Recent analyses of mammalian genomes and microarray data suggest that the majority of mammalian genes generate multiple transcripts and protein isoforms with distinct functional roles. This transcript diversity is generated, in part, through the use of alternative promoters (
1) and alternative splicing (
2), which produce pre-mRNA and mRNA isoforms respectively. The use of alternative promoters plays a fundamental role in regulating different gene isoforms, e.g.
LEF1,
TP73,
RUNX1 and
MYC in various mammalian tissues and at different developmental stages. For example, in case of
LEF1, the protein isoforms generated from the two promoters perform opposing biological functions. While the full-length LEF1, transcribed from upstream promoter, interacts with β-catenin and regulates Wnt target genes, the shorter isoform is incapable of binding β-catenin and suppresses the regulation of Wnt targets through β-catenin (
3). Moreover, activation of upstream promoter and silencing of the internal promoter is observed in most colon cancers (
4). Therefore, identifying primary and alternative gene promoters in various normal tissues is critical to understanding a diversity of physiological processes associated with normal and diseased states in different tissues. The advent of high-throughput molecular technologies and computational methods to support this technology has significantly improved our ability to annotate mammalian gene regulatory regions. High-throughput technologies, such as cap analysis gene expression (CAGE); chromatin immunoprecipitation (ChIP) followed by microarray analysis (ChIP–chip); ChIP coupled with pair-end ditag sequencing analysis (ChIP-PET) (
5,
6); and, more recently, ChIP coupled with sequencing (ChIP-seq) (
7) are enabling the genome-wide identification of alternative promoters and their patterns of use. This information will help us to understand the use of alternative promoters in a wide variety of cell/tissue types, different developmental stages and their misuse in disease conditions.
Growing evidence suggests that about half of the mammalian genes have multiple alternative promoters that can span up to thousands of bases (
8–12). For example, a comprehensive analyses of 1% of the human genome in 16 diverse human cell lines, using transient transfection reporter assays demonstrated the presence of functional alternative promoters in >20% of genes (
12). Similarly, it has been reported that 35% of 100 human erythroid genes examined have alternative promoters and that 24% of active genes in human fibroblast cells possess multiple promoters (
13). This is quite a high percentage of genes showing multiple promoter usage in a single biological process or cell type suggesting extensive use of multiple promoters by mammalian genes. The knowledge of alternative promoter usage in different mammalian tissues is very limited and cannot be addressed without high-resolution genome-wide mapping of the promoter regions. However, the high-throughput approaches, such as CAGE (
14), deepCAGE (
15), ChIP–chip (
16,
17) or ChIP-seq (
7), to annotate promoters at genome level need to be applied with caution because of the inherent problems with each method. For example, cytoplasmic enzyme complexes can add caps to 5′-monophosphate RNA molecules generated by ribonuclease cleavage (
18), and hence CAGE tags could represent 5′ ends of RNAs generated by cleavage and subsequent re-capping (
19). CAGE analysis can also capture some non-capped transcripts that may represent cleaved decaying mRNA (
20). Furthermore, a large number of CAGE tags are distributed throughout the gene transcripts rendering it inefficient as a sole source of promoter identifier. Previously, we (
16) and others (
17) have performed ChIP–chip analyses to identify the activity of mammalian promoters across different cell and tissue types. However, ChIP–chip requires design of genome-wide microarray to probe the ChIP-bound DNA sequences. Additionally, with either ChIP-chip or ChIP-seq technology promoters cannot be identified solely on the presence of Pol-II enrichment on a genomic location because of its enrichment throughout the transcribed genomic region and lack of highly specific antibodies that can distinguish promoter bound Pol-II from elongating Pol-II. In order to overcome these limitations of previous studies, we pursued a combined Pol-II ChIP-seq and bioinformatics promoter prediction approach to identify promoter regions and their activity in five different mouse tissues. We provide a genome-wide catalog of active promoters in five tissues of adult mouse along with tissue-specific promoters that will help future studies of transcriptional regulation in mammalian genomes.