The completed sequence of numerous vertebrate genomes has enabled rapid gene annotation across species using orthologous relationships. This approach is feasible because purifying selection, acting on the open reading frames of coding exons and aimed at preserving encoded protein sequences, minimizes the sequence divergence that can occur. The sequences of these protein-coding genes generally change more slowly over millions of years than do non-coding sequences. Similarity at the nucleotide level is reflected in the likeness of structure and function of the gene products produced in different species. Additional features, such as non-coding functional elements, are also maintained as conserved sequences across species through the action of purifying selection [1
]. Enhancer elements are often predicted from their distinctive sequence conservation. Other functional classes, such as promoters, contain more plasticity in their composition and do not lend themselves to identification in this manner. Given that precise computational methods are not yet developed for predicting promoter regions in newly assembled genomes, their annotation lags behind that of coding genes and enhancers.
We hypothesized that promoter regions could be reliably mapped across species using a unique class of promoter that is flanked by genes on each side. These promoters, known as bidirectional promoters, would be useful for annotating promoter regions across mammals because the genes on both the left and right sides of the promoter change slowly. Thus, the promoter region is maintained as a recognizable, intergenic, architectural region that is amenable to computational discovery. Furthermore, if no repetitive elements were inserted at the bidirectional promoter region in either species, the intergenic distances should be maintained across species. To lend support to this hypothesis, Takai and Jones (2004) [2
] showed the exclusion of repetitive elements from bidirectional promoters of human chromosomes 20, 21, and 22.
Bidirectional promoters were originally defined as the regulatory regions present in the intergenic space of two oppositely oriented genes whose transcription start sites (TSSs) were separated by no more than 1,000 bp [3
]. Such genes appear in a head-to-head arrangement, i.e. facing away from one another, and are transcribed from opposite strands of DNA. The closely spaced arrangement of the TSSs flanking the bidirectional promoter was recognized as a non-random event, proven by the fact that a greater-than-expected number of genes had this architecture [2
]. Up to 10% of human protein-coding genes were initially identified with bidirectional promoters. We subsequently identified thousands of additional, putative, bidirectional promoters by analyzing divergently transcribed, spliced EST data [4
]. The methodology of mapping bidirectional promoters across species used here treats the genes on each side of a promoter as anchors that delimit the intergenic, orthologous regulatory region. If the genome of the other species contains conserved gene order and orientation at the orthologous location, then the intergenic promoter region must have evolved from the ancestral sequence at that location. If the intergenic distance of the annotated transcripts in the other species is also maintained as ≤ 1,000 bp, the orthologous bidirectional promoter is declared validated. Of added benefit, this method is not dependent on the level of nucleotide sequence conservation in the promoter regions, which can vary extensively [5
The enrichment of bidirectional promoters in the human genome evokes questions about their evolution. In some cases, chromosomal rearrangements could have conjoined promoter regions of two genes. Those genes would remain united through all subsequent speciation events due to selective pressure against change. Any breakage of the union (within or near the bidirectional promoter) could disrupt the normal regulation of both genes, potentially having profound (disadvantageous) effects on cellular function. If true, bidirectional promoters should provide an evolutionary timestamp of rearrangement events across mammalian genomes. Alternatively, some unidirectional promoters could have lost control of their regulated transcription, enabling RNA polymerase to load and traverse in the opposite direction [6
]. This scenario could serve as a mechanism for generating new
genes in the genome, which would occur in a rare and species-specific manner.
Building on our previous computational infrastructure, we utilize updated human genome annotations to compare bidirectional promoters in human and bovine genomes to test the hypothesis that long-term evolutionary histories of these promoters could be identified and used to annotate the bovine genome. We used these data to create a detailed regulatory map of orthologous promoter regions across 5 placental mammals (human, chimp, cow, mouse and rat). As an outcome of the analysis, we have shown that the "locked" arrangement of genes around these promoters enables prediction of unannotated 5' UTRs using cross-species comparisons. Furthermore, we identified bidirectional promoters that lack orthologous counterparts in all other species, supporting the conclusion that species-specific genes can be identified from rigorous, cross-species comparisons of this dataset. One human-specific example was from the family of five RecQ helicase paralogs (WRN, BLM, RECQL, RECQL4, and RECQL5), all of which have bidirectional promoters that developed independently.