|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide pervasive transcription has been reported in many eukaryotic organisms1-7, revealing a highly interleaved transcriptome organization that involves hundreds of novel non-coding RNAs8. These recently identified transcripts either exist stably in cells (Stable Unannotated Transcripts) or are rapidly degraded by the RNA surveillance pathway (Cryptic Unstable Transcripts). One characteristic of pervasive transcription is the extensive overlap of SUTs and CUTs with previously annotated features, which prompts the questions of how these transcripts are generated, and whether they exert function9. Single-gene studies have shown that transcription of SUTs and CUTs can be functional, through mechanisms involving the generated RNAs10,11 or their generation itself12-14. To date, a complete transcriptome architecture including SUTs and CUTs has not been described in any organism. Knowledge about the position and genome-wide arrangement of these transcripts will be instrumental in understanding their function8,15. We provide here a comprehensive analysis of these transcripts in the context of multiple conditions, a mutant of the exosome machinery and different strain backgrounds. We show that both SUTs and CUTs display distinct patterns of distribution at specific locations. Most of the newly identified transcripts initiate from nucleosome-free regions (NFRs) associated with the promoters of other transcripts (mostly protein-coding genes), or from NFRs at the 3’ ends of protein-coding genes. Likewise, about half of all coding transcripts initiate from NFRs associated with promoters of other transcripts. These data change our view of how a genome is transcribed, suggesting that bidirectionality is an inherent feature of promoters. Such an arrangement of divergent and overlapping transcripts may provide a mechanism for local spreading of regulatory signals – that is, coupling the transcriptional regulation of neighbouring genes via transcriptional interference or histone modification.
To obtain a comprehensive survey of the structure and expression level of transcripts across the yeast genome, we used tiling arrays3 to profile wild-type transcriptomes in ethanol (YPE), glucose (YPD, SDC) and galactose (YPGal), which together encompass the main laboratory growth conditions of yeast (Supplementary Table 1 and 2). Transcript start and end positions were mapped to the genome by a segmentation algorithm16 and subsequent manual curation. To identify CUTs, profiles were measured for a deletion mutant of RRP6, coding for an important component of the nuclear exosome, which is involved in the degradation of CUTs17,18. Transcripts specific to the rrp6Δ mutant were designated as CUTs (Methods). Expression profiles are provided in a searchable web-database (http://steinmetzlab.embl.de/NFRsharing).
Altogether, 7,272 transcripts were identified, comprising 5,171 verified or uncharacterized ORF transcripts (ORF-Ts), 847 SUTs and 925 CUTs (Fig. 1, Supplementary Table 3). We took advantage of data from different conditions to disambiguate cases of overlapping or immediately adjacent transcripts (Methods). We only used transcripts with confidently mapped 5’ ends for analyses involving start sites (5,084 ORF-Ts, 823 SUTs and 704 CUTs) (Methods and Supplementary Table 4). For validation, we compared our data to transcript start sites (TSS) mapped by 5’ RACE19. 81% (1,039 of 1,281) of TSSs agreed within 50 bases with the 5’ RACE results (Supplementary Fig. 1); 3% higher than a recent Solexa sequencing approach19. Furthermore, a comparison of our 3’ ends with the Solexa dataset showed agreement of 61% (2,774 of 4,551) within 50 bases. In addition, we tested several CUT boundaries and they agreed well with our RT-PCR and 5’ RACE validations (Supplementary Fig. 2 and Supplementary Table 5). Altogether, 102 SUTs had higher expression level in rrp6Δ compared to wild-type (Supplementary Table 6), suggesting that the distinction between CUTs and SUTs is in some cases condition-dependent, as for example the CUT antisense of PHO84, which is stabilized in old cells11. CUTs were, overall, shorter (median length 440 bases) than SUTs (median length 761 bases; p < 2 × 10-16, Wilcoxon test).
Nucleosome-free promoter regions (or 5’ NFRs), which facilitate transcription by allowing RNA polymerase to bind to DNA, have been reported as hallmarks of gene promoters20-24. To test whether unannotated transcripts have such hallmarks, we compared our transcript positions with nucleosome maps22,25. Consistent with promoter activity at NFRs, all classes of transcripts, ORF-Ts, CUTs and SUTs, exhibited depletion of nucleosomes upstream of their TSS (Fig. 2a). Furthermore, no nucleosome was detected between 422 of the 666 (63%) non-overlapping divergent transcript pairs involving at least one unannotated transcript (Methods and Supplementary Table 7). This suggests that these pairs share a single 5’ NFR that may function as a bidirectional promoter.
To further investigate the set of potential bidirectional promoters in the yeast genome, we analyzed all 1,049 non-overlapping divergent transcript pairs that shared a single 5’ NFR. The distribution of distances between their TSSs had an estimated mode at 180 bases, while their shared NFR lengths had a mode at 131 bases (Fig. 2b). The size of the shared 5’ NFRs increased with the inter-transcript distances, in a relationship consistent with a model of a single NFR surrounded by two regions inside the flanking nucleosomes from which transcripts initiate22,25.
In our analysis, 612 of 931 non-overlapping divergent protein-coding transcript pairs were found to share a single 5’ NFR (66%, Supplementary Table 7). This fraction is considerably higher than the 30% of divergent ORF pairs that were previously estimated to share promoters26. Previous studies may have underestimated the number of bidirectional promoters by considering only distances between ORF start codons. Indeed, for divergent ORF-T pairs sharing a 5’ NFR (Fig. 2c, red dots), the total UTR length increased with the distance between the start codons, consistent with a typical size of the inter-transcript distance of a shared promoter being ~180 bases, as evident from Fig. 2b. This relationship holds for a wide range of inter-ORF distances, including cases greater than 1,000 bases, such as SAG1 and APL1 (Fig. 1f). In contrast, divergent ORF-T pairs separated by multiple NFRs showed no correlation between total UTR length and distance separating start codons (Fig. 2c, black dots). Moreover, most of these pairs were separated by more than 452 bases, which is approximately the minimal size of a region spanned by two NFRs (2 × 131 bases), a nucleosome (146 bases) and two intra-nucleosome regions (2 × 22 bases; Supplementary Fig. 3). These results suggest that bidirectional promoter usage is frequent for divergent transcript pairs involving unannotated transcripts and protein-coding genes in any combination.
To determine how many of the 5’ NFRs initiate transcripts bidirectionally, we selected all nucleosome-depleted regions longer than 80 bases immediately upstream of TSSs, defining a set of 3,965 5’ NFRs (Methods and Supplementary Fig. 4). Of these, 1,318 (33%) were bidirectional, involving half of all transcripts with a mapped 5’ NFR (2656 of 5339, Supplementary Tables 8-10). The sequences of NFRs detected as bidirectional promoters did not differ significantly from the other 5’ NFRs by content of palindromic sequences or GC nucleotides. Among transcripts with mapped 5’ NFRs, 61% of unannotated transcripts and 48% of protein-coding transcripts initiated bidirectionally from shared 5’ NFRs rather than initiating from their own promoters (Fig. 3b). Of the unannotated transcripts, 90% shared the 5’ NFR with a protein-coding transcript. These results suggest that bidirectionality is an inherent property of promoters. In addition to bidirectional transcription, a small number of transcripts were found to initiate in tandem orientation from shared 5’ NFRs (Fig. 3b). This number is likely underestimated, however, due to the difficulty of distinguishing immediately adjacent tandem transcripts by microarray hybridization. Altogether, our results suggest that multiple transcripts often initiate from NFRs at promoters in yeast. Additional transcripts will likely be detected by profiling other conditions or mutants other than rrp6Δ.
In addition to NFRs at promoters, nucleosome-free regions downstream of stop codons have been reported for the vast majority of ORFs and are suspected to play a role in transcription termination as well as in the generation of transcripts antisense to the ORF22. To better characterize such NFRs, we selected all nucleosome-depleted regions longer than 80 bases immediately downstream of stop codons of all verified and uncharacterized ORFs that we detected expressed, defining a set of 2,616 3’ NFRs (Supplementary Table 9). 827 of them initiated a transcript. We observed that 27% of unannotated transcripts with a mapped 5’ NFR initiated from the 3’ NFR of an ORF (Fig. 3b). Together, 3’ and 5’ shared NFRs thus accounted for the majority (73%) of SUT or CUT initiation, and for the majority (61%) of ORF-T initiation (Fig. 3b, Supplementary Table 10 and 11 for a list of all pairs). Altogether, these results show a surprisingly high level of NFR sharing, not only in bidirectional promoters but also in 3’ NFRs.
The high level of NFR sharing may explain a large extent of antisense transcription3, i.e. transcription on opposite strands. 70% of all antisense transcripts with mapped 5’ NFR initiated from a shared nucleosome free region. For example, 269 unannotated transcripts initiating from the 3’ NFR of an ORF were transcribed antisense to the ORF (for example YLR050C and MBR1, Figs. 1c, 1e). Another recurrent configuration is an antisense transcript starting from the 5’ NFR of a downstream tandem transcript. These configurations associate three transcripts; an example is GAL80, whose 5’ NFR initiates a transcript antisense to its upstream gene SUR7 (Fig. 1b). Notably, the level of SUR7 was lowest in YPGal medium, where the antisense and GAL80 had the highest expression (18 further examples are given in Supplementary Table 12). To generalize these observations, we analyzed expression correlations across growth conditions among transcript pairs involving at least one SUT. We observed significant expression anti-correlation between sense-antisense pairs, while bidirectional pairs of transcripts showed a tendency for co-expression (Supplementary Fig. 5 and Supplementary Table 13; p < 10-7, Pearson’s product moment correlation test). These findings fit the patterns displayed by individual cases of transcriptional interference or inhibitory histone modifications10-14,27.
The extent to which the genome-wide set of unannotated transcripts play a biological role, or are merely transcriptional side products (noise) originating from nucleosome depleted regions9, is unknown. The action of transcription itself can be functional even if the transcription product is not. This is the case, for example, with the transcription of the ncRNAs SRG1 and IME4 antisense, which mediate transcriptional silencing13,14. To explore the conservation of transcription initiation from 5’ and 3’ NFRs, we profiled the transcriptome of YJM78928, a highly diverged relative of the laboratory strain S288c. In rich media (YPD), about fifty percent (380/769) of the SUTs expressed in S288c were also found expressed in YJM789 (Methods). The frequencies with which these 380 conserved SUTs were observed sharing NFRs with other transcripts were similar to those in the overall dataset. These results indicate that the interlaced architecture of transcript initiation from 5’ and 3’ NFRs is conserved between these strains of S. cerevisiae. Why some of the unannotated transcripts are stable and others unstable remains to be explored. The parasite Giardia lamblia produces an abundance of antisense transcripts originating bidirectionally from promoters29, and consistent with our rrp6Δ results, its genome lacks orthologs to several nuclear exosome components. Likewise, the function of bidirectional transcription requires further exploration. One hypothesis is that bidirectional transcription has a role in maintaining an open chromatin structure at promoters. In other instances the combined action of bidirectional promoters and transcriptional regulation by these transcripts, or their generation, may provide a mechanism to spread transcriptional regulatory signals locally in the genome.
cDNA for hybridization was prepared using random- or random plus oligo-dT- priming with the addition of actinomycin D during reverse transcription30. The hybridization data were normalized and segmented using the Bioconductor package ‘tilingArray’16. Segments were then manually curated. Further details can be found in Methods and Supplementary Information.
We thank Asifa Akhtar, Andreas Ladurner, Stephanie Blandin, Raeka Aiyar, Eugenio Mancera Ramos, Emilie Fritsch for helpful comments on the manuscript, Joern Toedling for helpful discussion and for the template of the website, Charles Girardot for data submission to ArrayExpress, Nick Proudfoot for access to experiment equipment, and the contributors to the Bioconductor (www.bioconductor.org) and R (http://www.r-project.org) projects for their software. This work was supported by grants to L.M.S. from the National Institutes of Health and Deutsche Forschungsgemeinschaft, by a SystemsX fellowship to E.G., a Roche fellowship to J.C. and grants to F.S. from SNF and NCCR Frontiers in Genetics.
Author Contributions L.M.S., Z.X. and W.W. designed the research; Z.X. and W.W. annotated the transcripts with the help of J.G. and F.P.; W.W. and Z.X. performed analysis of the transcripts with the help of J.G.; F.P. and S.C. performed the array hybridizations; J.C. E.G. and F.S. provided samples for the rrp6 mutant, designed and performed validation RT-PCR and 5’ RACE experiments; L.M.S., J.G., F.S. and W.H. supervised the research; L.M.S., Z.X., W.W., J.G. and W.H. wrote the manuscript.
Author Information Raw data are available from ArrayExpress (http://www.ebi.ac.uk/arrayexpress) under accession number E-TABM-590.