The comparative search in several yeasts showed a large number of signals indicative for structured RNAs. We found evidence for structured RNAs not only in intergenic regions (that are often believed to be ncRNAs [10
]), but also in coding regions and untranslated regions of coding sequences. The only previous in silico
study to predict new ncRNAs in yeast by McCutcheon and Eddy [10
] used QRNA [30
] and was based on pairwise alignments of the intergenic regions only. The authors estimated the sensitivity of their screen to be 45%, measured against known and annotated ncRNAs (162 predicted out of 363 known ncRNAs). In contrast to the screen of McCutcheon and Eddy, we considered the entire genomic sequence. Based on multiple alignments instead of pairwise alignments, our RNAz-based approach has a significantly increased sensitivity and specificity. We recovered 257 of the 375 known ncRNAs in the S. cerevisiae
genome, amounting to a sensitivity of 69%. We retrieved almost all known ncRNAs that were also detected by QRNA, while the overlap with the novel predictions is much smaller. Only 42 of the 94 candidate ncRNAs from McCutcheon and Eddy [10
] are contained in our predictions. McCutcheon and Eddy verified the transcription of eight candidate ncRNAs (RUF1-8) using Northern blots; three of these (RUF4, RUF6, RUF7), however, turned out to be false positives in later experiments; RUF8 was identified as a misclassified ORF. Our RNAz-based approach classified RUF1, RUF2, RUF3, RUF5-1 and RUF5-2 as structured RNAs, but did not detect any of the false positives. This observation adds confidence to the specificity of our approach.
Surprisingly, the largest single class of predicted RNA structures was found in protein coding sequences. By contrast, it is widely believed that RNA structures in CDS can interfere both with translation and with the evolution of the protein coding sequence [13
]. Furthermore, statistical evidence of widespread secondary structure in eukaryotic CDS was recently provided by Meyer et al [14
]. The best-known examples of RNA structures that are superimposed on protein coding regions come from viruses: e.g. the Rev response element of HIV1 [31
] or the cis-acting regulation element (CRE) in picorna viruses [32
]. Eukaryotic examples are the mammalian steroid receptor activator (SRA) [33
] or the plant gene ENOD40 [34
]. An example in yeast is ASH1, which is one of the best-studied systems for localization of mRNAs within the cell [35
]. The ASH1 mRNA harbours at least four regions (E1, E2A, E2B, E3) with RNA secondary structures within its protein coding region. These localization elements of ASH1 have no similarity on the sequence level, but are structurally related, thus, it is believed, that these elements function on the structural level [37
]. Our data strongly suggest that this phenomenon is in fact common in yeast.
The relevance of the observation of a large number of structured RNA elements in coding regions is supported by an unexpected clustering of functional GO annotation terms of the affected protein coding genes. This significant clustering into a small number of functional classes strongly supports the interpretation that these RNAz hits are functional on a posttranscriptional level. The most prominent groups is related to cellular metabolism. Another large group of proteins is found to function within the ribosomal complex or within the mitochondria. ASH1 also belongs to the latter group. Many mitochondrial proteins are among the 55 organelle-specific proteins that have RNAz signals. This list includes in particular ATP2 and TIM44, both of which are known to be actively transported to the mitochondria [38
]. It is tempting to speculate that many or most of RNA structures within coding sequences are functional as localization signals.
Structured RNA elements in UTR regions (cis
-acting elements) often bind trans-acting factors and control important aspects of gene expression, such as translational efficiency, mRNA stability and subcellular localization. Known examples are iron response elements (IRE), the translation control elements (TCE), internal ribosome entry sites (IRES) and AU-rich elements [40
]. In addition, many cellular targeting signals are located within UTRs [37
]. From our screen, two groups of CDS with conserved RNA structures in their 3'-UTRs seem to be of special importance. First, one group of proteins is involved in the process of translation, mostly ribosomal proteins. Shalgi et al [44
] also reported that genes with common RNA sequence motifs in their 3'-UTR that control the stability of the transcripts are enriched in ribosomal proteins. It is conceivable that similar RNA motifs are embedded in larger, conserved structured regions that can be detected by RNAz.
The second large group consists of mitochondrial genes with structured 3'-UTRs. A number of mRNAs corresponding to nuclear-encoded mitochondrial proteins are targeted to the vicinity of mitochondria [45
]. Many of the cis
-acting mitochondrial localization elements are localized in the 3'-UTRs of the transcripts and are shown to be sufficient to target mRNAs to mitochondria [39
]. Together with the structured signals found in CDS of mitochondrial proteins, this is the first report of an enlarged set for this class of proteins. Shalgi et al [44
] described a motif common to many mitochondrial proteins, which was also associated with a distinct subcellular localization. It is plausible that more nuclear encoded mitochondrial transcripts are actively transported. However, more subtle roles of transcript localization might exist that seem to be partially redundant, and where the specific localization mechanisms are not yet completely understood.
Most of the predicted RNA structures with a distance of more than 120 bp to the nearest known feature could not be reliably annotated. With a very small number of exceptions, no significant sequence or structural homology outside the Saccharomyces genus was found. Nevertheless, the combination of three independent tiling array studies, EST data, and SAGE data provide evidence that about 120 of these novel intergenic elements are transcribed in S. cerevisiae. As our computational approach is designed to detect stabilizing selection acting on the RNA structure, we suggest that these transcripts are functional at the RNA level rather than being the mere by-product of other regulatory processes or constituting transcriptional noise.
For a subclass of the novel intergenic elements, we have at least circumstantial evidence that hints at their function. Firstly, a significantly larger number of structured RNAs is predicted in the 5' vicinity of known protein coding transcripts than in their 3' neighborhood. Secondly, tiling array data indicate that many of the transcribed sequences are promoter associated transcripts in the sense that they are transcribed upstream of a gene and covered the promoter region of the gene. Structured RNA signals are overrepresented in these sequences. One of the current hypotheses about the function of promoter-associated transcripts suggests that these RNAs are directly involved in transcriptional regulation of Pol II due to occupied promoter regions [7
]. Recently, such a regulation was shown in yeast for the ncRNA SRG1
, which controls the transcription of its downstream gene SER3
Our data also suggest another possibility. Recently, Thomas et al [49
] described a synthetic aptamer that binds with high affinity to Pol II and is able to specifically inhibit transcription. Similar cases are known for an ncRNA (B2) in mouse, that acts in the same way in response to stress signals [50
], and the bacterial 6S RNA [52
]. A non-coding RNA, Evf-2
, that probably acts as a transcriptional enhancer, was recently found in mammals [54
]. Most probably, these molecules are examples of an expanding repertoire of direct transcriptional modifiers. It is thus not implausible that many of the promoter-based transcripts that exhibit a conserved RNA structure function via direct modification of the Pol II transcription complex.
Finally, our data also indicate that at least some of the predicted structured RNAs could be functional by a direct modus via RNA-RNA interactions: we derived a substantial number of CDS/ncRNA or ncRNA/ncRNA antisense overlaps from the computational data, drawing a picture similar to that known in other eukaryotic species [55
]. This finding further implies that the antisense mechanism is dependent on RNA structures, for example to control the accessibility of antisense regions in the first step of duplex formation.