We carried out genome-wide computations of secondary structures in 5′-UTRs of mRNA in yeast, and correlated 5′-UTR folding free energy with various other transcript features. We chose somewhat arbitrarily to fold sequences of length 50 nt upstream of the coding start, because these sequences are almost certainly inside the 5′-UTR. We also folded 100- and 200-nt sequences, and had similar but weaker results (). Folding of RNA is somewhat local in sequence: when folding 100- or 200-nt upstream sequences, the last 50 nt were typically computed to fold into the same structure as when the 50-nt upstream sequences were folded. Translation has been shown to be most sensitive to secondary structure close to the 5′ end of mRNA [
12]. Hence, we think that the weaker results obtained for longer upstream sequences reflect an increase of sequence spanning genomic DNA not being transcribed, and not that secondary structure close to the translation start is most important for the transcript features we have investigated. We used 5′-UTRs of fixed length to avoid comparing free energies for sequences of different lengths. Bernstein et al. [
24] used predicted UTRs for each gene in
E. coli and found no association between secondary structure in UTRs and mRNA half-life. Our different findings may be due to differences between pro- and eukaryotes, or difficulties in comparing UTRs of different length.
To compare 5′-UTRs with other genomic regions, 50-nt sequences from intergenic regions, coding regions, and 3′-UTRs were also folded. These three sets of sequences had significantly lower free energies than the 5′-UTR sequences (see A). The folding free energy of RNA depends on both nucleotide composition and the order of the nucleotides. The nucleotide composition, quantified both by GC-content and weighted dinucleotide composition, was similar in 5′-UTRs and intergenic regions, indicating that the difference in free energies between these groups is due to nucleotide order. Indeed, the 5′-UTRs had higher folding free energies than random sequences with the same dinucleotide composition (B). In contrast, yeast coding regions have lower folding energies than randomized sequences preserving the encoded protein, the codon usage, and the dinucleotide composition [
36]. This opposite behavior is in agreement with the huge difference in folding free energies between coding regions and 5′-UTRs (A), even though GC-content probably is more important for this difference. Our results indicate that there has been evolutionary selection for 5′-UTRs to be weakly folded and suggest that folding free energy might be used as one probabilistic component of a gene prediction program.
In line with our observation that 5′-UTRs tend to be weakly folded is our finding that uncharacterized ORFs are overrepresented among the genes with strongly folded 5′-UTRs. Assuming that uncharacterized genes typically are expressed at low levels or under rare conditions, or even are pseudogenes, this finding hints at a larger selective pressure for absence of secondary structure for commonly or highly expressed genes. Confirming this picture is our finding that 5′-UTR folding free energy is significantly positively correlated with mRNA copy number and protein abundance (see ). Since we only investigated verified genes, we could look into the source of the verification of the genes with strongly folded 5′-UTRs. The 5′-UTR of the gene YBR296C-A (see B) has the secondary structure with the lowest free energy of all genes, and is annotated as unknown in GO. Remarkably, this gene has only one literature reference, in which Kumar et al. [
37] describe an approach for finding overlooked genes in yeast. Of the 137 new genes reported by Kumar et al., 41 are annotated as verified in the
Saccharomyces Genome Database (SGD). Ten of these 41 genes have a free energy below −10 kcal/mol, which is significantly more genes than expected by chance (
p = 4 × 10
−6, Fisher's exact test).
The three most significant GO categories among the genes with weakly folded 5′-UTRs were related to Ty element retrotransposons (see ). Ty element retrotransposons are stretches of DNA that replicate and move in the genome through RNA intermediates [
38]. The Ty elements contain various genes in their sequences, e.g., proteases, integrases, and reverse transcriptases. The fact that they have weakly folded 5′-UTRs suggests that folding of their RNA is detrimental to their function or integration in the genome. Interestingly, Ty elements showed up in a study of RNA half-life where different methods of transcriptional inhibition were compared [
39]. The RNA transcripts whose stability differed most between rpb1–1 inhibition on the one hand and Thiolutin, 1,10-phenanthroline, and 6-azauracil on the other hand were predominantly Ty elements. It may be worth investigating whether there is a connection between this difference in transcript stability and the lack of 5′-UTR secondary structure.
We found that 5′-UTR folding free energy was significantly positively correlated with both translational activity and mRNA half-life (see ). These correlations were still significant after correction for GC-content, indicating that the correlations are not simply a secondary effect caused by nucleotide frequencies. Parker and colleagues showed that the insertion of secondary structures into the 5′-UTR of
PGK1 yeast mRNA inhibited translation and stimulated decay of
PGK1 [
13]. Together, these findings suggest a widespread use of 5′-UTR secondary structure in post-transcriptional regulation. Our correlations may not be caused by any biochemical mechanism, e.g., transcripts of one evolutionary origin could have both strongly folded 5′-UTRs and low translation rates, whereas transcripts of another evolutionary origin could have weakly folded 5′-UTRs and high translation rates. Nevertheless, we believe that the correlations do reflect more direct connections. Our findings may be explained by an inhibitory effect of 5′-UTR secondary structure on translation initiation combined with competition between translation and decay. However, more direct biochemical pathways preferentially degrading mRNA with 5′-UTR secondary structure might also exist. Early support for the inhibitory effect of 5′-UTR secondary structure on translation came from insertion of hairpin loops into 5′-UTRs [
9,
10]. Later studies have shown connections between mRNA 5′ secondary structure and proteins important for translation such as eIF4A [
15]. Competition between translation and decay has been proposed because both may require cap access [
20,
33]. Moreover, during translation the mRNA is circularized through interactions between cap-binding translation initiation factors and the poly(A)-binding protein (PABP). This conformation presumably protects mRNA from degradation by preventing access to both the cap and the poly(A) tail, suggesting that also the poly(A) tail is important for competition [
17]. We expected that such competition would be more easily seen for short-lived transcripts because degradation takes up a larger part of their lives. Indeed, our global analysis revealed that transcript half-life is positively correlated with both ribosome density and ribosome occupancy, in particular for short-lived transcripts (see ).
A major mediator of heat shock response is mRNA decay [
40], and the mRNA decay profile is similar to the heat response [
39]. In line with these observations, we found a positive correlation between 5′-UTR free energy and mRNA response to heat shock (), i.e., transcripts with weakly folded 5′-UTRs are, in addition to being relatively long-lived, relatively upregulated after a heat shock. Given that transcripts that are upregulated by heat shock have weakly folded 5′-UTRs, it is expected that they would be translated at relatively high rates. Indeed, the correlation between ribosome occupancy and relative upregulation 10 min after heat shock was 0.23 (
p < 2 × 10
−60; similar for 5 min). Of interest, the heat shock mRNA
Hsp90 in
Drosophila has extensive secondary structure in its 5′-UTR.
Hsp90 translation is inefficient at normal growth temperature, and is activated by heat shock, perhaps by thermal destabilization of the secondary structure in the 5′-UTR [
41]. It may be worthwhile to perform genome-wide protein abundance experiments of heat shock response to investigate whether preferential heat shock translation is a common mechanism.
We assessed whether transcripts associated with RBPs, or with sequence motifs associated with these RBPs in their 3′-UTRs, were over- or underrepresented among fast decaying transcripts or among transcripts with strongly folded 5′-UTRs. Puf proteins are known to enhance mRNA turnover or repress translation [
42]. We found targets of Puf3p, Puf4p, and Puf5p proteins to be significantly associated with fast decay, extending an earlier study [
43]. Perhaps of interest, we note that the three Puf proteins for which Gerber et al. identified sequence motifs [
7] were associated with fast decaying transcripts, while the remaining two Puf proteins, as well as Mex67 and Yra1, instead tended to be associated with weakly folded 5′-UTRs.
To summarize, we found that (i) 5′-UTRs have higher folding free energies than other genomic regions and than expected from their nucleotide composition, (ii) secondary structures in 5′-UTRs likely play a role in mRNA translation and turnover on a genomic scale, and (iii) genes with strongly folded 5′-UTRs are generally rarer, harder to find experimentally, and less annotated. It is important to keep in mind that the highly significant correlations we have found are small, showing that folding of 5′-UTRs is, as expected, only one aspect of post-transcriptional regulation. However, the correlations may be larger in subgroups of mRNAs, such as mRNAs targeted by individual decay pathways [
44] and specific RBPs [
45]. An example of a larger correlation in a subgroup is our observation that translational activity and mRNA decay are highly correlated for mRNAs with short half-lives.