In this study we have examined the role of upstream sequence elements in yeast UTRs, considering whether they play a concerted role in regulation of gene expression under stress conditions. Specifically, we considered available data sets of full length mapped yeast transcripts to define a superset of mapped transcriptional start sites (TSSs) representing over 70% of yeast protein coding genes. This is the most comprehensive survey of yeast uORFs using known 5' UTR sequence, and the first time this data has been considered in light of known translationally regulated genes under stress conditions [17
]. The analysis shows that yeast uORFs are statistically under-represented in 5' UTRs and UTRs have generally evolved to select against uORFs, suggesting that those that are tolerated may play some specific role or function in translation. This is in agreement with previous studies, either on specific genes [13
] or more generally [6
] which have suggested functional roles for uORFs. In addition to their relative scarcity, they are also less "efficient" in terms of their start codon local sequence (AUG
CAI index), their overall codon bias (tAI), and importantly, their preferred stop codon contexts. The decreased translational adaption of uORFs has also been noted for those associated with NMD [26
] and is consistent with their scarcity in UTRs and generic functional role. They would be expected to reduced translation efficiency of the principal ORF, blocking the ribosome from progressing to the true ORF or even promoting termination and detachment [26
]. Interestingly however, they tend to avoid strong stop signals which would promote termination, instead allowing re-initiation of the ribosomal machinery to continue scanning. One possibility is that uORFs might permit ribosomes to stall and "wait", a general mechanism which has been suggested to facilitate a fast response once stresses have been removed and normal translation can then continue [20
Regarding differential regulation at the translational level, a significant trend is observed across most of the stresses examined here. Genes which are observed to undergo relative translational up-regulation under stress have longer UTRs. This observation seems self-evident when considered at face value – namely, that any gene which can be regulated at the translational level must have a mechanism to support this, and this should be via
some motif or element contained within either the 5' or 3' UTR. However, here we demonstrate this for a variety of stresses for the first time, and importantly, demonstrate that this is a statistically significant trend. It offers a simple approach to select genes which are more likely to be translationally regulated on the basis of the UTR size and contents. Interestingly, the recent tiling array study of the yeast genome also defined 5' UTRs for a subset of the gene set [23
] and these authors also noted trends with 5' UTR length. They noted that anecdotally, genes with shorter 5' UTR lengths were generally "housekeeping" genes involved in processes such as rRNA metabolism, RNA processing and ribosomal biogenesis. Our polysomal array data supports this, finding that these Gene Ontology categories are translationally down-regulated under stress, and do not have longer 5' UTRs that allows them to escape this. Equally, we also demonstrate genes with longer 5' UTRs are translationally up-regulated and includes those involved in processes such as transport and localisation as reported by David and colleagues [23
]. This provides further evidence for a relationship between mRNA transcript length and gene function as proposed by Hurowitz and Brown [45
Pinpointing the precise nature of the elements conferring translational regulatory properties is rather more challenging. Our data suggests that uORFs play a significant role mediating gene expression during stress responses, as they are over-represented in translationally up-regulated genes, particularly under 0.2 mM peroxide stress. However, this trend is not so striking as the UTR length correlation, and must in part be a result of this; longer UTRs are more likely to have a uORF.
It should also be noted that the over-representation in up-regulated genes is difficult to reconcile with the "standard" uORF mechanism where they are generally expected to down-regulate gene expression at the translational level. This suggests that they are either acting in a novel way, that the complex "GCN4"-type mechanism is more widespread, or that other UTR elements than uORFs are responsible. Regardless, given their relative scarcity it is clear that there is still much to learn about UTR and uORF function.
Other authors have focused on conservation as a strong predictor of functional significance [27
]. Although early studies have suggested that uORFs are generally not conserved, this is a far from straightforward calculation to make. A single uORF may not necessarily be conserved in terms of exact sequence, length or relative position with respect to either the transcriptional or translational start, yet might still fill its functional role. In this study we add the additional constraint of known transcriptional start site and consider two complementary conservation metrics, the phastCons score [46
] and a Z-score local conservation statistic looking at the local UTR background. We have also examined whether a uORF is directly conserved in close fungal relatives. Although uORFs are generally not conserved, many are more conserved than by chance within their respective UTR sequences, confirming and extending previous studies [27
]. Using strict criteria, we find 61 uORFs (from 43 genes) with high conservation across related fungal species (See Additional File 7
) extending these previously reported data sets to 365 genes.
It has been reported that secondary structure in 5' UTRs mediates translational control of gene expression on a genomic scale in yeast [5
]. We re-examined this result using the true TSS-mapping for the 4149 5'UTR sequences and the program Randfold [48
]. In broad agreement with Ringnér and Krogh [44
] the vast majority of 5'UTRs appear not to be strongly folded; only 20 5'UTRs were found to have low MFE values with an associated p-value < 0.005 (see Additional File 8
). However, we do not see the same general trend between calculated 5' UTR folding energies and translation rates. Clearly, the use of the true TSS has a marked effect on the 5' UTR folding energy and raises the possibility that this trend might also be a result of the true size of the UTRs. Put simply, shorter 5' UTRs facilitate faster translation. Nevertheless, the secondary structural states might be stronger not weaker in the true 5' UTRs and this also seems to be a significant regulatory mechanism for a small number of genes.
In summary, the results presented here demonstrate convincing evidence that 5'UTR sequence has a major role to play in the regulation of gene expression, particularly under stress conditions and at the translational level. This effect appears to be widespread, affecting large numbers of different yeast genes under different conditions. Yeast has evolved a variety of mechanisms to effect these changes, including upstream open reading frames which are over-represented in translationally up-regulated genes.