The lincRNAs comprise a substantial part of the mammalian RNome but very little is currently known about their functions and evolution. Together with previous observations, the results described here suggest (even if indirectly) that many lincRNAs are indeed functional molecules that are subject to relatively weak but significant purifying selection as determined from the Ke/Ki ratio. As such, lincRNAs genes provide evolutionary biologists with a unique data set to investigate the general and more idiosyncratic features of evolution by comparing their evolutionary patterns with those of protein-coding genes. Unlike the highly conserved structural RNAs (rRNAs and tRNAs) or small microRNAs, the lincRNA genes closely resemble protein-coding genes in terms of diversity, size, and gene architecture. The fundamental difference is that the transcripts of these genes are not translated into proteins but rather function directly as RNA molecules. Evolution of protein-coding genes shows correlations of varying strengths with several molecular phenomic variables (Koonin and Wolf 2006
; Wolf et al. 2006
). The most consistent and typically strongest is the negative correlation between the rate of sequence evolution and expression level of protein-coding genes or protein abundance (Drummond and Wilke 2008
; Wolf et al. 2010
). This relationship between evolution and expression of protein-coding genes inspired the hypothesis that evolution of proteins is driven primarily by selection for robustness to misfolding, which is partly caused by the errors of translation (Drummond and Wilke 2008
). Evolutionary models built on the assumption that the deleterious effect of misfolding is the primary fitness cost associated with mutations in the protein-coding genes have been shown to be compatible both with the dependency between the evolutionary rate and expression and with the universal distribution of the evolutionary rates of protein evolution (Drummond and Wilke 2009
; Lobkovsky et al. 2010
). In view of this unifying hypothesis of protein evolution, we were interested to determine whether the evolution of lincRNAs is similarly connected with expression.
The results presented here reveal the existence of a relatively weak but consistent and highly significant negative correlation between the evolutionary rate and expression level of lincRNAs. Introns of lincRNA genes provide an internal control: the absence of correlation for the intronic sequences indicates that the observed connection between evolution and expression has to do with structure and function (or robustness to malfunction) of the mature lincRNA molecules. We further showed that the level of correlation between evolutionary distances and expression is similar for lincRNAs and protein-coding genes evolving under comparable constraints. The connection between expression and evolution in mammals is relatively weak for both lincRNA and protein-coding genes, with only 1–2% of the variance in evolutionary rates accounted for by expression. These findings are compatible with the previous observations that the negative correlation between the sequence evolutionary rate and the expression level is the weakest in mammals among all tested model organisms (Drummond and Wilke 2008
). It seems most likely that this limited dependency is caused by the general weakness of purifying selection in mammals due to their characteristic low effective population sizes (Lynch and Conery 2003
; Lynch 2006
). Accordingly, mammals might not be the best choice of the model to study the causes of the dependency between evolution and expression for protein-coding gene. However, by the same token, this seems to be the only model on which a comparison of the evolutionary regimes of protein-coding genes and “protein-like” lincRNAs is possible because large diverse repertoires of long ncRNAs apparently could not evolve in organisms subject to strong selective constraints (Lynch 2007
; Koonin and Wolf 2010
We then examined potential connections between the predicted stability of lincRNA folding, their expression, and the rate of evolution. A limited in magnitude but significant positive correlation was detected between the predicted folding and expression: lincRNA molecules with greater folding potential show a tendency to be highly expressed. A positive correlation between the (predicted) RNA stability and expression level has been described previously for mammalian mRNAs (Shabalina et al. 2006
). However, we found no significant link between folding and the rate of evolution of lincRNAs and further observed that RNA folding and sequence evolution rate contributed to the expression level of lincRNAs independently.
The findings reported here show that the link between evolution and expression is a fundamental dependency that is not limited to protein-coding genes. Whether or not the deleterious effects of misfolding, leading to the formation of nonfunctional protein or RNA molecules, represent the principal factor behind this universal link remains to be determined. Certainly, the process of RNA folding is fundamentally different from protein folding as the two processes are based on different types of molecular interactions. Nevertheless, there is also undeniable general similarity between the folding processes of these two classes of biomolecules. Indeed, both proteins and RNAs are heteropolymers that fold to form well-defined secondary structure elements through local interactions followed by the formation of a unique 3D conformation through nonlocal interactions. Moreover, RNA misfolding is common if not thoroughly understood, and the increasingly apparent prevalence of RNA chaperones attests to its biological relevance (Cristofari and Darlix 2002
; Bhaskaran and Russell 2007
; Rajkowitsch et al. 2007
; Russell 2008
; Semrad 2011
). At face value, the observations reported here on the lack of connection between predicted RNA folding and evolutionary rate and the independence of the contributions of predicted folding and evolutionary rate to lincRNA expression can be taken as argument against a causal connection between lincRNA misfolding and the evolution–expression coupling. However, these observations should be interpreted with much caution. Prediction of the base-pairing potential is a blunt instrument that certainly does not reveal the true complexity of the RNA folding process and might not be able to distinguish well between correctly folded and misfolded RNA molecules. However, according to our estimations, about 60% of nucleotides are paired in lincRNAs and mRNAs (Shabalina et al. 2006
), which is comparable with the base pairing values for some experimentally characterized mRNAs (Kertesz et al. 2010
). Also, some of the local predicted structures for lincRNAs are in agreement with the structures predicted by biochemical probing, for example, for the A region of Xist RNA (Maenner et al. 2010
The distinct possibility remains that misfolded lincRNAs are deleterious similar to misfolded proteins, and this effect might explain the connection between their evolutionary rate and expression. Certainly, alternative explanations for this universal link could be relevant as well, for example, the potentially greater number of both functional and nonfunctional interactions in highly expressed proteins and RNAs constraining their evolution. Furthermore, it is impossible to rule out that, although the correlations between expression and evolution are of the same sign and similar in magnitude for proteins and lincRNAs, the underlying causes are substantially different (even if this possibility is less than parsimonious).