We have found that although duplicated RP genes created by WGD show strong evidence of surviving recent gene conversion in their coding regions, the same is not true of the upstream noncoding regions. Moreover, even in the coding regions, there is a bias toward detecting conversion events at nonsynonymous positions. One role of these results is to shed light on previous analyses. For instance, Gao and Innan (2004)
argued that high rates of gene conversion had resulted in overestimates of the rate of yeast gene duplication. However, the data set used by these authors was heavily biased toward RP genes: of the 68 duplicate pairs considered, fully 50 of them fall into the set of the 55 WGD-produced RP duplicate genes. Our results imply that RPs are likely to be somewhat unique in both their patterns of duplication and of gene conversion and therefore probably should not be used as a proxy for the genome at large.
Given the existence of cDNA or mRNA-based gene conversion (Derr and Strathern 1993
; Storici et al. 2007
), events that would be naturally limited to transcribed regions, one might ask if the patterns observed here might simply represent a mutational bias in the conversion events themselves. This model is attractive because RP genes are highly expressed and recombination and gene conversion are associated with high levels of transcription (Aguilera and Gómez-González 2008
). However, a purely mutational bias explanation for these repeated gene conversion events is unsatisfying for three reasons. First, no bias against upstream conversion is evident for meiotic recombination. Given that the per-cell-cycle rate of conversion is roughly four orders of magnitude greater for meiosis than for mitosis (Barbera and Petes 2006
), even the roughly 1,000-fold excess of mitotic to meiotic cell divisions seen in wild relatives of S. cerevisiae
(Tsai et al. 2008
) is insufficient to give an overwhelming signature of coding region gene conversion. Secondly, a mutational bias does not explain the absence of gene conversion among the metabolic genes: of the nine MP gene pairs with expression levels (Holstege et al. 1998
) that are as high as those of the RP gene pairs, only one shows evidence of gene conversion (data not shown). Finally, such a bias does not explain the fact that synonymous sites in the RP genes show much less evidence for conversion events than do nonsynonymous sites. Thus, we argue that in addition to any mutational biases, some intervening process of selection is helping to fix nonsynonymous conversion events.
Another partial explanation for the similarity in yeast RP coding sequences is selection for high dosages of these proteins. There is evidence for dosage benefits from RP gene duplication (Koszul et al. 2004
). However, this explanation does not wholly explain the biology of these ohnologs, as we will discuss below. Moreover, such conservation does not appear to be a general response to dosage selection: metabolic genes also likely survived in duplicate partly due to dosage selection (Blank et al. 2005
; Piškur et al. 2006
; Conant and Wolfe 2007
; Merico et al. 2007
; van Hoek and Hogeweg 2009
) and yet do not have strong signatures of conversion.
Instead, these results provide evidence that even if the coding regions of the duplicated RP genes are still being homogenized by gene conversion, their expression patterns have diverged considerably since the WGD. In fact, a number of recent analyses that have demonstrated that the duplicated yeast RP genes are not, in fact, functionally interchangeable. Thus, several RP genes, but not their paralogs, have been shown to be essential for determining bud location in S. cerevisiae
(Ni and Snyder 2001
) and for localizing proteins to that bud (Komili et al. 2007
). Similar patterns have been seen in Brassica napus
, where RP genes show tissue-specific expression despite high amino acid sequence identity (Whittle and Krochko 2009
An equally intriguing case is the difference in protein localization between the RP paralogs Rpl7a and Rpl7b. Rpl7a is much more highly expressed than is Rpl7b (Ghaemmaghami et al. 2003
) but while Rpl7a is only found in the cytoplasm, Rpl7b, despite its lower abundance, is found both in the cytoplasm and in the nucleolus (Kim et al. 2009
). This difference does not appear to be caused by variations in the coding sequences of the two genes: replacing the RPL7B
sequence with that from RPL7A
does not alter the cellular localization of the protein encoded at that locus (Kim et al. 2009
). These authors propose that the localization difference is instead driven by preferential incorporation of Rpl7a into ribosomal subunits, meaning that the free protein is rarely present at the site of ribosome subunit assembly in the nucleolus. However, the origins of this difference in incorporation rate remain unclear given the apparent equivalence of the two protein sequences seen after sequence replacement.
What combination of phenomena might give rise to expression divergence coupled to strong protein sequence homogenization? One possibility is expression subfunctionalization. This hypothesis is partly supported by our network analysis, which indicates partial expression isolation between two groups of RP duplicate genes. Because the ribosome represents a tightly integrated functional module, expression subfunctionalization might still require very high degrees of protein identity between the RP paralogs, such that the proteins encoded by these paralogs are able to substitute for each other under the different expression conditions. This hypothesis of strong purifying selection acting on RP coding sequences is supported by the observation that these genes show fewer single-nucleotide polymorphisms per base pair than do the MP genes (0.002 vs. 0.007; polymorphism data taken from Schacherer et al. 2009
). Likewise, an increased frequency of gene-expression coupled gene conversion events, as discussed above, could very well improve the ability of natural selection to maintain such coding sequence conservation. On the expression front, the subfunctionalization itself might be either quantitative (only expression of both paralogs gives sufficient protein product) or qualitative (the expression of the two paralogs varies with respect to each other temporally). Similarly, the process might follow either the purely neutral DDC model originally proposed by several authors (Force et al. 1999
; Stoltzfus 1999
) or involve other selective forces, including adaptive ones (Des Marais and Rausher 2008
): for review, see Innan and Kondrashov (2010)
. Neofunctionalization is another possible explanation for the expression divergence among the RP genes, but we are skeptical that it would occur on this scale (more than 50 genes).
Finally, we note that a broader perspective argues that gene dosage and subfunctionalization are not mutually exclusive explanations for the fate of a duplicate gene pair: He and Zhang (2005)
propose that a duplication might be initially preserved by subfunctionalization and then might later undergo neofunctionalization. In the case of the RPs, we suspect that the initial preservation of the duplicated RP genes was for reasons of gene dosage; this process was likely followed by subfunctionalization (He and Zhang 2005
; Innan and Kondrashov 2010
The key prediction of our model is that gene conversion among the RPs helps to maintain a coadapted functional module: the ribosome. In future, it would be very interesting to test if this prediction of coadaptation holds.