In this study we systematically quantify functional redundancy for a set of duplicated genes in S. cerevisiae that exist in exactly two copies, as well as for a comparable set of paired singleton genes. While we discover that duplicated genes commonly show redundancy, we cannot detect redundancy for any of the singleton pairs. This redundancy appears to be a general property of duplicated genes in our set and is independent of whether they are associated with ribosomal or non-ribosomal functions, or have been generated by WGD or SSD. Even ancient duplicate gene pairs that have been evolving for ~100 million years often show redundancy.
In many cases, the degree of redundancy exhibited by duplicated genes is substantial. For instance, a quarter of all duplicate genes are redundant for at least one essential function (they are synthetically lethal). Of these more than 30% have an expected fitness value (WAWB) greater than 0.8 indicating that the redundant functionality contributes considerably to fitness. We also estimate that for approximately 50% of the non-synthetically lethal duplicate genes the fitness effect of the redundant functionality is greater than 30% of the total fitness effect of all the functions carried out by both genes (i.e. R>0.30).
We considered two potential explanations for the pervasive functional redundancy evident among duplicated genes. First, duplicate genes that have partitioned ancestral functionality might perform similar functions in the organism. It is possible (but not necessarily likely) that any two functionally related genes would show high levels of redundancy. For example, even though genes participating in parallel biochemical pathways should show more redundancy than entirely unrelated genes, genes participating in the same pathway should show lower levels of redundancy than randomly paired genes. Nonetheless, there is some evidence that functionally related genes might show a higher degree of redundancy than unrelated genes. For example, genes that are important for cell survival and growth after treatment with a DNA damaging agent MMS
[27] (“MMS genes”) and which thus participate in related cellular roles do show some redundancy (). However, the data for the MMS genes grown either in the presence or in the absence of MMS do not show the same degree of redundancy we observe among the duplicate genes. Thus we argue that functional similarity among duplicate genes is unlikely to account completely for the degree of redundancy displayed by the duplicated genes.
The other explanation is that redundant functionality comes from shared ancestral function. Non-redundant functions correspond either to subfunctions that have been partitioned between the two duplicates or to independently acquired new functions. Why would two duplicate genes continue sharing the ancestral function over long periods of evolutionary times (
i.e. why does subfunctionalization not proceed to completion)? One possibility is that functional similarity within the duplicate pair is maintained by selection for its effect on the level, rate, dynamics, or noisiness of expression
[32]–
[39]. This might be the case for duplicate genes that encode parts of macromolecular complexes. In this case there could be a need to maintain a stoichiometrically precise balance in gene dosage
[32]. This is likely to be the reason for high levels of redundancy shown by duplicated genes that encode components of the ribosome. Indeed, 51.1% of the ribosomal duplicate genes are haploinsufficient
[40] indicating that
S. cerevisiae is sensitive to dosage changes of the ribosomal genes. Also consistent with stoichiometric constraints, most of the protein components of the ribosome are duplicated genes (86.9%), and the vast majority of these duplicated genes derive from the WGD event (92.4%). This is the pattern of observations expected under the stoichiometric explanation as ribosomal genes duplicated one-by-one would have a deleterious effect on fitness. At the same time, following a simultaneous duplication of all the ribosomal genes by means of a WGD event, losing any single gene would have a deleterious effect
[32],
[41]. This, however, does not provide a full explanation for the evolution of these pairs as it is known that duplicated ribosomal genes also possess some non-redundant functionality
[42],
[43].
Among the non-ribosomal duplicate gene pairs we do not see the same signs of selection for the stoichiometrically determined gene dosage. Only 1.2% of the non-ribosomal duplicated genes are haploinsufficient, giving little indication of dosage sensitivity for non-ribosomal duplicates. Furthermore, redundancy is common not only for duplicate pairs derived from the WGD but also for those generated by the SSD events. It remains possible that redundancy for non-ribosomal genes is maintained by selection for elevated rates or levels of expression that are not stoichiometrically determined. In addition, having two redundant loci might help buffer against stochastic fluctuations in expression level, as for certain genes such stochastic variability might be deleterious
[38],
[39]. In some cases, it is possible that even the initial fixation of the duplicated copy was due to the advantageous effect that it immediately had on various properties of gene expression
[33],
[36]. In such cases, as long as the benefit of these specific properties of gene expression remains, the two duplicates would be maintained in the genome and would retain redundancy.
Finally, it is possible that a portion of the ancestral functionality cannot be partitioned because both duplicate genes might require it in order to perform their non-redundant functions. In this case, mutations that lead to additional subfunctionalization also inevitably inactivate the gene entirely or lead to dominant negative forms of the protein. These and all of the above possibilities are not mutually exclusive and will need to be assessed explicitly in future research.
The other major conclusion of our study is that duplicate genes do not appear to acquire new functionality in rich medium, even after very long periods of evolution (~100 million years). If duplicated genes have not gained new functionality, we expect strains carrying double gene deletions of duplicate genes to have costs to fitness similar to strains carrying single gene deletions of singleton genes. However, if duplicated genes have gained enough new functionality to behave as two independent singleton genes, then removing a duplicate gene pair should have a cost to fitness comparable to that of removing two singleton genes. Because the ancestral progenitors of the duplicate genes might be a biased subset of genes
[29],
[30], we developed a proxy set of singleton genes to account for this potential bias. The distribution of fitness values for strains carrying single deletions of singletons in the proxy set is similar to that for strains carrying double deletions of duplicates. Because our test is sufficiently sensitive to detect changes in gained functionality, we conclude that duplicate genes have not gained substantial new functionality in rich medium (
Figure S2). Additional work is needed to understand discrepancies between our study and various other predictions of new functionality (
e.g. [11],
[15],
[22],
[44],
[45], although see
[46],
[47]).
Two alternative explanations might account for the lack of appreciable new functionality. First, our measurements of fitness were carried out exclusively in rich medium (YPD). Although work done in this single condition is sufficient to conclude that duplicate genes are highly dissimilar from singleton genes with regard to contributions to fitness, it does not resolve the question of whether duplicate genes gain new functionality. That is, any new functionality acquired by duplicated genes that is only important under alternative environmental conditions would not be detected using this assay. Indeed, several studies show that duplicated genes are often involved in interacting with the environment and managing stress
[36],
[48]. Future measurements of the fitness of deletion strains under various environmental conditions should address this possibility.
Second, it is possible that the maintenance of redundancy and the lack of acquisition of new functions are related to each other. In this scenario, mutations that lead to new functions do so by adversely affecting ancestral functionality. To the extent that purifying selection maintains the ancestral function in both duplicated genes, the same purifying selection would act against the evolution of new functionality in either of the duplicates.
Taken together our results shed light onto the lifecycle of a gene in eukaryotic genomes. The common view of the long-term evolution of duplicate genes is that they are redundant immediately upon duplication, quickly undergo subfunctionalization and some neofunctionalization, and over long time periods begin to behave as singleton genes. This notion is attractive as it provides for a steady state description of gene fate in the genome. However, here we show that, when tested in rich medium, duplicate gene pairs maintain substantial redundancy, acquire little new functionality, and do not behave as singleton genes even after ~100 million years of evolution.