Search tips
Search criteria

Results 1-25 (1333759)

Clipboard (0)

Related Articles

1.  Backup without redundancy: genetic interactions reveal the cost of duplicate gene loss 
We show that genetic interaction profiles offer a powerful approach to elicit phenotypes that are far richer than is attainable using single gene deletions. This has allowed us to address the long-standing question of the role played by duplicate genes (paralogs) in robustness against deletion.We provide for the first time direct evidence that the capacity of some duplicates to cover for the loss of their paralogs can account for the observed difference in fitness between duplicate and singleton deletions mutants, but that the overall contribution of this effect to dispensability is small.More broadly, we demonstrate that paralogs possessing apparent backup capacity in some environments have in fact distinct and non-overlapping functions, and are unable to provide backup across a range of compromising conditions. This resolves the previous paradox of how backup genes conferring dispensability can nevertheless be independently maintained in the population.From a practical point of view, our findings suggest efficient strategies to elicit rich deletion phenotypes that should be highly relevant for the design of future phenotypic screens.
Much of our understanding of biological processes has been derived from the characterization of the functional consequence to an organism of altering one or more of its genes. Efforts to systematically evaluate the phenotypic effects of gene loss, however, have been hampered by the fact that the disruption of most genes has surprisingly modest effects on cell growth and viability. The high proportion of genes with no apparent deletion effect has wide-ranging practical and theoretical implications and has been the subject of considerable interest (Wagner, 2000, 2005; Giaever et al, 2002; Gu et al, 2003; Papp et al, 2004; Kafri et al, 2005). One factor that has been implicated as contributing to the high degree of dispensability is the abundance of closely related paralogs present in most genomes (Winzeler et al, 1999; Wagner, 2000; Giaever et al, 2002). Indeed, recent work in S. cerevisiae has shown that the existence of a paralog elsewhere in the genome significantly increases the chance that deletion of a given gene has little effect on growth (Gu et al, 2003). However, current analyses have been mostly correlative, and direct mechanistic evidence supporting or refuting the role of backup compensation in mutational robustness is still largely missing. Furthermore, backup between duplicates is not easily justified in evolutionary terms, in that a genuine ability to comprehensively cover for the loss of another gene is evolutionarily unstable (Brookfield, 1992).
Here, we exploit the recent availability of high-density quantitative genetic interaction profiles (EMAPs) to address these issues directly. To test whether SSL paralogs can account for the excess fitness of duplicates, we classified genes into fitness categories according to their deletion growth defect (Materials and methods). The subset of genes covered by our combined data set exhibits an over-representation of duplicate genes in the weak/no deletion phenotype (WNP) class similar to that reported previously (Gu et al, 2003) (Figure 1B). Strikingly, this difference corresponds to the number of WNP duplicates that have an SSL interaction with their corresponding paralog (Figure 1C). Our data thus provide direct evidence that it is indeed duplicate compensation that accounts for the observed difference in deletion growth defect between duplicates and singletons, at least for the genes covered by our data set.
Apart from the mechanism itself, the characteristic features of buffering duplicates have received considerable attention (Gu et al, 2003; Kafri et al, 2005; Wagner, 2005). Our data allowed us to unambiguously distinguish the subset of duplicates whose dispensability can be attributed to the existence of a backup paralog. The ability to identify backup duplicates directly put us in a position to study their features, and how they differ from other duplicates without buffering properties. In particular, we asked to what extent the observed buffering in rich media reflects functional similarity and a genuine ability to cover for the loss of a paralog in a broader range of conditions.
To assess the extent to which SSL duplicates can provide genuine backup under compromising conditions, we fist used genetic interaction profiles as a more stringent test for redundancy that assesses the effect of gene loss in the background of additional gene deletions. In contrast to the expectation that truly buffered duplicates should have few if any synthetic interactions, we find that the number is in fact substantial and often exceeds that of random genes and non-SSL duplicates (Figure 2B). Similarly, using a recent data set of sensitivity profiles of deletion strains to a range of agents and environments (Brown et al, 2006), we find that the deletion of SSL duplicates across a range of environments has on average no weaker (and in fact a slightly stronger) effect on cellular growth rate than that of non-SSL duplicates or random genes. Taken together, these findings suggest that the backup capacity of SSL duplicates is limited and not indicative of a comprehensive ability to cover for the loss of the paralogous partner.
We next tested the degree of functional similarity of buffering duplicates using similarity in genetic interaction as well as environmental sensitivity profiles as indicators of shared functionality (Tong et al, 2004; Schuldiner et al, 2005; Brown et al, 2006; Pan et al, 2006). In spite of their rich media buffering properties, we find that the interaction and sensitivity patterns of most SSL duplicates are divergent and are usually more similar to those of other, non-paralogous genes (Figure 2C and D; Supplementary Figure 10).
Lastly, in addition to our analysis of duplicate phenotypes, we used genetic interaction spectra as deletion phenotypes for generic genes whose single deletion in standard conditions has little measurable effect. As expected, genetic interactions provide a deletion phenotype for many more genes (80–90%) than single gene deletions in standard growth environments (Steinmetz et al, 2002), which yield a detectable growth defect only for 30–40% (Figure 4B). To assess whether these interactions reflect the cost of gene loss (gene importance), we asked if there is a relationship between the probability of a gene being retained between related species and its number of genetic interactions. Indeed, genetic interactivity exhibits a strong correlation with gene retention across related phyla (Figure 4C and Supplementary Figure 7), and predicts the likelihood of gene loss better than lethality/viability, quantitative growth deficiency or environmental specificity (Supplementary Figure 8). Thus, genetic interactions provide a cost of gene loss that effectively recapitulates evolutionary constraints. This is further supported by the observation that genetic interactions are significantly correlated with environmental sensitivity across a range of conditions. Thus, our findings suggest that for most genes there is a substantial cost of gene loss, even though this is often not reflected in single gene deletion tests carried out in standard conditions.
Many genes can be deleted with little phenotypic consequences. By what mechanism and to what extent the presence of duplicate genes in the genome contributes to this robustness against deletions has been the subject of considerable interest. Here, we exploit the availability of high-density genetic interaction maps to provide direct support for the role of backup compensation, where functionally overlapping duplicates cover for the loss of their paralog. However, we find that the overall contribution of duplicates to robustness against null mutations is low (∼25%). The ability to directly identify buffering paralogs allowed us to further study their properties, and how they differ from non-buffering duplicates. Using environmental sensitivity profiles as well as quantitative genetic interaction spectra as high-resolution phenotypes, we establish that even duplicate pairs with compensation capacity exhibit rich and typically non-overlapping deletion phenotypes, and are thus unable to comprehensively cover against loss of their paralog. Our findings reconcile the fact that duplicates can compensate for each other's loss under a limited number of conditions with the evolutionary instability of genes whose loss is not associated with a phenotypic penalty.
PMCID: PMC1847942  PMID: 17389874
duplication; evolution; genetic interactions; redundancy
2.  The Cellular Robustness by Genetic Redundancy in Budding Yeast 
PLoS Genetics  2010;6(11):e1001187.
The frequent dispensability of duplicated genes in budding yeast is heralded as a hallmark of genetic robustness contributed by genetic redundancy. However, theoretical predictions suggest such backup by redundancy is evolutionarily unstable, and the extent of genetic robustness contributed from redundancy remains controversial. It is anticipated that, to achieve mutual buffering, the duplicated paralogs must at least share some functional overlap. However, counter-intuitively, several recent studies reported little functional redundancy between these buffering duplicates. The large yeast genetic interactions released recently allowed us to address these issues on a genome-wide scale. We herein characterized the synthetic genetic interactions for ∼500 pairs of yeast duplicated genes originated from either whole-genome duplication (WGD) or small-scale duplication (SSD) events. We established that functional redundancy between duplicates is a pre-requisite and thus is highly predictive of their backup capacity. This observation was particularly pronounced with the use of a newly introduced metric in scoring functional overlap between paralogs on the basis of gene ontology annotations. Even though mutual buffering was observed to be prevalent among duplicated genes, we showed that the observed backup capacity is largely an evolutionarily transient state. The loss of backup capacity generally follows a neutral mode, with the buffering strength decreasing in proportion to divergence time, and the vast majority of the paralogs have already lost their backup capacity. These observations validated previous theoretic predictions about instability of genetic redundancy. However, departing from the general neutral mode, intriguingly, our analysis revealed the presence of natural selection in stabilizing functional overlap between SSD pairs. These selected pairs, both WGD and SSD, tend to have decelerated functional evolution, have higher propensities of co-clustering into the same protein complexes, and share common interacting partners. Our study revealed the general principles for the long-term retention of genetic redundancy.
Author Summary
Eukaryotic cells show remarkable robustness against external perturbations, which has been thought to be attributed, at least in part, to the extensive gene duplication events in eukaryotic genomes. By duplication, genes are likely to gain redundant copies for backup purposes, however, this notion contradicts the population genetic theory that genetic redundancy is evolutionarily unstable. In this study, we used yeast as a model organism to delineate the evolutionary trajectory of genetic robustness by gene duplication, utilizing the comprehensively characterized synthetic genetic interaction data in the yeast genome. We showed that the evolution of genetic robustness by duplication follows a neutral mode, with the loss of backup capacity proportional to the divergence time. However, natural selection was also acting on a few pairs to maintain their long-term backup capacity; and these pairs are slowly evolving, are co-clustered in the same protein complexes, and tend to interact with the similar partners. This study unravels the general principles underlying the evolution of the cellular robustness arising from genetic redundancy.
PMCID: PMC2973813  PMID: 21079672
3.  Genetic interactions reveal the evolutionary trajectories of duplicate genes 
Duplicate genes show significantly fewer interactions than singleton genes, and functionally similar duplicates can exhibit dissimilar profiles because common interactions are ‘hidden' due to buffering.Genetic interaction profiles provide insights into evolutionary mechanisms of duplicate retention by distinguishing duplicates under dosage selection from those retained because of some divergence in function.The genetic interactions of duplicate genes evolve in an extremely asymmetric way and the directionality of this asymmetry correlates well with other evolutionary properties of duplicate genes.Genetic interaction profiles can be used to elucidate the divergent function of specific duplicate pairs.
Gene duplication and divergence serves as a primary source for new genes and new functions, and as such has broad implications on the evolutionary process. Duplicate genes within S. cerevisiae have been shown to retain a high degree of similarity with regard to many of their functional properties (Papp et al, 2004; Guan et al, 2007; Wapinski et al, 2007; Musso et al, 2008), and perturbation of duplicate genes has been shown to result in smaller fitness defects than singleton genes (Gu et al, 2003; DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Individual genetic interactions between pairs of genes and profiles of such interactions across the entire genome provide a new context in which to examine the properties of duplicate compensation.
In this study we use the most recent and comprehensive set of genetic interactions in yeast produced to date (Costanzo et al, 2010) to address questions of duplicate retention and redundancy. We show that the ability for duplicate genes to buffer the deletion of a partner has three main consequences. First it agrees with previous work demonstrating that a high proportion of duplicate pairs are synthetic lethal, a classic indication of the ability to buffer one another functionally (DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Second, it reduces the number of genetic interactions observed between duplicate genes and the rest of the genome by masking interactions relating to common function from experimental detection. Third, this buffering of common interactions serves to reduce profile similarity in spite of common function (Figure 1). The compensatory ability of functionally similar duplicates buffers genetic interactions related to their common function (reducing the number of genetic interactions overall), while allowing the measurement of interactions related to any divergent function. Thus, even functionally similar duplicates may have dissimilar genetic interaction profiles. As previously surmised (Ihmels et al, 2007), duplicate genes under selection for dosage amplification have differing profile characteristics. We show that dosage-mediated duplicates have much higher genetic interaction profile similarity than do other duplicate pairs. Furthermore, we show in a comparison with local neighbors on a protein–protein interaction (PPI) network, that although dosage-mediated duplicates more often have higher similarity to each other than they do to their neighbors, the reverse is true for duplicates in general. That is, slightly divergent duplicate genes more often exhibit a higher similarity with a common neighbor on the PPI network than they do with each other, and that observation is consistent with the idea that common interactions are buffered while interactions corresponding to divergent functions are observed.
We then asked whether duplicates' genetic interactions that are not buffered appear in a symmetric or an asymmetric fashion. Previous work has established asymmetric patterns with regard to PPI degree (Wagner, 2002; He and Zhang, 2005), sequence divergence (Conant and Wagner, 2003; Zhang et al, 2003; Kellis et al, 2004; Scannell and Wolfe, 2008) and expression patterns (Gu et al, 2002b; Tirosh and Barkai, 2007). Although genetic interactions are further removed from mechanism than protein–protein interactions, for example, they do offer a more direct measurement of functional consequence and, thus, may give a better indication of the functional differences between a duplicate pair. We found that duplicates exhibit a strikingly asymmetric pattern of genetic interactions, with the ratio of interactions between sisters commonly exceeding 7:1 (Figure 4A). The observations differ significantly from random simulations in which genetic interactions were redistributed between sisters with equal probability (Figure 4A). Moreover, the directionality of this interaction asymmetry agrees with other physiological properties of duplicate pairs. For example, the sister with more genetic interactions also tends to have more protein–protein interactions and also tends to evolve at a slower rate (Figure 4B).
Genetic interaction degree and profiles can be used to understand the functional divergence of particular duplicates pairs. As a case example, we consider the whole-genome-duplication pair CIK1–VIK1. Each of these genes encode proteins that form distinct heterodimeric complexes with the microtubule motor protein Kar3 (Manning et al, 1999). Although each of these proteins depend on a direct physical interaction with Kar3, Cik1 has a much higher profile similarity to Kar3 than does Vik1 (r=0.5 and r=0.3, respectively). Consistent with its higher similarity, Δcik1 and Δkar3 exhibit several similar phenotypes, including abnormally short spindles, chromosome loss and delayed cell cycle progression (Page et al, 1994; Manning et al, 1999). In contrast, a Δvik1 mutant strain exhibits no overt phenotype (Manning et al, 1999).
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. Large-scale genetic network analysis in Saccharomyces cerevisiae provides a powerful perspective for addressing these questions through quantitative measurements of genetic interactions between pairs of duplicated genes, and more generally, through the study of genome-wide genetic interaction profiles associated with duplicated genes. We show that duplicate genes exhibit fewer genetic interactions than other genes because they tend to buffer one another functionally, whereas observed interactions are non-overlapping and reflect their divergent roles. We also show that duplicate gene pairs are highly imbalanced in their number of genetic interactions with other genes, a pattern that appears to result from asymmetric evolution, such that one duplicate evolves or degrades faster than the other and often becomes functionally or conditionally specialized. The differences in genetic interactions are predictive of differences in several other evolutionary and physiological properties of duplicate pairs.
PMCID: PMC3010121  PMID: 21081923
duplicate genes; functional divergence; genetic interactions; paralogs; Saccharomyces cerevisiae
4.  Modification of Gene Duplicability during the Evolution of Protein Interaction Network 
PLoS Computational Biology  2011;7(4):e1002029.
Duplications of genes encoding highly connected and essential proteins are selected against in several species but not in human, where duplicated genes encode highly connected proteins. To understand when and how gene duplicability changed in evolution, we compare gene and network properties in four species (Escherichia coli, yeast, fly, and human) that are representative of the increase in evolutionary complexity, defined as progressive growth in the number of genes, cells, and cell types. We find that the origin and conservation of a gene significantly correlates with the properties of the encoded protein in the protein-protein interaction network. All four species preserve a core of singleton and central hubs that originated early in evolution, are highly conserved, and accomplish basic biological functions. Another group of hubs appeared in metazoans and duplicated in vertebrates, mostly through vertebrate-specific whole genome duplication. Such recent and duplicated hubs are frequently targets of microRNAs and show tissue-selective expression, suggesting that these are alternative mechanisms to control their dosage. Our study shows how networks modified during evolution and contributes to explaining the occurrence of somatic genetic diseases, such as cancer, in terms of network perturbations.
Author Summary
Gene copy number is often tightly controlled because it directly affects the gene dosage. In several species, including yeast, worm, and fly, genes that have a single gene copy (singleton genes) encode proteins with several connections in the protein interaction network (hubs) as well as essential proteins. Surprisingly, in mouse and human essential proteins and hubs are encoded by genes with more than one copy in the genome (duplicated genes). Here we show that these two distinct groups of hubs were acquired at different times during the evolution of protein interaction network and contribute in different ways to the cell life. Singleton hubs are ancestral genes that are conserved from prokaryotes to vertebrates and accomplish basic functions that deal with the cell survival. Duplicated hubs were acquired mostly within metazoans and duplicated through vertebrate-specific whole genome duplication. These genes are involved in processes that are crucial for the organization of multicellularity. Although duplicated, also recent hubs are subject to gene dosage control through microRNAs and tissue-selective expression. The clarification of how the protein interaction network evolves enables us to understand the adaptation to the progressive increase in complexity and to better characterize the genes involved in diseases such as cancer.
PMCID: PMC3072358  PMID: 21490719
5.  Variation in gene duplicates with low synonymous divergence in Saccharomyces cerevisiae relative to Caenorhabditis elegans 
Genome Biology  2009;10(7):R75.
Differences between yeast and worm duplicates result from differences in mechanisms of duplication and effective population size.
The direct examination of large, unbiased samples of young gene duplicates in their early stages of evolution is crucial to understanding the origin, divergence and preservation of new genes. Furthermore, comparative analysis of multiple genomes is necessary to determine whether patterns of gene duplication can be generalized across diverse lineages or are species-specific. Here we present results from an analysis comprising 68 duplication events in the Saccharomyces cerevisiae genome. We partition the yeast duplicates into ohnologs (generated by a whole-genome duplication) and non-ohnologs (from small-scale duplication events) to determine whether their disparate origins commit them to divergent evolutionary trajectories and genomic attributes.
We conclude that, for the most part, ohnologs tend to appear remarkably similar to non-ohnologs in their structural attributes (specifically the relative composition frequencies of complete, partial and chimeric duplicates), the discernible length of the duplicated region (duplication span) as well as genomic location. Furthermore, we find notable differences in the features of S. cerevisiae gene duplicates relative to those of another eukaryote, Caenorhabditis elegans, with respect to chromosomal location, extent of duplication and the relative frequencies of complete, partial and chimeric duplications.
We conclude that the variation between yeast and worm duplicates can be attributed to differing mechanisms of duplication in conjunction with the varying efficacy of natural selection in these two genomes as dictated by their disparate effective population sizes.
PMCID: PMC2728529  PMID: 19594930
6.  The Roles of Whole-Genome and Small-Scale Duplications in the Functional Specialization of Saccharomyces cerevisiae Genes 
PLoS Genetics  2013;9(1):e1003176.
Researchers have long been enthralled with the idea that gene duplication can generate novel functions, crediting this process with great evolutionary importance. Empirical data shows that whole-genome duplications (WGDs) are more likely to be retained than small-scale duplications (SSDs), though their relative contribution to the functional fate of duplicates remains unexplored. Using the map of genetic interactions and the re-sequencing of 27 Saccharomyces cerevisiae genomes evolving for 2,200 generations we show that SSD-duplicates lead to neo-functionalization while WGD-duplicates partition ancestral functions. This conclusion is supported by: (a) SSD-duplicates establish more genetic interactions than singletons and WGD-duplicates; (b) SSD-duplicates copies share more interaction-partners than WGD-duplicates copies; (c) WGD-duplicates interaction partners are more functionally related than SSD-duplicates partners; (d) SSD-duplicates gene copies are more functionally divergent from one another, while keeping more overlapping functions, and diverge in their sub-cellular locations more than WGD-duplicates copies; and (e) SSD-duplicates complement their functions to a greater extent than WGD–duplicates. We propose a novel model that uncovers the complexity of evolution after gene duplication.
Author Summary
Gene duplication involves the doubling of a gene, originating an identical gene copy. Early evolutionary theory predicted that, as one gene copy is performing the ancestral function, the other gene copy, devoid from strong selection constraints, could evolve exploring alternative functions. Because of its potential to generate novel functions, hence biological complexity, gene duplication has been credited with enormous evolutionary importance. The way in which duplicated genes acquire novel functions remains the focus of intense research. Does the mechanism of duplication—duplication of small genome regions versus genome duplication—influence the fate of duplicates? Although it has been shown that the mechanism of duplication determines the persistence of genes in duplicate, a model describing the functional fates of duplicates generated by whole-genome or small-scale duplications remains largely obscure. Here we show that despite the large amount of genetic material originated by whole-genome duplication in the yeast Saccharomyces cerevisiae, these duplicates specialized in subsets of ancestral functions. Conversely, small-scale duplicates originated novel functions. We describe and test a model to explain the evolutionary dynamics of duplicates originated by different mechanisms. Our results shed light on the functional fates of duplicates and role of the duplication mechanism in generating functional diversity.
PMCID: PMC3536658  PMID: 23300483
7.  A Synergism between Adaptive Effects and Evolvability Drives Whole Genome Duplication to Fixation 
PLoS Computational Biology  2014;10(4):e1003547.
Whole genome duplication has shaped eukaryotic evolutionary history and has been associated with drastic environmental change and species radiation. While the most common fate of WGD duplicates is a return to single copy, retained duplicates have been found enriched for highly interacting genes. This pattern has been explained by a neutral process of subfunctionalization and more recently, dosage balance selection. However, much about the relationship between environmental change, WGD and adaptation remains unknown. Here, we study the duplicate retention pattern postWGD, by letting virtual cells adapt to environmental changes. The virtual cells have structured genomes that encode a regulatory network and simple metabolism. Populations are under selection for homeostasis and evolve by point mutations, small indels and WGD. After populations had initially adapted fully to fluctuating resource conditions re-adaptation to a broad range of novel environments was studied by tracking mutations in the line of descent. WGD was established in a minority (≈30%) of lineages, yet, these were significantly more successful at re-adaptation. Unexpectedly, WGD lineages conserved more seemingly redundant genes, yet had higher per gene mutation rates. While WGD duplicates of all functional classes were significantly over-retained compared to a model of neutral losses, duplicate retention was clearly biased towards highly connected TFs. Importantly, no subfunctionalization occurred in conserved pairs, strongly suggesting that dosage balance shaped retention. Meanwhile, singles diverged significantly. WGD, therefore, is a powerful mechanism to cope with environmental change, allowing conservation of a core machinery, while adapting the peripheral network to accommodate change.
Author Summary
The evolution of eukaryotes is characterized by drastic changes in their genome content. Genome expansions have often occurred by duplication of the entire genome. It is generally not know whether organisms gain any adaptive advantage from these mutations. However, they appear to become fixed in response to environmental change. Many interesting whole genome duplications happened long ago in eukaryotic evolutionary history during periods of turbulent genome and species evolution. Genomic data analysis alone cannot resolve the evolutionary mechanisms and consequences of whole genome duplication. Here, we modeled evolution with whole genome duplications in a Virtual Cell model. Simulating populations that undergo a range of different environmental changes we found that next to often increasing fitness directly, whole genome duplications made lineages more evolvable and hence more able to adapt to harsh new environments. Although most duplicates are deleted in subsequent evolution, genes with many interaction partners were retained preferentially, increasing regulatory complexity. Interestingly however, we found that innovation happened most likely in the more loosely connected and less essential genes.
PMCID: PMC3990473  PMID: 24743268
8.  Network Hubs Buffer Environmental Variation in Saccharomyces cerevisiae 
PLoS Biology  2008;6(11):e264.
Regulatory and developmental systems produce phenotypes that are robust to environmental and genetic variation. A gene product that normally contributes to this robustness is termed a phenotypic capacitor. When a phenotypic capacitor fails, for example when challenged by a harsh environment or mutation, the system becomes less robust and thus produces greater phenotypic variation. A functional phenotypic capacitor provides a mechanism by which hidden polymorphism can accumulate, whereas its failure provides a mechanism by which evolutionary change might be promoted. The primary example to date of a phenotypic capacitor is Hsp90, a molecular chaperone that targets a large set of signal transduction proteins. In both Drosophila and Arabidopsis, compromised Hsp90 function results in pleiotropic phenotypic effects dependent on the underlying genotype. For some traits, Hsp90 also appears to buffer stochastic variation, yet the relationship between environmental and genetic buffering remains an important unresolved question. We previously used simulations of knockout mutations in transcriptional networks to predict that many gene products would act as phenotypic capacitors. To test this prediction, we use high-throughput morphological phenotyping of individual yeast cells from single-gene deletion strains to identify gene products that buffer environmental variation in Saccharomyces cerevisiae. We find more than 300 gene products that, when absent, increase morphological variation. Overrepresented among these capacitors are gene products that control chromosome organization and DNA integrity, RNA elongation, protein modification, cell cycle, and response to stimuli such as stress. Capacitors have a high number of synthetic-lethal interactions but knockouts of these genes do not tend to cause severe decreases in growth rate. Each capacitor can be classified based on whether or not it is encoded by a gene with a paralog in the genome. Capacitors with a duplicate are highly connected in the protein–protein interaction network and show considerable divergence in expression from their paralogs. In contrast, capacitors encoded by singleton genes are part of highly interconnected protein clusters whose other members also tend to affect phenotypic variability or fitness. These results suggest that buffering and release of variation is a widespread phenomenon that is caused by incomplete functional redundancy at multiple levels in the genetic architecture.
Author Summary
Most species maintain abundant genetic variation and experience a wide range of environmental conditions, yet phenotypic differences between individuals are usually small. This phenomenon, known as phenotypic robustness, presents an apparent contradiction: if biological systems are so resistant to variation, how do they diverge and adapt through evolutionary time? Here, we address this question by investigating the molecular mechanisms that underlie phenotypic robustness and how these mechanisms can be broken to produce phenotypic heterogeneity. We identify genes that contribute to phenotypic robustness in yeast by analyzing the variance of morphological phenotypes in a comprehensive collection of single-gene knockout strains. We find that ∼5% of yeast genes break phenotypic robustness when knocked out. The products of these genes tend to be involved in critical cellular processes, including maintaining DNA stability, processing RNA, modifying proteins, and responding to stressful environments. These genes tend to interact genetically with a large number of other genes, and their products tend to interact physically with a large number of other gene products. Our results suggest that loss of phenotypic robustness might be a common phenomenon during evolution that occurs when cellular networks are disrupted.
A genome-wide screen inSaccharomyces cerevisiae identifies over 300 gene products that buffer environmental variation--dubbed phenotypic capacitors--and function as hubs in protein-protein and synthetic-lethal interaction networks.
PMCID: PMC2577700  PMID: 18986213
9.  Niche adaptation by expansion and reprogramming of general transcription factors 
Experimental analysis of TFB family proteins in a halophilic archaeon reveals complex environment-dependent fitness contributions. Gene conversion events among these proteins can generate novel niche adaptation capabilities, a process that may have contributed to archaeal adaptation to extreme environments.
Evolution of archaeal lineages correlate with duplication events in the TFB family.Each TFB is required for adaptation to multiple environments.The relative fitness contributions of TFBs change with environmental context.Changes in the regulation of duplicated TFBs can generate new adaptation capabilities.
The evolutionary success of an organism depends on its ability to continually adapt to changes in the patterns of constant, periodic, and transient challenges within its environment. This process of ‘niche adaptation' requires reprogramming of the organism's environmental response networks by reorganizing interactions among diverse parts including environmental sensors, signal transducers, and transcriptional and post-transcriptional regulators. Gene duplications have been discovered to be one of the principal strategies in this process, especially for reprogramming of gene regulatory networks (GRNs). Whereas eukaryotes require dozens of factors for recruitment of RNA polymerase, archaea require just two general transcription factors (GTFs) that are orthologous to eukaryotic TFIIB (TFB in archaea) and TATA-binding protein (TBP) (Bell et al, 1998). Both of these GTFs have expanded extensively in nearly 50% of all archaea whose genomes have been fully sequenced. The phylogenetic analysis presented in this study reveal lineage-specific expansions of TFBs, suggesting that they might encode functionally specialized gene regulatory programs for the unique environments to which these organisms have adapted. This hypothesis is particularly appealing when we consider that the greatest expansion is observed within the group of halophilic archaea whose habitats are associated with routine and dynamic changes in a number of environmental factors including light, temperature, oxygen, salinity, and ionic composition (Rodriguez-Valera, 1993; Litchfield, 1998).
We have previously demonstrated that variations in the expanded set of TFBs (a through e) in Halobacterium salinarum NRC-1 manifests at the level of physical interactions within and across the two families, their DNA-binding specificity, their differential regulation in varying environments, and, ultimately, on the large-scale segregation of transcription of all genes into overlapping yet distinct sets of functionally related groups (Facciotti et al, 2007). We have extended findings from this earlier study with a systematic survey of the fitness consequences of perturbing the TFB network of H. salinarum NRC-1 across 17 environments. Notably, each TFB conferred fitness in two or more environmental conditions tested, and the relative fitness contributions (see Table I) of the five TFBs varied significantly by environment. From an evolutionary perspective, the relationships among these fitness landscapes reveal that two classes of TFBs (c/g- and f-type) appear to have played an important role in the evolution of halophilic archaea by overseeing regulation of core physiological capabilities in these organisms. TFBs of the other clades (b/d and a/e) seem to have emerged much more recently through gene duplications or horizontal gene transfers (HGTs) and are being utilized for adaptation to specialized environmental conditions.
We also investigated higher-order functional interactions and relationships among the duplicated TFBs by performing competition experiments and by mapping genetic interactions in different environments. This demonstrated that depending on environmental context, the TFBs have strikingly different functional hierarchies and genetic interactions with one another. This is remarkable as it makes each TFB essential albeit at different times in a dynamically changing environment.
In order to understand the process by which such gene family expansions shape architecture and functioning of a GRN, we performed integrated analysis of phylogeny, physical interactions, regulation, and fitness landscapes of the seven TFBs in H. salinarum NRC-1. This revealed that evolution of both their protein-coding sequence and their promoter has been instrumental in the encoding of environment-specific regulatory programs. Importantly, the convergent and divergent evolution of regulation and binding properties of TFBs suggested that, aside from HGT and random mutations, a third plausible (and perhaps most interesting) mechanism for acquiring a novel TFB variant is through gene conversion. To test this hypothesis, we synthesized a novel TFBx by transferring TFBa/e clade-specific residues to a TFBd backbone, transformed this variant under the control of either the TFBd or the TFBe promoter (PtfbD or PtfbE) into three different host genetic backgrounds (Δura3 (parent), ΔtfbD, and ΔtfbE), and analyzed fitness and gene expression patterns during growth at 25 and 37°C. This showed that gene conversion events spanning the coding sequence and the promoter, environmental context, and genetic background of the host are all extremely influential in the functional integration of a TFB into the GRN. Importantly, this analysis suggested that altering the regulation of an existing set of expanded TFBs might be an efficient mechanism to reprogram the GRN to rapidly generate novel niche adaptation capability. We have confirmed this experimentally by increasing fitness merely by moving tfbE to PtfbD control, and by generating a completely novel phenotype (biofilm-like appearance) by overexpression of tfbE.
Altogether this study clearly demonstrates that archaea can rapidly generate novel niche adaptation programs by simply altering regulation of duplicated TFBs. This is significant because expansions in the TFB family is widespread in archaea, a class of organisms that not only represent 20% of biomass on earth but are also known to have colonized some of the most extreme environments (DeLong and Pace, 2001). This strategy for niche adaptation is further expanded through interactions of the multiple TFBs with members of other expanded TF families such as TBPs (Facciotti et al, 2007) and sequence-specific regulators (e.g. Lrp family (Peeters and Charlier, 2010)). This is analogous to combinatorial solutions for other complex biological problems such as recognition of pathogens by Toll-like receptors (Roach et al, 2005), generation of antibody diversity by V(D)J recombination (Early et al, 1980), and recognition and processing of odors (Malnic et al, 1999).
Numerous lineage-specific expansions of the transcription factor B (TFB) family in archaea suggests an important role for expanded TFBs in encoding environment-specific gene regulatory programs. Given the characteristics of hypersaline lakes, the unusually large numbers of TFBs in halophilic archaea further suggests that they might be especially important in rapid adaptation to the challenges of a dynamically changing environment. Motivated by these observations, we have investigated the implications of TFB expansions by correlating sequence variations, regulation, and physical interactions of all seven TFBs in Halobacterium salinarum NRC-1 to their fitness landscapes, functional hierarchies, and genetic interactions across 2488 experiments covering combinatorial variations in salt, pH, temperature, and Cu stress. This systems analysis has revealed an elegant scheme in which completely novel fitness landscapes are generated by gene conversion events that introduce subtle changes to the regulation or physical interactions of duplicated TFBs. Based on these insights, we have introduced a synthetically redesigned TFB and altered the regulation of existing TFBs to illustrate how archaea can rapidly generate novel phenotypes by simply reprogramming their TFB regulatory network.
PMCID: PMC3261711  PMID: 22108796
evolution by gene family expansion; fitness; niche adaptation; reprogramming of gene regulatory network; transcription factor B
10.  Subfunctionalization reduces the fitness cost of gene duplication in humans by buffering dosage imbalances 
BMC Genomics  2011;12:604.
Driven essentially by random genetic drift, subfunctionalization has been identified as a possible non-adaptive mechanism for the retention of duplicate genes in small-population species, where widespread deleterious mutations are likely to cause complementary loss of subfunctions across gene copies. Through subfunctionalization, duplicates become indispensable to maintain the functional requirements of the ancestral locus. Yet, gene duplication produces a dosage imbalance in the encoded proteins and thus, as investigated in this paper, subfunctionalization must be subject to the selective forces arising from the fitness bottleneck introduced by the duplication event.
We show that, while arising from random drift, subfunctionalization must be inescapably subject to selective forces, since the diversification of expression patterns across paralogs mitigates duplication-related dosage imbalances in the concentrations of encoded proteins. Dosage imbalance effects become paramount when proteins rely on obligatory associations to maintain their structural integrity, and are expected to be weaker when protein complexation is ephemeral or adventitious. To establish the buffering effect of subfunctionalization on selection pressure, we determine the packing quality of encoded proteins, an established indicator of dosage sensitivity, and correlate this parameter with the extent of paralog segregation in humans, using species with larger population -and more efficient selection- as controls.
Recognizing the role of subfunctionalization as a dosage-imbalance buffer in gene duplication events enabled us to reconcile its mechanistic nonadaptive origin with its adaptive role as an enabler of the evolution of genetic redundancy. This constructive role was established in this paper by proving the following assertion: If subfunctionalization is indeed adaptive, its effect on paralog segregation should scale with the dosage sensitivity of the duplicated genes. Thus, subfunctionalization becomes adaptive in response to the selection forces arising from the fitness bottleneck imposed by gene duplication.
PMCID: PMC3280233  PMID: 22168623
11.  Role of Duplicate Genes in Robustness against Deleterious Human Mutations 
PLoS Genetics  2008;4(3):e1000014.
It is now widely recognized that robustness is an inherent property of biological systems [1],[2],[3]. The contribution of close sequence homologs to genetic robustness against null mutations has been previously demonstrated in simple organisms [4],[5]. In this paper we investigate in detail the contribution of gene duplicates to back-up against deleterious human mutations. Our analysis demonstrates that the functional compensation by close homologs may play an important role in human genetic disease. Genes with a 90% sequence identity homolog are about 3 times less likely to harbor known disease mutations compared to genes with remote homologs. Moreover, close duplicates affect the phenotypic consequences of deleterious mutations by making a decrease in life expectancy significantly less likely. We also demonstrate that similarity of expression profiles across tissues significantly increases the likelihood of functional compensation by homologs.
Author Summary
Genetic robustness is the ability of an organism to buffer deleterious genetic mutations. It has been previously demonstrated that the functional compensation by duplicates plays an important role in protection against gene deletions in model organisms. Close duplicates often share similar functions, and loss of one paralog may be buffered by others. In the present work we specifically investigate the contribution of gene duplicates to backup against deleterious human mutations. We find that genes with close homologs are significantly less likely to harbor known disease mutations compared to genes with remote homologs. In addition, close duplicates affect the phenotypic consequences of deleterious mutations by making a decrease in life expectancy less likely. Similarity of expression profiles across tissues increases the likelihood of functional compensation by homologs. Taken together, our analysis demonstrates that functional compensation by close duplicates plays an important role in human genetic disease.
PMCID: PMC2265532  PMID: 18369440
12.  Effect of Duplicate Genes on Mouse Genetic Robustness: An Update 
BioMed Research International  2014;2014:758672.
In contrast to S. cerevisiae and C. elegans, analyses based on the current knockout (KO) mouse phenotypes led to the conclusion that duplicate genes had almost no role in mouse genetic robustness. It has been suggested that the bias of mouse KO database toward ancient duplicates may possibly cause this knockout duplicate puzzle, that is, a very similar proportion of essential genes (PE) between duplicate genes and singletons. In this paper, we conducted an extensive and careful analysis for the mouse KO phenotype data and corroborated a strong effect of duplicate genes on mouse genetics robustness. Moreover, the effect of duplicate genes on mouse genetic robustness is duplication-age dependent, which holds after ruling out the potential confounding effect from coding-sequence conservation, protein-protein connectivity, functional bias, or the bias of duplicates generated by whole genome duplication (WGD). Our findings suggest that two factors, the sampling bias toward ancient duplicates and very ancient duplicates with a proportion of essential genes higher than that of singletons, have caused the mouse knockout duplicate puzzle; meanwhile, the effect of genetic buffering may be correlated with sequence conservation as well as protein-protein interactivity.
PMCID: PMC4119742  PMID: 25110693
13.  Relationship between gene duplicability and diversifiability in the topology of biochemical networks 
BMC Genomics  2014;15(1):577.
Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes.
Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene’s duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes – the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.
Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks – an improvement of our understanding of gene duplicability.
PMCID: PMC4129122  PMID: 25005725
14.  Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human 
PLoS Computational Biology  2006;2(9):e133.
Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or “in-paralogues,” are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes.
Biologists often exploit the evolutionary relationships between proteins in order to explain how their findings are relevant to the biology of other species, including Homo sapiens. The most natural way to define these relationships is to draw family trees showing, for example, which human protein is the counterpart (“orthologue”) of a protein in dog, and which human proteins have arisen by recent duplication of existing genes (“paralogues”). On a small-scale this is relatively straightforward, but it is difficult to do this automatically on a genome-wide scale. In this paper the authors describe a new approach to drawing a giant family tree of all proteins from humans and dogs. They show how this tree allows them to refine some protein predictions and discard others that are likely to be nonfunctional dead sequences. Family relationships can show how the dog and human genomes have been rearranged since their last common ancestor. In addition, they help to identify the proteins that are specific to either dog or human, and which contribute to these species' biological differences. Giant trees, drawn from this method, will help to associate the differences, duplications, and evolution of proteins in different mammals with their distinctive physiologies and behaviours.
PMCID: PMC1584324  PMID: 17009864
15.  Generation of Tandem Direct Duplications by Reversed-Ends Transposition of Maize Ac Elements 
PLoS Genetics  2013;9(8):e1003691.
Tandem direct duplications are a common feature of the genomes of eukaryotes ranging from yeast to human, where they comprise a significant fraction of copy number variations. The prevailing model for the formation of tandem direct duplications is non-allelic homologous recombination (NAHR). Here we report the isolation of a series of duplications and reciprocal deletions isolated de novo from a maize allele containing two Class II Ac/Ds transposons. The duplication/deletion structures suggest that they were generated by alternative transposition reactions involving the termini of two nearby transposable elements. The deletion/duplication breakpoint junctions contain 8 bp target site duplications characteristic of Ac/Ds transposition events, confirming their formation directly by an alternative transposition mechanism. Tandem direct duplications and reciprocal deletions were generated at a relatively high frequency (∼0.5 to 1%) in the materials examined here in which transposons are positioned nearby each other in appropriate orientation; frequencies would likely be much lower in other genotypes. To test whether this mechanism may have contributed to maize genome evolution, we analyzed sequences flanking Ac/Ds and other hAT family transposons and identified three small tandem direct duplications with the structural features predicted by the alternative transposition mechanism. Together these results show that some class II transposons are capable of directly inducing tandem sequence duplications, and that this activity has contributed to the evolution of the maize genome.
Author Summary
The recent explosion of genome sequence data has greatly increased the need to understand the forces that shape eukaryotic genomes. A common feature of higher plant genomes is the presence of large numbers of duplications, often occurring as tandem repeats of thousands of base pairs. Despite the importance of gene duplications in evolution and disease, the precise mechanism(s) that generate tandem duplications are still unclear. In this study we identified nine new spontaneous duplications that arose flanking elements of the Ac transposon system. These duplications range in size from 8 kbp to >5,000 kbp, and all cases exhibit features characteristic of Ac transposition. Using similar criteria in a bioinformatics search, we identified three smaller duplications adjacent to other hAT family transposons in the maize B73 reference genome sequence. Our results show that transposable elements can directly generate tandem duplications via alternative transposition, and that this mechanism is responsible for at least some of the duplications present in the maize B73 genome. This work extends the significance of Barbara McClintock's discovery of transposable elements by demonstrating how they can act as agents of genome expansion.
PMCID: PMC3744419  PMID: 23966872
16.  Molecular characterization of the evolution of phagosomes 
First large-scale comparative proteomics/phosphoproteomics study characterizing some of the key steps that contributed to the remodeling of phagosomes that occurred during evolution. Comparison of profiling analyses of isolated phagosomes from three distant organisms (Dictyostelium, Drosophila, and mouse) revealed a protein core that defines a potential ‘ancient' phagosome and a set of 50 proteins that emerged while adaptive immunity was already well established.Gene duplication events of mouse phagosome paralogs occurred mostly in Bilateria and Euteleostomi, coinciding with the emergence of innate and adaptive immunity, and thus, provided the functional innovations needed for the establishment of these two crucial evolutionary steps of the immune system.Phosphoproteomics of isolated phagosomes from the same three distant species indicate that the phagosome phosphoproteome has been extensively modified during evolution. Still, some phosphosites have been maintained for >1.2 billion years, and thus, highlight their particular significance in the regulation of key phagosomal functions.
Phagocytosis is the process by which multiple cell types internalize large particulate material from the external milieu. The functional properties of phagosomes are acquired through a complex maturation process, referred to as phagolysosome biogenesis. This pathway involves a series of rapid interactions with organelles of the endocytic apparatus, enabling the gradual transformation of newly formed phagosomes into phagolysosomes in which proteolytic degradation occurs. The degradative environment encountered in the phagosome lumen has enabled the use of phagocytosis as a predation mechanism for feeding (phagotrophy) in amoeba, whereas multicellular organisms utilize this process as a defense mechanism to kill microbes and, in jawed vertebrates (fish), initiate a sustained immune response.
High-throughput proteomics profiling of isolated phagosomes has been tremendously helpful for the molecular comprehension of this organelle. This approach is achieved by feeding low buoyancy latex beads to phagocytic cells, enabling the subsequent isolation of latex bead-containing phagosomes, away from all the other cell organelles, by a single-isopicnic centrifugation in sucrose gradient. In order to characterize some of the key steps that contributed to the remodeling of phagosomes during evolution, we isolated this organelle from three distant organisms: the amoeba Dictyostelium discoideum, the fruit fly Drosophila melanogaster, and mouse (Mus musculus) that use phagocytosis for different purposes, and performed detailed proteomics and phosphoproteomics analyses with unparallel protein coverage for this organelle (two- to four-fold enhancements in identified proteins).
In order to establish the origin of the mouse phagosome proteome, we performed comparative analyses among 39 taxa including plants/algea, unicellular organisms, fungi, and more complex animal multicellular organisms. These genomic comparisons indicated that a large proportion of the mouse phagosome proteome is of ancient origin (73.1% of the proteome is conserved in eukaryotic organisms) (Figure 2A). This stresses the fact that phagocytosis is a very ancient process, as shown by its possible involvement in the emergence of eukaryotic cells (eukaryogenesis). Indeed, we identified close to 300 phagosome mouse proteins also present on Drosophila and Dictyostelium phagosomes, defining a potential ‘ancient' core of proteins from which the immune functions of phagosomes likely evolved. Around 16.7% of the mouse phagosome proteins appeared in organisms that use phagocytosis for innate immunity (Bilateria to Chordata), whereas 10.2% appeared in Euteleostomi or Tetrapoda where phagosomes have an important function in linking the killing of microorganisms with the development of a specific sustained immune response following antigen recognition. The phagosome is made of molecules taken from a variety of sources within the cell, including the cytoplasm, the cytoskeleton and membrane organelles. Despite the evolution and diversification of these various cellular systems, the mammalian phagosome proteome is made preferentially of ancient proteins (Figure 2B). Comparison of functional annotation during evolution highlighted the emergence of specific phagosomal functions at various steps during evolution (Figure 2C). Some of these proteins and their point of origin during evolution are highlighted in Figure 2D. Strikingly, we identified in Tetrapods a set of 50 proteins that arose while adaptive immunity was already well established in teleosts (fish), indicating that the phagocytic system is still evolving.
Our study highlights the fact that the functional properties of phagosomes emerged by the remodeling of ancient molecules, the addition of novel components, and the duplication of existing proteins (paralogs) leading to the formation of molecular machines of mixed origin. Gene duplication is a process that contributed continuously to the complexification of the mouse proteome during evolution. In sharp contrast, paralog analysis indicated that the phagosome proteome was mainly reorganized through two periods of gene duplication, in Bilateria and Euteleostomi, coinciding with the emergence of adaptive immunity (in jawed fish), and innate immunity (at the split between Metazoa and Bilateria). These results strongly suggest that selective constraints may have favored the maintenance of phagosome paralogs to ensure the establishment of novel functions associated with this organelle at these two crucial evolutionary steps of the immune system.
The emergence of genes associated to the MHC locus in mammals that appeared originally in the genome of jawed fishes, contributed to the development of complex molecular mechanisms linking innate (our immune system that defends the host from infection in a non-specific manner) and adaptive immunity (the part of the immune system triggered specifically after antigen recognition). Several of the genes of this locus encode proteins known to have important functions in antigen presentation, such as subunits of the immunoproteasome (LMP2 and LMP7), MHC class I and class II molecules, as well as tapasin and the transporter associated with antigen processing (TAP1 and TAP2), involved in the transport and loading of peptides on MHC class I molecules (Figure 6). In addition to their ability to present peptides on MHC class II molecules, phagosomes of vertebrates have been shown to be competent for the presentation of exogenous peptides on MHC class I molecules, a process referred to as cross-presentation. From a functional point of view, the involvement of phagosomes in antigen cross-presentation is the outcome of the successful integration of a wide range of multimolecular components that emerged throughout evolution (Figure 6). The trimming of exogenous proteins into small peptides that can be loaded on MHC class I molecules is inherited from the phagotrophic properties of unicellular organisms, where internalized bacteria are degraded into basic molecules and used as a source of nutrients. Ancient processes have therefore been co-opted (the use of an existing biological structure or feature for a new function) for new functionalities. A summarizing model of the various steps that enabled phagosome antigen presentation is presented in Figure 6. This model highlights the fact that although antigen presentation is unique to evolutionary recent phagosomes (starting in jawed fishes about 450 million years ago), it uses and integrates molecular machines composed of proteins that emerged throughout evolution.
In summary, we present here the first large-scale comparative proteomics/phosphoproteomics study characterizing some of the key evolutionary steps that contributed to the remodeling of phagosomes during evolution. Functional properties of this organelle emerged by the remodeling of ancient molecules, the addition of novel components, the extensive adaption of protein phosphorylation sites and the duplication of existing proteins leading to the formation of molecular machines of mixed origin.
Amoeba use phagocytosis to internalize bacteria as a source of nutrients, whereas multicellular organisms utilize this process as a defense mechanism to kill microbes and, in vertebrates, initiate a sustained immune response. By using a large-scale approach to identify and compare the proteome and phosphoproteome of phagosomes isolated from distant organisms, and by comparative analysis over 39 taxa, we identified an ‘ancient' core of phagosomal proteins around which the immune functions of this organelle have likely organized. Our data indicate that a larger proportion of the phagosome proteome, compared with the whole cell proteome, has been acquired through gene duplication at a period coinciding with the emergence of innate and adaptive immunity. Our study also characterizes in detail the acquisition of novel proteins and the significant remodeling of the phagosome phosphoproteome that contributed to modify the core constituents of this organelle in evolution. Our work thus provides the first thorough analysis of the changes that enabled the transformation of the phagosome from a phagotrophic compartment into an organelle fully competent for antigen presentation.
PMCID: PMC2990642  PMID: 20959821
evolution; immunity; phosphoproteomics; phylogeny; proteomics
17.  Protein Under-Wrapping Causes Dosage Sensitivity and Decreases Gene Duplicability 
PLoS Genetics  2008;4(1):e11.
A fundamental issue in molecular evolution is how to identify the evolutionary forces that determine the fate of duplicated genes. The dosage balance hypothesis has been invoked to explain gene duplication patterns at the genomic level under the premise that a dosage imbalance among protein-complex subunits or interacting partners is often deleterious. Here we examine this hypothesis by investigating the molecular basis of dosage sensitivity. We focus on the extent of protein wrapping, which indicates how strongly the structural integrity of a protein relies on its interactive context. From this perspective, we predict that the duplicates of a highly under-wrapped protein or protein subunit should (1) be more sensitive to dosage imbalance and be less likely to be retained and (2) be more likely to survive from a whole-genome duplication (WGD) than from a non-WGD because a WGD causes little or no dosage imbalance. Our under-wrapping analysis of more than 12,000 protein structures strongly supports these predictions and further reveals that the effect of dosage sensitivity on gene duplicability decreases with increasing organismal complexity.
Author Summary
A gene duplication provides an extra gene copy that can be free to accumulate mutations and gain a new function. Therefore, gene duplication plays a very important role in evolution. However, the presence of an additional gene copy can sometimes be deleterious because it can lead to an excessive dosage relative to those of its interacting partners. This dosage imbalance effect in turn influences the fate of duplicated genes in evolution. Our study gives the first description to our knowledge of the molecular/structural basis for the dosage imbalance effect. We study the relationships between gene family size and extent of protein under-wrapping, a molecular quantifier of the reliance of the protein on binding partnerships to maintain structural integrity, indicative of the extent of structure protection from disruptive hydration. Using more than 12,000 protein three-dimensional structures from six organisms that range from bacteria to human, we show an inverse relationship between extent of protein under-wrapping and family size. That is, a duplication is unlikely to be tolerated if the protein is highly under-wrapped (i.e., its structure requires substantial stabilizing interactions with other proteins). We also show that the effect of dosage imbalance is more apparent in unicellular organisms but is buffered to some extent in higher eukaryotes.
PMCID: PMC2211539  PMID: 18208334
18.  Increased Expression and Protein Divergence in Duplicate Genes Is Associated with Morphological Diversification 
PLoS Genetics  2009;5(12):e1000781.
The differentiation of both gene expression and protein function is thought to be important as a mechanism of the functionalization of duplicate genes. However, it has not been addressed whether expression or protein divergence of duplicate genes is greater in those genes that have undergone functionalization compared with those that have not. We examined a total of 492 paralogous gene pairs associated with morphological diversification in a plant model organism (Arabidopsis thaliana). Classifying these paralogous gene pairs into high, low, and no morphological diversification groups, based on knock-out data, we found that the divergence rate of both gene expression and protein sequences were significantly higher in either high or low morphological diversification groups compared with those in the no morphological diversification group. These results strongly suggest that the divergence of both expression and protein sequence are important sources for morphological diversification of duplicate genes. Although both mechanisms are not mutually exclusive, our analysis suggested that changes of expression pattern play the minor role (33%–41%) and that changes of protein sequence play the major role (59%–67%) in morphological diversification. Finally, we examined to what extent duplicate genes are associated with expression or protein divergence exerting morphological diversification at the whole-genome level. Interestingly, duplicate genes randomly chosen from A. thaliana had not experienced expression or protein divergence that resulted in morphological diversification. These results indicate that most duplicate genes have experienced minor functionalization.
Author Summary
The relationship between morphological and molecular evolution is a central issue to the understanding of eukaryote evolution. In particular, there is much interest in how duplicate genes have contributed to morphological diversification during evolution. As a mechanism of functionalization of duplicate genes, differentiation of both gene expression and protein function are believed to be important. Although it has been reported that both expression and protein divergence tend to increase as a duplication ages, it is unclear whether expression or protein divergence in duplicate genes is greater in those genes that have undergone functionalization compared with those that have not. Here, we studied 492 duplicate gene pairs associated with various degrees of morphological diversification in Arabidopsis thaliana. Using these data, we found that the divergence of both expression and protein sequence were important sources for morphological diversification of duplicate genes. Although both mechanisms are not mutually exclusive, our analysis suggested that expression divergence is the minor contributor and protein divergence is the major contributor to morphological diversification. However, the expression or protein sequence of randomly chosen duplicate genes did not show significant divergence that resulted in morphological diversification. These results indicate that most duplicate genes experienced minor functionalization in the genome.
PMCID: PMC2788128  PMID: 20041196
19.  Duplication and Retention Biases of Essential and Non-Essential Genes Revealed by Systematic Knockdown Analyses 
PLoS Genetics  2013;9(5):e1003330.
When a duplicate gene has no apparent loss-of-function phenotype, it is commonly considered that the phenotype has been masked as a result of functional redundancy with the remaining paralog. This is supported by indirect evidence showing that multi-copy genes show loss-of-function phenotypes less often than single-copy genes and by direct tests of phenotype masking using select gene sets. Here we take a systematic genome-wide RNA interference approach to assess phenotype masking in paralog pairs in the Caenorhabditis elegans genome. Remarkably, in contrast to expectations, we find that phenotype masking makes only a minor contribution to the low knockdown phenotype rate for duplicate genes. Instead, we find that non-essential genes are highly over-represented among duplicates, leading to a low observed loss-of-function phenotype rate. We further find that duplicate pairs derived from essential and non-essential genes have contrasting evolutionary dynamics: whereas non-essential genes are both more often successfully duplicated (fixed) and lost, essential genes are less often duplicated but upon successful duplication are maintained over longer periods. We expect the fundamental evolutionary duplication dynamics presented here to be broadly applicable.
Author Summary
Duplicate genes occur in all organisms. It has been found that mutations in duplicate genes cause defects much less often than when single copy genes are mutated. It is widely believed that this is due to functional redundancy—that is, the two genes can carry out similar functions so that the non-mutated duplicate gene can cover for or “mask” the phenotype of the mutation in the first duplicate. To determine whether this hypothesis is true, it is necessary to test systematically whether defects indeed occur in the organism when both duplicate genes are inhibited. We have for the first time carried out such an analysis in a multicellular organism, the nematode Caenorhabditis elegans. In contrast to expectations, we observed that when both copies of duplicate genes are inhibited deleterious effects are very rare. We show that this is because duplicate genes are much more often non-essential compared to genes where there is only a single copy. Non-essential genes are also lost from the genome much more often than essential genes. However, when essential genes are duplicated, they remain present in the genome over longer periods. Our results give a framework to explain the evolutionary dynamics of duplications in the genome.
PMCID: PMC3649981  PMID: 23675306
20.  Evolutionary Models for Formation of Network Motifs and Modularity in the Saccharomyces Transcription Factor Network 
PLoS Computational Biology  2007;3(10):e198.
Many natural and artificial networks contain overrepresented subgraphs, which have been termed network motifs. In this article, we investigate the processes that led to the formation of the two most common network motifs in eukaryote transcription factor networks: the bi-fan motif and the feed-forward loop. Around 100 million y ago, the common ancestor of the Saccharomyces clade underwent a whole-genome duplication event. The simultaneous duplication of the genes created by this event enabled the origin of many network motifs to be established. The data suggest that there are two primary mechanisms that are involved in motif formation. The first mechanism, enabled by the substantial plasticity in promoter regions, is rewiring of connections as a result of positive environmental selection. The second is duplication of transcription factors, which is also shown to be involved in the formation of intermediate-scale network modularity. These two evolutionary processes are complementary, with the pre-existence of network motifs enabling duplicated transcription factors to bind different targets despite structural constraints on their DNA-binding specificities. This process may facilitate the creation of novel expression states and the increases in regulatory complexity associated with higher eukaryotes.
Author Summary
Networks are a simple and general way of representing natural phenomena that range in scale from the social interactions between people to the organization of circuits on a microchip. Many networks have been found to contain repeated patterns of connections between small groups of nodes. These patterns, termed network motifs, are thought to be involved in controlling the flow of information through the network. This article investigates the processes that led to the formation of the two most common types of motif in the network controlling gene expression in baker's yeast. Around 100 million y ago, yeast's ancestor underwent a whole-genome duplication, which resulted in the organism containing four copies of each gene rather than the usual two. The duplicated genes that remain in the yeast genome are used to infer the two mechanisms that give rise to network motifs. These are rewiring of interactions between genes, and the duplication of proteins that control gene expression (transcription factors). These two processes are complementary with the rewiring mechanism enabling duplicated transcription factors to regulate the expression of different genes. It appears likely that these two processes are involved in enabling the increases in complexity that are associated with multicellular life.
PMCID: PMC2041975  PMID: 17967049
21.  Evolutionary principles of modular gene regulation in yeasts 
eLife  2013;2:e00603.
Divergence in gene regulation can play a major role in evolution. Here, we used a phylogenetic framework to measure mRNA profiles in 15 yeast species from the phylum Ascomycota and reconstruct the evolution of their modular regulatory programs along a time course of growth on glucose over 300 million years. We found that modules have diverged proportionally to phylogenetic distance, with prominent changes in gene regulation accompanying changes in lifestyle and ploidy, especially in carbon metabolism. Paralogs have significantly contributed to regulatory divergence, typically within a very short window from their duplication. Paralogs from a whole genome duplication (WGD) event have a uniquely substantial contribution that extends over a longer span. Similar patterns occur when considering the evolution of the heat shock regulatory program measured in eight of the species, suggesting that these are general evolutionary principles.
eLife digest
The incredible diversity of living creatures belies the fact that their genes are quite similar. In the 1970s Mary-Claire King and Allan Wilson proposed that a process called gene regulation—which determines when, where and how genes are expressed as proteins—is responsible for this diversity. Four decades later, the central role of gene regulation in evolution has been confirmed in a wide range of species including bacteria, fungi, flies and mammals, although the details remain poorly understood. In recent years it has been suggested that the duplication of genes—and sometimes the duplication of whole genomes—has had a crucial influence on the part played by gene regulation in the evolution of many different species.
Ascomycota fungi are uniquely suited to the study of genetics and evolution because of their diversity—they include C. albicans, a fungus that is found in the human mouth and gut, and various species of yeast—and because many of their genomes have already been sequenced. Moreover, their genomes are relatively small, which simplifies the task of working out how it has changed over the course of evolution. It is also known that species in this branch of the tree of life diverged before and after an event in which a whole genome was duplicated.
Ascomycota fungi use glucose as a source of carbon in different ways during aerobic growth. Most, including C. albicans, are respiratory and rely on oxidative phosphorylation processes to produce energy. However, a small number—including S. cerevisiae and S. pombe, two types of yeast that are widely used as model organisms—prefer to ferment glucose, even when oxygen is available. Species that favor the latter respiro-fermentative lifestyle have evolved independently at least twice: once after the whole genome duplication event that lead to S. cerevisiae, and once when S. pombe and the other fission yeasts evolved.
Thompson et al. have measured mRNA profiles in 15 different species of yeast and reconstructed how the regulation of groups of genes (modules) have evolved over a period of more than 300 million years. They found that modules have diverged proportionally to evolutionary time, with prominent changes in gene regulation being associated with changes in lifestyle (especially changes in carbon metabolism) and a whole genome duplication event.
Gene duplication events result in gene paralogs—identical genes at different places in the genome—and these have made significant contributions to the evolution of different forms of gene regulation, especially just after the duplication event. Moreover, the paralogs produced in whole genome duplication events have resulted in bigger changes over longer periods of time. Similar patterns were observed in the regulation of the genes involved in the response to heat shock in eight of the species, which suggests that these are general evolutionary principles.
The changes in gene expression associated with the respiro-fermentative lifestyle may also have implications for our understanding of cancer: healthy cells rely on oxidative phosphorylation to produce energy whereas, similar to yeast cells, most cancerous cells rely on respiro-fermentation. Furthermore, yeast cells and cancer cells both support their rapid growth and proliferation by using glucose for biosynthesis to support cell division, although this process is not fully understood. Normal cells, on the other hand, use glucose primarily for energy and tend not to divide rapidly.
Thompson et al. found that the genes encoding enzymes in two biosynthetic pathways—one that produces the nucleotides necessary for DNA replication, and one that synthesizes glycine—are induced in respiro-fermentative yeasts but repressed in respiratory yeast cells. The fact that similar changes are observed in the same two pathways when normal cells become cancer cells suggests that these pathways have an important role in the development of cancer. The framework developed by Thompson et al. could also be used to explore the evolution of gene regulation in other species and biological processes.
PMCID: PMC3687341  PMID: 23795289
regulatory evolution; duplication; divergence; carbon lifestyle; module; gene expression; S. cerevisiae; S. pombe
22.  Evolutionary Diversification of Plant Shikimate Kinase Gene Duplicates 
PLoS Genetics  2008;4(12):e1000292.
Shikimate kinase (SK; EC catalyzes the fifth reaction of the shikimate pathway, which directs carbon from the central metabolism pool to a broad range of secondary metabolites involved in plant development, growth, and stress responses. In this study, we demonstrate the role of plant SK gene duplicate evolution in the diversification of metabolic regulation and the acquisition of novel and physiologically essential function. Phylogenetic analysis of plant SK homologs resolves an orthologous cluster of plant SKs and two functionally distinct orthologous clusters. These previously undescribed genes, shikimate kinase-like 1 (SKL1) and -2 (SKL2), do not encode SK activity, are present in all major plant lineages, and apparently evolved under positive selection following SK gene duplication over 400 MYA. This is supported by functional assays using recombinant SK, SKL1, and SKL2 from Arabidopsis thaliana (At) and evolutionary analyses of the diversification of SK-catalytic and -substrate binding sites based on theoretical structure models. AtSKL1 mutants yield albino and novel variegated phenotypes, which indicate SKL1 is required for chloroplast biogenesis. Extant SKL2 sequences show a strong genetic signature of positive selection, which is enriched in a protein–protein interaction module not found in other SK homologs. We also report the first kinetic characterization of plant SKs and show that gene expression diversification among the AtSK inparalogs is correlated with developmental processes and stress responses. This study examines the functional diversification of ancient and recent plant SK gene duplicates and highlights the utility of SKs as scaffolds for functional innovation.
Author Summary
Gene duplicates provide an opportunity for functional innovation by buffering their ancestral function. Mutations or genomic rearrangements altering when and where the duplicates are expressed, or the structure/function of the products encoded by the genes, can provide a selective advantage to the organism and are subsequently retained. In this study, we demonstrate that duplicates of genes encoding the metabolic enzyme shikimate kinase (SK) in plants have evolved to acquire novel gene product functions and novel gene expression patterns. We introduce two ancient genes, SKL1 and SKL2, present in all higher plant groups that were previously overlooked due to their overall similarity to the ancestral SKs from which they originated. SKL1 mutants in the model plant Arabidopsis indicate this gene is required for chloroplast biogenesis. We show that SKL2 acquired a protein–protein interaction domain that is evolving under positive selection. We also show that SK duplicates that retained their ancestral enzyme function have acquired new expression patterns correlated with developmental processes and stress responses. These findings demonstrate that plant SK evolution has played an important role in both the acquisition of novel gene function as well as the diversification of metabolic regulation.
PMCID: PMC2593004  PMID: 19057671
23.  Chromatin regulators as capacitors of interspecies variations in gene expression 
Deletion of eight chromatin regulators and one transcription factor increases the variability in gene expression between two closely related yeast species, suggesting that large-scale regulators often buffer variations in gene expression.Similar analysis of metabolic enzymes indicates that, unlike regulators, these enzymes do not buffer gene expression variations.
Biological systems are often robust to mutations—their outputs, (for example, gene expression profiles) remain stable in the face of mutations. This ensures that most individuals maintain the ‘correct' behavior, which has been shaped by million of years of evolution, despite a constant flux of mutations. How is robustness maintained, and in particular, which genes are required for it? Such questions have been studied for decades, yet there are no simple answers.
Previous studies suggested that particular proteins, termed genetic capacitors, buffer the effects of mutations, thereby promoting robustness. The classical example of such a protein is Hsp90, whose activity as a chaperone has been proposed to aid the correct folding of mutant proteins and thus buffer the structural effects of mutations. The hallmark of a genetic capacitor is that its deletion reveals phenotypic differences between individuals or species, which are hidden (that is, buffered) in its presence.
The example of Hsp90 may suggest that buffering is a property of only few proteins that carry particular catalytic functions such as chaperones. However, theoretical studies have instead suggested that many proteins serve as genetic capacitors and that buffering is not necessarily a consequence of their direct activity but rather emerges naturally during evolution of complex biological systems.
Here, we show that eight chromatin regulators and one transcription factor buffer interspecies variations in gene expression. We deleted each of these nine regulators in two closely related yeast species and compared the extent of interspecies expression difference before and after each deletion. The results clearly show that deletion of these regulators tends to increase the extent of expression differences, indicating that they are normally buffering variations in gene expression, thus serving as genetic capacitors.
Similar analysis of 11 metabolic enzymes showed that, unlike the regulators, deletion of these enzymes does not increase expression divergence. Thus, buffering may be a characteristic feature of large-scale regulators. Further analysis of the buffered variations suggested that these are often caused by mutations that affect regulatory proteins, presumably those involved in sensing the environment, and that buffered variations are found primarily in genes with distinctive promoter features that are associated with highly dynamic and responsive regulation.
We believe, as others have previously proposed, that buffering emerged naturally during evolution of a complex system. More specifically, we propose that organisms accumulate many mutations that have no functional consequences through random drift, but that some of these mutations would in fact be functional if a certain regulatory protein is inactive. These mutations are often conditionally neutral because of their epistatic interactions with mutations in regulatory proteins. Such epistatic interactions may not reflect direct buffering activity (as proposed for Hsp90) but rather an inevitable consequence of the connectivity and complexity of biological systems. Note that the opposite case—mutations that are normally functional but become neutral when the regulatory protein is inactive—are also frequent, but these are presumed to be efficiently purged by natural selection. As a result, deletion of such regulatory proteins unleashes the effects of many ‘hidden' mutations and increases variations among individuals or species.
Gene expression varies widely between closely related species and strains, yet the genetic basis of most differences is still unknown. Several studies suggested that chromatin regulators have a key role in generating expression diversity, predicting a reduction in the interspecies differences on deletion of genes that influence chromatin structure or modifications. To examine this, we compared the genome-wide expression profiles of two closely related yeast species following the individual deletions of eight chromatin regulators and one transcription factor. In all cases, regulator deletions increased, rather than decreased, the expression differences between the species, revealing hidden genetic variability that was masked in the wild-type backgrounds. This effect was not observed for individual deletions of 11 enzymes involved in central metabolic pathways. The buffered variations were associated with trans differences, as revealed by allele-specific profiling of the interspecific hybrids. Our results support the idea that regulatory proteins serve as capacitors that buffer gene expression against hidden genetic variability.
PMCID: PMC3010112  PMID: 21119629
chromatin structure; evolution; gene expression; genetic capacitor
24.  Substitution as a Mechanism for Genetic Robustness: The Duplicated Deacetylases Hst1p and Sir2p in Saccharomyces cerevisiae  
PLoS Genetics  2007;3(8):e126.
How duplicate genes provide genetic robustness remains an unresolved question. We have examined the duplicated histone deacetylases Sir2p and Hst1p in Saccharomyces cerevisiae and find that these paralogs with non-overlapping functions can provide genetic robustness against null mutations through a substitution mechanism. Hst1p is an NAD+-dependent histone deacetylase that acts with Sum1p to repress a subset of midsporulation genes. However, hst1Δ mutants show much weaker derepression of target loci than sum1Δ mutants. We show that this modest derepression of target loci in hst1Δ strains occurs in part because Sir2p substitutes for Hst1p. Sir2p contributes to repression of the midsporulation genes only in the absence of Hst1p and is recruited to target promoters by a physical interaction with the Sum1 complex. Furthermore, when Sir2p associates with the Sum1 complex, the complex continues to repress in a promoter-specific manner and does not spread. Our results imply that after the duplication, SIR2 and HST1 subfunctionalized. The single SIR2/HST1 gene from Kluyveromyces lactis, a closely related species that diverged prior to the duplication, can suppress an hst1Δ mutation in S. cerevisiae as well as interact with Sir4p in S. cerevisiae. In addition, the existence of two distinct protein interaction domains for the Sir and Sum1 complexes was revealed through the analysis of a chimeric Sir2–Hst1 molecule. Therefore, the ability of Sir2p to substitute for Hst1p probably results from a retained but reduced affinity for the Sum1 complex that is a consequence of subfunctionalization via the duplication, degeneration, and complementation mechanism. These results suggest that the evolutionary path of duplicate gene preservation may be an important indicator for the ability of duplicated genes to contribute to genetic robustness.
Author Summary
Gene duplication is an important force in evolution, as it provides a source of new genetic material. However, the mechanisms by which duplicated genes are retained and diverge are understudied at the experimental level. We have examined a pair of duplicated histone deacetylases Hst1p and Sir2p from baker's yeast, which are important for distinct types of gene repression. In this study, we show that before the duplication the ancestral histone deacetylase had both Hst1p- and Sir2p-like functions, and after the duplication Sir2p and Hst1p subfunctionalized, giving rise to two distinct proteins with normally nonoverlapping functions. Despite having partitioned the ancestral functions after the duplication, Sir2p can substitute for Hst1p in its absence by interacting with the normal partner of Hst1p. This study suggests that the evolutionary path of duplicate gene preservation may be an important indicator for the ability of duplicated genes to substitute for one another and hence protect the organism against deleterious mutations.
PMCID: PMC1937012  PMID: 17676954
25.  Profiling of gene duplication patterns of sequenced teleost genomes: evidence for rapid lineage-specific genome expansion mediated by recent tandem duplications 
BMC Genomics  2012;13:246.
Gene duplication has had a major impact on genome evolution. Localized (or tandem) duplication resulting from unequal crossing over and whole genome duplication are believed to be the two dominant mechanisms contributing to vertebrate genome evolution. While much scrutiny has been directed toward discerning patterns indicative of whole-genome duplication events in teleost species, less attention has been paid to the continuous nature of gene duplications and their impact on the size, gene content, functional diversity, and overall architecture of teleost genomes.
Here, using a Markov clustering algorithm directed approach we catalogue and analyze patterns of gene duplication in the four model teleost species with chromosomal coordinates: zebrafish, medaka, stickleback, and Tetraodon. Our analyses based on set size, duplication type, synonymous substitution rate (Ks), and gene ontology emphasize shared and lineage-specific patterns of genome evolution via gene duplication. Most strikingly, our analyses highlight the extraordinary duplication and retention rate of recent duplicates in zebrafish and their likely role in the structural and functional expansion of the zebrafish genome. We find that the zebrafish genome is remarkable in its large number of duplicated genes, small duplicate set size, biased Ks distribution toward minimal mutational divergence, and proportion of tandem and intra-chromosomal duplicates when compared with the other teleost model genomes. The observed gene duplication patterns have played significant roles in shaping the architecture of teleost genomes and appear to have contributed to the recent functional diversification and divergence of important physiological processes in zebrafish.
We have analyzed gene duplication patterns and duplication types among the available teleost genomes and found that a large number of genes were tandemly and intrachromosomally duplicated, suggesting their origin of independent and continuous duplication. This is particularly true for the zebrafish genome. Further analysis of the duplicated gene sets indicated that a significant portion of duplicated genes in the zebrafish genome were of recent, lineage-specific duplication events. Most strikingly, a subset of duplicated genes is enriched among the recently duplicated genes involved in immune or sensory response pathways. Such findings demonstrated the significance of continuous gene duplication as well as that of whole genome duplication in the course of genome evolution.
PMCID: PMC3464592  PMID: 22702965
Gene duplication; Whole genome duplication; Teleost species; Tandem duplication

Results 1-25 (1333759)