Search tips
Search criteria

Results 1-25 (1192585)

Clipboard (0)

Related Articles

1.  Genetic interactions reveal the evolutionary trajectories of duplicate genes 
Duplicate genes show significantly fewer interactions than singleton genes, and functionally similar duplicates can exhibit dissimilar profiles because common interactions are ‘hidden' due to buffering.Genetic interaction profiles provide insights into evolutionary mechanisms of duplicate retention by distinguishing duplicates under dosage selection from those retained because of some divergence in function.The genetic interactions of duplicate genes evolve in an extremely asymmetric way and the directionality of this asymmetry correlates well with other evolutionary properties of duplicate genes.Genetic interaction profiles can be used to elucidate the divergent function of specific duplicate pairs.
Gene duplication and divergence serves as a primary source for new genes and new functions, and as such has broad implications on the evolutionary process. Duplicate genes within S. cerevisiae have been shown to retain a high degree of similarity with regard to many of their functional properties (Papp et al, 2004; Guan et al, 2007; Wapinski et al, 2007; Musso et al, 2008), and perturbation of duplicate genes has been shown to result in smaller fitness defects than singleton genes (Gu et al, 2003; DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Individual genetic interactions between pairs of genes and profiles of such interactions across the entire genome provide a new context in which to examine the properties of duplicate compensation.
In this study we use the most recent and comprehensive set of genetic interactions in yeast produced to date (Costanzo et al, 2010) to address questions of duplicate retention and redundancy. We show that the ability for duplicate genes to buffer the deletion of a partner has three main consequences. First it agrees with previous work demonstrating that a high proportion of duplicate pairs are synthetic lethal, a classic indication of the ability to buffer one another functionally (DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Second, it reduces the number of genetic interactions observed between duplicate genes and the rest of the genome by masking interactions relating to common function from experimental detection. Third, this buffering of common interactions serves to reduce profile similarity in spite of common function (Figure 1). The compensatory ability of functionally similar duplicates buffers genetic interactions related to their common function (reducing the number of genetic interactions overall), while allowing the measurement of interactions related to any divergent function. Thus, even functionally similar duplicates may have dissimilar genetic interaction profiles. As previously surmised (Ihmels et al, 2007), duplicate genes under selection for dosage amplification have differing profile characteristics. We show that dosage-mediated duplicates have much higher genetic interaction profile similarity than do other duplicate pairs. Furthermore, we show in a comparison with local neighbors on a protein–protein interaction (PPI) network, that although dosage-mediated duplicates more often have higher similarity to each other than they do to their neighbors, the reverse is true for duplicates in general. That is, slightly divergent duplicate genes more often exhibit a higher similarity with a common neighbor on the PPI network than they do with each other, and that observation is consistent with the idea that common interactions are buffered while interactions corresponding to divergent functions are observed.
We then asked whether duplicates' genetic interactions that are not buffered appear in a symmetric or an asymmetric fashion. Previous work has established asymmetric patterns with regard to PPI degree (Wagner, 2002; He and Zhang, 2005), sequence divergence (Conant and Wagner, 2003; Zhang et al, 2003; Kellis et al, 2004; Scannell and Wolfe, 2008) and expression patterns (Gu et al, 2002b; Tirosh and Barkai, 2007). Although genetic interactions are further removed from mechanism than protein–protein interactions, for example, they do offer a more direct measurement of functional consequence and, thus, may give a better indication of the functional differences between a duplicate pair. We found that duplicates exhibit a strikingly asymmetric pattern of genetic interactions, with the ratio of interactions between sisters commonly exceeding 7:1 (Figure 4A). The observations differ significantly from random simulations in which genetic interactions were redistributed between sisters with equal probability (Figure 4A). Moreover, the directionality of this interaction asymmetry agrees with other physiological properties of duplicate pairs. For example, the sister with more genetic interactions also tends to have more protein–protein interactions and also tends to evolve at a slower rate (Figure 4B).
Genetic interaction degree and profiles can be used to understand the functional divergence of particular duplicates pairs. As a case example, we consider the whole-genome-duplication pair CIK1–VIK1. Each of these genes encode proteins that form distinct heterodimeric complexes with the microtubule motor protein Kar3 (Manning et al, 1999). Although each of these proteins depend on a direct physical interaction with Kar3, Cik1 has a much higher profile similarity to Kar3 than does Vik1 (r=0.5 and r=0.3, respectively). Consistent with its higher similarity, Δcik1 and Δkar3 exhibit several similar phenotypes, including abnormally short spindles, chromosome loss and delayed cell cycle progression (Page et al, 1994; Manning et al, 1999). In contrast, a Δvik1 mutant strain exhibits no overt phenotype (Manning et al, 1999).
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. Large-scale genetic network analysis in Saccharomyces cerevisiae provides a powerful perspective for addressing these questions through quantitative measurements of genetic interactions between pairs of duplicated genes, and more generally, through the study of genome-wide genetic interaction profiles associated with duplicated genes. We show that duplicate genes exhibit fewer genetic interactions than other genes because they tend to buffer one another functionally, whereas observed interactions are non-overlapping and reflect their divergent roles. We also show that duplicate gene pairs are highly imbalanced in their number of genetic interactions with other genes, a pattern that appears to result from asymmetric evolution, such that one duplicate evolves or degrades faster than the other and often becomes functionally or conditionally specialized. The differences in genetic interactions are predictive of differences in several other evolutionary and physiological properties of duplicate pairs.
PMCID: PMC3010121  PMID: 21081923
duplicate genes; functional divergence; genetic interactions; paralogs; Saccharomyces cerevisiae
2.  A Synergism between Adaptive Effects and Evolvability Drives Whole Genome Duplication to Fixation 
PLoS Computational Biology  2014;10(4):e1003547.
Whole genome duplication has shaped eukaryotic evolutionary history and has been associated with drastic environmental change and species radiation. While the most common fate of WGD duplicates is a return to single copy, retained duplicates have been found enriched for highly interacting genes. This pattern has been explained by a neutral process of subfunctionalization and more recently, dosage balance selection. However, much about the relationship between environmental change, WGD and adaptation remains unknown. Here, we study the duplicate retention pattern postWGD, by letting virtual cells adapt to environmental changes. The virtual cells have structured genomes that encode a regulatory network and simple metabolism. Populations are under selection for homeostasis and evolve by point mutations, small indels and WGD. After populations had initially adapted fully to fluctuating resource conditions re-adaptation to a broad range of novel environments was studied by tracking mutations in the line of descent. WGD was established in a minority (≈30%) of lineages, yet, these were significantly more successful at re-adaptation. Unexpectedly, WGD lineages conserved more seemingly redundant genes, yet had higher per gene mutation rates. While WGD duplicates of all functional classes were significantly over-retained compared to a model of neutral losses, duplicate retention was clearly biased towards highly connected TFs. Importantly, no subfunctionalization occurred in conserved pairs, strongly suggesting that dosage balance shaped retention. Meanwhile, singles diverged significantly. WGD, therefore, is a powerful mechanism to cope with environmental change, allowing conservation of a core machinery, while adapting the peripheral network to accommodate change.
Author Summary
The evolution of eukaryotes is characterized by drastic changes in their genome content. Genome expansions have often occurred by duplication of the entire genome. It is generally not know whether organisms gain any adaptive advantage from these mutations. However, they appear to become fixed in response to environmental change. Many interesting whole genome duplications happened long ago in eukaryotic evolutionary history during periods of turbulent genome and species evolution. Genomic data analysis alone cannot resolve the evolutionary mechanisms and consequences of whole genome duplication. Here, we modeled evolution with whole genome duplications in a Virtual Cell model. Simulating populations that undergo a range of different environmental changes we found that next to often increasing fitness directly, whole genome duplications made lineages more evolvable and hence more able to adapt to harsh new environments. Although most duplicates are deleted in subsequent evolution, genes with many interaction partners were retained preferentially, increasing regulatory complexity. Interestingly however, we found that innovation happened most likely in the more loosely connected and less essential genes.
PMCID: PMC3990473  PMID: 24743268
3.  The Roles of Whole-Genome and Small-Scale Duplications in the Functional Specialization of Saccharomyces cerevisiae Genes 
PLoS Genetics  2013;9(1):e1003176.
Researchers have long been enthralled with the idea that gene duplication can generate novel functions, crediting this process with great evolutionary importance. Empirical data shows that whole-genome duplications (WGDs) are more likely to be retained than small-scale duplications (SSDs), though their relative contribution to the functional fate of duplicates remains unexplored. Using the map of genetic interactions and the re-sequencing of 27 Saccharomyces cerevisiae genomes evolving for 2,200 generations we show that SSD-duplicates lead to neo-functionalization while WGD-duplicates partition ancestral functions. This conclusion is supported by: (a) SSD-duplicates establish more genetic interactions than singletons and WGD-duplicates; (b) SSD-duplicates copies share more interaction-partners than WGD-duplicates copies; (c) WGD-duplicates interaction partners are more functionally related than SSD-duplicates partners; (d) SSD-duplicates gene copies are more functionally divergent from one another, while keeping more overlapping functions, and diverge in their sub-cellular locations more than WGD-duplicates copies; and (e) SSD-duplicates complement their functions to a greater extent than WGD–duplicates. We propose a novel model that uncovers the complexity of evolution after gene duplication.
Author Summary
Gene duplication involves the doubling of a gene, originating an identical gene copy. Early evolutionary theory predicted that, as one gene copy is performing the ancestral function, the other gene copy, devoid from strong selection constraints, could evolve exploring alternative functions. Because of its potential to generate novel functions, hence biological complexity, gene duplication has been credited with enormous evolutionary importance. The way in which duplicated genes acquire novel functions remains the focus of intense research. Does the mechanism of duplication—duplication of small genome regions versus genome duplication—influence the fate of duplicates? Although it has been shown that the mechanism of duplication determines the persistence of genes in duplicate, a model describing the functional fates of duplicates generated by whole-genome or small-scale duplications remains largely obscure. Here we show that despite the large amount of genetic material originated by whole-genome duplication in the yeast Saccharomyces cerevisiae, these duplicates specialized in subsets of ancestral functions. Conversely, small-scale duplicates originated novel functions. We describe and test a model to explain the evolutionary dynamics of duplicates originated by different mechanisms. Our results shed light on the functional fates of duplicates and role of the duplication mechanism in generating functional diversity.
PMCID: PMC3536658  PMID: 23300483
4.  Protein Under-Wrapping Causes Dosage Sensitivity and Decreases Gene Duplicability 
PLoS Genetics  2008;4(1):e11.
A fundamental issue in molecular evolution is how to identify the evolutionary forces that determine the fate of duplicated genes. The dosage balance hypothesis has been invoked to explain gene duplication patterns at the genomic level under the premise that a dosage imbalance among protein-complex subunits or interacting partners is often deleterious. Here we examine this hypothesis by investigating the molecular basis of dosage sensitivity. We focus on the extent of protein wrapping, which indicates how strongly the structural integrity of a protein relies on its interactive context. From this perspective, we predict that the duplicates of a highly under-wrapped protein or protein subunit should (1) be more sensitive to dosage imbalance and be less likely to be retained and (2) be more likely to survive from a whole-genome duplication (WGD) than from a non-WGD because a WGD causes little or no dosage imbalance. Our under-wrapping analysis of more than 12,000 protein structures strongly supports these predictions and further reveals that the effect of dosage sensitivity on gene duplicability decreases with increasing organismal complexity.
Author Summary
A gene duplication provides an extra gene copy that can be free to accumulate mutations and gain a new function. Therefore, gene duplication plays a very important role in evolution. However, the presence of an additional gene copy can sometimes be deleterious because it can lead to an excessive dosage relative to those of its interacting partners. This dosage imbalance effect in turn influences the fate of duplicated genes in evolution. Our study gives the first description to our knowledge of the molecular/structural basis for the dosage imbalance effect. We study the relationships between gene family size and extent of protein under-wrapping, a molecular quantifier of the reliance of the protein on binding partnerships to maintain structural integrity, indicative of the extent of structure protection from disruptive hydration. Using more than 12,000 protein three-dimensional structures from six organisms that range from bacteria to human, we show an inverse relationship between extent of protein under-wrapping and family size. That is, a duplication is unlikely to be tolerated if the protein is highly under-wrapped (i.e., its structure requires substantial stabilizing interactions with other proteins). We also show that the effect of dosage imbalance is more apparent in unicellular organisms but is buffered to some extent in higher eukaryotes.
PMCID: PMC2211539  PMID: 18208334
5.  Dose–Sensitivity, Conserved Non-Coding Sequences, and Duplicate Gene Retention Through Multiple Tetraploidies in the Grasses 
Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein–protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein–protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose–sensitive protein–DNA interactions between the regulatory regions of CNS-rich genes – nicknamed bigfoot genes – and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy.
PMCID: PMC3355796  PMID: 22645525
conserved non-coding sequence; polyploidy; fractionation; gene dosage; gene regulation
6.  Subfunctionalization of Duplicated Zebrafish pax6 Genes by cis-Regulatory Divergence 
PLoS Genetics  2008;4(2):e29.
Gene duplication is a major driver of evolutionary divergence. In most vertebrates a single PAX6 gene encodes a transcription factor required for eye, brain, olfactory system, and pancreas development. In zebrafish, following a postulated whole-genome duplication event in an ancestral teleost, duplicates pax6a and pax6b jointly fulfill these roles. Mapping of the homozygously viable eye mutant sunrise identified a homeodomain missense change in pax6b, leading to loss of target binding. The mild phenotype emphasizes role-sharing between the co-orthologues. Meticulous mapping of isolated BACs identified perturbed synteny relationships around the duplicates. This highlights the functional conservation of pax6 downstream (3′) control sequences, which in most vertebrates reside within the introns of a ubiquitously expressed neighbour gene, ELP4, whose pax6a-linked exons have been lost in zebrafish. Reporter transgenic studies in both mouse and zebrafish, combined with analysis of vertebrate sequence conservation, reveal loss and retention of specific cis-regulatory elements, correlating strongly with the diverged expression of co-orthologues, and providing clear evidence for evolution by subfunctionalization.
Author Summary
Studying the zebrafish small eyed mutant “sunrise,” we identified the causative amino acid change in the pax6b gene. This mutation leads to reduced DNA binding capacity. There are two closely related pax6 genes in zebrafish, pax6a and pax6b, which arose following a whole-genome duplication event about 350 million years ago; they map to different chromosomes. Each copy is now associated with a different subset of the neighbouring genes found associated with all vertebrate single-copy Pax6 genes. The expression patterns of pax6a and pax6b have diverged from each other since the duplication event. Some division of labour has emerged: pax6b is less widely expressed in the brain than pax6a, but only pax6b is found in the developing pancreas. Multiple evolutionarily conserved regulatory elements (enhancers) control these expression patterns, which can be recapitulated in transgenic animals. Some enhancer elements lie more than 150 kb outside the transcribed gene region, inside the introns of unrelated neighbouring genes. Such juxtaposition imposes the need to conserve gene order in many vertebrate species. Genome duplication releases the constraint for retaining all neighbouring genes. Thus, pax6a has lost the coding region of its immediate neighbours, although it retains most of the brain-specific regulatory domains. Duplication also allows some orderly changes in the exact role of each regulatory component, as long as the two duplicates can, together, ensure the complex expression pattern required for complete function. We demonstrate functional loss of a brain element downstream of pax6b, while an upstream pancreas enhancer element has evolved in a more complex way.
PMCID: PMC2242813  PMID: 18282108
7.  Molecular evolution of glycinin and β-conglycinin gene families in soybean (Glycine max L. Merr.) 
Heredity  2010;106(4):633-641.
There are two main classes of multi-subunit seed storage proteins, glycinin (11S) and β-conglycinin (7S), which account for approximately 70% of the total protein in a typical soybean seed. The subunits of these two protein classes are encoded by a number of genes. The genomic organization of these genes follows a complex evolutionary history. This research was designed to describe the origin and maintenance of genes in each of these gene families by analyzing the synteny, phylogenies, selection pressure and duplications of the genes in each gene family. The ancestral glycinin gene initially experienced a tandem duplication event; then, the genome underwent two subsequent rounds of whole-genome duplication, thereby resulting in duplication of the glycinin genes, and finally a tandem duplication likely gave rise to the Gy1 and Gy2 genes. The β-conglycinin genes primarily originated through the more recent whole-genome duplication and several tandem duplications. Purifying selection has had a key role in the maintenance of genes in both gene families. In addition, positive selection in the glycinin genes and a large deletion in a β-conglycinin exon contribute to the diversity of the duplicate genes. In summary, our results suggest that the duplicated genes in both gene families prefer to retain similar function throughout evolution and therefore may contribute to phenotypic robustness.
PMCID: PMC3183897  PMID: 20668431
β-Conglycinin; duplicate divergence; glycinin; molecular evolution; positive selection; soybean
8.  Comparative genomic analysis of C4 photosynthetic pathway evolution in grasses 
Genome Biology  2009;10(6):R68.
Comparison of the sorghum, maize and rice genomes shows that gene duplication and functional innovation is common to evolution of most but not all genes in the C4 photosynthetic pathway
Sorghum is the first C4 plant and the second grass with a full genome sequence available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution by comparing key photosynthetic enzyme genes in sorghum, maize (C4) and rice (C3), and to investigate a long-standing hypothesis that a reservoir of duplicated genes is a prerequisite for the evolution of C4 photosynthesis from a C3 progenitor.
We show that both whole-genome and individual gene duplication have contributed to the evolution of C4 photosynthesis. The C4 gene isoforms show differential duplicability, with some C4 genes being recruited from whole genome duplication duplicates by multiple modes of functional innovation. The sorghum and maize carbonic anhydrase genes display a novel mode of new gene formation, with recursive tandem duplication and gene fusion accompanied by adaptive evolution to produce C4 genes with one to three functional units. Other C4 enzymes in sorghum and maize also show evidence of adaptive evolution, though differing in level and mode. Intriguingly, a phosphoenolpyruvate carboxylase gene in the C3 plant rice has also been evolving rapidly and shows evidence of adaptive evolution, although lacking key mutations that are characteristic of C4 metabolism. We also found evidence that both gene redundancy and alternative splicing may have sheltered the evolution of new function.
Gene duplication followed by functional innovation is common to evolution of most but not all C4 genes. The apparently long time-lag between the availability of duplicates for recruitment into C4 and the appearance of C4 grasses, together with the heterogeneity of origins of C4 genes, suggests that there may have been a long transition process before the establishment of C4 photosynthesis.
PMCID: PMC2718502  PMID: 19549309
9.  Modification of Gene Duplicability during the Evolution of Protein Interaction Network 
PLoS Computational Biology  2011;7(4):e1002029.
Duplications of genes encoding highly connected and essential proteins are selected against in several species but not in human, where duplicated genes encode highly connected proteins. To understand when and how gene duplicability changed in evolution, we compare gene and network properties in four species (Escherichia coli, yeast, fly, and human) that are representative of the increase in evolutionary complexity, defined as progressive growth in the number of genes, cells, and cell types. We find that the origin and conservation of a gene significantly correlates with the properties of the encoded protein in the protein-protein interaction network. All four species preserve a core of singleton and central hubs that originated early in evolution, are highly conserved, and accomplish basic biological functions. Another group of hubs appeared in metazoans and duplicated in vertebrates, mostly through vertebrate-specific whole genome duplication. Such recent and duplicated hubs are frequently targets of microRNAs and show tissue-selective expression, suggesting that these are alternative mechanisms to control their dosage. Our study shows how networks modified during evolution and contributes to explaining the occurrence of somatic genetic diseases, such as cancer, in terms of network perturbations.
Author Summary
Gene copy number is often tightly controlled because it directly affects the gene dosage. In several species, including yeast, worm, and fly, genes that have a single gene copy (singleton genes) encode proteins with several connections in the protein interaction network (hubs) as well as essential proteins. Surprisingly, in mouse and human essential proteins and hubs are encoded by genes with more than one copy in the genome (duplicated genes). Here we show that these two distinct groups of hubs were acquired at different times during the evolution of protein interaction network and contribute in different ways to the cell life. Singleton hubs are ancestral genes that are conserved from prokaryotes to vertebrates and accomplish basic functions that deal with the cell survival. Duplicated hubs were acquired mostly within metazoans and duplicated through vertebrate-specific whole genome duplication. These genes are involved in processes that are crucial for the organization of multicellularity. Although duplicated, also recent hubs are subject to gene dosage control through microRNAs and tissue-selective expression. The clarification of how the protein interaction network evolves enables us to understand the adaptation to the progressive increase in complexity and to better characterize the genes involved in diseases such as cancer.
PMCID: PMC3072358  PMID: 21490719
10.  Divergent evolutionary fates of major photosynthetic gene networks following gene and whole genome duplications 
Plant Signaling & Behavior  2011;6(4):594-597.
Gene and genome duplication are recurring processes in flowering plants, and elucidating the mechanisms by which duplicated genes are lost or deployed is a key component of understanding plant evolution. Using gene ontologies (GO) or protein family (PFAM) domains, distinct patterns of duplicate retention and loss have been identified depending on gene functional properties and duplication mechanism, but little is known about how gene networks encoding interacting proteins (protein complexes or signaling cascades) evolve in response to duplication. We examined patterns of duplicate retention within four major gene networks involved in photosynthesis (the Calvin cycle, photosystem I, photosystem II and the light harvesting complex) across three species and four whole genome duplications, as well as small-scale duplications and showed that photosystem gene family evolution is governed largely by dosage sensitivity.1 In contrast, Calvin cycle gene families are not dosage-sensitive, but exhibit a greater capacity for functional differentiation. Here we review these findings, highlight how this study, by analyzing defined gene networks, is complementary to global studies using functional annotations such as GO and PFAM, and elaborate on one example of functional differentiation in the Calvin cycle gene family, transketolase.
PMCID: PMC3142401  PMID: 21494088
gene duplication; whole genome duplication; dosage sensitivity; balance hypothesis
11.  Genome Duplication and Gene Loss Affect the Evolution of Heat Shock Transcription Factor Genes in Legumes 
PLoS ONE  2014;9(7):e102825.
Whole-genome duplication events (polyploidy events) and gene loss events have played important roles in the evolution of legumes. Here we show that the vast majority of Hsf gene duplications resulted from whole genome duplication events rather than tandem duplication, and significant differences in gene retention exist between species. By searching for intraspecies gene colinearity (microsynteny) and dating the age distributions of duplicated genes, we found that genome duplications accounted for 42 of 46 Hsf-containing segments in Glycine max, while paired segments were rarely identified in Lotus japonicas, Medicago truncatula and Cajanus cajan. However, by comparing interspecies microsynteny, we determined that the great majority of Hsf-containing segments in Lotus japonicas, Medicago truncatula and Cajanus cajan show extensive conservation with the duplicated regions of Glycine max. These segments formed 17 groups of orthologous segments. These results suggest that these regions shared ancient genome duplication with Hsf genes in Glycine max, but more than half of the copies of these genes were lost. On the other hand, the Glycine max Hsf gene family retained approximately 75% and 84% of duplicated genes produced from the ancient genome duplication and recent Glycine-specific genome duplication, respectively. Continuous purifying selection has played a key role in the maintenance of Hsf genes in Glycine max. Expression analysis of the Hsf genes in Lotus japonicus revealed their putative involvement in multiple tissue-/developmental stages and responses to various abiotic stimuli. This study traces the evolution of Hsf genes in legume species and demonstrates that the rates of gene gain and loss are far from equilibrium in different species.
PMCID: PMC4105503  PMID: 25047803
12.  All duplicates are not equal: the difference between small-scale and genome duplication 
Genome Biology  2007;8(10):R209.
The comparison of pairs of gene duplications generated by small-scale duplications with those created by large-scale duplications shows that they differ in quantifiable ways. It is suggested that this is directly due to biases on the paths to gene retention rather than association with different functional categories.
Genes in populations are in constant flux, being gained through duplication and occasionally retained or, more frequently, lost from the genome. In this study we compare pairs of identifiable gene duplicates generated by small-scale (predominantly single-gene) duplications with those created by a large-scale gene duplication event (whole-genome duplication) in the yeast Saccharomyces cerevisiae.
We find a number of quantifiable differences between these data sets. Whole-genome duplicates tend to exhibit less profound phenotypic effects when deleted, are functionally less divergent, and are associated with a different set of functions than their small-scale duplicate counterparts. At first sight, either of these latter two features could provide a plausible mechanism by which the difference in dispensability might arise. However, we uncover no evidence suggesting that this is the case. We find that the difference in dispensability observed between the two duplicate types is limited to gene products found within protein complexes, and probably results from differences in the relative strength of the evolutionary pressures present following each type of duplication event.
Genes, and the proteins they specify, originating from small-scale and whole-genome duplication events differ in quantifiable ways. We infer that this is not due to their association with different functional categories; rather, it is a direct result of biases in gene retention.
PMCID: PMC2246283  PMID: 17916239
13.  Multiple Mechanisms Promote the Retained Expression of Gene Duplicates in the Tetraploid Frog Xenopus laevis 
PLoS Genetics  2006;2(4):e56.
Gene duplication provides a window of opportunity for biological variants to persist under the protection of a co-expressed copy with similar or redundant function. Duplication catalyzes innovation (neofunctionalization), subfunction degeneration (subfunctionalization), and genetic buffering (redundancy), and the genetic survival of each paralog is triggered by mechanisms that add, compromise, or do not alter protein function. We tested the applicability of three types of mechanisms for promoting the retained expression of duplicated genes in 290 expressed paralogs of the tetraploid clawed frog, Xenopus laevis. Tests were based on explicit expectations concerning the ka/ks ratio, and the number and location of nonsynonymous substitutions after duplication. Functional constraints on the majority of paralogs are not significantly different from a singleton ortholog. However, we recover strong support that some of them have an asymmetric rate of nonsynonymous substitution: 6% match predictions of the neofunctionalization hypothesis in that (1) each paralog accumulated nonsynonymous substitutions at a significantly different rate and (2) the one that evolves faster has a higher ka/ks ratio than the other paralog and than a singleton ortholog. Fewer paralogs (3%) exhibit a complementary pattern of substitution at the protein level that is predicted by enhancement or degradation of different functional domains, and the remaining 13% have a higher average ka/ks ratio in both paralogs that is consistent with altered functional constraints, diversifying selection, or activity-reducing mutations after duplication. We estimate that these paralogs have been retained since they originated by genome duplication between 21 and 41 million years ago. Multiple mechanisms operate to promote the retained expression of duplicates in the same genome, in genes in the same functional class, over the same period of time following duplication, and sometimes in the same pair of paralogs. None of these paralogs are superfluous; degradation or enhancement of different protein subfunctions and neofunctionalization are plausible hypotheses for the retained expression of some of them. Evolution of most X. laevis paralogs, however, is consistent with retained expression via mechanisms that do not radically alter functional constraints, such as selection to preserve post-duplication stoichiometry or temporal, quantitative, or spatial subfunctionalization.
Gene duplication plays a fundamental role in biological innovation but it is not clear how both copies of a duplicated gene manage to circumvent degradation by mutation if neither is unique. This study explores genetic mechanisms that could make each copy of a duplicate gene different, and therefore distinguishable and potentially preserved by natural selection. It is based on DNA sequences of the protein-coding region of 290 expressed duplicated genes in a frog, Xenopus laevis, that underwent complete duplication of its entire genome. Results provide evidence for multiple mechanisms acting within the same genome, within the same functional classes of genes, within the same period of time following duplication, and even on the same set of duplicated genes. Each copy of a duplicate gene may be subject to distinct evolutionary constraints, and this could be associated with degradation or enhancement of function. Functional constraints of most of these duplicates, however, are not substantially different from a single copy gene; their persistence in the first dozens of millions of years after duplication may more frequently be explained by mechanisms acting on their expression rather than their function.
PMCID: PMC1449897  PMID: 16683033
14.  Dynamics of Gene Duplication in the Genomes of Chlorophyll d-Producing Cyanobacteria: Implications for the Ecological Niche 
Gene duplication may be an important mechanism for the evolution of new functions and for the adaptive modulation of gene expression via dosage effects. Here, we analyzed the fate of gene duplicates for two strains of a novel group of cyanobacteria (genus Acaryochloris) that produces the far-red light absorbing chlorophyll d as its main photosynthetic pigment. The genomes of both strains contain an unusually high number of gene duplicates for bacteria. As has been observed for eukaryotic genomes, we find that the demography of gene duplicates can be well modeled by a birth–death process. Most duplicated Acaryochloris genes are of comparatively recent origin, are strain-specific, and tend to be located on different genetic elements. Analyses of selection on duplicates of different divergence classes suggest that a minority of paralogs exhibit near neutral evolutionary dynamics immediately following duplication but that most duplicate pairs (including those which have been retained for long periods) are under strong purifying selection against amino acid change. The likelihood of duplicate retention varied among gene functional classes, and the pronounced differences between strains in the pool of retained recent duplicates likely reflects differences in the nutrient status and other characteristics of their respective environments. We conclude that most duplicates are quickly purged from Acaryochloris genomes and that those which are retained likely make important contributions to organism ecology by conferring fitness benefits via gene dosage effects. The mechanism of enhanced duplication may involve homologous recombination between genetic elements mediated by paralogous copies of recA.
PMCID: PMC3156569  PMID: 21697100
Acaryochloris; recA; homologous recombination; plasmid
15.  Extensive Copy-Number Variation of Young Genes across Stickleback Populations 
PLoS Genetics  2014;10(12):e1004830.
Duplicate genes emerge as copy-number variations (CNVs) at the population level, and remain copy-number polymorphic until they are fixed or lost. The successful establishment of such structural polymorphisms in the genome plays an important role in evolution by promoting genetic diversity, complexity and innovation. To characterize the early evolutionary stages of duplicate genes and their potential adaptive benefits, we combine comparative genomics with population genomics analyses to evaluate the distribution and impact of CNVs across natural populations of an eco-genomic model, the three-spined stickleback. With whole genome sequences of 66 individuals from populations inhabiting three distinct habitats, we find that CNVs generally occur at low frequencies and are often only found in one of the 11 populations surveyed. A subset of CNVs, however, displays copy-number differentiation between populations, showing elevated within-population frequencies consistent with local adaptation. By comparing teleost genomes to identify lineage-specific genes and duplications in sticklebacks, we highlight rampant gene content differences among individuals in which over 30% of young duplicate genes are CNVs. These CNV genes are evolving rapidly at the molecular level and are enriched with functional categories associated with environmental interactions, depicting the dynamic early copy-number polymorphic stage of genes during population differentiation.
Author Summary
After a locus is duplicated in a genome, individuals from a population instantaneously differ in the number of copies of this locus producing a copy-number variation (CNV). Over time, the joint effects of selection and other evolutionary forces will act to either eliminate the extra genetic copy or retain it. Depending on this evolutionary interplay, young duplications, including newly duplicated genes, can persist for millions of years as CNVs. CNVs may especially be prevalent between populations that have colonized and adapted to disparate environments in which selective pressures differ. Using whole genome sequences from several populations of three-spined sticklebacks that inhabit different environments, we find that a third of young duplicated genes are CNVs. These young CNV genes are enriched with environmental response functions and evolving rapidly at the molecular level, making them promising candidates for a role in the rapid ecological adaptation to novel environments.
PMCID: PMC4256280  PMID: 25474574
16.  Following Tetraploidy in Maize, a Short Deletion Mechanism Removed Genes Preferentially from One of the Two Homeologs 
PLoS Biology  2010;8(6):e1000409.
Following genome duplication and selfish DNA expansion, maize used a heretofore unknown mechanism to shed redundant genes and functionless DNA with bias toward one of the parental genomes.
Previous work in Arabidopsis showed that after an ancient tetraploidy event, genes were preferentially removed from one of the two homeologs, a process known as fractionation. The mechanism of fractionation is unknown. We sought to determine whether such preferential, or biased, fractionation exists in maize and, if so, whether a specific mechanism could be implicated in this process. We studied the process of fractionation using two recently sequenced grass species: sorghum and maize. The maize lineage has experienced a tetraploidy since its divergence from sorghum approximately 12 million years ago, and fragments of many knocked-out genes retain enough sequence similarity to be easily identifiable. Using sorghum exons as the query sequence, we studied the fate of both orthologous genes in maize following the maize tetraploidy. We show that genes are predominantly lost, not relocated, and that single-gene loss by deletion is the rule. Based on comparisons with orthologous sorghum and rice genes, we also infer that the sequences present before the deletion events were flanked by short direct repeats, a signature of intra-chromosomal recombination. Evidence of this deletion mechanism is found 2.3 times more frequently on one of the maize homeologs, consistent with earlier observations of biased fractionation. The over-fractionated homeolog is also a greater than 3-fold better target for transposon removal, but does not have an observably higher synonymous base substitution rate, nor could we find differentially placed methylation domains. We conclude that fractionation is indeed biased in maize and that intra-chromosomal or possibly a similar illegitimate recombination is the primary mechanism by which fractionation occurs. The mechanism of intra-chromosomal recombination explains the observed bias in both gene and transposon loss in the maize lineage. The existence of fractionation bias demonstrates that the frequency of deletion is modulated. Among the evolutionary benefits of this deletion/fractionation mechanism is bulk DNA removal and the generation of novel combinations of regulatory sequences and coding regions.
Author Summary
All genomes can accumulate dispensable DNA in the form of duplications of individual genes or even partial or whole genome duplications. Genomes also can accumulate selfish DNA elements. Duplication events specifically are often followed by extensive gene loss. The maize genome is particularly extreme, having become tetraploid 10 million years ago and played host to massive transposon amplifications. We compared the genome of sorghum (which is homologous to the pre-tetraploid maize genome) with the two identifiable parental genomes retained in maize. The two maize genomes differ greatly: one of the parental genomes has lost 2.3 times more genes than the other, and the selfish DNA regions between genes were even more frequently lost, suggesting maize can distinguish between the parental genomes present in the original tetraploid. We show that genes are actually lost, not simply relocated. Deletions were rarely longer than a single gene, and occurred between repeated DNA sequences, suggesting mis-recombination as a mechanism of gene removal. We hypothesize an epigenetic mechanism of genome distinction to account for the selective loss. To the extent that the rate of base substitutions tracks time, we neither support nor refute claims of maize allotetraploidy. Finally, we explain why it makes sense that purifying selection in mammals does not operate at all like the gene and genome deletion program we describe here.
PMCID: PMC2893956  PMID: 20613864
17.  Striking Similarities in the Genomic Distribution of Tandemly Arrayed Genes in Arabidopsis and Rice 
PLoS Computational Biology  2006;2(9):e115.
In Arabidopsis, tandemly arrayed genes (TAGs) comprise >10% of the genes in the genome. These duplicated genes represent a rich template for genetic innovation, but little is known of the evolutionary forces governing their generation and maintenance. Here we compare the organization and evolution of TAGs between Arabidopsis and rice, two plant genomes that diverged ~150 million years ago. TAGs from the two genomes are similar in a number of respects, including the proportion of genes that are tandemly arrayed, the number of genes within an array, the number of tandem arrays, and the dearth of TAGs relative to single copy genes in centromeric regions. Analysis of recombination rates along rice chromosomes confirms a positive correlation between the occurrence of TAGs and recombination rate, as found in Arabidopsis. TAGs are also biased functionally relative to duplicated, nontandemly arrayed genes. In both genomes, TAGs are enriched for genes that encode membrane proteins and function in “abiotic and biotic stress” but underrepresented for genes involved in transcription and DNA or RNA binding functions. We speculate that these observations reflect an evolutionary trend in which successful tandem duplication involves genes either at the end of biochemical pathways or in flexible steps in a pathway, for which fluctuation in copy number is unlikely to affect downstream genes. Despite differences in the age distribution of tandem arrays, the striking similarities between rice and Arabidopsis indicate similar mechanisms of TAG generation and maintenance.
The nuclear genomes of higher plants vary tremendously in size and gene content. Much of this variation is attributable to gene duplication. To date, most studies of plant gene duplication have focused on whole genome duplication events, which duplicate all genes simultaneously. Another prominent process is single gene duplication, which often results in duplicated genes arranged in a tandem array. Here Rizzon, Ponger, and Gaut identify tandem arrays in rice and their genome organization between Arabidopsis and rice, two plant species that diverged ~150 million years ago. The two genomes contain a similar proportion of genes that are tandemly arrayed, with a similar number of genes within an array. Moreover, tandemly arrayed genes are most common in genomic regions of high recombination in both species. This organization appears to be a general feature of eukaryotic genomes, perhaps because duplication rates are higher in high recombination regions. Tandemly arrayed genes of rice and Arabidopsis also represent a biased gene set with regard to function. In contrast to genes duplicated through whole genome events, tandemly arrayed genes are enriched for genes that encode membrane proteins and genes that function in response to environmental stresses. Taken together, these observations suggest that tandemly arrayed genes represent a rich and relatively fluid source for plant adaptation.
PMCID: PMC1557586  PMID: 16948529
18.  Selection in the evolution of gene duplications 
Genome Biology  2002;3(2):research0008.1-research0008.9.
Gene duplications have a major role in the evolution of new biological functions. Theoretical studies often assume that a duplication per se is selectively neutral and that, following a duplication, one of the gene copies is freed from purifying (stabilizing) selection, which creates the potential for evolution of a new function.
In search of systematic evidence of accelerated evolution after duplication, we used data from 26 bacterial, six archaeal, and seven eukaryotic genomes to compare the mode and strength of selection acting on recently duplicated genes (paralogs) and on similarly diverged, unduplicated orthologous genes in different species. We find that the ratio of nonsynonymous to synonymous substitutions (Kn/Ks) in most paralogous pairs is <<1 and that paralogs typically evolve at similar rates, without significant asymmetry, indicating that both paralogs produced by a duplication are subject to purifying selection. This selection is, however, substantially weaker than the purifying selection affecting unduplicated orthologs that have diverged to the same extent as the analyzed paralogs. Most of the recently duplicated genes appear to be involved in various forms of environmental response; in particular, many of them encode membrane and secreted proteins.
The results of this analysis indicate that recently duplicated paralogs evolve faster than orthologs with the same level of divergence and similar functions, but apparently do not experience a phase of neutral evolution. We hypothesize that gene duplications that persist in an evolving lineage are beneficial from the time of their origin, due primarily to a protein dosage effect in response to variable environmental conditions; duplications are likely to give rise to new functions at a later phase of their evolution once a higher level of divergence is reached.
PMCID: PMC65685  PMID: 11864370
19.  Multiple Chromosomal Rearrangements Structured the Ancestral Vertebrate Hox-Bearing Protochromosomes 
PLoS Genetics  2009;5(1):e1000349.
While the proposal that large-scale genome expansions occurred early in vertebrate evolution is widely accepted, the exact mechanisms of the expansion—such as a single or multiple rounds of whole genome duplication, bloc chromosome duplications, large-scale individual gene duplications, or some combination of these—is unclear. Gene families with a single invertebrate member but four vertebrate members, such as the Hox clusters, provided early support for Ohno's hypothesis that two rounds of genome duplication (the 2R-model) occurred in the stem lineage of extant vertebrates. However, despite extensive study, the duplication history of the Hox clusters has remained unclear, calling into question its usefulness in resolving the role of large-scale gene or genome duplications in early vertebrates. Here, we present a phylogenetic analysis of the vertebrate Hox clusters and several linked genes (the Hox “paralogon”) and show that different phylogenies are obtained for Dlx and Col genes than for Hox and ErbB genes. We show that these results are robust to errors in phylogenetic inference and suggest that these competing phylogenies can be resolved if two chromosomal crossover events occurred in the ancestral vertebrate. These results resolve conflicting data on the order of Hox gene duplications and the role of genome duplication in vertebrate evolution and suggest that a period of genome reorganization occurred after genome duplications in early vertebrates.
Author Summary
The genome of vertebrates has expanded greatly in gene number since our last common ancestor with invertebrates. While it is clear that genome expansions occurred early in the evolution of vertebrates, the mechanisms of that expansion—such as a single or multiple rounds of whole genome duplication, chromosome duplications, large-scale individual gene duplications, or some combination of these—is unclear. Central to this debate has been the duplication history of Hox clusters, which ancestrally have four copies in vertebrates, but only a single copy in invertebrates. This 1∶4 ratio has been used to support the hypothesis that two rounds of whole-genome duplications occurred in early vertebrates (named the 2R model); however, the phylogeny of the Hox clusters and its linked genes (the Hox paralogon) seem to contradict this model. Here, we use phylogenetic methods to infer that two chromosomal rearrangements occurred shortly after the genome duplications within the Hox paralogon. These results resolve the apparent conflict between the duplication order of the Hox paralogon and the 2R model and suggest that vertebrates are pseudo-octoploids.
PMCID: PMC2622764  PMID: 19165336
20.  Generation of Tandem Direct Duplications by Reversed-Ends Transposition of Maize Ac Elements 
PLoS Genetics  2013;9(8):e1003691.
Tandem direct duplications are a common feature of the genomes of eukaryotes ranging from yeast to human, where they comprise a significant fraction of copy number variations. The prevailing model for the formation of tandem direct duplications is non-allelic homologous recombination (NAHR). Here we report the isolation of a series of duplications and reciprocal deletions isolated de novo from a maize allele containing two Class II Ac/Ds transposons. The duplication/deletion structures suggest that they were generated by alternative transposition reactions involving the termini of two nearby transposable elements. The deletion/duplication breakpoint junctions contain 8 bp target site duplications characteristic of Ac/Ds transposition events, confirming their formation directly by an alternative transposition mechanism. Tandem direct duplications and reciprocal deletions were generated at a relatively high frequency (∼0.5 to 1%) in the materials examined here in which transposons are positioned nearby each other in appropriate orientation; frequencies would likely be much lower in other genotypes. To test whether this mechanism may have contributed to maize genome evolution, we analyzed sequences flanking Ac/Ds and other hAT family transposons and identified three small tandem direct duplications with the structural features predicted by the alternative transposition mechanism. Together these results show that some class II transposons are capable of directly inducing tandem sequence duplications, and that this activity has contributed to the evolution of the maize genome.
Author Summary
The recent explosion of genome sequence data has greatly increased the need to understand the forces that shape eukaryotic genomes. A common feature of higher plant genomes is the presence of large numbers of duplications, often occurring as tandem repeats of thousands of base pairs. Despite the importance of gene duplications in evolution and disease, the precise mechanism(s) that generate tandem duplications are still unclear. In this study we identified nine new spontaneous duplications that arose flanking elements of the Ac transposon system. These duplications range in size from 8 kbp to >5,000 kbp, and all cases exhibit features characteristic of Ac transposition. Using similar criteria in a bioinformatics search, we identified three smaller duplications adjacent to other hAT family transposons in the maize B73 reference genome sequence. Our results show that transposable elements can directly generate tandem duplications via alternative transposition, and that this mechanism is responsible for at least some of the duplications present in the maize B73 genome. This work extends the significance of Barbara McClintock's discovery of transposable elements by demonstrating how they can act as agents of genome expansion.
PMCID: PMC3744419  PMID: 23966872
21.  Gene Expansion Shapes Genome Architecture in the Human Pathogen Lichtheimia corymbifera: An Evolutionary Genomics Analysis in the Ancient Terrestrial Mucorales (Mucoromycotina) 
PLoS Genetics  2014;10(8):e1004496.
Lichtheimia species are the second most important cause of mucormycosis in Europe. To provide broader insights into the molecular basis of the pathogenicity-associated traits of the basal Mucorales, we report the full genome sequence of L. corymbifera and compared it to the genome of Rhizopus oryzae, the most common cause of mucormycosis worldwide. The genome assembly encompasses 33.6 MB and 12,379 protein-coding genes. This study reveals four major differences of the L. corymbifera genome to R. oryzae: (i) the presence of an highly elevated number of gene duplications which are unlike R. oryzae not due to whole genome duplication (WGD), (ii) despite the relatively high incidence of introns, alternative splicing (AS) is not frequently observed for the generation of paralogs and in response to stress, (iii) the content of repetitive elements is strikingly low (<5%), (iv) L. corymbifera is typically haploid. Novel virulence factors were identified which may be involved in the regulation of the adaptation to iron-limitation, e.g. LCor01340.1 encoding a putative siderophore transporter and LCor00410.1 involved in the siderophore metabolism. Genes encoding the transcription factors LCor08192.1 and LCor01236.1, which are similar to GATA type regulators and to calcineurin regulated CRZ1, respectively, indicating an involvement of the calcineurin pathway in the adaption to iron limitation. Genes encoding MADS-box transcription factors are elevated up to 11 copies compared to the 1–4 copies usually found in other fungi. More findings are: (i) lower content of tRNAs, but unique codons in L. corymbifera, (ii) Over 25% of the proteins are apparently specific for L. corymbifera. (iii) L. corymbifera contains only 2/3 of the proteases (known to be essential virulence factors) in comparision to R. oryzae. On the other hand, the number of secreted proteases, however, is roughly twice as high as in R. oryzae.
Author Summary
Lichtheimia species are ubiquitous saprophytic fungi, which cause life-threating infections in humans. In contrast to the mucoralean pathogen R. oryzae, Lichtheimia species belong to the ancient mucoralean lineages. We determined the genome of L. corymbifera (formerly Mycocladus corymbifer ex Absidia corymbifera) and found high dissimilarities between L. corymbifera and other sequenced mucoralean fungi in terms of gene families and syntenies. A highly elevated number of gene duplications and expansions was observed, which comprises virulence-associated genes like proteases, transporters and iron uptake genes but also transcription factors and genes involved in signal transduction. In contrast to R. oryzae, we did not find evidence for a recent whole genome duplication in Lichtheimia. However, gene duplications create functionally diverse paralogs in L. corymbifera, which are differentially expressed in virulence-related compared to standard conditions. In addition, new potential virulence factors could be identified which may play a role in the regulation of the adaptation to iron-limitation. The L. corymbifera genome and the phylome will advance further research and better understanding of virulence mechanisms of these medically important pathogens at the level of genome architecture and evolution.
PMCID: PMC4133162  PMID: 25121733
22.  Interrogation of alternative splicing events in duplicated genes during evolution 
BMC Genomics  2011;12(Suppl 3):S16.
Gene duplication provides resources for developing novel genes and new functions while retaining the original functions. In addition, alternative splicing could increase the complexity of expression at the transcriptome and proteome level without increasing the number of gene copy in the genome. Duplication and alternative splicing are thought to work together to provide the diverse functions or expression patterns for eukaryotes. Previously, it was believed that duplication and alternative splicing were negatively correlated and probably interchangeable.
We look into the relationship between occurrence of alternative splicing and duplication at different time after duplication events. We found duplication and alternative splicing were indeed inversely correlated if only recently duplicated genes were considered, but they became positively correlated when we took those ancient duplications into account. Specifically, for slightly or moderately duplicated genes with gene families containing 2 - 7 paralogs, genes were more likely to evolve alternative splicing and had on average a greater number of alternative splicing isoforms after long-term evolution compared to singleton genes. On the other hand, those large gene families (contain at least 8 paralogs) had a lower proportion of alternative splicing, and fewer alternative splicing isoforms on average even when ancient duplicated genes were taken into consideration. We also found these duplicated genes having alternative splicing were under tighter evolutionary constraints compared to those having no alternative splicing, and had an enrichment of genes that participate in molecular transducer activities.
We studied the association between occurrences of alternative splicing and gene duplication. Our results implicate that there are key differences in functions and evolutionary constraints among singleton genes or duplicated genes with or without alternative splicing incidences. It implies that the gene duplication and alternative splicing may have different functional significance in the evolution of speciation diversity.
PMCID: PMC3333175  PMID: 22369477
23.  Conserved CO-FT regulons contribute to the photoperiod flowering control in soybean 
BMC Plant Biology  2014;14:9.
CO and FT orthologs, belonging to the BBX and PEBP family, respectively, have important and conserved roles in the photoperiod regulation of flowering time in plants. Soybean genome experienced at least three rounds of whole genome duplications (WGDs), which resulted in multiple copies of about 75% of genes. Subsequent subfunctionalization is the main fate for paralogous gene pairs during the evolutionary process.
The phylogenic relationships revealed that CO orthologs were widespread in the plant kingdom while FT orthologs were present only in angiosperms. Twenty-eight CO homologous genes and twenty-four FT homologous genes were gained in the soybean genome. Based on the collinear relationship, the soybean ancestral CO ortholog experienced three WGD events, but only two paralogous gene pairs (GmCOL1/2 and GmCOL5/13) survived in the modern soybean. The paralogous gene pairs, GmCOL1/2 or GmCOL5/13, showed similar expression patterns in pair but different between pairs, indicating that they functionally diverged. GmFTL1 to 7 were derived from the same ancestor prior to the whole genome triplication (WGT) event, and after the Legume WGD event the ancestor diverged into two branches, GmFTL3/5/7 and GmFTL1/2/4/6. GmFTL7 were truncated in the N-terminus compared to other FT-lineage genes, but ubiquitously expressed. Expressions of GmFTL1 to 6 were higher in leaves at the flowering stage than that at the seedling stage. GmFTL3 was expressed at the highest level in all tissues except roots at the seedling stage, and its circadian pattern was different from the other five ones. The transcript of GmFTL6 was highly accumulated in seedling roots. The circadian rhythms of GmCOL5/13 and GmFT1/2/4/5/6 were synchronized in a day, demonstrating the complicate relationship of CO-FT regulons in soybean leaves. Over-expression of GmCOL2 did not rescue the flowering phenotype of the Arabidopsis co mutant. However, ectopic expression of GmCOL5 did rescue the co mutant phenotype. All GmFTL1 to 6 showed flower-promoting activities in Arabidopsis.
After three recent rounds of whole genome duplications in the soybean, the paralogous genes of CO-FT regulons showed subfunctionalization through expression divergence. Then, only GmCOL5/13 kept flowering-promoting activities, while GmFTL1 to 6 contributed to flowering control. Additionally, GmCOL5/13 and GmFT1/2/3/4/5/6 showed similar circadian expression profiles. Therefore, our results suggested that GmCOL5/13 and GmFT1/2/3/4/5/6 formed the complicate CO-FT regulons in the photoperiod regulation of flowering time in soybean.
PMCID: PMC3890618  PMID: 24397545
CONSTANS; FLOWERING LOCUS T; Paralog; Ortholog; Functional divergence; Soybean
24.  Protein complexity, gene duplicability and gene dispensability in the yeast genome 
Gene  2006;387(1-2):109-117.
Using functional genomic and protein structural data we studied the effects of protein complexity (here defined as the number of subunit types in a protein) on gene dispensability and gene duplicability. We found that in terms of gene duplicability the major distinction in protein complexity is between hetero-complexes, each of which includes at least two different types of subunits (polypeptides), and homo-complexes, which include monomers and complexes that consist of only subunits of one polypeptide type. However, gene dispensability decreases only gradually as the number of subunit types in a protein complex increases. These observations suggest that the dosage balance hypothesis can explain gene duplicability of complex proteins well, but cannot completely explain the difference in dispensabilities between hetero-complex subunits. It is likely that knocking out a gene coding for a hetero-complex subunit would disrupt the function of the whole complex, so that the deletion effect on fitness would increase with protein complexity. We also found that multi-domain polypeptide genes are less dispensable but more duplicable than single domain polypeptide genes. Duplicate genes derived from the whole genome duplication event in yeast are more dispensable (except for ribosomal protein genes) than other duplicate genes. Further, we found that subunits of the same protein complex tend to have similar expression levels and similar effects of gene deletion on fitness. Finally, we estimated that in yeast the contribution of duplicate genes to genetic robustness against null mutation is ~ 9%, smaller than previously estimated. In yeast, protein complexity may serve as a better indicator of gene dispensability than do duplicate genes.
PMCID: PMC2707112  PMID: 17049186
Protein complex; Gene deletion; Fitness effect; Duplicate gene; Protein domain; Whole genome duplication
25.  Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana 
Biology  2013;2(4):1465-1487.
Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. However, the Arabidopsis genome is characterized by an inherently complex organization, since it has undergone ancient whole genome duplications, followed by gene reduction, diploidization events and extended rearrangements, which relocated and split up the retained portions. These events, together with probable chromosome reductions, dramatically increased the genome complexity, limiting its role as a reference. The identification of paralogs and single copy genes within a highly duplicated genome is a prerequisite to understand its organization and evolution and to improve its exploitation in comparative genomics. This is still controversial, even in the widely studied Arabidopsis genome. This is also due to the lack of a reference bioinformatics pipeline that could exhaustively identify paralogs and singleton genes. We describe here a complete computational strategy to detect both duplicated and single copy genes in a genome, discussing all the methodological issues that may strongly affect the results, their quality and their reliability. This approach was used to analyze the organization of Arabidopsis nuclear protein coding genes, and besides classifying computationally defined paralogs into networks and single copy genes into different classes, it unraveled further intriguing aspects concerning the genome annotation and the gene relationships in this reference plant species. Since our results may be useful for comparative genomics and genome functional analyses, we organized a dedicated web interface to make them accessible to the scientific community
PMCID: PMC4009786  PMID: 24833233
gene duplication; paralog genes; single copy genes; singleton; gene network; Arabidopsis; genome annotation

Results 1-25 (1192585)