Related Articles
Gene and genome duplication are recurring processes in flowering plants, and elucidating the mechanisms by which duplicated genes are lost or deployed is a key component of understanding plant evolution. Using gene ontologies (GO) or protein family (PFAM) domains, distinct patterns of duplicate retention and loss have been identified depending on gene functional properties and duplication mechanism, but little is known about how gene networks encoding interacting proteins (protein complexes or signaling cascades) evolve in response to duplication. We examined patterns of duplicate retention within four major gene networks involved in photosynthesis (the Calvin cycle, photosystem I, photosystem II and the light harvesting complex) across three species and four whole genome duplications, as well as small-scale duplications and showed that photosystem gene family evolution is governed largely by dosage sensitivity.1 In contrast, Calvin cycle gene families are not dosage-sensitive, but exhibit a greater capacity for functional differentiation. Here we review these findings, highlight how this study, by analyzing defined gene networks, is complementary to global studies using functional annotations such as GO and PFAM, and elaborate on one example of functional differentiation in the Calvin cycle gene family, transketolase.
doi:10.4161/psb.6.4.15370
PMCID: PMC3142401
PMID: 21494088
gene duplication; whole genome duplication; dosage sensitivity; balance hypothesis
Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein–protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein–protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose–sensitive protein–DNA interactions between the regulatory regions of CNS-rich genes – nicknamed bigfoot genes – and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy.
doi:10.3389/fpls.2011.00002
PMCID: PMC3355796
PMID: 22645525
conserved non-coding sequence; polyploidy; fractionation; gene dosage; gene regulation
Researchers have long been enthralled with the idea that gene duplication can generate novel functions, crediting this process with great evolutionary importance. Empirical data shows that whole-genome duplications (WGDs) are more likely to be retained than small-scale duplications (SSDs), though their relative contribution to the functional fate of duplicates remains unexplored. Using the map of genetic interactions and the re-sequencing of 27 Saccharomyces cerevisiae genomes evolving for 2,200 generations we show that SSD-duplicates lead to neo-functionalization while WGD-duplicates partition ancestral functions. This conclusion is supported by: (a) SSD-duplicates establish more genetic interactions than singletons and WGD-duplicates; (b) SSD-duplicates copies share more interaction-partners than WGD-duplicates copies; (c) WGD-duplicates interaction partners are more functionally related than SSD-duplicates partners; (d) SSD-duplicates gene copies are more functionally divergent from one another, while keeping more overlapping functions, and diverge in their sub-cellular locations more than WGD-duplicates copies; and (e) SSD-duplicates complement their functions to a greater extent than WGD–duplicates. We propose a novel model that uncovers the complexity of evolution after gene duplication.
Author Summary
Gene duplication involves the doubling of a gene, originating an identical gene copy. Early evolutionary theory predicted that, as one gene copy is performing the ancestral function, the other gene copy, devoid from strong selection constraints, could evolve exploring alternative functions. Because of its potential to generate novel functions, hence biological complexity, gene duplication has been credited with enormous evolutionary importance. The way in which duplicated genes acquire novel functions remains the focus of intense research. Does the mechanism of duplication—duplication of small genome regions versus genome duplication—influence the fate of duplicates? Although it has been shown that the mechanism of duplication determines the persistence of genes in duplicate, a model describing the functional fates of duplicates generated by whole-genome or small-scale duplications remains largely obscure. Here we show that despite the large amount of genetic material originated by whole-genome duplication in the yeast Saccharomyces cerevisiae, these duplicates specialized in subsets of ancestral functions. Conversely, small-scale duplicates originated novel functions. We describe and test a model to explain the evolutionary dynamics of duplicates originated by different mechanisms. Our results shed light on the functional fates of duplicates and role of the duplication mechanism in generating functional diversity.
doi:10.1371/journal.pgen.1003176
PMCID: PMC3536658
PMID: 23300483
There are two main classes of multi-subunit seed storage proteins, glycinin (11S) and β-conglycinin (7S), which account for approximately 70% of the total protein in a typical soybean seed. The subunits of these two protein classes are encoded by a number of genes. The genomic organization of these genes follows a complex evolutionary history. This research was designed to describe the origin and maintenance of genes in each of these gene families by analyzing the synteny, phylogenies, selection pressure and duplications of the genes in each gene family. The ancestral glycinin gene initially experienced a tandem duplication event; then, the genome underwent two subsequent rounds of whole-genome duplication, thereby resulting in duplication of the glycinin genes, and finally a tandem duplication likely gave rise to the Gy1 and Gy2 genes. The β-conglycinin genes primarily originated through the more recent whole-genome duplication and several tandem duplications. Purifying selection has had a key role in the maintenance of genes in both gene families. In addition, positive selection in the glycinin genes and a large deletion in a β-conglycinin exon contribute to the diversity of the duplicate genes. In summary, our results suggest that the duplicated genes in both gene families prefer to retain similar function throughout evolution and therefore may contribute to phenotypic robustness.
doi:10.1038/hdy.2010.97
PMCID: PMC3183897
PMID: 20668431
β-Conglycinin; duplicate divergence; glycinin; molecular evolution; positive selection; soybean
The evolutionary origins of the multitude of duplicate genes in the plant genomes are still incompletely understood. To gain an appreciation of the potential selective forces acting on these duplicates, we phylogenetically inferred the set of metabolic gene families from 10 flowering plant (angiosperm) genomes. We then compared the metabolic fluxes for these families, predicted using the Arabidopsis thaliana and Sorghum bicolor metabolic networks, with the families' duplication propensities. For duplications produced by both small scale (small-scale duplications) and genome duplication (whole-genome duplications), there is a significant association between the flux and the tendency to duplicate. Following this global analysis, we made a more fine-scale study of the selective constraints observed on plant sodium and phosphate transporters. We find that the different duplication mechanisms give rise to differing selective constraints. However, the exact nature of this pattern varies between the gene families, and we argue that the duplication mechanism alone does not define a duplicated gene's subsequent evolutionary trajectory. Collectively, our results argue for the interplay of history, function, and selection in shaping the duplicate gene evolution in plants.
doi:10.1093/gbe/evr115
PMCID: PMC3240960
PMID: 22056313
dosage selection; genome duplication; gene duplication
Gene and genome duplications provide a playground for various selective pressures and contribute significantly to genome complexity. It is assumed that the genomes of all major eukaryotic lineages possess duplicated regions that result from gene and genome duplication. There is evidence that the model plant Arabidopsis has been subjected to at least three whole-genome duplication events over the last 150–200 million years. As a result, many cellular processes are governed by redundantly acting gene families. Plants pass through two distinct life phases with a haploid gametophytic alternating with a diploid sporophytic generation. This ontogenetic difference in gene copy number has important implications for the outcome of deleterious mutations, which are masked by the second gene copy in diploid systems but expressed in a dominant fashion in haploid organisms. As a consequence, maintaining the activity of duplicated genes might be particularly advantageous during the haploid gametophytic generation. Here, we describe the distinctive features associated with the alteration of generations and discuss how activity profiles of duplicated genes might get modulated in a life phase dependent fashion.
doi:10.3389/fpls.2011.00094
PMCID: PMC3355729
PMID: 22645557
gene duplication; flowering plants; alternation of generations; haploid; diploid
While the proposal that large-scale genome expansions occurred early in vertebrate evolution is widely accepted, the exact mechanisms of the expansion—such as a single or multiple rounds of whole genome duplication, bloc chromosome duplications, large-scale individual gene duplications, or some combination of these—is unclear. Gene families with a single invertebrate member but four vertebrate members, such as the Hox clusters, provided early support for Ohno's hypothesis that two rounds of genome duplication (the 2R-model) occurred in the stem lineage of extant vertebrates. However, despite extensive study, the duplication history of the Hox clusters has remained unclear, calling into question its usefulness in resolving the role of large-scale gene or genome duplications in early vertebrates. Here, we present a phylogenetic analysis of the vertebrate Hox clusters and several linked genes (the Hox “paralogon”) and show that different phylogenies are obtained for Dlx and Col genes than for Hox and ErbB genes. We show that these results are robust to errors in phylogenetic inference and suggest that these competing phylogenies can be resolved if two chromosomal crossover events occurred in the ancestral vertebrate. These results resolve conflicting data on the order of Hox gene duplications and the role of genome duplication in vertebrate evolution and suggest that a period of genome reorganization occurred after genome duplications in early vertebrates.
Author Summary
The genome of vertebrates has expanded greatly in gene number since our last common ancestor with invertebrates. While it is clear that genome expansions occurred early in the evolution of vertebrates, the mechanisms of that expansion—such as a single or multiple rounds of whole genome duplication, chromosome duplications, large-scale individual gene duplications, or some combination of these—is unclear. Central to this debate has been the duplication history of Hox clusters, which ancestrally have four copies in vertebrates, but only a single copy in invertebrates. This 1∶4 ratio has been used to support the hypothesis that two rounds of whole-genome duplications occurred in early vertebrates (named the 2R model); however, the phylogeny of the Hox clusters and its linked genes (the Hox paralogon) seem to contradict this model. Here, we use phylogenetic methods to infer that two chromosomal rearrangements occurred shortly after the genome duplications within the Hox paralogon. These results resolve the apparent conflict between the duplication order of the Hox paralogon and the 2R model and suggest that vertebrates are pseudo-octoploids.
doi:10.1371/journal.pgen.1000349
PMCID: PMC2622764
PMID: 19165336
Phylogenetic analysis of gene gain and loss during vertebrate evolution provides evidence for the importance of early gene or genome duplication events in evolution of complex vertebrates.
Background
Gene duplication is assumed to have played a crucial role in the evolution of vertebrate organisms. Apart from a continuous mode of duplication, two or three whole genome duplication events have been proposed during the evolution of vertebrates, one or two at the dawn of vertebrate evolution, and an additional one in the fish lineage, not shared with land vertebrates. Here, we have studied gene gain and loss in seven different vertebrate genomes, spanning an evolutionary period of about 600 million years.
Results
We show that: first, the majority of duplicated genes in extant vertebrate genomes are ancient and were created at times that coincide with proposed whole genome duplication events; second, there exist significant differences in gene retention for different functional categories of genes between fishes and land vertebrates; third, there seems to be a considerable bias in gene retention of regulatory genes towards the mode of gene duplication (whole genome duplication events compared to smaller-scale events), which is in accordance with the so-called gene balance hypothesis; and fourth, that ancient duplicates that have survived for many hundreds of millions of years can still be lost.
Conclusion
Based on phylogenetic analyses, we show that both the mode of duplication and the functional class the duplicated genes belong to have been of major importance for the evolution of the vertebrates. In particular, we provide evidence that massive gene duplication (probably as a consequence of entire genome duplications) at the dawn of vertebrate evolution might have been particularly important for the evolution of complex vertebrates.
doi:10.1186/gb-2006-7-5-r43
PMCID: PMC1779523
PMID: 16723033
Background
Gene duplication, a major evolutionary path to genomic innovation, can occur at the scale of an entire genome. One such "whole-genome duplication" (WGD) event among the Ascomycota fungi gave rise to genes with distinct biological properties compared to small-scale duplications.
Results
We studied the evolution of transcriptional interactions of whole-genome duplicates, to understand how they are wired into the yeast regulatory system. Our work combines network analysis and modeling of the large-scale structure of the interactions stemming from the WGD.
Conclusions
The results uncover the WGD as a major source for the evolution of a complex interconnected block of transcriptional pathways. The inheritance of interactions among WGD duplicates follows elementary "duplication subgraphs", relating ancestral interactions with newly formed ones. Duplication subgraphs are correlated with their neighbours and give rise to higher order circuits with two elementary properties: newly formed transcriptional pathways remain connected (paths are not broken), and are preferentially cross-connected with ancestral ones. The result is a coherent and connected "WGD-network", where duplication subgraphs are arranged in an astonishingly ordered configuration.
doi:10.1186/1752-0509-4-77
PMCID: PMC2900227
PMID: 20525287
A fundamental issue in molecular evolution is how to identify the evolutionary forces that determine the fate of duplicated genes. The dosage balance hypothesis has been invoked to explain gene duplication patterns at the genomic level under the premise that a dosage imbalance among protein-complex subunits or interacting partners is often deleterious. Here we examine this hypothesis by investigating the molecular basis of dosage sensitivity. We focus on the extent of protein wrapping, which indicates how strongly the structural integrity of a protein relies on its interactive context. From this perspective, we predict that the duplicates of a highly under-wrapped protein or protein subunit should (1) be more sensitive to dosage imbalance and be less likely to be retained and (2) be more likely to survive from a whole-genome duplication (WGD) than from a non-WGD because a WGD causes little or no dosage imbalance. Our under-wrapping analysis of more than 12,000 protein structures strongly supports these predictions and further reveals that the effect of dosage sensitivity on gene duplicability decreases with increasing organismal complexity.
Author Summary
A gene duplication provides an extra gene copy that can be free to accumulate mutations and gain a new function. Therefore, gene duplication plays a very important role in evolution. However, the presence of an additional gene copy can sometimes be deleterious because it can lead to an excessive dosage relative to those of its interacting partners. This dosage imbalance effect in turn influences the fate of duplicated genes in evolution. Our study gives the first description to our knowledge of the molecular/structural basis for the dosage imbalance effect. We study the relationships between gene family size and extent of protein under-wrapping, a molecular quantifier of the reliance of the protein on binding partnerships to maintain structural integrity, indicative of the extent of structure protection from disruptive hydration. Using more than 12,000 protein three-dimensional structures from six organisms that range from bacteria to human, we show an inverse relationship between extent of protein under-wrapping and family size. That is, a duplication is unlikely to be tolerated if the protein is highly under-wrapped (i.e., its structure requires substantial stabilizing interactions with other proteins). We also show that the effect of dosage imbalance is more apparent in unicellular organisms but is buffered to some extent in higher eukaryotes.
doi:10.1371/journal.pgen.0040011
PMCID: PMC2211539
PMID: 18208334
The comparison of pairs of gene duplications generated by small-scale duplications with those created by large-scale duplications shows that they differ in quantifiable ways. It is suggested that this is directly due to biases on the paths to gene retention rather than association with different functional categories.
Background
Genes in populations are in constant flux, being gained through duplication and occasionally retained or, more frequently, lost from the genome. In this study we compare pairs of identifiable gene duplicates generated by small-scale (predominantly single-gene) duplications with those created by a large-scale gene duplication event (whole-genome duplication) in the yeast Saccharomyces cerevisiae.
Results
We find a number of quantifiable differences between these data sets. Whole-genome duplicates tend to exhibit less profound phenotypic effects when deleted, are functionally less divergent, and are associated with a different set of functions than their small-scale duplicate counterparts. At first sight, either of these latter two features could provide a plausible mechanism by which the difference in dispensability might arise. However, we uncover no evidence suggesting that this is the case. We find that the difference in dispensability observed between the two duplicate types is limited to gene products found within protein complexes, and probably results from differences in the relative strength of the evolutionary pressures present following each type of duplication event.
Conclusion
Genes, and the proteins they specify, originating from small-scale and whole-genome duplication events differ in quantifiable ways. We infer that this is not due to their association with different functional categories; rather, it is a direct result of biases in gene retention.
doi:10.1186/gb-2007-8-10-r209
PMCID: PMC2246283
PMID: 17916239
Comparison of the sorghum, maize and rice genomes shows that gene duplication and functional innovation is common to evolution of most but not all genes in the C4 photosynthetic pathway
Background
Sorghum is the first C4 plant and the second grass with a full genome sequence available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution by comparing key photosynthetic enzyme genes in sorghum, maize (C4) and rice (C3), and to investigate a long-standing hypothesis that a reservoir of duplicated genes is a prerequisite for the evolution of C4 photosynthesis from a C3 progenitor.
Results
We show that both whole-genome and individual gene duplication have contributed to the evolution of C4 photosynthesis. The C4 gene isoforms show differential duplicability, with some C4 genes being recruited from whole genome duplication duplicates by multiple modes of functional innovation. The sorghum and maize carbonic anhydrase genes display a novel mode of new gene formation, with recursive tandem duplication and gene fusion accompanied by adaptive evolution to produce C4 genes with one to three functional units. Other C4 enzymes in sorghum and maize also show evidence of adaptive evolution, though differing in level and mode. Intriguingly, a phosphoenolpyruvate carboxylase gene in the C3 plant rice has also been evolving rapidly and shows evidence of adaptive evolution, although lacking key mutations that are characteristic of C4 metabolism. We also found evidence that both gene redundancy and alternative splicing may have sheltered the evolution of new function.
Conclusions
Gene duplication followed by functional innovation is common to evolution of most but not all C4 genes. The apparently long time-lag between the availability of duplicates for recruitment into C4 and the appearance of C4 grasses, together with the heterogeneity of origins of C4 genes, suggests that there may have been a long transition process before the establishment of C4 photosynthesis.
doi:10.1186/gb-2009-10-6-r68
PMCID: PMC2718502
PMID: 19549309
After whole-genome duplication (WGD), deletions return most loci to single copy. However, duplicate loci may survive through selection for increased dosage. Here, we show how the WGD increased copy number of some glycolytic genes could have conferred an almost immediate selective advantage to an ancestor of Saccharomyces cerevisiae, providing a rationale for the success of the WGD. We propose that the loss of other redundant genes throughout the genome resulted in incremental dosage increases for the surviving duplicated glycolytic genes. This increase gave post-WGD yeasts a growth advantage through rapid glucose fermentation; one of this lineage's many adaptations to glucose-rich environments. Our hypothesis is supported by data from enzyme kinetics and comparative genomics. Because changes in gene dosage follow directly from post-WGD deletions, dosage selection can confer an almost instantaneous benefit after WGD, unlike neofunctionalization or subfunctionalization, which require specific mutations. We also show theoretically that increased fermentative capacity is of greatest advantage when glucose resources are both large and dense, an observation potentially related to the appearance of angiosperms around the time of WGD.
doi:10.1038/msb4100170
PMCID: PMC1943425
PMID: 17667951
evolution; genome duplication; metabolism
The loss of functional redundancy is the key process in the evolution of duplicated genes. Here we systematically assess the extent of functional redundancy among a large set of duplicated genes in Saccharomyces cerevisiae. We quantify growth rate in rich medium for a large number of S. cerevisiae strains that carry single and double deletions of duplicated and singleton genes. We demonstrate that duplicated genes can maintain substantial redundancy for extensive periods of time following duplication (∼100 million years). We find high levels of redundancy among genes duplicated both via the whole genome duplication and via smaller scale duplications. Further, we see no evidence that two duplicated genes together contribute to fitness in rich medium substantially beyond that of their ancestral progenitor gene. We argue that duplicate genes do not often evolve to behave like singleton genes even after very long periods of time.
Author Summary
Gene duplication is the primary source of new genes. To persist, duplicated genes must lose some of the original redundancy either by partitioning the ancestral function (subfunctionalization) or by gaining new non-redundant functions (neofunctionalization). The extent to which these processes shape the evolution of duplicated genes over long periods of time is unknown. We investigate these questions experimentally by building strains carrying single and double gene deletions of duplicated genes and measuring their growth rates in rich medium. Using these data, we determine that many duplicated genes are functionally redundant to a substantial degree. We also investigate how often duplicated genes gain new functionality. We demonstrate that the fitness effects of double deletions of duplicate genes are indistinguishable from our best estimate of the fitness effects of deletions of their ancestral singleton genes. We therefore argue that many duplicate genes do not gain substantial new functionality at least in the rich medium. Our results suggest that subfunctionalization does not generally proceed to completion, even after very long periods of time, and that neofunctionalization is either rare or of little consequence, at least under some growth conditions.
doi:10.1371/journal.pgen.1000113
PMCID: PMC2440806
PMID: 18604285
Duplications of genes encoding highly connected and essential proteins are selected against in several species but not in human, where duplicated genes encode highly connected proteins. To understand when and how gene duplicability changed in evolution, we compare gene and network properties in four species (Escherichia coli, yeast, fly, and human) that are representative of the increase in evolutionary complexity, defined as progressive growth in the number of genes, cells, and cell types. We find that the origin and conservation of a gene significantly correlates with the properties of the encoded protein in the protein-protein interaction network. All four species preserve a core of singleton and central hubs that originated early in evolution, are highly conserved, and accomplish basic biological functions. Another group of hubs appeared in metazoans and duplicated in vertebrates, mostly through vertebrate-specific whole genome duplication. Such recent and duplicated hubs are frequently targets of microRNAs and show tissue-selective expression, suggesting that these are alternative mechanisms to control their dosage. Our study shows how networks modified during evolution and contributes to explaining the occurrence of somatic genetic diseases, such as cancer, in terms of network perturbations.
Author Summary
Gene copy number is often tightly controlled because it directly affects the gene dosage. In several species, including yeast, worm, and fly, genes that have a single gene copy (singleton genes) encode proteins with several connections in the protein interaction network (hubs) as well as essential proteins. Surprisingly, in mouse and human essential proteins and hubs are encoded by genes with more than one copy in the genome (duplicated genes). Here we show that these two distinct groups of hubs were acquired at different times during the evolution of protein interaction network and contribute in different ways to the cell life. Singleton hubs are ancestral genes that are conserved from prokaryotes to vertebrates and accomplish basic functions that deal with the cell survival. Duplicated hubs were acquired mostly within metazoans and duplicated through vertebrate-specific whole genome duplication. These genes are involved in processes that are crucial for the organization of multicellularity. Although duplicated, also recent hubs are subject to gene dosage control through microRNAs and tissue-selective expression. The clarification of how the protein interaction network evolves enables us to understand the adaptation to the progressive increase in complexity and to better characterize the genes involved in diseases such as cancer.
doi:10.1371/journal.pcbi.1002029
PMCID: PMC3072358
PMID: 21490719
Polyploidy, or whole-genome duplication (WGD), is an important genomic feature for all eukaryotes, especially many plants and some animals. The common occurrence of polyploidy suggests an evolutionary advantage of having multiple sets of genetic material for adaptive evolution. However, increased gene and genome dosages in autopolyploids (duplications of a single genome) and allopolyploids (combinations of two or more divergent genomes) often cause genome instabilities, chromosome imbalances, regulatory incompatibilities, and reproductive failures. Therefore, new allopolyploids must establish a compatible relationship between alien cytoplasm and nuclei and between two divergent genomes, leading to rapid changes in genome structure, gene expression, and developmental traits such as fertility, inbreeding, apomixis, flowering time, and hybrid vigor. Although the underlying mechanisms for these changes are poorly understood, some themes are emerging. There is compelling evidence that changes in DNA sequence, cis- and trans-acting effects, chromatin modifications, RNA-mediated pathways, and regulatory networks modulate differential expression of homoeologous genes and phenotypic variation that may facilitate adaptive evolution in polyploid plants and domestication in crops.
doi:10.1146/annurev.arplant.58.032806.103835
PMCID: PMC1949485
PMID: 17280525
polyploidy; nonadditive gene expression; epigenetic regulation; RNA interference; evolution
The Sox gene family is found in a broad range of animal taxa and encodes important gene regulatory proteins involved in a variety of developmental processes. We have obtained clones representing the HMG boxes of twelve Sox genes from grass carp (Ctenopharyngodon idella), one of the four major domestic carps in China. The cloned Sox genes belong to group B1, B2 and C. Our analyses show that whereas the human genome contains a single copy of Sox4, Sox11 and Sox14, each of these genes has two co-orthologs in grass carp, and the duplication of Sox4 and Sox11 occurred before the divergence of grass carp and zebrafish, which support the "fish-specific whole-genome duplication" theory. An estimation for the origin of grass carp based on the molecular clock using Sox1, Sox3 and Sox11 genes as markers indicates that grass carp (subfamily Leuciscinae) and zebrafish (subfamily Danioninae) diverged approximately 60 million years ago. The potential uses of Sox genes as markers in revealing the evolutionary history of grass carp are discussed.
doi:10.1186/1297-9686-38-6-673
PMCID: PMC2689270
PMID: 17129566
grass carp (Ctenopharyngodon idella); Sox; genome duplication; co-ortholog; molecular clock
Background
Plant genomes contain a high proportion of duplicated genes as a result of numerous whole, segmental and local duplications. These duplications lead up to the formation of gene families, which are the usual material for many evolutionary studies. However, all characterized genomes include single-copy (unique) genes that have not received much attention. Unlike gene duplication, gene loss is not an unspecific mechanism but is rather influenced by a functional selection. In this context, we have established and used stringent criteria in order to identify suitable sets of unique genes present in plant proteomes. Comparisons of unique genes in the green phylum were used to characterize the gene and protein features exhibited by both conserved and species-specific unique genes.
Results
We identified the unique genes within both A. thaliana and O. sativa genomes and classified them according to the number of homologs in the alternative species: none (U{1:0}), one (U{1:1}) or several (U{1:m}). Regardless of the species, all the genes in these groups present some conserved characteristics, such as small average protein size and abnormal intron number. In order to understand the origin and function of unique genes, we further characterized the U{1:1} gene pairs. The possible involvement of sequence convergence in the creation of U{1:1} pairs was discarded due to the frequent conservation of intron positions. Furthermore, an orthology relationship between the two members of each U{1:1} pair was strongly supported by a high conservation in the protein sizes and transcription levels. Within the promoter of the unique conserved genes, we found a number of TATA and TELO boxes that specifically differed from their mean number in the whole genome. Many unique genes have been conserved as unique through evolution from the green alga Ostreococcus lucimarinus to higher plants. Plant unique genes may also have homologs in bacteria and we showed a link between the targeting towards plastids of proteins encoded by plant nuclear unique genes and their homology with a bacterial protein.
Conclusion
Many of the A. thaliana and O. sativa unique genes are conserved in plants for which the ancestor diverged at least 725 million years ago (MYA). Half of these genes are also present in other eukaryotic and/or prokaryotic species. Thus, our results indicate that (i) a strong negative selection pressure has conserved a number of genes as unique in genomes throughout evolution, (ii) most unique genes are subjected to a low divergence rate, (iii) they have some features observed in housekeeping genes but for most of them there is no functional annotation and (iv) they may have an ancient origin involving a possible gene transfer from ancestral chloroplasts or bacteria to the plant nucleus.
doi:10.1186/1471-2148-8-280
PMCID: PMC2576244
PMID: 18847470
Whole-genome duplications (WGDs) have occurred repeatedly in the vertebrate lineage, but their evolutionary significance for phenotypic evolution remains elusive. Here, we have investigated the impact of the fish-specific genome duplication (FSGD) on the evolution of pigmentation pathways in teleost fishes. Pigmentation and color patterning are among the most diverse traits in teleosts, and their pigmentary system is the most complex of all vertebrate groups.
Using a comparative genomic approach including phylogenetic and synteny analyses, the evolution of 128 vertebrate pigmentation genes in five teleost genomes following the FSGD has been reconstructed. We show that pigmentation genes have been preferentially retained in duplicate after the FSGD, so that teleosts have 30% more pigmentation genes compared with tetrapods. This is significantly higher than genome-wide estimates of FSGD gene duplicate retention in teleosts. Large parts of the melanocyte regulatory network have been retained in two copies after the FSGD. Duplicated pigmentation genes follow general evolutionary patterns such as the preservation of protein complex stoichiometries and the overrepresentation of developmental genes among retained duplicates. These results suggest that the FSGD has made an important contribution to the evolution of teleost-specific features of pigmentation, which include novel pigment cell types or the division of existing pigment cell types into distinct subtypes. Furthermore, we have observed species-specific differences in duplicate retention and evolution that might contribute to pigmentary diversity among teleosts.
Our study therefore strongly supports the hypothesis that WGDs have promoted the increase of complexity and diversity during vertebrate phenotypic evolution.
doi:10.1093/gbe/evp050
PMCID: PMC2839281
PMID: 20333216
genome duplication; fish; conserved synteny; pigment cell; melanocyte; functional module
Background
Gene duplication provides resources for developing novel genes and new functions while retaining the original functions. In addition, alternative splicing could increase the complexity of expression at the transcriptome and proteome level without increasing the number of gene copy in the genome. Duplication and alternative splicing are thought to work together to provide the diverse functions or expression patterns for eukaryotes. Previously, it was believed that duplication and alternative splicing were negatively correlated and probably interchangeable.
Results
We look into the relationship between occurrence of alternative splicing and duplication at different time after duplication events. We found duplication and alternative splicing were indeed inversely correlated if only recently duplicated genes were considered, but they became positively correlated when we took those ancient duplications into account. Specifically, for slightly or moderately duplicated genes with gene families containing 2 - 7 paralogs, genes were more likely to evolve alternative splicing and had on average a greater number of alternative splicing isoforms after long-term evolution compared to singleton genes. On the other hand, those large gene families (contain at least 8 paralogs) had a lower proportion of alternative splicing, and fewer alternative splicing isoforms on average even when ancient duplicated genes were taken into consideration. We also found these duplicated genes having alternative splicing were under tighter evolutionary constraints compared to those having no alternative splicing, and had an enrichment of genes that participate in molecular transducer activities.
Conclusions
We studied the association between occurrences of alternative splicing and gene duplication. Our results implicate that there are key differences in functions and evolutionary constraints among singleton genes or duplicated genes with or without alternative splicing incidences. It implies that the gene duplication and alternative splicing may have different functional significance in the evolution of speciation diversity.
doi:10.1186/1471-2164-12-S3-S16
PMCID: PMC3333175
PMID: 22369477
Background and Aims
To assess the number and phylogenetic distribution of large-scale genome duplications in the ancestry of Actinidia, publicly available expressed sequenced tags (ESTs) for members of the Actinidiaceae and related Ericales, including tea (Camellia sinensis), were analysed.
Methods
Synonymous divergences (Ks) were calculated for all duplications within gene families and examined for evidence of large-scale duplication events. Phylogenetic comparisons for a selection of orthologues among several related species in Ericales and two outgroups permitted placement of duplication events in relation to lineage divergences. Gene ontology (GO) categories were analysed for each whole-genome duplication (WGD) and the whole transcriptome.
Key Results
Evidence for three ancient WGDs in Actinidia was found. Analyses of paleologue GO categories indicated a different pattern of retained genes for each genome duplication, but a pattern consistent with the dosage-balance hypothesis among all retained paleologues.
Conclusions
This study provides evidence for one independent WGD in the ancestry of Actinidia (Ad-α), a WGD shared by Actinidia and Camellia (Ad-β), and the well-established At-γ WGD that occurred prior to the divergence of all taxa examined. More ESTs in other taxa are needed to elucidate which groups in Ericales share the Ad-β or Ad-α duplications and their impact on diversification.
doi:10.1093/aob/mcq129
PMCID: PMC2924827
PMID: 20576738
Paleopolyploidy; Actinidiaceae; Ericales; Actinidia; Camellia; kiwi; genome duplication; dosage balance
Background
Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods.
Results
We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids.
Conclusions
Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem.
doi:10.1186/1471-2105-13-S10-S9
PMCID: PMC3389459
PMID: 22759433
Background
Whole genome duplication (WGD) is a special case of gene duplication, observed rarely in animals, whereby all genes duplicate simultaneously through polyploidisation. Two rounds of WGD (2R-WGD) occurred at the base of vertebrates, giving rise to an enormous wave of genetic novelty, but a systematic analysis of functional consequences of this event has not yet been performed.
Results
We show that 2R-WGD affected an overwhelming majority (74%) of signalling genes, in particular developmental pathways involving receptor tyrosine kinases, Wnt and transforming growth factor-β ligands, G protein-coupled receptors and the apoptosis pathway. 2R-retained genes, in contrast to tandem duplicates, were enriched in protein interaction domains and multifunctional signalling modules of Ras and mitogen-activated protein kinase cascades. 2R-WGD had a fundamental impact on the cell-cycle machinery, redefined molecular building blocks of the neuronal synapse, and was formative for vertebrate brains. We investigated 2R-associated nodes in the context of the human signalling network, as well as in an inferred ancestral pre-2R (AP2R) network, and found that hubs (particularly involving negative regulation) were preferentially retained, with high connectivity driving retention. Finally, microarrays and proteomics demonstrated a trend for gradual paralog expression divergence independent of the duplication mechanism, but inferred ancestral expression states suggested preferential subfunctionalisation among 2R-ohnologs (2ROs).
Conclusions
The 2R event left an indelible imprint on vertebrate signalling and the cell cycle. We show that 2R-WGD preferentially retained genes are associated with higher organismal complexity (for example, locomotion, nervous system, morphogenesis), while genes associated with basic cellular functions (for example, translation, replication, splicing, recombination; with the notable exception of cell cycle) tended to be excluded. 2R-WGD set the stage for the emergence of key vertebrate functional novelties (such as complex brains, circulatory system, heart, bone, cartilage, musculature and adipose tissue). A full explanation of the impact of 2R on evolution, function and the flow of information in vertebrate signalling networks is likely to have practical consequences for regenerative medicine, stem cell therapies and cancer treatment.
doi:10.1186/1741-7007-8-146
PMCID: PMC3238295
PMID: 21144020
Gene duplication has long been acknowledged by biologists as a major evolutionary force shaping genomic architectures and characteristics across the Tree of Life. Major research has been conducting on elucidating the fate of duplicated genes in a variety of organisms, as well as factors that affect a gene’s duplicability–that is, the tendency of certain genes to retain more duplicates than others. In particular, two studies have looked at the correlation between gene duplicability and its degree in a protein-protein interaction network in yeast, mouse, and human, and another has looked at the correlation between gene duplicability and its complexity (length, number of domains, etc.) in yeast. In this paper, we extend these studies to six species, and two trends emerge. There is an increase in the duplicability-connectivity correlation that agrees with the increase in the genome size as well as the phylogenetic relationship of the species. Further, the duplicability-complexity correlation seems to be constant across the species. We argue that the observed correlations can be explained by neutral evolutionary forces acting on the genomic regions containing the genes. For the duplicability-connectivity correlation, we show through simulations that an increasing trend can be obtained by adjusting parameters to approximate genomic characteristics of the respective species. Our results call for more research into factors, adaptive and non-adaptive alike, that determine a gene’s duplicability.
doi:10.1371/journal.pone.0044491
PMCID: PMC3439388
PMID: 22984517
Background
Multiple models have been proposed to interpret the retention of duplicated genes. In this study, we attempted to compare whether the duplicates arising from tandem duplications and retropositions are retained by the same mechanisms in human and mouse genomes.
Results
Both sequence and expression similarity analyses revealed that tandem duplicates tend to be more conserved, whereas retrogenes tend to be more divergent. The duplicability of tandem duplicates is also higher than that of retrogenes. However, positive selection seems to play significant roles in the retention of both types of duplicates.
Conclusions
We propose that dosage effect is more prevalent in the retention of tandem duplicates, while 'escape from adaptive conflict' (EAC) effect is more prevalent in the retention of retrogenes.
doi:10.1186/1297-9686-42-24
PMCID: PMC2902415
PMID: 20584267