1.  Genetic interactions reveal the evolutionary trajectories of duplicate genes 
Duplicate genes show significantly fewer interactions than singleton genes, and functionally similar duplicates can exhibit dissimilar profiles because common interactions are ‘hidden' due to buffering.Genetic interaction profiles provide insights into evolutionary mechanisms of duplicate retention by distinguishing duplicates under dosage selection from those retained because of some divergence in function.The genetic interactions of duplicate genes evolve in an extremely asymmetric way and the directionality of this asymmetry correlates well with other evolutionary properties of duplicate genes.Genetic interaction profiles can be used to elucidate the divergent function of specific duplicate pairs.
Gene duplication and divergence serves as a primary source for new genes and new functions, and as such has broad implications on the evolutionary process. Duplicate genes within S. cerevisiae have been shown to retain a high degree of similarity with regard to many of their functional properties (Papp et al, 2004; Guan et al, 2007; Wapinski et al, 2007; Musso et al, 2008), and perturbation of duplicate genes has been shown to result in smaller fitness defects than singleton genes (Gu et al, 2003; DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Individual genetic interactions between pairs of genes and profiles of such interactions across the entire genome provide a new context in which to examine the properties of duplicate compensation.
In this study we use the most recent and comprehensive set of genetic interactions in yeast produced to date (Costanzo et al, 2010) to address questions of duplicate retention and redundancy. We show that the ability for duplicate genes to buffer the deletion of a partner has three main consequences. First it agrees with previous work demonstrating that a high proportion of duplicate pairs are synthetic lethal, a classic indication of the ability to buffer one another functionally (DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Second, it reduces the number of genetic interactions observed between duplicate genes and the rest of the genome by masking interactions relating to common function from experimental detection. Third, this buffering of common interactions serves to reduce profile similarity in spite of common function (Figure 1). The compensatory ability of functionally similar duplicates buffers genetic interactions related to their common function (reducing the number of genetic interactions overall), while allowing the measurement of interactions related to any divergent function. Thus, even functionally similar duplicates may have dissimilar genetic interaction profiles. As previously surmised (Ihmels et al, 2007), duplicate genes under selection for dosage amplification have differing profile characteristics. We show that dosage-mediated duplicates have much higher genetic interaction profile similarity than do other duplicate pairs. Furthermore, we show in a comparison with local neighbors on a protein–protein interaction (PPI) network, that although dosage-mediated duplicates more often have higher similarity to each other than they do to their neighbors, the reverse is true for duplicates in general. That is, slightly divergent duplicate genes more often exhibit a higher similarity with a common neighbor on the PPI network than they do with each other, and that observation is consistent with the idea that common interactions are buffered while interactions corresponding to divergent functions are observed.
We then asked whether duplicates' genetic interactions that are not buffered appear in a symmetric or an asymmetric fashion. Previous work has established asymmetric patterns with regard to PPI degree (Wagner, 2002; He and Zhang, 2005), sequence divergence (Conant and Wagner, 2003; Zhang et al, 2003; Kellis et al, 2004; Scannell and Wolfe, 2008) and expression patterns (Gu et al, 2002b; Tirosh and Barkai, 2007). Although genetic interactions are further removed from mechanism than protein–protein interactions, for example, they do offer a more direct measurement of functional consequence and, thus, may give a better indication of the functional differences between a duplicate pair. We found that duplicates exhibit a strikingly asymmetric pattern of genetic interactions, with the ratio of interactions between sisters commonly exceeding 7:1 (Figure 4A). The observations differ significantly from random simulations in which genetic interactions were redistributed between sisters with equal probability (Figure 4A). Moreover, the directionality of this interaction asymmetry agrees with other physiological properties of duplicate pairs. For example, the sister with more genetic interactions also tends to have more protein–protein interactions and also tends to evolve at a slower rate (Figure 4B).
Genetic interaction degree and profiles can be used to understand the functional divergence of particular duplicates pairs. As a case example, we consider the whole-genome-duplication pair CIK1–VIK1. Each of these genes encode proteins that form distinct heterodimeric complexes with the microtubule motor protein Kar3 (Manning et al, 1999). Although each of these proteins depend on a direct physical interaction with Kar3, Cik1 has a much higher profile similarity to Kar3 than does Vik1 (r=0.5 and r=0.3, respectively). Consistent with its higher similarity, Δcik1 and Δkar3 exhibit several similar phenotypes, including abnormally short spindles, chromosome loss and delayed cell cycle progression (Page et al, 1994; Manning et al, 1999). In contrast, a Δvik1 mutant strain exhibits no overt phenotype (Manning et al, 1999).
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. Large-scale genetic network analysis in Saccharomyces cerevisiae provides a powerful perspective for addressing these questions through quantitative measurements of genetic interactions between pairs of duplicated genes, and more generally, through the study of genome-wide genetic interaction profiles associated with duplicated genes. We show that duplicate genes exhibit fewer genetic interactions than other genes because they tend to buffer one another functionally, whereas observed interactions are non-overlapping and reflect their divergent roles. We also show that duplicate gene pairs are highly imbalanced in their number of genetic interactions with other genes, a pattern that appears to result from asymmetric evolution, such that one duplicate evolves or degrades faster than the other and often becomes functionally or conditionally specialized. The differences in genetic interactions are predictive of differences in several other evolutionary and physiological properties of duplicate pairs.
PMCID: PMC3010121  PMID: 21081923
duplicate genes; functional divergence; genetic interactions; paralogs; Saccharomyces cerevisiae
2.  A Synergism between Adaptive Effects and Evolvability Drives Whole Genome Duplication to Fixation 
PLoS Computational Biology  2014;10(4):e1003547.
Whole genome duplication has shaped eukaryotic evolutionary history and has been associated with drastic environmental change and species radiation. While the most common fate of WGD duplicates is a return to single copy, retained duplicates have been found enriched for highly interacting genes. This pattern has been explained by a neutral process of subfunctionalization and more recently, dosage balance selection. However, much about the relationship between environmental change, WGD and adaptation remains unknown. Here, we study the duplicate retention pattern postWGD, by letting virtual cells adapt to environmental changes. The virtual cells have structured genomes that encode a regulatory network and simple metabolism. Populations are under selection for homeostasis and evolve by point mutations, small indels and WGD. After populations had initially adapted fully to fluctuating resource conditions re-adaptation to a broad range of novel environments was studied by tracking mutations in the line of descent. WGD was established in a minority (≈30%) of lineages, yet, these were significantly more successful at re-adaptation. Unexpectedly, WGD lineages conserved more seemingly redundant genes, yet had higher per gene mutation rates. While WGD duplicates of all functional classes were significantly over-retained compared to a model of neutral losses, duplicate retention was clearly biased towards highly connected TFs. Importantly, no subfunctionalization occurred in conserved pairs, strongly suggesting that dosage balance shaped retention. Meanwhile, singles diverged significantly. WGD, therefore, is a powerful mechanism to cope with environmental change, allowing conservation of a core machinery, while adapting the peripheral network to accommodate change.
Author Summary
The evolution of eukaryotes is characterized by drastic changes in their genome content. Genome expansions have often occurred by duplication of the entire genome. It is generally not know whether organisms gain any adaptive advantage from these mutations. However, they appear to become fixed in response to environmental change. Many interesting whole genome duplications happened long ago in eukaryotic evolutionary history during periods of turbulent genome and species evolution. Genomic data analysis alone cannot resolve the evolutionary mechanisms and consequences of whole genome duplication. Here, we modeled evolution with whole genome duplications in a Virtual Cell model. Simulating populations that undergo a range of different environmental changes we found that next to often increasing fitness directly, whole genome duplications made lineages more evolvable and hence more able to adapt to harsh new environments. Although most duplicates are deleted in subsequent evolution, genes with many interaction partners were retained preferentially, increasing regulatory complexity. Interestingly however, we found that innovation happened most likely in the more loosely connected and less essential genes.
PMCID: PMC3990473  PMID: 24743268
3.  The Roles of Whole-Genome and Small-Scale Duplications in the Functional Specialization of Saccharomyces cerevisiae Genes 
PLoS Genetics  2013;9(1):e1003176.
Researchers have long been enthralled with the idea that gene duplication can generate novel functions, crediting this process with great evolutionary importance. Empirical data shows that whole-genome duplications (WGDs) are more likely to be retained than small-scale duplications (SSDs), though their relative contribution to the functional fate of duplicates remains unexplored. Using the map of genetic interactions and the re-sequencing of 27 Saccharomyces cerevisiae genomes evolving for 2,200 generations we show that SSD-duplicates lead to neo-functionalization while WGD-duplicates partition ancestral functions. This conclusion is supported by: (a) SSD-duplicates establish more genetic interactions than singletons and WGD-duplicates; (b) SSD-duplicates copies share more interaction-partners than WGD-duplicates copies; (c) WGD-duplicates interaction partners are more functionally related than SSD-duplicates partners; (d) SSD-duplicates gene copies are more functionally divergent from one another, while keeping more overlapping functions, and diverge in their sub-cellular locations more than WGD-duplicates copies; and (e) SSD-duplicates complement their functions to a greater extent than WGD–duplicates. We propose a novel model that uncovers the complexity of evolution after gene duplication.
Author Summary
Gene duplication involves the doubling of a gene, originating an identical gene copy. Early evolutionary theory predicted that, as one gene copy is performing the ancestral function, the other gene copy, devoid from strong selection constraints, could evolve exploring alternative functions. Because of its potential to generate novel functions, hence biological complexity, gene duplication has been credited with enormous evolutionary importance. The way in which duplicated genes acquire novel functions remains the focus of intense research. Does the mechanism of duplication—duplication of small genome regions versus genome duplication—influence the fate of duplicates? Although it has been shown that the mechanism of duplication determines the persistence of genes in duplicate, a model describing the functional fates of duplicates generated by whole-genome or small-scale duplications remains largely obscure. Here we show that despite the large amount of genetic material originated by whole-genome duplication in the yeast Saccharomyces cerevisiae, these duplicates specialized in subsets of ancestral functions. Conversely, small-scale duplicates originated novel functions. We describe and test a model to explain the evolutionary dynamics of duplicates originated by different mechanisms. Our results shed light on the functional fates of duplicates and role of the duplication mechanism in generating functional diversity.
PMCID: PMC3536658  PMID: 23300483
4.  Dose–Sensitivity, Conserved Non-Coding Sequences, and Duplicate Gene Retention Through Multiple Tetraploidies in the Grasses 
Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein–protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein–protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose–sensitive protein–DNA interactions between the regulatory regions of CNS-rich genes – nicknamed bigfoot genes – and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy.
PMCID: PMC3355796  PMID: 22645525
conserved non-coding sequence; polyploidy; fractionation; gene dosage; gene regulation
5.  Subfunctionalization of Duplicated Zebrafish pax6 Genes by cis-Regulatory Divergence 
PLoS Genetics  2008;4(2):e29.
Gene duplication is a major driver of evolutionary divergence. In most vertebrates a single PAX6 gene encodes a transcription factor required for eye, brain, olfactory system, and pancreas development. In zebrafish, following a postulated whole-genome duplication event in an ancestral teleost, duplicates pax6a and pax6b jointly fulfill these roles. Mapping of the homozygously viable eye mutant sunrise identified a homeodomain missense change in pax6b, leading to loss of target binding. The mild phenotype emphasizes role-sharing between the co-orthologues. Meticulous mapping of isolated BACs identified perturbed synteny relationships around the duplicates. This highlights the functional conservation of pax6 downstream (3′) control sequences, which in most vertebrates reside within the introns of a ubiquitously expressed neighbour gene, ELP4, whose pax6a-linked exons have been lost in zebrafish. Reporter transgenic studies in both mouse and zebrafish, combined with analysis of vertebrate sequence conservation, reveal loss and retention of specific cis-regulatory elements, correlating strongly with the diverged expression of co-orthologues, and providing clear evidence for evolution by subfunctionalization.
Author Summary
Studying the zebrafish small eyed mutant “sunrise,” we identified the causative amino acid change in the pax6b gene. This mutation leads to reduced DNA binding capacity. There are two closely related pax6 genes in zebrafish, pax6a and pax6b, which arose following a whole-genome duplication event about 350 million years ago; they map to different chromosomes. Each copy is now associated with a different subset of the neighbouring genes found associated with all vertebrate single-copy Pax6 genes. The expression patterns of pax6a and pax6b have diverged from each other since the duplication event. Some division of labour has emerged: pax6b is less widely expressed in the brain than pax6a, but only pax6b is found in the developing pancreas. Multiple evolutionarily conserved regulatory elements (enhancers) control these expression patterns, which can be recapitulated in transgenic animals. Some enhancer elements lie more than 150 kb outside the transcribed gene region, inside the introns of unrelated neighbouring genes. Such juxtaposition imposes the need to conserve gene order in many vertebrate species. Genome duplication releases the constraint for retaining all neighbouring genes. Thus, pax6a has lost the coding region of its immediate neighbours, although it retains most of the brain-specific regulatory domains. Duplication also allows some orderly changes in the exact role of each regulatory component, as long as the two duplicates can, together, ensure the complex expression pattern required for complete function. We demonstrate functional loss of a brain element downstream of pax6b, while an upstream pancreas enhancer element has evolved in a more complex way.
PMCID: PMC2242813  PMID: 18282108
6.  Molecular evolution of glycinin and β-conglycinin gene families in soybean (Glycine max L. Merr.) 
Heredity  2010;106(4):633-641.
There are two main classes of multi-subunit seed storage proteins, glycinin (11S) and β-conglycinin (7S), which account for approximately 70% of the total protein in a typical soybean seed. The subunits of these two protein classes are encoded by a number of genes. The genomic organization of these genes follows a complex evolutionary history. This research was designed to describe the origin and maintenance of genes in each of these gene families by analyzing the synteny, phylogenies, selection pressure and duplications of the genes in each gene family. The ancestral glycinin gene initially experienced a tandem duplication event; then, the genome underwent two subsequent rounds of whole-genome duplication, thereby resulting in duplication of the glycinin genes, and finally a tandem duplication likely gave rise to the Gy1 and Gy2 genes. The β-conglycinin genes primarily originated through the more recent whole-genome duplication and several tandem duplications. Purifying selection has had a key role in the maintenance of genes in both gene families. In addition, positive selection in the glycinin genes and a large deletion in a β-conglycinin exon contribute to the diversity of the duplicate genes. In summary, our results suggest that the duplicated genes in both gene families prefer to retain similar function throughout evolution and therefore may contribute to phenotypic robustness.
PMCID: PMC3183897  PMID: 20668431
β-Conglycinin; duplicate divergence; glycinin; molecular evolution; positive selection; soybean
7.  Protein Under-Wrapping Causes Dosage Sensitivity and Decreases Gene Duplicability 
PLoS Genetics  2008;4(1):e11.
A fundamental issue in molecular evolution is how to identify the evolutionary forces that determine the fate of duplicated genes. The dosage balance hypothesis has been invoked to explain gene duplication patterns at the genomic level under the premise that a dosage imbalance among protein-complex subunits or interacting partners is often deleterious. Here we examine this hypothesis by investigating the molecular basis of dosage sensitivity. We focus on the extent of protein wrapping, which indicates how strongly the structural integrity of a protein relies on its interactive context. From this perspective, we predict that the duplicates of a highly under-wrapped protein or protein subunit should (1) be more sensitive to dosage imbalance and be less likely to be retained and (2) be more likely to survive from a whole-genome duplication (WGD) than from a non-WGD because a WGD causes little or no dosage imbalance. Our under-wrapping analysis of more than 12,000 protein structures strongly supports these predictions and further reveals that the effect of dosage sensitivity on gene duplicability decreases with increasing organismal complexity.
Author Summary
A gene duplication provides an extra gene copy that can be free to accumulate mutations and gain a new function. Therefore, gene duplication plays a very important role in evolution. However, the presence of an additional gene copy can sometimes be deleterious because it can lead to an excessive dosage relative to those of its interacting partners. This dosage imbalance effect in turn influences the fate of duplicated genes in evolution. Our study gives the first description to our knowledge of the molecular/structural basis for the dosage imbalance effect. We study the relationships between gene family size and extent of protein under-wrapping, a molecular quantifier of the reliance of the protein on binding partnerships to maintain structural integrity, indicative of the extent of structure protection from disruptive hydration. Using more than 12,000 protein three-dimensional structures from six organisms that range from bacteria to human, we show an inverse relationship between extent of protein under-wrapping and family size. That is, a duplication is unlikely to be tolerated if the protein is highly under-wrapped (i.e., its structure requires substantial stabilizing interactions with other proteins). We also show that the effect of dosage imbalance is more apparent in unicellular organisms but is buffered to some extent in higher eukaryotes.
PMCID: PMC2211539  PMID: 18208334
8.  Multiple Chromosomal Rearrangements Structured the Ancestral Vertebrate Hox-Bearing Protochromosomes 
PLoS Genetics  2009;5(1):e1000349.
While the proposal that large-scale genome expansions occurred early in vertebrate evolution is widely accepted, the exact mechanisms of the expansion—such as a single or multiple rounds of whole genome duplication, bloc chromosome duplications, large-scale individual gene duplications, or some combination of these—is unclear. Gene families with a single invertebrate member but four vertebrate members, such as the Hox clusters, provided early support for Ohno's hypothesis that two rounds of genome duplication (the 2R-model) occurred in the stem lineage of extant vertebrates. However, despite extensive study, the duplication history of the Hox clusters has remained unclear, calling into question its usefulness in resolving the role of large-scale gene or genome duplications in early vertebrates. Here, we present a phylogenetic analysis of the vertebrate Hox clusters and several linked genes (the Hox “paralogon”) and show that different phylogenies are obtained for Dlx and Col genes than for Hox and ErbB genes. We show that these results are robust to errors in phylogenetic inference and suggest that these competing phylogenies can be resolved if two chromosomal crossover events occurred in the ancestral vertebrate. These results resolve conflicting data on the order of Hox gene duplications and the role of genome duplication in vertebrate evolution and suggest that a period of genome reorganization occurred after genome duplications in early vertebrates.
Author Summary
The genome of vertebrates has expanded greatly in gene number since our last common ancestor with invertebrates. While it is clear that genome expansions occurred early in the evolution of vertebrates, the mechanisms of that expansion—such as a single or multiple rounds of whole genome duplication, chromosome duplications, large-scale individual gene duplications, or some combination of these—is unclear. Central to this debate has been the duplication history of Hox clusters, which ancestrally have four copies in vertebrates, but only a single copy in invertebrates. This 1∶4 ratio has been used to support the hypothesis that two rounds of whole-genome duplications occurred in early vertebrates (named the 2R model); however, the phylogeny of the Hox clusters and its linked genes (the Hox paralogon) seem to contradict this model. Here, we use phylogenetic methods to infer that two chromosomal rearrangements occurred shortly after the genome duplications within the Hox paralogon. These results resolve the apparent conflict between the duplication order of the Hox paralogon and the 2R model and suggest that vertebrates are pseudo-octoploids.
PMCID: PMC2622764  PMID: 19165336
9.  Modification of Gene Duplicability during the Evolution of Protein Interaction Network 
PLoS Computational Biology  2011;7(4):e1002029.
Duplications of genes encoding highly connected and essential proteins are selected against in several species but not in human, where duplicated genes encode highly connected proteins. To understand when and how gene duplicability changed in evolution, we compare gene and network properties in four species (Escherichia coli, yeast, fly, and human) that are representative of the increase in evolutionary complexity, defined as progressive growth in the number of genes, cells, and cell types. We find that the origin and conservation of a gene significantly correlates with the properties of the encoded protein in the protein-protein interaction network. All four species preserve a core of singleton and central hubs that originated early in evolution, are highly conserved, and accomplish basic biological functions. Another group of hubs appeared in metazoans and duplicated in vertebrates, mostly through vertebrate-specific whole genome duplication. Such recent and duplicated hubs are frequently targets of microRNAs and show tissue-selective expression, suggesting that these are alternative mechanisms to control their dosage. Our study shows how networks modified during evolution and contributes to explaining the occurrence of somatic genetic diseases, such as cancer, in terms of network perturbations.
Author Summary
Gene copy number is often tightly controlled because it directly affects the gene dosage. In several species, including yeast, worm, and fly, genes that have a single gene copy (singleton genes) encode proteins with several connections in the protein interaction network (hubs) as well as essential proteins. Surprisingly, in mouse and human essential proteins and hubs are encoded by genes with more than one copy in the genome (duplicated genes). Here we show that these two distinct groups of hubs were acquired at different times during the evolution of protein interaction network and contribute in different ways to the cell life. Singleton hubs are ancestral genes that are conserved from prokaryotes to vertebrates and accomplish basic functions that deal with the cell survival. Duplicated hubs were acquired mostly within metazoans and duplicated through vertebrate-specific whole genome duplication. These genes are involved in processes that are crucial for the organization of multicellularity. Although duplicated, also recent hubs are subject to gene dosage control through microRNAs and tissue-selective expression. The clarification of how the protein interaction network evolves enables us to understand the adaptation to the progressive increase in complexity and to better characterize the genes involved in diseases such as cancer.
PMCID: PMC3072358  PMID: 21490719
10.  Genome Duplication and Gene Loss Affect the Evolution of Heat Shock Transcription Factor Genes in Legumes 
PLoS ONE  2014;9(7):e102825.
Whole-genome duplication events (polyploidy events) and gene loss events have played important roles in the evolution of legumes. Here we show that the vast majority of Hsf gene duplications resulted from whole genome duplication events rather than tandem duplication, and significant differences in gene retention exist between species. By searching for intraspecies gene colinearity (microsynteny) and dating the age distributions of duplicated genes, we found that genome duplications accounted for 42 of 46 Hsf-containing segments in Glycine max, while paired segments were rarely identified in Lotus japonicas, Medicago truncatula and Cajanus cajan. However, by comparing interspecies microsynteny, we determined that the great majority of Hsf-containing segments in Lotus japonicas, Medicago truncatula and Cajanus cajan show extensive conservation with the duplicated regions of Glycine max. These segments formed 17 groups of orthologous segments. These results suggest that these regions shared ancient genome duplication with Hsf genes in Glycine max, but more than half of the copies of these genes were lost. On the other hand, the Glycine max Hsf gene family retained approximately 75% and 84% of duplicated genes produced from the ancient genome duplication and recent Glycine-specific genome duplication, respectively. Continuous purifying selection has played a key role in the maintenance of Hsf genes in Glycine max. Expression analysis of the Hsf genes in Lotus japonicus revealed their putative involvement in multiple tissue-/developmental stages and responses to various abiotic stimuli. This study traces the evolution of Hsf genes in legume species and demonstrates that the rates of gene gain and loss are far from equilibrium in different species.
PMCID: PMC4105503  PMID: 25047803
11.  Divergent evolutionary fates of major photosynthetic gene networks following gene and whole genome duplications 
Plant Signaling & Behavior  2011;6(4):594-597.
Gene and genome duplication are recurring processes in flowering plants, and elucidating the mechanisms by which duplicated genes are lost or deployed is a key component of understanding plant evolution. Using gene ontologies (GO) or protein family (PFAM) domains, distinct patterns of duplicate retention and loss have been identified depending on gene functional properties and duplication mechanism, but little is known about how gene networks encoding interacting proteins (protein complexes or signaling cascades) evolve in response to duplication. We examined patterns of duplicate retention within four major gene networks involved in photosynthesis (the Calvin cycle, photosystem I, photosystem II and the light harvesting complex) across three species and four whole genome duplications, as well as small-scale duplications and showed that photosystem gene family evolution is governed largely by dosage sensitivity.1 In contrast, Calvin cycle gene families are not dosage-sensitive, but exhibit a greater capacity for functional differentiation. Here we review these findings, highlight how this study, by analyzing defined gene networks, is complementary to global studies using functional annotations such as GO and PFAM, and elaborate on one example of functional differentiation in the Calvin cycle gene family, transketolase.
PMCID: PMC3142401  PMID: 21494088
gene duplication; whole genome duplication; dosage sensitivity; balance hypothesis
12.  All duplicates are not equal: the difference between small-scale and genome duplication 
Genome Biology  2007;8(10):R209.
The comparison of pairs of gene duplications generated by small-scale duplications with those created by large-scale duplications shows that they differ in quantifiable ways. It is suggested that this is directly due to biases on the paths to gene retention rather than association with different functional categories.
Genes in populations are in constant flux, being gained through duplication and occasionally retained or, more frequently, lost from the genome. In this study we compare pairs of identifiable gene duplicates generated by small-scale (predominantly single-gene) duplications with those created by a large-scale gene duplication event (whole-genome duplication) in the yeast Saccharomyces cerevisiae.
We find a number of quantifiable differences between these data sets. Whole-genome duplicates tend to exhibit less profound phenotypic effects when deleted, are functionally less divergent, and are associated with a different set of functions than their small-scale duplicate counterparts. At first sight, either of these latter two features could provide a plausible mechanism by which the difference in dispensability might arise. However, we uncover no evidence suggesting that this is the case. We find that the difference in dispensability observed between the two duplicate types is limited to gene products found within protein complexes, and probably results from differences in the relative strength of the evolutionary pressures present following each type of duplication event.
Genes, and the proteins they specify, originating from small-scale and whole-genome duplication events differ in quantifiable ways. We infer that this is not due to their association with different functional categories; rather, it is a direct result of biases in gene retention.
PMCID: PMC2246283  PMID: 17916239
13.  Comparative genomics of Lbx loci reveals conservation of identical Lbx ohnologs in bony vertebrates 
Lbx/ladybird genes originated as part of the metazoan cluster of Nk homeobox genes. In all animals investigated so far, both the protostome genes and the vertebrate Lbx1 genes were found to play crucial roles in neural and muscle development. Recently however, additional Lbx genes with divergent expression patterns were discovered in amniotes. Early in the evolution of vertebrates, two rounds of whole genome duplication are thought to have occurred, during which 4 Lbx genes were generated. Which of these genes were maintained in extant vertebrates, and how these genes and their functions evolved, is not known.
Here we searched vertebrate genomes for Lbx genes and discovered novel members of this gene family. We also identified signature genes linked to particular Lbx loci and traced the remnants of 4 Lbx paralogons (two of which retain Lbx genes) in amniotes. In teleosts, that have undergone an additional genome duplication, 8 Lbx paralogons (three of which retain Lbx genes) were found. Phylogenetic analyses of Lbx and Lbx-associated genes show that in extant, bony vertebrates only Lbx1- and Lbx2-type genes are maintained. Of these, some Lbx2 sequences evolved faster and were probably subject to neofunctionalisation, while Lbx1 genes may have retained more features of the ancestral Lbx gene. Genes at Lbx1 and former Lbx4 loci are more closely related, as are genes at Lbx2 and former Lbx3 loci. This suggests that during the second vertebrate genome duplication, Lbx1/4 and Lbx2/3 paralogons were generated from the duplicated Lbx loci created during the first duplication event.
Our study establishes for the first time the evolutionary history of Lbx genes in bony vertebrates, including the order of gene duplication events, gene loss and phylogenetic relationships. Moreover, we identified genetic hallmarks for each of the Lbx paralogons that can be used to trace Lbx genes as other vertebrate genomes become available. Significantly, we show that bony vertebrates only retained copies of Lbx1 and Lbx2 genes, with some Lbx2 genes being highly divergent. Thus, we have established a base on which the evolution of Lbx gene function in vertebrate development can be evaluated.
PMCID: PMC2446394  PMID: 18541024
14.  Dynamics of Gene Duplication in the Genomes of Chlorophyll d-Producing Cyanobacteria: Implications for the Ecological Niche 
Gene duplication may be an important mechanism for the evolution of new functions and for the adaptive modulation of gene expression via dosage effects. Here, we analyzed the fate of gene duplicates for two strains of a novel group of cyanobacteria (genus Acaryochloris) that produces the far-red light absorbing chlorophyll d as its main photosynthetic pigment. The genomes of both strains contain an unusually high number of gene duplicates for bacteria. As has been observed for eukaryotic genomes, we find that the demography of gene duplicates can be well modeled by a birth–death process. Most duplicated Acaryochloris genes are of comparatively recent origin, are strain-specific, and tend to be located on different genetic elements. Analyses of selection on duplicates of different divergence classes suggest that a minority of paralogs exhibit near neutral evolutionary dynamics immediately following duplication but that most duplicate pairs (including those which have been retained for long periods) are under strong purifying selection against amino acid change. The likelihood of duplicate retention varied among gene functional classes, and the pronounced differences between strains in the pool of retained recent duplicates likely reflects differences in the nutrient status and other characteristics of their respective environments. We conclude that most duplicates are quickly purged from Acaryochloris genomes and that those which are retained likely make important contributions to organism ecology by conferring fitness benefits via gene dosage effects. The mechanism of enhanced duplication may involve homologous recombination between genetic elements mediated by paralogous copies of recA.
PMCID: PMC3156569  PMID: 21697100
Acaryochloris; recA; homologous recombination; plasmid
15.  Comparative genomic analysis of C4 photosynthetic pathway evolution in grasses 
Genome Biology  2009;10(6):R68.
Comparison of the sorghum, maize and rice genomes shows that gene duplication and functional innovation is common to evolution of most but not all genes in the C4 photosynthetic pathway
Sorghum is the first C4 plant and the second grass with a full genome sequence available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution by comparing key photosynthetic enzyme genes in sorghum, maize (C4) and rice (C3), and to investigate a long-standing hypothesis that a reservoir of duplicated genes is a prerequisite for the evolution of C4 photosynthesis from a C3 progenitor.
We show that both whole-genome and individual gene duplication have contributed to the evolution of C4 photosynthesis. The C4 gene isoforms show differential duplicability, with some C4 genes being recruited from whole genome duplication duplicates by multiple modes of functional innovation. The sorghum and maize carbonic anhydrase genes display a novel mode of new gene formation, with recursive tandem duplication and gene fusion accompanied by adaptive evolution to produce C4 genes with one to three functional units. Other C4 enzymes in sorghum and maize also show evidence of adaptive evolution, though differing in level and mode. Intriguingly, a phosphoenolpyruvate carboxylase gene in the C3 plant rice has also been evolving rapidly and shows evidence of adaptive evolution, although lacking key mutations that are characteristic of C4 metabolism. We also found evidence that both gene redundancy and alternative splicing may have sheltered the evolution of new function.
Gene duplication followed by functional innovation is common to evolution of most but not all C4 genes. The apparently long time-lag between the availability of duplicates for recruitment into C4 and the appearance of C4 grasses, together with the heterogeneity of origins of C4 genes, suggests that there may have been a long transition process before the establishment of C4 photosynthesis.
PMCID: PMC2718502  PMID: 19549309
16.  Striking Similarities in the Genomic Distribution of Tandemly Arrayed Genes in Arabidopsis and Rice 
PLoS Computational Biology  2006;2(9):e115.
In Arabidopsis, tandemly arrayed genes (TAGs) comprise >10% of the genes in the genome. These duplicated genes represent a rich template for genetic innovation, but little is known of the evolutionary forces governing their generation and maintenance. Here we compare the organization and evolution of TAGs between Arabidopsis and rice, two plant genomes that diverged ~150 million years ago. TAGs from the two genomes are similar in a number of respects, including the proportion of genes that are tandemly arrayed, the number of genes within an array, the number of tandem arrays, and the dearth of TAGs relative to single copy genes in centromeric regions. Analysis of recombination rates along rice chromosomes confirms a positive correlation between the occurrence of TAGs and recombination rate, as found in Arabidopsis. TAGs are also biased functionally relative to duplicated, nontandemly arrayed genes. In both genomes, TAGs are enriched for genes that encode membrane proteins and function in “abiotic and biotic stress” but underrepresented for genes involved in transcription and DNA or RNA binding functions. We speculate that these observations reflect an evolutionary trend in which successful tandem duplication involves genes either at the end of biochemical pathways or in flexible steps in a pathway, for which fluctuation in copy number is unlikely to affect downstream genes. Despite differences in the age distribution of tandem arrays, the striking similarities between rice and Arabidopsis indicate similar mechanisms of TAG generation and maintenance.
The nuclear genomes of higher plants vary tremendously in size and gene content. Much of this variation is attributable to gene duplication. To date, most studies of plant gene duplication have focused on whole genome duplication events, which duplicate all genes simultaneously. Another prominent process is single gene duplication, which often results in duplicated genes arranged in a tandem array. Here Rizzon, Ponger, and Gaut identify tandem arrays in rice and their genome organization between Arabidopsis and rice, two plant species that diverged ~150 million years ago. The two genomes contain a similar proportion of genes that are tandemly arrayed, with a similar number of genes within an array. Moreover, tandemly arrayed genes are most common in genomic regions of high recombination in both species. This organization appears to be a general feature of eukaryotic genomes, perhaps because duplication rates are higher in high recombination regions. Tandemly arrayed genes of rice and Arabidopsis also represent a biased gene set with regard to function. In contrast to genes duplicated through whole genome events, tandemly arrayed genes are enriched for genes that encode membrane proteins and genes that function in response to environmental stresses. Taken together, these observations suggest that tandemly arrayed genes represent a rich and relatively fluid source for plant adaptation.
PMCID: PMC1557586  PMID: 16948529
17.  Following Tetraploidy in Maize, a Short Deletion Mechanism Removed Genes Preferentially from One of the Two Homeologs 
PLoS Biology  2010;8(6):e1000409.
Following genome duplication and selfish DNA expansion, maize used a heretofore unknown mechanism to shed redundant genes and functionless DNA with bias toward one of the parental genomes.
Previous work in Arabidopsis showed that after an ancient tetraploidy event, genes were preferentially removed from one of the two homeologs, a process known as fractionation. The mechanism of fractionation is unknown. We sought to determine whether such preferential, or biased, fractionation exists in maize and, if so, whether a specific mechanism could be implicated in this process. We studied the process of fractionation using two recently sequenced grass species: sorghum and maize. The maize lineage has experienced a tetraploidy since its divergence from sorghum approximately 12 million years ago, and fragments of many knocked-out genes retain enough sequence similarity to be easily identifiable. Using sorghum exons as the query sequence, we studied the fate of both orthologous genes in maize following the maize tetraploidy. We show that genes are predominantly lost, not relocated, and that single-gene loss by deletion is the rule. Based on comparisons with orthologous sorghum and rice genes, we also infer that the sequences present before the deletion events were flanked by short direct repeats, a signature of intra-chromosomal recombination. Evidence of this deletion mechanism is found 2.3 times more frequently on one of the maize homeologs, consistent with earlier observations of biased fractionation. The over-fractionated homeolog is also a greater than 3-fold better target for transposon removal, but does not have an observably higher synonymous base substitution rate, nor could we find differentially placed methylation domains. We conclude that fractionation is indeed biased in maize and that intra-chromosomal or possibly a similar illegitimate recombination is the primary mechanism by which fractionation occurs. The mechanism of intra-chromosomal recombination explains the observed bias in both gene and transposon loss in the maize lineage. The existence of fractionation bias demonstrates that the frequency of deletion is modulated. Among the evolutionary benefits of this deletion/fractionation mechanism is bulk DNA removal and the generation of novel combinations of regulatory sequences and coding regions.
Author Summary
All genomes can accumulate dispensable DNA in the form of duplications of individual genes or even partial or whole genome duplications. Genomes also can accumulate selfish DNA elements. Duplication events specifically are often followed by extensive gene loss. The maize genome is particularly extreme, having become tetraploid 10 million years ago and played host to massive transposon amplifications. We compared the genome of sorghum (which is homologous to the pre-tetraploid maize genome) with the two identifiable parental genomes retained in maize. The two maize genomes differ greatly: one of the parental genomes has lost 2.3 times more genes than the other, and the selfish DNA regions between genes were even more frequently lost, suggesting maize can distinguish between the parental genomes present in the original tetraploid. We show that genes are actually lost, not simply relocated. Deletions were rarely longer than a single gene, and occurred between repeated DNA sequences, suggesting mis-recombination as a mechanism of gene removal. We hypothesize an epigenetic mechanism of genome distinction to account for the selective loss. To the extent that the rate of base substitutions tracks time, we neither support nor refute claims of maize allotetraploidy. Finally, we explain why it makes sense that purifying selection in mammals does not operate at all like the gene and genome deletion program we describe here.
PMCID: PMC2893956  PMID: 20613864
18.  Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes 
PLoS Computational Biology  2015;11(7):e1004394.
Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined ‘ohnologs’ after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.
Author Summary
Duplication of existing genes with subsequent divergence of duplicated copies has long been recognized as the primary source of genomic innovation. Gene duplication is thus at the root of the evolution and complexification of living organisms. However, gene duplicates have been retained differently depending on the genomic scale of their duplication and their implication in genetic diseases. The scale of genomic duplication spans from small scale segmental duplication to whole genome duplication (WGD), which corresponds to a dramatic doubling event of a species genome. In particular, all vertebrates, including human, descend from two rounds of WGDs, which occurred in their jawless ancestor some 500 MY ago. Interestingly, WGD gene duplicates, also called ‘ohnologs’, have be shown to be more frequently implicated in genetic diseases in human. Hence, identifying ohnologs appears central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. In this study, we present a computational approach to predict ohnologs in six vertebrate genomes, including human, based on the comparison of their local gene content (i.e. synteny) with the genomes of six invertebrate outgroups. We show that such synteny comparisons across multiple genomes enhance the statistical power of ohnolog identification compared to earlier approaches.
PMCID: PMC4504502  PMID: 26181593
19.  Generation of Tandem Direct Duplications by Reversed-Ends Transposition of Maize Ac Elements 
PLoS Genetics  2013;9(8):e1003691.
Tandem direct duplications are a common feature of the genomes of eukaryotes ranging from yeast to human, where they comprise a significant fraction of copy number variations. The prevailing model for the formation of tandem direct duplications is non-allelic homologous recombination (NAHR). Here we report the isolation of a series of duplications and reciprocal deletions isolated de novo from a maize allele containing two Class II Ac/Ds transposons. The duplication/deletion structures suggest that they were generated by alternative transposition reactions involving the termini of two nearby transposable elements. The deletion/duplication breakpoint junctions contain 8 bp target site duplications characteristic of Ac/Ds transposition events, confirming their formation directly by an alternative transposition mechanism. Tandem direct duplications and reciprocal deletions were generated at a relatively high frequency (∼0.5 to 1%) in the materials examined here in which transposons are positioned nearby each other in appropriate orientation; frequencies would likely be much lower in other genotypes. To test whether this mechanism may have contributed to maize genome evolution, we analyzed sequences flanking Ac/Ds and other hAT family transposons and identified three small tandem direct duplications with the structural features predicted by the alternative transposition mechanism. Together these results show that some class II transposons are capable of directly inducing tandem sequence duplications, and that this activity has contributed to the evolution of the maize genome.
Author Summary
The recent explosion of genome sequence data has greatly increased the need to understand the forces that shape eukaryotic genomes. A common feature of higher plant genomes is the presence of large numbers of duplications, often occurring as tandem repeats of thousands of base pairs. Despite the importance of gene duplications in evolution and disease, the precise mechanism(s) that generate tandem duplications are still unclear. In this study we identified nine new spontaneous duplications that arose flanking elements of the Ac transposon system. These duplications range in size from 8 kbp to >5,000 kbp, and all cases exhibit features characteristic of Ac transposition. Using similar criteria in a bioinformatics search, we identified three smaller duplications adjacent to other hAT family transposons in the maize B73 reference genome sequence. Our results show that transposable elements can directly generate tandem duplications via alternative transposition, and that this mechanism is responsible for at least some of the duplications present in the maize B73 genome. This work extends the significance of Barbara McClintock's discovery of transposable elements by demonstrating how they can act as agents of genome expansion.
PMCID: PMC3744419  PMID: 23966872
20.  The gain and loss of genes during 600 million years of vertebrate evolution 
Genome Biology  2006;7(5):R43.
Phylogenetic analysis of gene gain and loss during vertebrate evolution provides evidence for the importance of early gene or genome duplication events in evolution of complex vertebrates.
Gene duplication is assumed to have played a crucial role in the evolution of vertebrate organisms. Apart from a continuous mode of duplication, two or three whole genome duplication events have been proposed during the evolution of vertebrates, one or two at the dawn of vertebrate evolution, and an additional one in the fish lineage, not shared with land vertebrates. Here, we have studied gene gain and loss in seven different vertebrate genomes, spanning an evolutionary period of about 600 million years.
We show that: first, the majority of duplicated genes in extant vertebrate genomes are ancient and were created at times that coincide with proposed whole genome duplication events; second, there exist significant differences in gene retention for different functional categories of genes between fishes and land vertebrates; third, there seems to be a considerable bias in gene retention of regulatory genes towards the mode of gene duplication (whole genome duplication events compared to smaller-scale events), which is in accordance with the so-called gene balance hypothesis; and fourth, that ancient duplicates that have survived for many hundreds of millions of years can still be lost.
Based on phylogenetic analyses, we show that both the mode of duplication and the functional class the duplicated genes belong to have been of major importance for the evolution of the vertebrates. In particular, we provide evidence that massive gene duplication (probably as a consequence of entire genome duplications) at the dawn of vertebrate evolution might have been particularly important for the evolution of complex vertebrates.
PMCID: PMC1779523  PMID: 16723033
21.  Exploiting a Reference Genome in Terms of Duplications: The Network of Paralogs and Single Copy Genes in Arabidopsis thaliana 
Biology  2013;2(4):1465-1487.
Arabidopsis thaliana became the model organism for plant studies because of its small diploid genome, rapid lifecycle and short adult size. Its genome was the first among plants to be sequenced, becoming the reference in plant genomics. However, the Arabidopsis genome is characterized by an inherently complex organization, since it has undergone ancient whole genome duplications, followed by gene reduction, diploidization events and extended rearrangements, which relocated and split up the retained portions. These events, together with probable chromosome reductions, dramatically increased the genome complexity, limiting its role as a reference. The identification of paralogs and single copy genes within a highly duplicated genome is a prerequisite to understand its organization and evolution and to improve its exploitation in comparative genomics. This is still controversial, even in the widely studied Arabidopsis genome. This is also due to the lack of a reference bioinformatics pipeline that could exhaustively identify paralogs and singleton genes. We describe here a complete computational strategy to detect both duplicated and single copy genes in a genome, discussing all the methodological issues that may strongly affect the results, their quality and their reliability. This approach was used to analyze the organization of Arabidopsis nuclear protein coding genes, and besides classifying computationally defined paralogs into networks and single copy genes into different classes, it unraveled further intriguing aspects concerning the genome annotation and the gene relationships in this reference plant species. Since our results may be useful for comparative genomics and genome functional analyses, we organized a dedicated web interface to make them accessible to the scientific community
PMCID: PMC4009786  PMID: 24833233
gene duplication; paralog genes; single copy genes; singleton; gene network; Arabidopsis; genome annotation
22.  GINS motion reveals replication fork progression is remarkably uniform throughout the yeast genome 
Time-resolved ChIP-chip can be utilized to monitor the genome-wide dynamics of the GINS complex, yielding quantitative information on replication fork movement.Replication forks progress at remarkably uniform rates across the genome, regardless of location.GINS progression appears to be arrested, albeit with very low frequency, at sites of highly transcribed genes.Comparison of simulation with data leads to novel biological insights regarding the dynamics of replication fork progression
In mitotic division, cells duplicate their DNA in S phase to ensure that the proper genetic material is passed on to their progeny. This process of DNA replication is initiated from several hundred specific sites, termed origins of replication, spaced across the genome. It is essential for replication to begin only after G1 and finish before the initiation of anaphase (Blow and Dutta, 2005; Machida et al, 2005). To ensure proper timing, the beginning stages of DNA replication are tightly coupled to the cell cycle through the activity of cyclin-dependent kinases (Nguyen et al, 2001; Masumoto et al, 2002; Sclafani and Holzen, 2007), which promote the accumulation of the pre-RC at the origins and initiate replication. Replication fork movement occurs subsequent to the firing of origins on recruitment of the replicative helicase and the other fork-associated proteins as the cell enters S phase (Diffley, 2004). The replication machinery itself (polymerases, PCNA, etc.) trails behind the helicase, copying the newly unwound DNA in the wake of the replication fork.
One component of the pre-RC, the GINS complex, consists of a highly conserved set of paralogous proteins (Psf1, Psf2, Psf3 and Sld5 (Kanemaki et al, 2003; Kubota et al, 2003; Takayama et al, 2003)). Previous work suggests that the GINS complex is an integral component of the replication fork and that its interaction with the genome correlates directly to the movement of the fork (reviewed in Labib and Gambus, 2007). Here, we used the GINS complex as a surrogate to measure features of the dynamics of replication—that is, to determine which origins in the genome are active, the timing of their firing and the rates of replication fork progression.
The timing of origin firing and the rates of fork progression have also been investigated by monitoring nascent DNA synthesis (Raghuraman et al, 2001; Yabuki et al, 2002). Origin firing was observed to occur as early as 14 min into the cell cycle and as late as 44 min (Raghuraman et al, 2001). A wide range of nucleotide incorporation rates (0.5–11 kb/min) were observed, with a mean of 2.9 kb/min (Raghuraman et al, 2001), whereas a second study reported a comparable mean rate of DNA duplication of 2.8±1.0 kb/min (Yabuki et al, 2002). In addition to these observations, replication has been inferred to progress asymmetrically from certain origins (Raghuraman et al, 2001). These data have been interpreted to mean that the dynamics of replication fork progression are strongly affected by local chromatin structure or architecture, and perhaps by interaction with the machineries controlling transcription, repair and epigenetic maintenance (Deshpande and Newlon, 1996; Rothstein et al, 2000; Raghuraman et al, 2001; Ivessa et al, 2003). In this study, we adopted a complementary ChIP-chip approach for assaying replication dynamics, in which we followed GINS complexes as they traverse the genome during the cell cycle (Figure 1). These data reveal that GINS binds to active replication origins and spreads bi-directionally and symmetrically as S phase progresses (Figure 3). The majority of origins appear to fire in the first ∼15 min of S phase. A small fraction (∼10%) of the origins to which GINS binds show no evidence of spreading (category 3 origins), although it remains possible that these peaks represent passively fired origins (Shirahige et al, 1998). Once an active origin fires, the GINS complex moves at an almost constant rate of 1.6±0.3 kb/min. Its movement through the inter-origin regions is consistent with that of a protein complex associated with a smoothly moving replication fork. This progression rate is considerably lower and more tightly distributed than those inferred from previous genome-wide measurements assayed through nascent DNA production (Raghuraman et al, 2001; Yabuki et al, 2002). Our study leads us to a different view of replication fork dynamics wherein fork progression is highly uniform in rate and little affected by genomic location.
In this work, we also observe a large number of low-intensity persistent features at sites of high transcriptional activity (e.g. tRNA genes). We were able to accurately simulate these features by assuming they are the result of low probability arrest of replication forks at these sites, rather than fork pausing (Deshpande and Newlon, 1996). The extremely low frequency of these events in wild-type cells suggests they are due to low probability stochastic occurrences during the replication process. It is hoped that future studies will resolve whether these persistent features indeed represent rare instances of fork arrest, or are the result of some alternative process. These may include, for example, the deposition of GINS complexes (or perhaps more specifically Psf2) once a pause has been resolved.
In this work, we have made extensive use of modeling to test a number of different hypotheses and assumptions. In particular, iterative modeling allowed us to infer that GINS progression is uniform and smooth throughout the genome. We have also demonstrated the potential of simulations for estimating firing efficiencies. In the future, extending such firing efficiency simulations to the whole genome should allow us to make correlations with chromosomal features such as nucleosome occupancy. Such correlations may help in determining factors that govern the probability of replication initiation throughout the genome.
Previous studies have led to a picture wherein the replication of DNA progresses at variable rates over different parts of the budding yeast genome. These prior experiments, focused on production of nascent DNA, have been interpreted to imply that the dynamics of replication fork progression are strongly affected by local chromatin structure/architecture, and by interaction with machineries controlling transcription, repair and epigenetic maintenance. Here, we adopted a complementary approach for assaying replication dynamics using whole genome time-resolved chromatin immunoprecipitation combined with microarray analysis of the GINS complex, an integral member of the replication fork. Surprisingly, our data show that this complex progresses at highly uniform rates regardless of genomic location, revealing that replication fork dynamics in yeast is simpler and more uniform than previously envisaged. In addition, we show how the synergistic use of experiment and modeling leads to novel biological insights. In particular, a parsimonious model allowed us to accurately simulate fork movement throughout the genome and also revealed a subtle phenomenon, which we interpret as arising from low-frequency fork arrest.
PMCID: PMC2858444  PMID: 20212525
cell cycle; ChIP-chip; DNA replication; replication fork; simulation
23.  Whole genome duplication events in plant evolution reconstructed and predicted using myosin motor proteins 
The evolution of land plants is characterized by whole genome duplications (WGD), which drove species diversification and evolutionary novelties. Detecting these events is especially difficult if they date back to the origin of the plant kingdom. Established methods for reconstructing WGDs include intra- and inter-genome comparisons, KS age distribution analyses, and phylogenetic tree constructions.
By analysing 67 completely sequenced plant genomes 775 myosins were identified and manually assembled. Phylogenetic trees of the myosin motor domains revealed orthologous and paralogous relationships and were consistent with recent species trees. Based on the myosin inventories and the phylogenetic trees, we have identified duplications of the entire myosin motor protein family at timings consistent with 23 WGDs, that had been reported before. We also predict 6 WGDs based on further protein family duplications. Notably, the myosin data support the two recently reported WGDs in the common ancestor of all extant angiosperms. We predict single WGDs in the Manihot esculenta and Nicotiana benthamiana lineages, two WGDs for Linum usitatissimum and Phoenix dactylifera, and a triplication or two WGDs for Gossypium raimondii. Our data show another myosin duplication in the ancestor of the angiosperms that could be either the result of a single gene duplication or a remnant of a WGD.
We have shown that the myosin inventories in angiosperms retain evidence of numerous WGDs that happened throughout plant evolution. In contrast to other protein families, many myosins are still present in extant species. They are closely related and have similar domain architectures, and their phylogenetic grouping follows the genome duplications. Because of its broad taxonomic sampling the dataset provides the basis for reliable future identification of further whole genome duplications.
PMCID: PMC3850447  PMID: 24053117
Myosin; Plant evolution; Whole genome duplication
24.  Identification and characterization of two wheat Glycogen Synthase Kinase 3/ SHAGGY-like kinases 
BMC Plant Biology  2013;13:64.
Plant Glycogen Synthase Kinase 3/ SHAGGY-like kinases (GSKs) have been implicated in numerous biological processes ranging from embryonic, flower, stomata development to stress and wound responses. They are key regulators of brassinosteroid signaling and are also involved in the cross-talk between auxin and brassinosteroid pathways. In contrast to the human genome that contains two genes, plant GSKs are encoded by a multigene family. Little is known about Liliopsida resp. Poaceae in comparison to Brassicaceae GSKs. Here, we report the identification and structural characterization of two GSK homologs named TaSK1 and TaSK2 in the hexaploid wheat genome as well as a widespread phylogenetic analysis of land plant GSKs.
Genomic and cDNA sequence alignments as well as chromosome localization using nullisomic-tetrasomic lines provided strong evidence for three expressed gene copies located on homoeolog chromosomes for TaSK1 as well as for TaSK2. Predicted proteins displayed a clear GSK signature. In vitro kinase assays showed that TaSK1 and TaSK2 possessed kinase activity. A phylogenetic analysis of land plant GSKs indicated that TaSK1 and TaSK2 belong to clade II of plant GSKs, the Arabidopsis members of which are all involved in Brassinosteroid signaling. Based on a single ancestral gene in the last common ancestor of all land plants, paralogs were acquired and retained through paleopolyploidization events, resulting in six to eight genes in angiosperms. More recent duplication events have increased the number up to ten in some lineages.
To account for plant diversity in terms of functionality, morphology and development, attention has to be devoted to Liliopsida resp Poaceae GSKs in addition to Arabidopsis GSKs. In this study, molecular characterization, chromosome localization, kinase activity test and phylogenetic analysis (1) clarified the homologous/paralogous versus homoeologous status of TaSK sequences, (2) pointed out their affiliation to the GSK multigene family, (3) showed a functional kinase activity, (4) allowed a classification in clade II, members of which are involved in BR signaling and (5) allowed to gain information on acquisition and retention of GSK paralogs in angiosperms in the context of whole genome duplication events. Our results provide a framework to explore Liliopsida resp Poaceae GSKs functions in development.
PMCID: PMC3637598  PMID: 23594413
SHAGGY-like kinase; GSK-3-like kinase; Poaceae; Wheat; Homologs; Homoeologs; Phylogenetic analysis; Brassinosteroid signaling
25.  Multiple Mechanisms Promote the Retained Expression of Gene Duplicates in the Tetraploid Frog Xenopus laevis 
PLoS Genetics  2006;2(4):e56.
Gene duplication provides a window of opportunity for biological variants to persist under the protection of a co-expressed copy with similar or redundant function. Duplication catalyzes innovation (neofunctionalization), subfunction degeneration (subfunctionalization), and genetic buffering (redundancy), and the genetic survival of each paralog is triggered by mechanisms that add, compromise, or do not alter protein function. We tested the applicability of three types of mechanisms for promoting the retained expression of duplicated genes in 290 expressed paralogs of the tetraploid clawed frog, Xenopus laevis. Tests were based on explicit expectations concerning the ka/ks ratio, and the number and location of nonsynonymous substitutions after duplication. Functional constraints on the majority of paralogs are not significantly different from a singleton ortholog. However, we recover strong support that some of them have an asymmetric rate of nonsynonymous substitution: 6% match predictions of the neofunctionalization hypothesis in that (1) each paralog accumulated nonsynonymous substitutions at a significantly different rate and (2) the one that evolves faster has a higher ka/ks ratio than the other paralog and than a singleton ortholog. Fewer paralogs (3%) exhibit a complementary pattern of substitution at the protein level that is predicted by enhancement or degradation of different functional domains, and the remaining 13% have a higher average ka/ks ratio in both paralogs that is consistent with altered functional constraints, diversifying selection, or activity-reducing mutations after duplication. We estimate that these paralogs have been retained since they originated by genome duplication between 21 and 41 million years ago. Multiple mechanisms operate to promote the retained expression of duplicates in the same genome, in genes in the same functional class, over the same period of time following duplication, and sometimes in the same pair of paralogs. None of these paralogs are superfluous; degradation or enhancement of different protein subfunctions and neofunctionalization are plausible hypotheses for the retained expression of some of them. Evolution of most X. laevis paralogs, however, is consistent with retained expression via mechanisms that do not radically alter functional constraints, such as selection to preserve post-duplication stoichiometry or temporal, quantitative, or spatial subfunctionalization.
Gene duplication plays a fundamental role in biological innovation but it is not clear how both copies of a duplicated gene manage to circumvent degradation by mutation if neither is unique. This study explores genetic mechanisms that could make each copy of a duplicate gene different, and therefore distinguishable and potentially preserved by natural selection. It is based on DNA sequences of the protein-coding region of 290 expressed duplicated genes in a frog, Xenopus laevis, that underwent complete duplication of its entire genome. Results provide evidence for multiple mechanisms acting within the same genome, within the same functional classes of genes, within the same period of time following duplication, and even on the same set of duplicated genes. Each copy of a duplicate gene may be subject to distinct evolutionary constraints, and this could be associated with degradation or enhancement of function. Functional constraints of most of these duplicates, however, are not substantially different from a single copy gene; their persistence in the first dozens of millions of years after duplication may more frequently be explained by mechanisms acting on their expression rather than their function.
PMCID: PMC1449897  PMID: 16683033

