PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1301834)

Clipboard (0)
None

Related Articles

1.  Genome-wide transcriptional plasticity underlies cellular adaptation to novel challenge 
By recruiting the essential HIS3 gene to the GAL regulatory system and switching to a repressing glucose medium, we confronted yeast cells with a novel challenge they had not encountered before along their history in evolution.Adaptation to this challenge involved a global transcriptional response of a sizeable fraction of the genome, which relaxed on the time scale of the population adaptation, of order of 10 generations.For a large fraction of the responding genes there is no simple biological interpretation, connecting them to the specific cellular demands imposed by the novel challenge.Strikingly, repeating the experiment did not reproduce similar transcription patterns neither in the transient phase nor in the adapted state in glucose.These results suggest that physiological selection operates on the new metabolic configurations generated by the non-specific large scale transcriptional response to eventually stabilize an adaptive state.
Cells adjust their transcriptional state to accommodate environmental and genetic perturbations. Some common perturbations, such as changes in nutrient composition, elicit well-characterized transcriptional responses that can be understood by simple engineering-like design principles as satisfying specific demands imposed by the perturbation. However, cells also have the ability to adapt to novel and unforeseen challenges. This ability is central in realizing the evolvability potential of cells as they respond to dramatic genetic or environmental changes along evolution. Little is known about the mechanisms underlying such adaptations to novel challenges; in particular, the role of the transcriptional regulatory network in such adaptations has not been characterized. Genome-wide measurements have revealed that, in many cases, perturbations lead to a global transcriptional response involving a sizeable fraction of the genome (Gasch et al, 2000; Jelinsky et al, 2000; Causton et al, 2001; Ideker et al, 2001; Lai et al, 2005). Such global behavior suggests that general collective properties of the genetic network, rather than specific pre-designed pathways, determine an important part of the transcriptional response. It is not known however what fraction of genes within such massive transcriptional responses is essential to the specific cellular demands. It is also unknown whether the non-pre-designed part of the response can have a functional role in adaptation to novel challenges.
To study these questions, we confronted yeast cells with a novel challenge they had not encountered before along their history in evolution. A strain of the yeast Saccharomyces cerevisiae was engineered to recruit the gene HIS3, an essential enzyme from the histidine biosynthesis pathway (Hinnebusch, 1992), to the GAL regulatory system, responsible for galactose utilization (Stolovicki et al, 2006). The GAL system is known to be strongly repressed when the cells are exposed to glucose. Therefore, upon switching to a medium containing glucose and lacking histidine, the GAL system and with it HIS3 are highly repressed immediately following the switch and the cells encounter a severe challenge. We have recently shown that a cell population carrying this rewired genome can adapt to grow competitively in a chemostat in a medium containing pure glucose (Stolovicki et al, 2006). This adaptation occurred on a timescale of ∼10 generations; applying a stronger environmental pressure in the form of a competitive inhibitor to HIS3 (3AT) resulted in a similar adaptation albeit with a longer timescale. Figure 1 shows the dynamics of the population's cell density (blue lines, measured by OD) following a medium switch from galactose to glucose in the chemostat without (A) and with (B) 3AT. The experiments revealed that adaptation occurs on physiological timescales (much shorter than required by spontaneous random mutations), but the mechanisms underlying this adaptation have remained unclear (Stolovicki et al, 2006).
Yeast cells had not encountered recruitment of HIS3 to the GAL system along their evolutionary history, and their genome could not possibly have been selected to specifically address glucose repression of HIS3. This experiment, therefore, provides a unique opportunity to characterize the spontaneous transcriptional response during adaptation to a novel challenge and to assess the functional role of the regulatory system in this adaptation. We used DNA microarrays to measure the genome-wide expression levels at time points along the adaptation process, with and without 3AT. These measurements revealed that a sizeable fraction of the genome responded by induction or repression to the switch into glucose. Superimposed on the OD traces, Figure 1 shows the results of a clustering analysis of the expression of genes as measured by the arrays along time in the experiments. This analysis revealed two dominant clusters, each containing hundreds of genes in each experiment, which responded to the medium switch to glucose by a strong transient induction or repression followed by relaxation to steady state on the timescale of the adaptation process, ∼ 10 generations. The two clusters in each experiment show similar but opposite dynamics.
A detailed analysis of the gene content in the two clusters revealed that only a small portion of the response was induced by a change in carbon source (15% overlap between the corresponding clusters in the two experiments, with and without 3AT). Moreover, it revealed a very low overlap with the universal stress response observed for a wide range of environmental stresses (Gasch et al, 2000; Causton et al, 2001) and with the typical response to amino-acid starvation (Natarajan et al, 2001). Additionally, all known specific responses to stress in the literature are characterized by transient induction or repression with relaxation to steady state within a generation time (Gasch et al, 2000; Koerkamp et al, 2002; Wu et al, 2004), whereas in our experiments relaxation of the transcriptional response occurs over many generations. Taken together, these results show that the transcriptional response observed here is neither a metabolic response to the change in carbon source nor is it a standard response to stress or amino-acid starvation. This raises the possibility that it is a spontaneous collective response that is largely composed of genes that do not have a specific function. This possibility was tested directly by repeating the experiment with different populations and comparing their responses. This procedure revealed reproducible adaptation dynamics and steady states in terms of population density, but showed significantly different transcriptional transient responses and steady states for the two repeated experiments. Thus, a significant portion of the genes that changed their expression during the adaptation process do not have a well-defined and reproducible function in the challenging environment.
The application of a stronger environmental pressure in the form of 3AT had a dramatic effect on the global characteristics of the transcriptional response: it induced a markedly higher correlation among the hundreds of responding genes. Figure 3A compares the array data in color code for the two experiments. It is seen that the emergent pattern of transcription exhibits a higher degree of order by the introduction of high external pressure. Observation of the transcriptional patterns for specific metabolic pathways illustrates the different contributions to the correlated dynamics (Figure 3B–D). A general energetic module such as glycolysis exhibited similar patterns of induction and relaxation in experiments with and without 3AT (Figure 3B). However, in general, we found that more than one-third of the known metabolic modules (30 out of 88 modules described in KEGG) exhibited high expression correlation among their genes when the environmental pressure was high but not when it was low. As an example, Figure 3C shows the histidine biosynthesis pathway and Figure 3D the purine pathway. Note the highly ordered trajectories in the lower panels (with 3AT) compared to the disordered ones in the upper panels (no 3AT). This order extends also between genes belonging to different and even distant metabolic modules. It indicates that a global transcriptional regulatory mechanism is in operation, rather than a local specific one. Surprisingly, genes belonging to the same metabolic pathway exhibited simultaneous positively and negatively correlated dynamics. Thus, an important conclusion of this work is that the global transcriptional response to a novel challenge cannot be explained by a simple cellular or metabolic logic. This is to be expected if the response had not been specifically selected in evolution and was not pre-designed for the challenge.
Our data clearly reveal that the massive transcriptional response underlies the adaptation process to a novel challenge. The novelty of the challenge presented to the cells excludes the possibility that this response has been specifically selected toward this challenge. Thus, transcriptional regulation has dynamic properties resulting in a general massive nonspecific response to a novel perturbation. Such a response in turn allows for metabolic rearrangements, which by feeding back on transcription lead to adaptation of the cells to the unforeseen situation. The drastic change in the expression state of the cell opens multiple new metabolic pathways. Physiological selection works then on these multiple metabolic pathways to stabilize an adaptive state that causes relaxation of the perturbed expression pattern. This scenario, involving the creation of a library of possibilities and physiological selection over this library, is compatible with our understanding of a broad class of biological systems, placing the cellular metabolic/regulatory networks on the same footing as the neural or the immune systems (Gerhart and Kirschner, 1997).
Cells adjust their transcriptional state to accommodate environmental and genetic perturbations. An open question is to what extent transcriptional response to perturbations has been specifically selected along evolution. To test the possibility that transcriptional reprogramming does not need to be ‘pre-designed' to lead to an adaptive metabolic state on physiological timescales, we confronted yeast cells with a novel challenge they had not previously encountered. We rewired the genome by recruiting an essential gene, HIS3, from the histidine biosynthesis pathway to a foreign regulatory system, the GAL network responsible for galactose utilization. Switching medium to glucose in a chemostat caused repression of the essential gene and presented the cells with a severe challenge to which they adapted over approximately 10 generations. Using genome-wide expression arrays, we show here that a global transcriptional reprogramming (>1200 genes) underlies the adaptation. A large fraction of the responding genes is nonreproducible in repeated experiments. These results show that a nonspecific transcriptional response reflecting the natural plasticity of the regulatory network supports adaptation of cells to novel challenges.
doi:10.1038/msb4100147
PMCID: PMC1865588  PMID: 17453047
adaptation; cellular metabolism; expression arrays; plasticity; transcriptional response
2.  Genetic interactions reveal the evolutionary trajectories of duplicate genes 
Duplicate genes show significantly fewer interactions than singleton genes, and functionally similar duplicates can exhibit dissimilar profiles because common interactions are ‘hidden' due to buffering.Genetic interaction profiles provide insights into evolutionary mechanisms of duplicate retention by distinguishing duplicates under dosage selection from those retained because of some divergence in function.The genetic interactions of duplicate genes evolve in an extremely asymmetric way and the directionality of this asymmetry correlates well with other evolutionary properties of duplicate genes.Genetic interaction profiles can be used to elucidate the divergent function of specific duplicate pairs.
Gene duplication and divergence serves as a primary source for new genes and new functions, and as such has broad implications on the evolutionary process. Duplicate genes within S. cerevisiae have been shown to retain a high degree of similarity with regard to many of their functional properties (Papp et al, 2004; Guan et al, 2007; Wapinski et al, 2007; Musso et al, 2008), and perturbation of duplicate genes has been shown to result in smaller fitness defects than singleton genes (Gu et al, 2003; DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Individual genetic interactions between pairs of genes and profiles of such interactions across the entire genome provide a new context in which to examine the properties of duplicate compensation.
In this study we use the most recent and comprehensive set of genetic interactions in yeast produced to date (Costanzo et al, 2010) to address questions of duplicate retention and redundancy. We show that the ability for duplicate genes to buffer the deletion of a partner has three main consequences. First it agrees with previous work demonstrating that a high proportion of duplicate pairs are synthetic lethal, a classic indication of the ability to buffer one another functionally (DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Second, it reduces the number of genetic interactions observed between duplicate genes and the rest of the genome by masking interactions relating to common function from experimental detection. Third, this buffering of common interactions serves to reduce profile similarity in spite of common function (Figure 1). The compensatory ability of functionally similar duplicates buffers genetic interactions related to their common function (reducing the number of genetic interactions overall), while allowing the measurement of interactions related to any divergent function. Thus, even functionally similar duplicates may have dissimilar genetic interaction profiles. As previously surmised (Ihmels et al, 2007), duplicate genes under selection for dosage amplification have differing profile characteristics. We show that dosage-mediated duplicates have much higher genetic interaction profile similarity than do other duplicate pairs. Furthermore, we show in a comparison with local neighbors on a protein–protein interaction (PPI) network, that although dosage-mediated duplicates more often have higher similarity to each other than they do to their neighbors, the reverse is true for duplicates in general. That is, slightly divergent duplicate genes more often exhibit a higher similarity with a common neighbor on the PPI network than they do with each other, and that observation is consistent with the idea that common interactions are buffered while interactions corresponding to divergent functions are observed.
We then asked whether duplicates' genetic interactions that are not buffered appear in a symmetric or an asymmetric fashion. Previous work has established asymmetric patterns with regard to PPI degree (Wagner, 2002; He and Zhang, 2005), sequence divergence (Conant and Wagner, 2003; Zhang et al, 2003; Kellis et al, 2004; Scannell and Wolfe, 2008) and expression patterns (Gu et al, 2002b; Tirosh and Barkai, 2007). Although genetic interactions are further removed from mechanism than protein–protein interactions, for example, they do offer a more direct measurement of functional consequence and, thus, may give a better indication of the functional differences between a duplicate pair. We found that duplicates exhibit a strikingly asymmetric pattern of genetic interactions, with the ratio of interactions between sisters commonly exceeding 7:1 (Figure 4A). The observations differ significantly from random simulations in which genetic interactions were redistributed between sisters with equal probability (Figure 4A). Moreover, the directionality of this interaction asymmetry agrees with other physiological properties of duplicate pairs. For example, the sister with more genetic interactions also tends to have more protein–protein interactions and also tends to evolve at a slower rate (Figure 4B).
Genetic interaction degree and profiles can be used to understand the functional divergence of particular duplicates pairs. As a case example, we consider the whole-genome-duplication pair CIK1–VIK1. Each of these genes encode proteins that form distinct heterodimeric complexes with the microtubule motor protein Kar3 (Manning et al, 1999). Although each of these proteins depend on a direct physical interaction with Kar3, Cik1 has a much higher profile similarity to Kar3 than does Vik1 (r=0.5 and r=0.3, respectively). Consistent with its higher similarity, Δcik1 and Δkar3 exhibit several similar phenotypes, including abnormally short spindles, chromosome loss and delayed cell cycle progression (Page et al, 1994; Manning et al, 1999). In contrast, a Δvik1 mutant strain exhibits no overt phenotype (Manning et al, 1999).
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. Large-scale genetic network analysis in Saccharomyces cerevisiae provides a powerful perspective for addressing these questions through quantitative measurements of genetic interactions between pairs of duplicated genes, and more generally, through the study of genome-wide genetic interaction profiles associated with duplicated genes. We show that duplicate genes exhibit fewer genetic interactions than other genes because they tend to buffer one another functionally, whereas observed interactions are non-overlapping and reflect their divergent roles. We also show that duplicate gene pairs are highly imbalanced in their number of genetic interactions with other genes, a pattern that appears to result from asymmetric evolution, such that one duplicate evolves or degrades faster than the other and often becomes functionally or conditionally specialized. The differences in genetic interactions are predictive of differences in several other evolutionary and physiological properties of duplicate pairs.
doi:10.1038/msb.2010.82
PMCID: PMC3010121  PMID: 21081923
duplicate genes; functional divergence; genetic interactions; paralogs; Saccharomyces cerevisiae
3.  Network Hubs Buffer Environmental Variation in Saccharomyces cerevisiae 
PLoS Biology  2008;6(11):e264.
Regulatory and developmental systems produce phenotypes that are robust to environmental and genetic variation. A gene product that normally contributes to this robustness is termed a phenotypic capacitor. When a phenotypic capacitor fails, for example when challenged by a harsh environment or mutation, the system becomes less robust and thus produces greater phenotypic variation. A functional phenotypic capacitor provides a mechanism by which hidden polymorphism can accumulate, whereas its failure provides a mechanism by which evolutionary change might be promoted. The primary example to date of a phenotypic capacitor is Hsp90, a molecular chaperone that targets a large set of signal transduction proteins. In both Drosophila and Arabidopsis, compromised Hsp90 function results in pleiotropic phenotypic effects dependent on the underlying genotype. For some traits, Hsp90 also appears to buffer stochastic variation, yet the relationship between environmental and genetic buffering remains an important unresolved question. We previously used simulations of knockout mutations in transcriptional networks to predict that many gene products would act as phenotypic capacitors. To test this prediction, we use high-throughput morphological phenotyping of individual yeast cells from single-gene deletion strains to identify gene products that buffer environmental variation in Saccharomyces cerevisiae. We find more than 300 gene products that, when absent, increase morphological variation. Overrepresented among these capacitors are gene products that control chromosome organization and DNA integrity, RNA elongation, protein modification, cell cycle, and response to stimuli such as stress. Capacitors have a high number of synthetic-lethal interactions but knockouts of these genes do not tend to cause severe decreases in growth rate. Each capacitor can be classified based on whether or not it is encoded by a gene with a paralog in the genome. Capacitors with a duplicate are highly connected in the protein–protein interaction network and show considerable divergence in expression from their paralogs. In contrast, capacitors encoded by singleton genes are part of highly interconnected protein clusters whose other members also tend to affect phenotypic variability or fitness. These results suggest that buffering and release of variation is a widespread phenomenon that is caused by incomplete functional redundancy at multiple levels in the genetic architecture.
Author Summary
Most species maintain abundant genetic variation and experience a wide range of environmental conditions, yet phenotypic differences between individuals are usually small. This phenomenon, known as phenotypic robustness, presents an apparent contradiction: if biological systems are so resistant to variation, how do they diverge and adapt through evolutionary time? Here, we address this question by investigating the molecular mechanisms that underlie phenotypic robustness and how these mechanisms can be broken to produce phenotypic heterogeneity. We identify genes that contribute to phenotypic robustness in yeast by analyzing the variance of morphological phenotypes in a comprehensive collection of single-gene knockout strains. We find that ∼5% of yeast genes break phenotypic robustness when knocked out. The products of these genes tend to be involved in critical cellular processes, including maintaining DNA stability, processing RNA, modifying proteins, and responding to stressful environments. These genes tend to interact genetically with a large number of other genes, and their products tend to interact physically with a large number of other gene products. Our results suggest that loss of phenotypic robustness might be a common phenomenon during evolution that occurs when cellular networks are disrupted.
A genome-wide screen inSaccharomyces cerevisiae identifies over 300 gene products that buffer environmental variation--dubbed phenotypic capacitors--and function as hubs in protein-protein and synthetic-lethal interaction networks.
doi:10.1371/journal.pbio.0060264
PMCID: PMC2577700  PMID: 18986213
4.  Metabolic modeling of endosymbiont genome reduction on a temporal scale 
This study explores the order in which individual metabolic genes are lost in an in silico evolutionary process leading from the metabolic network of Eschericia coli to that of the genome-reduced endosymbiont Buchnera aphidicola.
Simulating the reductive evolutionary process under several growth conditions, a remarkable correlation between in silico and phylogenetically reconstructed gene loss time is obtained.A gene's k-robustness (its depth of backups) is prime determinant of its loss time.In silico gene loss time is a better predictor of their actual loss times than genomic features and network properties.Simulating the reductive evolutionary process by the loss of large blocks followed by single-gene deletions, as known to occur in evolution, yields a remarkable correspondence with the phylogenetic reconstruction and the block loss reported in the literature.
An open fundamental challenge in Systems Biology is whether a genome-scale model can predict patterns of genome evolution by realistically accounting for the associated biochemical constraints. In this study, we explore the order in which individual genes are lost in an in silico evolutionary process, leading from the metabolic network of Eschericia coli to that of the endosymbiont Buchnera aphidicola.
To evaluate the in silico gene loss time, we repeated the reductive evolutionary process introduced by Pál et al (2006), denoting the in silico deletion time of a gene in a single run of the reductive evolutionary process as the number of genes deleted before its own deletion occurred. By comparing the in silico evaluations of the gene loss time to that obtained by a phylogenetic reconstruction (Figure 1), we could evaluate the ability of an in silico process to predict temporal patterns of genome reduction. Applying this procedure on a literature-based viable media, we obtained a mean Spearman's correlation of 0.46 (53% of the maximal correlation, empirical P-value <9.9e−4) between in silico and phylogenetically reconstructed loss times. In order to provide an upper bound on evolutionary necessity stemming from metabolic constraints, we searched the space of potential growth media and biomass functions via a simulated annealing search algorithm aimed at identifying an environment/biomass function that maximizes the target correlation between in silico and reconstructed loss times. Simulating the reductive evolutionary process under the growth conditions and biomass function obtained in this process, we managed to improve the correlation between in silico and reconstructed loss times to a mean Spearman's correlation of 0.54 (63% of the maximal correlation, empirical P-value <9.9e−4, Figure 3).
Examining the dependency of the predicted loss time of each gene on its intrinsic network-level properties we find a very strong inverse Spearman's correlation of −0.84 (empirical P-value <9.9e−4) between the order of gene loss predicted in silico and the k-robustness levels of the genes, the latter denoting the depth of their functional backups in the network (Deutscher et al, 2006). Moreover, in order to examine whether the relative loss time of a gene is influenced by its functional dependencies with other genes, we performed a flux-coupling analysis and identified pairs of reactions whose activities asymmetrically depend on each other, i.e., are directionally coupled (Burgard et al, 2004). We find that genes encoding reactions whose activity is needed for activating the other reaction (and not vice versa) have a tendency to be lost later, as one would expect (binomial P-value <1e−14).
To assess the scale of these results, we examined as a control how well genomic features and network properties predict the phylogenetically reconstructed gene loss times. We examined the dependency of the latter on several factors that are known be inversely correlated with the propensity of a gene to be lost (Brinza et al, 2009; Delmotte et al, 2006; Tamames et al, 2007), including the genes' mRNA levels, tAI values (Covert et al, 2004; Reis et al, 2004; Sharp and Li, 1987; Tuller et al, 2010a) and the number of partners the gene products have in a protein–protein interaction network. Remarkably, these genomic features yield considerably lower Spearman's correlation than that obtained by the in silico simulations. Moreover, multiply regressing the loss times from the phylogenetic reconstruction on the in silico gene loss time predictions and the genomic and network variables, we found that the (normalized) coefficient of the in silico predictions in the regression is much higher than those of the genomic features, further testifying to the considerable independent predictive power of the metabolic model.
Finally, simulating the evolutionary process as large block deletions at first followed by single-gene deletions as is thought to occur in evolution (Moran and Mira, 2001; van Ham et al, 2003), a remarkable correspondence with the phylogenetic reconstruction was found. Namely, we find that after a certain amount of genes are deleted from the genome, no further block deletions can occur due to the increasing density of essential genes. Notably, the maximum amount of genes that can be deleted in blocks (i.e., until no more blocks can be deleted) corresponds to the number of genes appearing in our phylogenetic reconstruction from the LCA (last common ancestor of Buchnera and E. coli) to the LCSA (last common symbiotic ancestor, nodes 1–3 in Figure 1A), as described in the literature.
A fundamental challenge in Systems Biology is whether a cell-scale metabolic model can predict patterns of genome evolution by realistically accounting for associated biochemical constraints. Here, we study the order in which genes are lost in an in silico evolutionary process, leading from the metabolic network of Eschericia coli to that of the endosymbiont Buchnera aphidicola. We examine how this order correlates with the order by which the genes were actually lost, as estimated from a phylogenetic reconstruction. By optimizing this correlation across the space of potential growth and biomass conditions, we compute an upper bound estimate on the model's prediction accuracy (R=0.54). The model's network-based predictive ability outperforms predictions obtained using genomic features of individual genes, reflecting the effect of selection imposed by metabolic stoichiometric constraints. Thus, while the timing of gene loss might be expected to be a completely stochastic evolutionary process, remarkably, we find that metabolic considerations, on their own, make a marked 40% contribution to determining when such losses occur.
doi:10.1038/msb.2011.11
PMCID: PMC3094061  PMID: 21451589
constraint-based modeling; endosymbiont; evolution; metabolism
5.  Niche adaptation by expansion and reprogramming of general transcription factors 
Experimental analysis of TFB family proteins in a halophilic archaeon reveals complex environment-dependent fitness contributions. Gene conversion events among these proteins can generate novel niche adaptation capabilities, a process that may have contributed to archaeal adaptation to extreme environments.
Evolution of archaeal lineages correlate with duplication events in the TFB family.Each TFB is required for adaptation to multiple environments.The relative fitness contributions of TFBs change with environmental context.Changes in the regulation of duplicated TFBs can generate new adaptation capabilities.
The evolutionary success of an organism depends on its ability to continually adapt to changes in the patterns of constant, periodic, and transient challenges within its environment. This process of ‘niche adaptation' requires reprogramming of the organism's environmental response networks by reorganizing interactions among diverse parts including environmental sensors, signal transducers, and transcriptional and post-transcriptional regulators. Gene duplications have been discovered to be one of the principal strategies in this process, especially for reprogramming of gene regulatory networks (GRNs). Whereas eukaryotes require dozens of factors for recruitment of RNA polymerase, archaea require just two general transcription factors (GTFs) that are orthologous to eukaryotic TFIIB (TFB in archaea) and TATA-binding protein (TBP) (Bell et al, 1998). Both of these GTFs have expanded extensively in nearly 50% of all archaea whose genomes have been fully sequenced. The phylogenetic analysis presented in this study reveal lineage-specific expansions of TFBs, suggesting that they might encode functionally specialized gene regulatory programs for the unique environments to which these organisms have adapted. This hypothesis is particularly appealing when we consider that the greatest expansion is observed within the group of halophilic archaea whose habitats are associated with routine and dynamic changes in a number of environmental factors including light, temperature, oxygen, salinity, and ionic composition (Rodriguez-Valera, 1993; Litchfield, 1998).
We have previously demonstrated that variations in the expanded set of TFBs (a through e) in Halobacterium salinarum NRC-1 manifests at the level of physical interactions within and across the two families, their DNA-binding specificity, their differential regulation in varying environments, and, ultimately, on the large-scale segregation of transcription of all genes into overlapping yet distinct sets of functionally related groups (Facciotti et al, 2007). We have extended findings from this earlier study with a systematic survey of the fitness consequences of perturbing the TFB network of H. salinarum NRC-1 across 17 environments. Notably, each TFB conferred fitness in two or more environmental conditions tested, and the relative fitness contributions (see Table I) of the five TFBs varied significantly by environment. From an evolutionary perspective, the relationships among these fitness landscapes reveal that two classes of TFBs (c/g- and f-type) appear to have played an important role in the evolution of halophilic archaea by overseeing regulation of core physiological capabilities in these organisms. TFBs of the other clades (b/d and a/e) seem to have emerged much more recently through gene duplications or horizontal gene transfers (HGTs) and are being utilized for adaptation to specialized environmental conditions.
We also investigated higher-order functional interactions and relationships among the duplicated TFBs by performing competition experiments and by mapping genetic interactions in different environments. This demonstrated that depending on environmental context, the TFBs have strikingly different functional hierarchies and genetic interactions with one another. This is remarkable as it makes each TFB essential albeit at different times in a dynamically changing environment.
In order to understand the process by which such gene family expansions shape architecture and functioning of a GRN, we performed integrated analysis of phylogeny, physical interactions, regulation, and fitness landscapes of the seven TFBs in H. salinarum NRC-1. This revealed that evolution of both their protein-coding sequence and their promoter has been instrumental in the encoding of environment-specific regulatory programs. Importantly, the convergent and divergent evolution of regulation and binding properties of TFBs suggested that, aside from HGT and random mutations, a third plausible (and perhaps most interesting) mechanism for acquiring a novel TFB variant is through gene conversion. To test this hypothesis, we synthesized a novel TFBx by transferring TFBa/e clade-specific residues to a TFBd backbone, transformed this variant under the control of either the TFBd or the TFBe promoter (PtfbD or PtfbE) into three different host genetic backgrounds (Δura3 (parent), ΔtfbD, and ΔtfbE), and analyzed fitness and gene expression patterns during growth at 25 and 37°C. This showed that gene conversion events spanning the coding sequence and the promoter, environmental context, and genetic background of the host are all extremely influential in the functional integration of a TFB into the GRN. Importantly, this analysis suggested that altering the regulation of an existing set of expanded TFBs might be an efficient mechanism to reprogram the GRN to rapidly generate novel niche adaptation capability. We have confirmed this experimentally by increasing fitness merely by moving tfbE to PtfbD control, and by generating a completely novel phenotype (biofilm-like appearance) by overexpression of tfbE.
Altogether this study clearly demonstrates that archaea can rapidly generate novel niche adaptation programs by simply altering regulation of duplicated TFBs. This is significant because expansions in the TFB family is widespread in archaea, a class of organisms that not only represent 20% of biomass on earth but are also known to have colonized some of the most extreme environments (DeLong and Pace, 2001). This strategy for niche adaptation is further expanded through interactions of the multiple TFBs with members of other expanded TF families such as TBPs (Facciotti et al, 2007) and sequence-specific regulators (e.g. Lrp family (Peeters and Charlier, 2010)). This is analogous to combinatorial solutions for other complex biological problems such as recognition of pathogens by Toll-like receptors (Roach et al, 2005), generation of antibody diversity by V(D)J recombination (Early et al, 1980), and recognition and processing of odors (Malnic et al, 1999).
Numerous lineage-specific expansions of the transcription factor B (TFB) family in archaea suggests an important role for expanded TFBs in encoding environment-specific gene regulatory programs. Given the characteristics of hypersaline lakes, the unusually large numbers of TFBs in halophilic archaea further suggests that they might be especially important in rapid adaptation to the challenges of a dynamically changing environment. Motivated by these observations, we have investigated the implications of TFB expansions by correlating sequence variations, regulation, and physical interactions of all seven TFBs in Halobacterium salinarum NRC-1 to their fitness landscapes, functional hierarchies, and genetic interactions across 2488 experiments covering combinatorial variations in salt, pH, temperature, and Cu stress. This systems analysis has revealed an elegant scheme in which completely novel fitness landscapes are generated by gene conversion events that introduce subtle changes to the regulation or physical interactions of duplicated TFBs. Based on these insights, we have introduced a synthetically redesigned TFB and altered the regulation of existing TFBs to illustrate how archaea can rapidly generate novel phenotypes by simply reprogramming their TFB regulatory network.
doi:10.1038/msb.2011.87
PMCID: PMC3261711  PMID: 22108796
evolution by gene family expansion; fitness; niche adaptation; reprogramming of gene regulatory network; transcription factor B
6.  Targeted interactomics reveals a complex core cell cycle machinery in Arabidopsis thaliana 
A protein interactome focused towards cell proliferation was mapped comprising 857 interactions among 393 proteins, leading to many new insights in plant cell cycle regulation.A comprehensive view on heterodimeric cyclin-dependent kinase (CDK)/cyclin complexes in plants is obtained, in relation with their regulators.Over 100 new candidate cell cycle proteins were predicted.
The basic underlying mechanisms that govern the cell cycle are conserved among all eukaryotes. Peculiar for plants, however, is that their genome contains a collection of cell cycle regulatory genes that is intriguingly large (Vandepoele et al, 2002; Menges et al, 2005) compared to other eukaryotes. Arabidopsis thaliana (Arabidopsis) encodes 71 genes in five regulatory classes versus only 15 in yeast and 23 in human.
Despite the discovery of numerous cell cycle genes, little is known about the protein complex machinery that steers plant cell division. Therefore, we applied tandem affinity purification (TAP) approach coupled with mass spectrometry (MS) on Arabidopsis cell suspension cultures to isolate and analyze protein complexes involved in the cell cycle. This approach allowed us to successfully map a first draft of the basic cell cycle complex machinery of Arabidopsis, providing many new insights into plant cell division.
To map the interactome, we relied on a streamlined platform comprising generic Gateway-based vectors with high cloning flexibility, the fast generation of transgenic suspension cultures, TAP adapted for plant cells, and matrix-assisted laser desorption ionization (MALDI) tandem-MS for the identification of purified proteins (Van Leene et al, 2007, 2008Van Leene et al, 2007, 2008). Complexes for 102 cell cycle proteins were analyzed using this approach, leading to a non-redundant data set of 857 interactions among 393 proteins (Figure 1A). Two subspaces were identified in this data set, domain I1, containing interactions confirmed in at least two independent experimental repeats or in the reciprocal purification experiment, and domain I2 consisting of uniquely observed interactions.
Several observations underlined the quality of both domains. All tested reverse purifications found the original interaction, and 150 known or predicted interactions were confirmed, meaning that also a huge stack of new interactions was revealed. An in-depth computational analysis revealed enrichment for many cell cycle-related features among the proteins of the network (Figure 1B), and many protein pairs were coregulated at the transcriptional level (Figure 1C). Through integration of known cell cycle-related features, more than 100 new candidate cell cycle proteins were predicted (Figure 1D). Besides common qualities of both interactome domains, their real significance appeared through mutual differences exposing two subspaces in the cell cycle interactome: a central regulatory network of stable complexes that are repeatedly isolated and represent core regulatory units, and a peripheral network comprising transient interactions identified less frequently, which are involved in other aspects of the process, such as crosstalk between core complexes or connections with other pathways. To evaluate the biological relevance of the cell cycle interactome in plants, we validated interactions from both domains by a transient split-luciferase assay in Arabidopsis plants (Marion et al, 2008), further sustaining the hypothesis-generating power of the data set to understand plant growth.
With respect to insights into the cell cycle physiology, the interactome was subdivided according to the functional classes of the baits and core protein complexes were extracted, covering cyclin-dependent kinase (CDK)/cyclin core complexes together with their positive and negative regulation networks, DNA replication complexes, the anaphase-promoting complex, and spindle checkpoint complexes. The data imply that mitotic A- and B-type cyclins exclusively form heterodimeric complexes with the plant-specific B-type CDKs and not with CDKA;1, whereas D-type cyclins seem to associate with CDKA;1. Besides the extraction of complexes previously shown in other organisms, our data also suggested many new functional links; for example, the link coupling cell division with the regulation of transcript splicing. The association of negative regulators of CDK/cyclin complexes with transcription factors suggests that their role in reallocation is not solely targeted to CDK/cyclin complexes. New members of the Siamese-related inhibitory proteins were identified, and for the first time potential inhibitors of plant-specific mitotic B-type CDKs have been found in plants. New evidence that the E2F–DP–RBR network is not only active at G1-to-S, but also at the G2-to-M transition is provided and many complexes involved in DNA replication or repair were isolated. For the first time, a plant APC has been isolated biochemically, identifying three potential new plant-specific APC interactors, and finally, complexes involved in the spindle checkpoint were isolated mapping many new but specific interactions.
Finally, to get a general view on the complex machinery, modules of interacting cyclins and core cell cycle regulators were ranked along the cell cycle phases according to the transcript expression peak of the cyclins, showing an assorted set of CDK–cyclin complexes with high regulatory differentiation (Figure 4). Even within the same subfamily (e.g. cyclin A3, B1, B2, D3, and D4), cyclins differ not only in their functional time frame but also in the type and number of CDKs, inhibitors, and scaffolding proteins they bind, further indicating their functional diversification. According to our interaction data, at least 92 different variants of CDK–cyclin complexes are found in Arabidopsis.
In conclusion, these results reflect how several rounds of gene duplication (Sterck et al, 2007) led to the evolution of a large set of cyclin paralogs and a myriad of regulators, resulting in a significant jump in the complexity of the cell cycle machinery that could accommodate unique plant-specific features such as an indeterminate mode of postembryonic development. Through their extensive regulation and connection with a myriad of up- and downstream pathways, the core cell cycle complexes might offer the plant a flexible toolkit to fine-tune cell proliferation in response to an ever-changing environment.
Cell proliferation is the main driving force for plant growth. Although genome sequence analysis revealed a high number of cell cycle genes in plants, little is known about the molecular complexes steering cell division. In a targeted proteomics approach, we mapped the core complex machinery at the heart of the Arabidopsis thaliana cell cycle control. Besides a central regulatory network of core complexes, we distinguished a peripheral network that links the core machinery to up- and downstream pathways. Over 100 new candidate cell cycle proteins were predicted and an in-depth biological interpretation demonstrated the hypothesis-generating power of the interaction data. The data set provided a comprehensive view on heterodimeric cyclin-dependent kinase (CDK)–cyclin complexes in plants. For the first time, inhibitory proteins of plant-specific B-type CDKs were discovered and the anaphase-promoting complex was characterized and extended. Important conclusions were that mitotic A- and B-type cyclins form complexes with the plant-specific B-type CDKs and not with CDKA;1, and that D-type cyclins and S-phase-specific A-type cyclins seem to be associated exclusively with CDKA;1. Furthermore, we could show that plants have evolved a combinatorial toolkit consisting of at least 92 different CDK–cyclin complex variants, which strongly underscores the functional diversification among the large family of cyclins and reflects the pivotal role of cell cycle regulation in the developmental plasticity of plants.
doi:10.1038/msb.2010.53
PMCID: PMC2950081  PMID: 20706207
Arabidopsis thaliana; cell cycle; interactome; protein complex; protein interactions
7.  Different sets of QTLs influence fitness variation in yeast 
We have carried out a combination of in-lab-evolution (ILE) and congenic crosses to identify the gene sets that contribute to the ability of yeast cells to survive under alkali stress.Each selected line acquired a different set of mutations, all resulting in the same phenotype. We identified a total of 15 genes in ILE and 17 candidates in the congenic approach, and studied their individual contribution to the phenotype.The total additive effect of the QTLs was much larger than the difference between the ancestor and the evolved strains, suggesting epistatic interactions between the QTLs.None of the genes identified encode structural components of the pH machinery. Instead, most encode regulatory functions, such as ubiquitin ligases, chromatin remodelers, GPI anchoring and copper/iron sensing transcription factors.
The majority of phenotypes in nature are complex traits affected by multiple genes [usually called quantitative trait loci (QTLs)], as well as by environmental factors. Many traits with practical importance such as crop yield in plants and susceptibility to various diseases in humans fall under this category. Understanding the architecture of complex traits has become the new frontier of genetic research, and many studies have greatly contributed to this field. However, to date, the genetic basis of only a few of these traits has been identified, and many questions regarding the architecture of complex traits and the accumulation of QTLs during evolution still remain unanswered. Among them are: How many QTLs affect complex phenotypes? What is the effect of each QTL? How do complex traits change during evolution? Is the adaptation process repeatable?, etc. In order to identify the QTLs that affect one of the important components of fitness variability in yeast, and to answer some of the questions above, we combined in-lab evolution (ILE) with the construction of congenic lines to isolate and map several gene sets that contribute to the ability of yeast cells to survive under alkali stress.
We carried out an ILE experiment, in which we grew yeast populations under increasing alkali stress to enrich for beneficial mutations. This process was followed by hybridizations to tiling arrays to identify the mutations acquired during the laboratory selective process. The ILE procedure revealed mutations in 15 genes, thus defining the QTLs and mechanisms that affect, in a quantitative fashion, the ability to cope with alkali stress. Our results indicate that during ILE several populations acquired different sets of QTLs that conferred the same phenotype. We identified each individual mutation in these strains, and validated and estimated their contribution to the phenotype. The total additive effect of the QTLs was much larger than the difference between the ancestor and the evolved strains, suggesting epistatic interactions between the QTLs.
In addition to the ILE, we have studied the mechanisms regulating fitness under alkali stress at natural habitats. We used a clinically isolated strain able to grow at high pH and a standard laboratory strain with a limited ability to sustain high pH as the parents of series of backcrosses to construct congenic lines up to the 8th generation. Seventeen genomic intervals that are candidates to contain QTLs were thus identified. In order to detect the contributing QTL in each interval, a predictive algorithm was applied, which scored the candidate genes in each genomic interval based on their interactions and similarity to the ILE genes. The algorithm was validated by testing the effect of the predicted candidate gene's deletions on the phenotype. Twelve out of 29 deletions were found to affect the trait (P-value 0.023).
Interestingly, our results show that almost all beneficial mutations affected regulatory genes, and not structural components of the pH homeostasis machinery (such as proton pumps, which control the cell's pH). The genes identified affect global regulators, such as ubiquitin ligases, proteins involved in GPI anchoring, copper sensing and chromatin remodelers. Thus, we show that adaptive changes tend to occur in genes with wide influence, rather than in genes narrowly affecting the phenotype selected for.
One example of genes identified both in the ILE and in the congenic lines is the copper-sensing transcription factor MAC1, and its downstream targets CTR1 and CTR3, which encode copper transporters. Different mutations at the same residue (Cys 271) were found in four out of five independent ILE lines. These mutations inactivate a copper-sensing region of Mac1 and cause up-regulation of its target genes. The CTR1 and CTR3 genes were identified in the congenic lines. Moreover, we found that a Ty transposable element is responsible for the decreased expression of CTR3 in some strains, and its excision caused transcriptional activation, affecting the ability to thrive at high pH.
This work provides insights on both evolutionary and genetic issues (such as the appearance of adaptive mutations and the architecture of complex traits), while at the same time providing information about the mechanisms that contribute to growth at high pH, a subject with ramifications for cell physiology, pathogenicity, and stress response.
Most of the phenotypes in nature are complex and are determined by many quantitative trait loci (QTLs). In this study we identify gene sets that contribute to one important complex trait: the ability of yeast cells to survive under alkali stress. We carried out an in-lab evolution (ILE) experiment, in which we grew yeast populations under increasing alkali stress to enrich for beneficial mutations. The populations acquired different sets of affecting alleles, showing that evolution can provide alternative solutions to the same challenge. We measured the contribution of each allele to the phenotype. The sum of the effects of the QTLs was larger than the difference between the ancestor phenotype and the evolved strains, suggesting epistatic interactions between the QTLs. In parallel, a clinical isolated strain was used to map natural QTLs affecting growth at high pH. In all, 17 candidate regions were found. Using a predictive algorithm based on the distances in protein-interaction networks, candidate genes were defined and validated by gene disruption. Many of the QTLs found by both methods are not directly implied in pH homeostasis but have more general, and often regulatory, roles.
doi:10.1038/msb.2010.1
PMCID: PMC2835564  PMID: 20160707
congenic lines; growth on alkali; in-lab evolution; QTL mapping; Saccharomyces cerevisiae
8.  GroEL dependency affects codon usage—support for a critical role of misfolding in gene evolution 
Integrating genome-scale sequence, expression, structural and protein interaction data from E. coli we establish an interaction between chaperone (GroEL) dependency and optimal codon usage.Highly expressed sporadic substrates of GroEL employ more optimal codons than expected, show enrichment for optimal codons at structurally sensitive sites and greater conservation of codon optimality under conditions of relaxed purifying selection.We suggest that highly expressed genes cannot routinely utilize GroEL for error control so that codon usage has evolved to provide complementary error limitation, whereas obligate GroEL substrates experience relaxed selection on codon usage.Our results support a critical role of misfolding prevention in gene evolution.
Errors during gene expression are relatively commonplace, which has prompted speculations that many features of gene and genome anatomy and organization have evolved to reduce or mitigate such errors. One type of error that can be particularly costly occurs when the polypeptide chain that emerges from the ribosome fails to fold into its native structure. Some aberrantly folded proteins, exposing hydrophobic residues that would normally be buried, may begin to promiscuously interact with other proteins, become toxic to the cell and thus pose a substantial fitness concern (Gregersen et al, 2006).
In trans, molecular chaperones have long been recognized to play crucial roles in misfolding prevention and remedy. In cis, it has recently been suggested that the use of optimal codons limits mistranslation-induced protein misfolding (Drummond and Wilke, 2008). Evidence for the latter is centred on the argument that synonymous codons differ in their propensity to cause mistranslation. Translationally optimal codons, typically represented by more abundant cognate tRNAs (Duret, 2000), are thought less likely to cause ribosomal stalling and/or incorporation of the wrong amino acid.
Here, we suggest that the role, if any, of error limitation in cis can be revealed by studying its interaction with well-established error management systems in trans (chaperones). If codon usage does indeed play a tangible role in misfolding prevention, we would expect selection on codon identity to vary with the degree to which a protein can rely on other error control mechanisms, namely chaperones. We use the E. coli chaperonin GroEL as a model system to explore whether there is any interaction between optimal codon usage and chaperone dependency.
Kerner et al (2005) had previously determined GroEL substrates on a genome-wide scale. Based on enrichment in GroEL complexes the authors assigned ∼250 proteins to three classes reflecting GroEL dependency: class-I proteins, only a small fraction of which (<1%) associates with GroEL and which spontaneously regain some activity; class-II proteins, which only exhibit spontaneous refolding at more permissive temperatures and class-III proteins, which are obligate substrates of GroEL and largely fail to refold even under more benign conditions. Notably, although on average less abundant than class-I/II proteins (‘sporadic clients'), class-III proteins (‘obligate clients') occupy ∼80% of GroEL's capacity in vivo. Consequently, a higher proportion (∼100% versus ∼20% for class-II and ∼1% for class-I) of these proteins is routinely processed by the GroEL system.
We demonstrate that sporadic but not obligate clients of GroEL exhibit enhanced codon adaptation, carefully controlling for possible confounding factors, notably expression level and protein length (Figure 1). We also point out that genes that recently entered the E. coli genome via horizontal gene transfer will distort equilibrium analyses of codon usage in bacteria and should thus be routinely eliminated from analysis.
Building on earlier work by Zhou et al (2009), we further show that sporadic substrates are conspicuously enriched for optimal codons at structurally sensitive sites, consistent with more severe fitness implications of codon choice for these proteins.
Lastly, we reveal that codon optimality in sporadic clients is more highly conserved in S. dysenteriae. S. dysenteriae is an E. coli clone that has diverged relatively recently from the E. coli K12 strain and has adopted an intracellular lifestyle (Balbi et al, 2009). Concomitant with that lifestyle, Shigella has experienced a lower effective population size and therefore reduced efficiency of purifying selection. This has generated conditions where, overall, codon optimality has started to decay. However, when we followed the fate of ancestrally optimal codons at buried sites in the S. dysenteriae and E. coli K12 genomes, we found that a lower fraction of buried sites has lost codon optimality in sporadic substrates (Figure 4), again consistent with greater structural importance of codon choice in these substrates.
Based on the these findings, we suggest the following explanation: As mentioned above, class-III substrates are defined not only by GroEL being critical for proper folding, but also by occupying most of GroEL's capacity (∼80%). With a high proportion of class-III protein passaged through the GroEL system, mistranslation errors in these proteins weigh less severely as GroEL can remedy at least some misfolding that ensues. In contrast, class-I and II genes are more highly expressed and cannot routinely rely on GroEL to rectify folding errors. Yet class-I/II proteins are clearly liable to misfold as testified by their sporadic association with GroEL. We argue that augmenting GroEL's capacity to address the misfolding propensity of these genes would be prohibitively costly to the organism and that, as an alternative strategy, these genes employ optimal codons to reduce the rate of misfolding error.
Our findings (a) reveal a cis–trans interaction between codon usage and chaperones in providing an integrated error management system, (b) provide independent evidence for a role of misfolding in shaping gene evolution and (c) suggest that the burden of deleterious mutations in long-term bottlenecking populations like that of the insect endosymbiont Buchnera not only comprises unfavourable amino-acid (Moran, 1996) but also synonymous substitutions.
It has recently been suggested that the use of optimal codons limits mistranslation-induced protein misfolding, yet evidence for this remains largely circumstantial. In contrast, molecular chaperones have long been recognized to play crucial roles in misfolding prevention and remedy. We propose that putative error limitation in cis can be elucidated by examining the interaction between codon usage and chaperoning processes. Using Escherichia coli as a model system, we find that codon optimality covaries with dependency on the chaperonin GroEL. Sporadic but not obligate substrates of GroEL exhibit higher average codon adaptation and are conspicuously enriched for optimal codons at structurally sensitive sites. Further, codon optimality of sporadic clients is more conserved in the E. coli clone Shigella dysenteriae. We suggest that highly expressed genes cannot routinely use GroEL for error control so that codon usage has evolved to provide complementary error limitation. These findings provide independent evidence for a role of misfolding in shaping gene evolution and highlight the need to co-characterize adaptations in cis and trans to unravel the workings of integrated molecular systems.
doi:10.1038/msb.2009.94
PMCID: PMC2824523  PMID: 20087338
codon bias; GroEL; misfolding
9.  Regulatory Network Structure as a Dominant Determinant of Transcription Factor Evolutionary Rate 
PLoS Computational Biology  2012;8(10):e1002734.
The evolution of transcriptional regulatory networks has thus far mostly been studied at the level of cis-regulatory elements. To gain a complete understanding of regulatory network evolution we must also study the evolutionary role of trans-factors, such as transcription factors (TFs). Here, we systematically assess genomic and network-level determinants of TF evolutionary rate in yeast, and how they compare to those of generic proteins, while carefully controlling for differences of the TF protein set, such as expression level. We found significantly distinct trends relating TF evolutionary rate to mRNA expression level, codon adaptation index, the evolutionary rate of physical interaction partners, and, confirming previous reports, to protein-protein interaction degree and regulatory in-degree. We discovered that for TFs, the dominant determinants of evolutionary rate lie in the structure of the regulatory network, such as the median evolutionary rate of target genes and the fraction of species-specific target genes. Decomposing the regulatory network by edge sign, we found that this modular evolution of TFs and their targets is limited to activating regulatory relationships. We show that fast evolving TFs tend to regulate other TFs and niche-specific processes and that their targets show larger evolutionary expression changes than targets of other TFs. We also show that the positive trend relating TF regulatory in-degree and evolutionary rate is likely related to the species-specificity of the transcriptional regulation modules. Finally, we discuss likely causes for TFs' different evolutionary relationship to the physical interaction network, such as the prevalence of transient interactions in the TF subnetwork. This work suggests that positive and negative regulatory networks follow very different evolutionary rules, and that transcription factor evolution is best understood at a network- or systems-level.
Author Summary
Transcription factors (TFs) are proteins which regulate the expression of genes by interacting with DNA. Mutations in TF protein sequences can affect the expression levels of regulated genes throughout evolution. In this study, we look into the factors which cause the different TFs in baker's yeast to be more or less tolerant of mutations during recent evolution. This tolerance is measured as the evolutionary rate, defined for each protein as the relative rate of protein-changing DNA mutations over other mutations (Ka/Ks). We found that the typical determinants of protein evolutionary rate, such as expression level and network interactions have a very different influence on TF evolutionary rate. We found that TF evolutionary rate is most highly correlated to the evolutionary properties of the genes which they regulate and specifically genes which they activate. We also show that TF evolutionary rate predicts actual evolutionary expression differences of regulated genes and we discuss some of the features unique to TFs which likely contribute to their different evolutionary trends, such as the types of protein-protein interactions prevalent in the TF subnetwork or TFs' potential role in adaptive evolution.
doi:10.1371/journal.pcbi.1002734
PMCID: PMC3475661  PMID: 23093926
10.  Backup without redundancy: genetic interactions reveal the cost of duplicate gene loss 
We show that genetic interaction profiles offer a powerful approach to elicit phenotypes that are far richer than is attainable using single gene deletions. This has allowed us to address the long-standing question of the role played by duplicate genes (paralogs) in robustness against deletion.We provide for the first time direct evidence that the capacity of some duplicates to cover for the loss of their paralogs can account for the observed difference in fitness between duplicate and singleton deletions mutants, but that the overall contribution of this effect to dispensability is small.More broadly, we demonstrate that paralogs possessing apparent backup capacity in some environments have in fact distinct and non-overlapping functions, and are unable to provide backup across a range of compromising conditions. This resolves the previous paradox of how backup genes conferring dispensability can nevertheless be independently maintained in the population.From a practical point of view, our findings suggest efficient strategies to elicit rich deletion phenotypes that should be highly relevant for the design of future phenotypic screens.
Much of our understanding of biological processes has been derived from the characterization of the functional consequence to an organism of altering one or more of its genes. Efforts to systematically evaluate the phenotypic effects of gene loss, however, have been hampered by the fact that the disruption of most genes has surprisingly modest effects on cell growth and viability. The high proportion of genes with no apparent deletion effect has wide-ranging practical and theoretical implications and has been the subject of considerable interest (Wagner, 2000, 2005; Giaever et al, 2002; Gu et al, 2003; Papp et al, 2004; Kafri et al, 2005). One factor that has been implicated as contributing to the high degree of dispensability is the abundance of closely related paralogs present in most genomes (Winzeler et al, 1999; Wagner, 2000; Giaever et al, 2002). Indeed, recent work in S. cerevisiae has shown that the existence of a paralog elsewhere in the genome significantly increases the chance that deletion of a given gene has little effect on growth (Gu et al, 2003). However, current analyses have been mostly correlative, and direct mechanistic evidence supporting or refuting the role of backup compensation in mutational robustness is still largely missing. Furthermore, backup between duplicates is not easily justified in evolutionary terms, in that a genuine ability to comprehensively cover for the loss of another gene is evolutionarily unstable (Brookfield, 1992).
Here, we exploit the recent availability of high-density quantitative genetic interaction profiles (EMAPs) to address these issues directly. To test whether SSL paralogs can account for the excess fitness of duplicates, we classified genes into fitness categories according to their deletion growth defect (Materials and methods). The subset of genes covered by our combined data set exhibits an over-representation of duplicate genes in the weak/no deletion phenotype (WNP) class similar to that reported previously (Gu et al, 2003) (Figure 1B). Strikingly, this difference corresponds to the number of WNP duplicates that have an SSL interaction with their corresponding paralog (Figure 1C). Our data thus provide direct evidence that it is indeed duplicate compensation that accounts for the observed difference in deletion growth defect between duplicates and singletons, at least for the genes covered by our data set.
Apart from the mechanism itself, the characteristic features of buffering duplicates have received considerable attention (Gu et al, 2003; Kafri et al, 2005; Wagner, 2005). Our data allowed us to unambiguously distinguish the subset of duplicates whose dispensability can be attributed to the existence of a backup paralog. The ability to identify backup duplicates directly put us in a position to study their features, and how they differ from other duplicates without buffering properties. In particular, we asked to what extent the observed buffering in rich media reflects functional similarity and a genuine ability to cover for the loss of a paralog in a broader range of conditions.
To assess the extent to which SSL duplicates can provide genuine backup under compromising conditions, we fist used genetic interaction profiles as a more stringent test for redundancy that assesses the effect of gene loss in the background of additional gene deletions. In contrast to the expectation that truly buffered duplicates should have few if any synthetic interactions, we find that the number is in fact substantial and often exceeds that of random genes and non-SSL duplicates (Figure 2B). Similarly, using a recent data set of sensitivity profiles of deletion strains to a range of agents and environments (Brown et al, 2006), we find that the deletion of SSL duplicates across a range of environments has on average no weaker (and in fact a slightly stronger) effect on cellular growth rate than that of non-SSL duplicates or random genes. Taken together, these findings suggest that the backup capacity of SSL duplicates is limited and not indicative of a comprehensive ability to cover for the loss of the paralogous partner.
We next tested the degree of functional similarity of buffering duplicates using similarity in genetic interaction as well as environmental sensitivity profiles as indicators of shared functionality (Tong et al, 2004; Schuldiner et al, 2005; Brown et al, 2006; Pan et al, 2006). In spite of their rich media buffering properties, we find that the interaction and sensitivity patterns of most SSL duplicates are divergent and are usually more similar to those of other, non-paralogous genes (Figure 2C and D; Supplementary Figure 10).
Lastly, in addition to our analysis of duplicate phenotypes, we used genetic interaction spectra as deletion phenotypes for generic genes whose single deletion in standard conditions has little measurable effect. As expected, genetic interactions provide a deletion phenotype for many more genes (80–90%) than single gene deletions in standard growth environments (Steinmetz et al, 2002), which yield a detectable growth defect only for 30–40% (Figure 4B). To assess whether these interactions reflect the cost of gene loss (gene importance), we asked if there is a relationship between the probability of a gene being retained between related species and its number of genetic interactions. Indeed, genetic interactivity exhibits a strong correlation with gene retention across related phyla (Figure 4C and Supplementary Figure 7), and predicts the likelihood of gene loss better than lethality/viability, quantitative growth deficiency or environmental specificity (Supplementary Figure 8). Thus, genetic interactions provide a cost of gene loss that effectively recapitulates evolutionary constraints. This is further supported by the observation that genetic interactions are significantly correlated with environmental sensitivity across a range of conditions. Thus, our findings suggest that for most genes there is a substantial cost of gene loss, even though this is often not reflected in single gene deletion tests carried out in standard conditions.
Many genes can be deleted with little phenotypic consequences. By what mechanism and to what extent the presence of duplicate genes in the genome contributes to this robustness against deletions has been the subject of considerable interest. Here, we exploit the availability of high-density genetic interaction maps to provide direct support for the role of backup compensation, where functionally overlapping duplicates cover for the loss of their paralog. However, we find that the overall contribution of duplicates to robustness against null mutations is low (∼25%). The ability to directly identify buffering paralogs allowed us to further study their properties, and how they differ from non-buffering duplicates. Using environmental sensitivity profiles as well as quantitative genetic interaction spectra as high-resolution phenotypes, we establish that even duplicate pairs with compensation capacity exhibit rich and typically non-overlapping deletion phenotypes, and are thus unable to comprehensively cover against loss of their paralog. Our findings reconcile the fact that duplicates can compensate for each other's loss under a limited number of conditions with the evolutionary instability of genes whose loss is not associated with a phenotypic penalty.
doi:10.1038/msb4100127
PMCID: PMC1847942  PMID: 17389874
duplication; evolution; genetic interactions; redundancy
11.  Automated identification of pathways from quantitative genetic interaction data 
We present a novel Bayesian learning method that reconstructs large detailed gene networks from quantitative genetic interaction (GI) data.The method uses global reasoning to handle missing and ambiguous measurements, and provide confidence estimates for each prediction.Applied to a recent data set over genes relevant to protein folding, the learned networks reflect known biological pathways, including details such as pathway ordering and directionality of relationships.The reconstructed networks also suggest novel relationships, including the placement of SGT2 in the tail-anchored biogenesis pathway, a finding that we experimentally validated.
Recent developments have enabled large-scale quantitative measurement of genetic interactions (GIs) that report on the extent to which the activity of one gene is dependent on a second. It has long been recognized (Avery and Wasserman, 1992; Hartman et al, 2001; Segre et al, 2004; Tong et al, 2004; Drees et al, 2005; Schuldiner et al, 2005; St Onge et al, 2007; Costanzo et al, 2010) that functional dependencies revealed by GI data can provide rich information regarding underlying biological pathways. Further, the precise phenotypic measurements provided by quantitative GI data can provide evidence for even more detailed aspects of pathway structure, such as differentiating between full and partial dependence between two genes (Drees et al, 2005; Schuldiner et al, 2005; St Onge et al, 2007; Jonikas et al, 2009) (Figure 1A). As GI data sets become available for a range of quantitative phenotypes and organisms, such patterns will allow researchers to elucidate pathways important to a diverse set of biological processes.
We present a new method that exploits the high-quality, quantitative nature of recent GI assays to automatically reconstruct detailed multi-gene pathway structures, including the organization of a large set of genes into coherent pathways, the connectivity and ordering within each pathway, and the directionality of each relationship. We introduce activity pathway networks (APNs), which represent functional dependencies among a set of genes in the form of a network. We present an automatic method to efficiently reconstruct APNs over large sets of genes based on quantitative GI measurements. This method handles uncertainty in the data arising from noise, missing measurements, and data points with ambiguous interpretations, by performing global reasoning that combines evidence from multiple data points. In addition, because some structure choices remain uncertain even when jointly considering all measurements, our method maintains multiple likely networks, and allows computation of confidence estimates over each structure choice.
We applied our APN reconstruction method to the recent high-quality GI data set of Jonikas et al (2009), which examined the functional interaction between genes that contribute to protein folding in the ER. Specifically, Jonikas et al used the cell's endogenous sensor (the unfolded protein response), to first identify several hundred yeast genes with functions in endoplasmic reticulum folding and then systematically characterized their functional interdependencies by measuring unfolded protein response levels in double mutants. Our analysis produced an ensemble of 500 likelihood-weighted APNs over 178 genes (Figure 2).
We performed an aggregate evaluation of our results by comparing to known biological relationships between gene pairs, including participation in pathways according to the Kyoto Encyclopedia of Genes and Genomes (KEGG), correlation of chemical genomic profiles in a recent high-throughput assay (Hillenmeyer et al, 2008) and similarity of Gene Ontology (GO) annotations. In each evaluation performed, our reconstructed APNs were significantly more consistent with the known relationships than either the raw GI values or the Pearson correlation between profiles of GI values.
Importantly, our approach provides not only an improved means for defining pairs or groups of related genes, but also enables the identification of detailed multi-gene network structures. In many cases, our method successfully reconstructed known cellular pathways, including the ER-associated degradation (ERAD) pathway, and the biosynthesis of N-linked glycans, ranking them among the highest confidence structures. In-depth examination of the learned network structures indicates agreement with many known details of these pathways. In addition, quantitative analysis indicates that our learned APNs are indicative of ordering within KEGG-annotated biological pathways.
Our results also suggest several novel relationships, including placement of uncharacterized genes into pathways, and novel relationships between characterized genes. These include the dependence of the J domain chaperone JEM1 on the PDI homolog MPD1, dependence of the Ubiquitin-recycling enzyme DOA4 on N-linked glycosylation, and the dependence of the E3 Ubiquitin ligase DOA10 on the signal peptidase complex subunit SPC2. Our APNs also place the poorly characterized TPR-containing protein SGT2 upstream of the tail-anchored protein biogenesis machinery components GET3, GET4, and MDY2 (also known as GET5), suggesting that SGT2 has a function in the insertion of tail-anchored proteins into membranes. Consistent with this prediction, our experimental analysis shows that sgt2Δ cells show a defect in localization of the tail-anchored protein GFP-Sed5 from punctuate Golgi structures to a more diffuse pattern, as seen in other genes involved in this pathway.
Our results show that multi-gene, detailed pathway networks can be reconstructed from quantitative GI data, providing a concrete computational manifestation to intuitions that have traditionally accompanied the manual interpretation of such data. Ongoing technological developments in both genetics and imaging are enabling the measurement of GI data at a genome-wide scale, using high-accuracy quantitative phenotypes that relate to a range of particular biological functions. Methods based on RNAi will soon allow collection of similar data for human cell lines and other mammalian systems (Moffat et al, 2006). Thus, computational methods for analyzing GI data could have an important function in mapping pathways involved in complex biological systems including human cells.
High-throughput quantitative genetic interaction (GI) measurements provide detailed information regarding the structure of the underlying biological pathways by reporting on functional dependencies between genes. However, the analytical tools for fully exploiting such information lag behind the ability to collect these data. We present a novel Bayesian learning method that uses quantitative phenotypes of double knockout organisms to automatically reconstruct detailed pathway structures. We applied our method to a recent data set that measures GIs for endoplasmic reticulum (ER) genes, using the unfolded protein response as a quantitative phenotype. The results provided reconstructions of known functional pathways including N-linked glycosylation and ER-associated protein degradation. It also contained novel relationships, such as the placement of SGT2 in the tail-anchored biogenesis pathway, a finding that we experimentally validated. Our approach should be readily applicable to the next generation of quantitative GI data sets, as assays become available for additional phenotypes and eventually higher-level organisms.
doi:10.1038/msb.2010.27
PMCID: PMC2913392  PMID: 20531408
computational biology; genetic interaction; pathway reconstruction; probabilistic methods
12.  Genomic Characterization of Variable Surface Antigens Reveals a Telomere Position Effect as a Prerequisite for RNA Interference-Mediated Silencing in Paramecium tetraurelia 
mBio  2014;5(6):e01328-14.
ABSTRACT
Antigenic or phenotypic variation is a widespread phenomenon of expression of variable surface protein coats on eukaryotic microbes. To clarify the mechanism behind mutually exclusive gene expression, we characterized the genetic properties of the surface antigen multigene family in the ciliate Paramecium tetraurelia and the epigenetic factors controlling expression and silencing. Genome analysis indicated that the multigene family consists of intrachromosomal and subtelomeric genes; both classes apparently derive from different gene duplication events: whole-genome and intrachromosomal duplication. Expression analysis provides evidence for telomere position effects, because only subtelomeric genes follow mutually exclusive transcription. Microarray analysis of cultures deficient in Rdr3, an RNA-dependent RNA polymerase, in comparison to serotype-pure wild-type cultures, shows cotranscription of a subset of subtelomeric genes, indicating that the telomere position effect is due to a selective occurrence of Rdr3-mediated silencing in subtelomeric regions. We present a model of surface antigen evolution by intrachromosomal gene duplication involving the maintenance of positive selection of structurally relevant regions. Further analysis of chromosome heterogeneity shows that alternative telomere addition regions clearly affect transcription of closely related genes. Consequently, chromosome fragmentation appears to be of crucial importance for surface antigen expression and evolution. Our data suggest that RNAi-mediated control of this genetic network by trans-acting RNAs allows rapid epigenetic adaptation by phenotypic variation in combination with long-term genetic adaptation by Darwinian evolution of antigen genes.
IMPORTANCE
Alternating surface protein structures have been described for almost all eukaryotic microbes, and a broad variety of functions have been described, such as virulence factors, adhesion molecules, and molecular camouflage. Mechanisms controlling gene expression of variable surface proteins therefore represent a powerful tool for rapid phenotypic variation across kingdoms in pathogenic as well as free-living eukaryotic microbes. However, the epigenetic mechanisms controlling synchronous expression and silencing of individual genes are hardly understood. Using the ciliate Paramecium tetraurelia as a (epi)genetic model, we showed that a subtelomeric gene position effect is associated with the selective occurrence of RNAi-mediated silencing of silent surface protein genes, suggesting small interfering RNA (siRNA)-mediated epigenetic cross talks between silent and active surface antigen genes. Our integrated genomic and molecular approach discloses the correlation between gene position effects and siRNA-mediated trans-silencing, thus providing two new parameters for regulation of mutually exclusive gene expression and the genomic organization of variant gene families.
doi:10.1128/mBio.01328-14
PMCID: PMC4235209  PMID: 25389173
13.  A Gene's Ability to Buffer Variation Is Predicted by Its Fitness Contribution and Genetic Interactions 
PLoS ONE  2011;6(3):e17650.
Background
Many single-gene knockouts result in increased phenotypic (e.g., morphological) variability among the mutant's offspring. This has been interpreted as an intrinsic ability of genes to buffer genetic and environmental variation. A phenotypic capacitor is a gene that appears to mask phenotypic variation: when knocked out, the offspring shows more variability than the wild type. Theory predicts that this phenotypic potential should be correlated with a gene's knockout fitness and its number of negative genetic interactions. Based on experimentally measured phenotypic capacity, it was suggested that knockout fitness was unimportant, but that phenotypic capacitors tend to be hubs in genetic and physical interaction networks.
Methodology/Principal Findings
We re-analyse the available experimental data in a combined model, which includes knockout fitness and network parameters as well as expression level and protein length as predictors of phenotypic potential. Contrary to previous conclusions, we find that the strongest predictor is in fact haploid knockout fitness (responsible for 9% of the variation in phenotypic potential), with an additional contribution from the genetic interaction network (5%); once these two factors are taken into account, protein-protein interactions do not make any additional contribution to the variation in phenotypic potential.
Conclusions/Significance
We conclude that phenotypic potential is not a mysterious “emergent” property of cellular networks. Instead, it is very simply determined by the overall fitness reduction of the organism (which in its compromised state can no longer compensate for multiple factors that contribute to phenotypic variation), and by the number (and presumably nature) of genetic interactions of the knocked-out gene. In this light, Hsp90, the prototypical phenotypic capacitor, may not be representative: typical phenotypic capacitors are not direct “buffers” of variation, but are simply genes encoding central cellular functions.
doi:10.1371/journal.pone.0017650
PMCID: PMC3047586  PMID: 21407817
14.  Quantifying Adaptive Evolution in the Drosophila Immune System 
PLoS Genetics  2009;5(10):e1000698.
It is estimated that a large proportion of amino acid substitutions in Drosophila have been fixed by natural selection, and as organisms are faced with an ever-changing array of pathogens and parasites to which they must adapt, we have investigated the role of parasite-mediated selection as a likely cause. To quantify the effect, and to identify which genes and pathways are most likely to be involved in the host–parasite arms race, we have re-sequenced population samples of 136 immunity and 287 position-matched non-immunity genes in two species of Drosophila. Using these data, and a new extension of the McDonald-Kreitman approach, we estimate that natural selection fixes advantageous amino acid changes in immunity genes at nearly double the rate of other genes. We find the rate of adaptive evolution in immunity genes is also more variable than other genes, with a small subset of immune genes evolving under intense selection. These genes, which are likely to represent hotspots of host–parasite coevolution, tend to share similar functions or belong to the same pathways, such as the antiviral RNAi pathway and the IMD signalling pathway. These patterns appear to be general features of immune system evolution in both species, as rates of adaptive evolution are correlated between the D. melanogaster and D. simulans lineages. In summary, our data provide quantitative estimates of the elevated rate of adaptive evolution in immune system genes relative to the rest of the genome, and they suggest that adaptation to parasites is an important force driving molecular evolution.
Author Summary
All organisms are attacked by an ever-changing array of pathogens and parasites, and it is widely supposed that the ensuing host–parasite “arms race” must drive extensive adaptive evolution in genes of the immune system. Here we have taken advantage of new sequencing technologies and analytical approaches to quantify the amount of adaptation that is occurring in immunity genes relative to the rest of the genome. We sampled two species of fruit fly (D. melanogaster and D. simulans) from eight different populations around the world, and sequenced 136 immunity and 287 non-immunity genes from these samples. Based on the differences in the sequences between the two species, and the genetic diversity within each species, we have estimated that natural selection drives twice as much change in immune-related proteins as in proteins with no immune function. Interestingly, the rate of adaptation is also more variable among immunity genes than among other genes in the genome, with a small subset of immunity genes evolving under intense natural selection. We suggest that these genes may represent hotspots of host–parasite coevolution within the genome.
doi:10.1371/journal.pgen.1000698
PMCID: PMC2759075  PMID: 19851448
15.  Adaptive Evolution and the Birth of CTCF Binding Sites in the Drosophila Genome 
PLoS Biology  2012;10(11):e1001420.
Comparative ChIP-seq data reveal adaptive evolution of insulator protein CTCF binding in multiple Drosophila species.
Changes in the physical interaction between cis-regulatory DNA sequences and proteins drive the evolution of gene expression. However, it has proven difficult to accurately quantify evolutionary rates of such binding change or to estimate the relative effects of selection and drift in shaping the binding evolution. Here we examine the genome-wide binding of CTCF in four species of Drosophila separated by between ∼2.5 and 25 million years. CTCF is a highly conserved protein known to be associated with insulator sequences in the genomes of human and Drosophila. Although the binding preference for CTCF is highly conserved, we find that CTCF binding itself is highly evolutionarily dynamic and has adaptively evolved. Between species, binding divergence increased linearly with evolutionary distance, and CTCF binding profiles are diverging rapidly at the rate of 2.22% per million years (Myr). At least 89 new CTCF binding sites have originated in the Drosophila melanogaster genome since the most recent common ancestor with Drosophila simulans. Comparing these data to genome sequence data from 37 different strains of Drosophila melanogaster, we detected signatures of selection in both newly gained and evolutionarily conserved binding sites. Newly evolved CTCF binding sites show a significantly stronger signature for positive selection than older sites. Comparative gene expression profiling revealed that expression divergence of genes adjacent to CTCF binding site is significantly associated with the gain and loss of CTCF binding. Further, the birth of new genes is associated with the birth of new CTCF binding sites. Our data indicate that binding of Drosophila CTCF protein has evolved under natural selection, and CTCF binding evolution has shaped both the evolution of gene expression and genome evolution during the birth of new genes.
Author Summary
A large proportion of the diversity of living organisms results from differential regulation of gene transcription. Transcriptional regulation is thought to differ between species because of evolutionary changes in the physical interactions between regulatory DNA elements and DNA-binding proteins; these can generate variation in the spatial and temporal patterns of gene expression. The mechanisms by which these protein–DNA interactions evolve is therefore an important question in evolutionary biology. Does adaptive evolution play a role, or is the process dominated by neutral genetic drift? Insulator proteins are a special group of DNA-binding proteins—instead of directly serving to activate or repress genes, they can function to coordinate the interactions between other regulatory elements (such as enhancers and promoters). Additionally, insulator proteins can limit the spreading of chromatin condensation and help to demarcate the boundaries of regulatory domains in the genome. In spite of their critical role in genome regulation, little is known about the evolution of interactions between insulator proteins and DNA. Here, we use ChIP-seq to examine the distribution of binding sites for CTCF, a highly conserved insulator protein, in four closely related Drosophila species. We find that genome-wide binding profiles of CTCF are highly dynamic across evolutionary time, with frequent births of new CTCF-DNA interactions, and we demonstrate that this evolutionary process is driven by natural selection. By comparing these with RNA-seq data, we find that gain or loss of CTCF binding impacts the expression levels of nearby genes and correlates with structural evolution of the genome. Together these results suggest a potential mechanism of regulatory re-wiring through adaptive evolution of CTCF binding.
doi:10.1371/journal.pbio.1001420
PMCID: PMC3491045  PMID: 23139640
16.  The Cellular Robustness by Genetic Redundancy in Budding Yeast 
PLoS Genetics  2010;6(11):e1001187.
The frequent dispensability of duplicated genes in budding yeast is heralded as a hallmark of genetic robustness contributed by genetic redundancy. However, theoretical predictions suggest such backup by redundancy is evolutionarily unstable, and the extent of genetic robustness contributed from redundancy remains controversial. It is anticipated that, to achieve mutual buffering, the duplicated paralogs must at least share some functional overlap. However, counter-intuitively, several recent studies reported little functional redundancy between these buffering duplicates. The large yeast genetic interactions released recently allowed us to address these issues on a genome-wide scale. We herein characterized the synthetic genetic interactions for ∼500 pairs of yeast duplicated genes originated from either whole-genome duplication (WGD) or small-scale duplication (SSD) events. We established that functional redundancy between duplicates is a pre-requisite and thus is highly predictive of their backup capacity. This observation was particularly pronounced with the use of a newly introduced metric in scoring functional overlap between paralogs on the basis of gene ontology annotations. Even though mutual buffering was observed to be prevalent among duplicated genes, we showed that the observed backup capacity is largely an evolutionarily transient state. The loss of backup capacity generally follows a neutral mode, with the buffering strength decreasing in proportion to divergence time, and the vast majority of the paralogs have already lost their backup capacity. These observations validated previous theoretic predictions about instability of genetic redundancy. However, departing from the general neutral mode, intriguingly, our analysis revealed the presence of natural selection in stabilizing functional overlap between SSD pairs. These selected pairs, both WGD and SSD, tend to have decelerated functional evolution, have higher propensities of co-clustering into the same protein complexes, and share common interacting partners. Our study revealed the general principles for the long-term retention of genetic redundancy.
Author Summary
Eukaryotic cells show remarkable robustness against external perturbations, which has been thought to be attributed, at least in part, to the extensive gene duplication events in eukaryotic genomes. By duplication, genes are likely to gain redundant copies for backup purposes, however, this notion contradicts the population genetic theory that genetic redundancy is evolutionarily unstable. In this study, we used yeast as a model organism to delineate the evolutionary trajectory of genetic robustness by gene duplication, utilizing the comprehensively characterized synthetic genetic interaction data in the yeast genome. We showed that the evolution of genetic robustness by duplication follows a neutral mode, with the loss of backup capacity proportional to the divergence time. However, natural selection was also acting on a few pairs to maintain their long-term backup capacity; and these pairs are slowly evolving, are co-clustered in the same protein complexes, and tend to interact with the similar partners. This study unravels the general principles underlying the evolution of the cellular robustness arising from genetic redundancy.
doi:10.1371/journal.pgen.1001187
PMCID: PMC2973813  PMID: 21079672
17.  Horizontal Transfer, Not Duplication, Drives the Expansion of Protein Families in Prokaryotes 
PLoS Genetics  2011;7(1):e1001284.
Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes—xenologs—persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes—paralogs—are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein–protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of life's biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.
Author Summary
Prokaryotes can be found in the most diverse and severe ecological niches of the planet. Their rapid adaptation is, in part, the result of the ability to acquire genetic information horizontally. This means that prokaryotes utilize two major paths to expand their repertoire of protein families: they can duplicate a pre-existing gene or acquire it by horizontal transfer. In this study, we track family expansions among closely related strains of prokaryotic species. We find that the majority of gene expansions arrive via transfer not via duplication. Additionally, we find that duplicate genes tend be more transient and evolve slower than transferred ones, highlighting different roles with respect to adaptation and evolution. These results suggest that prevailing theories aimed at understanding the evolution of biological systems grounded on gene duplication might be poorly fit to explain the evolution of prokaryotic systems, which include the vast majority of life's biochemical diversity.
doi:10.1371/journal.pgen.1001284
PMCID: PMC3029252  PMID: 21298028
18.  Genetic Selection for Context-Dependent Stochastic Phenotypes: Sp1 and TATA Mutations Increase Phenotypic Noise in HIV-1 Gene Expression 
PLoS Computational Biology  2013;9(7):e1003135.
The sequence of a promoter within a genome does not uniquely determine gene expression levels and their variability; rather, promoter sequence can additionally interact with its location in the genome, or genomic context, to shape eukaryotic gene expression. Retroviruses, such as human immunodeficiency virus-1 (HIV), integrate their genomes into those of their host and thereby provide a biomedically-relevant model system to quantitatively explore the relationship between promoter sequence, genomic context, and noise-driven variability on viral gene expression. Using an in vitro model of the HIV Tat-mediated positive-feedback loop, we previously demonstrated that fluctuations in viral Tat-transactivating protein levels generate integration-site-dependent, stochastically-driven phenotypes, in which infected cells randomly ‘switch’ between high and low expressing states in a manner that may be related to viral latency. Here we extended this model and designed a forward genetic screen to systematically identify genetic elements in the HIV LTR promoter that modulate the fraction of genomic integrations that specify ‘Switching’ phenotypes. Our screen identified mutations in core promoter regions, including Sp1 and TATA transcription factor binding sites, which increased the Switching fraction several fold. By integrating single-cell experiments with computational modeling, we further investigated the mechanism of Switching-fraction enhancement for a selected Sp1 mutation. Our experimental observations demonstrated that the Sp1 mutation both impaired Tat-transactivated expression and also altered basal expression in the absence of Tat. Computational analysis demonstrated that the observed change in basal expression could contribute significantly to the observed increase in viral integrations that specify a Switching phenotype, provided that the selected mutation affected Tat-mediated noise amplification differentially across genomic contexts. Our study thus demonstrates a methodology to identify and characterize promoter elements that affect the distribution of stochastic phenotypes over genomic contexts, and advances our understanding of how promoter mutations may control the frequency of latent HIV infection.
Author Summary
The sequence of a gene within a cellular genome does not uniquely determine its expression level, even for a single type of cell under fixed conditions. Numerous other factors, including gene location on the chromosome and random gene-expression “noise,” can alter expression patterns and cause differences between otherwise identical cells. This poses new challenges for characterizing the genotype–phenotype relationship. Infection by the human immunodeficiency virus-1 (HIV-1) provides a biomedically important example in which transcriptional noise and viral genomic location impact the decision between viral replication and latency, a quiescent but reversible state that cannot be eliminated by anti-viral therapies. Here, we designed a forward genetic screen to systematically identify mutations in the HIV promoter that alter the fraction of genomic integrations that specify noisy/reactivating expression phenotypes. The mechanisms by which the selected mutations specify the observed phenotypic enrichments are investigated through a combination of single-cell experiments and computational modeling. Our study provides a framework for identifying genetic sequences that alter the distribution of stochastic expression phenotypes over genomic locations and for characterizing their mechanisms of regulation. Our results also may yield further insights into the mechanisms by which HIV sequence evolution can alter the propensity for latent infections.
doi:10.1371/journal.pcbi.1003135
PMCID: PMC3708878  PMID: 23874178
19.  Gain and Loss of Multiple Genes During the Evolution of Helicobacter pylori 
PLoS Genetics  2005;1(4):e43.
Sequence diversity and gene content distinguish most isolates of Helicobacter pylori. Even greater sequence differences differentiate distinct populations of H. pylori from different continents, but it was not clear whether these populations also differ in gene content. To address this question, we tested 56 globally representative strains of H. pylori and four strains of Helicobacter acinonychis with whole genome microarrays. Of the weighted average of 1,531 genes present in the two sequenced genomes, 25% are absent in at least one strain of H. pylori and 21% were absent or variable in H. acinonychis. We extrapolate that the core genome present in all isolates of H. pylori contains 1,111 genes. Variable genes tend to be small and possess unusual GC content; many of them have probably been imported by horizontal gene transfer. Phylogenetic trees based on the microarray data differ from those based on sequences of seven genes from the core genome. These discrepancies are due to homoplasies resulting from independent gene loss by deletion or recombination in multiple strains, which distort phylogenetic patterns. The patterns of these discrepancies versus population structure allow a reconstruction of the timing of the acquisition of variable genes within this species. Variable genes that are located within the cag pathogenicity island were apparently first acquired en bloc after speciation. In contrast, most other variable genes are of unknown function or encode restriction/modification enzymes, transposases, or outer membrane proteins. These seem to have been acquired prior to speciation of H. pylori and were subsequently lost by convergent evolution within individual strains. Thus, the use of microarrays can reveal patterns of gene gain or loss when examined within a phylogenetic context that is based on sequences of core genes.
Synopsis
The Gram-negative pathogenic bacterium Helicobacter pylori colonizes the stomach of 50% of mankind and has probably infected humans since their origins. Due to geographic isolation and frequent local recombination, phylogeographic differences within H. pylori have arisen, resulting in multiple populations and subpopulations that mirror ancient human migrations and genetic diversity. We have examined the gene content of representatives of these populations by whole genome microarrays. Only 1,111 genes are predicted to exist in all H. pylori of the 1,531 that are present on average in two sequenced genomes. Missing genes fall into two classes: one class contains genes within the cag pathogenicity island that was acquired en bloc after speciation and is present only in particular populations. The second class contains a variety of genes whose function may be unimportant for the cell and that were acquired prior to speciation. Their absence in individual isolates reflects convergent evolution through gene loss. Thus, patterns of gene gain or loss can be identified by whole genome microarrays within a phylogenetic context that can be supplied by sequences of genes from the core genome.
doi:10.1371/journal.pgen.0010043
PMCID: PMC1245399  PMID: 16217547
20.  Epistatic relationships reveal the functional organization of yeast transcription factors 
A comprehensive quantitative genetic interaction map, or E-MAP, has provided a global view of the functional interdependencies between the components of the transcriptional apparatus in budding yeast.Transcription factors that display aggravating/negative genetic interactions regulate gene expression in an independent rather than coordinated manner.Parallel/compensating relationships between regulators often characterize transcriptional circuits.
Genetic interactions identify the functional interdependencies between genes (Guarente, 1993). They can be either positive (i.e. alleviating) or negative (i.e. aggravating) in nature corresponding to cases where the double mutant grows better or worse, respectively, then expected from growth of the corresponding single mutants (Beyer et al, 2007). Negative genetic interactions between non-essential genes often identify factors involved in parallel pathways, whereas positive ones often correspond to cases where the corresponding proteins are working in the same pathway and/or are physically associated (Beltrao et al, 2010). The epistatic miniarray profile (E-MAP) approach (Schuldiner et al, 2005), which quantitatively and comprehensively identifies both positive and negative genetic interactions on a logically selected set of genes, was used in this study in S. cerevisiae to genetically interrogate the set of 151 sequence-specific transcription factors (STFs) as well as 172 components of the general transcriptional machinery (GTFs).
We found a higher propensity of the group of STFs to strongly genetically interact with GTFs than with themselves (Figure 1A and B). However, within the set of STF–STF genetic interactions, there was a significant enrichment of negative over positive genetic interactions (Figure 1A and C), suggesting that parallel/compensating relationships, rather than linear pathways, predominate within the set of STFs. These genetic trends are in stark contrast to what was previously observed with factors involved in regulating signaling (e.g. kinases and phosphatases), which were significantly enriched in positive over negative genetic interactions (Fiedler et al, 2009).
In addition to providing an overview of the global relationships among TFs, the fine structure of the E-MAP can be used to address the nature of the regulatory architecture controlling individual genes. A variety of regulatory patterns have been described that serve the differing functional requirements of various biological processes (Istrail and Davidson, 2005). Our E-MAP identified several examples of the regulatory relationships between transcription factors, including (1) one TF acting as a repressor of another TF (e.g. Gal80 acting as a repressor of Gal4, the activator of the GAL genes); (2) two TFs acting redundantly to regulate a set of genes (e.g. Gln3 and Gat1, two GATA family activators involved in regulating nitrogen catabolite repression (NCR)); and (3) two TFs regulating genes in a coordinated manner (e.g. Hac1 working with the HDAC complex Rpd3C(L) to regulate expression of early meiotic genes).
Given the complex structures of promoters (Zhu and Zhang, 1999; Chin et al, 2005) and the possible types of regulatory logic (Buchler et al, 2003), we wanted to identify the types of logic that are used in nature. We explored this by combining our genetic interaction data with the information about the network connections between STFs and their targets. By initially focusing on pairs of STFs that share a set of targets defined by the genome-wide binding studies (Harbison et al, 2004; MacIsaac et al, 2006), a total of 110 STF gene pairs were identified that have statistically significant target overlap with a P-value <0.005, whereas 49 pairs have significant overlap at a more stringent cutoff (P<10−7). Several examples were examined in more detail by quantitative growth assay in liquid culture and gene expression profiling of the TF-deletion mutants. In each case, the growth rate of one of the single-deletion mutants is significantly reduced (i.e. ‘the major regulator'), whereas the growth rate of the other single-deletion mutant is similar to that of the wild type (i.e. ‘the minor regulator'). In the absence of the major regulator, the deletion of the minor regulator leads to a more severe growth defect, resulting in a negative genetic interaction (Figure 5A). We examined the response of common target genes of two pairs of TFs (Swi4-Skn7 and Gcr2-Tye7) and found an enrichment of common target genes displaying ‘OR' but not ‘AND' behavior, in the simplified language of Boolean logic. Further examination of the targets revealed that many of them are induced/repressed more by the double deletion than each of the single deletions (Figure 5D). Collectively, these results suggest that frequently TF pairs with negative interactions regulate the transcription of their common target genes in a redundant manner.
The regulation of gene expression is, in large part, mediated by interplay between the general transcription factors (GTFs) that function to bring about the expression of many genes and site-specific DNA-binding transcription factors (STFs). Here, quantitative genetic profiling using the epistatic miniarray profile (E-MAP) approach allowed us to measure 48 391 pairwise genetic interactions, both negative (aggravating) and positive (alleviating), between and among genes encoding STFs and GTFs in Saccharomyces cerevisiae. This allowed us to both reconstruct regulatory models for specific subsets of transcription factors and identify global epistatic patterns. Overall, there was a much stronger preference for negative relative to positive genetic interactions among STFs than there was among GTFs. Negative genetic interactions, which often identify factors working in non-essential, redundant pathways, were also enriched for pairs of STFs that co-regulate similar sets of genes. Microarray analysis demonstrated that pairs of STFs that display negative genetic interactions regulate gene expression in an independent rather than coordinated manner. Collectively, these data suggest that parallel/compensating relationships between regulators, rather than linear pathways, often characterize transcriptional circuits.
doi:10.1038/msb.2010.77
PMCID: PMC2990640  PMID: 20959818
genetic interaction; regulatory network; transcription factor; transcription regulation
21.  Chromatin regulators as capacitors of interspecies variations in gene expression 
Deletion of eight chromatin regulators and one transcription factor increases the variability in gene expression between two closely related yeast species, suggesting that large-scale regulators often buffer variations in gene expression.Similar analysis of metabolic enzymes indicates that, unlike regulators, these enzymes do not buffer gene expression variations.
Biological systems are often robust to mutations—their outputs, (for example, gene expression profiles) remain stable in the face of mutations. This ensures that most individuals maintain the ‘correct' behavior, which has been shaped by million of years of evolution, despite a constant flux of mutations. How is robustness maintained, and in particular, which genes are required for it? Such questions have been studied for decades, yet there are no simple answers.
Previous studies suggested that particular proteins, termed genetic capacitors, buffer the effects of mutations, thereby promoting robustness. The classical example of such a protein is Hsp90, whose activity as a chaperone has been proposed to aid the correct folding of mutant proteins and thus buffer the structural effects of mutations. The hallmark of a genetic capacitor is that its deletion reveals phenotypic differences between individuals or species, which are hidden (that is, buffered) in its presence.
The example of Hsp90 may suggest that buffering is a property of only few proteins that carry particular catalytic functions such as chaperones. However, theoretical studies have instead suggested that many proteins serve as genetic capacitors and that buffering is not necessarily a consequence of their direct activity but rather emerges naturally during evolution of complex biological systems.
Here, we show that eight chromatin regulators and one transcription factor buffer interspecies variations in gene expression. We deleted each of these nine regulators in two closely related yeast species and compared the extent of interspecies expression difference before and after each deletion. The results clearly show that deletion of these regulators tends to increase the extent of expression differences, indicating that they are normally buffering variations in gene expression, thus serving as genetic capacitors.
Similar analysis of 11 metabolic enzymes showed that, unlike the regulators, deletion of these enzymes does not increase expression divergence. Thus, buffering may be a characteristic feature of large-scale regulators. Further analysis of the buffered variations suggested that these are often caused by mutations that affect regulatory proteins, presumably those involved in sensing the environment, and that buffered variations are found primarily in genes with distinctive promoter features that are associated with highly dynamic and responsive regulation.
We believe, as others have previously proposed, that buffering emerged naturally during evolution of a complex system. More specifically, we propose that organisms accumulate many mutations that have no functional consequences through random drift, but that some of these mutations would in fact be functional if a certain regulatory protein is inactive. These mutations are often conditionally neutral because of their epistatic interactions with mutations in regulatory proteins. Such epistatic interactions may not reflect direct buffering activity (as proposed for Hsp90) but rather an inevitable consequence of the connectivity and complexity of biological systems. Note that the opposite case—mutations that are normally functional but become neutral when the regulatory protein is inactive—are also frequent, but these are presumed to be efficiently purged by natural selection. As a result, deletion of such regulatory proteins unleashes the effects of many ‘hidden' mutations and increases variations among individuals or species.
Gene expression varies widely between closely related species and strains, yet the genetic basis of most differences is still unknown. Several studies suggested that chromatin regulators have a key role in generating expression diversity, predicting a reduction in the interspecies differences on deletion of genes that influence chromatin structure or modifications. To examine this, we compared the genome-wide expression profiles of two closely related yeast species following the individual deletions of eight chromatin regulators and one transcription factor. In all cases, regulator deletions increased, rather than decreased, the expression differences between the species, revealing hidden genetic variability that was masked in the wild-type backgrounds. This effect was not observed for individual deletions of 11 enzymes involved in central metabolic pathways. The buffered variations were associated with trans differences, as revealed by allele-specific profiling of the interspecific hybrids. Our results support the idea that regulatory proteins serve as capacitors that buffer gene expression against hidden genetic variability.
doi:10.1038/msb.2010.84
PMCID: PMC3010112  PMID: 21119629
chromatin structure; evolution; gene expression; genetic capacitor
22.  Cooperative Adaptive Responses in Gene Regulatory Networks with Many Degrees of Freedom 
PLoS Computational Biology  2013;9(4):e1003001.
Cells generally adapt to environmental changes by first exhibiting an immediate response and then gradually returning to their original state to achieve homeostasis. Although simple network motifs consisting of a few genes have been shown to exhibit such adaptive dynamics, they do not reflect the complexity of real cells, where the expression of a large number of genes activates or represses other genes, permitting adaptive behaviors. Here, we investigated the responses of gene regulatory networks containing many genes that have undergone numerical evolution to achieve high fitness due to the adaptive response of only a single target gene; this single target gene responds to changes in external inputs and later returns to basal levels. Despite setting a single target, most genes showed adaptive responses after evolution. Such adaptive dynamics were not due to common motifs within a few genes; even without such motifs, almost all genes showed adaptation, albeit sometimes partial adaptation, in the sense that expression levels did not always return to original levels. The genes split into two groups: genes in the first group exhibited an initial increase in expression and then returned to basal levels, while genes in the second group exhibited the opposite changes in expression. From this model, genes in the first group received positive input from other genes within the first group, but negative input from genes in the second group, and vice versa. Thus, the adaptation dynamics of genes from both groups were consolidated. This cooperative adaptive behavior was commonly observed if the number of genes involved was larger than the order of ten. These results have implications in the collective responses of gene expression networks in microarray measurements of yeast Saccharomyces cerevisiae and the significance to the biological homeostasis of systems with many components.
Author Summary
Homeostasis is an inherent property of biological systems, which have a general tendency to adapt, i.e., to recover their original state following environmental changes. In cells, this adaptation is mediated by changes in protein expression. Initially, cells respond to environmental changes by altered gene/protein expression; subsequently, the expression of most genes returns to basal levels, albeit not completely, as shown by recent experimental analyses of yeast. Although simple mechanisms for adaptation through network motifs, composed of just a few genes, are well understood, how regulatory networks involving many genes that activate or repress each other can generate adaptive behaviors is unclear. Here, by numerically evolving gene regulatory networks, we obtained a class of genes whose expression dynamics showed adaptation over almost all genes, from which we revealed the general logic underlying such adaptive dynamics with many degrees of freedom, which was not reducible to motifs with a few genes. This adaptation was cooperative, i.e., adaptation of one gene mutually relied upon others' adaptive expressions. Moreover, this collective behavior was robust to noise and mutations. The present study sheds a light on the nature of collective gene expression dynamics allowing for biological homeostasis.
doi:10.1371/journal.pcbi.1003001
PMCID: PMC3616990  PMID: 23592959
23.  Proteome-wide evidence for enhanced positive Darwinian selection within intrinsically disordered regions in proteins 
Genome Biology  2011;12(7):R65.
Background
Understanding the adaptive changes that alter the function of proteins during evolution is an important question for biology and medicine. The increasing number of completely sequenced genomes from closely related organisms, as well as individuals within species, facilitates systematic detection of recent selection events by means of comparative genomics.
Results
We have used genome-wide strain-specific single nucleotide polymorphism data from 64 strains of budding yeast (Saccharomyces cerevisiae or Saccharomyces paradoxus) to determine whether adaptive positive selection is correlated with protein regions showing propensity for different classes of structure conformation. Data from phylogenetic and population genetic analysis of 3,746 gene alignments consistently shows a significantly higher degree of positive Darwinian selection in intrinsically disordered regions of proteins compared to regions of alpha helix, beta sheet or tertiary structure. Evidence of positive selection is significantly enriched in classes of proteins whose functions and molecular mechanisms can be coupled to adaptive processes and these classes tend to have a higher average content of intrinsically unstructured protein regions.
Conclusions
We suggest that intrinsically disordered protein regions may be important for the production and maintenance of genetic variation with adaptive potential and that they may thus be of central significance for the evolvability of the organism or cell in which they occur.
doi:10.1186/gb-2011-12-7-r65
PMCID: PMC3218827  PMID: 21771306
24.  Molecular evolution of genes in avian genomes 
Genome Biology  2010;11(6):R68.
Background
Obtaining a draft genome sequence of the zebra finch (Taeniopygia guttata), the second bird genome to be sequenced, provides the necessary resource for whole-genome comparative analysis of gene sequence evolution in a non-mammalian vertebrate lineage. To analyze basic molecular evolutionary processes during avian evolution, and to contrast these with the situation in mammals, we aligned the protein-coding sequences of 8,384 1:1 orthologs of chicken, zebra finch, a lizard and three mammalian species.
Results
We found clear differences in the substitution rate at fourfold degenerate sites, being lowest in the ancestral bird lineage, intermediate in the chicken lineage and highest in the zebra finch lineage, possibly reflecting differences in generation time. We identified positively selected and/or rapidly evolving genes in avian lineages and found an over-representation of several functional classes, including anion transporter activity, calcium ion binding, cell adhesion and microtubule cytoskeleton.
Conclusions
Focusing specifically on genes of neurological interest and genes differentially expressed in the unique vocal control nuclei of the songbird brain, we find a number of positively selected genes, including synaptic receptors. We found no evidence that selection for beneficial alleles is more efficient in regions of high recombination; in fact, there was a weak yet significant negative correlation between ω and recombination rate, which is in the direction predicted by the Hill-Robertson effect if slightly deleterious mutations contribute to protein evolution. These findings set the stage for studies of functional genetics of avian genes.
doi:10.1186/gb-2010-11-6-r68
PMCID: PMC2911116  PMID: 20573239
25.  Relationship between gene duplicability and diversifiability in the topology of biochemical networks 
BMC Genomics  2014;15(1):577.
Background
Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes.
Results
Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene’s duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes – the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.
Conclusion
Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks – an improvement of our understanding of gene duplicability.
doi:10.1186/1471-2164-15-577
PMCID: PMC4129122  PMID: 25005725

Results 1-25 (1301834)