|Home | About | Journals | Submit | Contact Us | Français|
Organisms face a constantly shifting landscape of environmental conditions and internal physiological states. How gene regulation and cellular functions are maintained across genetic and environmental variation is therefore a fundamental question in biology. Here, we analyze the Saccharomyces cerevisiae genetic interaction network to understand how the yeast cell maintains regulatory capacity across genetic backgrounds and environmental conditions. We used the recently characterized synthetic sick/lethal network in yeast, which measures the fitness effects of knocking out pairs of genes, to analyze interactions among 4,364 genes. Genes with large variance in epistatic effects on fitness are highly and ubiquitously expressed (with open chromatin conformations in their promoter regions) and evolve more slowly than genes with weak effects on fitness. Thus, rather than being the elements responsible for the regulation and responsiveness of the genetic network, genes with large epistatic effects tend to be more mundane “housekeeping” genes whose consistent expression is critical to fitness under all environments and that are thereby deeply embedded within the regulatory structure of the network. Our analysis shows that the yeast cell has evolved a system whereby a physical mechanism of regulation (nucleosome occupancy) buffers key genes from the variability experienced by the cell as a whole.
Biological systems are continuously faced with variability generated by both internal and external sources. Internally, mutations and shifting genetic backgrounds change the genetic context in which genes are expressed. Externally, changing environmental conditions alter the global context within which molecular systems operate and to which they must respond. Understanding how biological systems evolve robust function across changing conditions has been a frequent focus of research, yet the mechanisms underlying robustness remain unknown (Masel and Siegal 2009). At a structural level, all organisms achieve a balance between systems stability and responsiveness using complex networks of interacting genetic elements drawn from a wide variety of functional classes (e.g., proteins, noncoding RNA, epigenetic modifications, and DNA response elements). In contrast to other commonly studied networks, such as the Internet, genetic networks are remarkable in the heterogeneity of function of each individual component (Proulx et al. 2005). In light of this heterogeneity, are there general principles about the structure of genetic networks can be understood by studying network properties as a whole, or do the unique properties of each component in the system make general conclusions impossible? This is one of the fundamental questions in systems biology.
In principle, the network-wide influences of individual genes should reveal themselves via epistatic interactions among network components, broadly construed (Dixon et al. 2009). By necessity, the effects of individual genes will be contingent upon the broader genetic network in which they are found, since no gene can function in isolation. Taken two at a time, whether the combined effects of genes are positive or negative will depend on the relative position and connection of those particular genes within the network (Dixon et al. 2009; Michaut et al. 2011). There is no reason that a given gene should have all positive or negative interactions with genes throughout the genome—in fact network structure demands that this not be the case. Thus, while examining the average interaction properties of genes can be useful (e.g., He et al. 2010), it is in the variability of epistatic interactions that the structure of the genetic network should reveal itself (see also Phillips et al. 2000). Despite decades of interest and speculation, we still know very little about the role that epistasis plays in the structure, function, and evolution of genetic systems (Wolf et al. 2000; Phillips 2008). To some extent this limitation has been generated by a tendency (and perhaps necessity) to only consider a few interactions at a time, rather than the entire spectrum of possible interactions that a single gene may have with the many thousand other genes in the genome.
Comprehensive systems approaches are beginning to produce the detailed, large-scale descriptions of gene effects and interactions that are needed to address the question of an individual gene’s role within a broader network (Boone et al. 2007; Vizeacoumar et al. 2009). For example, Costanzo et al. (2010) mated strains of the Baker's yeast Saccharomyces cerevisiae containing single gene deletions or mutant alleles and measured the resulting double mutant effects on growth for ~6 million gene combinations. Growth is the most critical component of fitness in yeast, and highly correlated with other fitness components such as competitive ability (Bell 2010). Costanzo et al. (2010) estimated the magnitude and direction of the interaction between each pair of genes by comparing the double mutant fitness to fitness expectations derived from single mutants (fig. 1). Overall, this dataset comprises approximately 20% of the total potential yeast interactome.
The yeast genetic interaction network is remarkable because it measures functional biological relationships through quantitative effects on the fundamental unit of evolution: fitness. An emerging view is that single-cell organisms achieve robust cellular function in part through responsiveness in gene expression and regulation (Gasch et al. 2000). For example, S. cerevisiae tolerates fluctuations in the type and quantity of nutrients, temperature, pH, and chemical stressors. Yeast respond to these environmental stresses with global changes in gene expression involving 20–50% of the genome (Gasch et al. 2000; Causton et al. 2001). At the same time, yeast tolerate high levels of genetic perturbations and ~80% of yeast genes are dispensable or not essential for growth under standard laboratory conditions (Giaever et al. 2002). Perturbations affect genomic regulation and gene expression is correlated across environmental and genetic variation (Proulx et al. 2007; Choi and Kim 2009; Lehner 2010b). These patterns suggest that each gene may play a defined role in the cell's response to variability.
In this study, we address how epistasis relates to gene function and evolutionary change by calculating the whole-genome spectrum of epistatic fitness effects for each gene. Previous analyses of the yeast genetic interaction network have focused around the number and pattern of interactions each gene participates in (Costanzo et al. 2010; Bellay et al. 2011; Michaut et al. 2011; Szappanos et al. 2011). Measures of network structure, or topology, are common across diverse disciplines (for review, see Boccaletti et al. 2006), but numerous studies indicate that understanding biological networks requires a deeper understanding of biological processes and mechanisms (e.g., Hahn et al. 2004; Wang and Zhang 2007; Hakes et al. 2008; Jovelin and Phillips 2009; Agarwal et al. 2010; Podder and Ghosh 2010). Here, we take a different approach to analyzing a biological network by focusing on the effects that genetic interactions have on the organism and how network structure relates to these effects. We use the context of variation in fitness effects to then analyze results from a diverse set of published studies reporting genomic expression and regulation, evolutionary rates and functional molecular mechanisms, thereby revealing an integrated picture of organismal function and evolution.
We calculated the interaction mean and variance for each gene as a measure of epistatic fitness effects (fig. 2A and B). Epistasis refers to interactions between genes (Phillips 2008), and here is defined as a deviation from a multiplicative (independent) expectation of mutant fitness effects (fig. 1). Although previous studies have focused on the direction of epistatic interactions (for recent e.g., see He et al. 2010; Chou et al. 2011; Khan et al. 2011), each gene in the yeast interaction network participates in numerous epistatic interactions, both positive and negative (ranging from 6 to 3,738 total interactions per gene). We find that the variance in epistatic effects is a particularly useful secondary measure of each gene's whole-genome impact on fitness relative to other genes and quantifies its importance in the cellular system. If a gene has large variance in epistatic fitness effects, then it participates in a larger and more diverse set of interactions across the genome. Some of these interactions will be positive and result in less severe reductions in fitness, whereas others will be negative and thereby enhance or exacerbate the single mutant effects (fig. 1). On average, a perturbation in that gene will have a larger effect on fitness than one in a gene with small epistatic fitness effects. Interaction mean was highly correlated with variance (supplementary table S1, Supplementary Material online) but was not a statistically significant factor in our analyses (supplementary table S2, Supplementary Material online).
To analyze gene characteristics, we measured network topology and individual gene effects. Different measures of network topology were highly correlated with one another (supplementary table S1, Supplementary Material online), and we selected connectivity—the number of genes that a given gene interacts with—as the most descriptive topology measurement for further analyses. Individual gene effects were measured through essentiality (if the gene is essential for growth and development) and single mutant fitness (fitness after perturbing a single gene). Each of these measurements was correlated with one another, and with epistatic fitness effects, indicating that genes with higher interaction variance have higher numbers of connections, and larger individual effects on fitness.
Patterns of epistasis should in principle be reflective of the functional networks that underlie the mapping from genotype to phenotype. These networks are composed of genetic elements that vary across individuals and populations and are structured to respond to the environment by amplifying or suppressing variation. To examine the relationship between an individual gene's epistatic fitness effects and its tendency toward variation, we analyzed data from several genome-wide studies that reported gene expression across a broad range of genetic and environmental conditions. These include transcriptional and translational variation among genes (Landry et al. 2007; Nagalakshmi et al. 2008) and cells (Gasch et al. 2000; Newman et al. 2006), and genetic variation among siblings (Brem and Kruglyak 2005), populations (Choi and Kim 2008), and closely related species (Tirosh et al. 2006). Previous work has demonstrated that individual genes have correlated patterns of expression across these different scales (Choi and Kim 2009; Tirosh, Barkai, et al. 2009). Some genes are sensitive to perturbation and have expression levels that vary with environment or genetic background, whereas others have stable expression levels that are robust to both genetic and environmental change.
We mapped expression variation onto the yeast interaction network and found that the majority of these datasets showed a distinct pattern in which genes with small epistatic fitness effects show high variance in expression sensitivity (an example is shown in fig. 2C, and the full dataset is shown in supplementary fig. S1, Supplementary Material online). Some genes with small epistatic fitness effects are therefore extremely sensitive to perturbations while others have stable, robust expression. In contrast, we find that genes with larger variance in epistatic fitness effects consistently have stable levels of gene expression. The datasets we examined were negatively correlated with epistatic fitness effects with the exception of response to transcription factor regulation (Choi and Kim 2008) and expression level (Nagalakshmi et al. 2008), which was positively correlated with epistatic fitness effects.
We performed a partial correlation analysis to test for the possibility that relationships among the measures were affecting the correlations (supplementary table S2, Supplementary Material online). Epistatic fitness effects remained significantly correlated with the same datasets with the exception of expression level, for which the relationship was no longer significant. Single mutant fitness was significantly correlated with all of the expression datasets but transcription factor regulation. Connectivity was negatively correlated with chromatin regulation (Choi and Kim 2008, 2009), positively correlated with transcription factor regulation and expression level, and not correlated with the other datasets. We therefore conclude that single gene and epistatic fitness effects show the best relationship with expression variation.
Variation in gene expression can be generated both by trans-acting elements such as transcription factors and by cis-acting factors such as DNA response elements and local chromatin configuration. Recent work within yeast suggests that variation in gene expression maps to promoter architecture and nucleosome occupancy (Tirosh and Barkai 2008b; Choi and Kim 2009; Tirosh, Barkai, et al. 2009; Tirosh et al. 2010). A critical region of the promoter (−200 bp to −1 bp upstream of the transcription start site) shows a prominent nucleosome-depleted region (NDR) across the majority of genes (Lee et al. 2007). The DNA in NDRs is less likely to be bound to nucleosomes and is therefore thought to be more directly available to the transcriptional apparatus. The NDR is also found across Hemiascomycota species (Tsankov et al. 2010) and correlates with expression variance and gene function. Genes with open promoters tend to be “growth” genes involved in basic cellular maintenance processes and expressed constitutively during the cell's growth phase. Genes with promoters that are on average occupied by nucleosomes are overrepresented in stress and periodic physiological responses (Lee et al. 2007). These genes tend toward higher expression variance, and it is thought that the stochasticity and time lag associated with chromatin remodeling and the removal of the nucleosome results in transcriptional variation.
We used a dataset that mapped nucleosome occupancy at a 4 bp resolution across the entire S. cerevisiae genome (Lee et al. 2007) to examine the promoter architecture of each of the genes in the yeast interaction network and found that the probability of nucleosome occupancy in the NDR correlated negatively with epistatic fitness effects (ρ = −0.104, P = 2.2 × 10−18; fig. 3A and B). Single mutant fitness (ρ = 0.051, P = 0.75) and connectivity (ρ = −0.032, P = 0.22) were not correlated with nucleosome occupancy. The genome-wide average for nucleosome occupancy sits directly between genes with small epistatic effects and genes with increasingly strong epistatic effects (fig. 3C). Genes with the largest epistatic fitness effects have, on average, the most open chromatin directly upstream of the transcription start site. Nucleosome occupancy in other regions of the promoter and within the transcribed region do not differ between genes with different epistatic fitness effects. The area of the promoter upstream of the NDR shows high occupancy and weak, ‘fuzzy’ positioning (Lee et al. 2007), whereas the transcription start site marks the beginning of strong, consistent periodicity with well-positioned nucleosomes.
Although measures of gene expression variation have previously been found to be correlated with one another (Choi and Kim 2009), they have tended not to be correlated with divergence in coding sequence (Tirosh and Barkai 2008a). In contrast, we found that sequence-level divergence (dN/dS ratio) among closely related species (Wall et al. 2005) correlated negatively with variance in epistatic fitness effects (fig. 2D). Single mutant fitness and connectivity were also correlated with dN/dS, indicating that genes with small effects on fitness and fewer numbers of connections tend to evolve more quickly than other genes in the yeast interaction network (supplementary table S2, Supplementary Material online).
We analyzed the most significantly enriched gene ontology (GO) biological processes to gain a functional perspective on epistatic fitness effects (reported in supplementary table S3, Supplementary Material online). Genes with very small epistatic fitness effects (interaction variance <0.005) are enriched for metabolism and biological processes that connect the cell with its external environment, such as drug transport. Genes with slightly larger epistatic fitness effects (0.005 < interaction variance < 0.1) are enriched for processes associated with translation, gene expression, and metabolism. Genes with larger epistatic fitness effects (0.01 < interaction variance < 0.015) are enriched for processes associated with cellular, organelle, and chromosome organization and biogenesis. Genes with the largest epistatic fitness effects (interaction variance > 0.015) are also enriched for processes associated with different types of organization and biogenesis, and cellular localization.
Understanding biological and evolutionary significance at a systems level is a primary challenge for current biology, and our work demonstrates that studying genome-wide fitness effects creates a powerful framework for analyzing systems level patterns. Overall, the set of negative correlations revealed here implies a quantitative relationship between the multidimensional fitness effects of individual genes, nucleosome occupancy patterns, the production of variation through gene expression, and sequence-level divergence (fig. 3). Genes with large epistatic fitness effects are constitutively expressed throughout growth and have open chromatin directly upstream of the transcription start site. These genes are heavily regulated, core cellular components, and somewhat surprisingly, constant use has resulted in relative insensitivity to cellular conditions. Lack of dependence on chromatin remodeling produces robust expression levels, and these low variation genes show greater DNA sequence conservation among species. In contrast, genes with small epistatic effects have high variation across these scales. Some are constitutively expressed while others are periodically expressed, some have high nucleosome occupancy while others do not, and some have diverged through nucleotide substitutions while others have not.
Our analysis integrates several sets of well-documented phenomena, while providing insight into the evolutionary connections linking these phenomena. For example, previous studies have connected GO categories (Lee et al. 2007) and gene expression variability (Choi and Kim 2009) with patterns of nucleosome occupancy, and correlated highly connected “hub” genes with a number of features including essentiality, pleiotropy, chromatin structure, transcription, phenotypic capacitance, evolutionary rate, secretion, and vesicle transport (Costanzo et al. 2010). Previous studies have also suggested that promoter nucleosome occupancy may result in two classes of genes, one characterized by low levels of transcriptional plasticity and a second class with higher levels of transcriptional plasticity and evolutionary potential (Tirosh, Barkai, et al. 2009; Lehner 2010a). Although there has been ample evidence linking promoter nucleosome occupancy to transcriptional plasticity, until this point there has been no link between evolutionary change and patterns of nucleosome occupancy. The results presented here therefore connect transcriptional plasticity with sequence-level divergence via whole-genome fitness effects and show that genes with higher nucleosome occupancy evolved more quickly than low-occupancy genes.
From a functional perspective, the set of genes demonstrating large variance in epistatic fitness effects is somewhat surprising. Instead of the transcription factors and regulatory proteins that may a priori be expected to have both positive and negative interactions within the yeast cell, the genes with largest epistatic fitness variances are those involved in growth and maintenance of the cell, chromosomes, and organelles. These genes have stable expression levels and few evolutionary changes, indicating that the cell may not tolerate variation at these loci. The most likely explanation for these patterns is that this consistent expression is a dynamic equilibrium that is an evolved property of the yeast genetic network, which in turn is revealed by the fitness effects of the epistatic gene interactions. The variance in interaction effects generated when these genes are knocked out likely results from the differential influence of interactions with both positive and negative regulators. The yeast cell has thus evolved a system whereby expression and use of the core cellular components are buffered from the genetic and environmental variation the cell itself experiences. The most striking result from this study is that physical control at an epigenetic level is strongly reflected in fitness and the rate of evolutionary change.
In an evolutionary sense, the causal structure of this relationship is unclear, and there are two possibilities. First, because sequence-level divergence (dN/dS) measures the realized response to selection, genes with large epistatic effects may display constrained divergence because they do not produce sufficient expression variation for selection to act on. This is unlikely because yeast respond quickly to artificial selection for phenotypes as extreme as primitive multicellularity (Ratcliff et al. 2012). Expression variation therefore does not appear to be a barrier to divergence, but we can not exclude this possibility. The second, more likely, explanation is that these genes display constrained divergence because they are under very strong stabilizing selection. Strong stabilizing selection has been documented for phenotypic traits (Stinchcombe et al. 2008), and mutation accumulation and hybridization experiments suggest a large capacity for expression divergence under relaxed selection (Denver et al. 2005; Rifkin et al. 2005; Tirosh et al. 2006; Tirosh, Reikhav, et al. 2009). Genes with large epistatic effects interact strongly across the genome and tend to be involved in basic cellular function and maintenance (supplementary table S3, Supplementary Material online). This combination of multidimensional fitness effects and functional roles suggests that the cell may not readily tolerate sequence-level changes at these genes.
Additional studies including information on interactions across environments and genotypes, and across multiple phylogenetic scales will be needed to test the hypothesis that regulatory robustness is an evolved property of these networks. Nevertheless, in yeast, it appears that one critical level of epigenetic regulation drives the complex system of genetic interactions. Epigenetics and chromatin dynamics are increasingly implicated in errors across functional networks and cancer (e.g., Gui et al. 2011). As interaction networks are documented at a higher resolution in a variety of organisms, additional research addressing the link between regulation, variation, and fitness may refine this picture and demonstrate a quantitative relationship between multidimensional fitness effects and the propensity for error.
Genetic interactions were obtained from measures of single and double mutant fitnesses reported by Costanzo et al. (2010) for 4,364 open reading frames (ORFs). Interactions with P < 0.05 were used to create the genetic interaction network we analyzed here. We calculated the connectivity (number of ORFs with a nonzero interaction), interaction mean, and variance for each ORF in the genetic interaction network. To analyze the relationship among network measures, we calculated Pearson correlation coefficients between the estimates for essentiality and single mutant fitness (reported in Costanzo et al. 2010), and our estimates of connectivity, interaction mean, and interaction variance with R (reported in supplementary table S1, Supplementary Material online). We used the Python 2.7 Networkx package (version 1.5) to calculate the betweenness, closeness, and degree centrality for each ORF in the yeast interaction network. Betweenness was calculated as for each node, where V is the set of nodes, σ(s, t) is the number of shortest paths and σ(s, t|ν) gives the number of paths passing through node ν excluding s, t. Closeness was calculated as the inverse of the average distance to all other nodes. Degree centrality was calculated as the fraction of other nodes each ORF was connected to. We calculated the Pearson correlation coefficients between the network topology measures and interaction variance with R (supplementary table S3, Supplementary Material online).
The yeast genetic interaction data did not overlap completely with the datasets measuring gene expression, and for each dataset we extracted measurements for the yeast interaction network ORFs. We obtained measures of expression level for 2,812 of the ORF in the yeast interaction network from a study reporting the median transcription level of the last 30 bp for each ORF through RNA-Seq (Nagalakshmi et al. 2008). We obtained measures of transcription factor regulation and chromatin regulation from a study measuring gene expression across a compendium describing the deletion effect of chromatin regulators or transcription factors (Choi and Kim 2008, 2009). 4,268 ORFs from the yeast interaction network were measured in the chromatin regulation study and 4,163 ORFs were measured in the transcriptional regulation study. Divergence was reported in a microarray study measuring gene expression across five sets of environmental perturbation (heat shock, oxidative stress, nitrogen starvation, DNA damage, and carbon source switching) between S. cerevisiae, S. paradoxus, S. mikatae, and S. kudriazevii. The set covered 2,838 ORFs of the yeast interaction network. Plasticity was measured as the sum of squares of the log2 ratios over a literature-curated dataset of yeast expression across 1,500 conditions for the same 2,838 ORFs in the yeast interaction set (Tirosh et al. 2006). Mutational variance (Vm) was estimated through microarray gene expression measures across four mutation accumulation lines for 3,873 ORFs in the yeast interaction set (Landry et al. 2007). Noise was measured through single-cell proteomic analysis that described variation relative to the median (Newman et al. 2006) for 1,502 ORFs contained in the yeast interaction network. Measures of stress response were obtained from a study reporting variance in microarray gene expression for 4,308 ORFs in the yeast interaction network (Gasch et al. 2000). Segregating genetic variance was reported in a study measuring gene expression across 112 segregants from a cross between a standard laboratory strain and a wild yeast isolate (Brem and Kruglyak 2005), and included 3,978 ORFs contained in the yeast interaction set. We calculated the Spearman correlation coefficient between each of the datasets and interaction variance in R. We also calculated partial Spearman correlation coefficients with the R Partial Correlation package to analyze the significance of interaction variance when controlling for the other network measures (supplementary table S2, Supplementary Material online).
Nucleotide divergence was obtained from a study reporting dN/dS ratios between S. bayanus, S. mikatae, S. paradoxus, and S. cerevisiae (Wall et al. 2005) for 2,167 ORFs in the yeast interaction network. Results did not differ when the ratio between nonsynonymous substitutions (dN) and synonymous substitutions per synonymous site (dS) was corrected for selection on synonymous sites (dN/dS′). Nucleosome patterns were extracted from a tiling array study mapping nucleosome occupancy at a 4 bp resolution across the yeast genome (Lee et al. 2007). Genomic chromatin was extracted from haploid yeast grown in YPD medium and 3,505 ORF’s overlapped with the yeast interaction network. The yeast interaction dataset is available from http://drygin.ccbr.utoronto.ca/costanzo2009 (last accessed June 2012). Divergence and plasticity data are given at: http://barkai-serv.weizmann.ac.il/TATA (last accessed June 2012). Stress response data are available from: http://genome-www.stanford.edu/yeast (last accessed June 2012). The atlas of yeast nucleosomes is available from: http://chemogenomics.stanford.edu/supplements/03nuc (last accessed June 2012). Enrichment and significance for GO biological processes were calculated with GOStat: http://gostat.wehi.edu.au/ (last accessed August 2012).
The authors thank the members of the Phillips lab for discussions and two anonymous reviewers for helpful comments. This work was supported by NIH grant R01-GM096008 and a Senior Scholar in Aging award from the Ellison Medical Foundation to P.C.P. and a U.S. NSF Postdoctoral Fellowship in Biological Informatics to J.L.F.