|Home | About | Journals | Submit | Contact Us | Français|
The effects of genetic variants on phenotypic traits often depend on environmental and physiological conditions, but such gene–environment interactions are poorly understood. Recently developed approaches that treat transcript abundances of thousands of genes as quantitative traits offer the opportunity to broadly characterize the architecture of gene–environment interactions. We examined the genetic and molecular basis of variation in gene expression between two yeast strains (BY and RM) grown in two different conditions (glucose and ethanol as carbon sources). We observed that most transcripts vary by strain and condition, with 2,996, 3,448, and 2,037 transcripts showing significant strain, condition, and strain–condition interaction effects, respectively. We expression profiled over 100 segregants derived from a cross between BY and RM in both growth conditions, and identified 1,555 linkages for 1,382 transcripts that show significant gene–environment interaction. At the locus level, local linkages, which usually correspond to polymorphisms in cis-regulatory elements, tend to be more stable across conditions, such that they are more likely to show the same effect or the same direction of effect across conditions. Distant linkages, which usually correspond to polymorphisms influencing trans-acting factors, are more condition-dependent, and often show effects in different directions in the two conditions. We characterized a locus that influences expression of many growth-related transcripts, and showed that the majority of the variation is explained by polymorphism in the gene IRA2. The RM allele of IRA2 appears to inhibit Ras/PKA signaling more strongly than the BY allele, and has undergone a change in selective pressure. Our results provide a broad overview of the genetic architecture of gene–environment interactions, as well as a detailed molecular example, and lead to key insights into how the effects of different classes of regulatory variants are modulated by the environment. These observations will guide the design of studies aimed at understanding the genetic basis of complex traits.
Individuals frequently encounter different environmental conditions, and the physiological and behavioral responses to these conditions can depend on an individual's genetic makeup. This phenomenon is known as gene–environment interaction. For example, individuals who are infected with the Plasmodium falciparum parasite are susceptible to malaria, but not if they carry the sickle-cell allele of hemoglobin. The general properties of gene–environment interaction are poorly understood, and a better understanding is essential if individuals are to make informed health choices guided by their genomic information. We have investigated gene–environment interaction on a genomic level, characterizing its role in over 4,000 traits at once by investigating natural variation in yeast gene expression. We compared lab and vineyard strains of yeast growing in two conditions (glucose and ethanol as carbon sources) in which they adopt two different metabolic states: fermentation and aerobic respiration, respectively. We show that gene–environment interaction is a common phenomenon, describe how different classes of genetic variants affect the nature of the interactions, and provide detailed molecular examples of interactions.
We are rapidly approaching an age when genomic sequence will be used to inform life decisions. The extent to which lifestyle choices will change how our genetic blueprint is expressed, however, depends on the presence of gene–environment interactions: the phenomenon where the effect of a genetic variant differs in multiple environments. In humans, gene–environment interactions have been reported for many diseases, including heart disease (reviewed in ), depression , and cancer . More often, however, studies either fail to see these effects or they are difficult to reproduce. Studies in model and agricultural organisms have been more promising, particularly in experimental crosses where gene–environment interaction can be studied more easily on a genome-wide level, as opposed to only targeting candidate genes. When quantitative trait linkage analysis has been performed for the same trait in multiple environments, different loci are often found in the different environments , and when tested explicitly, these loci often display gene–environment interaction [5,6]. However, these studies are limited in their scope because they either examine one or a few phenotypes or restrict their analysis to loci that were initially found within a single condition. In humans, as well as in experimental systems, tracking down the molecular mechanisms of gene–environment interaction has proved difficult.
Recently, gene transcript abundance has been used as a model to study the genetics of thousands of quantitative traits at a time [7,8]. The studies have targeted organisms such as yeast , humans [10–13], mice [14–16], rat , and Arabidopsis . Because so many traits can be studied at once, these studies have been able to elucidate trends in difficult-to-study phenomena, such as gene–gene interaction [19,20] and genetic complexity .
Gene–environment interaction recently has been studied using transcript levels in yeast  and in worms . Landry et al. (2006) compared six strains in three conditions and found 221 transcripts showing gene–environment interaction , suggesting that there is much phenotypic diversity at the strain level in how yeast responds to its environment. Li at al (2006) mapped quantitative trait loci responsible for expression differences in two different temperatures in Caenorhabditis elegans . They identified 197 linkages that showed gene–environment interaction, suggesting that studying gene–environment interaction of expression phenotypes using model organisms in controlled conditions should prove fruitful.
We have used a large family of yeast segregants to characterize gene–environment interaction on a global scale. Two parental strains (BY and RM, see Materials and Methods) and 109 segregant strains from a cross between the parental strains were grown in two conditions—glucose and ethanol as carbon sources—and expression profiled using microarrays. When growing in glucose, yeast predominantly ferment glucose to ethanol. When the cells run low on glucose, they switch to a primarily respiratory state in which they metabolize ethanol . The transcriptional state of the cells changes dramatically during this period [24,25], with transcripts influencing growth and respiration particularly affected. This difference in the metabolic state may change how genetic variants influence traits, resulting in the observation of gene–environment interaction. By studying thousands of such traits, we sought to characterize the general genetic architecture of gene–environment interaction, and by using the molecular tools available in yeast, we were able to characterize some of the variants involved at the gene level.
Gene–environment interaction occurs when the phenotypic effects of genotype and environmental condition are not independent. For instance, a transcript could show a large difference in expression between two strains in glucose, but show no difference between these strains in ethanol. Thus, the effect of the genotype depends on which condition is tested. We characterized gene–environment interaction by two complementary approaches. First, we investigated gene–environment interaction on a strain level by comparing the two parental strains. Second, we used 109 segregants from a cross between the parents to characterize gene–environment interaction on a locus level.
We characterized the extent of gene–environment interaction influencing transcriptional phenotypes in the parental strains BY and RM by growing six independent cultures of each strain in glucose and in ethanol, and measuring gene expression with microarrays (Figure 1 and Dataset S1). We determined the influence of strain, condition, and the interaction between strain and condition (here we use strain–condition interaction to distinguish effects due to overall strain differences in genetic background, as opposed to a specific gene or locus) by using analysis of variance (ANOVA). Of the 4,342 transcripts with high-quality data, 2,037 (47%), 2,996 (69%), and 3,448 (79%) showed significant effects due to strain–condition interaction, strain, and condition, respectively (see Figure 1B and and1C1C for strong examples and Table S3 for a full list). Transcripts were often influenced by multiple factors (Figure S2A). We detected ten times as many transcripts showing strain–condition interaction as has been previously reported for gene expression in yeast . This result is primarily due to the number of replicates we performed (six); an analysis with two replicates detected a number of interactions similar to the previous report. Strain–condition interaction effects accounted for 9% of the total variance explained over all transcripts, whereas strain and condition effects were larger, explaining 21% and 36% of the variance, respectively. The importance of each of these factors varied for individual transcripts, with each factor playing a dominant role for a subset of transcripts (Figure 2). Genetic correlations of transcript levels between the two conditions were on average low (mean genetic correlation = 0.26), with transcripts that show significant strain–condition interaction having lower genetic correlations than those that do not (−0.07 vs. 0.56, respectively; Mann-Whitney p < 10−15). These genetic correlations are lower than those observed for outbred livestock populations [26,27]. Thus, as previously reported, strain  and condition  effects profoundly influence gene expression, but interaction effects between the two are also important.
To understand the genetic basis of gene–environment interaction, we measured transcript expression in 109 segregants derived from a cross between BY and RM . Each of the 109 previously genotyped segregants was expression profiled in both glucose and ethanol conditions on microarrays (Dataset S2). We performed linkage analysis on the transcript levels within each condition, and 3,997 and 3,489 linkages were observed in glucose and ethanol, respectively (Figure S1 and Table S4). We determined loci that show gene–environment interaction (gxeQTL) by performing linkage analysis on the difference between conditions (for each segregant, the difference between conditions is the difference between expression in ethanol and expression in glucose; this type of measurement is also known as plasticity  and is similar to environmental sensitivity  (see Text S1)). The difference between conditions was calculated for each segregant for 4,482 transcripts with high-quality data (see Materials and Methods), and subjected to linkage analysis with 2,894 markers spaced throughout the genome. There were 1,382 transcripts that showed gene–environment interaction with at least one locus (total of 1,555 linkages) at a genome-wide 5% false discovery rate (FDR), as determined by permutation (see Figure 3 for an example). Transcripts that showed strain–condition interaction in the parents were more likely to show at least one linkage in the segregants (relative risk [RR] = 1.8, 95% confidence interval [CI] = 1.6–1.9, χ2 = 154, p = 3 × 10−35), but there was not a one-to-one correspondence. For some transcripts, effects were revealed in the segregants that were not observed in the parents. For others, we were unable to map specific loci responsible for differences observed in the parents. Some of this may due to differences in power, but because of genetic complexity, we do not expect to map a locus for all transcripts that show significant genetic or gene–environment interaction effects in the parent strains. Previous work has also shown that effects that are revealed only in the segregants (transgressive segregation) are common in the genetic regulation of gene expression within a condition . We think that these phenomena are likely to play a role here as well.
Since transcripts derive from open reading frames (ORFs) with physical locations in the genome, we can make a distinction between linkages that are physically near the ORF (local) and those that are far away from the ORF (distant). Local linkages are often due to variation at cis-acting sites [30–32], whereas distant linkages likely result from variants that influence a trans intermediate, such as those resulting in a change in protein function or concentration. Most transcripts linked to distant loci. However, for gxeQTL, this was especially pronounced. gxeQTL were less likely to be local, with 172/1,555 (11%) of all gxeQTL being local, as compared to linkages within either condition, where 22%–25% of all linkages were local (Table 1). This result agrees with previous reports that either directly addressed interaction or observed differences between conditions [6,10,15–17,33,34], and suggests that changes in cis-regulatory sites are less condition dependent than those influencing trans-acting factors.
As has been previously reported in mouse [14–16] and yeast , distant linkages cluster in a small number of regions. We divided the genome into 10-cM–sized bins and counted the number of distant linkages observed in glucose, ethanol, or gxeQTL that fell within each of the bins (Figure 4). The majority of distant linkages fell into bins with a significant excess of linkages, with 81% (1,126/1,383) of distant gxeQTL occurring in 31/563 bins, or 5.5% of the genome. Significant bins that were located immediately next to each other were merged into a single peak to form 13, 13, and 15 peaks for glucose, ethanol, and gxeQTL, respectively (Figure 4 and Table S1). Some of these regions coincided with loci that have been characterized at the gene level, including AMN1 , GPA1 , MAT , HAP1 , and IRA2 (this paper).
Peaks identified in glucose did not coincide with those in ethanol (Figure 4). Thirteen peaks were found in each condition, but only seven were found in both. Two large peaks are found for gxeQTL, but are only observed within one condition, suggesting that the polymorphism likely acts in one condition, but not the other. We highlight these as striking examples of differences in genetic regulation across conditions. The first of these colocalizes with the gene encoding Gpa1, the G-protein alpha subunit of the pheromone response signaling pathway [36,37], which contains a polymorphism previously associated with transcripts involved in pheromone response . In glucose, 49 transcripts show distant linkage here, whereas in ethanol, there are no distant linkages, suggesting that the polymorphism in GPA1 does not influence these transcripts in ethanol. When we asked where these transcripts linked in ethanol, we found that 12 showed linkage to eth12, a peak that is also enriched for genes involved in the pheromone response. Given the overall rates of linkage to both loci, we would expect less than one overlap by chance, and the observed overlap of 12 genes is highly significant (Fisher exact p-value = 8 × 10−19). Within this region lies DIG1, which codes for an inhibitor of the pheromone response/invasive growth transcription factor Ste12 [38,39]. Allele replacements of DIG1 in both backgrounds show that variants in DIG1 make a major contribution to the effects at this locus (Figure S3). This example highlights how environmental differences can change the relative importance of genetic variants in the same pathway, resulting in different genes being more important in one or another condition.
Another example of a condition-specific distant peak is provided by the eth11/gxe13 peak on chromosome 15 (160–220 cM). A total of 386 transcripts show distant linkage to this locus in ethanol, whereas in glucose, only ten show distant linkage. In contrast to most other peaks, the transcripts linking here do not show functional enrichment for gene ontology (GO) terms. Targets of similar signaling pathways should share transcription factor binding sites, even if the functional implications of the pathway are unknown. To determine whether we could find enrichment for specific transcription factor binding sites, we split this group of transcripts into four groups, depending on whether they were up or down-regulated by condition and genotype. We used transcription factor binding site data derived from experimental binding as well as species conservation  to determine whether any of these gene lists were enriched for experimentally confirmed as well as potential transcription factor binding sites. Those genes that were expressed higher in glucose and in the segregants carrying the RM allele were enriched for sites where MSN2 and MSN4 target sequence was conserved (MSN2 p = 2 × 10−7, MSN4 p = 3 × 10−6), but not for sites that have been shown to experimentally bind MSN2 or MSN4. Experimental binding of MSN2 and MSN4 was tested in a variety of conditions , but these did not include ethanol, and so these targets may reflect an additional functional role for these transcription factors. Those transcripts that show lower expression in ethanol and higher expression in segregants with the BY allele are enriched for CIN5 binding sites that are both conserved and show experimental binding in the conditions tested (high and moderate hyperoxia, enrichment p = 6 × 10−5). CIN5 is located in this region, shows local linkage with gene–environment interaction, and contains 17 nucleotide changes in the 500 bp upstream of start (two to three are expected on average, given an upstream polymorphism rate of 0.005 ), in addition to five coding single nucleotide polymorphisms (SNPs) (two nonsynonymous), making it an excellent candidate for regulating a subset of these transcripts. CIN5 may not be the regulator for all transcripts linking here—because the peak is very wide, it may contain multiple variants.
After examining large differences in distant peaks, we sought to more generally and rigorously describe how the effect of a locus changes across conditions. Comparisons of gxeQTL with linkage results for transcript levels in glucose or ethanol showed that only 965/1,555 (62%) gxeQTL also reached genome-wide significance levels in at least one condition (see Figure S2B for full Venn diagram). However, for a test of whether a specific marker that represents a gxeQTL is also linked to the expression level of the corresponding transcript in one of the conditions, setting a genome-wide significance threshold is too conservative. We tested for linkage between the most significant marker at each of the 1,555 gxeQTL and expression of the respective transcripts in glucose and in ethanol, lowering the threshold to a FDR of 5% for testing a single marker for each trait, as obtained through inspection of the p-value distribution with QVALUE . All 1,555 gxeQTL showed an effect in at least one condition, with 265/1,555 (17%) showing an effect only in glucose, 327/1,555 (21%) showing an effect only in ethanol, and 963/1,555 (62%) showing an effect in both conditions. The fact that all gxeQTL showed linkage to the expression phenotype in at least one condition is not surprising, as a difference in how the genotypes differ between two conditions mathematically requires a change in the mean phenotypic value in at least one condition. The QVALUE method also provides an estimate of the proportion of loci found within a condition that show gene–environment interaction. We inspected the distribution of all p-values for linkage between the most significant marker in one condition and the difference in expression between the conditions, and found that 77% and 76% of loci in glucose and ethanol, respectively, also influence the difference between conditions.
Since promoters often define when and where transcripts get expressed, it might be thought that when they do show gene–environment interaction, variants influencing cis-acting regions would be more likely to be condition specific. We did not observe this relationship between local and distant linkages, however. Distant linkages were significantly more likely to be condition specific than local linkages, as 546/1,383 (40%) of distant linkages were significant in only one condition, as compared to 46/172 (27%) of local linkages (Table 2 and see Figure 5 for a schematic). Thus, local linkages are overall less dependent on the environment, and when they do show gene–environment interaction, their effects are less likely to be restricted to a specific condition.
We investigated whether individual hot spots showed condition specificity by comparing the proportion of gxeQTL in each hotspot that were glucose specific or ethanol specific to the overall rates in all distant gxeQTL using a hypergeometric test (Figure 6). Five hot spot regions showed altered proportions of condition-specific gxeQTL: peaks gxe2 (AMN1), gxe5, gxe6 (GPA1), and gxe12 (IRA2) were enriched for glucose-specific linkages, whereas peak gxe13 was enriched for ethanol-specific linkages. These results agree with the observation that distant peaks change between conditions, but provide a more quantitative result that indicates that some distant loci show differential effects between conditions. This may be due to changes in the activity states of the proteins and pathways involved as conditions change, or it could also be due to epistasis that masks the effects of the polymorphisms in the alternate condition.
Loci showing gene–environment interaction that have effects in both conditions may act by increasing expression in one condition and decreasing expression in the other or by affecting expression in the same direction in both conditions, but to a different extent (see Figure 5 for a schematic). Of those gxeQTL that had an effect in both conditions (963/1,555), the effects were in opposite directions 72% of the time. This phenomenon was dependent on the type of locus, as local gxeQTL were more likely to act in the same direction in both conditions (66%), whereas distant gxeQL were more likely to act in opposite directions (78%) (Table 3). The majority of the loci that had effects in opposite directions did not reach genome-wide significance in either condition (463/697). Exclusion of loci that were not genome-wide significant in at least one condition only slightly reduced the association between the type of locus and the direction of effect (RR = 2.3 vs. 3.0). The high rate of distant loci acting in opposite directions is not due to an individual peak, as most distant peaks, as well as the set of all linkages that fall outside of peaks, have a high proportion of linkages that act in opposite directions (Figure 6). This pattern is consistent with the result that local gxeQTL are less likely to show gene–environment interaction overall and less likely to be condition specific.
We further investigated peak gxe12, located at chromosome 15 50–80 cM, which we observed to be enriched for transcripts that show glucose-specific effects. A total of 372 transcripts link here distantly, and many are generally involved in energy metabolism and growth (Figure 7A). Most of these transcripts show large changes between glucose and ethanol, but the allele that the segregant inherits determines the strength of the change. The transcripts that are repressed in ethanol are enriched for ribosomal biogenesis and assembly (p = 8 × 10−13), whereas those that are activated in ethanol are enriched for generation of precursor metabolites and energy (p = 1 × 10−6). IRA2 is a strong candidate for the regulation of these transcripts. Ira2 is an inhibitor (GTPase activating protein) of the Ras proteins, which mediate the cellular response to glucose via the Ras/PKA pathway . Components of this pathway are highly conserved across species, and Ira2 is a homolog of the neurofibromin tumor suppressor in humans . The IRA2 coding region is highly polymorphic, with 87 SNPs and a single 3-bp indel differing between the strains, resulting in 61 synonymous changes, 26 nonsynonymous changes, and one RM insertion. Of the 26 nonsynonymous changes, BY carries the ancestral amino acid (Saccharomyces paradoxus) in 20 cases, and RM carries the ancestral amino acid in four cases, with the remaining two different from S. paradoxus in both strains.
To determine whether polymorphism within IRA2 is responsible for the observed linkages, we generated allele replacement strains that carried the coding region of each parent strain in the background of the other parent. We felt that targeting the coding region was appropriate because the expression of IRA2 has not shown local linkage . We expression profiled the replacement strains in glucose and ethanol, and compared the effect of the replacement to the effect of the locus observed in the segregants (Figure 7B and and7C).7C). We observed high correlations between replacements in both parental backgrounds and segregant effects for transcripts showing gene–environment interaction, and the regression slope for both was close to 1. Many transcripts link to this region in glucose and ethanol (1,159 and 410, respectively), and IRA2 polymorphisms also contribute to the variation in transcript levels within these conditions (Figure S4). Thus, polymorphism in IRA2 appears to be the major contributor to the difference in expression between conditions as well as to transcript abundance in each condition for many transcripts linking to this region. This is consistent with coding variants in Ira2 influencing transcript levels via changes in the signaling state of the Ras/PKA pathway.
To determine which allele had a stronger effect on the Ras/PKA pathway, we compared the IRA2 alleles to their respective knockout strains. We knocked out IRA2 in both parental strains, grew the resulting strains in glucose and ethanol, and then measured the transcriptional response, as compared to the parental strain (Figure 8). If the allele is active under the conditions that we tested, then the knockout should look different from the parental strain. We observed that both knockout strains were different from the parental strains, with linear regression slopes less than 1 (BY slope = 0.75, 95% CI: 0.71–0.78; RM slope = 0.40, 95% CI: 0.35–0.45), but the RM knockout showed a greater difference from the parental strain (Figure 8). Since knocking out IRA2 in RM causes a bigger change than knocking it out in BY, this suggests that the RM allele of IRA2 is better at inhibiting the Ras/PKA pathway in these conditions.
The BY allele of IRA2 could be a poorer inhibitor because BY has lost some ability to inhibit or because RM has gained this ability. We took a sequence-based approach and compared IRA2 coding sequences from the three sequenced S. cerevisiae strains (S288c, isogenic to BY; RM11-1a; and YJM789 [YJM]), with S. paradoxus as an outgroup. The S288c sequence is more similar to the ancestral S. paradoxus than either RM or YJM, indicating that RM IRA2 is the more diverged allele. BY is more similar to the ancestral sequence, suggesting that RM has gained the ability to inhibit signaling more. However, since we do not know the specific causal variant(s), we are unable to rule out the possibility that a small number of polymorphisms that are not characteristic of the overall trend are responsible for a decrease in the BY allele's ability to inhibit.
If the RM allele has gained the ability to more strongly inhibit Ras/PKA signaling, then it might be expected to show signs of selection. We used the McDonald-Kreitman test to look for evidence of selection in IRA2 by comparing rates of nonsynonymous and synonymous substitutions at polymorphic versus fixed sites within the three sequenced S. cerevisiae strains, as compared to the outgroup S. paradoxus. Of the sites that were fixed in S. cerevisiae, 133/882 (15%) were nonsynonymous, whereas within polymorphic sites, this rate was 2-fold higher (31/103, 30%). This difference is significant (Fisher exact p = 0.0001), suggesting that positive selection, or a relaxation of negative selection, has occurred in the S. cerevisiae strains. The RM strain appears to be contributing the most to the nonsynonymous rates, as the ratio of nonsynonymous to synonymous changes along the RM branch is the highest within the strains. Thus, the RM allele appears to have experienced a change in selection pressure in the past. Further analyses of a variety of yeast isolates should help elucidate the evolutionary history of this locus.
Here, we have used yeast gene expression as a model system to describe gene–environment interaction at strain, locus, and gene levels. We showed that 2,037 transcripts were jointly dependent on strain and condition in two parental strains. Then, we performed linkage analysis on the difference in transcript levels across conditions with 109 segregants, and identified 1,555 gxeQTL. The high number of gxeQTL that we detected has allowed us to make some general observations. We have shown that local and distant linkages differ dramatically in how they act across multiple conditions. Local linkages appear to be more stable: they are less likely to be dependent on the environment, and even when they are, they are more likely to have an effect in both conditions, with the direction of effect often being the same. Distant linkages, on the other hand, are more volatile: they are more likely to be dependent on condition and to show an effect in only one condition. Entire distant peaks can change across conditions, and when they do have an effect in multiple conditions, distant loci are more likely to act in different directions. Finally, we characterized the gene responsible for influencing the largest gene–environment interaction distant peak, IRA2. We showed that the RM allele of IRA2 is a stronger inhibitor of Ras/PKA signaling than the BY allele in the conditions that we tested, and that this locus has experienced a change in selective pressure in RM.
Previous studies have reported that local linkages are more consistent than distant linkages across conditions and experiments, including worms in different temperatures , different tissues in mice [15,16,33], and in the reproducibility of transcript linkages in human studies [10,34], indicating that this pattern is likely to extend beyond yeast. Since local and distant linkages are likely to differ in how they influence traits on a molecular level, we can speculate as to how they show differences in condition dependence. Although we do not know most of the causative polymorphisms involved for each type of linkage, local linkages show increased rates of polymorphism in 5′ and 3′ noncoding regions and high rates of allele-specific expression . Thus, we feel comfortable treating the two groups as distinct entities: local linkages are likely to be enriched for variants that directly influence transcript levels via changes in cis-regulatory sites, whereas distant linkages typically influence levels via a protein intermediate (trans factors). In a cis-regulatory site that interacts with a binding protein to directly increase or decrease transcript levels, mutations can either disrupt or enhance binding, but are unlikely to be able to do both in a condition-specific manner. If the binding protein is an activator, a loss of binding mutation will result in lower transcript levels in all conditions where the activator is present, and will show no change when the activator is absent, resulting in either a condition-specific pattern or in a pattern in which the locus has the same effect or the same direction of effect in both conditions (Figure S5A). Other than the case where a single polymorphism destroys a site for one binding factor while creating a site for another, it is less clear how one cis-regulatory mutation could be associated with effects in opposite directions, although one might imagine more complicated scenarios with either multiple linked cis-regulatory polymorphisms or transcription factors that are able to act as both activators and inhibitors. On the other hand, distant variants that influence protein intermediates have the potential to interact with many proteins, depending on the milieu present in the cell in a given condition (Figure S5B). A single variant may be able to activate transcription in one condition and repress it in another, resulting in a change in direction of effect.
The frequent occurrence of locus effects in opposite directions in the two conditions is surprising. One possible explanation is multiple linked polymorphisms. Distant loci are much larger targets for variation than cis-regulatory regions , but this could occur in both contexts. Multiple compensatory mutations could accumulate at loci and, depending on the condition, could compensate differentially. The mean phenotype would be stable over conditions, yet the direction of the effect within a condition could vary (Figure S5C). One example is suggested by the gxe11 peak on chromosome 14, where multiple polymorphisms that influence sporulation [46,47] and high-temperature growth  have been characterized, and some act in the opposite direction of the overall locus effect. At least one of these alleles (MKT1 D30G) is at least partially responsible for the gxeQTL in this region (Figure S6). Further characterization of the polymorphisms involved at these loci should help elucidate the underlying mechanisms behind this phenomenon.
A practical implication of the observation that distant loci often act in opposite directions in the two conditions is that such loci may be inherently difficult to detect in experiments where condition is not controlled, as when different tissues of multicellular organisms are mixed or when nonexperimental organisms (including humans) that experience different unmeasured environments are studied. This is because when conditions are not controlled, effects of opposite direction will cancel each other out, resulting in no overall association between the trait and the locus. This also has implications for selection, as these loci may be hidden from selection in organisms that experience fluctuating environmental conditions. The detection of loci that act in opposite direction is additionally complicated by our observation that the majority of these loci did not reach genome-wide significance in either condition alone, emphasizing the importance of using methods to directly test for gene–environment interaction without prior reliance on linkage in a single environment.
One general implication of our results for studies in other species, including humans, is that many genetic effects on most traits are likely to be detected without testing for gene–environment interactions, provided that the relevant environmental factors are known and controlled either experimentally or statistically. However, analyses that ignore gene–environment interactions introduce strong biases with regard to the types of loci that are detected. Moreover, gene–environment interactions play a dominant role for a minority of traits. We have studied the prevalence and importance of gene–environment interactions in a single-cell organism grown in two very different and precisely controlled environmental conditions. Our focus on transcript levels as quantitative traits allowed us to study a very large number of traits simultaneously and to delineate general patterns, as well as to provide detailed molecular examples of loci that show gene–environment interactions. The quantitative details would undoubtedly differ if different species, environments, and phenotypes were studied. It is possible that many environmental differences experienced by humans may be less drastic than those between growth on glucose and ethanol are for yeast. However, some environmental differences (for example, exposure to pathogens or shift from traditional to modern diets) can have a dramatic effect on health. Our detailed studies in a model organism provide examples of the types of effects that may be expected in humans, and thereby inform practical study design. Understanding the subset of human genetic variants whose phenotypic effects are modifiable by the environment will be key in making full use of personal genomic variation.
The segregants used in this study have previously been expression profiled in condition similar to the glucose condition used here . However, in order to make conditions as similar as possible for this study, with only the carbon source differing, these experiments were repeated alongside the experiments in ethanol. Yeast strains were grown in media consisting of 6.7g/l Yeast Nitrogen Base (Sigma Y0626), 100 mg/l leucine, 100 mg/l lysine, 20 mg/l uracil, and one of the following: 1% (v/v) ethanol or 2% (w/v) glucose. All growth was performed at 30 °C. After streaking to rich medium (YPD) plates from the freezer, cells were pregrown in 5 ml of the phenotyping medium overnight, while spinning on a rotor drum. Approximately 5 × 105 (glucose) or 107 (ethanol) cells were transferred to 15 ml of fresh medium in a 125-ml Erlenmeyer flask and grown overnight on a rotary shaker at 200 rpm. Cells were harvested at an optical density at 600 nm (OD600) between 0.36 and 0.40. Cells were collected (5 ml) at 30 °C by vacuum filtration, and the filters were immediately frozen in liquid nitrogen. Replicates of parental strains and parental strains used to generate the reference sample were grown interspersed with segregant strains such that the parental variation would reflect experimental variation throughout the experiment.
RNA was extracted from the filters using a standard hot acid phenol method, followed by RNeasy cleanup (Qiagen). Samples were quality controlled with RNA 6000 Nano kit of the Bioanalyzer 2100 (Agilent). Samples were labeled with either Cy3 or Cy5 dye, using the Low RNA Input Linear Amp Kit (Agilent) with the modification that half reactions were used with a quarter of the recommended dye.
All samples were hybridized against the same common reference consisting of equal amounts of RNA from both parents (BY and RM) grown in both conditions (glucose and ethanol). Six independent replicates of each strain–condition combination were grown and harvested, for a total of 24 samples. RNA was isolated for each sample individually and then pooled, with an equal amount of RNA contributed by each sample. Multiple labeling reactions (24 for each dye) were pooled, and the same samples were used for all segregant and parental arrays. Experiments for knockout and allele replacement arrays used the same reference RNA sample, but a new labeling reaction.
Samples were hybridized to Agilent 11k yeast arrays, which are two-color, 60-mer oligo arrays with two arrays per slide, each containing spots for 6,256 transcripts, with some duplicated spots. In order to minimize experimentally induced bias, we randomized the order in which samples were processed, hybridizing glucose and ethanol samples at the same time and on the same slides. After hybridizing and washing per Agilent instructions, the arrays were scanned using an Agilent scanner and analyzed with Agilent's Feature Extraction software (versions 8.0–9.5). The arrays were uploaded into PUMAdb for processing (http://puma.princeton.edu/). Spots were considered good data if intensity was well above background and the feature was not a nonuniformity outlier. Additionally, transcripts were retained if data were present in all parental strains in the parental analysis or in at least 80% of the segregants in both conditions for the segregant analysis. We ultimately used 4,342 transcripts for the parental analysis and 4,482 transcripts for the segregant analysis.
Six replicates of each parent (BY and RM) in each condition (glucose and ethanol) were grown, expression profiled, and labeled (half with Cy3 and half with Cy5). There were 4,342 transcripts with high-quality data: there were no polymorphisms within the probe sequence, and good data was present for each of the replicates. For each transcript, an ANOVA of the form
was performed in R using the aov function. In order to assess significance, the dataset was permuted with respect to strain and condition 1,000 times, retaining six values of each strain–condition combination, and repeating the ANOVA for each transcript. A FDR was calculated for a range of p-values for strain, condition, and strain–condition interaction effects separately. p-Value cutoffs of 0.04, 0.04, and 0.03 corresponded to FDRs of 5% for strain, condition, and strain–condition interaction effects, respectively.
Genetic correlation was calculated from the variance results of the ANOVA study using the following equation .
Segregants were grown and expression profiled in both conditions, resulting in a total of 218 segregant arrays: 109 segregants each in glucose and ethanol. Samples were randomized with respect to dye, and all arrays were linearly adjusted for dye. Three phenotypes were used in linkage analysis: expression in glucose, expression in ethanol, and the difference in expression between glucose and ethanol (ethanol minus glucose for each segregant). Mapping a significant locus for the change across conditions is interpreted as gene–environment interaction. It can also be conceptualized as a paired test where the segregant's ethanol phenotype is effectively corrected for its phenotype in glucose.
Genotypes have been previously characterized , but we further refined the map by removing a set of 62 markers, corresponding to 13 Affymetrix gene regions, that were artificially inflating the map. When each gene region was removed, the total map distance (Haldane) decreased at least 25 cM. All but one of these gene regions corresponded to sequence that was duplicated in the genome, which would increase genotyping error. Our final set of markers consisted of 2,894 markers with an average spacing of 4 kb or 1.9 cM. Linkage analysis was performed using the nonparametric method of the qtl package in R , which is an adaptation from the Kruskal-Wallace test and is similar to that described by Kruglyak and Lander . In addition to the genotypes of the markers, genotype probabilities were estimated at every centimorgan along the genome using the calc.genoprob function. Significance was assessed by permutation. For each permutation, segregants were assigned to a new array randomly, and linkage analysis on all transcripts was repeated. The permutation was performed ten times and the average number of transcripts showing linkage at a specific logarithm of the odds (to the base 10) (LOD) score was used to calculate a FDR. LOD scores of 3.25, 3.3, and 3.7 corresponded to FDRs of 4.9%, 4.9%, and 4.8% for glucose, ethanol, and the difference between condition phenotypes, respectively. We also calculated confidence intervals as LOD support regions that extend outward from the peak marker until the LOD score decreases by 1.5 LOD units .
A linkage was classified as a local linkage if the confidence interval of the linkage overlapped with the region containing the open reading frame of the gene of interest (ORF region). The two markers flanking the region from 500 bp upstream of the start of the ORF to 500 bp downstream of the stop of the ORF, as defined by the Saccharomyces Genome Database, defined the ORF region. A distant linkage is any linkage that fell outside of the ORF region.
The genome was broken into 10-cM bins, and the peak of each linkage in each condition (glucose, ethanol, or gene–environment interaction) was assigned to a bin. A bin was considered to have an excess of linkages if the number exceeded the number expected by chance by Poisson distribution, given the total number of distant linkages and Bonferroni correction for 563 tests (p < 9 × 10−5). These cutoffs were greater than 15 for glucose, 14 for ethanol, and nine for interaction. Significant bins that were located immediately next to each other were merged into a single peak. Peaks in different conditions were considered overlapping if any bin contained in one peak was contained in the other.
For each gxeQTL, we took the marker closest to the linkage peak and computed the LOD score for linkage of this marker to the transcript levels in glucose and in ethanol. We then converted the LOD score to a nominal p-value by comparison to permuted data. We created 1,000 random phenotypes and performed nonparametric linkage analysis on each. We translated LOD scores to p-values for a given marker by counting the proportion of the randomized phenotypes that had a LOD score as high or higher than that observed. By inspecting the p-value distribution of gxeQTL marker linkages in each condition using QVALUE software , we determined cutoffs that allowed us to call individual linkages significant at a FDR of 5% (p = 0.228 in ethanol and p = 0.22 in glucose). To estimate the proportion of linkages within a condition that also showed gene–environment interaction, we calculated p-values for the linkage between peak markers found within a condition and the difference in expression between conditions. The proportion of significant tests is .
RR estimates and 95% CIs were calculated in R using the twoby2 command of the Epi package. Values of χ2 and p-values were calculated in Excel with expected values derived from row and column expectations. The Mann-Whitney test was performed in R using the wilcox.test command of the stats package.
The RM genome was downloaded from the Broad Web site and aligned to the S288c genome (SGD). The sequence of each probe was found in the S288c sequence, and the corresponding sequence determined for RM. We were able to find 6,217 probes in the alignment, and of these, 5,029 have no polymorphisms at all, 747 have a single polymorphism or gap, and 438 have two or more strain differences. Missing probes were either in the mitochondrial genome or differed from the most recent reference sequence. The presence of even a single polymorphism was associated with an increased probability of apparent local linkage, as would be expected if there were differences in binding efficiencies of the alleles. We thus excluded them from all further analysis.
IRA2 and DIG1 replacement strains were generated by a two-step allele replacement method . For example, IRA2 was replaced with URA3 in BY4724  (MATa ura3 lys2) and RM11-1a  (MATa leu2 ura3 ho::KAN), generating ira2Δ::URA3 knockout strains. New IRA2 alleles were amplified by PCR with approximately 500-bp overlapping sequence and introduced into the appropriate background to replace URA3. See Table S2 for strain descriptions. Allele replacements were sequenced to ensure that no new mutations were introduced. For both replacements, the entire coding sequence was exchanged, but the extent that the 3′UTR polymorphisms were exchanged varied. Adam Deutschbauer kindly provided the MKT1 D30G replacement strain in the BY4742 background .
The RM IRA2 sequence was obtained from the whole-genome sequencing project at the Broad Institute (http://www.broad.mit.edu/annotation/genome/saccharomyces_cerevisiae/Home.html), with the exception of a small gap, which we sequenced using standard dideoxy methods. Replacement strains were also sequenced using standard dideoxy sequencing methods.
We quantified how well the IRA2 replacement strains recapitulated the effect due to the locus in the segregants by comparing the locus effect to the replacement effect for all linking transcripts. For a given transcript, the locus effect is the difference between the BY change across condition (average of all segregants carrying the BY allele in ethanol minus the average of all segregants carrying the BY allele in glucose) and the RM change across condition (average of all segregants carrying the RM allele in ethanol minus the average of all segregants carrying the RM allele in glucose). For the replacement effect, the calculation is similar, except instead of segregant strains, we use the parental and replacement strains. Thus, for a single transcript, the replacement effect would be:
We then compared the locus effect to the replacement effect across all transcripts by calculating a Pearson correlation. We estimated significance of this correlation by randomizing the genotype at the marker nearest to IRA2 and repeating the analysis 1,000 times. The p-value is the number of times that we got a correlation as high or higher than what was observed. A significant correlation indicates that gene that was replaced functionally influences the traits. We also analyzed this relationship by linear regression using the lm command in R. If the regression slope is close to 1, then it indicates that the gene is a major contributor to the phenotype.
Of the 4,482 transcripts that were analyzed in the segregants, 4,339 had GO terms associated with them. This set was used as a background for GO term enrichment. Hypergeometric tests were performed using GOLEM , and p-values were Bonferroni corrected for multiple testing.
Of the 4,482 transcripts that were analyzed in the segregants, 4,465 were used to look for enrichment of transcription factor (TF) binding sites. All nine “ORFs_by_factor” files that are provided by MacIsaac et al. on their Web site (available at: http://rd.plos.org/pbio.0060083) were used to look for enrichment of binding sites. These lists are constructed by creating lists of genes that contain a predicted TF site at three levels of conservation and three levels of experimental binding . In order to be in a list, the site has to occur within the upstream intergenic region of the gene (K. MacIsaac, personal communication). Hypergeometric tests of enrichment were performed in GOLEM using modified input files in which each gene list corresponded to its own term. p-Values were Bonferroni corrected.
Log2 ratios (experimental/reference) values for all parental replicates. Data have been processed according to Materials and Methods, and only good data are reported.
(1.7 MB XLS)
Log2 ratios (experimental/reference) of segregant glucose and ethanol expression values. Data are processed according to Materials and Methods, and only good data are reported.
(6.6 MB XLS)
The linkage map was calculated with 109 segregants using the Haldane mapping function in R/qtl, with the chromosome and order of the markers determined by their physical position. Chromosomes are indicated by vertical lines, and markers are indicated by horizontal lines. The map covers a total of 5,545 cM.
(663 KB PDF)
(A) The number of transcripts that show significant strain–condition interaction, strain, and condition effects in the parental strains.
(B) The number of linkages showing genome-wide level significance for gene–environment interaction, glucose, and ethanol. For each linkage, the marker closest to the peak of linkage was investigated in each condition. If the LOD score exceeded the genome-wide–level threshold, then the marker was considered significant in that condition. Since we are pooling results from different analyses, there are multiple linkages per chromosome. We randomly picked one of these linkages for the diagram and removed the rest. The totals within each condition do not add up to the total number of significant linkages reported because although a linkage can occur on the same chromosome in glucose and ethanol, they are not necessarily close enough that both markers are significant in both conditions, particularly when the scores are near the significance thresholds.
(116 KB PDF)
The relationship between the segregant effect and the replacement effect is shown for all transcripts that have a peak of linkage at the marker closest to DIG1. We chose to use the marker closest to DIG1 to limit the proportion of transcripts that were being influenced by nearby peaks (gxe15). Each point represents a transcript. Solid lines indicate the best fit by linear regression (orange = BY, and purple = RM). The grey dotted line indicates y = x. Both lines are significantly different from 0, as indicated by the slopes.
(150 KB PDF)
The relationship between the segregant effect and the replacement effect is shown for all transcripts that have confidence that overlap with IRA2 are shown. Associations are shown for linkages in (A) glucose and (B) ethanol. Each point represents a transcript. Solid lines indicate the best fit by linear regression (orange = BY, and purple = RM). The grey dotted line indicates y = x. Linear regression slopes, confidence intervals, and R 2 values are shown for each line.
(283 KB PDF)
(A) A scenario involving the loss of a transcription factor binding site across conditions that differ in the concentration (or activity) of the transcription factor. Depicted in the graph are two alleles, one with a mutation upstream of a start site (asterisk) and one without. Dotted lines indicate time points when measurements are taken. The activity at the locus is depicted below the graph for these two time points where the transcription factor (double circle), when present, is binding to the functional site and causing an increase in expression or no change. Expression levels are shown increasing from green to red, relative to the average expression.
(B) A scenario involving a transcription factor that is able to bind an activator or an inhibitor. An open circle indicates a protein capable of binding both factors, while the three-quarter circle indicates a protein unable to bind to either. The activator and the inhibitor vary in concentration depending on condition.
(C) A scenario in which one mutation (blue) changes the ancestral state (black) such that the average phenotype changes. A second mutation on the same haplotype (brown) changes expression such that the average phenotype is the same as the ancestral state. Thus the final phenotype averaged across the conditions is the same for both haplotypes, and the effects of the haplotype in the two conditions are in opposite directions.
(126 KB PDF)
Segregant and replacement effects for distant gxeQTL with confidence intervals that overlap MKT1 are shown. Each point represents a transcript. Solid lines indicate the best fit by linear regression (orange = BY, and purple = RM). The grey dotted line indicates y = x. Linear regression slopes, confidence intervals, and R 2 values are shown for each line. The slope is significantly different from 0, indicating that the variant contributes to phenotypic variation at this locus.
(187 KB PDF)
Each distant peak is described in further detail, with the location, number of linkages, verified or potential candidates, top significant GO term, and significant enriched transcription factor binding sites.
(29 KB XLS)
A list of the strains used in this study.
(20 KB XLS)
A list of the results from the ANOVA for all transcripts tested. For each transcript, the presence of a polymorphism in the probe, p-values for each factor, percent variance explained for each factor, whether the effect was significant, and average effect size are listed.
(1.7 MB XLS)
A list of all significant linkages. For each linkage, the presence of a polymorphism in the probe; the linkage peak marker, location, LOD score, confidence interval, and distant peak name; whether the peak is local or distant, and average effect sizes are listed.
(10.5 MB XLS)
(44 KB DOC)
Gene accession numbers reference the Saccharomyces Genome Database (http://www.yeastgenome.org): AMN1 (S000000362), CIN5 (S000005554, GPA1 (S000001047), HAP1 (S000004246), HXT6 (S000002751) , HXT7 (S000002750), IDP2 (S000004164), IRA2 (S000005441, MATALPHA1 (S000000636), MSN2 (S000004640),and MSN4 (S000001545).
We acknowledge Joshua A. Shapiro for performing the McDonald-Kreitman test and assistance with phylogenetics; Rachel Brem and David Botstein for valuable advice; John Storey for suggesting the paired analysis; Hannah Seidel, Matthew V. Rockman, Joshua A. Shapiro, and Rachel Brem for close reading of the manuscript; and Adam Deutschbauer for providing the MKT1 strain.
Author contributions. ENS conceived and designed the experiments, performed the experiments, analyzed the data, and wrote the paper. LK conceived and designed the experiments and wrote the paper.
Funding. This work was supported by National Institutes of Health (NIH) grant R37 MH059520 and a James S. McDonnell Foundation Centennial Fellowship (LK), and NIH grant GM071508 to the Lewis-Sigler Institute. The The Princeton University MicroArray database is funded in part by the National Institute of General Medical Sciences (NIGMS) (NIH grant P50 GM071508).
Competing interests. The authors have declared that no competing interests exist.