Many, and possibly the majority, of phenotypic differences among individuals are likely to be elicited by alterations in gene expression and the underlying transcriptional regulation
1,2. Differences in gene expression have been observed among individuals in a variety of species, including yeast
3-8. These differences may be mediated at a variety of levels, including transcription, mRNA processing and mRNA stability, and the contribution of each level is unknown. Recently, we and others discovered a surprisingly high frequency of TF binding site variation among closely related species
9,10, but the extent to which this occurs in individuals is unknown.
To investigate the extent of diversity in transcription factor binding among individual strains of yeast, we used ChIP-Seq to generate high-resolution Ste12 binding profiles for two distinct yeast strains, S96 (isogenic to S288c) and HS959 (isogenic to YJM789), and 43
MATa genotyped segregants of these lines. For the parent strains, the profiles were generated in the presence and absence of the mating pheromone α-factor
9, and for the segregants only α-factor treatment conditions were examined. At least two biological replicates were performed for each strain along with one input DNA control (see Methods).
From vegetatively grown cells, we found 536 and 846 binding regions in S96 and HS959 cells, respectively (;
Tables S2, S3). After treatment with α factor, Ste12 bound many more regions using the same threshold (1179 in S96 and 1112 in HS959, ,
Tables S4, S5), consistent with the role of Ste12 as a major transcriptional activator in pheromone response
11. Assignment of binding regions revealed that many of the gene targets were identical in vegetative and pheromone treated cells (
Fig. S1); many of these common targets are involved in conjugation, and may have an additional role during vegetative growth.
Comparison of the signal tracks revealed extensive differences among the different yeast strains and their progeny. We quantitatively identified the variable binding regions across all 43 segregants and the two parental strains under pheromone-treatment conditions by calculating a Normalized Difference (NormDiff) Score and defined the genomic positions within the top 2.5% of binding variance across all 45 strains as variable (see Methods). Adjacent variable regions were grouped into 930 genomic “traits”. Inspection of the signal tracks revealed that the majority (~ 78%) of the variable binding regions exhibit Mendelian segregation in the progeny (). Interestingly, a large number of transgression effects were observed in which the segregants either obtained or lost binding relative to the parents ().
To identify the underlying QTLs for each binding variation trait, we used the genotype information from Mancera et al. which contained 53,460 markers
12 (see Methods). The neighbouring markers with the same genotype distribution across strains were grouped, resulting in 2,592 nonredundant marker sets of 1,132 bp median size. We performed single marker regression with the 930 binding traits and the 2,592 markers and found significant marker-trait associations for 195 traits at a local false discovery rate (FDR) of 0.01 (see Methods). As shown in , 166 traits were associated with
cis-markers (defined as markers on the same chromosome as the associated traits); 157 of these
cis-traits were within 10kb of the associated marker. 35 traits were associated with
trans-markers (defined as markers on different chromosome from associated traits); the trans markers each affected between one and 11 loci (). Six traits were affected by both
cis- and
trans- markers. The effect for the
cis markers is much stronger than for the trans markers (). Thus, the vast majority of significantly associated markers (85%) lay
cis to the variable binding site, indicating that most binding differences are mediated by adjacent regulatory elements. Changing the linkage threshold yielded similar results (
Fig. S4).
We next examined the underlying genetic basis of the variable binding loci that were affected in
cis. Searching for position weight matrix (PWM) matches of the Ste12 motif in the genomic regions of the 166 significant
cis-variable binding traits of the two parental strains revealed that 102 contained at least one Ste12 motif. 36 of these Ste12 motifs in 31 variable binding regions were affected by polymorphisms: 14 contained SNPs ( and
Fig. S5) and, the majority (22) were disrupted by Indels. For the majority of SNPs (11 out of 14), the poorer match to the Ste12 consensus motif corresponds to a lower Ste12 binding signal. Two examples are shown in (see
Fig. S5 for two exceptions). In contrast to that observed for the variable regions, we found 177 Ste12 motifs in the non-variable Ste12 binding regions and only 2 out of these 177 motifs contained a SNP or Indel (Pearson's Chi-square test,
P < 0.001,
Table S6).
To examine whether there are
cis-mutations in the binding sequences of other TFs affecting Ste12 binding, we searched PWM matches of all known TF recognition consensus sequences
13 in all Ste12 binding regions with a particular focus on the variable binding regions. Excluding Ste12, 38 TF consensus sequences were over-represented in Ste12 bound regions (
Table S6, Bonferroni corrected
P < 0.05; see Methods). Of the 38 TF recognition sequences, 219 motifs contained at least one polymorphism in the 166
cis- variable Ste12 binding regions. In contrast, only 175 contain motifs with polymorphisms in more than 1,000 non-variable Ste12 binding regions (Pearson's Chi-Square test,
P <0.001). In total, of 166
cis-traits, 72 traits had polymorphic motifs: 10 (6.0%) with an altered Ste12 motif, 41 (24.7%) with altered motifs of other TFs, and 21 (12.7%) with altered motifs of both Ste12 and other TFs. We presume that the
cis-variable regions which lack polymorphisms are due to differences in other TF binding regions, chromatin effects, or perhaps affect other genes which in turn affect TF binding through feedback loops or post-transcriptional mechanisms
14.
We next analyzed whether polymorphisms in a specific TF motif correlated with Ste12 binding in the variable regions. The presence of polymorphisms in six TF motifs correlated positively or negatively with Ste12 binding (
Table S6). For example, for Yhp1 which associates with Mcm1
15 (a Ste12 associated protein
16), and Yap5, whose deletion causes a defect in haploid invasive growth
17 (), 18 variations (4 SNPs and 14 Indels) and 27 variations (12 SNPs and 15 Indels) lie in their respective binding motifs. In both cases the poorer consensus motifs match associated with reduced Ste12 binding (Binomial test,
P = 0.0096 and 0.0038 respectively,
Table S6). These results suggest that Yhp1 and Yap5, along with four other factors, function to facilitate Ste12 binding.
In our linkage analysis, we found 194
trans-markers significantly linked to binding variation traits on different chromosomes (). We used an independent hierarchical clustering method that has greater statistical power
6 to also identify
trans-regulators (see Methods). The 930 quantitative traits were first grouped into 121 mutually exclusive clusters of high significance. 85 of the clusters contained three or more traits (the largest cluster had 46 traits) and 70% of the clusters contain traits on different chromosomes. We next tested for association between these cluster-level traits and the genotype markers. At an FDR of 0.01 we found significant linkages for 51 clusters (
P-value < 2 × 10
−5), including 16 clusters with traits on different chromosomes. Six distinct genotype markers from 7 trans clusters were in common with those identified by the linkage analysis (Figs. and
S6; see methods) and are potential
trans-QTLs that affect Ste12 binding.
Four of the common trans clusters with Ste12 binding variation contained distinct classes of genes. Two clusters shown in
Fig. S6a and g exhibit Ste12 binding variation in the promoters of
DSE1,
DSE2,
SCW11,
CTS1, and
WTM1 which are enriched for the GO categories: “cytokinesis” (corrected
P << 0.001) and “cell wall organization and biogenesis” (corrected
P = 0.045). The cluster in
Fig. S6e includes binding variation in promoters of several
FLO genes (e.g.
FLO1, FLO11), suggesting that the QTL regulates pathways involved in flocculation
18 and/or cell-cell interaction
19. The fourth cluster (
Fig. S6f) includes 9 binding variations on chromosomes II, IV, VI, VII, XI, XIV and XV, with the highest association to a marker on chromosome II. The chromosome II region contains several interesting genes including chromatin remodelling factors. These results indicate that not only can
trans loci be identified, but they also affect distinct classes of genes.
To identify the
trans-acting genes responsible for the Ste12 binding differences, we disrupted 12 candidate genes in each of the parental strains. The genes were located primarily in three regions containing the
trans-QTLs shown in
Fig. S6a/g, e, and f. For two regions we found a single gene,
AMN1 and
FLO8, respectively, whose disruption recapitulated the Ste12 binding pattern of the other parent.
AMN1 deletion increased Ste12 binding whereas loss of
FLO8 diminished Ste12 binding, indicating that these are negative and positive regulators of Ste12 binding, respectively (). Interestingly, whereas Flo8 affects highly localized Ste12 peaks, Amn1 affects broadly bound Ste12 peaks, suggesting that Amn1 may operate over larger genomic regions.
AMN1 was previously shown to affect vegetative gene expression in a different set of yeast crosses
6,7. Our data therefore indicate that
AMN1 has a role in regulating transcription under several conditions.
FLO8 is often disrupted in laboratory strains
20, involved in pseudohyphal growth
18,21, and was shown previously to have a role in Ste12/Tec1 binding at one promoter
22. However, its binding motif is not enriched near Ste12 binding sites and it was not known to have role in either Ste12 function or in mating. Our data indicate that
FLO8 is directly or indirectly important for Ste12 binding at a number of loci during α-factor treatment. Flo8 has been reported to localize to specific loci, including the promoters of
FLO1, FLO11, and
CIN523, consistent with a direct regulatory role at these genes. Thus, our study revealed two novel loci affecting Ste12 binding during the pheromone response and yielded several other candidate regions.
To determine if the variation in Ste12 binding is functional, we examined the correlation between binding and gene expression at the Ste12 target genes. Gene expression was measured using DNA microarrays for all 45 strains grown under the same conditions as the ChIP-Seq experiments. Many traits exhibit a positive correlation between binding and expression (permutation test, P < 0.001; ), consistent with Ste12's role as a transcriptional activator. Interestingly, however, a number of traits were negatively correlated with gene expression (), suggesting that Ste12 may also function as a repressor, which had not been reported previously. Overall, 222 unique target genes (28%) had an absolute correlation coefficient for gene expression and Ste12 binding intensity higher than 0.335; 171 were positively correlated and 51 negatively correlated (). Although the results are highly significant (P < 0.001 from 1,000 permutations), we presume that the absolute correlation is not higher because of the presence of additional binding sites for Ste12 and/or other factors, or additional mechanism of gene regulation. The positively correlated genes were enriched in the GO category “cell wall organization and biogenesis” (P < 0.001).
We next grouped the patterns of binding variation and expression variation at individual genes and found 14 clusters of variable Ste12 binding regions that exhibited strongly correlated or anti-correlated gene expression (,
Fig. S8). 12 of 14 clusters include
cis-binding variations on the same chromosome. For the
trans-cluster affected by
AMN1, all Ste12 binding variations are positively correlated with expression of their target genes, suggesting that
AMN1 regulates gene expression through Ste12 binding to these genes. Another
trans-cluster with binding traits on chromosome VI and X also positively correlated with gene expression. Notably, these traits correspond to a translocation between chromosomes VI and X in HS959, and are actually in
cis for segregants inheriting the HS959 genotype (
Fig. S8), thereby suggesting that coordinated expression patterns can disperse in strain diversification.
Interestingly, we often observed combinations of both positively and negatively correlated binding-expression patterns in the same cluster (, second and third cluster;
Fig. S8). These results suggest that Ste12 binding may act as an activator on some gene promoters and a repressor on others. Alternatively, the differential expression may be due to an indirect effect of Ste12 binding through an activator or repressor of the target genes.
Although the trans-cluster affected by FLO8 was not among the most highly correlated for binding and expression, most binding traits in the FLO8 cluster are correlated with the expression of downstream gene targets. Specifically, four binding regions positively correlate with the expression of three adjacent genes, FLO1 (two regions; r = 0.554 and 0.577), FLO11 (r = 0.574), and CIN5 (r = 0.490) and one region negatively correlates with the expression level of its neighbouring genes YPS6 and GTT1 (r = −0.306 and −0.340 respectively). Furthermore, although not directly downstream, all binding traits in this cluster are highly correlated with the expression level of two filamentous growth genes, HMS1 (r = 0.600, 0.615, 0.650, 0.725, 0.715 respectively) and TMN3 (r = −0.500, −0.591, −0.599, −0.526, −0.643 respectively). Thus, FLO8-influenced Ste12 binding appears to be functional in regulating gene expression. The overall correlation of Flo8-mediated binding and gene expression may not have been revealed initially because some genes are influenced positively and others negatively.
Overall, our studies have revealed that TF binding variation in different yeast strains is wide-spread and that the majority of the strongest differences can be explained by genetic variants acting in cis. The cis events are often mediated by variation in consensus sequences required for binding of Ste12 or other co-associated TFs. Through genetic analysis on Ste12 binding, we were able to pinpoint potential causative DNA polymorphisms in TF binding motifs, identify novel association between TFs, and identify novel functional roles of Ste12 in activation and repression of different genes.
Our results concerning Flo8 and the covariance of motifs of other TFs such as Yhp1 and Yap5 with Ste12 binding, suggest that many factors not previously known to be involved in mating may have direct roles in the binding of key regulators such as Ste12 at specific loci. This result is consistent with the analysis of TF binding during the salt response where many different combinations of factors were present at different gene promoters
24. Our results suggest that although major transcriptional regulators can control the expression of many inducible genes, the particular combination of factors at specific loci are ultimately responsible for individual gene expression. Thus, gene expression is regulated both globally and locally by different factors. Analysis of both
cis- and
trans-acting factors in different individuals of the same species can reveal a large number of new regulators of gene expression, even in well-characterized processes such as yeast mating.
The observation that
AMN1 and
FLO8 vary in different laboratory strains
6,7,18,20 and affect several cellular processes such as vegetative growth and mating raises the possibility that these genes determine not one, but several phenotypic traits. Each of these genes, along with many other QTLs identified in our study, affect the expression of genes involved in the response of cells to the environment, such as constituents of the cell wall and flocculation genes. Thus, we speculate that
AMN1 and
FLO8 and other QTLs are important for controlling the response to different environmental conditions and phenotypic diversity.