We have presented a transcription-factor-centric method for identifying trans-acting genetic modulators of gene expression using parallel genotyping and mRNA expression phenotyping data. Our approach is based on the idea of treating the genotype-specific regulatory activity of each TF as a quantitative trait. It exploits prior information about the network of interactions between TFs and their target genes to infer genotype-specific TF activities from genome-wide measurements of mRNA expression. Our method has greatly increased statistical power to detect locus-TF associations. It is sensitive even to a relatively subtle influence of genotype-specific TF activity on mRNA expression because it is based on a statistical analysis across both genes and segregants. The fact that TF activity is not a gene-specific phenotype allows us to make the rather crude assumption that the strength of the regulatory connectivity between TF and target gene is proportional to in vitro promoter affinity. In reality, many of the predicted binding sites in promoter regions are not functional, due to complex interactions with nucleosomes and other chromatin-associated factors. It is remarkable that our method works in spite of this complexity.
Application of our aQTL method to a data set for 108 haploid segregants from a cross between two yeast strains (Smith and Kruglyak, 2008
) demonstrated a dramatic increase in statistical power to uncover the regulatory mechanisms underlying genetic variation in gene expression levels. The results are summarized in Supplementary Table S2
. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002
; Yvert et al, 2003
; Lee et al, 2006
; Smith and Kruglyak, 2008
; Zhu et al, 2008
). The total number of distinct genomic loci identified as an aQTL for one or more TFs equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008
). Thus, our method identifies 20 novel trans
-acting polymorphisms: almost double the number of known such loci in yeast. For many of the eQTL hotspots, it also implicated several TFs not previously known to mediate the influence of these loci on genome-wide mRNA expression.
Our regression procedure fully accounts for post-translational regulation of TF activity at the protein level, as we do not use the mRNA expression level of either the gene encoding the TF or one its upstream modulators as a surrogate for regulatory activity. Indeed, the correlation between the protein-level regulatory activity of a TF and its expression at the mRNA level across a large number of experimental conditions in yeast was recently found to often be quite poor (Boorsma et al, 2008
). The present study confirms this observation: Only one third of TFs analyzed show a significant (<5% FDR) correlation between mRNA expression and activity (Supplementary Figure S6
). Moreover, only 12 of the 103 TF-locus associations could be confirmed when mRNA expression level was used as a proxy inferred protein-level TF activity.
We also applied our aQTL method to the earlier yeast segregant data set of (Brem and Kruglyak, 2005
). This confirmed the dramatic increase in statistical power afforded by our approach (see Supplementary Table S5
). We detected a total of 79 locus-TF associations, which again is a more than six-fold improvement over the 14 locus-TF associations detected from these data by several existing methods (Lee et al, 2006
; Sun et al, 2007
; Zhu et al, 2008
; Ye et al, 2009
) combined. Furthermore, 28 of these 79 locus-TF associations were also detected using the data of (Smith and Kruglyak, 2008
). This degree of reproducibility strongly validates our method: given that the number of possible such associations equals the number of TFs (123) times the number of markers (~3000) divided by the average number of genes per locus (~20), we would expect this overlap to be ~0.4 by random chance. There is also no reason to expect complete overlap, as the data sets were similar but not identical. Indeed, although 13 eQTL hotspots have been identified in each respective data set, only 8 of these are the same (Smith and Kruglyak, 2008
; Zhu et al, 2008
Our findings are consistent with previous observations (Yvert et al, 2003
) that most trans
-acting variation in yeast does not map to TF genes, but to upstream modulators of TF activity. Indeed, of the total of 103 TF-locus associations shown in only four are local. We confirmed that HAP1
is directly affected by a sequence polymorphism, and discovered novel trans
-acting polymorphisms in the TF-encoding gene STB5
, and HAP4
. Unexpectedly, our analysis revealed loci on chromosomes II and XV that are informative for a large number of TFs (‘aQTL hotspots'). We stress that this cannot be accounted for in terms of correlated profiles of promoter affinity across genes, as we found these to be largely independent between TFs (cf. Supplementary Figure S1A
). Rather, this phenomenon seems to point to one-to-many relationships between signal transduction pathways and TFs. For instance, our method predicts that genetic variation at the locus on chromosome II encoding the cyclin-dependent kinase CDC28 changes the activity of multiple cell cycle associated TFs (Ace2p, Fkh1p, Fkh2p, and Swi5p). At the same time, distinct polymorphisms at the same aQTL could be responsible for modulating different subsets of linked TFs. Evidence for this is our observation that allele replacement at the IRA2 locus on chromosome XV only affected a small subset of the TFs whose activity is linked to this aQTL (cf. Supplementary Figure S4
In an effort to uncover further specific molecular mechanisms underlying the aQTL linkages summarized in , we supplemented our genetic analysis with knowledge about physical and genetic PPIs; see Materials and methods for details. The information provided by PPI and aQTL is highly complementary. On the one hand, aQTL linkage can only implicate relatively large genomic regions, not individual genes, as genetic modulators of TF activity. On the other hand, although PPI data can connect a TF to a putative modulator of its activity, it would be questionable to conclude that the interaction corresponds to a functional regulatory network connection without the strict causality and directionality associated with aQTL linkage. In all cases, the probability that a gene within the aQTL region encodes one of the direct interactors of the TF by chance is <3% (see Materials and methods and Supplementary Table S4
). Therefore, most of these genes (aQTG) are expected to encode direct or indirect modulators of the TF's activity. We were able to implicate a non-coding polymorphism in the CDC28
gene as a plausible genetic factor underlying the major eQTL hotspot on chromosome II (in addition to the experimentally validated trans
-acting polymorphism in the AMN1
gene in the same region (Yvert et al, 2003
)) and make a strong prediction that the functionally distinct cell cycle regulators Fkh1p and Fkh2p are modulated by the cyclin-dependent kinase Cdc28p in an antagonistic manner.
Extensive transgressive segregation has been previously identified for the expression levels of individual genes (Brem and Kruglyak, 2005
). However, when we tested for the same phenomenon at the level of TF activity (see Materials and methods), we were only able to detect transgressive segregation for Ecm22p and Tec1p (Supplementary Figure S8
); in both cases, the effects of two aQTLs for same TF cancel each other in both parental strains, and no differential activity between RM and BY could be observed (). Presumably, much of the transgressive segregation at the level of individual genes is due to the fact that positive and negative contributions from different TFs can cancel each other. Our multivariate modeling of each individual gene's expression level in terms of the activity of multiple TFs accounts for such compensation explicitly, and hence the transgression is much less prevalent for aQTLs than for eQTLs.
In our approach, ‘phenotype space' is reduced from that of all genes to that of all TFs. Rather than mapping the measured mRNA expression level of individual genes to eQTLs, we map the inferred activity of each TF to ‘aQTLs.' This enhances statistical power in two distinct ways. First, it improves the signal-to-noise ratio for the quantitative trait itself, as the activity of each TF is estimated from the mRNA expression levels of its many targets. Second, the severity of the multiple-testing problem associated with QTL mapping because of the large number of marker/trait combinations is greatly reduced. Running in only seconds on a single processor, our algorithm is also computationally efficient.
It is important to emphasize that in our method the molecular identity of a TF is only defined through the PSAM that parameterizes its DNA-binding specificity. The sequence-to-affinity model for each TF needs to be specific enough to allow differentiation from all other TFs. We found that in the case of the budding yeast S. cerevisiae
this condition generally holds. Given the rapid pace at which in vitro
DNA-binding data is currently being generated for mammalian TFs (Badis et al, 2009
), together with the demonstrated ability of regression-based models to infer TF activity in human cells (Das et al, 2006
), we expect application of our method also to be feasible in higher eukaryotes.
Taken together, our results underscore the value of explicitly treating TF activity as a quantitative trait from a systems biology perspective as a promising strategy for increasing the statistical power of genome-wide linkage and association studies. More generally, our method is applicable whenever a matrix of connection strengths between regulators and targets, independent of the phenotype matrix, is available as prior information. There are several directions in which this approach can be extended. First, the use of more sophisticated methods for causal gene identification (Sun et al, 2007
; Suthram et al, 2008
; Lee et al, 2009
) is likely to uncover additional molecular mechanisms. It will also be interesting to analyze to what extent the connectivity between the TF and their genetic modulators depends on the nutrient condition in which the yeast cells are grown (Smith and Kruglyak, 2008
). Furthermore, aQTLs provide a novel vantage point for analyzing locus–locus interactions. Finally, it should be interesting to analyze to what extent genetic variation in steady-state gene expression levels because of post-transcriptional regulation of mRNA stability (Foat et al, 2005
; Lee et al, 2009
) is amenable to dissection using the method introduced in this paper.