Analyzing low dimensional representations of high-dimensional data through techniques such as SVD and ICA is a useful approach for studying the genetic architecture of gene expression variation [1
]. We find that the meta-trait linkage analysis approach is complementary to traditional single trait linkage scans, which are inefficient in exploiting the complex correlation structure that exists among gene expression levels. In addition, this approach results in a smaller number of traits to analyze, thus increasing statistical power by attenuating the multiple testing problem.
Randomizing parental yeast genomes through genetic crosses induces widespread changes in expression, which allows the contribution of genetic variation to gene expression changes, to be systematically probed and makes it an appealing situation to apply dimension reduction methods. By applying SVD and ICA to the unfiltered expression matrix, we are able to focus our analysis on the most biologically meaningful meta-traits that we term "eigentraits" and "ICAtraits", respectively. In this low dimensional snapshot, each meta-trait is uncorrelated with the others and is a weighted average of all the 6216 traits and hence can be analyzed independently of others. The approach we outline for identifying significantly correlated genes for each meta-trait allows gene sets to be identified and subjected to further bioinformatics and functional analyses.
The complementary nature of this study compared to single trait analyses is supported by the fact that eigentraits with some of the highest singular values map to previously described strong effect eQTLs such as LEU2
locus, Msn2/4 targets, and AMN1
. These results are also reinforced by the ICAtrait analysis that uncovered similar large effect QTLs. Furthermore, the utility of both approaches is demonstrated by the detection of eleven novel eQTLs that supplements our understanding of the genetic architecture of gene expression differences between these two S. cerevisiae
strains. These include four novel cis
-linkages that were studied in greater detail. Two of them map to tandem arrays of genes with similar functions that are involved in asparaginase metabolism and sodium ion transport. Comparative sequence analysis of the BY and RM strains sugests that these gene clusters have been lost in RM and is consistent with reports of copy number changes at these loci in non-laboratory yeast strains [38
]. These two meta-traits also show differential expression between the two parental strains at the marker with the highest linkage statistic (see Additional file 7
). This is interesting as the two strains have evolved in very different ecological niches and might depend on different nutrient sources for survival. Using ICAtraits, two additional cis
-linkages were identified and found to be associated with differences in retro-transposititon and alcohol dehydrogenase activity. Analysis of the linkage region at a finer scale also provided strong candidate genes that might be potential regulators of these expression differences.
Despite the strict genome-wide threshold that we used, there were four cases of eQTLs common between eigentraits (Figure ). Such observations coupled with the orthogonal property of meta-traits is consistent with either pleiotropy or coordinate linkage between two closely spaced eQTLs. However, it is important to note that such inferences are tenuous because a single biological signal may be captured by multiple eigentraits. This cannot be ruled out as the eigentraits, being linear combinations of all expression traits, are hard to interpret qualitatively. Interestingly, there is only one case of linkages being shared between ICAtraits, suggesting that ICA is better at discriminating between the different biological signals present in the data. Furthermore, ICA identified a larger set of novel eQTLs compared to SVD. This may be due to ICA's estimation of statistically independent components in higher order moments that detects non-normally distributed trends, while SVD relies on the absence of correlation in second order moments of normal trends. The non-normally distributed or long-tailed distribution in this dataset is expected based on the finding from single trait analyses that there exists a small number of linkage "hotspots" that are responsible for most of the variation in the dataset.
Another scenario where the interpretation of the results might be potentially misleading is when the meta-traits capture technical artifacts in the microarray experiment, for example signal that is driven by cross-hybridization instead of true differential expression. One approach to assess the effect of cross-hybridization on the eQTL data is to test the hypothesis that paralogous genes are enriched among significantly correlated meta-traits. For example, eigentrait 8 consists of 47 significantly correlated traits, of which 7 are paralogs (YRF1-1, YRF1-2, YRF1-3, YRF1-4, YRF1-5, YRF1-6, YRF1-7). Thus cross-hybridization among these genes may be influencing this eigentrait. Note, linkage analysis of eigentrait 8 identified two closely linked cis-eQTL on chromosome 12 (Figure ). One of these cis-linked regions contains YRF1-4 and YRF1-5. Thus cross-hybridization of YRF1-4 and YRF1-5 with the other YRF1 paralogs could potentially explain this apparent cis-linkage. However, if this were true we would expect to see a cis-linkage at one of the other YRF1 genes, which we do not observe. Therefore, the linkages observed for eigentrait 8 appear to be robust, but the larger issue of technical artifacts due to cross-hybridization and other sources is important to keep in mind when interpreting eQTL studies.
In summary, we highlight the applicability of dimension reduction methods for studying large-scale patterns of variation in gene expression traits. We argue for the use of both SVD and ICA if there are no prior expectations about the different patterns of variation present in genome-wide expression trait measurements. It also represents an important tool in recovering previously undetected eQTL, for exploring the widespread but uncharacterized cases of pleiotropy, and provides the basis for a more detailed understanding about how regulatory variation manifests itself across transcriptional networks.