|Home | About | Journals | Submit | Contact Us | Français|
Expression quantitative trait loci (eQTL), or genetic variants associated with changes in gene expression, have the potential to assist in interpreting results of genome-wide association studies (GWAS). eQTLs also have varying degrees of tissue specificity. By correlating the statistical significance of eQTLs mapped in various tissue types to their odds ratios reported in a large GWAS by the Wellcome Trust Case Control Consortium (WTCCC), we discovered that there is a significant association between diseases studied genetically and their relevant tissues. This suggests that eQTL data sets can be used to determine tissues that play a role in the pathogenesis of a disease, thereby highlighting these tissue types for further post-GWAS functional studies.
Genome wide association studies (GWAS) of common complex or multifactorial diseases have proliferated enormously over the last few years. They have also been successful in identifying a large number of loci at extraordinary levels of significance. However, this success has presented a new challenge: translating these findings into a full understanding of how the loci affect complex disease traits. Most of the reported variants do not affect protein function in an obvious manner and indeed a large number lie in introns or intergenic regions, indicating that they may function through more subtle regulation of gene expression. This is compatible with the hypothesis that such genetic variants, commonly known as expression quantitative trait loci (eQTL), are an important factor in disease susceptibility (1).
eQTLs are discovered when samples are simultaneously studied using genotyping tools, yielding information on DNA variants, and expression measurement tools, yielding information on RNA levels. Each SNP can be represented by its alleles or genotypes, and statistically associated across samples with RNA levels. As a hypothetical example of one eQTL (Figure 1), as genotypes vary from CC, CT, TT at SNP rs7188573 in gene MMP25, expression levels of this gene decrease in monocytes. Note that the SNP may be within a certain distance from the gene, in which case the SNP is termed a cis eSNP, or might be distant, termed a trans eSNP. The degree of association is typically estimated statistically, using ANOVA or equivalent tests, controlled for multiple hypothesis testing. Nicolae, et al., recently demonstrated that SNPs associated with human traits are in general enriched for eQTLs (2). Other studies have shown that eQTLs calculated in the tissue of interest in a disease are enriched for disease-associated SNPs (3,4). These insights have been applied to use eQTLs to provide another layer of meaning to SNPs and prioritize GWAS results (5,6).
Since gene expression profiles vary in different tissues, it is only to be expected that some eQTLs are tissue specific, and it has been reported that 33–69% of eQTLs, depending on the analysis method and the tissue type, are not discovered in other tissues (3,7,8), suggesting that the differences in eQTLs across tissues may provide additional functional information. This prompted us to investigate the correlation between the statistical significance of eQTLs calculated in different tissues (blood monocytes, liver, and adipose tissue) and their odds ratios in diseases studied by the Wellcome Trust Case Control Consortium (WTCCC) (9). Our results reveal the potential for this type of correlation analysis to be used to determine the tissue of interest for a disease, ultimately providing information that narrows the research focus or opening up novel avenues of inquiry.
The overall experimental design in shown in Figure 2. We obtained publicly-available eQTL data generated from 3 different tissue types in healthy Caucasians: peripheral blood monocytes (8), liver, and adipose tissue (10). The first data set specifically involved monocytes isolated from the blood of 1490 healthy individuals recruited in the Gutenberg Heart Study. The eQTLs were mapped using analysis of variance (ANOVA), with a p-value cutoff of 5.78E-12, corresponding to a family-wise error rate of 0.05. This study reported 37,403 associations, comprising 29,912 SNPs and 2,745 expression traits. The liver and adipose datasets involved samples obtained from 1008 morbidly obese individuals at the time of gastric bypass surgery. Association was determined using the Kruskal-Wallis test, and results were reported at a 10% false discovery rate (FDR) based on permutations of the SNP genotypes and gene expression levels, for a total of 24,513 eSNPs associated with 15,241 transcripts or 9931 distinct genes.
To compare against these, we used the results of the genome wide association studies reported by the Wellcome Trust Case Control Consortium for 7 different diseases: bipolar disorder (BD), coronary artery disease (CAD), Crohn’s disease (CD), hypertension (HTN), rheumatoid arthritis (RA), type 1 diabetes (T1D), and type 2 diabetes (T2D).
We will refer to the p-value of eQTL association for an eSNP as peqtl, and the p-value of disease association for a given SNP as pgwas. For each of the three tissue types for which we had eQTL relationships, we studied the relationship of minimum eSNP p-value (peqtl) to odds ratio against any of the seven diseases. For each disease studied by the WTCCC, we selected eSNPs in each tissue that were in very strong linkage disequilibrium (LD, R2 = 1) with any SNP having uncorrected pgwas < 0.01 for that disease. Because testing multiple SNPs in LD with each other could result in spurious correlation, we selected one SNP with the minimum peqtl from each LD block and performed Kendall rank correlation analysis of -log(peqtl) and the absolute value of log(disease odds ratio) using the cor.test function in R (11). The resulting p-values of 21 statistical tests were provided as input to the qvalue package in R to calculate corresponding q-values (12).
Our goal was to determine whether one can computationally assign a functional tissue to a disease, given only the genetic architecture found through GWAS, using tissue-specific eQTL data sets. As shown in Figure 2, we intersected two sets of variants, one from each disease studied by the WTCCC and the other from eSNPs reported in each tissue type.
The total numbers of eSNPs reported to be significant in each tissue, and the number that were associated with each disease at p < 0.01, are shown in Table 1. For the eSNPs listed in this table, we investigated the correlation between peqtl and disease odds ratio. For example, Figure 3 shows the rank of log(odds ratio) plotted against log(peqtl) for monocyte eSNPs and type 1 diabetes. The log of Kendall rank correlation p-value for each disease is shown in Figure 4.
For SNPs reported to be associated with gene expression in monocytes, we found that the correlation was significant at p < 0.05 in three diseases, Crohn’s disease, rheumatoid arthritis, and type 1 diabetes. These are all autoimmune diseases in which macrophages play various roles. For example, there are recent reports that Crohn’s disease may be a primary immunodeficiency of macrophages (13), and these cells are also involved in the pathogenesis of type 1 diabetes (14,15). Monocytes are also of central importance in rheumatoid arthritis (16). Closer inspection revealed several SNPs in the vicinity of -- but not within -- CARD9 that were strongly associated with the expression of this gene. CARD9 has previously been implicated in Crohn’s disease, although the association was not replicated in the WTCCC study (17).
In liver tissue, we found that the correlation was significant for type 2 diabetes, and borderline for coronary artery disease. The liver is of course intimately involved in the metabolism of glucose and lipids, and liver pathology, such as nonalcoholic fatty liver disease seen in metabolic syndrome, is linked to coronary artery disease (18) and type 2 diabetes (19). We did not find any significant correlations in adipose tissue. The Kendall rank correlation statistics and q-values are summarized in Table 2. For the significant correlations, the q-values ranged from 0.025 to 0.205.
We systematically investigated the correlation between the statistical significance of eQTLs calculated in different tissues and their odds ratios for different diseases studied by the WTCCC. Although several groups have used these ‘tissue eQTLs’ as an indicator of functional significance in interpreting GWAS results, to our knowledge, this is the first time the correlation of tissues to diseases has been systematically evaluated. If we consider eQTLs to be a marker of the functional significance of SNPs, our analysis reveals that in tissues that play a role in the disease this significance correlates with the disease odds ratio. These results suggest that the reverse may be true as well: tissues in which there is a correlation between functional significance and odds ratio are more likely to play a role in that disease. While the statistical associations are not very strong, the false discovery rates of 0.025 to 0.205 indicate that the majority of them are not spurious. We plan to validate these findings by performing the analysis on additional eQTL and GWAS datasets. Ultimately, eSNP-gene relationships in the tissue of interest may reveal novel candidate disease SNPs, providing additional clues to the pathogenesis of the disease. For example, rs7698608, a SNP with p = 0.0005 for type 2 diabetes, is associated with the expression of CISD2 in the liver. CISD2 is a causative gene for Wolfram syndrome type 2, and a glucose intolerance phenotype has been observed in CISD2−/− C57BL/6 mice (20).
In conclusion, analyzing the correlation between tissue eQTL significance and disease odds ratio may provide another layer of tissue-specific information that can be used to decipher GWAS results. In order to facilitate this, more studies of tissue specific eQTLs must be performed and the data shared with the scientific community. Many researchers have been generous with their data, however the trend so far is to share mainly the p-values for the SNPs, and most of the datasets have not been collected in a centralized location. Providing the information equivalent to odds ratio in GWAS (the direction and magnitude of effect on gene expression) and submitting to a centralized database such as the GTEx (Genotype-Tissue Expression) eQTL Browser (http://www.ncbi.nlm.nih.gov/gtex/test/GTEX2/gtex.cgi) will help other investigators fully leverage the power of this data.
We thank Alex Skrenchuk and Mike Seda from Stanford University for computer support. This work was supported by Lucile Packard Foundation for Children's Health, the Hewlett Packard Foundation, and the National Library of Medicine (R01 LM009719 and T15 LM007033). This study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113.