Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.
When interpreting genome-wide association studies showing that specific genetic variants are associated with disease risk, scientists look for a link between the genetic variant and a biological mechanism behind that disease. One functional mechanism is that the genetic variant may influence gene transcription via a co-localized genomic regulatory element, such as a transcription factor binding site within an open chromatin region. Often this type of regulation occurs in some cell types but not others. In this study, we look across eleven gene expression studies with seven cell types and consider how genetic transcription regulators, or eQTLs, replicate within and between cell types. We identify pervasive allelic heterogeneity, or transcriptional control of a single gene by multiple, independent eQTLs. We integrate extensive data on cell type specific regulatory elements from ENCODE to identify general methods of transcription regulation through enrichment of eQTLs within regulatory elements. We also build a classifier to predict eQTL replication across cell types. The results in this paper present a path to an integrative, predictive approach to improve our ability to understand the mechanistic basis of human phenotypic variation.
A large fraction of human genes are regulated by genetic variation near the transcribed sequence (cis-eQTL, expression quantitative trait locus), and many cis-eQTLs have implications for human disease. Less is known regarding the effects of genetic variation on expression of distant genes (trans-eQTLs) and their biological mechanisms. In this work, we use genome-wide data on SNPs and array-based expression measures from mononuclear cells obtained from a population-based cohort of 1,799 Bangladeshi individuals to characterize cis- and trans-eQTLs and determine if observed trans-eQTL associations are mediated by expression of transcripts in cis with the SNPs showing trans-association, using Sobel tests of mediation. We observed 434 independent trans-eQTL associations at a false-discovery rate of 0.05, and 189 of these trans-eQTLs were also cis-eQTLs (enrichment P<0.0001). Among these 189 trans-eQTL associations, 39 were significantly attenuated after adjusting for a cis-mediator based on Sobel P<10-5. We attempted to replicate 21 of these mediation signals in two European cohorts, and while only 7 trans-eQTL associations were present in one or both cohorts, 6 showed evidence of cis-mediation. Analyses of simulated data show that complete mediation will be observed as partial mediation in the presence of mediator measurement error or imperfect LD between measured and causal variants. Our data demonstrates that trans-associations can become significantly stronger or switch directions after adjusting for a potential mediator. Using simulated data, we demonstrate that this phenomenon is expected in the presence of strong cis-trans confounding and when the measured cis-transcript is correlated with the true (unmeasured) mediator. In conclusion, by applying mediation analysis to eQTL data, we show that a substantial fraction of observed trans-eQTL associations can be explained by cis-mediation. Future studies should focus on understanding the mechanisms underlying widespread cis-mediation and their relevance to disease biology, as well as using mediation analysis to improve eQTL discovery.
Expression quantitative trait locus (eQTL) studies have demonstrated that human genes can be regulated by genetic variation residing close to the gene (cis-eQTLs) or in a distant region or on a different chromosome (trans-eQTLs). While cis-eQTL variants are likely to affect transcription factor binding or chromatin structure, our understanding of the mechanisms underlying trans-eQTLs is incomplete. We hypothesize that a substantial fraction of trans-eQTLs influence expression of distant genes through mediation by expression levels of a cis-transcript. In this paper, we use genome-wide SNPs and expression data for 1,799 South Asians to identify cis- and trans-eQTLs and to test our hypothesis using Sobel tests of mediation. Among 189 observed trans-eQTL associations, we provide evidence of cis-mediation for 39, 6 of which show mediation in an independent European cohort. We used simulated data to demonstrate that complete mediation will be observed as partial mediation in the presence of mediator measurement error or imperfect LD between measured and causal variants. We also demonstrate how unobserved confounding variables and incorrect mediator selection can bias mediation estimates. In conclusion, we have identified cis-mediators for many trans-eQTLs and described a mediation analysis approach that can be used to validate, characterize, and enhance discovery of trans-eQTLs.
Most genome-wide association studies consider genes that are located closest to single nucleotide polymorphisms (SNPs) that are highly significant for those studies. However, the significance of the associations between SNPs and candidate genes has not been fully determined. An alternative approach that used SNPs in expression quantitative trait loci (eQTL) was reported previously for Crohn’s disease; it was shown that eQTL-based preselection for follow-up studies was a useful approach for identifying risk loci from the results of moderately sized GWAS. In this study, we propose an approach that uses eQTL SNPs to support the functional relationships between an SNP and a candidate gene in a genome-wide association study. The genome-wide SNP genotypes and 10 biochemical measures (fasting glucose levels, BUN, serum albumin levels, AST, ALT, gamma GTP, total cholesterol, HDL cholesterol, triglycerides, and LDL cholesterol) were obtained from the Korean Association Resource (KARE) consortium. The eQTL SNPs were isolated from the SNP dataset based on the RegulomeDB eQTL-SNP data from the ENCODE projects and two recent eQTL reports. A total of 25,658 eQTL SNPs were tested for their association with the 10 metabolic traits in 2 Korean populations (Ansung and Ansan). The proportion of phenotypic variance explained by eQTL and non-eQTL SNPs showed that eQTL SNPs were more likely to be associated with the metabolic traits genetically compared with non-eQTL SNPs. Finally, via a meta-analysis of the two Korean populations, we identified 14 eQTL SNPs that were significantly associated with metabolic traits. These results suggest that our approach can be expanded to other genome-wide association studies.
Motivation: Gene expression is influenced by variants commonly known as expression quantitative trait loci (eQTL). On the basis of this fact, researchers proposed to use eQTL/functional information univariately for prioritizing single nucleotide polymorphisms (SNPs) signals from genome-wide association studies (GWAS). However, most genes are influenced by multiple eQTLs which, thus, jointly affect any downstream phenotype. Therefore, when compared with the univariate prioritization approach, a joint modeling of eQTL action on phenotypes has the potential to substantially increase signal detection power. Nonetheless, a joint eQTL analysis is impeded by (i) not measuring all eQTLs in a gene and/or (ii) lack of access to individual genotypes.
Results: We propose joint effect on phenotype of eQTL/functional SNPs associated with a gene (JEPEG), a novel software tool which uses only GWAS summary statistics to (i) impute the summary statistics at unmeasured eQTLs and (ii) test for the joint effect of all measured and imputed eQTLs in a gene. We illustrate the behavior/performance of the developed tool by analysing the GWAS meta-analysis summary statistics from the Psychiatric Genomics Consortium Stage 1 and the Genetic Consortium for Anorexia Nervosa.
Conclusions: Applied analyses results suggest that JEPEG complements commonly used univariate GWAS tools by: (i) increasing signal detection power via uncovering (a) novel genes or (b) known associated genes in smaller cohorts and (ii) assisting in fine-mapping of challenging regions, e.g. major histocompatibility complex for schizophrenia.
Availability and implementation: JEPEG, its associated database of eQTL SNPs and usage examples are publicly available at http://code.google.com/p/jepeg/.
Supplementary data are available at Bioinformatics online.
Mapping expression quantitative trait loci (eQTL) has identified genetic variants associated with transcription rates and has provided insight into genotype–phenotype associations obtained from genome-wide association studies (GWAS). Traditional eQTL mapping methods present significant challenges for the multiple-testing burden, resulting in a limited ability to detect eQTL that reside distal to the affected gene. To overcome this, we developed a novel eQTL testing approach, “network-based, large-scale identification of distal eQTL” (NetLIFT), which performs eQTL testing based on the pairwise conditional dependencies between genes’ expression levels. When applied to existing data from yeast segregants, NetLIFT replicated most previously identified distal eQTL and identified 46% more genes with distal effects compared to local effects. In liver data from mouse lines derived through the Collaborative Cross project, NetLIFT detected 5744 genes with local eQTL while 3322 genes had distal eQTL. This analysis revealed founder-of-origin effects for a subset of local eQTL that may contribute to previously described phenotypic differences in metabolic traits. In human lymphoblastoid cell lines, NetLIFT was able to detect 1274 transcripts with distal eQTL that had not been reported in previous studies, while 2483 transcripts with local eQTL were identified. In all species, we found no enrichment for transcription factors facilitating eQTL associations; instead, we found that most trans-acting factors were annotated for metabolic function, suggesting that genetic variation may indirectly regulate multigene pathways by targeting key components of feedback processes within regulatory networks. Furthermore, the unique genetic history of each population appears to influence the detection of genes with local and distal eQTL.
eQTL; gene expression; gene networks; genetical genomics
Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.
Genome-wide association studies (GWAS) have found a large number of genetic regions (“loci”) affecting clinical end-points and phenotypes, many outside coding intervals. One approach to understanding the biological basis of these associations has been to explore whether GWAS signals from intermediate cellular phenotypes, in particular gene expression, are located in the same loci (“colocalise”) and are potentially mediating the disease signals. However, it is not clear how to assess whether the same variants are responsible for the two GWAS signals or whether it is distinct causal variants close to each other. In this paper, we describe a statistical method that can use simply single variant summary statistics to test for colocalisation of GWAS signals. We describe one application of our method to a meta-analysis of blood lipids and liver expression, although any two datasets resulting from association studies can be used. Our method is able to detect the subset of GWAS signals explained by regulatory effects and identify candidate genes affected by the same GWAS variants. As summary GWAS data are increasingly available, applications of colocalisation methods to integrate the findings will be essential for functional follow-up, and will also be particularly useful to identify tissue specific signals in eQTL datasets.
Gene-based analysis has become popular in genomic research because of its appealing biological and statistical properties compared with those of a single-locus analysis. However, only a few, if any, studies have discussed a mapping of expression quantitative trait loci (eQTL) in a gene-based framework. Neither study has discussed ancestry-informative eQTL nor investigated their roles in pharmacogenetics by integrating single nucleotide polymorphism (SNP)-based eQTL (s-eQTL) and gene-based eQTL (g-eQTL).
In this g-eQTL mapping study, the transcript expression levels of genes (transcript-level genes; T-genes) were correlated with the SNPs of genes (sequence-level genes; S-genes) by using a method of gene-based partial least squares (PLS). Ancestry-informative transcripts were identified using a rank-score-based multivariate association test, and ancestry-informative eQTL were identified using Fisher’s exact test. Furthermore, key ancestry-predictive eQTL were selected in a flexible discriminant analysis. We analyzed SNPs and gene expression of 210 independent people of African-, Asian- and European-descent. We identified numerous cis- and trans-acting g-eQTL and s-eQTL for each population by using PLS. We observed ancestry information enriched in eQTL. Furthermore, we identified 2 ancestry-informative eQTL associated with adverse drug reactions and/or drug response. Rs1045642, located on MDR1, is an ancestry-informative eQTL (P = 2.13E-13, using Fisher’s exact test) associated with adverse drug reactions to amitriptyline and nortriptyline and drug responses to morphine. Rs20455, located in KIF6, is an ancestry-informative eQTL (P = 2.76E-23, using Fisher’s exact test) associated with the response to statin drugs (e.g., pravastatin and atorvastatin). The ancestry-informative eQTL of drug biotransformation genes were also observed; cross-population cis-acting expression regulators included SPG7, TAP2, SLC7A7, and CYP4F2. Finally, we also identified key ancestry-predictive eQTL and established classification models with promising training and testing accuracies in separating samples from close populations.
In summary, we developed a gene-based PLS procedure and a SAS macro for identifying g-eQTL and s-eQTL. We established data archives of eQTL for global populations. The program and data archives are accessible at http://www.stat.sinica.edu.tw/hsinchou/genetics/eQTL/HapMapII.htm. Finally, the results from our investigations regarding the interrelationship between eQTL, ancestry information, and pharmacodynamics provide rich resources for future eQTL studies and practical applications in population genetics and medical genetics.
Gene-based approach; Expression quantitative trait locus (eQTL); Partial least squares (PLS); Ancestry-informative marker (AIM); Pharmacogenetics; Adverse drug reaction; Drug response; Drug biotransformation
Genome-wide association studies (GWASs) have uncovered susceptibility loci for a large number of complex traits. Functional interpretation of candidate genes identified by GWAS and confident assignment of the causal variant still remains a major challenge. Expression quantitative trait (eQTL) mapping has facilitated identification of risk loci for quantitative traits and might allow prioritization of GWAS candidate genes. One major challenge of eQTL studies is the need for larger sample numbers and replication. The aim of this study was to evaluate the robustness and reproducibility of whole-blood eQTLs in humans and test their value in the identification of putative functional variants involved in the etiology of complex traits. In the current study, we performed comphrehensive eQTL mapping from whole blood. The discovery sample included 322 Caucasians from a general population sample (KORA F3). We identified 363 cis and 8 trans eQTLs after stringent Bonferroni correction for multiple testing. Of these, 98.6% and 50% of cis and trans eQTLs, respectively, could be replicated in two independent populations (KORA F4 (n=740) and SHIP-TREND (n=653)). Furthermore, we identified evidence of regulatory variation for SNPs previously reported to be associated with disease loci (n=59) or quantitative trait loci (n=20), indicating a possible functional mechanism for these eSNPs. Our data demonstrate that eQTLs in whole blood are highly robust and reproducible across studies and highlight the relevance of whole-blood eQTL mapping in prioritization of GWAS candidate genes in humans.
gene expression; eQTL; GWAS; whole blood
Amyotrophic lateral sclerosis (ALS) is a progressive, neurodegenerative disease characterized by loss of upper and lower motor neurons. ALS is considered to be a complex trait and genome-wide association studies (GWAS) have implicated a few susceptibility loci. However, many more causal loci remain to be discovered. Since it has been shown that genetic variants associated with complex traits are more likely to be eQTLs than frequency-matched variants from GWAS platforms, we conducted a two-stage genome-wide screening for eQTLs associated with ALS. In addition, we applied an eQTL analysis to finemap association loci. Expression profiles using peripheral blood of 323 sporadic ALS patients and 413 controls were mapped to genome-wide genotyping data. Subsequently, data from a two-stage GWAS (3,568 patients and 10,163 controls) were used to prioritize eQTLs identified in the first stage (162 ALS, 207 controls). These prioritized eQTLs were carried forward to the second sample with both gene-expression and genotyping data (161 ALS, 206 controls). Replicated eQTL SNPs were then tested for association in the second-stage GWAS data to find SNPs associated with disease, that survived correction for multiple testing. We thus identified twelve cis eQTLs with nominally significant associations in the second-stage GWAS data. Eight SNP-transcript pairs of highest significance (lowest p = 1.27×10−51) withstood multiple-testing correction in the second stage and modulated CYP27A1 gene expression. Additionally, we show that C9orf72 appears to be the only gene in the 9p21.2 locus that is regulated in cis, showing the potential of this approach in identifying causative genes in association loci in ALS. This study has identified candidate genes for sporadic ALS, most notably CYP27A1. Mutations in CYP27A1 are causal to cerebrotendinous xanthomatosis which can present as a clinical mimic of ALS with progressive upper motor neuron loss, making it a plausible susceptibility gene for ALS.
Restless legs syndrome (RLS) is a common neurologic disorder characterized by nightly dysesthesias affecting the legs primarily during periods of rest and relieved by movement. RLS is a complex genetic disease and susceptibility factors in six genomic regions have been identified by means of genome-wide association studies (GWAS). For some complex genetic traits, expression quantitative trait loci (eQTLs) are enriched among trait-associated single nucleotide polymorphisms (SNPs). With the aim of identifying new genetic susceptibility factors for RLS, we assessed the 332 best-associated SNPs from the genome-wide phase of the to date largest RLS GWAS for cis-eQTL effects in peripheral blood from individuals of European descent. In 740 individuals belonging to the KORA general population cohort, 52 cis-eQTLs with pnominal<10−3 were identified, while in 976 individuals belonging to the SHIP-TREND general population study 53 cis-eQTLs with pnominal<10−3 were present. 23 of these cis-eQTLs overlapped between the two cohorts. Subsequently, the twelve of the 23 cis-eQTL SNPs, which were not located at an already published RLS-associated locus, were tested for association in 2449 RLS cases and 1462 controls. The top SNP, located in the DET1 gene, was nominally significant (p<0.05) but did not withstand correction for multiple testing (p = 0.42). Although a similar approach has been used successfully with regard to other complex diseases, we were unable to identify new genetic susceptibility factor for RLS by adding this novel level of functional assessment to RLS GWAS data.
While genome-wide association studies (GWASs) have been successful in identifying novel variants associated with various diseases, it has been much more difficult to determine the biological mechanisms underlying these associations. Expression quantitative trait loci (eQTL) provide another dimension to these data by associating single nucleotide polymorphisms (SNPs) with gene expression. We hypothesised that integrating SNPs known to be associated with type 2 diabetes with eQTLs and coexpression networks would enable the discovery of novel candidate genes for type 2 diabetes.
We selected 32 SNPs associated with type 2 diabetes in two or more independent GWASs. We used previously described eQTLs mapped from genotype and gene expression data collected from 1,008 morbidly obese patients to find genes with expression associated with these SNPs. We linked these genes to coexpression modules, and ranked the other genes in these modules using an inverse sum score.
We found 62 genes with expression associated with type 2 diabetes SNPs. We validated our method by linking highly ranked genes in the coexpression modules back to SNPs through a combined eQTL dataset. We showed that the eQTLs highlighted by this method are significantly enriched for association with type 2 diabetes in data from the Wellcome Trust Case Control Consortium (WTCCC, p = 0.026) and the Gene Environment Association Studies (GENEVA, p = 0.042), validating our approach. Many of the highly ranked genes are also involved in the regulation or metabolism of insulin, glucose or lipids.
We have devised a novel method, involving the integration of datasets of different modalities, to discover novel candidate genes for type 2 diabetes.
Genetics of type 2 diabetes; Genomics/proteomics; Mathematical modelling and simulation
Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Although recent genome-wide association studies (GWAS) have contributed to discovery of SLE susceptibility genes, few studies has been performed in Asian populations. Here, we report a GWAS for SLE examining 891 SLE cases and 3,384 controls and multi-stage replication studies examining 1,387 SLE cases and 28,564 controls in Japanese subjects. Considering that expression quantitative trait loci (eQTLs) have been implicated in genetic risks for autoimmune diseases, we integrated an eQTL study into the results of the GWAS. We observed enrichments of cis-eQTL positive loci among the known SLE susceptibility loci (30.8%) compared to the genome-wide SNPs (6.9%). In addition, we identified a novel association of a variant in the AF4/FMR2 family, member 1 (AFF1) gene at 4q21 with SLE susceptibility (rs340630; P = 8.3×10−9, odds ratio = 1.21). The risk A allele of rs340630 demonstrated a cis-eQTL effect on the AFF1 transcript with enhanced expression levels (P<0.05). As AFF1 transcripts were prominently expressed in CD4+ and CD19+ peripheral blood lymphocytes, up-regulation of AFF1 may cause the abnormality in these lymphocytes, leading to disease onset.
Although recent genome-wide association study (GWAS) approaches have successfully contributed to disease gene discovery, many susceptibility loci are known to be still uncaptured due to strict significance threshold for multiple hypothesis testing. Therefore, prioritization of GWAS results by incorporating additional information is recommended. Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Considering that abnormalities in B cell activity play essential roles in SLE, prioritization based on an expression quantitative trait loci (eQTLs) study for B cells would be a promising approach. In this study, we report a GWAS and multi-stage replication studies for SLE examining 2,278 SLE cases and 31,948 controls in Japanese subjects. We integrated eQTL study into the results of the GWAS and identified AFF1 as a novel SLE susceptibility loci. We also confirmed cis-regulatory effect of the locus on the AFF1 transcript. Our study would be one of the initial successes for detecting novel genetic locus using the eQTL study, and it should contribute to our understanding of the genetic loci being uncaptured by standard GWAS approaches.
Elucidating the genetic basis underlying hepatic gene expression variability is of importance to understand the aetiology of the disease and variation in drug metabolism. To date, no genome-wide expression quantitative trait loci (eQTLs) analysis has been conducted in the Han Chinese population, the largest ethnic group in the world.
We performed a genome-wide eQTL mapping in a set of Han Chinese liver tissue samples (n=64). The data were then compared with published eQTL data from a Caucasian population. We then performed correlations between these eQTLs with important pharmacogenes, and genome-wide association study (GWAS) identified single nucleotide polymorphisms (SNPs), in particular those identified in the Asian population.
Our analyses identified 1669 significant eQTLs (false discovery rate (FDR) < 0.05). We found that 41% of Asian eQTLs were also eQTLs in Caucasians at the genome-wide significance level (p=10−8). Both cis- and trans-eQTLs in the Asian population were also more likely to be eQTLs in Caucasians (p<10−4). Enrichment analyses revealed that trait-associated GWAS-SNPs were enriched within the eQTLs identified in our data, so were the GWAS-SNPs specifically identified in Asian populations in a separate analysis (p<0.001 for both). We also found that hepatic expression of very important pharmacogenetic (VIP) genes (n=44) and a manually curated list of major genes involved in pharmacokinetics (n=341) were both more likely to be controlled by eQTLs (p<0.002 for both).
Our study provided, for the first time, a comprehensive hepatic eQTL analysis in a non-European population, further generating valuable data for characterising the genetic basis of human diseases and pharmacogenetic traits.
Clinical genetics; Genetics; Genome-wide; Molecular genetics
Genome-wide association studies (GWAS) have identified loci reproducibly associated with pulmonary diseases; however, the molecular mechanism underlying these associations are largely unknown. The objectives of this study were to discover genetic variants affecting gene expression in human lung tissue, to refine susceptibility loci for asthma identified in GWAS studies, and to use the genetics of gene expression and network analyses to find key molecular drivers of asthma. We performed a genome-wide search for expression quantitative trait loci (eQTL) in 1,111 human lung samples. The lung eQTL dataset was then used to inform asthma genetic studies reported in the literature. The top ranked lung eQTLs were integrated with the GWAS on asthma reported by the GABRIEL consortium to generate a Bayesian gene expression network for discovery of novel molecular pathways underpinning asthma. We detected 17,178 cis- and 593 trans- lung eQTLs, which can be used to explore the functional consequences of loci associated with lung diseases and traits. Some strong eQTLs are also asthma susceptibility loci. For example, rs3859192 on chr17q21 is robustly associated with the mRNA levels of GSDMA (P = 3.55×10−151). The genetic-gene expression network identified the SOCS3 pathway as one of the key drivers of asthma. The eQTLs and gene networks identified in this study are powerful tools for elucidating the causal mechanisms underlying pulmonary disease. This data resource offers much-needed support to pinpoint the causal genes and characterize the molecular function of gene variants associated with lung diseases.
Recent genome-wide association studies (GWAS) have identified genetic variants associated with lung diseases. The challenge now is to find the causal genes in GWAS–nominated chromosomal regions and to characterize the molecular function of disease-associated genetic variants. In this paper, we describe an international effort to systematically capture the genetic architecture of gene expression regulation in human lung. By studying lung specimens from 1,111 individuals of European ancestry, we found a large number of genetic variants affecting gene expression in the lung, or lung expression quantitative trait loci (eQTL). These lung eQTLs will serve as an important resource to aid in the understanding of the molecular underpinnings of lung biology and its disruption in disease. To demonstrate the utility of this lung eQTL dataset, we integrated our data with previous genetic studies on asthma. Through integrative techniques, we identified causal variants and genes in GWAS–nominated loci and found key molecular drivers for asthma. We feel that sharing our lung eQTLs dataset with the scientific community will leverage the impact of previous large-scale GWAS on lung diseases and function by providing much needed functional information to understand the molecular changes introduced by the susceptibility genetic variants.
There is genetic evidence that schizophrenia is a polygenic disorder with a large number of loci of small effect on disease susceptibility. Genome-wide association studies (GWASs) of schizophrenia have had limited success, with the best finding at the MHC locus at chromosome 6p. A recent effort of the Psychiatric GWAS consortium (PGC) yielded five novel loci for schizophrenia. In this study, we aim to highlight additional schizophrenia susceptibility loci from the PGC study by combining the top association findings from the discovery stage (9394 schizophrenia cases and 12 462 controls) with expression QTLs (eQTLs) and differential gene expression in whole blood of schizophrenia patients and controls. We examined the 6192 single-nucleotide polymorphisms (SNPs) with significance threshold at P<0.001. eQTLs were calculated for these SNPs in a sample of healthy controls (n=437). The transcripts significantly regulated by the top SNPs from the GWAS meta-analysis were subsequently tested for differential expression in an independent set of schizophrenia cases and controls (n=202). After correction for multiple testing, the eQTL analysis yielded 40 significant cis-acting effects of the SNPs. Seven of these transcripts show differential expression between cases and controls. Of these, the effect of three genes (RNF5, TRIM26 and HLA-DRB3) coincided with the direction expected from meta-analysis findings and were all located within the MHC region. Our results identify new genes of interest and highlight again the involvement of the MHC region in schizophrenia susceptibility.
expression; schizophrenia; eQTL; MHC; SNP
The discovery of expression quantitative trait loci (“eQTLs”) can
help to unravel genetic contributions to complex traits. We identified genetic
determinants of human liver gene expression variation using two independent
collections of primary tissue profiled with Agilent
(n = 206) and Illumina (n = 60)
expression arrays and Illumina SNP genotyping (550K), and we also incorporated
data from a published study (n = 266). We found that
∼30% of SNP-expression correlations in one study failed to replicate
in either of the others, even at thresholds yielding high reproducibility in
simulations, and we quantified numerous factors affecting reproducibility. Our
data suggest that drug exposure, clinical descriptors, and unknown factors
associated with tissue ascertainment and analysis have substantial effects on
gene expression and that controlling for hidden confounding variables
significantly increases replication rate. Furthermore, we found that
reproducible eQTL SNPs were heavily enriched near gene starts and ends, and
subsequently resequenced the promoters and 3′UTRs for 14 genes and tested
the identified haplotypes using luciferase assays. For three genes, significant
haplotype-specific in vitro functional differences correlated
directly with expression levels, suggesting that many bona fide
eQTLs result from functional variants that can be mechanistically isolated in a
high-throughput fashion. Finally, given our study design, we were able to
discover and validate hundreds of liver eQTLs. Many of these relate directly to
complex traits for which liver-specific analyses are likely to be relevant, and
we identified dozens of potential connections with disease-associated loci.
These included previously characterized eQTL contributors to diabetes, drug
response, and lipid levels, and they suggest novel candidates such as a role for
NOD2 expression in leprosy risk and
C2orf43 in prostate cancer. In general, the work presented
here will be valuable for future efforts to precisely identify and functionally
characterize genetic contributions to a variety of complex traits.
Many disease-associated genetic variants do not alter protein sequences and are
difficult to precisely identify. Discovery of expression quantitative trait loci
(eQTL), or correlations between genetic variants and gene expression levels,
offers one means of addressing this challenge. However, eQTL studies in primary
cells have several shortcomings. In particular, their reproducibility is largely
unknown, the variables that generate unreliable associations are
uncharacterized, and the resolution of their findings is constrained by linkage
disequilibrium. We performed a three-way replication study of eQTLs in primary
human livers. We demonstrated that ∼67% of cis-eQTL associations are
replicated in an independent study and that known polymorphisms overlapping
expression probes, SNP-to-gene distance, and unmeasured confounding variables
all influence the replication rate. We fine-mapped 14 eQTLs and identified
causative polymorphisms in the promoter or 3′UTR for 3 genes, suggesting
that a considerable fraction of eQTLs are driven by proximal variants that are
amenable to functional isolation. Finally, we found hundreds of overlaps between
SNPs associated with complex traits and replicated eQTL SNPs. Our data provide
both cautionary (i.e. non-reproducibility of many strong eQTLs)
and optimistic (i.e. precise identification of functional
non-coding variants) forecasts for future eQTL analyses and the complex traits
that they influence.
Although genome-wide association studies (GWAS) of complex traits have yielded more reproducible associations than had been discovered using any other approach, the loci characterized to date do not account for much of the heritability to such traits and, in general, have not led to improved understanding of the biology underlying complex phenotypes. Using a web site we developed to serve results of expression quantitative trait locus (eQTL) studies in lymphoblastoid cell lines from HapMap samples (http://www.scandb.org), we show that single nucleotide polymorphisms (SNPs) associated with complex traits (from http://www.genome.gov/gwastudies/) are significantly more likely to be eQTLs than minor-allele-frequency–matched SNPs chosen from high-throughput GWAS platforms. These findings are robust across a range of thresholds for establishing eQTLs (p-values from 10−4–10−8), and a broad spectrum of human complex traits. Analyses of GWAS data from the Wellcome Trust studies confirm that annotating SNPs with a score reflecting the strength of the evidence that the SNP is an eQTL can improve the ability to discover true associations and clarify the nature of the mechanism driving the associations. Our results showing that trait-associated SNPs are more likely to be eQTLs and that application of this information can enhance discovery of trait-associated SNPs for complex phenotypes raise the possibility that we can utilize this information both to increase the heritability explained by identifiable genetic factors and to gain a better understanding of the biology underlying complex traits.
We show here that single nucleotide polymorphisms (SNPs) associated with complex traits (as identified in the catalog of results from genome-wide association studies http://www.genome.gov/gwastudies/) are more likely than other SNPs chosen from high-throughput genotyping platforms to predict expression levels of genes. These observations confirm that genetic risk factors for complex traits will often affect phenotype by altering the amount or timing of protein production, rather than by changing the type of protein produced. This knowledge can be used to improve our ability to discover genetic risk factors for complex traits and to improve our understanding of their underlying biology.
Genome-wide gene expression profiling has been extensively used to generate biological hypotheses based on differential expression. Recently, many studies have used microarrays to measure gene expression levels across genetic mapping populations. These gene expression phenotypes have been used for genome-wide association analyses, an analysis referred to as expression QTL (eQTL) mapping. Here, eQTL analysis was performed in adipose tissue from 28 inbred strains of mice. We focused our analysis on “trans-eQTL bands”, defined as instances in which the expression patterns of many genes were all associated to a common genetic locus. Genes comprising trans-eQTL bands were screened for enrichments in functional gene sets representing known biological pathways, and genes located at associated trans-eQTL band loci were considered candidate transcriptional modulators. We demonstrate that these patterns were enriched for previously characterized relationships between known upstream transcriptional regulators and their downstream target genes. Moreover, we used this strategy to identify both novel regulators and novel members of known pathways. Finally, based on a putative regulatory relationship identified in our analysis, we identified and validated a previously uncharacterized role for cyclin H in the regulation of oxidative phosphorylation. We believe that the specific molecular hypotheses generated in this study will reveal many additional pathway members and regulators, and that the analysis approaches described herein will be broadly applicable to other eQTL data sets.
Genome-wide association (GWA) analyses seek to relate variation of phenotype to underlying (and presumably causative) variation in genotype. Recently, many GWA studies have identified candidate genes underlying disease phenotypes such as diabetes, heart disease, and cancer risk. Many groups have also performed GWA using variation in gene expression levels as the input phenotype. These expression QTL (eQTL) studies have provided important clues as to the genetic basis of gene expression regulation. Here, we perform an eQTL study in mouse adipose tissue. We then developed a systematic analysis method to relate these patterns of eQTL associations to biological pathways. Based on this approach, we identified putative roles for thousands of candidate upstream regulators and candidate pathway members in relation to specific biological pathways. Statistical analysis showed that these predictions were highly enriched for true genetic modulators of these pathways. Based on these predictions, we also experimentally validated a role for one particular gene, cyclin H, in the regulation of oxidative phosphorylation. These findings illustrate a new analysis method for relating eQTL studies to biological pathways and identify cyclin H as a novel key regulator of cellular energy metabolism.
DNA sequence variation causes changes in gene expression, which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes. A genetic locus containing a sequence variation that affects gene expression is called an “expression quantitative trait locus” (eQTL). Whereas the impact of cellular context on expression levels in general is well established, a lot less is known about the cell-state specificity of eQTL. Previous studies differed with respect to how “dynamic eQTL” were defined. Here, we propose a unified framework distinguishing static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. Further, we introduce a new approach to simultaneously infer eQTL from different cell types. By using murine mRNA expression data from four stages of hematopoiesis and 14 related cellular traits, we demonstrate that static, conditional and dynamic eQTL, although derived from the same expression data, represent functionally distinct types of eQTL. While static eQTL affect generic cellular processes, non-static eQTL are more often involved in hematopoiesis and immune response. Our analysis revealed substantial effects of individual genetic variation on cell type-specific expression regulation. Among a total number of 3,941 eQTL we detected 2,729 static eQTL, 1,187 eQTL were conditionally active in one or several cell types, and 70 eQTL affected expression changes during cell type transitions. We also found evidence for feedback control mechanisms reverting the effect of an eQTL specifically in certain cell types. Loci correlated with hematological traits were enriched for conditional eQTL, thus, demonstrating the importance of conditional eQTL for understanding molecular mechanisms underlying physiological trait variation. The classification proposed here has the potential to streamline and unify future analysis of conditional and dynamic eQTL as well as many other kinds of QTL data.
Complex physiological traits are affected through subtle changes of molecular traits like gene expression in the relevant tissues, which in turn are caused by genetic variation. A genetic locus containing a sequence variation affecting gene expression is called an expression quantitative trait locus (eQTL). Understanding the tissue and cell type specificity of eQTL effects is essential for revealing the molecular mechanisms underlying disease phenotypes. However, so far the cell-state dependence of eQTL is poorly understood. In order to systematically assess the importance of cell state-specific eQTL, we propose to distinguish static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. We applied our framework to mouse gene expression data from four hematopoietic stages and related cellular traits. The different eQTL classes, although derived from the same expression data, represent functionally distinct types of eQTL. Importantly, conditional eQTL are well correlated with relevant hematological traits. These findings emphasize the condition specificity of many regulatory relationships, even if the conditions under study are related. This calls for due caution when transferring conclusions about regulatory mechanisms across cell types or tissues. The proposed classification will also help to unravel dynamic behaviors in many other kinds of QTL data.
Interactions among genomic loci (also known as epistasis) have been suggested as one of the potential sources of missing heritability in single locus analysis of genome-wide association studies (GWAS). The computational burden of searching for interactions is compounded by the extremely low threshold for identifying significant p-values due to multiple hypothesis testing corrections. Utilizing prior biological knowledge to restrict the set of candidate SNP pairs to be tested can alleviate this problem, but systematic studies that investigate the relative merits of integrating different biological frameworks and GWAS data have not been conducted.
We developed four biologically based frameworks to identify pairwise interactions among candidate SNP pairs as follows: (1) for each human protein-coding gene, a set of SNPs associated with that gene was constructed providing a gene-based interaction model, (2) for each known biological pathway, a set of SNPs associated with the genes in the pathway was constructed providing a pathway-based interaction model, (3) a set of SNPs associated with genes in a disease-related subnetwork provides a network-based interaction model, and (4) a framework is based on the function of SNPs. The last approach uses expression SNPs (eSNPs or eQTLs), which are SNPs or loci that have defined effects on the abundance of transcripts of other genes. We constructed pairs of eSNPs and SNPs located in the target genes whose expression is regulated by eSNPs. For all four frameworks the SNP sets were exhaustively tested for pairwise interactions within the sets using a traditional logistic regression model after excluding genes that were previously identified to associate with the trait. Using previously published GWAS data for type 2 diabetes (T2D) and the biologically based pair-wise interaction modeling, we identify twelve genes not seen in the previous single locus analysis.
We present four approaches to detect interactions associated with complex diseases. The results show our approaches outperform the traditional single locus approaches in detecting genes that previously did not reach significance; the results also provide novel drug targets and biomarkers relevant to the underlying mechanisms of disease.
Recently it has become clear that only a small percentage (7%) of disease-associated single nucleotide polymorphisms (SNPs) are located in protein-coding regions, while the remaining 93% are located in gene regulatory regions or in intergenic regions. Thus, the understanding of how genetic variations control the expression of non-coding RNAs (in a tissue-dependent manner) has far-reaching implications. We tested the association of SNPs with expression levels (eQTLs) of large intergenic non-coding RNAs (lincRNAs), using genome-wide gene expression and genotype data from five different tissues. We identified 112 cis-regulated lincRNAs, of which 45% could be replicated in an independent dataset. We observed that 75% of the SNPs affecting lincRNA expression (lincRNA cis-eQTLs) were specific to lincRNA alone and did not affect the expression of neighboring protein-coding genes. We show that this specific genotype-lincRNA expression correlation is tissue-dependent and that many of these lincRNA cis-eQTL SNPs are also associated with complex traits and diseases.
Large intergenic non-coding RNAs (lincRNAs) are the largest class of non-coding RNA molecules in the human genome. Many genome-wide association studies (GWAS) have mapped disease-associated genetic variants (SNPs) to, or in, the vicinity of such lincRNA regions. However, it is not clear how these SNPs can affect the disease. We tested whether SNPs were also associated with the lincRNA expression levels in five different human primary tissues. We observed that there is a strong genotype-lincRNA expression correlation that is tissue-dependent. Many of the observed lincRNA cis-eQTLs are disease- or trait-associated SNPs. Our results suggest that lincRNA-eQTLs represent a novel link between non-coding SNPs and the expression of protein-coding genes, which can be exploited to understand the process of gene-regulation through lincRNAs in more detail.
Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with diseases of the colon including inflammatory bowel diseases (IBD) and colorectal cancer (CRC). However, the functional role of many of these SNPs is largely unknown and tissue-specific resources are lacking. Expression quantitative trait loci (eQTL) mapping identifies target genes of disease-associated SNPs. This study provides a comprehensive eQTL map of distal colonic samples obtained from 40 healthy African Americans and demonstrates their relevance for GWAS of colonic diseases.
8.4 million imputed SNPs were tested for their associations with 16,252 expression probes representing 12,363 unique genes. 1,941 significant cis-eQTL, corresponding to 122 independent signals, were identified at a false discovery rate (FDR) of 0.01. Overall, among colon cis-eQTL, there was significant enrichment for GWAS variants for IBD (Crohn’s disease [CD] and ulcerative colitis [UC]) and CRC as well as type 2 diabetes and body mass index. ERAP2, ADCY3, INPP5E, UBA7, SFMBT1, NXPE1 and REXO2 were identified as target genes for IBD-associated variants. The CRC-associated eQTL rs3802842 was associated with the expression of C11orf93 (COLCA2). Enrichment of colon eQTL near transcription start sites and for active histone marks was demonstrated, and eQTL with high population differentiation were identified.
Through the comprehensive study of eQTL in the human colon, this study identified novel target genes for IBD- and CRC-associated genetic variants. Moreover, bioinformatic characterization of colon eQTL provides a tissue-specific tool to improve understanding of biological differences in diseases between different ethnic groups.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1292-z) contains supplementary material, which is available to authorized users.
Expression quantitative trait loci; Colon; Gene expression; African Americans; Regulatory variation; Transcriptomics; Inflammatory bowel disease; Colorectal cancer; Genomics; Genome-wide association studies
Genome-wide association studies (GWAS) have identified numerous prostate cancer-associated risk loci. Some variants at these loci may be regulatory and influence expression of nearby genes. Such loci are known as cis-expression quantitative trait loci (cis-eQTL). As cis-eQTLs are highly tissue-specific, we asked if GWAS-identified prostate cancer risk loci are cis-eQTLs in human prostate tumor tissues. We investigated 50 prostate cancer samples for their genotype at 59 prostate cancer risk-associated single-nucleotide polymorphisms (SNPs) and performed cis-eQTL analysis of transcripts from paired primary tumors within two megabase windows. We tested 586 transcript–genotype associations, of which 27 were significant (false discovery rate ≤10%). An equivalent eQTL analysis of the same prostate cancer risk loci in lymphoblastoid cell lines did not result in any significant associations. The top-ranked cis-eQTL involved the IRX4 (Iroquois homeobox protein 4) transcript and rs12653946, tagged by rs10866528 in our study (P=4.91 × 10−5). Replication studies, linkage disequilibrium, and imputation analyses highlight population specificity at this locus. We independently validated IRX4 as a potential prostate cancer risk gene through cis-eQTL analysis of prostate cancer risk variants. Cis-eQTL analysis in relevant tissues, even with a small sample size, can be a powerful method to expedite functional follow-up of GWAS.
expression quantitative trait loci; eQTL; prostate cancer; GWAS; risk SNPs; IRX4
Increasing evidence suggests that single nucleotide polymorphisms (SNPs) associated with complex traits are more likely to be expression quantitative trait loci (eQTLs). Incorporating eQTL information hence has potential to increase power of genome-wide association studies (GWAS). In this paper, we propose using eQTL weights as prior information in SNP based association tests to improve test power while maintaining control of the family-wise error rate (FWER) or the false discovery rate (FDR). We apply the proposed methods to the analysis of a GWAS for childhood asthma consisting of 1296 unrelated individuals with German ancestry. The results confirm that eQTLs are enriched for previously reported asthma SNPs. We also find that some SNPs are insignificant using procedures without eQTL weighting, but become significant using eQTL-weighted Bonferroni or Benjamini–Hochberg procedures, while controlling the same FWER or FDR level. Some of these SNPs have been reported by independent studies in recent literature. The results suggest that the eQTL-weighted procedures provide a promising approach for improving power of GWAS. We also report the results of our methods applied to the large-scale European GABRIEL consortium data.
asthma; family-wise error rate; false discovery rate; eQTL; genome-wide association study; weighted hypothesis test
Current catalogs of brain expression quantitative trait loci (eQTL) are incomplete and the findings do not replicate well across studies. All existing cortical eQTL studies are small and emphasize the need for a meta-analysis. We performed a meta-analysis of 424 brain samples across five studies to identify regulatory variants influencing gene expression in human cortex. We identified 3584 genes in autosomes and chromosome X with false discovery rate q<0.05 whose expression was significantly associated with DNA sequence variation. Consistent with previous eQTL studies, local regulatory variants tended to occur symmetrically around transcription start sites and the effect was more evident in studies with large sample sizes. In contrast to random SNPs, we observed that significant eQTLs were more likely to be near 5'-untranslated regions and intersect with regulatory features. Permutation-based enrichment analysis revealed that SNPs associated with schizophrenia and bipolar disorder were enriched among brain eQTLs. Genes with significant eQTL evidence were also strongly associated with diseases from OMIM (Online Mendelian Inheritance in Man) and the NHGRI (National Human Genome Research Institute) genome-wide association study catalog. Surprisingly, we found that a large proportion (28%) of ~1000 autosomal genes encoding proteins needed for mitochondrial structure or function were eQTLs (enrichment P-value=1.3 × 10−9), suggesting a potential role for common genetic variation influencing the robustness of energy supply in brain and a possible role in the etiology of some psychiatric disorders. These systematically generated eQTL information should be a valuable resource in determining the functional mechanisms of brain gene expression and the underlying biology of associations with psychiatric disorders.