Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.
When interpreting genome-wide association studies showing that specific genetic variants are associated with disease risk, scientists look for a link between the genetic variant and a biological mechanism behind that disease. One functional mechanism is that the genetic variant may influence gene transcription via a co-localized genomic regulatory element, such as a transcription factor binding site within an open chromatin region. Often this type of regulation occurs in some cell types but not others. In this study, we look across eleven gene expression studies with seven cell types and consider how genetic transcription regulators, or eQTLs, replicate within and between cell types. We identify pervasive allelic heterogeneity, or transcriptional control of a single gene by multiple, independent eQTLs. We integrate extensive data on cell type specific regulatory elements from ENCODE to identify general methods of transcription regulation through enrichment of eQTLs within regulatory elements. We also build a classifier to predict eQTL replication across cell types. The results in this paper present a path to an integrative, predictive approach to improve our ability to understand the mechanistic basis of human phenotypic variation.
Genome-wide association studies (GWAS) have identified loci reproducibly associated with pulmonary diseases; however, the molecular mechanism underlying these associations are largely unknown. The objectives of this study were to discover genetic variants affecting gene expression in human lung tissue, to refine susceptibility loci for asthma identified in GWAS studies, and to use the genetics of gene expression and network analyses to find key molecular drivers of asthma. We performed a genome-wide search for expression quantitative trait loci (eQTL) in 1,111 human lung samples. The lung eQTL dataset was then used to inform asthma genetic studies reported in the literature. The top ranked lung eQTLs were integrated with the GWAS on asthma reported by the GABRIEL consortium to generate a Bayesian gene expression network for discovery of novel molecular pathways underpinning asthma. We detected 17,178 cis- and 593 trans- lung eQTLs, which can be used to explore the functional consequences of loci associated with lung diseases and traits. Some strong eQTLs are also asthma susceptibility loci. For example, rs3859192 on chr17q21 is robustly associated with the mRNA levels of GSDMA (P = 3.55×10−151). The genetic-gene expression network identified the SOCS3 pathway as one of the key drivers of asthma. The eQTLs and gene networks identified in this study are powerful tools for elucidating the causal mechanisms underlying pulmonary disease. This data resource offers much-needed support to pinpoint the causal genes and characterize the molecular function of gene variants associated with lung diseases.
Recent genome-wide association studies (GWAS) have identified genetic variants associated with lung diseases. The challenge now is to find the causal genes in GWAS–nominated chromosomal regions and to characterize the molecular function of disease-associated genetic variants. In this paper, we describe an international effort to systematically capture the genetic architecture of gene expression regulation in human lung. By studying lung specimens from 1,111 individuals of European ancestry, we found a large number of genetic variants affecting gene expression in the lung, or lung expression quantitative trait loci (eQTL). These lung eQTLs will serve as an important resource to aid in the understanding of the molecular underpinnings of lung biology and its disruption in disease. To demonstrate the utility of this lung eQTL dataset, we integrated our data with previous genetic studies on asthma. Through integrative techniques, we identified causal variants and genes in GWAS–nominated loci and found key molecular drivers for asthma. We feel that sharing our lung eQTLs dataset with the scientific community will leverage the impact of previous large-scale GWAS on lung diseases and function by providing much needed functional information to understand the molecular changes introduced by the susceptibility genetic variants.
Increasing evidence suggests that single nucleotide polymorphisms (SNPs) associated with complex traits are more likely to be expression quantitative trait loci (eQTLs). Incorporating eQTL information hence has potential to increase power of genome-wide association studies (GWAS). In this paper, we propose using eQTL weights as prior information in SNP based association tests to improve test power while maintaining control of the family-wise error rate (FWER) or the false discovery rate (FDR). We apply the proposed methods to the analysis of a GWAS for childhood asthma consisting of 1296 unrelated individuals with German ancestry. The results confirm that eQTLs are enriched for previously reported asthma SNPs. We also find that some SNPs are insignificant using procedures without eQTL weighting, but become significant using eQTL-weighted Bonferroni or Benjamini–Hochberg procedures, while controlling the same FWER or FDR level. Some of these SNPs have been reported by independent studies in recent literature. The results suggest that the eQTL-weighted procedures provide a promising approach for improving power of GWAS. We also report the results of our methods applied to the large-scale European GABRIEL consortium data.
asthma; family-wise error rate; false discovery rate; eQTL; genome-wide association study; weighted hypothesis test
While genome-wide association studies (GWASs) have been successful in identifying novel variants associated with various diseases, it has been much more difficult to determine the biological mechanisms underlying these associations. Expression quantitative trait loci (eQTL) provide another dimension to these data by associating single nucleotide polymorphisms (SNPs) with gene expression. We hypothesised that integrating SNPs known to be associated with type 2 diabetes with eQTLs and coexpression networks would enable the discovery of novel candidate genes for type 2 diabetes.
We selected 32 SNPs associated with type 2 diabetes in two or more independent GWASs. We used previously described eQTLs mapped from genotype and gene expression data collected from 1,008 morbidly obese patients to find genes with expression associated with these SNPs. We linked these genes to coexpression modules, and ranked the other genes in these modules using an inverse sum score.
We found 62 genes with expression associated with type 2 diabetes SNPs. We validated our method by linking highly ranked genes in the coexpression modules back to SNPs through a combined eQTL dataset. We showed that the eQTLs highlighted by this method are significantly enriched for association with type 2 diabetes in data from the Wellcome Trust Case Control Consortium (WTCCC, p = 0.026) and the Gene Environment Association Studies (GENEVA, p = 0.042), validating our approach. Many of the highly ranked genes are also involved in the regulation or metabolism of insulin, glucose or lipids.
We have devised a novel method, involving the integration of datasets of different modalities, to discover novel candidate genes for type 2 diabetes.
Genetics of type 2 diabetes; Genomics/proteomics; Mathematical modelling and simulation
The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r2) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.
Genome-wide association studies have led to the identification of susceptibility loci for a variety of human complex traits. What is still largely missing, however, is the understanding of the biological context in which these candidate variants act and of how they determine each trait. Given the localization of many GWAS loci outside coding regions and the important role of regulatory variation in shaping phenotypic variance, gene expression has been proposed as a plausible informative intermediate phenotype. Here we show that for a subset of the currently published GWAS this is indeed the case, by observing a significant excess of regulatory variants among disease loci. We propose an empirical methodology (regulatory trait concordance—RTC) able to integrate expression and disease data in order to detect causal regulatory effects. We show that the RTC outperforms simple correlation metrics under various simulated linkage disequilibrium (LD) scenarios. Our method is able to recover previously suspected causal regulatory effects from the literature and, as expected given the nature of the tested tissue, an overrepresentation of immunity-related candidates is observed. As the number of available tissues will increase, this prioritization approach will become even more useful in understanding the implication of regulatory variants in disease etiology.
Expression quantitative trait loci (eQTL), or genetic variants associated with changes in gene expression, have the potential to assist in interpreting results of genome-wide association studies (GWAS). eQTLs also have varying degrees of tissue specificity. By correlating the statistical significance of eQTLs mapped in various tissue types to their odds ratios reported in a large GWAS by the Wellcome Trust Case Control Consortium (WTCCC), we discovered that there is a significant association between diseases studied genetically and their relevant tissues. This suggests that eQTL data sets can be used to determine tissues that play a role in the pathogenesis of a disease, thereby highlighting these tissue types for further post-GWAS functional studies.
Although genome-wide association studies (GWAS) of complex traits have yielded more reproducible associations than had been discovered using any other approach, the loci characterized to date do not account for much of the heritability to such traits and, in general, have not led to improved understanding of the biology underlying complex phenotypes. Using a web site we developed to serve results of expression quantitative trait locus (eQTL) studies in lymphoblastoid cell lines from HapMap samples (http://www.scandb.org), we show that single nucleotide polymorphisms (SNPs) associated with complex traits (from http://www.genome.gov/gwastudies/) are significantly more likely to be eQTLs than minor-allele-frequency–matched SNPs chosen from high-throughput GWAS platforms. These findings are robust across a range of thresholds for establishing eQTLs (p-values from 10−4–10−8), and a broad spectrum of human complex traits. Analyses of GWAS data from the Wellcome Trust studies confirm that annotating SNPs with a score reflecting the strength of the evidence that the SNP is an eQTL can improve the ability to discover true associations and clarify the nature of the mechanism driving the associations. Our results showing that trait-associated SNPs are more likely to be eQTLs and that application of this information can enhance discovery of trait-associated SNPs for complex phenotypes raise the possibility that we can utilize this information both to increase the heritability explained by identifiable genetic factors and to gain a better understanding of the biology underlying complex traits.
We show here that single nucleotide polymorphisms (SNPs) associated with complex traits (as identified in the catalog of results from genome-wide association studies http://www.genome.gov/gwastudies/) are more likely than other SNPs chosen from high-throughput genotyping platforms to predict expression levels of genes. These observations confirm that genetic risk factors for complex traits will often affect phenotype by altering the amount or timing of protein production, rather than by changing the type of protein produced. This knowledge can be used to improve our ability to discover genetic risk factors for complex traits and to improve our understanding of their underlying biology.
There is genetic evidence that schizophrenia is a polygenic disorder with a large number of loci of small effect on disease susceptibility. Genome-wide association studies (GWASs) of schizophrenia have had limited success, with the best finding at the MHC locus at chromosome 6p. A recent effort of the Psychiatric GWAS consortium (PGC) yielded five novel loci for schizophrenia. In this study, we aim to highlight additional schizophrenia susceptibility loci from the PGC study by combining the top association findings from the discovery stage (9394 schizophrenia cases and 12 462 controls) with expression QTLs (eQTLs) and differential gene expression in whole blood of schizophrenia patients and controls. We examined the 6192 single-nucleotide polymorphisms (SNPs) with significance threshold at P<0.001. eQTLs were calculated for these SNPs in a sample of healthy controls (n=437). The transcripts significantly regulated by the top SNPs from the GWAS meta-analysis were subsequently tested for differential expression in an independent set of schizophrenia cases and controls (n=202). After correction for multiple testing, the eQTL analysis yielded 40 significant cis-acting effects of the SNPs. Seven of these transcripts show differential expression between cases and controls. Of these, the effect of three genes (RNF5, TRIM26 and HLA-DRB3) coincided with the direction expected from meta-analysis findings and were all located within the MHC region. Our results identify new genes of interest and highlight again the involvement of the MHC region in schizophrenia susceptibility.
expression; schizophrenia; eQTL; MHC; SNP
Identification of expression quantitative trait loci (eQTLs) is an emerging area in genomic study. The task requires an integrated analysis of genome-wide single nucleotide polymorphism (SNP) data and gene expression data, raising a new computational challenge due to the tremendous size of data.
We develop a method to identify eQTLs. The method represents eQTLs as information flux between genetic variants and transcripts. We use information theory to simultaneously interrogate SNP and gene expression data, resulting in a Transcriptional Information Map (TIM) which captures the network of transcriptional information that links genetic variations, gene expression and regulatory mechanisms. These maps are able to identify both cis- and trans- regulating eQTLs. The application on a dataset of leukemia patients identifies eQTLs in the regions of the GART, PCP4, DSCAM, and RIPK4 genes that regulate ADAMTS1, a known leukemia correlate.
The information theory approach presented in this paper is able to infer the dependence networks between SNPs and transcripts, which in turn can identify cis- and trans-eQTLs. The application of our method to the leukemia study explains how genetic variants and gene expression are linked to leukemia.
Motivation: Genome-wide association studies (GWAS) generate relationships between hundreds of thousands of single nucleotide polymorphisms (SNPs) and complex phenotypes. The contribution of the traditionally overlooked copy number variations (CNVs) to complex traits is also being actively studied. To facilitate the interpretation of the data and the designing of follow-up experimental validations, we have developed a database that enables the sensible prioritization of these variants by combining several approaches, involving not only publicly available physical and functional annotations but also multilocus linkage disequilibrium (LD) annotations as well as annotations of expression quantitative trait loci (eQTLs).
Results: For each SNP, the SCAN database provides: (i) summary information from eQTL mapping of HapMap SNPs to gene expression (evaluated by the Affymetrix exon array) in the full set of HapMap CEU (Caucasians from UT, USA) and YRI (Yoruba people from Ibadan, Nigeria) samples; (ii) LD information, in the case of a HapMap SNP, including what genes have variation in strong LD (pairwise or multilocus LD) with the variant and how well the SNP is covered by different high-throughput platforms; (iii) summary information available from public databases (e.g. physical and functional annotations); and (iv) summary information from other GWAS. For each gene, SCAN provides annotations on: (i) eQTLs for the gene (both local and distant SNPs) and (ii) the coverage of all variants in the HapMap at that gene on each high-throughput platform. For each genomic region, SCAN provides annotations on: (i) physical and functional annotations of all SNPs, genes and known CNVs within the region and (ii) all genes regulated by the eQTLs within the region.
Supplementary information: Supplementary data are available at Bioinformatics online.
Numerous single nucleotide polymorphisms (SNPs) associated with complex diseases have been identified by genome-wide association studies (GWAS) and expression quantitative trait loci (eQTLs) studies. However, few of these SNPs have explicit biological functions. Recent studies indicated that the SNPs within the 3’UTR regions of susceptibility genes could affect complex traits/diseases by affecting the function of miRNAs. These 3’UTR SNPs are functional candidates and therefore of interest to GWAS and eQTL researchers.
We developed a publicly available online database, MirSNP (http://cmbi.bjmu.edu.cn/mirsnp), which is a collection of human SNPs in predicted miRNA-mRNA binding sites. We identified 414,510 SNPs that might affect miRNA-mRNA binding. Annotations were added to these SNPs to predict whether a SNP within the target site would decrease/break or enhance/create an miRNA-mRNA binding site. By applying MirSNP database to three brain eQTL data sets, we identified four unreported SNPs (rs3087822, rs13042, rs1058381, and rs1058398), which might affect miRNA binding and thus affect the expression of their host genes in the brain. We also applied the MirSNP database to our GWAS for schizophrenia: seven predicted miRNA-related SNPs (p < 0.0001) were found in the schizophrenia GWAS. Our findings identified the possible functions of these SNP loci, and provide the basis for subsequent functional research.
MirSNP could identify the putative miRNA-related SNPs from GWAS and eQTLs researches and provide the direction for subsequent functional researches.
microRNA; Single nucleotide polymorphism (SNP); Genome-wide association study (GWAS); Expression quantitative trait loci (eQTLs); MirSNP
Motivation: Expression quantitative trait loci (eQTL) analysis links variations in gene expression levels to genotypes. For modern datasets, eQTL analysis is a computationally intensive task as it involves testing for association of billions of transcript-SNP (single-nucleotide polymorphism) pair. The heavy computational burden makes eQTL analysis less popular and sometimes forces analysts to restrict their attention to just a small subset of transcript-SNP pairs. As more transcripts and SNPs get interrogated over a growing number of samples, the demand for faster tools for eQTL analysis grows stronger.
Results: We have developed a new software for computationally efficient eQTL analysis called Matrix eQTL. In tests on large datasets, it was 2–3 orders of magnitude faster than existing popular tools for QTL/eQTL analysis, while finding the same eQTLs. The fast performance is achieved by special preprocessing and expressing the most computationally intensive part of the algorithm in terms of large matrix operations. Matrix eQTL supports additive linear and ANOVA models with covariates, including models with correlated and heteroskedastic errors. The issue of multiple testing is addressed by calculating false discovery rate; this can be done separately for cis- and trans-eQTLs.
Availability: Matlab and R implementations are available for free at http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL
Amyotrophic lateral sclerosis (ALS) is a progressive, neurodegenerative disease characterized by loss of upper and lower motor neurons. ALS is considered to be a complex trait and genome-wide association studies (GWAS) have implicated a few susceptibility loci. However, many more causal loci remain to be discovered. Since it has been shown that genetic variants associated with complex traits are more likely to be eQTLs than frequency-matched variants from GWAS platforms, we conducted a two-stage genome-wide screening for eQTLs associated with ALS. In addition, we applied an eQTL analysis to finemap association loci. Expression profiles using peripheral blood of 323 sporadic ALS patients and 413 controls were mapped to genome-wide genotyping data. Subsequently, data from a two-stage GWAS (3,568 patients and 10,163 controls) were used to prioritize eQTLs identified in the first stage (162 ALS, 207 controls). These prioritized eQTLs were carried forward to the second sample with both gene-expression and genotyping data (161 ALS, 206 controls). Replicated eQTL SNPs were then tested for association in the second-stage GWAS data to find SNPs associated with disease, that survived correction for multiple testing. We thus identified twelve cis eQTLs with nominally significant associations in the second-stage GWAS data. Eight SNP-transcript pairs of highest significance (lowest p = 1.27×10−51) withstood multiple-testing correction in the second stage and modulated CYP27A1 gene expression. Additionally, we show that C9orf72 appears to be the only gene in the 9p21.2 locus that is regulated in cis, showing the potential of this approach in identifying causative genes in association loci in ALS. This study has identified candidate genes for sporadic ALS, most notably CYP27A1. Mutations in CYP27A1 are causal to cerebrotendinous xanthomatosis which can present as a clinical mimic of ALS with progressive upper motor neuron loss, making it a plausible susceptibility gene for ALS.
Variation in gene expression has been found to be important in disease susceptibility and pharmacogenomics. Local and distant expression quantitative trait loci (eQTLs) have been identified via genome-wide association study (GWAS); yet the functional analysis of these variants has been challenging. The aim of this study was to unravel the functional consequence of a gene with a local SNP with evidence for local and distant regulatory roles in cellular sensitivity to cisplatin, one of the most widely used chemotherapeutic drugs. To this end, we measured cellular susceptibility to cisplatin in 176 HapMap lymphoblastoid cell lines derived from Yoruba individuals from Ibadan, Nigeria. The 276 cytotoxicity-associated SNPs at the suggestive threshold of P ≤ 0.0001 were significantly enriched for eQTLs. Of these SNPs, we found one intronic SNP, rs17115814, that had a significant relationship with the expression level of its host gene, PRPF39 (P= 0.0007), and a significant correlation with the expression of over 100 distant transcripts (P ≤ 0.0001). Successful knockdown of PRPF39 expression using siRNA resulted in a significant increase in cisplatin resistance. We then measured the expression of 61 downstream targets after PRPF39 knockdown and found 53 gene targets had significant (P ≤ 0.05) expression changes. Included in the list of genes that significantly changed after PRPF39 knockdown were MAP3K4 and TFPD2, two important signaling genes previously shown to be relevant in cisplatin response. Thus, modulation of a local target gene identified through a GWAS was followed by a downstream cascade of gene expression changes resulting in greater resistance to cisplatin.
Complexity and noise in expression quantitative trait loci (eQTL) studies make it difficult to distinguish potential regulatory relationships among the many interactions. The predominant method of identifying eQTLs finds associations that are significant at a genome-wide level. The vast number of statistical tests carried out on these data make false negatives very likely. Corrections for multiple testing error render genome-wide eQTL techniques unable to detect modest regulatory effects.
We propose an alternative method to identify eQTLs that builds on traditional approaches. In contrast to genome-wide techniques, our method determines the significance of an association between an expression trait and a locus with respect to the set of all associations to the expression trait. The use of this specific information facilitates identification of expression traits that have an expression profile that is characterized by a single exceptional association to a locus.
Our approach identifies expression traits that have exceptional associations regardless of the genome-wide significance of those associations. This property facilitates the identification of possible false negatives for genome-wide significance. Further, our approach has the property of prioritizing expression traits that are affected by few strong associations. Expression traits identified by this method may warrant additional study because their expression level may be affected by targeting genes near a single locus.
We demonstrate our method by identifying eQTL hotspots in Plasmodium falciparum (malaria) and Saccharomyces cerevisiae (yeast). We demonstrate the prioritization of traits with few strong genetic effects through Gene Ontology (GO) analysis of Yeast. Our results are strongly consistent with results gathered using genome-wide methods and identify additional hotspots and eQTLs.
New eQTLs and hotspots found with this method may represent regions of the genome or biological processes that are controlled through few relatively strong genetic interactions. These points of interest warrant experimental investigation.
Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Although recent genome-wide association studies (GWAS) have contributed to discovery of SLE susceptibility genes, few studies has been performed in Asian populations. Here, we report a GWAS for SLE examining 891 SLE cases and 3,384 controls and multi-stage replication studies examining 1,387 SLE cases and 28,564 controls in Japanese subjects. Considering that expression quantitative trait loci (eQTLs) have been implicated in genetic risks for autoimmune diseases, we integrated an eQTL study into the results of the GWAS. We observed enrichments of cis-eQTL positive loci among the known SLE susceptibility loci (30.8%) compared to the genome-wide SNPs (6.9%). In addition, we identified a novel association of a variant in the AF4/FMR2 family, member 1 (AFF1) gene at 4q21 with SLE susceptibility (rs340630; P = 8.3×10−9, odds ratio = 1.21). The risk A allele of rs340630 demonstrated a cis-eQTL effect on the AFF1 transcript with enhanced expression levels (P<0.05). As AFF1 transcripts were prominently expressed in CD4+ and CD19+ peripheral blood lymphocytes, up-regulation of AFF1 may cause the abnormality in these lymphocytes, leading to disease onset.
Although recent genome-wide association study (GWAS) approaches have successfully contributed to disease gene discovery, many susceptibility loci are known to be still uncaptured due to strict significance threshold for multiple hypothesis testing. Therefore, prioritization of GWAS results by incorporating additional information is recommended. Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Considering that abnormalities in B cell activity play essential roles in SLE, prioritization based on an expression quantitative trait loci (eQTLs) study for B cells would be a promising approach. In this study, we report a GWAS and multi-stage replication studies for SLE examining 2,278 SLE cases and 31,948 controls in Japanese subjects. We integrated eQTL study into the results of the GWAS and identified AFF1 as a novel SLE susceptibility loci. We also confirmed cis-regulatory effect of the locus on the AFF1 transcript. Our study would be one of the initial successes for detecting novel genetic locus using the eQTL study, and it should contribute to our understanding of the genetic loci being uncaptured by standard GWAS approaches.
Association analysis using genome-wide expression quantitative trait locus (eQTL) data investigates the effect that genetic variation has on cellular pathways and leads to the discovery of candidate regulators. Traditional analysis of eQTL data via pairwise statistical significance tests or linear regression does not leverage the availability of the structural information of the transcriptome, such as presence of gene networks that reveal correlation and potentially regulatory relationships among the study genes. We employ a new eQTL mapping algorithm, GFlasso, which we have previously developed for sparse structured regression, to reanalyze a genome-wide yeast dataset. GFlasso fully takes into account the dependencies among expression traits to suppress false positives and to enhance the signal/noise ratio. Thus, GFlasso leverages the gene-interaction network to discover the pleiotropic effects of genetic loci that perturb the expression level of multiple (rather than individual) genes, which enables us to gain more power in detecting previously neglected signals that are marginally weak but pleiotropically significant.
While eQTL hotspots in yeast have been reported previously as genomic regions controlling multiple genes, our analysis reveals additional novel eQTL hotspots and, more interestingly, uncovers groups of multiple contributing eQTL hotspots that affect the expression level of functional gene modules. To our knowledge, our study is the first to report this type of gene regulation stemming from multiple eQTL hotspots. Additionally, we report the results from in-depth bioinformatics analysis for three groups of these eQTL hotspots: ribosome biogenesis, telomere silencing, and retrotransposon biology. We suggest candidate regulators for the functional gene modules that map to each group of hotspots. Not only do we find that many of these candidate regulators contain mutations in the promoter and coding regions of the genes, in the case of the Ribi group, we provide experimental evidence suggesting that the identified candidates do regulate the target genes predicted by GFlasso.
Thus, this structured association analysis of a yeast eQTL dataset via GFlasso, coupled with extensive bioinformatics analysis, discovers a novel regulation pattern between multiple eQTL hotspots and functional gene modules. Furthermore, this analysis demonstrates the potential of GFlasso as a powerful computational tool for eQTL studies that exploit the rich structural information among expression traits due to correlation, regulation, or other forms of biological dependencies.
Following the recent success of genome-wide association studies in uncovering disease-associated genetic variants, the next challenge is to understand how these variants affect downstream pathways. The most proximal trait to a disease-associated variant, most commonly a single nucleotide polymorphism (SNP), is differential gene expression due to the cis effect of SNP alleles on transcription, translation, and/or splicing gene expression quantitative trait loci (eQTL). Several genome-wide SNP–gene expression association studies have already provided convincing evidence of widespread association of eQTLs. As a consequence, some eQTL associations are found in the same genomic region as a disease variant, either as a coincidence or a causal relationship. Cis-regulation of RPS26 gene expression and a type 1 diabetes (T1D) susceptibility locus have been colocalized to the 12q13 genomic region. A recent study has also suggested RPS26 as the most likely susceptibility gene for T1D in this genomic region. However, it is still not clear whether this colocalization is the result of chance alone or if RPS26 expression is directly correlated with T1D susceptibility, and therefore, potentially causal. Here, we derive and apply a statistical test of this hypothesis. We conclude that RPS26 expression is unlikely to be the molecular trait responsible for T1D susceptibility at this locus, at least not in a direct, linear connection.
Association studies; Gene expression; RPS26; T1D
Genome-wide association studies (GWASs) have been used to identify genes that increase risk of psychiatric diseases. However, much of the variation in disease risk is still unexplained, suggesting that there are genes still to be discovered. Functional annotation of genetic variants may increase the power of GWASs to identify disease genes by providing prior information that can be used in Bayesian analysis or in reducing the number of tests. Genetic mapping of expression quantitative trait loci (eQTLs) is helping us to reveal novel functional effects of thousands of single nucleotide polymorphisms (SNPs). The published brain eQTL studies are reviewed here, and major methodological issues and their possible solutions are discussed. We emphasize the frequently-ignored problems of batch effects, covariates, and multiple testing, all of which can lead to false positives and false negatives. The future application of eQTL data to the GWAS analysis is also discussed.
genome-wide association study; brain; psychiatric diseases; eQTL; genetics; SNP
Genome-wide association studies (GWAS) have identified genetic variants associated with numerous cardiovascular and metabolic diseases. Newly identified polymorphisms associated with myocardial infarction, dyslipidemia, hypertension, diabetes, and insulin resistance suggest novel mechanistic pathways that underlie these and other complex diseases. Working out the connections between the polymorphisms identified in GWAS and their biological mechanisms has been especially challenging given the number of non-coding variants identified thus far. In this review, we discuss the utility of expression quantitative trait locus (eQTL) databases in the study of non-coding variants with respect to cardiovascular and metabolic phenotypes. Recent successes in using eQTL data to link variants with functional candidate genes will be reviewed, and the shortcomings of this approach will be outlined. Finally, we discuss the emerging next generation of eQTL studies that take advantage of the ability to generate induced pluripotent stem cell lines from population cohorts.
cardiovascular disease; eQTL; GWAS; induced pluripotent stem cells; RNA-seq
The determination of expression quantitative trait loci (eQTL) epistasis – a form of functional interaction between genetic loci that affect gene expression – is an important step toward the thorough understanding of gene regulation. Since gene expression has emerged as an “intermediate” molecular phenotype eQTL epistasis might help to explain the relationship between genotype and higher level organismal phenotypes such as diseases. A characteristic feature of eQTL analysis is the big number of tests required to identify associations between gene expression and genetic loci variability. This problem is aggravated, when epistatic effects between eQTLs are analyzed. In this review, we discuss recent algorithmic approaches for the detection of eQTL epistasis and highlight lessons that can be learned from current methods.
eQTL; epistasis; genetic association; genetic crosses; network modules
Motivation: Analysis of expression quantitative trait loci (eQTL) significantly contributes to the determination of gene regulation programs. However, the discovery and analysis of associations of gene expression levels and their underlying sequence polymorphisms continue to pose many challenges. Methods are limited in their ability to illuminate the full structure of the eQTL data. Most rely on an exhaustive, genome scale search that considers all possible locus–gene pairs and tests the linkage between each locus and gene.
Result: To analyze eQTLs in a more comprehensive and efficient way, we developed the Graph based eQTL Decomposition method (GeD) that allows us to model genotype and expression data using an eQTL association graph. Through graph-based heuristics, GeD identifies dense subgraphs in the eQTL association graph. By identifying eQTL association cliques that expose the hidden structure of genotype and expression data, GeD effectively filters out most locus–gene pairs that are unlikely to have significant linkage. We apply GeD on eQTL data from Plasmodium falciparum, the human malaria parasite, and show that GeD reveals the structure of the relationship between all loci and all genes on a whole genome level. Furthermore, GeD allows us to uncover additional eQTLs with lower FDR, providing an important complement to traditional eQTL analysis methods.
Kidneys age at different rates, such that some people show little or no effects of aging whereas others show rapid functional decline. We sequentially used transcriptional profiling and expression quantitative trait loci (eQTL) mapping to narrow down which genes to test for association with kidney aging. We first performed whole-genome transcriptional profiling to find 630 genes that change expression with age in the kidney. Using two methods to detect eQTLs, we found 101 of these age-regulated genes contain expression-associated SNPs. We tested the eQTLs for association with kidney aging, measured by glomerular filtration rate (GFR) using combined data from the Baltimore Longitudinal Study of Aging (BLSA) and the InCHIANTI study. We found a SNP association (rs1711437 in MMP20) with kidney aging (uncorrected p = 3.6×10−5, empirical p = 0.01) that explains 1%–2% of the variance in GFR among individuals. The results of this sequential analysis may provide the first evidence for a gene association with kidney aging in humans.
Although family studies have shown that genes play a role in longevity, it has proven difficult to identify the specific genetic variants involved. We developed a sequential transcriptional profiling and eQTL mapping approach to find genes associated with aging in the kidney. First, we used genome-wide transcriptional profiling to determine which genes change expression with age in kidney tissue. Next, we used two methods to determine which of these age-regulated genes contain SNPs that associate with expression level. The allele-specific expression method, which compares the mRNA levels of the two alleles within heterozygous individuals, was more sensitive than the total expression method. We tested the eQTLs for association with kidney aging in two populations. One gene that encodes an extracellular matrix protein, MMP20, significantly associated with kidney aging, providing the first gene association with kidney aging. Our approach of combining both expression and genotype data can be applied to any phenotype of interest to increase the power to find genetic associations.
Expression Quantitative Trait Locus (eQTL) analysis is a powerful tool to study the biological mechanisms linking the genotype with gene expression. Such analyses can identify genomic locations where genotypic variants influence the expression of genes, both in close proximity to the variant (cis-eQTL), and on other chromosomes (trans-eQTL). Many traditional eQTL methods are based on a linear regression model. In this study, we propose a novel method by which to identify eQTL associations with information theory and machine learning approaches. Mutual Information (MI) is used to describe the association between genetic marker and gene expression. MI can detect both linear and non-linear associations. What’s more, it can capture the heterogeneity of the population. Advanced feature selection methods, Maximum Relevance Minimum Redundancy (mRMR) and Incremental Feature Selection (IFS), were applied to optimize the selection of the affected genes by the genetic marker. When we applied our method to a study of apoE-deficient mice, it was found that the cis-acting eQTLs are stronger than trans-acting eQTLs but there are more trans-acting eQTLs than cis-acting eQTLs. We compared our results (mRMR.eQTL) with R/qtl, and MatrixEQTL (modelLINEAR and modelANOVA). In female mice, 67.9% of mRMR.eQTL results can be confirmed by at least two other methods while only 14.4% of R/qtl result can be confirmed by at least two other methods. In male mice, 74.1% of mRMR.eQTL results can be confirmed by at least two other methods while only 18.2% of R/qtl result can be confirmed by at least two other methods. Our methods provide a new way to identify the association between genetic markers and gene expression. Our software is available from supporting information.
The polymorphism in microRNA target site (PolymiRTS) database aims to identify single-nucleotide polymorphisms (SNPs) that affect miRNA targeting in human and mouse. These polymorphisms can disrupt the regulation of gene expression by miRNAs and are candidate genetic variants responsible for transcriptional and phenotypic variation. The database is therefore organized to provide links between SNPs in miRNA target sites, cis-acting expression quantitative trait loci (eQTLs), and the results of genome-wide association studies (GWAS) of human diseases. Here, we describe new features that have been integrated in the PolymiRTS database, including: (i) polymiRTSs in genes associated with human diseases and traits in GWAS, (ii) polymorphisms in target sites that have been supported by a variety of experimental methods and (iii) polymorphisms in miRNA seed regions. A large number of newly identified microRNAs and SNPs, recently published mouse phenotypes, and human and mouse eQTLs have also been integrated into the database. The PolymiRTS database is available at http://compbio.uthsc.edu/miRSNP/.