A large fraction of human genes are regulated by genetic variation near the transcribed sequence (cis-eQTL, expression quantitative trait locus), and many cis-eQTLs have implications for human disease. Less is known regarding the effects of genetic variation on expression of distant genes (trans-eQTLs) and their biological mechanisms. In this work, we use genome-wide data on SNPs and array-based expression measures from mononuclear cells obtained from a population-based cohort of 1,799 Bangladeshi individuals to characterize cis- and trans-eQTLs and determine if observed trans-eQTL associations are mediated by expression of transcripts in cis with the SNPs showing trans-association, using Sobel tests of mediation. We observed 434 independent trans-eQTL associations at a false-discovery rate of 0.05, and 189 of these trans-eQTLs were also cis-eQTLs (enrichment P<0.0001). Among these 189 trans-eQTL associations, 39 were significantly attenuated after adjusting for a cis-mediator based on Sobel P<10-5. We attempted to replicate 21 of these mediation signals in two European cohorts, and while only 7 trans-eQTL associations were present in one or both cohorts, 6 showed evidence of cis-mediation. Analyses of simulated data show that complete mediation will be observed as partial mediation in the presence of mediator measurement error or imperfect LD between measured and causal variants. Our data demonstrates that trans-associations can become significantly stronger or switch directions after adjusting for a potential mediator. Using simulated data, we demonstrate that this phenomenon is expected in the presence of strong cis-trans confounding and when the measured cis-transcript is correlated with the true (unmeasured) mediator. In conclusion, by applying mediation analysis to eQTL data, we show that a substantial fraction of observed trans-eQTL associations can be explained by cis-mediation. Future studies should focus on understanding the mechanisms underlying widespread cis-mediation and their relevance to disease biology, as well as using mediation analysis to improve eQTL discovery.
Expression quantitative trait locus (eQTL) studies have demonstrated that human genes can be regulated by genetic variation residing close to the gene (cis-eQTLs) or in a distant region or on a different chromosome (trans-eQTLs). While cis-eQTL variants are likely to affect transcription factor binding or chromatin structure, our understanding of the mechanisms underlying trans-eQTLs is incomplete. We hypothesize that a substantial fraction of trans-eQTLs influence expression of distant genes through mediation by expression levels of a cis-transcript. In this paper, we use genome-wide SNPs and expression data for 1,799 South Asians to identify cis- and trans-eQTLs and to test our hypothesis using Sobel tests of mediation. Among 189 observed trans-eQTL associations, we provide evidence of cis-mediation for 39, 6 of which show mediation in an independent European cohort. We used simulated data to demonstrate that complete mediation will be observed as partial mediation in the presence of mediator measurement error or imperfect LD between measured and causal variants. We also demonstrate how unobserved confounding variables and incorrect mediator selection can bias mediation estimates. In conclusion, we have identified cis-mediators for many trans-eQTLs and described a mediation analysis approach that can be used to validate, characterize, and enhance discovery of trans-eQTLs.
Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.
When interpreting genome-wide association studies showing that specific genetic variants are associated with disease risk, scientists look for a link between the genetic variant and a biological mechanism behind that disease. One functional mechanism is that the genetic variant may influence gene transcription via a co-localized genomic regulatory element, such as a transcription factor binding site within an open chromatin region. Often this type of regulation occurs in some cell types but not others. In this study, we look across eleven gene expression studies with seven cell types and consider how genetic transcription regulators, or eQTLs, replicate within and between cell types. We identify pervasive allelic heterogeneity, or transcriptional control of a single gene by multiple, independent eQTLs. We integrate extensive data on cell type specific regulatory elements from ENCODE to identify general methods of transcription regulation through enrichment of eQTLs within regulatory elements. We also build a classifier to predict eQTL replication across cell types. The results in this paper present a path to an integrative, predictive approach to improve our ability to understand the mechanistic basis of human phenotypic variation.
Chronic obstructive pulmonary disease (COPD) is the fourth leading cause of mortality worldwide. Recent genome-wide association studies (GWAS) have identified robust susceptibility loci associated with COPD. However, the mechanisms mediating the risk conferred by these loci remain to be found. The goal of this study was to identify causal genes/variants within susceptibility loci associated with COPD. In the discovery cohort, genome-wide gene expression profiles of 500 non-tumor lung specimens were obtained from patients undergoing lung surgery. Blood-DNA from the same patients were genotyped for 1,2 million SNPs. Following genotyping and gene expression quality control filters, 409 samples were analyzed. Lung expression quantitative trait loci (eQTLs) were identified and overlaid onto three COPD susceptibility loci derived from GWAS; 4q31 (HHIP), 4q22 (FAM13A), and 19q13 (RAB4B, EGLN2, MIA, CYP2A6). Significant eQTLs were replicated in two independent datasets (n = 363 and 339). SNPs previously associated with COPD and lung function on 4q31 (rs1828591, rs13118928) were associated with the mRNA expression of HHIP. An association between mRNA expression level of FAM13A and SNP rs2045517 was detected at 4q22, but did not reach statistical significance. At 19q13, significant eQTLs were detected with EGLN2. In summary, this study supports HHIP, FAM13A, and EGLN2 as the most likely causal COPD genes on 4q31, 4q22, and 19q13, respectively. Strong lung eQTL SNPs identified in this study will need to be tested for association with COPD in case-control studies. Further functional studies will also be needed to understand the role of genes regulated by disease-related variants in COPD.
DNA sequence variation causes changes in gene expression, which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes. A genetic locus containing a sequence variation that affects gene expression is called an “expression quantitative trait locus” (eQTL). Whereas the impact of cellular context on expression levels in general is well established, a lot less is known about the cell-state specificity of eQTL. Previous studies differed with respect to how “dynamic eQTL” were defined. Here, we propose a unified framework distinguishing static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. Further, we introduce a new approach to simultaneously infer eQTL from different cell types. By using murine mRNA expression data from four stages of hematopoiesis and 14 related cellular traits, we demonstrate that static, conditional and dynamic eQTL, although derived from the same expression data, represent functionally distinct types of eQTL. While static eQTL affect generic cellular processes, non-static eQTL are more often involved in hematopoiesis and immune response. Our analysis revealed substantial effects of individual genetic variation on cell type-specific expression regulation. Among a total number of 3,941 eQTL we detected 2,729 static eQTL, 1,187 eQTL were conditionally active in one or several cell types, and 70 eQTL affected expression changes during cell type transitions. We also found evidence for feedback control mechanisms reverting the effect of an eQTL specifically in certain cell types. Loci correlated with hematological traits were enriched for conditional eQTL, thus, demonstrating the importance of conditional eQTL for understanding molecular mechanisms underlying physiological trait variation. The classification proposed here has the potential to streamline and unify future analysis of conditional and dynamic eQTL as well as many other kinds of QTL data.
Complex physiological traits are affected through subtle changes of molecular traits like gene expression in the relevant tissues, which in turn are caused by genetic variation. A genetic locus containing a sequence variation affecting gene expression is called an expression quantitative trait locus (eQTL). Understanding the tissue and cell type specificity of eQTL effects is essential for revealing the molecular mechanisms underlying disease phenotypes. However, so far the cell-state dependence of eQTL is poorly understood. In order to systematically assess the importance of cell state-specific eQTL, we propose to distinguish static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. We applied our framework to mouse gene expression data from four hematopoietic stages and related cellular traits. The different eQTL classes, although derived from the same expression data, represent functionally distinct types of eQTL. Importantly, conditional eQTL are well correlated with relevant hematological traits. These findings emphasize the condition specificity of many regulatory relationships, even if the conditions under study are related. This calls for due caution when transferring conclusions about regulatory mechanisms across cell types or tissues. The proposed classification will also help to unravel dynamic behaviors in many other kinds of QTL data.
Gene-based analysis has become popular in genomic research because of its appealing biological and statistical properties compared with those of a single-locus analysis. However, only a few, if any, studies have discussed a mapping of expression quantitative trait loci (eQTL) in a gene-based framework. Neither study has discussed ancestry-informative eQTL nor investigated their roles in pharmacogenetics by integrating single nucleotide polymorphism (SNP)-based eQTL (s-eQTL) and gene-based eQTL (g-eQTL).
In this g-eQTL mapping study, the transcript expression levels of genes (transcript-level genes; T-genes) were correlated with the SNPs of genes (sequence-level genes; S-genes) by using a method of gene-based partial least squares (PLS). Ancestry-informative transcripts were identified using a rank-score-based multivariate association test, and ancestry-informative eQTL were identified using Fisher’s exact test. Furthermore, key ancestry-predictive eQTL were selected in a flexible discriminant analysis. We analyzed SNPs and gene expression of 210 independent people of African-, Asian- and European-descent. We identified numerous cis- and trans-acting g-eQTL and s-eQTL for each population by using PLS. We observed ancestry information enriched in eQTL. Furthermore, we identified 2 ancestry-informative eQTL associated with adverse drug reactions and/or drug response. Rs1045642, located on MDR1, is an ancestry-informative eQTL (P = 2.13E-13, using Fisher’s exact test) associated with adverse drug reactions to amitriptyline and nortriptyline and drug responses to morphine. Rs20455, located in KIF6, is an ancestry-informative eQTL (P = 2.76E-23, using Fisher’s exact test) associated with the response to statin drugs (e.g., pravastatin and atorvastatin). The ancestry-informative eQTL of drug biotransformation genes were also observed; cross-population cis-acting expression regulators included SPG7, TAP2, SLC7A7, and CYP4F2. Finally, we also identified key ancestry-predictive eQTL and established classification models with promising training and testing accuracies in separating samples from close populations.
In summary, we developed a gene-based PLS procedure and a SAS macro for identifying g-eQTL and s-eQTL. We established data archives of eQTL for global populations. The program and data archives are accessible at http://www.stat.sinica.edu.tw/hsinchou/genetics/eQTL/HapMapII.htm. Finally, the results from our investigations regarding the interrelationship between eQTL, ancestry information, and pharmacodynamics provide rich resources for future eQTL studies and practical applications in population genetics and medical genetics.
Gene-based approach; Expression quantitative trait locus (eQTL); Partial least squares (PLS); Ancestry-informative marker (AIM); Pharmacogenetics; Adverse drug reaction; Drug response; Drug biotransformation
There is considerable variability in the susceptibility of smokers to develop chronic obstructive pulmonary disease (COPD). The only known genetic risk factor is severe deficiency of α1-antitrypsin, which is present in 1–2% of individuals with COPD. We conducted a genome-wide association study (GWAS) in a homogenous case-control cohort from Bergen, Norway (823 COPD cases and 810 smoking controls) and evaluated the top 100 single nucleotide polymorphisms (SNPs) in the family-based International COPD Genetics Network (ICGN; 1891 Caucasian individuals from 606 pedigrees) study. The polymorphisms that showed replication were further evaluated in 389 subjects from the US National Emphysema Treatment Trial (NETT) and 472 controls from the Normative Aging Study (NAS) and then in a fourth cohort of 949 individuals from 127 extended pedigrees from the Boston Early-Onset COPD population. Logistic regression models with adjustments of covariates were used to analyze the case-control populations. Family-based association analyses were conducted for a diagnosis of COPD and lung function in the family populations. Two SNPs at the α-nicotinic acetylcholine receptor (CHRNA 3/5) locus were identified in the genome-wide association study. They showed unambiguous replication in the ICGN family-based analysis and in the NETT case-control analysis with combined p-values of 1.48×10−10, (rs8034191) and 5.74×10−10 (rs1051730). Furthermore, these SNPs were significantly associated with lung function in both the ICGN and Boston Early-Onset COPD populations. The C allele of the rs8034191 SNP was estimated to have a population attributable risk for COPD of 12.2%. The association of hedgehog interacting protein (HHIP) locus on chromosome 4 was also consistently replicated, but did not reach genome-wide significance levels. Genome-wide significant association of the HHIP locus with lung function was identified in the Framingham Heart study (Wilk et al., companion article in this issue of PLoS Genetics; doi:10.1371/journal.pgen.1000429). The CHRNA 3/5 and the HHIP loci make a significant contribution to the risk of COPD. CHRNA3/5 is the same locus that has been implicated in the risk of lung cancer.
There is considerable variability in the susceptibility of smokers to develop chronic obstructive pulmonary disease (COPD), which is a heritable multi-factorial trait. Identifying the genetic determinants of COPD risk will have tremendous public health importance. This study describes the first genome-wide association study (GWAS) in COPD. We conducted a GWAS in a homogenous case-control cohort from Norway and evaluated the top 100 single nucleotide polymorphisms in the family-based International COPD Genetics Network. The polymorphisms that showed replication were further evaluated in subjects from the US National Emphysema Treatment Trial and controls from the Normative Aging Study and then in a fourth cohort of extended pedigrees from the Boston Early-Onset COPD population. Two polymorphisms in the α-nicotinic acetylcholine receptor 3/5 locus on chromosome 15 showed unambiguous evidence of association with COPD. This locus has previously been implicated in both smoking behavior and risk of lung cancer, suggesting the possibility of multiple functional polymorphisms in the region or a single polymorphism with wide phenotypic consequences. The hedgehog interacting protein (HHIP) locus on chromosome 4, which is associated with COPD, is also a significant risk locus for COPD.
Gene expression genetic studies in human tissues and cells identify cis- and trans-acting expression quantitative trait loci (eQTLs). These eQTLs provide insights into regulatory mechanisms underlying disease risk. However, few studies systematically characterized eQTL results across cell and tissues types. We synthesized eQTL results from >50 datasets, including new primary data from human brain, peripheral plaque and kidney samples, in order to discover features of human eQTLs.
We find a substantial number of robust cis-eQTLs and far fewer trans-eQTLs consistent across tissues. Analysis of 45 full human GWAS scans indicates eQTLs are enriched overall, and above nSNPs, among positive statistical signals in genetic mapping studies, and account for a significant fraction of the strongest human trait effects. Expression QTLs are enriched for gene centricity, higher population allele frequencies, in housekeeping genes, and for coincidence with regulatory features, though there is little evidence of 5′ or 3′ positional bias. Several regulatory categories are not enriched including microRNAs and their predicted binding sites and long, intergenic non-coding RNAs. Among the most tissue-ubiquitous cis-eQTLs, there is enrichment for genes involved in xenobiotic metabolism and mitochondrial function, suggesting these eQTLs may have adaptive origins. Several strong eQTLs (CDK5RAP2, NBPFs) coincide with regions of reported human lineage selection. The intersection of new kidney and plaque eQTLs with related GWAS suggest possible gene prioritization. For example, butyrophilins are now linked to arterial pathogenesis via multiple genetic and expression studies. Expression QTL and GWAS results are made available as a community resource through the NHLBI GRASP database [http://apps.nhlbi.nih.gov/grasp/].
Expression QTLs inform the interpretation of human trait variability, and may account for a greater fraction of phenotypic variability than protein-coding variants. The synthesis of available tissue eQTL data highlights many strong cis-eQTLs that may have important biologic roles and could serve as positive controls in future studies. Our results indicate some strong tissue-ubiquitous eQTLs may have adaptive origins in humans. Efforts to expand the genetic, splicing and tissue coverage of known eQTLs will provide further insights into human gene regulation.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-532) contains supplementary material, which is available to authorized users.
eQTL; RNA; Gene expression; Genomics; Transcriptome; GWAS; Genome-wide; Tissue; Cis; Trans
The discovery of expression quantitative trait loci (“eQTLs”) can
help to unravel genetic contributions to complex traits. We identified genetic
determinants of human liver gene expression variation using two independent
collections of primary tissue profiled with Agilent
(n = 206) and Illumina (n = 60)
expression arrays and Illumina SNP genotyping (550K), and we also incorporated
data from a published study (n = 266). We found that
∼30% of SNP-expression correlations in one study failed to replicate
in either of the others, even at thresholds yielding high reproducibility in
simulations, and we quantified numerous factors affecting reproducibility. Our
data suggest that drug exposure, clinical descriptors, and unknown factors
associated with tissue ascertainment and analysis have substantial effects on
gene expression and that controlling for hidden confounding variables
significantly increases replication rate. Furthermore, we found that
reproducible eQTL SNPs were heavily enriched near gene starts and ends, and
subsequently resequenced the promoters and 3′UTRs for 14 genes and tested
the identified haplotypes using luciferase assays. For three genes, significant
haplotype-specific in vitro functional differences correlated
directly with expression levels, suggesting that many bona fide
eQTLs result from functional variants that can be mechanistically isolated in a
high-throughput fashion. Finally, given our study design, we were able to
discover and validate hundreds of liver eQTLs. Many of these relate directly to
complex traits for which liver-specific analyses are likely to be relevant, and
we identified dozens of potential connections with disease-associated loci.
These included previously characterized eQTL contributors to diabetes, drug
response, and lipid levels, and they suggest novel candidates such as a role for
NOD2 expression in leprosy risk and
C2orf43 in prostate cancer. In general, the work presented
here will be valuable for future efforts to precisely identify and functionally
characterize genetic contributions to a variety of complex traits.
Many disease-associated genetic variants do not alter protein sequences and are
difficult to precisely identify. Discovery of expression quantitative trait loci
(eQTL), or correlations between genetic variants and gene expression levels,
offers one means of addressing this challenge. However, eQTL studies in primary
cells have several shortcomings. In particular, their reproducibility is largely
unknown, the variables that generate unreliable associations are
uncharacterized, and the resolution of their findings is constrained by linkage
disequilibrium. We performed a three-way replication study of eQTLs in primary
human livers. We demonstrated that ∼67% of cis-eQTL associations are
replicated in an independent study and that known polymorphisms overlapping
expression probes, SNP-to-gene distance, and unmeasured confounding variables
all influence the replication rate. We fine-mapped 14 eQTLs and identified
causative polymorphisms in the promoter or 3′UTR for 3 genes, suggesting
that a considerable fraction of eQTLs are driven by proximal variants that are
amenable to functional isolation. Finally, we found hundreds of overlaps between
SNPs associated with complex traits and replicated eQTL SNPs. Our data provide
both cautionary (i.e. non-reproducibility of many strong eQTLs)
and optimistic (i.e. precise identification of functional
non-coding variants) forecasts for future eQTL analyses and the complex traits
that they influence.
Most genome-wide association studies consider genes that are located closest to single nucleotide polymorphisms (SNPs) that are highly significant for those studies. However, the significance of the associations between SNPs and candidate genes has not been fully determined. An alternative approach that used SNPs in expression quantitative trait loci (eQTL) was reported previously for Crohn’s disease; it was shown that eQTL-based preselection for follow-up studies was a useful approach for identifying risk loci from the results of moderately sized GWAS. In this study, we propose an approach that uses eQTL SNPs to support the functional relationships between an SNP and a candidate gene in a genome-wide association study. The genome-wide SNP genotypes and 10 biochemical measures (fasting glucose levels, BUN, serum albumin levels, AST, ALT, gamma GTP, total cholesterol, HDL cholesterol, triglycerides, and LDL cholesterol) were obtained from the Korean Association Resource (KARE) consortium. The eQTL SNPs were isolated from the SNP dataset based on the RegulomeDB eQTL-SNP data from the ENCODE projects and two recent eQTL reports. A total of 25,658 eQTL SNPs were tested for their association with the 10 metabolic traits in 2 Korean populations (Ansung and Ansan). The proportion of phenotypic variance explained by eQTL and non-eQTL SNPs showed that eQTL SNPs were more likely to be associated with the metabolic traits genetically compared with non-eQTL SNPs. Finally, via a meta-analysis of the two Korean populations, we identified 14 eQTL SNPs that were significantly associated with metabolic traits. These results suggest that our approach can be expanded to other genome-wide association studies.
We examined the association between single-nucleotide polymorphisms (SNPs) previously associated with chronic obstructive pulmonary disease (COPD) and/or lung function with COPD and COPD-related phenotypes in a novel cohort of patients with severe to very severe COPD. We examined 315 cases of COPD and 330 Caucasian control smokers from Poland. We included three SNPs previously associated with COPD: rs7671167 (FAM13A), rs13180 (IREB2), and rs8034191 (CHRNA 3/5), and four SNPs associated with lung function in a genome-wide association study of general population samples: rs2070600 (AGER), rs11134242 (ADCY2), rs4316710 (THSD4), and rs17096090 (INTS12). We tested for associations with severe COPD and COPD-related phenotypes, including lung function, smoking behavior, and body mass index. Subjects with COPD were older (average age 62 versus 58 years, P < 0.01), with more pack-years of smoking (45 versus 33 pack-years, P < 0.01). CHRNA3/5 (odds ratio [OR], 1.89; 95% confidence interval [CI], 1.5–2.4; P = 7.4 × 10−7), IREB2 (OR, 0.69; 95% CI, 0.5–0.9; P = 3.4 × 10−3), and ADCY2 (OR, 1.35; 95% CI, 1.1–1.7; P = 0.01) demonstrated significant associations with COPD. FAM13A (OR, 0.8; 95% CI, 0.7–1.0; P = 0.11) approached statistical significance. FAM13A and ADCY2 also demonstrated a significant association with lung function. Thus, in severe to very severe COPD, we demonstrate a replication of association between two SNPs previously associated with COPD (CHRNA3/5 and IREB2), as well as an association with COPD of one locus initially associated with lung function (ADCY2).
chronic obstructive pulmonary disease; genetic association analysis; lung function; smoking; nicotine addiction
Rationale: A genome-wide association study (GWAS) for circulating chronic obstructive pulmonary disease (COPD) biomarkers could identify genetic determinants of biomarker levels and COPD susceptibility.
Objectives: To identify genetic variants of circulating protein biomarkers and novel genetic determinants of COPD.
Methods: GWAS was performed for two pneumoproteins, Clara cell secretory protein (CC16) and surfactant protein D (SP-D), and five systemic inflammatory markers (C-reactive protein, fibrinogen, IL-6, IL-8, and tumor necrosis factor-α) in 1,951 subjects with COPD. For genome-wide significant single nucleotide polymorphisms (SNPs) (P < 1 × 10−8), association with COPD susceptibility was tested in 2,939 cases with COPD and 1,380 smoking control subjects. The association of candidate SNPs with mRNA expression in induced sputum was also elucidated.
Measurements and Main Results: Genome-wide significant susceptibility loci affecting biomarker levels were found only for the two pneumoproteins. Two discrete loci affecting CC16, one region near the CC16 coding gene (SCGB1A1) on chromosome 11 and another locus approximately 25 Mb away from SCGB1A1, were identified, whereas multiple SNPs on chromosomes 6 and 16, in addition to SNPs near SFTPD, had genome-wide significant associations with SP-D levels. Several SNPs affecting circulating CC16 levels were significantly associated with sputum mRNA expression of SCGB1A1 (P = 0.009–0.03). Several SNPs highly associated with CC16 or SP-D levels were nominally associated with COPD in a collaborative GWAS (P = 0.001–0.049), although these COPD associations were not replicated in two additional cohorts.
Conclusions: Distant genetic loci and biomarker-coding genes affect circulating levels of COPD-related pneumoproteins. A subset of these protein quantitative trait loci may influence their gene expression in the lung and/or COPD susceptibility.
Clinical trial registered with www.clinicaltrials.gov (NCT 00292552).
biomarker; chronic obstructive pulmonary disease; genome-wide association study
Genome-wide association studies (GWAS) have identified loci reproducibly associated with pulmonary diseases; however, the molecular mechanism underlying these associations are largely unknown. The objectives of this study were to discover genetic variants affecting gene expression in human lung tissue, to refine susceptibility loci for asthma identified in GWAS studies, and to use the genetics of gene expression and network analyses to find key molecular drivers of asthma. We performed a genome-wide search for expression quantitative trait loci (eQTL) in 1,111 human lung samples. The lung eQTL dataset was then used to inform asthma genetic studies reported in the literature. The top ranked lung eQTLs were integrated with the GWAS on asthma reported by the GABRIEL consortium to generate a Bayesian gene expression network for discovery of novel molecular pathways underpinning asthma. We detected 17,178 cis- and 593 trans- lung eQTLs, which can be used to explore the functional consequences of loci associated with lung diseases and traits. Some strong eQTLs are also asthma susceptibility loci. For example, rs3859192 on chr17q21 is robustly associated with the mRNA levels of GSDMA (P = 3.55×10−151). The genetic-gene expression network identified the SOCS3 pathway as one of the key drivers of asthma. The eQTLs and gene networks identified in this study are powerful tools for elucidating the causal mechanisms underlying pulmonary disease. This data resource offers much-needed support to pinpoint the causal genes and characterize the molecular function of gene variants associated with lung diseases.
Recent genome-wide association studies (GWAS) have identified genetic variants associated with lung diseases. The challenge now is to find the causal genes in GWAS–nominated chromosomal regions and to characterize the molecular function of disease-associated genetic variants. In this paper, we describe an international effort to systematically capture the genetic architecture of gene expression regulation in human lung. By studying lung specimens from 1,111 individuals of European ancestry, we found a large number of genetic variants affecting gene expression in the lung, or lung expression quantitative trait loci (eQTL). These lung eQTLs will serve as an important resource to aid in the understanding of the molecular underpinnings of lung biology and its disruption in disease. To demonstrate the utility of this lung eQTL dataset, we integrated our data with previous genetic studies on asthma. Through integrative techniques, we identified causal variants and genes in GWAS–nominated loci and found key molecular drivers for asthma. We feel that sharing our lung eQTLs dataset with the scientific community will leverage the impact of previous large-scale GWAS on lung diseases and function by providing much needed functional information to understand the molecular changes introduced by the susceptibility genetic variants.
Genome-wide gene expression profiling has been extensively used to generate biological hypotheses based on differential expression. Recently, many studies have used microarrays to measure gene expression levels across genetic mapping populations. These gene expression phenotypes have been used for genome-wide association analyses, an analysis referred to as expression QTL (eQTL) mapping. Here, eQTL analysis was performed in adipose tissue from 28 inbred strains of mice. We focused our analysis on “trans-eQTL bands”, defined as instances in which the expression patterns of many genes were all associated to a common genetic locus. Genes comprising trans-eQTL bands were screened for enrichments in functional gene sets representing known biological pathways, and genes located at associated trans-eQTL band loci were considered candidate transcriptional modulators. We demonstrate that these patterns were enriched for previously characterized relationships between known upstream transcriptional regulators and their downstream target genes. Moreover, we used this strategy to identify both novel regulators and novel members of known pathways. Finally, based on a putative regulatory relationship identified in our analysis, we identified and validated a previously uncharacterized role for cyclin H in the regulation of oxidative phosphorylation. We believe that the specific molecular hypotheses generated in this study will reveal many additional pathway members and regulators, and that the analysis approaches described herein will be broadly applicable to other eQTL data sets.
Genome-wide association (GWA) analyses seek to relate variation of phenotype to underlying (and presumably causative) variation in genotype. Recently, many GWA studies have identified candidate genes underlying disease phenotypes such as diabetes, heart disease, and cancer risk. Many groups have also performed GWA using variation in gene expression levels as the input phenotype. These expression QTL (eQTL) studies have provided important clues as to the genetic basis of gene expression regulation. Here, we perform an eQTL study in mouse adipose tissue. We then developed a systematic analysis method to relate these patterns of eQTL associations to biological pathways. Based on this approach, we identified putative roles for thousands of candidate upstream regulators and candidate pathway members in relation to specific biological pathways. Statistical analysis showed that these predictions were highly enriched for true genetic modulators of these pathways. Based on these predictions, we also experimentally validated a role for one particular gene, cyclin H, in the regulation of oxidative phosphorylation. These findings illustrate a new analysis method for relating eQTL studies to biological pathways and identify cyclin H as a novel key regulator of cellular energy metabolism.
Elucidating the genetic basis underlying hepatic gene expression variability is of importance to understand the aetiology of the disease and variation in drug metabolism. To date, no genome-wide expression quantitative trait loci (eQTLs) analysis has been conducted in the Han Chinese population, the largest ethnic group in the world.
We performed a genome-wide eQTL mapping in a set of Han Chinese liver tissue samples (n=64). The data were then compared with published eQTL data from a Caucasian population. We then performed correlations between these eQTLs with important pharmacogenes, and genome-wide association study (GWAS) identified single nucleotide polymorphisms (SNPs), in particular those identified in the Asian population.
Our analyses identified 1669 significant eQTLs (false discovery rate (FDR) < 0.05). We found that 41% of Asian eQTLs were also eQTLs in Caucasians at the genome-wide significance level (p=10−8). Both cis- and trans-eQTLs in the Asian population were also more likely to be eQTLs in Caucasians (p<10−4). Enrichment analyses revealed that trait-associated GWAS-SNPs were enriched within the eQTLs identified in our data, so were the GWAS-SNPs specifically identified in Asian populations in a separate analysis (p<0.001 for both). We also found that hepatic expression of very important pharmacogenetic (VIP) genes (n=44) and a manually curated list of major genes involved in pharmacokinetics (n=341) were both more likely to be controlled by eQTLs (p<0.002 for both).
Our study provided, for the first time, a comprehensive hepatic eQTL analysis in a non-European population, further generating valuable data for characterising the genetic basis of human diseases and pharmacogenetic traits.
Clinical genetics; Genetics; Genome-wide; Molecular genetics
Profiles of sequence variants that influence gene transcription are very important for understanding mechanisms that affect phenotypic variation and disease susceptibility. Using genotypes at 1.4 million SNPs and a comprehensive transcriptional profile of 15,454 coding genes and 6,113 lincRNA genes obtained from peripheral blood cells of 298 Japanese individuals, we mapped expression quantitative trait loci (eQTLs). We identified 3,804 cis-eQTLs (within 500 kb from target genes) and 165 trans-eQTLs (>500 kb away or on different chromosomes). Cis-eQTLs were often located in transcribed or adjacent regions of genes; among these regions, 5′ untranslated regions and 5′ flanking regions had the largest effects. Epigenetic evidence for regulatory potential accumulated in public databases explained the magnitude of the effects of our eQTLs. Cis-eQTLs were often located near the respective target genes, if not within genes. Large effect sizes were observed with eQTLs near target genes, and effect sizes were obviously attenuated as the eQTL distance from the gene increased. Using a very stringent significance threshold, we identified 165 large-effect trans-eQTLs. We used our eQTL map to assess 8,069 disease-associated SNPs identified in 1,436 genome-wide association studies (GWAS). We identified genes that might be truly causative, but GWAS might have failed to identify for 148 out of the GWAS-identified SNPs; for example, TUFM (P = 3.3E-48) was identified for inflammatory bowel disease (early onset); ZFP90 (P = 4.4E-34) for ulcerative colitis; and IDUA (P = 2.2E-11) for Parkinson's disease. We identified four genes (P<2.0E-14) that might be related to three diseases and two hematological traits; each expression is regulated by trans-eQTLs on a different chromosome than the gene.
The genetic risk factors for chronic obstructive pulmonary disease (COPD) are still largely unknown. To date, genome-wide association studies (GWASs) of limited size have identified several novel risk loci for COPD at CHRNA3/CHRNA5/IREB2, HHIP and FAM13A; additional loci may be identified through larger studies. We performed a GWAS using a total of 3499 cases and 1922 control subjects from four cohorts: the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE); the Normative Aging Study (NAS) and National Emphysema Treatment Trial (NETT); Bergen, Norway (GenKOLS); and the COPDGene study. Genotyping was performed on Illumina platforms with additional markers imputed using 1000 Genomes data; results were summarized using fixed-effect meta-analysis. We identified a new genome-wide significant locus on chromosome 19q13 (rs7937, OR = 0.74, P = 2.9 × 10−9). Genotyping this single nucleotide polymorphism (SNP) and another nearby SNP in linkage disequilibrium (rs2604894) in 2859 subjects from the family-based International COPD Genetics Network study (ICGN) demonstrated supportive evidence for association for COPD (P = 0.28 and 0.11 for rs7937 and rs2604894), pre-bronchodilator FEV1 (P = 0.08 and 0.04) and severe (GOLD 3&4) COPD (P = 0.09 and 0.017). This region includes RAB4B, EGLN2, MIA and CYP2A6, and has previously been identified in association with cigarette smoking behavior.
Amyotrophic lateral sclerosis (ALS) is a progressive, neurodegenerative disease characterized by loss of upper and lower motor neurons. ALS is considered to be a complex trait and genome-wide association studies (GWAS) have implicated a few susceptibility loci. However, many more causal loci remain to be discovered. Since it has been shown that genetic variants associated with complex traits are more likely to be eQTLs than frequency-matched variants from GWAS platforms, we conducted a two-stage genome-wide screening for eQTLs associated with ALS. In addition, we applied an eQTL analysis to finemap association loci. Expression profiles using peripheral blood of 323 sporadic ALS patients and 413 controls were mapped to genome-wide genotyping data. Subsequently, data from a two-stage GWAS (3,568 patients and 10,163 controls) were used to prioritize eQTLs identified in the first stage (162 ALS, 207 controls). These prioritized eQTLs were carried forward to the second sample with both gene-expression and genotyping data (161 ALS, 206 controls). Replicated eQTL SNPs were then tested for association in the second-stage GWAS data to find SNPs associated with disease, that survived correction for multiple testing. We thus identified twelve cis eQTLs with nominally significant associations in the second-stage GWAS data. Eight SNP-transcript pairs of highest significance (lowest p = 1.27×10−51) withstood multiple-testing correction in the second stage and modulated CYP27A1 gene expression. Additionally, we show that C9orf72 appears to be the only gene in the 9p21.2 locus that is regulated in cis, showing the potential of this approach in identifying causative genes in association loci in ALS. This study has identified candidate genes for sporadic ALS, most notably CYP27A1. Mutations in CYP27A1 are causal to cerebrotendinous xanthomatosis which can present as a clinical mimic of ALS with progressive upper motor neuron loss, making it a plausible susceptibility gene for ALS.
Many disease-associated variants affect gene expression levels (expression quantitative trait loci, eQTLs) and expression profiling using next generation sequencing (NGS) technology is a powerful way to detect these eQTLs. We analyzed 94 total blood samples from healthy volunteers with DeepSAGE to gain specific insight into how genetic variants affect the expression of genes and lengths of 3′-untranslated regions (3′-UTRs). We detected previously unknown cis-eQTL effects for GWAS hits in disease- and physiology-associated traits. Apart from cis-eQTLs that are typically easily identifiable using microarrays or RNA-sequencing, DeepSAGE also revealed many cis-eQTLs for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. We also identified and confirmed SNPs that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of messenger RNAs (mRNA). We then combined the power of RNA-sequencing with DeepSAGE by performing a meta-analysis of three datasets, leading to the identification of many more cis-eQTLs. Our results indicate that DeepSAGE data is useful for eQTL mapping of known and unknown transcripts, and for identifying SNPs that affect alternative polyadenylation. Because of the inherent differences between DeepSAGE and RNA-sequencing, our complementary, integrative approach leads to greater insight into the molecular consequences of many disease-associated variants.
Many genetic variants that are associated with diseases also affect gene expression levels. We used a next generation sequencing approach targeting 3′ transcript ends (DeepSAGE) to gain specific insight into how genetic variants affect the expression of genes and the usage and length of 3′-untranslated regions. We detected many associations for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. Some of these variants are also associated with disease. We also identified and confirmed variants that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of mRNAs. We conclude that DeepSAGE is useful for detecting eQTL effects on both known and unknown transcripts, and for identifying variants that affect alternative polyadenylation.
Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Although recent genome-wide association studies (GWAS) have contributed to discovery of SLE susceptibility genes, few studies has been performed in Asian populations. Here, we report a GWAS for SLE examining 891 SLE cases and 3,384 controls and multi-stage replication studies examining 1,387 SLE cases and 28,564 controls in Japanese subjects. Considering that expression quantitative trait loci (eQTLs) have been implicated in genetic risks for autoimmune diseases, we integrated an eQTL study into the results of the GWAS. We observed enrichments of cis-eQTL positive loci among the known SLE susceptibility loci (30.8%) compared to the genome-wide SNPs (6.9%). In addition, we identified a novel association of a variant in the AF4/FMR2 family, member 1 (AFF1) gene at 4q21 with SLE susceptibility (rs340630; P = 8.3×10−9, odds ratio = 1.21). The risk A allele of rs340630 demonstrated a cis-eQTL effect on the AFF1 transcript with enhanced expression levels (P<0.05). As AFF1 transcripts were prominently expressed in CD4+ and CD19+ peripheral blood lymphocytes, up-regulation of AFF1 may cause the abnormality in these lymphocytes, leading to disease onset.
Although recent genome-wide association study (GWAS) approaches have successfully contributed to disease gene discovery, many susceptibility loci are known to be still uncaptured due to strict significance threshold for multiple hypothesis testing. Therefore, prioritization of GWAS results by incorporating additional information is recommended. Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Considering that abnormalities in B cell activity play essential roles in SLE, prioritization based on an expression quantitative trait loci (eQTLs) study for B cells would be a promising approach. In this study, we report a GWAS and multi-stage replication studies for SLE examining 2,278 SLE cases and 31,948 controls in Japanese subjects. We integrated eQTL study into the results of the GWAS and identified AFF1 as a novel SLE susceptibility loci. We also confirmed cis-regulatory effect of the locus on the AFF1 transcript. Our study would be one of the initial successes for detecting novel genetic locus using the eQTL study, and it should contribute to our understanding of the genetic loci being uncaptured by standard GWAS approaches.
Cigarette smoking is the major environmental risk factor for chronic obstructive pulmonary disease (COPD). Genome-wide association studies have provided compelling associations for three loci with COPD. In this study, we aimed to estimate direct, i.e., independent from smoking, and indirect effects of those loci on COPD development using mediation analysis. We included a total of 3,424 COPD cases and 1,872 unaffected controls with data on two smoking-related phenotypes: lifetime average smoking intensity and cumulative exposure to tobacco smoke (pack years). Our analysis revealed that effects of two linked variants (rs1051730 and rs8034191) in the AGPHD1/CHRNA3 cluster on COPD development are significantly, yet not entirely, mediated by the smoking-related phenotypes. Approximately 30 % of the total effect of variants in the AGPHD1/CHRNA3 cluster on COPD development was mediated by pack years. Simultaneous analysis of modestly (r2 = 0.21) linked markers in CHRNA3 and IREB2 revealed that an even larger (~42 %) proportion of the total effect of the CHRNA3 locus on COPD was mediated by pack years after adjustment for an IREB2 single nucleotide polymorphism. This study confirms the existence of direct effects of the AGPHD1/CHRNA3, IREB2, FAM13A and HHIP loci on COPD development. While the association of the AGPHD1/CHRNA3 locus with COPD is significantly mediated by smoking-related phenotypes, IREB2 appears to affect COPD independently of smoking.
Rationale: Genome-wide association studies (GWAS) have identified loci influencing lung function, but fewer genes influencing chronic obstructive pulmonary disease (COPD) are known.
Objectives: Perform meta-analyses of GWAS for airflow obstruction, a key pathophysiologic characteristic of COPD assessed by spirometry, in population-based cohorts examining all participants, ever smokers, never smokers, asthma-free participants, and more severe cases.
Methods: Fifteen cohorts were studied for discovery (3,368 affected; 29,507 unaffected), and a population-based family study and a meta-analysis of case-control studies were used for replication and regional follow-up (3,837 cases; 4,479 control subjects). Airflow obstruction was defined as FEV1 and its ratio to FVC (FEV1/FVC) both less than their respective lower limits of normal as determined by published reference equations.
Measurements and Main Results: The discovery meta-analyses identified one region on chromosome 15q25.1 meeting genome-wide significance in ever smokers that includes AGPHD1, IREB2, and CHRNA5/CHRNA3 genes. The region was also modestly associated among never smokers. Gene expression studies confirmed the presence of CHRNA5/3 in lung, airway smooth muscle, and bronchial epithelial cells. A single-nucleotide polymorphism in HTR4, a gene previously related to FEV1/FVC, achieved genome-wide statistical significance in combined meta-analysis. Top single-nucleotide polymorphisms in ADAM19, RARB, PPAP2B, and ADAMTS19 were nominally replicated in the COPD meta-analysis.
Conclusions: These results suggest an important role for the CHRNA5/3 region as a genetic risk factor for airflow obstruction that may be independent of smoking and implicate the HTR4 gene in the etiology of airflow obstruction.
chronic obstructive pulmonary disease; single-nucleotide polymorphism; genes
The development of COPD in subjects with alpha-1 antitrypsin (AAT) deficiency is likely to be influenced by modifier genes. Genome-wide association studies and integrative genomics approaches in COPD have demonstrated significant associations with SNPs in the chromosome 15q region that includes CHRNA3 (cholinergic nicotine receptor alpha3) and IREB2 (iron regulatory binding protein 2).
We investigated whether SNPs in the chromosome 15q region would be modifiers for lung function and COPD in AAT deficiency.
The current analysis included 378 PIZZ subjects in the AAT Genetic Modifiers Study and a replication cohort of 458 subjects from the UK AAT Deficiency National Registry. Nine SNPs in LOC123688, CHRNA3 and IREB2 were selected for genotyping. FEV1 percent of predicted and FEV1/FVC ratio were analyzed as quantitative phenotypes. Family-based association analysis was performed in the AAT Genetic Modifiers Study. In the replication set, general linear models were used for quantitative phenotypes and logistic regression models were used for the presence/absence of emphysema or COPD.
Three SNPs (rs2568494 in IREB2, rs8034191 in LOC123688, and rs1051730 in CHRNA3) were associated with pre-bronchodilator FEV1 percent of predicted in the AAT Genetic Modifiers Study. Two SNPs (rs2568494 and rs1051730) were associated with the post-bronchodilator FEV1 percent of predicted and pre-bronchodilator FEV1/FVC ratio; SNP-by-gender interactions were observed. In the UK National Registry dataset, rs2568494 was significantly associated with emphysema in the male subgroup; significant SNP-by-smoking interactions were observed.
IREB2 and CHRNA3 are potential genetic modifiers of COPD phenotypes in individuals with severe AAT deficiency and may be sex-specific in their impact.
CHRNA3; Chronic obstructive pulmonary disease; Genetic association analysis; Genetic modifiers; IREB2
In recent years genome-wide association studies (GWAS) have uncovered numerous chromosomal loci associated with various electrocardiographic traits and cardiac arrhythmia predisposition. A considerable fraction of these loci lie within inter-genic regions. The underlying trait-associated variants likely reside in regulatory regions and exert their effect by modulating gene expression. Hence, the key to unraveling the molecular mechanisms underlying these cardiac traits is to interrogate variants for association with differential transcript abundance by expression quantitative trait locus (eQTL) analysis. In this study we conducted an eQTL analysis of human heart. For a total of 129 left ventricular samples that were collected from non-diseased human donor hearts, genome-wide transcript abundance and genotyping was determined using microarrays. Each of the 18,402 transcripts and 897,683 SNP genotypes that remained after pre-processing and stringent quality control were tested for eQTL effects. We identified 771 eQTLs, regulating 429 unique transcripts. Overlaying these eQTLs with cardiac GWAS loci identified novel candidates for studies aimed at elucidating the functional and transcriptional impact of these loci. Thus, this work provides for the first time a comprehensive eQTL map of human heart: a powerful and unique resource that enables systems genetics approaches for the study of cardiac traits.
One major expectation from the transcriptome in humans is to characterize the biological basis of associations identified by genome-wide association studies. So far, few cis expression quantitative trait loci (eQTLs) have been reliably related to disease susceptibility. Trans-regulating mechanisms may play a more prominent role in disease susceptibility. We analyzed 12,808 genes detected in at least 5% of circulating monocyte samples from a population-based sample of 1,490 European unrelated subjects. We applied a method of extraction of expression patterns—independent component analysis—to identify sets of co-regulated genes. These patterns were then related to 675,350 SNPs to identify major trans-acting regulators. We detected three genomic regions significantly associated with co-regulated gene modules. Association of these loci with multiple expression traits was replicated in Cardiogenics, an independent study in which expression profiles of monocytes were available in 758 subjects. The locus 12q13 (lead SNP rs11171739), previously identified as a type 1 diabetes locus, was associated with a pattern including two cis eQTLs, RPS26 and SUOX, and 5 trans eQTLs, one of which (MADCAM1) is a potential candidate for mediating T1D susceptibility. The locus 12q24 (lead SNP rs653178), which has demonstrated extensive disease pleiotropy, including type 1 diabetes, hypertension, and celiac disease, was associated to a pattern strongly correlating to blood pressure level. The strongest trans eQTL in this pattern was CRIP1, a known marker of cellular proliferation in cancer. The locus 12q15 (lead SNP rs11177644) was associated with a pattern driven by two cis eQTLs, LYZ and YEATS4, and including 34 trans eQTLs, several of them tumor-related genes. This study shows that a method exploiting the structure of co-expressions among genes can help identify genomic regions involved in trans regulation of sets of genes and can provide clues for understanding the mechanisms linking genome-wide association loci to disease.
One major expectation from the transcriptome in humans is to help characterize the biological basis of associations identified by genome-wide association studies. Here, we take advantage of recent technical and methodological advances to examine the influence of natural genetic variability on >12,000 genes expressed in the monocyte, a blood cell playing a key role in immunity-related disorders and atherosclerosis. By examining 1,490 European population-based subjects, we identify three regions of the genome reproducibly associated with specific patterns of gene expression. Two of these regions overlap genetic variants previously known to be involved in the susceptibility to type 1 diabetes, celiac disease, and hypertension. Genes whose expression is modulated by these genetic variants may act as mediators in the causal relationship linking the variability of the genome to complex disease. These findings illustrate how integration of genetic and transcriptomic data at an epidemiological scale can help decipher the genetic basis of complex diseases.
For many complex traits, genetic variants have been found associated. However, it is still mostly unclear through which downstream mechanism these variants cause these phenotypes. Knowledge of these intermediate steps is crucial to understand pathogenesis, while also providing leads for potential pharmacological intervention. Here we relied upon natural human genetic variation to identify effects of these variants on trans-gene expression (expression quantitative trait locus mapping, eQTL) in whole peripheral blood from 1,469 unrelated individuals. We looked at 1,167 published trait- or disease-associated SNPs and observed trans-eQTL effects on 113 different genes, of which we replicated 46 in monocytes of 1,490 different individuals and 18 in a smaller dataset that comprised subcutaneous adipose, visceral adipose, liver tissue, and muscle tissue. HLA single-nucleotide polymorphisms (SNPs) were 10-fold enriched for trans-eQTLs: 48% of the trans-acting SNPs map within the HLA, including ulcerative colitis susceptibility variants that affect plausible candidate genes AOAH and TRBV18 in trans. We identified 18 pairs of unlinked SNPs associated with the same phenotype and affecting expression of the same trans-gene (21 times more than expected, P<10−16). This was particularly pronounced for mean platelet volume (MPV): Two independent SNPs significantly affect the well-known blood coagulation genes GP9 and F13A1 but also C19orf33, SAMD14, VCL, and GNG11. Several of these SNPs have a substantially higher effect on the downstream trans-genes than on the eventual phenotypes, supporting the concept that the effects of these SNPs on expression seems to be much less multifactorial. Therefore, these trans-eQTLs could well represent some of the intermediate genes that connect genetic variants with their eventual complex phenotypic outcomes.
Many genetic variants have been found associated with diseases. However, for many of these genetic variants, it remains unclear how they exert their effect on the eventual phenotype. We investigated genetic variants that are known to be associated with diseases and complex phenotypes and assessed whether these variants were also associated with gene expression levels in a set of 1,469 unrelated whole blood samples. For several diseases, such as type 1 diabetes and ulcerative colitis, we observed that genetic variants affect the expression of genes, not implicated before. For complex traits, such as mean platelet volume and mean corpuscular volume, we observed that independent genetic variants on different chromosomes influence the expression of exactly the same genes. For mean platelet volume, these genes include well-known blood coagulation genes but also genes with still unknown functions. These results indicate that, by systematically correlating genetic variation with gene expression levels, it is possible to identify downstream genes, which provide important avenues for further research.