Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.
When interpreting genome-wide association studies showing that specific genetic variants are associated with disease risk, scientists look for a link between the genetic variant and a biological mechanism behind that disease. One functional mechanism is that the genetic variant may influence gene transcription via a co-localized genomic regulatory element, such as a transcription factor binding site within an open chromatin region. Often this type of regulation occurs in some cell types but not others. In this study, we look across eleven gene expression studies with seven cell types and consider how genetic transcription regulators, or eQTLs, replicate within and between cell types. We identify pervasive allelic heterogeneity, or transcriptional control of a single gene by multiple, independent eQTLs. We integrate extensive data on cell type specific regulatory elements from ENCODE to identify general methods of transcription regulation through enrichment of eQTLs within regulatory elements. We also build a classifier to predict eQTL replication across cell types. The results in this paper present a path to an integrative, predictive approach to improve our ability to understand the mechanistic basis of human phenotypic variation.
DNA sequence variation causes changes in gene expression, which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes. A genetic locus containing a sequence variation that affects gene expression is called an “expression quantitative trait locus” (eQTL). Whereas the impact of cellular context on expression levels in general is well established, a lot less is known about the cell-state specificity of eQTL. Previous studies differed with respect to how “dynamic eQTL” were defined. Here, we propose a unified framework distinguishing static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. Further, we introduce a new approach to simultaneously infer eQTL from different cell types. By using murine mRNA expression data from four stages of hematopoiesis and 14 related cellular traits, we demonstrate that static, conditional and dynamic eQTL, although derived from the same expression data, represent functionally distinct types of eQTL. While static eQTL affect generic cellular processes, non-static eQTL are more often involved in hematopoiesis and immune response. Our analysis revealed substantial effects of individual genetic variation on cell type-specific expression regulation. Among a total number of 3,941 eQTL we detected 2,729 static eQTL, 1,187 eQTL were conditionally active in one or several cell types, and 70 eQTL affected expression changes during cell type transitions. We also found evidence for feedback control mechanisms reverting the effect of an eQTL specifically in certain cell types. Loci correlated with hematological traits were enriched for conditional eQTL, thus, demonstrating the importance of conditional eQTL for understanding molecular mechanisms underlying physiological trait variation. The classification proposed here has the potential to streamline and unify future analysis of conditional and dynamic eQTL as well as many other kinds of QTL data.
Complex physiological traits are affected through subtle changes of molecular traits like gene expression in the relevant tissues, which in turn are caused by genetic variation. A genetic locus containing a sequence variation affecting gene expression is called an expression quantitative trait locus (eQTL). Understanding the tissue and cell type specificity of eQTL effects is essential for revealing the molecular mechanisms underlying disease phenotypes. However, so far the cell-state dependence of eQTL is poorly understood. In order to systematically assess the importance of cell state-specific eQTL, we propose to distinguish static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. We applied our framework to mouse gene expression data from four hematopoietic stages and related cellular traits. The different eQTL classes, although derived from the same expression data, represent functionally distinct types of eQTL. Importantly, conditional eQTL are well correlated with relevant hematological traits. These findings emphasize the condition specificity of many regulatory relationships, even if the conditions under study are related. This calls for due caution when transferring conclusions about regulatory mechanisms across cell types or tissues. The proposed classification will also help to unravel dynamic behaviors in many other kinds of QTL data.
Chronic obstructive pulmonary disease (COPD) is the fourth leading cause of mortality worldwide. Recent genome-wide association studies (GWAS) have identified robust susceptibility loci associated with COPD. However, the mechanisms mediating the risk conferred by these loci remain to be found. The goal of this study was to identify causal genes/variants within susceptibility loci associated with COPD. In the discovery cohort, genome-wide gene expression profiles of 500 non-tumor lung specimens were obtained from patients undergoing lung surgery. Blood-DNA from the same patients were genotyped for 1,2 million SNPs. Following genotyping and gene expression quality control filters, 409 samples were analyzed. Lung expression quantitative trait loci (eQTLs) were identified and overlaid onto three COPD susceptibility loci derived from GWAS; 4q31 (HHIP), 4q22 (FAM13A), and 19q13 (RAB4B, EGLN2, MIA, CYP2A6). Significant eQTLs were replicated in two independent datasets (n = 363 and 339). SNPs previously associated with COPD and lung function on 4q31 (rs1828591, rs13118928) were associated with the mRNA expression of HHIP. An association between mRNA expression level of FAM13A and SNP rs2045517 was detected at 4q22, but did not reach statistical significance. At 19q13, significant eQTLs were detected with EGLN2. In summary, this study supports HHIP, FAM13A, and EGLN2 as the most likely causal COPD genes on 4q31, 4q22, and 19q13, respectively. Strong lung eQTL SNPs identified in this study will need to be tested for association with COPD in case-control studies. Further functional studies will also be needed to understand the role of genes regulated by disease-related variants in COPD.
There is considerable variability in the susceptibility of smokers to develop chronic obstructive pulmonary disease (COPD). The only known genetic risk factor is severe deficiency of α1-antitrypsin, which is present in 1–2% of individuals with COPD. We conducted a genome-wide association study (GWAS) in a homogenous case-control cohort from Bergen, Norway (823 COPD cases and 810 smoking controls) and evaluated the top 100 single nucleotide polymorphisms (SNPs) in the family-based International COPD Genetics Network (ICGN; 1891 Caucasian individuals from 606 pedigrees) study. The polymorphisms that showed replication were further evaluated in 389 subjects from the US National Emphysema Treatment Trial (NETT) and 472 controls from the Normative Aging Study (NAS) and then in a fourth cohort of 949 individuals from 127 extended pedigrees from the Boston Early-Onset COPD population. Logistic regression models with adjustments of covariates were used to analyze the case-control populations. Family-based association analyses were conducted for a diagnosis of COPD and lung function in the family populations. Two SNPs at the α-nicotinic acetylcholine receptor (CHRNA 3/5) locus were identified in the genome-wide association study. They showed unambiguous replication in the ICGN family-based analysis and in the NETT case-control analysis with combined p-values of 1.48×10−10, (rs8034191) and 5.74×10−10 (rs1051730). Furthermore, these SNPs were significantly associated with lung function in both the ICGN and Boston Early-Onset COPD populations. The C allele of the rs8034191 SNP was estimated to have a population attributable risk for COPD of 12.2%. The association of hedgehog interacting protein (HHIP) locus on chromosome 4 was also consistently replicated, but did not reach genome-wide significance levels. Genome-wide significant association of the HHIP locus with lung function was identified in the Framingham Heart study (Wilk et al., companion article in this issue of PLoS Genetics; doi:10.1371/journal.pgen.1000429). The CHRNA 3/5 and the HHIP loci make a significant contribution to the risk of COPD. CHRNA3/5 is the same locus that has been implicated in the risk of lung cancer.
There is considerable variability in the susceptibility of smokers to develop chronic obstructive pulmonary disease (COPD), which is a heritable multi-factorial trait. Identifying the genetic determinants of COPD risk will have tremendous public health importance. This study describes the first genome-wide association study (GWAS) in COPD. We conducted a GWAS in a homogenous case-control cohort from Norway and evaluated the top 100 single nucleotide polymorphisms in the family-based International COPD Genetics Network. The polymorphisms that showed replication were further evaluated in subjects from the US National Emphysema Treatment Trial and controls from the Normative Aging Study and then in a fourth cohort of extended pedigrees from the Boston Early-Onset COPD population. Two polymorphisms in the α-nicotinic acetylcholine receptor 3/5 locus on chromosome 15 showed unambiguous evidence of association with COPD. This locus has previously been implicated in both smoking behavior and risk of lung cancer, suggesting the possibility of multiple functional polymorphisms in the region or a single polymorphism with wide phenotypic consequences. The hedgehog interacting protein (HHIP) locus on chromosome 4, which is associated with COPD, is also a significant risk locus for COPD.
The discovery of expression quantitative trait loci (“eQTLs”) can
help to unravel genetic contributions to complex traits. We identified genetic
determinants of human liver gene expression variation using two independent
collections of primary tissue profiled with Agilent
(n = 206) and Illumina (n = 60)
expression arrays and Illumina SNP genotyping (550K), and we also incorporated
data from a published study (n = 266). We found that
∼30% of SNP-expression correlations in one study failed to replicate
in either of the others, even at thresholds yielding high reproducibility in
simulations, and we quantified numerous factors affecting reproducibility. Our
data suggest that drug exposure, clinical descriptors, and unknown factors
associated with tissue ascertainment and analysis have substantial effects on
gene expression and that controlling for hidden confounding variables
significantly increases replication rate. Furthermore, we found that
reproducible eQTL SNPs were heavily enriched near gene starts and ends, and
subsequently resequenced the promoters and 3′UTRs for 14 genes and tested
the identified haplotypes using luciferase assays. For three genes, significant
haplotype-specific in vitro functional differences correlated
directly with expression levels, suggesting that many bona fide
eQTLs result from functional variants that can be mechanistically isolated in a
high-throughput fashion. Finally, given our study design, we were able to
discover and validate hundreds of liver eQTLs. Many of these relate directly to
complex traits for which liver-specific analyses are likely to be relevant, and
we identified dozens of potential connections with disease-associated loci.
These included previously characterized eQTL contributors to diabetes, drug
response, and lipid levels, and they suggest novel candidates such as a role for
NOD2 expression in leprosy risk and
C2orf43 in prostate cancer. In general, the work presented
here will be valuable for future efforts to precisely identify and functionally
characterize genetic contributions to a variety of complex traits.
Many disease-associated genetic variants do not alter protein sequences and are
difficult to precisely identify. Discovery of expression quantitative trait loci
(eQTL), or correlations between genetic variants and gene expression levels,
offers one means of addressing this challenge. However, eQTL studies in primary
cells have several shortcomings. In particular, their reproducibility is largely
unknown, the variables that generate unreliable associations are
uncharacterized, and the resolution of their findings is constrained by linkage
disequilibrium. We performed a three-way replication study of eQTLs in primary
human livers. We demonstrated that ∼67% of cis-eQTL associations are
replicated in an independent study and that known polymorphisms overlapping
expression probes, SNP-to-gene distance, and unmeasured confounding variables
all influence the replication rate. We fine-mapped 14 eQTLs and identified
causative polymorphisms in the promoter or 3′UTR for 3 genes, suggesting
that a considerable fraction of eQTLs are driven by proximal variants that are
amenable to functional isolation. Finally, we found hundreds of overlaps between
SNPs associated with complex traits and replicated eQTL SNPs. Our data provide
both cautionary (i.e. non-reproducibility of many strong eQTLs)
and optimistic (i.e. precise identification of functional
non-coding variants) forecasts for future eQTL analyses and the complex traits
that they influence.
We examined the association between single-nucleotide polymorphisms (SNPs) previously associated with chronic obstructive pulmonary disease (COPD) and/or lung function with COPD and COPD-related phenotypes in a novel cohort of patients with severe to very severe COPD. We examined 315 cases of COPD and 330 Caucasian control smokers from Poland. We included three SNPs previously associated with COPD: rs7671167 (FAM13A), rs13180 (IREB2), and rs8034191 (CHRNA 3/5), and four SNPs associated with lung function in a genome-wide association study of general population samples: rs2070600 (AGER), rs11134242 (ADCY2), rs4316710 (THSD4), and rs17096090 (INTS12). We tested for associations with severe COPD and COPD-related phenotypes, including lung function, smoking behavior, and body mass index. Subjects with COPD were older (average age 62 versus 58 years, P < 0.01), with more pack-years of smoking (45 versus 33 pack-years, P < 0.01). CHRNA3/5 (odds ratio [OR], 1.89; 95% confidence interval [CI], 1.5–2.4; P = 7.4 × 10−7), IREB2 (OR, 0.69; 95% CI, 0.5–0.9; P = 3.4 × 10−3), and ADCY2 (OR, 1.35; 95% CI, 1.1–1.7; P = 0.01) demonstrated significant associations with COPD. FAM13A (OR, 0.8; 95% CI, 0.7–1.0; P = 0.11) approached statistical significance. FAM13A and ADCY2 also demonstrated a significant association with lung function. Thus, in severe to very severe COPD, we demonstrate a replication of association between two SNPs previously associated with COPD (CHRNA3/5 and IREB2), as well as an association with COPD of one locus initially associated with lung function (ADCY2).
chronic obstructive pulmonary disease; genetic association analysis; lung function; smoking; nicotine addiction
Genome-wide association studies (GWAS) have identified loci reproducibly associated with pulmonary diseases; however, the molecular mechanism underlying these associations are largely unknown. The objectives of this study were to discover genetic variants affecting gene expression in human lung tissue, to refine susceptibility loci for asthma identified in GWAS studies, and to use the genetics of gene expression and network analyses to find key molecular drivers of asthma. We performed a genome-wide search for expression quantitative trait loci (eQTL) in 1,111 human lung samples. The lung eQTL dataset was then used to inform asthma genetic studies reported in the literature. The top ranked lung eQTLs were integrated with the GWAS on asthma reported by the GABRIEL consortium to generate a Bayesian gene expression network for discovery of novel molecular pathways underpinning asthma. We detected 17,178 cis- and 593 trans- lung eQTLs, which can be used to explore the functional consequences of loci associated with lung diseases and traits. Some strong eQTLs are also asthma susceptibility loci. For example, rs3859192 on chr17q21 is robustly associated with the mRNA levels of GSDMA (P = 3.55×10−151). The genetic-gene expression network identified the SOCS3 pathway as one of the key drivers of asthma. The eQTLs and gene networks identified in this study are powerful tools for elucidating the causal mechanisms underlying pulmonary disease. This data resource offers much-needed support to pinpoint the causal genes and characterize the molecular function of gene variants associated with lung diseases.
Recent genome-wide association studies (GWAS) have identified genetic variants associated with lung diseases. The challenge now is to find the causal genes in GWAS–nominated chromosomal regions and to characterize the molecular function of disease-associated genetic variants. In this paper, we describe an international effort to systematically capture the genetic architecture of gene expression regulation in human lung. By studying lung specimens from 1,111 individuals of European ancestry, we found a large number of genetic variants affecting gene expression in the lung, or lung expression quantitative trait loci (eQTL). These lung eQTLs will serve as an important resource to aid in the understanding of the molecular underpinnings of lung biology and its disruption in disease. To demonstrate the utility of this lung eQTL dataset, we integrated our data with previous genetic studies on asthma. Through integrative techniques, we identified causal variants and genes in GWAS–nominated loci and found key molecular drivers for asthma. We feel that sharing our lung eQTLs dataset with the scientific community will leverage the impact of previous large-scale GWAS on lung diseases and function by providing much needed functional information to understand the molecular changes introduced by the susceptibility genetic variants.
Genome-wide gene expression profiling has been extensively used to generate biological hypotheses based on differential expression. Recently, many studies have used microarrays to measure gene expression levels across genetic mapping populations. These gene expression phenotypes have been used for genome-wide association analyses, an analysis referred to as expression QTL (eQTL) mapping. Here, eQTL analysis was performed in adipose tissue from 28 inbred strains of mice. We focused our analysis on “trans-eQTL bands”, defined as instances in which the expression patterns of many genes were all associated to a common genetic locus. Genes comprising trans-eQTL bands were screened for enrichments in functional gene sets representing known biological pathways, and genes located at associated trans-eQTL band loci were considered candidate transcriptional modulators. We demonstrate that these patterns were enriched for previously characterized relationships between known upstream transcriptional regulators and their downstream target genes. Moreover, we used this strategy to identify both novel regulators and novel members of known pathways. Finally, based on a putative regulatory relationship identified in our analysis, we identified and validated a previously uncharacterized role for cyclin H in the regulation of oxidative phosphorylation. We believe that the specific molecular hypotheses generated in this study will reveal many additional pathway members and regulators, and that the analysis approaches described herein will be broadly applicable to other eQTL data sets.
Genome-wide association (GWA) analyses seek to relate variation of phenotype to underlying (and presumably causative) variation in genotype. Recently, many GWA studies have identified candidate genes underlying disease phenotypes such as diabetes, heart disease, and cancer risk. Many groups have also performed GWA using variation in gene expression levels as the input phenotype. These expression QTL (eQTL) studies have provided important clues as to the genetic basis of gene expression regulation. Here, we perform an eQTL study in mouse adipose tissue. We then developed a systematic analysis method to relate these patterns of eQTL associations to biological pathways. Based on this approach, we identified putative roles for thousands of candidate upstream regulators and candidate pathway members in relation to specific biological pathways. Statistical analysis showed that these predictions were highly enriched for true genetic modulators of these pathways. Based on these predictions, we also experimentally validated a role for one particular gene, cyclin H, in the regulation of oxidative phosphorylation. These findings illustrate a new analysis method for relating eQTL studies to biological pathways and identify cyclin H as a novel key regulator of cellular energy metabolism.
Elucidating the genetic basis underlying hepatic gene expression variability is of importance to understand the aetiology of the disease and variation in drug metabolism. To date, no genome-wide expression quantitative trait loci (eQTLs) analysis has been conducted in the Han Chinese population, the largest ethnic group in the world.
We performed a genome-wide eQTL mapping in a set of Han Chinese liver tissue samples (n=64). The data were then compared with published eQTL data from a Caucasian population. We then performed correlations between these eQTLs with important pharmacogenes, and genome-wide association study (GWAS) identified single nucleotide polymorphisms (SNPs), in particular those identified in the Asian population.
Our analyses identified 1669 significant eQTLs (false discovery rate (FDR) < 0.05). We found that 41% of Asian eQTLs were also eQTLs in Caucasians at the genome-wide significance level (p=10−8). Both cis- and trans-eQTLs in the Asian population were also more likely to be eQTLs in Caucasians (p<10−4). Enrichment analyses revealed that trait-associated GWAS-SNPs were enriched within the eQTLs identified in our data, so were the GWAS-SNPs specifically identified in Asian populations in a separate analysis (p<0.001 for both). We also found that hepatic expression of very important pharmacogenetic (VIP) genes (n=44) and a manually curated list of major genes involved in pharmacokinetics (n=341) were both more likely to be controlled by eQTLs (p<0.002 for both).
Our study provided, for the first time, a comprehensive hepatic eQTL analysis in a non-European population, further generating valuable data for characterising the genetic basis of human diseases and pharmacogenetic traits.
Clinical genetics; Genetics; Genome-wide; Molecular genetics
The genetic risk factors for chronic obstructive pulmonary disease (COPD) are still largely unknown. To date, genome-wide association studies (GWASs) of limited size have identified several novel risk loci for COPD at CHRNA3/CHRNA5/IREB2, HHIP and FAM13A; additional loci may be identified through larger studies. We performed a GWAS using a total of 3499 cases and 1922 control subjects from four cohorts: the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE); the Normative Aging Study (NAS) and National Emphysema Treatment Trial (NETT); Bergen, Norway (GenKOLS); and the COPDGene study. Genotyping was performed on Illumina platforms with additional markers imputed using 1000 Genomes data; results were summarized using fixed-effect meta-analysis. We identified a new genome-wide significant locus on chromosome 19q13 (rs7937, OR = 0.74, P = 2.9 × 10−9). Genotyping this single nucleotide polymorphism (SNP) and another nearby SNP in linkage disequilibrium (rs2604894) in 2859 subjects from the family-based International COPD Genetics Network study (ICGN) demonstrated supportive evidence for association for COPD (P = 0.28 and 0.11 for rs7937 and rs2604894), pre-bronchodilator FEV1 (P = 0.08 and 0.04) and severe (GOLD 3&4) COPD (P = 0.09 and 0.017). This region includes RAB4B, EGLN2, MIA and CYP2A6, and has previously been identified in association with cigarette smoking behavior.
Rationale: A genome-wide association study (GWAS) for circulating chronic obstructive pulmonary disease (COPD) biomarkers could identify genetic determinants of biomarker levels and COPD susceptibility.
Objectives: To identify genetic variants of circulating protein biomarkers and novel genetic determinants of COPD.
Methods: GWAS was performed for two pneumoproteins, Clara cell secretory protein (CC16) and surfactant protein D (SP-D), and five systemic inflammatory markers (C-reactive protein, fibrinogen, IL-6, IL-8, and tumor necrosis factor-α) in 1,951 subjects with COPD. For genome-wide significant single nucleotide polymorphisms (SNPs) (P < 1 × 10−8), association with COPD susceptibility was tested in 2,939 cases with COPD and 1,380 smoking control subjects. The association of candidate SNPs with mRNA expression in induced sputum was also elucidated.
Measurements and Main Results: Genome-wide significant susceptibility loci affecting biomarker levels were found only for the two pneumoproteins. Two discrete loci affecting CC16, one region near the CC16 coding gene (SCGB1A1) on chromosome 11 and another locus approximately 25 Mb away from SCGB1A1, were identified, whereas multiple SNPs on chromosomes 6 and 16, in addition to SNPs near SFTPD, had genome-wide significant associations with SP-D levels. Several SNPs affecting circulating CC16 levels were significantly associated with sputum mRNA expression of SCGB1A1 (P = 0.009–0.03). Several SNPs highly associated with CC16 or SP-D levels were nominally associated with COPD in a collaborative GWAS (P = 0.001–0.049), although these COPD associations were not replicated in two additional cohorts.
Conclusions: Distant genetic loci and biomarker-coding genes affect circulating levels of COPD-related pneumoproteins. A subset of these protein quantitative trait loci may influence their gene expression in the lung and/or COPD susceptibility.
Clinical trial registered with www.clinicaltrials.gov (NCT 00292552).
biomarker; chronic obstructive pulmonary disease; genome-wide association study
Amyotrophic lateral sclerosis (ALS) is a progressive, neurodegenerative disease characterized by loss of upper and lower motor neurons. ALS is considered to be a complex trait and genome-wide association studies (GWAS) have implicated a few susceptibility loci. However, many more causal loci remain to be discovered. Since it has been shown that genetic variants associated with complex traits are more likely to be eQTLs than frequency-matched variants from GWAS platforms, we conducted a two-stage genome-wide screening for eQTLs associated with ALS. In addition, we applied an eQTL analysis to finemap association loci. Expression profiles using peripheral blood of 323 sporadic ALS patients and 413 controls were mapped to genome-wide genotyping data. Subsequently, data from a two-stage GWAS (3,568 patients and 10,163 controls) were used to prioritize eQTLs identified in the first stage (162 ALS, 207 controls). These prioritized eQTLs were carried forward to the second sample with both gene-expression and genotyping data (161 ALS, 206 controls). Replicated eQTL SNPs were then tested for association in the second-stage GWAS data to find SNPs associated with disease, that survived correction for multiple testing. We thus identified twelve cis eQTLs with nominally significant associations in the second-stage GWAS data. Eight SNP-transcript pairs of highest significance (lowest p = 1.27×10−51) withstood multiple-testing correction in the second stage and modulated CYP27A1 gene expression. Additionally, we show that C9orf72 appears to be the only gene in the 9p21.2 locus that is regulated in cis, showing the potential of this approach in identifying causative genes in association loci in ALS. This study has identified candidate genes for sporadic ALS, most notably CYP27A1. Mutations in CYP27A1 are causal to cerebrotendinous xanthomatosis which can present as a clinical mimic of ALS with progressive upper motor neuron loss, making it a plausible susceptibility gene for ALS.
The development of COPD in subjects with alpha-1 antitrypsin (AAT) deficiency is likely to be influenced by modifier genes. Genome-wide association studies and integrative genomics approaches in COPD have demonstrated significant associations with SNPs in the chromosome 15q region that includes CHRNA3 (cholinergic nicotine receptor alpha3) and IREB2 (iron regulatory binding protein 2).
We investigated whether SNPs in the chromosome 15q region would be modifiers for lung function and COPD in AAT deficiency.
The current analysis included 378 PIZZ subjects in the AAT Genetic Modifiers Study and a replication cohort of 458 subjects from the UK AAT Deficiency National Registry. Nine SNPs in LOC123688, CHRNA3 and IREB2 were selected for genotyping. FEV1 percent of predicted and FEV1/FVC ratio were analyzed as quantitative phenotypes. Family-based association analysis was performed in the AAT Genetic Modifiers Study. In the replication set, general linear models were used for quantitative phenotypes and logistic regression models were used for the presence/absence of emphysema or COPD.
Three SNPs (rs2568494 in IREB2, rs8034191 in LOC123688, and rs1051730 in CHRNA3) were associated with pre-bronchodilator FEV1 percent of predicted in the AAT Genetic Modifiers Study. Two SNPs (rs2568494 and rs1051730) were associated with the post-bronchodilator FEV1 percent of predicted and pre-bronchodilator FEV1/FVC ratio; SNP-by-gender interactions were observed. In the UK National Registry dataset, rs2568494 was significantly associated with emphysema in the male subgroup; significant SNP-by-smoking interactions were observed.
IREB2 and CHRNA3 are potential genetic modifiers of COPD phenotypes in individuals with severe AAT deficiency and may be sex-specific in their impact.
CHRNA3; Chronic obstructive pulmonary disease; Genetic association analysis; Genetic modifiers; IREB2
Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Although recent genome-wide association studies (GWAS) have contributed to discovery of SLE susceptibility genes, few studies has been performed in Asian populations. Here, we report a GWAS for SLE examining 891 SLE cases and 3,384 controls and multi-stage replication studies examining 1,387 SLE cases and 28,564 controls in Japanese subjects. Considering that expression quantitative trait loci (eQTLs) have been implicated in genetic risks for autoimmune diseases, we integrated an eQTL study into the results of the GWAS. We observed enrichments of cis-eQTL positive loci among the known SLE susceptibility loci (30.8%) compared to the genome-wide SNPs (6.9%). In addition, we identified a novel association of a variant in the AF4/FMR2 family, member 1 (AFF1) gene at 4q21 with SLE susceptibility (rs340630; P = 8.3×10−9, odds ratio = 1.21). The risk A allele of rs340630 demonstrated a cis-eQTL effect on the AFF1 transcript with enhanced expression levels (P<0.05). As AFF1 transcripts were prominently expressed in CD4+ and CD19+ peripheral blood lymphocytes, up-regulation of AFF1 may cause the abnormality in these lymphocytes, leading to disease onset.
Although recent genome-wide association study (GWAS) approaches have successfully contributed to disease gene discovery, many susceptibility loci are known to be still uncaptured due to strict significance threshold for multiple hypothesis testing. Therefore, prioritization of GWAS results by incorporating additional information is recommended. Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Considering that abnormalities in B cell activity play essential roles in SLE, prioritization based on an expression quantitative trait loci (eQTLs) study for B cells would be a promising approach. In this study, we report a GWAS and multi-stage replication studies for SLE examining 2,278 SLE cases and 31,948 controls in Japanese subjects. We integrated eQTL study into the results of the GWAS and identified AFF1 as a novel SLE susceptibility loci. We also confirmed cis-regulatory effect of the locus on the AFF1 transcript. Our study would be one of the initial successes for detecting novel genetic locus using the eQTL study, and it should contribute to our understanding of the genetic loci being uncaptured by standard GWAS approaches.
Rationale: Genome-wide association studies (GWAS) have identified loci influencing lung function, but fewer genes influencing chronic obstructive pulmonary disease (COPD) are known.
Objectives: Perform meta-analyses of GWAS for airflow obstruction, a key pathophysiologic characteristic of COPD assessed by spirometry, in population-based cohorts examining all participants, ever smokers, never smokers, asthma-free participants, and more severe cases.
Methods: Fifteen cohorts were studied for discovery (3,368 affected; 29,507 unaffected), and a population-based family study and a meta-analysis of case-control studies were used for replication and regional follow-up (3,837 cases; 4,479 control subjects). Airflow obstruction was defined as FEV1 and its ratio to FVC (FEV1/FVC) both less than their respective lower limits of normal as determined by published reference equations.
Measurements and Main Results: The discovery meta-analyses identified one region on chromosome 15q25.1 meeting genome-wide significance in ever smokers that includes AGPHD1, IREB2, and CHRNA5/CHRNA3 genes. The region was also modestly associated among never smokers. Gene expression studies confirmed the presence of CHRNA5/3 in lung, airway smooth muscle, and bronchial epithelial cells. A single-nucleotide polymorphism in HTR4, a gene previously related to FEV1/FVC, achieved genome-wide statistical significance in combined meta-analysis. Top single-nucleotide polymorphisms in ADAM19, RARB, PPAP2B, and ADAMTS19 were nominally replicated in the COPD meta-analysis.
Conclusions: These results suggest an important role for the CHRNA5/3 region as a genetic risk factor for airflow obstruction that may be independent of smoking and implicate the HTR4 gene in the etiology of airflow obstruction.
chronic obstructive pulmonary disease; single-nucleotide polymorphism; genes
One major expectation from the transcriptome in humans is to characterize the biological basis of associations identified by genome-wide association studies. So far, few cis expression quantitative trait loci (eQTLs) have been reliably related to disease susceptibility. Trans-regulating mechanisms may play a more prominent role in disease susceptibility. We analyzed 12,808 genes detected in at least 5% of circulating monocyte samples from a population-based sample of 1,490 European unrelated subjects. We applied a method of extraction of expression patterns—independent component analysis—to identify sets of co-regulated genes. These patterns were then related to 675,350 SNPs to identify major trans-acting regulators. We detected three genomic regions significantly associated with co-regulated gene modules. Association of these loci with multiple expression traits was replicated in Cardiogenics, an independent study in which expression profiles of monocytes were available in 758 subjects. The locus 12q13 (lead SNP rs11171739), previously identified as a type 1 diabetes locus, was associated with a pattern including two cis eQTLs, RPS26 and SUOX, and 5 trans eQTLs, one of which (MADCAM1) is a potential candidate for mediating T1D susceptibility. The locus 12q24 (lead SNP rs653178), which has demonstrated extensive disease pleiotropy, including type 1 diabetes, hypertension, and celiac disease, was associated to a pattern strongly correlating to blood pressure level. The strongest trans eQTL in this pattern was CRIP1, a known marker of cellular proliferation in cancer. The locus 12q15 (lead SNP rs11177644) was associated with a pattern driven by two cis eQTLs, LYZ and YEATS4, and including 34 trans eQTLs, several of them tumor-related genes. This study shows that a method exploiting the structure of co-expressions among genes can help identify genomic regions involved in trans regulation of sets of genes and can provide clues for understanding the mechanisms linking genome-wide association loci to disease.
One major expectation from the transcriptome in humans is to help characterize the biological basis of associations identified by genome-wide association studies. Here, we take advantage of recent technical and methodological advances to examine the influence of natural genetic variability on >12,000 genes expressed in the monocyte, a blood cell playing a key role in immunity-related disorders and atherosclerosis. By examining 1,490 European population-based subjects, we identify three regions of the genome reproducibly associated with specific patterns of gene expression. Two of these regions overlap genetic variants previously known to be involved in the susceptibility to type 1 diabetes, celiac disease, and hypertension. Genes whose expression is modulated by these genetic variants may act as mediators in the causal relationship linking the variability of the genome to complex disease. These findings illustrate how integration of genetic and transcriptomic data at an epidemiological scale can help decipher the genetic basis of complex diseases.
Genetic variation in the expression of human xenobiotic metabolism enzymes and transporters (XMETs) leads to inter-individual variability in metabolism of therapeutic agents as well as differed susceptibility to various diseases. Recent expression quantitative traits loci (eQTL) mapping in a few human cells/tissues have identified a number of single nucleotide polymorphisms (SNPs) significantly associated with mRNA expression of many XMET genes. These eQTLs are therefore important candidate markers for pharmacogenetic studies. However, questions remain about whether these SNPs are causative and in what mechanism these SNPs may function. Given the important role of microRNAs (miRs) in gene transcription regulation, we hypothesize that those eQTLs or their proxies in strong linkage disequilibrium (LD) altering miR targeting are likely causative SNPs affecting gene expression. The aim of this study is to identify eQTLs potentially regulating major XMETs via interference with miR targeting. To this end, we performed a genome-wide screening for eQTLs for 409 genes encoding major drug metabolism enzymes, transporters and transcription factors, in publically available eQTL datasets generated from the HapMap lymphoblastoid cell lines and human liver and brain tissue. As a result, 308 eQTLs significantly (p < 10−5) associated with mRNA expression of 101 genes were identified. We further identified 7,869 SNPs in strong LD (r2 ≥ 0.8) with these eQTLs using the 1,000 Genome SNP data. Among these 8,177 SNPs, 27 are located in the 3′-UTR of 14 genes. Using two algorithms predicting miR-SNP interaction, we found that almost all these SNPs (26 out of 27) were predicted to create, abolish, or change the target site for miRs in both algorithms. Many of these miRs were also expressed in the same tissue that the eQTL were identified. Our study provides a strong rationale for continued investigation for the functions of these eQTLs in pharmacogenetic settings.
eQTL; xenobiotic metabolism enzyme and transporter; microRNA; pharmacogenetics; 3′-UTR
In recent years genome-wide association studies (GWAS) have uncovered numerous chromosomal loci associated with various electrocardiographic traits and cardiac arrhythmia predisposition. A considerable fraction of these loci lie within inter-genic regions. The underlying trait-associated variants likely reside in regulatory regions and exert their effect by modulating gene expression. Hence, the key to unraveling the molecular mechanisms underlying these cardiac traits is to interrogate variants for association with differential transcript abundance by expression quantitative trait locus (eQTL) analysis. In this study we conducted an eQTL analysis of human heart. For a total of 129 left ventricular samples that were collected from non-diseased human donor hearts, genome-wide transcript abundance and genotyping was determined using microarrays. Each of the 18,402 transcripts and 897,683 SNP genotypes that remained after pre-processing and stringent quality control were tested for eQTL effects. We identified 771 eQTLs, regulating 429 unique transcripts. Overlaying these eQTLs with cardiac GWAS loci identified novel candidates for studies aimed at elucidating the functional and transcriptional impact of these loci. Thus, this work provides for the first time a comprehensive eQTL map of human heart: a powerful and unique resource that enables systems genetics approaches for the study of cardiac traits.
For many complex traits, genetic variants have been found associated. However, it is still mostly unclear through which downstream mechanism these variants cause these phenotypes. Knowledge of these intermediate steps is crucial to understand pathogenesis, while also providing leads for potential pharmacological intervention. Here we relied upon natural human genetic variation to identify effects of these variants on trans-gene expression (expression quantitative trait locus mapping, eQTL) in whole peripheral blood from 1,469 unrelated individuals. We looked at 1,167 published trait- or disease-associated SNPs and observed trans-eQTL effects on 113 different genes, of which we replicated 46 in monocytes of 1,490 different individuals and 18 in a smaller dataset that comprised subcutaneous adipose, visceral adipose, liver tissue, and muscle tissue. HLA single-nucleotide polymorphisms (SNPs) were 10-fold enriched for trans-eQTLs: 48% of the trans-acting SNPs map within the HLA, including ulcerative colitis susceptibility variants that affect plausible candidate genes AOAH and TRBV18 in trans. We identified 18 pairs of unlinked SNPs associated with the same phenotype and affecting expression of the same trans-gene (21 times more than expected, P<10−16). This was particularly pronounced for mean platelet volume (MPV): Two independent SNPs significantly affect the well-known blood coagulation genes GP9 and F13A1 but also C19orf33, SAMD14, VCL, and GNG11. Several of these SNPs have a substantially higher effect on the downstream trans-genes than on the eventual phenotypes, supporting the concept that the effects of these SNPs on expression seems to be much less multifactorial. Therefore, these trans-eQTLs could well represent some of the intermediate genes that connect genetic variants with their eventual complex phenotypic outcomes.
Many genetic variants have been found associated with diseases. However, for many of these genetic variants, it remains unclear how they exert their effect on the eventual phenotype. We investigated genetic variants that are known to be associated with diseases and complex phenotypes and assessed whether these variants were also associated with gene expression levels in a set of 1,469 unrelated whole blood samples. For several diseases, such as type 1 diabetes and ulcerative colitis, we observed that genetic variants affect the expression of genes, not implicated before. For complex traits, such as mean platelet volume and mean corpuscular volume, we observed that independent genetic variants on different chromosomes influence the expression of exactly the same genes. For mean platelet volume, these genes include well-known blood coagulation genes but also genes with still unknown functions. These results indicate that, by systematically correlating genetic variation with gene expression levels, it is possible to identify downstream genes, which provide important avenues for further research.
Restless legs syndrome (RLS) is a common neurologic disorder characterized by nightly dysesthesias affecting the legs primarily during periods of rest and relieved by movement. RLS is a complex genetic disease and susceptibility factors in six genomic regions have been identified by means of genome-wide association studies (GWAS). For some complex genetic traits, expression quantitative trait loci (eQTLs) are enriched among trait-associated single nucleotide polymorphisms (SNPs). With the aim of identifying new genetic susceptibility factors for RLS, we assessed the 332 best-associated SNPs from the genome-wide phase of the to date largest RLS GWAS for cis-eQTL effects in peripheral blood from individuals of European descent. In 740 individuals belonging to the KORA general population cohort, 52 cis-eQTLs with pnominal<10−3 were identified, while in 976 individuals belonging to the SHIP-TREND general population study 53 cis-eQTLs with pnominal<10−3 were present. 23 of these cis-eQTLs overlapped between the two cohorts. Subsequently, the twelve of the 23 cis-eQTL SNPs, which were not located at an already published RLS-associated locus, were tested for association in 2449 RLS cases and 1462 controls. The top SNP, located in the DET1 gene, was nominally significant (p<0.05) but did not withstand correction for multiple testing (p = 0.42). Although a similar approach has been used successfully with regard to other complex diseases, we were unable to identify new genetic susceptibility factor for RLS by adding this novel level of functional assessment to RLS GWAS data.
Many disease-associated variants affect gene expression levels (expression quantitative trait loci, eQTLs) and expression profiling using next generation sequencing (NGS) technology is a powerful way to detect these eQTLs. We analyzed 94 total blood samples from healthy volunteers with DeepSAGE to gain specific insight into how genetic variants affect the expression of genes and lengths of 3′-untranslated regions (3′-UTRs). We detected previously unknown cis-eQTL effects for GWAS hits in disease- and physiology-associated traits. Apart from cis-eQTLs that are typically easily identifiable using microarrays or RNA-sequencing, DeepSAGE also revealed many cis-eQTLs for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. We also identified and confirmed SNPs that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of messenger RNAs (mRNA). We then combined the power of RNA-sequencing with DeepSAGE by performing a meta-analysis of three datasets, leading to the identification of many more cis-eQTLs. Our results indicate that DeepSAGE data is useful for eQTL mapping of known and unknown transcripts, and for identifying SNPs that affect alternative polyadenylation. Because of the inherent differences between DeepSAGE and RNA-sequencing, our complementary, integrative approach leads to greater insight into the molecular consequences of many disease-associated variants.
Many genetic variants that are associated with diseases also affect gene expression levels. We used a next generation sequencing approach targeting 3′ transcript ends (DeepSAGE) to gain specific insight into how genetic variants affect the expression of genes and the usage and length of 3′-untranslated regions. We detected many associations for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. Some of these variants are also associated with disease. We also identified and confirmed variants that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of mRNAs. We conclude that DeepSAGE is useful for detecting eQTL effects on both known and unknown transcripts, and for identifying variants that affect alternative polyadenylation.
Cigarette smoking is the major environmental risk factor for chronic obstructive pulmonary disease (COPD). Genome-wide association studies have provided compelling associations for three loci with COPD. In this study, we aimed to estimate direct, i.e., independent from smoking, and indirect effects of those loci on COPD development using mediation analysis. We included a total of 3,424 COPD cases and 1,872 unaffected controls with data on two smoking-related phenotypes: lifetime average smoking intensity and cumulative exposure to tobacco smoke (pack years). Our analysis revealed that effects of two linked variants (rs1051730 and rs8034191) in the AGPHD1/CHRNA3 cluster on COPD development are significantly, yet not entirely, mediated by the smoking-related phenotypes. Approximately 30 % of the total effect of variants in the AGPHD1/CHRNA3 cluster on COPD development was mediated by pack years. Simultaneous analysis of modestly (r2 = 0.21) linked markers in CHRNA3 and IREB2 revealed that an even larger (~42 %) proportion of the total effect of the CHRNA3 locus on COPD was mediated by pack years after adjustment for an IREB2 single nucleotide polymorphism. This study confirms the existence of direct effects of the AGPHD1/CHRNA3, IREB2, FAM13A and HHIP loci on COPD development. While the association of the AGPHD1/CHRNA3 locus with COPD is significantly mediated by smoking-related phenotypes, IREB2 appears to affect COPD independently of smoking.
Genetic factors are known to contribute to COPD susceptibility and these factors are not fully understood. Conflicting results have been reported for many genetic studies of candidate genes based on their role in the disease. Genome-wide association studies in combination with expression profiling have identified a number of new candidates including IREB2. A meta-analysis has implicated transforming growth factor beta-1 (TGFbeta1) as a contributor to disease susceptibility.
We have examined previously reported associations in both genes in a collection of 1017 white COPD patients and 912 non-diseased smoking controls. Genotype information was obtained for seven SNPs in the IREB2 gene, and for four SNPs in the TGFbeta1 gene. Allele and genotype frequencies were compared between COPD cases and controls, and odds ratios were calculated. The analysis was adjusted for age, sex, smoking and centre, including interactions of age, sex and smoking with centre.
Our data replicate the association of IREB2 SNPs in association with COPD for SNP rs2568494, rs2656069 and rs12593229 with respective adjusted p-values of 0.0018, 0.0039 and 0.0053. No significant associations were identified for TGFbeta1.
These studies have therefore confirmed that the IREB2 locus is a contributor to COPD susceptibility and suggests a new pathway in COPD pathogenesis invoking iron homeostasis.
BACKGROUND & AIMS
Genome-wide association studies have greatly increased our understanding of intestinal disease. However, little is known about how genetic variations result in phenotypic changes. Some polymorphisms have been shown to modulate quantifiable phenotypic traits; these are called quantitative trait loci. Quantitative trait loci that affect levels of gene expression are called expression quantitative trait loci (eQTL), which can provide insight into the biological relevance of data from genome-wide association studies. We performed a comprehensive eQTL scan of intestinal tissue.
Total RNA was extracted from ileal biopsy specimens and genomic DNA was obtained from whole-blood samples from the same cohort of individuals. Cis- and trans-eQTL analyses were performed using a custom software pipeline for samples from 173 subjects. The analyses determined the expression levels of 19,047 unique autosomal genes listed in the US National Center for Biotechnology Information database and more than 580,000 variants from the Single Nucleotide Polymorphism database.
The presence of more than 15,000 cis- and trans-eQTL was detected with statistical significance. eQTL associated with the same expression trait were in high linkage disequilibrium. Comparative analysis with previous eQTL studies showed that 30% to 40% of genes identified as eQTL in monocytes, liver tissue, lymphoblastoid cell lines, T cells, and fibroblasts are also eQTL in ileal tissue. Conversely, most of the significant eQTL have not been previously identified and could be tissue specific. These are involved in many cell functions, including division and antigen processing and presentation. Our analysis confirmed that previously published cis-eQTL are single nucleotide polymorphisms associated with inflammatory bowel disease: rs2298428/UBE2L3, rs1050152/SLC22A4, and SLC22A5. We identified many new associations between inflammatory bowel disease susceptibility loci and gene expression.
eQTL analysis of intestinal tissue supports findings that some eQTL remain stable across cell types, whereas others are specific to the sampled location. Our findings confirm and expand the number of known genotypes associated with expression and could help elucidate mechanisms of intestinal disease.
SNPdb; IBD; Transcriptomics; Systems Biology
Expression Quantitative Trait Locus (eQTL) analysis is a powerful tool to study the biological mechanisms linking the genotype with gene expression. Such analyses can identify genomic locations where genotypic variants influence the expression of genes, both in close proximity to the variant (cis-eQTL), and on other chromosomes (trans-eQTL). Many traditional eQTL methods are based on a linear regression model. In this study, we propose a novel method by which to identify eQTL associations with information theory and machine learning approaches. Mutual Information (MI) is used to describe the association between genetic marker and gene expression. MI can detect both linear and non-linear associations. What’s more, it can capture the heterogeneity of the population. Advanced feature selection methods, Maximum Relevance Minimum Redundancy (mRMR) and Incremental Feature Selection (IFS), were applied to optimize the selection of the affected genes by the genetic marker. When we applied our method to a study of apoE-deficient mice, it was found that the cis-acting eQTLs are stronger than trans-acting eQTLs but there are more trans-acting eQTLs than cis-acting eQTLs. We compared our results (mRMR.eQTL) with R/qtl, and MatrixEQTL (modelLINEAR and modelANOVA). In female mice, 67.9% of mRMR.eQTL results can be confirmed by at least two other methods while only 14.4% of R/qtl result can be confirmed by at least two other methods. In male mice, 74.1% of mRMR.eQTL results can be confirmed by at least two other methods while only 18.2% of R/qtl result can be confirmed by at least two other methods. Our methods provide a new way to identify the association between genetic markers and gene expression. Our software is available from supporting information.