Chronic obstructive pulmonary disease (COPD) is the fourth leading cause of mortality worldwide. Recent genome-wide association studies (GWAS) have identified robust susceptibility loci associated with COPD. However, the mechanisms mediating the risk conferred by these loci remain to be found. The goal of this study was to identify causal genes/variants within susceptibility loci associated with COPD. In the discovery cohort, genome-wide gene expression profiles of 500 non-tumor lung specimens were obtained from patients undergoing lung surgery. Blood-DNA from the same patients were genotyped for 1,2 million SNPs. Following genotyping and gene expression quality control filters, 409 samples were analyzed. Lung expression quantitative trait loci (eQTLs) were identified and overlaid onto three COPD susceptibility loci derived from GWAS; 4q31 (HHIP), 4q22 (FAM13A), and 19q13 (RAB4B, EGLN2, MIA, CYP2A6). Significant eQTLs were replicated in two independent datasets (n = 363 and 339). SNPs previously associated with COPD and lung function on 4q31 (rs1828591, rs13118928) were associated with the mRNA expression of HHIP. An association between mRNA expression level of FAM13A and SNP rs2045517 was detected at 4q22, but did not reach statistical significance. At 19q13, significant eQTLs were detected with EGLN2. In summary, this study supports HHIP, FAM13A, and EGLN2 as the most likely causal COPD genes on 4q31, 4q22, and 19q13, respectively. Strong lung eQTL SNPs identified in this study will need to be tested for association with COPD in case-control studies. Further functional studies will also be needed to understand the role of genes regulated by disease-related variants in COPD.
Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.
When interpreting genome-wide association studies showing that specific genetic variants are associated with disease risk, scientists look for a link between the genetic variant and a biological mechanism behind that disease. One functional mechanism is that the genetic variant may influence gene transcription via a co-localized genomic regulatory element, such as a transcription factor binding site within an open chromatin region. Often this type of regulation occurs in some cell types but not others. In this study, we look across eleven gene expression studies with seven cell types and consider how genetic transcription regulators, or eQTLs, replicate within and between cell types. We identify pervasive allelic heterogeneity, or transcriptional control of a single gene by multiple, independent eQTLs. We integrate extensive data on cell type specific regulatory elements from ENCODE to identify general methods of transcription regulation through enrichment of eQTLs within regulatory elements. We also build a classifier to predict eQTL replication across cell types. The results in this paper present a path to an integrative, predictive approach to improve our ability to understand the mechanistic basis of human phenotypic variation.
Schizophrenia (SZ) is a complex disorder resulting from both genetic and environmental causes with a lifetime prevalence world-wide of 1%; however, there are no specific, sensitive and validated biomarkers for SZ. A general unifying hypothesis has been put forward that disease-associated single nucleotide polymorphisms (SNPs) from genome-wide association study (GWAS) are more likely to be associated with gene expression quantitative trait loci (eQTL). We will describe this hypothesis and review primary methodology with refinements for testing this paradigmatic approach in SZ. We will describe biomarker studies of SZ and testing enrichment of SNPs that are associated both with eQTLs and existing GWAS of SZ. SZ-associated SNPs that overlap with eQTLs can be placed into gene–gene expression, protein–protein and protein–DNA interaction networks. Further, those networks can be tested by reducing/silencing the gene expression levels of critical nodes. We present pilot data to support these methods of investigation such as the use of eQTLs to annotate GWASs of SZ, which could be applied to the field of biomarker discovery. Those networks that have association with SNP markers, especially cis-regulated expression, might lead to a more clear understanding of important candidate genes that predispose to disease and alter expression. This method has general application to many complex disorders.
expression quantitative trait loci; cis-regulatory SNPs; GWAS; gene expression; lymphoblastoid cell lines
There is genetic evidence that schizophrenia is a polygenic disorder with a large number of loci of small effect on disease susceptibility. Genome-wide association studies (GWASs) of schizophrenia have had limited success, with the best finding at the MHC locus at chromosome 6p. A recent effort of the Psychiatric GWAS consortium (PGC) yielded five novel loci for schizophrenia. In this study, we aim to highlight additional schizophrenia susceptibility loci from the PGC study by combining the top association findings from the discovery stage (9394 schizophrenia cases and 12 462 controls) with expression QTLs (eQTLs) and differential gene expression in whole blood of schizophrenia patients and controls. We examined the 6192 single-nucleotide polymorphisms (SNPs) with significance threshold at P<0.001. eQTLs were calculated for these SNPs in a sample of healthy controls (n=437). The transcripts significantly regulated by the top SNPs from the GWAS meta-analysis were subsequently tested for differential expression in an independent set of schizophrenia cases and controls (n=202). After correction for multiple testing, the eQTL analysis yielded 40 significant cis-acting effects of the SNPs. Seven of these transcripts show differential expression between cases and controls. Of these, the effect of three genes (RNF5, TRIM26 and HLA-DRB3) coincided with the direction expected from meta-analysis findings and were all located within the MHC region. Our results identify new genes of interest and highlight again the involvement of the MHC region in schizophrenia susceptibility.
expression; schizophrenia; eQTL; MHC; SNP
Genome-wide association studies (GWAS) have identified loci reproducibly associated with pulmonary diseases; however, the molecular mechanism underlying these associations are largely unknown. The objectives of this study were to discover genetic variants affecting gene expression in human lung tissue, to refine susceptibility loci for asthma identified in GWAS studies, and to use the genetics of gene expression and network analyses to find key molecular drivers of asthma. We performed a genome-wide search for expression quantitative trait loci (eQTL) in 1,111 human lung samples. The lung eQTL dataset was then used to inform asthma genetic studies reported in the literature. The top ranked lung eQTLs were integrated with the GWAS on asthma reported by the GABRIEL consortium to generate a Bayesian gene expression network for discovery of novel molecular pathways underpinning asthma. We detected 17,178 cis- and 593 trans- lung eQTLs, which can be used to explore the functional consequences of loci associated with lung diseases and traits. Some strong eQTLs are also asthma susceptibility loci. For example, rs3859192 on chr17q21 is robustly associated with the mRNA levels of GSDMA (P = 3.55×10−151). The genetic-gene expression network identified the SOCS3 pathway as one of the key drivers of asthma. The eQTLs and gene networks identified in this study are powerful tools for elucidating the causal mechanisms underlying pulmonary disease. This data resource offers much-needed support to pinpoint the causal genes and characterize the molecular function of gene variants associated with lung diseases.
Recent genome-wide association studies (GWAS) have identified genetic variants associated with lung diseases. The challenge now is to find the causal genes in GWAS–nominated chromosomal regions and to characterize the molecular function of disease-associated genetic variants. In this paper, we describe an international effort to systematically capture the genetic architecture of gene expression regulation in human lung. By studying lung specimens from 1,111 individuals of European ancestry, we found a large number of genetic variants affecting gene expression in the lung, or lung expression quantitative trait loci (eQTL). These lung eQTLs will serve as an important resource to aid in the understanding of the molecular underpinnings of lung biology and its disruption in disease. To demonstrate the utility of this lung eQTL dataset, we integrated our data with previous genetic studies on asthma. Through integrative techniques, we identified causal variants and genes in GWAS–nominated loci and found key molecular drivers for asthma. We feel that sharing our lung eQTLs dataset with the scientific community will leverage the impact of previous large-scale GWAS on lung diseases and function by providing much needed functional information to understand the molecular changes introduced by the susceptibility genetic variants.
Variation in gene expression has been found to be important in disease susceptibility and pharmacogenomics. Local and distant expression quantitative trait loci (eQTLs) have been identified via genome-wide association study (GWAS); yet the functional analysis of these variants has been challenging. The aim of this study was to unravel the functional consequence of a gene with a local SNP with evidence for local and distant regulatory roles in cellular sensitivity to cisplatin, one of the most widely used chemotherapeutic drugs. To this end, we measured cellular susceptibility to cisplatin in 176 HapMap lymphoblastoid cell lines derived from Yoruba individuals from Ibadan, Nigeria. The 276 cytotoxicity-associated SNPs at the suggestive threshold of P ≤ 0.0001 were significantly enriched for eQTLs. Of these SNPs, we found one intronic SNP, rs17115814, that had a significant relationship with the expression level of its host gene, PRPF39 (P= 0.0007), and a significant correlation with the expression of over 100 distant transcripts (P ≤ 0.0001). Successful knockdown of PRPF39 expression using siRNA resulted in a significant increase in cisplatin resistance. We then measured the expression of 61 downstream targets after PRPF39 knockdown and found 53 gene targets had significant (P ≤ 0.05) expression changes. Included in the list of genes that significantly changed after PRPF39 knockdown were MAP3K4 and TFPD2, two important signaling genes previously shown to be relevant in cisplatin response. Thus, modulation of a local target gene identified through a GWAS was followed by a downstream cascade of gene expression changes resulting in greater resistance to cisplatin.
The integrated analysis of genotypic and expression data for association with complex traits could identify novel genetic pathways involved in complex traits. We profiled 19,573 expression probes in Epstein-Barr virus-transformed lymphoblastoid cell lines (LCLs) from 299 twins and correlated these with 44 quantitative traits (QTs). For 939 expressed probes correlating with more than one QT, we investigated the presence of eQTL associations in three datasets of 57 CEU HapMap founders and 86 unrelated twins. Genome-wide association analysis of these probes with 2.2 m SNPs revealed 131 potential eQTLs (1,989 eQTL SNPs) overlapping between the HapMap datasets, five of which were in cis (58 eQTL SNPs). We then tested 535 SNPs tagging the eQTL SNPs, for association with the relevant QT in 2,905 twins. We identified nine potential SNP-QT associations (P<0.01) but none significantly replicated in five large consortia of 1,097–16,129 subjects. We also failed to replicate previous reported eQTL associations with body mass index, plasma low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides levels derived from lymphocytes, adipose and liver tissue. Our results and additional power calculations suggest that proponents may have been overoptimistic in the power of LCLs in eQTL approaches to elucidate regulatory genetic effects on complex traits using the small datasets generated to date. Nevertheless, larger tissue-specific expression data sets relevant to specific traits are becoming available, and should enable the adoption of similar integrated analyses in the near future.
Expression quantitative trait loci (eQTL) mapping is a powerful tool for identifying genetic regulatory variation. However, at present, most eQTLs in humans were identified using gene expression data from cell lines, and it remains unknown whether these eQTLs also have a regulatory function in other expression contexts, such as human primary tissues. Here we investigate this question using a targeted strategy. Specifically, we selected a subset of large-effect eQTLs identified in the HapMap lymphoblastoid cell lines, and examined the association of these eQTLs with gene expression levels across individuals in five human primary tissues (heart, kidney, liver, lung and testes). We show that genotypes at the eQTLs we selected are often predictive of variation in gene expression levels in one or more of the five primary tissues. The genotype effects in the primary tissues are consistently in the same direction as the effects inferred in the cell lines. Additionally, a number of the eQTLs we tested are found in more than one of the tissues. Our results indicate that functional studies in cell lines may uncover a substantial amount of genetic variation that affects gene expression levels in human primary tissues.
Rationale: Chromosome 12p has been linked to chronic obstructive pulmonary disease (COPD) in the Boston Early-Onset COPD Study (BEOCOPD), but a susceptibility gene in that region has not been identified.
Objectives: We used high-density single-nucleotide polymorphism (SNP) mapping to implicate a COPD susceptibility gene and an animal model to determine the potential role of SOX5 in lung development and COPD.
Methods: On chromosome 12p, we genotyped 1,387 SNPs in 386 COPD cases from the National Emphysema Treatment Trial and 424 control smokers from the Normative Aging Study. SNPs with significant associations were then tested in the BEOCOPD study and the International COPD Genetics Network. Based on the human results, we assessed histology and gene expression in the lungs of Sox5−/− mice.
Measurements and Main Results: In the case-control analysis, 27 SNPs were significant at P ≤ 0.01. The most significant SNP in the BEOCOPD replication was rs11046966 (National Emphysema Treatment Trial–Normative Aging Study P = 6.0 × 10−4, BEOCOPD P = 1.5 × 10−5, combined P = 1.7 × 10−7), located 3′ to the gene SOX5. Association with rs11046966 was not replicated in the International COPD Genetics Network. Sox5−/− mice showed abnormal lung development, with a delay in maturation before the saccular stage, as early as E16.5. Lung pathology in Sox5−/− lungs was associated with a decrease in fibronectin expression, an extracellular matrix component critical for branching morphogenesis.
Conclusions: Genetic variation in the transcription factor SOX5 is associated with COPD susceptibility. A mouse model suggests that the effect may be due, in part, to its effects on lung development and/or repair processes.
chronic obstructive pulmonary disease; emphysema; knockout mice; lung development; single nucleotide polymorphism
Multiple intergenic single-nucleotide polymorphisms (SNPs) near hedgehog interacting protein (HHIP) on chromosome 4q31 have been strongly associated with pulmonary function levels and moderate-to-severe chronic obstructive pulmonary disease (COPD). However, whether the effects of variants in this region are related to HHIP or another gene has not been proven. We confirmed genetic association of SNPs in the 4q31 COPD genome-wide association study (GWAS) region in a Polish cohort containing severe COPD cases and healthy smoking controls (P = 0.001 to 0.002). We found that HHIP expression at both mRNA and protein levels is reduced in COPD lung tissues. We identified a genomic region located ∼85 kb upstream of HHIP which contains a subset of associated SNPs, interacts with the HHIP promoter through a chromatin loop and functions as an HHIP enhancer. The COPD risk haplotype of two SNPs within this enhancer region (rs6537296A and rs1542725C) was associated with statistically significant reductions in HHIP promoter activity. Moreover, rs1542725 demonstrates differential binding to the transcription factor Sp3; the COPD-associated allele exhibits increased Sp3 binding, which is consistent with Sp3's usual function as a transcriptional repressor. Thus, increased Sp3 binding at a functional SNP within the chromosome 4q31 COPD GWAS locus leads to reduced HHIP expression and increased susceptibility to COPD through distal transcriptional regulation. Together, our findings reveal one mechanism through which SNPs upstream of the HHIP gene modulate the expression of HHIP and functionally implicate reduced HHIP gene expression in the pathogenesis of COPD.
Many disease-associated variants affect gene expression levels (expression quantitative trait loci, eQTLs) and expression profiling using next generation sequencing (NGS) technology is a powerful way to detect these eQTLs. We analyzed 94 total blood samples from healthy volunteers with DeepSAGE to gain specific insight into how genetic variants affect the expression of genes and lengths of 3′-untranslated regions (3′-UTRs). We detected previously unknown cis-eQTL effects for GWAS hits in disease- and physiology-associated traits. Apart from cis-eQTLs that are typically easily identifiable using microarrays or RNA-sequencing, DeepSAGE also revealed many cis-eQTLs for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. We also identified and confirmed SNPs that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of messenger RNAs (mRNA). We then combined the power of RNA-sequencing with DeepSAGE by performing a meta-analysis of three datasets, leading to the identification of many more cis-eQTLs. Our results indicate that DeepSAGE data is useful for eQTL mapping of known and unknown transcripts, and for identifying SNPs that affect alternative polyadenylation. Because of the inherent differences between DeepSAGE and RNA-sequencing, our complementary, integrative approach leads to greater insight into the molecular consequences of many disease-associated variants.
Many genetic variants that are associated with diseases also affect gene expression levels. We used a next generation sequencing approach targeting 3′ transcript ends (DeepSAGE) to gain specific insight into how genetic variants affect the expression of genes and the usage and length of 3′-untranslated regions. We detected many associations for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. Some of these variants are also associated with disease. We also identified and confirmed variants that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of mRNAs. We conclude that DeepSAGE is useful for detecting eQTL effects on both known and unknown transcripts, and for identifying variants that affect alternative polyadenylation.
Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3′ untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation.
Following the recent success of genome-wide association studies in uncovering disease-associated genetic variants, the next challenge is to understand how these variants affect downstream pathways. The most proximal trait to a disease-associated variant, most commonly a single nucleotide polymorphism (SNP), is differential gene expression due to the cis effect of SNP alleles on transcription, translation, and/or splicing gene expression quantitative trait loci (eQTL). Several genome-wide SNP–gene expression association studies have already provided convincing evidence of widespread association of eQTLs. As a consequence, some eQTL associations are found in the same genomic region as a disease variant, either as a coincidence or a causal relationship. Cis-regulation of RPS26 gene expression and a type 1 diabetes (T1D) susceptibility locus have been colocalized to the 12q13 genomic region. A recent study has also suggested RPS26 as the most likely susceptibility gene for T1D in this genomic region. However, it is still not clear whether this colocalization is the result of chance alone or if RPS26 expression is directly correlated with T1D susceptibility, and therefore, potentially causal. Here, we derive and apply a statistical test of this hypothesis. We conclude that RPS26 expression is unlikely to be the molecular trait responsible for T1D susceptibility at this locus, at least not in a direct, linear connection.
Association studies; Gene expression; RPS26; T1D
Expression quantitative trait loci (eQTLs) have been found to be enriched in trait-associated single-nucleotide polymorphisms (SNPs). However, whether eQTLs are adaptive to different environmental factors and its relative evolutionary significance compared with nonsynonymous SNPs (NS SNPs) are still elusive. Compiling environmental correlation data from three studies for more than 500,000 SNPs and 42 environmental factors, including climate, subsistence, pathogens, and dietary patterns, we performed a systematic examination of the adaptive patterns of eQTLs to local environment. Compared with intergenic SNPs, eQTLs are significantly enriched in the lower tail of a transformed rank statistic in the environmental correlation analysis, indicating possible adaptation of eQTLs to the majority of 42 environmental factors. The mean enrichment of eQTLs across 42 environmental factors is as great as, if not greater than, that of NS SNPs. The enrichment of eQTLs, although significant across all levels of recombination rate, is inversely correlated with recombination rate, suggesting the presence of selective sweep or background selection. Further pathway enrichment analysis identified a number of pathways with possible environmental adaption from eQTLs. These pathways are mostly related with immune function and metabolism. Our results indicate that eQTLs might have played an important role in recent and ongoing human adaptation and are of special importance for some environmental factors and biological pathways.
eQTLs; environmental adaptation; population genetics
Identification of expression quantitative trait loci (eQTLs) is an emerging area in genomic study. The task requires an integrated analysis of genome-wide single nucleotide polymorphism (SNP) data and gene expression data, raising a new computational challenge due to the tremendous size of data.
We develop a method to identify eQTLs. The method represents eQTLs as information flux between genetic variants and transcripts. We use information theory to simultaneously interrogate SNP and gene expression data, resulting in a Transcriptional Information Map (TIM) which captures the network of transcriptional information that links genetic variations, gene expression and regulatory mechanisms. These maps are able to identify both cis- and trans- regulating eQTLs. The application on a dataset of leukemia patients identifies eQTLs in the regions of the GART, PCP4, DSCAM, and RIPK4 genes that regulate ADAMTS1, a known leukemia correlate.
The information theory approach presented in this paper is able to infer the dependence networks between SNPs and transcripts, which in turn can identify cis- and trans-eQTLs. The application of our method to the leukemia study explains how genetic variants and gene expression are linked to leukemia.
Numerous single nucleotide polymorphisms (SNPs) associated with complex diseases have been identified by genome-wide association studies (GWAS) and expression quantitative trait loci (eQTLs) studies. However, few of these SNPs have explicit biological functions. Recent studies indicated that the SNPs within the 3’UTR regions of susceptibility genes could affect complex traits/diseases by affecting the function of miRNAs. These 3’UTR SNPs are functional candidates and therefore of interest to GWAS and eQTL researchers.
We developed a publicly available online database, MirSNP (http://cmbi.bjmu.edu.cn/mirsnp), which is a collection of human SNPs in predicted miRNA-mRNA binding sites. We identified 414,510 SNPs that might affect miRNA-mRNA binding. Annotations were added to these SNPs to predict whether a SNP within the target site would decrease/break or enhance/create an miRNA-mRNA binding site. By applying MirSNP database to three brain eQTL data sets, we identified four unreported SNPs (rs3087822, rs13042, rs1058381, and rs1058398), which might affect miRNA binding and thus affect the expression of their host genes in the brain. We also applied the MirSNP database to our GWAS for schizophrenia: seven predicted miRNA-related SNPs (p < 0.0001) were found in the schizophrenia GWAS. Our findings identified the possible functions of these SNP loci, and provide the basis for subsequent functional research.
MirSNP could identify the putative miRNA-related SNPs from GWAS and eQTLs researches and provide the direction for subsequent functional researches.
microRNA; Single nucleotide polymorphism (SNP); Genome-wide association study (GWAS); Expression quantitative trait loci (eQTLs); MirSNP
Genetic factors play a role in chronic obstructive pulmonary disease (COPD) but are poorly understood. A number of candidate genes have been proposed on the basis of the pathogenesis of COPD. These include the matrix metalloproteinase (MMP) genes which play a role in tissue remodelling and fit in with the protease - antiprotease imbalance theory for the cause of COPD. Previous genetic studies of MMPs in COPD have had inadequate coverage of the genes, and have reported conflicting associations of both single nucleotide polymorphisms (SNPs) and SNP haplotypes, plausibly due to under-powered studies.
To address these issues we genotyped 26 SNPs, providing comprehensive coverage of reported SNP variation, in MMPs- 1, 9 and 12 from 977 COPD patients and 876 non-diseased smokers of European descent and evaluated their association with disease singly and in haplotype combinations. We used logistic regression to adjust for age, gender, centre and smoking history.
Haplotypes of two SNPs in MMP-12 (rs652438 and rs2276109), showed an association with severe/very severe disease, corresponding to GOLD Stages III and IV.
Those with the common A-A haplotype for these two SNPs were at greater risk of developing severe/very severe disease (p = 0.0039) while possession of the minor G variants at either SNP locus had a protective effect (adjusted odds ratio of 0.76; 95% CI 0.61 - 0.94). The A-A haplotype was also associated with significantly lower predicted FEV1 (42.62% versus 44.79%; p = 0.0129). This implicates haplotypes of MMP-12 as modifiers of disease severity.
Expression quantitative trait loci (eQTL), or genetic variants associated with changes in gene expression, have the potential to assist in interpreting results of genome-wide association studies (GWAS). eQTLs also have varying degrees of tissue specificity. By correlating the statistical significance of eQTLs mapped in various tissue types to their odds ratios reported in a large GWAS by the Wellcome Trust Case Control Consortium (WTCCC), we discovered that there is a significant association between diseases studied genetically and their relevant tissues. This suggests that eQTL data sets can be used to determine tissues that play a role in the pathogenesis of a disease, thereby highlighting these tissue types for further post-GWAS functional studies.
Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Although recent genome-wide association studies (GWAS) have contributed to discovery of SLE susceptibility genes, few studies has been performed in Asian populations. Here, we report a GWAS for SLE examining 891 SLE cases and 3,384 controls and multi-stage replication studies examining 1,387 SLE cases and 28,564 controls in Japanese subjects. Considering that expression quantitative trait loci (eQTLs) have been implicated in genetic risks for autoimmune diseases, we integrated an eQTL study into the results of the GWAS. We observed enrichments of cis-eQTL positive loci among the known SLE susceptibility loci (30.8%) compared to the genome-wide SNPs (6.9%). In addition, we identified a novel association of a variant in the AF4/FMR2 family, member 1 (AFF1) gene at 4q21 with SLE susceptibility (rs340630; P = 8.3×10−9, odds ratio = 1.21). The risk A allele of rs340630 demonstrated a cis-eQTL effect on the AFF1 transcript with enhanced expression levels (P<0.05). As AFF1 transcripts were prominently expressed in CD4+ and CD19+ peripheral blood lymphocytes, up-regulation of AFF1 may cause the abnormality in these lymphocytes, leading to disease onset.
Although recent genome-wide association study (GWAS) approaches have successfully contributed to disease gene discovery, many susceptibility loci are known to be still uncaptured due to strict significance threshold for multiple hypothesis testing. Therefore, prioritization of GWAS results by incorporating additional information is recommended. Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Considering that abnormalities in B cell activity play essential roles in SLE, prioritization based on an expression quantitative trait loci (eQTLs) study for B cells would be a promising approach. In this study, we report a GWAS and multi-stage replication studies for SLE examining 2,278 SLE cases and 31,948 controls in Japanese subjects. We integrated eQTL study into the results of the GWAS and identified AFF1 as a novel SLE susceptibility loci. We also confirmed cis-regulatory effect of the locus on the AFF1 transcript. Our study would be one of the initial successes for detecting novel genetic locus using the eQTL study, and it should contribute to our understanding of the genetic loci being uncaptured by standard GWAS approaches.
While genome-wide association studies (GWASs) have been successful in identifying novel variants associated with various diseases, it has been much more difficult to determine the biological mechanisms underlying these associations. Expression quantitative trait loci (eQTL) provide another dimension to these data by associating single nucleotide polymorphisms (SNPs) with gene expression. We hypothesised that integrating SNPs known to be associated with type 2 diabetes with eQTLs and coexpression networks would enable the discovery of novel candidate genes for type 2 diabetes.
We selected 32 SNPs associated with type 2 diabetes in two or more independent GWASs. We used previously described eQTLs mapped from genotype and gene expression data collected from 1,008 morbidly obese patients to find genes with expression associated with these SNPs. We linked these genes to coexpression modules, and ranked the other genes in these modules using an inverse sum score.
We found 62 genes with expression associated with type 2 diabetes SNPs. We validated our method by linking highly ranked genes in the coexpression modules back to SNPs through a combined eQTL dataset. We showed that the eQTLs highlighted by this method are significantly enriched for association with type 2 diabetes in data from the Wellcome Trust Case Control Consortium (WTCCC, p = 0.026) and the Gene Environment Association Studies (GENEVA, p = 0.042), validating our approach. Many of the highly ranked genes are also involved in the regulation or metabolism of insulin, glucose or lipids.
We have devised a novel method, involving the integration of datasets of different modalities, to discover novel candidate genes for type 2 diabetes.
Genetics of type 2 diabetes; Genomics/proteomics; Mathematical modelling and simulation
Genetic variation in the expression of human xenobiotic metabolism enzymes and transporters (XMETs) leads to inter-individual variability in metabolism of therapeutic agents as well as differed susceptibility to various diseases. Recent expression quantitative traits loci (eQTL) mapping in a few human cells/tissues have identified a number of single nucleotide polymorphisms (SNPs) significantly associated with mRNA expression of many XMET genes. These eQTLs are therefore important candidate markers for pharmacogenetic studies. However, questions remain about whether these SNPs are causative and in what mechanism these SNPs may function. Given the important role of microRNAs (miRs) in gene transcription regulation, we hypothesize that those eQTLs or their proxies in strong linkage disequilibrium (LD) altering miR targeting are likely causative SNPs affecting gene expression. The aim of this study is to identify eQTLs potentially regulating major XMETs via interference with miR targeting. To this end, we performed a genome-wide screening for eQTLs for 409 genes encoding major drug metabolism enzymes, transporters and transcription factors, in publically available eQTL datasets generated from the HapMap lymphoblastoid cell lines and human liver and brain tissue. As a result, 308 eQTLs significantly (p < 10−5) associated with mRNA expression of 101 genes were identified. We further identified 7,869 SNPs in strong LD (r2 ≥ 0.8) with these eQTLs using the 1,000 Genome SNP data. Among these 8,177 SNPs, 27 are located in the 3′-UTR of 14 genes. Using two algorithms predicting miR-SNP interaction, we found that almost all these SNPs (26 out of 27) were predicted to create, abolish, or change the target site for miRs in both algorithms. Many of these miRs were also expressed in the same tissue that the eQTL were identified. Our study provides a strong rationale for continued investigation for the functions of these eQTLs in pharmacogenetic settings.
eQTL; xenobiotic metabolism enzyme and transporter; microRNA; pharmacogenetics; 3′-UTR
Due to the pleiotropic effects of nitric oxide (NO) within the lungs, it is likely that NO is a significant factor in the pathogenesis of chronic obstructive pulmonary disease (COPD). The aim of this study was to test for association between single nucleotide polymorphisms (SNPs) in three NO synthase (NOS) genes and lung function, as well as to examine gene expression and protein levels in relation to the genetic variation.
One SNP in each NOS gene (neuronal NOS (NOS1), inducible NOS (NOS2), and endothelial NOS (NOS3)) was genotyped in the Lung Health Study (LHS) and correlated with lung function. One SNP (rs1800779) was also analyzed for association with COPD and lung function in four COPD case–control populations. Lung tissue expression of NOS3 mRNA and protein was tested in individuals of known genotype for rs1800779. Immunohistochemistry of lung tissue was used to localize NOS3 expression.
For the NOS3 rs1800779 SNP, the baseline forced expiratory volume in one second in the LHS was significantly higher in the combined AG + GG genotypic groups compared with the AA genotypic group. Gene expression and protein levels in lung tissue were significantly lower in subjects with the AG + GG genotypes than in AA subjects. NOS3 protein was expressed in the airway epithelium and subjects with the AA genotype demonstrated higher NOS3 expression compared with AG and GG individuals. However, we were not able to replicate the associations with COPD or lung function in the other COPD study groups.
Variants in the NOS genes were not associated with lung function or COPD status. However, the G allele of rs1800779 resulted in a decrease of NOS3 gene expression and protein levels and this has implications for the numerous disease states that have been associated with this polymorphism.
Chronic obstructive pulmonary disease; Nitric oxide synthase; Polymorphism; Gene expression
Motivation: Expression quantitative trait loci (eQTL) analysis links variations in gene expression levels to genotypes. For modern datasets, eQTL analysis is a computationally intensive task as it involves testing for association of billions of transcript-SNP (single-nucleotide polymorphism) pair. The heavy computational burden makes eQTL analysis less popular and sometimes forces analysts to restrict their attention to just a small subset of transcript-SNP pairs. As more transcripts and SNPs get interrogated over a growing number of samples, the demand for faster tools for eQTL analysis grows stronger.
Results: We have developed a new software for computationally efficient eQTL analysis called Matrix eQTL. In tests on large datasets, it was 2–3 orders of magnitude faster than existing popular tools for QTL/eQTL analysis, while finding the same eQTLs. The fast performance is achieved by special preprocessing and expressing the most computationally intensive part of the algorithm in terms of large matrix operations. Matrix eQTL supports additive linear and ANOVA models with covariates, including models with correlated and heteroskedastic errors. The issue of multiple testing is addressed by calculating false discovery rate; this can be done separately for cis- and trans-eQTLs.
Availability: Matlab and R implementations are available for free at http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL
Recently, several genome-wide association studies (GWAS) have identified many susceptible single nucleotide polymorphisms (SNPs) for chronic obstructive pulmonary disease (COPD) and lung cancer which are two closely related diseases. Among those SNPs, some of them are shared by both the diseases, reflecting there is possible genetic similarity between the diseases. Here we tested the hypothesis that whether those shared SNPs are common predictor for risks or prognosis of COPD and lung cancer. Two SNPs (rs6495309 and rs1051730) located in nicotinic acetylcholine receptor alpha 3 (CHRNA3) gene were genotyped in 1511 patients with COPD, 1559 lung cancer cases and 1677 controls in southern and eastern Chinese populations. We found that the rs6495309CC and rs6495309CT/CC variant genotypes were associated with increased risks of COPD (OR = 1.32, 95% C.I. = 1.14–1.54) and lung cancer (OR = 1.57; 95% CI = 1.31–1.87), respectively. The rs6495309CC genotype contributed to more rapid decline of annual Forced expiratory volume in one second (FEV1) in both COPD cases and controls (P<0.05), and it was associated with advanced stages of COPD (P = 0.033); the rs6495309CT/CC genotypes conferred a poor survival for lung cancer (HR = 1.41, 95%CI = 1.13–1.75). The luciferase assays further showed that nicotine and other tobacco chemicals had diverse effects on the luciferase activity of the rs6495309C or T alleles. However, none of these effects were found for another SNP, rs1051730G>A. The data show a statistical association and suggest biological plausibility that the rs6495309T>C polymorphism contributed to increased risks and poor prognosis of both COPD and lung cancer.
Rationale: Chronic obstructive pulmonary disease (COPD), characterized by airflow limitation, is a disorder with high phenotypic and genetic heterogeneity. Pulmonary emphysema is a major but variable component of COPD; familial data suggest that different components of COPD, such as emphysema, may be influenced by specific genetic factors.
Objectives: To identify genetic determinants of emphysema assessed through high-resolution chest computed tomography in individuals with COPD.
Methods: We performed a genome-wide association study (GWAS) of emphysema determined from chest computed tomography scans with a total of 2,380 individuals with COPD in three independent cohorts of white individuals from (1) a cohort from Bergen, Norway, (2) the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE) Study, and (3) the National Emphysema Treatment Trial (NETT). We tested single-nucleotide polymorphism associations with the presence or absence of emphysema determined by radiologist assessment in two of the three cohorts and a quantitative emphysema trait (percentage of lung voxels less than –950 Hounsfield units) in all three cohorts.
Measurements and Main Results: We identified association of a single-nucleotide polymorphism in BICD1 with the presence or absence of emphysema (P = 5.2 × 10−7 with at least mild emphysema vs. control subjects; P = 4.8 × 10−8 with moderate and more severe emphysema vs. control subjects).
Conclusions: Our study suggests that genetic variants in BICD1 are associated with qualitative emphysema in COPD. Variants in BICD1 are associated with length of telomeres, which suggests that a mechanism linked to accelerated aging may be involved in the pathogenesis of emphysema.
Clinical trial registered with www.clinicaltrials.gov (NCT00292552).
emphysema; chronic obstructive pulmonary disease; BICD1; single-nucleotide polymorphism