Schizophrenia (SZ) is a complex disorder resulting from both genetic and environmental causes with a lifetime prevalence world-wide of 1%; however, there are no specific, sensitive and validated biomarkers for SZ. A general unifying hypothesis has been put forward that disease-associated single nucleotide polymorphisms (SNPs) from genome-wide association study (GWAS) are more likely to be associated with gene expression quantitative trait loci (eQTL). We will describe this hypothesis and review primary methodology with refinements for testing this paradigmatic approach in SZ. We will describe biomarker studies of SZ and testing enrichment of SNPs that are associated both with eQTLs and existing GWAS of SZ. SZ-associated SNPs that overlap with eQTLs can be placed into gene–gene expression, protein–protein and protein–DNA interaction networks. Further, those networks can be tested by reducing/silencing the gene expression levels of critical nodes. We present pilot data to support these methods of investigation such as the use of eQTLs to annotate GWASs of SZ, which could be applied to the field of biomarker discovery. Those networks that have association with SNP markers, especially cis-regulated expression, might lead to a more clear understanding of important candidate genes that predispose to disease and alter expression. This method has general application to many complex disorders.
expression quantitative trait loci; cis-regulatory SNPs; GWAS; gene expression; lymphoblastoid cell lines
Genome-wide association studies (GWAS) have identified loci reproducibly associated with pulmonary diseases; however, the molecular mechanism underlying these associations are largely unknown. The objectives of this study were to discover genetic variants affecting gene expression in human lung tissue, to refine susceptibility loci for asthma identified in GWAS studies, and to use the genetics of gene expression and network analyses to find key molecular drivers of asthma. We performed a genome-wide search for expression quantitative trait loci (eQTL) in 1,111 human lung samples. The lung eQTL dataset was then used to inform asthma genetic studies reported in the literature. The top ranked lung eQTLs were integrated with the GWAS on asthma reported by the GABRIEL consortium to generate a Bayesian gene expression network for discovery of novel molecular pathways underpinning asthma. We detected 17,178 cis- and 593 trans- lung eQTLs, which can be used to explore the functional consequences of loci associated with lung diseases and traits. Some strong eQTLs are also asthma susceptibility loci. For example, rs3859192 on chr17q21 is robustly associated with the mRNA levels of GSDMA (P = 3.55×10−151). The genetic-gene expression network identified the SOCS3 pathway as one of the key drivers of asthma. The eQTLs and gene networks identified in this study are powerful tools for elucidating the causal mechanisms underlying pulmonary disease. This data resource offers much-needed support to pinpoint the causal genes and characterize the molecular function of gene variants associated with lung diseases.
Recent genome-wide association studies (GWAS) have identified genetic variants associated with lung diseases. The challenge now is to find the causal genes in GWAS–nominated chromosomal regions and to characterize the molecular function of disease-associated genetic variants. In this paper, we describe an international effort to systematically capture the genetic architecture of gene expression regulation in human lung. By studying lung specimens from 1,111 individuals of European ancestry, we found a large number of genetic variants affecting gene expression in the lung, or lung expression quantitative trait loci (eQTL). These lung eQTLs will serve as an important resource to aid in the understanding of the molecular underpinnings of lung biology and its disruption in disease. To demonstrate the utility of this lung eQTL dataset, we integrated our data with previous genetic studies on asthma. Through integrative techniques, we identified causal variants and genes in GWAS–nominated loci and found key molecular drivers for asthma. We feel that sharing our lung eQTLs dataset with the scientific community will leverage the impact of previous large-scale GWAS on lung diseases and function by providing much needed functional information to understand the molecular changes introduced by the susceptibility genetic variants.
Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3′ untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation.
The integrated analysis of genotypic and expression data for association with complex traits could identify novel genetic pathways involved in complex traits. We profiled 19,573 expression probes in Epstein-Barr virus-transformed lymphoblastoid cell lines (LCLs) from 299 twins and correlated these with 44 quantitative traits (QTs). For 939 expressed probes correlating with more than one QT, we investigated the presence of eQTL associations in three datasets of 57 CEU HapMap founders and 86 unrelated twins. Genome-wide association analysis of these probes with 2.2 m SNPs revealed 131 potential eQTLs (1,989 eQTL SNPs) overlapping between the HapMap datasets, five of which were in cis (58 eQTL SNPs). We then tested 535 SNPs tagging the eQTL SNPs, for association with the relevant QT in 2,905 twins. We identified nine potential SNP-QT associations (P<0.01) but none significantly replicated in five large consortia of 1,097–16,129 subjects. We also failed to replicate previous reported eQTL associations with body mass index, plasma low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides levels derived from lymphocytes, adipose and liver tissue. Our results and additional power calculations suggest that proponents may have been overoptimistic in the power of LCLs in eQTL approaches to elucidate regulatory genetic effects on complex traits using the small datasets generated to date. Nevertheless, larger tissue-specific expression data sets relevant to specific traits are becoming available, and should enable the adoption of similar integrated analyses in the near future.
Identification of expression quantitative trait loci (eQTLs) is an emerging area in genomic study. The task requires an integrated analysis of genome-wide single nucleotide polymorphism (SNP) data and gene expression data, raising a new computational challenge due to the tremendous size of data.
We develop a method to identify eQTLs. The method represents eQTLs as information flux between genetic variants and transcripts. We use information theory to simultaneously interrogate SNP and gene expression data, resulting in a Transcriptional Information Map (TIM) which captures the network of transcriptional information that links genetic variations, gene expression and regulatory mechanisms. These maps are able to identify both cis- and trans- regulating eQTLs. The application on a dataset of leukemia patients identifies eQTLs in the regions of the GART, PCP4, DSCAM, and RIPK4 genes that regulate ADAMTS1, a known leukemia correlate.
The information theory approach presented in this paper is able to infer the dependence networks between SNPs and transcripts, which in turn can identify cis- and trans-eQTLs. The application of our method to the leukemia study explains how genetic variants and gene expression are linked to leukemia.
Following the recent success of genome-wide association studies in uncovering disease-associated genetic variants, the next challenge is to understand how these variants affect downstream pathways. The most proximal trait to a disease-associated variant, most commonly a single nucleotide polymorphism (SNP), is differential gene expression due to the cis effect of SNP alleles on transcription, translation, and/or splicing gene expression quantitative trait loci (eQTL). Several genome-wide SNP–gene expression association studies have already provided convincing evidence of widespread association of eQTLs. As a consequence, some eQTL associations are found in the same genomic region as a disease variant, either as a coincidence or a causal relationship. Cis-regulation of RPS26 gene expression and a type 1 diabetes (T1D) susceptibility locus have been colocalized to the 12q13 genomic region. A recent study has also suggested RPS26 as the most likely susceptibility gene for T1D in this genomic region. However, it is still not clear whether this colocalization is the result of chance alone or if RPS26 expression is directly correlated with T1D susceptibility, and therefore, potentially causal. Here, we derive and apply a statistical test of this hypothesis. We conclude that RPS26 expression is unlikely to be the molecular trait responsible for T1D susceptibility at this locus, at least not in a direct, linear connection.
Association studies; Gene expression; RPS26; T1D
Expression quantitative trait loci (eQTL) mapping is a powerful tool for identifying genetic regulatory variation. However, at present, most eQTLs in humans were identified using gene expression data from cell lines, and it remains unknown whether these eQTLs also have a regulatory function in other expression contexts, such as human primary tissues. Here we investigate this question using a targeted strategy. Specifically, we selected a subset of large-effect eQTLs identified in the HapMap lymphoblastoid cell lines, and examined the association of these eQTLs with gene expression levels across individuals in five human primary tissues (heart, kidney, liver, lung and testes). We show that genotypes at the eQTLs we selected are often predictive of variation in gene expression levels in one or more of the five primary tissues. The genotype effects in the primary tissues are consistently in the same direction as the effects inferred in the cell lines. Additionally, a number of the eQTLs we tested are found in more than one of the tissues. Our results indicate that functional studies in cell lines may uncover a substantial amount of genetic variation that affects gene expression levels in human primary tissues.
Expression quantitative trait loci (eQTL), or genetic variants associated with changes in gene expression, have the potential to assist in interpreting results of genome-wide association studies (GWAS). eQTLs also have varying degrees of tissue specificity. By correlating the statistical significance of eQTLs mapped in various tissue types to their odds ratios reported in a large GWAS by the Wellcome Trust Case Control Consortium (WTCCC), we discovered that there is a significant association between diseases studied genetically and their relevant tissues. This suggests that eQTL data sets can be used to determine tissues that play a role in the pathogenesis of a disease, thereby highlighting these tissue types for further post-GWAS functional studies.
Numerous single nucleotide polymorphisms (SNPs) associated with complex diseases have been identified by genome-wide association studies (GWAS) and expression quantitative trait loci (eQTLs) studies. However, few of these SNPs have explicit biological functions. Recent studies indicated that the SNPs within the 3’UTR regions of susceptibility genes could affect complex traits/diseases by affecting the function of miRNAs. These 3’UTR SNPs are functional candidates and therefore of interest to GWAS and eQTL researchers.
We developed a publicly available online database, MirSNP (http://cmbi.bjmu.edu.cn/mirsnp), which is a collection of human SNPs in predicted miRNA-mRNA binding sites. We identified 414,510 SNPs that might affect miRNA-mRNA binding. Annotations were added to these SNPs to predict whether a SNP within the target site would decrease/break or enhance/create an miRNA-mRNA binding site. By applying MirSNP database to three brain eQTL data sets, we identified four unreported SNPs (rs3087822, rs13042, rs1058381, and rs1058398), which might affect miRNA binding and thus affect the expression of their host genes in the brain. We also applied the MirSNP database to our GWAS for schizophrenia: seven predicted miRNA-related SNPs (p < 0.0001) were found in the schizophrenia GWAS. Our findings identified the possible functions of these SNP loci, and provide the basis for subsequent functional research.
MirSNP could identify the putative miRNA-related SNPs from GWAS and eQTLs researches and provide the direction for subsequent functional researches.
microRNA; Single nucleotide polymorphism (SNP); Genome-wide association study (GWAS); Expression quantitative trait loci (eQTLs); MirSNP
While genome-wide association studies (GWASs) have been successful in identifying novel variants associated with various diseases, it has been much more difficult to determine the biological mechanisms underlying these associations. Expression quantitative trait loci (eQTL) provide another dimension to these data by associating single nucleotide polymorphisms (SNPs) with gene expression. We hypothesised that integrating SNPs known to be associated with type 2 diabetes with eQTLs and coexpression networks would enable the discovery of novel candidate genes for type 2 diabetes.
We selected 32 SNPs associated with type 2 diabetes in two or more independent GWASs. We used previously described eQTLs mapped from genotype and gene expression data collected from 1,008 morbidly obese patients to find genes with expression associated with these SNPs. We linked these genes to coexpression modules, and ranked the other genes in these modules using an inverse sum score.
We found 62 genes with expression associated with type 2 diabetes SNPs. We validated our method by linking highly ranked genes in the coexpression modules back to SNPs through a combined eQTL dataset. We showed that the eQTLs highlighted by this method are significantly enriched for association with type 2 diabetes in data from the Wellcome Trust Case Control Consortium (WTCCC, p = 0.026) and the Gene Environment Association Studies (GENEVA, p = 0.042), validating our approach. Many of the highly ranked genes are also involved in the regulation or metabolism of insulin, glucose or lipids.
We have devised a novel method, involving the integration of datasets of different modalities, to discover novel candidate genes for type 2 diabetes.
Genetics of type 2 diabetes; Genomics/proteomics; Mathematical modelling and simulation
Genetic variation in the expression of human xenobiotic metabolism enzymes and transporters (XMETs) leads to inter-individual variability in metabolism of therapeutic agents as well as differed susceptibility to various diseases. Recent expression quantitative traits loci (eQTL) mapping in a few human cells/tissues have identified a number of single nucleotide polymorphisms (SNPs) significantly associated with mRNA expression of many XMET genes. These eQTLs are therefore important candidate markers for pharmacogenetic studies. However, questions remain about whether these SNPs are causative and in what mechanism these SNPs may function. Given the important role of microRNAs (miRs) in gene transcription regulation, we hypothesize that those eQTLs or their proxies in strong linkage disequilibrium (LD) altering miR targeting are likely causative SNPs affecting gene expression. The aim of this study is to identify eQTLs potentially regulating major XMETs via interference with miR targeting. To this end, we performed a genome-wide screening for eQTLs for 409 genes encoding major drug metabolism enzymes, transporters and transcription factors, in publically available eQTL datasets generated from the HapMap lymphoblastoid cell lines and human liver and brain tissue. As a result, 308 eQTLs significantly (p < 10−5) associated with mRNA expression of 101 genes were identified. We further identified 7,869 SNPs in strong LD (r2 ≥ 0.8) with these eQTLs using the 1,000 Genome SNP data. Among these 8,177 SNPs, 27 are located in the 3′-UTR of 14 genes. Using two algorithms predicting miR-SNP interaction, we found that almost all these SNPs (26 out of 27) were predicted to create, abolish, or change the target site for miRs in both algorithms. Many of these miRs were also expressed in the same tissue that the eQTL were identified. Our study provides a strong rationale for continued investigation for the functions of these eQTLs in pharmacogenetic settings.
eQTL; xenobiotic metabolism enzyme and transporter; microRNA; pharmacogenetics; 3′-UTR
Rationale: Chromosome 12p has been linked to chronic obstructive pulmonary disease (COPD) in the Boston Early-Onset COPD Study (BEOCOPD), but a susceptibility gene in that region has not been identified.
Objectives: We used high-density single-nucleotide polymorphism (SNP) mapping to implicate a COPD susceptibility gene and an animal model to determine the potential role of SOX5 in lung development and COPD.
Methods: On chromosome 12p, we genotyped 1,387 SNPs in 386 COPD cases from the National Emphysema Treatment Trial and 424 control smokers from the Normative Aging Study. SNPs with significant associations were then tested in the BEOCOPD study and the International COPD Genetics Network. Based on the human results, we assessed histology and gene expression in the lungs of Sox5−/− mice.
Measurements and Main Results: In the case-control analysis, 27 SNPs were significant at P ≤ 0.01. The most significant SNP in the BEOCOPD replication was rs11046966 (National Emphysema Treatment Trial–Normative Aging Study P = 6.0 × 10−4, BEOCOPD P = 1.5 × 10−5, combined P = 1.7 × 10−7), located 3′ to the gene SOX5. Association with rs11046966 was not replicated in the International COPD Genetics Network. Sox5−/− mice showed abnormal lung development, with a delay in maturation before the saccular stage, as early as E16.5. Lung pathology in Sox5−/− lungs was associated with a decrease in fibronectin expression, an extracellular matrix component critical for branching morphogenesis.
Conclusions: Genetic variation in the transcription factor SOX5 is associated with COPD susceptibility. A mouse model suggests that the effect may be due, in part, to its effects on lung development and/or repair processes.
chronic obstructive pulmonary disease; emphysema; knockout mice; lung development; single nucleotide polymorphism
Multiple intergenic single-nucleotide polymorphisms (SNPs) near hedgehog interacting protein (HHIP) on chromosome 4q31 have been strongly associated with pulmonary function levels and moderate-to-severe chronic obstructive pulmonary disease (COPD). However, whether the effects of variants in this region are related to HHIP or another gene has not been proven. We confirmed genetic association of SNPs in the 4q31 COPD genome-wide association study (GWAS) region in a Polish cohort containing severe COPD cases and healthy smoking controls (P = 0.001 to 0.002). We found that HHIP expression at both mRNA and protein levels is reduced in COPD lung tissues. We identified a genomic region located ∼85 kb upstream of HHIP which contains a subset of associated SNPs, interacts with the HHIP promoter through a chromatin loop and functions as an HHIP enhancer. The COPD risk haplotype of two SNPs within this enhancer region (rs6537296A and rs1542725C) was associated with statistically significant reductions in HHIP promoter activity. Moreover, rs1542725 demonstrates differential binding to the transcription factor Sp3; the COPD-associated allele exhibits increased Sp3 binding, which is consistent with Sp3's usual function as a transcriptional repressor. Thus, increased Sp3 binding at a functional SNP within the chromosome 4q31 COPD GWAS locus leads to reduced HHIP expression and increased susceptibility to COPD through distal transcriptional regulation. Together, our findings reveal one mechanism through which SNPs upstream of the HHIP gene modulate the expression of HHIP and functionally implicate reduced HHIP gene expression in the pathogenesis of COPD.
Cachexia, whether assessed by body mass index (BMI) or fat-free mass index (FFMI), affects a significant proportion of patients with chronic obstructive pulmonary disease (COPD), and is an independent risk factor for increased mortality, increased emphysema, and more severe airflow obstruction. The variable development of cachexia among patients with COPD suggests a role for genetic susceptibility. The objective of the present study was to determine genetic susceptibility loci involved in the development of low BMI and FFMI in subjects with COPD. A genome-wide association study (GWAS) of BMI was conducted in three independent cohorts of European descent with Global Initiative for Chronic Obstructive Lung Disease stage II or higher COPD: Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-Points (ECLIPSE; n = 1,734); Norway-Bergen cohort (n = 851); and a subset of subjects from the National Emphysema Treatment Trial (NETT; n = 365). A genome-wide association of FFMI was conducted in two of the cohorts (ECLIPSE and Norway). In the combined analyses, a significant association was found between rs8050136, located in the first intron of the fat mass and obesity–associated (FTO) gene, and BMI (P = 4.97 × 10−7) and FFMI (P = 1.19 × 10−7). We replicated the association in a fourth, independent cohort consisting of 502 subjects with COPD from COPDGene (P = 6 × 10−3). Within the largest contributing cohort of our analysis, lung function, as assessed by forced expiratory volume at 1 second, varied significantly by FTO genotype. Our analysis suggests a potential role for the FTO locus in the determination of anthropomorphic measures associated with COPD.
chronic obstructive pulmonary disease genetics; chronic obstructive pulmonary disease epidemiology; chronic obstructive pulmonary disease metabolism; genome-wide association study
We conducted a comprehensive study of copy number variants (CNVs) well-tagged by SNPs (r2≥0.8) by analyzing their effect on gene expression and their association with disease susceptibility and other complex human traits. We tested whether these CNVs were more likely to be functional than frequency-matched SNPs as trait-associated loci or as expression quantitative trait loci (eQTLs) influencing phenotype by altering gene regulation. Our study found that CNV–tagging SNPs are significantly enriched for cis eQTLs; furthermore, we observed that trait associations from the NHGRI catalog show an overrepresentation of SNPs tagging CNVs relative to frequency-matched SNPs. We found that these SNPs tagging CNVs are more likely to affect multiple expression traits than frequency-matched variants. Given these findings on the functional relevance of CNVs, we created an online resource of expression-associated CNVs (eCNVs) using the most comprehensive population-based map of CNVs to inform future studies of complex traits. Although previous studies of common CNVs that can be typed on existing platforms and/or interrogated by SNPs in genome-wide association studies concluded that such CNVs appear unlikely to have a major role in the genetic basis of several complex diseases examined, our findings indicate that it would be premature to dismiss the possibility that even common CNVs may contribute to complex phenotypes and at least some common diseases.
Despite the large number of SNPs found to be reproducibly associated with complex diseases, they collectively account for only a small proportion of the overall heritability to such traits. CNVs have thus been proposed to explain some of the missing heritability and to alter disease susceptibility. However, a recent study of the genetics of 8 common diseases involving 16,000 cases and 3,000 controls failed to identify any novel CNVs associated with disease and concluded that CNVs are unlikely to play a major role in their etiology. Studies we report here show that we must be careful not to dismiss the possibility that CNVs may indeed underlie some of the observed associations with complex disease. Our findings show that well-tagged CNVs are disproportionately more likely to be eQTLs, as well as cis-eQTLs, than frequency-matched SNPs; furthermore, reproducible trait associations, as represented in the NHGRI catalog, are enriched for well-tagged CNVs than frequency-matched SNPs. Because of these findings on the strong functional relevance of these CNVs, we created a database (available at http://www.scandb.org/) of expression associated CNVs to supplement our earlier studies of SNP eQTLs and to contribute to future studies of the genetics of complex traits.
Although genome-wide association studies (GWAS) of complex traits have yielded more reproducible associations than had been discovered using any other approach, the loci characterized to date do not account for much of the heritability to such traits and, in general, have not led to improved understanding of the biology underlying complex phenotypes. Using a web site we developed to serve results of expression quantitative trait locus (eQTL) studies in lymphoblastoid cell lines from HapMap samples (http://www.scandb.org), we show that single nucleotide polymorphisms (SNPs) associated with complex traits (from http://www.genome.gov/gwastudies/) are significantly more likely to be eQTLs than minor-allele-frequency–matched SNPs chosen from high-throughput GWAS platforms. These findings are robust across a range of thresholds for establishing eQTLs (p-values from 10−4–10−8), and a broad spectrum of human complex traits. Analyses of GWAS data from the Wellcome Trust studies confirm that annotating SNPs with a score reflecting the strength of the evidence that the SNP is an eQTL can improve the ability to discover true associations and clarify the nature of the mechanism driving the associations. Our results showing that trait-associated SNPs are more likely to be eQTLs and that application of this information can enhance discovery of trait-associated SNPs for complex phenotypes raise the possibility that we can utilize this information both to increase the heritability explained by identifiable genetic factors and to gain a better understanding of the biology underlying complex traits.
We show here that single nucleotide polymorphisms (SNPs) associated with complex traits (as identified in the catalog of results from genome-wide association studies http://www.genome.gov/gwastudies/) are more likely than other SNPs chosen from high-throughput genotyping platforms to predict expression levels of genes. These observations confirm that genetic risk factors for complex traits will often affect phenotype by altering the amount or timing of protein production, rather than by changing the type of protein produced. This knowledge can be used to improve our ability to discover genetic risk factors for complex traits and to improve our understanding of their underlying biology.
Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Although recent genome-wide association studies (GWAS) have contributed to discovery of SLE susceptibility genes, few studies has been performed in Asian populations. Here, we report a GWAS for SLE examining 891 SLE cases and 3,384 controls and multi-stage replication studies examining 1,387 SLE cases and 28,564 controls in Japanese subjects. Considering that expression quantitative trait loci (eQTLs) have been implicated in genetic risks for autoimmune diseases, we integrated an eQTL study into the results of the GWAS. We observed enrichments of cis-eQTL positive loci among the known SLE susceptibility loci (30.8%) compared to the genome-wide SNPs (6.9%). In addition, we identified a novel association of a variant in the AF4/FMR2 family, member 1 (AFF1) gene at 4q21 with SLE susceptibility (rs340630; P = 8.3×10−9, odds ratio = 1.21). The risk A allele of rs340630 demonstrated a cis-eQTL effect on the AFF1 transcript with enhanced expression levels (P<0.05). As AFF1 transcripts were prominently expressed in CD4+ and CD19+ peripheral blood lymphocytes, up-regulation of AFF1 may cause the abnormality in these lymphocytes, leading to disease onset.
Although recent genome-wide association study (GWAS) approaches have successfully contributed to disease gene discovery, many susceptibility loci are known to be still uncaptured due to strict significance threshold for multiple hypothesis testing. Therefore, prioritization of GWAS results by incorporating additional information is recommended. Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Considering that abnormalities in B cell activity play essential roles in SLE, prioritization based on an expression quantitative trait loci (eQTLs) study for B cells would be a promising approach. In this study, we report a GWAS and multi-stage replication studies for SLE examining 2,278 SLE cases and 31,948 controls in Japanese subjects. We integrated eQTL study into the results of the GWAS and identified AFF1 as a novel SLE susceptibility loci. We also confirmed cis-regulatory effect of the locus on the AFF1 transcript. Our study would be one of the initial successes for detecting novel genetic locus using the eQTL study, and it should contribute to our understanding of the genetic loci being uncaptured by standard GWAS approaches.
Gene expression quantitative trait loci (eQTL) are useful for identifying single nucleotide polymorphisms (SNPs) associated with diseases. At times, a genetic variant may be associated with a master regulator involved in the manifestation of a disease. The downstream target genes of the master regulator are typically co-expressed and share biological function. Therefore, it is practical to screen for eQTLs by identifying SNPs associated with the targets of a transcript-regulator (TR). We used a multivariate regression with the gene expression of known targets of TRs and SNPs to identify TReQTLs in European (CEU) and African (YRI) HapMap populations. A nominal p-value of <1×10−6 revealed 234 SNPs in CEU and 154 in YRI as TReQTLs. These represent 36 independent (tag) SNPs in CEU and 39 in YRI affecting the downstream targets of 25 and 36 TRs respectively. At a false discovery rate (FDR) = 45%, one cis-acting tag SNP (within 1 kb of a gene) in each population was identified as a TReQTL. In CEU, the SNP (rs16858621) in Pcnxl2 was found to be associated with the genes regulated by CREM whereas in YRI, the SNP (rs16909324) was linked to the targets of miRNA hsa-miR-125a. To infer the pathways that regulate expression, we ranked TReQTLs by connectivity within the structure of biological process subtrees. One TReQTL SNP (rs3790904) in CEU maps to Lphn2 and is associated (nominal p-value = 8.1×10−7) with the targets of the X-linked breast cancer suppressor Foxp3. The structure of the biological process subtree and a gene interaction network of the TReQTL revealed that tumor necrosis factor, NF-kappaB and variants in G-protein coupled receptors signaling may play a central role as communicators in Foxp3 functional regulation. The potential pleiotropic effect of the Foxp3 TReQTLs was gleaned from integrating mRNA-Seq data and SNP-set enrichment into the analysis.
Amyotrophic lateral sclerosis (ALS) is a progressive, neurodegenerative disease characterized by loss of upper and lower motor neurons. ALS is considered to be a complex trait and genome-wide association studies (GWAS) have implicated a few susceptibility loci. However, many more causal loci remain to be discovered. Since it has been shown that genetic variants associated with complex traits are more likely to be eQTLs than frequency-matched variants from GWAS platforms, we conducted a two-stage genome-wide screening for eQTLs associated with ALS. In addition, we applied an eQTL analysis to finemap association loci. Expression profiles using peripheral blood of 323 sporadic ALS patients and 413 controls were mapped to genome-wide genotyping data. Subsequently, data from a two-stage GWAS (3,568 patients and 10,163 controls) were used to prioritize eQTLs identified in the first stage (162 ALS, 207 controls). These prioritized eQTLs were carried forward to the second sample with both gene-expression and genotyping data (161 ALS, 206 controls). Replicated eQTL SNPs were then tested for association in the second-stage GWAS data to find SNPs associated with disease, that survived correction for multiple testing. We thus identified twelve cis eQTLs with nominally significant associations in the second-stage GWAS data. Eight SNP-transcript pairs of highest significance (lowest p = 1.27×10−51) withstood multiple-testing correction in the second stage and modulated CYP27A1 gene expression. Additionally, we show that C9orf72 appears to be the only gene in the 9p21.2 locus that is regulated in cis, showing the potential of this approach in identifying causative genes in association loci in ALS. This study has identified candidate genes for sporadic ALS, most notably CYP27A1. Mutations in CYP27A1 are causal to cerebrotendinous xanthomatosis which can present as a clinical mimic of ALS with progressive upper motor neuron loss, making it a plausible susceptibility gene for ALS.
Genetic factors play a role in chronic obstructive pulmonary disease (COPD) but are poorly understood. A number of candidate genes have been proposed on the basis of the pathogenesis of COPD. These include the matrix metalloproteinase (MMP) genes which play a role in tissue remodelling and fit in with the protease - antiprotease imbalance theory for the cause of COPD. Previous genetic studies of MMPs in COPD have had inadequate coverage of the genes, and have reported conflicting associations of both single nucleotide polymorphisms (SNPs) and SNP haplotypes, plausibly due to under-powered studies.
To address these issues we genotyped 26 SNPs, providing comprehensive coverage of reported SNP variation, in MMPs- 1, 9 and 12 from 977 COPD patients and 876 non-diseased smokers of European descent and evaluated their association with disease singly and in haplotype combinations. We used logistic regression to adjust for age, gender, centre and smoking history.
Haplotypes of two SNPs in MMP-12 (rs652438 and rs2276109), showed an association with severe/very severe disease, corresponding to GOLD Stages III and IV.
Those with the common A-A haplotype for these two SNPs were at greater risk of developing severe/very severe disease (p = 0.0039) while possession of the minor G variants at either SNP locus had a protective effect (adjusted odds ratio of 0.76; 95% CI 0.61 - 0.94). The A-A haplotype was also associated with significantly lower predicted FEV1 (42.62% versus 44.79%; p = 0.0129). This implicates haplotypes of MMP-12 as modifiers of disease severity.
Motivation: Genome-wide association studies (GWAS) generate relationships between hundreds of thousands of single nucleotide polymorphisms (SNPs) and complex phenotypes. The contribution of the traditionally overlooked copy number variations (CNVs) to complex traits is also being actively studied. To facilitate the interpretation of the data and the designing of follow-up experimental validations, we have developed a database that enables the sensible prioritization of these variants by combining several approaches, involving not only publicly available physical and functional annotations but also multilocus linkage disequilibrium (LD) annotations as well as annotations of expression quantitative trait loci (eQTLs).
Results: For each SNP, the SCAN database provides: (i) summary information from eQTL mapping of HapMap SNPs to gene expression (evaluated by the Affymetrix exon array) in the full set of HapMap CEU (Caucasians from UT, USA) and YRI (Yoruba people from Ibadan, Nigeria) samples; (ii) LD information, in the case of a HapMap SNP, including what genes have variation in strong LD (pairwise or multilocus LD) with the variant and how well the SNP is covered by different high-throughput platforms; (iii) summary information available from public databases (e.g. physical and functional annotations); and (iv) summary information from other GWAS. For each gene, SCAN provides annotations on: (i) eQTLs for the gene (both local and distant SNPs) and (ii) the coverage of all variants in the HapMap at that gene on each high-throughput platform. For each genomic region, SCAN provides annotations on: (i) physical and functional annotations of all SNPs, genes and known CNVs within the region and (ii) all genes regulated by the eQTLs within the region.
Supplementary information: Supplementary data are available at Bioinformatics online.
Systemic lupus erythematosus (SLE) is a serious prototype autoimmune disease characterized by chronic inflammation, auto-antibody production and multi-organ damage. Recent association studies have identified a long list of loci that were associated with SLE with relatively high statistical power. However, most of them only established the statistical associations of genetic markers and SLE at the DNA level without supporting evidence of functional relevance. Here, using publically available datasets, we performed integrative analyses (gene relationship across implicated loci analysis, differential gene expression analysis and functional annotation clustering analysis) and combined with expression quantitative trait loci (eQTLs) results to dissect functional mechanisms underlying the associations for SLE. We found that 14 SNPs, which were significantly associated with SLE in previous studies, have cis-regulation effects on four eQTL genes (HLA-DQA1, HLA-DQB1, HLA-DQB2, and IRF5) that were also differentially expressed in SLE-related cell groups. The functional evidence, taken together, suggested the functional mechanisms underlying the associations of 14 SNPs and SLE. The study may serve as an example of mining publically available datasets and results in validation of significant disease-association results. Utilization of public data resources for integrative analyses may provide novel insights into the molecular genetic mechanisms underlying human diseases.
Rationale: Chronic obstructive pulmonary disease (COPD), characterized by airflow limitation, is a disorder with high phenotypic and genetic heterogeneity. Pulmonary emphysema is a major but variable component of COPD; familial data suggest that different components of COPD, such as emphysema, may be influenced by specific genetic factors.
Objectives: To identify genetic determinants of emphysema assessed through high-resolution chest computed tomography in individuals with COPD.
Methods: We performed a genome-wide association study (GWAS) of emphysema determined from chest computed tomography scans with a total of 2,380 individuals with COPD in three independent cohorts of white individuals from (1) a cohort from Bergen, Norway, (2) the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE) Study, and (3) the National Emphysema Treatment Trial (NETT). We tested single-nucleotide polymorphism associations with the presence or absence of emphysema determined by radiologist assessment in two of the three cohorts and a quantitative emphysema trait (percentage of lung voxels less than –950 Hounsfield units) in all three cohorts.
Measurements and Main Results: We identified association of a single-nucleotide polymorphism in BICD1 with the presence or absence of emphysema (P = 5.2 × 10−7 with at least mild emphysema vs. control subjects; P = 4.8 × 10−8 with moderate and more severe emphysema vs. control subjects).
Conclusions: Our study suggests that genetic variants in BICD1 are associated with qualitative emphysema in COPD. Variants in BICD1 are associated with length of telomeres, which suggests that a mechanism linked to accelerated aging may be involved in the pathogenesis of emphysema.
Clinical trial registered with www.clinicaltrials.gov (NCT00292552).
emphysema; chronic obstructive pulmonary disease; BICD1; single-nucleotide polymorphism
The polymorphism in microRNA target site (PolymiRTS) database aims to identify single-nucleotide polymorphisms (SNPs) that affect miRNA targeting in human and mouse. These polymorphisms can disrupt the regulation of gene expression by miRNAs and are candidate genetic variants responsible for transcriptional and phenotypic variation. The database is therefore organized to provide links between SNPs in miRNA target sites, cis-acting expression quantitative trait loci (eQTLs), and the results of genome-wide association studies (GWAS) of human diseases. Here, we describe new features that have been integrated in the PolymiRTS database, including: (i) polymiRTSs in genes associated with human diseases and traits in GWAS, (ii) polymorphisms in target sites that have been supported by a variety of experimental methods and (iii) polymorphisms in miRNA seed regions. A large number of newly identified microRNAs and SNPs, recently published mouse phenotypes, and human and mouse eQTLs have also been integrated into the database. The PolymiRTS database is available at http://compbio.uthsc.edu/miRSNP/.
Recently, several genome-wide association studies (GWAS) have identified many susceptible single nucleotide polymorphisms (SNPs) for chronic obstructive pulmonary disease (COPD) and lung cancer which are two closely related diseases. Among those SNPs, some of them are shared by both the diseases, reflecting there is possible genetic similarity between the diseases. Here we tested the hypothesis that whether those shared SNPs are common predictor for risks or prognosis of COPD and lung cancer. Two SNPs (rs6495309 and rs1051730) located in nicotinic acetylcholine receptor alpha 3 (CHRNA3) gene were genotyped in 1511 patients with COPD, 1559 lung cancer cases and 1677 controls in southern and eastern Chinese populations. We found that the rs6495309CC and rs6495309CT/CC variant genotypes were associated with increased risks of COPD (OR = 1.32, 95% C.I. = 1.14–1.54) and lung cancer (OR = 1.57; 95% CI = 1.31–1.87), respectively. The rs6495309CC genotype contributed to more rapid decline of annual Forced expiratory volume in one second (FEV1) in both COPD cases and controls (P<0.05), and it was associated with advanced stages of COPD (P = 0.033); the rs6495309CT/CC genotypes conferred a poor survival for lung cancer (HR = 1.41, 95%CI = 1.13–1.75). The luciferase assays further showed that nicotine and other tobacco chemicals had diverse effects on the luciferase activity of the rs6495309C or T alleles. However, none of these effects were found for another SNP, rs1051730G>A. The data show a statistical association and suggest biological plausibility that the rs6495309T>C polymorphism contributed to increased risks and poor prognosis of both COPD and lung cancer.