|Home | About | Journals | Submit | Contact Us | Français|
We conducted a systematic study of top susceptibility variants from a genome-wide association (GWA) study of Bipolar Disorder to gain insight into the functional consequences of genetic variation influencing disease risk. We report here the results of experiments to explore the effects of these susceptibility variants on DNA methylation and mRNA expression in human cerebellum samples. Among the top susceptibility variants, we identified an enrichment of cis regulatory loci on mRNA expression (eQTLs), and a significant excess of quantitative trait loci for DNA CpG methylation, hereafter referred to as mQTLs. Bipolar Disorder susceptibility variants that cis-regulate both cerebellar expression and methylation of the same gene are a very small proportion of Bipolar Disorder susceptibility variants. This finding suggests that mQTLs and eQTLs provide orthogonal ways of functionally annotating genetic variation within the context of studies of pathophysiology in brain. No lymphocyte mQTL enrichment was found, suggesting that mQTL enrichment was specific to the cerebellum, in contrast to eQTLs. Separately, we found that using mQTL information to restrict the number of SNPs studied enhances our ability to detect a significant association. With this restriction a priori informed by the observed functional enrichment, we identified a significant association (rs12618769, Pbonferroni<0.05) from two other GWA studies (TGen+GAIN; 2,191 cases and 1,434 controls) of Bipolar Disorder, which we replicated in an independent GWA study (WTCCC). Collectively, our findings highlight the importance of integrating functional annotation of genetic variants for gene expression and DNA methylation to advance biological understanding of Bipolar Disorder.
Epigenetic phenomena 1, including DNA methylation, histone modification, and RNA-induced gene silencing, are important mechanisms of transcriptional regulation. Rapid advances in the development of high-throughput assays facilitate correspondingly dramatic advances in elucidating how such processes mediate gene expression and other phenotypes 2, 3. Enhanced understanding of epigenetic changes promises to shed light on disease mechanisms and clarify the consequences of epigenetic disruptions on pathophysiology 4. Single-gene disruptions of the epigenetic machinery have long been recognized to cause neurodevelopmental disorders; for example, mutations in the MECP2 gene which encodes a protein that binds to methylated DNA are the cause of Rett syndrome 5. Little, however, is known regarding the epigenetic component of common complex diseases. Recent studies from our group 6, 7 and others 8 on the utility of expression quantitative trait loci (eQTLs) to improve our understanding of complex traits genetics and to enhance discovery from genome-wide association (GWA) studies prompted us to evaluate to what extent genetic and epigenetic regulation of gene expression may contribute to susceptibility to neuropsychiatric disorders.
Focusing on Bipolar Disorder, we were particularly interested in the question of whether mapping methylation levels and transcript levels to genomic loci as methylation quantitative trait loci (mQTLs) 9 and expression quantitative trait loci (eQTLs) 10 in brain respectively, may provide information on the functional consequences of otherwise anonymous susceptibility variants identified by GWA studies on Bipolar Disorder 11-13, assist in the identification of novel susceptibility loci, and provide a framework for interpreting and prioritizing results from GWA studies of neuropsychiatric disorders. For an investigation of disease epigenetics, Bipolar Disorder provides a particularly striking example, as dynamic epigenetic fluctuations in gene regulation may contribute to fluctuation of disease course 14, 15. In particular, a systematic genome-wide investigation of genetic control of DNA methylation allowed us to test such genomic loci for effect on disease risk.
DNA and RNA of human cerebellum samples from 153 individuals of European ancestry were obtained from the Stanley Medical Research Institute (SMRI). As previously described 9, the specimens came from patients with psychiatric disorders and normal controls. Information on age, gender, ethnicity, brain pH, smoking and alcohol use, suicide status, postmortem interval (PMI) is available for the samples included in this study.
DNA was extracted from cerebellar tissues obtained from SMRI. Sample genotyping was conducted using Genome-Wide Human SNP Array 5.0, at Translational Genomics Research Institute (Tgen). BRLMM-p (Affymetrix) was used as the genotype calling algorithm. SNPs with call rates less than 99% were excluded from the analysis. SNPs showing departure from Hardy-Weinberg equilibrium (HWE) were filtered (p < 0.001). Of the remaining SNPs, only SNPs showing minor allele frequency (MAF) of at least 10% were carried forward for further analysis. These quality control filters yielded nearly 239,000 SNPs. We expanded on the earlier study 9 by performing imputation using MaCH v1.0 16, 17 to increase the density of interrogated SNPs. Only SNPs with MAF of at least 10% and Rsq (the squared correlation between imputed and true genotypes) > 0.3. 886,338 autosomal SNPs were analyzed for QTL mapping.
To test for the presence of population structure, principal components analysis (PCA) as implemented in Eigenstrat was performed. To identify novel relationships between the samples, pairwise identity by state analysis was done using PLINK.
RNAs were extracted from the cerebellum cortex of 132 samples using the RNeasy Mini kit (Qiagen, Valencia, CA). The concentration and A260/A280 ratio were measured on the NanoDrop spectrophotometer. The ratio of 28S to 18S rRNA and RNA Integrity Number (RIN) were measured using an RNA LabChip kit on the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). In order to avoid use of seriously degraded samples, only RNA samples with a RIN > 7 were used for the expression profiling. Affymetrix Human Gene 1.0 ST array was used for whole genome transcriptome profiling at the NIH Neuroscience Microarray Consortium at the facility at Yale University.
DNA methylation was assayed using the Illumina Infinium HumanMethylation27 BeadChips (Illumina Inc., San Diego, CA) platform; the assay was performed at the Genomics Core Facility at Northwestern University. 27,578 CpG sites were probed, almost all of which (approximately 97.7%) are within 1500 bp of a transcription start site (TSS). Methylation level at each interrogated CpG site was defined as: methylated probe intensity/(sum of methylated and unmethylated probe intensities) Therefore, methylation level is a quantitative trait whose values range from 0 to 1.
Affymetrix Expression Console was used to process raw data. All probes, which contain common SNPs (MAF ≥ 0.01) based on the 1000 Genome Project data and HapMap data, or could be mapped to multiple genomic regions, were excluded from the analysis. ComBat 18 was used to correct for batch effects within the methylation and expression array data. For later analysis of each technical replicate pair, the data were averaged for the replicated samples to obtain a single datum. In order to remove the effects of known and unknown covariates on the data, surrogate variable analysis (SVA) 19 was applied and the identified surrogate variables were regressed out. We have found that these methods perform well in removing the effects of batch and known covariates such as pH. To fit a normal distribution, quantile normalization was used for both expression and methylation residuals.
Imputed genotype dosage data was analyzed for association with methylation and expression phenotypes using PLINK 20. Linear regression of the phenotype with dosage of the minor allele for each SNP was performed. Cis regions were defined as 4 Mb of the probe site. This definition allowed us to compare our findings in cerebellum to our earlier observations in lymphocytes 7, 21. Trans regions included the rest of the genome. The significance thresholds were based on the estimated number of probes for the cis and trans-analysis. For the cis analysis, the significance threshold was 0.01. For the trans-analysis, 8597 and 25834 probes were analyzed for the methylation and expression data respectively. The trans-analysis significance threshold was set to 0.05 divided by the number of probes.
To determine the level of (cis) mQTL (or eQTL or joint eQTL-mQTL) enrichment in the set of top susceptibility variants (defined in terms of p-value threshold) from a GWAS, we generated 1000 sets, randomly drawn without replacement from a control set (e.g., Affymetrix GeneChip 500K Mapping Array Set, the study platform, for the WTCCC GWAS), which match the minor allele frequency distribution of the SNP set of interest. For each random set, the (cis) mQTL (or eQTL or joint eQTL-mQTL) count was determined, yielding the distribution under the null. The observed (cis) mQTL (or eQTL or joint eQTL-mQTL) count enabled us to calculate an empirical p-value for the level of enrichment by considering the proportion of random sets with mQTL (or eQTL or joint eQTL-mQTL) count that matches or exceeds the observed count. We tested the dependence of our enrichment results on the definition of cis by doing these analyses when cis is defined as within 1 Mb of the gene.
We utilized the results from several GWA studies of Bipolar Disorder. Our TGen sample set consists of 1,190 Bipolar Disorder cases from the Bipolar Genome Study (BiGS) and 401 controls 12. The GAIN sample set, from an earlier study conducted under the GAIN initiative, consists of 1,001 Bipolar Disorder cases and 1,033 controls of European ancestry 11. The WTCCC study of seven common diseases, including Bipolar Disorder, had an experimental design of 2000 cases for each disease and 3000 shared controls 13.
We re-analyzed our published mQTL dataset 9 by including imputed genotype data (see Materials and Methods), and extended the cis-region to 4 Mb. Analogous to the traditional classification of eQTLs, we sought to classify methylation QTLs (mQTLs) into cis and trans regulators of methylation phenotypes. SNPs within the cis-region of a variably methylated CpG site were included in our analysis. We identified 5974 distinct genes with cis mQTLs at association p < 0.01, and 4143 distinct genes with cis mQTLs at a more stringent p < 0.001. Supplemental Figure 1 illustrates the genomic locations of the identified cis mQTLs. To further the investigation of these mQTLs, we have deposited this dataset into SCAN (http://www.scandb.org) 6, a publicly available genomic resource containing initially the results of eQTL studies conducted in lymphoblastoid cell lines.
We sought to test the hypothesis that the top associations from the WTCCC Bipolar Disorder study are enriched for variants that affect methylation levels. This follows our earlier study on the use of eQTL enrichment analysis and eQTL-based annotation to enhance discovery of reproducible signals from GWA studies 6. That initial study was performed on (transformed) lymphocytes. We thus wanted to test a secondary hypothesis, whether SNPs that exert an influence on methylation levels in brain, the presumed tissue of relevance for Bipolar Disorder, are enriched in SNPs showing strong associations with Bipolar Disorder.
Figure 1 illustrates the result of our enrichment analysis applied to the WTCCC study of Bipolar Disorder. First, we took the top SNP associations (defined as p < 0.001; n=935) and asked whether there was an excess of cis mQTLs relative to null expectation. For the definition of mQTLs, we used the threshold of p < 0.01; a subsequent analysis using a more stringent p < 0.001 yielded similar results (so that the enrichment results appear robust to the definition of mQTL). To generate the null distribution, we conducted simulation studies, as previously described 6. See Materials and Methods for details of the simulation procedure. At the given mQTL threshold, we identified 132 such mQTLs among the top Bipolar Disorder associated SNPs (shown in Figure 1 as a black dot). From the simulated datasets (N=1000) each comprising of minor allele frequency-matched SNPs, the expected cis mQTL count was found to be 108.9 with standard deviation of 9.7. The enrichment p value is 0.01.
We then tested for cis mQTL enrichment for all SNP-disease associations satisfying p < 0.0001. Such a test would establish robustness of our observation to the definition of “top SNP associations” used in the analysis. We found that the cis mQTL enrichment held robustly (p = 0.01) at the more stringent SNP-disease association cutoff.
We also tested for enrichment of trans mQTLs among the top associated SNPs. At this initial mQTL threshold, there was no evidence for any significant excess. In 1000 simulated sets, the expected count of trans mQTLs was found to be 201.8 (SD=12.4); the observed count, however, was 199.
In the WTCCC study, cis mQTLs comprised 14% of the most significant associations (p < 0.001). We asked whether a similar level of cis mQTL enrichment could be observed among the top SNPs in the GAIN and TGen studies of Bipolar Disorder 11, 12. At the given association threshold, a similar percentage, 14.4%, of the top SNPs from the GAIN study were found to be cis mQTLs. Consistent with this trend, the top variants from the TGen study show a significant overlap with local mQTLs (20%). Furthermore, as in the WTCCC study, we observed no enrichment of trans mQTLs (enrichment p > 0.50) among the susceptibility variants from either the GAIN or TGen study.
Given the reported enrichment of cis mQTLs, we applied mQTL annotation to the top loci from GWAS of Bipolar Disorder, as represented in the NHGRI GWAS repository 22, to assess the functional relevance of these reported susceptibility loci (see Table 1).
We then sought to determine the extent of overlap between the mQTLs and the eQTLs in the top Bipolar Disorder associated SNPs. First, we evaluated for (cis) eQTL enrichment in cerebellum among the top Bipolar Disorder associated SNPs. As expected from our earlier study 7 (which was done, however, in transformed lymphocytes), we found a significant excess of locally acting eQTLs in cerebellum among the top SNPs. Again, from 1000 simulated sets, the observed count was 222 while the expected count was 167.9 with standard deviation of 10.9; no simulated set yielded a (cis) eQTL count that matched the observed count for an enrichment p value < 0.001. In contrast, we found no evidence for excess of trans-acting eQTLs (enrichment p = 0.36) in cerebellum among the top Bipolar Disorder associated SNPs. The enrichment we report here does not appear to depend on the 4 Mb definition of cis since the use of a lower threshold (1 Mb) preserves the enrichment robustly (enrichment p < 0.001).
Given this enrichment for eQTLs, we asked whether the enrichment for cis mQTLs could be explained by our observation on the excess of cis eQTLs relative to expectation in cerebellum. Of the 132 cis mQTLs among the top Bipolar Disorder associated SNPs, 55 were found to be mQTLs that are not cis eQTLs. The remaining 77 SNPs are cis eQTLs. However, when we required that cis mQTLs and cis eQTLs have the same target genes in the SNP-gene pair, of 169 identified mQTL-gene pairs (resulting from several of the 132 cis mQTLs having multiple methylation gene targets), we identified only 8 pairs with the same SNP-gene combination as cis eQTL. A cis mQTL SNP for a given gene may be a cis eQTL of a different gene target. Of these 8 pairs, 5 are for one gene, DLG5 (for cis joint eQTLs-mQTLs: rs10762763, rs11002306, rs1866437, rs2579177, and rs2579178, the first two of which are intronic to the gene). The overlap of the target genes of the cis mQTL and of the cis eQTL is not greater than expected (hypergeometric distribution, p = 0.25).
To evaluate the significance of the observed number of joint cis eQTLs-mQTLs (for the same target gene) among the top Bipolar Disorder associated SNPs, we conducted simulations (see Materials and Methods for details of the simulation procedure to empirically generate the null distribution to test the significance of the observed count of such joint cis eQTLs-mQTLs). The results of the simulation analysis on joint cis eQTLs-mQTLs show that such SNPs altering expression level and DNA methylation of the same proximal transcript constitute a small proportion of either cis mQTLs or cis eQTLs, and that the count of such joint eQTLs-mQTLs falls well within the range (enrichment p = 0.75) that can occur by chance based on our simulation analysis. 75% of all simulated sets matched or exceeded the actual count of joint (cis) eQTLs-mQTLs.
Notably, SNPs affecting both methylation and expression (allowing the respective target genes to be different) are significantly enriched (enrichment p < 0.001) among the top Bipolar Disorder associated SNPs.
Thus, we conclude that the enrichment of cerebellum mQTLs and that of eQTLs in the Bipolar Disorder data are independent features. The variants that are both eQTLs and mQTLs for the same proximal target transcript are not driving the observed enrichment. They should thus be considered separately in terms of biological functionality.
To determine whether there is tissue specificity regarding the contribution of mQTLs to disease, we also assessed the level of overlap between mQTLs in cerebellum and in LCLs among the top associations with Bipolar Disorder. We compared our findings with the (publicly available) results from a recent association analysis of methylation levels 23 in the HapMap lymphoblastoid cell lines to look for evidence of cross-tissue common regulators. The methylation profiles for these cell lines were assayed using the same platform as our human cerebellum samples. We found among the top SNP associations (p < 0.001) from the WTCCC study no overlap between mQTLs identified in brain with the cis mQTLs identified in the LCLs. Indeed, none of the mQTLs identified in LCLs overlapped with the most significant SNPs (p < 0.001) from the WTCCC study.
We used the results of GWA studies of Bipolar Disorder (GAIN and TGen) 11, 12 in order to evaluate whether incorporating a priori functional (mQTL) information into GWA studies would have utility for increasing power to detect significant associations. The meta-analysis of the original GWA studies yielded no genome-wide significant findings 11, 12. But restricting the association analysis to cis mQTLs (p < 0.001, n=59959) identified a SNP, rs12618769, that acts locally to regulate methylation patterns for INPP4A (p = 0.0009) and exhibits significant association (TGen+GAIN, p = 8.318e-07, OR=1.43, T allele) with Bipolar Disorder that can survive Bonferroni correction for 59959 tests. We also found the SNP was a cis eQTL for UNC50 (p = 0.009). In the WTCCC study 13, the SNP showed a replicated association at p = 0.02 (OR=1.14, T allele). If we restrict the association analyses to cis mQTLs-cis eQTLs (i.e., SNPs affecting both methylation and expression regardless of the target gene), the SNP rs12618769 continues to be significant with the number of tests reduced to 18173.
In this study, an approach to the investigation of the genetic basis of Bipolar Disorder was pursued. Particularly, we sought to integrate the results of genome-wide analyses of genetic regulation of DNA methylation and of gene expression to inform on the biological relevance of the results from genome-wide association studies of Bipolar Disorder.
A recent publication has revealed the impact of genetic control of gene expression in brain on Schizophrenia 24. Thus, our effort was aimed primarily at understanding the impact of genetic control of DNA methylation on Bipolar Disorder, a common complex disease for which epigenetic mechanisms have been proposed as critical for pathophysiology. The proposed epigenetic connection is consistent with such observations as reported clinical differences between males and females and the fluctuation (i.e., clinical remissions and relapses) of disease course 15. Our study implicates SNP-mediated regulation of epigenetic modification in the etiology of Bipolar Disorder. We found that susceptibility loci of Bipolar Disorder are enriched for genetic variation with substantial impact on local methylation patterns.
We were also secondarily interested in integrating studies of genetic regulation of DNA methylation and of transcriptional expression, on a genome-wide scale, into a coherent framework for characterizing the underlying biological relevance of disease susceptibility loci. We found that of the most significant associations with Bipolar Disorder, SNPs that control both methylation and the expression of the same proximal gene are a small minority. This is consistent with a recent report that concludes that mQTLs and eQTLs largely capture independent traits 25. This finding lends substantial support to our thesis that, within the context of studies of brain pathophysiology, genetic variation controlling gene expression and genetic regulation of methylation present largely independent sources of variability in disease susceptibility. While a SNP may regulate both methylation and gene expression in brain, it is likely to do so with different target genes. DNA methylation may contribute to disease risk other than through influencing gene expression, for example by affecting DNA stability. Alternatively, methylation effects on expression may be more difficult to detect due to methodological limitations; for example, RNA is much more susceptible to degradation than DNA.
The result of our cis eQTL enrichment analysis strongly demonstrates the influence of genetic regulation of gene expression in brain on Bipolar Disorder. In contrast to the enrichment method (which tests a polygenic model that includes tens of thousands of SNPs with disease association p < 0.50) utilized by a recent study 24 that reports enrichment for alleles that affect gene expression in brain among Schizophrenia susceptibility alleles, our enrichment results were robustly observed in the significant Bipolar Disorder susceptibility associations at different significance thresholds (p < 0.001 and p < 0.0001) for association with disease. Finally, through the use of local mQTL information, we identified a significant association, rs12618769, that can stand multiple testing correction for all mQTLs tested in the GWA studies of TGen+GAIN Bipolar Disorder. We found the SNP to regulate (p = 0.0009) the methylation of inositol polyphosphate phosphatase 4A (INPP4A) in cis. The gene INPP4A has been shown to protect neurons from excitotoxic cell death and to maintain the functional integrity of the brain 26. Remarkably, a previous study has shown that increased excitotoxicity may account for disease progression in Bipolar Disorder 27. The gene INPP4A encodes one of the enzymes involved in the phosphatidylinositol signaling pathway, which is a target for the effects of lithium 28. We obtained replication of the association in the independent WTCCC GWA study. We also found the SNP was a cis eQTL for UNC50 (p = 0.009). It is worth noting that considering SNPs that are both cis mQTLs and cis eQTLs even if the methylation and expression targets are not the same gene may increase power to detect an association in genome-wide studies of Bipolar Disorder by further reducing the multiple testing burden and providing a potentially more robust genetic association finding.
We believe that genome-wide association studies of neuropsychiatric phenotypes merit the kind of investigation made possible by the genome-wide maps of mQTLs and eQTLs . Though much of the functionality of mQTLs that do not correlate with eQTLs remains to be elucidated, the effect of genetic variation on methylation suggests an important new area to explore and may provide mechanistic insights into genome-wide association findings.
This work was funded through 5R01MH080425 (to CL), PAAR (Pharmacogenetics of Anti-cancer Agents Research; U01 GM61393), ENDGAMe (ENhancing Development of Genome-wide Association Methods) initiative (U01 HL084715), and the Genotype-Tissue Expression project (GTeX) (R01 MH090937).
Author Contributions: E.R.G. and C.L. conceived and designed the study. E.R.G. and J.A.B. performed statistical analyses. E.R.G. and N.J.C. provided analysis tools. E.R.G. and C.L. wrote the manuscript. C.L. coordinated and supervised the study. L.C., C.Z., D.Z., E.S.G., J.R.K., T.A.G., C.M.N., R.M., P.D.S., N.J.S., E.N.S., C.S.B., J.I.N. Jr, H.J.E., T.F., D.L.K., W.A.S., W.C., J.R., W.B., F.J.M., T.G.S., W.H.B., J.B.P., P.P.Z., P.B.M., M.G.M., S.Z., P.Z., D.W.C., S.S., T.B.B., C.L. participated in patient diagnosis and sample collection. All authors edited and approved the manuscript.