We report association between DNA methylation with genetic and gene expression variation at a genome-wide level. We have identified methylation QTLs genome-wide, the majority of which act over very short distances, namely less than 5 kb. Furthermore, methylation patterns generally covary within individuals over distances of approximately 2 kb and in conjunction with this, meQTLs frequently affect multiple neighboring CpG sites. Our findings are consistent with previous methylation associations [5
], familial aggregation [13
], correlation with local sequence [10
], allele-specific methylation [15
], and effects of histone modifications [47
]. Little is known about the biological mechanisms that underlie meQTL effects, however, this is one important route to identify how genetic variation affects gene regulation.
We find an overall enrichment of significant associations of genetic variants with methylation CpG-sites, which is consistent with the results from two recent reports examining genome-wide methylation QTLs in human brain samples [5
]. Overall, the number of genome-wide significant meQTLs varies across the three studies, which is likely due to differences in sample sizes, differences in multiple testing corrections and definition of cis
intervals, and the presence of large tissue-specific differences in DNA methylation with tissue-specific meQTLs. In general, power to detect meQTLs will depend on many factors including sample size, genome-wide coverage of genetic variation, genome-wide coverage of methylation variation, and the effect size of the genetic variants associated with methylation variation in the tissue of interest.
Additionally, our analyses are based on Epstein-Barr virus transformed lymphoblastoid cell lines. The choice of cell type will affect the observed genome-wide DNA methylation patterns, and in particular, high-passage LCLs may exhibit methylation alterations over time [29
]. Sun et al
], for example, investigated genome-wide differences in DNA methylation between LCLs and peripheral blood cells (PBCs), and identified 3,723 autosomal DNA methylation sites that had significantly different methylation patterns across cell types. In that respect, it is expected that a subset of our results reflect LCL-specific events. We have tested potential confounding variables that could affect methylation levels specifically in LCLs [30
], but do not observe significant effects of these on overall DNA methylation patterns in our data. However, variation in methylation are slightly different in HapMap Phase 1/2 samples compared to HapMap Phase 3 samples, suggesting that technical variation related to LCL culture may influence DNA methylation. We took this into account when performing all downstream methylation QTL analyses, and our analyses of the uncorrected methylation patterns are consistent with the results of previous studies in primary cells [4
We obtained interesting results from the trans analysis highlighting several loci with potential long-range effects on DNA methylation. Furthermore, an intriguing association of a SNP within the intron of DIP2B, which contains a DMAP1-binding domain, with the first principal component of autosomal methylation patterns suggests novel genome-wide effects on methylation variability. However, we do not observe a strong effect of polymorphisms in many of the candidate methylation regulatory genes on overall patterns of methylation or on specific probes. The sample size used in the study limits our power to detect trans signals, rendering these analyses more difficult to interpret. In general, the moderate sample sizes used in all three genome-wide methylation studies to date do not allow for the detection of subtle effects of genetic variants on methylation variation and correspondingly the majority of methylation sites assayed across all studies remains unexplained by the GWAS analyses. However, the findings indicate that genetic regulation of methylation is as complex as expression or phenotypic variation.
Relating genetic variation to both DNA methylation and gene expression variation reveals complex patterns. We observe significant overlap between meQTLs and eQTLs for cis
regulatory variants. These findings were obtained when we both focus exclusively on meQTL SNPs (Figure ) and when we compare the genome-wide meQTL results for all SNPs classified as eQTLs in the hierarchical model framework (Figure S9 in Additional file 1
). The observations indicate evidence for shared regulatory mechanisms in a fraction of genes. However, in the re-analyses of the eQTL data taking into account DNA methylation, in only 10% of eQTLs was the genetic effect of the SNP on expression affected by controlling for methylation, suggesting that variation in methylation accounts for only a small fraction of variation in gene expression levels. There may be several explanation for this. First, the coverage of the methylation array provides a relatively low resolution snapshot of the genome-wide DNA methylation patterns. Second, steady state gene expression levels (as measured by RNA-sequencing) are controlled by many other factors in addition to DNA methylation, such as transcription factor binding, chromatin state including histone marks and nucleosome positioning, and regulation by small RNAs. Finally, our study sample size provides modest power, both for eQTL and meQTL mapping. However, compared to previous studies addressing this issue [5
], we find more convincing evidence for meQTL and eQTL overlap. For example, Zhang et al
] found ten cases where genetic variants associated with both methylation and expression, but they only examined gene expression data for fewer than 100 genes in these comparisons in a subset of the sample, while Gibbs et al
] found that approximately 5% of SNPs in their study were significant as both meQTLs and eQTLs. Also, Gibbs et al
] find proportionally similar number of QTLs for methylation and gene expression, while we find more eQTLs. A potential explanation for the greater overlap obtained in our data is that our study examines one cell type in comparison to heterogeneous cell-types in human brain tissue samples used in both other studies [5
Characterizing the genetic control of methylation and its association to the regulation of gene expression is an important area for research, critical to our understanding of how complex living systems are regulated. Our study has the potential to help disease mapping studies, by informing the phenotypic consequences of this variation. Altogether, of the 173 genes with proximal meQTLs in our study, eighteen genes were previously reported to be differentially methylated in cancer, in other diseases, or across multiple tissues (see Table S4 in Additional file 1
). Furthermore, thirty of the meQTL associations reported in our study were also observed in human brain samples [5
]. These findings provide a framework to help the interpretation of GWAS findings and improve our understanding of the underlying biology in multiple complex phenotypes.