|Home | About | Journals | Submit | Contact Us | Français|
Epigenetics is being increasingly combined with epidemiology to add mechanistic understanding to associations observed between environmental, genetic and stochastic factors and human disease phenotypes. Currently, epigenetic epidemiological studies primarily focus on exploring if and where the epigenome (i.e. the overall epigenetic state of a cell) is influenced by specific environmental exposures like prenatal nutrition,1 sun exposure2 and smoking.3 In this issue of the IJE, Nada Borghol et al.4 report an association between childhood social-economic status (SES) and differential DNA methylation in adulthood. Low SES may integrate diverse and heterogeneous environmental influences, and knowing which epigenetic changes are associated with low SES may provide clues about the biological processes underlying its health consequences. The authors stress that their study is preliminary. This statement is, in fact, to a greater or lesser extent applicable to the entire first wave of studies currently being published that likewise aim to discover associations between epigenetic variation measured on a genome-wide scale and environmental exposures or disease phenotypes. When executing such epigenome-wide association studies (EWASs),5 every epigenetic epidemiologist is struggling with the same biological, technical and methodological issues. It is important to take these into consideration when designing a study and interpreting the results. Let us consider seven of those issues, taking the current study on SES as a starting point.
Most epigenetic epidemiological studies focus on DNA methylation for various practical and biological reasons, neglecting other layers of the epigenome-like histone modifications that are also likely to be important in influencing disease phenotypes. Our basic understanding of the methylome (i.e. the whole of DNA methylation marks on the genome) is in its infancy, and we are still learning about the specific localization of the features that, when differentially methylated, regulate gene expression and are thus relevant for epigenetic epidemiologists to study. The current study, like many others, evaluated promoter regions, in this case defined as 1000bp upstream to 250bp downstream of transcription start sites. Although these features are often enriched for DNA methylation marks influencing the expression of genes, recent work suggests that other regions of the methylome outside of promoters, including inter-genic CpG island shores6 and intra-genic CpG islands,7 may ultimately be more important for regulating phenotypic variation.
For any differentially methylated region identified in EWASs it will be important to demonstrate functionality. Promoter methylation in the current study was integrated with public gene expression data and, as expected, highly expressed genes were more commonly flanked by less methylated promoters and vice versa. A limitation is that this observation is for groups of promoters, whereas information is needed about this relationship for individual promoters. Mining the reference epigenomes and transcriptomes that are being generated for different cell types under the umbrella of initiatives such as the National Institutes of Health (NIH) Epigenomics Roadmap8 and the International Human Epigenome Consortium9 may contribute to such information. Additional in vitro experiments will be required to evaluate the transcriptional effects of differential DNA methylation at a specific locus independent of its genomic context.10
The good news is that recent advances in genomic technology mean that genome-scale studies of DNA methylation across multiple samples are now feasible. In practice, however, one has to compromise between coverage and precision in epidemiological studies, which likely incorporate a large number of samples. A large (and growing) number of methods exist for assessing DNA methylation both genome wide and at specific CpG sites,11 and one problem relates to our inability to compare results across studies that have used different platforms. On the one hand there are methods such as that used in the current study in which the methylated portion of the genome is captured using antibodies against methylated DNA and subsequently quantified using microarrays or next-generation sequencing. These approaches can provide coverage across most of the genome and may be optimally suited to discriminate low from high methylation, but have lower reliability for smaller differences and are biased by factors such as CG density.12,13 On the other hand, there are methods based on the bisulphite conversion of DNA combined with next-generation sequencing that provide higher accuracy and single nucleotide resolution. Although whole-genome bisulphite sequencing is currently unfeasible to use across large epidemiological cohorts, the method can be adapted to target a reduced representation of the genome (approximately 3 million out of approximately 28 million CG dinucleotides in the human genome).12,13 The recently launched Illumina 450k Methylation Beadchip may offer a balance between coverage and precision, which will be attractive for epidemiological EWASs executed during the next few years.5 It interrogates DNA methylation at over 480000 CG dinucleotides, is high-throughput and relatively affordable. The precision of this platform appears to compare well with some of the other platforms,12,13 but these results should be interpreted with caution. Although correlation coefficients reported across the various platform comparisons are high, they are mainly driven by the fact that the large majority of the genome is either unmethylated or fully methylated, and substantial discrepancies between platforms may exist for intermediate level methylation.12,14 Therefore, the technological validation of findings using an independent method remains important. This will be feasible for a small number of ‘top hits’, like the three procadherin promoters assessed in the current study. However, validating the outcomes of the complex pathway analyses performed to implicate either entire biological processes (such as extra- and intra-cellular signalling in the current study) or genomic features with a specific function in gene regulation [e.g. promoters, enhancers, inter/intragenic CG island (shores) etc.], is more demanding and currently not realized. Validating the results of such gene-set testing methods will entail the re-assessment of DNA methylation across large sets of loci.
The current study investigated only 40 individuals. Investigators will be able to secure budgets for larger studies as empirical data increasingly highlight the value of epigenetic epidemiology, and high-throughput, economical laboratory approaches become more widely adopted. Nevertheless, it is unlikely that the simple brute-force approach that has been used relatively successfully in genome-wide association studies (GWASs) is valid for EWASs. In genetics, many of the epidemiological principles about designing studies with respect to selection biases, confounding, batch effects and appropriateness of controls could largely be replaced by the simple rule ‘bigger-is-better’. This is not true for epigenetic epidemiology, because the epigenome is not a static entity like the genome, which necessitates the use of more conventional epidemiological approaches.15 Further complicating matters is the fact that, for the most powerful study designs in epigenetic epidemiology (including studies of discordant monozygotic twins16 particularly when longitudinally sampled,17 early exposure studies with long-term follow-up,1 and studies of specific cell types18), the number of eligible individuals for whom relevant biological materials were stored in existing epidemiological cohorts were often limited, and it will be difficult to scale-up analyses to include the thousands of samples that may be required for establishing robust associations with disease phenotypes. Moving forward, it will be important to establish cause and effect in epigenetic epidemiology; disease-associated differentially methylated regions may arise prior to illness and contribute to the disease phenotype or could be a secondary effect of the disease process, or the medications used in treatment.19 Furthermore, maximum information will be obtained from epidemiological studies that are able to integrate epigenomic information with genomic, transcriptomic and proteomic data obtained from the same samples.
In many respects, large comprehensively phenotyped and longitudinally sampled epidemiological studies, like the 1958 British birth cohort used in the current study, are an ideal resource for epigenetic epidemiology. In nearly all of these studies, however, whole blood is the only biological material that has been archived. Blood is a heterogeneous tissue and any DNA methylation difference between groups could be confounded by differences in the cellular composition of whole blood samples, for example, resulting from the immune response to sub-clinical infection. The good news is that fewer than perhaps expected DNA methylation differences exist between leucocyte types, and controlling for cellular heterogeneity may be possible in biobanks with a simple blood cell count.20 Whether the latter is sufficient (and under which circumstances it is not), however, remains to be established. Epigenomic studies of separate cell types such as those being undertaken by the NIH Epigenomic Roadmap Initiative and the European Union Blueprint consortium are currently generating reference epigenomes of haematopoietic cells that will be of great utility in this regard.8 When moving beyond associations with environmental exposures to epigenetic associations with phenotypes, a key question for epigenetic epidemiology concerns the extent to which easily accessible peripheral tissues (such as blood) can be used to ask questions about inter-individual phenotypic variation manifest in inaccessible tissues such as the brain, visceral fat and other internal organs and tissues. Cross-tissue comparisons of the methylome within the same individual are currently underway to establish the relationship between epigenetic patterns in blood with other tissues. Although these analyses are crucial, the results may not be generally applicable; higher inter-tissue concordance may be present for DNA methylation changes induced early in development (and potentially propagated soma-wide) than for changes occur during ageing that are more likely to remain tissue specific.19,21 Efforts to obtain biopsies (subcutaneous fat, muscle, etc.) and post-mortem material in subsets of longitudinal biobanks will greatly increase their value for epigenetic studies, despite the problems associated with cellular heterogeneity that also hold for such samples.
The main findings in the current study concerned DNA methylation differences at three procadherin promoters.4 The extent of the difference at these promoters was similar to those commonly observed in other recent studies, namely ~5%,5 and was most apparent for a single, nominally statistically significant CG dinucleotide in each region. The biological implications of such small alterations in DNA methylation in terms of gene expression and function are unknown. Although DNA methylation is recognized as one of the most stable epigenetic marks, it is still relatively dynamic and this has important implications for epigenetic epidemiology. The randomness of maintaining and mitotically transmitting DNA methylation patterns may potentially dilute the putative epigenetic signatures of an adverse exposure early in life (e.g. to low SES in childhood) observed decades later. Of note, recent studies indicate that DNA methylation patterns in leucocytes undergo considerable changes during the first years of life.22 Thus on top of the previously discussed question of whether DNA methylation at a specific locus actually influences transcriptional activity, researchers should also aim to establish whether the small DNA methylation differences often observed between groups—either expressed as absolute difference, relative difference or relative to the variation in the population—translate into differences in gene expression in the relevant tissue. It will be of particular interest to see whether the effects of such modest differences, while perhaps of little consequence individually, may shift transcription of a biological process or functional network when they co-occur with other changes to the methylome.23 Little is known about the actual scale and extent of between-individual variation in DNA methylation across the genome. In this regard, public genome-scale resources need to be created that document inter-individual differences in DNA methylation and gene expression, in addition to the reference epigenomes that are currently being generated.
The results of GWASs are relatively easy to judge. Quality-control steps are well-defined and reported, individually testing every genetic variant [i.e. single nucleotide polymorphism (SNP)] is straightforward, and levels of genome-wide statistical significance are clear. For EWASs, the analytical methodology is very much under construction. For example, in the current study it was not possible to attain genome-wide levels of significance, which is acceptable for an exploratory study, but makes it difficult to fully interpret the reported differences. Because of the vast range of methods currently being used to assess DNA methylation, meta-analyses across different studies are difficult. The adoption of a common technology platform, such as the new Illumina 450k Methylation Beadchip, across multiple studies would provide an excellent opportunity to converge on widely accepted guidelines for the analysis and integration of EWAS data. Apart from pre-processing procedures (quality control, normalization, handling different probe types, accounting for genetic variation, etc.), elements of these guidelines should deal with the analysis of individual CG dinucleotides vs groups of (correlated) adjacent CGs, the use of genome annotations in the analysis (histone states, promoter types, CG content, etc.), and levels of epigenome-wide significance for various analyses. An important aspect will be the exploration of the previously mentioned gene-set testing methods in the context of DNA methylation since they will be vital to obtain meaningful interpretations of genome-wide data in terms of underlying biological processes or genomic functions [e.g. promoters, enhancers, inter/intragenic CG island (shores), etc.]. For example, commonly used enrichment methods assume independence within a gene set and, apart from consistency in biological signal in a gene set, statistical significance may reflect consistency in other characteristics such as GC content, coverage or other sequence features.24 Alternative implementations of gene-set testing methods include global testing approaches.25 Finally, it will be important to adopt an integrative paradigm based on the combination of genetic and epigenetic epidemiological data.26 Of particular relevance in this respect is evidence for the widespread occurrence of allele-specific DNA methylation (ASM) across the genome. Recent studies have shown that there are considerable inter-individual differences in ASM, which are frequently associated with genetic variation but can also be mediated by genomic imprinting (i.e. the parent-of-origin dependent silencing of expression by epigenetic mechanisms), environmental influences and apparently stochastic factors in the cell.27,28 ASM can mask the effect of risk alleles by silencing their expression, and also provides a potential mechanism underlying gene–environment interactions.26 Furthermore, ASM may contribute towards the apparent ‘missing heritability’ of many complex diseases and the low penetrance often reported for SNPs identified by GWASs.29
There is a considerable interest in epigenetic research in the popular press. The current study is a vivid illustration: even though the authors deem it preliminary, it was widely covered by the media.30 Epigenetics should avoid some of the hype that surrounded the early days of genetic epidemiology. After the draft human genome sequence was announced in 2001, it was widely perceived that we would soon understand the causes of most common diseases and how to treat them. This expectation was not realistic, but not always renounced by geneticists. Currently, many scientists outside the field are disappointed by results of human genetics, and in particular GWASs, despite their overall considerable success. Genetic epidemiology has proven to be harder than expected despite the favourable starting point of thousands of Mendelian diseases and the high heritabilities associated with most traits to be explained. Very much like genetics, epigenetics will not be able to deliver the miracles it is sometimes claimed it will.
In conclusion, epigenetic epidemiology is early in its development and susceptible to new ideas and approaches. Only a few years ago empirical papers were greatly outnumbered by reviews. Now, reference epigenomes are produced at great pace (see http://epigenomeatlas.org).8,9 Moreover, furthered by pilot studies like the one from Nada Borghol et al.,4 the outline of the infrastructure required for EWASs is emerging. Crucial elements include optimal study designs, benchmarking technology and data analysis approaches that are statistically and biologically sound. An additional key aspect to the successful design and interpretation of epigenetic epidemiological studies will be the creation of public genome-scale resources focusing on inter-individual variation incorporating epigenomic, DNA sequence and transcriptomic data. Education, hard work and a certain degree of luck will get us there—not very different to the remedy against low SES.
NGI/NWO (#93518027, to B.T.H.); NGI/NWO-funded Netherlands Consortium for Healthy Ageing (NCHA) (#05060810, B.T.H.); NIH grant (AG036039, to J.M.).
We thank Elmar Tobi for his comments.
Conflict of interest: None declared.