The epigenome in genome-wide scale is still understood to a lesser degree of detail than the genome sequence. One reason is that there are as many epigenomes as cell types; although the bulk of DNA methylation changes little from one tissue to another, the fraction that changes profoundly impacts cell differentiation and disease [6
]. Another reason for the relative lack of detailed mapping of the epigenome is that it cannot be directly measured. DNA methylation status, for example, is not revealed by direct sequencing and thus depends on additional manipulation [12
]. This has resulted in studies of a small fraction of the genome per experiment, analyses that are qualitative rather than quantitative, and in some cases biased measurements of DNA methylation to CpG-rich or CpG-poor regions of the genome, according the chosen method [13
]. In many instances such biases are desirable, since it is still beyond the capacity of most laboratories to assay every single CpG nucleotide. Fortunately, as for genome studies, the methods to study the epigenome are evolving at a fast pace. The decreasing cost of high-throughput sequencing promises to make genome-wide mapping of methylated cytosines using bisulphite-treated DNA as template [15
] accessible to a larger number of research groups. Also, new sequencing technologies have reported the direct discrimination of modified dinucleotide bases with a certain degree of accuracy [17
Our understanding of the human methylome (both in normal and cancerous cells) has evolved throughout the years, reflecting both the introduction of new technologies and the description of biochemical routes regulating the addition and removal of methylated CpG sites. As for technological advances, the single most extraordinary breakthrough was the introduction of bisulphite-treatment [12
], which moved the field from inaccurate estimates based on Southern blot analysis [18
] to accurate, quantitative analyses. Most important, DNA bisulphite-treatment finally allowed a positive identification of individual methylated CpG sites using the polymerase chain reaction (PCR). Thus, bisulfite-based methods also made possible the study of DNA available in small quantities or of poor quality, like those obtained from frozen or paraffin-embedded tissues. Throughout the years, the technology to study single genes moved from qualitative (methylation-specific PCR [19
]), to semi-quantitative (COBRA, Q-MSP, among other methods [20
]) and highly quantitative (pyroMeth [22
]) assays. All these assays were designed to study discrete genomic regions, covering a few hundred base pairs at a time (bisulphite-sequencing) but mostly evaluating a few CpG sites of a single gene per experiment. As a consequence, the studied genes were selected a priori based on function or genomic location, and tumor suppressor genes were the prime candidates. This biased selection of genes resulted in the commonly accepted concept that DNA methylation is an alternative damage to genetic mutations in tumor suppressor genes; that DNA methylation has a result similar to inactivating mutations is true, but DNA methylation targets other molecular functional categories, as well. Also, not all tumor suppressor genes are targeted by DNA methylation. This topic is discussed in further detail in the next sections of this review.
In the same way that global profiling of gene expression is a better strategy to identify molecular signatures of tumor subtypes and clinical outcome, large-scale DNA methylation is also more powerful, mainly when critical markers are unknown for the disease of interest. One of the first attempts of genome-wide (or at least large-scale) methylome analysis adapted a previously published method to detect copy number changes [24
]. In this method, after serial digestion of DNA with methylation-sensitive and insensitive restriction enzymes, methylated fragments could be detected as missing signals in two-dimensional gel analysis [25
]. We learned from this method that a large number of genes (on average 5%) are hypermethylated in cancer, and more tumor-specific markers were identified in this process [27
]. The fact that assigning a genomic position to each identified differentially-methylated target is somewhat labor-intensive (the method required gel-extraction and cloning of fragments of interest) was an obstacle to its incorporation in routine analysis. Other methods with similar design become available and again were adopted by a few groups, sometimes limited to the original developers of the method [28
]. In a short period of time, however, several groups independently developed DNA microarray-based methods to assay DNA methylation [31
]. These have the advantage that targets are known in advance, so labor-intensive cloning and sequencing of differentially methylated loci is not necessary. With a greater interest in cancer epigenetics, companies developed arrays specifically designed to investigate promoter regions and CpG islands (actually, the first adopted arrays to study DNA methylation were designed for ChIP-on-Chip experiments, but the coverage of gene promoter made these arrays compatible with DNA methylation studies). A limitation of most methylation-microarray methods is that they are still qualitative. Nonetheless, hundreds to thousands of cancer-methylated genes were identified using these methods, and they further confirmed the tissue-specificity and age-related nature of DNA methylation [34
The most recent developments in methylome analysis resulted from massively parallel sequencing techniques. All methods used to generate methylation libraries suitable for microarray analysis can be directly or with slight modifications applied to each of these platforms (thus meDIP has become meDIP-Seq, HELP has became HELP-Seq, and so on; ). Due to the complexity of mammalian genomes and the high-cost associated with single-base mapping of methylated cytosine, most groups employ reduced-representation libraries to study the epigenome [36
]. The only truly genome-wide mapping of methylated cytosines in humans has been performed for normal cells [16
], but similar maps of adult cells and of the cancer epigenome will certainly become available soon. Once again, despite the increase in coverage, the results are in concordance with what has been previously reported: promoter CpG islands are in their vast majority unmethylated in normal cells, and gene body methylation correlates positively with gene expression. As novel findings, a difference in CpG methylation between exon and intron regions raises the possibility that gene body methylation participates in splicing regulation. Also, the presence of non-CG methylation (which occurs mostly in the CHG and CHH contexts) in the human genome was reported by Lister et al. [16
] when profiling the methylome of ES (embryonic stem) cells. Non-CpG methylation is well documented in plants and although it has been previously reported for individual genes in mouse ES cells [39
], the discovery of non-CG methylation mark in such high frequency in the human genome (up to 25% according to Lister et al.) was surprising. Supporting a possible functional role, non-CG methylation showed different distribution between gene bodies (enriched) and regulatory regions (depleted), and is virtually lost during differentiation. There is limited information about non-CG methylation in the cancer methylome [41
] and more studies are necessary to resolve its prevalence and physiological consequence.
Massively parallel sequencing-based methods to study DNA methylation in high coverage or whole-genome resolution.
In terms of methods, as one of the biggest breakthroughs in methylation analysis was the development of bisulphite-treatment, the next breakthrough will be the acquisition of information of chemically modified nucleotide bases (not only 5-methylcytosine, but also the recently rediscovered 5-hydroxymethylcytosine) at the same time as collecting sequence information. New sequencing technology yet in development follows the dynamic of dinucleotide incorporation in real-time and records the brief stalling of DNA polymerase when it encounters modified DNA bases [17
]. Competing technology using nanopores is also in development [43
]. Both methods have only been validated for short synthetic pieces of DNA and need more optimization before routine use for DNA methylation analysis.