Microarray-based methods, despite their high resolution, are generally far from being truly genome-wide analyses. Close to genome-wide coverage can be achieved by the combination of one of the affinity-based methods and high-density tiling arrays, and this has been done to study the methylome of B lymphoid blood cells at 100-bp resolution [36
]. Such an approach is quite expensive and time consuming, explaining why few research groups have used it to study whole-genome methylation. The introduction of what has been called next-generation sequencing brought a fresh excitement to genome and epigenome analysis. By making possible the reading of millions of sequences at once, next-generation sequencing equilibrated the usefulness of the methods to reveal genome-wide DNA methylation in favor of the gold-standard bisulfite-based detection. Currently, there are four main competing next-generation sequencing technologies available: Illumina Genome Analyzer, generally referred to as Solexa sequencing, from Illumina, Inc.; SOLiD™ System, from Applied Biosystems; HeliScope Single Molecule Sequencer, from Helicos BioSciences; and 454 Sequencing, from Roche. Despite variations, all platforms take advantage of parallel processing of thousands to millions of DNA sequences at a time (massively parallel sequencing), and the base detection is either based on classical Sanger sequencing (using fluorescently labeled nucleotides) or the innovative pyrosequencing method. This is a rapidly advancing field and companies are strongly competing to increase genome coverage per run and to reduce the cost of their method.
As for whole-genome tiling microarrays [37
], the first organism to have its methylome sequenced at single-base resolution was the plant Arabidopsis thaliana
]. To do this, two groups fragmented the genomic DNA by sonication prior to ligation of PCR primer adaptors and bisulfite conversion, and performed shotgun sequencing using the Illumina Solexa platform. Compared to the human methylome (and the methylome of all mammals), the methylome of Arabidopsis
is quite complex: in addition to methylation in CpG dinucleotides, there are also CHG and CHH methylation (H = A, C or T). From an analytical point of view, the possible combinations of methylated/unmethylated cytosines are less complex in humans than in Arabidopsis
, making sequence matching and assembling less laborious. However, the Arabidopsis
genome is just a fraction of the size of the human genome (119 Mb in Arabidopsis
versus 3.1 Gb in human). Thus, the size of the human genome has been the main obstacle to whole-genome sequencing.
Not long after the Arabidopsis
methylome was fully sequenced, the mouse methylome of pluripotent and differentiated cells from various tissues was sequenced with moderate coverage. To circumvent the genome size obstacle (the mouse genome is 2.7 Gb in size), the authors took advantage of the reduced representation generated from DNA digestion with the Msp
I restriction enzyme, which has a recognition site (CCGG) abundant in CpG islands [40
]. In this technique (reduced representation bisulfite sequencing, RRBS), bisulfite treatment is done for size-selected DNA fragments, targeting the most CpG island-enriched fraction, followed by bisulfite-treatment and Illumina Solexa sequencing. While analysis of the human methylome by RRBS has not yet been reported, this ingenious technique is very promising for such investigation. Meanwhile, the human methylome has been studied using other reduced representation strategies. A target-specific approach using 'padlock' probes was recently introduced by two different groups [41
]. By presenting a unique sequence in each end, designed to match the bisulfite-converted genome, these probes capture targeted regions and create a circular molecule. The internal part of these probes is a universal sequence that allows for simultaneous amplification of all circularized, captured sequences prior to massively parallel sequencing. Coincidentally, in their initial articles, both groups demonstrated the feasibility of their method by sequencing 10,000 targets, but the method can be extended to more or fewer targets according to the research goal. Interestingly, there seems to be an inherent bias in the process, with some circularized DNA being preferentially amplified or sequenced. Thus, some additional optimization of the method will be necessary prior to increasing the number of targets per analysis. It is also important to note that, since target selection is part of the procedure, these methods do not represent a genome-wide method. However, they are of extreme practical use when there is a strong interest in genome regions or promoter CpG islands alone. In one of these reports, the authors go one step further and introduce a less biased approach, termed MSCC (for methyl-sensitive cut counting) [41
]. In this method, the authors use the methylation-sensitive restriction enzyme Hpa
II, which, similarly to its methylation-insensitive ishoschizomer Msp
I, cuts the genome at CCGG sites and thus covers 90% or more of the human CpG islands. The ligation of adaptors to the generated fragments, followed by PCR and massively parallel sequencing, results in mapping of unmethylated cytosines in the CCGG context. The authors present an inverse correlation between the abundance of MSCC tags and measured cytosine methylation per regions, but recognize that a much larger sequencing effort is necessary to increase accuracy at low methylation densities. In another independent publication, Brunner et al.
] published a similar approach to MSCC, but they introduced the Msp
I-digested DNA as a control in the procedure, to discriminate CpG sites that can be assayed and mapped uniquely in the genome from those that cannot, to reduce the rate of false-positive methylation.
The first human methylome at single-base resolution was published earlier this year [44
] and the authors employed the MethylC-Seq method, previously used to sequence the Arabidopsis
methylome, to investigate the human methylome at single-base resolution. This landmark report is industrious both in methodology and in its findings. One embryonic stem cell (ESC) and one fetal lung fibroblast were sequenced and, to achieve a 14-fold coverage of the genome, more than 1 billion Solexa reads were generated for each. The results support that the methylome is very different between undifferentiated and differentiated cells, and the authors' unexpected findings of significant non-CpG methylation in ESCs (up to 25% of the methylated cytosines were in CHG and CHH contexts, similar to Arabidopsis
cytosine methylation) strongly support that the physiological impact of DNA methylation will be better captured in whole-genome, deep, unbiased analyses. However, until sequencing costs are significantly reduced, the human methylome analysis at single-base resolution will be restricted to a few samples at a time. Studies in cancer, however, will need more extensive analysis. At the minimum, cancer studies require the sequencing of dozens, if not hundreds, of samples due to their inherent genetic and epigenetic heterogeneity, and the various disease grades and prognostic groups. Additionally, genome-wide mapping of methylated cytosines must be quantitative rather than just qualitative; thus, massively parallel sequencing requires several-fold coverage of each individual CpG dinucleotide, which makes the task prohibitively expensive. As a compromise, strategies based on reduced representation of the genome are currently more practical for whole-methylome analysis.