General considerations in profiling DNA methylation
Here we review currently available and emerging laboratory methods for investigating DNA methylation patterns in health and disease. We emphasize cancer research, but the methods are equally applicable to other disease states. Current research in epigenetics is largely driven by novel technologies [21
], and over the past decade, studies of DNA methylation have grown dramatically, and become one of the most dynamic and rapidly developing fields in molecular biology. Most approaches for large-scale methylation profiling rely on an initial genomic fractionation step followed by analysis of methylation patterns across the genome by microarray-based or sequencing-based approaches. Previously, microarrays for this purpose contained clones from libraries of CGIs [22
]. As the field progressed, more comprehensive global approaches were accomplished using whole-genome comparative genomic hybridization tiling arrays or bacterial artificial chromosomes arrays [25
]. To achieve global epigenetic profiling, a variety of commercial microarrays are now available. Whole-genome arrays containing overlapping oligonucleotides tiling through large areas of mammalian genomes, including but not restricted to promoter regions, are available from NimbleGen (Roche, Basel, Switzerland), Agilent (CA, USA) and Affymetrix (CA, USA). These platforms have been used successfully for analyzing DNA methylation, typically with probes made from DNA fractionated by methylation-sensitive restriction-enzyme treatment and affinity purification of methylated DNA by either methyl-binding proteins or methylated DNA immunoprecipitation (MeDIP) [24
]. These approaches will likely remain a mainstay in mammalian epigenetics for some time, as they have major advantages of dense genomic coverage and high sample throughput at a reasonable cost.
Before discussing each type of genomic microarray and its applications, we need to describe in more detail the methods of genomic fractionation and probe preparation for identifying methylated and unmethylated DNA sequences. Choices for this purpose are: genomic DNAs can be treated with methylation-sensitive restriction endonucleases, which discriminate sequences based on methylation status; and DNA can be immunoprecipitated by antibodies that recognize methylcytosine or processed by affinity purification on methyl-binding protein beads, such that the resultant DNA is enriched for methylated sequences. Another nonfractionation approach, with different technical advantages compared with the above fractionations, is to use sodium bisulfite, which converts unmethylated cytosines into uracil residues without changing methylated cytosines. This is an important approach, as bisulfite sequencing is considered to be the gold standard for validating DNA methylation patterns.
As one of the most promising technologies for genomic analysis, next-generation (NextGen) sequencing is just beginning to take center stage as another approach in DNA methylation profiling. We will discuss this methodology and its application in studying DNA methylation in a later section. Some of the representative DNA methylation methods included in this review are illustrated in –, and we will explore each of them in more detail.
Methodologies based on methylation-sensitive restriction
High-throughput bisulfite sequencing
Restriction endonuclease digestion followed by microarray analysis
The site-specificity and methylation dependence of several available restriction enzymes make restriction digestion a powerful approach for genomic fractionation. In principle, these technologies permit only one of the two fractions (either methylated or unmethylated) to remain intact after the restriction digestions, and subsequently those intact probes can be labeled and hybridized to an array. By contrast, fragments that have been cut at internal sites fail to amplify and drop out from this genomic representation. For example, as shown in , Huang et al.
developed differential methylation hybridization (DMH) arrays by combining restriction endonucleases and microarrays for high-throughput examination of the methylation status of CGIs in human genomes [22
]. In these studies, genomic DNA was fragmented by restriction with Mse
I, a four-base cutter that cleaves bulk DNA into small fragments (<200 bp). This endonuclease recognizes sequences that rarely occur within GC-rich regions, leaving most CGIs intact. Then, the restricted fragments were ligated to synthetic linkers and further restricted with methylation-sensitive endonucleases, Bst
UI and/or Hpa
II. The Bst
UI was selected for the methylation analysis because more than 80% of CGIs contain Bst
UI sites. The Bst
UI-treated DNAs were used as templates for the subsequent linker-dependent PCR. Methylated DNAs resist the restriction digestions, and hence can be amplified. By contrast, the unmethylated DNAs were digested by the endonucleases and failed to be amplified. The resulting PCR products were next labeled with fluorescent dyes. In general, Cy3 dye was used for DNA from control normal patients, while Cy5 dye was used to denote DNA from the patients with cancer. Equal amounts of final amplicons from both groups were mixed well and overlaid on a DMH microarray slide upon which the human CGI library probes were printed. After stringent hybridization, weak or unbound amplicons were eliminated by extensive washing and the resulting microarray slides were subject to high-resolution fluorescence scanning using a laser beam. The ratio of Cy5 over Cy3 intensities would reflect the methylation status in the cancer group relative to that in the normal counterparts within each locus.
To judge the potential false discoveries in this method, a methylation-insensitive isoschizomer of Hpa
II, namely Msp
I, can be employed in a control reaction. This methodology was applied for profiling promoter methylation in 8091 human genes [35
]. Fukasawa et al.
employed a similar approach and carried out promoter methylation studies in human lung cancer [36
]. In their studies, the methylated Hpa
II-resistant DNA fragments and Msp
I-cleaved products were amplified and labeled with Cy3 and Cy5, respectively, and then were cohybridized to microarrays containing promoters of 288 cancer-related genes [36
]. By using the similar restriction fractionation approach, Hpa
II tiny fragment enrichment by ligation-mediated PCR assay (HELP) uses Msp
I representations as an internal control [37
]. Because Msp
I is the methylation-insensitive isoschizomer of Hpa
II, it would cut every possible Hpa
II restriction site across the genome. By comparing the two profiles generated by each enzyme, Khulan et al.
performed both intra- and inter-genomic DNA-methylation analyses in 6.2 Mb of the mouse genome, and identified 223 novel tissue-specific differentially methylated regions [37
A limitation in utilizing Bst
UI and Hpa
II to assess methylation status is that these enzymes identify only a limited fraction of genome CpG sites. To improve sensitivity, Nouzova et al.
added a reverse approach by using genomic representations made by digesting with Mcr
BC, an unusual restriction enzyme that cleaves methylated, rather than unmethylated, DNA (). Mcr
BC cuts between two closely spaced (55–100 bp) methylated cytosines in the context (G/A)Cmet, and therefore preferentially digests densely methylated regions of DNA, such as abnormally methylated CGIs in cancer cells and normally methylated repetitive and intragenic sequences in most cells and tissues. As an example of this method, a greater number of hypermethylated loci were identified in an acute promyelocytic leukemia cell line than in normal peripheral blood mononuclear cells, and the sensitivity of the analysis was greater compared with the conventional approach using only Hpa
II and Bst
]. Irizarry et al.
developed comprehensive high-throughput arrays for relative methylation (CHARM) from the Mcr
BC assay by improving statistical procedures and the array design algorithm [39
]. By using the CHARM assay to detect genome-wide DNA methylation in colon cancers, this group reported hypermethylated CpG sites in ‘CGI shores’, which were defined as the regions approximately 2 kb away from a CGI [40
]. This interesting finding is consistent with the DNA methylation ‘spreading theory’, which describes that de novo
methylation may begin at the flanking CpG sites and progressively invade into the core of the island [41
]. As will be described and illustrated in , genomic fractionation into methylated (Hpa
II-resistant) and unmethylated (Mcr
BC-resistant) fractions can also be exploited for mapping DNA methylation by conventional and NextGen sequencing [5
Recently, a methodology to enrich the unmethylated DNA with the involvement of multiple enzyme-mediated restrictions was developed [31
]. In this approach, genomic DNA was subjected to a cocktail of methylation-sensitive restriction endonucleases (Hpa
1I and Hpych
4IV). This combinatorial approach provides a better coverage (up to 41%) of all CpG dinucleotides in mammalian genomes [31
]. After the multiple restrictions of genomic DNA, a double-stranded adaptor was ligated to the CpG overhangs. At this point, the relatively short and amplifiable DNA fragments were predominantly derived from unmethylated regions and were susceptible to the subsequent PCR amplifications. However, if any residual fragments that harbor methylated cytosines remained, they could be further eliminated by a second step of methylation-specific restriction using Mcr
BC prior to the PCR. After the two-step restrictions and subsequent PCR, amplicons were labeled with different fluorescent cyanine (Cy5 or Cy3) dyes for either sample or reference DNA, mixed and cohybridized to oligonucleotide microarrays [31
]. In another variation of DMH, enriching unmethylated DNA, the total genomic DNA was first restricted with Hpa
II, ligated to special linkers prior to subsequent PCR amplifications, and then the resultant amplicons were hybridized to arrays of promoter sequences [42
]. This approach relied on the occurrence of two methylation-sensitive restriction sites in close proximity. In this case, if the restriction sites were both unmethylated, they were susceptible to Hpa
II and could be ligated to linkers followed by PCR amplification. By contrast, if the sites at either or both ends were methylated in the genome then restriction cuts were prevented and longer fragments were generated that were poorly susceptible to PCR amplification.
As DMH methods rely on PCR amplification prior to microarray analysis, Chen et al.
conducted a test to determine at which cycles of PCR the amplification is discriminative, and found that PCR should be carried out for less than 29 cycles to prevent overamplification of the partially restricted DNA fragments and yet yield sufficient PCR products for intact DNAs [43
]. Besides PCR cycles, another potential bias associated with PCR amplification of GC-rich sequences, including CGIs, is the choice of polymerase enzyme. Pike et al.
compared the efficiency of three ‘GC-improved’ DNA polymerases (AccuPrime™ DNA polymerase [Invitrogen, CA, USA], ThermalAce™ [Invitrogen], DNA polymerase and GC-RICH PCR enzyme [Invitrogen]) to the classical Taq polymerase and found advantages to using these higher efficiency enzymes [44
Of historical importance, another restriction enzyme-based methodology not involving microarrays is restriction landmark genomic scanning (RLGS), a two-dimensional gel electrophoresis approach combining restriction enzyme polymorphism and DNA methylation-sensitive sites for genome-wide analysis of DNA methylation and expression. This method entails the restriction digestion of genomic DNA with methylation-sensitive restriction enzymes (Not
I or Asc
I) followed by radiolabeling of the restriction fragments and two-dimensional gel electrophoresis; this method yields a pattern of spots on autoradiograms representing unmethylated sites in the genome of the experimental sample being analyzed [45
]. Comparison of the patterns of spots found with two different samples has uncovered important differentially methylated genes in diverse cancer types, ranging from leukemias to lung cancers [46
Profiling allele-specific methylation on microarrays
Allele-specific epigenetic modifications are the hallmark of imprinted loci. This type of allelic asymmetry is increasingly recognized as occurring at a subset of nonimprinted loci as well. While simple DMH does not yield information on allele-specific methylation, restriction enzyme-based approaches followed by microarray analysis have now been developed to produce such allele-specific data. In a single nucleotide polymorphism (SNP) chip-based method, called methylation-sensitive SNP chip analysis (MSNP), Yuan et al.
and Kerkel et al.
used restriction endonuclease-based methylation profiling on Affymetrix SNP arrays to determine net methylation and allele-specific methylation genome-wide [49
]. In the initial proof-of-principle study using 50K SNP arrays, genomic DNA was first cleaved by Xba
I in the presence or absence of Hpa
II. The resulting DNA fragments were ligated to adaptor linkers, followed by PCR amplification, labeling and hybridization to the SNP arrays, which contain oligonucleotides matching both alleles of a large number of SNPs, distributed at roughly equal intervals along the human genome. The resultant allele-specific SNP hybridization intensities derived from Xba
I genomic representations were compared with those from the Xba
II representation (and control representations with Xba
I). As will be discussed, MSNP can now be applied to 250 K and 1M SNP arrays that use Nsp
I or Sty
I as the generic restriction enzymes, but the principle remains the same. With regard to DNA methylation, the SNPs on the arrays fall into three general categories. The first is class 1 SNPs, fragments that lack internal Hpa
II sites, yield genetic information (copy number aberrations and loss of heterozygosity) and serve as invariant internal controls in the DNA methylation analysis. Class 2 SNPs are within fragments that contain Hpa
II sites at positions other than the SNP itself. In fact, these SNPs are the informative ones for assessing both net (average of the two alleles) and allele-specific DNA methylation. Class 3 SNPs, the rarest category, fall within a CCGG sequence and therefore create or destroy a Hpa
II restriction site, or have adjacent polymorphic Hpa
II sites, based on the SNP database. These SNPs are not reliable for assessing DNA methylation but, like the class 1 SNPs, they are useful for assessing loss of heterozygosity and DNA copy number.
In contrast to CGI arrays, SNP arrays and full-genome tiling arrays query diverse locations in the genome that are intergenic, intragenic and promoter CGI-associated. As noted previously, while much prior research in cancer epigenetics has focused on gaining methylation in CGIs, there are advantages to surveying not only islands, but also nonisland sequences. In contrast to islands, the vast majority of which are nonmethylated, nonisland sequences frequently have substantial baseline CpG methylation, so both gains and losses of DNA methylation can be evaluated as biomarkers. Moreover, nonisland CpG sequences may have important regulatory functions that will only be revealed when these sequences begin to be studied using MSNP and related genomic profiling methods such as HELP, CHARM assays and MeDIP on tiling arrays, as well as ultra-high-throughput sequencing.
New biological principles can emerge specifically from studying allele-specific epigenetic modifications by MSNP and related methods. For example, Kerkel et al.
carried out a large study of allele-specific methylation in normal human hematopoietic and placental tissues. This study uncovered a strong genetic/epigenetic dependence at multiple loci with strong linkage between SNP genotypes and the propensity of a given allele (CpGs near the SNP) to become methylated [50
]. This category of allele-specific methylation, being sequence-dependent, is distinct from genomic imprinting, in which the allele that becomes methylated is determined by its parental origin and not by its sequence. A method very similar to MSNP was utilized to examine allele-specific methylation on the human inactive and active X-chromosomes in females. This study revealed an interesting phenomenon of opposite patterns of methylation in CGIs versus gene bodies (intragenic sequences), such that the CGIs were found to be hypermethylated on the inactive X chromosome while the gene bodies were relatively hypermethylated on the active X chromosome [51
Bisulfite conversion-based assays utilizing microarray platforms
To characterize methylation patterns at base-pair resolution, bisulfite conversion of DNA followed by sequencing is considered to be the gold-standard approach [52
]. Bisulfite conversion-based approaches can be the first step in characterizing DNA methylation both on microarrays and by NextGen sequencing. Bisulfite treatment converts unmethylated cytosines to uracil, such that U is read as T after PCR amplification and sequencing. This conversion does not affect methylated cytosines, which remain C in the sequence. PCR amplicons generated after bisulfite conversion of genomic DNA can be hybridized to microarrays containing methylation-specific oligonucleotides (MSO; comprise 19–23 nucleotides) to query DNA methylation status [53
]. As MSO probes discriminated methylated from unmethylated cytosines within a given CG-rich sequence, the quantitative differences in hybridization, which are assessed by the fluorescent intensity, can indicate the methylation status of a particular locus. For example, a set of 12 MSOs was designed to test 15 CpG sites within the CGI in the first exon of the ER
α gene [53
]. As each probe can interrogate interrogate 2–4 CpG sites in the CGI the methylation status of the ER
α gene was identified to be strikingly different among breast cancer cell lines. Likewise, further studies applying MSO were able to classify various human tumor types by methylation patterns [54
] and classified a specific subtype of non-Hodgkin’s lymphoma [56
] based on the differential methylation profiles of several gene promoters In addition, a modification of MSO technology illustrating the methylation status of the promoter region of the MGMT
gene was developed to examine colorectal cancer [57
]. An increasingly popular bisulfite-based high-throughput DNA-methylation profiling platform, commercially available from Illumina, utilizes bead arrays to obtain a quantitative measure of the percentage methylation at each CpG site [58
]. In the current version of this approach, genomic DNA samples are bisulfite-converted, and the bead array assay utilizes hybridization and primer extension to query the methylation status of cytosines in specific CpG dinucleotides. The information content of the bead arrays is limited only by the number of specific primers attached to the beads. The proof-of-principle study for this basic approach examined human lung cancers at 1536 specific CpG sites in 371 genes, thereby deriving panels of cancer-specific epigenetic markers [58
]. For several years, the coverage offered by this system (Illumina Infinium methylation assays) has increased substantially to more than 20,000 CpGs in promoter regions of more than 14,000 genes.
MeDIP & methyl-binding domain affinity chromatography followed by microarray-based profiling
Genomic fractionation using restriction enzymes has limitations as previously noted. For example, only 3.9% of all CpG dinucleotides in human nonrepetitive regions are recognizable by Hpa
]. Although the conversion of unmethylated cytosines with bisulfite provides a sensitive alternative, it cannot yet be flexibly or comprehensively applied to whole-genome screenings, as even the Illumina Infinium assays only query a restricted number of CpG sites. A third general type of high-throughput approach in methylation analysis applies MeDIP or affinity chromatography over a methyl-binding domain (MBD) linked to beads, followed by probe preparation and microarray hybridization (). MeDIP utilizes nonspecific fragmentation of the genomic DNA followed by anti-5mC antibodies to enrich for methylated DNA fragments. The resultant immunoprecipitated DNA, enriched in hypermethylated sequences, and total genomic DNA (input) are labeled with fluorescent dyes Cy5 and Cy3, respectively, and cohybridized onto microarray chips (). The ratio of fluorescent intensity (Cy5 to Cy3) indicates the methylation status at each particular gene. MeDIP is thus a valuable general fractionation approach, compatible with any genomic microarray platform to query the level of methylation in genomic sequences. In their proof-of-principle study, Weber et al.
analyzed the genome-wide methylation between male and female fibroblasts using MeDIP coupled with a comparative genomic hybridization microarray that contains bacterial artificial chromosome clones with an average tiling resolution of 80 kb [24
]. In addition to demonstrating methylation differences related to cellular transformation, their results, which included but were not restricted to CGIs, revealed an interesting spatial pattern of methylation on the X chromosome, such that the inactive X chromosome was found to be hypermethylated overall at only a subset of gene-rich regions at the telomeric end and, unexpectedly, was hypomethylated overall relative to its active counterpart. One of the crucial factors in this assay is the sensitivity of the anti-5-methylcytosine antibody. Moreover, the MeDIP method is most sensitive to densely methylated sequences, as DNA fragments with many contiguous methylated CpGs are more efficiently precipitated. Approximately 200 differentially methylated genes were identified in a SW48 colon cancer cell line using MeDIP coupled with a 12k CpG island microarray [24
], which were substantially fewer than the previous examination [48
]. Keshet et al.
coupled MeDIP to a promoter array (nearly 13,000 human gene promoters) and have identified several common motifs of promoters that were significantly methylated in various human cancer cell lines [60
Affinity purification of hypermethylated DNA
Unlike conventional MeDIP, which employs a monoclonal antibody against 5-mC in the context of single-stranded DNA, MBD-based affinity purification is an alternative approach to enrich hypermethylated DNA in the genome. An example is methylated CGI recovery assay (MIRA), which utilizes the very high affinity of the MBD2/MBD 3-like 1 complex to purify methylated DNA (). MIRA is not sequence-dependent, and does not require a denaturation step to make the DNA single stranded. MIRA coupled with microarray-based analyses provides high-resolution genome-wide methylation profiling. By this approach, Rauch et al.
identified a tumor suppressor gene, DLEC1
, as well as 11 homeodomain-family genes that were frequently methylated in primary human lung cancers [61
]. It is worth noting that the MIRA approach has now been commercialized by Life Sciences Technologies (Invitrogen). Another laboratory also pursued this affinity approach, in conjunction with a second step of partial strand melting, followed by direct sequencing of the methylated DNA, for successfully identifying methylated CGIs in human lung cancers [62
]. A similar strategy, using MBD2 fused to the Fc fragment of human IgG1 by protein A-sepharose, was recently developed [63
]. Finally, in an interesting variant of the well known ChIP-on-chip approach for studying chromatin proteins, Ballestar et al.
globally examined the distribution of MBD2 as a surrogate marker for densely clustered methylated CpGs in the genome of breast cancer cells [64
As noted above, in considering the interpretation of microarray experiments using these methods for preparing the probes, it is important to keep in mind that both the MBDs and the anti-5-methyl-C antibody are specifically recognizing only the most densely methylated DNA sequences and are not pulling down DNA that is methylated, even at all available CpGs, in regions of the genome that have a low CpG content. Such regions are better examined by restriction endonuclease-based approaches. It is worthwhile to note that several studies described improved analytic algorithms for interpreting genome-wide methylation data generated by MeDIP experiments [65
Thus, no single approach, with the possible exception of extensive and deep sequencing of bisulfite-converted DNA (which is still slightly beyond current practical capabilities for large sets of biological samples) can give a truly complete picture of the overall epigenome. In summary, combinations of several modalities applied to the same sample set will give the clearest picture.
NextGen sequencing & its application in analyzing DNA methylation
The availability of NextGen sequencing, that is, a high-throughput sequencing technique, offers much higher coverage per run and relatively lower cost for genome-wide sequencing than previous technologies. Therefore, it provides a more cost-effective platform for large-scale methylation detection. As summarized in , the most widely used high-throughput sequencing platforms on the market are the 454 Genome sequencer (Roche), Solexa technology (Illumina, CA, USA) and the SOLiD platform (Applied Biosystems Inc., CA, USA). NextGen sequencing technology is in an exponential development stage. Some recent review papers have provided a more comprehensive view of this technology [67
]. Owing to limited space, a brief introduction to these technologies and their applications in the methylome will be discussed here.
Overview of next-generation DNA sequencing technologies.
The 454 system was the first platform available for NextGen sequencing. According to the principle of real-time pyrophosphate DNA sequencing method, DNA fragments are ligated to adaptors and subjected to emulsion PCR with water-in-oil microreactors. The sequencing signals are collected through the fluorescence generated from luciferin substrate in the sequencing-by-synthesis (SBS) reaction. The system could conduct more than 1 million individual reads at lengths of up to 500 bp. Illumina developed another high-throughput sequencing platform, Solexa genome analyzer, by using a parallelized sequencing approach. The principle of the analyzer is based on bridge PCR, in which the forward and reverse primers are attached to a solid surface so that the adaptor-ligated DNA fragments can be annealed on the surface and subjected to PCR amplification. The system takes advantage of SBS technology with reversible fluorescently labeled terminators, allowing detection of each single synthesized base in a real-time fashion. Almost 150 million reads could be analyzed within 6 h with an accuracy of 99%. Although up to 75 bp of length could be detected in paired-end sequencing, an average of 36 bp would provide much higher accuracy because of signal decay and dephasing. The third platform, the SOLiD system, is based on a sequencing-by-ligation technique. The system propagates sequencing on the template DNA fragments by ligating a pool of fluorescently labeled octamers that contain random oligonucleotide combinations. Each cycle of hybridization and ligation was proceeded by cleavage of the 3′ end of the ligated octomer and addition of the next fluorescent probe. A total of 400 million sequence tags can be generated per run and the length of each read can reach 50 bp.
By taking advantage of high-throughput sequencing technology, an obvious ultimate objective is to achieve cost-effective complete-genome bisulfite sequencing, which will reveal the methylation status of every CpG dinucleotide. A tour-de-force
whole-genome bisulfite sequencing study of Arabidopsis thaliana
, a widely used plant genetic model, was recently carried out by the Jacobsen laboratory by using the Solexa platform [69
]. Their paper contains an interesting comparison of the relative accuracy of array-based and bisulfite sequencing-based methods and, not surprisingly, the sequencing-based approach was more accurate. Another advantage of the sequencing-based analysis was the ability to score methylation in most types of repetitive sequences, which are less easily probed by microarray-based methods.
Unfortunately, whole-human genome bisulfite sequencing currently remains just out of reach because of the larger genome size and relatively small portion of methylated cytosine in mammalian genomes, compared with Arabidopsis thaliana
. Therefore, new approaches have been developed to either enrich high-CpG-density sequences or target specific CpG sites through padlock (molecular inversion) probes. In a pilot study, Taylor et al.
used a MDB affinity column to enrich methylated DNA from either normal peripheral blood leukocytes or several types of human lymphomas. The 454 NextGen sequencing platform was used for high-throughput sequencing of bisulfite PCR amplicons [70
]. Although this approach circumvents and improves the previously rate-limiting step of cloning the bisulfite PCR products, the total sequencing reads remain less than 1 million. To further extend NextGen sequencing technology to the genome-wide scale, Meissner et al.
reported a reduced representation bisulfite sequencing method (), in which 90% of CpG islands in the mouse genome could be covered through Msp
I fractionation. DNA fragments were then subjected to bisulfite conversion and high-throughput sequencing by the Solexa system [71
]. Bestor and associates are utilizing the SOLiD sequencing platform to conduct direct end-sequencing of restriction enzyme-digested and size-fractionated genomic DNA – an approach that they previously validated by using high-volume conventional sequencing [5
A second approach to selectively sequence the CpG-rich methylome is to target specific CpG sites across the genome. By taking advantage of the pilot study from The Encyclopedia of DNA Elements (ENCODE) project, which provided detailed DNA sequence information for 1% of the human genome [72
], two research groups simultaneously reported the utilization of padlock probes to target specific CpG sites across the genome [73
]. The probes were first synthesized by programmable DNA microarrays based on targeted DNA sequences. As shown in , padlock probes were designed to target both the 5′ and 3′ ends of specific CpG sites in bisulfite-converted DNA, so that the CpG sites fall in the gap between two probes. The methylation status of CpG sites was captured by single-ligation amplification reaction. The beauty of this technique is that the library of tens of thousands of multiplexed padlock probes are amplified in one single tube. The captured targeting CpG sites in padlock loops were then subjected to NextGen sequencing. Although only 66,000 CpG sites, accounting for approximately 0.25% of total CpG sites in human genome, were sequenced in the studies, the assay is ready to expand the representation across genome by following the increasing coverage of the ENCODE project.
New biological generalities are emerging from these studies examining both CGIs and nonisland sequences; for example, Meissner et al.
found that most developmental changes in CpG methylation occur outside of promoter regions [71
], and Ball et al.
found a trend for gene bodies to be hypermethylated as a function of active transcription [73
]. Rapid technical progress and new discoveries can be anticipated in this exciting area.