Epigenetic mechanisms of transcriptional regulation are increasingly being studied for their potential influences in human disease pathogenesis. Much of this interest is based on the paradigm of neoplastic transformation, in which epigenetic changes appear to be universal, widespread throughout the genome, causative of critical transcriptional changes and predictive of disease prognosis (reviewed in [1
]). Furthermore, these epigenetic changes represent potential pharmacological targets for reversal and amelioration of the disease process [2
Of the large number of regulatory processes referred to as epigenetic, there exist numerous assays to study chromatin component distribution, cytosine methylation and microRNA expression genome-wide. The chromatin components include a large number of post-translational modifications of histones, variant histones, DNA-binding proteins and associated complexes, all tested by chromatin immunoprecipitation (ChIP) approaches coupled with microarray hybridization or massively parallel sequencing (MPS). MicroRNAs can be identified and quantified by using microarrays and MPS, while cytosine methylation can be definitively studied by converting the DNA of the genome using sodium bisulfite, shotgun sequencing the product using MPS and mapping this back to the genome to count how frequently cytosines remain unconverted, indicating their methylation in the starting material, due to the resistance of methylcytosine to bisulfite conversion compared with unmethylated cytosines. This allows nucleotide resolution, strand-specific, quantitative assessment of cytosine methylation, with such studies performed in Arabidopsis
] and human cells to date [6
While this approach represents the ideal means of testing cytosine methylation, the amount of sequencing necessary (for the human genome, over 1 billion sequences of ~75 bp each [6
]) to generate quantitative information genome-wide remains prohibitive in terms of cost, limiting these studies to the few referred to above. When studying human disease, the emphasis remains on cytosine methylation assays, as it is generally easier to collect clinical samples for DNA purification than for ChIP or even RNA assays. However, the cell populations harvested are rarely of high purity, and we generally do not know the degree of change in cytosine methylation in the disease of interest and thus the quantitative discrimination required for an assay, with some studies to date indicating that the changes may be quite subtle [7
]. These concerns emphasize the need for cytosine methylation assays that can detect methylation levels intermediate in value and changes in disease that are relatively modest in magnitude. Certain microarray-based assays to study cytosine methylation have addressed this issue, with the methylated DNA immunoprecipitation (meDIP) assay amenable to such quantification when used for CpG islands [8
] and possibly also for less CG dinucleotide-rich regions [9
]. Restriction enzyme-based assays used with microarrays have also proven to be reasonably quantitative, whether based on methylation-sensitive (for example, the HELP assay [10
]) or methylation-dependent (for example, MethylMapper [11
]) enzymes. A promising new MPS-based assay is reduced representation bisulfite sequencing (RRBS), which is designed to study the CG-dense regions defined by short MspI fragments, and provides nucleotide resolution, quantitative data [12
The use of MPS for what were previously microarray-based assays has been associated with improved performance [13
], as we found when we modified our HELP (HpaII tiny fragment Enrichment by Ligation-mediated PCR) assay [10
] for MPS, creating an assay similar to Methyl-Seq [14
]. The strength of the HELP assay involves the comparison of the HpaII with the methylation-insensitive MspI representation, allowing a normalization step that makes the assay semi-quantitative [10
]. The HELP representation approach was improved upon by Ball et al.
], who developed the Methyl-Sensitive Cut Counting (MSCC) assay, which involves digesting DNA with HpaII, ligating an adapter to the cohesive end formed, using a restriction enzyme site within the adapter to digest at a flanking sequence and thus capturing the sequence immediate adjacent to the HpaII site. By adding a second MPS-compatible adapter, a library can be generated for MPS, allowing the counting of reads at these sites to represent the degree of methylation at the site. The authors demonstrated the assay to be reasonably quantitative, testing over 1.3 million sites in the human genome, representing not only HpaII sites clustered in CG-dense regions of the genome (approximately 12% of all HpaII sites are located in annotated CpG islands in the human genome [16
]) but also the remaining majority of the genome in which CG dinucleotides are depleted, a genomic compartment not tested by RRBS as currently designed. A focus on the CG-dense minority of the genome will fail to observe changes such as those at CG-depleted promoters (such as OCT4
]) and CpG island shores [18
], and within gene bodies where cytosine methylation has been found to be positively correlated with gene transcription [15
]. It is likely, therefore, that an assay system that can study both CG-dense and CG-depleted regions will acquire substantially more information about epigenomic states than those directed at the CG-dense compartment alone.
In the current study, we tested whether the use of an MspI control would improve MSCC assay performance, as we had found for microarray-based HELP, and whether we could develop an analytical pipeline for routine use of this assay in epigenome-wide association studies. We also explored the use of longer tags than those employed in the MSCC, and added T7 RNA polymerase and reverse transcription steps to allow the generation of libraries without contaminating products, thus obviating the need for gel extraction. The influence of base composition and fragment length parameters as potential sources of bias were also tested, using the H1 (WA01) human embryonic stem (ES) cell line. The outcome is a modified assay that combines the strengths of MSCC and HELP-seq/Methyl-seq, and the supporting analytical workflow that maximizes the quantitative capabilities of the data generated.