

| Home | About | Journals | Submit | Contact Us | Français |

A complete understanding of the potential function of 5-hydroxymethylcytosine (5-hmC), a DNA cytosine modification in mammalian cells, requires an accurate single-base resolution sequencing method. Here we describe a modified bisulfite-sequencing method, Tet-assisted bisulfite sequencing (TAB-seq), which can identify 5-hmC at single-base resolution, as well as determine its abundance at each modification site. This protocol involves β-glucosyltransferase (β-GT)-mediated protection of 5-hmC (glucosylation) and recombinant mouse Tet1(mTet1)-mediated oxidation of 5-methylcytosine (5-mC) to 5-carboxylcytosine (5-caC). After the subsequent bisulfite treatment and PCR amplification, both cytosine and 5-caC (derived from 5-mC) are converted to thymine (T), whereas 5-hmC reads as C. The treated genomic DNA is suitable for both whole-genome and locus-specific sequencing. The entire procedure (which does not include data analysis) can be completed in 14 d for whole-genome sequencing or 7 d for locus-specific sequencing.
5-Hydroxymethylcytosine was first observed in mammals in 1972, but it was not given much attention until recently1. In 2009, 5-hmC was found to exist in relatively high abundance in Purkinje neurons and embryonic stem cells (ESCs), and was produced specifically through 5-mC oxidation catalyzed by the Tet family of proteins2,3. 5-hmC is thought to be an intermediate in an active demethylation process and may have direct roles in gene expression, as the modified base itself cannot be recognized by most 5-mC–binding proteins3–7. With the development and application of more sensitive detection technologies, 5-hmC has been found to be present at different levels in the genomes of various cell types or tissues8–11. Genome-wide profiling of 5-hmC further indicates potential regulatory roles of 5-hmC in ESC regulation, myelopoiesis, zygote development and neurodevelopment, thus suggesting that it may serve as an epigenetic mark12–20.
After the discovery of 5-hmC, several groups independently reported further oxidization of 5-hmC to 5-formylcytosine (5-fC) and 5-caC catalyzed by Tet proteins21–23. Both 5-caC and 5-fC can be recognized and excised by thymine DNA glycosylase (TDG) and then converted back to cytosine through the base excision repair pathway 21,24,25. This newly discovered active demethylation pathway again suggests that 5-hmC is an intermediate of demethylation. 5-hmC accumulates to high abundance in certain brain tissues, implying functional roles other than as an intermediate in demethylation. Determination of the exact location and relative abundance of 5-hmC will be crucial in order to fully unveil the biology associated with this base modification. We describe here a detailed protocol for the TAB-seq method that we recently published for single-base resolution sequencing of 5-hmC26.
Traditional bisulfite sequencing, which has been widely used to detect 5-mC at single-base resolution, cannot differentiate 5-mC from 5-hmC, as both resist deamination during the treatment of DNA with sodium bisulfite7,27. The protocol described here overcomes this limitation by selectively converting 5-mC to 5-caC in two steps (Fig. 1): protection of 5-hmC through glucosylation and mTet1-mediated oxidation of 5-mC to 5-caC. After subsequent bisulfite conversion, the protected β-glucosyl-5-hydroxymethylcytosine (5-gmC; from 5-hmC) is sequenced as C, whereas 5-caC and C read as T, enabling single-base resolution sequencing of 5-hmC26.

In the first step, β-GT, a T4 bacteriophage protein, is used to transfer a glucose to the hydroxyl group of 5-hmC and generate 5-gmC28,29. This β-GT–catalyzed glucosylation is highly selective and efficient with either natural or chemically modified uridine diphosphate (UDP)-glucose11,30. Several groups, including ours, have used this selective glucosylation reaction of 5-hmC for the enrichment of 5-hmC–containing genomic DNA fragments11,31–33.
5-Methylcytosine can be converted to 5-caC by Tet proteins, which is eventually read as T in bisulfite sequencing. 5-fC, which can be partially converted to T under standard bisulfite treatment, can also be oxidized by Tet proteins to 5-caC34. Thus, only protected 5-gmC will read as C in TAB-seq. Most reagents in the protocol are readily available. Active mTet1 is now commercially available (Wisegen) and expression as well as purification procedures for wild-type β-GT and the active domain of mTet1 can be followed as reported11,13,26. We also provide a detailed protocol for producing and purifying a recombinant mTet1 protein (Box 1).
TAB-seq is amenable to both whole-genome sequencing and locus-specific sequencing. This method has recently been used to produce genome-wide 5-hmC maps at base resolution in human and mouse ESCs26. Although we have not tested this method combined with reduced representation bisulfite sequencing (RRBS), we believe that the TAB method is compatible with RRBS35. In TAB-seq, the detection limit is governed by the conversion rate of 5-mC, protection efficiency of 5-hmC, abundance of 5-hmC at the modification site and sequencing depths26. With the protocol described here, highly efficient conversion of 5-mC to T (above 96%) in genomic DNA can be achieved, with at least 90% of the 5-hmC protected from conversion. Thus, sensitivity and specificity of 5-hmC detection by TAB-seq depends on sequencing depth. A cytosine base with less-abundant 5-hmC modification (i.e., < 5%) will require more sequencing depth than a base with a higher level of 5-hmC. With RRBS or locus-specific sequencing, a better sensitivity of 5-hmC detection may be achieved owing to higher sequencing depth at selected bases.
The affinity-based methods have been widely used to enrich the 5-hmC–containing fragments and profile 5-hmC distribution in the genome. There are two main strategies that were developed previously. One is antibody based, wherein antibodies against 5-hmC12,36–38 or cytosine 5-methylenesulphonate, the product of 5-hmC after bisulfite treatment31, were used. The other is β-GT based and involves one of the following three approaches to achieve selective modification of 5-hmC for affinity purification: (i) an azide-modifed glucose is transferred onto 5-hmC, followed by selective chemical labeling to attach a biotin tag11; (ii) a natural glucose is transferred, followed by periodate oxidation and biotinylation31; or (iii) a protein (JBP1) that specifically recognizes and binds to 5-gmC is used to enrich glucosylated 5-hmC32,33. However, affinity-based methods can neither detect 5-hmC at single-base resolution nor quantify its abundance at the modification site. In the antibody-based approach, recovery of hydroxymethylated fragments can be affected by the density of 5-hmC, especially in 5-hmC immunoprecipitation (hMeDIP), which uses antibodies that recognize 5-hmC31. The regions with high 5-hmC density may be overrepresented, whereas the regions with low 5-hmC density may be underrepresented.
Single-molecule real-time sequencing can identify modified bases on the basis of the different polymerase passing rates at and around the base. Although this technology is capable of detecting 5-mC and 5-hmC modifications directly, its application is limited by low sensitivity and low throughput39. In early 2012, we modified our previous 5-hmC labeling method and combined it with single-molecule real-time (SMRT) DNA sequencing40. With larger kinetic signature, increased 5-hmC abundance and reduced amount of DNA to sequence, SMRT sequencing can be applied to detect 5-hmC in genomic DNA at single-base resolution. However, the quantitative information about 5-hmC at each modification site is lost during enrichment. The throughput of the method needs to be improved for the sequencing of large genomes.
Oxidative bisulfite sequencing (oxBS-seq), which can discriminate 5-mC from 5-hmC, was recently reported34. In this modified bisulfite-sequencing method, KRuO4 selectively oxidizes 5-hmC to 5-fC at high efficiency, followed by conversion to T in subsequent bisulfite treatment and PCR amplification. A comparison of the results of oxBS-seq with those of standard bisulfite sequencing allows for the quantitative sequencing of both 5-mC and 5-hmC at single-base resolution. Application of oxBS-seq requires multiple rounds of bisulfite treatment to fully deaminate 5-fC, and chemical oxidation may cause extensive oxidative DNA damage. Compared with oxBS-seq, TAB-seq gives direct reads of 5-hmC, and the treatment procedure incurs less DNA damage34. However, TAB-seq requires highly active Tet protein for efficient conversion of 5-mC to 5-caC.
As incomplete oxidation of 5-mC will result in false-positive 5-hmC signals, the availability of highly active mTet1 protein is crucial to TAB-seq. The expression and purification procedures of recombinant mTet1 are described in Box 1. To ensure high activity of the recombinant mTet1, all steps must be performed at 4 °C or on ice during purification. Aliquots of mTet1 proteins are stored at − 80 °C before use. Improper storage or multiple freeze-thaw cycles (more than twice) may result in decreased oxidation activity. The protocol to test the activity of purified mTet1 is described in Box 2; it is strongly recommended to carry out the activity test on each batch of newly purified recombinant mTet1 before applying it to TAB-seq.
For both locus-specific and whole-genome sequencing, two key parameters exist for an accurate estimation of 5-hmC abundance besides the conversion rate of unmodified cytosine to uracil: the oxidation efficiency of 5-mC to 5-caC and the protection efficiency of 5-hmC. Although nonprotected 5-hmC can result in an underestimation of 5-hmC abundance, nonconversion of unmodified C and 5-mC will result in false-positive 5-hmC signals, and should therefore be determined in each experiment. With sufficiently high C and mC conversion rates and 5-hmC protection rates, the abundance of 5-hmC can be quantified from the frequency with which C is read compared with T at a given genomic position in any sequencing experiment. To assess the conversion rates in genomic DNA, controls containing 5-mC and 5-hmC need to be spiked in before treatment. Such controls should be of sufficient complexity and contain modified cytosines in various sequence contexts (i.e., multiple CpGs throughout). For genome-wide sequencing, spike-in DNA should span at least 1 kb of the sequence, such that subsequent random fragmentation by sonication and sequencing can distinguish PCR duplicates. Furthermore, after bisulfite conversion, spike-in DNA should not align to a bisulfite-converted target genome. In practice, we find that DNA from the lambda phage and the pUC19 plasmid work well as spike-ins for mouse and human samples.
For the 5-mC control, DNA can either be selectively methylated at CpG sites using CpG methyltransferase or amplified with 5-mdCTP. However, the CpG-methylated control is recommended, as the frequent neighboring 5-mC generated by PCR may lead to the underestimation of the oxidation efficiency.
For the 5-hmC control, there is no enzyme that can selectively generate 100% 5-hmC from C or 5-mC; therefore, besides synthesizing long oligonucleotides containing 5-hmC at required positions, the easiest and most cost-effective way to generate DNA longer than 1 kb with multiple 5-hmC sites may be through PCR amplification with 5-hydroxymethyl dCTP (5-hmdCTP). With this method, each C position is supposed to be 100% 5-hmC. However, we have found that many commercial 5-hmdCTPs contain contaminant dCTP, which will result in the underestimation of protection efficiency because unmodified cytosine will display as `T' in TAB-seq (Fig. 2). Purifying the commercial 5-hmdCTP with HPLC offers one solution to this problem and takes about 2 d (Box 3). However, it should be noted that it may be difficult to amplify fragments larger than 2 kb with purified 5-hmdCTP. Alternatively, if experiments are run alongside conventional bisulfite treatments, dCTP contamination can be adjusted for by direct measurement of bisulfite-converted cytosine in the 5-hmC spike-in control. Typically, > 90% protection efficiency can be achieved for 5-hmC protection after taking contamination into account.

In this protocol, we use methylated λ-DNA as C and 5-mC control in both genome-wide and locus-specific sequencing. For the 5-hmC control, a PCR product of 1.64 kb from a pUC19 vector (5-hmC control 1; generated with 5-hmC_F_1 and 5-hmC_R_1 primers) is used for the estimation of 5-hmC protection efficiency only in genome-wide sequencing. The 290-bp control (5-hmC control 2; generated with 5-hmC_F_2 and 5-hmC_R_2 primers), which is relatively easier to amplify and clone after bisulfite treatment, is used for the verification of 5-hmC protection efficiency after β-GT and mTet1 treatment with TOPO cloning. In both 5-hmC spike-in controls, all Cs are 5-hmC except for those in the primer sequence.
Verification of 5-mC conversion and 5-hmC protection on spike-in controls with TOPO cloning is strongly recommended after β-GT and mTet1 treatment but the procedure before proceeding with large-scale sequencing. However, it may be simplified if no quantitative conversion or protection rate is required for locus-specific sequencing.
Data analysis for TAB-seq and traditional bisulfite sequencing are nearly identical. At each genomic locus, the estimated abundance of 5-hmC (A5-hmC) is measured as the number of cytosine base calls divided by the total (C + T) sequencing depth at the locus. For genome-wide analysis, only good-quality base calls (Phred score of 20 or greater) are considered. To correct for the 5-hmC protection rate (r5-hmC) not being 100% efficient, the absolute abundance of 5-hmC is calculated as E5hmC = A5hmC / r5hmC.
The resolution of TAB-seq to detect 5-hmC is crucially dependent on sequencing depth. For example, the median abundance of 5-hmC at 5-hmC sites in H1 ESCs is just under 20%, and detecting base-resolution 5-hmC at this level requires sequencing at a depth of ~25 times per cytosine, or ~50 times the haploid genome size. For cells with higher levels of 5-hmC, less sequencing is required: we estimate that a depth of ~15 times per cytosine is sufficient to detect base-resolution 5-hmC with an abundance of 30%. If base-resolution precision is not necessary, much less sequencing is required. We recommend sequencing reads with a length of at least 100 bp. Please see genome-wide bisulfite sequencing methods for additional details26,41.
The entire workflow for the method is shown in Figure 3. The details of all oligonucleotides used in this protocol are listed in Table 1. Supplementary Note 1 shows the sequence of the 5-hmC spike-in controls.
Add 50 ml of FBS and 5 ml of penicillin-streptomycin to 500 ml of supplemented Grace's insect medium.
The medium should be freshly prepared before use.
Mix 20 mM Tris-HCl (pH 8.0), 500 mM NaCl, 1 mM TCEP, 1 mM PMSF, 1 μg ml− 1 leupeptin and 1 μg ml− 1 pepstatin. TCEP, PMSF, leupeptin and pepstatin should be added immediately before use. The cell lysis buffer without TCEP, PMSF, leupeptin or pepstatin can be stored for 6 months at 4 °C.
Mix 20 mM HEPES (pH 8.0), 150 mM NaCl and 1 mM DTT. DTT should be added immediately before use. The GF running buffer without DTT can be stored for 6 months at 4 °C.
Mix 10 mM ultrapure dATP, dTTP, dGTP and 5-hmdCTP. It can be stored for 3 months at − 20 °C.
Mix 500 mM HEPES (pH 8.0) and 250 mM MgCl2. This buffer can be stored for 1 year at room temperature (20–25 °C).
Add 14.7 mg of Fe(NH4)2(SO4)2·6H2O to 1 ml of H2O and then make a 24-fold dilution to a concentration of 1.5 mM.
▲ CRITICAL Tet oxidation reagent 1 can be stored at − 80 °C for 2 months. Storage of Tet oxidation reagent 1 at room temperature can lead to reduced 5-mC oxidation efficiency.
Mix 333 mM NaCl, 167 mM HEPES (pH 8.0), 4 mM ATP, 8.3 mM DTT, 3.3 mM α-KG and 6.7 mM l-ascorbic acid. ▲ CRITICAL Tet oxidation reagent 2 can be stored at − 80 °C for 2 months. Storage of Tet oxidation reagent 2 at room temperature will lead to reduced 5-mC oxidation efficiency.
|
| ||
| Component | Amount (μl) | Final concentration |
|
| ||
| Milli-Q water | 44 | |
| NEbuffer 2 (10×) | 10 | 1× |
| S-adenosylmethionine (32 mM) | 2 | 0.64 mM |
| Unmethylated λ-DNA (450 μg ml−1) | 40 | 180 ng μl−1 |
| SssI methylase (20 U μl−1) | 4 | 0.8 U μl−1 |
| Final volume | 100 | |
|
| ||
|
| |||
|
Amount (μl)
| |||
| Component | Tube 1 | Tube 2 | Final concentration |
|
| |||
| Reaction buffer (2×) | 25 | 25 | 1× |
| 5-hmC dNTP mix (10 mM each) | 1 | 1 | 200 μM each |
| 5-hmC_F_1 (10 μM) | 2 | — | 0.4 μM |
| 5-hmC_R_1 (10 μM) | 2 | — | 0.4 μM |
| 5-hmC_F_ 2 (10 μM) | — | 2 | 0.4 μM |
| 5-hmC_R_ 2 (10 μM) | — | 2 | 0.4 μM |
| pUC 19 vector (1 ng μl−1) | 1 | 1 ng per 50 μl | |
| ZymoTaq DNA Polymerase (5 U μl−1) | 0.4 | 0.4 | 2 U per 50 μl PCR |
| Milli-Q water | to 50 | to 50 | |
| Final volume | 50 | 50 | |
|
| |||
|
| |||
| Cycle number | Denature | Anneal | Extend |
|
| |||
| 1 | 95 °C, 10 min | ||
| 2–41 | 95 °C, 30 s | 57 °C, 30 s | 72 °C, 1 min for tube 2; 72 °C, 1.5 min for tube1 |
| 42 | 72 °C, 7 min | ||
|
| |||
|
| ||
| Component | Amount | Final concentration |
|
| ||
| Sheared DNA from Step 11 | 1 or 3 μg | 50 or 150 ng μl−1 |
| UDP-glucose (10 mM) | 1μl | 200 μM |
| β-GT protection buffer (10×) | 2μl | 1× |
| T4-βGT (40 μM) | 0.5 μl | 1 μM |
| Milli-Q water | to 20 μl | |
| Final volume | 20 μl | |
|
| ||
|
| ||
| Component | Amount | Final concentration |
|
| ||
| Glucosylated DNA from Step 14 | 500 ng | 10 ng μl−1 |
| Tet oxidation reagent 1 | 3.5 μl | |
| Tet oxidation reagent 2 | 15 μl | |
| mTet1 protein (3 mg ml−1) | 5 μl | 0.3 μg μl−1 |
| Milli-Q water | to 50 μl | |
| Final volume | 50 μl | |
|
| ||
|
| |||
|
Amount (μl)
| |||
| Component | C and 5-mc test | 5-hmc test | Final concentration |
|
| |||
| PfuTurbo Cx reaction buffer (10×) | 5 | 5 | 1× |
| dNTP mix (25 mM each) | 0.4 | 0.4 | 200 μM each |
| 5-mC_test_F (10 μM) | 1 | — | 0.2 μM |
| 5-mC_test_R (10 μM) | 1 | — | 0.2 μM |
| 5-hmC_test_F (10 μM) | — | 1 | 0.2 μM |
| 5-hmC_test_R (10 μM) | — | 1 | 0.2 μM |
| Bisulfite-treated DNA (from Step 19) | 2 | 2 | |
| PfuTurbo Cx DNA Polymerase (2.5 U μl−1) | 1 | 1 | 2.5 U per 50 μl PCR |
| Milli-Q water | to 50 | to 50 | |
| Final volume | 50 | 50 | |
|
| |||
|
| |||
| Cycle number | Denature | Anneal | Extend |
|
| |||
| 1 | 95 °C, 2 min | ||
| 2–41 | 95 °C, 30 s | 57 °C, 30 s for 5-mC test | 72 °C, 1 min |
| 45 °C, 30 s for 5-hmC test | |||
| 42 | 72 °C, 10 min | ||
|
| |||
■ PAUSE POINT The bisulfite-treated DNA can be stored at − 20 °C for several weeks.
|
| ||
| Component | Amount (μl) | Final concentration |
|
| ||
| PfuTurbo Cx reaction buffer (10×) | 5 | 1× |
| dNTP mix (25 mM each) | 0.4 | 200 μM |
| Forward primer (10 μM) | 1 | 0.2 μM |
| Reverse primer (10 μM) | 1 | 0.2 μM |
| Bisulfite-treated DNA (from Step 25A(i) | 2 | |
| PfuTurbo Cx DNA polymerase (2.5 U μl−1) | 1 | 2.5 U per 50 μl PCR |
| Milli-Q water | to 50 | |
| Final volume | 50 | |
|
| ||
|
| |||
| Cycle number | Denature | Anneal | Extend |
|
| |||
| 1 | 95 °C, 2 min | ||
| 2–41 | 95 °C, 30 s | Primer dependent | 72 °C, 1 min |
| 42 | 72 °C, 10 min | ||
|
| |||
? TROUBLESHOOTING
? TROUBLESHOOTING
Troubleshooting advice can be found in Table 2.
● TIMING
For genome-wide sequencing, the 5-hmC protection rate on the spike-in control should be > 80% before adjusting for contaminants. The 5-mC oxidation rate (5-mC to 5-caC) on the spike-in control is the single most important parameter in TAB-seq, as it defines the lowest statistically obtainable abundance of 5-hmC. Because of the rarity of 5-hmC in most cell types, this value should not be < 95%.
For locus-specific sequencing, genuine 5-hmC will display partially as C (i.e., not all C sites will be modified) in Sanger sequencing results depending on the abundance of 5-hmC at that site. The relative abundance of 5-hmC at the modification site can be further estimated by TOPO cloning.
|
| ||
| Component | Amount | Final concentration |
|
| ||
| Genomic DNA of interest | 500 ng | 10 ng μl−1 |
| Methylated λ-DNA (from Step 3 of the main PROCEDURE) | 2.5 ng | |
| Tet oxidation reagent 1 | 3.5 μl | |
| Tet oxidation reagent 2 | 15 μl | |
| mTet1 protein (3 mg ml−1) | 5 μl | 0.3 μg μl−1 |
| Milli-Q water | to 50 μl | |
| Final volume | 50 μl | |
|
| ||
|
| ||
| Component | Amount (μl) | Final concentration |
|
| ||
| PfuTurbo Cx reaction buffer (10×) | 5 | 1× |
| dNTP mix (25 mM each) | 0.4 | 200 μM each |
| 5-mC_test_F (10 μM) | 1 | 0.2 μM |
| 5-mC_test_R (10 μM) | 1 | 0.2 μM |
| Bisulfite-treated DNA (from step 5) | 2 | |
| PfuTurbo Cx DNA polymerase (2.5 U μl−1) | 1 | 2.5 U per 50 μl PCR |
| Milli-Q water | to 50 | |
| Final volume | 50 | |
|
| ||
|
| |||
| Cycle number | Denature | Anneal | Extend |
|
| |||
| 1 | 95 °C, 2 min | ||
| 2–41 | 95 °C, 30 s | 57 °C, 30 s | 72 °C, 1 min |
| 42 | 72 °C, 10 min | ||
|
| |||
This study was supported by the US National Institutes of Health (GM071440 and HG006827 to C.H., U01 ES017166 to B.R., NS079625 and HD073162 to P.J.), a Catalyst Award (to C.H.) from the Chicago Biomedical Consortium with support from the Searle Funds at The Chicago Community Trust, the Ludwig Institute for Cancer Research (to B.R.) and the Emory Genetics Discovery Fund (to P.J.).
Supplementary information is available in the online version of the paper.
AUTHOR CONTRIBUTIONS M.Y., C.-X.S. and C.H. conceived the original idea. M.Y., C.-X.S. and C.H. designed the experiment with the help from B.R. and P.J.; M.Y. performed treatment of genomic DNA; M.Y., G.C.H. and K.E.S. performed locus-specific sequencing; and G.C.H. and K.E.S. performed genome-wide sequencing. M.Y. and C.H. drafted the manuscript, and all the authors participated in writing and editing the manuscript.
COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details are available in the online version of the paper.
Reprints and permissions information is available online at http://www.nature.com/reprints/index.html.
PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |