|Home | About | Journals | Submit | Contact Us | Français|
The epigenetic changes during B-cell development relevant to both normal function and hematologic malignancy are incompletely understood. We examined DNA methylation and RNA expression status during early B-cell development by sorting multiple replicates of four separate stages of pre-B cells derived from normal human fetal bone marrow and applied high-dimension DNA methylation scanning and expression arrays. Features of promoter and gene body DNA methylation were strongly correlated with RNA expression in multipotent progenitors (MPPs) both in a static state and throughout differentiation. As MPPs commit to pre-B cells, a predominantly demethylating phenotype ensues, with 79% of the 2966 differentially methylated regions observed involving demethylation. Demethylation events were more often gene body associated rather than promoter associated; predominantly located outside of CpG islands; and closely associated with EBF1, E2F, PAX5 and other functional transcription factor (TF) sites related to B-cell development. Such demethylation events were accompanied by TF occupancy. After commitment, DNA methylation changes appeared to play a smaller role in B-cell development. We identified a distinct development-dependent demethylation signature which has gene expression regulatory properties for pre-B cells, and provide a catalog reference for the epigenetic changes that occur in pre-B-cell leukemia and other B-cell-related diseases.
B-lymphopoiesis is a highly coordinated process initiating from pluripotent hematopoietic stem cells (HSCs) and involves multiple developmental stages. Multipotent progenitors (MPPs) develop into common lymphoid progenitors (CLPs), B-progenitor (pro-B), B-precursor (pre-B), immature B-cell and mature B-cell stages, each of which is characterized by distinct biological features (1,2). The key events during early stages include commitment to B-lineage and suppression of non-B and stem cell components, which are gradually intensified as the developmental stages proceed through hierarchical lineage priming by different transcription factors (TFs) (3,4). It is now believed that an orchestrated network of TFs has a fundamental role in B-cell development (5,6). These TFs are shown to be indispensible in expressing functioning B-cell proteins and altering the genetic landscape, including immunoglobulin V(D)J recombination.
Recent evidence suggests that TF networks are closely related to DNA methylation, an important means of gene regulation (7,8). During tissue-specific development, DNA hypermethylation has been implicated in the stable silencing of stem cell-associated TFs, such as POU5F1 (also known as OCT4), SOX2 and NANOG, whereas DNA demethylation has been implicated in the activation of differentiation-associated TFs and their target genes (9–11). In the case of B-cell development, the key TF genes include SPI1 (PU.1), BCL11A, TCF3 (E2A), IKZF1 (Ikaros), EBF1, PAX5 and FOXO1, and their target genes include IL7R, RAG1, RAG2, VPREB1, IGLL1 (λ5), CD79A (mb-1), CD19, BLNK, IRF4 and many others (3–5). Although key B-cell regulators have generally been studied on a gene-by-gene basis in mouse models, genome-wide studies in human cells are urgently needed for comprehensive understanding of the transcriptional and epigenetic signatures of B-cell development.
In the present study, we used genome-wide arrays to analyse the dynamic DNA methylation and expression changes during B-cell development, including the use of isolated and purified MPPs, pre-B-I cells, pre-B-II cells and immature B-cells from bone marrow isolates. We provide a reference methylation and expression map as well as a catalog of key DNA methylation changes during B-cell development.
Human fetal bone marrow (FBM) was obtained from elective abortions at San Francisco General Hospital with the consent of the women undergoing the surgical procedures and with the approval of the University of California San Francisco’s Committee on Human Research. FBM was extracted as previously described (12) from specimens from eight foetuses ranging in age between 20 weeks and 24 weeks of gestation as estimated based on foot length. Four stages of B-cell precursors were isolated via flow cytometry sorting (Supplementary Figure S1). Early progenitors were isolated based on high levels of CD34 protein expression (CD34++) and a lack of expression of the B-cell marker CD19. This population, designated as stage 1 (S1), contains MPPs before lineage commitment (predominantly MPPs) but also containing CLPs and stem cells. B-cell-committed progenitors were isolated based on their expression of CD19 and CD34 (CD19+CD34+), which were predominantly pre-B-I cells and were designated as stage 2 (S2). Two immature B-cell populations were isolated that expressed CD19, but not CD34, and were differentiated based on surface IgM (sIgM) expression: stage 3 cells (S3) were predominantly pre-B-II cells that express sIgM−CD19+; stage 4 cells (S4) were predominantly immature B-cells that express sIgM+CD19+.
Genomic DNA was isolated from eight FBM specimens using the AllPrep DNA/RNA Mini Kit (Qiagen). For each specimen, both DNA and total RNA was extracted for four purified cell populations representing S1–S4 (Supplementary Figure S1), giving rise to a total of 32 isolations. DNA was modified by sodium bisulfite to convert unmethylated cytosines to uracil using the EZ DNA Methylation kit (Zymo Research, Orange, CA). The bisulfite-treated DNA of six specimens (24 samples) was amplified and hybridized onto the Illumina HumanMethylation450 Beadchip (HM450; Illumina, San Diego, CA) according to the manufacturer’s specifications. Raw data were processed using the GenomeStudio software (Illumina), and the average methylation beta values (β) ranging from 0 (unmethylated) to 1 (fully methylated) were retrieved for each probe. CpG sites with detection P-values >1.0 × 10−4 were removed from analysis. Methylation samples with a high proportion (>4%) of suboptimal data were eliminated from analysis. This resulted in the removal of two samples with poor data quality.
Total RNA was isolated from the four purified cell populations, S1–S4 (Supplementary Figure S1), from each of the eight specimens as described above. Genome-wide gene expression analysis was performed using the GeneChip Human Gene 1.0 ST Array (Affymetrix, Santa Clara, CA). For each sample, 1 μg of high-quality total RNA was amplified and labeled (NuGen Ovation) and hybridized onto the microarray according to the manufacturer’s specifications. Data were processed and normalized using the Expression Console software (Affymetrix) with the standard Robust Multi-array Average (RMA) algorithm (13). One sample was excluded because of poor data quality, resulting in a total of 31 samples from 4 developmental stages in 8 fetuses.
Chromatin immunoprecipitation (ChIP) was done as previously described (14) with modifications. Briefly, sorted S1 and S3 cells (see above) from normal bone marrow were crosslinked for 15 min at room temperature with 1% (wt/vol) formaldehyde (sigma, F8775), then lysed and sheared to 100–1000 bp in size with ChIP-IT Express Enzymatic Shearing Kit (Active Motif. Cat# 53035) according to manufacturer’s instruction. Sheared chromatin was immunoprecipitated with 10 μg anti-EBF1 antibody (Abnova, H00001879-D01P) or 10 μg anti-BCL11A antibody (abcam, Cat# ab19489) or non-specific IgG (Santa Cruz Biotechnology, Cat# sc-2025, performed as a negative control) using EZ-ChIP™ Chromatin Immunoprecipitation Kit (Millipore, Cat# 17-371) following the kit instruction manual. The agarose beads were washed and the bound chromatin was eluted. The ChIPed and input chromatin were incubated overnight at 65°C for reversal of crosslinking. Samples were then treated with RNase A and proteinase K and purified with a Spin Filter from EZ ChIP—a Chromatin Immunoprecipitation Kit (Millipore, Cat# 17-371). Chromatin was end repaired, ligated to a linker and amplified by linker-mediated PCR (LM-PCR) using GeneAmp High Fidelity PCR System (ABI, Cat# 4328212). Amplified ChIP and input DNA were purified with MinElute Reaction Cleanup Kit (Qiagen, Cat# 28204) and normalized to ~30 ng/µl.
Thirty nanograms DNA (pull-down and input DNA, normalized for DNA concentration) was analysed by SYBR green real-time PCR and the CFX384 Real-Time PCR System (BioRad, CA, USA). Primers were designed to four predicted EBF1 targets and four independent BCL11A targets (predicted from ENCODE data). Primer sequences and amplicon locations are available on request. A two-step PCR cycling method was used: Preheat at 95°C for 10 min, and repeat 40 times at 95°C for 15 s, 60°C for 1 min. We calculated the fold differences in eight candidate loci between ChIP DNA and the input DNA using the following expression: Fold Difference = 2(Cq,Input DNA-Cq,Ip DNA). The experiment was peformed in quadruplicate, and a separate experiment starting from chromatin was repeated once.
Statistical analysis and data visualization were carried out using the R statistical software with Bioconductor packages. Methylation or expression levels between two groups were compared using moderated t-statistics in a mixed-effect model to accommodate correlations between samples from the same donor (15). P-values were adjusted with the Benjamini–Hochberg correction (16). To investigate the association between differential expression and differential methylation, a robust linear regression model was used with the differential expression level (log2 fold changes) as the outcome and the differential methylation level (beta differences) as the predictor, adjusted for baseline expression and methylation levels at the preceding stage. Confidence intervals and P-values were derived for the regression coefficients using 2000 bootstrapped samples. Fisher’s exact test was used to compare the equality of two binomial probabilities. Enrichment for TF motifs was done using the Analysis of Motif Enrichment (AME) (17) and MEME Suite (18); sequences 100 bp (±50 bp) around significant CpGs were retrieved and analysed with core motifs from the TRANSFAC Professional (http://www.gene-regulation.com) database. E-values below 0.001 were regarded as significant. DNA sequences, positional information of the NCBI reference genes, the ENCODE chromatin immunoprecipitation sequencing (ChIP-Seq) data, and CpG island (CGI) annotations were retrieved from the UCSC database (http://genome.ucsc.edu/). Chip-Seq enrichment analysis was performed by examining whether differentially methylated regions (DMRs) are located within TF binding sites of the ENCODE Chip-Seq database and by comparing their frequencies with those of all CpG sites on HM450 (Fisher’s exact test). Pathway enrichment analysis was done using the Ingenuity Pathway Analysis software (Ingenuity Systems, Redwood City, CA).
The MPPs (S1) serve as the baseline DNA methylation levels before the cells are committed to any lineages. At S1, similar to a previous report (19), we observed lower methylation levels near the promoter region compared to the body of a gene (Supplementary Figures S2A and B). Interestingly, while CGIs were in general depleted of methylation, we found that CGIs located in the body region were more methylated and more variable than CGIs sites lying within promoters (Supplementary Figures S2B).
We next compared methylation levels in the S1 samples with the histone modification ChIP-seq data of the lymphoblastoid cell line GM12878 from the ENCODE project (20) as DNA methylation is believed to interact with histone modifications to regulate gene expression (7,8). There was a strong negative correlation between DNA methylation levels and the presence of histone marks that target actively transcribed genes. The median methylation value was low in H3K4Me1, H3K4Me3 and H3K79me2 peaks, which preferentially mark enhancers, promoters and transcriptional transition regions, respectively, whereas the value was high in repressive mark, H3K27me3. In addition, Polycomb target genes tended to be hypomethylated whereas TET1 target genes (derived from human homologs of mouse TET1 targets) tended to be semimethylated (Figure 1A) at S1. TET2 targets would be more appropriate to examine in hematopoietic cells but have not been described. Similar methylation profiles were also observed in the other stages (results not shown).
Although it has been well-established that methylation in the promoter region is involved in transcriptional silencing (21), little is known about how body methylation affects gene expression and whether promoter and body methylation regulate expression in concert. To examine this, we partitioned promoter and body methylation levels into hypo- (β ≤ 0.3), semi- (0.3 < β ≤ 0.7) and hypermethylated (β > 0.7) groups and compared gene expression levels according to these different combinations in Figure 1B. The most highly expressed group was that with both promoter and body hypomethylated whereas the lowest was with both regions hypermethylated, and the mean difference in gene expression between these two groups was 2.8-fold (P < 2.2 × 10−16, two-sample t-test). Interestingly, when the gene body region is devoid of CpG sites, the repressive nature of promoter methylation is at its most potent with a striking 3.6-fold difference in expression between the hypo- and hypermethylated promoter groups whereas the same difference is 2-fold in genes with body CpG sites (Figure 1B, right). We found many housekeeping genes such as GUK1, CENPB, RPS2, NME2 and UBE2M belong to this group of genes lacking body methylation sites. To formally model the relationship between expression and methylation, a robust linear regression model was fit to the expression levels with promoter and body methylation and their interactions as predictors, and the regression coefficients were −1.55 (promoter, P < 0.001 using 2000 bootstrapping), −0.46 (body, P < 0.001) and 0.05 (interaction, P = 0.36), respectively. This quantification contrasts the extent to which promoter and body methylation controls expression and confirms that promoter methylation exerts stronger control over transcription. But intriguingly, when promoter methylation is held constant, increasing body methylation is associated with more repressed expression, suggesting a fine-tuning mechanism of body methylation for transcription regulation.
After examining the static methylation and expression profiles of the lymphoid precursor (S1), we next sought to identify DMRs between any two subsequent stages (S1->S2, S2->S3 and S3->S4). We chose the CpG sites that had Benjamini–Hochberg-adjusted P-values ≤0.05 for the moderated t-test and β-difference (Δβ) ≥ 0.2. Although two distinct types of DMRs were observed during stage progression, those that had a loss of methylation, which we designate as de-DMRs, and those that obtained more methylation, which we designate as de novo DMRs, de-DMRs were conspicuously dominant during stage progression. A scatter plot between –log10 P and Δβ of any two sequential stages showed that significant CpG sites were skewed to negative Δβ values (Figure 2A). We next assembled the 2987 unique de-DMRs along the stage progression and designated them as the ‘core de-DMRs’; as expected, these loci were likely to be more methylated at S1 than de novo DMRs (Figure 2B). During S1->S2, de-DMRs (2330 CpG sites; 1039 genes) outnumbered de novo DMRs (636 CpG sites; 421 genes) by almost 4 to 1. DMRs between S2->S3, and S3->S4 are exclusively de-DMRs, 828 and 139 CpG sites (455 and 81 genes), respectively (Figure 2C). Not only did de-DMRs far outnumber de novo DMRs loci, there was also a significant overlap between these two sets from S1->S2 (P < 2.2 × 10−16), which suggests that demethylation plays a more significant role during pre-B-cell development but de novo methylation likely plays only a secondary and accessory role.
We observed only a small number of DMRs between S3 and S4, suggesting that these two stages are very similar in terms of their methylation profiles. A hierarchical clustering of the 22 samples using the 2987 core de-DMRs perfectly classified each sample into its relevant stage, with S3 and S4 samples closely assigned into the same sub-tree (Figure 2D). Interestingly, both the clustering heatmap and the boxplots indicate that loss of methylation is a continuous event during pre-B-cell development and once a CpG site incurs a loss of methylation, it remains or becomes even more demethylated throughout subsequent stages (Figure 2D).
The core de-DMRs in B-cell development predominantly occurred in the CpG sites that are located in the gene body regions and outside of the CGIs, and depleted in promoter regions within the CGIs (Figure 2E). Among the core de-DMRs, 23% and 42% map to the promoter and body regions of a gene, respectively; a 1.8-fold preference for the body regions, in contrast to the background frequencies by all the CpG sites on the HM450 array, which have roughly equal amount of promoter or body CpG sites (36% and 34%, respectively). Interestingly, only 35 among the >1000 genes demethylated from S1 to S2 acquired both promoter and gene body changes, indicating that methylation status in these two regions were altered by independent processes. The partiality toward CpG sites outside of CGIs is even more striking with only 5% of the core de-DMRs located inside of the CGIs as opposed to 31% background frequency on the array (Figure 2E).
Using Chip-Seq enrichment data from the ENCODE project which combines data from 95 different cell lines and 152 transcription-factor (TF) targeting antibodies, DMRs were frequently observed in locations associated with specific TFs; with specific TF enrichment being stage dependent. From S1 to S2, EBF binding sites are very significantly enriched for de-DMRs (fold enrichment = 10.94; Fisher’s exact test P < 2.2 × 10−16, Table 1), supporting its principal role in B-cell development. On the other hand, de novo DMRs were significantly enriched for CEBPB, c-Jun and p300. From S2 to S3, de-DMRs were preferentially enriched with IRF4 and MEF2A binding sequences (Table 1). This specific enrichment of TF binding sites in B-cell development DMRs was further confirmed by motif enrichment analysis using AME and the TRANSFAC motifs. From S1 to S2, EBF-related motifs, V$EBF_Q6 and V$OLF1_01, were the most significantly associated with de-DMRs followed by other ETS and E2A motifs, whereas de novo DMRs were significantly enriched for PU.1, SPI1, ELF1 and SPIB motifs (Supplementary Table S1). To test whether the demethylation events at TF binding sites was functionally significant, we performed ChIP-PCR analysis with two of the most significant TFs, EBF1 and BCL11A, comparing multipotential cells (S1) to committed pre-B cells (S3). ChIP-PCR analysis revealed that predicted targets were not occupied in S1 cells but strongly and specifically occupied for each TF during S3 (Figure 2F).
We next analysed the concomitant RNA expression changes within the same samples in order to examine RNA expression changes associated with pre-B-cell development, and moreover, to distil functional methylation changes that had transcriptional consequences. To nominate differentially expressed genes (DEGs) between any two ensuing stages, we used the thresholds of adjusted P-values ≤0.05 (Benjamini–Hochberg adjustment) for moderated t-test and fold-change ≥ 1.5, which resulted in a total of 3913 DEGs. Unlike DNA methylation that was predominantly one-directional with hypomethylation being the driving force, RNA expression was a bi-directional process with similar numbers of up- and down-regulated genes in total. There was, however, a stage dependent preference toward up-regulation or down-regulation. During transition from S1 to S2, up-regulated DEGs were about two times more prevalent than down-regulated ones (1996 versus 675 genes); this together with the concomitant predominant demethylation leading to S2 suggests a possible role of methylation-regulated transcriptional changes during the initial B-cell lineage commitment. In contrast, during S3–S4 progression, down-regulated genes significantly outnumbered up-regulated genes (949 versus 364 genes; Figure 3A), this in combination with just 81 genes (corresponding to 139 DMRs) that are demethylated from S3 suggests that DNA methylation might not play a major role in regulating gene expression leading to the progression to immature B-cells. As expected, hierarchical clustering using these 3913 DEGs accurately classified the 31 samples into 4 different stages, and S2 and S3 were the most similar in expression profiles (Figure 3B). The putative B-cell-related TF and surface antigen genes such as TCF3, IKZF1, EBF1, PAX5, CD19, CD79A and BLNK were up-regulated throughout all stages. Genes involved in immunoglobulin rearrangement or functioning as temporary surrogate markers, such as RAG1, RAG2, VPREB1 and IGLL1, were activated during S2 and S3 and down-regulated again at S4. The early progenitor genes, SOX2 and SPI1, were gradually down-regulated as well as non-B lineage genes such as GFI1, GATA1 and NOTCH3. DNA methyltransferase genes including DNMT1, DNMT3A and DNMT3B were progressively down-regulated (Supplementary Figure S3).
Pathway enrichment analysis on the up-regulated and down-regulated genes between the three stage progressions (S1->S2, S2->S3 and S3->S4) identified significant pathways specifically in up-regulated genes between S1 and S2, and S2 and S3, and down-regulated genes between S3 and S4. From S1 to S2, antigen presentation and B-cell receptor (BCR) pathways associated with B-cell development were enriched in up-regulated genes, as well as polo-like kinase (PLK) and G2/M checkpoint pathways-associated cell proliferation. Nuclear factor of activated T-cells (NFAT), hepatocyte growth factor (HGF), interleukin-3, protein kinase A, estrogen receptor, glucocorticoid receptor signaling pathways and protein ubiquitination pathways were also modestly enriched during these stages. From S2 to S3, Phosphoinositide 3-kinase (PI3K) and interferon (mediated by JAK1, JAK2 and TYK2) signaling pathways were specifically enriched in up-regulated genes. During S3->S4 progression, many genes linked to cell cycle, cell division and mismatch repair were down-regulated. The associated signaling pathways include BRCA1 (most significant, P = 1.6 × 10−23), ataxia telangiectasia mutated (ATM), checkpoint kinase (CHK), PLK, cyclin and purine/pyrimidine pathways (Supplementary Table S2).
To investigate whether methylation alterations affect transcription, we examined the expression changes of genes with altered methylation levels during stage progression. In most cases, methylation changes did not lead to discernable expression differences in associated genes. 12.4% of the genes with one or more DMRs from S1 to S2 resulted in corresponding transcriptional alterations, which represented a highly significant overlap (Fisher’s exact P = 1.9 × 10−6). Similarly, from S2 to S3, 10% of genes with ≥1 DMRs had concurrent methylation and expression changes (Fisher’s exact P < 2.2 × 10−16), whereas the corresponding number from S3 to S4 was only 2.6% (P = 0.77). Looking among the genes that incurred stage-wise expression changes, the percentages that were accompanied with methylation changes were 10.1, 11.9 and 0.25% for stage progression from S1 to S2, from S2 to S3 and from S3 to S4, respectively. This suggests that methylation plays a pivotal role in B-cell lineage commitment, but mechanisms other than methylation also play an important role, especially toward progression into immature B-cells. We listed the 224, 60 and 7 genes in that had undergone concurrent methylation and expression changes from S1 to S2, from S2 to S3 and from S3 to S4, respectively, in Supplementary Table S3.
Given the enrichment of DMRs detected at intragenic regions outside of CGIs, we classified all stage-wise DMRs into different regulatory subgroups according to their relative locations in the gene (Promoter, Body) and to CGIs (non-Island, Island, Shelf/Shore) and examined the relationship between differential methylation and differential expression within each subgroup. If there were multiple DMRs mapped to the same gene and same regulatory subgroup, we selected the DMRs with the largest absolute methylation changes to represent the unit. In a robust linear regression analysis using the expression changes as the outcome and the methylation changes as the predictor, adjusted for baseline expression and methylation levels at the preceding stage, DMRs in all regions, except for body/CGI, showed an inverse correlation with expression changes. Interestingly, the strongest association was found in the group of promoter/non-CGI (coefficient = −0.61, P = 0.001, n = 350); a Δβ of 0.75 in this region was estimated to correspond to about a 1.37-fold change of expression levels. Even though the majority of the methylation changes (810 genes) were detected in body/non-CGI regions, their effects on expression were much weaker (coefficient = −0.21, P = 0.002). DMRs in body/CGI were small in number but intriguingly showed the only positive correlation with expression (coefficient = 0.11, P = 0.32, n = 87; Figure 4A). In a subset of CpGs and genes, we observed delayed or lagged expression changes in later stages after methylation changes in prior stages. A total of 48 de-DMRs from S1 to S2 were associated with up-regulated DEGs from S2 to S3, and 56 de-DMRs from S1 to S2 were so with those from S3 to S4 (Supplementary Table S4). Characteristic CpGs include those near the CD19 gene, which are substantially demethylated from S1 to S2, with accompanying substantial expression changes in later stages (Supplementary Figure S4).
Among the 350 genes that had one or more DMRs detected in the promoter/non-CGI group, 25% (89) genes had gene expression changes of at least 1.5-fold. Focusing on these 89 genes, we observed that 65 of them (74%) had methylation and expression changes occurring in inverse correlation (Figure 4B). Methylation and gene expression changes of this 65-gene set are listed in Supplementary Table S5. Examining the CpG dinucleotide density of these 65 genes, we found that a surprisingly high percentage (75.8%) do not contain any CGI regions in their entire genic region (including promoter region), furthermore, 86.4% had no CGI at their promoter region (Supplementary Figure S5A). In contrast, only 27.1% of the transcripts on HM450 do not contain any overlapping CGIs in their genic region, and only 34.1% have non-CGI promoters (Supplementary Figure S5B). While the role of CGIs located in the promoter regions in transcriptional silencing is well-recognized, little is known about the regulation of the genes whose promoters contain no CGIs, our results show that demethylation at low-density CpG sites in the promoter regions has a functional significance in the establishment of pre-B-cell lineage.
Specific histone modifications have been previously shown to be associated with active CGI promoters (22). To characterize the chromatin structure associated with these 65 genes, we examined the histone modification ChIP-seq data of the lymphoblastoid cell line GM12878 from the ENCODE project. A large majority, 75% and 66%, of these 65 genes have their non-CGI promoter regions marked with active histone modifications H3K4me1 and H3K4me3, respectively. Interestingly, only 37% and 28% of all the non-CGI promoter regions on HM450 are marked with these two active histone modifications. The percentages are notably higher, at 50% and 85%, for CGI promoter regions (Supplementary Figure S6). Our results indicate that the non-CGI promoters associated with expression changes display a histone modification profile more similar to that of CGI promoters than that of the average non-CGI promoters.
Although some trends were noted in methylation and expression changes, the situation differed gene by gene. A critical gene for pre-B-cell formation is CD19, whose promoter region was demethylated during gene activation. These regions and DMRs exactly coincided with EBF binding sites (Supplementary Figure S7A). A second example is TCF3 (E2A), which has two distinct TSSs. The upstream TSS (TSS1; encoding isoform E12) was hypomethylated at S1 and its methylation level remained unchanged during B-cell development whereas regions around the second TSS encoding isoform E47 underwent demethylation (Supplementary Figure S7B). This suggests a role of methylation changes in isoform regulation. In the case of PAX5, its TSS overlaps with CGIs and these regions were hypomethylated at S1. During gene activation, the CGI region in the middle of intron 5 became hypermethylated (Supplementary Figure S7C). In a mouse study, the 5′-part of PAX5 intron 5 was proven to have enhancer elements that were affected by EBF1 binding and were demethylated during B-cell development (23). An example of down-regulated gene is NR1H3, a key regulator of macrophage function, whose CpG sites in the promoter region were subjected to de novo methylation whereas CpG sites located in the 3′-part of the gene body were demethylated (Supplementary Figure S7D).
In the present study, we compared the methylation profiles of early progenitor cells starting from a point prior to lineage commitment (predominantly MPPs and CLPs) through pre-B-I, pre-B-II and immature B-cells, and have comprehensively cataloged methylation changes associated with pre-B-cell development. The dense probe placement enabled us to examine methylation changes within and outside CGIs and in various locations within individual genes to gain insights on loco-regional epigenetic control of human early B-cell development. Our analysis created a reference set for those studies investigating malignancies originating from B-cell progenitors which are the most common cancer in children. In combination with genome-wide RNA expression profiling, we have identified a distinct development-dependent demethylation signature which has gene expression regulatory properties. Altered DNA methylation occurred predominantly within gene bodies outside of CGIs, but less frequently at annotated gene promoters or with CGIs.
The canonical view on epigenetic gene regulation is that DNA methylation in CGIs in gene promoters has a principle role for repressing gene expression, and the loss of methylation in this region is associated with gene activation (7,8). Recently, several groups have provided evidence linking gene body methylation and transcription, but outstanding questions remain (24). Our data suggest a complex mechanism of epigenetic gene regulation during pre-B-cell development via DNA methylation. We observed that DNA methylation changes more frequently occur at gene body and remote upstream regions than promoter regions during early B-cell development although those at promoters still possess the most potent effect on gene expression. It is notable that at S1, before complete lineage commitment, the promoters of the majority of the genes are hypomethylated (β < 0.3), especially those having CGIs in their promoters (Supplementary Figure S2A). Our data indicate that, during progression through pre-B-cell stages, these promoter regions maintained their hypomethylation status whereas alterations in methylation occurred specifically to the gene body regions of many genes. Many of the DMRs in the body regions may correspond to cryptic TSS sites or enhancer regions, which need to be suppressed or activated for efficient gene transcription (25). We also observed demethylation of remote upstream regions in some up-regulated genes such as NH1R3. In fact, there is evidence that enhancer methylation in some genes is functionally critical for gene regulation (26,27). This may be understood in line with a previous model that primary TFs prime their target gene promoters at earlier stage in association with cell fate decisions and the gene expression is ‘triggered’ with a time lag after accumulation of secondary reinforcing events (5,11). On the contrary, some genes had their promoter methylated at baseline and were activated almost simultaneously with promoter demethylation, suggesting an instant triggering role of these genes. We provided one example of this in the pertinent role of demethylation at low-density CpG (non-Island) sites in the promoter regions having the strongest functional impact in triggering expression changes (Figure 3A) in the establishment of pre-B-cell lineage. We also found that these non-CGI promoters display a histone modification profile that is more similar to CGI promoters than to the average non-CGI promoters.
Strikingly, only demethylation processes were noted from stages S2 to S4, although many genes were down-regulated during this period. Furthermore, there were only seven genes that were altered in expression and in methylation from stages S3 to S4 (Supplementary Table S3); these genes included a negative regulator of BCR signaling, BTLA (which was increased in expression in S4) and the adenosine deaminase ADARB1, a gene distinct from the hypermutation and switch recombinase gene AICDA or activation-induced cytidine deaminase (Supplementary Table S3C). DNA methyltransferases including DNMT1, DNMT3A and DNMT3B were down-regulated as the stages progressed, supporting the diminished role of methylation in later stages. Therefore, it may be postulated that mechanisms other than methylation have the primary role in down-regulating genes in later stages, which may include changes in TF activity from protein modification, alterations in TF interactions and combinatory co-binding properties, changes in histone modification, effects of cytokines and miRNAs, and altered RNA decay speeds from RNA interference (28–32).
We noted that the de novo and demethylation processes occurred in regions enriched for specific TF binding sites, and these were identified in silico data using motif finding algorithms (Supplementary Table S1) as well as in in vivo data using ChIP-Seq data from the ENCODE project (Table 1). From S1 to S2, EBF binding motifs were most significantly demethylated corresponding to its central role in B-cell development (3,4). This was followed by Ets, RUNX1, TCF3 and ELF5 motifs (in silico) and E2F family members and PAX5 (in vivo). The Ets family TFs consists of over 30 members sharing common DNA binding (ETS) domain and have significant but heterogeneous roles on hematopoiesis (33). Accordingly, de novo-methylated regions were also enriched for Ets family TFs. SPI1 (PU.1), an Ets family TF involved in the myeloid versus B-lymphoid lineage decision, were also enriched both in hyper- and hypomethylated regions in vivo suggesting its complex roles (3,5). The TCF3 (E2A) is putatively known to activate the B-cell lineage-specific gene program synergistically with EBF1 and its motifs were significantly enriched here (Table 1) (3,4,34). Gene families associated with hypermethylation include targets of the myeloid TFs CEBPB and P300 (Table 1), a marker of lineage commitment. The SPIB motif was also enriched in hypermethylated regions; the molecule is specific to plasmacytoid dendritic cells and HSCs (35) suggesting that its repression is associated with binding site hypermethylation.
Using expression analysis, we found that many pathway-related genes were up-regulated from S1 to S2 (Supplementary Table S2). As expected, antigen presenting, BCR and B-cell developmental pathways were most significantly enriched. The PI3K pathway was suggested to be crucial for BCR/pre-BCR signaling (36), and protein ubiquitination was recently highlighted in NFkB signaling (37). The HGF β-chain was found to form a pre-pro-B-cell growth-stimulating factor with interleukin-7 (38), but may need further exploration. The role of NFAT family, HGF, IL-3, protein kinase A and glucocorticoid and estrogen receptor pathways were somewhat redundant although some were previously implicated in B-cell development (39–44). From S3 to S4, a number of genes related to cell division and mismatch repair were down-regulated with BRCA1-related pathway being most significant. The ATM, CHK, PLK and cyclin pathways are also related to cell cycle control and DNA repair. This may be due to the reduced need for cell division but may also be related to preparations for ‘hypermutation’ in mature and activated B-cells. Interestingly, a study using Brca−/− cell line found increased somatic hypermutation in the immunoglobulin genes (45).
Recently a global DNA demethylation signature was discovered in mouse erythropoiesis (46). This demethylation is associated with DNA replication and occurs at all DNA elements. This process is in contrast with the demethylation signature that Calvanese et al. (11) observe in non-erythropoietic cells, which are specific and targeted demethylation events with functional relevance, e.g. Figure 2F. In contrast to erythrocytes, a functional and specific DNA methylation pattern is critical for differentiated cell function in B-cells which remain nucleated and many of which are long-lived.
In summary, we have investigated DNA methylation changes in early human B-cell development in association with expression changes. DNA methylation changes were associated with profound effects on gene expression during early lineage commitment, especially DNA methylation changes in regions other than promoters. The changes were non-randomly located in terms of CGIs, alternative TSSs and TF binding sites. The impact of DNA methylation on gene regulation was reduced in later stages of B-cell development, suggesting that mechanisms other than DNA methylation may have a principal role after lineage commitment.
Supplementary Data are available at NAR Online: Supplementary Tables 1–5 and Supplementary Figures 1–7.
National Institutes of Health [NIEHS/EPA P01ES018172 to J.L.W. and P.B., and NIDDK P01DK088760 to M.O.M.], [NCI R01CA155461 to J.L.W and X.M.]; Tobacco-Related Disease Research Program [18CA-0127 to J.L.W.]; Leukemia and Lymphoma Society [6026-10 to J.L.W.]; and J.L.W. and P.B. funded the study. Funding for open access charge: NIH.
Conflict of interest statement. None declared.
We thank the staff and faculty at San Francisco General Hospital Women’s Options Center for assistance in the collection of human fetal tissues. We thank Margaret Wrensch for helpful discussions and comments. The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding bodies. S.-T.L. and Y.X. performed all biostatistical analyses, helped direct the analysis and drafted the manuscript. M.O.M. and M.E.F. conducted the cell isolation and sorting, and helped to draft the manuscript. J.X. performed nucleic acid isolations and DNA bisulfite modifications, and helped to draft the manuscript. J.K.W., S.Z., X.D., A.dS., A.C., P.B. and X.M. helped to analyse data, and helped to draft the manuscript. J.L.W. provided overall direction and management, designed the study and drafted the manuscript.