|Home | About | Journals | Submit | Contact Us | Français|
The cohesin complex has recently been shown to be a key regulator of eukaryotic gene expression, although the mechanisms by which it exerts its effects are poorly understood. We have undertaken a genome-wide analysis of DNA methylation in cohesin-deficient cell lines from probands with Cornelia de Lange syndrome (CdLS). Heterozygous mutations in NIPBL, SMC1A and SMC3 genes account for ~65% of individuals with CdLS. SMC1A and SMC3 are subunits of the cohesin complex that controls sister chromatid cohesion, whereas NIPBL facilitates cohesin loading and unloading. We have examined the methylation status of 27 578 CpG dinucleotides in 72 CdLS and control samples. We have documented the DNA methylation pattern in human lymphoblastoid cell lines (LCLs) as well as identified specific differential DNA methylation in CdLS. Subgroups of CdLS probands and controls can be classified using selected CpG loci. The X chromosome was also found to have a unique DNA methylation pattern in CdLS. Cohesin preferentially binds to hypo-methylated DNA in control LCLs, whereas the differential DNA methylation alters cohesin binding in CdLS. Our results suggest that in addition to DNA methylation multiple mechanisms may be involved in transcriptional regulation in human cells and in the resultant gene misexpression in CdLS.
Vertebrate gene expression is tightly regulated at several levels which are mechanistically linked to each other (1). DNA methylation, histone modification and chromatin remodeling are the most well-recognized and closely interwoven epigenetic events (2). Epigenetic regulation during development may occur very early during embryogenesis, driving the formation of different organ systems (3). DNA methylation is maintained by methyltransferase DNMT1 in both mitosis and meiosis, and is considered the most stable epigenetic mark (4).
In mammals, DNA methylation predominantly occurs at CpG dinucleotides by covalent addition of a methyl group to position 5 of the cytosine ring, creating 5-methylcytosine. CpG dinucleotides are underrepresented in the human genome with a frequency of 2–5% as compared to the GC content (5). CpG dinucleotides are not equally distributed throughout the human genome; instead, they occur in clusters of large repetitive sequences [such as ribisomal DNA (rDNA), satellite sequences or centromeric repeats] or in short CG-rich DNA stretches, known as CpG islands (CGIs) (6). Dinucleotide clusters of CpGs or ‘CpG islands’ are present in the promoter and exonic regions of ~40–70% of mammalian genes and these clusters are usually unmethylated (7,8). By contrast, other regions of the mammalian genome contain less CpG dinucleotides and the majority (75%) of these sparsely located CpG dinucleotides are largely methylated (9). A large number of experiments have shown that methylation of promoter CpG islands plays an important role in gene expression, genomic imprinting, X-chromosome inactivation, genomic instability, embryonic development and carcinogenesis (10,11).
Four DNA methyltransferases (DNMTs) (DNMT1, DNMT2, DNMT3A and DNMT3B) and one DNMT-related protein (DNMT3L) have been identified (12). DNMT1 acts as a maintenance methyltransferase, whereas DNMT2 may be an RNA methyltransferase, and DNMT3a and DNMT3b are de novo methyltransferases targeting unmethylated DNA. All DNMTs are essential for embryonic viability with homozygous mutant mice dying early in development (13). MBD1-4 proteins or methyl CpG-binding proteins (MeCP2) recognize and bind to methylated DNA. They recruit transcriptional corepressors such as histone-deacetylating complexes and polycomb group (PcG) proteins, associate with chromatin-remodeling complexes and attract chromodomain-binding proteins (13).
DNA methylation and chromatin structure are strikingly altered in many pathological situations, particularly in cancers and various mental retardation syndromes. Altered levels of the methyl donor folate and homocysteine have been repeatedly linked to these disorders. Disease-associated changes in epigenetic modifications can be classified into changes in genes that are epigenetically regulated and in genes that are part of the molecular machinery establishing and propagating the epigenetic modifications through the development and cell divisions. Aberrant methylation patterns have been reported in various neurodevelopmental disorders, including X-linked α-thalassemia and mental retardation (ATRX), Fragile X, and immune deficiency, centromeric instability and facial abnormalities (ICF) (14). Interestingly, the ATRX gene is misexpressed in CdLS [+1.2, false discovery rate (FDR) = 0.07] and the disorder ATRX presents defective sister chromatid cohesion and is considered as one of the ‘cohesinopathies’ (15).
Cornelia de Lange syndrome (CdLS, OMIM#122470, 300590, 610759) is the first identified ‘cohesinopathy’ which is a heterogeneous dominantly inherited developmental disorder with multiple-organ system involvement (16–19). The majority of CdLS probands were found to have heterozygous mutations in the NIPBL gene, whereas a small percentage have mutations in the SMC1A and SMC3 genes. SMC1A and SMC3 are core components of the cohesin complex which controls sister chromatid cohesion during S phase, while NIPBL facilitates cohesin loading and unloading (20,21). In addition to cohesin and NIPBL’s canonical role in regulating sister chromatid cohesion, they have also been implicated as key regulators of gene expression over long distances (22,23), and have been shown to preferentially associate to actively transcribed genes and colocalize with RNA polymerase II in Drosophila (24). In humans, cohesin colocalizes with CTCF and regulates gene expression at the imprinting IGF2/H19 locus (25). Moreover, we recently performed a genome-wide transcription and cohesin-binding study on CdLS cell lines and identified a CdLS-specific expression profile from NIPBL and cohesin mutant individuals. Our data also suggest that cohesin preferentially binds to transcription start sites (TSSs) and tightly correlates to transcriptional activation in humans. Loss of cohesin binding occurs in CdLS and correlates to dysregulated gene expression (26). We undertook a genome-wide DNA methylation analysis on cell lines from probands with CdLS for the following reasons: First, the colocalization of CTCF and cohesin suggests possible functional overlap between these two proteins. A growing body of evidence suggests that CTCF is involved in regulating DNA methylation in vertebrates (14), and it binds to diverse DNA sequences including most imprinting center regions and many CpG islands. CTCF’s binding to chromatin seems sensitive to DNA methylation, because it not only reads DNA-methylation marks but also has a role in determining DNA methylation patterns. Second, NIPBL was reported to be involved in chromatin remodeling by direct association with HP1, linking NIPBL to multiple enzymes and protein factors that determine the histone modification patterns that tightly correlate with DNA methylation status (27,28). Third, a qualitative analysis of cohesin binding peaks in control and CdLS cells using data obtained from our previous chromatin immunoprecipitation (ChIP)–chip assay (26) has revealed 6.85% of cohesin peaks overlap CpG islands, while 7.77% of the lost peaks in CdLS overlap CpG islands, hence CpG islands are overrepresented by 6.35% (P= 0.0018) among cohesin peaks that are lost in NIPBL mutant CdLS cells. Fourth, we have identified differentially expressed genes in CdLS from our previous genome-wide studies (using a relatively loosened cutoff of FDR < 0.2), multiple of which are critical in DNA methylation. Of these, there are several proteins directly involved in DNA methylation, such as the de novo DNA methyltransferase DNMT1 (−1.11, FDR = 0.15), the universal methyl-binding domain proteins MBD1 (−1.23, FDR = 0.02) and MeCP2 (+1.11, FDR = 0.06). In addition, PcG proteins which form the polycomb repressive complexes 1 and 2 (PRC1 and PRC2) are also dysregulated in CdLS. For example, CBX7 (+1.2, FDR = 0.13) and BMI1(+1.36, FDR = 0.02) in PRC1, EZH2 (−1.2, FDR = 0.05) and SUZ12 (−1.13, FDR = 0.199) in PRC2. Of note, Enhancer of Zeste homolog 2 (EZH2) was suggested to serve as a recruitment platform for DNA methyltransferases because it interacts with methyltransferases (DNMTs) and associates with DNMT activity in vivo (29). Another example is GADD45A (+1.54, FDR = 0.057) ectopic expression of which leads to the reduction in methylation at both specific gene loci and the total cellular 5-methylcytosine content (30). Fifth, SmcHD1-a SMC hinge domain containing protein, maintains hypermethylation on the inactivated X chromosome in mice (31); and a second SMC-like protein-DMS3 mediates RNA-directed DNA methylation (RdDM) in plants (32), functional identification of these two proteins therefore links cohesin to DNA methylation. Sixth, ‘genomic neighborhood diseases’ have recently been proposed (33), and are composed of genes that are located in the same chromatin domain, are co-expressed and communicate through three dimensional structures. Dysregulated epigenetic events such as DNA methylation are tightly involved in gene mis-expression and the onset of these diseases. It is not known whether mutations in NIPBL or cohesin might be able to alter global DNA methylation and affect genomic organization, which may contribute to transcriptional dysregulation in CdLS.
Very little is known about epigenetic regulation by cohesin. Similarly, the role of epigenetic modulation in CdLS and the majority of other human developmental disorders are poorly characterized. To date, there has been no global assessment of overall DNA methylation in cohesin or NIPBL mutant human cells. We applied a comprehensive DNA methylation profiling approach to assess the epigenetic state in CdLS. We asked whether these mutant cells differed from healthy controls in terms of DNA methylation. We used an array-based method to quantitatively measure the methylation levels of 27 578 CpG sites in the regulatory regions of 14 495 genes in the human genome. We have identified differential methylation patterns in CdLS probands and also provided an integrative whole-genome view on DNA methylation, gene expression and cohesin binding in CdLS. We suggest CdLS has its own epigenetic signature that is formed in the early stage of embryonic development, which likely contributes to its clinical features.
Sixty-three lymphoblastoid cell lines (LCLs) from 39 CdLS probands, two Roberts syndrome (RBS) probands and 22 gender- and race-matched healthy controls were tested. In addition, triplicates of one universally methylated DNA control (Zymo Research) and triplicates of each of two universally unmethylated DNA controls were also included (CHEMICON). All the tested CdLS probands have identified gene mutations and well-documented clinical features, including 22 severely affected probands with NIPBL mutations, eight mildly affected probands with NIPBL mutations, eight mildly affected probands with SMC1A mutations and one mildly affected proband with an SMC3 mutation. These samples include most of the individuals studied in our previous gene expression project (26). All human subjects participating in this study were enrolled under an institutional review board-approved protocol of informed consent at The Children’s Hospital of Philadelphia and Misakaenosono Mutsumi Developmental, Medical, and Welfare Center. All subjects were evaluated by one or more experienced clinicians. Gene mutations were confirmed by sequencing.
LCLs were grown uniformly in RPMI 1640 with 20% fetal bovine serum (FBS), 100 U penicillin/ml, 100 μg streptomycin/ml sulfate and 1% L-glutamine as described previously. All 63 cell lines were grown anonymously and processed randomly. Genomic DNA was isolated from LCLs following the manufacturer’s instruction (Gentra Systems). The ND-1000 (NanoDrop Technologies, Wilmington, DE, USA) were used to check DNA quality and quantity, respectively.
Prior to hybridization, bisulfite conversion of DNA samples was performed using the EZ DNA methylation kit (Zymo Research). Five-hundred nanograms DNA were used; a thermocycling program with a short denaturation step was included for bisulfite conversion (16 cycles of 95°C for 30 s followed by 50°C for 1 h).
After bisulfite treatment, each sample was whole-genome amplified (WGA) and enzymatically fragmented, and then applied to the BeadChips using Illumina-supplied reagents and conditions at Genomic Facility at the the Wistar Institute. HumanMethylation27 DNA Analysis BeadChip (Illumina), which carries 27 578 highly informative CpG sites derived from the well-annotated NCBI CCDS database (Genome Build 36) and spans more than 14 495 genes, was used for this experiment. Allele-specific primer annealing was followed by single-base extension using DNP- and Biotin-labeled ddNTPs. After extension, the array was fluorescently stained, scanned and the intensities of the unmethylated and methylated bead types were measured by a BeadArray Reader (34). Each methylation data point was represented by fluorescent signals from the M (methylated) and U (unmethylated) alleles and was recorded via a Methylation Module in BeadStudio software. DNA methylation values were described as beta (β) values which computed from the two alleles: β = M/(U + M). The β-value therefore reflects the fractional methylation level of each CpG site. DNA methylation β-values are continuous variables between 0 and 1, representing the ratio of the intensity of the methylated bead type to the combined locus intensity. We quantified methylation level using β-value, and performed the statistic analysis based on the value of Log2[β /(1 – β)] for more linearized data. We arbitrarily defined CpGs with Log2[β/(1 – β)] > 0 [also equals to β > 0.5 (50% of reads)] as hyper-methylated, CpGs with Log2[β/(1 – β)] < −2 [also equals to β < 0.2 (20% of reads)] as hypo-methylated, and CpGs with −2 <Log2[β/(1 – β)] < 0 as medium methylated.
Data processing and statistical analyses were preformed within R statistical environment (www.r-project.org); all CpG sites were included in the analysis. Array data from the methylated and unmethylated alleles were first processed separately by LOESS normalization across 63 LCL samples (the three sets of triplicated artificially methylated and unmethylated DNA controls were excluded) and then put together to calculate β-values. Differential methylation between control–CdLS or male–female groups was evaluated by a two-way analysis of variance (ANOVA) model with disease status and sample gender as the two tested factors. FDR was estimated by a procedure that randomly shuffled the sample labeling and repeated the ANOVA test 100 times.
To validate the data obtained using the HumanMethylation27 BeadChip, samples from four to six healthy controls and four to six severely affected probands with NIPBL mutations were evaluated by bisulfite sequencing (BS). The genomic addresses and sequence information of each CpG dinucleotide on the BeadChip were downloaded from the company’s database (Illumina); ±200 bp surrounding the target CpG of CAPN2 and LMO2 were retrieved from the UCSC genome database (http://genome.ucsc.edu/) and used as template sequence to design polymerase chain reaction (PCR) primers. Genomic DNA (1 μg) was bisulfite-converted and recovered as described above. Primers were designed by Methyl Primer Express v1.0 software (Applied Biosystems) using default settings. Primer sequences are available on request. Hot-start touchdown PCR was done in 25 μl reaction containing 0.25 mM dNTPs, 1× buffer, 0.4 μM forward and reverse primers and 1 U of ZymoTaq™ DNA Polymerase (Zymo Research). The PCR conditions were as the following: 94°C for 15 min, then 14 cycles with a gradual decrease of annealing temperature from 62°C to 55°C with 0.5°C reduction per cycle followed by 72°C for 1 min per cycle. After that, amplification was continued with 36 cycles of 94°C for 30 s, 55°C for 30 s and 72°C for 1 min, then ended with 72°C for 15–20 min. PCR products were verified by gel electrophoresis, 2 ul PCR product was subsequently cloned into pGEM-T vector and transformed into JM109 cells according to the manufacturer’s instruction (Promega). Ten to twelve individual clones of each PCR fragment were selected for sequencing using an ABI Prism 377 automatic sequencer (Applied Biosystems) with T7 primer. Sequencher and MacVector were used to align obtained sequences with reference sequences from the UCSC genome browser (http://genome.ucsc.edu/). C→T changes at CpG sites were documented.
All procedures were performed as describe in our previous publication (26).
EpiGRAPH (http://epigraph.mpi-inf.mpg.de/WebGRAPH/) is an online software to analyze genomic and epigenomic features enriched in a group of given DNA fragments (35–37). EpiGRAPH was used to analyze DNA sequences harboring the differentially methylated CpG sites in CdLS in terms of the specific DNA sequence patterns, the overlap with specific genomic regions (e.g. CpG islands, repetitive regions and SNPs) and histone modification makers. Galaxy (http://galaxy.psu.edu/) was used to format downloaded genomic sequences from the UCSC genome browser (http://genome.ucsc.edu/). ClustalW2 (http://www.ebi.ac.uk/Tools/clustalw2/index.html) was used to identify consensus sequence around CpG sites whose methylation was significantly changed in CdLS.
Genomic sequences reported in this manuscript have been submitted to NCBI GEO (http://www.ncbi.nlm.nih.gov/geo): methylation data are under accession number GSE 18458, gene expression data are under accession number GSE 12408 and ChIP–chip data are under accession number GSE 12603.
Whole-genome DNA methylation studies were conducted on 72 sodium bisulfite-converted DNA samples obtained from 63 sample LCLs, two sets of triplicated artificially de-methylated DNA controls and one set of triplicated artificially fully methylated DNA controls. Infinium HumanMethylation27 BeadChip that carries 27 578 CpG dinucleotides located within the promoter region of 14 495 unique genes in the human genome was used for hybridization. Hybridization signal and β-value were examined as described in ‘Material and Methods’ section, β < 0.2 (20% of reads) was considered as low level of methylation, while β > 0.5 (50% of reads) was considered as high level of methylation. Linearized Log2 [β /(1 – β)] was used for statistic analysis. Samples were selected from Caucasian individuals and gender and age were closely matched. The 63 experimental samples included 22 healthy controls, 22 severely affected CdLS probands with NIPBL protein truncating mutations, eight mildly affected CdLS probands with NIPBL missense mutations, eight mildly affected CdLS probands with SMC1A mutations, one mildly affected proband with the SMC3 mutation and two RBS probands with homozygous mutations in ESCO2 (Supplementary Table S1). After normalization, the six replicated unmethylated and three replicated methylated artificial DNA control samples showed corresponding low and high β-values, indicating an efficient bisulfite conversion of DNA samples (Supplementary Figure S1). The methylation density peaks of all of the 63 experimental samples have Log2 [β/(1 – β)] < 0 and overlap the peaks of the unmethylated artificial controls, indicating that the DNA from human LCLs is hypo-methylated globally (Supplementary Figure S1). There are 26 486 CpG dinucleotides on the human autosomes (chromosomes 1–22), the average DNA methylation levels of these dinucleotides are highly variable among the 22 control LCLs, but the majority of them are hypo-methylated. Bimodal distribution of DNA methylation was seen on all CpGs among the 22 controls, suggesting two types of genomic contents may be produced by DNA methylation. This bimodal distribution further suggests that it may represent the overall hypo-methylation of CpG sites inside CpG islands and the overall hyper-methylation of CpG sites located within the CpG poor genomic regions in human LCLs (Supplementary Figure S1).
Methylation levels of 26 486 probes on autosomes (chromosomes 1–22) were quantified by β, afterwards Log2 [β/(1 – β)] of each probe in each sample was calculated and used for unsupervised principal component analysis (PCA). Figure 1A depicts that the 22 severely affected CdLS probands with NIPBL protein truncating mutations can be separated from the 22 healthy controls solely based on DNA methylation levels on autosomes except five healthy control samples. Control ‘AGS-222-S’, ‘27574’, ‘CDL-145-03S’, ‘27572-B’ and ‘95-3986-S’ are discordant to the rest of other samples in control group. This unsupervised PCA result indicates that the genome-wide DNA methylation profile is significantly different in CdLS as compared to healthy individuals. Probes on X and Y were excluded due to the gender effect on DNA methylation.
In order to define the differentially methylated DNA loci in CdLS, we performed a two-way ANOVA analysis of the total 27 578 probes on the control group (22 individuals) and the severely affected NIPBL mutant CdLS group (22 individuals). There are 924 CpG sites (corresponding to 902 cognate genes) differentially methylated in CdLS with P < 0.01 (FDR = 0.222), out of these 924 CpG sites, methylation levels were decreased on 361 sites (356 genes) and increased on 563 (546 genes) sites in CdLS (Supplementary Table S2). We selected 152 differentially methylated CpG sites on the autosomes with P < 0.001 (Supplementary Table S2) and further performed clustering analysis on all of the 63 samples in our cohort which are varied both clinically and genotypically (Figure 1B). Samples from each subgroup could be clustered together, although with a few outliers. Control samples and severely affected NIPBL mutant CdLS probands are evidently separated from each other, whereas mildly affected CdLS probands stay in between. Within the mildly affected CdLS probands, individuals with NIPBL mutations stay closer to the NIPBL mutant subgroup that has severe manifestations, whereas CdLS probands with other gene mutations (SMC1A and SMC3) tend to cluster together and stay closer to healthy controls. Interestingly, the two RBS probands are clustered side by side with SMC1A mutant individuals. RBS is an autosomal recessive genetic disorder due to homozygous or compound heterozygous mutations in the ESCO2 gene, ESCO2 has acetyltransferase activity and is involved in the establishment of sister chromatid cohesion.
We further separately analyzed the hypo-methylated CpG sites (Log2[β/(1 − β)] <−2) and hyper-methylated CpG sites (Log2 [β/(1 – β)] >0) in controls. In addition, autosomal and X-linked sites were also analyzed separately (Supplementary Table S3). In general, the average DNA methylation levels of all of the hyper-methylated CpG sites and the X-linked hypo-methylated CpG sites have been increased in CdLS, whereas the autosomal hypo-methylated CpG sites remain unchanged (Figure 2A). Statistically, the average methylation level (combining both females and males) of X-linked hypo-methylated CpG sites has increased significantly in CdLS as compared to the increase of autosomal hypo-methylated CpG sites (P = 1.51E-08) (Figure 2A). Since gender could play a major role affecting DNA methylation levels, we then repeated the above analysis by analyzing female and male samples separately. We saw more significant methylation change of hypo-methylated CpG sites on the X chromosome than on the autosomes in CdLS females (P = 3.5E-17). Significant increase in DNA methylation of hypo-methylated CpG sites than that of hyper-methylated CpG sites was also found on X chromosome in female CdLS probands (P = 1.2E-12) (Figure 2B). Surprisingly, significantly reduced methylation levels were found for the above hypo-methylated CpG sites on the X chromosomes in male CdLS probands (Figure 2B).
The expression of genes in LCLs and the methylation levels of their associated CpG sites were first examined in 22 control samples. We took advantage of the whole-genome expression analysis from our previous study (26), in which we have identified 10 378 out of 15 162 unique refSeq genes expressed in LCLs using Affymetrix HG_U133plus2.0 arrays. Each of these genes was mapped to one and only one transcription starting site (TSS). CpG dinucleotides were mapped to within ±1-kb regions surrounding TSSs to obtain 22 351 TSS–CpG pairs from 12 081 unique genes and 21 110 unique CpG sites. Consistent with the well-documented literature, an inverse correlation between DNA methylation and gene expression was also found in LCLs in our study. DNA methylation suppresses gene expression and the reduced methylation level correlates to higher probability of gene expression (Figure 3A, Supplementary Figure S2A). Figure 3A shows the percentage of expressed genes and the corresponding CpG methylation levels. The overall percentage of genes expressed in LCLs is 69.9% (8449/12 081), which indicates that 69.9% of human genes are likely to be expressed in LCLs. While the level of DNA methylation is increased, fewer genes are expressed. Using (Log2 [β/(1 – β)]=0) as cutoff, Fisher’s exact test shows highly significant interdependence between gene expression and DNA methylation in the promoter region (P = 4.5E-315) (Figure 3A). The elevated expression of a subgroup of genes that is associated with hyper-methylation (Log2[β/(1 – β)]>0) seems to be unexpected, suggesting mechanisms other than DNA methylation are involved in their transcriptional regulation; or in another word, DNA methylation alone is not enough to downregulate gene expression in human LCLs (Figure 3A, Supplementary Figure S2A). We subdivided the 21 110 unique sites into two groups based on the CpG island mapping information from the UCSC genome browser (http://genome.ucsc.edu) as CpG sites located within the CpG island (CGI sites) and outside the CpG island (non-CGI sites). There was no linear correlation between the level of DNA methylation and the transcriptional activity of the cognate gene observed for both CGI sites and non-CGI sites, which further indicated the direct impact from DNA methylation on gene expression might not be predominant (Supplementary Figure S2B).
We also found that the relative location of the CpG site to the TSS impacts the association between DNA methylation and the expression of the downstream gene (Figure 3B). While 44.5% (1224/2752) of the genes with hyper-methylated CpG sites around their TSSs were expressed in LCLs, the closer the sites were located to the TSSs, the less likely the genes were to be expressed. When there was a hyper-methylated site located within ±50 bases around a TSS, the downstream gene had only a 27.6% (93/337) chance of being expressed. However, such an effect of the hypo-methylated sites was less evident. The negative correlation between DNA methylation and gene expression is stronger on autosomes than on the X chromosome (Supplementary Figure S2C). The probability of genes to be expressed on autosomes and the X chromosome is 82.3% and 61.4%, respectively (P = 6.0E-06) when there is a hypo-methylated site around the TSS in female samples, and 36.8% and 44.5% (P = 0.08) when there is a hyper-methylated site. This observation suggests that the association between DNA methylation and gene expression has different mechanisms on autosomes and the X chromosome.
As described earlier, 12 081 unique refSeq genes are associated with CpG sites with DNA methylation information available from this study. Out of these 12 801 genes, 8449 genes are expressed in LCLs. We compared the expression level and the methylation level for each of these 8449 genes between 22 controls and 22 severely affected NIPBL mutant CdLS probands, then we plotted the alteration of expression to the alteration of methylation for each gene. In general, there is no strong evidence supporting that global differential DNA methylation correlates to global transcriptional dysregulation in CdLS (Figure 4). The increased DNA methylation level correlates to the transcription downregulation for only six genes in CdLS (P = 0.003), whereas the decreased DNA methylation level correlates to transcriptional upregualtion for only two genes but without statistic significance (Figure 4, Supplementary Table S4). There is no link seen between the altered gene expression of the rest of 8441 genes and the altered DNA methylation in CdLS (Figure 4). We therefore conclude that the differential DNA methylation may not directly contribute to the transcriptional dysregulation in CdLS.
The average DNA methylation level of 21 110 CpG sites associated to the above 12 081 genes in 22 controls were analyzed with Log2[β/(1 – β)]. Cohesin binding signal was obtained from our previous ChIP–chip assay (26), re-analyzed quantitatively and mapped to each of 21 110 CpG site. The majority of cohesin signals were concentrated at chromatin regions with a low level of DNA methylation which is Log2[β/(1 – β)] < –2 (β < 0.2), indicating that cohesin preferentially binds to hypo-methylated chromatin; meanwhile, an increased DNA methylation level prohibits cohesin from binding to the chromatin (Supplementary Figure S3A and B). From our previous study (26), we have shown evidence that cohesin preferentially binds to gene promoters especially at TSSs. We therefore split the 12 081 genes into three groups according to the methylation levels of the CpG sites at their TSSs, and looked at cohesin binding intensities around promoters. As described above, CpGs with Log2[β/(1 – β)] > 0 were defined as hyper-methylated, while CpGs with Log2[β/(1 –β)] < –2 were defined as hypo-methylated, and CpGs with −2 <Log2[β/(1 – β)] < 0 were defined as medium methylated. A similar cohesin-binding pattern was identified for the 8128 genes with hypo-methylated CpGs and the 2247 medium methylated CpGs. Cohesin preferentially binds to the vicinity of their TSSs especially around the core promoter region (from −200 bp to +1 bp) with each peak clearly seen (Figure 5). On the contrary, very little cohesin binds to the 1706 genes with hyper-methylated CpGs, and there is no quantitative difference of cohesin binding to different regions surrounding the TSSs (Figure 5). Although cohesin binding correlates to gene expression (26), we were not able to identify a direct correlation between gene expression and DNA methylation based on our above analyses. In combination with our previous studies, these results suggest the association between cohesin and promoters are tightly correlated to gene expression, but DNA methylation may only be one of several upstream events that affect cohesin binding. The transcriptional regulation does not appear to solely depend on DNA methylation but rather multiple mechanisms or pathways are likely functioning together to control the expression of genes in human LCLs.
In addition to respectively examining genes with hypo-methylated promoters and hyper-methylated promoters in CdLS, we also wanted to examine whether the X chromosome or autosome location of the CpG dinucleotide will differently affect the correlation between DNA methylation and cohesin binding in CdLS, hence only the female samples from the control group and severely affected NIPBL mutant CdLS individuals were analyzed. For the genes with hypo-methylated promoters, binding of cohesin is enriched around TSSs regardless of whether they are located on autosomes (6692 genes) or X chromosome (145 genes); however, less cohesin binds to the X chromosome than to autosomes and the peak shifts upstream of the TSS. In CdLS, the amount of both autosomal and X chromosome bound cohesin has decreased, although a smaller peak remains in a narrowed region surrounding the TSS (Figure 6A). For the genes with hyper-methylated promoters, the amount of cohesin bound is quite low for both autosomal and X-linked genes (1640 and 129, respectively) (Figure 6B), and cohesin binding is equally distributed along ±1.5-kb region of TSSs with no obvious peak seen at TSSs. More cohesin is associated with the X chromosome than with the autosomes in controls but this tendency was reversed in CdLS which shows less cohesin associated with the X chromosome than with autosomes (Figure 6B). A slightly reduced amount of cohesin binds to the X chromosome in CdLS, but there is little change in cohesin binding to autosomal genes in CdLS (Figure 6B). In summary, cohesin binding has changed remarkably at hypo-methylated promoters in CdLS, with unknown mechanisms in addition to DNA methylation likely involved in regulating cohesin binding to X-linked and hyper-methylated promoters.
Bisulfite conversion and sequencing described by Susan Clark (38) has always been considered the gold standard by which to measure CpG methylation. We selected two genes, LMO2 and CAPN2, which demonstrated differential methylation between CdLS and control cells, for BS validation. In our previous studies (26), these two genes showed reduced transcription and loss of cohesin binding to TSSs in CdLS cells. There is a big CpG island embedded at the CAPN2 promoter region; two probes, cg01566170 and cg14972271 on the HumanMethylation27 array, were located 455 bp and 465 bp upstream of the TSS of CAPN2, respectively. The 252-bp fragment (chr1:221,966,278-221,966,529) amplified by primers CAPN2 329/330 covers these two probe CpG sites in addition to seven other CpG dinucleotides included in this fragment. DNA isolated from four controls and four severely affected CdLS probands with NIPBL mutations tested on methylation arrays above were tested again by BS. Ten to twelve clones were sequenced for each sample. Figure 7 shows the increased number of 5-methylcytosine at CpG sites in CdLS probands, indicating elevated DNA methylation levels in this chromatin region at the CAPN2 promoter which is consistent with the methylation microarray findings. In addition, the BS result from control samples also illustrated varied levels of DNA methylation in the healthy population. Sample 11 obviously has much higher methylation level than sample 46 and 48, although, on average, 48.8% of CpGs in control and 80.3% in CdLS are methylated. A second promoter region, the promoter of the LMO2 gene, was also tested by BS in six control and six CdLS individuals. A 358-bp fragment that is located at the LMO2 promoter region was amplified by primers LMO2 337/338. This examined fragment is located on chr11:33,870,140-33,870,497 surrounding probe cg11822932 on the HumanMethylation27 array, which is 120 bp upstream of the TSS of LMO2. The LMO2 promoter is very CpG poor and harbors no region that meets the criteria of the ‘CpG island’. There are only three CpG dinucleotides in this tested region including the probe CpG. We could demonstrate similar methylation changes at the LMO2 locus as demonstrated by the array study (Supplementary Figure S4). In conclusion, a consistent methylation pattern was obtained from both BS and HumanMethylation27 analysis.
EpiGRAPH (http://epigraph.mpi-inf.mpg.de/WebGRAPH/) is an online software to analyze genomic and epigenomic features that are enriched in given DNA fragments (35–37). As described above, we have identified 924 CpG sites correlated to 902 genes that are differentially methylated in CdLS (P < 0.01), out of which the methylation level of 361 CpGs (356 genes) are decreased and the methylation level of 563 CpGs (546 genes) are increased. The EpiGRAPH web service was used to analyze DNA features enriched in regions with differential methylation in CdLS. One kilobase DNA sequence (±500 bp) surrounding each of the 924 CpG dinucleotides was identified and downloaded from the UCSC genome browser (http://genome.ucsc.edu/), processed in Galaxy (http://galaxy.psu.edu/) and uploaded to EpiGRAPH to identify overrepresented genomic or epigenomic features by comparing the 361 sequences harboring hyper-methylated CpGs and the 563 sequences harboring the hypo-methylated CpGs in CdLS. We chose a size of 1000 bp for the analyses because in normal tissues the extended distance of DNA methylation is generally shorter than 1000 bp (13). Significantly distinct histone modifications were found in chromatin regions with both differential DNA hypo-methylation and hyper-methylation, indicating a tight correlation between the alteration of DNA methylation and the diverse chromatin structure in CdLS (Table 1, Supplementary Table S5). The finding that repetitive sequences are enriched in chromatin regions harboring differentially methylated CpG sites is very interesting because cohesin binding is dysregulated in CdLS and the association between cohesin and repetitive sequences has been suggested to be involved in multiple biological roles (15). ClustalW2 (http://www.ebi.ac.uk/Tools/clustalw2/index.html) was further used to look for consensus sequences covering the differentially methylated CpG sites; however, no consensus sequences could be identified (data not shown).
CdLS is a dominant genetic disorder with multiple-organ system abnormalities, including characteristic facial features, limb defects, mental retardation, developmental delay and gastrointestinal problems. Mutations in the cohesin regulatory protein NIPBL and cohesin subunits SMC1A and SMC3 account for ~65% of confidently diagnosed probands, while the remaining ~35% of probands have no identifiable gene mutations (19). Prior studies (19) have shown that CdLS probands with SMC3 and SMC1A mutations present with a milder clinical picture than the classic form of CdLS typically associated with protein truncating mutations in NIPBL. Since NIPBL is a regulator of the cohesin complex and both SMC3 and SMC1A are actual structural components of the cohesin complex, the exact mechanisms by which mutations in these proteins manifest their effects on development and gene regulation are likely to be quite different. The cohesin complex consists of four major subunits, SMC1A, SMC3, RAD21 and STAG1/STAG2, forming a ring structure holding sister chromatid together during mitosis and meiosis (39). Additionally, cohesin has been suggested to play pivotal roles in fundamental biological events in humans such as gene expression (22,26), double-strand DNA repair (40,41), genome instability (42), carcinogenesis (43) and chromatin loop formation (44). The co-localization of cohesin and CTCF in human cells further suggests functional cooperation and overlap may exist between these two proteins (25). Of note, mutations in cohesin accessory factors such as NIPBL, ESCO2 and ATRX have also been identified in human developmental disorders with similar but quite distinct clinical presentations which are currently collectively named ‘cohesinopathies’ (15). The cohesinopathies provide valuable experimental models with naturally occurring mutations to study the biological functions of the cohesin pathway in general and the specific proteins in particular in human cells. Recently, the term ‘genomic disorder’ was proposed to describe groups of developmental disorders or human maligancies in which epigenetic mechanisms including DNA methylation play important roles in the pathogenesis (33). It is not known whether similar mechanisms could also be involved in CdLS.
We chose to use a seemingly phenotypically unrelated tissue type, LCL, in this proof-of-principle study on a human developmental disorder, CdLS, based on two considerations: availability of the sample and goals of the project. As opposed to primary fibroblasts which represent a limited resource, has uncontrollable environmental exposure (such as diet or medications), and has significant variation of growth and survival rate across different samples in culture, patient-derived LCLs for genome-wide DNA methylation studies on human subjects has obvious advantages including (i) easier growth under controlled conditions to minimize the environmental influence on DNA methylation, (ii) ease of identification of sufficient numbers of matched samples (gender, age, clinical manifestations, etc.) for valid statistical analysis that would be much more difficult using fibroblasts (especially for groups engaged in the study of rare human disorders) and (iii) provides continually renewable and stable biomaterials that ensure for sequential integrated genomic analyses. Of note, GM12878, a well-circulated model LCL, has been universally used in the HapMap, ENCODE, and other major projects. Publicized DNA methylation data generated for GM12878 on the same Illumina platform demonstrated good correlation with our LCL samples with r = 0.947 when the low-quality measurements were removed (Zhang et al., manuscript in preparation). In addition, LCLs have been widely used as surrogates or as cellular models to study epigenetic changes in neuropsychiatric illnesses such as autism, bipolar disorder, schizophrenia and in central nervous system disorders such as Parkinson disease (45). Genetic disorders, such as ataxia telangiectasia (46) and Nijmegen breakage syndrome, have also been investigated on patient derived LCLs yielding valuable insights into the pathobiology of these disorders (47).
Methylation and demethylation of regulatory sequences in the genome are known to have profound effects on cellular behavior and fate. In this study, we have performed genome-wide DNA methylation analysis in CdLS using Illumina Methylation27 bead chip carrying 27 578 CpG dinucleotides that represents 14 495 cognate genes in the human genome. The power of this array is that a very large number of sites can be determined simultaneously, allowing highly reproducible global patterns to be discovered. Both unsupervised PCA based on results of all data from available CpG sites and the supervised clustering signatures based on differentially methylated autosomal CpG sites shown in Figure 1 indicate that CdLS has a distinct methylation profile as compared to controls. Of note, NIPBL transcription has only dropped 30% in those CdLS cells presenting the unique disease-specific DNA methylation pattern in the current study (26). The hierarchial clustering also demonstrated that the methylation profile correlates to disease severity and to the specific mutated genes: the mildly affected probands stay midway between severe probands and controls; mildly affected probands with NIPBL mutations were clustered as a transient group between the mildly affected individuals with mutations in other genes (SMC1A or SMC3); and the severe individuals with NIPBL mutations suggesting a genotype–phenotype correlation (Figure 1B). However, CdLS DNA methylation profile alone is not sensitive enough to serve as a diagnostic tool (Supplementary Figure S5). Leave-One-Out cross validation using P < 0.001 on 22 healthy controls and 22 severely affected CdLS probands with NIPBL mutations was conducted. Controls and probands could be roughly separated; however, 11 samples were mis-classified after 44 rounds with a classification accuracy of only 75%.
We next attempted to correlate the data from the three sets of genome-wide analyses on DNA methylation, gene expression and cohesin binding in CdLS. Venn diagram analysis did not reveal a single gene that had significant alterations in all of the three biological events (Supplementary Figure S6); therefore, no direct correlation between the three dysregulated biological events could be revealed in CdLS, indicating that additional unknown mechanisms might be involved in the pathogenesis of CdLS. That there is no remarkable correlation between gene expression and DNA methylation identified in CdLS probably can be explained by (i) A relatively big variation of DNA methylation level exists in healthy controls as seen on both array and BS (Supplementary Figure S1B and Figure 7) which adds to the difficulty of identifying an unambiguously changed DNA methylation pattern in CdLS. (ii) Global DNA methylation is a more stable type of epigenetic modification modulating the transcriptional plasticity in the human genome than we have realized (48). One study reporting the high-resolution methylation states on 1.9 million CpGs on human chromosomes 6, 20 and 22 from 12 different tissues failed to correlate DNA methylation with mRNA expression levels for 63% of the genes. Therefore, the author suggested that differential promoter methylation might have only a permissive role, such as establishing an open chromatin conformation, for the transcription regulation. In combination with other factors or mechanisms that drive transcription but not alone, DNA methylation could regulate the transcription of the cognate genes (49). To support this argument, in our study we have found keratin genes that are not expressed in the studied LCLs, most promoters of the two keratin gene clusters located on chromosome 12 and 17 are constantly under hyper-methylation in control cells. In CdLS, uniform methylation was found again at the two clusters but with even higher methylation levels (Supplementary Figure S7A and B). Keratin genes remain silent in both control and CdLS cells with differential promoter hyper-methylation levels, suggesting gene expression is not solely controlled by DNA methylation. (iii) The average global DNA methylation level in LCLs is low, and overall methylation changes between control and CdLS are quite subtle, making it more difficult to identify minor but significant changes. (iv) As seen in Drosophila, NIPBL or cohesin may regulate transcription mainly by the association with enhancers or other remote regulatory elements first and only subsequently with promoters. The CpG dinucleotides examined on the Methylation27 platform in this study have minimal coverage of the remote cis elements, which could lead to the lack of detection of any DNA methylation alteration on enhancers, silencers, insulators, locus control regions, etc. We therefore propose whole-genome BS to be the next experimental approach.
In humans, most genes are expressed from both alleles in diploid cells; however, more and more genes are being identified that are only expressed from a single allele. Mammalian X inactivation, imprinting (e.g. IGF2 and H19), and allelic exclusion (e.g. olfactory receptor genes, immunoglobulin genes, T cell receptors, interleukins and natural killer cell receptors) are classic examples of monoallelic gene expression. A recent report suggests at least 1000 autosomal human genes are subject to random monoallelic expression (50). Regulation of this group of genes is clearly epigenetic and the role of DNA methylation in imprinting has been well recognized, for example, allele-specific DNA methylation has been observed for the immunoglobulin gene and few other monoallelically expressed genes (50–52). Hence, we combined the data set from this study with our previous genome-wide gene expression study and ChIP–chip assay to test whether differential DNA methylation affects monoallelic expression in CdLS and whether it is also involved in the altered cohesin–chromatin association in CdLS. Information on documented human imprinted genes was obtained from the online database ‘Catalogue of Parent of Origin Effects’ (http://igc.otago.ac.nz/Summary-table.pdf), a list of X-linked genes with their X inactivation status and a list of monoallelically expressed autosomal genes were obtained from the literature (50,53). Twenty-nine imprinted genes, 465 X-linked genes and 191 randomly (paternal or maternal) monoallelically expressed autosomal genes have data available from all the three assays (DNA methylation, expression and ChIP–chip). Fisher’s exact tests did not reveal any correlation between differential DNA methylation and the differential expression for monoallelically expressed genes in severely affected CdLS probands with NIPBL mutations (Supplementary Table 6). In addition, correlation between the differential DNA methylation and altered cohesin binding (loss or addition) at transcription start sites (TSSs), and correlation between the altered cohesin binding at TSSs and the differential gene expression for the monoallelically expressed genes were not identified either (Supplementary Table S7). In conclusion, the current data are not able to support an impact of CdLS-related disruption of cohesin on monoallelic gene expression. This may be due to the tissue or developmental stage specificity of expression of these genes.
DNA methylation state is influenced by a number of endogenous and exogenous parameters such as gender, age, tissue type or passage of cell cultures (54). To be consistent, all samples for expression and methylation studies were from Caucasians, all three genome-wide assays used the same tissue type (LCLs), and the same set of samples were used for all the studies whenever possible. Gender was strictly matched for the differential expression and DNA methylation analyses. Only two factors, the age and the culture passage, could not be controlled for stringently due to the limitation of resources. Most of our CdLS probands are children and some of the healthy controls are adults. The LCLs we have used are stock cell lines that have been continuously collected in our laboratory for decades. However, in one report, no detectable global changes in average DNA methylation levels can be recognized between a group of 26-year-olds and the second group of 68-year-olds (49), and, in another report, the methylation change during prolonged passage in culture is suggested to be insignificant (55).
At present, genome-wide information of gene expression, DNA methylation and cohesin binding in CdLS is available from our integrative studies. The tip of the iceberg revealed in these studies will help us to design new assays to further understand how cohesin regulates critical biological pathways in humans. For example, ‘6C’ could be conducted to identify epigenetic chromatin looping structure mediated by cohesin at specific loci that associate with a particular gene transcription pattern. Allele-specific DNA methylation cannot be studied on the current array platform but could be possible if combined with the SNP information. The combined data will help us to illustrate NIPBL and cohesin’s role in monoallelic expression in the human genome, knowledge of which is currently lacking. ChIP–chip or ChIP–Seq studies to identify specific histone modifications in CdLS are also a needed next step to understand how chromatin structure is involved in human developmental disorders. Finally, animal models with complete or partial NIPBL or cohesin knockdown may help us to interpret cohesin-dependent cellular functions in vivo.
Supplementary Data are available at NAR Online.
The CdLS Foundation (a Fellowship Grant to J.L.); National Institutes of Health P01 HD052860 and R21 HD050538 (to IDK); KO8 HD055488 (to M.A.D.); Genome Network Project and Grant-in-Aid for Scientific Research (S) from the MEXT, Japan (to K.S.); Pennsylvania Department of Health (to N.B.S.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Funding for open access charge: National Institutes of Health (PO1 HD052860 and R21 HD050538 to I.D.K.).
Conflict of interest statement. None declared.
We are grateful for the participation of the children and families with CdLS and to the CdLS Foundation for their support.