|Home | About | Journals | Submit | Contact Us | Français|
DNA methylation and chromatin DNaseI sensitivity were analyzed in and adjacent to D4Z4 repeat arrays, which consist of 1 to ~100 tandem 3.3-kb units at subtelomeric 4q and 10q. D4Z4 displayed hypomethylation in some cancers and hypermethylation in others relative to normal tissues. Surprisingly, in cancers with extensive D4Z4 methylation there was a barrier to hypermethylation spreading to the beginning of this disease-associated array (facioscapulohumeral muscular dystrophy, FSHD) despite sequence conservation in repeat units throughout the array. We infer a different chromatin structure at the proximal end of the array than at interior repeats, consistent with results from chromatin DNaseI sensitivity assays indicating a boundary element near the beginning of the array. The relative chromatin DNaseI sensitivity in FSHD and control myoblasts and lymphoblasts was as follows: a non-genic D4Z4-adjacent sequence (p13E-11, array-proximal)> untranscribed gene standards > D4Z4 arrays> constitutive heterochromatin (satellite 2; P < 10−4 for all comparisons). Cancers displaying D4Z4 hypermethylation also exhibited a hypermethylation-resistant subregion within the 3.3-kb D4Z4 repeat units. This subregion contains runs of G that form G-quadruplexes in vitro. Unusual DNA structures might contribute to topological constraints that link short 4q D4Z4 arrays to FSHD and make long ones phenotypically neutral.
Overall hypomethylation in some cancer specimens and hypermethylation in others was found in NBL2, a tandem array of 1.4 kb repeat units, in a comparison to diverse somatic control tissues. We demonstrated this in ovarian carcinomas and Wilms tumors by hairpin genomic sequencing to analyze methylation at 14 CpG dyads and by southern blotting to assess overall NBL2 methylation at CpG-containing restriction sites (1). NBL2 is located mostly in the short arms of the acrocentric chromosomes. This large tandem array and as well as others, including pericentromeric satellite 2 DNA and subtelomeric D4Z4 repeat arrays, require DNA methyltransferase 3B (DNMT3B) for normal levels of methylation (2). Satellite 2 and other human satellite DNAs are very frequently hypomethylated in diverse cancers compared with a wide variety of control tissues (3). There is clearly much plasticity in the epigenetics of these tandem repeat arrays.
The epigenetics of D4Z4 is of great interest because of the linkage of a short array of these tandem 3.3-kb repeats at 4q35 (Figure 1) to facioscapulohumeral muscular dystrophy (FSHD), a severely debilitating, dominantly inherited disease (4). In the present study, we examined DNA methylation and chromatin structure in D4Z4 and the immediately proximal sequence in various cell populations to gain insights into this array, whose structure must play a central role in FSHD. The very high G+C content of these repeat arrays, 73%, as contrasted with the neighboring 1-kb proximal sequence, which has only 40% G+C, suggests biologically important epigenetic features of the proximal border of the array.
Contraction of the D4Z4 array is tightly linked to the disease in an unelucidated manner. About 95% of FSHD patients have only 1–10 tandem 3.3-kb repeat units in one of the two allelic 4q35 arrays (5,6). Almost all unaffected individuals have 11–100 of these repeat units on both 4q35 arrays. Analysis of the epigenetics of D4Z4 is facilitated by its unusually highly conserved primary structure, compared to many other types of tandem repeats, including satellite DNAs and NBL2. D4Z4 also offers the advantage of having at its proximal border a well-characterized 0.8-kb sequence, p13E-11 (Figure 1A), a hybridization probe for molecular diagnosis of FSHD (7–9).
FSHD is an enigmatic disease. There are subtelomeric D4Z4 arrays at 10q as well as 4q, and both are size polymorphic but only 4q arrays are linked to the disease. Short 10q arrays are phenotypically neutral despite 4q and 10q D4Z4 arrays having 99% homology (GenBank AF117653, AC126281, AL732375). Moreover, there is >95% homology in the 42 kb proximal to D4Z4 at 4q35 and 10q26 and also in the 15–25 kb at 4q35 and 10q26 that is distal to D4Z4 (Figure 1A) all the way to the telomeric TTAGGG repeats (5). FSHD pathogenesis is probably initiated by a short D4Z4 array interacting with a distant 4q35 sequence that is proximal to the array and not present on 10q26 (10). The identity of this sequence is unclear. From three reports of conventional microarray expression analyses on FSHD versus control muscle cell populations (11–13), no candidate FSHD gene present on 4q35, but not 10q26, has emerged. Moreover, 4q35 genes, including a gene-like sequence within the D4Z4 repeat unit, individually tested for involvement in FSHD have produced conflicting results (8,11,14–17).
Because DNA methylation and chromatin structure are often interrelated (18), insights into D4Z4 chromatin might be derived from the study of its methylation patterns. Shortened arrays in FSHD patients are significantly hypomethylated relative to D4Z4 arrays from controls (19), However, they are not as hypomethylated as the normal length D4Z4 arrays of ICF patients (20,21). ICF patients do not show any tendency to muscular disorders. Nonetheless, the FSHD-associated hypomethylation of D4Z4 might be a marker of a less condensed chromatin structure in disease-related short arrays (5).
Several groups reported D4Z4 hypomethylation in cancers, cancer-derived cell lines, and cells treated with demethylating agents or deficient in DNA methyltransferase (22–24). However, interpretation of those studies is complicated by their being PCR-based methylation analyses in which the investigators probably mostly detected highly abundant D4Z4-like sequences rather than specifically D4Z4. In contrast, blot hybridization under stringent conditions provides specificity for D4Z4 arrays (25) and offers the important advantage that a single blot can be serially hybridized with different probes for comparison of DNA methylation or chromatin DNase sensitivity at different sequences. Southern blotting also allows individual assessment of 4q versus 10q D4Z4 arrays by use of one of the rare sequence variations at a restriction site that is 4q- or 10q-specific (26,27).
In this study, we observed that D4Z4 was hypermethylated in some cancers and hypomethylated in others of the same type compared with a diverse set of somatic control tissues. Importantly, the first D4Z4 repeat unit behaved very differently from the rest of the D4Z4 array with respect to changes in DNA methylation. Moreover, the immediately proximal chromatin had a very different accessibility to DNaseI than the array, suggesting a different chromatin structure at the proximal border than in the body of the array. Our results are consistent with a model that we present for the biological importance of the proximal end of the D4Z4 array and a special chromatin structure within the array that could be sensitive to array length in an FSHD-determining manner.
With IRB approval, primary tumor samples were obtained from patients who had not been treated with chemotherapy prior to surgery (28). DNA was purified from quick-frozen samples of tumor tissue or autopsy samples from non-cancer patients or trauma victims by phenol-extraction and ethanol-precipitation.
Lymphoblastoid cell lines (LCLs 149, 724, 112 and 952) were from normal controls (AG14953 and GM18478, Coriell Institute for Medical Research) or FSHD patients (GM17724, Coriell Institute, and WJ952, from Dr Silvere van der Maarel) and propagated in RPMI 1640 (Invitrogen) with 15% fetal bovine serum (FBS, Invitrogen). Normal-control skin fibroblast cell strains 996 and CF702 (GM01996, Coriell Institute, and Tulane Health Science Center, respectively) were derived from skin punch biopsies from non-FSHD patients and grown in MEM (Invitrogen) with 15% FBS and 2 mM l-glutamine. The FSHD myoblast cell strain (FM41j) was derived from a moderately affected, left deltoid biopsy of an FSHD patient. Duly signed patient consent forms were obtained that had been approved by the Institutional Review Boards of Tulane Health Science Center and the University of Mississippi Medical Center in Jackson. The fetal myoblast cell strain (H246) from Dr Stephen Hauschka was derived in 1980 from an anonymous, therapeutically aborted E-79 fetus under a signed patient consent form approved by the University of Washington's Human Subjects Committee. Myoblasts were propagated in F-10 medium (Invitrogen) containing 20% FBS, 1 μM dexamethasone and 10 ng/ml basic fibroblast growth factor (Invitrogen), 100 U/ml penicillin and 100 μg/ml streptomycin in flasks coated with 0.7% gelatin and, for H246, were supplemented with 1.3 mM CaCl2 at 37°C in 5% CO2. They were passaged when no more than about 80% confluent (about 2.5 × 106 cells per T75 flask). Cultures used for experiments were checked by immunocytochemistry to have >90% desmin-positive cells.
Either cells (108 cells) that were permeabilized (LCLs) or nuclei that were gently isolated (LCLs, myoblasts and fibroblasts) were treated with 0–200 U/ml of DNaseI as previously described (29,30) using minor modifications given in the Supplementary Data. Then the DNA was purified by RNase treatment and organic solvent extraction and stored in 10 mM Tris–HCl, pH 7.4, 0.1 mM EDTA, pH 8.0.
DNA was digested with restriction endonucleases and probes were prepared by PCR or directly by excision from recombinant plasmids as described previously (25) or in the Supplementary Data. Complete digestion with restriction endonucleases was verified for each sample with an internal plasmid or phage DNA control (28). When more than one probe was used with a blot, the lower-copy-number probes were used first and the blots were checked for complete stripping before hybridizing to the next probe.
For methylation analysis, tumors were scored for the approximate extent of methylation change relative to the somatic controls by phosphorimager quantitation and assessment of band patterns from X-rays by methods similar to those used previously (28). Scores for cancer DNA methylation (methylation change) relative to diverse somatic control tissues in the same blot were as follows: no appreciable change (0), progressively more hypermethylation (+1, +2 or +3), or more hypomethylation (−1, −2 or −3); −3 was strong hypomethylation, like that of sperm, and +3 was little or no digestion with the CpG methylation-sensitive restriction endonuclease (28). Some blots were hybridized first to the p13E-11 probe (7), followed by stripping and rehybridizing to the D4Z4 probe. Where indicated, the percent methylation was quantitated from the phosphorimager results of the relevant bands, with normalization, when necessary, for the percent overlap with the probe. Methods for statistical analyses of the methylation data are described in Supplementary Data.
For chromatin DNaseI sensitivity assays, the percent remaining DNA (R) in the StyI parent fragment was calculated from phosphorimager data as 100× the signal in the parent fragment after DNaseI treatment divided by that in parent fragment of the analogous sample incubated with buffer instead of DNaseI. Where indicated, R values for D4Z4, p13E-11, B2M, HMBS and IL-2 sequences were normalized to the expected results for a parent band of 1.5 kb, the size of the CST5 and GHRHR parent bands. This was done for each concentration of DNaseI in each blot by using the near-linear slope of the plot of the size of chromosome 1 satellite 2 (Sat2) parent bands (1.3, 1.7, 2.3 and 3.9 kb) versus R for each of those bands. From the slope, R values were extrapolated for hypothetical Sat2 parent bands of 1.5 kb (R1.5-kb Sat2) and the size of the restriction fragment of interest. For B2M, with its 1.2 kb StyI parent fragment, the normalization was RB2M × R1.5-kb Sat2/R1.2-kb Sat2. After quantitation, the blots were stripped and checked for complete removal of signal by phosphorimager analysis before hybridizing to the next probe.
D4Z4-derived oligonucleotides (5 μM) were heated at 100°C for 10 min, immediately placed on ice, incubated at 37°C for 48 h in 0.1 M KCl, 10 mM Tris-HCl, pH 7.4, and analyzed by circular dichroism spectropolarimetry (CD; Jasco-810, Easton, MD) at 37°C. A 1-mm optical path length, a scanning speed of 50 nm/min, and a response time of 4 s were used for the following oligonucleotides: PQS6, 5′-TGGGGGGGGGGGGTGGGGGGGGA; PQS6-mut, 5′-TGCGAGTGCGAGGTGAGCGTGTA-3′; and PQS5, 5′-AGGGGGAGAGGGGGGAGGGGGGAGGGGGGCGC-3′. The average of three scans was plotted after subtracting the buffer baseline. Non-denaturing polyacrylamide gel electrophoresis (PAGE) was done in a 20% gel at ambient temperature after incubation of PQS 6 or PQS6-mut, as above, subsequent to labeling with [γ-32P]ATP and polynucleotide kinase.
To examine the nature of epigenetic changes in D4Z4 repeat arrays in cancers relative to various somatic control tissues, DNA samples were digested with the following CpG methylation-sensitive enzymes: EagI (5′-CGGCCG-3′), SmaI (5′-CCCGGG-3′), MluI (5′-ACGCGT-3′), BstUI (5′-CGCG-3′), HpaII (5′-CCGG-3′), HpyCH4IV (5′-ACGT-3′), or BsaAI (5′-YACGTR-3′). Either single digests (EagI, SmaI and MluI) or double digests (BstUI, HpaII, HpyCH4IV and BsaAI plus the CpG methylation-insensitive enzymes KpnI or BglII) were used. After checking all samples for complete digestion, they were blot-hybridized to a 1-kb D4Z4 subfragment under stringent conditions specific for D4Z4 that did not give interference from cross-hybridizing sequences (25). All eight normal postnatal somatic tissues exhibited similar blot-hybridization patterns indicating considerable, but incomplete, methylation of D4Z4. In contrast, the 17 analyzed ovarian epithelial carcinomas and 44 Wilms tumors displayed large differences in D4Z4 methylation (Supplementary Figures 1–3; with summary data from representative samples in Figure 2). As shown in the summary of data from these 61 cancers (Table 1), most specimens from both of these diverse types of cancers were either significantly hypo- or hypermethylated at D4Z4 EagI and SmaI sites relative to somatic control tissues (P = 0.015 and 0.00003 for ovarian carcinomas and P = 0.0006 and 0.00003 for Wilms tumors, respectively). There was a strong association between alterations in D4Z4 methylation at EagI and SmaI sites in both ovarian carcinomas (P < 10−6) and Wilms tumors (P < 10−14). Cancer-linked methylation changes at the other examined CpG-containing restriction sites for a given tumor were also very similar to those at EagI and SmaI sites (Supplementary Figures 2A and 3A and data not shown).
We also found that methylation changes in the subtelomeric 4q and 10q D4Z4 arrays were significantly associated with those in the NBL2 repeats arrays, which are located mostly in the short arms of the acrocentric chromosomes. For this analysis, tumor-linked methylation changes at EagI sites in the D4Z4 arrays were compared with those previously described for HhaI (5′-GCGC-3′) sites in NBL2 (1) using southern-blot-derived methylation scores, that had been validated by hairpin genomic sequencing (1). These sites are present at similar densities (5 and 3 per 3.3- and 1.4-kb repeat unit, respectively) and were analyzed in overlapping sets of tumors (Table 1). This analysis included borderline carcinomas (low malignant potential tumors) and benign tumors (cystadenomas) of epithelial origin (Supplementary Data). Alterations in D4Z4 methylation in the group of all examined tumors were significantly associated with those of NBL2 (P < 0.0001, Kendall's tau correlation). There is probably a common recognition of both dissimilar types of repeat arrays by poorly understood pathways for demethylation and de novo methylation that function during tumorigenesis, just as both tandem repeats are dependent on DNMT3B for normal levels of methylation (1,20).
Several of the cancers had extremely high levels of methylation in consecutive 3.3-kb repeat units of D4Z4 arrays (Supplementary Figures 1–3). For example, the amount of D4Z4 uncleaved by BstUI, HpaII or HpyCH4IV in double digests with the CpG methylation-insensitive KpnI was 52–88% for Wilms tumor 11 (WT11) as compared with 7–14% for control brain DNA, even though BstUI and HpaII sites are respectively about 6 and 12 times as frequent as HpyCH4IV sites in D4Z4 (Supplementary Figure 2A). In single digests of WT11 or WT52 DNA with BstUI or HpyCH4IV, most of the D4Z4 signal was in fragments of more than 10 kb (Figure 2 and data not shown). This indicates complete methylation at these sites for at least three contiguous repeat units. Because BstUI has 37 recognition sites per D4Z4 repeat unit, the long fragments had more than 100 contiguous methylated sites. Other, much smaller, hybridizing fragments were also present indicative of close unmethylated sites in other portions of D4Z4. Not only these two cancer DNA digests but also many other SmaI digests of control and cancer DNAs displayed significantly non-random distributions of D4Z4 fragment sizes consistent with long regions of these arrays being highly methylated and other regions having little methylation in the same sample (Supplementary Figure 1 and Supplementary Data). These observations and the associations between methylation at different restriction sites in D4Z4 suggest that methylation spreads along the arrays during normal development and during tumorigenesis with long regions of complete methylation and interspersed regions of considerably lower methylation.
One subregion within the 3.3-kb D4Z4 repeat unit displayed resistance to hypermethylation. An unexpectedly strong D4Z4-hybridizing band at the position of 1.4-kb fragments was seen in BstUI/KpnI and HpaII/KpnI digests from the five examined cancers that had hypermethylation of D4Z4 (WT11, WT52, WT5 and ovarian carcinomas Q and S, Figure 2 and Supplementary Figure 2A and B). This prominent band cannot be explained by trivial reasons, e.g. the distribution of BstUI and HpaII sites in D4Z4 (37 and 71 per 3.3-kb repeat unit, respectively; Figure 3A). It was also observed in BstUI/KpnI and HpaII/KpnI digests of somatic control DNAs, although at a lower intensity, but not in cancer or control samples singly digested with BstUI or HpaII. Therefore, this 1.4-kb fragment should be derived from cleavage at the single KpnI site in each D4Z4 repeat unit and a subregion of the repeat unit located about 1.4 kb distal (Figure 3A). This subregion of the D4Z4 repeat unit is apparently partly resistant to hypermethylation in cancer.
The primary structure of D4Z4 in this subregion is very unusual (Figure 3B). There is an uninterrupted run of 37 pyrimidine residues on the forward strand adjacent to HpaII and BstUI sites located 1395 and 1409 nt, respectively, after the KpnI site of the 3.3-kb D4Z4 repeat unit. Importantly, (G5N2-3)4, a potential guanine quadruplex sequence (PQS5, Figure 3A and B), is complementary to 27 nt of the Y37 sequence. Nearby is a (G3N2)4 sequence (PQS6) that also fits the PQS consensus sequence of four equally long runs of G in the following context: G3-5N1-7G3-5N1-7G3-5. PQS5, which is present 1412 nt after the KpnI site, scores especially high in a PQS prediction program for intramolecular G-quadruplexes (31). Each 3.3-kb D4Z4 repeat unit contains seven PQS that match the above consensus sequence (Figure 3A; GenBank AF117653 and AC126281).
By CD spectroscopy, a single-stranded PQS5 or PQS6 oligonucleotide had a strong positive peak with a maximum at 263 nm and a negative peak centered around 243 nm (Figure 3C). A positive CD peak at about 260 nm and a negative peak at about 240 nm are diagnostic for G-quadruplexes, usually of the parallel type (32) (Figure 3D). PQS6-mut, an oligonucleotide derived from PQS6 but with nine substitutions of C, A or T for G residues, gave a non-specific CD spectrum (Figure 3B and C). Radiolabeled PQS6 and PQS6-mut oligonucleotides were analyzed by non-denaturing PAGE. In addition to a band that migrated as expected for the unstructured oligonucleotide, about half of the signal from PQS6 was in low-mobility bands indicating intermolecular G-quadruplexes (33) (data not shown). For PQS6-mut, only one band with the expected mobility was seen.
We were able to compare methylation in the very beginning of the D4Z4 array, in the bulk of this array, and immediately proximal to it by hybridizing blots to a p13E-11 probe, which is a D4Z4-adjacent sequence (Figure 1), and then to the D4Z4 probe. This was done for BstUI/KpnI, HpaII/KpnI, HpyCH4IV/KpnI and BsaAI/BglII digests of cancer and control DNAs (Supplementary Figures 2 and 3). While D4Z4 is extremely rich in CpG's (9.9% CpG), there are only six CpG sites (0.6% CpG) in the 930-bp region immediately proximal to D4Z4. This proximal region encompasses the 0.8-kb p13E-11 sequence (Figure 1). Two of the six proximal CpG's are in HpyCH4IV and BstBI recognition sites and were analyzed by southern blotting (Supplementary Figure 2). Surprisingly, in cancers with overall D4Z4 hypermethylation relative to somatic controls, there was resistance to hypermethylation, and even decreased methylation, in the first approximately 2 kb of the array and immediately proximal to it (Figure 2). A BstUI site located 2.1 kb proximal to the array was the exception (Figure 2). It was tenaciously methylated in all cell populations, including sperm and WT22, which were strongly hypomethylated at D4Z4 compared to somatic controls (Figure 2 and Supplementary Figure 2C and D). Conventional hairpin structures with T-A or T-G base-pairing might form from the T2-5A4-5 sequence and the (TG)8 sequence 10 bp 5′ and 25 bp 3′ to this site, respectively (Supplementary Figure 2E). This might help recruit DNA methyltransferases (34) to keep the site methylated.
Given the unusual DNA methylation changes that we found in and adjacent to this array in cancer and the linkage of D4Z4 arrays to FSHD, we studied the conformation of chromatin in these regions by DNaseI sensitivity assays on cultured cells. Chromatin standards (Supplementary Table 1) and the restriction enzyme for digestion (StyI) of isolated DNA from DNaseI-treated nuclei or permeabilized cells were chosen to allow comparison of test sequences and euchromatin and heterochromatin standards in a single southern blot. The test sequences were 4q35 D4Z4, which is linked to FSHD; 10q26 D4Z4, which is phenotypically neutral; and p13E-11 the sequence immediately proximal to D4Z4 (Figure 1). Because FSHD is mostly a skeletal muscle disease, it was important to test cells from the muscle lineage, namely, FSHD and control myoblast cell strains. These non-transformed cell cultures are generally difficult to obtain, generate and propagate and have limited growth potential before irreversible growth arrest. Moreover, large numbers of cells (108) are required for one chromatin DNaseI sensitivity assay monitored by southern blotting. Therefore, we first tested lymphoblastoid cell lines (LCLs) and control fibroblast cell strains. Lysolecithin efficiently permeabilized LCLs, but not fibroblasts, to DNaseI (Supplementary Figure 4A). We were able to study the chromatin DNaseI sensitivity of fibroblasts and myoblasts by incubating gently isolated nuclei with DNaseI and adjusting DNaseI digestion conditions. A comparison of permeabilized LCL cells and LCL nuclei gave similar results for relative DNaseI sensitivity of the analyzed DNA sequences (Supplementary Table 2).
The relative DNaseI sensitivity of standard and test sequences in FSHD and control myoblasts, LCLs and fibroblasts was as follows: constitutively transcribed gene standards (B2M and HMBS) > D4Z4-proximal p13E-11 marker > untranscribed gene standards (CST5, IL-2 and GHRHR) > 4q and 10q D4Z4 > Sat2 juxtacentromeric heterochromatin in chromosome 1 (Figures 4 and and55 and Supplementary Figures 4B and 5). B2M and HMBS are highly transcribed in all studied tissues while CST5, IL-2 and GHRHR are selectively transcribed in only a small number of cell types, not including those in this study (http://www.genecards.org/ and http://expression.gnf.org/cgi-bin/index.cgi). The differences in DNaseI digestion for untranscribed gene standards versus transcribed gene standards and constitutive heterochromatin versus untranscribed gene standards were significant (P < 10−4). It was very surprising that p13E-11, which is only 0.1 kb from D4Z4 (Figure 1) and has no features of a gene (http://exon.gatech.edu/GeneMark/, http://www.softberry.com/berry.phtml, and http://genes.mit.edu/GENSCAN.html), was more sensitive to DNaseI than the untranscribed gene standards in all tested myoblast, fibroblast and lymphoblastoid cell or nuclei populations (P < 10−4).
In every studied cell type, both the 3.3- and 2.8-kb D4Z4 parent bands from 4q35 and 10q26, respectively (Figure 1B), were much more resistant to DNaseI than the 1.9-kb region containing the p13E-11 sequence (P < 10−5; Figures 4 and and5).5). In addition, the D4Z4 arrays were more resistant to DNaseI than untranscribed gene standards and less than the constitutive heterochromatin standard in every FSHD and control cell population (P < 10−4 for both comparisons). Similar results were obtained with or without normalization for the somewhat different sizes of the parent fragments (1.2–3.3 kb; Supplementary Table 2). No DNaseI hypersensitive sites (discrete lower-molecular-weight bands) were seen in the promoter regions or other tested sequences. However, this might be due to optimizing conditions for measuring general DNaseI sensitivity rather than hypersensitive sites.
Among the two studied pairs of similar standards, namely, the untranscribed and the constitutively transcribed gene standards, smaller, but also significant differences, were seen (P < 10−3, Figures 4 and and5).5). CST5 was slightly more resistant to DNaseI than GHRHR in all samples although both genes should be untranscribed in the examined cells. Similarly, the ubiquitously transcribed B2M was always appreciably more resistant to DNaseI than the other constitutively expressed standard, HMBS. This might reflect gene-specific differences in chromatin structure even when comparing genes with the same expression status. Given the reproducibility of these findings among different cell populations, it is noteworthy that 10q D4Z4 chromatin was generally a little more resistant to DNaseI than 4q D4Z4 chromatin (P = 10−3, Figures 4 and and55).
The length of the D4Z4 arrays needs to be considered in evaluation of their chromatin DNaseI sensitivity because of the frequent correlations between long tandem repeat arrays and heterochromatinization (35,36) and the effect of the size of 4q D4Z4 arrays on disease status. We looked for correlations of DNaseI chromatin resistance and D4Z4 array size as determined by pulsed-field gel electrophoresis (25). Interpretation of the results is complicated by large D4Z4 arrays contributing proportionally more signal than short arrays and the highly polymorphic sizes of these arrays. The most informative of our studied cell populations to look for an effect of D4Z4 array length on the DNaseI resistance was the control LCL 724. It has two short 10q D4Z4 arrays (6- and 10-unit) and two long allelic 4q arrays (both 27 repeat units). The 2.8-kb band from 10q D4Z4 was well separated from the 3.3-kb band from 4q D4Z4. Because 10q and 4q D4Z4 arrays are almost identical in sequence (26), if the chromatin structure of 10q D4Z4 is dependent on the length of the array, this is probably also true for 4q D4Z4. Contrary to a simple correlation between array size and DNaseI resistance, the 10q and 4q D4Z4 arrays of LCL 724 had virtually the same DNaseI resistance (Figure 5 and Supplementary Table 2).
A major unanswered question about the molecular genetics of FSHD is how a tandem array of more than ten 3.3-kb D4Z4 units at both 4q35 allelic positions is almost always protective against the disease while one 4q35 D4Z4 array with ten or fewer repeat units usually indicates the presence of the disease by the third decade (37). The in situ ‘ruler’ or biochemical device for measuring D4Z4 array size to establish disease status is likely to involve topological differences between a short array and one containing more than ten units. However, the nature of D4Z4 chromatin is little understood.
We previously reported on histone H4 acetylation of D4Z4 and proximal sequences (8). Histone acetylation is often inversely related to local chromatin compaction (38), although not always (39). D4Z4 chromatin in several human chromosome 4-containing somatic cell hybrids has low levels of H4 acetylation, but not as low as those of a heterochromatin standard. Those PCR-dependent experiments were confined to somatic cell hybrids because of the lack of primers specific for D4Z4 (8). Histone acetylation at p13E-11, the immediately proximal sequence to D4Z4, was at levels similar to those of untranscribed genes rather than to constitutive heterochromatin.
Here we analyzed the general DNaseI sensitivity of D4Z4 and the immediately proximal region under blot-hybridization conditions specific for these sequences (7,25). The DNaseI sensitivity of both 4q and 10q D4Z4 chromatin was intermediate to that of constitutive heterochromatin and inactive genes in FSHD and control myoblasts, fibroblasts and lymphoblastoid cell lines. A caveat is that in comparing short, disease-associated D4Z4 arrays with normal, long D4Z4 arrays, the contribution of the short array to the blot-hybridization signal is obscured by the proportionally greater contribution of the long arrays. However, in a control LCL (724, Figure 5), two short allelic D4Z4 arrays at 10q (6 and 10 repeat units) were not more sensitive to DNaseI than two long 4q D4Z4 arrays (both 27 repeat units) or than untranscribed gene standards. These results suggest a high degree of condensation of D4Z4 chromatin even in short, FSHD-sized repeat arrays, with the caveat that DNaseI sensitivity measures only certain aspects of higher-order chromatin structure.
It has been proposed that heterochromatinization spreads proximally from a normal, long D4Z4 array to a 4q35 gene responsible for initiating the FSHD dysregulation cascade (14,40,41). Contrary to that hypothesis or the idea that long D4Z4 arrays might normally act as insulators (5,41), the region immediately proximal to the D4Z4 array (p13E-11) was more sensitive to DNaseI than the bulk D4Z4 array in all tested cell cultures. Unexpectedly, this D4Z4-adjacent region consistently displayed a DNaseI sensitivity that was even greater than that of untranscribed gene standards. We found no predicted promoter, enhancer, or exon in the 4-kb region proximal to the 4q or 10q D4Z4 arrays that could explain its unexpectedly high sensitivity to DNaseI even in control fibroblasts containing four long D4Z4 arrays (18–30 repeat units per array). The nuclease sensitivity in this region might reflect a local strain in the chromatin conformation.
We could monitor the DNaseI sensitivity of the D4Z4-proximal region and the average sensitivity of the D4Z4 array but not specifically that of the first D4Z4 repeat unit. However, by studying cancer-linked DNA methylation changes, we found evidence for an atypical chromatin structure at the beginning of the D4Z4 array and immediately proximal to it. The apparent spreading of methylation along D4Z4 that we observed seems to have limits to its processivity and to be prone to stop at certain subregions. Specifically, there was a remarkable resistance to cancer-linked hypermethylation at the very beginning of the repeat array and at several CpG sites proximal to the array in cancers that displayed strong hypermethylation in the bulk of the array (Figure 2). The sequence of D4Z4 repeat units is highly conserved throughout the array. Therefore, the differential susceptibility to tumor-linked hypermethylation at the start of the array versus in the bulk of the array is best explained by a special chromatin structure at the junction of D4Z4 and the proximal non-repeated sequence.
The methylation and DNaseI sensitivity findings suggest that there is a boundary element (42) (Figure 6) around the junction of the D4Z4 array and the immediately proximal sequence. Naturally occurring deletions of this D4Z4-proximal region indicate that the exact sequence immediately proximal to D4Z4 cannot be contributing to the phenotype (43). However, there might be a critical interfacing of D4Z4 (first 200 bp, 78% G+C) and a DNA sequence with a more typical base composition for human DNA (immediately proximal 200 bp, 43.5%).
The second region in which we observed resistance to cancer-linked hypermethylation is a subregion of the 3.3-kb D4Z4 repeat unit, about 1.4 kb distal to its single KpnI site. It has runs of G residues that are predicted to form stable G-quadruplexes (Figure 3B). We demonstrated that two of these sequences, PQS5 and PQS6, readily form G-quadruplexes when tested as single-stranded oligonucleotides. Inhomogeneous spreading of DNA methylation along the body of the D4Z4 arrays might be due partly to such unusual DNA secondary structures within the D4Z4 repeat units hindering the spread of DNA methylation in vivo directly or indirectly, by effects on chromatin structure.
Various types of indirect evidence suggest that genomic G-quadruplexes, non-B DNA structures, exist in vivo and mediate transcription control, genomic stability and chromatin packing at telomeres (44,45). In certain functional classes of genes, including those associated with muscle development or contraction, PQS overrepresentation is especially high (46), which is further evidence for their biological relevance. G-quadruplexes can involve Hoogsteen base-pair interactions between runs of G residues within one local sequence on a DNA strand (Figure 3D) or between four equal-length runs of G residues on two or four different strands (44,47). They might also form between two runs of G residues in non-neighboring locations within the same region of a given strand and thereby help establish higher-order packing of chromatin (45).
We propose that G-quadruplex formation in the 3.3-kb D4Z4 repeat unit (Figure 3A) helps organize D4Z4 chromatin to give special intra-array looping in normal, long arrays (Figure 6). We further hypothesize that, within short arrays, abrogation of this intra-array looping allows long-distance looping between the beginning of the array and an FSHD master gene at 4q35 (Figure 6). The identity of this 4q35 gene is still uncertain despite many conventional analyses (8,14–16), including microarray expression studies (11–13). The postulated alternative looping to regulate expression of the FSHD master gene could explain why only short D4Z4 arrays at 4q35 are linked to FSHD by permitting long-distance looping that initiates inappropriate gene expression accounting for the dominant inheritance of this disease.
The hypothesized boundary element at the first D4Z4 repeat unit in normal and FSHD cells (Figure 6) makes it easier to envision pathogenic long-distance chromatin interactions between a short D4Z4 array at 4q35 and another FSHD-related DNA sequence in cis. Given that FSHD is mostly a muscle-specific disease, it is intriguing that the myogenesis-specific homodimeric MyoD protein can specifically recognize G-quadruplexes with a higher affinity than specific duplex sequences (48–50). That this transcription factor preferentially binds to G-quadruplexes between runs of G on different oligonucleotide strands makes the above model a very attractive hypothesis for understanding part of the perplexing near-threshold effect of D4Z4 array size on disease status such that a 4q35 D4Z4 array of <11 repeat units (<36 kb) is diagnostic for FSHD.
We thank the Brain and Tissue Bank for Developmental Disorders at the University of Maryland for generously replacing tissues lost during the Hurricane Katrina flood in New Orleans, LA. Supported in part by NIH grant R01 NS048859 and a grant from the Louisiana Cancer Research Consortium. Funding to pay the Open Access publication charges for this article was provided by the FSH Society.
Conflict of interest statement. None declared.