A major unanswered question about the molecular genetics of FSHD is how a tandem array of more than ten 3.3-kb D4Z4 units at both 4q35 allelic positions is almost always protective against the disease while one 4q35 D4Z4 array with ten or fewer repeat units usually indicates the presence of the disease by the third decade (37
). The in situ
‘ruler’ or biochemical device for measuring D4Z4 array size to establish disease status is likely to involve topological differences between a short array and one containing more than ten units. However, the nature of D4Z4 chromatin is little understood.
We previously reported on histone H4 acetylation of D4Z4 and proximal sequences (8
). Histone acetylation is often inversely related to local chromatin compaction (38
), although not always (39
). D4Z4 chromatin in several human chromosome 4-containing somatic cell hybrids has low levels of H4 acetylation, but not as low as those of a heterochromatin standard. Those PCR-dependent experiments were confined to somatic cell hybrids because of the lack of primers specific for D4Z4 (8
). Histone acetylation at p13E-11, the immediately proximal sequence to D4Z4, was at levels similar to those of untranscribed genes rather than to constitutive heterochromatin.
Here we analyzed the general DNaseI sensitivity of D4Z4 and the immediately proximal region under blot-hybridization conditions specific for these sequences (7
). The DNaseI sensitivity of both 4q and 10q D4Z4 chromatin was intermediate to that of constitutive heterochromatin and inactive genes in FSHD and control myoblasts, fibroblasts and lymphoblastoid cell lines. A caveat is that in comparing short, disease-associated D4Z4 arrays with normal, long D4Z4 arrays, the contribution of the short array to the blot-hybridization signal is obscured by the proportionally greater contribution of the long arrays. However, in a control LCL (724, ), two short allelic D4Z4 arrays at 10q (6 and 10 repeat units) were not more sensitive to DNaseI than two long 4q D4Z4 arrays (both 27 repeat units) or than untranscribed gene standards. These results suggest a high degree of condensation of D4Z4 chromatin even in short, FSHD-sized repeat arrays, with the caveat that DNaseI sensitivity measures only certain aspects of higher-order chromatin structure.
It has been proposed that heterochromatinization spreads proximally from a normal, long D4Z4 array to a 4q35 gene responsible for initiating the FSHD dysregulation cascade (14
). Contrary to that hypothesis or the idea that long D4Z4 arrays might normally act as insulators (5
), the region immediately proximal to the D4Z4 array (p13E-11) was more sensitive to DNaseI than the bulk D4Z4 array in all tested cell cultures. Unexpectedly, this D4Z4-adjacent region consistently displayed a DNaseI sensitivity that was even greater than that of untranscribed gene standards. We found no predicted promoter, enhancer, or exon in the 4-kb region proximal to the 4q or 10q D4Z4 arrays that could explain its unexpectedly high sensitivity to DNaseI even in control fibroblasts containing four long D4Z4 arrays (18–30 repeat units per array). The nuclease sensitivity in this region might reflect a local strain in the chromatin conformation.
We could monitor the DNaseI sensitivity of the D4Z4-proximal region and the average sensitivity of the D4Z4 array but not specifically that of the first D4Z4 repeat unit. However, by studying cancer-linked DNA methylation changes, we found evidence for an atypical chromatin structure at the beginning of the D4Z4 array and immediately proximal to it. The apparent spreading of methylation along D4Z4 that we observed seems to have limits to its processivity and to be prone to stop at certain subregions. Specifically, there was a remarkable resistance to cancer-linked hypermethylation at the very beginning of the repeat array and at several CpG sites proximal to the array in cancers that displayed strong hypermethylation in the bulk of the array (). The sequence of D4Z4 repeat units is highly conserved throughout the array. Therefore, the differential susceptibility to tumor-linked hypermethylation at the start of the array versus in the bulk of the array is best explained by a special chromatin structure at the junction of D4Z4 and the proximal non-repeated sequence.
The methylation and DNaseI sensitivity findings suggest that there is a boundary element (42
) () around the junction of the D4Z4 array and the immediately proximal sequence. Naturally occurring deletions of this D4Z4-proximal region indicate that the exact sequence immediately proximal to D4Z4 cannot be contributing to the phenotype (43
). However, there might be a critical interfacing of D4Z4 (first 200 bp, 78% G+C) and a DNA sequence with a more typical base composition for human DNA (immediately proximal 200 bp, 43.5%).
Figure 6. Hypothesized intrachromosomal communication by chromatin looping. (A) A disease-associated long-distance loop is hypothesized to form when the D4Z4 repeat array is contracted to less than 11 repeat units, leading to inappropriate expression in cis of (more ...)
The second region in which we observed resistance to cancer-linked hypermethylation is a subregion of the 3.3-kb D4Z4 repeat unit, about 1.4 kb distal to its single KpnI site. It has runs of G residues that are predicted to form stable G-quadruplexes (B). We demonstrated that two of these sequences, PQS5 and PQS6, readily form G-quadruplexes when tested as single-stranded oligonucleotides. Inhomogeneous spreading of DNA methylation along the body of the D4Z4 arrays might be due partly to such unusual DNA secondary structures within the D4Z4 repeat units hindering the spread of DNA methylation in vivo directly or indirectly, by effects on chromatin structure.
Various types of indirect evidence suggest that genomic G-quadruplexes, non-B DNA structures, exist in vivo
and mediate transcription control, genomic stability and chromatin packing at telomeres (44
). In certain functional classes of genes, including those associated with muscle development or contraction, PQS overrepresentation is especially high (46
), which is further evidence for their biological relevance. G-quadruplexes can involve Hoogsteen base-pair interactions between runs of G residues within one local sequence on a DNA strand (D) or between four equal-length runs of G residues on two or four different strands (44
). They might also form between two runs of G residues in non-neighboring locations within the same region of a given strand and thereby help establish higher-order packing of chromatin (45
We propose that G-quadruplex formation in the 3.3-kb D4Z4 repeat unit (A) helps organize D4Z4 chromatin to give special intra-array looping in normal, long arrays (). We further hypothesize that, within short arrays, abrogation of this intra-array looping allows long-distance looping between the beginning of the array and an FSHD master gene at 4q35 (). The identity of this 4q35 gene is still uncertain despite many conventional analyses (8
), including microarray expression studies (11–13
). The postulated alternative looping to regulate expression of the FSHD master gene could explain why only short D4Z4 arrays at 4q35 are linked to FSHD by permitting long-distance looping that initiates inappropriate gene expression accounting for the dominant inheritance of this disease.
The hypothesized boundary element at the first D4Z4 repeat unit in normal and FSHD cells () makes it easier to envision pathogenic long-distance chromatin interactions between a short D4Z4 array at 4q35 and another FSHD-related DNA sequence in cis
. Given that FSHD is mostly a muscle-specific disease, it is intriguing that the myogenesis-specific homodimeric MyoD protein can specifically recognize G-quadruplexes with a higher affinity than specific duplex sequences (48–50
). That this transcription factor preferentially binds to G-quadruplexes between runs of G on different oligonucleotide strands makes the above model a very attractive hypothesis for understanding part of the perplexing near-threshold effect of D4Z4 array size on disease status such that a 4q35 D4Z4 array of <11 repeat units (<36 kb) is diagnostic for FSHD.