FSHD is associated with short arrays of the macrosatellite D4Z4 at subtelomeric 4q but not at subtelomeric 10q. It is still uncertain why, despite the near identity of 4q and 10q D4Z4 and much homology proximally and distally, FSHD is a 4q-specific disease. This dominant disease is caused by the reduction in size of a 4q D4Z4 array past a near-threshold of ~36 kb (). For example, contraction of a 40-kb array (with 12 3.3-kb repeat units) to one of 30 kb (with 9 3.3-kb repeat units) can result in the disease. We proposed that FSHD involves pathogenic long-range looping in
cis of the centromere-proximal end of D4Z4 chromatin with 4q-specific sequences at 4q35.2 that is enabled by changes in intra-array chromatin looping dependent on the array size (
20,
46). The importance of pathogenic chromatin structure changes to this disease is indicated by recent evidence for FSHD-specific chromatin alterations in the D4Z4 array itself in FSHD patient's cells (
47,
48). In addition, the most proximal D4Z4 repeat unit apparently has a more open structure than the bulk of the array (
20,
48,
49). Many experimental studies of the molecular genetics of FSHD do not duplicate the unusual chromatin environment of 4q35.2, which is likely to be critical for this disease in view of its 4q specificity. We used DNase-chip to examine 4q35.2 for chromatin features suggestive of a distinctive higher order structure. Given the lack of definitive findings about
cis effects of short D4Z4 arrays at 4q35.2 on gene expression (
10–14,
34,
35), DNase-chip also served as an annotation-neutral method of finding evidence for undocumented genes that may be important to FSHD in this gene-sparse 4-Mb region.
At 4q35.2, we found 28 DH sites detectable in all six examined myoblast cultures from FSHD patients or normal controls. As expected, most were located in the proximal 1 Mb of 4q35.2, the most gene-rich subregion. Surprisingly, within the bifurcated 3.1-Mb gene desert at 4q35.2 (), 12 DH sites were observed in all tested myoblast cultures >100 kb from the nearest gene. For some of these DH sites, notably DH8, DH9 and DH10, the distances to the closest genes active in myoblasts were very large, >0.7 Mb. Nonetheless, these sites may identify long-distance enhancers, silencers or locus control regions (
50–53). Alternatively, they might be associated with unannotated genes or structural elements, such as looping hubs (
54). That DH8, 9 and 10 were observed in myoblasts and fibroblasts, both of mesodermal origin, but not in cells of the lymphoid, myeloid and hepatic lineages, suggests functionality.
In the D4Z4-proximal 1-Mb region, which is mostly gene desert, we found nine DH sites present in at least three of the six myoblast cell cultures. Only two of these, namely, DH
FRG1 (in the promoter of
FRG1) and DH272 (in the distal gene desert) did not overlap a DNA repeat. The other seven overlapped tandem repeats of short units (STRs). We also observed that DH sites frequently overlap STRs also in the terminal 1 Mb of 10q by DNase-chip analysis (unpublished data). While the biological significance of these DH-STRs remains to be determined, there are precedents for shorter tandem repeats influencing nucleosome positioning and excluding nucleosomes (
55,
56). With respect to DH
FRG1, the DH site at the
FRG1 promoter, one group reported overexpression of
FRG1 RNA in FSHD muscle (
17) but several others were unable to confirm this (
11–13). In this study, we found no difference between control and FSHD myoblasts in this DH site and no significant difference in the amount of RNA product. DH272, the unique DH site located 150 kb proximal to
FRG1, was observed preferentially in FSHD versus control myoblast cultures. Preliminary results from DNase-seq on three other control myoblast cell strains also revealed little or no DH peak at the position of DH272. We found nearby unannotated transcripts (probably non-coding RNAs,
Supplementary Figure S5) that were not FSHD-specific in myoblasts. However, further study is needed of both myotubes and myoblasts to test the possibility of disease-linked expression of amplicons in the vicinity of DH272 and other DH sites in the 4q35.2 gene desert.
Even DH sites in 4q35.2 that did not display FSHD-related differences might be involved in pathogenic chromatin looping interactions. DH sites could be identical in both normal and disease cells, but the 3D structure (looping) and protein complexes that bind to them could differ between them. Given that DH sites can be associated with loci at which chromatin looping occurs (
54), our results suggest subregions of 4q35.2 with the potential for these chromatin interactions that should be investigated. Our study points to the DH272 region as particularly attractive for searching for FSHD-related sequences because of the overlap of DH272 with a CTCF sequence found in many cell types [ and (
4)]. Moreover, the potential CTCF binding site identified in this ChIP-positive region of lung fibroblasts by Kim
et al. (
4) matched the CTCF consensus sequence at 19 out of 20 nt. CTCF is a sequence-specific DNA-binding protein with diverse functions, including as an insulator and organizer of chromatin looping (
54,
57). CTCF might play a role in our proposed pathogenic looping of 4q35.2-specific sequences to a short pathogenic D4Z4 array because D4Z4 was recently shown to have a CTCF-binding sequence (
47). Some evidence was presented for increased binding of CTCF to D4Z4 in FSHD versus control myoblasts (
47).
In addition to the revealing candidate DNA sequences for FSHD involvement in gene deserts, our data indicate that
FAT1 transcription warrants further study of possible differences in FSHD and control muscle cells beyond the few studies involving expression microarrays (
11,
13).
FAT1 is the only annotated 4q35.2 gene with evidence for complex tissue-specific expression and, in this study, a muscle-specific pattern of DH sites. Many myoblast-specific DH sites were found in and around this large gene in both FSHD and control cell strains, suggesting that this subregion contains active regulatory elements associated with the muscle lineage. The cell type-specific differences in chromatin that we observed are consistent with tissue-specific production of multiple
FAT1 RNA and protein isoforms from predicted gene-internal promoters and by alternative splicing (
http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/).
FAT1, which contains >25 exons, encodes a cadherin-type integral membrane protein which is implicated in diverse developmental and signaling pathways, including in vascular smooth muscle remodeling (
58).
It has been proposed that a short disease-linked D4Z4 array at 4q35.2, but not at 10q26.3, triggers abnormal transcription of DNA sequences within the array itself in affected FSHD muscle cells (
34,
35,
39,
59). Overexpression of
DUX4 RNA, derived from the 1.6-kb gene inside each 3.3-kb D4Z4 repeat unit, was reported in FSHD myotubes relative to control myotubes (
35) but truncated transcripts or transcripts from other portions of the D4Z4 repeat unit are more prevalent than full-length
DUX4 transcripts (
34). Currently, definitive conclusions as to the relationship of D4Z4 transcription and pathogenicity are precluded by low expression levels, small numbers of samples, many cross-hybridizing sequences and the variety of small transcripts (
34). If dysregulated expression of some D4Z4 sequence from short arrays initiates abnormal gene expression in FSHD, it remains to be explained why it is only short 4q arrays that cause the disease despite the ~98% identity between 4q and 10q D4Z4 (
8) and homology outside the arrays (). In addition, exchanges between the almost (but not completely) identical 4q and 10q D4Z4 arrays are rather frequent and can result in an array with 4q-type repeat units replacing all the 10q units (
60). Nonetheless, short D4Z4 arrays cause disease only when they reside on 4q (
61). Therefore, polymorphisms that were found to be associated with canonical 4q-type D4Z4 units, but not canonical 10q-type D4Z4 units (
8), are unlikely to explain the 4q linkage of FSHD.
We propose that the chromosomal environment of 4q35.2 plays a key role in the 4q-specific nature of FSHD, whether abnormal expression from 4q containing a short D4Z4 array initiates from within or outside D4Z4. Both at the DNA and the chromosome levels, 4q35.2 is unusual. It has the lowest gene density in its terminal 3 Mb of any of the q arms. It is distal to a large bifurcated gene desert punctuated centrally by a few genes that appear to be critical in early embryogenesis. Like some other genes (
62), especially those important in the control of development (
63), these inter-desert genes may be flanked by gene deserts to help keep their expression tightly restricted to certain stages in development. They might be part of large blocks chromatin with distinguishing epigenetic features. In CD4
+ cells, this gene desert region has histone modifications [(
5) and
http://genome.ucsc.edu] indicative of inactive euchromatin rather than constitutive heterochromatin. This is consistent with our previous immunocytochemical and DNA replication analyses of FSHD and control myoblasts (
64). However, given the complexity of epigenetic modification of chromatin, there can be a variety of types of large distinctive chromatin blocks within euchromatin (
65).
One of the properties that distinguishes subtelomeric 4q (which can have pathogenic D4Z4 arrays) and 10q (whose D4Z4 arrays are always phenotypically neutral) is that only the 4q subtelomere (and not 10q or 4q) has a strong association with the nuclear rim in FSHD and control myoblasts and myotubes (
66). A marker that was 0.22 Mb from D4Z4 on 4q35.2 (close to DH272) showed a significantly closer association with the nuclear periphery than did D4Z4. The unusual localization of subtelomeric 4q to the nuclear periphery might be necessary for pathogenicity. This localization may result partly from its uncommonly large region of inactive euchromatin (
67,
68) in a distinctive conformation, as reflected in its low concentration of DH sites. Our results emphasize the underappreciated importance of considering the regional chromatin context of D4Z4 in analysis of the mechanism by which contraction of D4Z4 to a size of <36 kb can lead to disease (
69).