|Home | About | Journals | Submit | Contact Us | Français|
Autosomal dominant facioscapulohumeral muscular dystrophy (FSHD) has an unusual pathogenic mechanism. FSHD is caused by deletion of a subset of D4Z4 macrosatellite repeat units in the subtelomere of chromosome 4q. Recent studies provide compelling evidence that a retrotransposed gene in the D4Z4 repeat, DUX4, is expressed in the human germline and then epigenetically silenced in somatic tissues. In FSHD, the combination of inefficient chromatin silencing of the D4Z4 repeat and polymorphisms on the FSHD-permissive alleles that stabilize the DUX4 mRNAs emanating from the repeat, result in inappropriate DUX4 protein expression in muscle cells. FSHD is thereby the first example of a human disease caused by the inefficient repression of a retrogene in a macrosatellite repeat array.
Facioscapulohumeral dystrophy (FSHD) was first described by two French neurologists in the late 19th century . The disease was named after its clinical presentation to distinguish it from the well known Duchenne type of muscular dystrophy. Indeed, the core phenotype of FSHD involves progressive weakness and wasting of the facial (facio), shoulder and upper arm (scapulohumeral) muscles (for a detailed clinical description, see Box 1). There is currently no treatment available for FSHD (for outstanding questions, see Box 2) .
At disease onset, typically in the second decade of life, FSHD is characterized by initially restricted weakness of shoulder and facial muscles [1,48]. With progression, the lower extremities, both distal and proximal, become involved. The spectrum of disease severity is wide, ranging from mildly affected, asymptomatic individuals to severely affected wheelchair bound individuals (about 20%). Non-muscular manifestations include hearing loss and retinal vascular abnormalities that remain largely of no clinical consequence [49–50]. Rarely, however, the retinal vascular disease can result in an exudative retinopathy, Coat’s syndrome, that can result in significant loss of vision. There is no linear and inverse correlation between residual repeat size and disease severity and onset. However, patients having repeat arrays of 1–3 units usually have an infantile onset and rapid progression .
The major form of FSHD (FSHD1: MIM # 158900) is autosomal dominantly transmitted with linkage to the subtelomere of chromosome 4q . The FSHD1 locus was mapped in 1990 , the first genetic condition to be mapped with polymorphic microsatellite repeats, and the nature of the genetic defect was resolved in 1993 . Nevertheless, its pathogenic mechanism remains somewhat elusive. This review focuses on recent developments that establish FSHD as the first example of a human disease caused by the inefficient repression of a retrogene array.
FSHD1 is caused by a contraction of the highly polymorphic D4Z4 macrosatellite repeat (MSR) in chromosome 4q (Box 3, Figure 1) . The D4Z4 macrosatellite repeat is located approximately 40–60 kb proximal to the telomere repeat and varies between 11 and >100 copies of D4Z4 units in the unaffected population [6–7]. The D4Z4 repeat unit is defined as a 3.3 kb KpnI fragment, and multiple units are ordered head-to-tail to form the D4Z4 repeat array. Most patients with FSHD1 have a partial and internal deletion of the repeat array leaving only 1–10 units on one of their chromosomes 4. It is believed that at least one unit of D4Z4 is necessary to develop FSHD, as monosomy of 4q does not cause FSHD .
Approximately half of the human genome consists of repetitive DNA, and a significant proportion is organized in tandem arrays. These tandem arrays of DNA embody an extreme example of copy number variation and are classified according to their repeat unit size and their total length. Although different definitions exist, repeat unit sizes 1–4 nucleotides and spanning less than 100 bp are typically defined as microsatellite repeats. Those with repeat unit sizes between 10–40 nucleotides covering several hundreds of base pairs are referred to as minisatellite repeats. The term midisatellite repeat has been proposed for loci contain repeat units of 40–100 nucleotides that may extend over distances of 250 to 500 kb. Macrosatellite repeats, to which D4Z4 belongs, are the largest class of repeat arrays with unit sizes of > 100 nucleotides but which are typically much larger and can span hundreds of kilobases of DNA. While FSHD represents a macrosatellite repeat contraction disease, microsatellite repeat expansions are a frequent cause of neurodegenerative diseases.
A contraction of the D4Z4 repeat array only predisposes to the disease as this contraction needs to occur on a specific chromosomal background . Soon after the discovery of the D4Z4 repeat on chromosome 4, it was established that the subtelomere of chromosome 10q is almost identical to that of chromosome 4q and that it also contains a highly homologous and equally polymorphic repeat array [10–11]. A considerable proportion of individuals in the population carry 4q or 10q chromosome ends with repeat arrays that have apparently been entirely or partially transferred between both chromosomes [7,12–13]. However, contracted repeat arrays on chromosome 10 have until recently never been shown to cause FSHD [14–15]. The observation of linkage of the disease with chromosome 4 and the absence of linkage with chromosome 10 led to the hypothesis that interplay of D4Z4 with other, more proximal elements on chromosome 4 could explain the chromosome 4 specificity of the disease. In this scenario, either by spreading  or looping [17–18] mechanisms, the D4Z4 repeat contraction would affect the transcriptional regulation of proximal chromosome 4-specific genes. Indeed, closely located genes with high myopathic potential were reported to be transcriptionally upregulated  in FSHD, such as FRG1, FRG2 and ANT1. Variability between studies possibly due to biological differences between the tissues sampled and technical differences between the various studies, however, prevented a consensus agreement on any single mechanism .
The D4Z4 repeat and its homolog are located in the subtelomere of chromosomes 4 and 10. Subtelomeres are unusual domains, showing a relatively high level of plasticity and resulting in the frequent transfer of sequences between homologous and non homologous chromosome ends .
To understand the chromosome 4 linkage with the disease, genetic studies were undertaken leading to the identification of the mechanism of D4Z4 rearrangements and to the identification of large polymorphisms in the subtelomere of chromosome 4q [21–23]. As approximately half of new FSHD cases arise as a consequence of a postzygotic rearrangement of the repeat leading to somatic mosaicism for the D4Z4 repeat contraction, these mosaic cases were instrumental for studying the genetic mechanism . It became apparent that the preferred template for these rearrangements was the sister chromatid, instead of the homologous chromosome, making a frequent occurrence of sequence transfers between homologous chromosomes or between chromosomes 4 and 10 unlikely . In addition, a large polymorphism was identified on chromosome 4 that involved the presence or absence of a β-satellite repeat immediately distal to the D4Z4 repeat array, identifying two major haplotypes of chromosome 4 called 4A (with β-satellite repeat) and 4B (without β-satellite repeat) . On chromosome 10, only the A variant was identified. While 4A and 4B chromosomes are almost equally common in the population, FSHD chromosomes seemed to be exclusively of the 4A type , which was later confirmed in independent studies [24–25]. Thus it was concluded that attributes specific to 4A, presumably polymorphisms, confer permissiveness to the D4Z4 repeat.
This led to focused attention on identifying allele-specific polymorphisms associated with FSHD. Initially, a simple sequence length polymorphism (SSLP) was identified immediately proximal to the D4Z4 repeat and was instrumental in our understanding of the genetic basis of FSHD [9,26]. Studies in patients and control individuals from different populations showed that during recent human evolution, there were likely only 4 events in which sequences were transferred between chromosomes 4 and 10. Subsequent detailed genetic studies of the FSHD locus led to the identification of additional polymorphisms subdividing chromosome 4 into at least 17 genetically distinct subtelomeric variants and chromosome 10 into 8 subtelomeric variants. Intriguingly, contractions in only three genetically almost identical chromosome 4 subtelomeres, the common variant 4A161 and the rare variants 4A159 and 4A168, caused FSHD while contractions in other 4q subtelomeres were not associated with disease [9,26–28]. This finding provided strong evidence that genetic factors in the subtelomere of chromosome 4 contribute to FSHD pathology, and indeed detailed sequence analysis of the first and last repeat unit of the array of the most common chromosomal backgrounds identified 4A161-specific sequence variants [26,28]. Thus, not only outside the D4Z4 repeat, but also within the repeat array, it was shown that the permissive chromosomes contained specific sequence variants not shared by other (non-permissive) chromosome ends [26,28]. In summary, these detailed genetic studies revealed that each chromosome maintained specific polymoprhisms, and that specific sequences in the FSHD-permissive variants of 4A confer permissiveness to this repeat.
The sequence of the D4Z4 repeat contains the open reading frame (ORF) of a double homeobox transcription factor, DUX4 (Figure 1) [30–31]. The DUX4 ORF is in a single exon, whereas other members of the double-homeobox family have multiple introns, indicating that DUX4 was inserted into the genome as a retrotransposed mRNA from an intron containing the DUX gene, possibly DUXC or less likely Duxbl [32–34]. In contrast to the many pseudogenes retrotransposed to our genome, the DUX4 retrogene maintained a conserved ORF . However, it is unclear whether the conservation of the ORF was a consequence of a conserved functional role propagated to all repeats by concerted evolution.
Although initial attempts to identify DUX4 mRNA expression in normal development or disease were unsuccessful , a major advance in understanding FSHD was the identification of poly-adenylated mRNA containing the DUX4 ORF using RT-PCR . Initially identified only in FSHD muscle samples, the polyadenylation site of the DUX4 mRNA was mapped to the region immediately telomeric to the last D4Z4 repeat, a region previously cloned from a phage clone containing the D4Z4 repeat and flanking sequences and called pLAM1 . It was proposed that the contraction of the D4Z4 array results in the transcription of the DUX4 retrogene ; however, the abundance of the DUX4 mRNA and protein was extremely low.
In addition, a later study  identified D4Z4 and DUX4 transcripts in both FSHD and control muscle. Random priming of RNA identified both sense and anti-sense transcripts throughout the D4Z4 region. Regions of lower transcript abundance correlated with the presence of small si- or mi-RNA-sized fragments, and it was suggested that these bidirectional transcripts and small RNAs contribute to the heterochromatin suppression of this region , as discussed in the next section. In addition, several splice forms of a DUX4 polyadenylated mRNA were identified that used the pLAM1 polyadenylation site, but these DUX4 transcripts were also at extremely low abundance . Several groups demonstrated that relatively high levels of DUX4 expression was pathologic to muscle cells and other cell types [37–40]. Therefore, if DUX4 could be shown to be expressed at sufficient levels in FSHD, then it was likely to be a major cause of the muscle pathology.
A third important clue that could explain how a repeat contraction can cause disease came from chromatin studies of the D4Z4 array. The D4Z4 repeat is GC rich and contains sequences often residing in heterochromatic domains of the genome . It was therefore postulated that normally the D4Z4 repeat is in a relatively closed chromatin configuration and that, as a consequence of repeat contraction, it would adopt a more open chromatin configuration (Figure 1). DNA methylation studies and studies of histone modifications and other chromatin factors supported this hypothesis [17,42–43]. Normally, the D4Z4 repeat is densely DNA methylated; FSHD chromosomes experience an approximate 30–40% reduction of DNA methylation at specific sites tested in D4Z4. In addition, chromatin immunoprecipitation (ChIP) studies showed that the D4Z4 repeat is normally occupied by both transcriptionally repressive as well as permissive histone modifications, while in FSHD chromosomes, there is a relative loss of repressive histone modifications. These changes in chromatin structure are restricted to the D4Z4 repeat and do not seem to spread proximally. ChIP studies also identified the losses or gains of other chromatin factors such as HP1γ, the cohesin complex, YY1 (losses) and CTCF (gain) at D4Z4 of disease alleles [16,43–44]. Overall, the data support a model in which D4Z4 in FSHD chromosomes adopts a relatively open chromatin structure facilitating the transcriptional activity of the repeat and possibly affecting the processing of the different D4Z4 transcripts.
Interestingly, similar changes in the chromatin structure of D4Z4 were also identified in a small cohort of patients whose disease status could not be confirmed by standard molecular diagnostic tests [43,45–46]. These patients, now classified as FSHD2, have normal, but compared to the general population, smaller-sized D4Z4 repeat arrays; disease alleles of FSHD2 patients do show similar changes in D4Z4 chromatin structure as those of FSHD1 patients . In contrast to FSHD1 where the relative relaxation of the D4Z4 chromatin structure seems restricted to the contracted allele, in patients with FSHD2, the D4Z4 repeats on both chromosomes 4 and 10 seem to be affected. Other repeat structures in the genome of these patients seem to be normally structured  and the cause for this change in chromatin structure of D4Z4 in patients with FSHD2 is currently not known. Also in common with FSHD1, patients with FSHD2 have at least one permissive (4A161) chromosome . Thus it seems that patients with FSHD1 and FSHD2 share the commonality of a relative chromatin relaxation of D4Z4 on the genetic 4A161 background.
Very recently, genetic studies directly demonstrated the requirement of the DUX4 polyadenylation site for FSHD , and molecular studies have produced a new developmental model for the disease that is consistent with the extremely low abundance of the mRNA and protein  (Figure 2). Together, these studies explain the apparent discrepancies in previous models of FSHD and provide compelling support for the expression of DUX4 as a major cause of FSHD.
Meticulous genetic analysis of patients with unusual hybrid D4Z4 repeat structures containing units with sequence signatures consistent with those originating from chromosomes 4 and 10 revealed a common last portion of the D4Z4 repeat array and flanking pLAM1 sequence . This old-fashioned positional cloning strategy strongly argued that the distal end of the repeat array and flanking pLAM1 sequences are critically important for the development of FSHD . Further corroborating this finding was the identification of an FSHD family in which the disease segregated with a contracted D4Z4 allele of chromosome 10. Importantly, the last part of this disease-associated repeat array was replaced by permissive chromosome 4 sequences. The identification of this family in which FSHD segregates with chromosome 10 essentially confirms the importance of the distal end of the repeat and pLAM1 sequences and precludes a prominent role for other proximal candidate genes on chromosome 4. Further genetic studies of the distal end of the D4Z4 repeat array and flanking sequences allowed an almost perfect separation of permissive and non-permissive chromosome ends based on the identification of consistent polymorphisms in the region sequenced . One noticeable difference between permissive chromosomes 4 and non-permissive chromosomes 10 was the presence of a DUX4 polyadenylation signal on chromosome 4, while on chromosome 10 this polyadenylation signal was lost because of the presence of independent polymorphisms . Indeed, when transfecting the critical region in murine C2C12 muscle cells, stable DUX4 transcripts making efficient use of the DUX4 polyadenylation signal could only be identified when constructs derived from permissive chromosomes with a polyadenylation signal were transfected . Interestingly, a study of the recent hominoid evolution of the 4q subtelomere shows that the permissive chromosome end is ancestral to all 4q chromosome ends, and the data are consistent with an evolutionary pressure to eliminate the third DUX4 exon in pLAM1 . These genetic studies clearly demonstrated the requirement for the polyadenylation site utilized by DUX4 mRNA and, therefore, strongly implicated DUX4 protein as a cause of FSHD. However, as noted above, although DUX4 mRNA was detected in FSHD muscle, it was still at extremely low abundance.
Low abundance mRNA in a population of cells could reflect either a small amount of mRNA in all cells or an abundant amount of mRNA in just a few cells. RT-PCR amplification of DUX4 mRNA in small pools of 100 or 600 differentiated FSHD muscle cells identified relatively abundant transcripts in a subset of the pools . The frequency of positive pools suggested that approximately one-in-1000 FSHD muscle cell nuclei were expressing an abundant amount of DUX4 mRNA. Immuno-detection confirmed that approximately 0.1% of nuclei in cultured FSHD muscle cells expressed an abundant amount of protein. In addition, the DUX4-expressing FSHD muscle nuclei had characteristics consistent with DUX4 induced toxicity, i.e. an aggregation of nuclear DUX4 protein that occurs coincident with DUX4-induced apoptosis. Therefore, the very low abundance of DUX4 mRNA in FSHD muscle represented relatively abundant amounts of DUX4 mRNA and protein in a small subset of the nuclei, likely leading to dysfunction or death of those DUX4-expressing nuclei.
As a retrogene, DUX4 was viewed by some as a dead gene, or pseudogene, brought ‘back to life’ by the contraction of the D4Z4 array, but the conservation of the ORF suggested otherwise [32–33]. Indeed, DUX4 mRNA and protein was shown to be highly expressed in the testes of unaffected individuals, most likely in the germline, both from the permissive 4A161 allele as well as from the non-permissive alleles . It is interesting to note that the germline DUX4 mRNA transcripts from chromosome 10, which lacks the polyadenylation site that is present on the permissive chromosomes, use an alternative polyadenylation site approximately six kilobases telomeric to the end of the D4Z4 array. Use of this alternative polyadenylation was restricted to germline tissues and not identified in somatic tissues . Therefore, the DUX4 retrogene is expressed in early development, i.e., in the human germline, and is epigenetically silenced in somatic tissues. The inefficient chromatin-mediated repression, either related to the contraction of the array in FSHD1 or through unknown mechanisms in FSHD2, results in the occasional escape from repression in muscle cells, and possibly other somatic cells. In this model, the muscle cell nuclei would be lost over time in FSHD because of the inappropriate expression of DUX4 protein (Figure 2).
These recent studies substantiate a developmental model of FSHD that explains many of the previously unexplained mysteries of this human disease. First, genetic studies demonstrate the requirement for the DUX4 polyadenylation site in the pLAM1 region of the permissive alleles, indicating that DUX4 mRNA is critical for FSHD . Second, molecular studies show decreased density of repressive chromatin modifications in both FSHD1 and FSHD2 , indicating that DUX4 mRNA is more likely to be expressed. Third, RNA and protein studies showed an occasional escape from the inefficient chromatin repression leading to high levels of DUX4 expression in a small number of nuclei in FSHD muscle cells . Fourth, abundant expression of DUX4 in testis, most likely the germline cells, indicates that this retrogene might have a normal role in germ cell development . Finally, if a retrogene has subsumed a normal role in germ cell biology, then repression of that gene in somatic cells needs to co-opt regulatory mechanisms distinct from the evolved enhancers and promoters of the parental gene. Therefore, the repression of DUX4 in somatic cells is likely a mechanism adopted from other loci, such as the mechanism of silencing retrotransposons and other repetitive elements, and is not highly evolved for the DUX4 locus. In this case, co-opted mechanisms might not be sufficiently robust to avoid disease, as is evident by the association of FSHD with the contracted D4Z4 array.
This new unifying and substantiated model of FSHD has one additional profound and as yet unexplored implication. DUX4 arose from the retrotransposition of a parental DUX mRNA, possibly either DUXC or less likely Duxbl [32–34]. Both DUXC and Duxbl are expressed in the germline, a requirement for introducing a retrogene into the population, but neither are present in primates . Therefore, primates have retained the retrogene and lost the parental gene, suggesting a selective advantage of the retrogene. Primates have sacrificed upper extremity and facial muscle mass for the advantage of an upright posture and highly expressive facial muscles. Although this remains highly speculative, it is interesting to suggest that the DUX4 retrogene might have been retained in preference to the parental gene because inefficient chromatin repression results in sufficient expression in skeletal muscle to modulate facial and upper extremity muscle mass, even in individuals without the FSHD deletion. If this is correct, then FSHD is a hypermorphic phenotype for traits that are critical for primate evolution, and the mystery of FSHD still has the potential to lead us to new understandings of human biology.
Our work is supported by the Fields Center for FSHD and Neuromuscular Research, the National Institutes of Health (NINDS P01NS069539, NIAMS R01AR045203 and NIAMS R21AR059966), the Muscular Dystrophy Association (173202), the Prinses Beatrix Fonds (WAR08-14), the Shaw Family Foundation, the FSH Society, the Dutch FSHD Foundation, and the Pacific Northwest Friends of FSH Research.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.