In summary, our data demonstrate capped transcripts initiating upstream of the DUX4 MAL codons that continue into the pLAM region where they are polyadenylated. Although a full-length DUX4 transcript is initially produced, and can be detected at low levels, spliced and cleaved forms are more readily detected, presumably indicating higher abundance of these processed forms of the DUX4 mRNA. The carboxy-terminal portion of DUX4 is either removed by a splice event that creates an mRNA encoding for the double-homeobox region of DUX4, or is isolated as an uncapped and polyadenylated RNA, presumably by cleavage of the full-length DUX4 transcript. The presence of a direct repeat of the second exon in some mRNA strongly indicates that some of the spliced and polyadenylated transcripts initiate from internal D4Z4 units and progress through intervening D4Z4 repeats to terminate at the poly-adenylation site in the pLAM region.
Our study is consistent with a recent publication (7
) showing that a transcript from the last D4Z4 unit extends into the pLAM region, has specific splice sites in the 3-prime UTR and is polyadenylated at a specific site. We extend that study by demonstrating (i) several overlapping sense and anti-sense RNA transcripts from the 4q D4Z4 units; (ii) RNA cleavage and processing of the central portion of the DUX4 containing transcript to generate mi/siRNA-sized fragments; (iii) additional miRNA-sized fragments generated from the introns in the 3-prime UTR of the DUX4 transcript; (iv) biological activity of DUX4 transcripts that do not produce a full-length DUX4 protein; (v) translation of the highly conserved C-terminal portion of DUX4, possibly through internal ribosomal initiation, is sufficient to inhibit myogenic differentiation and (vi) novel splice forms of the DUX4 transcript that lack the highly conserved C-terminal region. These findings suggest several new candidate mechanisms for FSHD pathophysiology that deserve rigorous exploration (Supplementary Material, Fig. S3
Small RNA fragments might have a biological function
The northern analysis demonstrates that restricted regions of the D4Z4 transcripts give rise to the small si or miRNA-sized fragments, because moving the probe 10 nt in either direction fails to detect the small fragments, and multiple other probes to the D4Z4 regions failed to detect small fragments. Some of the fragments map to regions of predicted hairpin RNA and others map to introns in the 3-prime UTR of the DUX4 transcript, both consistent with a miRNA mechanism. However, we are reluctant to conclude that these are si-, mi- or pi-RNAs without further confirmatory study. The northern blots show more hybridization to larger precursor-like RNA fragments than is typically seen for the precursors to the miRNA (pre-miRNA or priRNA) signals and this suggests a fragmentation process distinct from miRNA generation and might be more consistent with the formation of endogenous siRNAs from double-stranded RNA. In either case, however, these small RNA fragments might have a biological role.
The overlapping sense and anti-sense transcripts that apparently span the D4Z4 region, with areas of discontinuity (see Figs and ), might generate double-stranded RNA that can subsequently be cleaved to generate siRNAs. This is of interest because repeat associated heterochromatin in numerous species has been shown to be mediated by an RNAi mechanism (19
). Transcripts from the repetitive regions are converted to siRNA fragments through Dicer-mediated cleavage and the siRNA induce local heterochromatin through recruitment of HP1 and other factors. This creates the paradox that transcription of the repetitive element is necessary for heterochromatic silencing. Therefore, the bidirectional transcripts and small RNA fragments we describe at the D4Z4 repeats are consistent with the emerging model of repeat associated heterochromatic silencing.
In addition to silencing the transcribed locus in cis
, endogenous siRNA can also target other DNA loci in trans
, or RNA transcripts. A recent example reveals that siRNA generated from pseudogenes can target transcripts from corresponding genes in mammalian ES cells (20
), either through double-stranded RNA generated from the pseudogene, or a single-stranded pseudogene RNA that hybridizes with the spliced RNA from the cognate gene. Alternatively, the ~21 nt siRNA fragments, or the ~25–27 nt piRNA fragments can target retroposons, or potentially other DNA elements to induce heterochromatic silencing in cis
). Therefore, the transcripts and small RNA fragments we identified at the D4Z4 repeats might be associated with local chromatin silencing, chromatin silencing at distant loci or might target RNA from other loci. We should note that a prior publication failed to detect RNA transcripts or PolII association with the D4Z4 repeats (22
); however, this might represent differences in cell type or relative sensitivity of the assays compared with our current study.
In contrast to siRNAs, miRNAs are generated from a single RNA strand that forms a double-stranded hairpin structure, which frequently are encoded in introns of transcribed genes. Several of the small RNA fragments identified in this study map to regions of predicted RNA hairpin structure and several also map to introns, both characteristics common to miRNA. It is interesting to note that standard miRNA prediction algorithms identify several RNAs involved in muscle cell differentiation (S.J.T., unpublished data), and it will be important to determine whether these have a role in normal development or FSHD.
Internal translation initiation can produce a C-terminal fragment of DUX4 that blocks myogenesis
Transfection experiments with stop codons introduced in the DUX4 ORF indicate that internal translation initiation can result in protein translation of the C-terminal region of DUX4, and that this small protein (76 amino acids) can block specific steps of myogenic differentiation. Prior studies have demonstrated that transcriptional targets of MyoD are expressed at lower levels in FSHD muscle cells (23
) and it is interesting that the C-terminal protein beginning at the MQG codons appears to block myogenesis at a step between MyoD RNA transcription and the activation of MyoD target genes. It is important to note that prior studies have not identified the presence of this 76 aa protein in either FSHD or wild-type muscle and our study only shows that it is expressed from transfected RNA, not from the endogenous RNA. However, the epitope recognized by the 9A12 monoclonal antibody we, and others, have used to detect DUX4 is not contained in the 76 aa MQG protein and it will be necessary to generate antibodies to this protein to assess its expression in wild-type and FSHD tissues.
RNA containing this C-terminal MQG ORF fractionates with poly-adenylated mRNA; however, 5-prime race identifies only uncapped 5-prime ends upstream of the MQG codons, suggesting that the MQG ORF containing RNA is generated through cleavage of a longer transcript, possibly initiating upstream of the DUX4 MAL codons. Normally, it would be anticipated that an uncapped RNA would not be translated; however, the transient transfection studies provided definite evidence for internal translation initiation in the region upstream of the MQG codons. Our demonstration of IRES activity in this region of the RNA further indicates that this uncapped RNA fragment can be translated. Uncapped and polyadenylated viral RNA has been shown to be translated in mammalian cells through IRES elements, although as noted above, there remains some disagreement regarding the molecular mechanisms (16
). Therefore, our suggestion that the MQG ORF might be translated from an uncapped and polyadenylated RNA needs rigorous validation. However, the ability of this 76-aa protein to inhibit myogenesis in C2C12 cells and in zebrafish embryos suggests a possible role in FSHD, particularly since this 76-aa protein inhibits a specific stage of myogenesis—after the expression of MyoD
and the before the activation of Myog
(a fast myosin isoform) (Fig. )—whereas, DUX4 appears to be broadly toxic to both cells and embryos. It is interesting to note that protein and RNA expression studies on FSHD muscle identified both a decreased expression of MyoD targets and a transition from fast-glycolytic to slow-oxidative fibers in FSHD (23
). In addition, a prior study demonstrated partial inhibition of C2C12 differentiation when transfected with D4Z4 repeats but did not identify the full-length DUX4 protein (25
). Our findings provide a new basis for extending these earlier studies.
Novel splice sites suggest continuous transcripts through the D4Z4 units producing a protein similar to DUX4C
We have also identified poly-adenylated mRNA containing the 5-prime region of the DUX4 transcript. In our studies, these transcripts lack the 3-prime region containing the MQG ORF due to an internal splice donor site that connects with the splice acceptor of the second exon located in the region of the KpnI site that arbitrarily determines the repeat boundaries. It might be quite revealing that the majority of these transcripts contain a direct repeat of the second exon. Our best interpretation at this time is that the duplicated second exon is strong evidence that the polyadenylated transcript originates within an internal D4Z4 unit, continues through one or more additional D4Z4 units until there is a successful splice to an internal second exon splice acceptor site, and then this internal second exon is spliced to the second exon in the last repeat followed by polyadenylation in the pLAM region. This interpretation is consistent with our original observation that sense transcripts appear to span the entire D4Z4 unit with some regions of interruption that are likely secondary to RNA processing. In addition, the 5-prime region of the transcripts containing the duplicated second exon have a polymorphism that does not match the first or last D4Z4 sequence in λ42, again suggesting that this transcript arises from an internal repeat. It will, however, be necessary to accurately identify intra-allelic polymorphisms in the 4qA161 D4Z4 units to validate our interpretations.
If a protein is produced from this spliced transcript, it would contain the double-homeobox region of DUX4 but lack the highly conserved C-terminal region. This would be very similar to the DUX4c transcript and protein, which has been shown to inhibit myogenesis and suppress expression of both MyoD and Myf5 (18
). Together with our data, it appears that the amino-terminal portion of DUX4 might suppress MyoD and Myf5 expression, whereas the carboxy-terminal portion can suppress myogenesis at a step following the expression of MyoD. The presence of alternative splice forms and RNAs that potentially differentially regulate the expression of each of these DUX4 regions suggests that each might have a distinct developmental role that needs further exploration.
Finally, it is important to mention that we do find evidence of full-length DUX4 transcripts; however, these appear to be of significantly lower abundance than RNAs containing the 5-prime or 3-prime regions.
Macrosatellite repeats and an emerging model for FSHD
At this time, a biological role for the D4Z4 arrays remains speculative, but recent studies on retrotransposons, chromatin regulation and other macrosatellite repeats reveal striking parallels to our current findings at D4Z4 and suggest a biological role for these repeats. A strong parallel to our work on D4Z4 is the DXZ4 macrosatellite repeat (26
). Similar to D4Z4, DXZ4 is a 3 kb GC rich unit repeated 50–100 times on the X-chromosome. On the active X-chromosome, bidirectional transcription of DXZ4 results in small RNA fragments, presumably siRNA generated from dsRNA. The locus also has H3K9 and CpG methylation that are associated with a siRNA-mediated induction of heterochromatin. On the inactive X-chromosome, the insulator factor CTCF binds adjacent to a bidirectional promoter in a region that remains unmethylated at CpG residues, and this is associated with epigenetic marks of euchromatin and longer RNA transcripts, possibly secondary to decreased production and processing of double-stranded RNA. Similar to DXZ4, our study finds bidirectional transcription of D4Z4 associated with small RNAs. In addition, the D4Z4 units have CTCF binding sites and we find enriched CTCF binding at hypomethylated sites on the deleted pathogenic allele, as well as enriched CTCF binding to the D4Z4 units in undifferentiated ES cells (Filippova et al
., in preparation).
At least two other macrosatellite repeats contain genes. TSPY is in the DYZ5 repeat on the Y chromosome and is expressed in the placenta, and USP17 encodes a deubiquitinating enzyme in the RS447 repeat (27
). Similar to the coding region of DUX4, the USP17 gene does not contain introns. Also similar to our findings at DUX4
, USP17 is transcribed in both sense and anti-sense directions and the anti-sense transcripts are believed to have a role in regulating USP17 expression.
Many intronless genes and pseudogenes were generated by retrotransposition of a spliced mRNA into the genome. It was initially suggested that DUX4
was generated following a retrotransposition of DUXA (30
); however, an elegant study of the evolution of the human DUX4
and the D4Z4 repeat indicates that this region arose from a retrotransposition of the DUXC gene (8
). DUXC has apparently been lost in the primate lineages but is still present in dogs, cows and armadillo. Generation and propagation of multiple retrotransposed genes indicates germ-line expression and thereby suggests a potential role for DUXA and DUXC in germ cell or early embryonic stem cell. The coding region for DUX4 has been conserved (8
) and it is possible that DUX4 protein expression might substitute for the original DUXC function. The mouse DUX4
ortholog is also transcribed in the sense and anti-sense orientation with partial fragments of the RNA detected more readily than the full length (8
), indicating that our findings at human DUX4
are conserved in the murine ortholog.
is not a pseudogene because of its conserved ORF, it has some similarities to emerging properties of some pseudogenes. Recent studies in Drosophila (31
) and mammals (20
) demonstrate that transcripts from pseudogenes, sometimes occurring in subtelomeric clusters, can suppress transposable elements in the germ-line and regulate RNA stability or translation from the related gene family. One pathway for this regulation is through pi-RNA, but siRNA is also generated through bi-directional transcription of these pseudogenes. In this context, it is very interesting that we have identified bi-directional transcripts through the subtelomeric cluster of D4Z4 units that contain the pseudogene-like DUX4
, and also have demonstrated DUX4 RNA expression in ES cells. Although a more thorough analysis is needed, the apparent decreased RNA processing of the DUX4 transcript in ES cells (Fig. D) suggests that there is a special function for this RNA in ES cells and possibly for the cleaved RNA in the process of ES cell differentiation. It will be interesting to determine whether the small RNAs generated from DUX4 function to suppress DUX4 expression in the germ-line or regulate DUXC in some species or other DUX paralogs.
Therefore, our studies on D4Z4 are very consistent with the recent findings that bidirectional transcription of pseudogenes and genes in macrosatellite repeats is developmentally regulated and also serves a regulatory function. The contraction of the repeats in FSHD likely alters the efficiency of one of these functions, such as maintaining regional heterochromatin. Because of the apparent restriction of FSHD to D4Z4 deletions of the 4qA161 allele, it is likely that a polymorphism results in the production of an abnormal product by affecting RNA splicing or polyadenylation, CTCF (or other factor) binding or small RNA production or targeting. Our current study has provided a strong foundation for this new model of FSHD and identified several new biological processes associated with D4Z4 that warrant further investigation as candidate mechanisms of disease pathophysiology.