|Home | About | Journals | Submit | Contact Us | Français|
Myotonic dystrophy (DM1) is associated with expression of expanded CTG DNA repeats as RNA (CUGexp RNA). To test whether CUGexp RNA creates a global splicing defect, we compared skeletal muscle of two mouse DM1 models, one expressing a CTGexp transgene, and another homozygous for a defective Mbnl1 gene. Strong correlation in splicing changes for ~100 new Mbnl1-regulated exons indicates loss of Mbnl1 explains >80% of the splicing pathology due to CUGexp RNA. In contrast, only about half of mRNA level changes can be attributed to loss of Mbnl1, indicating CUGexp RNA has Mbnl1-independent effects, particularly on mRNAs for extracellular matrix (ECM) proteins. We propose that CUGexp RNA causes two separate effects: loss of Mbnl1 function, disrupting splicing, and loss of another function that disrupts ECM mRNA regulation, possibly mediated by MBNL2. These findings reveal unanticipated similarities between DM1 and other muscular dystrophies.
Myotonic dystrophy (DM) is a triplet repeat expansion disease, one of a group that includes Huntington disease, Fragile X syndrome, and Friedreich’s ataxia1,2. The common form, DM1, is caused by a CTG-repeat expansion (CTGexp) in the 3′-untranslated region of the DMPK gene, leading to myotonia, muscle degeneration, reduced heart function, ocular cataracts, and nervous system dysfunction3. RNA containing CUG repeats (CUGexp RNA) accumulates in nuclear foci4,5. Based on its autosomal dominant inheritance, a leading hypothesis for DM1 is that CUGexp RNA is toxic. Mice engineered to express CUGexp RNA show many symptoms of myotonic dystrophy6,7. In addition to RNA splicing (see below), CUGexp RNA has been proposed to disrupt a wide variety of cellular processes through several mechanisms, including “leaching” of transcription factors8, processing into small RNAs that trigger inappropriate gene silencing9, or through PKC-dependent signaling pathways10. Of these, the contribution of splicing perturbations is most clear given the physiological relevance of several splicing alterations that occur when CUGexp RNA is expressed5,7,11. For example, aberrant splicing of transcripts from the chloride channel gene CLCN1 is responsible for the myotonia in skeletal muscle12–15, and aberrant splicing of insulin receptor transcripts is associated with insulin resistance16.
Among proteins that bind CUGexp RNA are homologs of Drosophila muscleblind (muscleblind-like proteins, Mbnl1 through 3)17,18. Mbnl proteins contain 4 CCCH-type zinc fingers that recognize a YGCY motif repeated within the CUG repeat (CUGCUG), and can function in splicing regulation19–21. CUG repeats can form long dsRNA structures in vitro22, however crystal structures of Mbnl1 zinc fingers complexed with RNA indicates that the Watson-Crick faces of the binding site 5′-GC-3′ dinucleotide residues are buried in the protein and unavailable for duplex formation20. The colocalization of CUGexp RNA and Mbnl1 in nuclear foci suggests a place where sequestration occurs4,11,17.
Besides Mbnl1, the altered function of splicing factors CUGBP1 and hnRNP H have been proposed to play a role in DM1 pathogenesis23–26. To test of the role of Mbnl1, a homozygous Mbnl1 mutant mouse (Mbnl1ΔE3/ΔE3) was created27. Like mice expressing CUGexp RNA, mice deficient in Mbnl1 show characteristics of myotonic dystrophy including aberrant splicing11,19,27. Some but not all of the DM-like symptoms and aberrant splicing of six exons are rescued by AAV-mediated expression of Mbnl1 in CUGexp RNA expressing mice28, leaving open the possibility that CUGexp RNA has other mechanisms of action. In addition, the broader impact on splicing of the loss or sequestration of Mbnl1 beyond the few genes tested so far, is unknown.
To determine the extent to which the loss of Mbnl1 explains the splicing and gene expression defects caused by CUGexp RNA, we compared mRNA in skeletal muscle of HSALR mice7 expressing CUGexp RNA, to those in Mbnl1ΔE3/ΔE3 mice27 using splicing-sensitive microarrays29,30. We find that global splicing perturbations are remarkably congruent in these mice, and identify more than 200 splicing events altered upon loss of Mbnl1. Testing human exons orthologous to these new mouse Mbnl1-dependent exons suggests that human DM1 patients suffer many of the same mis-splicing events, identifying new potential splicing markers for the human disease. Mbnl1 RNA binding sequences were greatly enriched near the affected exons suggesting that CUGexp RNA affects splicing primarily through Mbnl1. In contrast to splicing, many changes in mRNA level are found in CUGexp RNA expressing muscle but not in the Mbnl1 mutant, indicating a distinct second defect caused by CTGexp DNA. This defect appears to be unusually focused on genes expressing extracellular matrix (ECM) components and their regulators, some of which are known to play roles in other forms of muscular dystrophy and connective tissue diseases.
If Mbnl1 sequestration is the main cause of gene expression perturbation in DM1, then transcripts from muscle genetically lacking Mbnl1 should appear broadly similar to those expressing CUGexp RNA. The extent of this similarity should provide a strong quantitative estimate of the extent to which the Mbnl1 sequestration hypothesis can explain the disease. To test this directly, RNA was extracted from quadriceps muscle of age-matched males carrying either the HSALR transgene expressing CUGexp RNA (HSALR)7 or homozygous for the Mbnl1 knockout allele (Mbnl1ΔE3/ΔE3)27, or homozygous wild type Mbnl1 in the same (FBV) background (wt), and analyzed on splicing-sensitive microarrays as described previously29,30. Experience with this method indicates that differences in the log2 of the skip/include ratio between two samples (Fig. 1a, Sepscore, see Methods) with absolute value > 0.3 can be validated by RT-PCR about 85% of the time29,30. We observe 246 events in Mbnl1ΔE3/ΔE3 mice and 221 events in HSALR mice that exceed this score, distributed among different splicing modes (alternative cassette exons, alternative 5′ or 3′ splice sites, and mutually exclusive cassette exons, Fig. 1b and Supplementary Table 1). Even at this relatively crude level of analysis, nearly 80% of splicing events altered in HSALR mice are also altered in Mbnl1ΔE3/ΔE3 mice. About half show increased exon skipping after loss of Mbnl1, and half show decreased skipping, indicating formally that Mbnl1 contributes to both activation and repression of splicing.
To evaluate whether the two disease models share quantitatively similar changes in splicing, we compared the Sepscores for each splicing event altered (with |Sepscore| ≥ 0.3) in both models. The R2 value for this comparison is 0.84, suggesting that Mbnl1 loss of function explains more than 80% of the splicing phenotype caused by CUGexp RNA expression (Fig. 1c). We used RT-PCR to validate a subset of the alternative cassette exons, as shown for a 123 nt cassette exon repressed by Mbnl1 in the Nfix gene (encoding NF-I/X CAAT box-binding transcription factor, Fig. 1d, see Methods). The 93 nt exon of the Tlk gene represents Mbnl1 contribution to splicing activation, as skipping is increased in the mutants (Fig. 1d). Of 33 other events tested, all but one behaved as expected from the arrays (FDR = 0.03, Supplementary Fig. 1 and Supplementary Table 1). Of these, 28 are affected in both HSALR and Mbnl1ΔE3/ΔE3 mice (Fig. 1d; Supplementary Fig. 1a); 4 appear affected only in Mbnl1ΔE3/ΔE3 (e. g. Ndrg2, Fig. 1e; Supplementary Fig. 1b); and 1 appears affected only in HSALR (Fig. 1f; Supplementary Fig. 1c). The concordance observed by arrays (Fig. 1c) was confirmed by RT-PCR reactions quantified on an Agilent Bioanalyzer (R2 = 0.88, Supplementary Fig. 1d).
We found four splicing events affected in Mbnl1ΔE3/ΔE3 but not in HSALR mice (Fig. 1e, Supplementary Fig. 1b). These might compete effectively for Mbnl1 against CUGexp RNA, and thus may not be inhibited except at levels higher than those achieved by this transgene. We found only one event affected specifically in HSALR mice (Fig. 1f), suggesting that CUGexp RNA transgene expression can cause splicing defects by mechanisms other than simple loss of Mbnl1. Possibly the regulation of this exon is the responsibility of other factors that function in the Mbnl1 knockout mouse but not in HSALR, such as Mbnl2, which is also sequestered by CUGexp RNA31,32, or CUGBP1 whose activity is altered in CUGexp RNA expressing cells24, or by some other indirect or combinatorial mechanism (see below). Given that Mbnl2 is sequestered by CUGexp RNA31,32, our results also mean that Mbnl2 is largely unable to compensate for the loss of Mbnl1 in muscle, otherwise we would have found many more splicing events perturbed in the HSALR but not in the Mbnl1ΔE3/ΔE3 mouse. We conclude that the vast majority of splicing dysregulation in the CUGexp RNA-expressing muscle is due to catastrophic loss of Mbnl1 splicing factor activity.
To identify motifs associated with Mbnl1-regulated exons, we used Improbizer, which identifies sequence motifs present in a set of sequences as compared to a background sequence set (See Methods). We contrasted the introns upstream and downstream of three groups of exons (Supplementary Table 2): Mbnl1-activated exons (more skipping in Mbnl1ΔE3/ΔE3 than wild type); Mbnl1-repressed exons, (more inclusion in Mbnl1ΔE3/ΔE3); and background exons (detected in our experiments but showing no significant splicing change). Improbizer strongly recognizes enrichment of a YGCY-containing motif (CUGCY) in the Mbnl1-repressed and Mbnl1-activated exons (Fig. 2a, b), consistent with interpretation of mutagenesis experiments on several splicing substrates18,33,34, and with crystal complexes of Mbnl1 zinc fingers with RNA20. We mapped the distribution of positions of CUGCY motifs to upstream and downstream intron regions (Fig. 2c, d). Introns upstream of Mbnl1-repressed exons, as well as part of the exon itself, are significantly enriched for CUGCY motifs, which often occur multiple times in individual pre-mRNAs. In contrast the corresponding region of Mbnl1-activated exons is not enriched (Fig. 2c). In the intron downstream of affected exons, both Mbnl1-repressed and Mbnl1-activated exons show slight but significant enrichment in CUGCY elements above the background exons (Fig. 2d). Enrichment for CUGBP1 or hnRNP H motifs is not observed (Supplementary Fig. 2). This pattern argues that the splicing defects are primarily due to loss of Mbnl1, and suggests that regulation by Mbnl1 often involves its direct binding to regions near the exons it regulates.
To confirm the function of YGCY motifs in newly identified exons, we cloned the Mbnl1-activated exon from the Vldlr gene and the Mbnl1-repressed exon from the Nfix gene, each with their native flanking intron sequences, into a splicing reporter plasmid. We then mutated copies of the motif in each and tested them in mouse embryo fibroblasts lacking Mbnl1 (derived from the Mbnl1ΔE3/ΔE3 mouse), along with a construct expressing an Mbnl1-GFP fusion protein or GFP alone (Fig. 3a, b). Splicing activation is promoted by the Mbnl1-GFP fusion protein for the wild type, but not the mutant Vldlr splicing reporter, indicating that the motif downstream of the exon is important for Mbnl1-mediated splicing activation (Fig. 3a). Splicing repression is promoted by the Mbnl1-GFP fusion protein for the wild type Nfix reporter, and is only partly compromised by alteration of four copies of the motif in the intron upstream of the exon. Complete loss of repression is achieved once three additional motif copies in the exon itself are altered (Fig. 3b). These results confirm by reconstruction that newly identified exons depend on Mbnl1 for their correct splicing, and furthermore show that regulation is mediated by sequence motifs enriched in Mbnl1-regulated exons.
A handful of splicing defects are known in human DM1 patients11. To determine whether human DM1 patients share newly identified splicing changes with the mouse DM1 models, we aligned affected mouse exons to the human genome and found 49 that are conserved and orthologous to Mbnl1-dependent mouse exons, 39 of which show significant evidence of alternative splicing in humans, according to the Alt Events track on the UCSC Genome Browser35. We tested six for perturbation of splicing in human DM1 patients. Three (in Nfix, Smyd1, and Spag9) are perturbed in all three DM1 patients tested, as compared to a normal human control, whereas three others (Gnas, Mtdh and Ppp2r5c) are clearly affected only in either patients 2 or 3 or both (Fig. 4). To rigorously determine the value of such changes in monitoring human DM1, more DM1 patients and unaffected people would need to be tested. Nonetheless, these and other11 splicing changes indicate that Mbnl1-dependent events observed in the mouse are altered in human DM1. Thus exons affected in the mouse models represent a rich source of potential markers for the detection and evaluation of the human disease, as well as for testing the efficacy of treatments.
Besides impacting splicing, Mbnl1 loss could lead to transcript-level changes by other direct or indirect mechanisms, including nonsense mediated decay caused by aberrant splicing (AS-NMD)36, altered mRNA stability, or indirectly through mis-splicing of transcription factor or splicing factor mRNAs, all of which could combine to alter the transcriptome37. If Mbnl1 sequestration is the only mechanism by which CTGexp DNA alters gene expression, these myriad effects should be highly similar in both mouse models. To test this, we compared gene expression changes in HSALR mice to Mbnl1ΔE3/ΔE3 mice using the array probe intensities for all the constitutive (always included) exons in each gene (see Methods). After filtering out genes with probes affected by cross-hybridization to the massive amounts of CUGexp RNA in the HSALR mouse samples (see Methods), we identified 148 genes whose muscle transcript levels were significantly altered in HSALR mice, and 110 in the Mbnl1ΔE3/ΔE3 mice (≥ 1.5 fold change; q < 0.05 cutoff, Fig. 5a). Of the 148 changes observed in HSALR mice, 102 (69%) also appear in Mbnl1ΔE3/ΔE3 mice, suggesting that, as for splicing, the underlying reason for those expression changes is loss of Mbnl1, either directly or indirectly through other factors. The 46 genes (31%) that change only in HSALR mice indicate that CTGexp DNA imposes a second layer of dysregulation not shared by mice lacking Mbnl1. Overall, Mbnl1 loss explains a little more than half of the transcript level variance between wild type and HSALR mice (Fig. 5b; R2 = 0.57). We named the larger class of genes whose mRNA levels are altered in both models primarily due to loss of Mbnl1 “class I” genes, and those altered only in the HSALR but not Mbnl1ΔE3/ΔE3 mice “class II” genes. In a more complex study using a very different array platform37, similar gene classes were identified that contained different genes (see Discussion).
To validate the transcript level changes, we used quantitative RT-PCR (qPCR, Fig. 5c and Supplementary Table 3). Of changes observed only in the HSALR mice, more genes are down-regulated than up-regulated (Fig. 5c). There is good agreement with the microarray data, however the magnitude of the change detected is larger by qPCR, likely because of the smaller dynamic range of arrays. Because the class II changes do not occur in the Mbnl1ΔE3/ΔE3 mouse (at least by 12–14 weeks of age), they are unlikely to be a simple consequence of a loss of Mbnl1. However, the HSALR mouse also suffers a loss of Mbnl1 function28, and it remains possible that Mbnl1 loss is necessary but not sufficient to produce the class II changes. We conclude that a substantial number of transcript level changes occur in the HSALR mouse that do not occur in the Mbnl1ΔE3/ΔE3 mouse.
To capture class II gene functions, we relaxed the array-measured fold change cut off from 1.5 to 1.3 fold, since qPCR indicated that fold change is underestimated in this data. Because the filtering of cross hybridizing CUGexp RNA in the samples is incomplete, some genes will artificially appear to be up-regulated in the HSALR mouse but not in the Mbnl1ΔE3/ΔE3 mouse. We avoided such genes by only considering those significantly down regulated in the HSALR mouse relative to wild type. To ensure that the genes on the list were specific to CTGexp effects, we required that their q values (FDRs) in both the HSALR vs wild type and the HSALR vs Mbnl1ΔE3/ΔE3 mouse comparisons were < 0.05, and that their q values in the Mbnl1 vs wild type comparisons were > 0.05. This resulted in a set of 93 genes down regulated in the HSALR mouse but not changed in the Mbnl1ΔE3/ΔE3 mouse (Supplementary Table 4).
Using the splicing changes, and the class I and class II gene expression change lists, we searched for functionally related gene classes that might provide insight into the disease. We performed a Gene Ontology (GO) analysis38, using the background set of all genes expressed in the experiment (~6000 genes of the ~10,000 on the array). Splicing changes were not significantly enriched in any GO category (not shown), suggesting that Mbnl1-regulated exons are distributed in several broad functional classes of genes. Class I genes, whose expression changes presumably arise as an indirect consequence of the loss of Mbnl1 in splicing, are enriched insulin and insulin-like growth factor signaling and glucose metabolism (Supplementary Table 5), suggesting that loss of Mbnl1 has strong but specific effect on glucose metabolism in muscle cells. Other muscle cell terms such as contractile fiber and muscle development are also significantly associated with this gene set.
Class II genes are very significantly enriched for those encoding or regulating components of the extracellular matrix (ECM), with one fifth of the genes (18/93) carrying this functional label (Supplementary Table 6). Fifteen are orthologous to human genes involved in other forms of muscular dystrophy, myopathies, and several connective tissue diseases, or their deletion in mouse creates a similar defect (Table 1, Supplementary Table 7). We conclude that the coordinate dysregulation of a large set of genes associated with ECM function is caused by a second effect of CTGexp DNA that is not explained by simple loss of Mbnl1.
We analyzed the sequences associated with the 5′ and 3′ untranslated regions of the down regulated class II genes. Using the frequency of 7-mer sequence words, we find that sequences containing the Mbnl1 motif (YGCY) are enriched in the untranslated regions of the 93 class II mRNAs. There is a significant enrichment of GGUGCUA in the 3′UTRs as well as enrichment of several related words in the 5′ UTRs (UGCCUGC, CCUGCCU, UGUGCCU; Supplementary Table 8). Due to the extreme similarity between the RNA binding domains of Mbnl1 and Mbnl2, it seems likely that the two proteins bind highly related RNA sequences21,39. Since Mbnl2 is sequestered by CUGexp RNA31,32, these results lead to the strong hypothesis that expression of class II genes might be altered through loss of Mbnl2 function mediated through binding to the untranslated regions of the mRNAs. The enrichment of Mbnl binding sites observed in the 93 class II mRNAs is also observed in the subgroup of class II genes that encode ECM components (Supplementary Table 8). Mbnl2 colocalizes with integrin α3 mRNA in the cytoplasm and promotes its expression at the cell surface40. Since integrins are ECM components, it is tempting to propose that Mbnl2 plays a general role in promoting correct localization, stability, and translation of mRNAs encoding ECM proteins and their regulators.
Our genome-wide test of the Mbnl1 sequestration hypothesis identified a large set of mouse muscle splicing events that depend on Mbnl1, and compared them to splicing changes caused by toxic CUGexp RNA. Strikingly, genetic loss of Mbnl1 accounts for about 80–90% of the phenotype in the CUGexp RNA expressing mice (Fig. 1, Supplementary Fig. 1). The extreme similarity of global splicing defects supports a sequestration model in which CUGexp RNA creates a catastrophic loss of Mbnl1 splicing function in DM1. This is confirmed by the presence of Mbnl1 binding motifs in and near exons whose splicing is compromised by loss of Mbnl1 (Figs. 2, ,3).3). Our analysis was unable to identify enrichment of motifs for other splicing factors implicated in DM1 pathogenesis, such as CUGBP1 (refs. 23, 25) or hnRNP H23 and neither we (this study) nor Kalsotra et al.25, who studied the developing heart, found strong co-enrichment of CUGBP1 motifs with Mbnl1-dependent exons. We also found rare splicing changes specific to each model (Fig. 1). Thus, although Mbnl1 loss is the major contributor to splicing defects in the HSALR mouse, other factors also contribute in more subtle ways.
A cascade of other perturbations due to Mbnl1 loss also occurs. These encompass changes in transcript level (Fig 5, see also ref. 37). Fully 69% of the transcript level changes observed in the Mbnl1ΔE3/ΔE3 mouse are also observed in the HSALR mouse (class I genes). Many of these show no altered splicing, but they could depend directly on Mbnl1 (if it has a role in mRNA stabilization), or could be altered through indirect effects consequent to the loss of splicing regulation for transcription or other RNA stability factors. The broad impact of Mbnl1 loss makes following the threads of each perturbation to the mechanistic source of each disease symptom a daunting prospect. Whether direct or indirect, perturbations that arise from loss of Mbnl1 are pervasive and complex, and may explain the medical complexities of the human disease. The mouse models reveal defects conserved in humans DM1 patients (Fig. 4), suggesting that diagnostic tests for a wide spectrum of genes that depend on Mbnl1 function may presage DM1 onset, or allow sensitive evaluation of therapies.
A second class of gene expression perturbations occurs in CUGexp RNA-expressing mice, but not in mice lacking Mbnl1 (class II genes). A surprising fraction is associated with ECM function (Fig. 6, Supplementary Tables 6 and 7). We propose that their misregulation is due to loss of function of a second factor. The leading candidate for such a factor is Mbnl2, which like Mbnl1, is sequestered by CUGexp RNA5,31,32,41. The RNA binding domains of Mbnl1 and Mbnl2 are nearly identical, and untranslated regions of class II genes are enriched in Mbnl RNA motifs (Supplementary Table 8), supporting this hypothesis. A mouse Mbnl2 mutant has a relatively mild phenotype that includes muscle fibrosis, however expression of residual Mbnl2 has not been strictly ruled out in this system42. The enrichment of ECM genes among the class II set (Fig. 6, Supplementary Tables 6 and 7) combined with the finding that Mbnl2 colocalizes with integrin α3 mRNA and promotes localization of the integrin protein at focal adhesions40 makes it reasonable to postulate that Mbnl2 plays a general role in promoting correct expression of ECM molecules.
Loss of Mbnl1 through sequestration by CUGexp RNA contributes to changes in muscle morphology and excitability as indicated by studies with Mbnl1-deficient mice27. These include aberrant splicing of CLCN1 causing myotonia14,15,43, mis-splicing of Camk2g causing interference in excitation–contraction coupling44, or alteration of Acvr2a expression disturbing regulation of muscle growth45. A large number of new candidates associated with ion channels, cytoskeleton and myotubule composition, glucose metabolism and signal transduction may lead to additional defects (Supplementary Tables 1 and 3).
Despite this, loss of Mbnl1 does not produce the progressive muscle wasting observed in DM1. Orengo et al. reported that transgenic mice with 960 CTG repeats in DMPK 3′UTR display muscle loss not observed in Mbnl1ΔE3/ΔE3 mice6,27. We found that CUGexp RNA, but not genetic loss of Mbnl1, causes alteration of a broad set of ECM mRNAs (Fig. 6) that could worsen with age or increased repeat expansion, as observed in the human disease. Consistent with this, mutations in some of these same ECM genes cause muscular dystrophies or connective tissue diseases (Table 1, refs. 46–48). For example, defects in collagen I cause osteogenesis imperfecta with muscle weakness49, whereas mutations in Col6a1 are responsible for Ullrich congenital muscular dystrophy50. Col15a1 is involved in maintaining ECM structural integrity, and Col15a1-deficient mice show progressive muscle fiber degeneration51. Defects in elastin assembly also affect muscle strength and regeneration. Fibrillin-1 (FBN1) mutation causes Marfan syndrome with muscle involvement52,53. Loss of Fibulin-5 contributes to age-related muscular degeneration54,55. Tenascin C interacts with integrin (integrin α7 mutations cause congenital muscular dystrophy), and TNC-deficient mice show reduced muscle strength56. Unlike EpA960 transgenic mice6, HSALR mice do not develop severe muscle weakness and wasting, so more extreme alterations in ECM gene expression may promote a more severe dystrophic phenotype. Together these data overwhelmingly point to the idea that a loss of regulation of ECM function and the consequent effects on cell adhesion contribute to muscle defects in DM1 (Table 1, Fig. 6).
RNA samples from the quadriceps muscle of individual 12–14-week-old male mice from the Mbnl1ΔE3/ΔE3, HSALR, and FVB inbred background lines (n = 4 for each group) were compared. To identify Mbnl1-dependent splicing events in heart (shown in Supplementary Table 2), RNA from quadriceps muscle and heart from Mbnl1ΔE3/ΔE3 mice and age-matched wild type mice in the C57BL/6J background were compared. RNA samples were processed for hybridization to Affymetrix “A-chip” oligonucleotide microarrays29 according to the standards of the manufacturer.
Analysis was done according to ref. 29, with modifications indicated below. Preliminary analysis of data for the HSALR mice indicated that probes with any 7-mer comprised of CTG repeats reported significantly higher intensities than others in the same gene, likely reflecting cross-hybridization driven by the large mass of CUGexp RNA in the sample. We corrected for this by ignoring any gene with more than 4 probes containing the three 7-mer permutations of (CTG)n: CTGCTGC, GCTGCTG, or TGCTGCT.
The separation score method29 is illustrated for the Nfix(123) exon in Fig. 1a: when exon skipping (Skip) intensities are plotted against exon inclusion (Include) intensities, the wild type samples appear on the left side of the graph while the Mbnl1ΔE3/ΔE3 and HSALR samples appear in the lower right, indicating a shift from exon skipping in the wild type to inclusion in the mutants. The equation for sepscore is:
For each replicate set (HSALR, Mbnl1ΔE3/ΔE3, and wild type), we estimated the log2 ratio of skipping to inclusion using robust least squares analysis. We evaluated sepscore significance by permuting the assignments of data points to replicate sets, calculating the separation score for the permuted data, and estimating the likelihood that the observed data came from the permuted distribution. We estimated rates of overall gene expression according to probesets that measure constitutive features. We identified genes that are differentially expressed between replicate sets using SAM57.
Total RNA was reversed transcribed to cDNA with Superscript III (Invitrogen) and oligo-dT, following the manufacturer’s protocol. Then ~50 ng cDNA was used as template for PCR with primers to regions spanning the test cassette exons. PCR was carried out for 25–35 cycles using the Platinum Taq polymerase (Invitrogen). To measure splicing, 1 ul PCR products from above reaction were separated on the Agilent bioanalyzer, which then reported the size and concentrations of each splicing-derived product.
Quantitative RT-PCR was carried out in 30 ul reactions using Brilliant QRT-PCR Master Mix kits (SYBR-Green, Stratagene, La Jolla, CA) on a Bio-Rad iCycler. We used GAPDH for normalization, and the protocol of Livak et al.58 for calculating changes in threshold cycles.
We used Improbizer to search intronic sequences flanking each differentially-spliced exon. Background sequences were from alternative exons that were expressed but showed no changes in splicing. We searched for 4 copies of each motif during its search (maxOcc setting = 4). To map motifs, we searched in 40-base windows (offset by 5 bases) through the 150-base region immediately upstream and downstream of each alternative exon. We contrasted 55 Mbnl1-repressed exons (skipping lost in the mutant), 66 Mbnl1-induced exons (inclusion lost in the mutant), and 790 alternative exons expressed without splicing changes. We determined background motif frequencies by randomly sampling equivalent numbers of exons from the background 1000 times per sequence window. Mean background counts and the 95% confidence interval were determined through quartile analysis.
To identify sequence words associated with class II genes we contrasted the sequences of the 5′ and 3′ UTRs to those of all genes expressed in the experiment. We determine the number of times each k-mer appears in the selected sequences and in the background, for k = 5, 6, and 7. A public perl module (Statistics::ChisqIndep) for chi-squared evaluation of 2×2 contingency table data is used to determine the likelihood (p-value) that the observed enrichment is due to chance. A Bonferroni correction is applied by multiplying the p-value by the number of all possible k-mers (4k). This correction does not assume independence and ignores the fact that many k-mers only rarely appear in the genome. We consider k-mers to be significantly enriched in the test set relative to background if their false discovery rates are 0.1 or less (Supplementary Table 8).
The human β-globin Dup33 minigene plasmid59 (gift of Doug Black) was used to create splicing reporters. The Vldlr(84) and Nfix(123) exons and associated intron sequences (~200 bp on each side) were amplified from mouse genomic DNA and inserted between the ApaI and BglII sites. Mutagenesis was performed using the QuikChange Site-Directed Mutagenesis Kit (Stratagene Corp.)
Mouse embryonic fibroblasts from Mbnl1ΔE3/ΔE3 (C57BL/6J background) were immortalized by transfection with pOT, a plasmid expressing SV40 T-antigen60 (gift of Xiang-Dong Fu). The cells were grown in antibiotic-free Dulbecco’s modified Eagle’s medium (DMEM) with 15% fetal bovine serum. To test the function of Mbnl1 and the motifs, 0.5 ug wild type or mutant minigenes were and 1 ug pEGFP-MBNL1 (ref. 28) or the control plasmid producing only GFP (pEGFPC1) were co-transfected using Lipofectamine 2000 reagent (Invitrogen). RNA was harvested 24 h after transfection.
We evaluated Gene Ontology term enrichment using GoMiner (Version 200904, April 2009; Application Build 246, ref. 38), relative to the set of all genes expressed in the experiment. Because of the use of Fisher’s exact test for scoring, enrichment of GO classes for which there are 5 or fewer genes must be considered tentative.
We thank Doug Black (UCLA) and Xiang-Dong Fu (UCSD) for gifts of plasmid and advice on mammalian cell culture. Thanks to Benoit Chabot and Jeremy Sanford for critical readings of the manuscript. This work was primarily supported by grant GM084317 to M.A., with additional support from GM040478 (to M.A.), grant AR046799 (to M.S.) and grants AR046806 and a Muscular Dystrophy Center grant U54NS048843 (to C.T.). M.H. acknowledges the California Institute of Regenerative Medicine for postdoctoral support.
Microarray data for this study has been deposited in GEO under accession number GSE17986.
M.A., H.D., M.S. and C.T. designed the experiments. H.D., R.O., D.T., and M.H. performed the experiments. T.C. provided materials and data analysis. M.C., J.D., and L.S. performed data analysis. M.A., H.D., and M.C. wrote the paper.