|Home | About | Journals | Submit | Contact Us | Français|
TAR DNA-binding protein 43 (TDP-43) is associated with a spectrum of neurodegenerative diseases. Although TDP-43 resembles heterogeneous nuclear ribonucleoproteins, its RNA targets and physiological protein partners remain unknown. Here we identify RNA targets of TDP-43 from cortical neurons by RNA immunoprecipitation followed by deep sequencing (RIP-seq). The canonical TDP-43 binding site (TG)n is 55.1-fold enriched, and moreover, a variant with adenine in the middle, (TG)nTA(TG)m, is highly abundant among reads in our TDP-43 RIP-seq library. TDP-43 RNA targets can be divided into three different groups: those primarily binding in introns, in exons, and across both introns and exons. TDP-43 RNA targets are particularly enriched for Gene Ontology terms related to synaptic function, RNA metabolism, and neuronal development. Furthermore, TDP-43 binds to a number of RNAs encoding for proteins implicated in neurodegeneration, including TDP-43 itself, FUS/TLS, progranulin, Tau, and ataxin 1 and -2. We also identify 25 proteins that co-purify with TDP-43 from rodent brain nuclear extracts. Prominent among them are nuclear proteins involved in pre-mRNA splicing and RNA stability and transport. Also notable are two neuron-enriched proteins, methyl CpG-binding protein 2 and polypyrimidine tract-binding protein 2 (PTBP2). A PTBP2 consensus RNA binding motif is enriched in the TDP-43 RIP-seq library, suggesting that PTBP2 may co-regulate TDP-43 RNA targets. This work thus reveals the protein and RNA components of the TDP-43-containing ribonucleoprotein complexes and provides a framework for understanding how dysregulation of TDP-43 in RNA metabolism contributes to neurodegeneration.
Gene expression is an essential process common to all living organisms. Regulation of genes in mammals can occur by repression or activation at transcription promoter sites or by regulating aspects of RNA metabolism (1). RNA metabolism is dysregulated in several neurodevelopmental and neurodegenerative diseases (2, 3), and it is plausible that defective RNA metabolism contributes to the pathogenesis and progression of neurodegeneration. Genetic mutations in two RNA-binding proteins, TDP-434 and fused in sarcoma/translated in liposarcoma (FUS/TLS), have recently been identified as causative factors of familial and sporadic amyotrophic lateral sclerosis (ALS) (4, 5). TDP-43 and FUS/TLS are also major components of the ubiquitinated neuronal and glial inclusions in affected brain and spinal cord regions of patients with ALS and frontotemporal lobar degeneration with ubiquitin-positive inclusions (6). In animal studies, transgenic mice for wild type (7) and mutant (8) TDP-43 partially phenocopy the human diseases. Genomic deletion of TDP-43 is embryonic lethal, indicating an essential role of TDP-43 in early embryogenesis (9, 10). Its neural function, however, is not known, nor is how alterations of neural TDP-43 lead to neurodegeneration.
TDP-43 is part of the family of heterogeneous nuclear ribonucleoproteins (hnRNPs), containing two highly conserved RNA recognition motifs and a non-conserved C-terminal region that mediates protein-protein interactions (11). It has been implicated in gene transcription, pre-mRNA splicing, mRNA stability, and mRNA transport (12). TDP-43 was shown to have high binding affinity for the (TG)n motif (13). Splicing of the cystic fibrosis transmembrane conductance regulator (CFTR) (13), apolipoprotein A-II (APOAII) (14), and survival of motor neuron (SMN) (15) was reported to be regulated by TDP-43. In addition, TDP-43 has been implicated in regulation of mRNA biogenesis (16) and shown to be localized to sites of mRNA transcription and processing in neurons (17) and to bind directly to miRNAs (18).
TDP-43 is one of ~600 annotated and predicted RNA-binding proteins that function in multiprotein complexes, working in cooperation to perform a collective function in RNA metabolism to regulate protein-coding genes. The constitutions of many RNA-binding protein complexes and their RNA targets are not well characterized, nor is the site specificity of these complexes for their RNA targets known. These questions can now be answered due to recent advances in proteomics, functional genomics, and high throughput sequencing.
This study aims to identify the native protein constituents of TDP-43-containing ribonucleoprotein complexes and their RNA targets in neural cells. We performed a proteomic analysis on proteins co-purified with TDP-43 from brain nuclear extracts and determined that TDP-43 is in complexes with 25 endogenous nuclear proteins, mostly RNA-binding proteins and splicing factors. Notable among them are neural proteins methyl CpG-binding protein 2 (MECP2) and polypyrimidine tract-binding protein 2 (PTBP2). In parallel, we used RIP-seq and a bioinformatics approach, which included producing our own mappability track for calculating density reads within each gene, to identify and analyze novel, in vivo TDP-43 RNA targets. We found that TDP-43 binds predominantly to RNAs containing the consensus motif (UG)n. Moreover, our analysis shows that there is often an adenine in the middle of the motif, (UG)nUA(UG)m. (Because our TDP-43 library generated for deep sequencing is composed of cDNAs that are then mapped to the rat genome, for simplicity, we will use TG instead of UG in describing TDP-43 targets in the rest of the text.) We also demonstrated that our TDP-43 library is significantly enriched in reads containing a PTBP2 consensus motif, suggesting that PTBP2 may co-regulate TDP-43 RNA targets. This work thus reveals the nuclear components of the TDP-43-containing ribonucleoprotein complexes and provides a framework for understanding the neuronal function of TDP-43 and its contribution to neurodegenerative disorders.
Electrophoresis reagents were from Bio-Rad. All other chemicals were reagent-grade and were as indicated in the following sections. TDP-43 was detected with antibodies 748C (9) or TDP-43 Proteintech Group Inc.; other antibodies include: hnRNPA1, lamin A/C, and MECP2 (Sigma-Aldrich).
Mice on a mixed background (ages ~3–5 months) were sacrificed, and brains were harvested. Mouse brains were added to homogenization buffer (10 mm HEPES-NaOH, pH 7.4, 1 mm MgCl2, 250 mm sucrose, 1× protease inhibitors) (Roche Diagnostics) and homogenized using a Dounce homogenizer. Homogenates were centrifuged at 1,500 × g for 10 min at 4 °C. The pellet was suspended in nuclear isolation buffer (10 mm HEPES-NaOH, pH 7.4, 1 mm MgCl2, 1.42 m sucrose, 1 mm DTT, 1× protease inhibitors), added to Beckman centrifuge tubes, and centrifuged in a SW 45 Ti rotor at 100,000 × g for 1 h. The nuclear pellet (P100) was suspended in NT2 lysis buffer (50 mm Tris-HCl, pH 7.4, 150 mm NaCl, 1 mm MgCl2, 0.05% Nonidet P-40, 20 mm DTT, 1× protease inhibitors).
Precleared nuclear extracts were applied to the AminoLink Plus resin (Pierce) cross-linked with either nonspecific rabbit IgG or TDP-43(748C) antibodies. Briefly, antibodies were diluted in coupling buffer (0.1 m Na3C6H5O7 and 0.05 m Na2CO3, pH 10) and coupled to the resin. Antibodies were then cross-linked to the resin, using 0.1 m NaBH3CN. Lysates were incubated overnight at 4 °C and eluted with 50 mm glycine, pH 2.5, and eluents were neutralized with 1 m Tris-HCl, pH 8.0.
HeLa nuclear lysates were untreated or pretreated with RNase A (Roche Diagnostics) and rat brain nuclear lysates were untreated or pretreated with micrococcal nuclease before they were loaded into a Superose 6 or Superdex 200 column (GE Healthcare), respectively, in buffer containing 50 mm Tris-HCl, 150 mm NaCl at pH 7.5. The collected fractions were used for Western blot analysis.
Immunoprecipitation eluates were desalted via loading and briefly resolving protein bands in a 10% polyacrylamide SDS gel. After staining with Coomassie Blue, each gel lane was cut into a band, and bands were subjected to in-gel digestion (12.5 μg/ml trypsin). Extracted peptides were loaded onto a C18 column (100-μm internal diameter, 12 cm long, ~300 nl/min flow rate, 5 μm, 200 Å pore size resin from Michrom Bioresources, Auburn, CA) and eluted during a 10–30% gradient (Buffer A, 0.4% acetic acid, 0.005% heptafluorobutyric acid, and 5% acetonitrile; Buffer B, 0.4% acetic acid, 0.005% heptafluorobutyric acid, and 95% acetonitrile) for 90 min (Experiment Sample 1) or 30 min (Experiment Sample 2). The eluted peptides were detected by Orbitrap (350–1500 m/z, 1,000,000 automatic gating control target, 1,000-ms maximum ion time, resolution 30,000 full width at half maximum) followed by 9–10 data-dependent MS/MS scans in linear trap quadrupole (2 m/z isolation width, 35% collision energy, 5,000 automatic gating control target, 200-ms maximum ion time) on a hybrid mass spectrometer (Thermo Finnigan).
Acquired MS/MS spectra were extracted and searched against a mouse reference database from the National Center for Biotechnology Information using the SEQUEST Sorcerer algorithm (version 2.0, SAGE-N). Searching parameters included mass tolerance of precursor ions (±50 ppm) and product ion (±0.5 m/z), partial tryptic restriction, with a dynamic mass shift for oxidized Met (+15.9949), four maximal modification sites, and three maximal missed cleavages. Only b and y ions were considered during the database match. To evaluate false discovery rate, all original protein sequences were reversed to generate a decoy database that was concatenated to the original database. The false discovery rate was estimated by the number of decoy matches (nd) and the total number of assigned matches (na). False discovery rate = 2 × nd/na, assuming the mismatches in the original database were the same as in the decoy database. To remove false positive matches, assigned peptides were grouped by a combination of trypticity (fully, partial, and non-tryptic) and precursor ion charge state (1+, 2+, and 3+). Considering shift from expected precursor m/z (<10 ppm) and by dynamically increasing XCorr (minimal 1.8) and ΔCn (minimal 0.05) values, protein false discovery rate was reduced to less than 5% (and less than 3% for proteins identified by a single peptide match). Proteins were quantified using the Abundance Index, which is defined as the spectral counts divided by the number of peptides per protein (19, 20).
Rat cortical neurons (14 days in vitro) were isolated from embryonic day 18 rat brains and cultured as described previously (9). Cells were rinsed with PBS, lysed using polysomal lysis buffer (100 mm KCl, 5 mm MgCl2,10 mm HEPES pH 7.0, 0.5% Nonidet P-40, 1 mm DTT, 100 units ml−1 RNase Out, 400 μm vanadyl ribonucleoside complexes, 1× protease inhibitors), and sonicated using a Biorupter® UCD200 to fragment the RNA (30 s on, 30 s off, repeated three times), and then lysates were precleared. Supernatants were diluted 10-fold in NT2 buffer (supplemented with 200 units ml−1 RNase Out, 400 μm vanadyl ribonucleoside complexes, 20 mm EDTA) and added to antibody-protein A beads. Both TDP-43 and nonspecific rabbit IgG antibodies were affinity-purified using protein A beads. Immunoprecipitation occurred for 2 h. Beads were washed with ice-cold NT2 buffer before eluting the RNP components and RNA with the addition of RNA-Stat60. The aqueous phase was separated by adding chloroform, and RNA was precipitated from the aqueous phase using 70% ethanol. Isolated RNA was treated with DNase I to remove any genomic DNA contamination. The cDNA libraries were generated as per the Illumina manufacturer's instructions accompanying the RNA sample kit (part number 1004898). Briefly, the isolated RNA was fragmented using the provided fragmentation buffer, and first strand cDNA was generated using SuperScript II followed by second strand synthesis using DNA polymerase I. The cDNA was end-repaired using a combination of T4 DNA polymerase, Escherichia coli DNA polymerase I large fragment (Klenow polymerase), and T4 polynucleotide kinase. The blunt, phosphorylated ends were treated with Klenow fragment (3′ to 5′ exonuclease-minus) and dATP to yield a protruding 3-A base for ligation of the Illumina adapters, which have a single T base overhang at the 3′ end. After adapter ligation, cDNA was PCR-amplified with Illumina primers for 15 cycles, and library fragments of ~250 bp (insert plus adaptor and PCR primer sequences) were band-isolated from an agarose gel. The purified cDNA was captured on an Illumina flow cell for cluster generation. Libraries were sequenced on the Illumina GA IIx genome analyzer following the manufacturer's protocols.
The rat genome (build rn4) was downloaded from the University of California, Santa Cruz (UCSC) Genome Browser (21) in May 2010. We used the RefSeq annotations (22) for rat and downloaded the associated table from the UCSC Genome Browser in May 2010. 5′-UTR, 3′-UTR, and coding region as well as intron and exon definitions were based on this RefSeq annotation.
The read lengths from both TDP-43 and control libraries were 36 nt (nucleotides) long. These reads were mapped to the rat genome using the Bowtie software (23). To ensure that the highest possible coverage was obtained, the reads were first truncated 1 nt at a time so that the length range was 12–36 nt, yielding a total of 24 different search files. Next, the error rates were compared in mapping with 0–3 mismatches. It was found that using the whole length of the reads (36 nt) allowing two mismatches gave the best trade-off between coverage and mapping accuracy (supplemental Fig. S1). Given the nature of the sequencing protocol, it was not possible to differentiate between reads coming from the two different strands. Using the mapped Bowtie output, the number of reads that map to a particular region (intron/exon, 5′-UTR, 3′-UTR, coding sequence) of each gene was calculated.
The number of reads that map to a particular gene is strongly correlated with its length. Furthermore, due to shared/repetitive sequences between genes, not all 36-nt reads can be mapped uniquely to the genome. Therefore, it is not enough to simply divide the number of uniquely mapped reads by the total length of the gene to calculate the density of reads per unit sequence length.
Currently, there is no existing information about how mappable a given sequence is for the rat genome, so we generated our own mappability track for the accurate calculation of the density of reads within each gene. For every position in the rat genome, we took 36 nt starting at that nucleotide and mapped this sequence using the Bowtie software back to the rat genome, allowing no mismatches. We assigned a mappability score of 1 if this sequence was the only instance in the genome. Otherwise, we assigned a mappability score of 0. Finally, we defined the effective length of a region as the sum of mappability scores of all positions in that region.
One way to identify common short sequences that are preferentially bound by TDP-43 is to search for sequence fragments that are enriched in the TDP-43 library when compared with the control library. To determine how many times a short sequence appears in the TDP-43 library, we first found all possible 12-nt sequence fragments by shifting one nucleotide at a time to produce 24-fragments from each 36-nt read in the library. For each read, the same 12-nt sequence could appear more than once. In these cases, we only counted a single occurrence of these 12 nt. Due to the high computational cost of this operation, we decided to use a randomly selected subset of ~10 million reads from the TDP-43 library. After ranking the 12-nt sequences by the number of occurrences among these reads, we took the most frequent 100 fragments, and we counted the total number of occurrences of only these sequences in both the entire TDP-43 and the control libraries. The -fold enrichment is calculated using Fisher's exact test. Because we used the same dataset to generate and test our hypothesis, we corrected for the multiple hypotheses testing problem using Bonferroni correction. We multiplied our p values by 4–12 to obtain the adjusted p values.
For each gene, we first determined the number of reads mapping to its exons and introns and calculated the total intronic and exonic effective length of all genes. Then, we defined the exonic read density of each gene as equal to the number of reads mapped to its exons divided by its total exonic effective length. We also calculated intronic read density analogously.
Using the reads from the TDP-43 library, we ranked all RefSeq genes based on either exonic or intronic read density, obtaining two separate ranked lists. Then, for each of these genes, we calculated the ratio of exonic and intronic read densities in the TDP-43 library to the control library. The top 25% genes from these lists were analyzed further after filtering out genes for which the ratio of read density between the TDP-43 sample and the control library was less than the ratio of total number of reads in the TDP-43 library to that in the control library (1.0547). This filter eliminated more genes from the exonic set than the intronic set. Among the filtered out genes were highly abundant transcripts. We defined three groups within TDP-43 targets. The “exonic targets” of TDP-43 were genes that ranked in the top 25% in exonic read density and had at least 1.0547-fold more exonic reads in the TDP-43 library when compared with the control library. The “intronic targets” of TDP-43 were defined similarly. The genes that appeared in the top 25% of both exonic and intronic read density and had a greater than 1.0547-fold difference in both exonic and intronic ratio of TDP-43 reads to control were defined as the set of “dual targets.” For a summary of the computational analysis of the libraries, see supplemental Fig. S2.
The functional analysis of TDP-43 targets was performed using the FuncAssociate software (24, 25). Briefly, enrichment for each Gene Ontology classification was calculated using Fisher's exact test. The p values were adjusted for the multiple hypotheses testing problem using a resampling approach (24). We specified the set of all RefSeq IDs as the universe of all genes unless otherwise specified. All other statistical analyses were carried out using the R 2.9 software package.
We searched the TDP-43 library for the consensus binding motifs and their reverse complements of PTPB2 (CTCTCTCTCTCT), hnRNP-A2/B1 (TTAGGGTTAGGG), and hnRNPC (CTTTACATTTG) and as a negative control PPARγ (AGGTCAXAGGTCA). We compared the number of occurrences of these motifs in the TDP-43 library with the control one to determine -fold enrichment.
We first determined the native state of TDP-43 in HeLa nuclear extracts using gel filtration. We found that TDP-43 elutes in a high molecular mass range with a peak elution at >500 kDa and less prominently in a lower molecular mass range (Fig. 1A). We show that with the addition of RNase A, TDP-43 elution in the high molecular mass range could be shifted to the lower molecular mass range (Fig. 1A). A similar pattern was observed for hnRNPA1 in control lysates as well as lysates treated with RNase A but not for lamin A/C (Fig. 1A). We noticed that nuclease treatment resulted in the presence of ~35- and ~25-kDa TDP-43 bands, which are thought to be associated with neuropathology. To test whether TDP-43 exists in high molecular mass complexes in the brain, we took rat brain nuclear extracts and applied the extracts to a gel filtration column. Similar trends were observed as in HeLa nuclear extracts (Fig. 1B). TDP-43 elution in a lower molecular mass range of ~100 kDa probably reflects a TDP-43 dimer (Fig. 1, A and B) (26). The observation that TDP-43 prominently exists in high molecular mass complexes (which probably are functional units) is consistent with the association of TDP-43 with RNAs.
Native TDP-43 RNA targets in neurons have not been identified. To identify RNAs associating with TDP-43, we used a modified RNA immunoprecipitation method followed by deep sequencing (RIP-seq) (27) from primary cultured rat cortical neurons (Fig. 1C). We obtained 30.6 and 28.9 million 36-nt reads from the TDP-43 and control libraries, respectively. Of these, 47.1% (TDP-43) and 17.7% (control) mapped uniquely to the rat genome (build rn4), respectively. This 2.75-fold difference in mappability between the two libraries indicates that the TDP-43 library was enriched for regions of high sequence complexity. Consistent with this, 7.98 million reads in the TDP-43 library mapped uniquely to annotated RefSeq genes versus only 1.97 million reads from the control library (supplemental Data Files S1 and S2).
In our TDP-43 library, 1.33 million reads mapped to exonic regions of genes versus 6.65 million reads to intronic regions (Fig. 1D). We calculated read densities (number of reads per 1,000 mappable nucleotides per million reads (mRPKM)), which takes into account differences in lengths of various regions (see “Experimental Procedures”). Within exons, TDP-43 reads exhibited 2.2 and 2.3 times higher read density in 3′-UTRs when compared with 5′-UTRs or open reading frames (coding sequence), respectively (Fig. 1E). This suggests that TDP-43 binding sites are enriched within 3′-UTRs. Interestingly, we observed several cases where TDP-43 reads extended beyond the annotated 3′-UTR end (data not shown), suggesting that several of these genes might have alternative isoforms in which use of distal polyadenylation sites results in longer 3′-UTRs. This is consistent with previous analyses suggesting that in differentiated tissues, genes tend to have longer 3′-UTRs (28). With regard to reads mapping to introns, we did not find any overall bias in the distribution of reads (8.33 × 10−7, 6.94 × 10−7, and 8.33 × 10−7, for coding sequence intron, 3′-UTR intron, and 5′-UTR intron read density, respectively).
We then undertook an unbiased search for frequent short sequences in our TDP-43 library (see “Experimental Procedures”). Given our sequencing procedure, it was not possible to differentiate between the sense and antisense strands (for example, (TG)6 and (AC)6 were considered as a single motif). We discovered (TG)6 with its reverse complement (AC)6 to be the most frequent 12-nt sequence in our TDP-43 library such that 1.30 million reads contain one or the other. The number of these sequences in TDP-43 showed a remarkable 55.1-fold enrichment when compared with the control library (Fisher's exact test; adjusted p value < 3.4 × 10−8). Previously, TDP-43 was suggested to bind to TG repeats, which is in agreement with our analysis. More surprisingly, we found motifs of class (TG)nTA(TG)m (and their reverse complement) to be highly enriched in the TDP-43 library as well (Fig. 2 and supplemental Table S1). For example, for the sequence (CA)3TA(CA)2, the odds ratio was 95.8 (Fisher's exact test, adjusted p value < 3.4 × 10−8). The most frequent variant was in instances where n = m + 1 and n > 1, m > 1 in the motif (TG)nTA(TG)m, indicating that adenine is in the middle of the motif.
We also looked at the distribution of reads within each gene from the TDP-43 library. We analyzed exonic and intronic reads separately and observed that TDP-43 reads for most genes are spread across the entirety of the gene for both exons and introns (supplemental Fig. S3). However, there were two minor sets of genes, one that had most of their exonic reads in the 3′-UTR and another that had most of their intronic reads in the 5′-UTR (supplemental Fig. S3, A, panel iii, and B, panel i).
We next defined TDP-43 RNA targets. A total of 4,352 genes passed the enrichment filter (see “Experimental Procedures” and supplemental Table S2) and are referred to as TDP-43 RNA targets henceforth. There were 1,971 TDP-43 RNA targets that had predominantly intronic reads and 910 targets that had predominantly exonic reads, whereas 1,471 targets had both exonic and intronic reads (Fig. 3, A–C). These three categories of genes are henceforth referred as: 1) exonic, 2) intronic, and 3) dual RNA targets of TDP-43.
We then asked whether these three categories of TDP-43 RNA targets differed with respect to their functions. We used the Gene Ontology database and searched for statistically significant enrichment of functional categories within these three sets. TDP-43-targeted RNAs were enriched in diverse functional categories (supplemental Tables S3–S6). Remarkably, all three sets revealed distinct functional enrichment profiles (Fig. 4, A–C). In particular, genes with TDP-43 exonic reads were enriched for Gene Ontology terms related to splicing and RNA processing and maturation (Fig. 4A, panel i, and supplemental Table S3), whereas genes with intronic TDP-43 reads were enriched for terms associated with synaptic formation and function and in regulation of neurotransmitter processes (Fig. 4B, panel i, and supplemental Table S4); genes with dual TDP-43 reads were enriched for terms related to various aspects of development (Fig. 4C, panel i, and supplemental Table S5). These results provide an important perspective about how TDP-43 regulates different biological and pathobiological processes.
One major group of TDP-43 exonic targets consists of transcripts for proteins involved in RNA metabolism (Table 1), for example splicing factor arginine/serine-rich 1 (SFRS1) and RNA-binding proteins: TDP-43 itself (Fig. 4A, panel ii), FUS/TLS, hnRNPs (A1, A2/B1, C, D, Dl, F, H1, K, M, R, U), and poly(A)-binding protein cytoplasmic 1 (PABPC1). Notably, there was a particular enrichment of reads in 3′-UTRs of some of these genes, like TDP-43 (Fig. 4A, panel ii). Together with the observation that in Tardbp+/− heterozygous mice there is a compensatory increase in TDP-43 RNA levels (9), the current work supports a model wherein TDP-43 binds to the 3′-UTR and regulates the stability or translational efficiency of its own RNA transcript. Moreover, our RIP-seq studies, when combined with our proteomics analysis (see below), suggest that TDP-43 and other factors involved in RNA metabolism mediate post-transcriptional regulation in a complex regulatory network analogous to gene regulation by transcriptional factors.
Transcripts with predominantly intronic reads had enriched Gene Ontology terms related to synaptic formation and function and in regulation of neurotransmitter processes (Table 2). Prominent examples included transcripts for neurexin (Nrxn1–3) and neuroligin (Nlgn1–3), alternative pre-mRNA splicing of which specifies a trans-synaptic signaling code (29, 30) and slit homolog (Slit1,3) (Fig. 4B, panel ii), Slit3 being the primary transcript for miR218-2, which is also involved in neuron differentiation (31). We also noticed that reads from the TDP-43 library were mapped to genomic regions where a number of known miRNAs are annotated (the data are deposited in the National Center for Biotechnology Information (NCBI) GEO database, GSE25032).
TDP-43 RNA targets bound in both intronic and exonic regions were particularly enriched for genes involved in CNS development and differentiation (Table 3). Genes from this category include: notch homolog 1 (Notch1) (Fig. 4C, panel ii), neurotrophic tyrosine kinase receptor types 2 and 3 (Ntrk2,3), myelin transcription factor 1-like (Myt1l), and dual specificity tyrosine phosphorylation-regulated kinase 1A (Dyrk1a). In our previous study addressing the biological impact of deleting Tardbp in mice, we determined that TDP-43 is necessary for embryonic development and is highly expressed in the developing CNS of embryos and into adulthood (9). The RIP-seq data from the current study have allowed us to appreciate the broad spectrum of transcripts regulated by TDP-43 in the CNS and the functional importance of TDP-43 during development and maintenance of the CNS.
Some of the TDP-43 RNA targets have also been associated with neurodegenerative diseases, for example, Tardbp, Fus, progranulin (Grn), α-synuclein (Scna), microtubule-associated protein Tau (Mapt), adenosine deaminase, RNA-specific B1 (Adarb1), and ataxin 1 and -2 (Atxn1,2) (3, 4, 32,–35) (Table 4). Interestingly, a recent study showed a positive correlation between ADARB1-absent neurons and TDP-43-positive cytoplasmic inclusions (34). Given the numerous transcripts TDP-43 regulates, including an apparent self-regulation mechanism, it is conceivable that the TDP-43 inclusions that are present in ALS and frontotemporal lobar degeneration with ubiquitin-positive inclusions and subsequent neurodegeneration could be a result of, or alternatively could result in, a loss of TDP-43 function for a subset of its RNA targets.
We noticed a pronounced representation of exonic reads in the 3′-UTR and intronic reads in the 5′-UTR in a subset of TDP-43 targets (data not shown). We found genes with intronic TDP-43 binding in 5′-UTRs to be enriched among several regulatory functions including regulation of transcription and metabolism (supplemental Table S6). However, we realized that the enrichment of 5′-UTR introns in regulatory genes holds true in the rat genome irrespective of TDP-43 binding (supplemental Table S7), consistent with a similar enrichment of 5′-UTR introns in regulatory genes in the human genome (36).
These results, taken together, suggest that TDP-43 regulates genes in three different modes. One is through binding sites in introns, one is through binding sites in exons, and another is through binding across both introns and exons. Our analysis also supports that, consistent with the “post-transcriptional operon” theory (37), TDP-43 regulates functionally coherent sets of genes via binding to distinct modalities.
This study is also the first report showing that endogenous TDP-43 RNA targets in a genome-wide manner. Previous studies reporting TDP-43 binding to RNAs used overexpression models or showed a correlation between knockdown of TDP-43 and changes in transcript and protein levels (13,–18, 38, 50). Previously identified TDP-43 RNA targets, HDAC6, APOAII, SMN, and neurofilament (NEF), were not identified in our study. These TDP-43 RNA targets may be context-specific. Several TDP-43 RNA targets have been predicted from microarray analysis of altered cellular transcripts upon siRNA knockdown of TDP-43 (18). TDP-43 targets identified from our RIP-seq data set corresponded to some of these altered transcripts including: Dyrk1a, cyclin-dependent kinase 6 (Cdk6), insulin-like growth factor 1 receptor (Igf1r), laminin γ1 (Lamc1), structural maintenance of chromosomes protein (Smc1a), Rho-related BTB domain-containing (Rhobtb2), and protein CDV3 homolog (Cdv3) (18) (Tables 1 and and33 and supplemental Table S1). The altered transcripts listed by Buratti et al. (18) are also down-regulated following let-7b overexpression, which the authors interpreted as being the result of the TDP-43-miRNA interaction. Interestingly, our TDP-43 RIP-seq results indicate that there is a direct interaction between TDP-43 and these transcripts, which indicates a dual means of transcript regulation.
We immunoprecipitated endogenous TDP-43 from rodent brain nuclear extracts and analyzed the resultant precipitation products with semiquantitative mass spectrometry. Taking into consideration the abundance index (>3.33) and consistency of spectral count trend (Fig. 5A, under spectral counts) of two independent experiments, we reliably identified 25 co-precipitating proteins highly enriched in the TDP-43 precipitate relative to control (Fig. 5A). There were 34 co-purified proteins that did not meet our criteria (supplemental Table S8), which nevertheless may represent transient interacting proteins of TDP-43.
Of the 25 proteins we identified as part of a TDP-43 nuclear interactome, 16 had been previously shown to co-purify with TDP-43 (39, 40). The nine new proteins not previously reported to co-purify with TDP-43 are GM9242 (similar to hnRNPA3), MECP2, SFRS1, peroxiredoxin (PRDX1 and -2), calmodulin-like 3 (CALML3), U1 small nuclear ribonucleoprotein (SNRNP), eukaryotic translation initiation factor 5A (EIF5A), and splicing factor 3a, subunit 1 (SF3A). Many of these proteins are ubiquitously expressed, except MECP2 and PTBP2, which are highly enriched in the CNS. The TDP-43 nuclear interactome reveals that the majority of co-purified proteins are RNA-binding proteins, splicing factors, and translation factors involved in aspects of RNA metabolism (Fig. 5B and supplemental Table S9), but some are considered antioxidants (PDX1 and -2) or a calcium-binding protein (CALML3). Previously, SFRS1, which promotes exon skipping of CFTR, was shown to work with TDP-43 additively to promote CFTR exon skipping (13). The brain-specific PTBP2 binds intronic clusters of RNA regulatory elements and controls the assembly of other splicing regulatory factors, including RNA-binding proteins (41). From this list, RNA binding motif protein, X chromosome retrogene (RBMXRT) and hnRNPH2 are RNA-binding proteins involved in RNA splicing, transport, and stability (42). The predicted hnRNP A3 isoform 4 homolog (GM9242) is likely to have a similar role in RNA metabolism, but its function is unknown. MECP2 is an X-linked gene, mutations of which cause Rett syndrome, a progressive neurodevelopmental disorder (43). MECP2 is known for binding methylated DNA repressing translation, but its interaction with TDP-43 was confirmed by co-immunoprecipitation plus Western blotting (Fig. 5C). There is one report that demonstrates that MECP2 interacts with the RNA-binding protein YBX1 (Y box-binding protein 1) and that together they regulate splicing of reporter minigenes (44). Otherwise its role in RNA metabolism is not well characterized.
Several recent studies have reported identification of many proteins that interact with TDP-43 from peripheral cells overexpressing tagged TDP-43 protein (39, 40, 45). The native nuclear interactome of TDP-43 that we isolated from mouse brain nuclear extracts only partially overlaps with those from the other studies. Future studies will be needed to examine whether the 16 commonly co-purified proteins from our study and the other studies reflect ubiquitous TDP-43-interacting proteins. Moreover, the nine novel proteins identified in our study may represent unique constituents of nuclear TDP-43-containing complexes in the brain.
Our proteomics analysis (Fig. 5, A and B) indicated that the bulk of TDP-43 is physically associated with other proteins, and many of these proteins are RNA-binding proteins with known RNA binding motifs (46, 47). Therefore, we wondered whether there was any enrichment for reads containing motifs for TDP-43-associated RNA-binding proteins. We focused on binding motifs for three TDP-43-associated proteins, PTBP2, hnRNPA2/B1, and hnRNPC, that have well defined and relatively long binding sites. Binding motifs for other associated proteins were not searched because their binding motifs either are not well defined or are too short. For example, the known binding site of hnRNPH2 is 4 nt long (GGGA). The consensus motif of hnRNPL, (CA)n, on the other hand, could not be searched independently due to the nature of our TDP-43 library generation, which does not account for strandedness. Therefore, it is not possible to determine what fraction of the significant enrichment of (TG)n/(CA)n in our TDP-43 library is explained by the affinity of hnRNPL to (CA)n or by the affinity of TDP-43 to (TG)n.
We found a 18.9-fold enrichment for reads containing a consensus binding site motif for PTBP2, (CT)6, in our TDP-43 library when compared with the control (Fisher's exact test; p value < 2 × 10−16; 95,300 reads in the TDP-43 library versus 5,100 in the control library). These results suggest that PTBP2 binding sites are in proximity of TDP-43 binding sites and that PTBP2 may co-regulate TDP-43 RNA targets. However, we did not find any significant enrichment of reads containing the binding sites of hnRNPA2/B1 or hnRNPC. One possible explanation is that these RNA-binding proteins do not directly bind TDP-43 transcripts but work as co-regulators through their direct association with the C-terminal region of TDP-43 (48). The other possibility is that hnRNPA2/B1 and hnRNPC bind to regions distal to TDP-43 binding sites. As an additional negative control, we used the DNA binding motif for PPARγ (49), which has not been suggested to associate with TDP-43 in previous analyses. As expected, there was no enrichment for reads containing the PPARγ binding motif in our TDP-43 library.
This study reveals the nuclear components of the TDP-43-containing ribonucleoprotein complexes in the nervous system. We uncovered 25 protein constituents of TDP-43 nuclear protein complexes, referred to as the TDP-43 nuclear interactome. Using RIP-seq, we identified 4,352 RNA targets of TDP-43 and revealed distinct regulatory roles of TDP-43 in post-transcriptional regulation. We also observed similar profiles of TDP-43 RNA targets using cross-linking immunoprecipitation followed by deep sequencing (data not shown). Our work on the TDP-43 nuclear interactome and RNA targets provides a framework for uncovering the biochemical principle of TDP-43-dependent regulation of pre-mRNA splicing and RNA stability and transport, for revealing the neural functions of TDP-43, and for understanding how dysregulation of TDP-43 in RNA metabolism contributes to neurodegeneration.
We thank Dr. Edward K. Wakeland and the staff at University of Texas Southwestern Medical Center Microarray Core Facility and the members of the Dr. Jane Johnson laboratory for assistance with the deep sequencing and its analysis.
*This work was supported, in whole or in part, by National Institutes of Health Grants R01 AG029547 and AG023104 (to G. Y.), HG004233 (to F. P. R.), T32 NS007480 (to E. B. D.), and P30 NS055077 (to J. P.). This work was also supported by the Consortium for Frontotemporal Dementia Research (to G. Y. and J. H.), the Welch Foundation (to G. Y.), the Ted Nash Long Life Foundation (to G. Y.), and a fellowship from the Canadian Institute for Advanced Research (to F. P. R.). M. J. M. is an HHMI investigator.
♦This article was selected as a Paper of the Week.
The on-line version of this article (available at http://www.jbc.org) contains supplemental Data Files S1 and S2, Figs. S1–S3, and Tables S1–S9.
The nucleotide sequences reported in this paper have been submitted to the GEO database under accession number GSE25032.
4The abbreviations used are: