|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs (miRNAs) are 19-22nt non-coding RNAs that post-transcriptionally regulate mRNA targets. To identify endogenous miRNA binding sites, we performed photo-crosslinking immunoprecipitation using Ago2 antibodies, followed by deep-sequencing of RNAs (CLIP-seq) in mouse embryonic stem cells (mESCs). We also performed CLIP-seq in Dicer−/− mESCs that lack mature miRNAs, allowing us to define whether the association of Ago2 with the identified sites was miRNA-dependent. A significantly enriched motif, GCACUU, was identified only in wild-type mESCs in 3′ untranslated and coding regions. This motif matches the seed of a miRNA family that constitutes ~68% of the mESC miRNA population. Unexpectedly, a G-rich motif was enriched in sequences crosslinked to Ago2 in the presence and absence of miRNAs. Expression analysis and reporter assays confirmed that the seed-related motif confers miRNA-directed regulation on host mRNAs and that the G-rich motif can modulate this regulation.
miRNAs are key regulators of gene expression in fundamental processes including cell proliferation, cell death, cell differentiation and cellular responses to the environment1-3. These short non-coding RNAs guide a ribonucleoprotein complex, containing a member of the conserved Argonaute (Ago) protein family, to sites predominantly in the 3′UTRs of their target mRNAs, resulting in the destabilization of the message and/or inhibition of translation4,5. Biochemical and computational studies have shown that base-pairing between the “seed” (2nd -7th nucleotide) at the 5′ end of the miRNA and mRNA target is important for this regulation in animals6-10. Comparative genomic analysis for miRNA seed sites in 3′UTRs suggests that miRNAs regulate ~60% of all mammalian mRNAs11. Moreover, both comparative genomic analysis and emerging data from a handful of genes suggest that miRNAs also target coding sequences8,12,13, but the prevalence of this interaction is unclear. Therefore, recent efforts14-16, including the study presented here, have aimed at identifying bona fide miRNA binding sites on a genome-wide scale in samples from whole mouse brain and whole-animal nematodes. However, one challenge of these studies is to deconvolute the miRNA-target relationships in the mixed cell types from these samples15,16.
In this study, we dissect the miRNA-target relationship in a homogeneous cell population – mouse embryonic stem cells (mESCs) – with defined miRNA characteristics17-19. RNA tags photo-crosslinked to Ago2 in these cells were isolated by immunoprecipitation and subjected to deep-sequencing (CLIP-seq)15,16,20,21. Importantly, no RNA species were detectable by autoradiography in Ago2 immunoprecipitates without crosslinking, suggesting that cloned RNA tags require crosslinking and thus are in direct association with Ago2 (Fig. 1a and Supplementary Fig. 1a). In addition, we performed a parallel analysis in derivative mESCs that lacks Dicer and hence mature miRNAs22. Unexpectedly, we identified specific RNAs crosslinked to Ago2 in Dicer−/− cells (Fig. 1a), indicating that Ago2 can associate with RNAs in a miRNA-independent manner23,24.
Approximately 24.5M sequenced RNA tags from 3 wild-type mESC libraries representing two biological replicates (WT1A, WT1B, WT2) and 10.6M tags from two Dicer−/− libraries (KO1, KO2) were processed and mapped to the mouse genome (Supplementary Methods and Supplementary Fig. 1b-d). Across all libraries, 79% reads uniquely matched to the genome, 21% mapped to non-unique locations and 0.05% could not be aligned.
miRNAs crosslinked to Ago2 in mESCs were identified by screening reads with unique and repeat matches to the genome against non-coding RNA databases (Supplementary Fig. 1e; Supplementary Methods). Mature miRNAs are significantly enriched, as expected17,18, in Ago2-crosslinked samples from wild-type cells compared with Dicer−/− cells. The miR-290~295 cluster, miR-467 family, and miR-302~367 cluster (most members share the AAGUGC seed) represents the largest fraction (~68%) of the Ago2-crosslinked mature miRNA population17-19,25 (Fig. 1b and Supplementary Fig. 1e), and the Ago2-CLIP and whole cell miRNA populations were positively correlated (Fig. 1b). While WT2 library had more reads mapping to ncRNA and repetitive regions than WT1 libraries, the distribution of crosslinked miRNAs is similar between the libraries (Supplementary Fig. 1e). The specificity of CLIP-seq method is shown by the absence of Ago2 crosslinking to the highly abundant rRNAs (~0.2%) and tRNAs (~0.2%).
For each library, the remaining tags that mapped uniquely to 3′UTRs were subjected to a data processing pipeline that consists of four filtering steps (Fig. 1c and Supplementary Table 1): First, identical reads were collapsed as a single read to eliminate potential PCR bias, and overlapping reads were then clustered (Clustering filter, Fig. 1c). 25nt flanking regions were added to the clusters in case an RNase cleavage separated where the miRNA bound and Ago2 crosslinked. Clusters were further considered only if they were significantly enriched over background levels (Normalization filter, Fig. 1c; Supplementary Methods). Third, to select for a reproducible signal, only the clusters that overlapped with at least one other cluster from a Normalized biological replicate library were considered (Multi-Library filter, Fig. 1c). Fourth, the remaining WT clusters that had overlaps with clusters from either Normalized Dicer−/− library (Knockout/KO) were removed (Knockout filter, Fig. 1c). Finally, after removing duplicates from technical replicates WT1A and 1B, 430 clusters in the combined WT libraries (244 in WT1[A+B] and 186 in WT2), of average length 81nt, passed all four filters. Various sets of clusters from different filtering steps were then subjected to motif enrichment analysis using two independent approaches.
First, significantly enriched motifs were identified in 3′UTR-mapped clusters from WT and KO sets independently (Fig. 1d, Supplementary Table 2 for all motif analyses). The motif discovery tool MEME26 was used to search for significantly enriched motifs of variable lengths over background (Supplementary Table 2a; Supplementary Methods). We found significant enrichment for G-rich motifs in clusters from both WT and KO libraries, suggesting that Ago2 may be associated with G-rich sequences independently of miRNAs. The G-rich motifs identified independently in WT and KO libraries have an average Pearson correlation of ~0.80, suggesting a high degree of similarity between the motifs (Supplementary Table 3a). Therefore, we defined a consensus G-rich motif by performing MEME analysis on Ago2-crosslinked clusters that overlapped between WT and KO libraries. This motif was highly statistically enriched (E-value=2.9 × 10−386) and present in 87% of the common clusters between Normalized WT and KO libraries. Examination of individual libraries showed that this consensus G-rich motif was present at approximately equal frequency in sequences crosslinked to Ago2 from wild-type and Dicer−/− mESCs, again suggesting that its association with Ago2 is miRNA independent (Supplementary Table 3b).
One of the two significantly enriched miRNA-dependent motifs identified in 3′UTRs was GCACU[UG] (79 instances from 430 WT clusters). GCACUU (48/79 GCACU[UG] motifs) is complementary to the seed AAGUGC of several highly expressed miRNA families in mESCs. The only other statistically enriched miRNA-dependent motif in the selected clusters was CCAGCC (51 instances). However, unlike GCACUU, this motif is not complementary to any miRNA sequence with appreciable expression in mESCs.
To independently investigate enrichment of motifs from clusters within each individual CLIP library (Fig. 1e, Supplementary Fig. 2a, Supplementary Table 2b for top 20 motifs), we used an enumerative approach that guarantees global optimality by statistical overrepresentation and avoids the problem of being trapped at local optima inherent in most general motif-finding algorithms27. Briefly, we measured the statistical significance of the occurrence of all possible n-mer sequences within each library compared to their occurrence in sequences drawn randomly given a background distribution. This independent analysis confirmed the significant enrichment of G-rich hexamers (≥3Gs, red dots, Fig. 1e, Supplementary Fig. 2a) out of all possible hexamers in WT and KO libraries, as demonstrated by their high p-values and z-scores at a false discovery rate (FDR) < 0.5% (Supplementary Methods for derivations). The three hexamers encompassed in the consensus G-rich motif are amongst the top 7 significantly enriched hexamers in WT and KO libraries (black circle, Fig. 1e). Enrichment was observed exclusively in WT libraries for several non-G rich hexamers (blue dots, Fig. 1e), including GCACUU (black dot) and CCAGCC (light-blue dot). 29 non-G rich hexamers matched to miRNA seeds, but these miRNAs are associated with Ago2 at a median frequency of 0.003% (p < 0.05; Supplementary Fig. 1f, Supplementary Table 2b, c). Several other miRNA seed-matching hexamers occurred with high frequency, but were not observed significantly more than expected by chance and thus were not further considered (Supplementary Table 2c). The miRNA-dependent motif GCACUU is one of the top significantly enriched non-G rich hexamers in all WT libraries, including WT2 that had a lower proportion of 3′UTR-mapping clusters. The enrichment of GCACUU is particularly apparent after applying the Knockout filter, where common clusters between WT and KO libraries, many of which contain G-rich hexamers (red dots), are removed from the WT set. In effect, the Knockout filter reveals GCACUU as the most significantly enriched non-G rich hexamer in mESCs expressing miRNAs (black dot, left-most panel vs. right-most panel Fig. 1e). We also observed enrichment of 7mers and 8mers containing GCACUU that match the extended seed region6,8-10 of the AAGUGC-seed family (Supplementary Table 2d).
Sequences mapping to coding sequences (CDS) were also subjected to the same data processing pipeline, resulting in a set of 197 clusters (106 in WT1[A+B], 91 in WT2). As in the case of 3′UTR clusters, G-rich motifs were highly significantly enriched by MEME analysis in CDS clusters from both WT and KO libraries (constituting ~25% and ~30% of clusters, respectively; data not shown). Moreover, GCACUU hexamer was observed in the CDS clusters from wild-type libraries (22 instances in 197 clusters; Supplementary Table 2a), but not KO libraries. Similarly, in the enumerative analysis of individual libraries, G-rich hexamers were highly enriched in both WT and KO libraries and GCACUU was enriched only in WT libraries (Supplementary Fig. 2b, Supplementary Table 2e, 3c). Unlike 3′UTR-mapped clusters, both MEME and enumerative analyses indicated no enrichment for CCAGCC in the CDS-mapped clusters.
mRNAs targeted by miRNAs are often destabilized, resulting in a lower abundance of targeted transcripts in wild-type cells as compared to Dicer−/− cells4,28,29. We used mRNA expression of two sets of Ago2-CLIP 3′UTR GCACUU transcripts in wild-type and Dicer−/− mESCs to determine if their stability is miRNA-regulated. These two sets included the high-confidence “Overlap” set, comprised of 43 genes that passed the Normalization and Multi-Library filters, and a more inclusive “All” set, comprised of 201 genes that passed the Normalization filter for any WT library. The log2 fold expression change (LFC) between wild-type and Dicer−/− mESCs was compared to the LFC of a control set of genes that lacked the GCACUU-motif. The Ago2-CLIP 3′UTR GCACUU-motif genes from both “Overlap” and “All” sets showed significantly more downregulation in wild-type mESCs relative to Dicer−/− mESCs, as compared to the control gene set (Fig. 2a-d; Supplementary Table 4 for statistics and Supplementary Table 5 for gene lists). These results independently support that these mRNAs physically bound to Ago2 are in vivo miRNA targets in mESCs.
Given that miRNA-dependent changes in mRNA expression have previously been shown for high-confidence predicted targets based on computational analysis of conservation and context around the seed site (TargetScan 5.18,11,30), the properties of these predicted targets of the AAGUGC seed-related family were compared with the mRNAs identified by Ago2-CLIP. Comparison of expression levels in wild-type mESCs of Ago2-CLIP genes and predicted targets showed that the Ago2-CLIP 3′UTR GCACUU-motif genes tend to be more highly expressed (Supplementary Fig. 3). This is not surprising, as biochemical enrichment protocols tend to more effectively sample highly expressed genes.
To further compare properties of the predicted targets and CLIP-identified mRNAs other than expression level, two expression-matched and 3′UTR length-matched sets of predicted targets for the AAGUGC-seed family were generated. The first set, “All predicted targets”, contains 799 TargetScan GCACUU-containing predicted targets. Compared with this predicted set, both Ago2-CLIP “Overlap” and “All” gene sets have significantly greater miRNA-dependent changes in expression (Fig. 2a and 2c), suggesting that the CLIP-identified mRNAs possess features in addition to the miRNA seed match requirement.
To assess the importance of conservation and context around the seed site, we created two gene sets (“Conserved predicted targets”) containing the highest-confidence bioinformatically predicted targets, which are first ranked by branch length (i.e. conservation), then by context score (scored combinatorially by its site-type, 3′pairing, local AU content, and position within the 3′UTR8,11) (Fig. 2b, d). These two sets were comparable to the corresponding Ago2-CLIP “Overlap” and “All” sets in terms of gene number, expression level, and 3′UTR length. No statistically significant difference in miRNA-dependent gene expression change was observed between the Ago2-CLIP “Overlap” gene set and the “All” gene set and their corresponding “Conserved predicted targets” sets (Fig. 2a, c). Yet, the CLIP-identified GCACUU sites from the “Overlap” and “All” sets are generally less conserved and surrounded by a relatively less favorable sequence context than the “Conserved predicted targets” (Fig. 2b, d; Supplementary Table 4d-f). Taken together, our results suggest that the “All” set and the smaller “Overlap” gene sets represent high confidence sets of miRNA-regulated mRNAs and that there are factors, besides conservation and context around the GCACUU seed motif, that govern which sites miRNAs target and/or are bound by Ago2 in mESCs.
We also sought to determine whether the GCACUU-motifs identified in CDS were associated with a miRNA-dependent gene expression signature. To this end, expression of Normalization filtered Ago2-CLIP CDS GCACUU-motif genes from all WT libraries (excluding those with GCACUU in the 3′UTR) was compared to a set of controls that lacks GCACUU in the CDS. The 80 Ago2-CLIP CDS GCACUU-motif genes showed miRNA-dependent downregulation in mRNA expression compared with the control set (Fig. 2e). Interestingly, other expression-matched CDS GCACUU-motif genes (“Predicted set”, Fig. 2e) showed a similar profile as the Ago2-CLIP identified set and a significant downregulation compared with the control. This indicates that the presence of the GCACUU motif in CDS, as in the case of 3′UTR, is associated with a miRNA-dependent gene expression signature8,12.
The expression profile difference between wild-type and Dicer−/− mESCs was further examined for genes with the G-rich motif, whose association with Ago2 appears to be miRNA-independent, and with the CCAGCC motif, whose association with Ago2 might be miRNA-dependent. Neither the G-rich motif nor the CCAGCC motif is complementary to any miRNA sequence with appreciable expression in mESCs. We compared those CCAGCC-containing genes that passed the Normalization filter (excluding those containing GCACUU) with expression-matched sets of all mouse genes that do not contain CCAGCC (control) in the 3′UTR (Fig. 2f). Surprisingly, we observed a significant downregulation of gene expression for the CCAGCC-containing genes in wild-type mESCs relative to Dicer−/− mESCs. This expression difference appears to be specific to those Ago2-CLIP CCAGCC-containing mRNAs as other mRNAs containing CCAGCC did not have a similar change (“Predicted set” in Supplementary Fig. 4). For the G-rich motif from Fig. 1d, we compared the expression change for Ago2-CLIP genes that contain matches to this motif, but lack GCACUU in their 3′UTRs, and passed the Normalization and Multi-library filters (Fig. 2f) with a set of 3′UTRs randomly chosen from the mouse genome that was matched for expression level, dinucleotide CG composition, and 3′UTR length. As is the case for GCACUU- and CCAGCC-containing genes, a significant increase in gene expression was observed upon deletion of Dicer for these G-motif containing genes identified by Ago2-CLIP. Such observed change could be due to the presence of other miRNA seed matches in the 3′UTRs of Ago2-CLIP G-rich motif genes. However, excluding those G-rich motif genes harboring seed matches to abundant mESC miRNAs did not affect the aggregate gene expression change of the G-rich motif gene set (Supplementary Fig. 4).
Interestingly, the degrees of change in mRNA expression observed for G-motif or CCAGCC containing genes were not significantly different from those observed for the Ago2-CLIP GCACUU-motif genes (Fig. 2f) and their expression-matched predicted GCACUU set (cf. Fig. 2a). Correlated with this, previous data suggests that the effect of miRNAs can be mimicked by miRNA-independent tethering of Argonaute proteins to reporter mRNAs31,32. Thus, the observed miRNA-dependent expression changes for Ago2-CLIP genes could be due to the close proximity between Ago2 and the crosslinked mRNA targets.
Next, we sought to determine whether the GCACUU-containing regions that crosslinked to Ago2 are sufficient to confer miRNA-dependent repression on luciferase reporter transgenes in the presence of endogenous levels of the corresponding miRNAs. Since only four genes33-36 have been validated as GCACUU seed match targets in mESCs, it was difficult to evaluate our dataset with the existing literature. Instead, the ~80nt Ago2-CLIP cluster sequence was inserted into the 3′UTR of luciferase and the expression of this construct was compared to an equivalent construct with the GCACUU motif mutated to CCUCAU. The ratio of wild-type to mutant construct expression was evaluated in 3 cellular states: (1) wild-type (endogenous miRNA levels), (2) Dicer−/− mESCs (no mature miRNAs) and (3) Dicer−/− mESCs transfected with a miR-295 mimic, as illustrated in Fig. 3a. In each cellular state, the relative repression was calculated by normalizing to the ratio in Dicer−/− cells. We found that 8 out of 8 Ago2-CLIP 3′UTR GCACUU-motifs showed significant miRNA-dependent repression in wild-type cells but not in Dicer−/− cells (Fig. 3b). However, the repression in Dicer−/− cells was restored by addition of a miR-295 mimic, suggesting that a member from this mESC-specific miRNA cluster (with AAGUGC seed) is sufficient to provide the specificity for such regulation. Additionally, Ago2-CLIP-identified binding sites were present in three genes that have previously been shown to be regulated by the AAGUGC-related miRNA family (E2f137, Pten38, Cdkn1a33). These data show that the Ago2-CLIP 3′UTR-bearing GCACUU-motif sites are indeed endogenous targets for direct regulation by miRNAs in mESCs and the short fragment of ~80nt containing such sites is sufficient to confer mESC-specific miRNA-mediated repression through miR-290~295.
To determine whether the Ago2-CLIP CDS GCACUU-motif sites are sufficient for miRNA regulation, we inserted the CDS cluster sequence (~80nt), or a seed mutant equivalent, in the 3′UTR of luciferase. 7 out of 8 clusters containing CDS GCACUU-motifs conferred downregulation on the luciferase reporter (Fig. 3c), suggesting that these sequences are recognized by the endogenous miRNA machinery even in the heterologous context of the 3′UTR.
The enrichment of the G-rich motif in Ago2-CLIP sequences from both wild-type and Dicer−/− mESCs (Fig. 1d) suggests that it is likely a miRNA-independent binding site for Ago2. This binding preference has not previously been described for Ago2, so we used an independent method to confirm the miRNA-independent association of Ago2 with the set of the G-rich motif containing mRNAs. We transfected Dicer−/− mESCs with an HA-tagged Ago2 construct, immunoprecipitated Ago2 by anti-HA antibodies, isolated the bound mRNA, and hybridized it to Affymetrix microarrays. We also performed microarray analysis on total RNA from Dicer−/− mESCs. The enrichment of mRNAs in the Ago2-IP was determined by comparing expression values between Ago2-IP and total RNA. We then determined whether the set of genes enriched in the Ago2-IP from Dicer−/− mESCs significantly overlapped with the sets of Ago2-CLIP G-motif containing genes. We found that 1.6-2.1 fold more genes overlapped between the Ago2-IP set and the Ago2-CLIP G-motif set than expected by chance (Supplementary Table 6a). These data support the observation that the G-motif containing genes identified by Ago2-CLIP are likely bound to Ago2 or its associated protein complex in a miRNA-independent manner in mESCs.
We previously determined that the CLIP-identified 3′UTR GCACUU mRNAs have a miRNA-dependent expression change comparable to the high-confidence predicted targets, despite being less conserved and in a less favorable sequence context (Fig. 2a-d). To explore whether the G-motif is a feature in these sequences that can contribute to miRNA-dependent regulation, we focused on a new miRNA target, Txnip, identified in this study. We validated this target by demonstrating both its endogenous mRNA and protein levels are regulated by Dicer in mESCs (Fig. 4a), similarly to previously validated miR-290~295 target, Cdkn1a33. One of the Ago2-CLIP clusters identified in Txnip was amongst the most repressed in our 3′UTR luciferase assay (Fig. 3b). This provides a good range of sensitivity to test whether neighboring G-rich motifs affect the miRNA-dependent activity of GCACUU seed sites (Fig. 4b).
The relationship between this G-rich motif and seed-motif was investigated as in Fig. 3 using the following luciferase constructs: (1) the WT cluster, (2) with GCACUU seed motif mutated to CCUCAU, (3) a mutant G-rich motif where all Gs are mutated to Cs (not to alter AU content8,10), (4) or both motifs being mutated. For Txnip “A” cluster, repression was the strongest with wild-type GCACUU seed motif and G-motif (Fig. 4c). Interestingly, the repression was relieved by 50% when the G-motif was mutated (Fig. 4c). However, in the absence of GCACUU, the presence of G motif alone did not confer repression (Fig. 4c). We extended this analysis by investigating the contributions of G-rich motifs to miRNA-dependent repression in another cluster from Txnip, Txnip “B”, (Fig. 4d) and a cluster from Ei24 (Fig. 4e). These clusters each have multiple G-rich motifs; mutation of each G-rich motif individually has varying impact on repression by the GCACUU seed site, with deletion of all G-rich motifs having the greatest effect on repression (Fig. 4f). Similar loss of miRNA-dependent repression was also observed when the G-rich motif of Txnip “A” cluster and Ei24 cluster is deleted (Supplementary Fig. 5a). Taken together, these data suggest that the G-rich motif is important for the full activity of the miRNA seed site, but does not contribute activity in the absence of the miRNA seed site.
Given that the CLIP-identified G-rich motif modulated the miRNA-mediated repression in the three clusters examined, we further investigated the general features of this motif, including its composition, conservation, and location within the Ago2-associated sequence. We searched for enrichment of shorter motifs in the 3′UTR clusters and found that the original 8mer G motif is comprised of enriched G-rich 4mers and 5mers (Supplementary Fig. 5b). Next, we analyzed the conservation of the 8mer G-motif. We determined an average conservation score for all G motifs based on the phastCons conservation39 of each nucleotide within the motifs and compared with a background set of 8mers (Supplementary Methods). We found that G-motifs are generally more conserved than random sequences (p<1E-06) (Fig. 4g). We also analyzed nucleotide positional conservation of an alignment of G-motifs with 10 nt flanks at either end. The level of conservation decreased immediately after the 3′end of the motif whereas the higher level of conservation persists in the 10 nt 5′ of the motif (Supplementary Fig. 5c). Interestingly, further MEME analyses suggest that the 8mer G-motif is likely embedded in an extended G-motif (Supplementary Fig. 5b). The excess conservation observed for G-motifs was true for all 3′UTR clusters, including those lacking the GCACUU motif (Supplementary Fig. 5d). Thus, the excess conservation of the G-motif is not a bystander effect from being near this particular miRNA seed match, but rather the G-motif has attributes of a functional regulatory element40.
Another common feature of this G-motif is that it tends to be present in the 5′ half of the sequence that is crosslinked to Ago2 (Supplementary Fig. 5e-f). In contrast, there is no positional bias for the GCACUU motif (Supplementary Fig. 5f). In cases where both motifs are present in the Ago2-CLIP sequences, there are no biases as to whether the G-motif is 5′ or 3′ of the GCACUU motif (data not shown). The activity of the examined G-motifs is independent of its location relative to GCACUU (c.f. Fig. 4b, d-e), suggesting that the vicinity, rather than the directionality, is important for modulating miRNA repression.
Photo-crosslinking followed by Ago2 immunoprecipitation, Ago2-CLIP, was used to identify miRNA binding sites in mESCs. We found significantly enriched motifs in 3′UTRs and CDS that correspond to miRNA seed matches, representing 201 and 103 potential mESC miRNA targets in 3′UTRs and CDS, respectively. In regards to the latter point, this study is in agreement with other studies that the presence of miRNA binding sites in CDS is more widespread than has been previously considered and nearly as prevalent as in 3′UTRs14-16. Here we provided gene expression data suggesting that these CDS sites regulate mRNA stability much like 3′UTR sites. Moreover, these sites can be recognized by miRNAs at endogenous expression levels and confer repression in a heterologous 3′UTR8,13,41,42.
Two other Ago-CLIP studies have identified potential miRNA targets in mammalian cells and tissue. Our study differs from those by analyzing mRNAs associated with endogenous Ago2 in a mostly homogenous cell population of mESCs, whereas Chi et al.15 performed CLIP on brain extracts using endogenous Ago antibodies and Hafner et al.14 performed CLIP in 293 cells using HA-tagged Ago1-4 and crosslinking by a photoactivatable nucleotide. Independently of the variations in the CLIP technique and data analysis, these studies, as well as our studies, identified similar numbers of targets for each miRNA seed family (several hundred), which is comparable to the number of moderately conserved targets predicted for each miRNA seed family by TargetScan11 (Supplementary Notes for cross-comparison with other CLIP datasets).
There are several previously published reports of miRNA-regulated mRNAs in mESCs that we could compare to the Ago2-CLIP 3′UTR GCACUU-containing genes35, 43. Only miR-294 (member of AAGUGC seed family) regulated mRNAs described by Melton et al.43 showed significant overlap with the Ago2-CLIP 3′UTR GCACUU-containing mRNAs (Supplementary Table 6b for all comparisons).
Unlike other cell types, including those used in other Ago2-CLIP studies14,44, mESCs appear to be dominated by a single miRNA seed family that is probably responsible for most of the miRNA regulation in this cell type. Essentially all of the GCACUU-motif containing CLIP 3′UTR clusters conferred miRNA-dependent regulation when tested in luciferase reporter assays in the presence of endogenous levels of miRNAs. This suggests that the stringency of our filtering criteria resulted in selection of a high confidence set of GCACUU-containing mRNAs that most likely are bona fide miRNA targets in mESCs. Previous studies have already shown that this miRNA family plays important roles in mESCs, including maintaining pluripotency, self-renewal and cell cycle control33-35,43,45. But, few targets have been identified and validated. This study identifying a few hundred new miRNA targets by Ago2-CLIP is a significant step in the exploration of this biology.
To understand the extent of miRNA-regulated pathways represented by the Ago2-CLIP 3′UTR GCACUU-motif genes (“All” set, 201 genes), we performed pathway enrichment analysis (Supplementary Methods) and compared this set with the top 201 “Conserved predicted targets” and all mRNAs expressed in mESCs that contain GCACUU hexamer in the 3′UTR (“All predicted targets”, 2969 genes). 37 and 11 pathways were significantly enriched in the CLIP and “Conserved predicted targets” sets, respectively (Supplementary Figure 6a and Supplementary Table 7). The pathways significantly enriched in CLIP included “Early S-phase” (4 genes), a pathway in which miR-290~295 has been previously implicated33, and “TGF-beta receptor signaling” (5 genes), a pathway where miR-290~295 has not been implicated.
The genes identified by Ago2-CLIP in “TGF-beta receptor signaling” pathway (p-value 0.013) include two intracellular pathway inhibitors, the cytoplasm-localized Smad7 and the nucleus-localized Skil, and an extracellular inhibitor, Lefty146. Our reporter assay confirmed that these 3 genes are indeed targeted by miR-295 (Supplementary Fig. 6b). We extended this analysis to Lefty2, a gene that was not identified in the CLIP results, but is homologous to Lefty1 and contains the GCACUU hexamer, and showed that it is also targeted by miR-295 (Supplementary Fig. 6b). Correlated with this, miR-302 and miR-430, which are related in miRNA seed to miR-290~295, have been shown, respectively, in human ESCs47 and zebrafish embryos48 to regulate differentiation through targeting Lefty homologs. Here using a genome-wide approach, we found that the miR-290~295 regulates not only the extracellular Lefty homologs, but also additional inhibitory nodes of the TGF-beta pathway localized in different cellular compartments (Supplementary Fig. 6c). This coordinate inhibition, as observed for other miRNAs49-51, might confer robustness in this signaling network.
We unexpectedly identified a G-rich motif in most of the sequences associated with Ago2 regardless of the miRNA status in the cell. We believe this is a true biological association, rather than a technical artifact, based on the following observations. First, this motif is conserved above the general 3′UTR background even when matched for sequence content. Second, the genes containing this G-rich motif have significant overlap with the set of genes enriched in HA-tagged Ago2 immunoprecipitates from Dicer−/− mESCs. Third, we only observe G bias in genic sequences, and not miRNA or intergenic sequences crosslinked to Ago2. Lastly, the enrichment of G residues is not likely due to CLIP itself as there are no described G biases in the literature for any of the steps involved (Supplementary Notes for further discussion).
Yet it remains unclear whether crosslinking to this G-rich sequence is due to Ago2 itself or a binding partner of Ago2. Given that UV-crosslinking forms covalent bonds between protein and RNA that are within angstroms, a potential binding partner would have to be in close proximity to Ago2 and the mRNA target. Indeed, several proteins that co-immunoprecipitate with Ago2 have binding preference for G-rich sequences, including HNRNP-H and FMRP52-54, although we only observed one Ago2-dependent RNAprotein complex close to the molecular weight of native Ago2 in the CLIP procedure. Alternatively, Ago2 itself could have a previously unidentified preference for binding G-rich sequences. In either case, when a G-rich sequence occurs near a miRNA binding site, it could give the Ago2/miRNA complexes a higher affinity for this region and thus lead to increased probability that the mRNA is targeted for degradation and/or inhibition of translation. In three cases examined, this G-rich motif modulates the level of miRNA-dependent regulation by the miR-290~295-related seed motif, but imparts no regulation by itself. Therefore, identification of this association indicates the value of Ago2-CLIP data from Dicer−/− mESCs as an invaluable background in delineating bona fide microRNA targets.
Methods and any associated references are available in the online version of the paper at http://www.nature.com/nsmb/
We thank C. Burge, J. Wilusz, A. Ravi and members of the Sharp laboratory for critical comments, M. Lindstrom for illustration, T. Cybulski for technical help, A. Leshinsky and R. Cook for running the Solexa sequencing samples in the KI Biopolymer & Proteomics Core Facility, M. Luo and L. Smeester for microarray technical assistance in the MIT Department of Biology Biomicrocenter, and C. Whittaker for bioinformatics support in the Bioinformatics & Computing Core Facility at the Koch Institute. A.K.L was supported by a special fellowship from the Leukemia and Lymphoma Society. A.G.Y was partially supported by a David H. Koch graduate fellowship. This work was supported by United States Public Health Service grants R01-GM34277 and R01-CA133404 from the National Institutes of Health, P01-CA42063 from the National Cancer Institute to P.A.S and partially by Cancer Center Support (core) grant P30-CA14051 from the National Cancer Institute.
Accession codes. Microarray and short RNA sequence files have been deposited at the Gene Expression Omnibus database repository under accession number GSE25310. BED files for clusters in all libraries are available for download from http://rowley.mit.edu/pubs/Ago2_CLIP/Ago2_CLIP.html
Note: Supplementary information is available on the Nature Structural & Molecular Biology website.
Author Contributions: A.K.L. and A.G.Y. designed and performed the experiments; A.K.L., A.G.Y. and P.A.S. wrote the paper; A.D.B. performed experiments; A.B., C.B.N, and G.X.Z. performed the bioinformatics analyses. All authors reviewed and approved the manuscript.
Competing Interest Statement: The authors declare no competing financial interests.