|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs (miRNAs) are endogenous ~23-nt RNAs that can play important gene-regulatory roles in animals and plants by pairing to the mRNAs of protein-coding genes to direct their posttranscriptional repression. This review outlines the current understanding of miRNA target recognition in animals and discusses the widespread impact of miRNAs on both the expression and evolution of protein-coding genes.
Characterization of C. elegans genes that control the timing of larval development revealed two small regulatory RNAs, known as the lin-4 and let-7 RNAs (Lee et al., 1993; Reinhart et al., 2000). let-7 homologs, soon recognized in other bilateral animals, including mammals, exhibited temporal expression resembling that observed in C. elegans, suggesting that let-7 and perhaps other small temporal RNAs might be playing orthologous roles in diverse metazoan lineages (Pasquinelli et al., 2000). Soon thereafter, lin-4 and let-7 RNAs were reported to represent a very populous class of small endogenous RNAs found in worms, flies and mammals — a few expressed temporally, but most not — which were named microRNAs (miRNAs) (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001). miRNAs have since been found in plants, green algae, viruses, and more deeply branching animals (Griffiths-Jones et al., 2008).
Meanwhile other types of small RNAs have been found in animals, plants, and fungi. These include endogenous small interfering RNAs (siRNAs)(Reinhart and Bartel, 2002; Ambros et al., 2003) and Piwi-interacting RNAs (piRNAs)(Aravin et al., 2007). Like miRNAs, many of these other RNAs function as guide RNAs within the broad phenomenon known as RNA silencing, but miRNAs differ from these other classes of small RNAs in their biogenesis: miRNAs derive from transcripts that fold back on themselves to form distinctive hairpin structures (Bartel, 2004), whereas the other types of endogenous small RNAs derive either from much longer hairpins that give rise to a greater diversity of small RNAs (siRNAs), or from bimolecular RNA duplexes (siRNAs), or from precursors without any suspected double-stranded character (piRNAs).
Once processed from the hairpin (Grishok et al., 2001; Lee et al., 2003) and loaded into the Argonaute protein of the silencing complex (Hutvagner and Zamore, 2002; Mourelatos et al., 2002), the miRNAs pair with mRNAs to direct posttranscriptional repression. At sites with extensive pairing complementarity, metazoan miRNA can direct Argonaute-catalyzed mRNA cleavage (Hutvagner and Zamore, 2002; Song et al., 2004; Yekta et al., 2004). More commonly, though, the metazoan miRNAs direct translational repression, mRNA destabilization, or a combination of the two (Lee et al., 1993; Wightman et al., 1993; Lim et al., 2005). The various molecular processes at the heart of miRNA-directed translational repression and mRNA destabilization, which include inhibition of translation initiation and poly(A) shortening, are reviewed elsewhere (Filipowicz et al., 2008).
The number of confidently identified miRNA genes has surpassed 110 in C. elegans, 140 in D. melanogaster, and 400 in humans—numbers that approach about 1–2% the number of protein-coding genes in these respective species (Ruby et al., 2006; Landgraf et al., 2007; Ruby et al., 2007). These numbers will undoubtedly increase as high-throughput sequencing continues to be applied both to miRNA discovery and to the validation of some of the many additional candidates proposed.
The discovery of the abundance of miRNAs in diverse multicellular species raised many questions, including, perhaps most intriguingly, what are all these tiny noncoding RNAs doing? Key to answering this question has been to learn how to find their regulatory targets. Initial clues to miRNA target recognition came from the observation that the lin-4 RNA had some sequence complementarity to multiple conserved sites within the lin-14 mRNA (Lee et al., 1993; Wightman et al., 1993), within a region of the 3' untranslated region (UTR) that earlier molecular genetic analyses had shown was required for the repression of lin-14 by lin-4 (Wightman et al., 1991). Similarly, lin-4 and let-7 RNAs were found to have complementarity to UTR sites of lin-28 and lin-41, respectively, which are targets that were also found with help of genetic analyses (Moss et al., 1997; Reinhart et al., 2000).
What about the hundreds of miRNAs identified by cloning and computation, most of which correspond to loci without previously identified functions? In plants, many targets can be predicted with confidence simply by searching for messages with extensive complementarity to the miRNAs (Rhoades et al., 2002). In animals extensive complementarity, with consequent cleavage of the targeted message, occasionally occurs, but is much more unusual (Yekta et al., 2004; Davis et al., 2005). Thus, for metazoan miRNAs the challenge has been to devise a genomewide computational search that captures most of the regulatory targets without also bringing in too many false predictions.
Initial attempts generated algorithms and sets of predictions that were difficult for experimentalists to evaluate, which was exacerbated by the poor overlap between sets of predictions from the same organism (Enright et al., 2003; Lewis et al., 2003; Stark et al., 2003; John et al., 2004; Kiriakidou et al., 2004). Nonetheless, some of these efforts have provided methods and insights that helped set the stage for our current understanding of metazoan miRNA recognition. A key methodological advance was the use of preferential evolutionary conservation to evaluate the ability of an algorithm to distinguish miRNA target sites from the multitude of 3'-UTR segments that otherwise would score equally well with regard to the quality of miRNA pairing (Lewis et al., 2003). To the extent that sites are conserved more than would be expected by chance, they are judged to be under selective pressure and therefore biologically functional. Thus, by summing the net yield of conserved sites after correcting for the number expected by chance, features and refinements of the algorithm can be evaluated computationally. For example, short subsegments of the miRNA can be individually screened to learn which ones are subject to preferentially conserved pairing. In this way, common features of target recognition can be distinguished from those that seem equally plausible but are rarely if ever used, thereby enabling the principles of target recognition to be elucidated and algorithms to be developed without resorting to training on a known set of targets (Lewis et al., 2003; Lewis et al., 2005). Developing the algorithm without consideration of known targets avoids biases from sites that are more easily found experimentally and was particularly useful for mammalian miRNAs, for which no targets were known.
Current prediction methods are diverse, both in approach and performance (Table 1, Baek et al., 2008; Selbach et al., 2008), and all have room for improvement. Nonetheless, agreement is emerging on three conclusions, which are each reassuringly consistent with a growing body of experimental data. And, as further relief for the non-computational biologist, the most critical concepts for computational target prediction can be distilled down to a few simple guidelines that can be implemented by anyone with access to the UC Santa Cruz Genome Browser and the Find function on their word processor.
The first major conclusion is that requiring conserved Watson–Crick pairing to the 5' region of the miRNA centered on nucleotides 2–7, which is called the miRNA ‘seed’ (Figure 1) markedly reduces the occurrence of false-positive predictions (Lewis et al., 2003; Brennecke et al., 2005; Krek et al., 2005; Lewis et al., 2005). The discovery that perfect seed pairing substantially improves prediction reliability implied that it was also important for miRNA target recognition (Lewis et al., 2003). This assertion dovetailed nicely with previous reports that the 5' region is the most conserved portion of the metazoan miRNAs (Lim et al., 2003) and the 5' region of certain Drosophila miRNAs perfectly matches 3'-UTR elements that mediate mRNA decay and translational repression (Lai, 2002), as well as subsequent experiments showing that miRNA-like regulation was most sensitive to nucleotide substitutions that disrupt seed pairing (Doench and Sharp, 2004; Kloosterman et al., 2004; Brennecke et al., 2005; Lai et al., 2005).
The second conclusion is that conserved pairing to the seed region can also be sufficient on its own for predicting conserved targets above the noise of false-positive predictions (Brennecke et al., 2005; Krek et al., 2005; Lewis et al., 2005). For example, mammalian targets can be predicted by simply searching for conserved 7-nt matches in aligned regions of vertebrate 3' UTRs (Lewis et al., 2005). Prediction specificity increases when requiring an 8-nt match or multiple matches to the same miRNA, but systematic analysis of preferentially conserved features indicates that most targets of a given miRNA have only a single 7-nt match to that miRNA seed region. Fortunately, enough genomes have been sequenced and aligned such that these targets with single sites can now be predicted with confidence that most are authentic; when assessing the evolutionary conservation of 7-nt motifs that match miRNAs compared to those that do not match miRNAs but are of equal abundance in the UTRs, the ratio of predicted targets to estimated false positives is 3.5:1 in a five-genome analysis that extends to chicken (Lewis et al., 2005).
Hence, a simple three-step protocol can predict evolutionarily conserved targets for a metazoan miRNA: 1) Identify the two 7-nt matches to the seed region (Figure 1A and B). For example, miR-1, with sequence 5'-UGGAAUGUAAAGAAGUAUGUA, would recognize the CAUUCCA match and the ACAUUCC match. 2) Use available whole-genome alignments (Karolchik et al., 2008) to compile orthologous 3' UTRs. 3) Search within the orthologous UTRs for conserved occurrence of either 7-nt match. These are predicted regulatory sites. Note that members of the same miRNA family (i.e., miRNAs with the same sequence at nucleotides 2–8) all share the same predicted targets. A search for conserved 8-nt sites comprised of both 7-nt motifs (e.g., ACAUUCCA, in the case of miR-1, Figure 1C) yields greater prediction specificity, whereas a search for conserved 6-nt seed matches (Figure 1D) yields greater sensitivity. When only a few genomes are available, those sites present at orthologous positions in all genomes examined are the ones considered conserved. When more genomes are available, more sophisticated measures of conservation increase the information gleaned from the alignments (Gaidatzis et al., 2007; Kheradpour et al., 2007; Friedman et al., 2008).
As might be expected, searching for conserved instances of either of two 7-mers yields many predicted targets—hundreds of messages for each miRNA family. The surprise is that after subtracting the number of sites expected to be conserved by chance, the number of predicted targets remains very high. This leads to the third major conclusion: highly conserved miRNAs have very many conserved targets (Brennecke et al., 2005; Krek et al., 2005; Lewis et al., 2005; Xie et al., 2005). An updated analysis of preferential conservation of 7–8-nt sites reveals that the mammalian miRNAs conserved through vertebrates have an average of 300 conserved targets per miRNA family, a number that exceeds 400 if 6-mer sites are also included (Figure 1H, Friedman et al., 2008). In sum, more than half of the human protein-coding genes appear to have been under selective pressure to maintain 3´-UTR pairing to miRNAs (Friedman et al., 2008). As a result, miRNA targeting can explain a sizable fraction of the conserved motifs in mammalian 3´ UTRs (Xie et al., 2005).
As computational studies were uncovering evidence of widespread targeting, an experimental approach reached the same conclusion. After introducing miRNAs into HeLa cells, microarray analyses revealed modest effects on the levels of hundreds of mRNAs (Lim et al., 2005). Messages that decrease in response to each miRNA tend to have corresponding seed matches at a propensity indicating that most of the down-regulated messages are directly targeted by the miRNA. Because the miRNAs are introduced into cells that normally do not express them, some of the interactions observed in HeLa cells probably do not occur in the animal. Nonetheless, introduction of the miRNAs causes the expression profile of HeLa to shift towards that of the organ that normally expresses the introduced miRNA, thereby indicating that much of the observed targeting reflects, either directly or indirectly, that within the animal (Lim et al., 2005). In addition to providing experimental support for both the importance of seed pairing and the conclusion that miRNAs repress many messages, these results overturned the prevailing notion that miRNAs down-regulate protein output without influencing message levels. Indeed, re-evaluation of the effects of the lin-4 miRNA on lin-14 mRNA levels indicated substantial mRNA destabilization of this classical miRNA target (Bagga et al., 2005). Subsequent studies inhibiting or ablating endogenous miRNAs conclusively demonstrate the widespread transcriptomic effects on seed-matched messages (Krutzfeldt et al., 2005; Giraldez et al., 2006; Rodriguez et al., 2007). Identities of the mRNAs that co-immunoprecipitate with silencing complexes and the proteins that change following miRNA introduction or disruption further confirm the widespread targeting of seed-matched mRNAs (Beitzinger et al., 2007; Easow et al., 2007; Karginov et al., 2007; Baek et al., 2008; Selbach et al., 2008).
Although initial target-prediction efforts yielded largely non-overlapping predictions, this changed with the developing consensus on the importance of seed pairing for miRNA:target recognition (Table 1). When evaluated based on proteomic changes following miRNA addition or deletion, tools that stringently require Watson–Crick seed pairing perform better than those that do not (Baek et al., 2008; Selbach et al., 2008). Tools with more moderately stringent cutoffs that allow a mismatch or wobble to the miRNA seed generally do perform better than expected by chance, but this observed efficacy can be explained by the strong response of a subset of predictions with perfect seed matches. The inability of these tools to find many functional sites with seed mismatches does not necessarily mean that mammalian messages lack such sites — it only means that, of the tools developed thus far, none are able to find many functional seed-mismatched sites.
The current predictions by TargetScan, PicTar, EMBL, and ElMMo have a high degree of overlap because they now all require stringent seed pairing. However, they are not 100% identical. Some reasons for imperfect overlap can be traced to alignment artifacts, the use of slightly different UTR databases, or the use of different miRNA sequences. Other reasons are intrinsic to the prediction algorithms themselves, such as the treatment of the target nucleotide opposite the first miRNA nucleotide. TargetScan rewards an A across from position 1 (Figure 1A, C), whereas the other algorithms with stringent seed pairing reward a Watson–Crick match across from this position (Krek et al., 2005; Lewis et al., 2005; Stark et al., 2005; Gaidatzis et al., 2007). This is not a factor for the many miRNAs that begin with a U; for such miRNAs the 7-nt matches used by each of the four tools are identical. However, for a miRNA beginning with A, C, or G, about half of the predicted targets differ because for such a miRNA one of the two 7-nt matches (the 7mer-m8, Figure 1B) is the same for all four tools, whereas the other one differs (the 7mer-A1, Figure 1A).
Several lines of evidence support the non-Watson–Crick recognition of target position 1. The first is the site-conservation analyses that originally detected this preference in vertebrates (Lewis et al., 2005). The second is array and proteomics data in mammalian cells showing that for miRNAs (and siRNAs) that do not begin with U, the 7mer-A1 sites out-perform sites with a Watson–Crick match to position 1 (Nielsen et al., 2007; Baek et al., 2008). Whether or not this preference extends beyond mammals has not yet been reported, but a biochemical preference for a mismatch at position 1 and crystallographic studies indicating that the 5´-most nucleotide of an Argonaute-bound guide RNA is not paired to the target strand both suggest that it could extend beyond mammals (Haley and Zamore, 2004; Ma et al., 2005; Parker et al., 2005).
Another difference between the prediction sets is their rank ordering of the targets, designed to help biologists focus on predictions more likely to be authentic or responsive. When assessed using available proteomics results, the higher-ranked predictions of several tools trend toward better performance, with the most robust discrimination observed for Targetscan rankings (Baek et al., 2008). When evaluated independently, each of the parameters used to rank TargetScan predictions — site conservation, site number, site type (with 8mer > 7mer-m8 > 7mer-A1) and site context (described below) — correlate with targeting efficacy (Grimson et al., 2007; Nielsen et al., 2007; Baek et al., 2008; Friedman et al., 2008; Selbach et al., 2008). Examples of top-ranked targets for well-studied miRNA families in nematodes, flies, and mammals are shown (Figure 2A).
Even after recognition of the importance of seed pairing, a reasonable assumption has been that pairing to the remainder of the miRNA usually supplements seed pairing to enhance binding specificity and affinity. However, experimental evidence for frequent function of such 3´-supplementary pairing has not yet materialized (Doench and Sharp, 2004; Brennecke et al., 2005; Lim et al., 2005). Similarly, comparative analysis in flies indicates that the majority of sites under selection have no more 3´-supplementary pairing than expected by chance (Brennecke et al., 2005), and parallel analyses in mammals motivated a disregard of 3´-supplementary pairing altogether (Lewis et al., 2005). Indeed, the early pairing and energy-based rubrics initially designed to identify and rank 3´-supplementary pairing (Lewis et al., 2003; John et al., 2004; Krek et al., 2005) lack predictive value (Grimson et al., 2007; Baek et al., 2008).
Complicating genomewide attempts to evaluate the potential contribution of 3´-supplementary pairing are the numerous potential pairing possibilities involving the 3´ portion of the miRNA and the UTR, some of which might be more favored than others, perhaps because they are more compatible with the location and configuration of the miRNA in the silencing complex. Systematic analyses of pairing possibilities, searching for those associated with preferential conservation and increased efficacy on the arrays, has revealed a type of 3´-supplementary pairing that is both productive and associated with a sufficient number of sites to be supported by microarray datasets (Grimson et al., 2007). This 3´ pairing optimally centers on miRNA nucleotides 13–16 and the UTR region directly opposite this miRNA segment (Figure 1F). Such sites in which 3´ pairing productively augments seed pairing are called ‘3´-supplementary sites’. Like seed pairing, 3´ pairing appears relatively insensitive to predicted thermostability and instead is sensitive to pairing geometry, preferring at least 3–4 contiguous Watson–Crick pairs uninterrupted by bulges, mismatches or wobbles. Sites with conserved supplementary pairing of this type are predicted with significantly greater specificity, but such sites are atypical (Figure 1H) and tend to be only slightly more effective than those without the supplemental pairing (Grimson et al., 2007), suggesting that supplemental 3´ pairing plays a modest role in target recognition.
Pairing to the 3´ portion of the miRNA can not only supplement a 7–8mer match, it can also compensate for a single-nucleotide bulge or mismatch in the seed region, as illustrated by the let-7 sites in lin-41 and the miR-196 site in Hoxb8 (Figure 2F). These are called ‘3´-compensatory sites’ (Figure 1G). In all experimentally validated examples of 3´-compensatory sites, the pairing centered on miRNA nucleotides 13–17 extends to at least 9 contiguous Watson–Crick pairs — substantially more than the number needed to observe effective supplementary pairing. Indeed, for the miR-196 site in Hoxb8, the pairing is so extensive that the miRNA directs the Argonaute-mediated cleavage of the message (Yekta et al., 2004).
The let-7 sites in C. elegans lin-41 happened to be some of the first proposed and definitively validated sites in animals (Reinhart et al., 2000; Vella et al., 2004), and therefore had a particularly strong influence on early concepts of miRNA targeting. Indeed, the let-7 sites in lin-41, which had some resemblance to a few of the proposed (but not individually validated) lin-4 sites in lin-14, were originally thought to exemplify typical miRNA target sites. However, systematic examination of site conservation indicates that mismatched seed sites with 3´-compensatory pairing are only rarely under selective pressure to be conserved (Brennecke et al., 2005; Lewis et al., 2005; Friedman et al., 2008). Perhaps because such sites with extensive pairing to the 3´ portion of the miRNA possess much more informational complexity than do the 7–8mer perfect matches and therefore emerge much less frequently and are harder to maintain in evolution, these 3´-compensatory sites appear to be used only rarely for biological targeting, comprising ~1% of the conserved sites in mammals (Figure 1H). Thus to achieve prediction specificity, algorithms can either omit the prediction of 3´-compensatory sites (Lewis et al., 2005; Gaidatzis et al., 2007) or predict them at high stringency, requiring such extensive pairing to the miRNA that the 3´-compensatory sites do not substantially increase the total number of predictions (Krek et al., 2005; Stark et al., 2005; Friedman et al., 2008).
If 3´-compensatory sites are indeed rare, then the question arises as to why lin-41 would have two highly conserved 3´-compensatory sites for let-7. An answer to this question becomes apparent when considering the other let-7 family members in worms, which include let-7, miR-48, miR-84, and miR-241, all of which have the same seed region but differ in their remaining sequence (Lau et al., 2001; Lim et al., 2003). Because these three paralogs are expressed earlier than is let-7, their repression of lin-41 would presumably cause cells to precociously assume adult cell fates (Reinhart et al., 2000; Abbott et al., 2005). To prevent this undesired outcome, both lin-41 sites have two important features: 1) imperfect seed pairing to the let-7 family, which prevents regulation by most family members, and 2) extensive compensatory pairing involving the unique 3´ portion of let-7 RNA, which enables later repression by let-7 (Brennecke et al., 2005; Lewis et al., 2005). Presumably other situations requiring regulation by a specific member of a miRNA family would favor the emergence and retention of 3´-compensatory pairing, and the consequent lack of redundant function among miRNA family members might make such targets particularly sensitive to the loss of a single miRNA (and thereby more readily identified through forward genetics). Moreover, a few key 3'-compensatory sites per miRNA could explain the preferential conservation of miRNA nucleotides sometimes observed outside the seed region.
For the many sites that lack evidence for consequential 3´ pairing, the 3´ region of the miRNA might still interact with the message, but in a way that does not favor matches over mismatches and therefore does not add detectably to targeting specificity. A common practice is to show the 3´ region of the miRNA paired to the message, depicting short regions of potential pairing of the type that generally can be found between any two arbitrarily chosen RNA fragments. A reasonable alternative is to avoid proposing such pairing for a majority of miRNA sites (e.g., Figure 2D) and to depict pairing to the miRNA 3´ region only in cases for which there is reason to believe that it adds to targeting specificity (Figure 2E and F).
One mechanistic model for explaining the primacy of seed pairing proposes that the protein of the silencing complex presents the 5´ region of the miRNA (or siRNA) preorganized to favor Watson–Crick pairing to the mRNA (Bartel, 2004; Mallory et al., 2004) (Figure 3A). Presentation of nucleotides 2–8 prearranged in a geometry resembling an A-form helix would enhance both the affinity and specificity for matched mRNA segments, enabling 7–8-nt sites to suffice for most targeting functions (Figure 3B). Presentation of a preformed helical segment longer than ~7 nt would impose topological challenges and would not increase the effective nucleation surface because too many nucleotides would face opposing directions, whereas a shorter preformed segment would have both lower affinity and lower specificity. Consistent with this model, a co-structure with a target mimic reveals protein contacts to the guide-strand backbone that might also preorganize the seed region prior to target binding (Ma et al., 2005; Parker et al., 2005), and biochemical studies indicate that affinity to the seed is stronger than that to other regions of the RNA (Haley and Zamore, 2004; Ameres et al., 2007).
Although pairing to the seed region is often sufficient for functional binding specificity, some sites involve additional pairing outside the seed region (Figure 1F–H). A disproportional importance of seed pairing nonetheless is observed at cleavage sites, suggesting that pairing might still nucleate at the seed match of these cleavage sites and then spread to the central and 3´ regions of the miRNA (or siRNA) (Bartel, 2004; Haley and Zamore, 2004; Mallory et al., 2004; Ameres et al., 2007). In this seed-nucleation model, a substantial conformational accommodation must occur to achieve extensive Watson–Crick pairing: Prior to target recognition, the miRNA is likely bound along its entire length to protein, presumably to the Argonaute protein (Figure 3A); otherwise the miRNA sugar-phosphate backbone would be accessible to cellular RNases. After nucleating at the seed (Figure 3B), pairing cannot spread much further without the protein releasing its grip on the remainder of the miRNA. Perhaps successive pairs to the more central miRNA residues helps trigger a transient release of the central and 3´ miRNA regions, allowing these regions to wrap around the message to complete the two helical turns (Figure 3C). Following this large conformational change, the protein can complete the accommodation by re-binding the central and 3´ regions of the miRNA in a mode that cleaves the message (Figure 3D). In this model, some binding energy gained in forming the central pairs would be offset by the disruption of miRNA:Argonaute interactions, causing contiguous pairing outside the seed to contribute less affinity than might have otherwise been expected. This lower contribution to affinity could explain why the sites that do not require central pairing for function (3´-supplementary and 3´-compensatory sites) tend to skip contiguous pairing to the central residues (Figure 1F–G). Pairing to these residues might be neutral or even disfavored in terms of binding energy and would begin to induce the accommodation, whereas starting a fresh pairing region at residues 13–16 (without the miRNA actually wrapping around the mRNA) would enable additional favorable interactions without incurring the cost of the large conformational change (Figure 3E).
Target recognition that relies heavily on 7-nt matches to the seed region creates the possibility for a lot of nonconserved targeting, given that poorly conserved 7-nt sites outnumber preferentially conserved ones by about ten-to-one. Are a substantial fraction of these sites functional? After all, the cell cannot evaluate evolutionary conservation when choosing which of the sites should mediate repression.
Heterologous reporter assays show that when present in the same cell as the miRNA, a large fraction of nonconserved sites can indeed function (Farh et al., 2005), raising the question of how often messages with nonconserved sites are present in the same tissues as their cognate miRNAs. Analyses of mRNA and miRNA expression profiles indicate that 3´ UTRs with nonconserved sites are most often found in genes primarily expressed in tissues where the cognate miRNA is absent (Farh et al., 2005). This observation has a simple evolutionary explanation, known as selective avoidance. Over the course of evolutionary drift, sites for miRNAs that are absent in the cells where the mRNA is expressed can accumulate without consequence, whereas sites that emerge for a miRNA that is highly expressed in the same cell where the message functions will often impart a selective disadvantage and thus fail to be fixed in the population (Farh et al., 2005; Stark et al., 2005). Despite this trend, because so many messages have nonconserved 7-nt sites for each miRNA, the minority of messages that are co-expressed with the miRNA still constitutes a large number, creating the possibility for much nonconserved targeting (Farh et al., 2005). Identities of messages and proteins that increase upon inhibition or removal of an endogenous miRNA demonstrate that nonconserved targeting is even more widespread than conserved targeting (Krutzfeldt et al., 2005; Giraldez et al., 2006; Rodriguez et al., 2007; Baek et al., 2008; Selbach et al., 2008). Although much of this nonconserved targeting could represent inconsequential, evolutionarily neutral dampening of gene expression, some presumably represents important species-specific repression. With this in mind, targets have been predicted without considering site conservation (Miranda et al., 2006; Grimson et al., 2007; Kertesz et al., 2007). PITA and TargetScan predictions have been reported both with and without conservation cutoffs. Although disregarding conservation compromises overall performance, the highest ranked predictions of both tools perform at least as well as do their respective highest ranked conserved predictions (Baek et al., 2008).
The depletion of sites in messages co-expressed with the miRNA is particularly striking in messages most highly and preferentially expressed in the same tissue as the miRNA. 3´-UTRs of these mRNAs have about half as many nonconserved 7-nt sites for that miRNA as expected by chance, a depletion attributed in large part to selective avoidance (Farh et al., 2005). The evolutionary pressure to avoid emergence of fortuitous miRNA sites is also detected in ‘housekeeping genes’ and might explain the shorter 3´ UTRs of these genes in animals when compared to 3´ UTRs of orthologs in plants and fungi, which lack abundant 3´ UTR targeting (Stark et al., 2005). Messages selectively avoiding targeting to a miRNA are called the ‘antitargets’ of that miRNA (Bartel and Chen, 2004). When considering the thousands of messages avoiding targeting to particular miRNAs together with those avoiding targeting to all miRNAs, the phenomenon of selective avoidance clearly has had a widespread impact on UTR evolution, with the estimated number of antitargets comparable to the number of conserved targets (Farh et al., 2005).
When considering conserved targeting, nonconserved targeting, and antitargeting, miRNAs likely influence the expression or evolution of nearly all mammalian mRNAs. Indeed, these effects are so widespread that the spatial and temporal specificities of highly expressed miRNAs can be revealed by finding those 7-nt motifs that are underrepresented in messages preferentially expressed in particular tissues or developmental stages (Farh et al., 2005). The reason that this unusual approach of searching for the absence of nonconserved sites (rather than the presence of conserved sites) is productive lies in the fact that 7-nt sites—conserved ones as well as nonconserved ones—frequently have what it takes to mediate biological repression. Indeed, the most important implication of the selective avoidance phenomenon concerns its ramifications for endogenous target recognition. Selective avoidance shows that when 7mer sites emerge in 3´ UTRs, they often appear in contexts suitable for biological repression—otherwise, depletion of sites could not be observed.
Although 7–8-nt matches to the seed region, coupled with whether or not the mRNA is coexpressed with the miRNA, can explain much of targeting specificity, they cannot explain all of it. Selective avoidance leads to ~50% site depletion, not the near-100% depletion that might have been expected if 7-nt sites mediated repression regardless of their sequence context. Moreover, reporter assays show that the identical site can mediate repression in some UTRs but not in others (e.g., Brennecke et al., 2005; Farh et al., 2005; Giraldez et al., 2006). Because pairing to the 3´ portion of the miRNA is rarely consequential, additional features of UTR context must influence site efficacy. Additional features recently found to boost site efficacy include 1) positioning within the 3´ UTR at least 15 nt from the stop codon, 2) positioning away from the center of long UTRs, 3) AU-rich nucleotide composition near the site or other measures of site accessibility, and 4) proximity to sites for coexpressed miRNAs. These features are discussed below.
Although most investigation into metazoan miRNA function has been for sites in 3´ UTRs, experiments using artificial sites show that targeting can occur in 5´ UTRs and open reading frames (ORFs) (Kloosterman et al., 2004; Lytle et al., 2007), and computational and experimental genome-wide analyses indicate that a significant amount of targeting, involving thousands of mRNAs, occurs in ORFs (Farh et al., 2005; Lewis et al., 2005; Lim et al., 2005; Easow et al., 2007; Grimson et al., 2007; Stark et al., 2007; Baek et al., 2008) (Figure 2C). Overall, endogenous ORF targeting appears less frequent and less effective than 3´-UTR targeting but still much more frequent than 5´-UTR targeting.
One reason that 5´ UTRs and ORFs may be less hospitable for targeting is that silencing complexes bound to these regions would be displaced by the translation machinery as it translocates from the cap-binding complex through the ORF (Bartel, 2004). Support for this notion comes with the observation that the transition to more effective and more selectively conserved sites is not at the stop codon but instead occurs ~15 nt into the 3´ UTR, precisely as expected if the first 15 nt of the 3´ UTR were cleared of silencing complexes when they enter the ribosome as it approaches the stop codon (Grimson et al., 2007). Targeting of sites perfectly complementary to artificial siRNAs is not hampered by ribosome interference, presumably because the ribosome has more difficulty disrupting extensive pairing or because sites that are cleaved need not remain associated as long to the mRNA.
The apparent interference by the ribosome along its entire path of translation implies that most messages under detectable miRNA control experience at least one round of translation prior to or concurrent with their repression; if an appreciable number of molecules were repressed prior to translation of the full-length protein, then the strong effects of ribosome interference could not be observed (Grimson et al., 2007). By similar reasoning, ORF targeting is expected to be much more effective in messages that are already inefficiently translated, because fewer ribosomes passing through the site would allow for greater residency time of the silencing complex. In this way, ORF targeting could provide an important mechanism for amplifying the effects of both miRNA 3´-UTR targeting and other types of translational repression.
Genome-wide analyses of site conservation, site efficacy, and site depletion all indicate that 7–8-nt sites within the 3´ UTR and out of the path of the ribosome tend to be most effective if they do not fall in the middle of long UTRs (Grimson et al., 2007). One explanation for these results is that sites in the middle of long UTRs might be less accessible to the silencing complex because they would have opportunities to form occlusive interactions with segments from either side, whereas sites near the UTR ends would not. These same types of analyses show that even more important than site position is the nucleotide composition in the immediate vicinity of the site, with those sites within high local AU content performing best (Grimson et al., 2007). When the site has a match to position 8, an A or U across from position 9 is particularly favorable, suggesting non-Watson–Crick recognition of this nucleotide resembling that opposite position 1 (Lewis et al., 2005; Nielsen et al., 2007). The remaining benefit of local AU composition might be, as suggested for site position, to place the site within a more accessible UTR context. Indeed, several methods have been proposed for predicting accessible UTR secondary structure favorable for miRNA targeting (Robins et al., 2005; Zhao et al., 2005; Kertesz et al., 2007; Long et al., 2007; Hammell et al., 2008). Although some of these methods have predictive value, when evaluated by monitoring the impact of the miRNA on both mRNA destabilization and protein output, they are less successful in predicting responsive targets than is scoring local AU content (Grimson et al., 2007; Baek et al., 2008). Perhaps because of RNA-binding proteins, RNA tertiary structure, and multiple competing RNA pairing conformations, the details of intracellular UTR structures might differ substantially from predicted structures, such that scoring local AU content is more reliable for predicting site accessibility.
When analyzing orthologous 3´ UTRs, conserved 7-mers in general, not just those matching miRNAs, are preferentially found in local AU-rich contexts, in predicted accessible secondary structure, away from the first 15 nt of the 3´ UTR, and away from the centers of long UTRs (Gaidatzis et al., 2007; Grimson et al., 2007; Majoros and Ohler, 2007). This result would be expected if these UTR context features found to boost miRNA effectiveness do so by enhancing site accessibility and therefore generalize to protein-binding elements. Systematic analyses of the number of sites conserved above the number expected by chance, i.e., analyses of the signal above background, rather than the signal divided by background, prevents this general effect from confounding interpretation of miRNA site conservation (Grimson et al., 2007).
The proposal of multiple lin-4 sites in the lin-14 3´ UTR (Lee et al., 1993; Wightman et al., 1993) and of sites for both lin-4 and let-7 RNAs in both the lin-14 and lin-28 UTRs (Reinhart et al., 2000), led to the notion that multiple sites for the same or different miRNAs might function cooperatively. Quantitative analyses of array data have shown that with most site configurations the increased response observed for messages with multiple sites is nearly the same as that expected if each site contributes independently to repression; that is, the response for a gene with two sites matches that anticipated by multiplying the responses from each site working on its own (Grimson et al., 2007; Nielsen et al., 2007). This multiplicative effect, a hallmark of independent and noncooperative action, was observed previously using reporter assays (Doench et al., 2003). Although not cooperative in a biochemical sense, such independent action can add up to substantial repression. For instance, a message with eight sites to co-expressed miRNAs would be repressed by ~25 fold if each site independently decreased protein output by a third [(0.67)8 = 0.04].
A notable exception to the overall tendency of independent action has been found: two sites that are close together (within 40 nt, but no closer than 8 nt) tend to act cooperatively, leading to marked enhancement in repression over that expected from the independent contributions of the two sites (Grimson et al., 2007; Saetrom et al., 2007). By analogy to transcription factors, cooperative miRNA function provides a mechanism by which repression can become more sensitive to small changes in miRNA expression levels, and it greatly enhances the regulatory effect and utility of combinatorial miRNA expression. Although not the norm, hundreds if not thousands of sites fall within a cooperative configuration with a site for the same or a co-expressed miRNA. Such cases would likely be more responsive to the miRNA, which may make them easier to identify genetically. Indeed, the miR-2:E(Spl) and miR-4:Brd interactions both involve conserved sites with intersite spacing suitable for cooperative action (Figure 2A). The same holds for the lin-4:lin-14, let-7:lin-41, let-7:hbl-1 and lsy-6:cog-1 interactions, and the lin-4 site within lin-28, which is the another classical interaction identified genetically in worms, falls within cooperative distance of a site for the co-expressed let-7 family members (Figure 2A–B).
Quantitative analyses of array data provides the opportunity to compare the relative contributions of different types of 6–8-nt sites, 3´ pairing, and each of the four features of UTR context that boost site efficacy (Figure 4). The hierarchy of site efficacy is as follows: 8mer >> 7mer-m8 > 7mer-A1 >> 6mer > no site, with the 6mer differing only slightly from no site at all (Grimson et al., 2007; Nielsen et al., 2007) (Figure 4A). This hierarchy reflects average efficacies—within each site type efficacy varies widely, depending on the context of each individual site. Regarding the known context features, dual 7mers with cooperative intersite spacing tend to out-perform a single 8mer, whereas those at non-cooperative spacing do not (Figure 4B); an 8mer in the path of the ribosome tends to be less effective than a 7mer in the UTR (Figure 4A and C), and local AU content, site position, and supplemental 3´ pairing are all influential, but local AU content more commonly distinguishes sites because relatively few sites are positioned in the middle of long UTRs or proximal to residues that can provide the extra ~4 bp needed for consequential 3´ pairing (Figure 4C–E; Grimson et al., 2007).
The quantitative estimates of Figure 4 were calculated from the mRNA destabilization effects of introducing a miRNA into cultured cells, leading to the possibility that these estimates might miss or underestimate the relative importance of those features that exert more of an influence on translational repression or have a greater influence in the animal than in cell culture. Helping allay these concerns is the performance of a model that quantitatively integrates these parameters into “context scores” for predicting site efficacy (Grimson et al., 2007). Proteomics data and reporter assays support the predictive value of the model for protein down-regulation, and proteomics and array data monitoring endogenous responses following miRNA knockout show that the model applies to endogenous miRNA:target interactions in fish and mice (Grimson et al., 2007; Baek et al., 2008). The model, with its context scores, is used to rank both conserved and nonconserved Targetscan predictions (Table 1). Because the model predicts site efficacy without considering site conservation, it also predicts siRNA off-targets, which appear to be repressed through the same mechanisms as endogenous miRNA targets.
Although much is known about the principles of miRNA target recognition and some targets can be predicted with high confidence, much remains to be learned. A limitation of comparative sequence analysis is that some preferentially conserved sites are difficult to distinguish from those that are conserved by chance, and thus prediction sensitivity requires permissive conservation cutoffs. For example, the cutoffs used currently by TargetScan are chosen such that only about half of the conserved sites listed are thought to be preferentially conserved, some of which are indistinguishable from the others that are included as a consequence of their fortuitous conservation (Friedman et al., 2008). Probabilities of preferentially conserved targeting and context scores are both provided to guide biologists wanting to focus on the more confident predictions, but because these values imperfectly reflect the preferential conservation and efficacy of each site, using them to increase prediction specificity eliminates some biological targets without eliminating all false positives. Some high-scoring sites do not respond at all to the miRNA in reporter and proteomic experiments (Grimson et al., 2007; Beak et al., 2008), and overall, context scores explain ~60% of the variability observed in reporter assays for the same 7-nt site in different UTR contexts (A. Grimson, D.B., unpublished data). Further illustrating the room for improvement, conserved sites tend to perform better than do nonconserved sites with identical context scores, presumably because a higher fraction of the conserved sites have been under selection to fall in more favorable contexts, and current methods only partially capture features of these more favorable contexts. Thus, some determinants of targeting specificity remain to be discovered or more accurately quantified, a conclusion congruent with a recent reporter-mutagenesis experiments in worms (Didiano and Hobert, 2008).
Another challenge concerns sites that are functional despite lacking both perfect seed pairing and 3´-compensatory pairing. Many such sites have been proposed, and in a few cases the messages or their UTRs are reported to respond to changes in the levels of the miRNA. However, only very rarely has function been tested by observing the effect of mutating the site; in the other cases the responsive UTRs could be indirect targets of the miRNA, regulated downstream of the direct targets. One experimentally confirmed site, found in the human LIN28 3´ UTR, involves a mismatch at the first seed nucleotide and then perfect pairing to miRNA nucleotides 3–8 (Wu and Belasco 2005). Genomewide, such “offset 6mer” sites (Figure 1E) do have a detectable tendency to be conserved (Figure 1H), but they typically mediate very limited repression (Figure 4A, Friedman et al., 2008). Understanding what makes the LIN28 offset 6-mer site so effective could yield important new insights into miRNA targeting.
Adding to the challenge of fully understanding targeting specificity are cases in which other regulatory processes counteract miRNA-directed regulation of particular targets. For example, alternative cleavage and polyadenylation can eliminate regulatory sites from the message—a phenomenon that appears widespread in proliferating cells, in which the shorter UTRs have only half the number of conserved miRNA sites as observed in the longer isoforms that dominate in non-proliferating cells (Sandberg et al., 2008). In another example, miR-430-directed repression of nanos1 is observed in the zebrafish soma but is blocked by the binding of Dead-end RNA-binding protein in the germline, thereby allowing germline-specific expression (Mishima et al., 2006; Kedde et al., 2007). Moreover, within the same cell the miRNA-mediated repression can be modulated in response to different conditions. For example, under stress conditions miR-122-directed repression of human CAT-1 mRNA is relieved through the binding of HuR to AU-rich elements in the CAT-1 3´ UTR (Bhattacharyya et al., 2006). These examples illustrate that in addition to the general features that influence site efficacy, other mechanisms, some involving cell-type-specific UTR-binding cofactors, can influence site accessibility and subcellular message localization. Indeed, under some circumstances, in particular those that induce cells to become quiescent, miRNA targeting of UTRs is reported to enhance rather than repress translation (Vasudevan et al., 2007, 2008).
Perhaps the most important caveat of the current understanding of miRNA targeting is that virtually everything known about metazoan targeting has been learned from the minority of miRNAs that are the most highly expressed. Expression-array, protoemic, and reporter assays are all conducted with miRNAs introduced or expressed at high levels, and site-depletion analysis is informative only for the miRNAs with highly differential expression. A similar caveat applies to analyses of site conservation. The 87 mammalian miRNA families conserved throughout most vertebrates have an average of >400 preferentially conserved sites per family (Figure 1H). By contrast, the 53 families that are conserved throughout mammals but appear to have emerged well after the divergence of chicken have far fewer preferentially conserved sites; after subtracting the estimated number of sites conserved by chance, an average of only 11 sites remain per miRNA family—a number high enough to explain the conservation of these mammalian-only miRNAs but too low to contribute to our understanding of target recognition (Friedman et al., 2008). A lower number of conserved sites is expected for mammalian-only miRNAs because messages have had a relatively short time between the emergence of these miRNAs and the divergence of mammals to acquire beneficial sites, whereas for older miRNAs, they had much more time to acquire beneficial sites. Another reason for this dramatic difference might be that mammalian-only miRNAs are often expressed at lower levels and in more narrow domains than are the more broadly conserved miRNAs, which would provide fewer opportunities for evolutionary acquisition of targets. Regardless of the reasons for this difference, one ramification is that the predicted targets listed for these 53 mammalian-only families must be viewed with caution; because site conservation is barely above that expected by chance, the observation that a site is conserved provides little evidence of biological relevance.
Although found and studied largely in the context of highly expressed miRNAs, seed pairing and favorable UTR context presumably are relevant also for more modestly expressed miRNAs. However, features normally sufficient for targeting by highly expressed miRNAs might not be sufficient for targeting by more modestly expressed miRNAs. Perhaps additional determinants, such as binding of a protein cofactor or extensive 3´-supplementary pairing, would be required to concentrate a limiting amount of miRNA to a small subset of the sites that would otherwise function with higher miRNA concentration. Because many mammalian miRNAs are not highly expressed, this more restrictive (and still uncharacterized) recognition mode presumably applies to a majority of mammalian miRNAs. In sum, the state of knowledge does not sound good when expressed in terms of miRNAs: we do not understand targeting for most miRNAs. Nonetheless, it sounds better when expressed in terms of functional miRNA:target interactions: we likely understand most of miRNA targeting. This is because each of the 87 highly conserved miRNA families has so many conserved and nonconserved targets that are repressed through 7–8-nt sites, and thus the canonical, less restrictive recognition mode presumably applies to the vast majority of metazoan miRNA targeting interactions.
Since the discovery that lin-4 and let-7 play roles in the timing of C. elegans larval development, specific miRNAs have been implicated in many other biological processes. With more than a half of mammalian messages under selective pressure to maintain pairing to miRNAs (Friedman et al., 2008), it may prove difficult to find a biological function or process that is not influenced at least to some degree, in some cell type, by miRNAs.
Some miRNAs have predicted propensity to target genes with related functions, which can provide insight into biological roles of these miRNAs (Stark et al., 2003; Grun et al., 2005; Lall et al., 2006; Gaidatzis et al., 2007). For example, the vertebrate miRNAs of the miR-17~92 cluster tend to target genes involved in growth control (Lewis et al., 2005), consistent with their oncogenic properties (He et al., 2005). However, the conserved targets of particular miRNAs, even those miRNAs with very striking expression specificities, are not always statistically enriched for specific functions or processes, and even in those cases for which statistical enrichment is found, the enrichment involves only a minority of the conserved targets. Although noise in the predictions and flaws in the gene-ontology databases contribute to this low signal, it is difficult to escape the conclusion that most of the broadly conserved miRNAs each represses genes with a wide variety of biological and molecular functions.
The diversity of conserved miRNA targets is rationalized in a model of miRNA function proposed as the abundance, differential expression, and targeting promiscuity of metazoan miRNAs were coming into focus (Bartel and Chen, 2004). In this model, the different expression profiles of miRNAs in different cell types constitutes a miRNA milieu, unique to each cell type, that dampens the expression of thousands of mRNAs and provides important context for the evolution of all metazoan mRNA sequences. As the UTR sequences drift over the course of evolution, they are continuously sampling matches to co-expressed miRNAs. Depending on whether the dampening of protein output is beneficial, inconsequential, or harmful, the sites are either selectively conserved, neutral, or selectively avoided during evolution (with the messages classified as conserved targets, neutral targets, and antitargets, respectively, of the miRNA). In muscle, for example, genes involved in many functions and processes might be detrimental if expressed at too high a level, and thus a wide variety of messages would be expected to accrue conserved complementary sites to muscle-specific miRNAs. For genes that should not be expressed in a particular cell type, the cell can come to depend on its miRNAs to act as binary off-switches to help to repress target protein output to inconsequential levels. Examples include the lin-4 targeting of lin-14 and lin-28, and let-7 targeting of lin-41. These classical switch interactions embodied the initial paradigm of miRNA targeting (summarized in Reinhart et al., 2000), whereby miRNA induction turns off expression of a pre-existing target. The current model expands this paradigm to include other types of switch interactions, tuning and neutral interactions, as well as instances in which messages selectively avoid miRNA targeting (described earlier in “Beyond conserved targeting”).
In contrast to the originally identified lin-4 and let-7 interactions, switch interactions can also include those in which a miRNA is already present when the target is first expressed (Figure 5A). In this scenario, the miRNA sets a more stringent threshold for consequential transcriptional activity, which can help quiet stochastic cell-to-cell noise during developmental fate decisions (Cohen et al., 2006). For example, targeting of senseless by miR-9a in fly epithelial cells prevents sporadic production of extra neuronal precursor cells (Li et al., 2006). At their extreme, such switch interactions can be regarded as failsafe interactions. For both classical switch interactions and failsafe interactions the miRNA represses protein output to inconsequential levels, but failsafe interactions differ because protein output falls below functional levels even in the absence of the miRNA. For failsafe interactions, miRNA repression adds an additional, functionally redundant layer of repression, helping ensure that aberrant transcripts do not give rise to a consequential amount of protein. Proposed examples of failsafe targeting include the miR-1 repression of non-muscle Tropomyosin isoforms and non-muscle V-ATPase subunits in the developing muscle (Stark et al., 2005).
Tuning interactions are those for which the miRNA acts as a rheostat rather than a binary off-switch to dampen protein output to a more optimal level but one that is still functional in the cell (Figure 5A), thereby enabling more customized expression in different cell types as well as more uniform expression within each cell type. A recent example is the Drosophila miR-8 regulation of atrophin, which reduces protein output to a level that prevents neurodegeneration but not so low as to compromise viability (Karres et al., 2007). Another likely example is the miR-375 targeting of Myotrophin (Mtpn), which dampens Mtpn output in pancreatic islets to a more optimal level but one that remains functional for insulin secretion (Poy et al., 2004). In the case of Mtnp targeting, the miRNA level remains constant in the adult animal, illustrating that the regulation need not be dynamic to be classified as tuning.
Neutral interactions dampen protein output but this repression is tolerated or offset by feedback mechanisms such that the regulatory sites are under no selective pressure to be retained or lost during the course of evolution (Figure 5A). Neutral interactions comprise cases in which biological targeting (i.e., targeting occurring in the animal) has no biological function. Because 7–8-nt sites so frequently fall in contexts suitable for repression in the animal, many ‘bystander’ messages that fortuitously pair to co-expressed miRNAs are likely subject to neutral repression. Indeed, when endogenous miRNAs are inhibited or removed, most derepressed messages have nonconserved sites (Krutzfeldt et al., 2005; Giraldez et al., 2006; Rodriguez et al., 2007; Baek et al., 2008), raising the possibility that neutral repression might be the most frequent type of biological repression. However, this possibility is difficult to confirm because tuning or switch interactions can have useful lineage-specific functions, and antitargets can have detrimental sites that have yet to be lost.
Comparing the expression of the miRNAs with that of their predicted targets can provide important clues to the more prevalent regulatory effects of metazoan miRNAs. In Drosophila, expression of miRNAs and their conserved targets usually appears ‘mutually exclusive,’ as judged by in situ hybridization (Stark et al., 2005). Microarray data from mammals paints a similar picture of a mutually exclusive tendency when considering messages with nonconserved sites (discussed above). However, the array data, which have greater dynamic range than do in situ hybridization data, suggest a different picture for messages with conserved sites, indicating that although the conserved mRNA targets tend to be expressed higher in tissues that lack the miRNA, they are still usually detected, albeit at lower levels, in tissues that express the miRNA (Farh et al., 2005; Sood et al., 2006). When miRNA-expressing cells are purified to cellular resolution, this tendency for some overlap between miRNA and target expression domains is retained, which indicates that the observed overlap is not an artifact of mixing cell types (Farh et. al., 2005; A. Shkumatava, A. Stark, H. Sive, D.B., unpublished data). The tendency of conserved targets to be present at low levels in the same tissues as the miRNA suggests that, rather than performing failsafe functions, miRNAs more frequently function to actively sculpt expression domains through a combination of tuning and classical switch targeting (Farh et al., 2005; Sood et al., 2006; A. Shkumatava, A. Stark, H. Sive, D.B., unpublished data). Still, with so many conserved targets, each highly conserved miRNA likely performs each type of regulatory function, and the proportions of classical switch, tuning, and failsafe interactions could vary widely from one miRNA to the next. Moreover, a single miRNA:target relationship could vary in different tissues or over the course of development, with, for example, active repression transitioning to failsafe repression as transcriptional output of the message declines.
The degree of target repression could also provide clues to the more prevalent functions of metazoan miRNAs. However, very little is known about the influence of miRNAs in their endogenous context on the protein output of their many of targets. Large-scale proteomic analysis has been performed for only one miRNA, miR-223, in only one biologically relevant cell type, murine neutrophils (Baek et al., 2008). This analysis revealed that although some detected proteins are repressed by 50–80%, miR-223 typically has more modest effects on its endogenous targets (even those targets that are conserved), with individual sites usually reducing protein output by less than a half and often by less than a third. Perhaps other miRNAs in their endogenous contexts have many more targets for which protein output is dramatically repressed. Even allowing for this possibility, it seems reasonable to presume that for each highly conserved miRNA, a minority of the preferentially conserved targets (much less than 150 for most miRNAs) are repressed >50% by that miRNA, whereas the hundreds of remaining preferentially conserved targets (particularly those with only 6mer sites) are repressed more modestly.
Those interactions conferring the greatest repression presumably would be enriched in switch interactions (classical or failsafe), whereas those with more modest repression would tend to be tuning interactions. Nonetheless, some targets that respond more modestly to the miRNA are likely to run counter to this tendency. For example, when target expression falls at the razor edge of efficacy, a 30% knockdown could provide switch function (Figure 5A). Alternatively, the miRNA and target can fall within a mutually repressive regulatory loop that amplifies small changes in target output to achieve switch function (Johnston et al., 2005; Li and Carthew, 2005; Yoo and Greenwald, 2005; Li et al., 2006). Another scenario in which modest miRNA-directed regulation cooperates with transcriptional regulation to achieve classical switch function occurs when a miRNA targets an mRNA that lingers after the gene has been shut off transcriptionally. In this case, depending on the threshold level for protein function, the mRNA decay rate, and the protein decay rate, modest miRNA-mediated repression can lead to substantially reduced protein at later timepoints, with a much more rapid transition to the off state (Figure 5B). Indeed this type of switch targeting applies to hundreds of maternal messages whose expression is damped by miR-430 in zebrafish [and perhaps analogous miRNAs in mammals (Farh et al., 2005)] to facilitate the transition to the zygotic gene-expression program (Giraldez et al., 2006). Nonetheless, many conserved mammalian interactions involve targets that are not strongly repressed by the miRNA, are not expressed at the razor edge of function, are not gene-regulatory molecules and thus cannot participate in amplifying regulatory loops, and are not transcriptionally shut off concurrently with miRNA repression. Most of these are unlikely to be classical switch interactions, and because modest repression would seem to impart less selective advantage for failsafe interactions than for tuning interactions, most are presumed tuning interactions.
The vast number of predicted targets, often with quite disparate functions, presents biologists with the challenge of choosing which is worthy of experimental follow-up. In some cases, known properties of a predicted target will suggest that the biological process of interest might be particularly sensitive to changes in its expression, making it especially promising for follow-up. Another way to choose targets to investigate is to assume that those messages with multiple conserved sites and particularly favorable sites might be among the most responsive to the miRNA. Examination of the predictions illustrates the utility of this approach. Among the predicted targets of C. elegans lin-4 miRNA, the one with the highest number of conserved sites is lin-14 (Figure 2A). Thus, for those interested in investigating the molecular etiology of the lin-4 phenotype, the target-prediction results suggest that lin-14 would be the top candidate for experimental follow-up, which turns out to be right on the mark (Lee et al., 1993; Wightman et al., 1993). For the other two genetically identified worm miRNAs, let-7 and lsy-6, similar correspondence is observed between targets revealed using genetic information and the top genome-wide target predictions (Figure 2A), presumably because such targets are more responsive to the miRNA than the multitude of other conserved targets and thus most easily implicated genetically. With this correspondence in mind, Hmga2, the top predicted target of the let-7 family in mammals (Figure 2A), has been investigated. As anticipated, its expression is highly responsive to changes in let-7 levels, and disrupting this regulation has phenotypic consequences in cell culture (Mayr et al., 2007). Likewise, Myb, the top predicted target of miR-150 (Figure 2A), has been investigated, and the effects of too little or too much miR-150 are largely explained by Myb repression (Xiao et al., 2007).
Once a miRNA:target interaction is chosen for study, how can the function of that specific interaction be assigned with confidence? Monitoring protein changes following miRNA knockout or knockdown is a useful starting point, but with so many targets for each miRNA, the possibility of indirect effects can be difficult to rule out. Thus, an attractive approach is to disrupt only that interaction and observe the phenotypic consequences. An expedient method to disrupt a single miRNA targeting interaction is to use antisense reagents that hybridize to the target site thereby preventing miRNA pairing. This approach reveals the importance of miR-430 regulation in balancing the expression of Nodal agonist (Squint) and antagonist (Lefty) during zebrafish mesoderm development (Choi et al., 2007). Another approach is to mutate the miRNA sites within a transgene expressing the target mRNA, ideally under the same transcriptional control as the endogenous gene. Transgene experiments performed flies even before miRNAs were known to regulate the sites can now be interpreted as evidence that miRNA regulation of Enhancer of split (E(Spl)) and Bearded (Brd) is needed for proper development of the peripheral nervous system (Lai and Posakony, 1997; Lai et al., 1998; Lai et al., 2005). In plants this approach reveals the biological importance of specific miRNA:target interactions throughout every stage of development, and because conserved miRNAs of plants fall within gene families whose members have largely redundant functions, this approach provides more information on plant miRNA functions than does disrupting miRNA loci (Jones-Rhoades et al., 2006). The transgene approach has also been fruitful in mammalian systems (Mayr et al., 2007; Teng et al., 2008).
The cleanest way to specifically disrupt a miRNA targeting interaction is to perturb pairing at an endogenous site through homologous recombination. Such an experiment shows that miR-155 repression of activation-induced cytidine deaminase (AID) production in mice helps prevent a potentially oncogenic translocation (Dorsett et al., 2008). Mapping naturally occurring variation can also uncover consequential targeting interactions. A point substitution creating a single miR-1 regulatory site in the sheep Myostatin 3' UTR dramatically impacts musculature (Clop et al., 2006), and a polymorphism that optimizes the context of a pre-existing miR-189 site in the human SLITRK1 3' UTR is associated with Tourette’s syndrome (Abelson et al., 2005).
Although the biological characterization of miRNA-mediated regulatory interactions is in its infancy, the emerging picture is that the phenotypic consequences of the vast majority of conserved interactions would be very challenging to detect in the lab. Indeed, simultaneously disrupting all the interactions of a miRNA by knocking out the miRNA locus often does not have dramatic phenotypic consequences. Of the 95 C. elegans miRNA genes tested, only a few have grossly abnormal phenotypes when individually knocked out (Miska et al., 2007).
Knockouts in flies and vertebrates are likely to have more easily discernable phenotypic consequences than those in worms. Compared to worm miRNAs, far fewer conserved fly miRNAs fall into multigene families, thereby lessening opportunities for redundant functions. Moreover, tissue-specific expression patterns, which can inform more focused phenotypic analyses, are more easily determined in flies and vertebrates because these species are more amenable to dissection and in situ hybridization (Kosman et al., 2004; Wienholds et al., 2005). The lsy-6 miRNA mutant, which scores as wild-type in the assays of Miska et al. (2007) but lacks the ability to discriminate between certain chemosensory inputs, illustrates the utility of a more focused assay (Johnston and Hobert, 2003). Nonetheless, gene-knockout phenotypes in flies and vertebrates still can be subtle, or largely attributable to derepression of only a few conserved targets (Li and Carthew, 2005; Li et al., 2006; Xiao et al., 2007), thereby reinforcing the idea that individually disrupting most conserved targeting interactions would have phenotypic consequences that are difficult to detect in the lab.
At least three factors explain why the phenotypic consequences of disrupting single miRNA:target interactions are expected to be subtle. One is that >90% of the conserved miRNA:target interactions involve only a single site to the miRNA, and therefore most of these targets would be expected to be down-regulated by less than 50%. Because most messages with a conserved site to one miRNA have at least one other conserved site to an unrelated miRNA (Brennecke et al., 2005; Krek et al., 2005; Lewis et al., 2005), interactions with multiple miRNAs might need to be disrupted before the de-repression of that message had perceptible consequences.
A second factor is that miRNAs have conserved interactions with targets possessing many types of functions, and for the vast majority of these targeted genes protein output can vary by twofold without detectable consequences, as evidenced by the rarity of haplo-insufficient phenotypes. One of the more interesting reasons that such perturbations are so frequently tolerated, even for miRNA targets that are themselves gene-regulatory proteins, is the phenomenon of regulatory network buffering. Many regulatory interactions, including many miRNA:target interactions, presumably fall within complex regulatory networks with bifurcating pathways and feedback control that enable accurate response despite a defective node in the network. With this ability to buffer the effects of losing a node, such networks must be perturbed elsewhere before the lost miRNA interaction has discernable phenotypic consequences. Reciprocally, perturbing the miRNA node would be expected to sensitize the network to reveal the importance of other regulatory nodes. Such experiments are beginning to reveal important miRNA functions that otherwise would be missed (Li and Carthew, 2005).
A third reason for the subtle phenotypes is that lab conditions have been optimized to preserve and propagate mutant lines, whereas conditions that better simulate the stresses and competition that have shaped the evolution of each species would uncover many more instances in which disrupting miRNA regulation of a target has discernable phenotypic consequences. Indeed, to the extent that a miRNA-target interaction has been under selective pressure to be preserved, it must be a biologically meaningful regulatory relationship with phenotypic consequences, albeit potentially mild ones. By this criterion, comparative sequence analyses, which measures the results not of a laboratory experiment but of The Big Experiment (otherwise known as evolution), indicates a vast scope of consequential regulatory targeting. However, because a very subtle fitness disadvantage can prevent alleles with mutant sites from being the ones that become fixed in a population, purifying selection is exquisitely sensitive in retaining consequential interactions. A challenge of the next decade will be to design the laboratory experiments with the sensitivity needed to uncover the functional roles of most of these biological interactions.
When combining the observation that most mammalian messages are under selective pressure to maintain sites to miRNAs together with the observation that each conserved targeting interaction typically imparts only a modest reduction in protein output, it is difficult to escape the conclusion that the precise levels of most individual proteins impact animal fitness. That these levels should be so precise for so many proteins, with the tight tolerances so often retained through evolution, is one of the more fascinating biological conclusions arising from miRNA research of the past few years.
The emerging picture of miRNA regulation in animals is far richer and more complex than the crisp linear pathways of the previous decade, with miRNAs participating in executive decisions but also performing much of the grunt work to micromanage protein output. However, the situation is not so complicated so as to make the reductionistic approach futile. Initial experiments to characterize individual regulatory interactions in the lab will still reveal much by focusing on those with the more easily scored consequences. With help of target-predictions, expression data, and biological knowledge, candidates for such interactions can now be found by focusing on those messages that are predicted to be most responsive to the miRNA, coexpressed with the miRNA in relevant cells, and at interesting and vulnerable nodes in regulatory networks. With so many biologists now cognizant of miRNAs and how they recognize their targets, the stage is set for rapid progress in learning the functions of these more accessible miRNA:target interactions.
I thank D. Baek, C. Burge, K. Farh, R. Friedman, A Grimson, B. Lewis, L. Lim and other colleagues for their input and stimulating discussions, and members of the lab for helpful comments on this manuscript. Work on miRNAs in my lab is supported by grants from the NIH.
David P. Bartel, Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA.