|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs are emerging as important regulators of diverse biological processes and pathologies in animals and plants. While hundreds of human miRNAs are known, only a few have known functions. Here we predict human microRNA functions by using a new method that systematically assesses the statistical enrichment of several miRNA targeting signatures in annotated gene sets such as signaling networks and protein complexes. Some of our top predictions are supported by published experiments, yet many are entirely new or provide mechanistic insights to known phenotypes. Our results indicate that coordinate miRNA targeting of closely connected genes is prevalent across pathways. We use the same method to infer which microRNAs regulate similar targets and provide the first genome-wide evidence of pervasive co-targeting, where a handful of “hub” microRNAs are involved in a majority of co-targeting relationships. Our method and analyses pave the way to systematic discovery of microRNA functions.
MicroRNAs (miRNAs) regulate diverse biological processes in animals and plants (Bushati and Cohen, 2007) and are among the most abundant regulatory factors in the human genome, comprising 3–5% of known human genes (Griffiths-Jones et al., 2008). miRNAs recognize target mRNAs by imperfect base pairing to sites in the 3′ untranslated region (3′ UTR), usually with perfect pairing of the miRNA seed region (nucleotides 2–8), ultimately leading to translational repression and/or mRNA degradation (Bushati and Cohen, 2007). Thousands of human genes are predicted to be targeted by miRNAs (Rajewsky, 2006), suggesting that miRNAs play a pervasive role in the regulation of gene expression.
Although hundreds of human miRNAs have been identified and new ones are continually being discovered (Griffiths-Jones et al., 2008), the function of most miRNAs remains unknown. Increasingly, miRNA expression changes are being linked to phenotypes, but the mechanistic role of the miRNA in the underlying biological network is often unclear. Given that many human miRNAs can target up to thousands of genes, how often do miRNAs target a set of related genes to regulate a specific pathway or process? While recent studies show that a few miRNAs have pathway-specific functions (Xiao and Rajewsky, 2009), earlier work suggests that miRNAs primarily serve to fine-tune and confer robustness upon the expression of many genes (Bartel and Chen, 2004; Farh et al., 2005; Stark et al., 2005).
The prevalence of multiple miRNAs targeting the same gene (“co-targeting”) is also unclear. While many genes contain putative binding sites for multiple miRNAs (Krek et al., 2005; Stark et al., 2005), many putative sites may not be functional in vivo. More specifically, the combinations of miRNAs that function together by regulating common targets are unknown. Knowledge of such co-targeting relationships would also enable one to infer a miRNA’s function from the function of its co-targeting miRNAs.
Typically miRNA function is predicted by assessing whether the predicted targets of a given miRNA are enriched for particular functional annotations. Such an approach has several limitations: (1) target prediction is imperfect and can lead to spurious targets (Rajewsky, 2006); (2) having a subset of one’s favorite pathway genes in the putative target set does not necessarily mean that the miRNA functions in the pathway; (3) predicted target sets are often so large (hundreds to thousands of genes) and have such heterogeneous functional annotations that standard algorithms are not sufficiently sensitive to make high-confidence predictions. Rather than progressing from a miRNA to a potentially spurious target set that may or may not have enriched function, here we introduce a computational method called mirBridge, which starts with a gene set of known function, then assesses whether functional sites for a given miRNA are enriched in the gene set compared to random gene sets with similar properties.
We apply mirBridge to a variety of annotated gene sets for signaling pathways, diseases, drug treatments, and protein complexes. We also use mirBridge to infer miRNA pairs that tend to function together by regulating common targets and use the results to assemble a miRNA-miRNA co-targeting network. Together, our analyses provide: (1) hundreds of miRNA function predictions, many of which are supported by published experiments; (2) genome-wide evidence that many miRNAs coordinately regulate multiple components of pathways or protein complexes; and (3) evidence that miRNA co-targeting is prevalent, with a small number of “hub” miRNA families involved in a large fraction of the co-targeting interactions. Both the mirBridge method and the predictions it has generated can serve as important resources for the future experimental dissection of miRNA functions.
Many gene sets contain tens to hundreds of putative targets for any particular miRNA. However, for a variety of reasons (e.g. mRNA secondary structure occludes binding, or the miRNA and the target are not expressed together) many target sites are not functional in vivo. The goal of mirBridge is to infer whether an unusually large proportion and number of putative target sites for a miRNA (m) in a given gene set (G) are likely to be functional in vivo. Towards this end, mirBridge computes a score by combining the results of three statistical tests that evaluate different aspects of likely-functional target-site enrichment in G. It is essential that the enrichment of sites in G be compared to enrichment in appropriate control gene sets. Below we describe the individual tests and the method for constructing the control gene sets (see Supplemental Experimental Procedures for details).
The following definitions are essential to the methodology of mirBridge. First, any gene with one or more seed-matched site for m in its 3′ UTR is deemed a “putative target.” Second, seed-matched sites can be classified into two categories (Figure 1A): “conserved sites” (CS) are sites that are conserved across mammalian genomes; “high-context scoring sites” (HCS) are sites with a context score above a predefined threshold. The context score reflects the likelihood of a seed-matched site to confer repression based on several features, including the distance of the site from the stop codon, accessibility of the site based on secondary structure, and the extent of base pairing beyond the seed (Grimson et al., 2007).
The first test used by mirBridge, called “conservation enrichment signature” (CE), infers whether the number of CS in G is significantly higher than that of random gene sets containing the same number of putative targets as G. This test is similar to evaluating whether the sites have evolved at a slower rate compared to random putative target sets, but is fundamentally different from prior tests that utilize sequence conservation (Lewis et al., 2005; Stark et al., 2005) (see Supplemental Experimental Procedures). The second test, called “context-score signature” (CTX), evaluates whether the number of HCS is significantly higher than that of random gene sets containing the same number of putative targets as G. The CTX test is designed to detect enrichment of sites in G that are likely functional but not necessarily conserved. The third test, called “site occurrence signature” (OC), evaluates whether the number of putative target sites in G is unusually high compared to random gene sets containing the same number of genes. While target site abundance alone is not necessarily indicative of functional targeting by m, functional targeting enrichment becomes a likely scenario even when G tests as moderately significant for the CE and/or CTX tests. Note that both CE and CTX are based on comparison with random gene sets with the same number of putative targets to detect enrichment in the proportion rather than the number of CS or HCS. This ensures that the comparisons are valid, as gene sets with more putative targets tend to have more CS or HCS. Since true positives are more likely than false positives to test as simultaneously significant across the tests, we combine the three tests and form a composite score (“OC-CE-CTX”) to increase sensitivity without sacrificing specificity.
We developed a nearest-neighbor gene sampling algorithm, motivated by the principle of kernel-based density estimators (Wegman, 1972), to generate random gene sets that are similar to the input gene set with respect to general conservation level, 3′ UTR length, and GC content, which primarily bias the CE, OC, and CTX tests, respectively. Simultaneous adjustment is particularly important because these factors are correlated with each other across genes. Specifically, for the OC test, comparable random gene sets are generated by replacing each member of G with a randomly drawn gene that has similar GC content, 3′ UTR length, and general conservation level (Figure 1B). To ensure that the number of putative targets in the random gene sets is the same as that in G for the CE and CTX tests, the same nearest-neighbor procedure is used but only putative targets in G are replaced by random putative targets (Figure 1C).
Finally, to obtain the OC-CE-CTX p value, the p values of the individual tests are combined using a customized version of the inverse-normal method that corrects for dependencies among tests (Joachim, 1999). When multiple gene sets and/or miRNAs are tested simultaneously, multiple hypothesis testing is corrected by computing the false discovery rate (FDR) using the q-value method (Storey and Tibshirani, 2003). “FDR” and “q value” are used interchangeably below.
Besides 3′ UTR length, GC content, and general conservation, other less apparent factors could bias mirBridge results, but their effects are likely small (see Supplemental Experimental Procedures). The statistical model in mirBridge was also designed to incorporate additional factors if needed – in principle any number of factors can be accounted for by our nearest-neighbor sampling procedure.
mirBridge is fundamentally different from testing whether the number of predicted miRNA targets in a gene set is significantly higher than expected using the Fisher’s Exact Test (FET), a standard way to assess the significance of gene set overlaps. First, mirBridge takes gene set properties into account; second, it combines different and important biological characteristics of target sites; and finally, it uses metrics (CE and CTX) that focus on the proportion of likely-functional target sites instead of the number of predicted target overlaps. In fact, mirBridge has superior sensitivity and specificity compared to FET as shown in the applications below.
To link human miRNA families (miRNAs with a shared seed sequence) to functions, we applied mirBridge to gene sets from (1) canonical signaling pathways from MSigDB (Subramanian et al., 2005); (2) KEGG (Kanehisa and Goto, 2000); (3) human protein complexes from the CORUM database (Ruepp et al., 2008); (4) gene co-expression modules (Segal et al., 2004); (5) Gene Ontology (GO) Biological Process; (6) GO Component; and (7) GO Function (Ashburner et al., 2000). At a FDR cutoff of 0.2, mirBridge predicts 185, 128, 1198, 456, 432, 71, and 175 distinct miRNA-function associations, respectively (Tables S1a–g). Most predictions implicate pathways or protein complexes with multiple putative targets for the miRNA, whereas some have only one (or very few) putative targets containing multiple high-quality sites (e.g. miR-33 and statin pathway). The latter fits the paradigm implied in some recent papers where a miRNA phenotype seems to be accounted for by one (or just a few) targets: “miR-X regulates process Y by targeting gene Z.” However, the prevalence of coordinate targeting of multiple related genes suggests that most miRNAs exert their phenotypic effects by targeting multiple network components.
To facilitate a succinct discussion of such a large set of predictions, Table 1 shows a selection of predictions that either already have support from the literature or wherein the predicted pathway (1) has known activity in the tissue where the miRNA is known to be expressed; or (2) represents core cellular processes (e.g. “apoptosis”) and has a large number of putative targets for the miRNA. We also favor predictions that reoccur in closely related or synonymous gene sets, e.g. “cell cycle” and “G1 to S transition.”
Although mirBridge is not trained on any dataset of known miRNA functions, several of the top hits already have experimental support in the literature (Table 1A), such as the association of miR-16 with the cell cycle, Wnt signaling and prostate cancer (Calin et al., 2005; Linsley et al., 2007) (Figure S1A). This is also an example where mirBridge links a disease and the pathways underlying its pathology: miR-16 has been shown to work through the Wnt pathway to function as a tumor suppressor in prostate cancer (Bonci et al., 2008). Analogously, miR-7 hits the ErbB pathway in glioblastoma (Kefas et al., 2008; Webster et al., 2009); miR-221/222 hits the estrogen signaling pathway in breast cancer (Miller et al., 2008; Zhao et al., 2008); and let-7 hits the G1-S cell-cycle pathway in breast cancer (Schultz et al., 2008; Yu et al., 2007). mirBridge can also implicate a pathway of interest given the tissue specificity of a miRNA: miR-7 is predicted to regulate the insulin receptor pathway and is known to be highly expressed in insulin-producing cells of pancreatic islets (Bravo-Egana et al., 2008; Correa-Medina et al., 2009; Joglekar et al., 2009). mirBridge also independently uncovered feedback loops: miR-146 is predicted to target several upstream signaling genes in the NF-kB pathway while its transcription is known to be activated by NF-kB (Taganov et al., 2006) (Figure S1B). Another notable prediction supported by the literature is miR-34 targeting BCL2 and several additional anti-apoptotic genes in the BAD pathway (Chang et al., 2007; Cloonan et al., 2008; He et al., 2007). This prediction provides an attractive hypothesis for how miR-34 upregulation could lead to apoptosis. In sum, these results are reassuring and indicate that mirBridge can capture biologically relevant signals.
mirBridge is significantly more sensitive than the standard approach of evaluating gene set overlaps using FET. For instance, when FET is applied to the canonical pathway gene sets, only five predictions can be made at the 0.2 FDR cutoff (Table S1h); all five have FDRs above 0.18 and only one has support from the literature (miR-16 and the Gleevec pathway given that miR-16 is associated with leukemia). Furthermore, none of the top mirBridge predictions supported by published experiments were uncovered. For example, for miR-16, none of the cell-cycle related pathways are ranked near the top even if we ignore the statistical significance and order the pathways within each miRNA family by their q values (the top cell-cycle related entry has rank 54, q=0.55). These results suggest that mirBridge can better uncover biologically relevant signals than FET.
It is important to note that the comprehensiveness of our predictions is dependent on the gene sets used. Some known miRNA functions are not in our predicted list because the appropriate gene set(s) were not included in the analysis. For example, miR-200 is known to function in the epithelial-mesenchymal transition (Burk et al., 2008; Gregory et al., 2008a; Korpal et al., 2008a; Park et al., 2008), but none of the gene sets used in our analysis captures this process. However, when mirBridge is applied to genes whose function annotation in the GeneCards database includes “epithelial-mesenchymal transition,” miR-141/200a has the lowest q value among all miRNAs (q=0.08).
To further assess the ability of mirBridge to predict known miRNA functions independently, we compiled eight additional miRNA phenotypes from the literature and applied mirBridge to seemingly relevant gene sets from KEGG or GeneCards (Table S2). Of nine phenotypes, four miRNA-gene set p values are significant and two are marginally significant (Table 2). In a multiple hypothesis testing context where all miRNAs are tested simultaneously for the phenotype gene set, however, only two would have been predicted at a FDR cutoff of 0.2 even though the desired miRNA ranks at or near the top for all four of the significant cases. This suggests that for these specific gene sets, mirBridge is sensitive to the relevant biological signals but lacks sufficient statistical power after multiple-testing correction. It follows that the hundreds of low-FDR predictions that are made by mirBridge are compelling candidates for experimental follow-up given that these emerged in the simultaneous testing of thousands of miRNA-gene set combinations. We expect the statistical power of mirBridge to continue to improve as additional genomes and knowledge of miRNA-target interactions become available.
We also sought to understand cases where mirBridge failed to predict the correct functions. Closer examination of the three failed cases in Table 2 suggests that for let-7 and miR-133 the gene sets used do not capture the biology relevant to the miRNA targeting. The cell cycle may be a key pathway through which let-7 exerts its effect on lung cancer (Esquela-Kerscher et al., 2008; Kumar et al., 2008; Schultz et al., 2008), but the non-small cell lung cancer gene set lacks most cell cycle genes and other postulated targets such as HMGA2 and MYC (let-7 does hit the G1-S cell-cycle transition pathway; Table 1A). Similarly for miR-133 and cardiac hypertrophy: two out of the three known targets relevant to the phenotype are not in the GeneCards set (CDC42 and WHSC2; (Care et al., 2007)). Finally, for miR-122, it turns out that inhibition of miR-122 by antagomir treatment tends to downregulate, rather than upregulate, cholesterol biosynthetic genes (Krutzfeldt et al., 2005), suggesting that the effect of miR-122 on cholesterol biosynthetic genes is indirect. Thus the insignificant mirBridge p value for miR-122 and cholesterol biosynthesis genes is not surprising.
The majority of mirBridge predictions are not yet directly supported by existing experiments (Tables 1B and S1–4). Some pathways predicted in common for multiple miRNAs seem particularly compelling because the miRNAs are known to be co-regulated. For example, the apoptosis pathway is predicted for miR-23 and -24, which are different in sequence but are co-expressed from the same cluster (Chhabra et al., 2009). Some predictions seem reasonable based on the function of the miRNA host gene. For example, the statin/cholesterol homeostasis pathway is linked to miR-33, which is embedded in an intron of a transcription factor that regulates cholesterol synthesis and uptake (Figure S1C). Other predictions seem plausible based on known miRNA functions with similar developmental placement and timing. For example, axon guidance pathways are predicted for miR-124, which has already been shown to positively regulate neurogenesis (Cheng et al., 2009; Visvanathan et al., 2007). Consistently, miR-124 was linked to the SNARE protein complex as it putatively targets VAMP3, a component of SNARE, via three conserved and high context-scoring sites; VAMP3 is known to function in the docking and fusion of synaptic vesicles with the presynaptic membrane (Sudhof, 2004).
mirBridge predictions can also provide mechanistic interpretations of published experiments. For example, it is known that activation of PIP3 signaling leads to the hypertrophic response in cardiac myocytes and that miR-1 expression is down-regulated upon hypertrophic stress (Care et al., 2007; Heineke and Molkentin, 2006; Sayed et al., 2007). mirBridge links miR-1 to the PIP3 pathway, and the putative miR-1 targets in the pathway are all pro-hypertrophic except PTPN1 (Table 1A), suggesting that the down-regulation of miR-1 helps to drive pathway activation (Figure 2). Post-transcriptional repression by miR-1 could allow these genes to be transcribed at higher (or leaky) levels without triggering a hypertrophic response, such that a reduction in miR-1 expression would suffice to rapidly activate signaling at multiple levels. For example, de-repression of the most downstream factors (e.g. CDC42) could quickly lead to sarcomere remodeling, a first step in the hypertrophic response (Nagai et al., 2003). Increasing levels of upstream factors coupled with positive feedback loops would intensify the response.
We envision a useful application of mirBridge would be to probe function of interest guided by the expression profile of likely relevant miRNAs. Since we are interested in neurotransmitter and axon guidance pathways, we applied mirBridge to manually curated gene sets for these pathways and examined miRNAs known to be expressed in the brain (see Supplemental Experimental Procedures). miR-218 is a top hit for both GABA and glutamate gene sets (q=0.025 and 0.033, respectively). That these two neurotransmitter activities may be regulated by the same miRNA is intriguing given that glutamate and GABA are, respectively, the major excitatory and inhibitory neurotransmitters, and that the latter can be enzymatically converted from the former. In addition, encouraged by the recent finding that miR-218 is enriched at synapses of hippocampal neurons, we tested a gene set for the presynaptic cytomatrix protein complex (Siegel et al., 2009). miR-218 is the top hit with a q value of 0.0008. In sum, the mirBridge hits for these gene sets extend early experimental findings to implicate miR-218 as potential regulators of neuronal activity at hippocampal synapses.
Our miRNA-pathway map indicates that some miRNAs function in the same pathway(s) by targeting a similar set of genes. Indeed, many miRNAs may function together (via “co-targeting”) to regulate target-gene expression. To assess the prevalence of co-targeting and infer which miRNAs are co-targeting partners, we next used sets of genes likely regulated by particular miRNAs to create a miRNA-to-miRNA mapping. Specifically, our inputs to mirBridge were the predicted target sets (PTS) of 73 deeply conserved human miRNA families. We call a miRNA family Y a “co-targeting partner” of a miRNA family X if at least one of Y’s seed-matched sequences has a significant mirBridge q value in the PTS of X and denote the relationship as “X→Y.” We predicted co-targeting relationships for all ordered pairs of the 73 families (73 × 72 = 5256 distinct pairs).
Our results indicate that miRNA co-targeting is prevalent: 221 distinct X→Y co-targeting relationships are inferred at a FDR cutoff of 0.2 (Table S3a). A subset of these predictions correspond to miRNA genomic clusters (Yu et al., 2006), such as the miR-19b-2/106a cluster on Xq26.2 and the miR-17-18-19a-20-92 cluster on 13q31.3 (Table S3b). Co-targeting pairs in close genomic proximity are not surprising: these miRNAs are polycistronic and co-expressed, and are thus likely to function together to regulate common targets. In fact, clustered miRNAs are enriched for co-targeting relationships: when X and Y are members of a genomic cluster, they are predicted as co-targeting partners 25% of the time compared to 3% when X and Y are not clustered. Consequently, the median q-value of clustered pairs is significantly lower than that of unclustered ones (p < 2.1 × 10−7, Mann-Whitney Test; see Table S3c for the clusters used in this analysis), indicating that our method for detecting co-targeting is sensitive, specific, and capable of uncovering biologically relevant signals.
If our predictions reflect bona fide biological signals, we also expect a significant percentage of the X→Y pairs to possess mutual co-targeting relationships, i.e. each miRNA’s putative binding sites would have a score below the FDR cutoff in the other miRNA’s PTS. Indeed, 96/221 (43%) of the X→Y predicted pairs do. While the remaining 57% of the X→Y pairs do not have the corresponding Y→X pairs falling below the FDR cutoff, there is nonetheless a significant correlation between their q values (Spearman correlation = 0.42 (p=0); Figure S2). Also, the reciprocal (Y→X) q values of significant X→Y pairs are lower than those of pairs with q values greater than 0.2 (p < 5 × 10−140 Mann-Whitney Test). The general reciprocation of co-targeting scores indicates that a significant percentage of our predictions are specific and that the signals we are detecting are likely biologically relevant.
We also tested whether co-targeting relationships could be inferred from gene set overlaps, where the X→Y q value was computed using FET on the number of genes shared between the PTSs of the miRNA family pair. This analysis failed to provide informative results because almost all tested pairs have a significant q value: 2264 (86%) and 2628 (100%) of the pairs have a q value of less than 0.05 by using the Bonferroni and FDR correction, respectively. This suggests that a core set of genes are frequently predicted as targets for many miRNA family pairs; these likely correspond to genes with highly conserved 3′ UTRs and/or low GC content, properties that favor a gene being predicted as a target using Targetscan. This result strongly suggests that the degree of PTS overlap is not sufficiently specific to detect authentic co-targeting relationships, whereas mirBridge has superior specificity and thus able to provide biologically relevant signals as shown above.
Our co-targeting predictions can naturally be organized as a network where the nodes are miRNA families and the directed edges between nodes denote the X→Y predictions. A network representation enables examination of connectivity patterns beyond pairwise interactions. We first checked whether the edges in the network are evenly distributed across nodes or concentrated around a few nodes (“hubs”). Strikingly, the edges connecting the 10 most connected nodes (out of 69 nodes with at least one adjacent edge) account for more than 55% (123/221) of the edges in the network (Figure 3A and Table S3d). While overall the size of a miRNA family’s PTS is correlated to its connectivity ranking (p = 10−6 Spearman correlation), this correlation becomes insignificant when restricted to families with at least 900 predicted targets (p > 0.1). Since only six of the top 40 most-connected families have fewer than 900 predicted targets, the size of a miRNA family’s PTS alone cannot explain the connectivity pattern among the top 40 families. The hub miRNA families probably have functions in diverse contexts. For example, some hubs have a large number of members and therefore are likely to have more diverse functions depending on the spatial-temporal expression of individual miRNAs (e.g. miR-93.hd/291-3p/294/295/302/372/373/520).
We reasoned that groups of tightly inter-connected nodes might represent miRNAs that perform similar functions. To identify such groups we used a graph clustering tool that ignores edge weights to identify tightly interconnected nodes (Bader and Hogue, 2003) (Figure 3B). We find that subnetwork 1 has four families and is the largest and most highly interconnected; three of the families (miR-17-5p, -130, -93.hd) are among the most connected families (Figure 3A). This subnetwork is also well-connected to subnetwork 3 (miR-18, -19, -181), probably because miR-17-18-19-20 are co-expressed from a polycistronic transcript. The miR-17 cluster is known to be overexpressed in a number of human cancers, including B-cell tumors, while miR-142 is also highly expressed in B cells (Chen and Lodish, 2005; Mendell, 2008). Their shared PTS is enriched for genes in developmental processes (p < 3.8 × 10−5), consistent with the miR-17 cluster’s function in the development of B cells, the heart and lungs (Mendell, 2008; Ventura et al., 2008). Our linking of the miR-142 and miR-130/301 families—whose functions are largely unknown—to the miR-17 cluster suggests that these miRNA families also participate in similar developmental and oncogenic processes.
We have introduced a systematic method for inferring miRNA functions by assessing the enrichment of likely functional target sites in gene sets. Key features of mirBridge include combining test metrics that detect different aspects of functional targeting, and a sampling algorithm for removing gene-set biases to improve estimation of statistical significance. Hundreds of human miRNA-function associations were inferred by mirBridge; some are reassuringly supported by published experiments, but many are not previously known and/or provide mechanistic insights beyond published data.
Our results provide hints about the general principles of miRNA-mediated regulation in networks. While some miRNAs could act as global regulators by repressing up to thousands of targets genome-wide (Lewis et al., 2005), many appear to have pathway-specific functions, and these miRNAs tend to target multiple genes in the same pathway. Typically, the predicted targets of the miRNA are genes that drive pathway activity in a coherent direction (e.g. miR-16 targeting of G1-to-S-promoting genes). Such coordinate targeting could partially explain how individual miRNAs can be potent effectors of pathway activity even though the amount of repression conferred by miRNAs tends to be modest for any single target (Baek et al., 2008; Selbach et al., 2008; Xiao and Rajewsky, 2009). As was observed earlier (Martinez et al., 2008; Tsang et al., 2007), some of our predictions (e.g. miR-1) involve miRNAs mediating feedback and feedforward loops, whose functions include protein homeostasis and signal amplification, respectively. For example, miRNAs could be “master” regulators of pathways and thus serve as effective therapeutic targets because positive feedbacks could amplify small changes in protein concentration conferred by miRNA targeting of multiple genes. Our analysis also indicates that miRNAs can function in, and mediate cross-talk among, multiple canonical pathways, such as miR-16’s potential roles across the cell cycle and Wnt pathways to coordinately regulate cellular growth and proliferation.
mirBridge also facilitates context-specific target prediction: one can first predict which pathways a miRNA regulates and then compile high-quality putative targets within a pathway. This strategy may be especially effective for miRNAs that function in only a few pathways, as targets predicted genome-wide may have low specificity (Lewis et al., 2005). Additional filtering can be used to strengthen the target predictions, for example, by requiring that the putative target and the miRNA be significantly correlated in their expression using miRNA-mRNA expression data sets (Lu et al., 2005) (Table S1i).
In addition to providing functional links across miRNAs, our human miRNA-miRNA map provides, to the best of our knowledge, the first genome-wide evidence that miRNA co-targeting is prevalent, and that a handful of hub miRNA families are involved in a large fraction of the co-targeting connections. The abundance of co-targeting further suggests that while individual miRNAs may provide only modest levels of repression, combinatorial targeting by multiple miRNAs (Krek et al., 2005) can potentially achieve a wide range of target-level modulations. Given that multiple miRNAs are expressed at different levels in any given cell type, individual genes can evolve combinations of miRNA binding sites to optimize expression levels across cell types (Bartel and Chen, 2004). miRNA target sites are short and could thus be acquired or lost relatively quickly over evolution to fine-tune gene expression levels.
Designating a group of miRNAs as “co-targeting” does not necessarily imply that these miRNAs are co-expressed so as to regulate their common targets at the same time and place. In fact, the exact opposite is also likely: different miRNAs are responsible for controlling a given set of targets in different contexts. In general, a combination of the above scenarios is likely for individual cases, and additional data (e.g. miRNA and target expression profiles) are needed to further dissect the mechanistic basis of individual co-targeting predictions.
mirBridge is currently limited to assessing enrichment at the level of miRNA families using seed-matched motifs. But this is largely due to our lack of general understanding of miRNA-target interaction beyond seed pairing and features captured by the context score. In principle, the mirBridge methodology is general and can be applied to any combinations of gene sets, sequence motifs and site scoring metrics, including non-miRNA motifs, such as those involved in regulating mRNA stability. Given mirBridge’s ability to simultaneously correct for multiple gene-set biases, and the increasing availability of genomes and annotated gene sets, mirBridge is poised to serve as a key resource for the comprehensive functional dissection of miRNAs and other regulatory sequence motifs in genomes.
miRNA family memberships, 3′ UTR sequences, seed-matched sites and their context scores and conservation status were downloaded from TargetScan (http://www.targetscan.org/vert_40/). For each known human gene, the number of seed-matched sites for each miRNA family, the number of those that are conserved, and the context score were computed. Since the context score depends on the full miRNA sequence, the context score for a miRNA family is defined as the average of all human members of that family.
The method as described in the text was implemented in Matlab. More details and related discussions can be found in Supplementary Experimental Procedures.
Canonical signaling pathway and KEGG gene sets were downloaded from http://www.broad.mit.edu/gsea/msigdb/index.jsp. The cancer, CORUM, and GO sets were downloaded from http://robotics.stanford.edu/~erans/cancer/, http://mips.helmholtz-muenchen.de/genre/proj/corum, and NCBI Gene, respectively. To reduce noise and avoid spurious annotations, we only used GO annotations with experimental and peer-reviewed evidence. A miRNA-gene set prediction requires at least one of the miRNA seed motifs (m2-8 and/or m7-A) to test as significant in the gene set. The q value reported for individual miRNAs corresponds to the q value of the seed motif with the smaller p value.
The deeply conserved miRNAs are ones that are conserved across human, mouse, rat, dog and chicken. We focused on these miRNAs because they probably have (1) more conserved functions, (2) a larger number of targets compared to less-conserved miRNAs, and (3) stronger conservation enrichment signals.
Targets were compiled for each miRNA by including genes with at least one conserved seed-match (across human, mouse, rat and dog) or a seed-match with a context score of greater than 68 in the 3′ UTR (see Supplemental Experimental Procedures). Predictions based on context score alone were included because functional target sites can be imperfectly conserved. High-quality putative targets in gene sets (Table 1 and S1) were compiled using the same definition.
mirBridge was applied to the predicted target set of each miRNA family. Only the seed-matched motifs of the 73 families were scored. When both seed-matched motifs of a miRNA family are tested significant, the smaller q value is used as the X→Y q value. Human miRNA clusters were obtained from (Yu et al., 2006).
The number of overlaps between the predicted target set of each miRNA-family pair was computed. The statistical significance was computed using Fisher’s Exact Test (see Supplemental Experimental Procedures).
Similar to above except that (1) all genes that are not predicted as a target for any miRNA were removed from the pathway gene sets; and (2) the population size is taken as the number of genes that are predicted as a target for at least one miRNA family and belong to at least one pathway.
We thank H. Fraser, D. Muzzey, M. Narayanan and M. Umbarger for comments on the manuscript; J. Zhu for discussions; D. Bartel for the suggestion to examine co-targeting by polycistronic miRNAs; M. Fang for help on importing gene sets. This work was supported by grants from the NSF (PHY-0548484), NIH (R01-GM068957), and by a NIH Director’s Pioneer Award to A.v.O. (1DP1OD003936); J.S.T. was partially supported by a doctoral scholarship from the NSERC of Canada; M.S.E. was supported by a HHMI Predoctoral Scholarship, a Paul and Cleo Schimmel Scholarship, and a grant from the NIH (RO1-CA133404) to Phillip Sharp.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.