microRNAs are a class of small regulatory RNAs that are involved in post-transcriptional gene silencing. These small (approximately 22 nucleotide) single-strand RNAs guide a gene silencing complex to an mRNA by complementary base pairing, mostly at the 3' untranslated region (3' UTR). The association of the RNA-induced silencing complex (RISC) to the conjugate mRNA results in silencing the gene either by translational repression or by degradation of the mRNA [1
]. Reliable microRNA target prediction is an important and still unsolved computational challenge, hampered both by insufficient knowledge of microRNA biology as well as the limited number of experimentally validated targets.
Early studies of target recognition revealed that near-perfect complementarity at the 5' end of the microRNA, the so-called "seed region" at positions 2 to 7, is a primary determinant of target specificity [2
]. However, a perfect seed match by itself is a poor predictor for microRNA regulation due to the large number of random occurrences of any given hexamer in 3' UTRs.
Conversely, a number of studies have shown that some target sites with a mismatch or a G:U wobble in the seed region confer a noticeable regulatory effect [3
], and a recent study using a cross-linking and immunoprecipitation (CLIP) method to study in vivo
microRNA targets found a significant number of non-canonical sites [6
]. Therefore, perfect seed complementarity is neither necessary nor sufficient for microRNA regulation.
Most computational methods require sites to have perfect seed complementarity ("canonical" sites) [8
], with only a few methods allowing for G:U wobbles or mismatches in the seed region [11
] ("non-canonical" sites). Other approaches consider predicted mRNA secondary structure and require energetically favorable hybridization between microRNA and target mRNA [13
]. However, for the most part, all these target prediction methods generate a large number of predictions, many of which are presumed to be false. To address this problem, virtually all computational methods filter predictions by conservation, which eliminates poorly conserved candidate sites from consideration.
Several studies have used genome-wide mRNA expression changes following microRNA transfection to elucidate microRNA target specificity rules [8
]. Grimson et al.
defined a four-class hierarchy of canonical seed types of differing efficiencies and identified additional "context" features of target sites that correlate (but only weakly) with reduced expression levels, in particular the AU content flanking the target site. Using univariate regression between feature scores and expression change, they developed a seed-class-dependent scoring system called "context score", which has been incorporated into the TargetScan prediction program. Nielsen et al.
assessed the significance of similar features by the shift in the cumulative distribution of log expression ratios using the same four-class seed hierarchy. Recently, proteomics studies of protein expression changes in response to microRNA transfection and knockdown [17
] corroborated a number of these specificity features. Importantly, these studies showed that most targets with significantly reduced protein levels also experienced detectable reduction in mRNA levels, indicating that changes in mRNA expression are reasonable indicators for microRNA regulation.
Here we present a new algorithm called mirSVR for scoring and ranking the efficiency of miRanda-predicted microRNA target sites by using supervised learning on mRNA expression changes following microRNA transfections. mirSVR incorporates target site information and contextual features into a single integrated model, without the need to define seed subclasses. We use support vector regression (SVR) to train on a wide range of features, including secondary structure accessibility of the site and conservation.
We first compared mirSVR against a number of existing target prediction algorithms using a large panel of independent microRNA transfection and inhibition experiments as test data. For a fair comparison, we limited consideration to sites with canonical seed pairing in this analysis. mirSVR performs as well as, and often better than, existing methods for the task of predicting the extent of downregulation of genes at the mRNA or protein level. The miRanda-mirSVR approach effectively broadens target prediction beyond the standard notion of seed hierarchy and strict conservation without introducing a large number of spurious predictions. In particular, we found that the mirSVR scoring model correctly identified functional but poorly conserved target sites, and that imposing a conversation filter results in a reduced rate of detection of true targets.
mirSVR downregulation scores are calibrated to correlate linearly with the extent of downregulation and therefore enable accurate scoring of genes with multiple target sites by simple addition of the individual target scores. Furthermore, the scores can be interpreted as an empirical probability of downregulation, which provides a meaningful guide for selecting a score cutoff. We found that the model can correctly identify genes that are regulated by multiple endogenous microRNAs - rather than transfected microRNAs whose concentrations are above physiological levels - by analyzing targets bound to human Argonaute (AGO) proteins as identified by AGO immunoprecipitation [19
]. We also revisited the idea of the seed hierarchy, and found that different seed types had wide and overlapping ranges of efficiencies. Finally, we tested the usefulness of including non-canonical sites in the model by evaluating performance on biochemically determined sites from recent Photo Activatable Ribonucleoside enhanced CLIP experiments (PAR-CLIP). In this data set approximately 7% of the detected sites do not contain perfect microRNA seed match to the expressed microRNAs [7
]. We found that miRanda-mirSVR indeed correctly identified a significant number of these experimentally verified non-canonical sites. miRanda target sites and mirSVR scores are available at http://www.microRNA.org