|Home | About | Journals | Submit | Contact Us | Français|
Computational identification of putative microRNA (miRNA) targets is an important step towards elucidating miRNA functions. Several miRNA target-prediction algorithms have been developed followed by publicly available databases of these predictions. Here we present a new database offering miRNA target predictions of several binding types, identified by our recently developed modular algorithm RepTar. RepTar is based on identification of repetitive elements in 3′-UTRs and is independent of both evolutionary conservation and conventional binding patterns (i.e. Watson–Crick pairing of ‘seed’ regions). The modularity of RepTar enables the prediction of targets with conventional seed sites as well as rarer targets with non-conventional sites, such as sites with seed wobbles (G-U pairing in the seed region), 3′-compensatory sites and the newly discovered centered sites. Furthermore, RepTar’s independence of conservation enables the prediction of cellular targets of the less evolutionarily conserved viral miRNAs. Thus, the RepTar database contains genome-wide predictions of human and mouse miRNAs as well as predictions of cellular targets of human and mouse viral miRNAs. These predictions are presented in a user-friendly database, which allows browsing through the putative sites as well as conducting simple and advanced queries including data intersections of various types. The RepTar database is available at http://reptar.ekmd.huji.ac.il.
Since microRNAs (miRNAs) emerged as key regulators of gene expression many experimental and computational efforts have been made to identify their targets. Over the years, numerous target-prediction algorithms have been developed, offering a plethora of predicted miRNA target genes [for review, see (1)]. Most of these algorithms used two main features observed in early experimentally discovered targets: the complementarity of the target gene’s 3′-UTR region to the miRNA ‘seed’ region (~7 nt in the 5′-region of the miRNA) [reviewed in (1)], and the evolutionary conservation of the binding sites in the 3′-UTR. However, since these earlier discoveries, several functional miRNA targets were shown to lack seed pairing and to compensate for this by extensive binding at the 3′-region of the miRNA (termed ‘3′-compensatory’ sites) (2,3), and many genes lacking evolutionary conservation in their 3′-UTR-binding sites were found to be targeted by miRNAs as well. Recently, the miRNA binding options were expanded further with the identification of ‘centered sites’, functional miRNA target sites that lack both perfect seed pairing and 3′-compensatory pairing and instead exhibit pairing with the target along 11–12 contiguous pairs at the center of the miRNA (4). While some algorithms relaxed the evolutionary conservation criterion (5–11) and/or offer also predictions of 3′-compensatory sites [e.g. (6,12,13)], few databases offer predictions of the whole repertoire of miRNA targeting patterns. Furthermore to date, no database lists genome-wide prediction of cellular targets of viral miRNAs. These miRNAs lack significant evolutionary conservation and their targets are not necessarily expected to be evolutionarily conserved. In addition, the few identified viral miRNA targets have shown both conventional seed binding and 3′-compensatory binding [e.g. (3,14)].
Here we present a database of genome-wide miRNA target predictions for mouse and human genes, based on the predictions of our novel target prediction algorithm, RepTar (described hereinafter). The RepTar database adds to the currently available databases in three major aspects: first, it offers a wide repertoire of binding-site variants, including conserved and non-conserved seed sites, wobble seed sites, 3′-compensatory sites, full match sites and the recently discovered centered sites (Figure 1). Second, it offers for the first time in addition to the predictions of human and mouse miRNA targets, genome-wide predictions of cellular targets of human and mouse viral miRNAs. Third, it offers the users both simple and advanced query options to mine the data and obtain predictions of interest.
The RepTar database is based on our recently developed RepTar algorithm (Figure 2), which is described in detail in the Supplementary data. In brief, based on the finding that a miRNA may have multiple binding sites in the 3′-UTR of its target (15–17), RepTar characterizes repetitive sequences in the 3′-UTRs of genes in order to identify conventional and non-conventional miRNA-binding sites. It first searches for statistically significant repetitive motifs in each 3′-UTR. These motifs are comprised of 7 nt with up to two mismatched positions and occur at least three times in each 3′-UTR. Next, the instances of these repeating motifs are represented by a profile hidden Markov model (HMM). These HMMs are used to identify miRNA sequences that can base pair with the repeating motifs they represent. To identify such miRNAs, we reverse complement the HMMs so that they represent the reverse-complementary sequence of the repeat and then use these complementary HMMs to search for matching miRNA sequences. While reversing the HMMs we also allow for the pairing of G–U. The nature of the HMMs allows for the identification of miRNAs that match them perfectly as well as imperfectly, as it allows insertions and deletions in the alignment. The initial search does not impose any restriction on the location of the match within the miRNA (whether it is in the 5′-end or 3′-end). For HMMs that match a miRNA, the binding pattern and the thermodynamic stability of the miRNA–mRNA duplex are evaluated using the Vienna package RNAcofold program (18). To qualify as a putative binding site, the miRNA–mRNA duplex must exhibit an adequate free energy score (less than or equal to −10 kcal/mol), and a binding pattern of either a seed site, 3′-compensatory site or full-match site (Figure 1). For a gene to be considered a putative target gene, the 3′-UTR must contain at least two qualified binding sites. Using the requirement for repetitiveness of the sites provides a constricted and more reliable set of miRNA-binding sites, without considering evolutionary conservation.
Next, this set of binding sites is used to identify targets with non-repetitive sites or with sites that lack a high-scoring repetitive motif. The HMMs created in the previous step for a given miRNA are combined and the top-scoring-combined HMMs are then used to search for additional binding sites in the whole 3′-UTR database. Top-scoring hits to these combined HMMs are further evaluated for thermodynamic stability and binding pattern as above.
Following the recent discovery of centered sites, HMM hits that fulfill the criteria of a centered site and show thermodynamic stability are no longer excluded and are considered an additional type of miRNA-binding site.
The final set of RepTar predictions include conserved and non-conserved targets that have either multiple or single binding sites of various types: seed binding sites, wobble seed sites, 3′-compensatory binding sites, centered sites and full match sites.
The RepTar algorithm was applied to all human 3′-UTR sequences, searching for putative targets of all human miRNAs as well as Epstein–Barr virus (EBV), human cytomegalovirus (HCMV) and Kaposi’s sarcoma herpes virus (KSHV) miRNAs. We also applied RepTar to all mouse 3′-UTR sequences searching for putative targets of all mouse miRNAs as well as mouse cytomegalovirus (MCMV) and mouse gammaherpesvirus (MGHV) miRNAs. The human and mouse sets of 3′-UTR sequences were extracted from the UCSC Genome Browser hg18 and mm9 databases, respectively (http://genome.ucsc.edu/, 19). All miRNA sequences (human, mouse and viral) were extracted from miRBase registry release 15 (http://microrna.sanger.ac.uk/, 20–22).
We first assessed the accuracy of RepTar’s predictions using a database of experimentally determined direct targets by small-scale experiments (23). We successfully predicted 142 out of 197 reported direct targets of human and mouse miRNAs, defining a sensitivity of 72% on this data set. This result is highly statistically significant (P ≤ 2.5e–64 by a hypergeometric test). In a recent assessment of miRNA-target-prediction algorithms (24), the predictions of each algorithm were compared to published data on measured changes of protein levels after overexpression or underexpression of the miRNAs (25). This comparison demonstrated that most miRNA target prediction algorithms showed precision values between 23 and 58%, and sensitivity values ~10%, hence missing many of the known targets. Using the same data we assessed RepTar’s performance, achieving precision of 25% and a sensitivity of 23%. RepTar’s method of identifying a wide variety of binding sites renders its precision in the range of the more permissive miRNA target prediction methods but awards it greater sensitivity in identifying sites missed by other tools. To validate these non-conventional miRNA-target predictions we used experimental data of the change in mRNA or protein expression levels following overexpression or deletion of several human miRNAs (17,26). We demonstrated that the predicted 3′-compensatory sites as well as the non-conserved seed sites and the wobble seed sites are statistically significantly downregulated (Supplementary Data).
To validate RepTar’s predictions of cellular targets of viral miRNAs we compiled a set of 21 experimentally validated direct targets of cellular genes (Supplementary Table S1). RepTar correctly predicted 15/22 cellular targets of viral miRNAs, constituting a sensitivity of 71%, similar to that obtained for the human and mouse miRNAs.
The RepTar database offers its users an interface for both browsing and querying the multiple data sets of RepTar predictions. The database is meant to allow the user to easily find predicted targets of interest, and also to allow more advanced analyses of large sets of targets. The user may browse the database of all target predictions of miRNAs of each organism (human, mouse or viruses) or conduct a more specific search by querying the database. The different ways of querying the database are grouped into two main categories, a ‘simple target search’ and an ‘advanced target search’. Both search options allow the user to set his own cutoff values for the parameters RepTar uses to characterize its predictions, with the advanced search allowing more flexibility than the simple search. The major parameters include the minimal free energy of the miRNA–mRNA duplex, the binding site profile (whether it is a seed site, a 3′-compensatory site, a centered site or a nearly fully matched site), the conservation value for the binding site, the number of G–U base pairs within the binding site, the presence of genomic repeats in the site, and the number of sites within the target’s 3′-UTR. For example, the user can filter the results requesting only seed binding sites that are highly evolutionarily conserved and that do not contain G–U pairing, or can request 3′-compensatory sites that are not evolutionarily conserved and overlap a genomic repeating element. Once a query is built it may be saved and uploaded at a later time for future use.
The ‘simple target search’ provides the user with the most commonly used search options of miRNA-target prediction algorithms: (i) a search of all targets and binding sites predicted for a single miRNA or a group of miRNAs; (ii) a search of all miRNAs predicted to target a single gene or a group of genes, and their binding sites. In addition, the simple search provides the user with a more sophisticated option, to search for shared predicted targets of a query miRNA group or for miRNAs that are predicted to commonly target a query gene group. For example, a user may study the activity of a genomic cluster of miRNAs and can inquire for common predicted targets of all the miRNAs in a cluster. A user may study the regulation of cell-cycle genes and look for a single miRNA or a group of miRNAs that are predicted to target each of the cell-cycle genes. In addition, as mentioned above, for all types of queries the user can filter the predictions by the different parameters. To use the simple target search the user should enter (or upload) a single (or a list) of miRNA names or gene names and to set the filter parameters at his requested cutoffs. The ‘simple target search’ provides the user with a simple interface for conducting such common and useful queries (Figure 3).
The ‘advanced target search’ allows more versatile and complex queries. It enables the user to employ more types of filters than in the simple search and in particular, enables global queries on the data regardless of specific miRNAs or genes. For example, the user can ask for all predicted binding sites that are localized at the beginning of the 3′-UTR, or for targets with 3′-compensatory binding sites or for all targets that overlap a genomic repeat, etc. The advanced target search comprises a set of rules that is defined by the user. This set of rules is joined together using logic operators to form a more complex query. The combination of the rules with the different logic operators opens up numerous advanced search options. To search for intersections of the data, the user may join the rules with the logic operator AND (for example, all sites that are predicted targets of a group of miRNAs ‘AND’ are localized at the beginning of the 3′-UTR). To search for exclusive sets, the rules can be joined with the operator OR. For example, all predicted targets that are either highly evolutionarily conserved ‘OR’ have a pre-set minimum free energy score. The user may also negate any rule by using ‘NOT’ before the rule. For example, the user can search for all binding sites that are predicted in a given set of genes but are ‘NOT’ targets of ‘star’ miRNAs. The different rule options and logic operators provide a flexible interface for advanced user-tailored queries of RepTar’s predictions. A step by step tutorial is supplied on the web site to guide the user through this sophisticated search option.
A submitted query produces as output all the miRNA target sites that match the query’s conditions (Figure 4). The output contains an entry for each predicted binding site with detailed information. This information includes the target’s gene name (as GeneSymbol and RefSeq accessions), the targeting miRNA name and sequence, the coordinates of the location of the binding site within the 3′-UTR sequence, the computed minimal free energy of the predicted miRNA–mRNA duplex, the binding site type, the evolutionary conservation score of the binding site, a schematic representation of the pattern of base pairing of miRNA–mRNA and several other fields. These fields are presented in a dynamic table where the user can choose to display or hide any of the fields. The results can be sorted by the various displayed fields. For example, clicking on the heading of the minimal free energy will sort all results by the minimal free energy scores, by either ascending or descending order. The table may be displayed on several web pages, or alternatively to facilitate its use, the full table of results may be downloaded and saved as a tab-delimited text file. In addition, each miRNA that appears in the table is also linked to a list of all genes targeted by it. Likewise, each gene is linked to a list of all miRNAs that target it. This provides a quick and general view of the predicted targets.
We present the RepTar database of miRNA target predictions. This database provides a comprehensive set of conventional (‘seed’ type) and non-conventional miRNA target predictions, including 3′-compensatory and centered sites. It offers genome-wide predictions of cellular targets of host and viral miRNAs and provides sophisticated data-mining techniques for querying the large data set of miRNA-target predictions.
Supplementary Data are available at NAR Online.
Funding for open access charge: Israeli Cancer Research Fund; The Israeli Science Foundation administered by the Israel Academy of Sciences and Humanities (granted to H.M.); Azrieli Foundation Fellowship (granted to N.E.).
Conflict of interest statement. None declared.
We thank Shai Fisher and Roman Melnikov for their assistance in configuring the database, and Diego Kramer, Vladimir Dubnikov and Roy Shapira for their help in installing the database.