|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs (miRNAs), i.e. small non-coding RNA molecules (~22nt), can bind to one or more target sites on a gene transcript to negatively regulate protein expression, subsequently controlling many cellular mechanisms. A current and curated collection of miRNA–target interactions (MTIs) with experimental support is essential to thoroughly elucidating miRNA functions under different conditions and in different species. As a database, miRTarBase has accumulated more than 3500 MTIs by manually surveying pertinent literature after data mining of the text systematically to filter research articles related to functional studies of miRNAs. Generally, the collected MTIs are validated experimentally by reporter assays, western blot, or microarray experiments with overexpression or knockdown of miRNAs. miRTarBase curates 3576 experimentally verified MTIs between 657 miRNAs and 2297 target genes among 17 species. miRTarBase contains the largest amount of validated MTIs by comparing with other similar, previously developed databases. The MTIs collected in the miRTarBase can also provide a large amount of positive samples to develop computational methods capable of identifying miRNA–target interactions. miRTarBase is now available on http://miRTarBase.mbc.nctu.edu.tw/, and is updated frequently by continuously surveying research articles.
As small non-coding RNAs of ~22nt, microRNAs (miRNAs) regulate gene expression post-transcriptionally through suppressing mRNA translation or inducing mRNA degradation by hybridizing to the 3′-untranslated regions (3′-UTR) of mRNAs. Discovery of the first miRNA in Caenorhabditis elegans in 1993 (1) ushered in numerous studies on the cellular processes of these tiny regulatory RNAs for a large variety of metazoan. Identified in mammalian cells over the past two decades in thousands of varieties, miRNAs play critical roles in many biological processes, including cell-cycle control, cell growth and differentiation, apoptosis, as well as embryo development.
Many miRNA-related database systems have been developed in recent years to provide further insight into miRNAs and their target genes. miRBase (2) is the most complete repository for miRNA annotation and nomenclature. Until now, the miRBase (version 16.0) contains 15172 miRNA entries and many more new sequences are added regularly. miRGen (3), miRGator (4), miRDB (5), microRNA.org (6) and miRNAMap (7,8) provide miRNA targets by integrating extensively adopted target prediction programs. Moreover, TarBase (9), miRecords (10) and miR2Disease (11) contain experimentally validated miRNA–target interactions (MTIs). TarBase is the first resource to provide experimentally verified MTIs by surveying pertinent literature (9). miRecords accumulates both experimentally validated miRNA targets and computationally predicted miRNA targets (10). miR2Disease contains relationships among miRNAs, target genes and diseases in humans (11). miRSel (12) incorporates a text-mining method to extract systematically miRNA–target relationships from the PubMed abstracts. Additionally, among the several computational methods and web-based programs developed to identify target genes of miRNAs include miRanda (6), TargetScan (13), RNAhybrid (14), Pictar (15) and PITA (16). These extensively adopted research tools are highly promising for identifying MTIs, with their effectiveness confirmed experimentally.
miRNA-related research has grown exponentially in recent years (Figure 1). The accelerated rate of miRNA gene discovery warrants a thorough investigation of the functions of these miRNAs. Additionally, more than 20 databases and computational methods have been developed to identify candidates of MTIs. A curated collection of updated MTIs with experimental support is essential to elucidating miRNA functions under different conditions and in different species. This work presents a database, miRTarBase, which has accumulated more than 3000 MTIs collected by manually surveying pertinent literature after adopting a systematic text-mining procedure to select research articles related to functional studies of miRNAs. Generally, the collected MTIs are validated experimentally based on reporter assays, western blot, or microarray experiments with overexpression or knockdown of miRNAs.
Generally, an experimentally validated MTI initially involves using computational methods to identify target sites of miRNAs. These putative MTIs are then validated by molecular experiments, including reporter assays and western blot. Reporter assays and western blot are the conventional means of confirming the interaction between miRNA and its target mRNA. Additionally, northern blot analysis, quantitative real-time PCR (qPCR), or in situ hybridization is often performed to examine the co-expression of predicted miRNA and mRNA target gene. In contrast with conventionally adopted traditional validation approaches, genome-wide screening approaches have been developed, including microarray experiments with overexpression or knockdown of miRNAs, stable isotope labeling with amino acids in culture (SILAC) or pulsed SILAC (pSILAC). For instance, Selbach et al. determined the complement of all genes targeted by five miRNAs induced independently in HeLa cells using microarrays and pSILAC (17). That study identified more than 400 MTIs as well.
A growing number of miRNAs and their targets have been identified in recent years, combined with their major roles in biological systems, explaining the need to easily access an accurate, updated and centralized information repository. This work devises a frequently updated database by surveying continuously research articles with the pre-screening based on text-mining programs and the intention to make the database become a major repository for experimentally confirmed MTIs. In addition to containing the largest number of validated MTIs, the miRTarBase provides the most current collection by comparing with other similar databases developed previously, including TarBase, miRecords and miR2Disease. This work also elucidates the biological features of miRNA/target duplex based on more than 700 validated MTIs in human, where the miRNA target sites of MTIs have been described in previous literature. The MTIs collection in the miRTarBase can also become a largest amount of positive samples to develop computational methods in order to identify MTIs.
All database entries are collected manually to describe how a miRNA and its target genes are related to experimental support (Figure 2). Initially, all fields in the PubMed database are searched based on the keywords ‘microRNA targets’ or ‘miRNA targets’, followed by downloading the full text of these articles. A text-mining system is then devised to survey full-text literature that potentially describes MTIs, as verified by various experimental methods. Each research article is carefully reviewed by at least two of our developers to extract the MTIs, which are experimentally confirmed by reporter assays, western blot, microarray experiments, pSILAC or qRT–PCR. Additionally, other effective information is extracted, e.g. the species of miRNAs, species of target genes and experimental conditions.
The most effective means of verifying MTIs involves using fluorescence quantitative PCR and western blot methods to detect mRNA expression levels and protein expression levels under conditions of miRNA overexpression or miRNA knock-down cells. Despite the ability of the above methods to identify miRNA target genes accurately, other experimental methods must locate the regions targeted by miRNAs. Luciferase reporter assay is extensively adopted. Here, the MTIs are viewed as having strong support when they are validated by western blot, qPCR, or reporter assays.
High-throughput miRNA target identification methods, including pSILAC and microarray experiments, can determine the mRNA expression levels or protein expression levels when the miRNA is present or not (9). Given our inability to understand whether the over-expressed miRNAs cause the changed expression patterns directly or not, these technologies provide less strong experimental evidence of the collected MTIs.
In the release 1.0 (15 October 2010) of miRTarBase, 3576 curated MTIs between 657 miRNAs and 2297 target genes were collected from 985 articles. Table 1 lists the number of the collected MTIs in each species. For instance, 2531 human MTIs were collected between 287 miRNAs and 1631 target genes with the experimental support from 773 articles; in addition, 617 and 1409 interactions were confirmed experimentally by western blot and reporter assays, respectively. Each human miRNA can target five target genes on average. Supplementary Figure S1 shows the distribution of miRNAs categorized by the number of target genes for each miRNA which are supported by reporter assays or western blot. In the miRTarBase, hsa-miR-122 was recorded as having 45 target genes, which were validated experimentally by luciferase reporter assays or western blot. Notably, hsa-miR-122 is a liver-specific miRNA in humans and is significantly down-regulated in liver cancer (18).
This work also examines the functions of these target genes involved in human MTIs collected in the database by performing gene ontology (GO) and KEGG (19) pathway enrichment annotation using the DAVID gene annotation scheme (20). GO enrichment analysis indicates that the cellular process, biological regulation and metabolic process are the most significantly enriched GO terms to select human target genes (Supplementary Figure S2). Supplementary Table S1 lists the top 20 pathways significantly enriched in these human target genes, most of which are involved in cancer, including prostate, pancreatic, colorectal, small cell lung, endometrial, non-small cell lung and bladder. Interestingly, above analysis provides an overview of the possible functions of human miRNAs based on this curation of MTIs. This is despite the fact that the data should be biased since miRNAs have attracted increasing attention in cancer research recently.
Only the 709 human MTIs in the miRTarBase have miRNA target site annotations, which can be extracted from the articles. Of these target site sequences, nine of them only provide the sequence of seed region (<10nt); 667 of them contain the target site sequences (10–50nt), while the remaining ones (33) provide cloned partial UTR sequences (>50nt). Next, an attempt is made to summarize the data distributions of 12 biological features of the miRNA/target duplex in these 709 known human MTIs, as shown in Figure 3. The miRNA target sites are mapped to the 3′-UTR of the corresponding target gene; in addition, 70nt around the target site are extracted. Moreover, the miRNA target sites are selected when the alignment score (by mirnada) of miRNA/target duplex exceeds 100 and the number of base pairs within the seed region is higher than5. Notably, 721 miRNA target sites are obtained from the 709 MTIs. Figure 3 displays the histograms of various features of these miRNA–target duplexes. Figure 3A and B show the longest consecutive matches (excluding or including wobble pairing (GU pairing) in a seed region), i.e. a subsequence from nucleotide 1–8 in the 5′-end of the miRNA, respectively. More than 80% of all binding sites have more than seven bases of consecutive pairings. The minimum free energy of the seed regions and the binding sites is also calculated, as shown in Figure 3C and D, respectively. The mean value of the free energy of the binding site is approximately −14kcal/mol. The free energy of most of the seeds is lower than −6kcal/mol. Next, analysis is performed of the number of nucleotides matches, GU matches, and mismatches in the seed regions and the target sites. Figure 3E and J summarize these statistics. More than 85% of all target sites have at least six matches in the seed region; in addition, the GU matches rarely occur in the seed region, i.e. <30%. The number of matches is significantly higher than the number of mismatches. GU matches in the target sites are significantly smaller than the quantity of matches and mismatches. The interaction energy is also estimated based on the calculations of Kertesz M. et al. (16). According to our results, most of the interaction energy of target site accessibility shown in Figure 3K ranges from −10 to 10kcal/mol.
Comparing the other manually curated databases such as TarBase, miRecords, and miR2Disease reveals that miRTarBase accumulates a larger collection and a more updated curation of MTIs than other resources (Table 2). In particular, around 900 research articles are collected in our database. This comparison also reveals that the proposed miRTarBase has the most abundant MTIs, even if only considering the entries supported by reporter assays or western blot experiments. Furthermore, the Venn diagrams display the intersection of articles collected in different databases (Supplementary Figure S4). miRTarBase covers all of the research articles collected in TarBase, miRecords and miR2Disease.
Alternatively, the text-mining method can retrieve the information of relations between miRNAs and target genes. However, MTI is generally described in a natural language and is not easily extracted correctly by only computational methods. For instance, although important genes involved in the biogenesis of miRNA, DICER1 and Drosha are normally not the target genes of a miRNA when they are discussed along with the miRNA in an article. However, text-mining methods may identify the relation between a miRNA and DICER1, and incorrectly annotate the relation as a MTI. Therefore, manually reviewing the articles that may contain MTIs is inevitable for extracting such experimental evidence to support a MTI. Notably, this study does not compare the contents of the miRTarBase with those of other databases established by only text-mining methods without manual review.
miRTarBase provides various query interfaces and graphical visualization pages to facilitate the access of MTI data (Supplementary Figure S3). Several search functions for retrieving MTIs are designed, including search by miRNA accessions, search by target genes, and search by literature. Alternatively, miRTarBase provides a keyword search in all fields for all data entries. Here, a result page is designed to describe MTI, where each MTI is assigned a miRTarBase accession. The result page largely comprises three main parts, i.e. miRNA information, target gene information and evidence support. Generally, web pages of the miRTarBase contain many effective quick links to several other web resources, including NCBI Entrez (21), UCSC Genome Browser (22), miRBase (2), BioGPS (23), iHOP (24) and HGNC (25). The web pages are described in detail below.
The ‘miRNA information’ page contains the attributes of a miRNA such as accession, synonyms, descriptions, miRNA sequence, and links to other putative MTI databases. In particular, in this page all MTIs of miRNA are presented as a network, which can depict the relationships between a miRNA and multiple target genes. In the ‘Target Gene’ page, the basic information of a target gene is provided, including a gene symbol, description, genomic location, transcript sequence and links to other resources. Target site-related information located in the transcript is carefully examined and displayed on the web page. Notably, many articles only describe the regulatory relationship between a miRNA and its target genes without providing the exact regions of miRNA target sites. By using miRanda, this work attempts to identify computationally the potential target sites belonging to a MTI, which is supported by experimental evidence.
In the ‘Evidence’ page, the experimental information to support a MTI from one or multiple articles is provided by presenting the experimental validation methods, experimental conditions, location of target sites, computational tools used in an article, partial key descriptions extracted from the article and article abstract. Additionally, this resource also provides a data submission page that allows users or researchers to submit information of MTIs, which have not yet been curated. The database provides a convenient approach for users to directly suggest articles containing information about MTI, followed by review of the suggested articles by the developer of miRTarBase.
This work presents a more comprehensive collection of MTIs, which are validated experimentally. The biological features of miRNA/target duplex are observed based on the largest collection of human MTIs currently available. Various web interfaces are also designed to facilitate the presentation of MTIs. Moreover, a pipeline combining text-mining and manual review methods is established to extract MTI information from research articles.
Future work involving the proposed database should extend the human MTIs to mouse, rat and other mammalian genome based on evolutionary conservation of miRNA and its target sites. More probable MTIs can be provided as promising candidates for experimental confirmation.
miRTarBase database will be continuously maintained and updated monthly. The database is now publically accessible through the URL http://miRTarBase.mbc.nctu.edu.tw/.
Supplementary Data are available at NAR Online.
National Science Council of the Republic of China (Contract No. NSC 98-2311-B-009-004-MY3 and NSC 99-2627-B-009-003); National Research Program For Genomic Medicine (NRPGM), Taiwan; MOE ATU (partial). Funding for open access charge: National Science Council, Taiwan.
Conflict of interest statement. None declared.
Ted Knoy is appreciated for his editorial assistance. Special thanks for the financially supports from National Research Program for Genomic Medicine (NRPGM), Taiwan.