|Home | About | Journals | Submit | Contact Us | Français|
While a growing body of evidence implicates regulatory miRNA modules in various aspects of human disease and development, insights into specific miRNA function remain limited. Here, we present an innovative approach to elucidate tissue-specific miRNA functions that goes beyond miRNA target prediction and expression correlation. This approach is based on a multi-level integration of corresponding miRNA and mRNA gene expression levels, miRNA target prediction, transcription factor target prediction and mechanistic models of gene network regulation. Predicted miRNA functions were either validated experimentally or compared to published data. The predicted miRNA functions are accessible in the miRNA bodymap, an interactive online compendium and mining tool of high-dimensional newly generated and published miRNA expression profiles. The miRNA bodymap enables prioritization of candidate miRNAs based on their expression pattern or functional annotation across tissue or disease subgroup. The miRNA bodymap project provides users with a single one-stop data-mining solution and has great potential to become a community resource.
MicroRNAs (miRNAs) are small non-coding RNA molecules that function as indispensible regulators of an increasing number of cellular processes (1–4). The exact role of an individual miRNA strictly depends on its spatiotemporal expression pattern and that of its targeted genes. With >1000 mature human miRNA species reported thus far, miRNAs form one of the largest classes of gene regulators. While miRNA expression profiles have been established for various normal and diseased tissues, our understanding of specific miRNA function remains limited. To accommodate this, several experimental procedures have been developed for high-throughput miRNA target identification such as RIP-chip (5) and HITS-CLIP (6). Unfortunately, these methods are technically challenging and are typically performed for only one or few miRNAs, necessitating an up-front prioritization and selection of candidate miRNAs. Alternatively, computer-based miRNA target predictions can be used to gain insights into miRNA function by probing annotated gene sets for miRNA target enrichment (7,8). Of note, miRNA target prediction algorithms are prone to a high degree of false positives and completely ignore the tissue- or disease-specific nature of miRNA–target interactions.
Here, we present an innovative and sensitive method and accompanying resource to elucidate tissue-specific miRNA function by combining matching miRNA and mRNA expression data with miRNA target prediction and mechanistic models of gene network regulation. Inferred miRNA functions, based on different data sets, can be queried through the ‘miRNA bodymap’, a web tool available at www.mirnabodymap.org. To complement the functional predictions, we implemented an in-depth literature knowledge mining tool with result context highlighting to retrieve experimentally validated miRNA functions. In addition, the miRNA bodymap contains high-quality RT–qPCR miRNA expression profiles for more than 750 human, mouse and rat samples, belonging to different tissue and disease types, which can be examined through a built-in miRNA expression analysis pipeline.
RNA samples from 39 normal human tissues were obtained from Ambion and Biochain. Reverse transcription for 704 miRNAs, 18 small RNA controls and U6 was performed using stem–loop primers (Applied Biosystems) in single-plex reactions containing 45 ng of total RNA. qPCR reactions were performed in quadruplicate on a 7900 HT system (Applied Biosystems). Whole-genome stem–loop RT–qPCR miRNA expression data for over 700 additional samples were gathered from the literature. miRNA expression data were normalized according to the global mean normalization strategy (9). MiRNA expression data can be obtained from the miRNA bodymap web tool (www.miRNAbodymap.org). Microarray mRNA expression data were taken from GEO (GSE16558, GSE5846, GSE21713 and GSE1133).
For each individual data set, Spearman's rank rho values were calculated for each mRNA–miRNA combination using normalized mRNA and miRNA expression values. mRNA–miRNA combinations with less than 10 pair-wise observations were excluded from the analysis. For each miRNA, mRNAs were ranked according to their correlation coefficient and ranked gene lists were used as input for GSEA. The following gene set collections were taken from the Molecular Signatures Database (MSigDB v3.0): chemical and genetic perturbations, gene ontology molecular function and gene ontology biological process. Gene sets significantly enriched among the positive and negative correlating mRNAs were selected based on the GSEA FDR value (FDR<0.05). All analyses were performed using the R Bioconductor statistical programming platform (version 2.11).
One-way ANOVA was used to analyze the impact of the miRNA target prediction algorithm on protein downregulation. Two-by-two comparisons of individual prediction algorithms were performed by Tukey's ‘honest significant difference’ method to identify significant differences.
For each miRNA, predicted targets were derived from the MIRDB database (10,11) and enrichment of these targets in the different gene sets was calculated using Fisher's exact test. Fisher's exact P-values were multiple testing corrected using the Benjamini–Hochberg algorithm. Gene sets that are enriched among the mRNAs that negatively correlate with an miRNA and that are enriched for targets of that miRNA were assigned to the multiple component targeting model. To determine the enrichment of transcription factor targets in the different gene sets, we used the transcription factor targets gene set collection from the MSigDB v3.0. Enrichments were calculated using Fisher's exact test, and P-values were corrected for multiple testing using the Benjamini–Hochberg algorithm. Gene sets that are enriched among the mRNAs that positively correlate with an miRNA and that are enriched for targets of a transcription factor that is a predicted target of that miRNA (according to MIRDB) were assigned to the transcription factor targeting model.
To evaluate miR-29a binding to the MYCN 3′-UTR, 74-bp oligonucleotides spanning the predicted 3′-UTR miRNA binding site flanked by XhoI and NotI restriction sites were cloned into psicheck2 (Promega) as described previously (12). Oligonucleotides with a mutated binding site were used as control. DLD1Dicerhypo cells were seeded at a density of 10000 cells per well in an opaque 96-well plate. Twenty-four hours after seeding, cells were co-transfected with a miR-29a pre-miR (Ambion) or negative control pre-miR (Ambion) in combination with the 3′-UTR construct using DharmaFECT Duo (Dharmacon). Forty-eight hours after transfection, luciferase reporter gene activity was measured using the Dual-Glo Luciferase Assay System (Promega) and a FLUOstar OPTIMA microplate reader (BMG LABTECH).
SK-N-BE(2c) neurobblastoma cells were cultured in RPMI (Invitrogen) supplemented with 10% fetal calf serum and transfected with a miR-29a pre-miR (Ambion) or negative control pre-miR (Ambion) as described previously (13). Cells were harvested 48 h after transfection and RNA was isolated using the miRNeasy mini kit (Qiagen) according to the manufacturer's instructions. Total RNA was reverse transcribed using the iScript cDNA synthesis kit (BioRad). MYCN mRNA expression was detected using quantitative PCR and SYBRGreen detection chemistry (Roche) on a Lightcycler 480 (Roche). MYCN expression data were normalized to the expression of stable reference genes (Alu-sx, HPRT1, SDHA and UBC) using qbasePLUS software (www.biogazelle.com).
To determine tissue or disease-specific miRNA functions, matching miRNA and mRNA expression levels were analyzed using rank correlation coefficients. Matching genome-wide mRNA and miRNA expression data were obtained from the literature or newly generated. In total, 244 human samples belonging to 4 different data sets (normal adult tissues, neuroblastoma tumors, myeloma tumors and NCI60 cancer cell lines) were included in the analysis. For each miRNA, mRNAs were ranked according to their correlation coefficient and functional gene sets, enriched among the positively or negatively correlated mRNAs, were identified using gene set enrichment analysis (GSEA) (14). We next integrated the GSEA results with miRNA target prediction, transcription factor target prediction and mechanistic models of gene network regulation. These models represent specific interaction schemes between an miRNA and a gene set (or pathway), and form the basis of the mechanism underlying a particular miRNA—gene set association. We hypothesized that a significant miRNA—gene set association is more likely to be functional if there is mechanistic evidence that links the association between an miRNA and a gene set to one of five proposed models (Figure 1). According to these models, negative miRNA—gene set associations can occur if the gene set is enriched for targets of the miRNA (multi-component targeting), if the miRNA targets a key signaling molecule in the pathway represented by the gene set (component targeting) or if the miRNA negatively regulates a transcriptional activator that has its targets enriched in the gene set (transcription factor targeting) (Figure 1A). Similarly, positive miRNA–gene set associations can occur if the miRNA negatively regulates a transcriptional repressor with its targets enriched in the gene set (transcription factor targeting), if the miRNA targets a negative regulator of a pathway or if the miRNA and the genes in the gene set share a common transcriptional activator or repressor (Figure 1B).
In order to integrate miRNA target predictions in the mechanistic models, we first made a rational selection of one single miRNA target prediction database. To this end, we used publically available mass spectrometry protein expression data from eight miRNA perturbation experiments (15,16) and evaluated seven widely used miRNA target databases for their ability to predict protein downregulation. We found that the targets in the MIRDB database showed the highest fold downregulation in the experimental data sets (Figure 2A and B) suggesting that MIRDB predicts targets that are primarily downregulated upon overexpression of the miRNA. Fold downregulation observed for MIRDB targets was significantly higher compared to targets predicted by PITA, RNA22, MICROCOSM, TARGETSCAN and DIANA (P<0.001). Furthermore, MIRDB showed the lowest number of false positive predictions (Figure 2C). Combining individual databases did not give better results that those obtained by MIRDB alone (Figure 2D) suggesting that, according to our selection procedure, MIRDB predictions are best suited for accurate assessment of miRNA target enrichment in the studied gene sets. MIRDB predictions were then used to calculate miRNA target enrichments in the gene sets in order to identify those negative miRNA–gene set associations that adhere to the multi-component targeting model. To identify miRNA–gene set associations that follow the mechanistic model of transcription factor targeting, we searched for gene sets that are enriched for transcription factor targets and used MIRDB to predict whether the transcription factor is a target of the miRNA.
To support the predicted miRNA functions and their assignment to one of the proposed models of gene regulation, we first compared our findings to experimentally validated miRNA functions. For each of the five proposed models, representative examples were found in the literature (Supplementary Figure S1), underscoring the relevance of the models and the power of the miRNA bodymap functional annotation pipeline. To further assess the accuracy of the pipeline, we compared functional predictions for miRNAs from the miR-17–92 cluster to a set of experimentally derived miR-17–92 functions. To this end, we used a recently published protein expression data set from a miR-17–92 perturbation experiment in neuroblastoma cells as our benchmark (13). Measured proteins (n=3249) were ranked according to their fold change and GSEA revealed 94 gene sets enriched among the downregulated proteins. We then compared these experimentally derived gene sets to gene sets predicted to be negatively associated with miR-17–92 in the neuroblastoma tumor data set. In total, 78 out of 94 experimentally validated gene sets were predicted suggesting that the predicted miRNA functions show a good concordance with experimentally derived functions. Another 201 gene sets were predicted that were not identified in the benchmark data set. Interestingly, the most significant gene sets [GSEA false discovery rate (FDR)=0], according to the annotation pipeline, were more frequently present in the benchmark data set as compared to the least significant gene sets suggesting that predicted gene sets can be further prioritized based on the GSEA FDR value, greatly enhancing prediction specificity. Furthermore, the most significant gene sets (GSEA FDR=0) were found to be largely independent of the sample size of the data set (Supplementary Figure S2).
In addition to functional miRNA annotation, the miRNA bodymap enables the detection of regulators of miRNA expression. Such regulators are identified by looking for positive associations between an miRNA and a gene set representing targets of a transcription factor (i.e. the miRNA and the genes in the gene set share a common transcriptional activator, Figure 1). To test this assumption, we searched for miRNAs that were positively correlated to a gene set containing MYC target genes (SCHUHMACHER_MYC_TARGETS_UP) in the neuroblastoma data set and selected only the most significant associations (GSEA FDR=0). The selected miRNAs, 38 in total, were compared to a set of 18 MYC/MYCN-activated miRNAs, previously validated in neuroblastoma (17). In total, 16 out of 18 miRNAs were identified suggesting high concordance between predicted and validated interactions.
To further assess the validity of our approach, we evaluated inferred miRNA annotations for tissue-specific miRNAs and hypothesized that a tissue-specific miRNA should play a role in pathways relevant for that tissue. In the normal tissues data set, we searched for miRNAs that are highly expressed in tissues of the lymphatic system compared to all other tissues in that data set. The expression of five lymphatic system-specific miRNAs (miR-142-3p, miR-150, miR-15, miR-146a and miR-150*) is visualized in a ranked expression map (Figure 3A). The most significant gene sets (GSEA FDR=0, gene ontology biological process gene set collection) for each of these miRNAs are primarily annotated to processes such as immune response and immune cell activation (Figure 3B). Furthermore, we found miR-142-3p and miR-155 to be associated with the NF-kB pathway, a signaling cascade involved in immune response. Next, we selected tissue-specific gene sets, such as ‘heart development’, ‘brain development’, muscle development and ‘digestion’ from the gene ontology biological process gene set collection and identified miRNAs that are predicted to be positively associated with those gene sets in the normal tissues data set. miRNAs annotated to ‘brain development’ showed the highest expression in brain while miRNAs annotated to ‘heart development’ showed the highest expression in cardiovascular tissues, skeletal muscle and mononuclear blood cells (Figure 3C and D). Similarly, miRNAs annotated to muscle development showed the highest expression in skeletal muscle and miRNAs annotated to ‘digestion’ had the highest expression in tissues from the gastro-intestinal tract (Figure 3E and F). Together, these results support the relation between tissue-specific expression and function and lend further credibility to the predicted miRNA functions.
MiRNAs have been shown to act as key components in transcription factor signaling networks, either through cooperation with a transcription factor in the process of gene expression regulation (18) or through direct regulation of the transcription factor itself (19). Using the miRNA bodymap annotation pipeline, we searched for miRNAs regulating the MYCN transcription factor in neuroblastoma. To identify miRNAs regulating MYCN, we searched for miRNAs that negatively correlate to a gene set containing MYC targets (SCHUHMACHER_MYC_TARGETS_UP) with a GSEA FDR value=0 and that are predicted to target MYCN according to MIRDB predictions. We identified a single miRNA (miR-29a) that could meet these criteria. To evaluate whether miR-29a directly targets the MYCN 3′-UTR, we established a 3′-UTR luciferase reporter vector containing the predicted miR-29a binding site downstream of the luciferase gene and evaluated luciferase activity in the presence of a miR-29a pre-miR or negative control pre-miR. Luciferase activity significantly decreased in the presence of miR-29a compared to the negative control pre-miR (Student's t-test, P<0.01) (Figure 4A) suggesting that MYCN is a target of miR-29a. Mutation of the miR-29a binding site resulted in a partial but significant rescue of luciferase activity (Figure 4A). Furthermore, overexpression of miR-29a in MYCN amplified SK-N-BE(2c) cells resulted in a 40% decrease of MYCN expression levels (Figure 4B).
Predicted miRNA functions available in the miRNA bodymap are based on three different gene set collections: gene ontology biological process, gene ontology molecular function and chemical and genetic perturbations. On top of that, the miRNA bodymap allows users to perform GSEA with custom gene sets obtained from the literature or derived from their own perturbation or profiling experiments. Based on the KEGG pathway database (20), we established a gene set representing the p53-signaling pathway and a gene set representing the B-cell receptor signaling pathway and searched for miRNA associations in the NCI60 data set and normal adult tissues data set, respectively. Five miRNAs were highly associated with increased p53 pathway activity (GSEA FDR=0) including miR-34a, miR-373 and miR-141. All three miRNAs have previously been shown to function downstream of p53 (21–23). MiRNAs associated with B-cell receptor signaling were, among others, miR-150 (GSEA FDR=0), miR-155 (GSEA FDR=0) and different members of the miR-17–92 cluster (GSEA FDR=0), all of which were shown to play important roles in B-cell development (24), B-cell receptor activation (25) or lymphoproliferative disease (26).
Elucidating tissue-specific miRNA functions has become one of the major challenges in miRNA research. While miRNA target prediction databases provide insights into putative miRNA target genes, they ignore the tissue- and context-specific nature of an miRNA–target interaction. To overcome this issue, we designed a functional miRNA annotation pipeline based on the integration of miRNA target prediction with matching miRNA and mRNA expression data. In addition, we introduced different interaction schemes representing the mechanistic relation between the miRNA and the genes within the pathway that is predicted to be regulated by the miRNA. We hereby hypothesized that the probability of obtaining a valid functional miRNA prediction increases when there is a mechanistic basis that supports it. This mechanistic basis can be a predicted interaction between the miRNA and a transcription factor within the pathway or an overrepresentation of miRNA targets in the pathway. MiRNA targets were predicted using the MIRDB algorithm that we identified to correlate best with protein expression data sets from miRNA perturbation experiments. Of note, this analysis is not comprehensive but provides an indication which miRNA target prediction algorithm produces the least false positive predictions. Ideally, high-throughput reporter assays should be performed for all (or a significant subset of) predicted miRNA targets in order to comprehensively assess the accuracy of miRNA target prediction databases. Functional miRNA predictions for four different data sets are available through the miRNA bodymap web tool and can be queried based on the underlying interaction schemes. In addition, the miRNA bodymap web tool contains an miRNA expression analysis pipeline that enables identification of differentially expressed miRNAs between tissues or disease groups, tissue- or disease-specific miRNAs and stably expressed miRNAs for miRNA expression normalization. Combining tissue- or disease-specific miRNA expression patterns with functional miRNA predictions allows prioritization of candidate miRNAs and hypothesis generation for further research. The miRNA bodymap web tool enables researchers to upload additional data sets of matching mRNA and miRNA expression data to populate the database with additional functional miRNA predictions. In this way, miRNA functions that are predicted in multiple independent data sets can be further prioritized, making the miRNA bodymap a community resource for functional miRNA research. A manual describing the entire functionality of the miRNA bodymap web tool is available on the website.
Supplementary Data are available at NAR Online.
This work was supported by the Research Foundation-Flanders (G.0198.08, 31516610 and 31511809); the Children Cancer Fund, the Stichting Tegen Kanker and GOA (01G01910). PM is supported by the Ghent University Research Fund (BOF 01D31406 and the Belgian Program of Interuniversity Poles of Attraction, SL is supported by the Ghent University Research Fund (BOF/ZAP/094, FP is a postdoctoral fellow of the Research Foundation-Flanders (FWO). Funding for open access charge: Research Foundation-Flanders.
Conflict of interest statement. None declared.
P.M., S.L., F.P., E.F., A.F., M.O. and J.V. performed the analysis and developped the web tool. P.M. and J.V. wrote the manuscript. D.R., L.W. and C.C. performed miRNA expresssion analysis. A.D.P., F.S. and J.V. supervised the study. This article represents research results of the Belgian program of Interuniversity Poles of Attraction, initiated by the Belgian State, Prime Minister's Office, Science Policy Programming.