Available probes on miRNA and mRNA microarrays
Filtering criteria (see Methods for more details) were applied to 59 NCI samples from nine tissues for all the expression profiles. The NCI-H23 cell line lacks mRNA data. The retained probes for correlation analysis on the microarray platforms should be those that appear in the downloaded miRNA-target data set and display adequate variability across the expression profiles. As illustrated in Figure , the red circles denote the number of probes whose corresponding targets or mature miRNAs can be found in the miRNA-target pairs predicted by TargetScan or miRBase, respectively. The blue circles indicate that 16,769 and 555 probes have at least two-fold difference in expression level between the maximum and minimum values among the 22,283 Affymetrix and 627 miRNA probes, respectively. The number in the intersection between the two circles is the set of probes used for correlation analyses.
Correlation of expression profiles between miRNA and TargetScan predicted targets
The data set was taken from the TargetScan 4.1 web site. It contains 46,458 predicted pairs comprising 162 conserved miRNA families and 7,927 target genes. Using the filtering criteria (described in Method, also see Figure ), we selected 284 miRNA probes and 8,813 Affymetrix probe sets to compute correlations. Among the 138,919 Pearson correlation coefficients and the corresponding p-values computed at the probe level, 2,976 probe-probe interactions, representing 1,389 predictions between 113 conserved miRNA families and 940 target genes, show statistical significance (Benjamini and Hochberg-adjusted p < 0.05). The percentages of positive and negative correlations are 39.28% and 60.72%, respectively. The density plot of the correlation is shown in Figure . As a random control test, we computed all the 2,502,892 correlations between 284 miRNA probes and 8,813 Affymetrix IDs, then randomly selected 138,919 values for 100 times to compare with those from the predicted pairs. The difference of the correlation coefficients between the TargetScan predicted pairs and the 100 random sets are all statistically significant (p < 2.2 × 10-16, Wilcoxon rank-sum test).
In the microarray data used in our study, a miRNA or a target gene could be represented by more than one probe thereby creating a problem of multiple probe-probe interactions. We found 664 miRNA-target pairs carrying multiple probe-probe correspondences that in turn lead to multiple correlation coefficients when comparing their expression profiles. For each of those 664 pairs, a standard deviation can be computed to estimate the variation of those correlation coefficients. Among the 664 pairs, 24 such standard deviations are found to be greater than 0.1, and only four pairs produce correlation coefficients with opposite signs. It suggests, in most cases, that probes representing the same gene indeed have similar expression profiles.
Correlations of expression profiles between miRNAs and miRBase predicted target
The predicted miRNA-target pairs provided by the miRBase::Targets database are presented as interactions between individual miRNAs and target mRNA transcripts. The miRBase::Targets 5 contains 676,265 paired predictions for human, composed of 711 mature miRNAs and 34,525 Ensembl transcript IDs. As shown in Figure , 460 miRNA probes and 14,640 Affymetrix probe sets were used to perform correlation analysis. We obtained 3,575 significant
p-values for the 293,176 correlations by using the Benjamini and Hochberg correction method. Those significantly correlated probe-probe pairs consist of 3,210 miRNA-target interactions, among which 303 miRNAs and 2,227 mRNA transcripts were found. As shown by the density plot in Figure , the negative correlations cover 46.18% of the significant pairs, comparing with 60.72% for the TargetScan predicted interactions. This observation is consistent with the earlier claims [
19] that the TargetScanS-predicted target genes are more likely to be down-regulated owing to miRNA-mediated mRNA degradation. For all the 6,734,400 correlations, we also tested whether correlation coefficients from the predicted and randomly selected probe-probe pairs are different. The
p-values from the 100 random tests indicate significant difference (
p < 10
-6, Wilcoxon rank-sum test).
As described earlier, certain miRNA-target pairs show multiple probe-probe correspondences. Among the 715 such miRNA-target pairs, 39 of them produce standard deviations greater than 0.1 for the multiple correlation coefficients. In four cases, correlation coefficients with opposite signs were found. The observation indicates that in most cases the expression profiles produced by different probes representing the same gene indeed resemble each other.
Comparison between TargetScan and miRBase
The density plot shown in Figure was prepared from the miRNA-mRNA interactions predicted from TargetScan and miRBase, respectively. Due to the different prediction strategy adopted by the two methods, we found 284 miRNA and 8,813 mRNA probes in NCI-60 data for TargetScan and 460 miRNA and 14,640 mRNA probes for miRBase. The uneven data size raises a question that whether the bias toward negatively correlated miRNA-mRNA for TargetBase predictions can be attributed to the algorithm design or simply to the differences in data sizes.
To address the concern, using the NCI-60 miRNA and mRNA expression data, we identified 17,777 miRNA-mRNA probe pairs that can be found in both the TargetScan and the miRBase predictions. We refer the 17,777 pairs as the common set. These pairs were then subject to the computation of correlation coefficients. After correcting the p-values of the correlation coefficients and keeping the 392 significant correlations, the resulting density plot is shown in Figure . Pairs predicted by TargetScan and by miRBase cannot be distinguished in Figure because in the common set all data point come from the predictions made by both methods. Nevertheless it is clear that negative correlations occur more often than positive ones indicating that the expression levels of a miRNA and its target mRNA are more likely to be negatively correlated across the NCI-60 data than to be positively correlated, at least for the common set described here.
To compare the two methods of TargetScan and miRBase, one has to be able to distinguish the performance of both methods using some measurements. The measurement should produce different results for targets predicted by different method. For example, Baek et al[
25] evaluated five target prediction methods by measuring the protein level change for targets predicted by each method. They also performed comparisons among the five methods using the best-scoring targets predicted by the respective approaches. Given that Figure does not produce conclusive comparison between TargetScan and miRBase using a common probe set, in the following we will compare the distributions of negative and positive miRNA-mRNA correlations using three new experiments.
To compare TargetScan and miRBase predictions, here we try to select equal number of statistically significant miRNA-mRNA pairs from both datasets. Using NCI-60 expression data, only those miRNA-mRNA pairs that have statistically significant Pearson correlation coefficients were retained. The correlation coefficients were ranked according to their absolute values. Only the top 1,000 pairs (from both datasets) with the most positive or negative correlation coefficients were used to draw the density plot shown in Figure . The figure shows the distribution of correlation coefficients of the 1,000 most correlated pairs for each of the two datasets. The distributions indicate that TargetScan-predicted miRNA targets tend to be more down-regulated when the comparison was performed using the same number, that is, 1,000, of the most correlated miRNA-mRNA interactions.
Next we want to study the typical experimental scenario where target mRNAs are to be predicted off a particular set of candidate miNRAs that might have been selected using microarray or qPCR experiments. That is, in this experiment we had a common set of miRNAs and two sets of targets predicted by TargetScan and miRBase, respectively. Corresponding probes were identified. The filtering criteria were applied. The resulting pairs were subject to the computation of Pearson correlation coefficients using NCI-60 expression data. After p-value correction with the Benjamini-Hochberg method, pairs with p < 0.05 were considered as the statistically significant pairs. Their density plot is shown in Figure . For the 2,975 significantly correlated TargetScan-predicted pairs, 60.63% of them show negative correlations. For miRBase, 51.7% of the 3,027 significantly correlated pairs show negative correlation. This result indicates that, starting from a common set of miRNAs, expression profiles are more useful when assessing targets predicted by TargetScan than by miRBase.
Sometimes we want to identify miRNAs that have potential interactions with a set of candidate mRNA transcripts. Thus in this experiment we first selected common mRNAs in TargetScan and miRBase databases. Then we retrieved the predicted miRNA-mRNA interactions. The corresponding probe sets in NCI-60 expression data were then collected. The correlation coefficients were computed as described. Figure shows the distribution of the significant correlation coefficients. The figure shows that 60.2% of 2,450 significantly correlated TargetScan-predicted pairs are negative, whereas 48.1% of 1,890 significantly correlated pairs are negative for miRBase. This result suggests that, given a common starting set of mRNAs, TargetScan predicted miRNA-mRNA interactions are more likely to show negative correlations and hence are possibly more suitable for the assessment using miRNA/mRNA expression profiles.
Correlations of expression profiles for validated miRNA-target pairs
To investigate whether experimentally validated miRNA-target interactions show negative correlations of expression profiles, we computed the correlation coefficients for the validated miRNA-target pairs collected by TarBase [
26]. The validated miRNA-mRNA pairs in Tarbase are divided into two classes according to their corresponding regulatory mechanisms: the translational repression and the mRNA degradation. The former class contains 96 pairs while the latter has 358 pairs. Among the 358 pairs listed by TarBase under the mRNA down-regulation or cleavage class, 274 of them have corresponding probes in the NCI-60 expression arrays. Similarly, probes in NCI-60 data can be found in 75 of the 96 translationally repressed miRNA-mRNA pairs.
Using Wilcoxon rank-sum test, the pairs involved in mRNA degradation show more negative correlations (p < 10-8) than those involved in translational repression. However, due to the multiple probe-probe correspondences, among the 349 (274+75) miRNA-target interactions we computed, 327 pairs have more than one correlation coefficient. Among the 327 pairs, the standard deviations of the correlation coefficients for 106 pairs are greater than 0.1, indicating that the profile correlation method is not reliable when applied to the validated data.
Correlation of expression profiles for predictions made by GenMiR++
GenMiR++ [
19] was developed to identify functional miRNA targets. GenMiR++ classifies putative human miRNA targets that were originally predicted by TargetScanS into confident and unsupported ones using mRNA expression profiles. The expression data were obtained from 88 common normal and cancerous tissue samples [
27,
28]. Here we are to compute Pearson correlation coefficients for such miRNA-mRNA pairs to investigate whether GenMiR++'s confident and unsupported pairs show different correlations in terms of NCI-60 expression profiles. For the 5,572 miRNA-target interactions predicted by GenMiR++, a total of 16,388 miRNA-mRNA pairs at the probe-probe level were found in NCI-60 data. Among them, 3,467 are classified into the high-confidence class by GenMiR++, 4,421 the low-confidence and 8,500 the unsupported class. The computation of Pearson correlation coefficients and the subsequent Benjamini/Hochberg correction lead to 80, 41 and 291 significant correlations for the high-confidence, low-confidence and unsupported classes, respectively. Their density plot is shown in Figure .
Looking at the GenMiR++ prediction set as a whole (see the subfigure of Figure that is labeled "All"), the density plot is similar to the one produced from TargetScanS's predictions, that is, negative correlations appear to be more dominant than positive ones. This result is expected because GenMiR++ takes TargetScanS's predictions as the input in the first place. What is worth noting is that the dominance of negative correlations seems to be more obvious in the unsupported class than in the high or low-confidence ones. While definite conclusion cannot be drawn from such a limited number of significant correlations (80 and 41 for the high and low-confidence classes, respectively), the observation might be attributed to two facts: (1) GenMiR++ and NCI-60 expression data come from different tissue samples; (2) For a given miRNA-mRNA interaction, GenMiR++ does not make its prediction using only the simple correlations in expression profiles. Instead, GenMiR++ considers all other predicted miRNA regulators of the mRNA using sophisticated inference algorithms [
19].
Among the interactions that were validated by Huang
et al.[
19], the NCI-60 data show that the expression profile of let-7b is negatively correlated with the validated targets, SMARCC1 (
r = -0.40,
p = 0.06), CDC25A (
r = -0.27,
p = 0.29) and BCL7A (
r = -0.37,
p = 0.09). Furthermore, because miRNAs having identical seed regions are classified into the same family in the current released version of TargetScan 4.1, different putative targets of a miRNA family may actually interact with different members in the family. For this reason, we decided to see whether the experimentally validated let-7b targets are correlated with other members within the let-7 family. We found highly correlated let-7 member-target pairs, including let-7f/SMARCC1 (
r = -0.49,
p = 9.41 × 10
-3), let-7a/CDC25A (
r = -0.57,
p = 9.54 × 10
-4), and let-7c/BCL7A (
r = -0.43,
p = 3.84 × 10
-2). Because a single miRNA recognition site could be targeted by miRNAs sharing the same seed sequence, we speculate that multiple miRNAs within the same family might simultaneously regulate the expression of the same target gene. As a result, the overexpression or suppression of different miRNA members in a miRNA family could lead to different changes in the expression levels of the target genes.
Associations between intronic miRNAs and host genes
We now ask whether intronic miRNAs have consistent expression profiles compared with their host genes. Here 139 probe-probe correlation coefficients representing 74 miRNA-host interactions were obtained. As shown in Figure , the distribution of those correlation coefficients is biased toward the positive correlation. Moreover, 41 significantly correlated pairs were found (Benjamini and Hochberg-adjusted p < 0.05), representing 25 interactions between 25 intronic miRNAs and 18 host genes. All of them are positively correlated (Table ). This result suggests that the transcriptional regulation of intronic miRNAs is, at least in some cases, similar to that of the mRNAs from the same host genes.
| Table 1Intronic miRNAs and their corresponding host genes show significantly correlated expression profiles. |