In this systematic review, we found that several miR classifiers were associated with prognosis above and beyond traditional clinical and pathological metrics for a diverse group of cancers. The median hazard ratio for statistically significant classifiers that were developed and tested in separate groups of patients was 2.52, suggesting a modestly strong discriminatory ability. Moreover, we identified specific miRs that appeared repeatedly among the selected classifiers for different malignancies. This finding suggests that miR-coordinated regulatory pathways are common to many cancers.
Although these results are promising, this overview suggests several caveats and areas for improvement in miR-related translational research. There are many critical steps in the appraisal of miRs for prognostic purposes, and errors or biases can appear at all steps (64
), including data acquisition, which is subject to preanalytical imprecision (65
), and the large variety of statistical methods that are applied for classifier generation, which allows for a range of different results and creates opportunities for selective analysis and other reporting biases. We found that adjustment for important prognostic factors in both training and test cohorts was neither common nor standardized. Moreover, miRs that were considered in datasets for external validation were often a selected subset of those found in the training sets and were sometimes chosen without clear justification. Furthermore, most test cohorts were small, and although a large number of classifiers reached nominally statistically significant results in external validation, many had wide 95% confidence intervals, and most P
values were close to the .05 threshold. A “winner’s curse” has been demonstrated in other biomarker fields: small studies are typically performed first and have large effect sizes (66
), whereas subsequent larger replication studies report smaller effects or no effect at all (69
We also found some evidence in the available data that smaller studies tended to report the largest effect estimates and that the number of statistically significant studies was probably excessive given the generally small sample sizes and relatively limited statistical power of many of these investigations. These observations are consistent with the possibility of publication bias, although other explanations such as genuine heterogeneity of effects cannot be excluded, especially given the large diversity of cancers and markers analyzed (70
An interesting question that remains poorly defined for the field is the number of features required for a robust classifier. Most studies in this review developed an miR classifier and carried it forward to an external test set using only a few miRs. Patnaik et al. (43
) suggested that at least four miRs were required for optimal test performance when examining the area under the receiver operating characteristic curve during cross-validation, whereas Bray et al. (14
) found very marginally, albeit statistically significantly, improved test performance when expanding their classifier from 10 to 42 miRs. Previous work in the field (71
) suggests that differences in expression levels across genes necessarily define the size of a classifier and that a 1.5-fold increase in expression across seven different features (eg, miRs) will result in a small classification error rate between classes or populations (eg, patients with cancer who have different outcomes). Given that the majority of studies to date have externally validated classifiers with only one or a few selected miRs, it is likely that larger studies and a larger number of validated miRs will provide substantial improvement in discriminatory ability.
The miR field is evolving rapidly. The first database of miR sequences (miRBase) contained 506 entries for six organisms in 2002; by 2010, it contained 1424 entries for humans alone (2
). Furthermore, miR terms are dynamic, and some miRs are “deceased” based on misclassifications due to high sequence similarities between preprocessed miRs, actual miRs, and minor variations among different miRs (72
). Because platform arrays are made from probes that are specific for sequences and annotated to the most recent registry, each change can render current or previous studies inaccurate. For these reasons, we chose to use an ontological approach commonly used in gene enrichment analysis to account for some of these issues and to understand the expression of miRs across diverse cancers (73
Although previous reviews (74
) have illustrated the importance of certain cancer-associated miRs, including let-7 and miR 21, our analysis augments this literature by systematically pooling prognostic associations for classifiers across cancers. By using an ontological approach, we demonstrate that let-7 and miR 21 remained statistically significantly associated with outcome across all classifiers as well as when analyzed separately for miRs with increased (miR 21) or decreased (let-7) expression.
Specific miRs could be selected more frequently in cancer prognosis classifiers if either they have large average expression differences in patients with good vs poor prognosis and/or if the values of expression within each of these two groups have small variability. There are not sufficient data to tell whether the variances of the distributions of miR values consistently tend to be higher for some miRs than for others across very heterogeneous malignancies. One possibility is that miRs that are more frequently found to be differentially expressed have smaller variances within compared groups and thus have higher statistical power to detect differences. Alternatively, it is possible that some miRs indeed tend to be more important determinants of cancer survival than others.
Limitations of this systematic review are as follows: The review excluded non–English-language articles and studies that analyzed fewer than 20 miRs, cross-sectional studies, studies concerning genetic alterations of a miR, and studies involving oncoviruses. Some of the outcome data were missing (eg, HR estimates were not available or possible to calculate for all studies). Similarly, information was not always provided on each analyzed miR separately. Finally, some of the primary studies did not provide full details on their design and some selection or information biases cannot be excluded, whereas adjustments for other factors, including potential confounders were not standardized. This systematic review suggests that these are fronts where standardized reporting may improve the quality of this research literature in the future.
In conclusion, imperfect methodology and potential biases hinder the clinical application of miR expression testing as a biomarker of prognosis. However, the availability of many promising results suggests that larger scale standardized investigations with robust external validation practices for the most promising miRs are likely to better define the true potential of these novel cancer biomarkers and help select miR classifiers for further successful clinical translation.