|Home | About | Journals | Submit | Contact Us | Français|
MicroRNA (miR) expression may have prognostic value for many types of cancers. However, the miR literature comprises many small studies. We systematically reviewed and synthesized the evidence.
Using MEDLINE (last update December 2010), we identified English language studies that examined associations between miRs and cancer prognosis using tumor specimens for more than 10 patients during classifier development. We included studies that assessed a major clinical outcome (nodal disease, disease progression, response to therapy, metastasis, recurrence, or overall survival) in an agnostic fashion using either polymerase chain reaction or hybridized oligonucleotide microarrays.
Forty-six articles presenting results on 43 studies pertaining to 20 different types of malignancy were eligible for inclusion in this review. The median study size was 65 patients (interquartile range [IQR] = 34–129), the median number of miRs assayed was 328 (IQR = 250–470), and overall survival or recurrence were the most commonly measured outcomes (30 and 19 studies, respectively). External validation was performed in 21 studies, 20 of which reported at least one nominally statistically significant result for a miR classifier. The median hazard ratio for poor outcome in externally validated studies was 2.52 (IQR = 2.26–5.40). For all classifier miRs in studies that evaluated overall survival across diverse malignancies, the miRs most frequently associated with poor outcome after accounting for differences in miR assessment due to platform type were let-7 (decreased expression in patients with cancer) and miR 21 (increased expression).
MiR classifiers show promising prognostic associations with major cancer outcomes and specific miRs are consistently identified across diverse studies and platforms. These types of classifiers require careful external validation in large groups of cancer patients that have adequate protection from bias. –
Many small studies have examined associations between tumor expression of microRNAs (miRs)—small, noncoding highly stable nucleic acids that regulate DNA translation to protein—and clinical outcome for a variety of cancers. However, there has been no formal systematic synthesis of the evidence on miR prognostic markers for cancer outcomes.
A systematic review of 46 articles presenting results on 43 studies pertaining to 20 different cancers that examined associations between miRs and prognosis using tumor specimens for more than 10 patients during classifier development. Included studies used either polymerase chain reaction or hybridized oligonucleotide microarrays to assess associations with a major clinical outcome in an agnostic fashion (ie, without selection for prior credibility).
The median study size was 65 patients, the median number of miRs assayed was 328, and the most commonly measured outcomes were overall survival and recurrence. Of the 21 studies that performed external validation, 20 reported at least one nominally statistically significant result for a miR classifier. The median hazard ratio for poor outcome in externally validated studies was 2.52. Two miRs appeared repeatedly among the selected classifiers for different malignancies: let-7 (decreased expression in patients with cancer) and miR 21 (increased expression).
MiR classifiers show promising prognostic associations with major cancer outcomes. That fact that specific miRs are consistently identified across diverse studies and assay platforms suggests that regulatory pathways coordinated by miRs operate across cancers.
The review excluded non–English language articles and studies that analyzed fewer than 20 miRs, cross-sectional studies, studies concerning genetic alterations of a miR, and studies involving oncoviruses. Some biases at the study or outcome level were difficult to assess due to non-standardized reporting in the primary studies.
From the Editors
Insights gained from novel molecular tests may improve the ability to make accurate prognostic assessments for patients with cancer. One of the most promising new biological measurements involves the expression of microRNAs (miRs), which are small, noncoding, highly stable 22-base-pair nucleic acids first discovered in the nematode Caenorhabditis elegans (1). Unlike messenger RNA, which serves as a template for DNA translation to protein, miRs are unique because they directly regulate this translation. To date, investigators have discovered several hundred of these oligomers (2). Furthermore, the ability of these molecules to act as gene suppressors or activators at multiple sites, their relatively limited number, and their remarkable in vivo stability make them appealing to study as clinical biomarkers in cancer and other diseases (3).
Many articles have published promising results for miR-based prognostic classifiers in diverse malignancies. Whereas some studies test only a handful of miRs on an opportunistic basis, there are also many studies that evaluate a large number of miRs in an agnostic fashion, that is, they select the most prognostic miRs from a large initial unselected set. The development of miR-based classifiers requires rigorous methods and validation processes, and it is often challenging to make sense of the evidence accumulated from complex “omics” studies to ensure that results are unbiased and well validated (4,5). This problem applies to the cancer miR literature as well, and it is accentuated by the fact that the literature comprises many, mostly small, studies among different cancer types. Currently, a formal systematic synthesis of the evidence on miR prognostic markers for cancer outcomes is lacking.
For this study, we systematically evaluated and synthesized the data from miR studies that have attempted to identify prognostic classifiers in malignancies. We describe the characteristics of studies published to date, assess study design and limitations thereof—with emphasis on the validation practices—and provide a quantitative synopsis of the performance of these biomarkers for important clinical outcomes in cancer. We further evaluate the specific miRs that are most frequently identified as important prognostic indicators of survival for diverse malignancies.
We searched MEDLINE for English language studies that analyzed associations between miRs and prognosis in cancer patients. The search strategy used the clinical queries prognosis filter, which has a sensitivity of 92% for detecting articles related to prognosis (6): (Prognosis/Broad [filter]) AND ((microRNA OR miRNA OR miR) AND (cancer OR tumor OR neoplas* OR tumour OR malignan* OR metastat* OR metastas* OR leukemia OR lymphoma OR recurren* OR tumor control OR lymph node OR response) AND (Humans[Mesh] AND English[lang])). The search was last updated to include articles published through December 31, 2010.
Studies were iteratively screened for inclusion at the title, abstract, and full-text levels. Titles or abstracts that were not clearly categorized as being eligible or not were reviewed at the next stage until a definitive assessment could be made. Articles were considered eligible if they described miR analyses in samples from patients with any type of malignancy, with the aim of evaluating major clinical outcomes. Eligible outcomes included disease progression, response to therapy, metastasis, recurrence, and overall survival. We only considered studies that analyzed a large number of miRs (>20) without specifying a priori that only a select number of miRs should be tested based on speculation about their biological importance. Studies that included such agnostic testing of multiple miRs were eligible for consideration in this review if they examined associations between these miRs and clinical outcomes in a single dataset or if they also performed validation of these associations (either by cross-validation or by external validation in a separate dataset). Articles that presented the results of studies with selective testing of one or a few miRs were eligible if they evaluated miRs associated with clinical outcomes in an earlier study that had started from the agnostic testing of a large number of miRs; these articles were then considered as efforts to validate the earlier studies. Finally, we only considered studies with a sample size exceeding 10 patients because the ability to evaluate any prognostic marker would be negligible in smaller studies.
Studies that used either quantitative real-time polymerase chain reaction (qRT-PCR) array or hybridized oligonucleotide microarray (oligoarray) platforms were eligible for inclusion. We excluded cross-sectional studies (eg, those that addressed associations with tumor stage), studies concerning genetic alterations of an miR (eg, polymorphisms or methylation patterns), and studies involving oncoviruses. Studies that defined associations with tumor histology, tumor size, tumor differentiation from normal tissue, or malignant potential without specifically examining associations with a clinical outcome were likewise excluded. Two reviewers (V. S. Nair and L. S. Maeda) identified eligible studies, and contested articles were adjudicated by a third reviewer (J. P. A. Ioannidis). The reference lists from included articles were examined to ensure that relevant studies were not missed.
We used a worksheet to record information about all studies that qualified for final inclusion. The worksheet documented relevant metrics identified during full-text retrieval of the included studies based on previous empirical evaluations of “omics” studies and information that we thought would be important in identifying systematic variation for this genre of literature (4,5,7). More specifically, the worksheet included information on the journal in which the study was published, demographics, study characteristics, array platform characteristics, technical validation procedures, statistical methods for selecting miRs, validation techniques, adjustment for other potential predictors, and clinical outcomes. The performance of miRs for prognosis was captured separately for training, cross-validation, and independent external validation settings whenever applicable.
Descriptive characteristics for studies were described by the median value with interquartile range (IQR) for continuous variables and by frequencies with percentages for categorical variables. MiR classifiers were defined by the authors of each study; some classifiers consisted of a single miR, whereas other classifiers combined information from multiple miRs. We abstracted the miR threshold published by the original investigators to define the association between miR expression and outcome along with the threshold metric used for this purpose. To quantify the risk of poor clinical outcome associated with each classifier, we extracted the hazard ratio (HR) from time-to-event analyses along with the 95% confidence interval for the defined miR categories. Whenever the 95% confidence interval was not directly provided, we approximated it using the hazard ratio and the P value from the log-rank test, or we estimated the variance of the natural logarithm of the hazard ratio using the formula [(n1 + n2)2/(n1 + n2) × t1 × t2], where n was the number of events per group delineated by miR expression levels and t was the total number of subjects per group (8). When the hazard ratio was not reported and it was not possible to approximate it with these calculations, we used the inverse of the ratio of the median survival times in groups that were defined by miR expression levels to approximate the hazard ratio. For consistency, all hazard ratios were expressed as values greater than 1 to denote increased risk (ie, HRs <1 were presented as the inverted ratio). No formal quantitative synthesis (meta-analysis) of these data was performed because they pertained to heterogeneous cancers, miRs, validation settings, and adjustments.
For estimates of effects derived from validation studies, we evaluated whether smaller studies yielded different effects compared with larger studies by creating a funnel plot with sample size on the vertical axis and log10 hazard ratio on the horizontal axis. We used sample size for the funnel plot because there were no complete data on standard errors for all estimates. We used linear regression (9) to assess whether the sample size was associated with the log10 hazard ratio. We also calculated the expected number of positive classifiers based on the power of each study to find nominally statistically significant (P < .05) results given the number of events and assuming an effect equal to the median effect found in these analyses. The expected number was then compared with the observed number using a χ2 test to determine whether an excess of statistically significant findings existed (10,11).
We also identified which specific miRs were identified most frequently across all cancers regardless of expression level direction and those with increased or decreased expression levels. This analysis required adjusting for how many times each miR was assayed because not all known miRs were assessed up front in all studies. We therefore counted how many opportunities each miR had for selection, or in other words, how many times it was assayed in the original agnostic platform, and also counted how many times each miR was identified in training classifiers used for survival analysis. The null hypothesis is that all miRs had the same probability of being selected in a prognostic classifier across any study. This expected probability is equal to pexp= Ncl/Npl, where Ncl is the sum of the number of statistically significant miRs across all survival classifiers and Npl is the sum of the number of miRs across all platforms evaluated in survival studies. We used a binomial test to determine whether the observed frequency of selection for each miR across all classifiers differed from the expected probability. In a sensitivity analysis, we also assessed variants of the same “parent” miR as separate opportunities for classifier prediction. The parent is defined here as the root number assigned to each miR without additional annotation according to the Welcome Trust Sanger Institute’s miRBase registry (12) and represents the fundamental unique biological miR. As a hypothetical example, the parent miR of miR 0*, 0a, 0-1, and 0-3p would be coded as miR 0. We limited our analysis to overall survival because recurrence, progression, and metastasis were often studied in concert with survival and were typically not studied independently (recurrence was examined in nine studies, progression in four studies, and metastasis in one study). MiR platform data were recovered from the supplementary material of the relevant studies, the Gene Expression Omnibus of the National Center for Biotechnology Information, or the European Bioinformatics Institute Array Express Database.
All calculations were performed using Excel software (for Mac 2011, Microsoft Corp, Seattle, WA) or SAS statistical software (v9.2; SAS Institute Inc, Cary, NC). Forest plots were developed using GraphPad Prism (v5; GraphPad Software, Inc, La Jolla, CA). P values are two-sided.
Of the 987 items retrieved by the MEDLINE search, 908 were excluded based on screening of the title or abstract and 79 were evaluated in full text. Of these, ultimately 46 articles (13–58) pertaining to 43 different studies (ie, unique miR classifiers associated with a clinical outcome) were considered eligible for inclusion in this review (Figure 1). One article (52) described a validation study of a classifier that had been developed but not externally validated in a previous article (18), another article (15) described a new validation study of a classifier that had already been externally validated in a previous article (14), and two other articles from the same research group (39,40) described the development and training of the same classifier and were counted as one study. We refer to these combined classifiers from here as “Calin–Visone” (18,52), “Bray–Buckley” (14,15), and “Navarro” (39,40) in the accompanying tables.
The median impact factor of the journals in which the eligible studies were published was 7.5 (IQR = 4.7–11). The first eligible article was published in the New England Journal of Medicine in 2005 and 21 different journals published this research (Table 1). The median study size was 65 patients (IQR = 34–129 patients), most studies analyzed solid tumors, and more than three times as many studies used frozen tumor tissue vs formalin-fixed paraffin-embedded tumor tissue. Hybridized oligoarrays were used by 81% of the studies to identify statistically significant miRs, whereas PCR arrays were used by 19% of the studies. The median number of miRs assayed across studies was 328 (IQR = 250–470).
In general, most studies that used oligoarray analysis reported data normalization procedures consisting of: 1) filtering probe signal intensity by mean normalization across replicates to adjust for sample variability, 2) median centering across arrays to adjust for interarray variability, and 3) log normalization to accommodate further statistical analyses (Supplementary Table 1, available online). By contrast, studies that used qRT-PCR arrays performed mean normalization for replicates based on the amplification threshold cycle (Ct) with normalization to an endogenous control to account for between-sample variability. Among the 43 included studies, 33 of the 35 oligoarray studies defined a normalization procedure for microarray data processing, with global median normalization being the most common method. Thirty-two oligoarray studies validated signal intensity by PCR, and none of the PCR array studies provided additional signal validation. Normalization using endogenous miR controls to account for between-sample variability for PCR arrays was performed in all cases, and the most commonly used endogenous miR controls were U6B and let-7. Comparative expression analysis was the dominant method used for quantifying PCR transcript expression (59).
Although the included studies used diverse statistical methods, the majority (31 [72%] of 43) provided methods to account for multiple comparisons, either by reporting a false discovery rate (Q value) (60) or an adjusted P value. The most commonly used Q value was 0.05 (n = 9) and the median adjusted P value was .005 (n = 8).
Thirty-three studies (13–19,22–28,32,33,35–37,39–50,53–55,58) performed either cross-validation or external validation, and seven studies (16,18,28,37,47,48,55) performed both cross-validation and external validation (Table 1). Among the 20 studies that performed cross-validation (14,16–19,23,26–28,36,37,41,42,45,47,48,53–55,58), the most common methods were K-means clustering with bootstrapping, leave-one-out cross-validation, and prediction analysis of microarrays (61–63). Two studies (23,24) used multiple cross-validation methods, and six studies (22,35,43,46,47,49) validated results using different model building methods for the same cohort. Of the 21 studies that carried a developed miR classifier to an external validation stage (14–19,23,27,28,32,36,37,41,42,47,48,53–55,58), 17 used a distinct previously unexamined group of patients, whereas the other four studies (16,28,37,58) that claimed to externally validate actually combined the training and test groups for final classifier modeling.
The most commonly measured outcomes were overall survival (30 studies) and recurrence (19 studies), and 10 studies examined both (Table 1). Metastasis, disease progression, and response to therapy were not commonly studied endpoints.
The included studies addressed outcomes for 20 different malignancies, with the most common being lung cancer (five studies), hepatocellular carcinoma (four studies), leukemia (four studies), and ovarian cancer (four studies) (Table 2). Several common cancers (ie, prostate, colon, and breast cancer) were evaluated in comparatively few studies. Training set sample size varied considerably. The largest training set was in a study of hepatocellular carcinoma (n = 131), and studies that addressed outcomes for gynecologic cancers generally had the smallest training cohorts. Sixteen studies assessed clinical variables in the miR training cohort, and 14 of these studies stated that these potentially prognostic variables were incorporated in multivariable adjusted analyses in the training set during miR classifier development (Table 2). Stage (n = 10 studies), age (n = 6 studies), grade (n = 5 studies), and sex (n = 5 studies) were the most commonly used adjustment factors.
Most studies assayed hundreds of miRs (range = 73–911), and the number of miRs that were found to be statistically significantly associated with an outcome of interest ranged from 1 to 42 (Table 2). Of the 43 studies, 14 used continuous miR expression levels to assess outcome without stratifying, whereas the remaining 29 stratified miR levels to assess outcomes (Supplementary Table 2, available online). When we summed the number of miRs that were statistically significantly associated with outcomes across all developed classifiers, the total number of miRs selected was 339, and of those, 150 displayed increased expression in samples of patients with cancer, 103 displayed decreased expression, and for 86, it was not clearly stated whether their expression was increased or decreased in patients with cancer.
Of the 21 studies that performed external validation, 11 carried forward to the test set all of the statistically significant miRs in the training set, whereas the remaining 10 carried forward only part of the developed classifier for external validation testing. Four of the studies that dropped part of the prognostic classifier in the test cohort did not provide a clear reason for doing so (Table 2).
The median sample size for training and test cohorts for the 21 studies that performed external validation was 62 (range = 20–131) and 55 (range = 16–147), respectively. These studies externally validated their data on a total of 1188 patients by using disease progression (one study), metastasis (two studies), recurrence (10 studies), or overall survival (14 studies) as an outcome measure (Table 3). Six of 21 studies provided externally validated data for more than one outcome. Nine studies used a classifier for outcome analysis that combined information for more than one miR, and 12 studies used only a single miR for outcome analysis. Six of these 12 studies presented separate analyses for several miRs.
To assess the effect of miR expression levels on patient outcome, studies performed survival analysis by regression modeling or Kaplan–Meier plots against a threshold miR expression level in most test sets except for one study where the threshold was not explicitly stated (52). Single miR classifiers used the median (n = 10 studies), mean (n = 1 study), or some other (n = 1 study) expression threshold to dichotomize expression levels. Composite classifiers used a “summary” score for further analysis and selected various cutoffs (Supplementary Table 2, available online).
For these 21 externally validated studies, 45 cancer-specific outcomes were assessed (Table 3). We imputed hazard ratios for 16 of the 45 outcomes reported in these 21 studies and 95% confidence intervals for 19 of the 45 outcomes for which this information was not explicitly provided. For 14 of the 45 outcomes, we could extract point estimates only. Examining unadjusted hazard ratios (unless only an adjusted value was available) revealed that 16 of the 45 miR classifiers were not statistically significantly associated with outcome (Table 3). The median hazard ratio in external validation studies that reported statistically significant results at the P = .05 level for which we were able to extract or approximately calculate a hazard ratio (n = 25) was 2.52 (IQR = 2.26–5.40; range = 1.39–19.7) (Figure 2). Fifteen of these 25 estimates had a P value less than .01. For all 45 reported hazard ratios, regardless of statistical significance, the median hazard ratio was 2.29 (IQR = 1.39–3.28) (Table 3).
Twelve studies adjusted for potentially prognostic clinical and pathological variables during multivariable regression in test cohorts (Table 3). Of the six studies that presented hazard ratios and 95% confidence intervals for both unadjusted and adjusted classifiers, only one (28) reported a substantially smaller hazard ratio with adjustment, whereas in the other studies (16,37,47,53,58), there was no substantial change or an increase in the hazard ratio estimate with adjustment. Only three studies performed adjusted and unadjusted analysis for their miR classifier in both training and test cohorts: two studies (47,53) adjusted for the same variables in both the test and training cohorts, and one study (45) adjusted for different variables.
As shown in Figure 3, there was a suggestion that large effect sizes were seen predominantly in smaller studies (P = .08 for correlation between log10HR and sample size). Moreover, we estimated the statistical power required for 40 of the 45 measured outcomes to achieve the median hazard ratio of 2.29 (we excluded five study outcome analyses for which the number of events was missing and could not be approximated). Power calculations showed that the expected number of measured outcomes with nominally statistically significant results among the 40 available was 17.9 vs the 25 observed (P = .02 for an excess statistical significance of expected vs observed positive classifiers).
Thirty studies evaluated overall survival as a clinical outcome in a training cohort, and for 27 of these studies, we were able to extract data for the full miR profiles tested (Supplementary Tables 3 and 4, available online). A total of 257 prognostic miRs from the original 339 miRs were identified in these 27 classifiers, and 20 miRs (7.8%) were identified statistically significantly more frequently than expected at an alpha level of 5%. Four miRs (let-7, 21, 100, and 125) were statistically significantly associated with overall survival at an alpha level of 1% (Table 4). In addition, four miRs (20, 21, 155, and 193) appeared more frequently than expected as miRs with increased expression and four miRs (let-7, 29, 30, and 7039) appeared more frequently than expected as miRs with decreased expression at an alpha level of .01. Let-7 and miR 21 were commonly part of classifiers associated with poor outcome, both overall and when miRs were grouped by direction of expression level. Expression of let-7 was decreased in a variety of tumors, including gastric, lung, liver, and ovarian carcinomas. Expression of miR 21 was increased consistently in lung, ovarian, and colon cancer as well as in astrocytoma. Additional sensitivity analyses for fully annotated miRs are presented in Supplementary Tables 5–8 (available online).
In this systematic review, we found that several miR classifiers were associated with prognosis above and beyond traditional clinical and pathological metrics for a diverse group of cancers. The median hazard ratio for statistically significant classifiers that were developed and tested in separate groups of patients was 2.52, suggesting a modestly strong discriminatory ability. Moreover, we identified specific miRs that appeared repeatedly among the selected classifiers for different malignancies. This finding suggests that miR-coordinated regulatory pathways are common to many cancers.
Although these results are promising, this overview suggests several caveats and areas for improvement in miR-related translational research. There are many critical steps in the appraisal of miRs for prognostic purposes, and errors or biases can appear at all steps (64), including data acquisition, which is subject to preanalytical imprecision (65), and the large variety of statistical methods that are applied for classifier generation, which allows for a range of different results and creates opportunities for selective analysis and other reporting biases. We found that adjustment for important prognostic factors in both training and test cohorts was neither common nor standardized. Moreover, miRs that were considered in datasets for external validation were often a selected subset of those found in the training sets and were sometimes chosen without clear justification. Furthermore, most test cohorts were small, and although a large number of classifiers reached nominally statistically significant results in external validation, many had wide 95% confidence intervals, and most P values were close to the .05 threshold. A “winner’s curse” has been demonstrated in other biomarker fields: small studies are typically performed first and have large effect sizes (66–68), whereas subsequent larger replication studies report smaller effects or no effect at all (69).
We also found some evidence in the available data that smaller studies tended to report the largest effect estimates and that the number of statistically significant studies was probably excessive given the generally small sample sizes and relatively limited statistical power of many of these investigations. These observations are consistent with the possibility of publication bias, although other explanations such as genuine heterogeneity of effects cannot be excluded, especially given the large diversity of cancers and markers analyzed (70).
An interesting question that remains poorly defined for the field is the number of features required for a robust classifier. Most studies in this review developed an miR classifier and carried it forward to an external test set using only a few miRs. Patnaik et al. (43) suggested that at least four miRs were required for optimal test performance when examining the area under the receiver operating characteristic curve during cross-validation, whereas Bray et al. (14) found very marginally, albeit statistically significantly, improved test performance when expanding their classifier from 10 to 42 miRs. Previous work in the field (71) suggests that differences in expression levels across genes necessarily define the size of a classifier and that a 1.5-fold increase in expression across seven different features (eg, miRs) will result in a small classification error rate between classes or populations (eg, patients with cancer who have different outcomes). Given that the majority of studies to date have externally validated classifiers with only one or a few selected miRs, it is likely that larger studies and a larger number of validated miRs will provide substantial improvement in discriminatory ability.
The miR field is evolving rapidly. The first database of miR sequences (miRBase) contained 506 entries for six organisms in 2002; by 2010, it contained 1424 entries for humans alone (2). Furthermore, miR terms are dynamic, and some miRs are “deceased” based on misclassifications due to high sequence similarities between preprocessed miRs, actual miRs, and minor variations among different miRs (72). Because platform arrays are made from probes that are specific for sequences and annotated to the most recent registry, each change can render current or previous studies inaccurate. For these reasons, we chose to use an ontological approach commonly used in gene enrichment analysis to account for some of these issues and to understand the expression of miRs across diverse cancers (73).
Although previous reviews (74,75) have illustrated the importance of certain cancer-associated miRs, including let-7 and miR 21, our analysis augments this literature by systematically pooling prognostic associations for classifiers across cancers. By using an ontological approach, we demonstrate that let-7 and miR 21 remained statistically significantly associated with outcome across all classifiers as well as when analyzed separately for miRs with increased (miR 21) or decreased (let-7) expression.
Specific miRs could be selected more frequently in cancer prognosis classifiers if either they have large average expression differences in patients with good vs poor prognosis and/or if the values of expression within each of these two groups have small variability. There are not sufficient data to tell whether the variances of the distributions of miR values consistently tend to be higher for some miRs than for others across very heterogeneous malignancies. One possibility is that miRs that are more frequently found to be differentially expressed have smaller variances within compared groups and thus have higher statistical power to detect differences. Alternatively, it is possible that some miRs indeed tend to be more important determinants of cancer survival than others.
Limitations of this systematic review are as follows: The review excluded non–English-language articles and studies that analyzed fewer than 20 miRs, cross-sectional studies, studies concerning genetic alterations of a miR, and studies involving oncoviruses. Some of the outcome data were missing (eg, HR estimates were not available or possible to calculate for all studies). Similarly, information was not always provided on each analyzed miR separately. Finally, some of the primary studies did not provide full details on their design and some selection or information biases cannot be excluded, whereas adjustments for other factors, including potential confounders were not standardized. This systematic review suggests that these are fronts where standardized reporting may improve the quality of this research literature in the future.
In conclusion, imperfect methodology and potential biases hinder the clinical application of miR expression testing as a biomarker of prognosis. However, the availability of many promising results suggests that larger scale standardized investigations with robust external validation practices for the most promising miRs are likely to better define the true potential of these novel cancer biomarkers and help select miR classifiers for further successful clinical translation.
This work was supported by the National Institutes of Health (5T32HL00794 to V.S.N.).
The authors take full responsibility for the study design, the data collection, the analysis and interpretation of the data, the decision to submit the article for publication, and the writing of the article.