The scores obtained by each study are presented in . The immediate striking finding from this table is that none of the studies succeeded in showing improvement in predictive power for the gene expression signatures over and above known risk factors. In fact, the majority of the risk factors outlined by the NCCN were not even considered by most of the studies. For example, according to NCCN guidelines (4
), completeness of resection is the most important decision variable after stage; it has also been shown to statistically significantly influence survival (31
). However, only seven of the 16 studies (9
) stated that completeness of resection was a criterion for patient selection. In addition, only three studies (10
) placed sufficient importance on patient selection by adhering to consecutive enrollment of patients who had undergone complete resection and received no adjuvant therapy, and only nine studies (9
) reported having used snap-frozen tissues (). These points indicate that most of the studies reviewed were based on the use of a convenience sample of patients for whom tissue was available, with limited attention to either patient selection or the collection of important information about them to address specific questions of therapeutic decision making.
Comparison of prognostic gene expression studies*
The most important medical question that needs to be answered by a new prognostic signature in NSCLC is whether it can identify the subset of stage IA patients who might benefit from adjuvant chemotherapy. However, only two studies (20
) presented validation results for the prognostic signature separately for stage IA patients. Although the stratification results for stage IA patients in the study by Potti et al. (20
) look promising, this signature failed to achieve statistical significance in a subsequent independent validation effort by Shedden et al. (11
), even for stratifying stage I patients. In the only other study that presented validation results for stage IA samples (21
), both the predicted low-risk and high-risk groups achieved 100% 3-year survival.
Most of the studies (9
) presented overall validation results for stage I patients. The 3-year overall survival rates for stage I patients in the predicted high- and low-risk groups in the validation datasets of these studies () show that some of the signatures succeeded in identifying high-risk stage I patients [eg, (17
), studies that reported 40% or less 3-year overall survival for the high-risk group]. However, an evaluation of whether the signature predicted overall survival better than tumor size (ie, using information on whether the tumor stage was IA or IB) and other standard risk factors was not adequately addressed and hence unclear from most of these studies. Only Sun et al. (12
) reported a marginal improvement in predictive accuracy for their gene expression signature over tumor size for stage I patients. However, the area under the receiver operating characteristic curve increased only from 0.63 to 0.67.
Three-year overall survival for stage I patients in validation datasets*
The recent large multicenter study (11
) compared many genomic prognostic models with a model that used clinical covariates alone to predict overall survival in lung cancer patients. The best of the genomic models (identified as method A in the article) provided a statistically significant prognostic gradient for stage I patients in only one of the two validation sets. The authors, however, did not report whether the prognostic gradient for a combined model incorporating gene expression and clinical covariates was statistically significantly greater than that for the model containing only clinical covariates. Also, the clinical information included only age and sex (not tumor size). Separate validation for stage IA and stage IB samples was not addressed in this study.
Identification of the subset of stage IB and stage II patients who are at a low risk of disease recurrence without chemotherapy is also an important medical need. Only the study by Lu et al. (21
) presented separate validation results for stage IB patients. The 3-year overall survival was 100% for the low-risk group and 70% for the high-risk group. The study by Roepman et al. (10
) was the only one that reported statistical significance of the prognostic signature for validation in stage II samples. The 3-year survival rate for their low-risk stage II group was approximately 90%. These survival estimates, however, were based on very small sample sizes [38 patients in the study by Lu et al. (21
) and 24 patients in the study by Roepman et al. (10
)], and the authors did not compare the predictive power of their signature with that obtained using standard risk factors. None of the other studies showed results separately for stage IB or stage II samples; however, two studies (15
) pointed out that the respective signatures did not statistically significantly distinguish prognosis in stage II validation samples. As pointed out in the respective publications, the lack of predictiveness for stage II patients could have resulted from the small number of stage II patients in the samples.
Most of the studies presented validation results on data that were not used for developing the predictive signatures (). Four studies (15
) developed signatures that were subsequently independently evaluated by other authors. Only the signature reported by Beer et al. (24
) provided a statistically significant difference in outcome of the low-risk vs high-risk group on independent validation by Sun et al. (12
). However, the signature was not statistically significantly prognostic after adjustment for clinical covariates. This validation study of the Beer et al. signature by Sun et al. (12
) also included all stages of disease and reported no separate analysis of stage I or stage II patients. Sun et al. (12
) also attempted to validate the signature reported by Raponi et al. (22
), which provided nearly statistically significant differences (P
= .09) in outcome among the predicted risk groups after adjusting for clinical covariates. However, this validation study also included patients from all stages and again, no separate validation of stage I or stage II patients was reported.
Shedden et al. (11
) attempted unsuccessfully to validate the signatures reported by Chen et al. (15
) and Potti et al. (20
). In neither case was there convincing evidence that the signatures alone provided statistically significant risk discrimination for stage I or stage II patients. Shedden et al. (11
) reported that the signature of Chen et al. (15
), when combined with clinical covariates, provided statistically significant risk discrimination for one of their validation sets of stage I patients. However, in this case, the model with clinical covariates (age and sex) alone gave statistically significant discrimination, and no evidence was presented that the signature added statistically significant prognostic power to the clinical covariates.
The studies by Shedden et al. (11
) and Sun et al. (12
) were the only attempts at independent validation of prognostic signatures reported by others. Such attempts at independent assessment of signatures are difficult because the prognostic models are often not fully specified in the original publications; in most cases, only the list of statistically significant genes is provided. A predictive signature is not just a gene list. To enable independent confirmation of a prognostic signature, all other aspects of the predictive model, such as weights and cut points, should also be reported. Only three of the 16 studies we reviewed presented fully specified models (). It is interesting that these three studies were RT-qPCR studies with simple three- to five-gene prognostic models. Two of these studies (15
) specified the normalization and preprocessing steps used to apply their prognostic models to microarray data. We attempted to independently assess the prognostic signatures reported in (15
) and (16
) for stage IA and stage IB samples using the data of Shedden et al. (11
). However, in our validation study, the signatures did not demonstrate statistically significant differences in outcome among the predicted risk groups (; Supplementary Methods
, available online).
Figure 1 Independent validation of gene expression–based prognostic signatures on stage IA (left) and stage IB (right) samples obtained from the datasets reported by Shedden et al. (11). Kaplan–Meier survival curves for the five-gene signature (more ...)
In developing predictive models that use data in which the number of variables is much greater than the number of samples, it is essential to separate the data used for model development from the data used for model evaluation (33
). Statistics that are computed by using the same data for model development and evaluation are called “resubstitution” statistics. The separation between the Kaplan–Meier survival curves for low- and high-risk patients of the training set used for model development is an example of a resubstitution statistic. Even though the enormous bias involved in presenting such resubstitution statistics has been repeatedly emphasized (25
), presentation of resubstitution statistics has again emerged as an area of concern in our analysis, with nine studies (9
) presenting such biased survival curves. We conducted a small simulation study to demonstrate the bias involved in presenting resubstitution-based estimates of prediction accuracy for prognostic models. Full details on the methodology for this simulation study are provided in the Supplementary Methods
(available online). Our simulation studies show that even with completely random gene expression profiles, a prognostic model can always be developed that provides excellent associations with survival time for the training set. The poor predictive power of the model in such cases is revealed only when applied to independent validation data ().
Figure 2 Kaplan–Meier survival estimates for the simulation study. *Prediction accuracy for the training and validation datasets with random gene expression profiles. For this simulation, survival data on 129 patients were obtained from Bild et al. (32 (more ...)
None of the 16 studies reviewed adequately addressed the question of the predictive power that could be attained by using easily measurable clinicopathological factors for stage I samples. We attempted to analyze the predictive power of clinicopathological factors for stage I samples by using the training data from Shedden et al. (11
). We developed a predictive model based on age, tumor stage (IA vs IB), and adjuvant chemotherapy (received vs not received) for stage I patients [the study by Shedden et al. (11
) was among those studies that did not exclude patients receiving adjuvant chemotherapy]. Full details on the methods used for this study are provided in Supplementary Methods
(available online). Statistically significant separation of the risk groups (P
= .013) was obtained for the test datasets using this model (). An unexpected finding from this analysis was the poorer outcome for stage I patients who received adjuvant chemotherapy (, ). The poorer outcome for stage I patients receiving adjuvant chemotherapy is probably because “adjuvant chemotherapy” acts as a surrogate variable for risk factors that are being used by clinicians to select stage I patients for chemotherapy. These risk factors unfortunately were neither recorded nor analyzed in the publications reviewed. Our analysis also emphasizes again the importance of establishing appropriate patient selection criteria for studies of gene expression–based prognostic signatures. Because the objective of such studies is to identify patients for adjuvant chemotherapy, they should be restricted to patients who do not receive adjuvant chemotherapy.
Figure 3 Evaluation of the prognostic models for stage I samples developed using clinical information alone () on the Memorial Sloan-Kettering Cancer Center (MSK) and Dana-Farber Cancer Institute (CAN/DF) test datasets of Shedden et al. (11). The (two-sided) (more ...)
Cox regression analysis of association between patient characteristics and overall survival for stage I samples*
Figure 4 Effect of adjuvant chemotherapy on survival. These survival curves are based on the combined University of Michigan Cancer Center and Moffitt Cancer Center (HLM), Memorial Sloan-Kettering Cancer Center (MSK), and Dana-Farber Cancer Institute (CAN/DF) (more ...)
On the basis of observations made during this review and considering previous publications on analysis and reporting recommendations for microarray studies (25
), we present a set of design, analysis, and reporting practice guidelines for prognostic gene expression studies, with a focus on NSCLC ().
Guidelines for prognostic factor studies in NSCLC*