Several studies of primary lung adenocarcinoma or NSCLC have reported the ability to generate expression signatures capable of grouping subjects according to their survival outcomes. However, most studies are small (approximately 100 subjects or fewer) and typically drew data from a single treatment institution. Gene expression profiles with real clinical applicability must be recognizable despite variability that might occur in the processing of samples at different institutions. So far, little has been published on the ability of prognostic methods for lung cancer to perform in larger data sets or with independent validation samples. Often the published signatures show little overlap in the genes identified as significant predictors of outcome. Thus there is a strong possibility that sample collection methods, processing protocols, single-institution subject cohorts, small sample sizes, and peculiarities of the different microarray platforms are contributing significantly to the results. To address these issues, a multi-institutional collaborative study was conducted to generate gene expression profiles from a large number of samples with a priori determined clinical features that could be used to fully evaluate proposed prognostic models for potential clinical implementation.
The design and execution of the present study was performed recognizing the specific issues discussed above. Significant emphasis was placed on reducing technical variability by using similar protocols, reagents and platforms12
, so that the major uncontrolled variables represent the biology of the lung cancers and associated clinical data. The sample sizes used for training and validation were determined to be of sufficient size, and two blinded external validation sets were used to provide a realistic assessment of the performance of each prognostic method. This is in contrast to the more common approach of obtaining all the data from a single source and randomly assigning samples to training and validation sets for the development and assessment of classifiers. Furthermore, great care was taken to standardize the pathological assessment of each tumor sample and the collection of clinical information across all institutions involved in this study. The lessons learned from this coordinated effort will likely influence the research practice for future profiling efforts in lung cancer.
Several classifiers were developed from the training data and tested on the independent data sets. These classifiers represent many of the established techniques for classifier development, with novel approaches also represented. The classifiers had various levels of success in stratifying subjects according to risk. Two of the methods (C and E) showed little predictive capacity. The poor performance of method E was expected as one individual gene parameter is too sensitive to noise to perform well in gene expression data collected from multiple institutions. More complex classifiers showed better success, with a few classifiers demonstrating the ability to classify across different institutional data sets, and within the stage 1 tumors. The most successful classifiers at stratifying stage 1 samples were trained on samples from all stages. This suggests that heterogeneity of aggressiveness exist in stage 1 tumors, and the pattern of gene expression in higher stage tumors is informative for predicting the risk of stage 1 tumors. We note that the power for comparing classifiers tends to be lower than the power for identifying differentially expressed genes. This study was not adequately powered to draw sharp lines between the performances of different classifiers.
Method A, which worked with all tumor samples or with Stage I samples alone, both with and without clinical covariates, showed the best overall predictive ability. Method H also had good performance without clinical covariates. The genes in these classifiers may provide insight into the biology of aggressive tumors. Method A relied on the correlated expression of multiple gene clusters to predict subject outcome. Relatively higher expression of genes in cluster 6 of method A (545 genes) was associated with poor subject outcome. This cluster included cell proliferation-related genes including cyclin A (CCNA2) and other cyclins, BUB1B, topoisomerases, check point genes (CHEK1), chromosomal and spindle protein genes. Method H also relied heavily on these genes for classification. This is consistent with elevated cell proliferation and loss of cell cycle control being associated with poor outcomes7
. Greater expression of genes in cluster 4 of method A (262 genes), cluster 5 (82 genes), and cluster 12 (427 genes) were associated with better survival. Cluster 4 includes several differentiation related genes such as thyroid transcription factor 1 (TITF1), pulmonary-associated surfactant protein B (SFTPB), as well as G protein-coupled receptor 116 (GPR116) and MAP3K12 binding inhibitory protein 1 (MBIP) while cluster 12 contains many immunological-related genes. This is consistent with tumors showing some aspects of recognition by the subject’s immune system having better outcomes14
. The variety of genes found useful for classification suggests that multiple mechanisms contribute to the clinical progression of lung adenocarcinomas and that multiple classifiers may be equally effective.
This study provides a realistic assessment of the challenges in developing prognostic models for early stage lung cancer. A significant degree of outcome prediction accuracy was observed using gene expression data alone, yet the hazard ratios for most of our models increase with the inclusion of clinical data (). Conversely, gene expression data improves the predictive performance of clinical parameters alone (method I), compared to method A which uses gene expression and clinical variables. We note that even this uniquely large study was not adequately powered to make comparisons between classification methods with high statistical confidence. Nevertheless, some interesting trends emerge. For the all-stage analysis, method I (clinical variables only) was competitive with most of the procedures using gene expression data without clinical variables, consistent with gene expression largely recapitulating stage. However it is notable that method A with covariates performs substantially better on the CAN/DF samples than either method A without covariates, or method I. In the stage 1 analysis, the clinical variables reduce to age and sex. In the MSK test set, these variables are uninformative about disease risk, so the fact that gene expression appears to risk stratify subjects in method A is important. The predictive performance of method I in the stage 1 CAN/DF test set is driven by a strong association with age. However it is unclear how far this relationship will generalize. Therefore, an integrated approach using gene expression together with associated clinical, pathological, and other information may be more promising for future work, as has previously been pointed out in studies examining prostate and breast cancer15,16
. While it is not possible to attribute the slightly better results across the hypotheses and test sets with method A compared to the other methods to specific classifier properties, we do note that method A did utilize substantially more genes than the other approaches and incorporated an initial gene clustering procedure. These properties may have contributed to its more consistent performance. We have provided a detailed discussion of the challenges in using gene expression profiling for lung cancer prognosis in practice in Supplementary Materials section 2
. Our findings suggest that clinical covariates should be collected with the same care as utilized for obtaining gene expression signatures.
The present study was designed to address three key issues in the field of gene expression based outcome prediction. First, this study provides the largest gene expression data set with pathological and clinical annotation for lung adenocarcinomas to date. Because of the large sample size, additional analyses of prognostic genes associated with specific histological subtypes, such as bronchioalvelolar carcinomas, can now be undertaken. Extensive pathological and mutational annotation of each specimen is ongoing and this careful assessment will provide an extremely valuable resource for hypothesis generation. Secondly, this data was used to test in a rigorous manner the current methodologies used to predict tumor biology and, by inference, subject prognosis from gene expression signatures. Finally, this study was used to identify issues relevant to the use of gene expression profiles that should be taken into consideration in designing future studies. We had observed previously12
that the biological variation between tumors exceeds the technical variation introduced by microarray analysis. We have observed in this study that clinical covariates improve upon gene expression alone as a mechanism for stratifying tumor samples. We have also learned that coordinating the collection of clinical and pathological data across several institutions is an important task for prospective studies designed to further refine prognostic signatures. There are also limitations in using subject survival as an end-point that may be overcome by using time to tumor recurrence as the primary endpoint in place of overall survival. Although there still remain significant challenges to the use of gene expression-based classifiers in the clinical setting, the potential that these tools can improve subject care and increase survival provides a strong impetus to continue to refine these approaches for eventual clinical utilization.