Thirty to 35% of Stage 1 NSCLC patients relapse following tumor resection(56
). Clinical trials have indicated a potential survival advantage for early-stage lung cancer patients who receive adjuvant chemotherapy (58
). However, it would be useful to identify the subset of these patients who are at low risk for relapse to spare them the side effects of unnecessary treatment. Gene expression profiles have the potential to augment current prognostic indicators such as clincopathological stage, K-ras
mutations, poor differentiation, and high tumor proliferative index.
Prognostic gene expression signatures for NSCLC
The Garber et al.
and Bhattacharjee et al.
studies found correlations between molecular subgroups of lung AD and prognosis. These findings set the stage for the publication of several studies that used supervised approaches to identify genes associated with prognosis among early-stage ADs. The supervised approaches first stratify patients by known outcome, identify genes associated with these outcomes in a set of training samples, and use these genes and an algorithm to predict the outcome of additional test set samples. In 2002, Beer et al.
) used DNA microarrays to measure gene expression levels in 67 stage I ADs, 19 stage III ADs, and 10 non-neoplastic lung tissues. Stage I and III tumors were divided into training and testing sets and 50 genes associated with survival were identified across the training set using univariate Cox proportional-hazard regression modeling. Expression levels of these genes were combined using a prediction algorithm to calculate a risk index which was then used to stratify patients into low- and high-risk groups. There was a significant difference in survival between test set samples as a whole and the subgroup of stage I test samples predicted to be low- or high-risk. Interestingly, stratifying patients by prognostic markers such as K-ras
mutation status did not identify subgroups with a significant difference in survival. After refining the predictor, it was validated across 84 lung AD samples from Bhattacharjee et al
. and patients assigned to the low- and high-risk groups by gene expression varied significantly in survival. Since the publication of this study, several other studies have emerged with gene expression prognostic profiles for early stage NSCLC (59
One such study by Potti et al.
) analyzed 89 NSCLC patients using DNA microarrays to develop a metagene prediction model capable of predicting disease recurrence. The model had a higher accuracy than models containing clinical data alone (age, sex, tumor diameter, stage of disease, histological subtype, and smoking history) or both clinical and gene expression data. The model was 72% accurate across ACOSOG Z0030 trial samples (n=25), 79% accurate across CALGB 9761 trial samples (n=84), and 80% accurate across an independent set of stage I SCC (n=15). As proposed by Potti et al.,
a randomized Phase III trial, CALGB 30506, is about to begin to evaluate the metagene predictor to direct adjuvant therapy in high risk stage IA NSCLC patients. While the prediction model was validated on an independent sample set, it remains unclear if the signature is entirely related to differences in prognosis or recognized subtypes of AD (patients with BAC were not identified). In addition, the variables explored in the clinical risk model did not include potentially important prognostic indicators such as tumor grade, histological subtype of AD (67
), and the mutational status of cancer-related genes (K-ras
). Finally, it is not clear if there were differences in the use of adjuvant chemotherapy treatment among the patients that could effect survival. The trial will hopefully answer several of these questions that were not addressed in the study.
A study by Lu et al.
), published shortly after the Potti et al.
study, performed a meta-analysis of 7 different datasets (10
), including a previously unpublished dataset of their own, to identify a gene expression signature that predicts survival in patients with stage I NSCLC. Genes were identified that were common to the microarray platforms used in all of the studies, the datasets were adjusted for systematic bias, and 197 samples with stage I NSCLC from 5 of the 7 datasets were used to identify a gene expression signature of 64 genes predictive of survival. The signature had higher classification power compared to stage, was predictive of survival among ADs and SCCs, and was able to accurately predict survival in the 2 datasets not used to develop the signature. A subset of the 64 genes was also validated using quantitative RT-PCR and immunohistochemistry. This study demonstrates the feasibility of combining different Affymetrix DNA microarrays to increase sample size and predictive power and identify a robust gene expression signature predictive of survival.
Chen et al.
) recently reported a 5 gene signature capable of predicting survival among patients with NSCLC. Sixteen genes were found to be associated with survival across training and test sets using DNA microarrays measuring 672 previously identified genes (71
) associated with invasive activity in invasive NSCLC cell lines. A subset of the sixteen genes (n=5) were correlated with survival using quantitative RT-PCR, and this subset was used to create a decision tree that stratified patients into low- and high-risk for reoccurrence. The predictor was tested on an independent set of 60 patients and on the Beer et al.
) dataset. The shortcomings of this study include a heterogeneous group of samples that included Stage I, II, and III NSCLC samples and different subtypes of NSCLC. In addition, Chen et al.
chose to focus on a set of invasive genes derived from NSCLC cell lines characteristic of the lung tumor, but not the adjacent stromal tissue. The samples used were not microdissected and had both tumor and stromal tissue, and therefore, the analysis may be missing more robust predictive genes.
Given the publication of numerous studies that have identified prognostic gene expression signatures for NSCLC, one important question concerns the comparability of these studies as they have used different microarray platforms, analysis techniques, and samples. The Lu et al.
study discussed above as well as other published studies (72
) have demonstrated the feasibility of combining different datasets to increase the power and robustness of the prognostic signature. In addition to these studies, additional work has been done to determine the feasibility of conducting larger studies involving the participation of multiple laboratories. Recently a large retrospective, multi-site, blinded study by Shedden et al
. collected 442 lung ADs with relevant clinical, pathological, and outcome data at 4 institutions from 6 lung cancer treatment sites to characterize the performance of several prognostic models (75
). The feasibility of the study was established previously by comparing gene expression data produced on the same microarray platform using a standardized protocol by the 4 participating institutions (76
). Eight prognostic classifiers and classifiers based on the work of Potti et al.
) and Chen et al
) were developed and evaluated on designated training and blinded test subsets of the data and produced variable results. The inclusion of clinical covariates improved the performance of most classifiers, more complex classifiers (classifiers that included more genes) had better performance, classifiers trained across samples of all stages performed better across stage I samples, and a small subset of the classifiers performed well across both tests sets (from 2 different institutions).
The study illustrates many important points concerning the development of gene expression-based prognostic predictors for early stage lung cancer. While the prognostic classifiers contain different gene sets, there was some concordance between the predictions made by each of the classifiers. This suggests that the power of gene expression to predict prognosis is not restricted to the differential expression of a few genes and that each of the classifiers is measuring aspects of prognosis-related lung AD biology. Similar results have been seen in the setting of breast cancer, where various prognostic classifiers (containing different genes) show high rates of concordance in their outcome predictions of individual samples (77
). It is interesting to note that for some lung tumor samples there was complete agreement or disagreement between the classifiers and clinical outcome, while for other samples there was considerable heterogeneity. There are several possible explanations for these discrepancies. Lung ADs have significant histological variation and mixed subtypes, and therefore, it is possible that for some samples, the tissue in the sample may not accurately represent the tumor or the biological process on which a particular classifier depends. In addition, heterogeneity in tissue composition and sample processing or inaccuracies in clinical information may contribute to the variability in the predictions made by the classifiers for a particular sample. There are also potential problems with using overall survival as an endpoint to evaluate prognostic gene expression signatures in subjects with “high risk” tumors that are completely resected or in subjects with “low risk” tumors that develop secondary conditions shortly after diagnosis. The study addresses problems that have plagued past studies such as small number of samples, inconsistent and variable clinical data and sample collection and illustrates many of the remaining challenges associated with developing a prognostic gene expression signature for clinical application.
In addition, the MicroArray Quality Control (MAQC) project led by the FDA evaluated microarray technology for its use in clinical and regulatory settings by examining repeatability of data generated within a particular site, across multiple sites, and between seven different microarray platforms (78
). The study observed reproducibility of gene expression measurements between different sites and platforms. The reproducibility of gene expression measurements between sites and across platforms demonstrated by these studies is a critical milestone in the development of gene expression biomarkers that can be routinely used in the clinic.
Prognostic miRNA signatures for NSCLC
Prior to the work of Johnson et al.
) associating let-7
miRNA and RAS expression in lung cancer, Takamizawa et al.
) demonstrated that reduced expression of let-7
miRNA in lung cancer was associated with shortened postoperative survival. One-hundred, forty-three lung tissue specimens, predominantly ADs, from stage I, II, and III lung cancers were collected from patients undergoing resection. Let-7
expression was used to dichotomize patients into two groups that had significantly different survival (p = 0.0003) when all samples were analyzed or just ADs. Patients with lower let-7
expression has significantly worse prognosis, independent of disease stage.
Yanihara et al.
) used microarrays to quantify miRNA expression in lung tumors and found that two miRNAs, mir-155
, were significantly associated with survival in lung ADs by Kaplan-Meier survival analysis. In a multivariate Cox proportional hazard analysis that included all clinicopathological and molecular factors, increased expression of mir-155 was significantly associated with worse prognosis. Real time RT-PCR across an independent validation set of 32 ADs confirmed a significant relationship between mir-155 expression and survival. A subsequent study by Yu et al
) used real-time PCR to measure the expression of 157 miRNAs in 112 NSCLC patients to identify a 5 miRNA signature (let-7a, mir-221, mir-137, mir-372, mir-182
) capable of predicting overall and relapse-free survival. Cox proportional hazard regression and risk-score analysis was used to identify the 5 miRNA signature across a training set of samples (n=56). The signature was used to predict the risk (high- or low-) on a test set of samples (n=56) and an independent cohort of NSCLC samples (n=62). There was a statistically significant difference in overall and relapse-free survival between low- and high-risk groups and the signature was a reasonable predictor of survival among subsets of the samples with the same cell type or stage. Yu et al.
was also able to show that modulating the levels of 4 out of the 5 miRNAs altered lung cancer cell invasiveness in vitro
. The results indicate that miRNA expression profiles can be used as prognostic markers for lung cancer. Future studies profiling both gene and miRNA expression across a large cohort of early stage ADs is needed to determine if an expression signature composed of miRNAs, mRNAs, or both has the greatest diagnostic and prognostic potential in lung cancer.