Gene expression profiling using oligonucleotide microarrays enables a genome-wide analysis of transcriptional profiles that are associated with specific clinical phenotypes of human malignancies. Several recent studies have identified gene expression profiles that discriminate between benign prostate epithelium and prostate carcinoma, Gleason grade, as well as primary and metastatic prostate carcinoma.8,12,13,15,16
High-throughput techniques such as oligonucleotide microarrays are potentially powerful tools to identify, in a relatively unbiased manner, molecular signatures associated with progressive prostate carcinoma after definitive local therapy. These prognostic molecular signatures may more accurately reflect tumor biology than clinicopathologic parameters and may enhance the ability to predict the outcome of patients treated by RP. However, few studies have analyzed gene expression profiles among primary tumors associated with prostate carcinoma progression.14
In a gene expression analysis of primary tumors, we used molecular signatures of 5–8 genes associated with recurrent prostate carcinoma and accurately predicted disease recurrence in 75% of tissue samples using this approach. However, the predictive accuracy of the gene models was inferior to a validated model based on standard variables currently used in clinical practice. Predictive accuracy was significantly enhanced when prognostic genes identified by molecular profiling were combined with the postoperative nomogram prediction. Validation of these modeling approaches in an independent expression data set is required to evaluate their potential clinical applicability. This novel approach of integrating clinical and molecular variables to predict cancer progression, however, may provide a new paradigm for the use of expression profiling to predict clinical outcome for all malignancies.
To identify prognostic genes for incorporation into our gene and combined models, the LOOCV procedure was used. The LOOCV reduces the likelihood of developing an overly optimistic predictive model (i.e., overfitted to the tissue samples within our cohort) by selecting genes associated with disease recurrence only within the 78 tissue samples of the training set. The validation set consists of one tissue sample that is excluded at each step of model development. Although the LOOCV procedure likely represents the optimal use of the data given the relatively small sample in our study, it is no substitute for external validation on an independent sample set. Independent validation of these models is required to more accurately assess their performance relative to the nomogram and to determine if they are suitable for incorporation into clinical practice. However, due to the difficulty in obtaining sufficient quantities of cancerous tissue from frozen prostatectomy specimens and the long-term follow-up required to determine recurrent cases, it is difficult to obtain an appropriate validation set of sufficient size and with sufficient clinical follow-up. To our knowledge, there are no publicly available prostate carcinoma gene expression data sets with the required size, clinical information, and follow-up for independent testing.
We identified 153 genes that exhibited significant expression differences between recurrent and nonrecurrent primary tumors. These expression differences are relatively few compared with the results of our previous analysis of nonrecurrent primary and meta-static tumor samples.12
In a recent study, Singh et al.14
were unable to identify any genes that exhibited significant differential expression between recurrent and nonrecurrent tumors, although their sample was considerably smaller than the sample in the current study.14
We used the differences in gene expression between the two classes to develop models that predict disease recurrence after RP with 75% accuracy in LOOCV. A final model will eventually be tested on an independent validation set. However, that EI24
, and MAP4K4
were chosen as the first 3 variables in 78 of 79 models suggests that the final model may have similar classification accuracy when applied to a validation set. It is noteworthy that 3 of the 4 most commonly selected genes in these models (EI24
) have not been previously implicated in prostate carcinoma, whereas EPB49
may have a role as a tumor suppressor gene in prostate carcinoma.27
Singh et al.14
developed a similar model for disease recurrence after RP using a supervised machine learning algorithm based on 5 genes that classified 19 of 21 tissue samples (90%) correctly in LOOCV. Although this model performed well relative to clinical variables, it was not compared with a validated multivariable model such as the nomogram. Nonetheless, these two examples illustrate that prostate carcinoma can be accurately classified with respect to outcome based exclusively on gene expression differences.
Recently, outcome prediction models have been developed for breast and lung carcinoma using molecular profiling and have been shown to perform well, independent of clinical variables.28,29
These molecular models are promising for integration into clinical practice as accurate clinical models for these cancers are lacking. Fortunately, accurate prediction models based on standard variables exist for prostate carcinoma recurrence after RP, external-beam radiotherapy, and brachytherapy.4,30,31
As we observed with our modeling approach using gene expression differences alone, outcome prediction based on molecular profiling may not significantly improve on models that are based on the optimal combination of clinical variables. Although we were able to predict outcome at a high level of accuracy using molecular profiling, this approach did not generate models that outperformed the nomogram.
To our knowledge, no previous study has attempted to integrate prognostic genes identified by molecular profiling with validated models based on clinical variables. The diverse information of clinical and molecular variables is likely to provide a broader assessment of factors that are associated with cancer progression. Based on LOOCV, the predictive accuracy of the combined modeling approach, measured by the c-index (0.89), was superior to the postoperative nomogram (0.84) and the approach using gene variables alone (0.75). This suggests that models incorporating both clinical variables and gene expression information provide greater predictive accuracy than models based on either set of variables alone.
Overall, the integration of gene expression profiling and clinical variables produced a model that had a significant but modest improvement in predictive accuracy over the nomogram. However, a substantial improvement in the classification of patients whose nomogram predictions were in the middle range (7-year PFP, 30–70%) was achieved with the combined modeling approach. The combined models accurately classified 85% of these patients (c-index 0.85) and performed significantly better than the nomogram (c-index 0.59). The nomogram is useful for discriminating among patients at the extremes of predictions, but the anticipated outcome is indeterminate for patients whose probability of disease recurrence is in this middle range. A potential clinical application of a combined model is to distinguish disease recurrent from nonrecurrent patients when the nomogram prediction is in this middle range, which represents approximately 30% of patients who undergo RP for clinically localized prostate carcinoma.
Pending the independent validation of a final combined model, the approach we have taken to integrate gene expression information with clinical variables may provide a new paradigm for the use of molecular profiling to predict clinical outcome for all malignancies. For prostate carcinoma, we believe that the optimal predictive model must be based, in part, on clinical variables. A patient’s prognosis after RP is dependent on technical factors in addition to the inherent biologic properties of his cancer (reflected by serum PSA level, Gleason grade, and pathologic stage). Surgical margins have been reported to be positive in 5–53% of patients and increase the risk of disease recurrence by up to 4-fold in multivariable analysis. Margin status is included in the postoperative nomogram.1,2,4,32
The risk of positive surgical margins in prostatectomy specimens is associated with the clinical features of prostate carcinoma as well as the technique used by individual surgeons.33
The prognostic information of molecular profiling may reflect the biologic potential of prostate carcinoma better than tumor grade, stage, and PSA level, but it does not capture the prognostic importance of technical factors.
Few of the genes that exhibited significant differences in expression in our microarray analysis of recurrent and nonrecurrent primary tumors have been previously implicated in prostate carcinoma. Nonetheless, their proposed functional properties are intriguing. Accumulating evidence suggests that oxidative genomic DNA damage is responsible for the molecular events that lead to the development and progression of prostate carcinoma.34 GSTP1
are important carcinogen detoxification enzymes and both were significantly underexpressed in our recurrent tumor specimens. EI24
, the most highly overexpressed gene in recurrent tumors, is believed to be a direct target of TP53 transcriptional activation and is responsible for the formation of reactive oxygen species leading to apoptosis.35,36
Uncoordinated activity of EI24
may be a potential mechanism contributing to genomic instability and prostate carcinoma progression via oxidative DNA damage or it may be a sign of uncoordinated apoptotic pathways that have been described in a number of malignancies.37 EPB49
, the most highly underexpressed gene in recurrent tumors, was the most frequently selected gene in the combined models. The EPB49
gene has been localized to chromosome 8p21.1 (a region frequently deleted in prostate carcinoma) and is an actin-binding/bundling protein involved in the regulation of cell shape.27 MAP4K4
was the second most highly overexpressed gene in our recurrent tumor specimens. MAP4K4
is overexpressed in tumor cell lines and may be an upstream activator of the c-jun N-terminal kinase pathway responsible for activation of several transcription factors.38
These initial observations deserve further investigation to clarify the potential roles of these genes in prostate carcinoma progression.