|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: YXL WZ. Performed the experiments: YXL YS. Analyzed the data: YXL YS RB JSL AKS IS WZ. Contributed reagents/materials/analysis tools: YXL WZ. Wrote the paper: YXL YS RB JSL AKS IS WZ. Designed the software used in analysis: YXL.
Small sample sizes used in previous studies result in a lack of overlap between the reported gene signatures for prediction of chemotherapy response. Although morphologic features, especially tumor nuclear morphology, are important for cancer grading, little research has been reported on quantitatively correlating cellular morphology with chemotherapy response, especially in a large data set. In this study, we have used a large population of patients to identify molecular and morphologic signatures associated with chemotherapy response in serous ovarian carcinoma.
A gene expression model that predicts response to chemotherapy is developed and validated using a large-scale data set consisting of 493 samples from The Cancer Genome Atlas (TCGA) and 244 samples from an Australian report. An identified 227-gene signature achieves an overall predictive accuracy of greater than 85% with a sensitivity of approximately 95% and specificity of approximately 70%. The gene signature significantly distinguishes between patients with unfavorable versus favorable prognosis, when applied to either an independent data set (P=0.04) or an external validation set (P<0.0001). In parallel, we present the production of a tumor nuclear image profile generated from 253 sample slides by characterizing patients with nuclear features (such as size, elongation, and roundness) in incremental bins, and we identify a morphologic signature that demonstrates a strong association with chemotherapy response in serous ovarian carcinoma.
A gene signature discovered on a large data set provides robustness in accurately predicting chemotherapy response in serous ovarian carcinoma. The combination of the molecular and morphologic signatures yields a new understanding of potential mechanisms involved in drug resistance.
Ovarian carcinoma (OvCa) remains a leading cause of mortality from gynecologic cancer, with approximately 21,880 new cases and 13,850 deaths estimated in the United States in 2010 , . The standard treatment protocol for advanced-stage epithelial OvCa is cytoreductive surgery followed by platinum-based combination chemotherapy. However, the majority of patients eventually relapse with generally incurable disease, mainly due to the emergence of chemotherapy resistance , . Early identification and differentiation of patients who are resistant to chemotherapy could lead to their enrollment in clinical trials with alternative therapeutics and is of utmost importance for improving the outcome of ovarian cancer.
Understanding the molecular mechanisms for chemoresistance has been the subject of intense research. Various genomic methodologies – have been applied to the study of OvCa to identify a gene signature associated with chemotherapy response , . However, there is a lack of overlap between the discovered genes in different studies , , possibly because of limited sample size in most studies.
The Cancer Genome Atlas (TCGA), a project of the National Cancer Institute and the National Human Genome Research Institute, generates a comprehensive catalog of genomic abnormalities with large-scale data sets that include cancers with the highest mortality rates including serous OvCa. In addition, the TCGA effort has led to the accumulation of a large set of tumor images in the repository. It is recognized that cell morphologies are intimately linked to multiple cell functions, such as cell growth, apoptosis, differentiation, and migration –. Switches between different cell functions can be controlled by regulating cell shapes , . It is reported that nuclear size is correlated with tumor prognosis in Stage III-IV ovarian cancer  and is capable of distinguishing low- from high-grade serous OvCa . However, the molecular mechanism underlying this association remains unknown. The tumor image collections in TCGA provide the opportunity to systematically characterize the morphologic features associated with chemotherapy response and gene activity.
In this study, we leverage the full scope of the TCGA database with a large population of patients, including gene expression and tumor images, to identify the molecular and morphologic signatures associated with chemotherapeutic response in OvCa. Integration of the genomic and morphologic dimensions of OvCa will yield potential insights into mechanism of drug resistance and facilitate identification of novel system-level events for alternate therapeutic interventions.
To gain insight into the potential mechanisms underlying the differential response of OvCa to chemotherapy, we perform an integrated analysis of gene expression and tumor nuclear image profiles. The 232-sample set that has both gene expression and image data is used to identify a gene expression pattern that could predict clinical outcome. The 227 genes most weighted in achieving the prediction are identified, of which 154 (67.8%) were downregulated and 73 (32.2%) are upregulated in the chemoresistant group (Figure 1A; Table S1). The gene expression fold change cutoff between these two groups is determined on the basis of the overall predictive accuracy of the patients in this training set (Figure S1); this cutoff is similar to the one used in the previous study . The gene expression profiling well separates the chemoresistant patients from the chemosensitive patients (Figure 1B) and achieves an overall predictive accuracy of approximately 87.9% (Figure 1D), with a sensitivity of approximately 95.2% and specificity of approximately 70% (Figure 1C). Pathway analysis of the discovered genes reveals an enrichment of several groups of genes that regulate morphologic changes at the cellular (approximately 11%), tissue (approximately 13%), and tumor (approximately 3%) levels (Figure 1E).
A validation of the gene signature is performed on an independent set of 261 samples from TCGA. Based on the score cutoff from Figure 1D, the predictive model splits the patients into two groups (Figure 2A) that are well separated from each other (Figure 2B). 35 patients are identified to have an explicit response to chemotherapy  and 26 of them are correctly predicted (Figure 2A). Kaplan-Meier analysis of the remaining samples after removing the patients without survival data shows that the patients in the low-scoring group exhibited poorer progression-free survival (PFS) (Figure 2C; median: 22.3 vs 34.2 months; log-rank P=0.04, HR [95%CI]=0.43 [0.19–0.97]). The clinicopathologic characteristics of patients in these two groups are summarized in Table S2.
Robustness and scalability of the gene signature are next evaluated by using the Australian data set which is based on a different microarray platform. We use it to validate whether the discovered genes are associated with patient outcome. Overlap analysis reveals 198 among the 227 genes in this data set (Table S1). Using the threshold from the reported chemosensitive rates in ovarian cancer patients (approximately 70% ), we group the ~70.0% patients (171 out of 244) with the highest scores into the high-scoring group and the remaining 30% patients (73 out of 244) into the low-scoring group (Figure 2D); consistently patients in the low-scoring group have poorer prognosis (Figure 2E; P<0.0001, HR [95%CI]=0.36 [0.22–0.56]) where the median PFS of Group 2 (12.0 months) is almost 4 times shorter than that of Group 1 (50.0 months). The clinicopathologic characteristics of the patients in these two groups either with high scores or with low scores in both validation sets are detailed in Table S2, which shows the similar age, tumor stage, and tumor grade distributions as the TCGA training set (Table 1). Cox proportional hazard analysis demonstrates that the two groups have significantly different progression-free survival patterns, independent of age, grade, and stage (Table S3).
These results not only validate the predictive performance of the gene signature but also suggest its strong association with tumor prognosis, which is most likely contributable from chemotherapy response.
The results from pathway analysis (Figure 1E) suggests that the morphologic characteristics may play a key role in determining chemotherapy response. The tumor nuclear image profile in the 130-sample training set is used to identify a morphologic signature associated with chemotherapy response; this is then validated using the 123-sample validation set. The 15 significant features (FDR≤2%, Table S4) consisting of 5 with the highest signal-to-noise ratios (SNRs) and 10 with the smallest SNRs clearly uncovers binary patterns in both the training and validation sets, as illustrated in a similar fashion commonly used in gene expression profiles (Figure 3A). A detailed version of this panel with morphological feature names is provided in the Figure S5. More prominently, we find that the Std_Ar_Bin2 feature (see Methods and Table S5) values are strongly associated with tumor prognosis. Based on the Std_Ar_Bin2 feature values, we split the 253 patients into two groups, where patients with values greater than or equal to the median are categorized into a group (ie, High Std_Ar_Bin2, n=129), and patients with values less than the feature median are categorized a different group (ie, Low Std_Ar_Bin2, n=124) (Figure 3B). Kaplan-Meier survival analysis demonstrates that tumors with smaller values of Std_Ar_Bin2 feature have significantly poorer overall survival (OS) (Figure 3C, log-rank P=0.001, HR [95%CI]=1.99 [1.32–3.01]) and poorer PFS (Figure 3D, log-rank P=0.017, HR [95%CI]=2.72 [1.20–6.19]). Cox proportional hazard analysis demonstrates that the two subgroups, split on the basis of Std_Ar_Bin2 feature values, have significantly different OS and PFS patterns, after controlling for age, stage, and grade (Table S6). These results suggest that morphologic features are significantly related to patient survival and could serve as valuable prognostic markers .
Both genomic and morphologic features are associated with chemotherapy, suggesting that these two types of signatures are strongly associated with each other. With the patients split into the two groups based on the Std_Ar_Bin2 feature values, as described above, we carry out a supervised analysis of the gene expression data and find that five of the signature genes are significantly lower (P<0.01) in the Low Std_Ar_Bin2 group (Figure 4A). Similar analysis is performed on the other morphologic features, and the corresponding differentially expressed genes are summarized in Table S7. Next we perform correlation analysis of morphologic feature data and gene expression data, and the highly correlated (either positively or negatively) feature-gene pairs (P<0.005) are depicted in Figure 4B, in which we can see that the morphologic features are strongly related to the gene signature.
Several studies have described chemotherapy response in ovarian cancer using gene expression profiles, as summarized by Helleman et al . However, the number of ovarian cancer specimens used for the gene selection in those studies was relatively small, ranging from 6 to 119, and the corresponding gene sets discovered to be associated with platinum-based chemotherapy resistance exhibited a wide range of 14 to 1,727 genes where only seven genes were observed as an overlap and each between only two gene sets . Lack of overlap between the discovered gene sets is likely due to the limited sample size in most studies. However, ours is the first study performed on such a large scale, two genes in the 227-gene set, EPH receptor B3 (EPHB3) and nuclear factor I/B (NFIB), had been identified in one of the previous studies , and one gene, RNA binding protein 1 (RNABP1), had been identified in a different study . More prominently, a gene set discovered on a large data set undoubtedly has high statistical power and robustness in accurately predicting chemotherapy response. Recently, the TCGA research network identified 193 prognostic gene signatures predictive of OS, but the gene association with chemotherapy response remains unexplored . Here we used a large sample set (493 samples from TCGA and 244 samples from an external source) for identification of molecular and morphologic signatures that are associated with chemotherapy response. The predictive model on the basis of gene signature revealed an accuracy of 87.9% in correctly classifying refractory from responsive tumors in the TCGA training set and stratified patients in both the TCGA validation set and the Australian data set into groups that demonstrated significant discrepancy in tumor progression, suggesting the capacity of the gene signature to serve as a mechanism to stratify patients with respect to treatment.
The imaging approach stratifies the cells into 10 bins based on nuclear size and accounts for the heterogeneity of cells in a tumor population. Our stratification revealed that most significant morphologic features differed between the chemosensitive and chemoresistant groups in the larger nuclei (range, 300 to 500 pixel2; Table S2). However, nuclei within this size range account for a very small percentage (approximately 2.0%), and the majority of the nuclei (approximately 98.0%) do not show a significant difference in chemotherapy response. This observation not only is consistent with the Goldie-Coldman hypothesis  that only a small cell population may contribute to differential response to chemotherapy, but also suggests the difficulty of a conventional approach of simply correlating the overall morphologic differences with chemotherapy response, owing to the “dilution” effect . Therefore, our imaging approach allows us to interrogate different cell populations separated on the basis of nuclear size in a high throughput and automated fashion.
The 15 morphologic features (Table S4) most weighted in achieving the patient separation are highly instructive. The same nuclear parameter might exhibit different or even opposite patterns. The average roundness of nucleus in Bin 8 (Mean_Ro_Bin8) is significantly higher in the chemoresistant group (P=1.5×10−4, Figure S2A), on the contrary, the same nuclear parameter in Bin 9 (Mean_Ro_Bin9) shows significant decrease in the chemoresistant patients (P=0.0015, Figure S2B). The average roundness of the entire nucleus per sample (Mean_Ro_Total) shows no significant difference (P=0.56, Figure S2C). In addition, none of the image features calculated from the entire nucleus per sample, the way similar to those used in other studies , , , show significant difference between the chemoresistant and chemosensitive patients. This discrepancy from the previous studies  likely results from the number of nuclei used in the feature calculation. We used approximately 4000 nuclei per sample for feature value calculation, almost 80 times more than the amount used in the other studies , , . Taken together, our approach of binning the nucleus size and then assessing the image feature in each individual bin improves the image feature resolution and enhances the discriminating power. Furthermore, our approach of calculating the morphologic features in separate bins (with smaller size variations) is capable of alleviating the size dependence of some of the features, such as circularity and roundness .
Aside from the potentially practical value, the morphologic features also provide insights into cancer morphogenesis. The chemosensitive patients exhibit a smaller value of nuclear roundness in Bin 8 (Mean_Ro_Bin8), but with a larger variability (Std_Ro_Bin8) and a larger aspect ratio (Mean_AR_Bin8). Such morphologic differences likely result from the active response of the cells to their environment and heightened cellular metabolism, that is contributable from different molecular regulations (Figure 4B, Table S7). This is further corroborated by pathway analysis, which revealed the gene enrichment in the morphologic function at cellular, tissue, and tumor levels (Table 2). The gene content of this table offers potential insight into the structural and molecular mechanisms of the chemotherapy response. The importance of A2M gene expression is of particular interest, in view of past work suggesting a correlation between decreased A2M levels with sensitivity to drugs . A2M is an inhibitor of matrix metalloproteinase activity, which is reported to contribute to tissue remodeling and morphogenesis , . PAX6, which is associated with drug response, is strongly activated by cotylenin A in retinoblastoma cell lines . Decreased expression of EPHB3 in the chemoresistant group may have promoted chemoresistance by impairing the apoptotic response to cell damage .
In conclusion, a gene signature discovered on a large data set provides robustness in accurately predicting chemotherapy response in serous OvCa. Meanwhile, we propose a novel approach for tumor nuclear image profile generation by characterizing patients with nuclear features (such as size, aspect ratio, and roundness etc) in incremental bins, and we demonstrate that the tumor nuclear image profile exhibits a strong association with chemotherapy response. This imaging approach is capable of accounting for cell heterogeneity and improving the discriminating power. The integrated approach herein, using gene expression profile that predicts chemotherapy response coupled with the morphologic features to stratify patients to the most appropriate treatment regimen, represents an important step toward the goal of personalized cancer treatment by identifying the area where novel drugs can be developed. Although our observations suggest that the tumor image profile is capable of defining prognosis and yielding mechanistic insights into the process of chemoresistance, one limitation of this study is the lack of validation of the image analysis due to unavailability of the independent image sets especially in a large population. This issue should be addressed in the future in order to determine the ultimate value of this technique in clinical practice. Besides, the resolution dependence of the morphologic features in separate bins has not been systematically investigated yet in this study and deserves attention in the follow-up studies. Future work also consists of inclusion of more possible morphologic features and verification of the gene-feature relation identified in this study.
Two hundred fifty three OvCa patients in the TCGA database with explicit platinum status  are obtained for nuclear image profile generation, among which 172 patients are sensitive to chemotherapy, and 81 are chemoresistant. Platinum status is defined as resistant if the patient recurred within six months. Platinum status is defined as sensitive if the platinum free interval is six months or greater, there is no evidence of progression or recurrence, and the follow-up interval is at least six months from the date of last primary platinum treatment . Compared with patients who are chemosensitive, the chemoresistant patients exhibit relatively poorer overall survival (OS; median, 53.9 vs. 33.8 months; p<0.0001) and progression-free survival (PFS; median, 25.8 vs. 9.3 months; p<0.0001; Figure S3). Other characteristics of these 253 patients are listed in Table 1. The average age at diagnosis is 61.7 years (range, 38.0 to 84.7 years) for the chemoresistant group and 59.1 years (range, 30.5 to 87.5 years) for the chemosensitive group. Up to 84% of the chemosensitive patients show the symptom of recurrent diseases in contrast to 100% of relapse for the chemoresistant patients. 232 among the 253 samples with expression data serve as the TCGA training set to identify the gene signature, of which 165 are chemosensitive and 67 are chemoresistant. An independent data set from TCGA (n=261) and an external data set from an Australian study  are applied for validation of the gene signature. The gene expression profile in TCGA dataset was performed on three different platforms (Affymetrix Exon 1.0, Agilent 244 K Whole Genome Expression Array and Affymetrix HT-HG-U133A) and a unified expression data set was created by the TCGA research working group and is available in the TCGA data portal. The external expression data was performed on the Affymetrix HG-U133 plus 2 platform and was downloaded from the Gene Expression Omnibus (accession GSE 9899 ). The training set is used to discover the gene signature and then to create the predictive model. To be consistent with patient characteristics in these two data sets, we exclude patients from the Australian data set who have either non-serous OvCa or grade 2 disease, resulting in 244 patients in this validation set. The clinicopathologic characteristics of patients in these two validation data sets are summarized in Table S2.
Expression data are prescreened to remove genes with trivial variation across the samples and low median expression levels, resulting in 14,084 genes in the analysis. The gene signature identified through a supervised method  is used for constructing a predictive model using the weighted voting algorithm , . A predictive score is assigned to each sample and is calculated as
where , N is the number of discovered genes, wf is the weighting factor, xf is the expression value and μ represents the expression mean for each class. A sample with a score greater than a cutoff is assigned to the chemosensitive group, and a sample with a score less than or equal to the cutoff is assigned to the chemoresistant group. The predictive accuracy, based on a cutoff score determined by receiver operating characteristic (ROC) curve, is assessed. The gene signature is validated on an independent sample set from TCGA and an external data set . Pathway and network analysis is performed using Ingenuity Pathway Analysis (IPA, version 8.6-3003; Ingenuity Systems, Inc.).
An average of 10 high-resolution tumor images (20 X magnification, 1072×648 pixels) per sample at the different views of the tissue blocks are first selected by a pathologist from hematoxylin- and eosin-stained ScanScope virtual slides, to account for the spatial heterogeneity of the tumor tissues. Next, we automatically identify and measure the nuclei in each image by using a cell-image analysis software (ImageJ, version 1.42, NIH) , producing a parametric profile for each nucleus. In brief, the first slice of the tumor image is processed using a Fast-Fourier-transform (FFT) band pass filter with the default setting before it is converted to a black-white image (features of interest such as nuclei are displayed as black and the background as white) by using a threshold value verified by overlaying the segmented nuclei with the original RGB image. Nuclei with a size range of 50 to 500 pixel2 and a circularity of greater than 0.3 are selected for further analysis. The nucleus profile consists of a set of numbers that describe the nucleus’s characteristics, including size, location, and shapes that are automatically measured using the ImageJ Plugins, which is widely used in research , . Definition of these nuclear parameters is described in details in the ImageJ user guide . Typically, approximately 4000 nuclei within the size range from 50 to 500 pixel2 per sample are produced which is almost 80 times more than the amount used in other studies , . An example detailing this procedure is shown in Figure S4.
To generate the tumor image profile, we first split the nuclei into 10 evenly spaced bins based on the nuclear size, and then calculate the average value and standard deviation (SD) of these parameters (e.g, area, perimeter, circularity, aspect ratio, solidity, and roundness) for each nucleus in each bin as well as for all the nuclei in an image. In addition, the number of nuclei in each bin (feature name, count) and its percentage (feature name, percentage) are defined as image features and calculated accordingly. The compactness is defined as an image feature to qualify the spatial distribution of nuclei within the tumor tissue. As a result, the tumor image profile consists of 153 morphologic features including 66 means, 66 SDs, 10 percentages, 10 counts, and 1 compactness (Table S5). Definition of these nuclear parameters is described in details in the ImageJ user guide .
The 253 TCGA samples with the calculated image profile (described above) are randomly divided into a 130-sample training set and a 123-sample validation set in an approximate 11 ratio. The training set consists of 90 chemosensitive patients (69.2%) and 40 chemoresistant patients (30.8%), while the validation set contains 82 chemosensitive patients (66.7%) and 41 chemoresistant patients (33.3%). The distribution of chemosensitive and chemoresistant patients in both the training and validation sets is selected to reflect clinical chemosensitive rates of approximately 70% . Similar to the gene expression analysis as stated above, the identification of the morphologic features associated with chemotherapy response was performed using the method described previously ,  in the training set and validated in the validation set: In brief, the signal-to-noise ratio (SNR) is calculated for each potential feature , , in which positive or negative SNR value indicates the feature favorable for either the chemoresistant or the chemosensitive group. The 153 features are ranked on the basis of their SNR values. The differentially varied morphologic features are determined in the training set on the basis of FDR≤2% and then validated in the test set. Feature data exhibit a normal distribution after median centered and log transformed.
OS and PFS curves are generated by the Kaplan-Meier method, and the statistical significance of survival differences is determined with the log-rank test. Survival analysis is performed and an ROC curve is generated using SPSS (version 17.0; SPSS Inc.) and GraphPad Prism (version 5.04; GraphPad Software, Inc.). Normality of feature values is verified via a Jarque-Bera test . The statistical significance of the morphologic signature is calculated via an unpaired, two-tailed t-test combined with Benjamini-Hochberg (BH) multiple testing . The p-value of the identified pathways is assessed by the Fisher exact test using Ingenuity Pathway Analysis.
Overall predictive accuracy of patients in the training set as a function of the gene expression fold change cutoff. The arrow indicates the fold-change cutoff used in the study that gives rise to the highest predictive accuracy.
(A) The average roundness of nuclei in Bin 8 (Mean_Ro_Bin8) is significantly higher in the chemoresistant group (P=1.5 E-04). (B) The same nuclear parameter in Bin 9 (Mean_Ro_Bin9) shows a significant decrease in the chemoresistant group (P=0.0015). (C) The average roundness of the nucleus in an entire sample shows no significant difference between groups (Mean_Ro_Total) (P=0.56).
Overall survival (OS) and progression-free survival (PFS) curves of the 253 patients used for tissue nuclear image profile generation, among which 172 patients were sensitive to chemotherapy and 81 of them were chemoresistant.
Flow chart for nucleus parametric profile generation.
Detailed version of Figure 3A including morphologic feature names.
The 227 genes are differentially expressed between chemoresistant and chemosensitive patients (P<0.05, as identified by parametric t-test).
Clinicopathologic characteristics of patients in the TCGA and Australia Validation data sets.
Cox proportional hazard analysis of progression-free survival of OvCa patients in the TCGA and Australian validation data sets in relation to predictive sub-groups (the group with high predictive scores versus the group with low predictive scores).
The 15 morphologic features are differentially varied between chemoresistant and chemosensitive patients with serous OvCa (FDR≤2%).
Image features defined for tumor nuclear image profile generation.
Cox proportional hazard analysis of overall and progression-free survival of OvCa patients in relation to the morphological feature value (Std_Ar_Bin2).
Significantly expressed gene signatures associated with each morphologic feature.
We would like to thank Jilong Yang and Yingmei Wang for their assistance in evaluating histological images, Lina Zhang and Da Yang for their input in genomic data analysis, Sue Moreau and Sarah Bronson of the Department of Scientific Publications at the University of Texas MD Anderson Cancer Center for editing the original and revised manuscripts.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work was supported by funding for the Genome Data Analysis Centers from the National Institutes of Health (U24 CA143835 to I.S. and W.Z.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.