|Home | About | Journals | Submit | Contact Us | Français|
Expanding interest in and use of active surveillance for early state prostate cancer has increased need for prognostic biomarkers. Using a multi-institutional tissue microarray resource including over 1000 radical prostatectomy samples, we sought to correlate Ki67 expression captured by an automated image analysis system with clinico-pathologic features and validate its utility as a clinical grade test in predicting cancer-specific outcomes.
After immunostaining, the Ki67 proliferation index (PI) of tumor areas of each core (3 cancer cores/case) was analyzed using a nuclear quantification algorithm (Aperio). We assessed whether Ki67 PI was associated with clinico-pathologic factors and recurrence free survival including biochemical recurrence, metastasis or PC death (7-year median follow-up).
In 1004 PCs (~4,000 tissue cores) Ki67 PI showed significantly higher inter-tumor (0.68) than intra-tumor variation (0.39). Ki67 PI was associated with stage (p<0.0001), seminal vesicle invasion (SVI, p=0.02), extracapsular extension (ECE, p<0.0001) and Gleason Score (GS, p<0.0001). Ki67 PI as a continuous variable significantly correlated with recurrence free, overall and disease-specific survival by multivariable Cox proportional hazard model (HR=1.04–1.1, p=0.02–0.0008). High Ki67 score (defined as ≥5%) was significantly associated with worse recurrence free survival (HR=1.47, p=0.0007) and worse overall survival (HR=2.03, p=0.03).
In localized PC treated by radical prostatectomy, higher Ki67 PI assessed using a clinical grade automated algorithm is strongly associated with a higher GS, stage, SVI and ECE, and greater probability of recurrence.
Differentiating indolent from aggressive prostate cancer (PC) is a major priority given the high prevalence of PC in the aging population and the current magnitude of its overtreatment (1–3). While clinical and pathological assessments of tumor characteristics provide prognostic information, there is a broad spectrum of outcomes in individual patients. Substantial investments have been made in identifying biomarkers related to PC behavior, though to date no tissue-based markers have been incorporated in routine clinical practice due to conflicting data, lack of independence from other well established clinico-pathological characteristics, and a paucity of appropriately-designed and powered validation and standardization studies.
The Canary Tissue Microarray of Prostate cancer outcomes (CTMAP) was designed with the primary objective of validating promising candidate biomarkers reported to predict PC recurrence at the time of radical prostatectomy (RP) (4). The CTMAP model of a case-cohort study was designed to allow validation of a biomarker with approximately two-fold sensitivity greater than Gleason score (GS) in predicting PC recurrence. Such a marker would likely benefit clinical practice. Oversampling of recurrent low-grade tumors and non-recurrent high-grade tumors in CTMAP decreases the influence of GS, allowing unbiased validation of independent prognostic biomarkers. This large and well-annotated tissue resource of more than 1,300 randomly selected RP specimens with prolonged follow-up from six academic institutions also reflects the heterogeneity of PC and spectrum of patient management in North America.
Comprehensive literature reviews and a recent meta-analysis of prognostic PC biomarkers indicates that immunohistochemical measurement of Ki67 (MKI67/MIB-1) expression is the tissue biomarker with the most consistent association with the clinical outcomes of PC (5–7). Ki67 has provided independent prognostic value in prostate needle biopsies, transurethral prostate resections and prostatectomy specimens (1, 8, 9) including independent associations with biochemical and clinical recurrence regardless of treatment (9). In addition, Ki67 is an attractive biomarker from a technical perspective due to ease of interpretation with moderate to high intra- and inter-observer reproducibility, relatively high tolerance to typical preanalytical variability, universal use and availability across diagnostic laboratories, and the frequent presence of internal positive controls within sampled tissues (3, 7, 10, 11). As a marker of tumor proliferation, Ki67 has been successfully used in routine pathology practice for differential diagnoses, grading, prognostication and assessment of treatment responses for multiple neoplasms including endocrine and neuroendocrine neoplasms (12, 13), breast cancer (10, 14), trophoblastic tumors, lymphomas, soft tissue and brain tumors (15–17).
In view of the need for biomarkers that aid in the clinical management of men with PC, we sought to determine if automated detection of Ki67 immunohistochemical staining, reported as a percentage of cells expressing Ki67 as a measure of cell proliferation, associated with a specific adverse PC outcome: recurrence after primary therapy. We also sought to determine if Ki67 provided information independent of other risk features and if the magnitude of this additional information was sufficient to impact clinical practice.
This study utilized tissue microarrays (TMA) comprised of tissue samples from over 1,300 randomly selected participants treated for PC with RP at 6 institutions between 1995 and 2004 (4). The cohort included approximately equal numbers of samples from men with biochemically recurrent and non-recurrent PC after 5 or more years of follow-up. Recurrent PC was defined by (1) a single serum PSA level >0.2 ng/mL more than 8 week after RP and/or (2) receipt of salvage or secondary therapy after RP and/or (3) clinical or radiologic evidence of metastatic disease after RP. Median follow-up was 7 years (range: 1 day – 21.4 years).
The TMA was constructed to assess biomarkers that provide prognostic information independent of clinical and pathological information. As GS is a powerful predictor of outcome, we oversampled recurrent cases of GS 3+3 and 3+4 as well as non-recurrent cases with GS 4+4. This strategy diminishes the prognostic significance of GS and allows for the validation of biomarkers that correlate with PC clinical outcomes, independent of GS. All six participating sites contributed approximately 200 formalin-fixed and paraffin embedded RP specimens each, which were distributed on 33 TMA blocks. Each TMA block with 11 × 16 layouts was fabricated from 42 RP specimens and 8 normal control tissues including tonsil, prostate, kidney, colon and liver. Each PC, sampled with a 1 mm puncher, was represented by 3 cores obtained from the highest grade cancer in the dominant tumor nodule. In addition, 1 core was obtained from histologically benign prostatic tissue of every patient.
This study was conducted with a multi-institutional agreement and approvals from the institutional review boards at University of Washington, Stanford University, University of British Columbia, University of California San Francisco, University of Texas Health Sciences Center San Antonio, Veterans Affairs Puget Sound Health Care System, and Fred Hutchinson Cancer Research Center (FHCRC; Coordinating Center). De-identified demographic, clinical, and pathologic data is maintained in a central data repository at FHCRC managed by the Coordination Center.
Unstained 4 μM TMA sections were deparaffinized on an automated immunostainer (Bond III (TM), Leica Biosystems, Germany) using a proprietary Bond Dewax solution. Three applications of the Dewax solution were followed by three applications of 100% ethanol and then three applications of Bond Wash solution. The immunostaining was performed in a CAP certified diagnostic immunohistochemistry laboratory according to a standardized protocol. In brief, antigen retrieval was performed on Bond III (TM) using ER2 buffer (pH 9.0) for 30 minutes. After rinsing and endogenous peroxidase blocking, a post primary IgG linker was applied followed by several rinses with the Bond wash solution and a de-ionized water rinse. The slides were incubated for 15 minutes with a mouse monoclonal IgG1 antibody against Ki67 (clone MIB-1, dilution 1:200, Cat# M7240, DAKO, Carpinteria, CA). Slides were then rinsed multiple times with Bond Wash solution, a polymer anti-mouse Poly-HRP-IgG was applied, and slides were incubated for 8 minutes with polymer detection reagent (Bond Polymer Refine Detection kit, Leica). This was followed by multiple rinses, reacted with 3,3″-Diaminobenzidine tetrahydrochloride chromogen for 10 minutes, and counterstained with hematoxylin for 5 minutes. Negative controls for the immunostaining were prepared by omitting the primary antibody step and substituting it with non-immune mouse serum. Normal tonsil tissue cores present on each TMA section served as internal positive controls for Ki67 staining.
The TMA slides were scanned on Aperio ScanScope AT (Aperio Technologies, Vista, CA, USA) at 200× magnification. High resolution digital image files were created and saved in the web-based Spectrum Plus digital slide manager and segmented using TMALab to further automate the analysis of the individual gridded tissue cores. The tumor areas were manually annotated by an experienced genitourinary pathologist (MT) to maximally exclude non-tumor stroma, benign glands and inflammatory cells [Figure 1]. The Ki67 staining of the cells comprising the tumor areas of each core was determined using a nuclear quantification algorithm which was tuned by an experienced user (MT) to allow reliable identification of all Ki67 positive nuclei and hematoxylin counterstained negative nuclei (18). The threshold for size and shape of tumor cells was manually calibrated to maximally exclude stromal cells and lymphocytes as previously reported (19). Cases from TMA slides with weak or negative internal control cases (tonsil tissue) after repeat staining were excluded from further analysis. The Ki67 proliferation index (PI) was automatically calculated by the software as a ratio (%) of positively stained nuclei to all nuclei. For each core a median of 3,019 tumor nuclei were counted (range 19–12,091). Tumor PIs for each patient were averaged from 3 analyzed cores. The maximum score % Ki67 positivity per tumor was calculated as a surrogate of the ‘hotspot’ reflecting the area of most intense proliferation (20, 21). All cases were coded and analyzed by a pathologist in a blinded fashion, without any knowledge of patient outcome.
Patient characteristics were collected in the clinical data set and included pre-operative serum PSA level, pathology stage, GS, seminal vesicle invasion (SVI), extracapsular extension (ECE), and surgical margin status (positive or negative). Of 1326 patients with clinical data, 1004 patients had complete high-quality Ki67 tumor data available for analysis with acceptable strong uniform TMA staining of the positive controls.
The summary statistics of patient characteristics are provided in frequencies and percentages (see Table 1). Ki67 and pre-operative PSA data were summarized using mean, SD, and range. The Wilcoxon rank sum test or Kruskal-Wallis test were used to compare Ki67 PI between patient groups. The Pearson correlation test was used to assess association between Ki67 PI and pre-operative PSA. Inter- and intra-tumor variation of Ki67 positive nuclei were estimated using variance component analysis. Ki67 were also dichotomized by either weighted average of tumor Ki67 or maximum percent positive nuclei using 5% cutoff point reported previously for RP/TURP cohorts as an independent prognosticator in multivariable analyses (3, 7, 22–24).
The primary endpoints of this analysis were: 1) recurrence-free survival (RFS) post-surgery, where an event was defined as any PC recurrence (biological, clinical/radiological, or treatment with salvage therapy) or metastasis or death due to PC; 2) overall survival (OS), where an event was defined as death of any cause; and 3) disease-specific survival (DSS), where an event was defined as PC metastasis or death due to PC. The baseline was set as the date of RP.
There were four groups of patients defined in this study: 1) non-recurrence (48% cases), 2) recurrence within 5 years of RP (40% cases), 3) lost-to-follow up within 5 years of RP (6% cases), and 4) recurrence more than 5 years after RP (6% cases). The survival of patient groups was estimated using the Kaplan-Meier method and groups were compared using the log rank test. A univariable Cox model was used to assess the effects of Ki67 PI and other patient characteristics on RFS, OS, and DSS. A backwards elimination procedure including PSA, GS, age, margin, ECE, SVI, and Ki67 covariates was used to identify a final multivariable Cox model with all significant factors for each survival endpoint. The summaries of concordance index of different Cox proportional hazard models for RFS, types of recurrence and salvage therapy
All tests were two-sided and p-values of 0.05 or less were considered statistically significant. Statistical analysis was carried out using SAS version 9 (SAS Institute, Cary, NC). Plots were produced using Spotfire S+ 8.2 (TIBCO Inc., Palo Alto, CA).
We determined the Ki67 PI in > 4,000 prostate tissue cores from 1004 RPs using the Aperio system for automated detection of staining and quantification of positive nuclei. The Ki67 PI ranged from 0 to 35.6% with weighted average median value of 2.19%. We found significant associations between Ki67 PI and pathologic stage, SVI, ECE, and GS (Table 1). More specifically, Ki67 PI was significantly higher in patients with pathologic stages pT3/pT4 vs. stage pT2 (p<0.0001), as well as cases with SVI and ECE (p=0.02 and p<0.0001, respectively). There was also a statistically significant increase in Ki67 PI from lower to higher GS (Figure 1). Strong positive correlation between higher GS and higher Ki67 PIs were found for weighted average and maximum Ki67 PI by Kruskal-Wallis test (p<0.0001). No significant association was found between Ki67 PI and positive surgical margin (PSM) status or pre-operative PSA by Wilcoxon rank sum test and Pearson’s correlation (p=0.21 and p=0.36, respectively).
The intra-tumor variance of Ki67 expression, estimated from different tumor cores, was 0.39 (95% CI: 0.36–0.41), whereas the inter-tumor variation was 0.68 (95% CI: 0.61–0.76). The inter-tumor variation explained 64% of total variation.
The summary of 5-year RFS by Ki67 PI and clinico-pathological factors are shown in supplementary Table 1. The univariable Cox proportional hazard model determined that the weighted average Ki67 PI was significantly correlated with RFS as a continuous variable (per 1% increase, HR=1.04, p=0.002), as well as maximum % Ki67 positivity (per 1% increase, HR=1.03, p=0.005). Other PC characteristics significantly associated with RFS events were PSM, presence of SVI and ECE, and (log) pre-operative PSA as continuous 1 unit incremental values (p<0.0001). Two multivariable Cox proportional hazard models (including 634 cases with 281 recurrences) for RFS demonstrated that higher weighted Ki67 PI average (model 1) and higher maximum Ki67 % positivity (model 2) were both significantly correlated with worse RFS after adjusting for other clinical factors (PSM, SVI, GS, ECE, and pre-operative PSA) (p=0.02–0.0008; Table 2). The summary of concordance index of different Cox proportional hazard models for RFS is shown in supplemental Table 2.
The univariable Cox proportional hazard analysis demonstrated that significantly worse OS was associated with increasing Ki67 PI (p=0.003) or pre-operative PSA (p=0.018), stage pT3/pT4 vs. pT2 (p=0.01), presence of SVI (p<0.0001) or ECE (p=0.01), and GS of 8 or higher (vs. 6 or lower) (p=0.008). Worse DSS was also strongly associated with increasing Ki67 PI (p=0.004) and pre-operative PSA (p<0.0001), PSM (p=0.02), stage pT3/pT4 vs. pT2 (p<0.0001), presence of SVI (p=0.01), and GS of 6 vs 4+3 (p<0.0001). The multivariable Cox proportional hazard model for OS comprised a total sample size of 984 with 57 events. After adjusting for pre-operative PSA, PSM, pathologic stage, SVI, and ECE, worse OS was significantly associated with increasing by 1% Ki67 PI (HR=1.09, p=0.02) or GS of 8 or higher (vs. 6 or lower) (HR=3.28, p=0.0007). The same model was run for DSS and comprised a total sample size of 874 with 44 events. There were significant associations with worse DSS and increasing by 1% Ki67 PI (HR=1.1, p=0.02), pre-operative PSA (HR=1.98, p=0.005), and GS of 8 or higher (vs. 6 or lower) (HR=5.13, p=0.001). Multivariable cox regression analysis results for RFS, OS and DSS are shown in Table 2.
We examined the functional form and possible cutoff points of Ki67 PI and pre-operative PSA using Martingale residual plots (Supplementary Figure 1). The linear form of Ki67 and logarithmic transformation of pre-operative PSA were used for modeling. Martingale residual plots showed no discrete cut-point that could be used to dichotomize samples prognostically using log(PSA) or KI67 PI values. Therefore, we tested three different Ki67 cutpoints to dichotomize cases to evaluate associations with clinical endpoints: weighted average (2.19%), maximum per case (3.11%), and at cut-point of 5% which has been used successfully in several previous studies of localized PC (3, 7, 22–24).
The univariable Cox proportional hazard model showed significant correlation with RFS when dichotomized by median classes only for the weighted average (2.19%, HR=0.72, p=0.01), but not for the maximum % positive Ki67 per tumor (3.11%, HR=0.86, p=0.06). The multivariable Cox proportional hazard model for outcomes dichotomized by Ki67 PI weighted average and maximum Ki67 % positivity were not significant. Moreover, no significant relationship with OS and DSS was detected with either Ki67 cutoff of 2.19% or 3.11%.
The multivariable Cox proportional hazard model showed significant correlation with RFS when Ki67 dichotomized at 5% cutoff for both the weighted average (model 1) and for the maximum % positive (model 2). Tumors with Ki67 ≥5% were associated with worse RFS after adjusting for pre-op PSA, margin status, SV invasion status, and Gleason score: HR=1.47, p=0.0007 (model 1) and HR=1.31, p=0.03 (model 2). The multivariable Cox proportional hazard model for OS comprised a total sample size of 992 with 57 events showed significant relationship with worse OS and Ki67≥5% based on weighted average (HR=2.03, 95% CI 1.09–3.8, p=0.03). The same model was run for DSS and comprised a total sample size of 1352 with 51 events. Analysis of DSS demonstrated its significant relationship with Gleason scores (HR=2.6–8.11, p=0.0004) and (log) pre-operative PSA (HR=2.01, p=0.006), but not with Ki67 dichotomized at 5% cutoff. Multivariable cox regression analysis for RFS stratified by center is shown in Table 3.
Kaplan-Meier plots for all survival endpoints indicated that patients with tumor Ki67 PI above the median (>2.19%) had a significantly decreased probability of RFS (p=0.003) (Figure 2A), which was even more evident with 5% cut-point (p<0.0001) (Figure 2B). Moreover, Kaplan-Meier survival curves with 5% cutoff point showed a trend toward predicting DSS in tumors with higher Ki67 (p=0.083) and significant difference for OS (p=0.045), but not for cutoffs of 2.19% or 3.11%. All p-values in this paragraph were produced by the log-rank test.
Our Ki67 validation study was conducted in over 1,000 radical prostatectomies in compliance with MISHFISHIE (Minimum Information specification for in situ hybridization and immunohistochemistry experiments) (25) and REMARK (Reporting recommendations for tumor marker prognostic studies) (26, 27). Our objective was to evaluate whether Ki67 testing provides meaningful prognostication and whether an image-based automated scoring system could meet criteria for use in clinical practice. To our knowledge, this is the largest study to date, with one of the longest periods of follow-up (median: 7 years) (5, 9, 28), and represents a wide spectrum of patients with PC managed at six major centers across North America. We found that increased Ki67 as a continuous variable was significantly associated with stage pT3/pT4, presence of SVI or ECE, and higher GS, but not with PSM and/or pre-operative PSA levels. After adjusting for other factors on multivariable analysis, Ki67 as a continuous variable remained a significant independent predictor for recurrence-free, overall, and disease-specific survival using logistic regression models and Cox proportional hazard models. These findings substantiate Ki67 staining as an independent predictive biomarker for PC outcomes.
Our tested Ki67% cutoffs of weighted average and maximum % positivity per case (hotspot equivalent) were 2.19% and 3.11%, respectively. These values were comparable to several other studies (1, 29, 30), but at the lower end of broadly ranging cutoffs from 2.4% to 26%. The range of Ki67 PI values in these studies could reflect differences in the risk profiles of the patient cohorts, tumor heterogeneity, pre- and post - analytical variables, manual vs automated counting, and different statistical methodologies in determining “suitable cutpoints” as mean, median, maximal, or quartile-based (1, 3, 8, 14, 28, 30, 31). Our relatively low weighted average PI could also be partially explained by the study population that was selected to enrich for recurrent low-grade and non-recurrent high-grade cases, possibly leading to oversampling of cases with lower proliferative indices. However, our patient population reflects contemporary risk groups of prostate cancer diagnosed in North America, particularly patients who are candidates for active surveillance in whom prognostic biomarkers are most needed.
A recent study of PC patients treated on Radiation Therapy Oncology Group (RTOG 94–08) using automated ACIS scoring showed median Ki67 PI similar to what we observed (2.65%), although the automated scoring was slightly lower than the median value obtained by manual scoring in the same cohort (3.85%) (30). Manual scoring likely produces higher PIs because there typically are smaller numbers of nuclei counted and regions with more stained nuclei (hotspots) are likely to be oversampled. In agreement with these observations, median Ki67 PI in our study was nearly identical to RTOG 94–08 (PI=2.19%) while the maximum Ki67 PI % positivity per case in our study (PI=3.11%) was remarkably similar to manually scored hotspots.
Implementation of automated scoring in our high-throughput study was justified since 1) it provides more accurate determination of Ki67 PI due to higher numbers of counted cells (a median of 3,019 tumor nuclei), 2) it reduces human error and fatigue during quantitating of ~4,000 cores, and 3) it eliminates scoring variations by analysis of uniformly batch stained TMA slides with a standard nuclear counting algorithm. Furthermore, the accuracy of automated scoring has been separately validated in studies in which adequate sample sizes for manually counted nuclei were obtained. A comparative study between digital image analysis using the same Aperio XT nuclear algorithm and manual counting of adequate numbers of cells (>2,000 nuclei) showed excellent concordance with interclass correlation of 0.98 (19).
Our data suggest 2 alternate approaches for incorporating Ki67 PI into clinical practice. In multivariable analysis, both median and maximum KI67 PI provided independent prediction of RFS when coupled with other clinical and pathological parameters. In this case, KI67 PI could be incorporated into clinical risk models, such as CAPRA or other nomograms, although appropriate weighting of the models might need to be adjusted for a non-selected population. Alternatively, biomarkers with a purely dichotomous output (positive or negative) that sort patients into high and low risk groups are easier to incorporate into risk models for clinical use (14, 31). Despite our inability to define a cut-point of Ki67 PI in our dataset using the Lowess smoothed plot of Martingate residuals, we did validate that a cut-point of 5%, which had been derived empirically in previous studies (3, 7, 22–24), did provide independent prediction of RFS with a hazard ratio (HR=1.47) comparable to Gleason score (HR=1.29–1.81), margin status (HR=1.59) and log(PSA) increase by 1 unit (HR=1.54), although slightly lower than SVI (HR=2.07). Additionally, Kaplan-Meier plots showed robust curve separation between low and high Ki67 groups at 5% that outperformed median or maximum Ki67 PI per case. Whether 5% is the ideal cut-point is unclear, given the results of our cut-point analysis. However, Kaplan-Meier analysis also showed that tumors with very low Ki67 PI (≤1%) were indistinguishable from those that were 1–5%, implying that the cut-point is likely to be at 5% or greater.
Our study has some limitations since it is retrospective and based on radical prostatectomy samples and was designed with specific parameters to help in identifying prognostic biomarkers that are independent of clinical predictors. For example, the selection of balanced numbers of recurrent and non-recurrent Gleason score 3+4 and 4+3 cases and oversampling of recurrent 3+3 and non-recurrent 4+4 cases improved our ability to validate markers that predict outcomes after surgery that are independent of known clinical predictors such as Gleason score. Because of this, the relative weight of the biomarker in predicting outcome cannot be incorporated into existing algorithms, such as CAPRA, that have been developed and tested in non-selected surgical populations. In addition, selection of the region of highest grade in construction of the TMA could limit application of the findings to non-selected random biopsies and further confounded by intratumoral heterogeneity. In addition, there is considerable overlap in Ki67 PI in univariable analysis between favorable and high risk clinical and pathological outcomes, making clinical translation challenging. Very likely, the greatest utility will be derived from combining Ki67 PI with independent predictors, or by focusing more on extreme values, such as the >5% cut-point. Translation to clinically relevant scenarios, such as selection of patients for active surveillance in the low risk population or for adjuvant therapies in high risk localized disease will require testing in prospective cohorts. While our study is based on small samples of cancer, in many ways comparable to standard prostate needle biopsies, it will need to be validated on biopsy samples because of potential issues with sampling error in prostate needle biopsies. It is possible that the advent of image-directed biopsies using multiparametric MRI could significantly improve the performance of prognostic biomarkers because of improved sampling the largest incident lesion. However, this hypothesis needs to be tested.
In localized PC treated by radical prostatectomy, Ki67 PI provides independent prognostic value for RFS, DSS and OS beyond Gleason score, pathological stage and PSA levels. In our large, multi-institutional cohort, Ki67 PI performed best as a continuous variable and could be incorporated into existing or new predictive nomograms. Additionally, our study suggests that risk stratification for localized PC could be achieved with Ki67 as a dichotomous variable at 5% cutoff. These findings strongly suggest that Ki67 PI should be further tested as a prognostic biomarker in other clinically relevant cohorts such as patients on active surveillance (32), and possibly in patients undergoing image guided biopsies.
Supported by: The Canary Foundation; the Department of Defense W81XWH-11-1-0380; the NCI Early Detection Research Network U01 CA086402, CA152737, CA08636815; P30 CA054174; and the Pacific Northwest Prostate Cancer SPORE P50 CA097186
We cordially thank a large and dedicated Canary Foundation team of coordinating center staff, local coordinators, lab staff, and physicians who have made this study possible.
CONFLICT OF INTEREST
There are no conflicts of interest for any authors.