|Home | About | Journals | Submit | Contact Us | Français|
Gleason grading is an important predictor of prostate cancer (PCa) outcomes. Studies using surrogate PCa end points suggest outcomes for Gleason score (GS) 7 cancers vary according to the predominance of pattern 4. These studies have influenced clinical practice, but it is unclear if rates of PCa mortality differ for 3 + 4 and 4 + 3 tumors. Using PCa mortality as the primary end point, we compared outcomes in Gleason 3 + 4 and 4 + 3 cancers, and the predictive ability of GS from a standardized review versus original scoring.
Three study pathologists conducted a blinded standardized review of 693 prostatectomy and 119 biopsy specimens to assign primary and secondary Gleason patterns. Tumor specimens were from PCa patients diagnosed between 1984 and 2004 from the Physicians' Health Study and Health Professionals Follow-Up Study. Lethal PCa (n = 53) was defined as development of bony metastases or PCa death. Hazard ratios (HR) were estimated according to original GS and standardized GS. We compared the discrimination of standardized and original grading with C-statistics from models of 10-year survival.
For prostatectomy specimens, 4 + 3 cancers were associated with a three-fold increase in lethal PCa compared with 3 + 4 cancers (95% CI, 1.1 to 8.6). The discrimination of models of standardized scores from prostatectomy (C-statistic, 0.86) and biopsy (C-statistic, 0.85) were improved compared to models of original scores (prostatectomy C-statistic, 0.82; biopsy C-statistic, 0.72).
Ignoring the predominance of Gleason pattern 4 in GS 7 cancers may conceal important prognostic information. A standardized review of GS can improve prediction of PCa survival.
Gleason grading1 is a strong predictor of survival among men with prostate cancer (PCa).2 The Gleason system, introduced in 1974,3 is an architectural grading system that ranges from 1 (well differentiated) to 5 (poorly differentiated). The Gleason score (GS) is the sum of the primary and secondary patterns with a range of 2 to 10. It has long been appreciated that patients with GS ≥ 7 are at greater risk for extraprostatic extension and biochemical recurrence.4 However, more recent evidence suggests that the application of GS has changed, leading to changes in the distribution of scores over time.5–7 Further, studies using surrogate end points have shown that the prognosis of GS 7 cancers varies considerably.8–10
Analyses of GS assignment before and after the introduction of prostate-specific antigen (PSA) screening,7,11,12 and studies that have involved blind rereviews of original specimens,5–7,13 have noted increases in GS with more contemporary readings. The change in GS over time appears to be largely due to a systematic shift in Gleason grading, which is attributed primarily to two factors.14 It is widely accepted that Gleason grade from biopsy is frequently upgraded at prostatectomy, resulting in a reluctance to assign a low GS at diagnosis.15 Previously undescribed benign lesions (ie, atypical adenomatous hyperplasia) were historically mistaken for Gleason 1 + 1 tumors, but contemporary scoring more often correctly classifies those lesions as benign.16 Because systematic upgrading in Gleason results in improved survival for all Gleason categories, the observed Gleason shift has been described as the Will Rogers phenomenon.5,7
Historically, PCa risk prediction models4,17 and observational studies that have adjusted for GS utilized only the overall GS. Because numerous studies suggest that Gleason 3 + 4 tumors (ie, those where pattern 3 is most prevalent but some amount of pattern 4 is also observed) have a better prognosis than Gleason 4 + 3 tumors (ie, those where pattern 4 is more prevalent than pattern 3), contemporary clinical risk prediction now incorporates primary and secondary Gleason pattern.18 However, studies that have specifically assessed differences in Gleason 3 + 4 and 4 + 3 cancers have relied on associations of Gleason pattern with other prognostic factors and biochemical progression8,10,12,19–21 or, less frequently, development of metastases.8,21 While biochemical recurrence is a widely used outcome for studies relating to PCa risk prediction and treatment efficacy, it is important to note that its definition varies,22,23 and it is an imperfect surrogate for PCa mortality.24,25 We undertook a standardized review of radical prostatectomy and needle biopsy tumor tissue samples from the Physicians' Health Study (PHS) and the Health Professionals Follow-Up Study (HPFS) PCa cohorts to assess the predictive ability of a contemporary review compared to original GS, as well as explore potential differences in GS 7 subtypes, with PCa mortality as the primary end point.
The PHS26–28 was initiated as a randomized, double-blind, placebo-controlled trial for the primary prevention of cardiovascular disease and cancer. The study included 29,021 healthy US male physicians age 40 to 84 years at baseline. The majority of participants were white (94%). Participants were observed through annual questionnaires to collect data on diet, health, and lifestyle behaviors, and medical history, and biannually to ascertain compliance and health end points, including PCa.
The HPFS was initiated in 1986, when 51,529 male health professionals, ages 40 to 75 years, completed a mailed questionnaire on demographic characteristics, risk factors and preventive behaviors, and diet and use of supplements. The cohort was predominantly white (> 91%). Through biennial follow-up mailed questionnaires, we update exposure information and medical events, including PCa.
In both cohorts, participants are asked to report new diagnoses of cancer on follow-up questionnaires. Subsequently, hospital records and pathology reports are requested and reviewed by study investigators. Through systematic medical record review of PCa patients, we obtain clinical and pathological data. When available, the original GS, major and minor patterns are recorded. Stage is recorded according to the TNM staging system or a modified Whitmore-Jewett classification scheme. We also observed PCa patients through questionnaires to collect information on their PCa clinical course, including PSA levels and development of metastases. Deaths were ascertained through mailings, telephone calls, and searches of the National Death Index, and cause of death is assigned after review of death certificates, information from the family, and medical records. Follow-up for mortality was more than 99% complete in the PHS and more than 98% complete in the HPFS.
For PHS patients, original specimens (blocks and hematoxylin and eosin slides) from needle biopsy or prostatectomy are requested from the diagnosing institution of every PCa patient. For HPFS patients, only prostatectomy specimens are requested. Of the 1,195 tissue blocks obtained, we excluded two patients determined not to have cancer on standardized review, as well as two patients found to have transitional cell bladder cancer. An additional 379 patients were excluded because an original GS from the same specimen type as the obtained tissue was not available. A total of 812 men with PCa were included in the study. Blinded to the original pathology reports and clinical data, we undertook a standardized review (M.A.R., S.P., S.F.) of original hematoxylin and eosin slides from the referring hospitals and assigned a primary and secondary Gleason grade. The pathologists reviewed the slides independently, and then GSs were compared. For any patients with discrepant Gleason data between pathologists, slides were rereviewed until a consensus was reached.
All analyses were conducted separately for prostatectomy and needle biopsy specimens. To assess differences in original and standardized GS, we compared distributions among the men using a Wilcoxon signed-rank test. Trends in Gleason during three 4-year time periods (1985 to 1988, 1994 to 1997, and 2001 to 2004) were explored by plotting the original versus standardized scores and the best-fitting regression line through the points. To compare the ability of GS from original report and standardized review to predict PCa survival, the outcome event was defined as lethal PCa. Event dates were the date of diagnosis of bony metastases when such data were available, or the date of PCa death, otherwise. Patients who did not die of PCa and who did not develop metastases to bone were censored at time of death from other causes or the end of study follow-up (March 1, 2008). Follow-up time was calculated from the date of PCa diagnosis to the event date, or time of censoring.
Crude mortality rates were calculated within six strata defined by the original reports and standardized GS (2 to 5, 6, 3 + 4, 4 + 3, 8, and 9 to 10). Time-to-event analyses to predict lethal PCa were also conducted separately for the original and the standardized Gleason data. Cox proportional hazards models that controlled for age at diagnosis were used to estimate hazard ratios (HR) and 95% CIs by including in the model indicator variables for each stratum of Gleason.
The discrimination of models that included the GS based on the standardized review and historical grading was compared using 10-year survival as an end point. In this analysis, we contrasted two groups: men who lived at least 10 years after diagnosis without known metastases, and men who developed metastases or died of PCa within 10 years after diagnosis. To obtain C-statistics for both original and standardized scores, a 10-level ordinal variable for GS 2 to 10 (with separate codes for Gleason 3 + 4 and 4 + 3) was included in a logistic regression model. We fit crude models and models that adjusted for age, stage, and PSA at diagnosis. We obtained 95% CIs by repeating the analysis on 1,000 bootstrap samples.29 All analyses were performed using SAS version 9.1.3 (SAS Institute, Cary, NC). The research protocol was approved by the institutional review board at the Harvard School of Public Health and Partners Healthcare.
Selected characteristics of the study population are included in Table 1. The mean age at diagnosis was 65.5 years among the 693 patients with prostatectomy specimens and 71.2 years among the 119 patients with biopsy specimens. Most patients were diagnosed in the PSA screening era (85% of prostatectomy and 93% of biopsy specimens).
We observed a dramatic and statistically significant shift in Gleason grading on standardized review for both prostatectomy (P < .0001) and biopsy specimens (P < .0001). The shift in GS assignment of prostatectomy specimens is illustrated in Table 2. Among the prostatectomy patients, 171 were originally assigned a GS of 2 to 5, but on standardized review, only six patients had that assignment (all of whom were GS 5). At the upper end of the scale, 28 prostatectomy patients were originally assigned GS 9 or 10, compared to 45 on standardized review. For biopsy specimens, GS assignment of 2 to 5 decreased from 20 to 1 and GS of 9 to 10 decreased from 7 to 3.
Figure 1 illustrates the trends in Gleason grading of prostatectomy specimens during the study period. Our pathologists tended to give scores that were higher than those originally assigned from 1985 to 1988. Compared to the scores assigned in 1994 to 1997, the standardized GS at the upper end of the scale were generally upgraded from their original score. This trend appeared to persist through 2001 to 2004. When we divided the data into three time periods (pre-1994, 1994 to 2000, and 2001 to 2004), the weighted κ statistics for concordance between original and standardized scores increased in each subsequent time period (from 23% to 33% to 44%), while the percent of patients with original GS that were lower than the standardized scores simultaneously decreased (from 73% to 55% to 39%).
A total of 53 lethal PCa events, which included the development of bony metastases (n = 11) and PCa-specific deaths (n = 42), occurred during the study period. Of the 241 total patients (biopsy and prostatectomy) assigned GS 2 to 5 or 6 on standardized review, none developed lethal PCa. Among the 693 prostatectomy patients, 37 developed lethal PCa, with postdiagnostic survival time ranging from 0.1 to 21.1 years (median, 10.4). As presented in Table 3, crude cancer mortality rates increased in each consecutive stratum of standardized GS for the prostatectomy specimens.
Because no lethal events occurred among prostatectomy cases scored as GS 2 to 6 in the standardized review, we designated 3 + 4 as the reference group when estimating HRs for lethal PCa. After adjusting for age at diagnosis, rates of lethal PCa for prostatectomy specimens assigned as GS 4 + 3, 8, and 9 to 10 on standardized review were significantly higher than GS 3 + 4 (Table 3), and rates increased with each level of Gleason. Among those with an prostatectomy specimen, patients with a standardized GS of 4 + 3 were 3.1 times more likely to develop lethal PCa than patients with 3 + 4 (95% CI, 1.1 to 8.6), while patients with a standardized GS of 9 to 10 were more than 19 times more likely to develop lethal PCa compared to patients with GS of 3 + 4 (HR, 19.1; 95% CI, 7.4 to 49.2). Using the original GS from prostatectomy, an elevated rate of lethal PCa was also found when we compared GS 4 + 3 versus 3 + 4 (HR, 2.4; 95% CI, 1.0 to 5.6). Patients with prostatectomy specimens assigned GS 9 to 10 in the original report were only 4.9 times as likely to develop lethal PCa compared to cases with GS 3 + 4 tumors assigned by the original pathologist (95% CI, 2.0 to 11.9; Table 3). After additional adjustment for pathological stage and log (PSA) at diagnosis among the subset of 502 patients for whom data was available, the point estimates for the standardized and original GS among prostatectomy cases were similar, albeit with wider CIs. The HRs for standardized GS of 4 + 3, 8, and 9 to 10 cancers compared to 3 + 4 were 2.6 (95% CI, 0.4 to 16.0), 6.2 (95% CI, 0.9 to 44.7), and 25.9 (95% CI, 4.7 to 145.2), respectively. For the original GS, there were no deaths in the GS 2 to 5 category among the subset of men with stage and PSA data. HRs comparing each remaining stratum of original GS to Gleason 3 + 4 were 0.2 for Gleason 6 (95% CI, < 0.1 to 1.8), 3.2 for Gleason 4 + 3 (95% CI, 0.9 to 11.1), 0.5 for Gleason 8 (95% CI, 0.1 to 4.3), and 3.3 for Gleason 9 to10 (95% CI, 0.8 to 13.7).
For the 119 patients with biopsy specimens, 16 developed lethal PCa and postdiagnostic survival ranged from 0.1 to 16.3 years (median, 8.3). Crude cancer mortality rates in each stratum of standardized GS of the biopsy specimens were as follows: GS 2 to 5, no deaths; GS 6, no deaths; GS 3 + 4, 11.0/1,000 person-years; GS 4 + 3, 18.7/1,000 person-years; GS 8, 40.6/1,000 person-years; GS 9 to 10, 98.8/1,000 person-years). According to the original GS of the biopsy specimens, crude cancer mortality rates per 1,000 person-years were 14.0 for GS 2 to 5, 8.5 for GS 6, 10.8 for GS 3 + 4, 45.2 for GS 4 + 3, 26.6 for GS 8, and 15.9 for GS 9 to 10.
In predicting 10-year survival, we observed a marked improvement in discrimination of models that utilized strata of the standardized scores compared to original scores when modeled alone or with age. For men with prostatectomy specimens, this analysis included the 380 men who lived at least 10 years after diagnosis without known metastases and the 30 who experienced development of distant metastases or death from PCa within 10 years of diagnosis. When we modeled GS alone, the original score from prostatectomy was associated with a C-statistic of 0.82 (95% CI, 0.77 to 0.88), whereas the C-statistic was 0.86 (95% CI, 0.82 to 0.91) in the model using the standardized scores (Fig 2). When we modeled GS, age, pathologic stage, and log (PSA) at diagnosis, the C-statistics were 0.89 (95% CI, 0.82 to 0.96) for the original GS and 0.90 (95% CI, 0.83 to 0.97) for the standardized GS. For men with biopsy specimens, the comparison of model discrimination included the 37 men who lived at least 10 years after diagnosis without known metastases and the 12 who developed lethal PCa within 10 years of diagnosis. C-statistics for models of biopsy GS alone were 0.72 for the original scoring (95% CI, 0.58 to 0.86) and 0.85 for the standardized review (95% CI, 0.76 to 0.94).
As noted in previous studies,5,7,13 we observed a striking upgrading of GS in a standardized review compared to the original reading. No patients were assigned a GS of 2 on standardized review and fewer than 1% received GS 3 to 5. We observed a 61% increase in the number prostatectomy specimens assigned a GS of 9 to 10 in the standardized review. More importantly, we found that systematic changes in Gleason grading have improved the ability of GS to predict PCa mortality. Among men with GS lower than 7 assigned in a standardized review, no lethal PCa developed in more than 2,600 person-years of follow-up. Moreover, our analysis revealed that Gleason 4 + 3 cancers assigned to prostatectomy specimens have three times of the rate of PCa mortality compared with Gleason 3 + 4 tumors. Considering that capturing the amount of Gleason pattern 4 in tumors revealed remarkable differences in outcome, it is possible that the utilizing the percentage of Gleason 4 may provide further prognostic information, as suggested by a study of biochemical progression.30
One previous study found that standardized GSs of needle biopsy and transurethral resection of the prostate specimens were significantly better at discriminating indolent from aggressive disease than the original GS assigned in 1991 to 1996.13 Our study confirms these findings in a population where the majority of GS data came from prostatectomy. Interestingly, including data on stage and PSA at diagnosis to our models of PCa mortality produced similar C-statistics for both original and standardized GS (0.89 and 0.90, respectively). These results suggest that stage and PSA levels are more important predictors of PCa mortality when original Gleason data are used, as would often be the case in epidemiologic studies. As PSA screening continues to increase the number of patients diagnosed with localized cancer and reduces variability in stage at diagnosis,31 obtaining the most precise assessment of tumor grade will become more critical for population-based studies. Thus, both the primary and secondary Gleason patterns should be considered essential components of PCa data collection for prognostic and research purposes. Given that standard PCa nomograms are most often based on a particular clinical series graded by a single pathologist or one team of pathologists,18 it is reassuring that GS assigned in a variety of settings do as well as standardized scores in discriminating lethal from indolent cancers when other clinical covariates are considered.
Our findings underscore the difficulty in identifying PCa patients who should be treated with prostatectomy. All men with standardized GS 6 tumors at prostatectomy survived, but many of these men likely would have survived without intervention.32 By contrast, one third of men with standardized GS 9 to 10 tumors at prostatectomy developed lethal PCa despite surgery, most likely due to micrometastases at diagnosis. If GS at diagnosis is to be used for guiding treatment decisions, our biopsy data suggest that contemporary, standardized scoring is preferable to historical scoring from diagnosing institutions. All 35 men with standardized GS 2 to 6 at biopsy survived, while two of three with standardized GS 9 to 10 at biopsy developed lethal PCa; however, there were no clear trends in mortality rates according to the biopsy GS assigned by diagnosing institutions—a concerning finding given that the original biopsy GS may have factored prominently into disease management.
This study has several strengths. A large sample of patients with 20 years of follow-up allowed the use PCa mortality as a primary end point. In light of data indicating that the risks of dying from PCa and other causes 15 years after biochemical recurrence are virtually equivalent (32% v 33%),33 PSA relapse cannot be viewed as a substitute for more definitive end points. The availability of data on primary and secondary Gleason pattern from both the initial review and a standardized review allowed us to utilize mortality data to make comparisons of 3 + 4 and 4 + 3 cancers, separately for needle biopsy and prostatectomy specimens. Limitations include missing data on tumor stage and PSA at diagnosis. Nevertheless, our study indicates that contemporary GS from a standardized review can markedly improve prediction of PCa-specific survival compared to the original GS. Further, our study is the first to confirm with PCa-specific mortality data that the predominance of Gleason pattern 4 in GS 7 cancers represents important prognostic information: when it comes to Gleason grading, 3 + 4 does not equal 4 + 3.
We thank the participants in the Physicians' Health Study and Health Professionals Follow-Up Study for their long-standing participation. We are grateful to Julia Fleet, Luba Bondarenko, Al Wing, and Haiyan Zhang for their assistance with data collection and programming.
Supported by Grants No. 5R01CA058684-13 and 5R01CA042182-20 from the National Cancer Institute; Grant No. W81XWH-05-1-0562 from the Department of Defense; and Grant No. T32 CA009001-32 from the National Research Service Award Training Program in Cancer Epidemiology and a Dana-Farber/Harvard Cancer Center SPORE Career Development Award (J.R.S.). The Physicians Health Study is supported by Grants No. CA34944, CA40360, and CA097193 from the National Cancer Institute and Grants No. HL-26490 and HL-34595 from the National Heart, Lung, and Blood Institute.
Presented in part in abstract format at the Annual Meeting of the American Association for Cancer Research, April 12-16, 2008, San Diego, CA.
Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.
The author(s) indicated no potential conflicts of interest.
Conception and design: Jennifer R. Stark, Meir J. Stampfer, Lorelei A. Mucci
Financial support: Meir J. Stampfer, Edward L. Giovannucci
Administrative support: Meir J. Stampfer, Anna S. Eisenstein, Tobias Kurth
Provision of study materials or patients: Tobias Kurth
Collection and assembly of data: Jennifer R. Stark, Sven Perner, Stephen Finn, Anna S. Eisenstein, Massimo Loda, Mark A. Rubin
Data analysis and interpretation: Jennifer R. Stark, Meir J. Stampfer, Jennifer A. Sinnott, Jing Ma, Michelangelo Fiorentino, Edward L. Giovannucci, Mark A. Rubin, Lorelei A. Mucci
Manuscript writing: Jennifer R. Stark
Final approval of manuscript: Jennifer R. Stark, Sven Perner, Meir J. Stampfer, Jennifer A. Sinnott, Stephen Finn, Anna S. Eisenstein, Jing Ma, Michelangelo Fiorentino, Tobias Kurth, Massimo Loda, Edward L. Giovannucci, Mark A. Rubin, Lorelei A. Mucci