|Home | About | Journals | Submit | Contact Us | Français|
To compare the accuracy of cancer progression prediction of the molecular-genetics and morphometry-based Endometrial Intraepithelial Neoplasia (EIN) and the WHO-94 classification schemes in endometrial hyperplasias.
A multicenter multivariate analysis of 477 endometria, with a required 1 year minimum cancer-free interval from the index biopsy (1-18 years follow-up). Comparison with 197 patients with <1 year follow-up.
24/477 (5.0%) hyperplasias progressed to cancer over an average of 4 years (10 yrs maximum). According to WHO94, 16/123 (13%) atypical and 8/354 (2.3%) non-atypical hyperplasias progressed (Hazard ratio=HR=7). 22/118 (19%) of EINs and 2/359 (0.6%) of non-EINs progressed (HR=45). EIN was prognostic within each WHO94 subcategory. Progression rates in simple (SH), complex (CH), simple atypical (SAH), and complex atypical (CAH) hyperplasias with EIN were 3, 22, 17 and 38% respectively, contrasting with 0.0-2.0% if EIN was absent. EIN detected precancer (sensitivity 92%) better than WHO94 atypical hyperplasias collectively (67%) or CAH alone (46%). With Cox regression EIN was the strongest prognostic index of future endometrial cancer. The same holds for patients with <1 year follow-up (HR for EIN, Atypia and CAH 58, 7 and 8 respectively).
The EIN classification more accurately predicts cancer-progression than WHO94 and identifies many women with benign changes regarded as high risk under WHO94.
Comparison of the accuracy of cancer progression prediction of the Endometrial Intraepithelial Neoplasia (EIN) and the WHO-94 classification schemes in 477 endometrial hyperplasias with a required 1 year minimum cancer-free interval and 197 patients with <1 year follow-up. The EIN classification (Hazard Ratio=HR=45) more accurately predicts cancer-progression than WHO94 (HR=7) and identifies many women with benign changes regarded as high risk under WHO94.
Endometrial hyperplasia is a common disease with estimates of frequency ranging from roughly half1 to several times2, 3 the estimated 40,320 new cases of cancer being reported in the United States alone for 20041.The latter high estimate is more in agreement with incidences comparable to The Netherlands4 and the province of Rogaland, south-west Norway (Baak unpublished results, 2004). Extrapolation of these incidences to the European Union (with 380 million inhabitants) means approximately 120,000 new cases per year. As only 1 to 28% of cases of hyperplasia actually progress to cancer depending on degree of severity5, it is important to stratify the cases into high and low cancer risks before initiating therapy. The World Health Organization 1994 classification for endometrial hyperplasias (WHO94)6 is widely used for this purpose. The most recent WHO classification (WHO2003)7, acknowledges the shortcomings of WHO94, and on this basis has introduced the alternative molecular genetics and morphometric-based Endometrial Intraepithelial Neoplasia (“EIN”) classification8.
The two classification systems differ in their foundations. The WHO94, based entirely on histologic findings, uses four subcategories based upon architectural and cytologic alterations. In practice, the diagnostic criteria are difficult to reproducibly apply as they are largely subjective. Even acknowledged experts experience substantial differences in reporting. Data of the GOG (Gynecologic Oncology Group) showed that the WHO94 diagnostic criteria are easily misinterpreted in the community, as 30% of cases submitted as complex atypical hyperplasia (CAH) were diagnostic of cancer on expert review, while another 40% were a lesser degree of hyperplasia or entirely benign9. Furthermore, the significant differences of 2 of the 4 WHO94 subcategories were based on studies having very small numbers (only 1 of 13 or 8% patients with complex hyperplasia (CH) and 1 of 8 or 13% of patients progressed with simple atypical hyperplasia (SAH))5.
The EIN system in contrast has a molecular genetic basis, can be implemented by morphometric analysis, and even with H&E slides alone has semiquantifiable features8. The prognostic foundation is the D-Score, a unique combination of three reproducible morphometrical features reflecting glandular volume, architectural complexity and cytologic (nuclear) abnormality10. These three features, selected from many variables by their multivariate ability to independently predict cancer progression, include: 1) volume percentage stroma (VPS), which assesses the percentage of endometrial tissue composed of stroma (i.e., the inverse of glandular percentage, a measure of crowding); 2) length (perimeter) of basement membrane about the endometrial glands (measurement of gland complexity), termed OSD for “outer surface density” of glands; and 3) standard deviation of the shortest nuclear axis (SDSNA), which reflects nuclear pleomorphism. Three previous studies found that the D-score’s prognostic value was better than the WHO944, 11-12. This paper compares and evaluates the WHO94 and D-Score-based EIN classification schemes in the largest series of investigations reported to date, with long follow-up to determine which classification scheme best predicts the risk of future cancer.
Endometrial curette or biopsy (Vabra or Pipelle) specimens diagnosed as hyperplasia were collected over 22 years from 8 contributing centers (6 European and 2 American) representing 16 hospitals. With the more recent recognition of EIN as a diagnostic category, a minority of cases were also collected on that basis. Based on the rules established for the diagnosis of EIN, all EIN cases are hyperplasias, but not the reverse. This is discussed more fully below.
Slides from all cases were reviewed and recategorized according to the WHO94 system by the senior investigator from each of the contributing centers. These diagnoses were used for all analyses. Pilot studies showed that while the correlation coefficient between a center’s participating investigator and its staff pathologist who originally diagnosed the case was moderate to reasonable, it differed more substantially between the investigators from differing centers when they first began collaborating. This is indicative of center differences interpreting WHO94 criteria, a well documented feature9.
“Progression” is a debatable concept. Tumors encountered during early follow-up of a precancer could be instances of cancers already present but missed at the time of the original hyperplasia biopsy. Because of this concern, cases of carcinoma occurring within the first year of follow-up were therefore considered separately as concurrent rather than progression events (as previously applied by others5. This study explores this issue of progression from EIN to carcinoma by examining events during the first year separately from those after the first year.
Among 1061 cases diagnosed as hyperplasia, 477 fulfilled all criteria for inclusion Another 197 patients, analyzed separately, are those followed for under 1 year, of which 43 had cancer discovered within this time. Of the remaining 387 cases, 181 uteri were removed within 8 weeks for non cancerous reasons, too short a time frame for meaningful analysis; 74 were misdiagnoses of endometrial polyps or other lesions as hyperplasia, 6 had other gynecologic cancers and 12 had foci too small to reliably measure (< 1 mm in diameter) or had tissues that were artifactually damaged or unsuitable for morphometric analysis. 33 subjects were lost to follow-up altogether.
While some patients have been reported previously (Table 1), wherever possible additional follow-up information has been sought from review of clinical management records and subsequent pathologic specimens. The median follow-up interval for all patients was 48 months (range: 13-120) for cases with progression and 68 months (13-216) without.
Those 197 patients who either developed cancer (n=43) or completed follow-up in the first year (n=154) were tabulated separately as short term outcomes.
Therapeutic interventions occurring between the index hyperplasia diagnosis and end of follow-up were not formally standardized but representative of the various institutional practices. Given the many institutions and practices involved, detailed analysis of practice patterns and their effect on outcome was not feasible. It is the authors’ impression that higher grade lesions were treated more aggressively than lower grade lesions, and that some American centers were more aggressive in their approaches than their European counterparts, which naturally raises the possibility of over- or under-treatment. In general, patients did not receive hormonal therapy, but were re-biopsied based upon either symptomatic indications or at proscribed intervals.
The specimens were fixed in 4% buffered formaldehyde, paraffin-embedded and 4 μm thick histological sections were stained with hematoxylin-eosin (H&E). The gynecopathologist co-authors reviewed the cases from their own laboratories without supporting information (“blindly”), using the WHO criteria13 for simple (SH), complex (CH), simple atypical (SAH) or complex atypical (CAH) hyperplasia. Cancer was diagnosed based upon subsequent tissue specimens using standard criteria (solid areas of epithelium, cribriform glands, interconnected “meandering” lumina or myometrial invasion)13.
Computerized morphometrical analysis used the QPRODIT system (version 6.1, Leica, Cambridge, U.K.) on the standard diagnostic H&E section4. Because non-hyperplastic areas must be carefully excluded from the measurement area, each primary surgical pathologist demarcated the diagnostic area on the glass slide with a black marker. The minimum diameter acceptable was 1 millimeter. The analyses were performed in the different collaborating laboratories. Over the past two decades, 17 technicians have done D-score (=DS) assessments, all with minimal inter-observer variation (correlation coefficients between R=0.91 and 0.98) and 100% unanimity in final prognostic classifications (i.e., DS<1 vs. ≥1, see below)4, 10, 14. The average technician time spent per case was less than 30 minutes.
The D-score4 was calculated as follows:D-score =
The D-Score values vary from -5 to +6. Currently, EIN is diagnosed if the D-Score is <1, and low risk or non-EIN if the D-score is >1. The D-score of between 0 and +1, considered “indeterminate” in the past, correlates better with monoclonality and therefore currently are regarded as EIN14.
Within the EIN diagnostic scheme, endometria with D-Score >1 are reassigned into a variety of benign categories including proliferative endometria with architectural changes of unopposed estrogens (anovulatory type), polyps, reactive and degenerative changes, and unusual cuts of normal structures such as the basalis and lower uterine segment.
Cancer development was the end-point. For those without, data were censored at the time of last contact. Kaplan-Meier survival and multiple regression (Cox model) analysis were used to assess the prognostic value. Calculations made included Hazard ratios (HR), 95% confidence intervals, sensitivity, specificity, and negative and positive predictive values.
24 (5%) of the 477 patients developed endometrial carcinoma more than one year later following the index biopsy (Table 2). By the current WHO94 scheme emphasizing cytologic atypia, the progression rate was 13 % (16/123) with atypia and 2.3% (8/354) without. Using the EIN scheme, 19 % (22/118) of patients with EIN showed progression contrasting with 0.6% (2/359) of benign endometria without EIN (“non-EIN”). Substratification of each WHO94 diagnostic group by presence or absence of EIN polarized cancer risk further. In patients with complex atypical hyperplasia, the most abnormal category of hyperplasia, progression to cancer jumped from 0% in cases rediagnosed as non-EIN (D-Score >1) compared to 38% for EIN (D-Score <1). Similarly, the frequency of progression to cancer jumped from 2% to 22% for complex non-atypical hyperplasia rediagnosed as EIN, and from 0% to 17% for simple atypical hyperplasia rediagnosed as EIN. The Hazard Ratio was far higher for EIN (45.4) than with atypical (combined simple and complex) hyperplasia (7.0) or complex atypical hyperplasia (7.7).
Cancer occurrences occurred throughout the follow-up interval (median 48 months, range 13-120), strongly suggesting that the discovery of tumor was a new development and not synchronous tumor missed at the initial biopsy (Fig 1). A time trend to progression to cancer appeared when the cases were stratified by the D-score (36 months for score <0 and 60 months for score >0) (Fig. 2), but not by the WHO94 diagnosis (median time for SAH 20 months, CH 40, CAH 60, and SH 65). Progression rates varied between centers as a function of available follow-up interval, and the proportion of entered hyperplasias which were reclassified as EIN (Table 1). Only 2/359 patients with a D-score of >1 developed cancer.
Previous studies showed that the Volume Percentage Stroma (VPS), one of the 3 features constituting the D-Score, was the strongest correlate of cancer progression and also of monoclonality, a cardinal biologic feature of premalignant disease4, 10, 14. We therefore evaluated the prognostic value of the VPS at differing levels. The progression risk was highest (40%) if the VPS was <41%. No case with VPS>57% progressed. Values in between had a 5% cancer risk but these cancers developed after a longer latent interval than those specimens where the VPS was <41%.
Because of relatively few cases, previous studies were inconclusive regarding prognostic significance of a D-Score between 0 and +1. The current data shows the cases with these scores behaving more like those with D-Score <0 than >1 (Fig. 2).
In comparing the EIN with various WHO94 subcategories and permutations, the EIN more often correctly predicted clinical cancer outcomes (sensitivity 92%) than simple and complex atypical hyperplasia as a group (67%) or CAH alone (46%) (Table 3). Both schemes were similarly excellent in predicting absence of progression (negative predictive value of 99%, 98% and 97%, respectively). Cox regression analysis showed that EIN (DS <1) was the strongest independent prognostic indicator, and far stronger than that for the WHO94 (as 4 subcategories, CAH-nonCAH, AH-No-AH, or SAH-nonSAH) (Table 2).
Cancers discovered in the immediate months following a biopsy diagnosis of hyperplasia may have been present but inadvertently missed at the time of biopsy. This section highlights this patient group to assess: 1) the influence of time, and 2) the utility of the EIN system to predict those cases where concurrent cancer will be found.
Cancer was discovered in 43 patients during the 1st year of follow-up, while another 154 were free of apparent tumor but followed for less than 1 year. No patient with a D-Score >1 developed cancer, while 39% of the patients with low D-scores did (Table 4). This included 30% (3 of 10) of patients with simple non-atypical hyperplasia. The WHO hyperplasia class with highest cancer rates in the first year (52.5%) were patients with Complex Atypical Hyperplasia (typically with D-Scores0).
The much higher rates of cancer occurrence in the 1st year, compared to cancers appearing more than one year later, is strong evidence that many cancers were missed through sampling or interpretive errors at the time of the index biopsy (compare tables tables11 & 4). In contrast, none of the No-EIN lesions (n=87) developed cancer, whereas EIN lesions were again associated with a high cancer risk within 12 months (Hazard ratio=58, compared to 7.4 and 7.5 for WHO94 Atypia and CAH, figure 3 and Table 4).
This study shows that the “Endometrial Intraepithelial Neoplasia (EIN)” system is superior to the WHO hyperplasia scheme in discriminating lesions with highest risk for conversion to cancer. Furthemore, a large group of women who initially have been given the diagnosis of “hyperplasia” but are later found not to have EIN, have near negligible risk of cancer development. Prospective discovery of carcinoma may be attributed to one of two different mechanisms. Those cancers seen soon after the index biopsy could be considered concurrent instances present but missed at the time of the index biopsy. In contrast, carcinomas seen at longer follow-up intervals are much more likely to represent time dependent precancer to cancer progression events. Both short and long term cancer outcomes are well predicted by an EIN diagnosis (D-Score<1).
A key feature of premalignant lesions is elevated risk for developing cancer if left untreated. This perspective is an essential and valid standard by which clinical utility of an existing precancer diagnostic scheme should be measured, but is inefficient for discovery of new diagnostic criteria. Prediction of low frequency clinical outcomes requires very large sample sizes in the learning or “criteria discovery” case set, and predictive performance degrades substantially upon extrapolation to new (test or “challenge”) cases if the initial classification criteria are incompletely or poorly defined. In this regard, objective morphometry has the advantage of high reproducibility and the ability to decompose component variables independently. Alternatively, molecular genetic assessment of seminal precancer characteristics such as monoclonal growth, acquisition of mutations which offset lesions from the normal background, and documentation of lineage continuity with subsequent carcinoma may be informative in definitive identification of individual examples of precancers15, 16. In the case of EIN, these molecular and morphometric approaches have independently identified the same class of premalignant lesions14, effectively cross-validating each as complementary discovery platforms. The current study goes one step further in documenting a level of clinical predictive value which exceeds that of current practice.
A major problem with the WHO94 scheme is disappointing reproducibility17, 18. Because each of its four subcategories predicts only 1% to a maximum of 20% of cases that actually develop future cancer, the prospect of this scenario and the subsequent risk of overtreatment is at least 80% (100-20%) to 99% (100 - 1%) of patients. This current study documents that the EIN classification scheme more accurately predicts the development of future cancer than WHO94. Both classification systems are predictive of endometrioid endometrial cancer, the serous type having a different pathway exclusive of EIN19.
Of immediate practical significance, the EIN system more accurately identifies cases with benign changes, some of which are regarded as high risk (atypical hyperplasia) under WHO94. The question is why. At the heart of both schemes are diagnostic criteria. WHO94, as initially described, emphasized architectural abnormality with sub-stratification by presence or absence of cytologic (principally nuclear changes) atypia. This evolved over time to where cytology became the principal feature for discrimination of cancer risk20. WHO atypical cytology is defined in stereotypical terms as loss of nuclear polarity, nuclei with prominent nucleoli and altered chromatin7. In the EIN system gland architecture is the most important of the 3 independent D-Score variables4, with cytology being less predictive as an independent variable. Morphometric EIN nuclear pleomorphism is exemplified by the variable of shortest nuclear diameter, a feature found more predictive of cancer association than other nuclear characteristics including textural chromatin features21. Subjective review of EIN cases identified by morphometry has reaffirmed the importance of cytologic change, but the nature and appearance of these alterations are not absolute amongst all cases. Rather, cytologic changes are best recognized by comparison of lesional to background cytology in the same patient. Presently, neither the WHO94 nor the EIN scheme find nuclear-cytoplasmic ratios significant22 and to date, neither identify textural changes exclusive to the cytoplasm as reliable prognostically.
The second key feature the WHO94 scheme evaluates is architectural change, essentially in the form of glandular outpouchings (“budding”), infoldings, and any other abnormal glandular configuration (“back-to-back positioning”). This feature, while crisply described in textbooks, is subjective in practice and open to considerable interpretation and variation. The EIN system also scores architectural change, but morphometrically with an easily quantifiable proxy in the linear length of basement membrane per gland (OSD=outer glandular density)23. Any deviation from a perfect circle shows as an increased total length of the basement membrane. The OSD is the 2nd most important component of the D-score10.
The most important contributor to the D-score, the VPS (volume percentage stroma), does not formally exist as a WHO94 criterion. Nonetheless, pathologists often intuitively incorporate this feature (usually in the inverse as “glandular crowding”) when diagnosing complex atypical hyperplasia because the glandular component is so dominant. The importance of the current findings is that the VPS quantifies and recognizes the stromal-glandular ratio, which pathologists use on a daily basis. A major reason for the lack of fixed WHO94 to EIN concordance is that some EIN variables such as lesion size and objective assessment of glandular crowding (“volume percentage stroma”) do not appear as part of the WHO94 scheme.
The D-Score, using objective measurements, is highly reproducible, as shown by high interobserver correlation coefficients (R≥0.91) among 17 different well-trained technicians performing these measurements over the 22 year period4, 10, 14. In addition to excellent reproducibility of measurements, the correlations between the presence and absence of disease and later outcome were also high. The data from this study has also been important to further refine the decision thresholds for the D-Score. Previous studies were inconclusive concerning prognostic significance of a D-Score between 0 and +1 due to relatively few cases, a problem resolved in the current greatly expanded study. Hyperplasias with scores between 0 and +1 behaved more like those with D-Score <0, and are thus incorporated into what is considered as EIN.
A molecular-genetic quantitative foundation, partially based on PTEN tumor suppressor gene inactivation15, 24 underlies the EIN classification, distinguishing it from the WHO94, which was developed based on work from the 1980s devoid of molecular input. EIN lesions are monoclonal outgrowths of transformed cells, which begin as a discrete focus and over time spread to involve larger fractions of the endometrial compartment14, 15, 24. The concept of a focal origin is highly relevant when interpreting the endometrial samples and emphasizes the need to select representative areas of the diagnostic lesion. Large scale architectural features of EIN lesions, particularly the crowding of glands to a point where the gland area exceeds that of stroma (VPS<50%) is of practical significance at low magnification when selecting those individual regions worthy of high resolution scrutiny.
Technical adequacy of specimens and adherence to exclusion of mimics is a prerequisite for accurate EIN diagnosis. The prognostic significance of architectural features underlines the requirement for sufficiently large tissue fragments to evaluate these parameters. Our conclusions apply to those fragments in which the diagnostic focus is at least 1mm in all dimensions. Some cases excluded due to excessive fragmentation contained lesions that were suboptimal because of artifactual distortion. Other cases were also excluded since the epithelium was in strips, devoid of underlying stroma, precluding both morphometric analysis and diagnostic evaluation. All limitations are relevant whether the pathologists wish to apply the WHO94 or EIN schemes. An idiosyncracy of the morphometric implementation of the EIN system is its inability to distinguish EIN from gland-rich areas such as commonly found in mid-to-late secretory endometrium, but these are easily recognized by a well-trained pathologist.
One practical advantage of the D-score is that it can be applied to standard H&E-stained tissue sections, provided the lesion is of sufficient size. Because standard H&E sections are used, there are no additional consumable costs for expensive chemicals. Moreover, equipment for computerized morphometric D-score analysis is commercially available worldwide from different manufacturers. The total costs for a non-automated morphometry unit alone, without microscope is around US$14,000; the hardware elements including a microscope with an automated mechanical scanning stage would be around US$30,000-50,000, with direct costs averaging US$25 per case. The initial capital outlay for a highly automated interactive morphometry workstation is compensated by a lower running cost (approximately 15-30 minutes technician time and 3 minutes pathologist time per case). These expenses clearly fall within the operational budget of the host pathology department. In the Netherlands4 and Norway (unpublished results), use of the D-Score to triage women into treatment groups has standardized the therapeutic decision making process and reduced the frequency of over- and under treatment of endometrial hyperplasias, many of which would have been inconsistently or incorrectly diagnosed under WHO94. The benefit to patients, and savings in unnecessary surgery, more than counterbalances increased pathology costs. It has led to a decline in diagnosis of precancers on the one hand, but detection of early cancers that went undetected by WHO94 on the other. Although cost-effective overall, this is definitely a program that requires increased fiscal support of pathology departments in anticipation of gains to be seen elsewhere. Once the morphometrical equipment has been installed and is in use, a large number of non-endometrial25 and endometrial cancer26 morphometrical applications immediately becomes possible at minimal added cost and effort.
Additional costs are related to training and the actual time of performance for technologist and the pathologists’ time. While highly automated interactive morphometry systems are used in the routine patient care setting in many European institutions, until now such systems are rarely used in the US. Recent work has shown that subjective EIN diagnosis by pathologists using criteria designed to mimic stratification across the D-score threshold of 1 can be used, but these are not as reproducible as the formal D-score27.
In summary, this long-term follow-up study of 477 women with hyperplasia suggests that the EIN classification scheme more accurately predicts the development of future cancer than the WHO94 scheme. The results of morphometric analysis are highly reproducible, as shown by its routine use in multiple laboratories in different countries. Under WHO94, many patients would have been erroneously classified as having premalignant disease, but within the EIN scheme, these women are more correctly classified as having a non-neoplastic, benign, lesion.
We thank Prof. Peter J. Kenemans, Susan Allen-de Jong, RNRM and Dr. Kjell Kjellevold for critically reading the manuscript and E. Baak, E. Wisse-Brekelmans, J. Van Eijk, E. Matze-Cok, S. Smith, J. Schuuring, L. Schuurmans, J. Brugghe, J. Konneman, I. Kleivan, S. Lysne, H. De Zeeuw, L. Hiwot Taddele, Bianca van Diermen, Emiel Janssen and all other technicians for their skilled technical assistance in performing the morphometric assessments.
Supported by grants #28-1203 (JPA Baak) from The National Health and Research Council of The Netherlands, ZonMw, and RO1-CA92301 (G. Mutter) from the National Institutes of Health, USA.