|Home | About | Journals | Submit | Contact Us | Français|
It has been reported that the diagnosis of serous tubal intraepithelial carcinoma (STIC) is not optimally reproducible based only on histologic assessment. Recently, we reported that the use of a diagnostic algorithm that combines histologic features and coordinate immunohistochemical expression of p53 and Ki-67 substantially improves reproducibility of the diagnosis. The goal of the current study was to validate this algorithm by testing a group of 6 gynecologic pathologists who had not participated in the development of the algorithm (3 faculty, 3 fellows) but who were trained in its use by referring to a website designed for that purpose. They then reviewed a set of microscopic slides, which contained 41 mucosal lesions of the fallopian tube. Overall consensus (≥4 of 6 pathologists) for the 4 categories of STIC, serous tubal intraepithelial lesion (our atypical intermediate category), p53 signature, and normal/reactive was achieved in 76% of lesions with no consensus in 24%. Combining diagnoses into 2 categories (STIC vs. non-STIC) resulted in overall consensus in 93% with no consensus in 7%. The kappa value for STIC vs. non-STIC among all 6 observers was also high at 0.67 and did not significantly differ whether for faculty (κ=0.66) or fellows (κ=0.60). These findings confirm the reproducibility of this algorithm by a group of gynecologic pathologists who were trained on a website for that purpose. Accordingly, we recommend its use in research studies. Before applying it in routine clinical practice, the algorithm should be evaluated by general surgical pathologists in the community setting.
Serous tubal intraepithelial carcinoma (STIC) is the earliest morphologically recognizable form of pelvic (non-uterine) high-grade serous carcinoma (1–3), and, therefore, a diagnosis of STIC has important management implications when found in fallopian tubes prophylactically removed from women with a strong family history of ovarian cancer or who have been found to have germline mutations of BRCA 1/2. As pathologists are now frequently embedding all tubal tissue from surgical specimens and evaluating them with immunohistochemistry, a variety of other lesions ranging from those with normal appearing tubal epithelium that overexpresses p53 [“p53 signature”] (3–8) to lesions displaying cytologic atypia that falls short of STIC (3,7,9–16) have been encountered. The latter have been referred to by some as “tubal intraepithelial lesion in transition” (3), but we prefer the less committal term “serous tubal intraepithelial lesion” (STIL) since the nature of this lesion and its relationship to STIC have not been clearly established. These lesions pose considerable difficulty in diagnosis, and, therefore, reproducible diagnostic criteria for their diagnosis are required. This problem was recently highlighted by a reproducibility study of STIC by Carlson et al showing that this diagnosis is not optimally reproducible based only on histologic assessment (17). In a subsequent study, we developed an algorithm utilizing a combination of histology and immunohistochemical expression of p53 and Ki-67 for the diagnosis of STIC (18) and demonstrated that this method could substantially improve the reproducibility of the diagnosis. As that study was performed by experienced gynecologic pathologists and also included a training session before beginning the reproducibility analysis, one of the goals of the current study was to confirm whether the algorithm could be successfully employed by pathologists with varying levels of experience without a formal training session. Another goal was to determine whether the algorithm could be taught using a website specifically developed for that purpose.
H&E slides of fallopian tubes from women without known ovarian/tubal/peritoneal carcinoma who underwent prophylactic bilateral salpingo-oophorectomy were collected from Toronto University Health Network, The Johns Hopkins Hospital, and Memorial Sloan-Kettering Cancer Center. Slides were reviewed, and cases were selected to include a spectrum of lesions ranging from morphologically normal to cytologically malignant, including those with intermediate degrees of atypia. Immunohistochemical stains were performed using formalin-fixed, paraffin-embedded tissue sections at each of the 3 institutions as part of the initial diagnostic evaluation for each case with immunohistochemical/antigen retrieval protocols optimized for each antibody. Only cases with available H&E and immunohistochemical (p53 and Ki-67) slides were included in this study, which consisted of 41 lesions from 28 cases. These cases were part of our prior reproducibility study (18). The lesions were identified and marked with a dotting pen on the H&E and immunohistochemical slides by one pathologist (PAS). The slides were then randomly ordered and assigned individual study numbers by the epidemiologist (KV) so that all pathologists were blinded to patient identification.
Cases were assessed by 6 gynecologic pathologists (3 faculty [BMR, JDS, AY] and 3 fellows [MG, EK, RL]). First, each study pathologist reviewed an on-line training set consisting of normal/reactive tubal mucosa, STIL, and STIC (H&E and p53/Ki-67 stains) at http://www.ovariancancerprevention.org/?page_id=160. Then, each pathologist assessed the dotted foci on H&E and immunohistochemical slides for each of the 41 lesions in the test set to render a morphology-only diagnosis, individual immunohistochemical scores for p53 and Ki-67 slides, and a combined histologic/immunohistochemical final diagnosis using the algorithm at http://www.ovariancancerprevention.org/?page_id=191. A reproduction of this algorithm is shown in Fig. 1.
Briefly, the histologic features were evaluated, and a morphology-only diagnosis of unequivocal for STIC, suspicious for STIC, or not suspicious for STIC was established. These diagnoses were not based on any single major criterion or an arbitrarily determined number of minimal criteria since the morphologic spectrum of STIC is wide and no 2 cases are histologically identical. Rather, each pathologist rendered a diagnosis on the basis of a variable combination of histologic features (nuclear enlargement, hyperchromasia, irregularly distributed chromatin, nucleolar prominence, mitotic activity, apoptosis, loss of polarity, and epithelial tufting). The exact combination used in a given case was at the discretion of each study pathologist.
In histologically atypical lesions (i.e., STIC or suspicious for STIC), the immunostains were assessed only in the area that corresponded to the atypical focus. The percentage of cells showing nuclear expression in the lesion was determined. p53 was considered positive if the focus showed >75% cells with moderate to strong expression or a 0% labeling index (i.e., completely negative). Cases without either of these 2 patterns were considered negative for p53. In cases without histologic atypia (i.e., not suspicious for STIC), the denominator for the calculation of the percentage of p53-positive cells was based on a minimum length of 12 cells. It should be noted that the 0% labeling index pattern of a “positive” p53 stain can only be recognized for foci showing histologic atypia (i.e., STIC or suspicious for STIC) since it is not possible to localize the area with a 0% labeling index in cases without histologic atypia (i.e., not suspicious for STIC). Ki-67 was interpreted as low or high depending on whether <10% or ≥10% of cells showed staining, respectively. The determination of the percentage of cells positive for p53 or Ki-67 was not dependent on whether or not the lesional cells were ciliated.
Lastly, the final diagnosis (STIC, STIL [our atypical category intermediate between p53 signature and STIC], p53 signature, or normal/reactive) was based on a combination of the morphology-only diagnosis and coordinate immunohistochemical scores as per the algorithm.
Interobserver agreement was tested using Cohen’s Kappa statistics and corresponding 95% confidence intervals (19). Category-specific kappa statistics (morphology-only diagnosis, individual p53 and Ki-67 scores, and final diagnosis) were obtained in addition to an overall kappa value. Statistical analyses were performed with the STATA software package (version 11.1; StataCorp LP).
Kappa values for interobserver agreement for all 3 morphology-only categories, p53 scores, Ki-67 scores, and all 4 final diagnostic categories are listed in Table 1. For the 3 morphology-only categories, the reproducibility was best for “unequivocal for STIC” (κ=0.50) and worst for “suspicious for STIC” (κ=0.18). The reproducibility of the immunohistochemical scoring by itself was better for Ki-67 (κ=0.78) than p53 (κ=0.49). The addition of immunohistochemistry to the histologic evaluation showed an improvement in the kappa value for a diagnosis of STIC, from 0.50 (morphology-only) to 0.67 (final diagnosis). Regarding all 4 final diagnostic categories, STIC had the best reproducibility (κ=0.67) while STIL had the lowest (κ=0.27). Examples of lesions with and without consensus are shown in Figs. 2–5.
Based on consensus (≥4 of 6 pathologists), the distribution of diagnoses was: STIC in 32% of lesions (13/41), STIL in 17% (7/41), p53 signature in 12% (5/41), and normal/reactive in 15% (6/41). Overall consensus for all 4 categories was achieved in 76% of lesions (31/41) while non-consensus occurred in 24% (10/41). Among the non-consensus lesions, the main discordances between observers for each lesion were STIL vs. STIC in 60% (6/10), normal/reactive vs. p53 signature in 20% (2/10), normal/reactive vs. STIL in 10% (1/10), and p53 signature vs. STIL in 10% (1/10). Thus, STIL was one of the main diagnostic considerations in 80% of the discordant lesions. If STIL, p53 signature, and normal/reactive are collapsed into a non-STIC category, then overall consensus for STIC vs. non-STIC was present in 93% of lesions (38/41), and non-consensus occurred in 7% (3/41).
Kappa values for all 4 final diagnostic categories are listed in Table 2. STIC had the best reproducibility (κ=0.66) while STIL had the lowest (κ=0.13). Based on consensus (≥2 of 3 pathologists), the distribution of diagnoses was: STIC in 37% of lesions (15/41), STIL in 20% (8/41), p53 signature in 15% (6/41), and normal/reactive in 17% (7/41). Overall consensus for all 4 categories was achieved in 88% of lesions (36/41) while non-consensus occurred in 12% (5/41). Among the non-consensus lesions, STIC was diagnosed by 1 pathologist in 60% (3/5) while STIC was not diagnosed by any pathologist in the remaining 40% (2/5).
Kappa values for all 4 final diagnostic categories are listed in Table 2. STIC had the best reproducibility (κ=0.60) while STIL had the lowest (κ=0.27). Based on consensus (≥2 of 3 pathologists), the distribution of diagnoses was: STIC in 32% of lesions (13/41), STIL in 29% (12/41), p53 signature in 17% (7/41), and normal/reactive in 17% (7/41). Overall consensus for all 4 categories was achieved in 95% of lesions (39/41) while non-consensus occurred in 5% (2/41). Among the non-consensus lesions, STIC was diagnosed by 1 pathologist in 50% (1/2) while STIC was not diagnosed by any pathologist in the remaining 50% (1/2).
A comparison of the kappa values between faculty and fellows for all 4 final diagnostic categories is listed in Table 2. The reproducibility of STIC vs. non-STIC was similar for faculty and fellows (κ=0.66 and κ=0.60, respectively). However, the reproducibility between faculty and fellows for the 3 morphology-only categories (overall kappa values of 0.34 and 0.42, respectively), p53 score (κ=0.43 and κ=0.55, respectively), and Ki-67 score (κ=0.71 and κ=0.80, respectively) was slightly higher for fellows. For instances in which there was a consensus diagnosis (≥2 of 3 pathologists) for any of the 4 final diagnostic categories in the same lesion for both the faculty and fellow groups (83% of cases; 34/41), 29 lesions (85%) were concordant (faculty vs. fellows), and 5 (15%) were discordant (faculty vs. fellows). For each of the discordant lesions, however, STIC was the consensus diagnosis in 1 group (faculty or fellows) in 60% (3/5) while the consensus diagnosis in both groups (faculty and fellows) was a non-STIC category in the remaining 40% (2/5).
Histologic and immunohistochemical criteria for STIC (3,5–8,16,20–23), p53 signature (3–8), and atypical lesions intermediate between p53 signature and STIC (3,7,9–16) have been applied in a non-uniform fashion in studies from different institutions (Tables 3–5). Consequently, variability exists for these diagnoses. Previously, a histologic reproducibility study of STIC was conducted by Carlson et al (17) that included 14 and 16 cases of STIC and benign mucosa, respectively. Cases were shown as photographs in a Powerpoint program, which included H&E images without immunohistochemical stains. There were 12 observers (6 experienced gynecologic pathologists, 6 pathology residents). Criteria for STIC were not provided to the observers prior to review, and 2 diagnostic categories were selected (STIC and non-STIC). The interobserver agreement for all observers was poor (κ=0.333); however, it was higher for the experienced pathologists (fair-good; κ=0.453) compared with residents (poor; κ=0.253). Interestingly, ≥4 of 6 experienced pathologists agreed with the reference diagnosis in 9 of 14 (64%) STIC cases compared with only 1 of 16 (6%) non-STIC cases. Our findings of only moderate interobserver reproducibility of STIC based entirely on histologic features (Table 1) confirm those of the study by Carlson et al (17), as well as our recent reproducibility study (18) which utilized a different group of gynecologic pathologists than those tested in the current study. Our previous study also demonstrated the improvement of a histologic diagnosis of STIC by adding immunohistochemistry (18), which led to the development of the algorithm used in the current study.
The results of the present study confirm that an algorithm for the diagnosis of STIC, which combines morphology and immunohistochemistry for p53 and Ki-67, can lead to a high level of reproducibility by a group of gynecologic pathologists with varying levels of experience. The findings also demonstrate that the algorithm can be taught by referring to a website that was specifically designed for that purpose. Of the 4 categories assessed with this algorithm, STIC had the highest interobserver reproducibility while STIL (our atypical category intermediate between p53 signature and STIC) had the lowest, which may be partly due to the lower level of interobserver agreement for the interpretation of the p53 immunostain (Table 1); thus, improved criteria for diagnosing atypical intermediate lesions are needed. Our modification of the criteria for p53 signature (lack of atypia; overexpression of p53 in >75% of cells [with or without cilia] in a segment of tubal mucosa at least 12 cells in length; Ki-67 proliferation index <10%), which slightly differs from the definitions used in the literature (3,5–8), resulted in only moderate interobserver reproducibility. However, to the best of our knowledge, other criteria of p53 signature have not been tested for reproducibility in previous studies.
The algorithm (Fig. 1) is easy to use, and it is applied by first determining the morphologic category based on a number of histologic features (combination of nuclear enlargement, hyperchromasia, irregularly distributed chromatin, nucleolar prominence, mitotic activity, apoptosis, loss of polarity, and epithelial tufting). Next, depending on the coordinate immunohistochemical expression of p53 (negative or positive) and Ki-67 (low or high), a final diagnosis of STIC, STIL, p53 signature, or normal/reactive is rendered. The morphologic criteria and immunoprofiles for a diagnosis of STIC in our algorithm (combined atypia [either “unequivocal for STIC” or “suspicious for STIC”], abnormal p53 expression, and an elevated Ki-67 index) were chosen because they had been used in other published studies (Table 3). Moreover, the immunohistochemical cut-off levels in this algorithm are based on biologically valid evidence. Specifically, the immunohistochemical criterion for p53 positivity is expression in >75% cells or complete absence of staining (“0% labeling index”). STICs with or without a synchronous high-grade serous carcinoma involving the ovary have p53 mutations in essentially all cases (4), and these 2 patterns of expression in ovarian high-grade serous carcinoma have been associated with p53 mutations in nearly 95% of cases (24). It should be noted that since the pattern of complete absence of p53 (“0% labeling index” pattern) [Fig. 3] is typically associated with a p53 mutation that results in a truncated protein which is not identified by the p53 antibody, it should not be interpreted as “negative” (24–28). This pattern differs from true negative patterns of p53 (“wild-type” pattern) that predominantly lack immunohistochemical expression but which have occasional scattered weakly staining cells. Thus, it is important to recognize this pattern since misinterpretation as “negative” can result in underdiagnosing STIC. The Ki-67 cut-off level for this algorithm (10% positive cells) was selected because normal fallopian tubes generally have a Ki-67 proliferation index less than 10% (2,23,29). Also, the substantial interobserver agreement for Ki-67 in this study justifies the practicality of the 10% cut-off criterion (Table 1). Setting this threshold at higher cut-off levels could result in underdiagnosing some STICs. In fact, we had initially attempted using higher Ki-67 cut-off levels in our prior STIC algorithm study(18) [data not shown], but acceptable diagnostic consensus could not be achieved. Additionally, we have encountered anecdotal consultation cases that had a STIC and extra-pelvic disease in which the Ki-67 index in the STIC was very low. However, it should be noted that the mean Ki-67 index was high in one study of STIC although it was relatively low in others (Table 3). Thus, additional study is warranted to further refine these Ki-67 immunohistochemical criteria.
In conclusion, the present study validates the diagnostic reproducibility of our algorithm for diagnosing tubal mucosal lesions, and assists in standardizing their classification. Since the pathologists in this study had specific training in gynecologic pathology, it is premature to recommend the algorithm for routine use until it has been evaluated by general surgical pathologists. On the other hand, in studies of STICs performed by pathologists experienced in gynecologic pathology, use of the algorithm would be helpful in order to be able to compare the results of studies from different institutions. Further study is needed to refine diagnostic criteria for atypical intermediate lesions (e.g., STIL, tubal intraepithelial lesion in transition, dysplasia, etc.). In clinical practice, these atypical lesions should be designated descriptively, including a comment that the lesion is insufficient for a diagnosis of STIC. Similarly, the designation “p53 signature” should not be used as a diagnostic term in a pathology report and is best reserved for research at the present time.
Funding: CDMRP grant OC100517 from the Department of Defense