|Home | About | Journals | Submit | Contact Us | Français|
Aberrant crypt foci (ACF) have emerged as a putative precursor to colorectal adenoma, with potential use as a biomarker of colorectal cancer (CRC). However, there are wide differences in ACF prevalence, dysplasia and histological confirmation rates across studies. These differences may in part be due to variability in identification of endoscopic criteria.
To systematically evaluate the accuracy and reliability of various endoscopic criteria used to identify ACF using magnification chromoscopic endoscopy (MCE).
Images obtained via MCE were shown to participating endoscopists who diagnosed them as an ACF or not, and assessed them for the endoscopic characteristics used to identify ACF in the literature.
were the predictive ability of the endoscopic criteria (crypt number, staining, margin, crypt size, epithelial thickness, lumen shape) for histologic confirmation of an ACF, and their reliability across endoscopists. The accuracy of the examiners in identifying ACF that were histologically confirmed was also assessed.
The inter rater agreement rate for all except one of the endoscopic criteria (crypt number) was low, and did not improve with training. None of the criteria could significantly predict histological confirmation of ACF. Despite training exercises, accuracy of endoscopists to correctly identify a histologically proven ACF remained low.
Still images with 40X optical magnification were analyzed rather than real time endoscopy. All ACF samples were hyperplastic; none were dysplastic.
No endoscopic criteria evaluated by our study predicted histological confirmation of ACF. Magnification chromoendoscopy had low accuracy and poor reliability.
Aberrant crypt foci (ACF) have emerged over the last decade as a putative precursor to colorectal adenoma. ACF were initially identified as the earliest recognizable lesions on the colonic mucosa of rodents exposed to colorectal carcinogens 1, 2, and animal studies have shown ACF to be an important predictor of CRC development 3, 4. Shortly after the description in animals, ACF were discovered in pathologic specimens of human colonic mucosa 5. More recently, ACF have been identified in human colonic mucosa in vivo via magnification chromoscopic endoscopy (MCE) 6, 7.
Cross sectional studies have found that the ACF prevalence and density are greater in patients with colorectal carcinoma and adenoma, compared to normal controls 7–10, emphasizing the potential use of ACF as a biomarker of colorectal carcinoma. However, there is a significant variability in the criteria and methods used to identify and define ACF on endoscopy. The criterion used most commonly is darker staining 6–8, 11 compared to the surrounding normal mucosa. Larger crypt size 7, 11, raised appearance 6, 9, 11, thicker epithelial lining 7, and dilated or slit like crypt lumen 9 compared to the surrounding normal mucosa are other frequently employed criteria. Methylene blue is the most commonly used dye for mucosal staining; however, indigo carmine has also employed 8, 10.
There is wide variability in the data reported from studies using MCE 1. For example, the prevalence of rectal ACF in patients with a normal colon on colonoscopy ranges from 15% 8 to 100% 11, and the proportion of ACF having dysplastic changes in patients with sporadic colorectal carcinoma ranges from 0% 9 to 61% 8. The rate of agreement between the endoscopic impression of the presence of an ACF and histological confirmation is also variable, ranging from 53% 9 to 92% 7.
Due to these widely variable results, the ACF ancillary study of the Prostate, Lung, Colon and Ovarian (PLCO) cancer screening trial was initiated. This large, multi center study examined ACF prevalence and risk factors and ACF reproducibility across institutions, populations and endoscopists 12, 13. In the PLCO study, 589 subjects at 4 clinical centers underwent a flexible sigmoidoscopy (FSG) using HMCE for ACF detection, at year 0 and year 1.
During the main phase of the study, images of endoscopic ACF (some of them confirmed histologically and some not) were shown to 5 participating endoscopists, blinded to the endoscopic and histological diagnosis. The inter-endoscopist agreement rate on whether the image represented an ACF was poor (multi-rater kappa score of .2 to .3). The accuracy of the examiners to correctly identify an ACF, based on histology as the gold standard, was 48–66%, and only 60% of the ACF identified endoscopically could be confirmed on histology 13.
Considering the variability observed in the ACF literature and levels of accuracy and inter rater agreement seen in the PLCO ACF study, we hypothesized that the criteria used to identify ACF are poorly reliable across endoscopists and poorly predict histologic confirmation of an ACF. The aims of this study were to 1) identify endoscopic criteria that are the most predictive of histologic ACF and 2) to identify those that are most reliable across endoscopists. We conducted three training exercises, to standardize the assessment of endoscopic criteria, hoping this would improve inter rater agreement and concordance of the endoscopic detection of ACF with histology, and then reassessed the reliability and accuracy of ACF detection across endoscopists.
Images were obtained from the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial ancillary study of ACF 12. Subjects at four centers (Georgetown University, Washington University, University of Pittsburgh and The Marshfield Clinic) were eligible if they had an adequate flexible sigmoidoscopy screening examination at the baseline exam of the PLCO trial. All subjects were participants of the PLCO cancer screening trial, a multi-center randomized clinical trial of cancer screening, which includes flexible sigmoidoscopy (FSG) 14. Subjects with abnormal screening FSG were referred to their personal physicians for evaluation of screen-detected abnormalities and were tracked to determine the results from subsequent diagnostic work-up, such as colonoscopy. In the ACF study, subjects with advanced and non-advanced adenoma, hyperplastic polyps, or negative sigmoidoscopy or colonoscopy exams were included.
Examiners underwent standardized training, and the ACF exam was performed under a defined protocol using methylene blue dye (0.2%) and the Fujinon ES-410CE5 sigmoidoscope (Fujinon, Wayne, NJ) 12 The sigmoidoscope was equipped with an optical magnification of 40X (with an added 80X electronic magnification) and had a 450 thousand charged coupled devices (CCD) pixel density resolution. ACF in the rectum were assessed and when detected, removed by biopsy and subjects returned 1 year for a repeat exam. The endoscopic definition of an ACF was: “A lesion with crypts larger in diameter than surrounding normal crypts, having thicker epithelium, which may be darker staining, and must be <2mm raised”. Not all criteria were required for identification of an ACF. For this study, images, stripped of patient identifying and clinical information, were posted online for review, in JPEG format.
All examiners took part in extensive training for the study, which included some examiners personally observing ACF exams at the Sapporo Medical University School of Medicine in Japan. Prior to study initiation, a joint conference among examiners with a live demonstration of the ACF exam technique, review of the protocol, and ACF definition was conducted. In addition, a pilot phase was built into the study, which consisted of ACF exams in 78 subjects, after which the study researchers met and reevaluated the study protocol. The goals of these pilot sessions and conferences were to adequately standardize the protocol across special study sites. The issues discussed and agreed upon included the precise means of bowel preparation, the amount and concentration of the dye to be used, the area to be examined, and protocol for photo documentation and biopsy of the mucosa. We also conducted quality control exercises to stream line the manner in which abnormal findings were being interpreted by the investigators. Five participating ACF endoscopists evaluated 50 images of suspected ACF and normal mucosa, and assessed whether they appeared to be ACF or not; the endoscopists were blinded to their endoscopic and histologic classification. We conducted 2 such exercises, each with 50 images, aiming to standardize the manner in which the investigators were using the endoscopic appearances to identify ACF 12, 13.
All centers followed the identical protocol for handling of specimens. Formalin-fixed specimens were retained in room temperature 10% buffered formalin for up to 24 hours Formalin was then removed, and the specimens were transferred to 70% alcohol, and prior to shipping to phosphate buffered saline. Snap-frozen specimens were stored at −80°C and shipped on dry ice. The number of slides per biopsy sample ranged from 5 to greater than 25. All slides were stained with H&E and independently examined for evidence of ACF by two pathologists at UCLA. Pathologists were blinded to the endoscopic classification of the biopsy (ACF vs. normal) and to each other’s diagnosis. With discordant diagnoses, the slides were re-screened, with both pathologists again blinded to the other’s diagnosis. Any remaining discrepant slides were reviewed and discussed openly, and a consensus diagnosis was agreed upon.
The biopsies classified as ‘normal mucosa’ had typical straight, tube like crypts, lined by cells with basally oriented nuclei and apical mucin. The biopsies that were not normal, (ACF) were categorized as hyperplastic, mixed hyperplastic/dysplastic or dysplastic15, 16. Criteria used for hyperplastic ACF were identical to those used for hyperplastic polyps, i.e., crypts with a star-shaped or serrated luminal appearance, with cells containing open vesicular nuclei. Nuclei in hyperplastic ACF focally could “pile up” in areas with luminal tufting, however, diffuse pseudostratification was not seen. In contrast, dysplastic ACF were defined as having nuclei that diffusely showed pseudostratification; these nuclei also contain coarse, often smudgy, chromatin, and the crypt profile is smooth and tube-like. Mixed hyperplastic/dysplastic ACF show a combination of features. This can be seen as either two distinct adjacent areas within the same lesion (one adenomatous and the other hyperplastic) or as a combination of features, for example, serrated shaped crypts with diffuse pseudostratofication of nuclei with coarse, smudgy chromatin.
The images used in the study were randomly selected from the PLCO ACF ancillary study database. A flow diagram of the evaluation of ACF in this study is presented in Figure 1.
42 images of lesions initially interpreted during endoscopy as an ACF were evaluated by a new observer (blinded to the histological diagnoses of these lesions) for each of the endoscopic criteria that have been used to define ACF in the literature including the number of crypts in the lesion (size), staining of the crypt epithelium compared to the surrounding mucosa, the margin or whether the lesion was demarcated or not, the size (diameter) of the crypts compared to the surrounding mucosa, the thickness of the epithelial lining compared to crypts in the surrounding mucosa, and the shape of the lumen of the crypt. Lumen shape was assessed by two separate scales -round or not, and actual shape (round, oval, semicircular, slit, asteroid, or nondistinct). Each of the endoscopic criteria was assessed for association with histological confirmation of an ACF.
To help build consensus in the assessment of ACF across endoscopists, three new sets of images (10, 8 and 6 images) were evaluated independently by four participating investigators for each of the endoscopic criteria of ACF (see above). All the images represented lesions suspected to be ACF on initial endoscopy, except for three images of normal mucosa in the 1st set. The endoscopists were again blinded to the results of initial endoscopy and histology. After each set of images was assessed, the investigators held a teleconference to examine the differences amongst them in an attempt to standardize the assessment of the endoscopic criteria.
After completion of the training exercises to standardize the endoscopic assessment of ACF, 113 new images were assessed by two of the investigators, who were blinded to their histological status. Of the 113 images, 80 were confirmed by histology as an ACF. The inter rater agreement was measured.
To assess the intra rater agreement rate, one of the investigators evaluated the 113 images a second time, separated by a 3 month span.
Stata 10® was used for the statistical analysis. Simple logistic regression was used to assess the association of the endoscopic criteria with histological diagnosis. To examine whether all of the criteria combined were predictive of histology, we employed non-parametric linear discriminant analysis. Cohen’s kappa was used to assess intra and inter rater reliability.
Results of Step 1, the evaluation of six criteria used for assessing an ACF are summarized in table 1. The resolution of the images however did not permit satisfactory assessment of all the candidate criteria. None of the criteria were statistically significantly associated with a histological confirmation of an ACF, though increased epithelial thickness was borderline associated with ACF histology (p=0.09).
Table 2 displays the results of step 2, a training exercise aimed at improving the inter-observer assessment of endoscopic criteria for the determination of ACF. Four investigators independently analyzed 3 sets of 10, 8 and 6 endoscopic images and after each assessment, reviewed their results as a group to build consensus on how to interpret and apply endoscopic criteria to particular images. Several issues that contributed to inconsistency across endoscopists were identified. The farther away a comparator crypt was in the surrounding normal mucosa, the smaller it appeared. It was decided to choose comparator crypts as close to the margin of the ACF as possible. Since there was often heterogeneity within a possible ACF, each lesion was examined in 4 quadrants to better identify sources of disagreement. Furthermore, it was noted that pooling of dye within the crypt lumen can result in the lesion appearing darker than the surrounding mucosa, but this was distinguished from darker staining of the epithelium (figure 2). During these exercises, the agreement rate of endoscopists on whether the lesion was an ACF or not was initially good (kappa 0.61), but dropped to poor and fair levels in subsequent sets. The agreement rate remained excellent for the number of crypts in the lesion (kappa ≥ 0.8). The agreement rate for margin improved to the good range (kappa 0.5 to 0.75) and that for staining to the fair range (0.3 to 0.5) by the 3rd set (Table 2). For the rest of the endoscopic criteria, the agreement rate was poor (kappa < 0.3) and showed no major improvement over the course of the exercise.
After standardizing the application of the endoscopic criteria via training and consensus building, two endoscopists analyzed a new set of 113 images suspected to be ACF in steps 3 and 4. Using the techniques developed in step 2, especially the examination of the lesions in four quadrants, all the endoscopic criteria could be assessed in all the images. The first examiner’s accuracy in identifying the lesions’ histological diagnosis was 50.4%. The other examiner’s accuracy was 55.6% in the 1st evaluation (step 3) and 62% on repeat evaluation 3 months later (step 4).
The inter rater agreement for the endoscopic diagnosis of an ACF was fair (kappa=0.32) and the intra rater agreement rate 3 months later was good (kappa =0 .52) (Table 3). Among the endoscopic criteria, the inter rater agreement was excellent for the number of crypts (kappa = 0.83), fair for lesion margin (k = 0.49), crypt diameter (k = 0.34) and epithelial thickness (k = 0.39) and poor for staining (k = 0.21). The intra rater agreement rates were good for the number of crypts (k = 0.83), poor for lumen shape (k = 0.05) and good (k = 0.52 – 0.66) for all others.
Whether any of the endoscopic criteria were associated with the histological diagnosis of an ACF was evaluated using simple logistic regression. None of the endoscopic criteria were significantly associated with a histological diagnosis of ACF (all criteria tested individually, data not shown). The results of the linear discriminant analysis showed essentially no predictive ability of the criteria in any combination. Using the six reproducible criteria, as assessed by the 1st reader, the error rate according to the discriminant function was 39.8%, as compared to an error rate by chance alone (based on the overall proportion of ACF’s in the sample) of 41.7%. Results using the 2nd reader’s calls were similar. To isolate the question of predictive ability from the issue of reader variability, we examined the ACF histology outcome when both readers agreed on an individual criterion. Even when only images where the two raters agreed were included, none of the endoscopic criteria were associated with ACF histology (Table 4). For example, among lesions where both raters agreed had a darker stain, 74% were histologically confirmed as an ACF, compared to 71% when both raters agreed the lesions did not have a darker stain (Table 4). Since all the ACF examined in our study fell in the hyperplastic category, predictive ability of the endoscopic appearances for the histological subtype of ACF could not examined.
Biological data on ACF as a biomarker of colorectal carcinoma is promising, but inconsistency in the estimates of prevalence, in histologic confirmation, and in criteria for endoscopic detection make standardization of ACF detection a priority1. Given variability seen in the previous ACF literature and the low level of accuracy demonstrated by the relatively poor rate of histologic confirmation of endoscopically suspected ACF (48–66%) in the PLCO ACF ancillary study, the objective of this study was to evaluate the ability of specific features of the endoscopic appearance of ACF to predict histologic confirmation of an ACF, and to evaluate their reliability across endoscopists. We conducted 3 training exercises to standardize the endoscopic criteria assessment, and the accuracy and inter rater agreement were reassessed among endoscopists, to see if improvement resulted.
Initially, none of the endoscopic criterion predicted the histological confirmation of an ACF (step 1). During the training exercises, we did see some improvement in the agreement rates of lesion margin and staining. However, when what was learned was applied to a new set of 113 images, the reliability across endoscopists remained low (k <0.5) for all criteria except the total crypt number. Furthermore, none of the endoscopic criteria were even weakly associated with a histological diagnosis of ACF, even when we limited the analysis to criteria which endoscopists agreed upon. Two prior studies have used crypt lumen shapes not only to identify ACF, but also to determine the presence of dysplasia 7, 8, with reported excellent accuracy (≥ 85%). In our study, reliability in assessment of lumen shape was especially poor. Even the same rater could not consistently evaluate lumen shape when the same images were viewed 3 months later (kappa = 0.05). We classified lumen shape on 3 different scales - round or not (step 1), a scale which delineated the exact shape (step 1), and yet another one, evaluating whether the lumen were compressed or not (steps 2 and 3). None of the assessments found an association between lumen shape and a histological diagnosis of ACF.
The degree of agreement among endoscopists whether a lesion is an ACF or not was low in the PLCO ACF study (multi-rater kappa score of .2 to .3). During the training exercises, the agreement rate did not improve, and it remained at the same level when a larger set of images were assessed (step 3, inter rater kappa = 0.3). The results for the overall assessment of ACF are consistent with the results for each individual criteria (inter rater kappa < 0.5 for all criteria except crypt number). Thus, the global assessment of ACF presence is no different than any of the individual criteria that might be used to determine their presence.
The accuracy of an endoscopist to correctly identify an ACF based on histology as the gold standard in the PLCO ACF ancillary study was 48% to 66%, on 2 different quality control exercises, during which 5 endoscopists were shown images obtained from MCE 13. Despite attempts to reliably standardize the assessment of images with three intensive learning review sessions, which studied a few images in great detail, the results across endoscopists did not improve. In fact, one examiner’s accuracy in identifying a lesion’s histological diagnosis dropped from 74% in the PLCO ACF study to 50.4% in step 3 of our study, when a larger set of images were reviewed.
Undoubtedly, some of discrepancy in reliability and accuracy are due to naturally occurring variation in crypt architecture which may overlap with the criteria used to identify ACF. Further study of normal mucosa may help clarify the threshold for what is abnormal.
Some studies 7, 8 have reported excellent sensitivity and specificity (> 85%) for endoscopic ACF detection using histology as the gold standard, with an excellent agreement rate (92%) between endoscopic and histological diagnoses 7. Both of these studies however, did not view images in real time and analyzed images post colonoscopy. One study used graphical enhancement of images for analysis 8. Our agreement rates (53% for the evaluation of 113 images by 2 endoscopists in step 3) are dramatically lower. It is possible that the use of unenhanced images in our study was a factor contributing to lower agreement rates. Regardless, these results suggest that ACF detection is unlikely to be successful in routine clinical practice, since major efforts to improve reliability, such as image enhancement may be required.
We relied on histology as the gold standard, a standard which may not be optimal. ACF are small lesions, sometimes having fewer than 20 crypts, and it’s possible that the biopsy forceps missed or overwhelmed the lesion. Bleeding after biopsy often obscures the operative field, so it can be difficult to be sure the lesion has been excised. More importantly, because of issues in biopsy orientation and the small number of crypts affected in comparison to the biopsy sample, pathological diagnosis may not be reliable. Assessment of molecular abnormalities in histologically confirmed ACF and in endoscopically suspected ACF which are not pathologically confirmed will be an additional means of assessing the validity of the endoscopic classification of an ACF. Even if pathology were not used as the gold standard, our data on reliability show considerable variability across endoscopists, which was not attenuated despite training. Our data demonstrate that reliability in endoscopic assessments cannot be taken for granted. Endoscopic variability in assessment is not unique, and has been observed in diagnosis of ulcers and stigmata of bleeding 17 and in application of new technology such as narrow band imaging 18.
In conclusion, we found several areas of concern with the endoscopic detection of ACF using magnification endoscopy. None of the endoscopic criteria tested by our study predicted a histological confirmation of ACF. Despite attempts at standardizing the assessment of these criteria, the accuracy of endoscopists to correctly identify an ACF, based on histology as the gold standard did not improve. There was considerable variability among endoscopists on whether a lesion is or is not an ACF, and in agreement on whether endoscopic criteria associated with ACF were present in the lesion under observation.
The need for a reliable biomarker of CRC remains acute, as a marker could reduce the sample size requirements and duration of follow up in prevention trials. Use of digitally enhanced images, as employed by some investigators 8 may be a solution, but whether they can be implemented for use in large scale studies is unclear. Perhaps development of better endoscopes that permit an even higher resolution, or the use of confocal microscopy or other emerging real time histology technologies to overcome some of the limitations seen in our study may improve ACF detection. Longer training period for endoscopists may be required to reduce variability and improve accuracy. Currently however, limitations in ACF detection need to be appreciated and incorporated into the design of experimental testing using this technique.
Source of funding: This research was supported under contract N01-CN2551 from the National Cancer Institute.
Institutions where the study took place:
1. Georgetown University
2. Washington University, St. Louis, MO
3. University of Pittsburgh, Pittsburgh, PA
4. The Marshfield Clinic, Marshfield, WI
1. Digestive Disease Week.. May 17-22, 2008, San Diego, CA (Poster). Gastroenterology, 2008 (In press)
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.