Biological data on ACF as a biomarker of colorectal carcinoma is promising, but inconsistency in the estimates of prevalence, in histologic confirmation, and in criteria for endoscopic detection make standardization of ACF detection a priority1
. Given variability seen in the previous ACF literature and the low level of accuracy demonstrated by the relatively poor rate of histologic confirmation of endoscopically suspected ACF (48–66%) in the PLCO ACF ancillary study, the objective of this study was to evaluate the ability of specific features of the endoscopic appearance of ACF to predict histologic confirmation of an ACF, and to evaluate their reliability across endoscopists. We conducted 3 training exercises to standardize the endoscopic criteria assessment, and the accuracy and inter rater agreement were reassessed among endoscopists, to see if improvement resulted.
Initially, none of the endoscopic criterion predicted the histological confirmation of an ACF (step 1). During the training exercises, we did see some improvement in the agreement rates of lesion margin and staining. However, when what was learned was applied to a new set of 113 images, the reliability across endoscopists remained low (k <0.5) for all criteria except the total crypt number. Furthermore, none of the endoscopic criteria were even weakly associated with a histological diagnosis of ACF, even when we limited the analysis to criteria which endoscopists agreed upon. Two prior studies have used crypt lumen shapes not only to identify ACF, but also to determine the presence of dysplasia 7, 8
, with reported excellent accuracy (≥ 85%). In our study, reliability in assessment of lumen shape was especially poor. Even the same rater could not consistently evaluate lumen shape when the same images were viewed 3 months later (kappa = 0.05). We classified lumen shape on 3 different scales - round or not (step 1), a scale which delineated the exact shape (step 1), and yet another one, evaluating whether the lumen were compressed or not (steps 2 and 3). None of the assessments found an association between lumen shape and a histological diagnosis of ACF.
The degree of agreement among endoscopists whether a lesion is an ACF or not was low in the PLCO ACF study (multi-rater kappa score of .2 to .3). During the training exercises, the agreement rate did not improve, and it remained at the same level when a larger set of images were assessed (step 3, inter rater kappa = 0.3). The results for the overall assessment of ACF are consistent with the results for each individual criteria (inter rater kappa < 0.5 for all criteria except crypt number). Thus, the global assessment of ACF presence is no different than any of the individual criteria that might be used to determine their presence.
The accuracy of an endoscopist to correctly identify an ACF based on histology as the gold standard in the PLCO ACF ancillary study was 48% to 66%, on 2 different quality control exercises, during which 5 endoscopists were shown images obtained from MCE 13
. Despite attempts to reliably standardize the assessment of images with three intensive learning review sessions, which studied a few images in great detail, the results across endoscopists did not improve. In fact, one examiner’s accuracy in identifying a lesion’s histological diagnosis dropped from 74% in the PLCO ACF study to 50.4% in step 3 of our study, when a larger set of images were reviewed.
Undoubtedly, some of discrepancy in reliability and accuracy are due to naturally occurring variation in crypt architecture which may overlap with the criteria used to identify ACF. Further study of normal mucosa may help clarify the threshold for what is abnormal.
Some studies 7, 8
have reported excellent sensitivity and specificity (> 85%) for endoscopic ACF detection using histology as the gold standard, with an excellent agreement rate (92%) between endoscopic and histological diagnoses 7
. Both of these studies however, did not view images in real time and analyzed images post colonoscopy. One study used graphical enhancement of images for analysis 8
. Our agreement rates (53% for the evaluation of 113 images by 2 endoscopists in step 3) are dramatically lower. It is possible that the use of unenhanced images in our study was a factor contributing to lower agreement rates. Regardless, these results suggest that ACF detection is unlikely to be successful in routine clinical practice, since major efforts to improve reliability, such as image enhancement may be required.
We relied on histology as the gold standard, a standard which may not be optimal. ACF are small lesions, sometimes having fewer than 20 crypts, and it’s possible that the biopsy forceps missed or overwhelmed the lesion. Bleeding after biopsy often obscures the operative field, so it can be difficult to be sure the lesion has been excised. More importantly, because of issues in biopsy orientation and the small number of crypts affected in comparison to the biopsy sample, pathological diagnosis may not be reliable. Assessment of molecular abnormalities in histologically confirmed ACF and in endoscopically suspected ACF which are not pathologically confirmed will be an additional means of assessing the validity of the endoscopic classification of an ACF. Even if pathology were not used as the gold standard, our data on reliability show considerable variability across endoscopists, which was not attenuated despite training. Our data demonstrate that reliability in endoscopic assessments cannot be taken for granted. Endoscopic variability in assessment is not unique, and has been observed in diagnosis of ulcers and stigmata of bleeding 17
and in application of new technology such as narrow band imaging 18
In conclusion, we found several areas of concern with the endoscopic detection of ACF using magnification endoscopy. None of the endoscopic criteria tested by our study predicted a histological confirmation of ACF. Despite attempts at standardizing the assessment of these criteria, the accuracy of endoscopists to correctly identify an ACF, based on histology as the gold standard did not improve. There was considerable variability among endoscopists on whether a lesion is or is not an ACF, and in agreement on whether endoscopic criteria associated with ACF were present in the lesion under observation.
The need for a reliable biomarker of CRC remains acute, as a marker could reduce the sample size requirements and duration of follow up in prevention trials. Use of digitally enhanced images, as employed by some investigators 8
may be a solution, but whether they can be implemented for use in large scale studies is unclear. Perhaps development of better endoscopes that permit an even higher resolution, or the use of confocal microscopy or other emerging real time histology technologies to overcome some of the limitations seen in our study may improve ACF detection. Longer training period for endoscopists may be required to reduce variability and improve accuracy. Currently however, limitations in ACF detection need to be appreciated and incorporated into the design of experimental testing using this technique.