|Home | About | Journals | Submit | Contact Us | Français|
A multitarget stool DNA test (MSDT) that showed higher sensitivity but lower specificity than a fecal immunochemical test (FIT) for hemoglobin in one recent study from the US and Canada, is increasingly used for colorectal cancer (CRC) screening, despite its ~20-fold higher costs compared to FITs. We aimed to assess diagnostic performance of a quantitative FIT in an independent study among participants of screening colonoscopy and to compare it with the previously reported performance of MSDT.
A total of 3494 participants, aged 50–84 years, who underwent screening colonoscopy in private gastroenterological practices in Germany, and who provided a stool sample before colonoscopy to be evaluated by a commercially available quantitative FIT (FOB Gold®) were included. Diagnostic performance (sensitivity, specificity) for detecting CRC or advanced precancerous lesions (APCLs) was evaluated by comparison of test results with findings at screening colonoscopy. In addition to the original cutoff, we used an adjusted cutoff yielding the same specificity as reported for the MSDT to enhance comparability.
The most advanced finding at colonoscopy was CRC and APCL in 30 (0.86%) and 359 (10.3%) cases, respectively. At a cutoff yielding the same specificity as reported for MSDT (86.6%), the sensitivities (95% CI) of the FIT for detecting CRC and APCL >1 cm were 96.7% (82.8–99.9%) and 54.3% (48.3–60.3%), respectively. These sensitivities are higher than those reported for MSDT (92.3% and 43.6%, p=0.66 and 0.003, respectively).
In this large screening population, FIT showed equivalent or better diagnostic performance in comparison to reported performance of MSDT.
Several randomized controlled trials have demonstrated that screening by guaiac-based fecal occult blood tests (gFOBTs) is effective in decreasing colorectal cancer (CRC) mortality.1–3 However, traditional gFOBTs are limited by low sensitivity for detecting CRC and its precursors.4 Meanwhile, fecal immunochemical tests (FITs) for human hemoglobin have been well established, showing better diagnostic performance than traditional gFOBTs.5,6 A recent meta-analysis reported a pooled sensitivity (95% CI) and specificity (95% CI) for detecting CRC of 79% (69−86%) and 94% (92−95%), respectively.6
In a recent study in a large screening population from the USA and Canada, a multitarget stool DNA test (MSDT) that combines quantitative molecular assays for KRAS mutations, aberrant NDRG4 and BMP3 methylation, and β-actin with a hemoglobin immunoassay was compared with a commercial FIT.7 In this study, which was sponsored by the manufacturer of the MSDT and which employed intentional oversampling of older adults ≥65 years of age, MSDT detected significantly more cancers but had more false-positive results than the FIT (sensitivity 92.3% versus 73.8%, specificity 86.6% versus 94.9%). This MSDT whose costs exceed the costs of FITs ~20-fold8 and which requires substantially more complex logistics for stool sampling (collection of a whole bowel movement) has been claimed to be the new high bar benchmark for noninvasive CRC screening.9 The test is commercially available as Cologuard™ and increasingly used for CRC screening after Food and Drug Administration (FDA) approval in August 2014 and start of Medicare coverage in October 2014.
We assessed diagnostic performance of a quantitative FIT in an independent study conducted in the target population and typical age range for CRC screening in Germany and compared the results with the reported performance of MSDT.
This analysis was conducted in the context of the German BLITZ study (Begleitende Evaluierung innovativer Testverfahren zur Darmkrebsfrüherkennung), which has been described in detail elsewhere5,10 and which is registered in the German Clinical Trials Register (DRKS-ID: DRKS00008737). Briefly, participants of screening colonoscopy, which is offered in Germany since 2002, are consecutively recruited in gastroenterology practices in Southern Germany since December 2005. Stool and blood samples are collected from participants prior to colonoscopy. Clinical data are extracted from colonoscopy and histology reports, in a standardized manner, by trained research assistants who, like the endoscopists, are blinded with respect to results of blood or stool tests. Written informed consent is obtained from each participant. The study was approved by the ethics committees of the University of Heidelberg and of the responsible state physicians’ boards.
In various periods of recruitment, different FITs were evaluated. For this analysis, we included participants recruited at ages 50–84 years (the age range included in the MSDT study) from November 2008 to September 2014 when the same quantitative FIT (FOB Gold®; Sentinel Diagnostics, Milano, Italy) was applied (n=4203). The following exclusion criteria were applied to ensure the study participants to represent an average risk screening population and to minimize the potential of false-negative findings of screening colonoscopy: 1) history of CRC or inflammatory bowel disease (n=32), 2) colonoscopy in the preceding 5 years (n=193), 3) inadequate bowel preparation before colonoscopy (n=432), and 4) incomplete colonoscopy (cecum not reached, n=52). Finally, 3494 remaining participants were included in the analysis.
Participants were handed out stool collection devices at a pre-colonoscopy practice visit. Before February 2012, participants were asked to fill a small plastic container with a native stool sample, store it in a provided plastic bag, keep it in the freezer and bring it to the practice visit for colonoscopy. At the practice visit, the sample was immediately stored at −15 to −40°C, shipped on dry ice to the study’s central laboratory (Labor Limbach, Heidelberg, Germany) and stored again at −70°C until analysis. From February 2012 onward, participants were asked to collect a stool sample according to the manufacturer’s instruction in a collection tube containing hemoglobin stabilizing buffer (10 mg stool in 1.7 mL extraction buffer; Sentinel Diagnostics; Ref. 11561H). The tubes were mailed in sealed envelopes to the German Cancer Research Center (Deutsches Krebsforschungszentrum [DKFZ]). At DKFZ, the tubes were kept at 2–8°C in the refrigerator before transfer in a temperature-controlled environment to the central laboratory, where they were stored at 2–8°C in the refrigerator until FIT analysis.
Laboratory personnel were blind with respect to colonoscopy findings. For analyses using frozen stool samples, the stool samples were thawed once for extraction of 10 mg stool, which was then diluted in the extraction buffer (1.7 mL, i.e., dilution: 1:170). All FIT analyses were conducted in fully automated manner using Abbott Architect c8000. Positivity was defined according to the cutoff recommended by the manufacturer (17 µg hb/g feces=100 ng hb/mL buffer).
All collection, arrival and analysis dates of fecal samples were documented. For frozen stool samples, the median time (interquartile range [IQR]) between fecal sampling and laboratory analysis was 6 (IQR=4−12) days; for fresh stool samples, the median time (IQR) between fecal sampling and arrival in DKFZ was 4 (IQR=3−5) days, and the median time between arrival at DKFZ and laboratory analysis was 2 (IQR=1−4) days.
As previously reported in detail,11 FIT results and diagnostic performance indicators were very similar for both stool-sampling methods, and data were therefore pooled for analysis. We first described the study population according to basic sociodemographic characteristics, overall FIT positivity rate and findings at screening colonoscopy, which was conducted blinded with respect to FIT results in all cases.
We then determined the sensitivity of FOB Gold according to the most advanced finding at screening colonoscopy: 1) CRC (any stage or stages I–III only), 2) advanced precancerous lesions (APCLs) including advanced adenomas (defined by at least one adenoma with any of the following features: ≥1 cm in size, tubulovillous or villous components, high-grade dysplasia) and sessile serrated polyps ≥1 cm, AND 3) non-advanced adenoma. Sensitivities were also derived for combinations of the aforementioned groups, such as participants with CRC or any APCL. Specificity was determined for participants without CRC or APCL.
To facilitate comparisons of diagnostic performance with MSDT, we also calculated sensitivities and specificities after adjustment of the FIT cutoff in such a way that it yielded the same specificity (86.6%) as previously reported for MSDT.7 This was achieved by lowering the cutoff from the value recommended by the manufacturer, i.e., from 17 µg hb/g feces to 8.5 µg hb/g feces. Furthermore, in order to assess diagnostic performance over a wide range of possible cutoffs, receiver operating characteristic (ROC) analysis using quantitative test results was conducted for the outcomes CRC and any advanced neoplasia (CRC or any APCL), and areas under the curves (AUCs) along with 95% CIs (derived by 2000 bootstrap samples) were calculated.
In addition to analyses in the entire study population, subgroup analyses were performed according to the location of the most advanced neoplasm (proximal of or at the splenic flexure, distal otherwise). To account for a potential role in age differences in our study population and the study population of the MSDT study,7 we also calculated age-adjusted values of FIT sensitivities and specificity as weighted averages of the age-specific values in age groups <65 and 65+ years, with weights equal to the proportion of study participants in the two age groups in the MSDT study.
Statistical analyses were performed with R statistical software version 188.8.131.52 Differences in sensitivity and specificity were tested for statistical significance by two-sided chi-square test and, where indicated, Fisher’s exact test at α=0.05.
Table 1 shows main characteristics of the BLITZ study population. Corresponding data from the MSDT study are shown for comparison.7 The BLITZ study included almost equal numbers of men and women. The vast majority (93.2%) of participants were aged between 55 and 74 years, mean age was 62.1 years.
At least one neoplasm was found in 30.8% of participants, the most advanced finding being CRC, APCL and non-advanced adenoma in 0.86%, 10.3% and 19.7% of participants. CRC was most commonly diagnosed in stage I or stage III, only three of 30 cases (10%) had stage IV CRC. A total of 77% of APCLs were ≥1 cm. The MSDT study population had included a slightly higher proportion of women (53.7% versus 50.3%) and a substantially higher proportion of participants ≥65 years (63.1% versus 32.8%).7 Prevalences of CRC and APCL were somewhat higher in the BLITZ study. Although these differences would affect positive or negative predictive values, they should not hinder comparisons of sensitivity and specificity.
Data on sensitivity and specificity of FOB Gold, using the cutoff recommended by the manufacturer (17 µg hb/g feces), are summarized in Table 2. CRC was detected in 29 of 30 cases, yielding a sensitivity of 96.7%. The single missed CRC was a stage I CRC in a 69-year-old woman. The sensitivities for any APCL, APCL ≥1 cm and non-advanced adenoma were 33.7%, 39.9% and 10.0%, respectively, and 73.3%, 45.4% and 38.6% for the combined end points “CRC or high-grade dysplasia”, “CRC or APCL ≥1 cm” or “CRC or any APCL”, respectively. With the exception of CRC, sensitivities were somewhat lower, but specificity was significantly higher (92.8% versus 86.6%, p<0.0001) than reported for Cologuard in the MSDT study.7
Shifting the cutoff of FOB Gold from 17.0 to 8.4 µg hb/g feces lowered the specificity to the level reported for Cologuard.7 This shift did not affect the already very high sensitivity for CRC, but increased the sensitivity for any APCL, APCL ≥1 cm and non-advanced adenomas to 47.4%, 54.3% and 19.5%, respectively (Table 2). All these sensitivities are higher (in case of APCL ≥1 cm or the combined end point CRC or APCL ≥1 cm significantly so, p=0.003 and 0.002, respectively) than the sensitivities reported for Cologuard.7 Age adjustment to the age distribution of the MSDT study7 increased the sensitivity of FOB Gold for detecting APCL ≥1 cm above the levels observed for Cologuard even without adjustment of the cutoff while maintaining superior specificity (Figure 1), given the substantially higher sensitivity in age group 65+ years compared to age group <65 years (49.6% versus 32.5%).
In order to assess overall test performance of FOB Gold over a wider range of cutoffs, ROC analyses were performed for the detection of CRC or any advanced neoplasm, i.e., CRC or any APCL. AUCs were 0.95 and 0.72 for the two outcomes, respectively. The corresponding AUCs for Cologuard had been reported to be 0.94 and 0.73, respectively.7
Table 3 shows sensitivities of FOB Gold by site of neoplasms, again using both the original and the adjusted cutoff. With both cutoffs, sensitivity was substantially higher for distal APCLs than for proximal APCLs. Similar site differences had also been reported for Cologuard.7 Significantly lower site-specific sensitivities for FOB Gold compared to Cologuard turned to non-significantly higher sensitivities when the cutoff of FOB Gold was adjusted to yield the same specificity as Cologuard.
In this large screening population, FOB Gold, a quantitative FIT showed good diagnostic performance not only for detecting CRC (96.7%) but also for detecting APCL, especially large APCL. In particular, 39.9% of APCLs ≥1 cm were detected at a specificity of 92.8%. When the cutoff for positivity was lowered to yield a specificity of 86.6%, the specificity previously reported for MSDT, the sensitivity for detecting APCL >1 cm increased to 54.3%, which is substantially and significantly (p=0.003) higher than the corresponding sensitivity reported for MSDT (43.6%).7 Our indirect comparison therefore suggests at least equivalent if not superior performance of a single quantitative FIT compared to MSDT.
The specificity of FOB Gold in our study is very close to the pooled estimate of FIT specificity of 94% derived in a recent meta-analysis of 19 studies.6 However, our estimate of sensitivity for detecting CRC (97%) is substantially higher than the corresponding pooled estimate in the meta-analysis (79%), even though there is a slight overlap of 95% CIs (83–100% and 69–86%, respectively). The meta-analysis had included three cohorts with similarly higher levels of sensitivity between 96% and 100%,13–15 but the numbers of CRC cases in those studies had been lower (in two of them substantially so: n=6, 14 and 28, respectively) than in our study (n=30).
To our knowledge, no previous study other than the original MSDT study,7 which had been sponsored by the manufacturer, has reported on the comparison of this test with an FIT in detail. Different FIT cutoffs, resulting in differences in both sensitivity and specificity (with higher cutoffs yielding lower sensitivities and higher specificities and vice versa), often make such comparisons difficult. Adjusting FIT cutoffs to yield the same specificity as MSDT enables an indirect comparison of performance of FIT and MSDT across studies. We have previously reported such an indirect comparison based on a different FIT used in an earlier phase of the BLITZ study.16 For this FIT (OC Sensor; Eiken Chemicals, Tokyo, Japan), performance had been found to be comparable to performance of MSDT. However, the sample size for this indirect comparison had been much smaller (including 15 CRC cases only), and results were reported in much less detail in letter form only.16 In the few previous comparative studies of OC Sensor and FOB Gold, diagnostic performance of these FITs was roughly similar,17–19 even though OC Sensor seemed to show some advantages in terms of analytical performance and test handling.20
When the same or even higher diagnostic performance can be achieved by FIT compared with MSDT, other aspects, such as practicality, acceptance and costs, are crucial criteria for test selection. Excellent adherence rates can be achieved in organized FIT-based CRC screening programs,21–23 and failure rates of FITs due to technical problems seem to be much lower than for MSDT (e.g., 0.3% versus 6.3% in the MSDT study).7 In light of these results, the ~20-fold higher costs and the need for collection of an entire bowel movement for the MSDT are clear arguments against the use of this test as currently offered for CRC screening. In fact, increased use of this test instead of FIT or other established CRC screening options, such as flexible sigmoidoscopy and colonoscopy, could strongly compromise the otherwise excellent cost-effectiveness of CRC screening consistently demonstrated in multiple studies24–29 even if longer screening intervals, such as 3- rather than the 1- or 2-year intervals commonly recommended for FIT,30 should be sufficient for this test. Empirical data supporting prolongation of screening intervals have recently become available for FIT,31 whereas we are not aware of such data for the MSDT. Whether this test should be considered the new high bar benchmark for noninvasive CRC screening as previously claimed9 therefore appears debatable.
Our study has specific strengths and limitations. Strengths include inclusion of a large study population from a true screening setting, with results of screening colonoscopy (the best available albeit not perfect reference test) being available for all study participants. Also, results were worked out and presented in such a way that they allowed the best possible comparison with previously reported results of MSDT. Nevertheless, the most important limitation is that the comparison could only be made in an indirect manner, as MSDT, which is very challenging in terms of sample collection (requiring a whole bowel movement), had not been part of the study protocol and could not be conducted retrospectively using stored stool samples. Therefore, differences in study populations might have confounded comparisons of diagnostic performance. Although both studies were conducted in the same age range (50–84 years) in a true screening setting, the proportion of participants ≥65 years of age was substantially higher (due to intentional oversampling) in the MSDT study than in our study population (63.1% versus 32.8%). As sensitivity was substantially higher among participants ≥65 years of age in our study, this difference led to some underestimation of the advantage in diagnostic performance of FOB Gold in our main analysis. In additional analyses, FOB Gold achieved equivalent sensitivity for detecting CRC or APCL ≥1 cm after adjustment to the age distribution to the MSDT study population even when using the original FIT cutoff that yielded substantially higher specificity.30 Even though the proportion of stage III CRC was higher and the proportion of stage II CRC was lower in our study than in the MSDT study, this difference should not have affected the comparison, as sensitivity was very high for all CRC stages. However, despite the overall large size of both studies, the low overall numbers of CRC cases (30 and 65, respectively) limit the power of sensitivity comparisons for the CRC end point. Our study also included larger proportions of participants with APCL. Other factors, such as the sex distribution, were roughly similar in both studies.
Therefore, despite its limitations, our study suggests that at least equivalent if not better diagnostic performance for detecting CRC and its precursors can be achieved with a single high-quality quantitative FIT as with MSDT. Given the ~20-fold higher costs of MSDT, its increasing use for CRC screening instead of FIT should be carefully reconsidered as it may compromise the otherwise very high cost-effectiveness of CRC screening.
We gratefully acknowledge the excellent cooperation of gastroenterology practices and clinics in patient recruitment and of Labor Limbach in sample collection. We gratefully acknowledge Dr. Katja Butterbach, Dr. Katarina Cuk and Ulrike Schlesselmann for their excellent work in laboratory management of stool samples. We also gratefully acknowledge Isabel Lerch, Susanne Köhler, Utz Benscheid, Jason Hochhaus and Maria Kuschel for their contribution in data collection, monitoring and documentation. The BLITZ study was partly funded by grants from the German Research Council (DFG, grant No. BR1704/16-1). The work of Hongda Chen was partly supported by the China Scholarship Council (CSC). The funders did not have any role in the design, conduct or reporting of the study.
The authors report no conflicts of interest in this work.