In this study, we assessed how well ovarian cancer biomarkers performed in pre-diagnostic specimens in asymptomatic women compared with performance in specimens obtained at diagnosis in a different set of subjects. Use of a common set of specimens allowed “head-to-head” comparisons. Generally, markers whose assays had poor CVs also had poor performance as biomarkers. None of the markers with a CV ≥ 30% had a sensitivity (at 95% specificity) better than 37% in either phase II or the phase III data (within 6 months of diagnosis). For “standard” tumor markers with better CVs, like CA125, HE4, CA72.4, and CA15.3, the performance in phase III data was actually quite comparable to the performance in phase II data when the blood in the cases had been drawn within 6 months of diagnosis. These same markers were observed to have good stability in the paired specimens drawn about a year apart in healthy subjects (see ). In contrast, for markers like prolactin, transthyretin, or apolipoprotein A1, which may be derived from the patients’ response to the cancer, performance was poorer in the phase III specimens even when they were taken within 6 months of the diagnosis compared to that at the time of clinical diagnosis. These were also markers which tended to show less stability in normal paired specimens.
Recently, a panel of markers has been approved to assess cancer risk in women who have an ovarian mass (6
). This panel includes several acute phase reactants (apolipoprotein A1, transthyretin, and transferrin) evaluated as part of this study. Although these are analyzed by immunoassays in the approved panel and by mass spectrometry in this study, inclusion of acute phase reactants raises concern about use of the test to screen for pre-clinical disease. Physicians and patients should not seek to extend the indication to early detection until it is tested and approved for this indication.
In comparing phase II and III results, differences in the sample sets should be acknowledged, including older age of the phase III subjects, ethnic differences, and more early stage cases in phase II specimens. We had oversampled early-staged cases, as did the Yale-GOG phase II study (15
), with the belief they might provide clues about better markers for detection of preclinical disease. Some of the phase II markers which ranked higher in early-staged cases than in all cases included CA19.9, apolipoprotein A1, and prolactin. However, these markers did not perform well in the phase III specimens, challenging the assumption that markers for early stage disease are good screening markers. Early stage ovarian cancer tends to be borderline malignant or low grade tumors and types like mucinous that are slower growing and generally have a better prognosis than high grade serous tumors, relatively few of which are diagnosed at stage I or II.
An important limitation of this study in making inferences about performance of markers relative to CA125 is that CA125 was used in “real time” for triaging women for diagnostic workup for ovarian cancer. Therefore, women with a CA125 value of >35 may have been selectively excluded at testing points when their CA125 values reached that threshold. It should be noted in that the CA125 value used as the cutoff for 95% specificity was 24, not the standard cutoff of 35. When 35 was used as the cutoff, the specificity increased to 98.5% and the sensitivities for time periods 0-6, 6-12, 12-18, and >18 months decreased to 84%, 19%, 0% and 3%, respectively. The sensitivities for the over-6 month intervals, regardless of the cutoff, may be biased downwards because of intervention based on the CA125 level. Also, readers are cautioned not to over-interpret data from that the limitations of CA125 (or other markers) can be overcome by screening at more frequent intervals.
Because of this potential bias, phase III specimens which were not used in the setting of clinical decision making are of interest for comparison. A recent study by Anderson et al (22
) using specimens from the Carotene and Retinol Efficacy Trial (CARET) found a similar, though possibly less steep, drop-off in CA125 performance. The AUC for CA125 was 0.74 for cases within two years of diagnosis and 0.57 for 4 or more years from diagnosis. Anderson et al concluded that a panel including CA125, HE4, and mesothelin may provide signal for ovarian cancer three years before diagnosis and that incorporating prior marker values into the algorithm may also have value (23
). Markers showing high stability in healthy controls over time are likely to be the best candidates for this approach which, as noted, tend to exclude the acute phase markers. The longitudinal approach is currently being studied by PLCO and site investigators and will likely be the subject of future communications.
Relevant to the discussion, the PLCO reported results after four rounds of screening with CA125 plus transvaginal ultrasound. This regimen produced a high ratio of surgeries to detected cancers (19.5 to 1) without a clear shift toward earlier stage disease. The authors concluded that screening for ovarian cancer in the general population could not be recommended (23
), advice reiterated in several editorials (24
). In contrast, Menon et al. reported results of a prevalence screen in which two screening modalities were compared: CA125 used as the primary screen with referral for ultrasound if necessary vs. ultrasound alone. The trajectory of serial CA125 values was taken into consideration in interpreting positive or negative screening results (25
). The sensitivity, specificity, and positive predictive value for ovarian and tubal cancers were all higher with CA125 followed by ultrasound (89.4%, 99.8%, and 43.3%) than with ultrasound alone (75.0%, 98.2, and 5.3%). The ratio of operations per malignancy found within one year of the prevalence screen was 2.3 for CA125 but 18.8 for transvaginal ultrasound, the latter value being similar to what was observed in the PLCO. The authors concluded general population screening using CA125 was “feasible” (26
In summary, we tested 28 ovarian cancer biomarkers in pre-diagnostic specimens from the PLCO. CA125 remains the single best biomarker for ovarian cancer and has its strongest signal within 6 months of diagnosis. Though disappointing, this conclusion should be viewed in perspective of lessons learned from the study and future directions suggested. Refinement of assays with poor CV is desireable prior to phase III testing of a biomarker since, without this, its performance may not be fairly tested. Performance of a biomarker in phase II does provide clues about performance in phase III when the phase III specimen was taken within 6 months of diagnosis and when the marker does not represent an acute reaction to clinical disease. Whether the decline in performance of the current best ovarian cancer biomarkers in specimens more than 6 months remote from diagnosis is a limitation of screening studies where CA125 prompted clinical action or represents an inherent limitation due to a short lead time for ovarian cancer requires further study. If there is hope for reduced mortality for ovarian cancer through screening, we need markers that would show a signal for ovarian cancer more than 6 months remote from diagnosis. It may be especially worthwhile to focus discovery efforts on high grade invasive tumors associated with a normal CA125. Even phase II specimens with these characteristics will be valuable since cases with low CA125 at clinical diagnosis would likely have had a low CA125 during their pre-symptomatic phase.