The purpose of this project was to see how well published algorithms identify breast cancer cases in more recent claims data overall and by population subgroup (i.e., by age, race, stage, and region). Algorithm sensitivity is lower for the 1998 data compared with the 1995 data, indicating that published algorithms may need to be updated due to changing patient characteristics or patterns of care. Differential sensitivity of the algorithms by SEER region likely reflects geographic variation in practice patterns, because two of the algorithms rely on administrative procedure codes. Rates of misclassification range from nearly 3 percent to just over 5 percent in 1998, with false negatives highest in Freeman's algorithm and lowest using Nattinger's method. Misclassification disproportionately affects older women and those diagnosed with in situ, metastatic, or unknown-stage disease. Subjects of older age are more likely to have comorbid conditions, and subjects with metastatic disease are more likely to be facing imminent death. These two categories and those with in situ (the least severe) breast cancer therefore do not receive as aggressive treatment (Ballard-Barbash et al. 1996
; Yancik et al. 2001
; Bouchardy et al. 2003
; Gold and Dick 2004
), leading to a smaller pool of breast cancer-related claims that the algorithms can use to identify cases.
Because the addition of age, race, and region variables to the algorithms' case indicator variable improves the probability of correctly identifying incident breast cancer cases, using demographic information may enhance case identification. As an example, when applying Nattinger's algorithm, age categories could be incorporated into step 3, with older women requiring fewer procedure codes to pass this step, as they may be less likely to receive aggressive treatment. Thus including these variables in the models may account for differences in treatment patterns due to age, region, and race, even though the demographic variables themselves are not indicators of cancer. Region variables may only be meaningful for the SEER areas and not for other studies where distinct regions are not well defined, however. It is also possible that the improved results are due to overfitting the model. We do not have an additional validation data set to test our findings.
PPV varies widely across the algorithms but improves over time with Warren's algorithm, although PPV is still lowest for this algorithm. PPV figures must be considered cautiously because our sample includes all breast cancer cases but only a 5 percent random sample of Medicare beneficiaries without breast cancer, yet we know PPV depends on disease prevalence. We present PPV to identify trends over time, but the absolute values may not be as meaningful.
The strength of this work is that our analyses include later years of data to represent more recent patterns of care (i.e., a shift to outpatient care), and we provide a head-to-head comparison of three algorithms using the newer data. We use a 5 percent random sample of nonbreast-cancer controls provided to us and assume that this is representative of the population without breast cancer. Otherwise, our results may be misleading.
Accurate identification of breast cancer cases has many implications for studying quality and costs of care. For true positive cases, we have all the information on subjects and would be able to study their treatment/surveillance patterns and costs of care. For false positive subjects, we would be evaluating care patterns of noncases to estimate health care utilization for breast cancer patients, thereby yielding underestimates of cancer costs and/or low compliance rates. For example, subjects without breast cancer should not be compliant with posttreatment mammography guidelines. We would therefore undercount the utilization of followup mammography in breast cancer patients. For true negative subjects, we would not anticipate any added error in our estimates. False negatives, however, would lead to a host of lost information, especially if they are differentially misclassified. We expect that the cases the algorithms miss would have fewer breast cancer-related claims due to less extensive or aggressive treatment, so they may more likely be early stage, older, facing imminent death, or with comorbid illness, and possibly of minority race. If one used the algorithms to identify cases for quality of care assessment, it could appear that there is less variation in care than actually exists, particularly for the vulnerable populations one might aim to study. In assessing costs (i.e., reimbursed charges) of care using these algorithms, one would in effect overestimate average costs because the lower costs associated with less aggressive treatment would not balance out the high costs of advanced disease with its more involved treatment. Also, cancer-staging information is not available in claims data, so studies that are stage-treatment specific would be hard to conduct without linkage to tumor registry data. Previous research has shown cancer-stage identification to be difficult with claims data (Cooper et al. 1999
). Important algorithm limitations to note are that Freeman's algorithm was developed for 65–74-year olds, Warren's was applied only to registries of entire states (not metropolitan areas), and none of the algorithms were designed to detect cases of in situ disease.
Because our study did not account for Medicaid claims data, there was concern that Medicare claims data for beneficiaries with state buy-in (SBI) coverage may be incomplete. Our findings did not bear this out, however (data not shown). A higher proportion of the older old in our sample does have SBI coverage (e.g., for 1998 data, almost 24 percent of those aged 80 and older have a full year of SBI coverage compared with 9.6 percent of those ages 65–69), but we could find no significant differences in rates of false negatives by SBI status within age groups for any of the algorithms for 1998 (p >.12 for all comparisons). We do note that 7 percent of white compared with 33 percent of black subjects had a full year of SBI coverage, but sample sizes are too small to draw meaningful conclusions about possible effects on algorithm performance. State buy-in coverage may act as a proxy of low-income status in our study sample, but likely does not directly affect the completeness of the utilization data, which challenges the notion that data for dual eligibles may be incomplete by using Medicare claims alone. In this study, the use of Medicare claims data appeared to be adequate to identify incident cases of breast cancer in SBI beneficiaries.
Some authors of the published algorithms recommended caution in using their algorithm to identify incident breast cancer cases, while others are more enthusiastic. We are not yet aware of any studies in which a researcher has used an algorithm alone to identify breast cancer cases. An important advancement in this field would be to refine an algorithm, which could be used to identify cases of recurrent cancer, information which most registries do not collect. Until the algorithms are refined, researchers probably should use the algorithms in isolation of cancer registry information only if they highlight the limitations of the method and there is no alternative. For other diseases, diagnosis and procedure codes may be more relevant to identify patient cohorts. In breast cancer, such codes often are used for patients undergoing diagnostic testing to rule out disease or before a definitive cancer diagnosis (e.g., breast abnormality of some sort, rather than breast cancer). In addition, cancer stage, which can greatly affect treatment received, cannot be determined from diagnosis and procedure codes. The next question is: how good does an algorithm need to be in order to be confident in its application to new data? As with any diagnostic test, the algorithms yield trade-offs between sensitivity and specificity. Future work should explore the biases of algorithm misclassification in assessing use and costs of health care services. In the meantime, algorithms should be applied very cautiously to insurance claims databases to assess health care utilization and costs of breast cancer care outside SEER-Medicare populations.