|Home | About | Journals | Submit | Contact Us | Français|
To develop and validate a clinically informed algorithm that uses solely Medicare claims to identify, with a high positive predictive value, incident breast cancer cases.
Population-based Surveillance, Epidemiology, and End Results (SEER) Tumor Registry data linked to Medicare claims, and Medicare claims from a 5 percent random sample of beneficiaries in SEER areas.
An algorithm was developed using claims from 1995 breast cancer patients from the SEER-Medicare database, as well as 1995 claims from Medicare control subjects. The algorithm was validated on claims from breast cancer subjects and controls from 1994. The algorithm development process used both clinical insight and logistic regression methods.
Training set: Claims from 7,700 SEER-Medicare breast cancer subjects diagnosed in 1995, and 124,884 controls. Validation set: Claims from 7,607 SEER-Medicare breast cancer subjects diagnosed in 1994, and 120,317 controls.
A four-step prediction algorithm was developed and validated. It has a positive predictive value of 89 to 93 percent, and a sensitivity of 80 percent for identifying incident breast cancer. The sensitivity is 82–87 percent for stage I or II, and lower for other stages. The sensitivity is 82–83 percent for women who underwent either breast-conserving surgery or mastectomy, and is similar across geographic sites. A cohort identified with this algorithm will have 89–93 percent incident breast cancer cases, 1.5–6 percent cancer-free cases, and 4–5 percent prevalent breast cancer cases.
This algorithm has better performance characteristics than previously proposed algorithms. The ability to examine national patterns of breast cancer care using Medicare claims data would open new avenues for the assessment of quality of care.
The quality of cancer care in the United States is known to be variable, and factors determining quality of cancer care have been insufficiently studied (Hewitt and Simone 1999). The development of methods for using existing databases to study the quality of cancer care would be a major advance (Hewitt and Simone 2000). Methods to permit the use of Medicare administrative databases to study cancer quality of care would be particularly helpful because about 60 percent of persons diagnosed with cancer are aged 65 and older (Hewitt and Simone 2000), and the Medicare claims data represent a nearly population-based source of data.
With respect to breast cancer specifically, several challenges have been identified in the use of Medicare claims in studying the care provided. The use of inpatient Medicare claims to identify incident breast cancer cases offers excellent specificity but poor sensitivity because 30–40 percent of initial breast cancer operations are done on an outpatient basis (Warren et al. 1999; Warren et al. 1996). Inpatient records are also more likely to identify patients undergoing mastectomy for initial therapy than those undergoing breast-conserving surgery (Warren et al. 1996; Cooper et al. 2000). Compared to inpatient data alone, the use of combined inpatient, outpatient, and physician claims increases sensitivity to 80–90 percent (Freeman et al. 2000; Cooper et al. 1999), but decreases specificity (Warren et al. 1999; Freeman et al. 2000). Because only a small percentage of the female Medicare population develops breast cancer in a given year, even small decreases in specificity lead to large decreases in the positive predictive value (PPV) (Freeman et al. 2000).
Our major goal in the development of this algorithm was to identify a cohort of incident breast cancer patients, whose surgical, medical, and follow-up care could be studied over time. Inherent in this goal was a requirement for a high positive predictive value (PPV), ensuring that a high percentage of the cohort was made up of true breast cancer patients. The requirement for a high PPV was considered more important than the algorithm's sensitivity, particularly for the small percentage (6–7 percent) of women not undergoing initial surgical therapy. However, we also considered important the consistency of the algorithm's sensitivity across subgroups defined by geographic location, age, and type of initial surgery undergone (breast-conserving surgery [BCS] or mastectomy.)
The prior work of the other investigators cited had adequately demonstrated that a relatively simple algorithm (generally consisting of the identification of a claim with a coincident breast cancer diagnosis and operative procedure) would not permit us to achieve our goal. Our strategy was to use an interaction of clinical rationale and statistical analysis in developing the four-step algorithm presented herein.
The key data source for this study was the linked SEER-Medicare database (SEER-Medicare Linked Database 2003). This database links information from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) tumor registries and the Centers for Medicare and Medicaid Services (CMS) Medicare claims data. The population-based SEER registries cumulatively represent about 14 percent of the U.S. population, and include information on incident cancer patients, such as demographics, month and year of diagnosis, extent of disease, and initial treatment undergone. The Medicare files required for this study include the Medicare Provider Analysis and Review (MEDPAR) file, which contains inpatient hospital claims; Outpatient file, which contains claims from institutional outpatient providers including hospital ambulatory surgery centers; the Carrier Claims (previously known as Part B Physician/Supplier File), which contains inpatient and outpatient claims from noninstitutional providers such as physicians, as well as stand-alone ambulatory surgical centers; and the Denominator file, which contains beneficiary demographic information and Medicare entitlement and enrollment information. About 94 percent of the SEER registry patients aged 65 and older were successfully linked with their Medicare claims (Potosky et al. 1993). An additional data source was a 5 percent random sample of Medicare beneficiaries residing in the SEER geographic areas, including an indicator for whether the individual linked to the SEER database. When SEER subjects are removed from this sample, it represents nearly a population-based random sample of cancer-free control subjects residing in SEER areas. This study was approved by the Medical College of Wisconsin Human Subjects Research Review Committee.
Training Set: Incident Breast Cancer Cases. A cohort of women aged 65 or older at the time of diagnosis of breast cancer in 1995 (according to SEER) was developed. Cases were excluded if the diagnosis was made only at autopsy or by death certificate. Subjects were required to meet the following criteria for the period from January 1995 to March 1996: eligibility for Medicare Parts A and B, not in a Medicare HMO, and to be alive. Eligibility through the first quarter of 1996 was required to capture Medicare treatment information for patients who were diagnosed near the end of 1995, but treated early in 1996. These criteria resulted in a cohort of 7,700 women, whose 1995 Medicare claims comprised the training set for incident breast cancer cases.
Training Set: Cancer-Free Subjects. From the 5 percent random sample of Medicare beneficiaries who resided in SEER areas but who did not link to SEER registry for an incident cancer between 1973 and 1995, a cancer-free cohort was developed. These 71,752 women were required to meet the same eligibility criteria as the breast cancer cases with respect to Medicare eligibility and survival. The 1995 claims of these women comprised the “cancer-free” training set.
Training Set: Other Cancer Cases. Again using the 5 percent random sample and the same eligibility criteria, a cohort was constructed of 4,501 women who were diagnosed with a cancer other than breast cancer between 1973 and 1995. The 1995 claims of these women comprised the “other cancer” training set.
Prevalent Breast Cancer Cases. For the purposes of this article, we use the term prevalence to indicate cancer cases diagnosed prior to the index year and not including the incident cases diagnosed in the index year. From the SEER-Medicare linked data, a cohort was developed of 48,631 women who had breast cancer between 1973 and 1994, according to SEER, and who were alive and eligible for Medicare Parts A and B and not in an HMO during 1995. The 1995 claims for these women were not used to train the algorithm per se, but were used to assess the impact of prevalent cases on the algorithm's specificity.
Validation Sets. Using the same selection criteria as described above for the year 1995, four analogous sets of claims were constructed for calendar year 1994. When evaluating a predictive algorithm, it is important that the validation set be independent of the training set. We defined the training sets to be comprised of claims from 1995, while the validation sets were comprised of independent claims from 1994. We recognized the possibility that some of the individuals generating the claims for the 1995 training set might also have generated claims for the 1994 validation set, particularly among the cancer-free and other cancer groups. In assessing the frequency of such overlap, though, we determined that only 5.5 percent of the individuals whose claims were part of the algorithm's training for steps 2 or 3 (described below under “Algorithm Development”) also contributed claims to the validation set at those steps.
The algorithm was developed using the 1995 training sets. In constructing the algorithm, consideration was given not only to the presence or absence of breast cancer diagnosis or procedure codes, but also to other related codes (such as historical codes and radiation codes) that might improve the prediction of a case. In addition, variables were evaluated indicating whether a code was in a primary or secondary position on a given claim, and how frequently the code occurred. The algorithm development effort involved an iteratively processed interaction between clinical insight and statistical analysis. The codes actually used in the algorithm are summarized in Table 1.
A four-part algorithm was developed (Figure 1). The input to the algorithm consists of Medicare claims of all women aged 65 and older who were alive and eligible for Medicare Part A and Part B in some index year, including claims for the following three months.
Step 1. Referred to as the “screen,” requires that a potential case have both a breast cancer diagnosis code and a breast cancer procedure code (not necessarily on the same claim) anywhere in the Medpar, Outpatient, or Carrier Claims records (see Table 1). Only subjects satisfying this screening step are retained for further consideration.
Step 2. Directly includes subjects with a high likelihood of being a case. To be classified as a case based on this step, the subject must meet both of the following criteria:
Subjects who pass step 2 are classified as possible incident cases, and proceed to step 4. Subjects who are not classified as cases at step 2 go to step 3.
Step 3. This step of the algorithm applies to all potential cases that passed the screen (step 1), but were not directly included at step 2. In practice, this step differentiates primary breast cancer cases from women undergoing lumpectomy or partial mastectomy for benign disease or for another cancer that had metastasized to the breast. Some such patients without primary breast cancer have claims with erroneous primary breast cancer diagnosis codes, and therefore pass the step 1 screen. To develop step 3 of the algorithm, logistic regression methods were employed. A model was developed predicting an incident breast cancer case from about 50 indicator variables representing the presence of various billing codes (diagnostic or procedural) or combinations of codes that were thought clinically to have possible usefulness. Complete details regarding this model are omitted in the interest of space, and because such details do not assist in assessing the performance of the final algorithm. The final parsimonious logistic model included only the following four dichotomous factors:
Because all the factors in the model are binary variables, it is not necessary for a user of the algorithm to use the regression equation to classify a case as positive or negative. Once the values of the four variables have been determined, subjects can be ruled in if they have one of three combinations of the variables. These combinations are (1) the “surgery” variable is positive and the other three variables are negative, (2) the “surgery” variable is positive, the “other cancer” variable is positive, and the other two variables are negative, or (3) the “surgery” variable is positive, the “secondary cancer to breast” variable is positive, and the other two variables are negative. With all other combinations, the subject is declared not to be a breast cancer case.
Step 4. This step of the algorithm is the step to remove prevalent breast cancer cases. This step uses three prior years of claims of subjects classified as a case in step 2 or step 3. Such subjects are removed if they have a claim in prior years 1992–1994 (1991–1993 for the validation cohort) that was either positive for step 1 (the screening step) of the algorithm, or contained a diagnosis of prior history of breast cancer. Women younger than age 68 at diagnosis did not have three full years of claims for review, but as many years as were available were used. This strategy results in the removal of most prior cases, but also a number of incident cases (Table 2, rows 6 and 7).
The initial approach was to consider the “gold standard” for defining an incident breast cancer case to be SEER. However, while conducting this study, it was determined that subjects meeting the criteria from step 2 above had a very high likelihood of being a case according to SEER. A small number of cancer-free control subjects also met the step 2 high likelihood criteria. After manual inspection of the claims for these subjects, the likelihood of these being incident breast cancer cases seemed extremely high (based on the pattern and number of claims with breast cancer diagnosis over time, radiotherapy claims, etc.). Therefore, in the results section two gold standards for defining an incident case of breast cancer are reported. The first is termed the “SEER” gold standard, which is defined solely by whether a subject linked to the SEER registry in that year as a breast cancer case. The second gold standard is termed the “SEER plus High Likelihood” gold standard and consists of cases identified by a SEER registry as well as control subjects identified by the two criteria of step 2 above. Our belief is that these cases might be in the group of about 6 percent of SEER subjects who did not successfully link with Medicare files (Potosky et al. 1993) or they might be due to a patient moving into a SEER area shortly after breast cancer diagnosis.
The estimates of sensitivity and specificity were converted to an estimate of the positive predictive value (PPV) using Bayes Theorem, as
where πB, πO, πPB, πN represent the incidence of breast cancer, incidence and prevalence of other cancers, prior breast cancer, and no cancer, respectively, in the study population. Based on SEER data, these were estimated to be 0.005, 0.07, 0.03, and 0.895, respectively. Confidence intervals for the PPVs were estimated using Fieller's method (Fieller 1940; Steffens 1971).
When applied to the training cohort, this algorithm had excellent specificity, and moderate sensitivity (Tables 2 and and3).3). Of the initial breast cancer cohort, about 5 percent of the subjects were not detected by step 1, about 9 percent were not retained by step 3, and a further 6 percent had a prior year code for breast cancer diagnosis or history thereof at step 4. This left an overall sensitivity of 80 percent. The specificity was excellent, at well over 99.9 percent.
The validation of this algorithm was carried out in the 1994 cohorts (Tables 2 and and3).3). The algorithm's performance was similar to the training year. The specificity of the algorithm remained well above 99.9 percent. Using the stricter “SEER” gold standard, the PPV was 89 percent. Using the “SEER plus high likelihood” gold standard, the PPV was 93 percent.
Using the PPVs for the four cohorts, the expected composition of a cohort developed using this algorithm can be determined (Table 3). In the validation year, the vast majority of cases selected by the algorithm are incident breast cancer cases. About 1 percent are other cancer cases. About 4–5 percent of cases selected by the algorithm are prevalent breast cancer cases. A substantial minority of the prevalent breast cancer cases was diagnosed according to SEER in the three months prior to the start of the 1994 year. If one were willing to tolerate a three-month error in date of diagnosis (i.e., cases diagnosed according to SEER in the last three months of 1993 are not counted against the algorithm's specificity for 1994), the percent of prevalent cases in the 1994 algorithm cohort would decrease from 4.6 percent to about 3 percent, and the percent considered incident cases would increase accordingly. With respect to the composition of the algorithm cohort, the percentage of cancer-free patients in the validation year varies from 1.4 percent to 5.5 percent depending on which gold standard is used.
We performed a sensitivity analysis of the specificity gain associated with examining prior claims in step 4 for differing numbers of years. Of the 48,631 prior breast cancer cases with 1995 claims, only 1,242 were positive after step 2 or 3 of the algorithm. Examining prior claims for one year back in step 4 would have removed 58.6 percent of those cases. Going back two, three, or four years, respectively, removed 69.3 percent, 74.4 percent, and 76.5 percent of the 1,242 cases. The specificity gain from applying step 4, however, is associated with a sensitivity loss (loss of index year true incident cases who met the criteria for removal at step 4). The percentage of true incident cases retained when applying step 4 going back one, two, three, or four years was 95.4 percent, 93.9 percent, 92.7 percent, and 92.3 percent respectively.
The algorithm sensitivity by selected patient characteristics is presented in Table 4. The algorithm sensitivity is lower for women with stage 4 and unknown stage disease at presentation, but there are relatively few such patients in any given year. Women are well represented up to age 84, but there is a decline in sensitivity for the women aged 85 and older. The sensitivity is consistent across the different SEER geographic regions. With respect to initial treatment, the algorithm fails to identify women who did not undergo initial surgery according to SEER, but identifies equally well women who underwent mastectomy and those who underwent BCS. Women who underwent lymph node dissection or radiotherapy according to SEER are somewhat overrepresented compared to those who did not.
We propose a 4-step algorithm for the use of Medicare claims data to identify women with surgically treated incident breast cancer. This algorithm has a sensitivity of about 80 percent overall, with a sensitivity of 82–87 percent for stages 1 and 2 disease. The algorithm has a specificity above 99.9 percent, and a positive predictive value of 89 percent, using a SEER gold standard. The PPV is greater than 93 percent based on the SEER plus High Likelihood gold standard.
The algorithm development process described herein illustrates several major issues with respect to the use of Medicare claims to identify breast cancer cases. One is the relationship of specificity to positive predictive value. Because only a minority of women, even in the Medicare age group, develop breast cancer in a given year, an exceedingly high specificity (>99.9 percent) is necessary to have a positive predictive value of 90 percent. The dramatic decline in PPV that occurs with only small decreases in specificity can be seen by comparing the results of this algorithm with prior proposed algorithms (Table 5). Given that the procedures used to treat breast cancer may also be used to identify or treat benign breast disease, and given occasional inaccuracies in the use of a breast cancer diagnosis, it is challenging to reach the necessary level of specificity.
A major goal of this algorithm was to maintain a high specificity while including cases treated in the ambulatory surgical setting. This algorithm achieves a PPV similar to that reported by Warren and colleagues (Warren et al. 1999) for inpatient claims, while providing improved sensitivity (Table 5). Although the sensitivity is not as high as that reported by Freeman and colleagues (Freeman et al. 2000), the PPV is much higher.
Another major issue with this and prior algorithms is the presence of prevalent cases. Because women with breast cancer often live for many years, the number of prevalent cases in a dataset greatly exceeds the number of incident cases. Women with prevalent disease undergo at times the same breast procedures as women with initial disease to diagnose or rule out recurrent or new breast disease, and also may carry diagnostic codes of primary breast cancer for years after initial disease. Since local disease recurrences occur most frequently within the first few years after diagnosis, our approach was to assume that algorithm-identified cases with a history of breast cancer within the prior three years had recurrent disease. This led to a decrease in sensitivity from about 85 percent to 80 percent, but maintained the high specificity of the algorithm.
In attempting to maximize the PPV of the algorithm, we accepted a moderate sensitivity of about 80 percent. Therefore, this algorithm may have limited utility for determining breast cancer incidence. The key uses for this algorithm are likely to be for aspects of care not well captured by SEER or other state tumor registries. The study of survivor care, for example, studies of mammography (Schapira, McAuliffe, and Nattinger 2000) or other health care utilization and physician care (Nattinger et al. 2002) among survivors, is well suited to claims analysis. Patterns of care studies with respect to geographic variation and rural areas not well represented by SEER appear feasible given the consistency of the algorithm in different geographic areas. Studies might examine pre-morbid care for older breast cancer patients, such as use of mammography or other preventive care interventions. Although some of the studies mentioned could be performed using the limited number of available linked tumor registry–Medicare databases, the need for greater geographic representation or larger sample sizes might favor the use of Medicare-derived samples. Given that almost half of all breast cancer cases occur in women aged 65 and older, the algorithm could be applied to 100 percent state Medicare databases for identifying providers with possible quality problems, such as low levels of medical oncology consultation, poor follow-up care, and poor preventive care practices. An algorithm that is less than perfect may still provide a valid assessment of patterns of care (Kahn et al. 1996).
A limitation we encountered is the fact that about 5 percent of women identified by SEER as having an incident breast cancer, and who linked to the Medicare claims data, did not even pass the screening step. Based on Table 4 it appears that some of these women do not undergo initial surgical therapy. Perhaps some women undergo surgery but are covered by employer-based insurance, which pays for their care in preference to Medicare. In any event, this problem does cause a limitation on the sensitivity that can be achieved by the algorithm, even if steps 2, 3, and 4 could be further optimized. Another limitation is that women who underwent radiotherapy are somewhat overrepresented compared to those who did not, limiting the ability to use this algorithm to study patterns of care for radiotherapy.
We are not able to state which of the two “gold standards” represents a more accurate definition of an incident breast cancer case. Although the SEER tumor registry program has excellent case ascertainment, all registry programs likely miss occasional cancer cases. In the case of this study, an incident cancer patient could also have been classified as a cancer-free control subject due to failure to link with the Medicare beneficiary files, or due to moving into a SEER area shortly after disease diagnosis. For these reasons, we developed and presented the “SEER plus High Likelihood” gold standard, which followed a decision rule created initially by manual inspection of the claims histories for certain control subjects who seemed to have a high likelihood of having breast cancer. We were convinced that these subjects likely had incident breast cancer by the lack of prior claims suggesting prevalent disease, and by the multiple claims during the training year that consistently suggested an operation for breast cancer (surgical claims, pathology claims, anesthesiology claims, etc.). Since we did not have access to patient identifiers or charts, we could not confirm that these patients had breast cancer. However, Warren and colleagues (1999) have previously demonstrated that some cases identified by their algorithm using Medicare claims actually had breast cancer but failed to link to SEER when the linkage was conducted. In addition, the number of high-likelihood cases identified by our algorithm within the 5 percent control sample is very close to the number that one would expect given a 94 percent linkage rate between SEER and Medicare. For example, in 1994, a 6 percent failure to link would translate into 456 unlinked breast cancer cases. We would expect 5 percent of these (23 cases) to be found in the 5 percent control sample. We would further expect the high-likelihood definition to identify 75 percent (17 cases). In fact, the high-likelihood definition did identify 19 cases in the 5 percent control cohort that year (Table 2), very close to the expected number.
As has been shown in a number of other disease areas, Medicare claims data offer unique advantages for cancer quality of care and health services research (Hewitt and Simone 1999; 2000; McNeil 2001). These data are essentially population-based, and minimize selection bias with respect to geographic region, urban versus rural location, and socioeconomic status. Each of these factors is an important predictor of cancer treatment, a fact that limits analyses of databases from more restricted populations (Nattinger et al. 1992; Guadagnoli et al. 1998; Gilligan et al. 2002). The possibility of using Medicare data more widely to assess patterns of cancer practice and related outcomes offers a potential that is worthy of further exploration.
Grant support from the Department of the Army (DAMD17-96-6262).
This study used the linked SEER-Medicare database. The interpretation and reporting of these data are the sole responsibility of the authors. The authors acknowledge the efforts of the Applied Research Program, NCI; the Office of Research, Development and Information, CMS; Information Management Services (IMS), Inc.; and the Surveillance, Epidemiology, and End Results (SEER) Program tumor registries in the creation of the SEER-Medicare database.