Using a unique database constructed independently in Michigan and Ohio to study cancer-related disparities, we evaluated the ability of the SBI variable in the Medicare denominator file to identify beneficiaries dually enrolled in the Medicare and Medicaid programs. The study considered the Medicaid enrollment file as the gold standard because Medicaid pays medical expenses for those enrolled and Medicare relies on Medicaid to report enrollment.
Overall, sensitivity was low for both states, but lower in Michigan than in Ohio. Low sensitivity implies that the SBI variable in the denominator file fails to identify a beneficiary as a dual, when in fact s/he is a dual according to the Medicaid source. Additionally, we observed substantial variation in sensitivity by patient demographics, as well as by cancer site, both within and between the states.
Several reasons for the discrepancies between the states are speculated. First, Michigan was unable to locate a number of Medicare beneficiaries due to a CMS denominator file restricted to Michigan residents. Roughly half of these cases not in the Michigan denominator file were confirmed as a CMS recipient but with a different recorded state of residence. Ohio, on the other hand, had Medicare data on Ohio cancer patients regardless of residence.
Second, we note that the records from the cancer registry, used to link with the other files, encompassed all anatomic cancer sites in Michigan, but were limited to breast, prostate, and colorectal cancer cases in Ohio. It is possible that this difference may have yielded a greater match rate in Michigan than in Ohio. Finally, disagreement as to Medicaid eligibility can arise from poor communication and reporting errors between the Medicare and Medicaid systems.
We obtained higher PPV in Michigan than in Ohio, meaning that nearly all patients identified as duals from the Michigan Medicare denominator file were in fact duals, according to the Medicaid enrollment files. On the other hand, data from Ohio indicated that 10 percent or more of those identified as duals through the SBI indicator were false positive or not identified through the Medicaid files. The higher PPV in Michigan is likely due to using a different linkage algorithm, as well as to augmenting the standard CMS social security match with an administrative file of linked Medicaid to Medicare information developed by CMS. This file was developed by CMS using date of birth, sex, beneficiary identification code, and claim account number along with SSN, resulting in a much more comprehensive file linkage achieved when using SSN alone. This step contributed 4 percent of all Medicaid and Medicare linked recipients in the Michigan study file. Had Ohio used similar strategies, perhaps the PPV would have been similar to that obtained in Michigan.
The linking strategy employed in Michigan differs from that of Ohio in some aspects. The linking algorithm in Ohio was deterministic in nature and employed four steps accounting for SSN, first and last name, and date of birth (month and year), as detailed elsewhere (Koroukian 2008
). Sex was added to the matching criteria only in the case of colorectal cancer. For the Michigan database, both probabilistic and deterministic approaches were used (Bradley et al. 2007b
). Cases identified through probabilistic matching but not through deterministic matching were resolved through manual review. This nuance in matching criteria, in addition to the manual review of cases identified through probabilistic matching and not through deterministic matching may have been responsible for the superior performance of the linking strategy in Michigan as compared with that of Ohio, although this remains to be confirmed in future studies testing various algorithms.
With regard to the length of enrollment in Medicaid, and for patients identified through both sources, there was agreement across the two sources that, on average, patients enroll in Medicaid for a length of approximately 8 months in the year before or after cancer diagnosis. However, the two sources agreed less frequently when accounting for patients with shorter lengths of enrollment in Medicaid. The rate of agreement was limited, barely reaching 30 percent among beneficiaries enrolled in Medicaid for less than 6 months, and ranging 60–70 percent among those enrolled for a period of 7–12 months.
To our knowledge, this is the first study to assess the ability of the SBI variable in the Medicare denominator file to identify dual beneficiaries by examining both Medicare and Medicaid sources of data. It is also the first to compare measures across similar databases constructed independently in two states. The above discussion pointing to idiosyncrasies across the linking strategies is very informative, although it does not definitively explain how the divergence in linking strategy between Ohio and Michigan contributed to the observed differences across the states. No doubt that this study would have been more informative if the databases across the two states were developed following the same linking strategy. However, the national trend has been for individual investigators to link databases within their home states, and, absent a consensus and a uniform methodology to link databases, idiosyncrasies are bound to occur.
Considerable variations in sensitivity and PPV between the two states by patient demographics and often in reverse trends were apparent. For example, while sensitivity in Ohio increased in older patients, it decreased with older age in Michigan. These unexplained trends further add to the level of uncertainty as to the biases to be expected when identifying duals through the SBI variable in the denominator file versus a given state's Medicaid enrollment files. A researcher planning a similar study in another state may see altogether different trends.
The findings from this study highlight the need for improvements in information management across Medicare and Medicaid to obtain a greater level of agreement between the two programs. It would also be extremely helpful if CMS included other variables in addition to the beneficiary SSN in their matching algorithm. CMS uses a more extensive list of variables (e.g., sex, date of birth) when they match their data to sources from other government agencies (e.g., the National Cancer Institute sponsored SEER-Medicare match), but when performing linkages for independent researchers, only SSN and gender are used.
The addition of the State-Reported Dual Eligibility Status code
, a variable in the Part D enrollment file, may improve the identification of duals using the denominator files alone (Research Data Assistance Center (ResDAC) website, 2009). The Part D enrollment file will also include the various dual eligibility categories (e.g., SLMB, QMB), which can be very useful when investigating questions related to access to care. This constitutes a substantial improvement over the SBI variable, which fails to distinguish between these varying levels of coverage (Barosso 2006
). Depending on states and study periods, Medicaid enrollment files may include these eligibility categories. In this study, for example, these categories could be identified in the Michigan Medicaid files, but not in the Ohio Medicaid files.
Accurate identification of duals through the Medicare denominator file is crucial to the valid comparison of health services use and outcomes between duals and nonduals. A high proportion of false negatives would bias results toward the null because outcomes in nonduals would be unfavorably influenced by the preponderance of unidentified duals within the sample. In turn, falsely underestimating the extent to which disparities between duals and nonduals exist would undermine the urgency with which the special needs of this vulnerable group of elders should be identified and addressed. It is left to future research to assess the adequacy Part D data to identify duals.
A uniform approach to linking strategies is crucial, given the utility of enhanced databases combining multiple sources of data to conduct in-depth studies on disparities. While recent research with such databases focuses on cancer-related disparities, a similar approach can be used to study outcomes for other clinical conditions and for evaluating other aspects of health care delivery and financing, such as cost of care and other outcomes as well. The potential utility of such databases should provide the impetus for the research community to agree on a linking strategy that could be used in a uniform fashion.
We note that the Medicare denominator file was linked with Medicaid analytic extract (MAX) data for the first time in 1999, leading to the creation of an enhanced MAX eligibility file that incorporates Medicare enrollment data for individuals identified in both datasets (Baugh 2004
). While useful, such a data source presents important limitations when data for controls (or nonduals) are not readily available. In addition, the feasibility of linking such a data source to an external one, such as cancer registry, is unknown.
In conclusion, the use of the SBI variable for identifying duals is limited. The somewhat low sensitivity and the varying PPV between the two states call for improvements in the management of information across the Medicare and Medicaid programs and for uniformity in strategies of linking databases. At present, the identification of beneficiaries dually eligible for Medicare and Medicaid must continue to rely on state-by-state matching algorithms, which ultimately restricts research related to duals, and when such research is conducted, its generalizability is limited.