In this comparison of self-reported race/ethnicity to codes in the Medicare Enrollment Database, we found that all codings in the database were highly specific and had high positive and negative predictive power. The sensitivity of the Medicare EDB measure was high for White and Black respondents, consistent with prior findings of Arday et al. (2000)
. However, the EDB sensitivities were low (ranging from 39% to 60% by the most focused definition) for all other groups. In short, Whites and Blacks recorded in the Medicare EDB closely matched those identified by self-report, but Hispanics, Asians, Pacific Islanders, American Indians and Alaska Natives identified in the Medicare EDB constituted only subgroups of their corresponding self-reported populations.
A number of historical factors may account for under-identification of these groups. Most importantly, Medicare relies primarily on Social Security Administration (SSA) records for identification, which were limited to categories of White, Black and Other until 1980, with only limited updates since then. Although the rule adopting the 1997 OMB racial/ethnic standard called for its implementation across federal agencies by 2002, the SSA was granted extensions by OMB through 2009. Beginning on January 1, 2010, SSA forms completed by new adult registrants and individuals requesting a replacement Social Security card use the 1997 OMB categories, but applications for social security numbers issued at birth, which are taken by state agencies, do not ask for racial/ethnic identity. It remains unclear whether and when CMS will start to receive and incorporate data in the new format. (Personal communication, Manuel de la Puente, Associate Commissioner, Office of Retirement and Disability Policy, Social Security Administration).
Furthermore, with changing social understandings of race and ethnicity, racial and ethnic identifications declared in the past may be inconsistent with current self-identification. Between the 1990 and 2000 decennial censuses, for example, the numbers of self-identified AIAN increased substantially within the same birth cohorts, likely due to a more positive image of identification with that group and the growing prosperity of some tribes (Appendix Table 4
The structure of the categories used by the SSA also could create inconsistencies between Medicare EDB and self-reported race/ethnicity. The SSA system requires selection of a single category. The CAHPS items, consistent with current OMB standards, allow selection of multiple categories, and in particular, are designed to distinguish Hispanic ethnicity and to encourage Hispanic respondents to select a race as well as Hispanic ethnicity. Among self-reported Hispanics in CAHPS, 76% checked at least one race as well, and likewise 76% of AIAN and 57% of NHOPI selected multiple categories; it is unknown how many of these would have reported these categories if they had to choose only one.
Under-ascertainment of some groups in the EDB would be less concerning if the characteristics of members of a group who are so identified are equivalent to those of members who are not. In that case the main effect would be to reduce the sample size of the under-ascertained group and classify the unrecognized members into a larger group (notably, Hispanics, AIAN, or NHOPI are often classified as White) on whose estimates they would have little statistical impact. We found, however, that there were substantial differences in geographic, demographic, health, and health care variables. Members of each group identified in the Medicare EDB were more likely to select the group as a single category and to live in areas with higher concentrations of the group (at both regional and local levels), as well as to have less education, be Medicaid dual-eligible, and to live in lower-income areas, all compared to those not identified in the EDB. These differences indicate that those identified in the EDB populations have a stronger group identification and greater socioeconomic disadvantage. The striking differences in geographic distribution for AIAN might be attributed to the major contribution to their identification in the EDB from enrollment with the Indian Health Service, which is most active on Indian reservations in the western states. Swan et al. (2006)
report findings parallel to ours in comparisons of American Indians reporting a single versus multiple racial/ethnic identifications. Similarly, EDB-identified group members tended to report worse health. The main exception was the prevalence of a history of cancer; because most reports of cancer were obtained from cancer survivors, this paradoxical finding might reflect better ascertainment and survival in the non-EDB subgroups due to superior screening and treatment. Access to health care, as measured by CAHPS items, also appeared to be superior for most of the non-EDB subgroups.
Because of these differences, for example, estimates of White-Hispanic differences in health or health care experiences relying on racial/ethnic identifications in the Medicare EDB reflect the experiences of a select subgroup of Hispanics but are not representative of all those who currently self-identify as Hispanic. This might be regarded as more or less problematic depending on the objectives of the analysis, since the boundaries of racial/ethnic subgroups are not immutable or unambiguous. Although we do not know exactly what historical or social factors determined classification as Hispanic in the EDB, we have no reason to think it was a consequence of health status at the time of registration. Therefore, we might regard the group so identified as an appropriate subject for an analysis of social disparities in health or health care, in a way that would not be appropriate for a group identified through its use of health care services (such as Hispanics served at a clinic). On the other hand, a disparity finding for EDB-identified Hispanics would be less generalizable and possibly less actionable than one that is more broadly representative of those who self-identify as Hispanic since the latter group is more readily identifiable in the community and in health care settings (although not within Medicare itself) and is recognizable as a social and political entity.
The differences we found by EDB identification within groups are challenging for methods that attempt to recover means or proportions for self-reported groups by backing out the relationship between self-report and EDB coding. Escarce and McGuire (2003)
proposed such a method, using the matrix cross-classifying the two race/ethnicity codings reported by Arday et al. (2000)
. Their method assumed that each group defined by self-report was homogeneous with respect to the variable under study. Regarding each EDB-based group as a mixture in known proportions of the self-report groups, equations can be written expressing means for the former in terms of means for the latter, and solved to estimate self-report means from the more readily observable EDB-based means. On the other hand, the rationale for this method breaks down when, as we found for many variables, means differ by EDB category within each self-report group. Indeed in that case, the proposed method could increase bias of estimates for some groups, relative to taking EDB groups at face value, rather than reducing it.
Researchers have made considerable progress over recent years on methods for assigning racial/ethnic categories to individuals, or more precisely estimating probabilities that an individual would report each of the possible racial-ethnic categories, using information such as surnames (Morgan, Wei, and Virnig 2004
; Wei et al. 2006
), given names, or racial/ethnic composition of the area of residence. Each of these racial/ethnic correlates contributes distinctive information and is most informative for identifying particular groups, so the results may be most useful or accurate when several variables can be combined (Elliott et al. 2008
; Elliott et al. 2009
; Fiscella and Fremont 2006
). We anticipate (based on Eicheldinger and Bonito (2008)
and our own exploratory analyses) that including EDB race/ethnicity codes as an additional predictor will further contribute to the reliability of these “indirect estimation” methods. In related work, the National Center for Health Statistics (NCHS) used multivariate Bayesian models to bridge racial/ethnic identification in their surveys from the older (1977) to the new (1997) OMB-mandated system by imputing a primary race for those reporting multiple races (Schenker and Parker 2003
). On the other hand, regression analyses proposed for use with such estimated probabilities (McCaffrey and Elliott 2008
) similarly rely on a homogeneity assumption, namely that within each group the outcome under study is unrelated to the probabilities assigned by the model. This assumption could fail, for example if Hispanics with Spanish surnames or living in areas with high proportions of Hispanics differ systematically from those with non-Spanish surnames or in non-Hispanic neighborhoods. While validation studies of indirect estimation have had promising results, we recommend continued monitoring of the validity of the underlying assumptions.
Nonetheless, we believe self-report is the preferred method of racial/ethnic identification for studies of health and health care disparities, as underscored by the 2009 Institute of Medicine report Race, Ethnicity and Language Data: Standardization for Health Care Quality Improvement
(Ulmer, McFadden, and Nerenz 2009
). This approach does not rely on the particular features of any administrative system, although the sensitivity of survey responses to details of item wording, response options, or context is also of concern (Bates et al. 1995
; Martin, de la Puente, and Bennett 2001
). For studies in which self-report is unavailable, other information sources or indirect estimation methods should be used, while recognizing that the groups identified in this way may not be consistent with those identified by self-report. Finally, recognition of the fluidity (different reporting over time or in different contexts) of race/ethnicity in no way detracts from the reality of disparities in health and health care and the importance of detecting and addressing them.
Our study has a number of limitations. Nonresponse to the CAHPS survey could bias our results, although we speculate that this might have greater effects on marginal distributions of race/ethnicity than on associations between variables. While we weighted our data to match population distributions of EDB race and numerous individual and area characteristics, we were unable to calibrate CAHPS racial identification against a strictly comparable census self-reported race distribution due to differences in response options (inclusion of an “Other race” category) in the American Community Survey (ACS) and census. Furthermore, the most recent data tabulated by ZCTA are from the 2000 Census, but these geocoded measures are probably stable enough over 10 years to give useful information on differences in area characteristics.
In summary, our findings highlight the ambiguity and fluidity of racial/ethnic identification. Research on racial identity highlights how self-reported race/ethnicity can change even for the most reliably distinguished groups, Blacks and Whites, with changing social or family circumstances (Lieberson and Waters 1993
) or even in a retest after a few months time (Elwert and Christakis 2006
). This fluidity may be even greater for some of the smaller groups within the Medicare population. In particular, Hispanic identity may be variously regarded as an alternative to or distinct categorization from race (Morning 2005
), an ambiguity perhaps reflected in the selection of “Other race” by many Hispanics (37% in the 2010 census), an option not offered in the CAHPS survey. Furthermore, identification with multiple races is gradually increasing, from 2.4% in 2000 to 2.9% in 2010 (Humes, Jones, and Ramirez 2011
). On the other hand, the cohorts born in the United States since 1989 were registered with Social Security at birth, a process that provides no racial identification because racial data on birth certificates are not transmitted to the SSA (McBean 2006
). Thus the complexities of racial and ethnic identification in the Medicare population can only be expected to grow as a more diverse population ages into Medicare.