There are at least four methodological issues to consider before using federal data to assess racial or ethnic disparities in health care: (1) the validity of the classification of individuals' race and ethnicity, (2) sample size limitations, (3) the smallest analyzable geographic or institutional unit, and (4) the availability of data on other cultural or socioeconomic characteristics of the individuals that may be important mediators of health disparities.
Researchers have examined the validity of assignment of race and ethnicity in the Medicare EDB using self-report within the MCBS as a gold standard (Arday et al. 2000
). The specificity of classification of minorities within the enrollment database is very high, resulting in negative predictive values ranging from 96 percent among Hispanics to 100 percent among Asian and Pacific Islanders. The sensitivity of the EDB for identifying blacks is also high (95 percent), but the sensitivity is much lower for identifying Hispanics (39 percent), Asian/Pacific Islanders (58 percent), and American Indians (11 percent). The positive predictive values for the Medicare enrollment data range from 96 percent for blacks to as low as 78 percent for American Indians, suggesting the need to exercise caution when using Medicare datasets to evaluate care for nonblack minority populations. Similar studies of the Veterans Health Administration (VHA) data compared with patient self-report found 98 percent agreement for whites, 92 percent for blacks, 76 percent for Asians, 83 percent for Hispanics, and only 23 percent for Native Americans. Even after excluding the 36 percent of patients with missing race information in the VHA, the accuracy of classification of nonblack minority populations is not optimal (Kressin et al. 2003
Misclassification of nonblack minority individuals may bias estimates of health status or mortality. Correcting or adjusting for this can reduce the bias. For example, using tribal documentation, the Indian Health Service (IHS) and the National Center for Health Statistics noted that the proportion of American Indians misidentified in the National Death Index ranged from 1 percent in Arizona to 30 percent in California. Using corrected estimates, the IHS has produced adjusted disease-specific mortality rates for American Indians in its most recent report (Indian Health Service 1999
). The IHS database might also be used to improve the low sensitivity of the Medicare EDB for identifying Native Americans. The classification of Hispanic individuals can be refined by surname analysis (Morgan, Wei, and Virnig 2004
Medicare health plans are uniquely positioned to assess the distribution of race and ethnicity for their enrolled populations by comparing estimates from the Medicare EDB to estimates based on self-reported race and ethnicity from surveys like the Medicare CAHPS. For a given health plan, if the CAHPS data produce an unbiased estimate of the prevalence of racial and ethnic groups, and these estimates are comparable to those in the Medicare EDB, then health plan analysts could rely on the EDB to supplement race and ethnicity data on Medicare plan members. Otherwise, plans may need to request race and ethnicity information at the time of enrollment or survey members. For commercial enrollees there is no Medicare EDB equivalent.
Obtaining adequate sample sizes to reliably estimate the use of services or health status of minority populations is a second important consideration. The large numbers of individuals included in federal databases enable statistically precise estimates of many health measures. Nevertheless, the relatively small numbers of nonblack minority individuals make it difficult to measure health care delivery and outcomes with precision. Most nonblack minority groups are clustered within specific geographic regions of the country. For example, while American Indians comprise less than 1 percent of the population throughout all counties in Ohio, this proportion ranges from 1 percent to as high as 90 percent in New Mexico (United States 2005a
). The utility of specific federal databases to local health care leaders will depend on the racial and ethnic composition of the local population. Fortunately, the prevalence of minority populations at the county level is readily available through maps created by the U.S. Census website ().
Figure 1 Geographic Distribution of the American Indian Population within New Mexico and Ohio, 2000 Census The distribution of minority populations is clustered throughout the United States, as indicated by the high proportion of American Indians residing in New (more ...)
Healthcare managers using federal data sources to assess local disparities will also want to know the smallest analyzable geographic or organizational unit available, such as the state, city, or individual hospital or clinic (). Most federal datasets can provide summary information at the state level and many can do so at the county or zip code level. Administrative datasets are more likely than survey datasets to achieve adequate sample sizes within smaller geographic units. Very few databases provide detailed information at the facility (hospital or clinic) level. The Dartmouth Atlas based on Medicare administrative and claims data has been used to analyze racial disparities in care based on local hospital referral regions (Baicker et al. 2004
). Additional sources of facility-level data include the Medicare End Stage Renal Disease program and the AHRQ Health Care Utilization Project database. While not available routinely, the Medicare CAHPS, the MCBS, and the Medicare HEDIS program may be able to provide information on specific health plans, but whether this information is actionable will depend on the total enrollment of the health plan and the prevalence of its minority population.
Measures of socioeconomic position (SEP) such as income or education attained are useful to understand mediators of racial and ethnic disparities (Braveman et al. 2001
). The primary language spoken by an individual is also increasingly recognized as an important determinant of care (Jacobs et al. 2004
). There is substantial variation in the collection of both language and SEP variables among federal databases. Surveys conducted by the CDC can be administered in either English or Spanish, and many federal surveys include extensive SEP data, but most federal administrative and claims databases contain varying levels of information on SEP data and no information on spoken language. While it is possible to use geographic locators such as zip code within administrative or claims databases to estimate SEP through geocoding, these methods are less precise than individual level data (Krieger, Williams, and Moss 1997
Federal databases currently lack data on other potentially important determinants of health care, such as language proficiency, health literacy, and immigrant status (Kandula, Kersey, and Lurie 2004
). For example, disparities in health care within a specific racial group such as blacks may depend not only on U.S. immigrant status, but also on the precise country of origin (Lucas, Barr-Anderson, and Kington 2003
; Read, Emerson, and Tarlov 2005
). These subtle differences will prove important to health care organizations attempting to address disparities on a local level. For the foreseeable future, such data must be collected locally.