|Home | About | Journals | Submit | Contact Us | Français|
To reduce racial and ethnic disparities in health care, managers, policy makers, and researchers need valid and reliable data on the race and ethnicity of individuals and populations. The federal government is one of the most important sources of such data. In this paper we review the strengths and weaknesses of federal data that pertain to racial and ethnic disparities in health care. We describe recent developments that are likely to influence how these data can be used in the future and discuss how local programs could make use of these data.
Racial and ethnic disparities in both the use of health care services and the quality of care within the United States have been well documented (Gornick et al. 1996; Institute of Medicine 2002; Virnig et al. 2002; Weech-Maldonado et al. 2003; Haas et al. 2004; Fremont et al. 2005). The recent National Health Care Disparities Report (NHDR) and other studies summarize these disparities on a national level (Schneider, Zaslavsky, and Epstein 2002; AHRQ 2005; Centers for Disease Control and Prevention 2005; Felix-Aaron et al. 2005). Collectively, these results have spurred national efforts to eliminate racial differences in care by the year 2010 (United States Department of Health and Human Services 2005b).
Much of what we know about racial and ethnic disparities has been derived from national population samples, yet many of the actions to eliminate disparities must occur in the local collaborative efforts of regional health care organizations, communities, health care institutions, and providers (Nerenz 2005). Local groups need data to guide interventions that address locally relevant disparities. Some efforts are already under way. In 2001, Aetna expanded efforts for the collection of race and ethnicity data on members, and used these data to target initiatives like the African American Preterm Labor Prevention and Breastfeeding Program (Hassett 2005).
Even as local projects are implemented, the federal government will remain a major source of data for monitoring racial and ethnic health care disparities (Lurie, Jung, and Lavizzo-Mourey 2005). In this paper we summarize the key features of selected federal health care databases. We examine their strengths and weaknesses, as well as trends that may affect their future use. Finally, we propose a model that would enable local initiatives on health care disparities to make use of federal health data.
A variety of federal programs collect data on race and ethnicity and health care (Committee on National Statistics 2004). Two federal data resources, the Social Security Administration (SSA) and the United States decennial census (United States Census Bureau 2005b), contain limited information on health care processes or outcomes, but supply the race and ethnicity data to federal databases used in analyses of health disparities. The SSA provides race and ethnicity data to other federal agencies including the Centers for Medicare and Medicaid Services (CMS). The U.S. census offers detailed population estimates and a detailed set of socioeconomic factors including race and ethnicity. These data are readily available through the U.S. census website (United States Census Bureau 2005b) and can be linked to multiple geographic levels (e.g., zip code or census block), permitting detailed geocoding analyses involving race and ethnicity (Fiscella and Franks 2001; Fremont et al. 2005; Krieger et al. 2005).
The federal agencies that offer health-related data can be divided into those that purchase or deliver health care and those that monitor health care (Table 1). Of the former, CMS accounts for the majority of databases (Centers for Medicare and Medicaid Services 2005b). The latter group includes the Centers for Disease Control and Prevention, the National Institutes of Health, and the Agency for Healthcare Research and Quality (AHRQ). These federal databases vary in the racial and ethnic composition of the populations assessed, the allowance for multiple race designation, whether ethnicity is collected independent of race, how race and ethnicity are designated (individual self-report versus other mechanisms), and the smallest geographic unit available for analysis (e.g., census region, state, or smaller units).
The datasets in Table 1 are the most widely used and each is available to the public with restrictions on the release of some variables to decrease the risk that individuals can be identified. Some datasets are completely deidentified and freely accessible on the Internet (e.g., some CDC surveys), while others require formal approval from the collecting organization (e.g., CMS). Comprehensive lists of federal datasets containing information on race and ethnicity are available in the National Healthcare Disparities Report and from the Department of Health and Human Services (United States Department of Health and Human Services 2005a).
Table 2 summarizes the general types of health-related data available in the federal databases, organized into five general categories that are illustrative, and not meant to be mutually exclusive: (1) mortality, (2) preventive services, (3) management of chronic conditions, (4) quality of care measures, and (5) patient reported experiences and quality of life. To support analyses of racial and ethnic disparities, these databases must contain or be linked to other databases that include information on health status, use of health services, or health outcomes. Linking databases to one another further enhances the questions that can be addressed. For example, the individual-level HEDIS data submitted by health plans participating in the Medicare Managed Care program do not contain race and ethnicity information, yet linkage to the CMS enrollment files has allowed comparisons of the quality of care between racial and ethnic groups (Schneider et al. 2002; Virnig et al. 2002; Trivedi et al. 2005).
Federal definitions and methods for collecting race and ethnicity data have evolved between the time the U.S. census first began collecting race information in 1790 and the 2000 decennial census that for the first time allowed individuals to self-report either a single race or multiple racial and ethnic backgrounds. The collection of ethnicity data has also evolved, with the first collection of Hispanic ethnicity in 1970 (Gibson and Jung 2002).
The collection of federal race and ethnicity data was transformed in 1977 by the Office of Management and Budget (OMB) Directive 15 (Office of Management and Budget 1978). This federal directive eliminated the category of “other race,” and instead required collection of race in 4 categories: white; black; Asian, Asian American, or Pacific Islander; and Northern American Indian or Alaskan Native. It also required that Hispanic ethnicity be collected as a data element separate from race, with the option to collapse race and ethnicity into one variable, with Hispanic listed as a race. A subsequent revision of OMB Directive 15 in 1997 allowed for the identification of more than one race per individual (Federal Register 1997b).
These directives affected multiple national databases, including the Medicare program and all surveys conducted by AHRQ and the CDC. In the Medicare enrollment database (EDB), race was originally stored as white, black, other, or unknown based on SSA data derived at the time of application for a new or replacement Social Security card (Lauderdale and Goldberg 1996). After expanding the number of race categories in 1994 to comply with the original OMB Directive 15, the Medicare EDB contained a substantial number of patients with “other” and “unknown” race. To remedy this, 2.2 million beneficiaries with a race of “other,”“unknown,” or with a Hispanic surname or country of birth (from SSA files) were surveyed about their race and ethnicity in 1997. Approximately 40 percent of individuals responded to this survey, improving the completeness of the Medicare race and ethnicity data (Arday et al. 2000).
Despite the mandate of OMB Directive 15, federal race and ethnicity data still vary in several ways (Table 1). First, the relative distribution of the race and ethnicity of populations differs across databases, particularly if minority populations were deliberately oversampled. For example, while the U.S. census estimates that 12 percent of the population report black race, 23 percent (unweighted) of the National Health and Nutrition Examination Survey sample reports black race because of deliberate oversampling. Second, collection of multiple race data on individuals is not yet routine except for the U.S. Census and the surveys conducted by the CDC. The majority of other data sources do not yet include such detail (Table 1). Third, the method for collecting ethnicity data is inconsistent among databases. The Medicare EDB includes Hispanic ethnicity as a race category, whereas Medicare surveys such as the Medicare Current Beneficiary Survey (MCBS) (Centers for Medicare and Medicaid Services 2005a) and the Medicare CAHPS® (CAHPS 2005) separate Hispanic ethnicity from race (Table 1). Fourth, the mechanism of assessing of an individual's race and ethnicity varies, ranging from individual self-report to the assignment of race by clerical workers at the time of patient registration (Table 1). This variability in the method of assigning race may affect the interpretation of research related to race and ethnicity (Kaplan and Bennett 2003).
There are at least four methodological issues to consider before using federal data to assess racial or ethnic disparities in health care: (1) the validity of the classification of individuals' race and ethnicity, (2) sample size limitations, (3) the smallest analyzable geographic or institutional unit, and (4) the availability of data on other cultural or socioeconomic characteristics of the individuals that may be important mediators of health disparities.
Researchers have examined the validity of assignment of race and ethnicity in the Medicare EDB using self-report within the MCBS as a gold standard (Arday et al. 2000). The specificity of classification of minorities within the enrollment database is very high, resulting in negative predictive values ranging from 96 percent among Hispanics to 100 percent among Asian and Pacific Islanders. The sensitivity of the EDB for identifying blacks is also high (95 percent), but the sensitivity is much lower for identifying Hispanics (39 percent), Asian/Pacific Islanders (58 percent), and American Indians (11 percent). The positive predictive values for the Medicare enrollment data range from 96 percent for blacks to as low as 78 percent for American Indians, suggesting the need to exercise caution when using Medicare datasets to evaluate care for nonblack minority populations. Similar studies of the Veterans Health Administration (VHA) data compared with patient self-report found 98 percent agreement for whites, 92 percent for blacks, 76 percent for Asians, 83 percent for Hispanics, and only 23 percent for Native Americans. Even after excluding the 36 percent of patients with missing race information in the VHA, the accuracy of classification of nonblack minority populations is not optimal (Kressin et al. 2003).
Misclassification of nonblack minority individuals may bias estimates of health status or mortality. Correcting or adjusting for this can reduce the bias. For example, using tribal documentation, the Indian Health Service (IHS) and the National Center for Health Statistics noted that the proportion of American Indians misidentified in the National Death Index ranged from 1 percent in Arizona to 30 percent in California. Using corrected estimates, the IHS has produced adjusted disease-specific mortality rates for American Indians in its most recent report (Indian Health Service 1999). The IHS database might also be used to improve the low sensitivity of the Medicare EDB for identifying Native Americans. The classification of Hispanic individuals can be refined by surname analysis (Morgan, Wei, and Virnig 2004).
Medicare health plans are uniquely positioned to assess the distribution of race and ethnicity for their enrolled populations by comparing estimates from the Medicare EDB to estimates based on self-reported race and ethnicity from surveys like the Medicare CAHPS. For a given health plan, if the CAHPS data produce an unbiased estimate of the prevalence of racial and ethnic groups, and these estimates are comparable to those in the Medicare EDB, then health plan analysts could rely on the EDB to supplement race and ethnicity data on Medicare plan members. Otherwise, plans may need to request race and ethnicity information at the time of enrollment or survey members. For commercial enrollees there is no Medicare EDB equivalent.
Obtaining adequate sample sizes to reliably estimate the use of services or health status of minority populations is a second important consideration. The large numbers of individuals included in federal databases enable statistically precise estimates of many health measures. Nevertheless, the relatively small numbers of nonblack minority individuals make it difficult to measure health care delivery and outcomes with precision. Most nonblack minority groups are clustered within specific geographic regions of the country. For example, while American Indians comprise less than 1 percent of the population throughout all counties in Ohio, this proportion ranges from 1 percent to as high as 90 percent in New Mexico (United States 2005a). The utility of specific federal databases to local health care leaders will depend on the racial and ethnic composition of the local population. Fortunately, the prevalence of minority populations at the county level is readily available through maps created by the U.S. Census website (Figure 1).
Healthcare managers using federal data sources to assess local disparities will also want to know the smallest analyzable geographic or organizational unit available, such as the state, city, or individual hospital or clinic (Table 1). Most federal datasets can provide summary information at the state level and many can do so at the county or zip code level. Administrative datasets are more likely than survey datasets to achieve adequate sample sizes within smaller geographic units. Very few databases provide detailed information at the facility (hospital or clinic) level. The Dartmouth Atlas based on Medicare administrative and claims data has been used to analyze racial disparities in care based on local hospital referral regions (Baicker et al. 2004). Additional sources of facility-level data include the Medicare End Stage Renal Disease program and the AHRQ Health Care Utilization Project database. While not available routinely, the Medicare CAHPS, the MCBS, and the Medicare HEDIS program may be able to provide information on specific health plans, but whether this information is actionable will depend on the total enrollment of the health plan and the prevalence of its minority population.
Measures of socioeconomic position (SEP) such as income or education attained are useful to understand mediators of racial and ethnic disparities (Braveman et al. 2001). The primary language spoken by an individual is also increasingly recognized as an important determinant of care (Jacobs et al. 2004). There is substantial variation in the collection of both language and SEP variables among federal databases. Surveys conducted by the CDC can be administered in either English or Spanish, and many federal surveys include extensive SEP data, but most federal administrative and claims databases contain varying levels of information on SEP data and no information on spoken language. While it is possible to use geographic locators such as zip code within administrative or claims databases to estimate SEP through geocoding, these methods are less precise than individual level data (Krieger, Williams, and Moss 1997).
Federal databases currently lack data on other potentially important determinants of health care, such as language proficiency, health literacy, and immigrant status (Kandula, Kersey, and Lurie 2004). For example, disparities in health care within a specific racial group such as blacks may depend not only on U.S. immigrant status, but also on the precise country of origin (Lucas, Barr-Anderson, and Kington 2003; Read, Emerson, and Tarlov 2005). These subtle differences will prove important to health care organizations attempting to address disparities on a local level. For the foreseeable future, such data must be collected locally.
In the near term, three major trends seem likely to affect the collection of race and ethnicity data and increase the complexity of their analysis: (1) the increasing diversity of the minority populations themselves, (2) the rising prevalence of non–English-speaking individuals, and (3) increasing numbers of individuals that self-identify as multiracial. Among births involving at least one black parent, the proportion with the second parent listed as white increased from 2 to 9 percent during 1968–1994 (Federal Register 1997a). The first two demographic trends tend to reduce the homogeneity of analytic categories and may increase bias due to nonresponse. In 1980, only 11 percent of the population reported speaking a language other than English at home, and this increased to 18 percent in the 2000 Census (Shin and Bruno 2003). It is likely that using survey data to assess health care disparities will underestimate existing disparities in the non–English-speaking population owing to nonresponse bias resulting from language barriers. To overcome this limitation, federal data collection efforts should translate survey instruments into additional languages as is done with the Medicare CAHPS survey. Federal agencies might also incorporate primary language spoken as a unique administrative or claims data field to enable health assessment of the non–English-speaking populations.
Between 1982 and 1994, the proportion of patients reporting multiple races to the National Health Interview Survey (NHIS) increased from 1.2 to 1.8 percent (Federal Register 1997a), and the 2000 Census recorded 2.4 percent of respondents with multiple races (Greico and Cassidy 2001). The proportion of individuals that self-identify as more than one race also varies significantly among racial groups. For example, individuals reporting American Indian race in combination with white race accounted for 55 percent of all multiple race individuals in the NHIS analyses (Federal Register 1997a). By contrast, black race in combination with white race accounted for 11 percent of multiple race individuals in the 2000 Census (Greico and Cassidy 2001).
Allowing individuals to select up to six race designations generates 63 possible race combinations, substantially reducing the sample size within each category. Combining some categories for analysis is an option, but the optimal approach to combining them is not obvious. For example, the surveys conducted through CDC include a summary question of “Which group would you say best represents your race?” The National Center for Health Statistics has constructed statistical models that can be used to assign a single “most likely” race for multiple race individuals responding to the 2000 census. Results from these models tend to increase the relative proportions of minority populations by anywhere from 2.5 percent (blacks) to as much as 12 percent for the American Indian population (Ingram et al. 2003).
Future federal health data collection programs may enhance sampling of less numerous minority populations such as Asians and American Indians. Collection of data on nonblack minority populations might also be enhanced by refining race and ethnicity categories in special geographic locations. For example, the “Hispanic” category may not suffice in cities such as New York, where expanded definitions including Puerto Rican or Dominican might be more appropriate. CMS could play an active role in such collection efforts by defining required race/ethnicity categories to be used in each region based on the local race/ethnicity profile obtained from the U.S. Census.
Racial and ethnic disparities in health care could be addressed most effectively through collaboration between individual health care organizations and the federal government. Any model for federal and local collaboration needs to address the division of responsibility for three data-related activities: collection, analysis, and reporting. Challenges to bidirectional sharing of standardized race and ethnicity data include time lags in the production of data, significant political and legal barriers (Kamoie and Hodge 2004), suspicions about the benevolent intent of such programs, and limited willingness to devote resources to this issue.
The federal government and states or other local agencies already collaborate to collect some forms of health care data. Three examples include collaborations between the Medicare program and the Quality Improvement Organizations (QIOs) (Jencks, Huff, and Cuerdon 2003), between the Medicare program and the National Committee for Quality Assurance and local health plans to collect standardized health plan performance measures, and between the Centers for Disease Control and Prevention and state health departments to operate the Behavioral Risk Factor Surveillance System (Centers for Disease Control and Prevention 2005a). In each instance, a local agency or organization collects data that are transmitted to the national organization for aggregation and analysis, but which are also available to the local organization for its own purposes. These or similar partnerships could be tasked with collection of standardized data on racial and ethnic health care disparities.
How would analysis and reporting on regional and local disparities be accomplished? Health care quality reporting provides two useful models. One model involves analysis and reporting of local or regional results by a national organization. For example, the National Committee for Quality Assurance (NCQA) currently generates regional and health plan summaries of HEDIS results. The other model involves reporting by regional, state, or local groups. For example, the Massachusetts Health Quality Partnership has produced statewide reports on the quality of physician groups by aggregating and analyzing the data that local health plans report to NCQA. The Medicare Quality Improvement Organizations, many of which have expertise in data collection and analysis could also play this role (Jencks et al. 2003). As CMS has included the elimination of health disparities as part of the scope of work for the QIOs, local disparity reports could be a product of the QIOs (Department of Health and Human Services 1999).
Currently, the federal government reports on racial and ethnic disparities in health care via the NHDR. Future releases of the NHDR might routinely include regional and local reporting on health disparities. Alternatively, the NHDR might generate local reports in response to requests from state agencies, private sector organizations, and institutions about specific questions related to disparities. Much as census data guide local planners, these reports would enable health care leaders to better understand local manifestations of nationally documented health care disparities and enable institutional investment of resources to address locally relevant disparities (as opposed to disparities that may be relevant in other parts of the United States). Results of these local reports might also reveal resource limitations. For example, survey data could reveal local disparities in the availability of primary care or specialty services.
Some federal data, sampled nationally, may lack sufficient sample sizes to assess health care disparities among minority populations with particular medical conditions. Large claims databases are less vulnerable to sample size limitations than survey data. To address disparities using these surveys, the sampling schemes would have to be modified to oversample minority populations or initiate data collection in geographic areas that are not adequately represented. For example, there is already precedent within the Medicare Current Beneficiary Survey for oversampling selected populations (such as managed care enrollees) in some survey rounds. Likewise, monitoring cancer care for Native Americans has stimulated the addition of Arizona and Alaska to the SEER program. Sampling modifications could be guided by results of racial disparities analyses using larger administrative datasets.
In conclusion, racial and ethnic disparities continue to be an important national problem. Much of what we know about these disparities has been derived from federal databases. The federal government has key roles as standard setter, as data collector for federally sponsored programs, and as a data clearinghouse. The collection of data on race, ethnicity, and health must reflect the growing diversity of our population. Coordinating the efforts of states and local insurers with federal efforts to enhance and standardize race and ethnicity data collection could lead to more powerful analyses of aggregated data. With modifications, federal datasets can also be useful to local health care leaders and policy makers as they strive to reduce racial and ethnic disparities and improve care for all of the citizens of the United States.
This work was supported by a grant from the Robert Wood Johnson Foundation.