This study examined issues related to the transition to the new federal standards in collecting race/ethnicity in the VA. We showed that the overall agreement between the observer-recorded and self-reported race/ethnicity data was excellent. Excluding those who reported ethnicity only in 2004, the overall agreement between the new and old data was over 95% (kappa = 0.87). This indicates that the observer-recorded data are highly consistent with, and can be used together with, the self-reported data without creating substantial bias in multi-year trends. This was mainly due to accurate identification by observation of the two largest racial groups, Whites and African Americans, who had sensitivity rates of 97.6% (kappa = 0.89) and 94.0% (kappa = 0.93), respectively.
However, we also showed that observation was not a reliable method of identifying race/ethnicity for non-African American minority groups. The sensitivity rates for these groups varied between 26.6% and 83.0% (kappa, 0.23 and 0.79), too low for identifying them separately for research purposes. They can be combined with other groups to create a higher-level, more inclusive group, to achieve better sensitivity. We showed that the African American and Other (Whites and all other non-African American minorities combined in one group) distinction had the best agreement between the old and new race/ethnicity data.
We also observed a systematic pattern by which observer-recorded data misclassified individuals; 85% of all inaccurate race/ethnicity in the observer-recorded data involved Whites in such a way that Whites were incorrectly identified as members of a minority group or vice versa. This pattern of misclassifications in the observer-recorded data can reduce the observed disparity between Whites and other racial groups, and accordingly the racial disparity based on observer-recorded data may be underestimated. Researchers using the observer-recorded and self-reported data together thus need to conduct sensitivity analyses to rule out the possibility that any change in disparity before and after the transition is not attributable to using mixed data.
The findings of this study are consistent with a previous study that reported agreement rates of 97.9% and 92.0% for Whites and African Americans, respectively, between the observer-recorded race in VA administrative files and the self-reported race in a survey of veterans [4
]. The agreement of the APIs was much lower with the self-reported data in the administrative files (35.3%) than with the survey data (75.5% for Asians and 69.6% for Pacific Islanders).
The observer-recorded race/ethnicity data in the VA Medical SAS Datasets also compare favorably in accuracy with those in the Medicare Enrollment Database (EDB), which showed 96.5% and 95.6% for sensitivity rates for Whites and African Americans, respectively [22
]. The VA observer-recorded data performed slightly better in identifying Whites but slightly worse in identifying African Americans. In the VA, only about 15% of Hispanics were misclassified to some other race/ethnicity groups, while almost 65% in the EDB were misclassified. The sensitivity for Hispanic category in the EDB was only 35.7% compared with 85.5% in the VA data. Except for Asians, the sensitivity rates for other minority groups in the VA data were much higher than those in the EDB. Thus, when both VA and Medicare race values are available for an individual, this implies that the old VA data should actually be preferred to the Medicare data, especially for Hispanics.
We found that the completeness of self-reported race/ethnicity data was a serious problem. Over 60% of all VHA users in 2004 did not report any race values, which represents almost 15% drop in completeness compared with observer-recorded data in the pre-transition years. For example, 45% of all VHA users in 2002 were missing race/ethnicity data. This sudden drop in completeness of the race data from the pre-transition years may in part be a transitional problem that occurs during the first few years after a new system is implemented. If these were the case, the race/ethnicity data may be randomly missing.
However, it is also possible that some groups may not like to disclose their race/ethnicity more than others and so this drop may also be in part attributable to the change in data collection methods. As we have shown, race/ethnicity data for multiracial individuals may be seriously underreported in the VHA data. Only 0.3% of all users and 0.7% of those with valid self-reported race/ethnicity values reported two or more races in the 2004 VHA data, while a national survey of veterans conducted in 2001 indicated that 2.1% of all veterans and 3.2% of VHA users may be multiracial [23
]. The selective self-reporting is also shown in the regional variations in the completeness of race data in 2004. The South region had the highest completeness at 43%, followed by Northeast and Midwest at 39%, and West at 33%. According to the 2000 Census, the West had the highest concentration of multiracial persons with 40% of all multiracials in the country [24
]. This suggests that the multiracial individuals are more reluctant to report their own races than individuals of single race and the self-reported data may have selection issues that the previous observer-recorded data do not have, further complicating the mixed use of observer-recorded and self-reported data for multi-year trends.
To address the incompleteness issue, the VA can consider several options. First, the VA can obtain data through special surveys or from external sources. As the Centers for Medicare and Medicaid Services (CMS) have done [25
], the VA could survey veterans specifically to collect race/ethnicity data from enrollees whose self-reported race/ethnicity data are not known. Alternatively, the VA could establish an interagency agreement with the Social Security Administration (SSA) and acquire the SSA's race data regularly to supplement its own race data, an approach also used by the CMS [22
However, a more fundamental and long-term solution to this problem is to improve race reporting at the source, namely, in VA hospitals and clinics. The VA may need to examine whether the way the race/ethnicity questions are asked (e.g., specific wording of the questions, use of any prefatory remarks or probes following an incomplete answer, or circumstances under which the questions are asked) can be improved. Previous research suggests that how a question is asked about race/ethnicity can make substantial differences in the response rate, especially for small race/ethnicity groups [27
]. For example, a study showed that an open-ended question (i.e., allowing the respondents to describe their race/ethnicity in their own terms) can reduce the rates of unusable data compared with data obtained with the OMB standards, and that the open-ended format is especially effective in improving race reporting for minority groups such as Hispanics, Asians or multiracial individuals, who are often reluctant to describe their race/ethnicity profile in pre-defined categories such as those in the OMB standards [27
Until the self-reported race data in the VA are substantially improved in completeness, researchers using the VA race data may consider supplementing self-reported data with observer-recorded data from past years or through SSA or Medicare data sets when applicable. Future research needs to examine how the observer-recorded and self-reported data can be integrated for a well-validated patient-level race database and if such an approach can substantially improve the completeness of VA race data.
In the meantime, however, whether self-reported data are used alone or in combination with old race/ethnicity data, users of VA self-reported race data should be aware of the potential selectivity in the self-reported race/ethnicity data. As discussed above, about 25% of those who reported Hispanic ethnicity did not report race in the FY2004 VHA data. These individuals may not view themselves as having racial identity distinct from their ethnicity [27
]. They thus may choose either "Other" category for their race or refuse to disclose race when they are given OMB categories. This is shown in 2000 Census data in which over 42% of all those who reported their ethnicity chose "Some other race" compared with only 5.3% of the total population [29
]. In the VHA, "Other race" is not provided as a response category and as a result many refused to report race. Regional variations in the completeness of self-reported race/ethnicity data (e.g., 33% in the West vs. 41% in the other three regions) may also reflect not so much systemic failure to enforce the new race/ethnicity data collection standards among VHA facilities in the West as variations in the distribution across regions of non-African American minority groups such as Hispanics, Asians, and individuals of two or more races.
One limitation of this study is that we have not considered the characteristics of the VA population who had no self-reported data. Their individual characteristics, and accordingly the accuracy of their observer-recorded race values, may be systematically different from those who could be linked. As a consequence, this study cannot provide an estimation of how good the quality of data would be that combine the old and new information. Further, the findings about the accuracy of observer-recorded race need to be cautiously generalized because only about 28% of all valid observer-recorded data for 1997 – 2002 could be linked to the self-reported data.