Our results suggest that correlational or multivariate latent variable analyses of SF-36v2 responses can be done across whites, blacks, English-speaking Hispanics, and Spanish-speaking Hispanics, since these analytic methods require only weak factorial invariance, as was found in this study. For example, multiple regression analysis (stratified by race/ethnicity/language) with MCS or PCS as the dependent variable does not present a problem in terms of measurement equivalence. However, the lack of strong invariance seen in our study suggests the need for caution when comparing SF-36v2 mean scores of Spanish-speaking Hispanics with those of other groups. The pattern of differences in item intercepts () suggests a systematic linguistic/translational or cultural (values and beliefs) bias regarding Spanish-speaking Hispanics for most of the SF-36v2 items. Further analysis found that strict invariance was confirmed for whites, blacks and English-speaking Hispanics.
Our findings agree with the weak (metric) invariance analysis conducted by Hays, Revicki and Coyne who also found support for weak invariance across both English-and Spanish-speaking Hispanics using the SF-36 version 1.18
However, they did not conduct tests for strong or strict invariance. Our results extend this previous body of research and provide contrasting findings of a lack of strong invariance for most SF-36v2 items for Spanish-speaking Hispanics.
The lack of strong invariance should not be that surprising to Hispanic health researchers given the findings of prior studies. Bzostek et al 4
presented evidence suggesting that lack of measurement invariance could result from the difficulties in translation and the lack of linguistic equivalence for some words in English and Spanish. They suggest that the single self-rated health question (“Would you say your health in general is excellent, very good, good, fair or poor?”) when translated into Spanish with response options of excellente, muy buena, buena, regular
creates a language-based measurement artifact. They explain:
“Regular” can mean “okay” or “fine” (but also “so-so”), whereas in English, “fair” clearly connotes subpar health. Angel and Guarnaccia (1989) also suggest that there may be language-related differences in “anchoring”, e.g., buena or regular may be used to describe normal health in Spanish…while excellent and very good may imply having no health problems—i.e., normal—health in English (or in the United States)” (p. 992)
Our results are consistent with these suggestions regarding the single self-rated health item as we noted in the results section. But this translational issue cannot explain the pattern of lower intercepts on most items for Spanish-speakers because no other items in the SF-36v2 use the same response options.
Several potential explanations of bias in the responses of Spanish-speaking Hispanics have been suggested in prior research.4,8,43,44
Differences in how Spanish-speaking Hispanics respond to particular SF-36v2 items could result from cultural differences in what is acceptable to say about one’s health. For example, it may be socially unacceptable to express optimism or boast about one’s health.4
Alternatively, Hispanics in general may be more likely to choose responses at scale endpoints rather than those in the middle. 45
Our study results are consistent with both of these explanations for Spanish-speaking Hispanics.
As a counter point, Morales, Diamant and Hays46
performed an analysis of the SF-12 version 1 in a U.S. population of English and Spanish-speaking persons and also found weak invariance. A test for strong invariance found differential item functioning (i.e., different intercepts) for only two items (general health status and “downhearted and blue”). They suggest one can establish psychometric equivalence with only weak invariance and that partial strong invariance is acceptable. Part of the lack of congruence between our study and that of Morales et al may be due to the differences in response options between the two versions supporting the notion that modifications of response categories—even when they are designed to improve scale properties--can often lead to unintended consequences. For example, the dichotomous responses for 4 items in the SF-12 version 1 (2 role physical and 2 role emotional) were changed to 5 category response scales in version 2—and all 4 of these exhibited differential item functioning in our study while no difference was reported in the Morales et al study.46
Additionally, demographic differences in the study populations such as age, sex, national origin for those speaking Spanish and education may have contributed to the differences in our findings but these data were not available for comparison. However, both samples were clinic based and drawn mainly from major urban public hospital systems and therefore clinical differences should have had little effect on our differential findings.
Our analysis has several limitations. First, the age of participants was 45–64 years and hence results cannot be generalized to other age groups. Second, this is a clinical population and hence the results may not be generalized to community dwelling populations. Third, the majority (85%) of our Spanish-speaking respondents self-identified their ethnicity as Puerto-Rican (53%) or Mexican (32%). We were concerned about combining these two major Hispanic subgroups in the analysis for two reasons. They have very different cultural histories, and most of the analyses conducted on English and Spanish-language differences in self-reported health status have been among Mexican ancestry participants. However, results among Spanish-speaking Puerto Ricans living in the Commonwealth of Puerto Rico in the Behavioral Risk Factor Surveillance System (BRFSS) suggest that responses to the single-item general health status question are consistent with those seen among Mexican ancestry Spanish-speaking Hispanics. For example, over the past decade the percentage of residents of Puerto Rico reporting fair or poor health has consistently been approximately 32% (the national average for residents of other U.S. states is approximately 14.5%). Again, this mirrors the results of our study, as well as those of Bzoctek et al4
and others who focused on Mexican Spanish-speakers. Hence we believe this lends support to the linguistic/translational explanation for some of the differences we have found and that it applies to both Mexican and Puerto Rican Spanish-speaking subgroups.
A fourth limitation is the significant educational differences between the Spanish-speakers and the other three racial/ethnic groups. As a partial test of this limitation, we replicated our multi-group analysis excluding those with less than a high school education. The results of this supplemental analysis were nearly identical to that of the full sample across the models, and showed a similar pattern of differences in intercepts (see Appendix Figure A1
Fifth, the Spanish-speaking group was disproportionately female (66.4%) compared to whites (53.8%), blacks (48.9%), and English-speaking Hispanics (58.3%) and that may have induced bias in an unknown direction in our results. Sample size and power limitations precluded a multi-group analysis stratified by sex; future studies would benefit from larger samples and a more equal sex distribution. However, cross-tabulation of self-reported overall health status by race/ethnicity and language group revealed a similar distribution of responses by sex. Hence we believe that at least for this single item the linguistic/translational explanation holds across the sexes.
Sixth, while we have conducted a comprehensive set of analyses and the approaches we have employed in this study are growing in acceptance (i.e., our use of specific goodness-of-fit measures and cutoff points and the use of delta tests), it still remains an area of controversy and debate and these approaches are not universally accepted.
Finally, a limitation of all purely psychometric approaches to invariance testing using self-reports of health and physical functioning among language groups is the inability to distinguish between linguistic/translational and cultural influences and actual objective physical health status. In short, we cannot definitively say whether or not the Spanish-language differences we observed would be replicated if we were to compare against “gold standard” objective measures of health and physical functioning. Future studies would benefit from including performance based measures of physical functioning and other clinically based measures of health status.
In closing, our findings suggest the need for caution when using self-reported health measures to compare the health status and health care needs of Spanish-speaking Hispanic populations with white, Black, or English-speaking Hispanic groups. Policy makers, public health officials, and health providers should be aware that SF-36v2 mean scores for Spanish-speakers may not be directly equivalent to those of English-speakers using current items, response options, and analytic methods. Future studies are needed to establish procedures for accurately comparing the self-reported health of Spanish-speakers with that of English-speakers, such as the use of numerical visual-analogue scales instead of Likert scales with response options that may lack true linguistic equivalents. Additionally, qualitative studies (including focus groups, cognitive interviewing and in-depth interviews) are needed to better understand the cultural origins and meanings attached to questions of health and how these artifacts, attitudes and orientations to health influence perceptions of physical, emotional and social well-being.