Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Med Care. Author manuscript; available in PMC 2013 September 1.
Published in final edited form as:
PMCID: PMC3443872

Does the Consumer Assessment of Healthcare Providers and Systems Cultural Competence Survey provide equivalent measurement across English and Spanish versions?

Adam C. Carle, MA PhD and Robert Weech-Maldonado, Ph.D., Professor & L.R. Jordan Endowed Chair



The English and Spanish versions of the Consumer Assessments of Healthcare Providers and Systems (CAHPS®) Cultural Competence Survey (CAHP-CC) assess patients’ experiences with culturally competent care. The possibility exists that even when Spanish and an English speakers experience the same levels of culturally competent care, responses describing their care may differ. This is called measurement bias. To deliver reliable and valid information across language, responses must provide equivalent measurement across versions. In this study, we examined whether measurement bias on the CAHPS-CC impedes valid measurement across the English and Spanish versions.


We used multiple group (MG) confirmatory factor analyses (CFA) to examine measurement bias across English (n = 851) and Spanish (n = 113) speakers. Participants came from a 2008 sample of two Medicaid managed care plans, in New York and California.


MG-CFA provided general support for the equivalence of the CAHPS-CC in measuring Doctor Communication-Positive Behaviors, Doctor Communication-Negative Behaviors, Doctor Communication-Preventative Care, Equitable Treatment, and Trust. We did observe statistically significant differences in the thresholds associated with the item asking whether a doctor gave easier to understand instructions. However, analyses indicated that bias did not meaningfully influence conclusions about average experiences using the English and Spanish versions of the CAHPS-CC.


Our results support the use of the English and Spanish versions of the CAHPS-CC. Though we found some bias, analyses demonstrated that it did not substantively impact conclusions for the studied domains. Health providers can place confidence in the two different CAHPS-CC translations.

Keywords: Cultural competence, CAHPS®, Spanish, translation, ethnicity, measurement equivalence

Culturally competent care refers to the capacity of healthcare providers at various levels to engage with patients in a safe, patient and family centered, evidence-based, and equitable manner.1 Given the increasing size of the Spanish speaking Hispanic population in the US,2 the importance of delivering culturally competent care to this population,3-8 and the importance of the patient’s perspective,9 it seems self-evident that stakeholders need reliable and valid measures of patients’ experiences with culturally competent care. Yet, until recently, few tools have existed to do this, especially for Spanish speaking patients.

In response, a team of investigators developed a new measure of patient’s experiences with culturally competent care.10 The Consumer Assessment of Healthcare Providers and Systems (CAHPS®) Cultural Competence Survey (CAHPS-CC) assesses 8 domains of culturally competent care: Doctor Communication-Positive Behaviors; Doctor Communication-Negative Behaviors; Doctor Communication-Health Promotion; Doctor Communication-Alternative Medicine; Shared Decision; Equitable Treatment; Trust; and Access to Interpreter Services. Another paper provides support for the reliability and validity of this survey among patients generally. 10 However, research has not yet examined whether responses to the CAHPS-CC item set provide equivalently reliable and valid measurement across patients responding in English and Spanish.

Measurement bias refer to the possibility that two people who have had equivalent experiences with culturally competent care nevertheless answer questions about their experiences differently based on whether or not they respond to questions in English or Spanish.11 Without establishing equivalent measurement, the field cannot discern whether differences in reports of care between English and Spanish speakers result from different care experiences or differences in the way the groups respond. Multiple group confirmatory factor analysis (MG-CFA) provides a potent method for evaluating bias.12-14 Thus, we used MG-CFA to examine potential measurement bias on the CAHPS-CC across English and Spanish survey versions.



Using administrative data provided by the state’s plans, we used a stratified random sampling design (based on race/ethnicity and language), to select 3,200 (New York) and 2,800 (California) adults (18-65 year old). The initial sampling frame consisted of: 1,200 White English speakers, 1,200 Black English speakers, 900 Hispanic English speakers, 900 Hispanic Spanish speakers, 900 Asian English speakers, and 900 bilingual Asian speakers (all communication with this group occurred in English. Cost restrictions and the number of Asian languages prevented us from developing numerous separate Asian language surveys).

Data collection occurred in two waves: mailing and follow-up telephone interviews of non-respondents. The mailing included an English survey and a cover letter in English and Spanish. The letter directed Spanish speakers to call an 800 number to request a copy of the Spanish survey materials. Four weeks after the initial mailing, non-respondents received a second mailed survey packet. Telephone follow-ups in English and Spanish started 2 weeks after the second mailing. Remaining non-respondents after the second call attempt received a monetary incentive of $10 to complete the survey. In all, 1,380 individuals completed the survey for an overall response rate of 26%.

We used administrative data to compare responders and non-responders on gender, age, race/ethnicity, primary language, and health plan affiliation. Respondents were more likely to be White (24% versus 20%) and older (39 versus 36 years, and less likely to be Black (18% versus 22%). We observed no other significant differences. Note that using administrative data to compare respondents and non-respondents may have influenced our conclusions regarding non-response bias.

After excluding individuals that did not have a personal doctor or a doctor visit during the last 12 months, the final analytic sample constituted 964 respondents. 851 completed the survey in English and 113 completed the survey in Spanish. See Weech-Maldonado et al.10 for further methodological details.


Cultural Competency

The CAHPS Cultural Comparability team developed the CAHPS-CC by: 1) evaluating existing CAHPS surveys to identify existing items that addressed the domains; 2) conducting a literature review in order to identify existing items and instruments; 3) placing a Federal Register notice with a call for measures; 4) reviewing and adapting existing public domain measures; and 5) writing new survey items for each of the domains not addressed in 1 through 4. This resulted in a 49 item draft set. Two independent American Translators Association (ATA) certified translators then conducted two forward translations of the items into Spanish. Subsequently, a committee formed by the two translators and bilingual members of the Comparability team reviewed the translations and reconciled differences. Following translation, cognitive interviews15 were conducted. Lastly, psychometric analyses evaluated the CAHPS-CC in the sample overall.10,16,17

At development’s end, the CAHPS-CC consisted of 27 items addressing the extent to which an experience had occurred (rather than evaluating the experience). The items measured 8 constructs: Doctor communication-Positive Behaviors, Doctor Communication- Negative Behaviors, Doctor Communication- Health Promotion, Doctor Communication- Alternative Medicine, Shared Decision Making, Equitable Treatment, Trust, and Access to Interpreter Services. One can view the entire item set at

Our analyses addressed the Doctor communication-Positive Behaviors, Doctor Communication- Negative Behaviors, Doctor Communication- Health Promotion, Trust, and Equitable Treatment domains only. This occurred because too few individuals used interpreters to create a large enough sample to evaluate the Access to Interpreter Services domain. Additionally, the presence of some bivariate frequencies equal to 0 limited our ability to estimate the polychoric correlation matrix when including all of the remaining items. These “empty” cells occurred as a result of sparse responses in some item’s categories.18 Consistent with the literature, we collapsed categories for polytomous items with this problem 19 and dropped dichotomous items that had this problem. This resolved all estimation problems, but limited our analyses to the five factors listed earlier (and their 19 total items).

Analytical Approach

Measurement bias

We probed for measurement bias following the method described by Millsap and Yun-Tien20 and Carle.12 To evaluate overall fit, we used fit index levels identified by the literature.21,22 Fit evaluation focused on the index set. We used the chi-square difference test (Δχ2) to test for bias. After identifying bias using this omnibus test, we used item level comparisons to identify bias’ source.12 All analyses used Mplus (6.1),18 its theta parameterization and robust weighted least squares estimator and missing data estimation capability. Given the number of models tests and consistent with the literature, we used a more conservative alpha of 0.01 for all significance tests.12 We evaluated bias’ substantive impact on substantive conclusions by comparing the pattern and size of mean differences from a model ignoring measurement bias to a model incorporating measurement bias, as described by Carle.12



Table 1 presents the analytic sample’s descriptive statistics.

Table 1
Descriptive Statistics across Race and Ethnicity and across Survey Language for Analytic Sample

Evaluating Measurement Bias

We initially tested a 5 factor model’s fit (Model 1) across the English and Spanish groups. This model fit well (RMSEA = 0.05, CFI = 0.98, TLI = 0.98). We then tested Model 2, which constrained the loadings to equality across groups. This model also fit well (RMSEA = 0.04, CFI = 0.99, TLI = 0.99) and the constraints did not result in statistically significant misfit (Δχ2 = 12.7, 13, n = 633, p = 0.23). This indicated no statistically significant bias in the loadings. We next examined bias in the thresholds. Thresholds give the level of the latent variable present before a respondent is more likely than not to respond in a given category. Model 3 constrained the thresholds to equality across the groups. The threshold’s equivalence led to statistically significant misfit (Δχ2 = 138.6, 34, n = 964, p < 0.01), revealing bias in at least one threshold. Follow-up analyses indicated bias only in the thresholds of the “easy to understand instructions” items. The final partially invariant model relaxed the ill-fitting constraints. Summarily, we found no differences in the loadings and differences in only one item’s thresholds. Table 2 presents the final partially invariant measurement model.

Table 2
Final Partially Invariant Measurement Model.

Evaluating the Influence of Measurement Bias

Statistically significant measurement bias may not substantively influence scores.23,24 To evaluate bias’ influence, we compared model-based estimates from the final partially invariant measurement model incorporating measurement differences to estimates from a model ignoring bias. Any differences in the pattern of mean differences would indicate influence. For example, White’s had a mean of 0 on each factor (for statistical identification). Thus, we could first evaluate whether the means for each factor and group differed from Whites by examining whether their means differed significantly from 0. If we observed differences, we could then examine changes (if any) in these differences across the models. None of the means for Spanish respondents (Doctor Communication-Positive = -0.062, z = -0.565; Doctor Communication-Negative = -0.052, z = -0.278; Health Promotion = -0.092, z -0.609; Trust = 0.234, z = 2.107; and Equitable Treatment = -0.137, z = -0.393) showed statistically significant differences relative to English respondents. Under the model adjusting for bias, we also observed no statistically significant mean differences, providing support for the hypothesis that bias does not substantially influence mean-based conclusions for these factors.


In this study, we investigated whether the CAHPS-CC provides sufficiently equivalent measurement across individuals responding in English and Spanish. Despite best efforts at survey translation, the possibility exists that two people with equivalent cultural competence experiences who answered the CAHPS-CC in different languages may have responded to questions about their experiences differently. Our results indicate that the CAHPS-CC has equivalent measurement properties across individuals responding in English and Spanish for the domains included in our analyses.

We used MG-CFA and probed for bias across language (Spanish and English) in a sample of Medicaid patients in New York and California. Though we found some statistically significant measurement bias, further analyses demonstrated that the observed bias did not influence mean-based comparative conclusions across language when using the CAHPS-CC. These findings highlight the importance of evaluating whether measurement bias exists and whether any observed, statistically significant bias substantively influences decisions.

These findings support the use of the CAHPS-CC to measure patients’ experiences with culturally competent care across Spanish and English speaking patients. Scores on the measure correspond to and estimate the underlying CAHPS-CC constructs similarly whether or not patients answer in Spanish or English. Patients’ reports should have similar reliability across responses in either language and mean-based estimates should correspond to similar levels of the domain across English and Spanish respondents.

Before concluding, we note some study limitations. Due to sparse categories and relatively small within group sample sizes, we had to collapse some item categories and drop three domains (Shared Decision Making, Alternative Medicine, and Access to Interpreter Services). Therefore we could not examine bias in the full set of thresholds and for all of the domains. Also, our data came from a sample of Medicaid managed care enrollees in two states. New York and California’s Medicaid populations may not generalize to the full Medicaid population. Additionally, we only investigated bias across language using Medicaid patients; our findings may not generalize to other populations. Likewise, we did not have measures of other potentially relevant variables (e.g., income, language ability) that might have influenced our results. Moreover, due to sample size restrictions, we could not further split our groups to examine additional for which we did have measures (e.g., race and ethnicity). Future research in larger, more diverse samples can address all of these issues.

Summarily, we used MG-CFA to examine whether measurement bias influences conclusions based on the patients’ responses to the CAHPS-CC depending on whether they answer the survey in Spanish or English. Though we found some statistically significant measurement bias, our analyses demonstrated that this measurement bias does not substantively influence mean-based conclusions based on patients’ responses. CAHPS-CC users can place confidence in efforts to compare the cultural competence experiences of English and Spanish speakers using the CAHPS-CC on the studied domains.


This project has been funded in part by Commonwealth Fund Grant # 2006627. Robert Weech-Maldonado was supported in part by the UAB Center of Excellence in Comparative Effectiveness for Eliminating Disparities (CERED), NIH/NCMHD Grant 3P60MD000502-08S1. National Institute of Nursing Research grant R15NR10631 supported Adam Carle in part.

Contributor Information

Adam C. Carle, Assistant Professor of Pediatrics, University of Cincinnati School of Medicine, Cincinnati Children’s Hospital Medical Center, Assistant Professor of Psychology, University of Cincinnati College of Arts and Sciences, 3333 Burnet Avenue, MLC 7014, Cincinnati, OH 45229, Office Phone: (513) 803-1650, Fax: (513) 636-0171.

Robert Weech-Maldonado, Department of Health Services Administration, University of Alabama at Birmingham, 1675 University Boulevard, 520 Webb, Birmingham, AL 35294, Phone: (205) 996-5838, Fax: (205) 975-6608, ude.bau@hceewr.


1. National Quality Forum. Endorsing a Framework and Preferred Practices for Measuring and Reporting Culturally Competent Care Quality. Washington DC: 2008.
2. Bureau USC. Annual Estimates of the Population by Sex, Race, and Hispanic Origin for the United States: April 1, 2000 to July 1, 2007 (NC-EST2007-03) 2008
3. Lambert BL, Street RL, Cegala DJ, Smith DH, Kurtz S, Schofield T. Provider-patient communication, patient-centered care, and the mangle of practice. Health communication. 1997;9(1):27–43.
4. McWhinney I. The need for a transformed clinical method. Communicating with medical patients. 1989;9:25–40.
5. Ngo-Metzger Q, Telfair J, Sorkin D, et al. Cultural Competency and Quality of Care: Obtaining the Patient’s Perspective. New York, NY: Commonwealth Fund; 2006.
6. Weech-Maldonado R, Morales LS, Elliott M, Spritzer K, Marshall G, Hays RD. Race/ethnicity, language, and patients’ assessments of care in Medicaid managed care. Health services research. 2003;38(3):789–808. [PMC free article] [PubMed]
7. Weech-Maldonado R, Dreachslin J, Dansky K, De Souza G, Gatto M. Racial/ethnic diversity management and cultural competency: the case of Pennsylvania hospitals. Journal of healthcare management/American College of Healthcare Executives. 47(2):111. [PubMed]
8. Nápoles-Springer AM, Santoyo J, Houston K, Pérez-Stable EJ, Stewart AL. Patients’ perceptions of cultural factors affecting the quality of their medical encounters. Health Expectations. 2005;8(1):4–17. [PubMed]
9. Stewart AL, Nápoles-Springer AM. Advancing health disparities research: can we afford to ignore measurement issues? Medical care. 2003;41(11):1207–1220. [PubMed]
10. Weech-Maldonado R, Carle AC, Weidmer B, Ngo-Metzger Q, Hays RD. Working Paper. Department of Health Services Administration, University of Alabama at Birmingham; 2010. Assessing Cultural Competency from the Patient’s Perspective: The CAHPS Cultural Competency (CC) Item Set.
11. Mellenbergh GJ. Item bias and item response theory. International Journal of Educational Research. 1989;13:127–143.
12. Carle A. Mitigating systematic measurement error in comparative effectiveness research in heterogeneous populations. Medical Care. 2010;48(6):S68. [PubMed]
13. Carle AC. Assessing the adequacy of self-reported alcohol abuse measurement across time and ethnicity: cross-cultural equivalence across Hispanics and Caucasians in 1992, non-equivalence in 2001–2002. BMC Public Health. 2009;9:60. [PMC free article] [PubMed]
14. Carle AC. Tolerating Inadequate Alcohol Dependence Measurement: Cross-cultural Invalidity of Alcohol Dependence across Hispanics and Caucasians in 2001 and 2002. Addictive Behaviors. 2009;34:43–50. [PMC free article] [PubMed]
15. Willis G. Cognitive interviewing: a tool for improving questionnaire design. Sage Publications, inc; 2005.
16. Weech-Maldonado R, Carle AC, Weidmer B, Hurtado M, Ngo-Metzger Q, Hays RD. The Consumer Assessment of Healthcare Providers and Systems (CAHPS®) Cultural Competence (CC) Item Set. Medical Care. In Press. [PMC free article] [PubMed]
17. Carle AC, Weech-Maldonado R. Evaluating Measurement Equivalence across Race and Ethnicity on the CAHPS® Cultural Competence Survey. Medical Care. In Press. [PMC free article] [PubMed]
18. Muthén LK, Muthén BO. Mplus User’s Guide. Los Angeles, CA: Muthén & Muthén; 2009.
19. Crane PK, Gibbons LE, Jolley L, van Belle G. Differential Item Functioning Analysis With Ordinal Logistic Regression Techniques: DIFdetect and difwithpar. Medical Care. Special Issue: Measurement in a multi-ethnic society. 2006;44(11, Suppl 3):S115–S123. [PubMed]
20. Millsap RE, Yun-Tein J. Assessing factorial invariance in ordered-categorical measures. Journal of Multivariate Behavioral Research. 2004;39:479–515.
21. Hu L, Bentler P. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal. 1999;6(1):1–55.
22. Hu L, Bentler PM. Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological methods. 1998;3(4):424–453.
23. Millsap RE. Statistical Approaches to Measurement Invariance. Routledge; 2011.
24. Millsap RE, Kwok O-M. Evaluating the Impact of Partial Factorial Invariance on Selection in Two Populations. Psychological methods. 2004;9(1):93–115. [PubMed]