|Home | About | Journals | Submit | Contact Us | Français|
To assess the reliability and validity of a translated version of the American Hospital-level Consumer Assessment of Health Plans Survey® (H-CAHPS) instrument for use in Dutch health care.
Primary survey data from adults aged 18 years or more who were recently discharged from two multispecialty city hospitals in the Netherlands.
We used forward and backward translation procedures and a panel of experts to adapt the 66-item pilot H-CAHPS into a 70-item Dutch instrument. Descriptive statistics and standard psychometric methods were then used to test the reliability and validity of the new instrument.
From late November 2003 to early January 2004, the survey was administered by mail to 1,996 patients discharged within the previous 2 months.
Analyses supported the reliability and validity of the following 7-factor H-CAHPS structure for use in Dutch hospitals: on doctor's communication, nurses' communication, discharge information, communication about medication, pain control, physical environment of hospital, and nursing services. The internal consistency reliability of the scales ranged from 0.60 to 0.88. Items related to “family receiving help when on visit,” “hospital staff introducing self,” and “admission delays” did not improve the psychometric properties of the new instrument.
These findings suggest that the H-CAHPS instrument is reliable and valid for use in the Dutch context. However, more research will be needed to support its equivalence to the United States version, and its use for between-hospital comparisons.
Like in many advanced countries, current health care reforms which emphasize accountability, transparency, choice, and performance improvement encourage Dutch patients to become critical consumers (Ministry of Health, Welfare and Sport 2002; Ministerie van Volksgezondheid, Welzijn en Sport 2004; ten Asbroek et al. 2004). Health care seekers need all the help they can get to identify appropriate and high quality health care providers, insurers, and institutions (Iezzoni 2002). Surveys which assess patients' health care experiences are increasingly being used by consumers and insurers to choose and contract high performers, as well as to hold them accountable. Several Dutch instruments have been previously developed to capture generic, group-specific (e.g., elderly), or disease-specific (e.g., HIV) health care experiences, satisfaction, opinions, and preferences (Sixma et al. 1998, 2000; Sixma, Spreeuwenberg, and van der Pasch 1998; Hendriks et al. 2000, 2002; Jansen, Hutten, and Spreeuwenberg 2002; Hekkink et al. 2003). The growing role of cross-national learning and comparisons of health systems performance (Arah et al. 2003) imply that there are opportunities for standardization of survey instruments and for international comparisons.
After an extensive literature review, the United States Consumer Assessment of Health Plans Survey (CAHPS®) instruments were selected for adaptation and use in Dutch pilot studies. The Dutch health care system is seeking a family of related and standardized instruments such as the CAHPS® products for multiple (choice, purchasing, and performance improvement) purposes. The CAHPS® products are well-established in the American context (Hays et al. 1999; Spranca et al. 2000; Goldstein and Fyock 2001; Morales et al. 2001; Farley et al. 2002; Hargraves, Hays, and Cleary 2003; CAHPS® Survey Users Network [SUN] 2004; Solomon et al. 2005). Funded by the Centers for Medicare and Medicaid Services in partnership with the Agency for Healthcare Research and Quality in the United States (U.S.), the Hospital CAHPS® (H-CHAPS) instrument was recently developed and field tested to provide comparative information on hospitals and to aid informed choices by patients (CAHPS Survey Users Network 2004; Centers for Medicare and Medicaid Services 2004). During the pilot study in the U.S. in 2003, we received and adapted the pilot version of the H-CAHPS for use in Dutch hospitals. This study is part of concerted efforts, initially spearheaded by a large Dutch health insurer, to import a family of CAHPS® instruments, including ambulatory care (such as the health plan) surveys and facility surveys (such as the H-CAHPS), for measuring health care experiences from the perspective of Dutch patients and consumers, and to use the resulting information for their health care purchasing purposes.1 The aim of this study is to report on the reliability and validity of the H-CAHPS in Dutch hospitals.
We used the 2003 U.S. pilot version of the H-CAHPS: a 66-item instrument which contained 33 core items on patient experiences, three global ratings (of nurses, doctor, and the hospital), one item on the likelihood to recommend hospital to friends and family, and several items on patient characteristics. Twenty-seven of the core items on patient experiences were evaluated on a 1-to-4 response scale, where 1 referred to “never,” 2 “sometimes,” 3 “usually,” and 4 “always.” The remaining six core items had a dichotomous (yes–no) response scale. The global ratings had a 0–10 response scale with only the endpoints labeled (e.g., 0 was “worst hospital possible” and 10 was “best hospital possible”). The one question on patient's likelihood to recommend the hospital to friends and family had the following response scale: 1 “definitely no,” 2 “probably no,” 3 “probably yes,” and 4 “definitely yes.” The instrument asked respondents to think about their last stay at the specified and confirmed hospital. The development of the H-CAHPS through extensive systematic literature review, consumer focus groups, public response to Federal Register notice, stakeholder input, cognitive testing, and a 3-state pilot test in New York, Arizona, and Maryland is detailed elsewhere (The CAHPS® II Investigators and the Agency for Healthcare Research and Quality 2003).
Mindful of the difficulties and equivalence issues involved in transplanting survey instruments from one culture to another (Weidmer, Brown, and Garcia 1999; Streiner and Norman 2003), we had H-CAHPS translated into Dutch by two independent professional translators, and subsequently backtranslated into English by two other independent translators who had never seen the original U.S. version. A panel of seven researchers experienced in patient surveys or survey instrument development was asked to choose a mix of the two versions that came closest to the original U.S. instrument and used clear, comprehensible language. A few adaptations were also made to fit the Dutch context as follows: the two U.S. questions on Hispanic descent and race were replaced by three items on birth place of respondent, mother, and father; and three additional items on getting verbal information on activity limitations (Q47), getting help when home (Q50), and taking new medication at home (Q53). The new Dutch instrument totaled 70 items which included 35 core items.
The study sites were two multispecialty city hospitals with 555 and 386 beds, and 15,761 and 12,606 admissions, respectively, in 2003. Assuming a possible 50 percent response rate, a random sample of 998 patients per hospital was drawn to reflect the hospital's patient population aged 18 years and above and discharged from admission (lasting at least one night) within the previous 2 months. The survey was administered by mail from late November 2003 to early January 2004. Nonrespondents were followed up with a postcard 1 week later, a second questionnaire 3 weeks later, and a reminder letter 5 weeks later. Each mailing pack included a stamped addressed envelope and a cover letter using each hospital's letterhead, endorsed by the Dutch Patients and Consumers Federation, and guaranteeing confidentiality.
We conducted several analyses to assess response and psychometric properties of the instrument. The respondents and nonrespondents could only be compared on age and sex because of Dutch privacy laws. We assessed the percentage of valid and invalid response/skip patterns, and the percentage of responses within the minimum and maximum response categories for each item. Using principal component analysis (with oblique rotation) we explored the factor structure. Principal factor analysis would yield similar results with 30 or more variables exhibiting high communalities as was the case here (Stevens 1992). If an item loaded across multiple factors, it was assigned to the factor where it had the highest loading and/or subsequently maximized the internal consistency.
The internal consistency reliability of the scales was estimated using Cronbach's α (Cronbach 1951), where an α value of 0.70 or more was considered satisfactory. Item-total scale correlations were also calculated, correcting for item overlap, to check for homogeneity of the simple-summated scales created from items that loaded strongly on the factors (Streiner and Norman 2003). Each of these correlations was also checked to see if it was greater than the correlation of each item with scales other than its own. Analyses using imputed values derived from a multivariate normal model (as implemented in SAS PROC MI, SAS Version 9.1) to replace missing data yielded similar results to the ones from the pairwise deletion methods reported here.
Using Pearson product moment correlations (for the ratings) and Spearman rank-order correlations (for the items), we also evaluated associations between (a) items, (b) simple-summated scales from the extracted factors, (c) scales and the global ratings (of nurse, doctor, and hospital) and the likelihood to recommend hospital. The interscale correlations above were to give further insights into the interpretability of the constructed factors as separate scales (if correlations were less than 0.70, a value that follows a similar logic as the scale reliability threshold), thus supporting the multidimensionality of the questionnaire (Carey and Seibert 1993). The correlations between scales and the global ratings were also evaluated using adjusted regression models.
Finally, we evaluated the associations between respondent characteristics and the three global ratings, using multiple linear regression models specified for all respondents, and separately for respondents who were admitted for surgical, childbirth, or other medical reasons. We also looked at the effects of respondent characteristics on patient experience composites, in order to aid future analysis of between-hospital differences (not reported here). All analyses were conducted using the Statistical Package for the Social Sciences (SPSS Inc. 2004).
The survey response rates were 63 and 57 percent for hospitals A and B, respectively (averaging 60 percent, n = 1,194). The respondents were not significantly older than the nonrespondents (mean age 53.2 versus 52.3 years, p = .32). However, there were significantly more females among respondents than among nonrespondents (64 versus 61 percent, p = .03). Table 1 summarizes the respondents' characteristics.
The percentage of cases within the minimum response category value ranged from 0.4% for item Q4 (on how often nurses were respectful) to 66 percent for the dichotomous response item Q49 (on getting written information on problems to watch out for after discharge) (Table 2). For those within the maximum response category, the percentage ranged from 25 percent for item Q41 on being told side-effects of new medicine to 76 percent for item Q53 on getting verbal information on taking medicine at home. An analysis of the response and skip patterns revealed that items were about 92 percent to more than 99 percent consistently completed.
Principal components analysis with oblique rotation yielded seven factors with eigenvalues greater than 1, and which explained 60 percent of the total variance among the items covering patient experiences. The scree test/plot and estimated communalities also lent support to the 7-factor structure. Only the primary factor loadings based on the pattern matrix are presented in Table 2. We dropped items Q27 (family receiving help when visiting), Q28 (hospital staff introducing themselves), and Q43 (admission delay) because they exhibited poor or unclear (difficult to interpret) factor loadings, and did not subsequently improve the internal consistency of the instrument. The remaining 32 items formed the following scales/composites: (1) doctor's communication, (2) nurses' communication, (3) discharge information, (4) communication about medication, (5) pain control, (6) physical environment of hospital, and (7) nursing assistance services. With the exception of item Q35, all items used in the scales exhibited factor loadings approximating or exceeding 0.40.
The internal consistency reliability coefficient was greater than 0.70 for six of the seven factors (Table 2). The scale “physical environment” was the exception with a Cronbach's α of 0.60. Except for item Q35, none of the Cronbach's αs (with the item deleted) exceeded the α for the total scale. The corrected item-total scale correlations ranged from 0.33 (item Q35: medical tests without pain) to 0.79 (item Q12: doctor listening carefully). On approximation only item Q35 did not meet the cut-off value of 0.40 for item-total scale correlations (Nunnally 1978).
Interitem correlations revealed that the similar item-pairs of Q46/47, Q49/50, and Q52/53 which detailed both written and verbal discharge information types found only in the Dutch instrument had moderate-to-low positive correlations of 0.40, 0.42, and 0.32, respectively. Interscale correlations ranged from 0.22 for nursing services and discharge information to 0.68 for nurses' communication and doctor's communication (Table 3). As the interscale correlations were all less than 0.70, the scales could be read as separate scales.
The seven scales were all significantly correlated with the three global ratings and the likelihood to recommend the hospital (Table 4). Nurses' communication exhibited the strongest significant association with nurses' global rating (r = 0.72), hospital rating (r=0.62), and the likelihood to recommend the hospital (ρ = 0.59), with all p < .01. Doctor's communication had the strongest correlation with the global rating of the doctor, but came in second on the global rating for and recommending the hospital. The least associated scale with any global rating or recommending hospital was discharge information, followed by communication about medication (except for nurses' rating).
Finally, as the survey instrument was aimed at providing information on the performance of each hospital corrected for their patients' characteristics, we ran multivariable regression models to determine the relationship between individual respondent characteristics and the global ratings (Table 5). We found that, overall, older age was significantly associated with higher ratings. Gender had mixed, but mostly nonsignificant, associations with the ratings by patients admitted for reasons other than childbirth (the models being corrected for age, education, general health status, and mental health status). Education also showed mixed results, higher education being insignificantly associated with lower ratings by surgical and medical patients, but was significantly associated with higher ratings for childbirth cases (β=0.17, p<.05). Furthermore, poorer general health status was significantly related to lower global ratings by surgical (β ranging from −0.26 to −0.17, p<.001 or p<.05) and medical patients (β ranging from −0.15 to −0.12, p<.05). Mental health status exhibited no substantial associations with the global ratings given by any group of patients in the corrected models. Subsequently, only age and general health status consistently appeared to contribute to differences in between-hospital variations in patient experiences and ratings.
Patient experiences of care are an important data source for evaluating the functioning of the health care system (Cleary and Edgman-Levitan 1997; Cleary 1999). Like in the U.S., patient experience data are used for various purposes in the Netherlands. Recent policy reforms in the Netherlands encourage the use of experience data to provide patients/consumers and insurers information on which to base their choice and contracting of providers and institutions within a regulated health care market (Ministry of Health, Welfare and Sport 2002; Ministerie van Volksgezondheid, Welzijn en Sport 2004; ten Asbroek et al. 2004). Capturing inpatient care experiences augments the technical performance indicators being developed in the Dutch health care system. Moreover, Dutch insurers are looking to use valid performance information from the patients' perspective to guide their hospital contracting, thus potentially widening the scope for the use of patient experience data. In this paper, we presented the initial psychometric properties of the pilot H-CAHPS in the Netherlands.
The items in the pilot instrument could be grouped into the following seven scales or composites: doctor's communication, nurses' communication, discharge information, communication about medication, pain control, physical environment of hospital, and nursing services. These are similar to the ones identified in the U.S. pilot H-CAHPS study (The CAHPS® II Investigators and the Agency for Healthcare Research and Quality 2003; CAHPS Survey Users Network 2004). The factor structure also reflects similar constructs as nursing care, doctor's care, information, hotel care, and discharge seen in existing Dutch instruments (Sixma et al. 1998; Hendriks et al. 2001, 2002; Jansen, Hutten, and Spreeuwenberg 2002). The factors reflect domains of hospital care experiences that patients may find important. Our results further suggest that these constructs may transcend cultural and contextual differences between Dutch and American patients in their understanding and interpretation of hospital care experiences. Additional studies including in-depth cognitive and confirmatory factor analyses will be needed to understand if this is really the case.
Items Q27 (family receiving help when visiting), Q28 (hospital staff introducing themselves), and Q43 (admission delay) had ambiguous loadings on the main factors, and this also appeared to be the case with the U.S. pilot results (The CAHPS® II Investigators and the Agency for Healthcare Research and Quality 2003), thus suggesting that they could be eliminated from the postpilot questionnaire and further analyses. Item Q35 (on medical tests without pain) is also problematic since it has low factor loadings, and did not improve the reliability coefficient of its corresponding scale.
The internal consistency reliability analysis showed that all but one of the seven factors had satisfactory coefficients greater than 0.70. The factor on physical environment had a low consistency coefficient of 0.60 as well as low corrected item-total correlations. This may be because of the low importance attached to this construct by the respondents, as shown in a previous Dutch study where patients were asked to rate which aspects of hospital performance they considered important ( Jansen, Hutten, and Spreeuwenberg 2002). Therefore, given that many surveys have historically asked about amenities such as food, parking, and the physical environment, it may be useful to investigate critically the role that communication with providers plays in patient evaluations of hospital care.
Bivariate correlations between scales and between scales and global ratings supported the multifactor structure of the instrument, despite the moderately high correlations among doctor's and nurses' communication and nursing services. Moreover, they reflected the relatively high importance of nurses' and doctor's communication and pain management to patients' ratings of nurses, doctor, and hospital, and the likelihood to recommend the hospital. Discharge information, communication about medication, and the physical environment were less important in influencing patients' ratings. These findings suggest that scales closest to interpersonal care (Donabedian 1980) issues may trump the relative values of technical and environmental issues such as hospital comfort in the face of ill-health. Interestingly, like in the U.S. (The CAHPS® II Investigators and the Agency for Healthcare Research and Quality 2003) and in previous studies (Abramowitz, Cote, and Berry 1987; Rubin 1990), nurses' communication featured rather prominently in influencing global ratings and likelihood to recommend the hospital. This would be expected given the frequency and nature of nurse–patient interactions. Nevertheless, primary care research indicates that there may be no substantial differences in patient satisfaction with care provided by different types of practitioners (Roblin et al. 2004).
As this patient experience survey instrument was aimed at providing information on specific hospital's performance, it was important that the resulting information reflected the performance of the hospital, not differences because of their patients' characteristics. Our analysis suggested that age and self-reported general health status were among the most important patient characteristics to adjust for when reporting hospital performance (Hall and Dornan 1990; Cleary et al. 1992; Hall, Milburn, and Epstein 1993; Hargraves et al. 2001). Our analyses also supported the well-known trends in patient survey literature: higher ratings were seen among older patients, those with higher self-reported general health status (Hargraves et al. 2001) and the less educated (Hall and Dornan 1990), although the last group was not significant here. Female patients sometimes gave lower ratings (also not significant here) (Hargraves et al. 2001).
Although we have presented some good psychometric results of the new instrument to support its reliability and validity, several limitations existed in the study. First, there were only two hospitals involved in this study, hence limiting the potential for exploring hospital-level psychometric properties of the instrument. Second, for lack of resources, we were unable to conduct prior cognitive testing among patients to aid clarity, comprehension and establish equivalence. Third, because of Dutch privacy restrictions, we could not thoroughly analyze nonresponse in this study. Fortunately, the second phase of the Dutch pilot is already underway to try to address most of these shortcomings using multiple hospital sites.
The H-CAHPS pilot instrument is reliable and valid for use within the Dutch context. Initial evaluation suggests that the American H-CAHPS constructs are not alien to the Dutch situation. In the future, it would be beneficial to pool the U.S. and Dutch data for further psychometric and performance analyses. This is necessary if equivalence and comparability are to be explored in-depth. This study contributes to the reform of the Dutch health care system aimed at increasing its transparency and performance.
The following supplementary material is available for this article online:
Pilot Core Items and Differences between the U.S. and Dutch Versions
We thank the participants of the AcademyHealth's 2004 Annual Research Meeting held in San Diego, California, and of the International Society for Quality in Health Care's 2004 annual conference held in Amsterdam for their useful comments on earlier abstracts of this work. We also thank the RAND Graduate School in Santa Monica, California for their invitation to present and discuss the project on which this paper is based. We appreciate the support of Dr. Charles Darby (Agency for Healthcare Research and Quality). This study was funded by Agis Zorgverzekeringen, the Netherlands. We are also grateful to the editors and two anonymous reviewers of the Journal, Health Services Research, for their helpful comments.
1This is addressed in “Delnoij DMJ et al. Made in the USA: the import of the American Consumer Assessment of Health Plans Study surveys into the Dutch insurance system (submitted manuscript).”