|Home | About | Journals | Submit | Contact Us | Français|
To describe the systematic language translation and cross-cultural evaluation process that assessed the relevance of the Hospital Consumer Assessment of Healthcare Providers and Systems survey in five European countries prior to national data collection efforts.
An approach involving a systematic translation process, expert review by experienced researchers and a review by ‘patient’ experts involving the use of content validity indexing techniques with chance correction.
Five European countries where Dutch, Finnish, French, German, Greek, Italian and Polish are spoken.
‘Patient’ experts who had recently experienced a hospitalization in the participating country.
Content validity indexing with chance correction adjustment providing a quantifiable measure that evaluates the conceptual, contextual, content, semantic and technical equivalence of the instrument in relationship to the patient care experience.
All translations except two received ‘excellent’ ratings and no significant differences existed between scores for languages spoken in more than one country. Patient raters across all countries expressed different concerns about some of the demographic questions and their relevance for evaluating patient satisfaction. Removing demographic questions from the evaluation produced a significant improvement in the scale-level scores (P= .018). The cross-cultural evaluation process suggested that translations and content of the patient satisfaction survey were relevant across countries and languages.
The Hospital Consumer Assessment of Healthcare Providers and Systems survey is relevant to some European hospital systems and has the potential to produce internationally comparable patient satisfaction scores.
Across the globe, consumer groups, practitioners and governing agencies (e.g. ministries of health, regulatory boards, etc.) increasingly place patient satisfaction with hospital care as a priority outcome for health system performance . Many researchers have designed instruments to measure patient satisfaction that are specific to a country's health system or individual hospital, with most countries having a standard set of questions on the topic [2–12]. Survey question length and content can vary widely; therefore, comparisons between countries can be challenging [2, 12]. The Picker Institute, for example, conducted some of the first comparative studies of patient satisfaction in Europe and produced some standardized results [13–15]. Interpersonal care processes also influence patients' perceptions of how satisfied they are with their hospital experience, particularly in a country's cultural minorities [4, 6, 10–12]. Other studies have cited factors related to healthcare personnel as key influences in patient satisfaction scores [8, 9, 16–18]. Yet, in order to compare performance across health systems, standardized and comparable measures of patient satisfaction are necessary.
The RN4CAST project is a 12-country (Belgium, England, Finland, Germany, Greece, Ireland, The Netherlands, Norway, Poland, Spain, Sweden and Switzerland) comparative nursing workforce study funded by the Seventh Framework Programme of the European Commission aimed at developing innovative forecasting methods for developing and sustaining the nursing workforce . Researchers from the USA also participated in the study under separate funding mechanisms. One goal of the study was to examine if there was a relationship between patient satisfaction and the nursing workforce. Eight of the 12 countries agreed to collect patient satisfaction data as one of the outcomes sensitive to the performance of the nursing workforce. A previously tested instrument for comparing patient satisfaction in Europe, however, was not available to the study's team.
Therefore, to standardize the measurement of patient satisfaction with the hospitalization experience, the principal investigators proposed that the study use the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey. The survey was originally developed for use in the USA by the Centers for Medicare and Medicaid Services (CMS) in partnership with the Agency for Healthcare Quality Research (AHRQ)  and later endorsed by the National Quality Forum (NQF). ‘The HCAHPS survey asks discharged patients 27 questions about their recent hospital stay. The survey contains 18 core questions about critical aspects of patients' hospital experiences (communication with nurses and doctors, the responsiveness of hospital staff, the cleanliness and quietness of the hospital environment, pain management, communication about medicines, discharge information, overall rating of hospital, and would they recommend the hospital),’ and demographic questions that allow a researcher to adjust for patient mix . The survey's emphasis on communication and interactions between providers and patients also made its potential for cross-cultural sensitivity high. Research by O'Malley et al.  found the HCAHPS to be sensitive across differently sized hospitals in the USA. Therefore, for the RN4CAST study, it offered the potential to produce comparable results across health systems among the participating countries and with US data.
The purpose of this study is to describe the systematic translation and cross-cultural evaluation process used by the RN4CAST project in five European countries to determine, prior to data collection, the cross-cultural relevance and applicability of the HCAHPS in the European context. At present, the validated translated versions of the HCAHPS are available in American English, Spanish (Latin American), Mandarin Chinese, Russian and Vietnamese (http://www.hcahpsonline.org/surveyinstrument.aspx). The available translations reflect the dominant non-English speaking immigrant populations in the USA. Eight countries out of 12 involved in the study opted to include patient satisfaction data in their study. The participating countries included Belgium, Finland, Germany, Greece, Ireland, Poland, Spain and Switzerland. No translations of the HCAHPS were available in several of the languages; thus, the RN4CAST team had to translate the instrument into seven additional languages (Dutch, Finnish, French, German, Greek, Italian and Polish) and cross-culturally evaluate the instrument prior to data collection. In the end, five countries participated in the pre-data collection, cross-cultural evaluation process reported in this study.
Translating an instrument for use in a multi-country, comparative study requires not only translating the instrument from the source language to the target one, but also performing a cross-cultural evaluation of the instrument's applicability to the new context [23–26]. A rigorous review by Maneersriwongul and Dixon  concluded that simple forward and back translation of instruments alone, even when researchers conduct factor analyses post-data collection, are insufficient to produce reliable and valid results from a translated instrument. Flaherty et al.  recommend that instruments used across cultures and that require translation undergo an evaluation that involves content, context, conceptual, semantic and technical equivalence to ensure that the instrument is appropriate for use in the new location (see Fig. 1 for definitions). Failure to integrate this kind of evaluation can produce significant issues related to contextual and conceptual equivalence , especially when administrative language (i.e. managerial roles, terms of reimbursement, etc.) is involved in the translation process.
Prior to embarking on the evaluation study in Europe, Liu et al.  undertook a pilot study of the translation method proposed for use in Europe. The pilot study took place in China and attempted to use the US-translated Mandarin Chinese version of the HCAHPS . Initial review of the Chinese translation used in the USA by immigrants found that the translations had some subtle linguistic differences that were deemed sufficient to affect results and resulted in another translation into mainland Mandarin Chinese. The pilot study helped inform the final approach to language translation used in the RN4CAST study for both the nurse and patient surveys .
To translate the HCAHPS survey, with the translation framework developed by Squires et al.  serving as a methodological guide for the cross-cultural adaptation process, each country's teams used the following steps. It began with a review of the instrument by ‘research experts’ comprised of representatives from each participating country's team. For the first 22 items in the survey, the team determined if there were any US health system-specific terms that might pose a problem for translation. The only translation issue that emerged was that some answer descriptors, like the difference between ‘fair’ and ‘poor’, proved difficult to conceptually differentiate for translation purposes for most non-native English speakers. For Likert-type responses found in the HCAHPS survey (e.g. never, sometimes, often, always), standardized translations were used to ensure equivalence across languages and cultures.
The demographic questions in the HCAHPS items 23 through 27 posed some problems related to contextual and conceptual relevance. While important for risk adjustment purposes, the questions about race and ethnic identity were specific to the USA and not applicable to all countries involved in the study. Issues about educational equivalence also arose since the exact equivalence of primary, secondary and post-secondary education across European countries is not well established. European team members also indicated that these types of questions are not commonly asked in survey research in the region. Thus, the result of the initial review was that the teams opted to keep risk adjustment questions (How would you rate your health overall?) and adapt the educational equivalence criteria found in an additional item. This strategy allowed the main questions of the instrument [1–22] to remain intact and the demographic ones to reflect each country's needs. The final instrument contained 24 questions in total, with the original 22 questions maintained and 2 questions focusing on demographics.
Once the final version was established, the systematic translation process used by the team for the HCAHPS translations involved the use of experienced translators (separate for forward and back translations as is standard practice) and a review of the resulting translations by the country teams, who were all bilingual. These combined steps address all five aspects of Flaherty's criteria. Then, an evaluation by ‘expert’ raters of the relevance of the survey's questions to the hospital care experience in the country also took place. Expert reviewers have excellent consistency with predicting the relevance of survey questions to the population of interest, as a recent investigation by Olson  demonstrated. In the case of this study, recently hospitalized patients were defined as the ‘experts’ since they are the ones who experience the results of the delivery of health services by healthcare professionals and system operations. Thus, each country's team aimed to recruit 7–12 patients who had experienced a hospitalization within the last year to serve as an expert rater. The patient experts also had to be able to follow instructions for completing the evaluation of the survey questions and have enough years of education to complete the task. A patient rater's ability to speak English was not required for this aspect of the cross-cultural evaluation process because of the difficulty in gauging the English fluency of the patient experts. The use of patient raters addresses Flaherty's evaluation criteria around content and contextual equivalence.
Once selected by the country's research team, each patient rater received oral and written instructions in their own language about evaluating the relevance of the survey questions to their hospitalization experience. The raters used content validity indexing (CVI) techniques and had the opportunity to make comments on each item and about the survey as a whole. Using the CVI approach, raters scored each survey question for their relevance to the patient care experience using the following scale: 1, = Not relevant, 2 = Somewhat relevant, 3 = Very relevant and 4 = Highly relevant. CVI techniques produce an item level score (an average of all raters evaluations of a question, known as an I-CVI) and then a scale-level score (S-CVI) which is the average of all item level scores for a question . A common concern with the CVI approach is the possibility of chance agreement among raters occurring . To address that concern, once the patient's scores completed, the research team used a formula that adjusts the CVI calculation to account for chance agreement between the raters . The resulting modified kappa score can then be used to evaluate the cross-cultural relevance of a survey question. The score is a reflection of chance-agreement corrected proportions of patient agreement that raters scored and item as ‘relevant’ and ‘highly relevant’.
With seven languages involved in the rating process, a total of 70 patient raters were invited to evaluate the HCAHPS translations. Sixty-eight patient raters (97% participation rate) participated in the process, with only the Swiss-Italian group having 8 raters; all other language groups had 10. As Table 1 illustrates, the scale-level modified kappa scores ranged from 0.63 to 1.00. Swiss German and Italian translations received the lowest scores while Greek was the highest. Per the modified kappa scoring standards recommended by Cicchetti and Sparrow  and Fleiss , the team concluded all translations were acceptable for use and had high overall, scale-level relevance scores for their potential applicability to each country's patient care services experience. ‘Excellent’ ratings by the kappa standard were obtained for all translations except for the Swiss German and Swiss Italian which received ‘good’ overall ratings. Only demographic questions received ‘poor’ ratings at the item level.
Because of the scale-level scores for the Swiss German and Swiss Italian translations of the HCAHPS instrument, we explored the effect of removing the item-level scores about the demographic questions from the overall relevance ratings by the patient experts. This resulted in an increase in almost all of the scale-level scores. To determine if the scale-level scores were significantly different if the demographic questions were removed from the process, a t-test analysis was conducted. The result confirmed that the removal of the demographic questions produced a statistically significant increase in the relevance scores (P= .018, −0.069 to −0.009, 95% CI).
Comments by the patient raters further confirmed the effect of the personal questions on the cross-cultural relevance scores. Many patient raters commented that they did not understand the need to collect ‘personal’ information and that they did not see how things like education level, in particular, were relevant to patient satisfaction scores. The comments by Swiss German and Italian patient raters also shed some light on the lower overall scores as they slanted toward the negative about the entire survey.
The results from this study suggest that patients view the HCAHPS as relevant to their patient care experiences in their home countries. The instrument, as a result, may adapt well across cultures and developed country health systems for measuring patient satisfaction with in-patient acute care services. The experience of the cross-cultural, pre-data collection evaluation process does raise multiple methodological issues researchers may need to consider when designing multi-country comparative health services research studies.
To begin, the effect of the demographic questions on the overall scores of the instrument has several implications for comparative health services research. First, early evaluation by the research teams hinted at the potential problems that could arise with the demographic questions, in particular when trying to compare educational levels across countries. These issues, however, were mostly technical in nature, like trying to determine what constituted educational equivalence. Patients' scores and feedback, however, highlighted other concerns about demographic questions. On the positive side, the patient experts' comments provided the team with valuable feedback that allowed them to anticipate questions that might go unanswered during the survey process.
Negatively, however, the reaction of patients to sharing personal information on this kind of survey did affect the overall cross-cultural relevance scores of the survey. Even when researchers think it is important, patients or other research participants may not perceive standard demographic questions as relevant to a survey, thereby affecting the scores of an overall instrument. The improvement in the majority of relevance rating scores when the personal questions were removed from the overall scale score illustrates that phenomenon. Another implication regarding the effect of evaluating personal questions contained in established instruments is that researchers need to be more sensitive and judicious about asking for what patients may perceive to be unrelated personal information. Furthermore, the question of whether or not researchers even need to include personal questions in the expert evaluation process that uses CVI techniques remains unanswered and may be specific to the country. It may be worthwhile for researchers to have personal questions evaluated separately when integrated into a survey instrument that will be applied in other contexts, cultures or countries.
Researchers also need to use their best judgment to determine the potential impact on results of expert rater identity. The patient raters' feedback, through scoring and comments, provided valuable insight to the team that was not identified by ‘research’ experts alone. This study suggests that combining expertise from researchers and subjects when adapting an instrument for use across cultures and countries may be a stronger approach methodologically to pre-data collection evaluation of a survey instrument. The approach, however, does require further study. Additional follow-up for low scoring questions once data is collected is also necessary to determine if rater evaluation of survey items can accurately predict missing data patterns or unexpected responses.
As with any approach, there are limitations to this study and the methods undertaken. Selection bias by the teams toward specific patient raters may have also occurred and we recommend that researchers develop clear guidelines for selection when choosing raters. Grant and Davis  provide some useful references for rater selection. Some raters also had difficulty with the concept of evaluating questions instead of answering them. Finally, since the predictive validity of this process for survey results has not yet finished, researchers should give due consideration of the strength of the approach.
To conclude, the value of this type of pre-data collection, cross-cultural instrument evaluation process is that it lays a solid foundation for comparative country studies of patient satisfaction. The potential threat to validity related to language translation is reduced significantly. Consequently, researchers and policymakers can increase their certainty that their explorations about the impact of health system structures on patient satisfaction across countries have accounted for the basic requirements of rigorous cross-cultural research, addressed role variations among healthcare workers and the financing mechanisms that may differentially affect access to services.
This research is funded by the European Union's Seventh Framework Programme FP7/2007-2012 under (grant agreement 223468); and the National Institute of Nursing Research (R01NR04513, T32NR0714 and P30NR05043 to L.A.).
The authors would like to thank the expert raters who participated in the study and the RN4CAST Consortium. For a complete list of consortium members, please go do: www. rn4cast.eu