|Home | About | Journals | Submit | Contact Us | Français|
To review the existing literature (1980–2003) on survey instruments used to collect data on patients' perceptions of hospital care.
Eight literature databases were searched (PubMED, MEDLINE Pro, MEDSCAPE, MEDLINEplus, MDX Health, CINAHL, ERIC, and JSTOR). We undertook 51 searches with each of the eight databases, for a total of 408 searches. The abstracts for each of the identified publications were examined to determine their applicability for review.
For each instrument used to collect information on patient perceptions of hospital care we provide descriptive information, instrument content, implementation characteristics, and psychometric performance characteristics.
The number of institutional settings and patients used in evaluating patient perceptions of hospital care varied greatly. The majority of survey instruments were administered by mail. Response rates varied widely from very low to relatively high. Most studies provided limited information on the psychometric properties of the instruments.
Our review reveals a diversity of survey instruments used in assessing patient perceptions of hospital care. We conclude that it would be beneficial to use a standardized survey instrument, along with standardization of the sampling, administration protocol, and mode of administration.
Patient evaluations of hospital care can be useful to payers, regulatory bodies, accrediting agencies, hospitals, and consumers. All of these parties can use this information to gauge quality of hospital care from the patients' perspective (Marino, Marino, and Hayes 2000). Hospitals can use this information to focus on specific areas for improvement, strategic decision making (Sower et al. 2001), managing the expectations of patients (Hickey et al. 1996), and benchmarking (Dull, Lansky, and Davis 1994). Ultimately, the reporting of patient evaluations can influence the delivery of care (Howard et al. 2001).
Many of the benefits of measuring and reporting patient evaluations of hospital care result from using standardized performance information. Clearly, to adequately make comparisons across hospitals requires each facility to measure and report the same information. As described elsewhere in this issue (Goldstein et al. 2005), systematic efforts are underway by the Centers for Medicare and Medicaid Services (CMS) to make standardized performance information on hospitals publicly available. As part of the background for this effort, we reviewed the existing literature on survey instruments used to collect data on patients' perceptions of hospital care. We describe and compare the format, content, and administration issues associated with these previously used survey instruments.
We searched the PubMED, MEDLINE Pro, MEDSCAPE, MEDLINEplus, MDX Health, CINAHL (Cumulative Index for Nursing and Allied Health Literature), ERIC, and JSTOR databases. These searches were conducted with a combination of key words. We limited the searches to articles in English and those with abstracts. Searches returning more than 250 articles were further filtered by using terms such as “questionnaire” and “hospital.” We undertook 51 searches with each of the eight databases, for a total of 408 searches.
After the searches were conducted, the abstracts of the returned articles were examined, to determine their applicability for review. Relevant studies were defined liberally to be those that included any discussion of perceptions of hospital care. Articles that included a survey instrument were included in the analyses. When more than one article was identified reportedly using the same survey instrument, all the articles were included in the analyses; we did not restrict this review to one article per survey instrument. This approach was used because it provided more information on the instruments, such as response rates and psychometric properties.
We identified articles that included a patient survey of hospital care for further examination. We also consulted several survey development texts (Krowinski and Steiber 1996; Cohen-Mansfield, Ejaz, and Werner 2000) to construct our approach for characterizing the hospital survey instruments.
These texts describe how to develop the content of a survey instrument, implementation issues to have a usable survey, and performance of the instrument. To characterize hospital survey instruments, we followed these same general steps. First, we provide some basic information, including the name of the instrument. Second, the contents of the instruments are presented, including the number of domains used. Third, implementation characteristics associated with conducting the surveys are presented, including the sample size per facility. Fourth, performance characteristics of the instruments are presented, including the response rates and psychometric properties.
We first identified the study author(s) and the name of the survey instrument developed (if any). Some instruments were modified from preexisting instruments, or were amalgams of preexisting instruments. Details on the origins/modifications of the survey instrument are given. The setting includes the number and type of hospitals in which the study was conducted. We also identified the type of respondent from whom the instrument was designed to collect data: patients, family, or staff. The number of respondents in the study is also provided.
Second, the contents of the survey instruments are further described. We note the number of items in the instrument, excluding demographic and other background questions. Patient survey instruments often classify “like” questions together; for example, capabilities of staff, staff politeness, and the caring nature of staff might be sorted into a staff “bucket” or category. These similar questions are generally referred to as “domains.” We present the number of domains included in each instrument.
In addition, we present the type of domains included in each survey instrument. We also present the type of rating scale used in the instruments (Krowinski and Steiber 1996), and categorize the response scale in terms of whether it is open-ended or close-ended, the number of close-ended response options (dichotomous or multiple categories), and the nature of the response scale. The nature of the response scale included: evaluation (e.g., poor, fair, good, very good, excellent), frequency (e.g., none of the time to all of the time), satisfaction (e.g., very satisfied to very dissatisfied), visual analog, or Chernoff face formats. A visual analog format (also called graphic scaling) is a pictorial scale that usually has some implied interval value (e.g., scale from 0 to 10). Chernoff faces are a pictorial representation with smiles and frowns.
Third, we present characteristics of how the survey instrument was used—that is, implementation characteristics. We present whether any information is provided as to when the instrument was given (or mailed) to respondents (e.g., 2 days after discharge). Survey initiatives can also differ on the target sample size of respondents per facility (or unit). We record these target sample sizes. We also report whether the survey was administered by in-person interviews, telephone, mail, or drop-box.
In some cases, specific sample inclusions are given—for example, including only persons 18 years and older. These sample inclusions are also noted. In addition, in some cases sample restrictions are made—for example, excluding patients receiving hospice services. We record whether any such restrictions are made.
Fourth, we document the performance characteristics of the survey instruments. This includes the response rates and whether information about the reliability (internal consistency, test–retest, and interrater) and construct validity are reported.
We provide information on the time to conduct interviews and further psychometric properties of the instruments. In the interest of space, we do not report the actual levels of reliability and validity achieved for each instrument, instrument domains, or individual questions. Rather, we report whether reliability or validity of the instrument was evaluated (yes or no). Nevertheless, we do note any unusual results (e.g., poor performance), what analyses were used (e.g., factor analysis), or whether any other instrument assessment was undertaken.
The key words and results for the first nine key word searches are summarized in the on-line Appendix Table A. The results in the first column of figures of this table show the number of articles identified from the PubMED literature database. For example, 1,289 articles were identified in PubMED using the search term “survey and data collection protocols.” Results in subsequent columns show the number of additional articles identified, using the other literature databases. For example, using this same search term (“survey and data collection protocols”) eight additional articles were identified using MEDLINE Pro. This literature search identified 246 articles, of which all of the abstracts were reviewed. From these 246 abstracts, 84 full-length articles were subsequently examined, with 59 presenting sufficient information to be included in this review.
The descriptive characteristics of the survey instruments are shown in Table 1. The study settings are diverse, ranging from single hospitals to a system comprised of 135 medical centers. Studies are also geographically diverse coming from many regions of the U.S., Europe, and the Middle East. Likewise, the number of respondents included in these studies varied widely from 70 to approximately 25,000. Most studies used patients as respondents, although a few assessed family or caregivers. Twenty-six studies used mail surveys, 13 telephone, four drop-boxes, and 12 used in-person interviews.
Summary characteristics of the content, implementation, and performance of the survey instruments are shown in Table 2. The information is also provided by each of the major modes of survey administration (mail, telephone, drop-box, and in-person interviews). The number of items included in the instruments varied from eight to 121. The average values show more questions were generally asked in mail surveys (average=45 questions) and fewer in drop-box surveys (average=16 questions). Likewise, the number of domains varied and included instruments with one domain to as many as 14. However, the average number of domains by mode of administration seemed quite consistent at about six.
We also identified various response formats; however, the most common was an evaluation type response format. The names of the domains and response formats are shown in the on-line Appendix Table B. Looking across studies, we found that the five most-common domains were nursing, physicians, food, services, and care (not shown in the table).
The lag postdischarge until mailing of the survey instrument varied from 1 week to 6 months, although many (19 percent) studies using mail surveys were sent more than 4 weeks postdischarge. Telephone surveys had a shorter lag time; among the studies for which data were available, most were conducted between 2 and 4 weeks postdischarge. The majority of studies using drop-box surveys or in-person interviews were conducted on-site prior to patient discharge. Few studies provided a target sample size when using the survey instrument. Studies that did give target sample sizes varied from 10 per department to 1,400 per hospital. The target sample size averaged 510 per hospital for mail surveys and 10 per hospital for drop-box surveys. Sample inclusions and exclusions are also shown in the on-line Appendix Table C.
Response rates varied widely, with one study having a 17 percent response rate and another study having a 92 percent response rate. The average response rate for mail surveys was 47 percent, telephone interviews 70 percent, drop-box surveys 63 percent, and in-person interviews 75 percent. The majority of studies provided little information on instrument reliability or validity. For example, 54 percent of studies using mail surveys provided measures of internal consistency; but only 15 percent provided measures of construct validity.
More detailed information on the performance characteristics of the survey instruments, including the completion time, reliability and validity, are provided in the on-line Appendix Table D. However, few studies provided information on the time needed to complete the instrument. For the six studies that provided this information, the time needed to complete instruments varied from 10 to 60 minutes.
Prior reviews of the literature on patient perceptions of hospital care have cited the existence of relatively few survey instruments (e.g., Rubin 1990). In this review we examined 59 studies providing information on 54 different survey instruments. This provides some evidence of the increasing salience of use of patient survey instruments addressing hospital care in recent years.
In examining these survey instruments we provide details on descriptive information, instrument content, implementation characteristics, and performance characteristics. Following these general categories a critique of these existing instruments follows, along with suggestions for future research.
The survey instruments varied greatly with respect to both the number of institutional settings in which they had been used and the number of patients to whom they had been administered (see Table 1). On the one hand, many survey instruments have been administered in only a few institutional settings and to a limited number of patients; on the other hand, we identified instruments that haven been administered at hundreds of hospitals with thousands of patients. The SERVQUAL, Press Ganey Associates instrument, and Picker questionnaires are notable examples of survey instruments falling in the latter category.
A variety of different domains of patient perceptions are represented (see Table 2 and on-line Appendix A). In some cases this occurs because survey instruments were developed for very specific purposes (e.g., for use in the ER). The more general instruments measuring patient perceptions of hospital care did yield domains common to these instruments: nursing, physicians, food, services, and care. However, these domains differ in the level of detail of questions and number of items. This divergence in emphasis may be a consequence of the fact that many instruments were developed using expert opinion rather than patient input. Expert opinion is often confounded with clinical measures of care quality (Oermann and Templin 2000) and does not necessarily correspond with patient evaluation of care quality. Indeed, of the 54 different survey instruments we examined, 13 (24 percent) were developed using expert opinion, six (11 percent) used patient input, seven (13 percent) used both expert opinion and patient input, and for 28 survey instruments (52 percent) we could not determine how they were developed.
In future questionnaire development initiatives, consulting studies that have examined patients' evaluations of care may be useful. The Institute of Medicine's (IOM 1999) nine domains of care were developed from patient input and can provide useful guidelines for survey-item development. These nine domains are: respect for patient's values; attention to patient's preferences and expressed needs; coordination and integration of care; information, communication, and education; physical comfort; emotional support; involvement of family and friends; transition and continuity; and access to care. The CAHPS Hospital Survey domains (nurse communication, nursing services, doctor communication, physical environment, pain control, communication about medicines, and discharge information) were derived from the IOM domains (Goldstein et al. 2005). These domains derived from patient input may be influenced by cultural factors, and may not apply to settings outside of the U.S. For example, some modifications to items (e.g., race/ethnicity questions) were made and items were added in a recent adaptation of the CAHPS hospital survey for use in Dutch hospitals (Arah et al. 2005).
It was not surprising that we identified survey instruments developed for very specific purposes (e.g., for use in the ER [Burstin et al. 1999], nuclear medicine [Harding et al. 1994], psychiatric care [Eisen et al. 2002], oncology [Brédart et al. 1999], and critical care [Conover et al. 1999]). General instruments may not be specific enough to identify areas for quality improvement in all hospital departments. Longer instruments can be advantageous, as they can provide more detailed information to departments, but there are limits on how many questions can be included in a survey instrument before response rates are adversely affected. An alternative approach to extending the length of instruments is to use a brief core set of questions, followed by a series of specific questions more relevant to individual departments. States and accreditation bodies can use the core instrument to assess perceptions of care in the aggregate, and the more-specific items could be used by the facility for quality improvement. However, this requires a more-sophisticated targeting approach that would require a patient receive the correct department-specific instrument.
Instruments measuring patient perceptions of hospital care were administered by telephone, mail, and interview; or were collected by drop-box (see Table 2 and on-line Appendix C). However, the majority of survey instruments were administered by mail. No web-based patient surveys were identified.
No agreement on when the instruments should be administered was evident. Many instruments were mailed months after patient discharge. This may have something to do with the limits of hospital administrative databases that are used to construct the mailing lists. Still, a potential bias to collecting information is recall bias. That is, over time patients' abilities to reliably remember their hospital care may decline (Krowinski and Steiber 1996). For example, Ley et al. (1976) found ratings of care to be less positive at 8 weeks compared with those at 2 weeks. However, we cannot simply generalize that a shorter lag time is more beneficial. If patients' perceptions become more or less negative as time passes, this does not necessarily mean that they are based on less reliable recollections. Recollections may be just as accurate, but the features of care patients regard as important may change over time. It may also be that additional time postdischarge gives patients additional data points to consider (e.g., regarding coordination or care and/or success of treatment) by the time they are asked to evaluate their care. In these cases, it would be reasonable for patients' evaluations to be affected by this new, additional data, and thus change/differences in evaluations associated with the passage of time may not necessarily reflect memory reliability at all.
Several studies found telephone interviews to be advantageous in terms of more-rapid contact with patients and higher response rates (e.g., Woodside and Shinn 1988; Hargraves et al. 2001). However, a potential bias to surveys involves social desirability, leading to more positive assessments of care (Hays and Ware 1986). Social desirability might be more of a problem with telephone administration because this involves more-direct contact, and it may be more difficult for the respondent to feel anonymous. In addition, phone interviews may cost more than mail surveys.
The length of the survey instruments was highly varied. As discussed above, short, very general instruments may be less useful than longer detailed instruments. But, longer instruments carry more response burden and may lower response rates. Indeed, examining the instruments in this review, we find a −.65 correlation between response rate and number of questions.
One of the limitations of surveys of patient perceptions of hospital care can be low response rates (Barkley and Furse 1996). Low response rates are cited as providing different results from high response rates (Barkley and Furse 1996). Our review of the literature identified both relatively high and low response rates (see Table 2 and on-line Appendix D). Nonrespondents may have less favorable perceptions of care than respondents (Barkley and Furse 1996; Mazor et al. 2002; Elliott et al., 2005). However, often very little information is provided on how the response rates are calculated.
A related issue is the representativeness of the patients selected to receive a survey instrument. In some cases the sampling criteria that were used in the studies reviewed appear to have been biased (e.g., by including only patients hospitalized for 3 days or more). In other cases, the sampling criteria may be appropriate, but precision of estimates and power to detect differences was limited by small sample size. Few studies reviewed provided information on whether a sufficiently large sample size was selected such that reasonably accurate point estimates could be reported or that meaningful differences between units of interest at a given point in time could be reported. In addition, Ehnfors and Smedby (1993) report, such problems in sampling can greatly influence survey results.
We identified few articles providing extensive psychometric properties (see Table 2 and on-line Appendix D). In many studies even basic psychometric properties were often not reported. This is important because poor survey instruments “… act as a form of censorship imposed on patients. They give misleading results, limit the opportunity of patients to express their concerns about different aspects of care, and can encourage professionals to believe that patients are satisfied when they are highly discontented” (Whitfield and Baker 1992, p. 152).
The plethora of survey instruments measuring patient perceptions of hospital care is heartening; but, the advantages of a standardized core instrument cannot be realized when multiple different instruments are used. For example, benchmarking and report cards facilitating consumer choice may be impeded. Our review clearly shows that there are a variety of approaches regarding the instrument domains, how they are measured, and when perceptions of care are elicited. We conclude that a standardized instrument would be beneficial. Moreover, our results also show that it may also be beneficial to standardize the sampling, administration protocol, and mode of administration of survey instruments.
The following supplementary material for this article is available online:
Results of Literature Search (1980–2003).
Content Characteristics of Instruments Collecting Patient Perceptions of Hospital Care.
Implementation Characteristics of Instruments Collecting Patient Perceptions of Hospital Care.
Performance Characteristics of Instruments Collecting Patient Perceptions of Hospital Care.
This work was supported by grant number 5 U18 HS00924 from the Agency for Healthcare Research and Quality.