|Home | About | Journals | Submit | Contact Us | Français|
To assess variation in safety climate across VA hospitals nationally.
Data were collected from employees at 30 VA hospitals over a 6-month period using the Patient Safety Climate in Healthcare Organizations survey.
We sampled 100 percent of senior managers and physicians and a random 10 percent of other employees. At 10 randomly selected hospitals, we sampled an additional 100 percent of employees working in units with intrinsically higher hazards (high-hazard units [HHUs]).
Data were collected using an anonymous survey design.
We received 4,547 responses (49 percent response rate). The percent problematic response—lower percent reflecting higher levels of patient safety climate—ranged from 12.0–23.7 percent across hospitals (mean=17.5 percent). Differences in safety climate emerged by management level, clinician status, and workgroup. Supervisors and front-line staff reported lower levels of safety climate than senior managers; clinician responses reflected lower levels of safety climate than those of nonclinicians; and responses of employees in HHUs reflected lower levels of safety climate than those of workers in other areas.
This is the first systematic study of patient safety climate in VA hospitals. Findings indicate an overall positive safety climate across the VA, but there is room for improvement.
The health care industry attempts to cure patients while avoiding problems and negative outcomes resulting from the processes of care. In many ways, health care strives to be a “high-reliability” enterprise. High-reliability organizations (HROs), which include the aviation and nuclear power industries, successfully perform highly complex and repetitive tasks while avoiding disastrous events. The reliability of HROs is attributed in part to having a strong safety climate, and several different formulations of key principles of HROs have been introduced into health care (Gaba 2001; Weick and Sutcliffe 2001, 2003; Roberts and Tadmor 2002; Rall and Dieckmann 2005; Gaba, Singer, and Rosen 2006). Research has shown several components to be important in achieving a high-safety climate: (1) attitudes and perceptions must be uniformly safety oriented (Hofmann and Morgeson 1999; Sexton, Thomas, and Helmreich 2000b; Katz-Navon, Naveh, and Stern 2005); (2) procedures, structures, and resources supportive of safety are in place (Shekelle et al. 2000; Sexton, Thomas, and Helmreich 2000b; Zohar 2002; Tucker and Edmondson 2003); (3) there is adequate safety training for personnel (Roberts 1990; Morey et al. 2002); and (4) auditing of clinical processes and safety standards takes place (Welsh, Pedot, and Anderson 1996; Weingart, Ship, and Aronson 2000).
As the nation's largest integrated health system, the VA provides unique opportunities to study patient-safety climate. With the current emphasis on moving from a punitive climate within hospitals toward a system-wide climate of safety within health care in order to reduce medical errors (Bagian and Perlin 2000; Reason 2000; IOM 2001; Kizer 2001), assessing the status of the VA's safety climate is essential. However, little is known about the safety climate of the VA because previous surveys in the VA on this topic suffer from a number of methodological limitations, including small sample sizes and nonrandomized samples. Results have not been published in the peer-reviewed literature (Landesman and McKnight 2006). This study represents the first systematic investigation of safety climate within VA hospitals using HRO theory as the theoretical base and allows for a more comprehensive understanding of factors affecting patient safety than has been possible to date.
We used a previously validated instrument (Singer et al. 2007) to measure patient safety climate in VA hospitals. Using HRO theory as a conceptual framework, the principal objective of our study was to assess the levels of safety climate in VA hospitals. We hypothesized the following:
We used the Patient Safety Climate in Healthcare Organizations (PSCHO) survey to measure the safety climate of participating hospitals. The PSCHO was developed by the Patient Safety Culture Institute (PSCI) at the VA Palo Alto Health Care System and Stanford University (Singer 2007). Evaluation has demonstrated generally favorable psychometric properties (Colla et al. 2005; Singer et al. 2007b). The PSCHO contained a total of 42 items related to safety climate and six demographic questions. Responses to all items were close-ended. Each safety climate item used a 5-point, neutral mid-point Likert scale (“strongly disagree”=1 to “strongly agree”=5) with an additional “not applicable” option.
Because the literature on HROs emphasizes the necessity of a “preoccupation with failure” and the importance of a highly uniform safety climate, the PSCHO instrument is scored to reflect the percentage of “problematic responses”—answers that suggest an absence of safety climate. A lower percentage of problematic responses indicates a higher level of safety climate. Identifying frequencies of problematic response focuses attention on areas of safety climate that may be susceptible to error and may present opportunities for improvement. Weighting each survey item equally, we calculated overall and item-specific frequencies of problematic response by hospital, job category, and workgroup. In addition, because a neutral mid-point response (i.e., neither agree nor disagree) to an item could potentially be characterized as problematic with respect to safety climate (Singer et al. 2003), we calculated the frequency of neutral responses to all items and scales and for scales by management category.
Before conducting this study, approval was granted from the relevant Institutional Review Boards for all participating hospitals.
To minimize selection bias, we conducted a complex recruitment process to accrue participating hospitals. All 117 VA acute care hospitals were divided into four performance strata based on the Agency for Healthcare Research and Quality (AHRQ)'s Patient Safety Indicators (PSIs) (Shojania et al. 2001; Romano et al. 2003): low, medium, high, and other. To do this, we calculated two PSI rates for each hospital (one for each of two categories of PSIs created through a factor analysis: complications related and medical/surgical) and ranked hospitals for each group of PSIs separately, dividing each ranking into quartiles. We assigned hospitals to strata based on the quartiles, and those whose ranking in the two PSI groups differed by more than one quartile were assigned to the “other” category. Within each stratum, hospitals were recruited in random order. We recruited a total sample of 30 hospitals for participation in the study—eight from the high, medium, and other categories; six from the low category. Details of the hospital recruitment strategy have been described elsewhere (Rosen et al. in press).
Data from the 2004 American Hospital Association Annual Survey of Hospitals provided information on hospitals' teaching status, bedsize, and geographic region, and the 2005 Bureau of Health Professions' Area Resource File provided information about metropolitan location. To calculate hospitals' case mix, we used AHRQ's comorbidity software on 2005 VA discharge-level data (Patient Treatment File) (AHRQ 2003). These data were aggregated to the hospital level to calculate a hospital's case mix. Data from the VA's 2005 Decision Support System files were used to calculate nurse staffing ratios. Almost all hospitals in the sample were located in metropolitan areas (90 percent, n =26), and over half (53 percent, n =16) were major teaching hospitals (members of the Council of Teaching Hospitals of the Association of American Medical Colleges). Fifty-five percent (n = 16) were large (bedsize>250), 21 percent (n = 6) were medium (bedsize=100–249), and 24 percent (n = 7) were small (bedsize ≤99). At least one hospital from each of the four U.S. geographic regions (east, south, midwest, and west) was represented in each of the size categories, except for small, which was missing a hospital from the “west.”
In all participating hospitals, we sampled 100 percent of the senior managers (department head and above), 100 percent of hospital-based physicians, and a random 10 percent sample of other employees. In addition, at 10 randomly selected hospitals, we sampled 100 percent of employees from HHUs: operating room (OR) (including postanesthesia care unit [PACU]), intensive care unit (ICU), emergency department (ED), and intravenous chemotherapy administration unit. We oversampled senior managers because of their small number and physicians because of their typically low response rates (Berk 1985). Because nurses comprise such a large fraction of hospital employees, it was not necessary to oversample them. Our original sampling frame consisted of 10,837 employees' names. Out of these, 907 were removed because the individual was no longer working at the hospital or had died. An additional 621 individuals were eliminated because they declined to participate using a response postcard, leaving a final sampling frame of 9,309 individuals.
Survey printing, distribution, and collection were conducted by an outside vendor specializing in survey administration. Surveys were delivered to each hospital via U.S. mail and distributed internally by each site's project coordinator. Between December 2005 and May 2006, participants received up to three surveys in waves spaced approximately 5 weeks apart. Participation was voluntary and all responses were anonymous. Each survey packet contained a cover letter with instructions, the survey instrument, a business reply return envelope, and, for waves 1 and 2, a survey completion postcard, which allowed for anonymous tracking of responses and declines for subsequent mailings.
We generated combined weights for each workgroup within each hospital in order to reflect the original sampling frame accurately. Combined weights were calculated by multiplying the sampling weight by the nonresponse weight. All analyses were conducted using the combined weights.
A sampling weight was calculated for each workgroup within each hospital in order to account for variation in size from the original sampling target. For each of the 100 percent samples (senior administrators, physicians, and employees of HHUs), the sampling weight was set as 1.0. For the other employees, where the target sample size was 10 percent, the sampling weight was close to 10 (the numerator consisted of the total number of employees in this category as reported by each hospital, while the denominator contained the number of employees in our original sampling frame).
To account for varying response rates within workgroups, we assigned a nonresponse weight for each workgroup within each hospital, calculated using the inverse of the response rate. Response rates were calculated using the number of responses received for a specific hospital's workgroup over the number of employees in this workgroup (based on our final, deliverable sample).
To assess the psychometric properties of the revised PSCHO instrument, respondents were randomly divided into a derivation sample (n = 2,252) and a validation sample (n = 2,252). We applied multitrait analysis (MTA) to the data in the derivation sample to assess both the reliability and validity of the multi-item scales (Campbell and Fiske 1959), beginning with item-to-scale assignments based on our conceptual model of the key elements of safety climate. Item content and scale interpretability were considered, and reliability and validity statistics were examined. Items were dropped from a given scale to improve scale reliability or decrease interscale correlations. MTA was applied to the revised scale structure, and the cycle was repeated until no further improvements in scale psychometrics could be achieved. We found empirical support for 11 of the conceptually based safety climate scales, which we labeled “senior leadership,”“resources for safety,”“facility characteristics,”“workgroup leadership,”“workgroup norms,”“workgroup recognition,”“learning,”“fear of blame,”“psychological safety,”“problem responsiveness,” and “outcomes.” These align generally with the components necessary to achieve a strong safety climate outlined above. For example, “resources for safety” items reflect the need for supportive structures and procedures, while “learning” items reflect the need for auditing of procedures.
To test the stability and robustness of the MTA results, we applied confirmatory factor analysis (CFA) (Bollen 1989; Bryant and Yarnold 1995) in the validation sample using the final scale structure produced by the MTA.
Our primary outcome was percent of problematic response to survey items. We conducted a correlation analysis and, to further investigate the effects of management level on problematic response frequencies, constructed a multiple regression model treating each hospital as a primary sampling unit. These results are reported where they provide additional information or support data presented in the tables.
To assess the potential limitation of differences in problematic response by time to response, we compared differences in mean problematic response by wave using 95 percent confidence intervals and found no significant differences.
All analyses were performed at the individual level using SAS ©, version 9.1 (SAS Institute, Cary, NC).
One hospital returned data for only the physician sample and no other employees and was dropped from analyses. All results reported reflect data for 29 hospitals. A total of 4,547 surveys was returned, for an overall response rate of 49 percent. Job category was related to response rate, with senior managers having the highest (68 percent) and physicians the lowest (37 percent); excluding physicians, the response rate was 57 percent. The response rate also varied by hospital, ranging from 26 to 73 percent.
Eighty-eight percent of the items had correlations of 0.40 or higher with their respective scale scores (adjusted for overlap) from the 11 scales resulting from the MTA, suggesting adequate item internal consistency (Kerlinger 1973; Ware et al. 1997). Further, in 329 out of 400 comparisons (82 percent), the correlations between items and their hypothesized scales were significantly higher than their correlations with any other scale; they were higher, though not significantly so, in an additional 37 comparisons. Altogether, appropriate discriminant validity was observed for 92 percent of item-to-scale correlations. Internal consistency reliabilities ranged from 0.61 to 0.89.
We did not find strong support for the hypothesized “fear of shame” scale, which consists of three items related to individual feelings of shame associated with mistakes and help seeking that could affect willingness to come forward with safety concerns in a timely manner. Internal consistency reliability for this scale was low (0.46), and none of the items achieved item-to-respective scale correlations above the 0.40 criterion. We did not treat these items as a scale in the analyses.
Results of the CFA in the validation sample supported the scale structure suggested by the MTA in the derivation sample. The root mean square error of approximation was 0.065, and Bentler's normed comparative fit index was 0.98. Both values were in the range indicative of a good fit of the model to the data (Bentler and Bonett 1980; Hu and Bentler 1999).
Respondents were evenly divided between males and females (Table 1). Most, 68 percent, were between the ages of 41 and 60. Thirteen percent of respondents were senior managers. Over one-third (36 percent) were physicians. Finally, 17 percent were HHU employees.
Results for individual items grouped by scale are displayed in Table 2. The mean percent problematic response across all hospitals was 17.5, ranging from 12.0 to 23.7. Across all hospitals, individual item percent problematic response ranged from 4.4 to 49.6, with a mean of 18.1. The mean problematic response across all scales was 18.7 percent. Thirty-four items had a percent problematic response of 10 percent or greater, and 10 had a problematic response of 25 percent or greater. The scale with the highest α-coefficient was “senior leadership” (.89).
Overall, 50 percent of respondents indicated that their workgroups did not recognize individual safety achievement through rewards, nor were they rewarded for timely action to identify a serious mistake (41 percent). Forty-three percent reported never to have witnessed a coworker perform an unsafe act during patient care. Seven percent believed that asking for help was a sign of incompetence, and 4 percent indicated that they would not report a mistake with significant consequences that no one had noticed.
Across all items, an average of 19.9 percent of respondents selected the neutral mid-point of the scale. The percent of neutral responses ranged from 5.2 (“If I make a mistake that has significant consequences and nobody notices, I do not tell anyone about it”) to 39.3 percent (“Clinicians who make serious mistakes are usually punished”). This latter item is part of the “fear of blame” scale, designed to capture a punitive climate. The average percent problematic response plus neutral response over all items was 37.9 percent. Items rated by participants as “not applicable” were not included in our analyses.
We examined the mean percent problematic response by scale for senior managers compared with nonsenior managers (i.e., all others) as well as clinicians versus nonclinicians (Table 3). There was a statistically significant difference between the mean problematic response for senior managers (9.8 percent) and that for all others (18.3 percent). Senior managers had significantly lower overall means than nonsenior managers ( p<.05) on the “senior leadership,”“workgroup leadership,”“workgroup norms,”“workgroup recognition,” and “problem responsiveness” scales. In contrast, clinicians and nonclinicians had relatively similar responses, with no statistically significant differences between their overall means. When we looked at the data by workgroup, we found statistically significant differences among workgroups for the “learning” and “problem responsiveness” scales.
We found a significant correlation between senior managers and clinicians. We constructed a multiple regression model using percent problematic response as the dependent variable and management level as the independent variable, controlling for facility- and individual-level characteristics (geographic region, teaching status, metropolitan location, bed size, nurse-to-patient ratio, case mix, gender, age, HHU, and nurse). We found that senior managers have a significantly lower percent problematic response than other job types (data not shown).
To assess the relationship between percent problematic and percent neutral responses, we investigated the percent neutral response over all items by management level by creating a dummy variable for senior managers. For all but three items, senior managers had a lower percent neutral response than other staff (data not shown). The three items for which senior managers had higher frequencies of neutral response were “My unit recognizes individual safety achievement through rewards and incentives” (28.2 percent versus 20.6 percent), “Telling others about my mistakes is embarrassing” (21.5 percent versus 15.9 percent), and “I am rewarded for taking quick action to identify a serious mistake” (34.6 percent versus 28.3 percent).
To investigate the effect of hospital workgroup on problematic response, we created seven workgroup categories based on similarity of clinical functions: ambulatory (home care, off-campus ambulatory care, and main-campus ambulatory care); ED and urgent care; ICU; nonclinical; OR and PACU; pharmacy and lab; and ward (Table 3). The three workgroups with the highest overall mean percent problematic responses (ED/urgent, mean=23.4 percent; ICU, mean=22.7 percent; and OR/PACU, mean=20.7 percent) substantially overlapped with the areas categorized as HHUs in our sampling strategy. For seven of the 12 scales, the nonclinical workgroup had the lowest mean percent problematic response. Either ED/urgent or ICU had the highest percent problematic response for all scales, with the exception of the “fear of blame,”“psychological safety,” and “workgroup norms” scales.
We explored the variation in percent problematic response by management level and job type by creating four job categories: physician (including resident), nurse, other clinician, and nonclinician. Management level was broken down into three groups: senior manager, supervisor, and front-line worker. Nonclinicians had the largest percent of senior managers; 16.4 percent of responding nonclinicians were senior managers, compared with 14.7 percent of physicians, 8.9 percent of other clinicians, and 5.2 percent of nurses. Of the 240 nonclinician senior managers, 236 identified themselves as “other” (i.e., having administrative duties) on the demographic questions.
We calculated a mean percent problematic response for each group (Figure 1). Across all job categories, front-line workers had the highest mean and senior managers the lowest. Nurses had the largest variation in mean percent problematic response by management level (senior manager nurses, 9.5 percent; front-line nurses, 21.8 percent; standard deviation, 12.2 percent), closely followed by nonclinicians (senior manager nonclinicians, 7.7 percent; front-line nonclinicians, 18.1 percent; standard deviation, 8.1 percent). Physicians as a group and front-line workers overall had almost identical variations.
The psychometric results of our study provide strong support for 11 scales representing dimensions of safety climate. We found that VA hospitals possess relatively high levels of safety climate (83 percent of responses overall were not problematic), and items with the highest percent problematic response were those related to rewards and recognition of safe behavior, indicating a need for programs aimed at recognizing and rewarding contributions to improving safety. Despite a relatively high level of safety climate in general, results suggest important opportunities for improvement in VA hospitals.
A key question is the meaning of these results as defined by HRO theory. A fundamental feature of HROs is their achievement of a highly uniform safety climate. It is not enough that a majority of employees believe safety climate to be important, because the presence of even a relatively small minority of individuals who do not can raise risks within an organization that may, over time, undermine the performance of others. No minimum threshold for unsafe attitudes or practices exists that enables the differentiation of an authentic HRO from one that is not, but some notable HRO theorists have suggested that having more than 10 percent of a workforce answer items in ways that are problematic regarding safety climate is of concern (A.P. Ciavarelli, Naval Postgraduate School, personal communication; K.H. Roberts, Haas School of Business, University of California, Berkeley, personal communication). In fact, several survey studies of commercial and naval aviators have shown that in these HRO settings high uniformity of answers in favor of safety is achieved (Sexton, Thomas, and Helmreich 2000b; Gaba et al. 2003). Given these results, the level of safety climate within the VA overall has still not achieved highly reliable levels and could benefit from further study and targeted action on rewards and safe behaviors.
Comparing our results with studies outside the VA that have used the PSCHO instrument, we found similar levels of problematic response overall. The overall mean problematic response of the means across hospitals in a recent study using the PSCHO instrument conducted across 92 non-VA hospitals was 17.6 percent (Singer 2007), compared with our mean across VA hospitals of 17.7 percent. Direct item-to-item comparison is not possible here because different versions of the PSCHO instrument were used. Noteworthy also is the increase in problematic responses when neutral is classified as problematic. (The non-VA study above had similar results, with a problematic plus neutral percent problematic response of 34.2 percent [personal communication to C.W. Hartman from S.J. Singer, Center for Health Policy/Center for Primary Care and Outcomes Research, Stanford University].) The range of 34.1 percentage points for neutral responses was over 11 percentage points lower than the range observed for the problematic response categories, suggesting that neutral was not used as a catch-all or a choice for unenthusiastic respondents. Instead, it may be important as a signal of potentially problematic areas and is worthy of further analysis.
One important finding was the difference in problematic responses between senior managers and other staff. On five of the 11 scales, workers who are not senior managers, including front-line staff who are in direct contact with patients and therefore have the most exposure to potential safety issues, had significantly higher levels of problematic response than senior managers, whose work is by nature removed from the direct patient care environment. This supports our hypothesis and indicates a possible difference in understanding of the issues affecting patient care between those providing the care and hospital administration; this interpretation is substantiated by our findings regarding management level and job type. Across all job types, senior managers had the lowest percent problematic response, with the largest differences in problematic response by management level being for nurses and the lowest for physicians, potentially because nurse managers, unlike physician managers, are less likely to have direct patient contact (Singer 2007).
The workgroup has been (Shortell et al. 1991; Shortell et al. 1994; Sexton, Thomas, and Helmreich 2000a) and is being (Pronovost et al. 2003; Pronovost et al. 2004) studied as a locus of safety climate. Specific differences in perceptions of safety climate emerged by workgroup in our study. Workers in the ICU, ED, and urgent care had higher levels of overall problematic response and higher problematic responses on the “learning” and “problem responsiveness” scales than workers in other areas, which is not surprising, given the complex and urgent nature of the tasks performed in these areas. It makes sense that areas of high complexity and hazard will yield more challenges to safety, resulting in a greater incidence of problematic response (Sexton, Makary et al. 2006). On the other hand, such areas have the closest parallels to work in other high-hazard industries in which a more uniform safety climate has been successfully achieved, suggesting that lower levels of problematic response are possible. However, the hospital as an institution is an equally important locus and represents an aggregate possessor of the interacting sub-climates of its workgroups. All workgroups share the same hospital senior management, who, through the resources they control and their decision-making authority, strongly influence the safety climate that individuals and workgroups express. Therefore, it is important to investigate variation both within workgroups and overall as key indicators of overall safety climate (Gaba, Singer, and Rosen 2007).
Although we achieved an overall response rate of almost 50 percent and found no difference in problematic response relative to the time of response, the possibility of nonresponse bias in our results cannot be dismissed. Ours was a voluntary survey administered without incentives to busy hospital staff. Although our response rate is analogous to that achieved in similar studies (Singer et al. 2003, 2007), we acknowledge that other safety climate surveys have achieved higher response rates (Sexton, Helmreich et al. 2006; Sorra, Famolaro, and Dyer 2007). Nonetheless, we used a clearly outlined research methodology to achieve this rate, and we attempted to compensate for our lowest response rate category (i.e., physicians) by sampling 100 percent of physicians. For comparisons among job-types, we adjusted for nonresponse by weighting results to account for the differential response rate of the different target populations.
We made a great effort to recruit hospitals across a broad spectrum of performance and geographic criteria (Rosen et al. in press) but our results may not be completely representative of the larger VA population. Analysis of our recruitment data shows that there was a significantly higher percentage of hospitals with low PSI rates (i.e., “high performers” in terms of patient safety) in our sample than in the nonrecruited population (p<.05), indicating a possible hospital selection bias. However, our efforts to recruit across the spectrum of performance improved the sample's distribution. Further, if our sample included an overabundance of high-performing hospitals (presumably those with higher levels of safety climate), then our data may actually represent a lower bound on the prevalence of a problematic safety climate within VA hospitals.
Finally, as with all cross-sectional studies, the data presented here represent only one point in time, which may be of particular concern given ongoing efforts to improve patient safety climate in VA hospitals. In addition, there may have been other factors affecting levels of problematic response that were not captured by our instrument. However, other studies using similar instruments outside the VA have found comparable percentages of problematic response (Singer et al. 2003, 2007).
It is important to understand the level of safety climate in hospitals to inform improvement efforts. This study presents the first reliable and valid information about a representative sample of VA hospitals. Our findings suggest that there is room for improvement and that institutions vary in their perceptions of safety climate, and in their strengths and weaknesses. Data from VA hospitals, like those from non-VA hospitals, suggest a complex picture of safety climate and attitudes linked to a number of factors. All hospitals need to attend to differences by management level and workgroup; more information about the causes of variation is needed. Our results point to a generally positive climate of safety across all VA institutions, workgroups, and types of workers. However, for the VA, similar to other health care organizations, it still remains a challenge to devise and implement strategies and interventions that successfully create and sustain the kind of highly intense climate of safety that is thought to be a hallmark of high reliability in other domains.
Joint Acknowledgement/Disclosure Statement: The research reported here was supported by the Department of Veterans Affairs, Veterans Health Administration, Health Services Research and Development Service as Investigator-Initiated Research (IIR03-303-2). In addition, Drs. Gaba and Singer and Ms. Falwell are supported by a grant from the Agency for Healthcare Research and Quality (R01 HS 013920).
Disclaimer: The views expressed in this report are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.
Funding provided by VA Health Services Research and Development (HSR&D) Service (grant #IIR03-303-2) and the Agency for Healthcare Research and Quality (AHRQ) (grant #R01 HS 013920).
The following supplementary material for this article is available online:
HSR Author Matrix.
This material is available as part of the online article from http://www.blackwell-synergy.com/doi/abs/10.1111/j.1475-6773.2008.00839.x (this link will take you to the article abstract).
Please note: Blackwell Publishing is not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.