|Home | About | Journals | Submit | Contact Us | Français|
Investigations of nurse burnout are highly relevant given the global shortage of nurses and the need to retain qualified nurses in clinical care roles (Aiken, Buchan, Sochalski, Nichols, & Powell, 2004). Additionally nurse burnout has been associated with patient dissatisfaction and other measures of deficient care quality (Vahey, Aiken, Sloane, Clarke, & Vargas, 2004).
Burnout was first introduced into the literature by Freudenberger in the early 1970s (Freudenberger, 1974). He defined burnout as a state of fatigue or frustration that resulted from professional relationships that failed to produce the expected rewards (Freudenberger, 1974; Freudenberger & Richelson, 1980). Maslach (1982) later defined burnout as a psychological syndrome involving emotional exhaustion, depersonalization, and a diminished sense of personal accomplishment that occurred among various professionals who work with other people in challenging situations. In Maslach’s view, burnout undermines the care and professional attention given to clients of human service professionals such as teachers, police officers, lawyers, nurses, and others (Maslach, 1982).
There is no consensus on the measurement of burnout. The Maslach Burnout Inventory (MBI) (Maslach & Jackson, 1981a) is the most commonly used instrument for measuring burnout. The Maslach Burnout Inventory captures three dimensions of burnout: emotional exhaustion (EE), depersonalization (DP), and personal accomplishment (PA). Maslach’s team demonstrated, using data from U.S. samples that the subscales have good psychometric properties (Maslach & Jackson, 1981a). Other researchers have added to the evidence confirming the MBI as a useful tool for research (Greenglass, Burke, & Fiksenbaum, 2001; Hastings, Horne, & Mitchell, 2004), and supporting the three dimensionality of the MBI (Evans & Fischer, 1993). However, some researchers have conceptualized burnout as having a two-factor structure that includes only the emotional exhaustion and depersonalization attributes (Kalliath, O’Driscoll, Gillespie, & Bluedorn, 2000). Some have suggested viewing it as a unidimensional phenomenon (Brenninkmeijer & VanYperen, 2003; Halbesleben & Buckley, 2004). Still others have relied solely on the emotional exhaustion subscale of the MBI because of its strong predictive properties (Aiken, Clarke, Sloane, Sochalski, & Silber, 2002; Aiken & Sloane, 1997).
Given the lack of consensus about the measurement of burnout in the U.S. where the MBI was developed, it is not surprising that measurement issues are of even greater concern in international research. Researchers have expressed concerns about the existence of burnout in countries with different work environments and organizational structures, and about the ability of research instruments generally and the MBI in particular to capture burnout in those settings.
International studies of burnout have included both single-country samples (Kanste, Miettunen, & Kyngäs, 2006; Langballe, Falkum, Innstrand, & Aasland, 2006) and, in a few cases, multi-country samples (Aiken et al., 2001; Perrewe et al., 2002). These studies have often used different instruments to measure burnout and neglected to investigate the performance of those instruments in the context of the countries. In some cases, the studies have lacked representative samples of nurses, and some of the multi-country studies used data derived using decidedly different research protocols. Moreover, several researchers have demonstrated that some of the items included in the MBI subscales fail to measure the latent construct and yield low loadings on the factors or cross-load on more than one factor (Byrne, 1991; Koeske & Koeske, 1989), while others have demonstrated different (and better) item loadings (Higashiguchi et al., 1999) and factor structures altogether (Schmitz, Neumannb, & Oppermannb, 2000).
A recent meta-analysis looked at 45 studies that explored the factorial structure of the MBI (Worley, Vassar, Wheeler, & Barnes, 2008). Only 5 of these 45 studies used samples of nurses, mainly in English speaking countries, and in some of these studies the samples included other professionals as well. The sample sizes in most studies were less than optimal for conducting factor analysis, and in some of the larger sample studies certain items were removed from the MBI. Comrey and Lee (1992) suggest the sample size for factor analysis equal to 300 is good, 500 is very good, and 1,000 is excellent. Taking into account that the interpretation of the three dimensions of the MBI is different for nurses than for other professionals (Vanheule, Rosseel, & Vlerick, 2007), the small sample sizes of most studies, and the lack of studies using representative international samples of nurses, the evidence is limited and somewhat ambiguous regarding the performance of the MBI among nurses internationally.
This study investigates the factorial structure of the full MBI among large representative samples of nurses from eight countries, including English and non-English speaking countries, and provides evidence regarding the utility of the MBI in cross-national burnout research. The development of a strong instrument should help to advance burnout research and contribute to the design of successful interventions to reduce nurse burnout.
While burnout measures have been developed by other researchers, including the Burnout Measure (BM) (Pines & Aronson, 1981) and the Copenhagen Burnout Inventory (CBI) (Kristensen, Borritz, Villadsen, & Christensen, 2005), the MBI is the most widely used instrument by researchers. The development of the MBI was based on early research by Maslach and Jackson, who conducted interviews and surveys among various professionals. Those interviews served as a basis for three-subscale MBI. Maslach and Jackson investigated the performance of the three MBI subscales and demonstrated that they had good psychometric properties; Cronbach’s alpha for all three subscales were above 0.7. Also, they established the convergent validity of the MBI by correlating individual MBI scores with: 1) measures of various outcomes, such as job dissatisfaction, that were hypothesized to be related to burnout subscales; 2) job characteristics that were expected to contribute to the development of burnout such as difficult workloads; and 3) behavioral ratings provided by other persons who knew the individuals scored very well (e.g. spouses and co-workers). All correlations provided evidence about the validity of the MBI and its dimensions (1981b).
A review of 34 burnout studies by Hwang and colleagues (2003) concluded that even though all three factors of the MBI have not been replicated exactly across studies, there was considerable evidence that the MBI is a useful tool across a wide range of occupations, languages, and countries. The studies reviewed were conducted by many different researchers using differing research protocols and study designs. The unique advantage of this study is that it explores the factorial structure and performance of the MBI using a common investigator and instrument in eight countries, thus adding new knowledge regarding burnout measurement cross-nationally.
The purpose of this study was to investigate the performance of the items and the subscales of the Maslach Burnout Inventory (MBI) by validating its factorial structure and investigating the reliability of the subscales in the eight countries for which we had samples of nurses. Our results should help future researchers to know whether the MBI is an equally valid and reliable measure of the different dimensions of burnout in different countries and whether cross-national comparisons are possible using the MBI.
This investigation was conducted using data from the International Hospital Outcomes Study (Aiken, Clarke, & Sloane, 2002). In 1998–1999, the study was conducted among nurses in four countries (the U.S., Canada, the U.K., and Germany) (Aiken et al., 2001); in 2001 it was replicated in New Zealand (Finlayson, Aiken, & Nakarada-Kordic, 2007); in 2002 it was replicated in Russia and Armenia (Aiken, 2005; Aiken & Poghosyan, 2009); and in 2005 it was replicated in Japan (Kanai-Pak, Aiken, Sloane, & Poghosyan, 2008). The response rates for nurses surveyed in the original four countries in 1998–99 ranged from 42 percent to 53 percent. The response rates in the replications were 37% in New Zealand, 75% in Russia, 100% in Armenia, and 84% in Japan.
In the U.S., a 50 percent sample of registered nurses who were licensed in and residents of the state of Pennsylvania were surveyed. The respondents included over 13,000 nurses employed in 209 adult acute care hospitals. In Canada, nurses working in all adult acute care hospitals in three provinces (Ontario, Alberta, British Columbia) were surveyed and respondents ultimately included over 17,000 nurses in 303 different hospitals (the complete census of registered nurses working in adult acute care hospitals in Alberta was surveyed, while in British Columbia and Ontario representative samples from nurses working in acute care hospitals were drawn). In the U.K. and Germany, hospitals were asked to provide the lists of nurses employed at the hospitals, and all nurses in those lists were surveyed. In the U.K. 9,855 nurses from 63 hospitals and in Germany 2,681 nurses from 29 hospitals responded to the survey. In those countries questionnaires were distributed to all nurses on the units of the participating hospitals, and nurses returned them to the research team after completion.
In New Zealand nurses working in 19 publicly-funded non-specialty hospitals were surveyed. The surveys in these countries were conducted following a modified Dillman (1978) approach to mail surveys. The surveys were mailed to nurses with envelopes for returning the completed questionnaires, and surveys were returned by 4,799 nurses in the 19 hospitals. In Japan, 19 hospitals participated in the survey in 2005 (Kanai-Pak et al., 2008). The nursing directors of the hospitals were contacted about the study. After receiving the approval and the agreement of the nursing directors, nurses in participating hospitals were surveyed. The questionnaires along with a cover letter were distributed to each nurse in their units. The nurses were instructed to complete the questionnaires in private and return them in the special box on each unit, and 5,956 surveys were completed by the nurses in these hospitals.
In the decidedly smaller replication in 2002 of the International Hospital Outcomes Study in Armenia and Russia, two hospitals in each country known for having more advanced professional nursing were selected to participate (Aiken & Poghosyan, 2009). Overall 840 nurses participated in the study from these four hospitals.
In all countries, studies utilized virtually the same questionnaire. Translations of the questionnaire wording to languages other than English were back-translated and carefully checked for differences in meaning. There were, however, some variations in the data collection procedures-- in some countries, the data collection took place in hospital settings while in other countries nurses had the surveys mailed to their homes—and some differences in response might result from these differences in procedures. After comparing the original and translated versions of the instruments they were pilot-tested for final revisions.
In this study we restrict our attention to nurses working in adult general hospitals. The final sample of nurses for the analyses described below included 54,738 nurses in 646 hospitals in the eight countries. Selected characteristics of the nurse samples in the different countries are presented in Table 1. The mean age of the nurses in Japan (29.2 years) was somewhat younger than in the other countries, where it ranged from roughly 34 to 42 years, and their experience, or number of years in nursing, was decidedly less-- about 7 years in Japan vs. 11 to 18 years in the other countries. While females predominated in all samples of nurses, the percentage of male nurses varied considerably across countries, from only 1 percent in Russia and Armenia to 8 percent in the U.K. and 15 percent in Germany. No question about nurse education was included in the surveys of Russian and Armenian nurses, but in the other countries educational credentials do differ markedly, at least in the sense that while only between 10 and 20 percent of the nurses from Canada, the U.K., Germany and Japan were educated in university settings, the same was true of more than a third of the nurses in the U.S. and New Zealand.
Burnout was measured using the emotional exhaustion (EE), depersonalization (DP) and personal accomplishment (PA) subscales that are parts of the 22-item MBI (Maslach & Jackson, 1981a). The EE subscale describes feelings of being emotionally exhausted because of the work and contains nine items. The PA subscale contains eight items that describe beliefs of competence and successful achievement at work. The DP subscale describes detached and impersonal treatment of patients and consists of five items. Each of the 22 items asks nurses to describe their feelings on a 7-point scale, ranging from never having those feelings to having those feelings a few times a week.
This study used both confirmatory and exploratory factor analysis to investigate the factor structure of the MBI in different samples of nurses. Factor analysis involves considering the joint distribution of the full set of variables in an inventory and combining variables into factors when they are correlated with one another and independent of the other variables in the set (Tabachnick & Fidell, 1996). Our use of factor analysis was guided by Tabachnick and Fidell (1996), Comrey and Lee (1992), and Kim and Mueller (1978). In this study, factor analysis allows us to determine whether the items of the MBI comprise the same factors or subscales in all countries that were originally suggested by Maslach and Jackson (1981a). Exploratory factor analysis was used after the confirmatory factor analysis showed that the factor structure of the 22 items was not entirely consistent with the three subscales in the original Maslach Burnout Inventory in any of the countries.
In our analyses principal component methods were used to extract factors, and after extracting the three factors they were rotated with both varimax and promax rotations to achieve interpretable results. Promax rotation was preferable to varimax rotation since the three factors of the MBI were found to be significantly correlated. The models were evaluated for their ability to produce subscales that have items with loadings (or item-to-factor correlations) higher than 0.3. Items that had loadings of lower or equal to 0.3 were excluded from consideration for inclusion; loadings greater than 0.3 are considered minimal (Merenda, 1997) and loadings of 0.40 are considered important (Hair, Anderson, Tatham, & Black, 1998). After the three subscales of the MBI were extracted, they were investigated for their internal consistencies. The coefficient of reliability (Cronbach’s alpha) for each subscale in every country was calculated to determine how well the items in each subscale measure the latent construct. We calculated correlation coefficients to assess how strongly the subscales were associated with one another in the various countries and, to describe the levels of burnout in each country, we calculated means and standard deviations on each MBI subscale.
Table 2 provides the goodness-of –fit statistics for the confirmatory factor analysis model for each country. While the values of the Root Mean Square Error of Approximation (RMSEA) and Bartlett’s Comparative Fit Index (CFI) approach the values that are usually considered acceptable (i.e., RMSE < .06 and CFI > .90, respectively), the RMSEA shows an acceptable fit only in Russia and the CFI value is unacceptable in every country. Moreover, the chi-square statistic indicating the goodness-of-fit in each country suggests an unacceptable fit of model to data in every country. As such, Maslach’s original 3-factor solution does not appear to be consonant with the observed data in these countries, though by at least two of the indicators it does come reasonably close.
The results of the exploratory factor analysis, shown in Table 3, do nonetheless result in three factors being extracted from the 22-item MBI. The slope of the Scree test (not shown) explicitly demonstrated the existence of the three factors in each country, and virtually all MBI items yielded loadings higher than 0.3 on the three different factors that were extracted. The primary difference between the three factors extracted in the exploratory analysis and the original three subscales suggested by Maslach is that in all countries the item stating that “Working with people all day is really a strain for me” loads on the depersonalization factor rather than the emotional exhaustion factor, and in seven of the eight countries the same is true of the item stating that “Working directly with people puts too much stress on me.” In Russia, one item (“I feel frustrated by my job”) from Maslach’s emotional exhaustion subscale loaded on the depersonalization factor, another (“I feel I’m working too hard on the job”) loaded on the personal accomplishment subscale, and one other item “I feel patients blame me for their problems” failed to load on any of the subscales. In Armenia, three of the original five items in Maslach’s depersonalization subscale (including the “Blame” item just mentioned) failed to load on any scale, as did one of the eight items in the original personal accomplishment scale. The only item that did not yield a sufficient loading on any of the factors in the other six countries was the “Blame” item in Germany.
In summary, in nearly all countries three useful subscales emerge with only slight modification—the two items (6 and 16) related to the “stress” and “strain” involved in working with people should be included in the depersonalization subscale rather than the emotional exhaustion subscale to which they were initially assigned. These three subscales—now a 7-item EE subscale, an 8-item PA subscale, and a 7-item DP subscale-- are almost fully supported by the exploratory factor analysis in all of the countries except Armenia, where three of the DP items failed to load on that subscale and one item (6) cross-loaded on the EE subscale, and to lesser extent in Russia, where items 13 and 14 loaded on the DP and PA subscales rather than the EE subscale.
The fact that one other item failed to exhibit a substantial loading in Germany (item 22 on the DP subscale) and another item failed to exhibit a substantial loading in Armenia (item 21 on the PA subscale) is largely ignorable. This is suggested by the fact that all three of the resultant scales yield Cronbach alphas, shown in the top panel of Table 4, which exceed the critical value of .70, except for the depersonalization subscale in Armenia. With that one exception, the three subscales, in spite of including a single non-performing item in one or two countries that do not greatly reduce their scalability, provide useful subscales that can be consistently defined in all countries.
The correlations between the three subscales are presented in the middle panel of Table 3. The EE and PA were mildly negatively correlated except Japan, Russia and Armenia, where correlations were virtually nil. The correlation coefficients ranged from −.15 to −.25 in the five countries where these two subscales were negatively correlated. The correlation coefficients for the EE and DP subscales were strong and positive and ranged across the eight countries from 0.42 to 0.60. The correlations between DP and PA dimensions were negative and relatively similar across all countries except Japan, where the two subscales were only weakly correlated and in Armenia, where the DP subscale is suspect.
The bottom panel of Table 3 shows the average scores on the different burnout subscales across countries. Because of differences in research protocols across countries, these differences should be interpreted cautiously. Japanese nurses exhibit higher levels of emotional exhaustion and depersonalization and lower levels of personal accomplishment than nurses in countries other than Armenia and Russia. German nurses exhibit less emotional exhaustion than similar countries. Russian and Armenian nurses are quite unlike nurses from any other countries with respect to having lower average scores on all three subscales at once.
This study used confirmatory and exploratory factor analyses to investigate the factorial structure of the MBI in eight countries, and while Maslach’s initial configuration of the burnout inventory was not completely confirmed, with minor modifications the three-factor structure of the MBI was largely validated. Except for two items, nearly all of the MBI items loaded significantly on the factors they were expected to, in virtually all countries. Even though a few items of the MBI subscales did not load on the original subscales, the items performed similarly in most countries and the lack of item loadings in the few cases may be explained by the properties of the original instrument rather than country-specific differences. Moreover, the range of the item loadings was narrow suggesting that there is no large variation in terms of how each item performs in a specific country. Overall, the EFA demonstrated that the MBI performs relatively similarly across countries and the item loadings provided evidence about their equivalency across countries.
The findings of this study are consistent with the findings in the literature but add to our knowledge because of the international scope of the investigation. Two EE items load on the DP subscale rather than the EE subscale in all countries (in the case of items 16), “Working with patients is really a strain for me,” or in most countries (in the case of item 6), “Working with people directly puts too much stress on me.” A similar finding was demonstrated by other researchers (Beckstead, 2002; Byrne, 1991). Both items describe nurses’ feelings of strain and stress from working with people or patients. Since feelings of stress and strain are similar to the negative feelings captured by the items on the depersonalization subscale, it is not surprising that these two items load on the DP subscale. The consistency of this finding across countries suggests that it should be taken into consideration in future revisions of the instrument.
Since the factor analysis was performed with samples of nurses in different countries, some differences in item loadings were anticipated (Tabachnick & Fidell, 1996). It is not realistic to expect that the factor loadings will be identical for all groups (Byrne, 1991). Nurses from different countries may perceive and report items differently which may result in variability in the MBI loadings.
In all countries, the EE and DP subscales were significantly correlated. This finding is consistent with the findings reported by other researchers (Byrne, 1991; Maslach & Jackson, 1981b). Among nurses in all countries, their feelings of emotional exhaustion, depersonalization, and personal accomplishment are similarly correlated, suggesting that emotionally exhausted nurses or nurses who are experiencing depersonalizing feelings perceive that these conditions are associated with their accomplishments at work. Additionally, very weak or almost no relationship between feelings of emotional exhaustion and depersonalization with personal accomplishment in Japan may suggest that these nurses have different perceptions about their own accomplishments. In Asian countries, professionals may have unique ways of defining their accomplishments at work (Tang, 1988). Future research is needed to understand how nurses in Asian countries rate their work performance and accomplishments, and what are the factors associated with their accomplishments at work.
Even though comparing burnout levels across countries using mean scores of the MBI subscales is not the main focus of this study, some of the patterns in the scores are worth investigating in future research. The high burnout rates in the U.S. and Canada are consistent with their short average length of hospital stay. Hospitals in the U.S. have the shortest average length of stay in the world. When hospital length of stay is short, nurses do the same amount of work but in a shorter period of time, and the cycle of admissions and discharges is more rapid which places a significant burden on nurses. Germany has a long length of stay and low burnout but this may change now that Germany has adopted prospective payment for hospitals using Diagnosis Related Groups that in the U.S. led to significant reductions in length of stay. Japan does not follow the pattern expected by average length of stay as we observed high rates of burnout and long lengths of stay. This discrepancy was addressed in a recent paper showing that in Japan a young and inexperienced nurse workforce in hospitals and poor physician-nurse relations appear to explain the comparatively high burnout rates (Kanai-Pak et al., 2008). This study has several limitations. The data collection protocol and the nurses surveyed differed in some ways across countries, and that variability may influence the findings. The relatively small sample of nurses and hospitals in Armenia and Russia may be a factor in the results from these countries. While attempts to verify the accuracy of translations were undertaken, we cannot completely rule out the possibility that there may be some inaccuracies in the translations. Nonetheless, it is unusual to have a measure of nurse burnout across so many different countries located in different parts of the world, and it is remarkable that the factor structure of the MBI was as consistent across countries as we found.
Nurse job-related burnout was observed in countries with very different types of health services organization, financing, and resources. It is important to investigate the causes of burnout and to identify potential solutions to address the phenomenon. This study demonstrated that 22-item MBI has a similar factorial structure and, with minor modifications, performed similarly across countries. While additional investigations of the factor structure of the MBI using other samples would be useful, the modified versions of the subscales can be used with greater confidence as burnout measures among nurses internationally to determine the effectiveness of burnout reduction measures generated by institutional and national policies.
This research was supported by the National Institute for Nursing Research (R01NR04513 and P30NR05043, Linda Aiken, principal investigator) and by AMN Healthcare, Inc. We thank Dr. Eileen Lake and Tim Cheney for their assistance.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Lusine Poghosyan, Assistant Professor of Nursing and Public Health, Bouve’ College of Health Sciences, School of Nursing and School of Health Professions/Masters of Public Health (MPH), Northeastern University, USA.
Linda H. Aiken, The Claire M. Fagin Leadership Professor of Nursing, Professor of Sociology, Director, Center for Health Outcomes and Policy Research, School of Nursing, University of Pennsylvania, USA.
Douglas M Sloane, Adjunct Professor, Center for Health Outcomes and Policy Research, School of Nursing, University of Pennsylvania, USA.