|Home | About | Journals | Submit | Contact Us | Français|
The ACGME-released revisions to the 2003 duty hour standards.
To review the impact of the 2003 duty hour reform as it pertains to resident and patient outcomes.
Medline (1989–May 2010), Embase (1989–June 2010), bibliographies, pertinent reviews, and meeting abstracts.
We included studies examining the relationship between the pre- and post-2003 time periods and patient outcomes (mortality, complications, errors), resident education (standardized test scores, clinical experience), and well-being (as measured by the Maslach Burnout Inventory). We excluded non-US studies.
One rater used structured data collection forms to abstract data on study design, quality, and outcomes. We synthesized the literature qualitatively and included a meta-analysis of patient mortality.
Of 5,345 studies identified, 60 met eligibility criteria. Twenty-eight studies included an objective outcome related to patients; 10 assessed standardized resident examination scores; 26 assessed resident operative experience. Eight assessed resident burnout. Meta-analysis of the mortality studies revealed a significant improvement in mortality in the post-2003 time period with a pooled odds ratio (OR) of 0.9 (95% CI: 0.84, 0.95). These results were significant for medical (OR 0.91; 95% CI: 0.85, 0.98) and surgical patients (OR 0.86; 95% CI: 0.75, 0.97). However, significant heterogeneity was present (I2 83%). Patient complications were more nuanced. Some increased in frequency; others decreased. Outcomes for resident operative experience and standardized knowledge tests varied substantially across studies. Resident well-being improved in most studies.
Most studies were observational. Not all studies of mortality provided enough information to be included in the meta-analysis. We used unadjusted odds ratios in the meta-analysis; statistical heterogeneity was substantial. Publication bias is possible.
Since 2003, patient mortality appears to have improved, although this could be due to secular trends. Resident well-being appears improved. Change in resident educational experience is less clear.
The online version of this article (doi:10.1007/s11606-011-1657-1) contains supplementary material, which is available to authorized users.
In 2003, the Accreditation Council for Graduate Medical Education (ACGME) reduced resident work hours for all US residents.1 Resident well-being and patient safety were the main concerns relating to the excessive duty hours that residents sometimes worked prior to the 2003 changes. Even after these modifications, concerns remained about how compliant residents were with the duty hour rules,2 and one rigorous study suggested that further reductions in maximum shift length could lead to safer patient care.3
The Institute of Medicine (IOM) conducted an investigation in 2008 and published recommendations for further reductions in resident duty hours.4 Partially in response to the IOM report, the ACGME convened a duty hours task force to examine the issues and propose revisions in the duty hour standards. Those proposals were made public in June 2010 and will go into effect in July 2011.5 No changes were made to the 80-h/week limit or to the maximum frequency of call (every 3rd night). However, interns will now be held to a maximum shift length of 16 h. Residents who are post-graduate year (PGY) 2 and above will still be able to work 24-h shifts, but with only 4 additional h for hand-offs (reduced from 6 h).5 As residency programs prepare for the implementation of the new standards, it would be helpful to review how patient care and residents’ lives have changed since the 2003 duty hour rules were implemented. Our objective was to synthesize the research that specifically assessed the relationship of the pre- and post-2003 time periods to patient and resident outcomes.
This systematic review was part of a larger project completed at the request of the Accreditation Council for Graduate Medical Education (ACGME). The overall project goal was to examine all aspects of resident duty hours including sleep, fatigue, education, well-being, learning environment, patient safety, moonlighting, supervision, and the effects of the 2003 duty hour standards.6 Here we present the results of the studies examining the possible effect of the 2003 ACGME duty hour regulations on patient and resident outcomes. We followed PRISMA guidelines (Preferred Reporting Items for Systematic Reviews and Meta-Analyses ).7
We conducted an electronic search of the literature in Medline and Embase in June 2009. The Medline search included in-process papers and was updated in May 2010. We updated the Embase search in June 2010. We used an extensive search strategy in Medline, based on prior work8,9 and consultation with a reference librarian. We conducted a similar search in Embase. We used a combination of MESH subheadings and keywords that can be accessed online. We limited the entire list to studies published in 1989 or later and to English language papers. We reviewed the bibliographies of all included studies, and previous reviews, to identify additional studies.
To identify studies not yet published, we searched titles and abstracts from 2008–2010 national meetings of the Accreditation Council of Graduate Medical Education and the Society of General Internal Medicine. We also searched the Research in Medical Education abstracts from the meeting of the Association of American Medical Colleges for the years 2008–2009. We assumed that abstracts written before 2008 would have been published by the time we conducted our search. Finally, we had an expert review our bibliography.
Inclusion criteria were that the study had to contain data collected after the 2003 ACGME duty hour rules went into effect. If studies contained data prior to implementation of the 2003 duty hour rules, but the data had been obtained after instituting changes to achieve compliance with the duty hour rules, we included these studies. The included studies also had to have been conducted to assess the impact of the duty hour reform. Since ACGME duty hour regulations apply only to accredited programs in the United States, we included only studies conducted in the US.
We identified 5,345 citations in our electronic searches, and we reviewed the abstracts of all pertinent articles (Fig. 1). We divided the abstracts between the three study team members for review. During weekly conference calls, we discussed the abstracts about which we were uncertain and came to consensus about inclusion. In all decisions, we erred on the side of inclusion. Abstracts were rejected without further review if they were not research articles or they did not address the study topic. Each article was reviewed by the same study team member who had originally reviewed that paper’s abstract. Again, we discussed uncertainties regarding inclusion of papers as a group. Only one pertinent abstract was found in the meeting proceedings. It did not significantly change the results.10 We then excluded studies that did not include one of the three following outcomes: a direct measure of patient safety, an objective measure of educational outcomes (standardized test scores or experience), and well-being [using the Maslach Burnout Inventory (MBI)].11 This resulted in a final number of 60 studies.
We abstracted data from each study into a structured data abstraction form in a database called Research Electronic Data Capture (REDCap).12 This form included information on specialty, sample characteristics, study duration, study design, study quality (see below), and outcomes.
One study team member (KF) reviewed all included papers and abstracted all of the relevant data from them. To validate the abstraction process, the other two study team members each reviewed a randomly selected sample of 15 papers and abstracted all quality-related items from each. Other large systematic reviews have used a similar approach.13 Inter-rater agreement on study quality scores was calculated using a weighted kappa.
We used the Medical Education Research Quality Index (MERSQI) to assess study quality. Substantial evidence supports the use of MERSQI scores for evaluating research study quality.14 The MERSQI has a maximum score of 18, with 9.8 as an average score for medical education research studies. For the studies assessing patient outcomes, we rated the study quality again using the United States Preventive Services Task Force criteria for cohort studies. 15 One point is awarded for meeting each of seven criteria for study quality, resulting in a 0–7 score, with 7 representing the highest quality possible.
We qualitatively synthesized the results of the studies in a deliberative process that included weekly conference calls and research-in-progress presentations. We plotted outcomes versus quality scores in an effort to understand the significance of the trends that we noticed. We considered the risk of bias such as publication bias, which would favor studies with significant outcomes. We were also cognizant of selective reporting of significant outcomes within our included studies, and tried to include and consider the importance of null results as well.
We conducted a meta-analysis assuming random effects (Stata version 11.2, College Station, TX) on the mortality studies that provided enough detail to calculate odds ratios (n=14). One study16 included patients that were a subset of patients from two other studies,17,18 so we excluded the study with the duplicate patients. We used the mean upper and lower confidence intervals of the combined studies for two of the studies. The first was a study that reported no deaths in either group,19 so confidence intervals were not available. The second study reported one death in the pre-2003 period and none in the post period.20 Because the number of deaths was virtually the same and odds ratios cannot be calculated with a zero in them, we assigned an odds ratio of 1 to that study. Not all of the studies included information on adjusted odds ratios; hence, we used unadjusted odds ratios. When not included, we calculated them. In the studies that included data for more than 1 post-2003 year, we included the most recent year. The mortality outcomes varied across studies (e.g., in-hospital mortality, 30-day mortality). We also performed meta-analysis separately for studies with medical and surgical patients. We assessed publication bias using the Egger’s test for small study effects.
A list of all 60 included studies can be found in the online table. In the following paragraphs, we focus on three overarching themes: patient safety, resident education and resident well-being. For the studies that had quality rated by two investigators, the raw percent agreement was 89% and the kappa was 0.67, representing substantial agreement.75
Twenty studies assessed mortality as an endpoint (Table 1); 12 assessed mortality in surgical patients, 4 in internal medicine patients, and 4 in both. Overall, these studies were of relatively high quality as measured by MERSQI scores between 12 and 16.67. The actual mortality outcomes varied across studies, and included overall, inpatient, and 30-day mortality. No studies demonstrated a worsening in mortality outcomes, and many trended toward improvement in the post-2003 time period.
We conducted a meta-analysis on the 14 studies that included enough detail for the calculations (Fig. 2). The results of the meta-analysis with all included studies revealed a pooled estimate of the odds ratio of 0.9 (0.84, 0.95). Separate meta-analyses were performed for studies assessing mortality in medical (OR 0.91; 0.85, 0.98) and surgical (OR 0.86; 0.75, 0.97) patients, and these also showed significant improvement in the post-2003 period. Significant heterogeneity was present (I2 83%, p<0.01); I2>56% is considered large.76. The Egger’s test failed to show a small study effect, suggesting that publication bias was not present (z=−0.05; p=0.97).
The association of duty hour reform and complications was evaluated in 24 studies (Table 1). A diverse group of complications was evaluated, including surgical complications (e.g., intra-operative, postoperative) and medical [e.g., Intensive Care Unit (ICU) transfers, days on ventilators, adverse drug events]. The preponderance of studies demonstrated that some outcomes improved, some worsened, and some were unchanged.
We highlight the studies of highest quality. All used national databases and controlled for secular trends by comparing teaching intensive and non-teaching intensive hospitals. Browne et al. studied patients with hip fractures, and demonstrated that pneumonia, hematoma, and need for transfusions were significantly more likely in the post-duty hour time period, although measures of effect size were not included.67 In that study, many other complications did not differ between the two time periods. Rosen et al. demonstrated that in Medicare and VA patients, most patient safety indicators (PSIs), which are adverse events that are identifiable in administrative data,77 were equally common in both time periods, although certain PSIs occurred more often in the post-duty hour time period in the VA patients (OR 1.63).78 Silber et al. found no increased risk of prolonged hospitalization in Medicare or VA patients.79
Two studies assessed actual errors (as opposed to self-reported errors). A study in pediatric residents failed to show improvement in medication errors after changes to reduce duty hours were implemented.80 The other study showed that intercepted medication errors made by internal medicine residents were decreased after duty hour rules were initiated.26
This section focuses on the results of studies objectively measuring education using standardized tests and clinical experience. Heterogeneity in outcomes and incomplete reporting precluded meta-analysis.
Ten studies assessed the impact of the duty hour rules on standardized tests (see Table 2). Nine studies were of surgical (or surgical subspecialty) in-training examination scores,49,50,52,54,61,64,69,70,73 and one examined the obstetrics and gynecology in-training examination.35 These were mostly small single site studies with sample sizes ranging from 28 to 238. Several did not report the number of residents included.
Two studies reported an improvement in test scores after the duty hour rules were implemented.61,64 In the first, scores increased for interns only (from 59.5 to 72.4, p=0.006), but were unchanged among the other residents.61 In the other study, residents’ basic science and overall scores increased (by 4.7% and 3.6%, respectively), while clinical management scores did not.64 Five studies showed no change in examination scores between the pre- and post-duty hour periods.35,52,54,69,73 Two studies demonstrated a decrease in examination scores between the pre- and post-duty hour periods.50,70
The difference in operative experience before and after reform has been reported in 26 studies (Table 2). The type of operative experience varied between studies. Of the 17 studies that evaluated the relationship between duty hour rules and overall surgical experience or overall experience as the main surgeon, two showed significant decreases, one showed a significant increase, nine showed no change,40,52,54,55,59,64,71,73,81 and several others did not report statistical analyses. Many of these studies were single site studies, and likely underpowered to detect a true difference.
Other studies suggest that the volume of certain procedures may have changed; this outcome would not be captured in studies examining only overall operative experience. One study of cardiothoracic surgical experience showed that overall experience with coronary artery bypass grafting significantly decreased in the post-duty hour time period (from 148 cases to 110 cases combining all years of residency).82 Another study of abdominal trauma surgery42 found that the overall number of operative procedures per graduating resident in the last 2 years of their residency did not differ between the pre- and post-duty hour periods; however, there was a significant decrease in the number of advanced emergency abdominal cases (51 versus 31) and an increase in the number of basic abdominal cases (47 versus 84) when comparing the pre- and post-duty hour time periods.
A study by Coverdill et al. included interviews with surgical faculty members.83 One theme identified in the analysis of the interviews was that routine work was being shifted from residents to faculty. For example, they reported that residents only come to the operating room, while the faculty provided the preoperative and postoperative care. Another study counted the use of the relative value unit (RVU) modifier “-82,” which signifies that no qualified resident was available.71 The use of this modifier increased from 523 in the 2 years prior to duty hour rules to 6,542 in the 2 years post-duty hour rules.71
Eight studies assessed burnout, using the Maslach Burnout Inventory (see Table 3). One of these studies reported burnout in a cross-sectional study23 and reported the same data as part of a pre-post analysis.22 We include only the pre-post information in Table 3. Rates of burnout did not statistically significantly worsen in any study, although there was a non-significant worsening in one study.21 Burnout improved in five studies,22,30,36,51,54 most often as a result of a decrease in emotional exhaustion. Two studies found that a higher number of work hours was related to burnout.30,51 Specifically, in one study, working >80 h corresponded to rates of burnout near 70%, which decreased to 39% when working <80 h per week.30
Appropriate duty hour reform must consider the interests of all stakeholders involved. In a review of the frameworks used to conceptualize this discussion, Schwartz et al. point out that it is imperative to work from models that take into account the trade-offs associated with public policy issues such as this one.84 A recent review by Jamal and colleagues focused on the effects of duty hour limits on surgeons85, and a second review by Reed et al. examined evidence specifically pertaining to shift length and night float86. Our review differs from and expands upon these prior reviews by providing a comprehensive synthesis of the impact of the 2003 duty hour policy reforms on the most important stakeholder groups: patients and residents of all specialties.
Our major findings were in the areas of patient safety, resident education, and resident well-being. With respect to patient safety, our meta-analysis suggested an improvement in mortality between the pre- and post-2003 time periods. Medical and surgical complications were more variable, with some improving and others worsening. Resident burnout was improved.
The finding that mortality has improved over time must be considered with several important caveats. First, we used unadjusted odds ratios to conduct our analyses. Therefore, we cannot account for differences in patient characteristics between the two time periods. Of particular importance is the fact that we could not take advantage of the adjustments made in the subset of studies that used non-teaching hospitals as controls to account for temporal trends.17,18,27,28 It is important to note that after adjustment those studies largely found no change in mortality between pre- and post-2003. Therefore, our meta-analysis results could easily reflect improvements in quality of care that occurred over the time period studied rather than a direct result of the duty hour rules.
Complications were more nuanced, with some improving and some worsening in the post-2003 time period. One possible explanation for these variable results is that strategies for complying with duty hour reform may lead to improvements in certain types of complications, and a worsening in others. Another possible explanation is that certain complications are more sensitive to fatigue, and these improved post-reform, whereas outcomes more sensitive to discontinuity of care worsened. For example, in one surgical study, bile duct injuries decreased in the post-2003 time period, but conversion from laparoscopic to open cholecystectomy was significantly more common in the post-2003 time.20 Improved manual dexterity from being better rested could account for the former finding, consistent with prior simulation studies.87–89 The latter finding of more conversions to open procedures could reflect the impact of less resident experience with laparoscopy in the post-2003 time period. Less continuity of care may also contribute to certain complications. For example, if doctors are less familiar with patients, this could lead to delayed decisions and therapeutic interventions. This phenomenon could partially explain the increase in the number of cardiac surgery patients that remained on ventilators for >48 h in one of our studies.90 A third possibility is that these inconsistencies simply represent variation due to local factors or chance. For example, the complication of postoperative pneumonia was increased in one study67 and lessened in another.52 We are unable to explain the specific patterns found in these studies by any one of these explanations alone, so other factors are likely involved as well. Regardless, many complications appear to be worsening in the post-reform period, and this deserves further study as additional changes to duty hour rules are made.
The impact of duty hour reform on resident experience is also important. Today’s residents will become tomorrow’s independent doctors, and we must be confident that they are ready for practice.91,92 Most studies in our review did not demonstrate significant differences in overall resident operative experience between the pre-2003 and post-2003 time periods. However, the role of residents in surgeries may be evolving to one in which they have less responsibility. In addition, none of the studies assessed residents who had trained entirely after the 2003 reform compared with those who trained before the reform. Moving forward into an era of further restrictions, it will be essential to study not only the number of surgeries performed, but also the specific surgeries performed and the residents’ roles in those surgeries. This will allow us to better understand the full effect of reform on residents’ operative experience. There remains a paucity of data on patient care experience in the non-operative specialties. The non-surgical specialties could easily track the admitting diagnoses of all patients their interns see or the non-operative procedures that they perform. It is important to determine whether other specialties are struggling to maintain training experiences.
Another interesting finding from this review was the improvement in resident well-being following the 2003 duty hour reforms, which has been noted in prior work.9 We focused on burnout in this review, but other studies have corroborated the improvement in well-being by documenting more residents having babies in the post-2003 time period,93 greater ability to attend family events,38 and less perceived stress.94 However, other aspects of well-being such as rates of depression do not seem to have changed between the pre-2003 period and the post-2003 period.21,22,36,95 Prior research has demonstrated links between resident well-being and quality of patient care,96,97 making preservation of resident well-being extremely important. This improvement in well-being may be one explanation for why some patient care parameters are improving in the post-2003 time period.
Our study has limitations. Perhaps the greatest limitation of this review is that our conclusions rest upon studies demonstrating association, not causality. It is likely that other contextual changes unrelated to duty hour rules contribute to the observed effects. These confounders may explain much of the heterogeneity that we observe. However, decisions must frequently be made in the context of incomplete evidence. While a causal relationship between the duty hour rules and outcomes cannot be determined with certainty from the studies cited, we have diligently identified and synthesized the best available evidence. The possibility of publication bias is also a limitation. We reviewed abstracts from recent meetings in order to capture studies that have not yet made it to publication and also asked an expert to review our bibliography for omissions. Other limitations include the wide range of quality of the included studies. To account for this variability, we used the MERSQI to rate and compare study quality objectively. However, since the MERSQI is designed to measure quality across the full range of quantitative study designs, the instrument incorporates only broad aspects of methodological quality and thus does not account for finer methodological differences within study types. The decision about whether to include a study was made by a single reviewer, although we erred on the side of inclusion and discussed studies about which we were unsure. Additionally, most data from each study were abstracted by a single reviewer and could have been inaccurate. Finally, the reviewers were not blinded to the study authors or journals, which could result in bias as well. Despite these limitations, this review was comprehensive, including over 60 studies. This allows conclusions to be drawn that were not possible when the last comprehensive reviews on this subject were published.8,9
Limitations notwithstanding, this review provides a comprehensive synthesis of the evidence base for the 2003 duty hour reforms in the US. The balance of evidence suggests that burnout among residents has decreased. Given the unacceptably high prevalence of burnout among trainees,96 the reduction in burnout represents an important success of the 2003 reforms. In contrast, data on residents’ educational outcomes, such as test scores and clinical experience, with the 2003 reforms are more mixed, preventing the formulation of any firm conclusions. Moreover, while our review included several studies that examined surgical residents’ operative experience before and after duty hour reform, we were unable to identify any study assessing the impact of the 2003 duty hour rules on the clinical experience of non-surgical residents (e.g., the number of patients seen with specific diagnoses or the number of bedside procedures done). As the new 2011 duty hour rules are implemented, it will be important to quantify any changes in the breadth of clinical exposure for all residents. While this review suggests a modest decrease in mortality following the 2003 duty hour limits, we are unable to exclude the possibility of secular trends playing a role. Nevertheless, because several studies reported increased rates of certain complications, special attention should be paid to monitoring these complications during future reforms. Future efforts to evaluate the impact of the 2011 duty hour limits should build upon this evidence base by using rigorous methods to examine the most important outcomes related to patient care and residents’ education.
Below is the link to the electronic supplementary material.
Study quality and outcomes assessed for all included studies (DOC 168 kb)
Author contributions Dr. Fletcher had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the analysis.Study concept and design: Fletcher, Reed, AroraAcquisition of data: Fletcher, Reed, AroraAnalysis and interpretation of data: Fletcher, Reed, Arora, JacksonDrafting of the manuscript: FletcherCritical revision of the manuscript for important intellectual content: Fletcher, Reed, AroraStatistical analysis: Fletcher, Reed, Arora, JacksonObtained funding: Fletcher, Reed, AroraAdministrative, technical, or material support: Fletcher, Reed, AroraStudy supervision: Fletcher, Reed, Arora
Funding Support This study was funded by a grant from the Accreditation Council of Graduate Medical Education.
Role of the Sponsor The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.
Financial Disclosures Dr. Fletcher reports receiving funding from the VA HSRD and also from NCI. Dr. Reed reports receiving funding from the ABIM Foundation. Dr. Arora reports receiving grant funding from the ABIM Foundation, the ACP Foundation, the Agency for Healthcare Research and Quality, and the National Institutes of Aging. Dr. Fletcher was a reviewer for the Institute of Medicine Report on Resident Duty Hours: Sleep, Supervision and Safety. Dr. Fletcher served voluntarily on the ACGME Committee on Innovation, Learning, and Education. Dr. Arora has provided testimony on duty hours to the Institute of Medicine Committee on Optimizing Graduate Medical Trainee Hours and Work Schedules to Improve Patient Safety and to the ACGME Duty Hours Congress as a representative of the American College of Physicians. Drs. Arora and Reed are members of the Association of Program Directors of Internal Medicine.
Additional Contributions We wish to thank Jessica Schmidt and Andrea Bruckbauer at the Milwaukee VAMC, Alexis Dye, MS, Sherrie Smaxwill and Mark Oium, MS, at the Medical College of Wisconsin, Katya Papatla at Duke University, Patricia Erwin and Kate Featherstone at the Mayo Clinic College of Medicine, and Meryl Prochaska, BA, and Diane Daviera, BS, at the University of Chicago, and Emily Chiu at the University of Michigan for their excellent research assistance. We also wish to thank Jack Littrell, MS, for his assistance with database creation and management, and DeWitt Baldwin, MD, at the ACGME for his assistance with obtaining funding. We are grateful to Jeffrey Jackson, MD MPH, for conducting the meta-analysis and to Dr. Monica Lypson, MD, for her review of the bibliography. Ms. Bruckbauer was a Milwaukee VAMC employee while this project was underway and was also paid through the ACGME grant. Ms. Schmidt is an employee of the Milwaukee VAMC and was also paid through the ACGME grant. Ms. Papatla was paid through the ACGME grant. Ms. Dye, Mr. Oium, Ms. Smaxwill, and Mr. Littrell are paid employees at the Medical College of Wisconsin and volunteered to help with this project. Ms. Erwin and Featherstone are paid employees of the Mayo Clinic College of Medicine. Ms. Prochaska and Daviera are paid employees of the University of Chicago, and Ms. Chiu was paid by the ACGME grant.
This work was presented at the 2010 national SGIM meeting and at the 2010 national Society of Hospital Medicine meeting, both times in poster form.