|Home | About | Journals | Submit | Contact Us | Français|
Objective To investigate the literature for evidence that workplace based assessment affects doctors’ education and performance.
Design Systematic review.
Data sources The primary data sources were the databases Journals@Ovid, Medline, Embase, CINAHL, PsycINFO, and ERIC. Evidence based reviews (Bandolier, Cochrane Library, DARE, HTA Database, and NHS EED) were accessed and searched via the Health Information Resources website. Reference lists of relevant studies and bibliographies of review articles were also searched.
Review methods Studies of any design that attempted to evaluate either the educational impact of workplace based assessment, or the effect of workplace based assessment on doctors’ performance, were included. Studies were excluded if the sampled population was non-medical or the study was performed with medical students. Review articles, commentaries, and letters were also excluded. The final exclusion criterion was the use of simulated patients or models rather than real life clinical encounters.
Results Sixteen studies were included. Fifteen of these were non-comparative descriptive or observational studies; the other was a randomised controlled trial. Study quality was mixed. Eight studies examined multisource feedback with mixed results; most doctors felt that multisource feedback had educational value, although the evidence for practice change was conflicting. Some junior doctors and surgeons displayed little willingness to change in response to multisource feedback, whereas family physicians might be more prepared to initiate change. Performance changes were more likely to occur when feedback was credible and accurate or when coaching was provided to help subjects identify their strengths and weaknesses. Four studies examined the mini-clinical evaluation exercise, one looked at direct observation of procedural skills, and three were concerned with multiple assessment methods: all these studies reported positive results for the educational impact of workplace based assessment tools. However, there was no objective evidence of improved performance with these tools.
Conclusions Considering the emphasis placed on workplace based assessment as a method of formative performance assessment, there are few published articles exploring its impact on doctors’ education and performance. This review shows that multisource feedback can lead to performance improvement, although individual factors, the context of the feedback, and the presence of facilitation have a profound effect on the response. There is no evidence that alternative workplace based assessment tools (mini-clinical evaluation exercise, direct observation of procedural skills, and case based discussion) lead to improvement in performance, although subjective reports on their educational impact are positive.
The assessment of clinical performance in medicine is important but challenging. Historically, assessments have been implicit, unstandardised, and based on holistic or subjective judgments (the apprenticeship model).1 However, recent reforms in postgraduate medical education2 3 have brought new systems for the assessment of competence and performance.
Workplace based assessment is one of these systems. Workplace based assessment refers to “the assessment of day-to-day practices undertaken in the working environment”4—or, more simply, workplace based assessment is an “assessment of what doctors actually do in practice.”5 Although many forms of assessment can be used to show a doctor’s knowledge or competence, there is evidence that competence does not reliably predict performance in clinical practice6; one major advantage of workplace based assessment is its ability to evaluate performance in context.7
Another strength of workplace based assessment is its formative potential. A recently published guideline for the implementation of workplace based assessment emphasises the importance of using such tools as assessments for learning rather than solely as assessments of learning.8 The critical element required to achieve this is the provision of feedback from assessor to trainee, enabling the trainee to steer his or her learning towards desired outcomes.9 There is now convincing evidence that systematic feedback delivered by a credible source can change clinical performance,10 although there are many complexities that influence the effectiveness of feedback in practice.11
Many different workplace based assessment methods exist, all designed to assess different aspects of performance. Commonly, assessment tools will fit into one of the following categories5:
These tools have been described in more detail elsewhere.9 12 13 14 Certain aspects of their utility1 (particularly their reliability, validity, and acceptability) have been scrutinised over the past few years,9 14 but there is still relatively little known about their educational impact.
It is tempting to suggest that, because workplace based assessment requires the provision of feedback, and feedback can lead to learning and improved performance, the implementation of such assessment strategies will have a positive impact on doctors’ learning and performance. However, despite the considerable weight placed on them in postgraduate training, there is little information in the medical education literature to support this claim.
The aim of this study was therefore to perform a systematic review of the literature to investigate the educational impact of workplace based assessment in an attempt to answer the question: “What is the evidence that workplace based assessment affects physician education and performance?”
The primary data sources for this review were the electronic databases Journals@Ovid (English language only, 1996–February 2010), Medline (1950–February 2010), Embase (1980–February 2010), CINAHL (1981–February 2010), PsycINFO (1806–February 2010), and ERIC (1966–February 2010). Evidence based reviews (Bandolier, Cochrane Library, DARE, HTA Database, and NHS EED) were accessed and searched via the Health Information Resources website, www.library.nhs.uk/default.aspx (formerly the National Library for Health).
The search terms (in English only) were
Results from the four searches were combined with “AND” then duplicated results were removed. The remaining citations were displayed and examined. We decided to limit the search to the terms mini-clinical evaluation exercise, direct observation of procedural skills, case based discussion, and multisource feedback because these four tools are in common use internationally. Otherwise, terms used were kept as broad as possible to maximise the chance of finding relevant articles. Hyphens and abbreviations were not used in case they limited the search.
In addition, we searched reference lists of relevant studies and bibliographies of review articles.
Eligibility judgments were made by a single author (AM), with consensus from the second author (JA), on the basis of information found in the article’s title, abstract, or full text if necessary. Studies were included in the review if they met the following criteria:
Studies were excluded if the sampled population was non-medical or the study was performed with medical students. Review articles, commentaries, and letters were also excluded. The final exclusion criterion was the use of simulated patients or models rather than real life clinical encounters.
Data from eligible articles were extracted into a form to compare the studies. This procedure was performed by both authors independently, with disagreements being resolved by discussion. Column headings were
We assessed study quality using a series of quality indicators developed by Buckley et al15 as a Best Evidence Medical Education (BEME) guide (see box 1). We considered studies to be of higher quality if they met seven or more of these 11 indicators.
As a means of evaluating outcome, we applied Barr’s adaptation of Kirkpatrick’s four level evaluation model (see box 2) to the results of each study.16 Levels of evaluation were included in the outcome column of the data extraction table.
Our initial search was carried out using the Ovid database because of its good coverage of medical education literature; this yielded 201 articles. We screened titles and abstracts, leading to the exclusion of 163 articles. The remaining 38 articles were read in full, but 27 of these did not fit the inclusion criteria, leaving 11 studies for inclusion in the review.
We then performed the same search via the Health Information Resources website using Medline (15 studies identified), Embase (11 studies identified), CINAHL (60 studies identified), and PsycINFO (39 studies identified). No additional studies were identified from the evidence based reviews section. Removal of duplicates produced 114 articles, the titles and abstracts of which we screened and cross referenced with the initial Ovid search results: 110 either did not fit the inclusion criteria or had already been identified, leaving four new articles for inclusion in the review.
The ERIC database highlighted 14 articles, but all had been previously identified. Manual searching of reference lists identified one additional article of relevance, taking the total number of included articles to 16.17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 The figurefigure summarises the results of the search strategy.
Details of the included studies and a summary of the data extracted are listed in the web extra table on bmj.com. Most of the studies were conducted in the United Kingdom and Canada, with smaller numbers originating from New Zealand, Australia and the United States. Study populations consisted of doctors from all levels of training and from different specialties, including primary and secondary care. Fifteen of the 16 included studies were non-comparative descriptive or observational studies, and one was a randomised controlled trial. Study quality was mixed. Of the included articles, 11 were graded as higher quality and five as lower quality according to BEME quality indicators (see web extra table). Eight studies examined multisource feedback,17 18 19 20 21 22 23 24 four concentrated on the mini-clinical evaluation exercise,25 26 27 28 one investigated direct observation of procedural skills,29 and three looked at multiple assessment methods.30 31 32
As part of a larger study investigating the reliability and feasibility of various different workplace based assessment methods in general practice training, Murphy et al20 asked study participants to rate the educational impact of multisource feedback on a 7-point Likert scale. The mean score was 4.2, indicating that most doctors felt multisource feedback held educational value (Kirkpatrick level 1). Other studies went further, attempting to show that multisource feedback could lead to modifications in attitudes (Kirkpatrick level 2a)18 19 or even changes in behaviour (Kirkpatrick level 3).21 22 24
However, in terms of tool effectiveness, results are mixed. A questionnaire of 249 foundation year 1 doctors18 revealed that nearly a third of trainees did not anticipate changing in response to multisource feedback. Similarly, a group of fully qualified surgeons were unlikely to make practice changes in response to feedback data, even if their multisource feedback scores revealed a need to consider change.19 On a more positive note, a survey of 113 family physicians21 showed that 61% had either made practice changes or were planning to make them in response to their multisource feedback data. More detailed, focus group data from a small sample of these doctors22 revealed that feedback is useful only if it is perceived to be accurate and credible; feedback perceived as negative and inaccurate is much less likely to lead to practice improvement.
One prospective longitudinal study24 collected multisource feedback data from 250 family physicians on two separate occasions five years apart and found small to moderate improvements in scores the second time. However, the authors were unable to conclude that this performance improvement was due to the multisource feedback.
Brinkman et al17 carried out a randomised controlled trial to determine whether multisource feedback could lead to improvements in communication skills and professional behaviours in paediatric residents. As well as receiving a feedback report from collated multisource feedback data, participants in the intervention group were also obliged to fill in a self assessment form and to take part in a tailored coaching session to help them identify their strengths and weaknesses. Participants in the control group received standard feedback only. After five months, the multisource feedback group showed significant improvements in “communicating effectively with the patient and family (35%; 95% confidence interval, 11.0%-58.0%), timeliness of completing tasks (30%; 95% confidence interval, 7.9%-53.0%), and demonstrating responsibility and accountability (26%; 95% confidence interval, 2.9%-49.0%),” but only when rated by nursing staff.17 However, it is not clear whether the same performance improvements would have occurred without the tailored coaching sessions provided.
The four studies evaluating the mini-clinical evaluation exercise were all concerned with its educational impact as a formative assessment tool. A qualitative study from Canada25 investigated internal residents’ perceptions of the exercise as an educational tool; most agreed that it had had a positive educational impact (Kirkpatrick level 1). The focus group participants also highlighted the point that use of the mini-clinical evaluation exercise as an assessment limited its value as an educational tool.
Nair et al studied the reliability, feasibility, and acceptability of the mini-clinical evaluation exercise in a group of international medical graduates,26 and found that nearly half were either satisfied or very satisfied with the exercise as a tool for learning (Kirkpatrick level 1).
Two studies from New Zealand looked at the educational impact of the mini-clinical evaluation exercise in anaesthesia training.27 28 Survey data revealed that the large majority of trainees (and their assessors) felt that the evaluation exercise improved the frequency and quality of feedback offered (Kirkpatrick level 1).27 Focus group and interview data built on these findings, suggesting that the mini-clinical evaluation exercise promoted educational interaction and improved training quality (Kirkpatrick level 1).28
We found no studies looking at the effect of the mini-clinical evaluation exercise on doctors’ performance.
An observational survey describing the implementation of direct observation of procedural skills, mini-clinical evaluation exercise, and multisource feedback in a London hospital29 provides some data on the educational impact of direct observation of procedural skills. A feedback survey returned by 25 of the 27 preregistration house officers completing the assessments revealed that most (70%) felt that direct observation helped to improve clinical skills (Kirkpatrick level 2b). Furthermore, 65% agreed with the statement, “I think that undertaking direct observation of procedural skills will improve my future career.” However, this study was graded as lower quality according to BEME indicators, and there was no evidence in this study (or any others) that direct observation of procedural skills leads to objective performance improvement.
Three studies looked at the impact of multiple assessment methods on education and training. A large survey collected the opinions of 539 surgical trainees on the Intercollegiate Surgical Curriculum Programme,30 an online portfolio that is used to administer various workplace based assessments (including mini-clinical evaluation exercise, case based discussion, direct observation of procedural skills, and multisource feedback). Over 60% of survey respondents felt that the programme sometimes or frequently impacted adversely on training opportunities, as a result of the time needed to complete the assessments. More than 90% stated that the programme had a neutral or negative impact on their training overall (Kirkpatrick level 1).
A questionnaire of 95 foundation year 2 doctors exploring their experiences of the foundation programme portfolio was slightly more positive.31 Most felt that the portfolio was effective in helping them to achieve their educational requirements (Kirkpatrick level 1), but some felt that its success as an educational tool was limited by lack of understanding of its contents and purpose (particularly by educational supervisors).
An observational study to evaluate the reliability and feasibility of workplace based assessment for assessing medical registrars also gave positive results in terms of educational impact.32 Participants completed a questionnaire about their experiences of workplace based assessment, and the large majority felt that mini-clinical evaluation exercise, direct observation of procedural skills, and multisource feedback were helpful in aiding personal development (Kirkpatrick level 1). There were also positive free-text comments about the ability of the assessments to provide a basis for feedback, although many also found them to be time consuming and a considerable administrative workload.
Again, there were no studies investigating the impact of multiple assessment methods on performance.
This systematic review brings together the available evidence concerning the educational impact of workplace based assessment and its ability to change doctors’ performance. Considering the emphasis now placed on workplace based assessment as a method of formative performance assessment, there are surprisingly few published articles exploring these areas, and the strength of the findings is questionable.
The strongest evidence for workplace based assessment improving performance comes from studies examining multisource feedback. Work reported in the psychology literature has shown that multisource feedback can lead to small improvements in performance over time,33 and a 10 year old study of medical education also showed that doctors exposed to specific feedback from peers, coworkers, and patients can use the data to inform changes in their practice.34 The studies we reviewed show conflicting evidence, however. Although some junior doctors18 and most surgeons19 displayed little willingness to change in response to multisource feedback, family physicians seemed more prepared to initiate performance changes.21 This variability may be due to individual differences; it is already known that performance improvement is more likely to occur when feedback indicates a need for change, when recipients have a positive view of feedback, and when they believe that change is feasible.33
The single randomised controlled trial in our review17 attempted to show improved performance in the intervention group allocated to multisource feedback, but the positive results seen might have been due to the coaching session that was also part of the intervention. The positive influence of facilitation in the effectiveness of multisource feedback has recently been established,11 35 especially in the context of negative feedback.36
It seems, therefore, that multisource feedback can lead to improved performance, but individual factors, the context of the feedback, and the presence (or absence) of facilitation can have a profound effect on the magnitude of the response.
We were unable to unearth any clear evidence to show that the mini-clinical evaluation exercise, direct observation of procedural skills, or case based discussion can lead to improvements in performance. The studies examining the mini-clinical evaluation exercise and multiple assessment methods showed largely positive results in terms of learner satisfaction but could not show changes in attitudes, skills, knowledge, or behaviour. The study of the impact of direct observation of procedural skills30 revealed that some house officers felt it could improve their clinical skills, but this evidence has not been captured objectively, and participant numbers were small. A previous systematic review investigating tools for direct observation and assessment of clinical skills found similarly few studies describing educational outcome.37
Most of the articles included in this review were non-comparative descriptive or observational studies. Strength of findings may be limited by the uncontrolled nature of the studies, but given the methodological difficulties of evaluating topics such as educational impact and doctor performance,38 descriptive and observational studies can still provide useful information. Indeed, some of the strongest evidence for improved performance after workplace based assessment comes from detailed focus group data.22 The single randomised controlled trial we identified attempted to establish causality (“multisource feedback causes performance improvement”),17 but, as discussed above, the results are undoubtedly affected by confounding factors.
Methodological rigour is clearly apparent in some articles, especially those aiming to evaluate multiple facets of workplace based assessment,20 32 but, because the focus here tends to be on reliability and feasibility, they may be less suitable for gathering data about educational impact or performance change.
Quality is also affected by the voluntary nature of participation in most of the studies. Potentially biased30 or highly motivated21 study populations can lead to profoundly different results. The reliance on self reporting and the small study populations in most of the studies also limit the quality and strength of their findings.
Our review methodology also has its limitations. Although our database search was extensive, we did not review the grey literature and so may have missed some relevant studies. The Ovid database search was also limited to English language publications, so there may have been publication bias. The search may also have been limited by the terms used (for example, multisource feedback is also known as 360° feedback, mini-peer assessment tool, and team assessment of behaviours, but these terms were not included in the search).
This review has highlighted once again the need for further research in the area of formative performance assessment: the increasing use of workplace based assessment methods in postgraduate medical training and recertification should provide fertile ground for this work. Serious consideration needs to be given to the use of study designs that are able to show conclusive links between workplace based assessment and performance improvement. So often workplace based assessment has been implemented wholesale and evaluation has subsequently and understandably focused on feasibility and self reported outcomes. We need to move to an interventionist, experimental model to establish whether workplace based assessment makes a difference. Future studies will need to be ambitious, not only in size to show significant change but over extended periods to expose matched groups of doctors to different interventions. This will require collaboration within and across nations.
Further avenues for future work are also clearly signposted from here. A focus on assessment programmes to show how workplace based assessment instruments can be used together would be of great practical benefit. The role of facilitation in workplace based assessment, and the extent of its involvement in performance improvement, must also be fully investigated. Finally, we need to discover whether formative assessment strategies such as workplace based assessment can reach Kirkpatrick’s highest levels of change, leading to improvements in care delivery and patient outcome.
Competing interests: All authors have completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) (URL) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous 3 years; no other relationships or activities that could appear to have influenced the submitted work.
Data sharing: No additional data available.
Cite this as: BMJ 2010;341:c5064