|Home | About | Journals | Submit | Contact Us | Français|
Learning in a clinical environment differs from formal educational settings and provides specific challenges for clinicians who are teachers. Instruments that reflect these challenges are needed to identify the strengths and weaknesses of clinical teachers.
To systematically review the content, validity, and aims of questionnaires used to assess clinical teachers.
MEDLINE, EMBASE, PsycINFO and ERIC from 1976 up to March 2010.
The searches revealed 54 papers on 32 instruments. Data from these papers were documented by independent researchers, using a structured format that included content of the instrument, validation methods, aims of the instrument, and its setting.
Aspects covered by the instruments predominantly concerned the use of teaching strategies (included in 30 instruments), supporter role (29), role modeling (27), and feedback (26). Providing opportunities for clinical learning activities was included in 13 instruments. Most studies referred to literature on good clinical teaching, although they failed to provide a clear description of what constitutes a good clinical teacher. Instrument length varied from 1 to 58 items. Except for two instruments, all had to be completed by clerks/residents. Instruments served to provide formative feedback ( instruments) but were also used for resource allocation, promotion, and annual performance review (14 instruments). All but two studies reported on internal consistency and/or reliability; other aspects of validity were examined less frequently.
No instrument covered all relevant aspects of clinical teaching comprehensively. Validation of the instruments was often limited to assessment of internal consistency and reliability. Available instruments for assessing clinical teachers should be used carefully, especially for consequential decisions. There is a need for more valid comprehensive instruments.
The online version of this article (doi:10.1007/s11606-010-1458-y) contains supplementary material, which is available to authorized users.
High-quality patient care is only feasible if physicians have received high-quality teaching during both their undergraduate and their residential years.1,2 Their medical development starts in a university environment and continues in a clinical setting, where they predominantly learn on the job. Most teaching in clinical settings is provided by physicians who also work in that clinical setting. Therefore, it is important that these physicians should be good and effective teachers.3,4
There is a considerable body of literature on the roles of clinical teachers, including several review studies.5,10 Excellent clinical teachers are described as physician role models, effective supervisors, dynamic teachers, and supportive individuals, possibly complemented by their role as assessors, planners, and resource developers.11,13 Some of this literature, including a recent review, has described good clinical teachers by looking for typical behaviors or characteristics, which often fit into one or more of the above-mentioned roles.14,18
There is also a considerable body of literature defining physicians in terms of single roles, such as role models or supervisors.3,19,25 Effective role models are clinically competent, possess excellent teaching skills, and have personal qualities, such as compassion, sense of humor and integrity. Effective supervisors give feedback and provide guidance, involve their students in patient care, and provide opportunities for carrying out procedures. Studies on work-based learning show that work allocation and structuring are important for learners to make progress and that a significant proportion of their work needs to be sufficiently new to challenge them without being so daunting as to reduce their confidence.26,29 To assign work that provides effective learning opportunities, therefore, is essential.30
Physician competencies which should be acquired by trainees during their training have recently been formulated.1,4,31,32 Clinical teachers should at least role model these competencies.33 Box 1 summarizes the roles of the clinical teacher.
The assessment of clinical teachers in postgraduate education is often based on questionnaires completed by residents.25 It is important that these instruments should have good measurement properties. If used to help improve clinical teaching skills, such instruments should provide reliable and relevant feedback on clinical teachers’ strengths and weaknesses.6,25 If used for promotion and tenure, or ranking of clinical teachers, instruments should be able to distinguish between good and bad teachers in a highly valid and reliable way.
The American Psychological and Education Research Associations published standards identifying five sources of validity evidence by: (1) Content, (2) Response process, (3) Internal structure, (4) Relations to other variables, and (5) Consequences (see Box 2).34,37
Beckman34 extensively reviewed instruments for their psychometric qualities, thereby giving useful recommendations on ways of improving this quality. However, we are unaware of any studies that focus specifically on the content of these questionnaires in relation to literature on good clinical teaching and on how instruments are used in practice. Therefore, we performed a review of instruments for assessing clinical teachers in order to determine (1) the content of these instruments (what they measure) and (2) how well these instruments measure clinical teaching (their construction and use).
We searched the MEDLINE, EMBASE, PsycINFO, and ERIC databases from 1976 through March 2010 (see online appendix Search Strategy). Search terms included clinical teaching, clinical teacher, medical teacher, medical education, evaluation, effectiveness, behavior, instrument, and validity. A manual search was performed by reviewing references of retrieved articles and contents of medical education journals. Two authors (CF and SB) independently reviewed the titles and abstracts of retrieved publications for possible inclusion in the review. If the article or the instrument was not available (N=5), we contacted the author(s). Studies were included after they had been reviewed by two authors (CF and SB) to make sure that they: (1) reported on the development, validation, or application of an instrument for measuring clinical teacher performance; (2) contained a description of the content of the instrument; (3) were applied in a clinical setting (hospital or primary care); (4) used clerks, residents, or peers for assessing clinical teachers. We restricted our review to studies published in English.
A standardized data extraction form was developed and piloted to abstract information from the included articles. Data extraction was done by three authors: the first author (CF) assessed all selected articles; two other authors (SB and MW) each assessed half the articles. Disagreements about data extraction merely concerned the content of six questionnaires, which were discussed by the three authors until consensus was reached.
The content of the instruments was assessed in two ways. First, we ascertained to what degree these instruments reflected the domains described in the literature as being characteristic of good clinical teaching (see Box 1). As the instruments we examined focused on teaching in daily clinical practice, we excluded the domain of ‘resource developer’ as this concerns an off-the-job activity which, moreover, may be more difficult for trainees to assess. Secondly, we wanted to know to what extent instruments assessed the way clinical teachers teach their residents the medical competencies of physicians. Therefore, we used the medical competencies as described by the Canadian Medical Educational Directives (CanMEDS): medical expert, communicator, collaborator, manager, health advocate, scholar, and professional, as these have been widely adopted. Good physician educators would be expected to act as role models of these competencies and be effective teachers of these competencies.4,33,38
The five sources for validity evidence served to analyze the psychometric qualities of the instruments (Box 2). Information was extracted about the study population, the setting where the instrument was used, evaluators, number and type of items, feasibility of the instrument (duration, costs, and number of questionnaires needed), the aim of the instrument, and how the instrument had been developed.
Reliability coefficients estimate measurement error in assessing and quantifying the measurement’s consistency.37 The most frequently used estimates are Cronbach’s α (based on the test-retest concept and indicating internal consistency), the Kappa statistic (a correlation coefficient indicating inter-rater reliability), ANOVA (also indicating inter-rater statistics), and generalizability theory (to estimate the concurrent effect of multiple-source reliability, note that this not refer to external validity of a measure). Comparisons with other instruments or related variables were documented.
Finally, the purpose of the instrument was documented: feedback (formative assessment) or promotion and tenure (summative assessment).
We found 2.712 potentially relevant abstracts, 155 of which were retrieved for full text review (see online Appendix Flow Diagram). Application of the inclusion criteria resulted in 54 articles.33,38,90 As some articles were about the same instrument, a total of 32 instruments was found. Table 1 presents their general characteristics. Instruments were most frequently used in an inpatient clinical setting (N=20) and tested in one discipline (N=25 ). Instruments were completed by residents (N=16), students/clerks (N=18), trained observers (N=2), or peers (N=1). Most instruments (N=28) were developed in the USA. There was a wide range in the number of teachers (9-711, median 41) and evaluators (2-731, median 66) involved in validation of the instruments.
The ‘teacher’ and ‘supporter’ domains were represented most frequently in the instruments (30 and 29 instruments), followed by ‘role model’ (27 instruments), and ‘feedback’ in 26 instruments (see online Appendix Table A). Together, these were expressed by 479 (79%) of all items. Most of these items concerned teaching techniques (216 items, 36%). The domain of ‘planning teaching activities’ was represented by 33 items in 18 instruments. ‘Assigning work that is effective for learning’ was represented by 29 items, that is 5% of all items, in 13 instruments. Items about ‘assessment’ were represented by nine items (2% of all items) in five instruments. Fifteen instruments asked for overall teaching quality or effectiveness (OTE). Seven instruments contained one question or several questions that were either open questions or questions that were not directly related to the quality of the individual teacher (other).
About one-third of all items (213 items) could be related to the competencies as described by CanMEDS (see online Appendix Table B). The other items did not refer specifically to competencies as specified in the CanMEDS roles. More than half of these (129 items) were related to the medical expert (102 items) and scholarship (27 items) competencies, evaluating the teaching of medical skills and knowledge (e.g., ‘the physician showed me how to take a history’; ‘uses relevant scientific literature in supporting his/her clinical advice’). There were 42 items on professional behavior. Role modeling and teaching (101 and 71 items) are strategies most frequently associated with the teaching and learning of competencies (e.g., ‘is a role model of conscientious care’, ‘sympathetic and considerate towards patients’).
The measurement characteristics of the validated instruments have been summarized in Table 2. The content validity of most instruments was based on the literature and the input of experts and residents/students. In 17 studies, a previously developed instrument served as a basis for the development of the new instrument. Irby’s questionnaire (1986) was mentioned four times for the development of an instrument.39,70,72,79 Not all studies documented what previously developed instrument had been used. Five studies reported the use of a learning theory for questionnaire construction.54,65,67,84,90 The number of available evaluations varied from 30 to 8,048 (median 506). Instruments contained 1 to 58 items, with Likert scale points ranging from four to nine. Information about feasibility in terms of costs, time needed for filling in the questionnaire, or minimum number of questionnaires needed was reported in eight instruments.
Studies represented a variety of validity evidence procedures, with the most common one being the determination of internal consistency by internal structure by factor analysis and/or Cronbach’s α (20 studies). Less common validation methods were determining inter-rater and intra-class correlations, Pearson correlation coefficients, Spearman Brown formula, and studies using the statistical generalizability theory. In some studies, scores were compared to the overall teaching score or scores on other instruments. Some studies reported on hypotheses formulated in advance or compared scores of different respondent groups.33,43,46,55,64,70
The reported purposes of clinical teacher evaluations are summarized in Table C (see online Appendix Table C). Not all authors documented how their instrument was to be used. Although providing feedback is the evaluation aim mentioned most frequently, 14 authors reported that the instrument was or would be used for summative purposes such as promotion, tenure, or resource allocation.
Our review revealed 32 instruments designed for evaluating clinical teachers. These instruments differ in terms of content and/or quality of the measurement.
Most instruments cover the important domains of teaching, role modeling, supporting, and providing feedback, roles that have been emphasized in the literature on clinical teaching.
Items on assessment of residents are least represented in the instruments. Assessment is becoming more and more important since society is increasingly demanding accountability from its doctors.91 With the shift towards competency-based residents’ training programs, there is also a growing need for measuring competency levels and competency development, including not only knowledge and skills but also performance in practice.1,91,93 For all these reasons, assessing residents by using a mix of instruments is an important task in clinical teaching.93,96
Items on the supervisor’s role in assigning clinical work and planning are also under-represented in the instruments. Opportunities for participating in the clinical work environment and for performing clinical activities are crucial for residents’ development.30 Planning in the demanding clinical environment provides structure and context for both teachers and trainees, as well as a framework for reflection and evaluation.97 Creating and safeguarding opportunities for performing relevant activities and planning teaching activities can therefore be seen as key evaluable roles of clinical teachers.
Teaching and learning in the clinical environment need to focus on relevant content. Doctors’ competencies in their roles as medical experts, professionals, and scholars were well represented in the instruments, but doctors’ competencies in their roles as communicators, collaborators, health advocates, and managers were less frequently measured. We found two instruments that reflected all CanMEDS compentencies.38,61 Remarkably, one instrument had been developed in 1990, before the CanMEDS roles were published.61
In summary, although all instruments cover important parts of clinical teaching, no instrument covers all clinical teaching domains. Therefore, the use of any of these individual assessment tools will be limited.
The inpatient setting was used most frequently for validating and/or applying instruments, and most instruments were used in only one discipline (most frequently internal medicine). These limitations restrict the generalizability of the instruments. Different teaching skills may be required for instruction in outpatient versus inpatient settings.72,75,98 Some authors found no differences in teaching behavior in relation to the setting.52 However, Beckman compared teaching assessment scores of general internists and cardiologists and found factor instability, thus highlighting the importance of validating assessments for the specific contexts in which they are used.43
Most authors used factor analysis and Cronbach’s α to demonstrate an instrument’s dimensionality and internal consistency, respectively. Less commonly used methods included the establishment of validity by showing convergence between new and existing instruments, and by correlating faculty assessments with educationally relevant outcomes. Computing Cronbach’s α or completing a factor analysis may be the simplest statistical analysis to carry out for rating-scale data, but these analyses do not provide sufficient validity evidence for rating scales.99 Validity is a unified concept and should be approached as a hypothesis, requiring that multiple sources of validity evidence be gathered to support or refute the hypothesis.36,99,100 This suggests that a broader variety of validity evidence should be considered when planning clinical teacher assessments.
As most instruments are completed by residents and/or students, there are several issues that may affect ratings. First, residents tend to rate their teachers very highly,34 which may cause ceiling effects, but this is rarely discussed in the selected studies. Second, learners at different stages differ in what they appreciate most in teachers and, hence, may rate their clinical teachers differently.9,39 Last, anonymous evaluations reveal lower scores than non-anonymous evaluations.9,39 Though most questionnaires are anonymous, anonymity may not be realistic in a department with only a few residents. Therefore, as part of an instrument’s validation process, it should be tested in different settings, in different disciplines, by involving learners at different stages of the learning process, and by taking different evaluation circumstances into account. Even if the assessment of clinical teachers by residents could reveal valid information, evaluations should be derived from multiple and diverse sources, including peers and self-assessment, to allow “triangulation” of assessments.101
As in any systematic review, we may have failed to identify instruments. Our search was limited to English-language journals, which may have introduced publication bias.
Instruments for assessing clinical teachers are used to provide feedback but also to back up consequential decisions relating to promotion, tenure, and resource allocation. In order to improve clinical teaching, assessments need to be effective in informing clinical teachers about all important domains and in identifying individual faculty strengths and weaknesses.33,66 Therefore, it is first of all important that the full assessment package includes all aspects of clinical teaching. In addition to the well known domains of teaching, role modeling, providing feedback, and being supportive, other domains also need attention, particularly the domains of assigning relevant clinical work, assessing residents, and planning teaching activities. Real improvement is more likely to be accomplished if all important domains are included in the selected set of assessment instruments.33,54,84 This would likely require multiple complementary evaluation instruments.
Secondly, further study is needed to determine whether instruments can be validly used to assess a wider range of clinicians in different settings and different disciplines. Thirdly, evidence of an instrument’s validity should be obtained from a variety of sources. Fourthly, we need to determine what factors influence evaluation outcomes, for instance, year of residency and non-anonymous versus anonymous evaluations. Finally, optimal assessment needs to balance requirements relating to measurement characteristics, content validity, feasibility, and acceptance.102 The primary requirement for any assessment, however, is that it should measure what it stands for, that is teaching in the clinical environment.
The authors would like to thank Rikkert Stuve of ‘The Text Consultant’ for editing of the manuscript.
This work was funded by the Department for Evaluation, Quality and Development of Medical Education of the Radboud University Nijmegen Medical Centre.
Conflict of Interest None disclosed.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.