Although the reliability of admission interviews has been improved through the use of objective and structured approaches, there still remains the issue of identifying and measuring relevant attributes or noncognitive domains of interest. In this present study, we use generalizability theory to determine the estimated variance associated with participants, judges and stations from a semi-structured, Medical Judgment Vignettes interview used as part of an initiative to improve the reliability and content validity of the interview process used in the selection of students for medical school.
A three station, Medical Judgment Vignettes interview was conducted with 29 participants and scored independently by two judges on a well-defined 5-point rubric. Generalizability Theory provides a method for estimating the variability of a number of facets. In the present study each judge (j) rated each participant (p) on all three Medical Judgment Vignette stations (s). A two-facet crossed designed generalizability study was used to determine the optimal number of stations and judges to achieve a 0.80 reliability coefficient.
The results of the generalizability analysis showed that a three station, two judge Medical Judgment Vignettes interview results in a G coefficient of 0.70. As shown by the adjusted Eρ2 scores, since interviewer variability is negligible, increasing the number of judges from two to three does not improve the generalizability coefficient. Increasing the number of stations, however, does have a substantial influence on the overall dependability of this measurement. In a decision study analysis, increasing the number of stations to six with a single judge at each station results in a G coefficient of 0.81.
The Medical Judgment Vignettes interview provides a reliable approach to the assessment of candidates' noncognitive attributes for medical school. The high inter-rater reliability is attributed to the greater objectivity achieved through the used of the semi-structured interview format and clearly defined scoring rubric created for each of the judgment vignettes. Despite the relatively high generalizability coefficient obtained for only three stations, future research should further explore the reliability, and equally importantly, the validity of the vignettes with a large group of candidates applying for medical school.