Objectives
The aim of this study is to evaluate how two independent VP design variables influence their effectiveness as an educational tool in musculoskeletal medicine. The specific objectives in the study are firstly to evaluate the performance of students exposed to different virtual patient designs in identical assessments of clinical reasoning skills. Secondly we aim to determine how different VP designs influence the student experience when using a VP. Finally we are attempting to explore the relationships between student performance in VP assessment metrics and other measurements of clinical skills, including written and clinical examinations.
Study design
This is a randomised 2x2 factorial design study evaluating two independent variables of VP design, branching (present or absent), and structured clinical reasoning feedback (present or absent).
Setting and participants
The setting is three university medical schools in the United Kingdom. These are the Warwick Medical School (WMS), the University of Birmingham Medical School (UBMS), and Keele Medical School (KMS). WMS runs a four year MBChB degree open only to graduate entry medical students, UBMS and KMS have a five year MBChB degree course, open to undergraduate entry medicine (UEM) graduate entry medical (GEM) students. The research project will run from 2011 to 2013.
Virtual Patient software information technology
Virtual patient cases in the study are created to the Medbiquitous standard [
12] using the XML programming language [
29]. The software used to create and host the cases is DecisionSim® v2.0, developed by the University of Pittsburgh. The cases are compatible with open source VP systems such as Open Labyrinth [
7]. Access to cases, participation, electronic consent, and post case evaluations will be controlled by the VP software, and content hosted on the University of Warwick virtual learning environment Internet pages. Students will be registered with and logged in to the software, allowing tracking of decisions and performance.
Randomisation
The study follows the CONSORT statement on randomised trials [
30]. A flow diagram of the study design is seen in Figure

. Students from the eligible year-groups in each institution will be allocated to one of four intervention groups using block randomisation. Each of the university cohorts will be randomised individually. Block randomisation will use a computerised random number generator to allocate students. The primary investigator (JB) will implement the allocation and hold a record of the sequence.
Recruitment and baseline data collection
All eligible students will be invited to participate in the study. Inclusion criteria are students in the year group studying MSK medicine. The only exclusion criteria are students who do not volunteer or consent. Eligible students will be invited to attend an oral presentation and demonstration of the study, and given an approved study participant information sheet. Students who do not electronically record their informed consent will not be able to complete any cases, and are not considered to be participants. Students who consent will be considered to be study participants from this point onwards. At this point the baseline data collected from students will be gender, email address, student type (UEM/ GEM), year of study, and institution.
Additional data and information on other aspects of student performance will be collected from the examinations officer at WMS only. This includes student performance on formative clinical and written assessments at both at the end of the musculoskeletal block, and the end of year assessments.
Intervention and Independent design variables
The intervention consists of students completing four VP cases sequentially. Each case takes approximately 30 minutes to complete. The cases focus on four core clinical musculoskeletal areas. These are large joint arthritis, back pain, polyarthritis, and connective tissue disease. The 2x2 factorial study design means that any cases can be designed in four different ways. The four case designs are: A) not branched+

no-feedback; B) branched+

no feedback; C) not branched+

feedback; D) branched+

feedback. Students will use all four of the case designs during the research (see Figure

).
The first variable is branching pathways through the VP, present or absent. There are four branching points with three choices through the thirty-minute case. This gives a possible 81 core pathways (3^4) through the case in a branched form. The linear case has a single core pathway, with participants being redirected back to the core pathway irrespective of the decision made, for example by feedback from a supervising clinician in the case. The second variable is the use of structured feedback to promote clinical reasoning skills, present or absent. This will be in a predetermined approach through the cases at five key points through the case, based on the ‘SNAPPS’ approach [
20], systematic approaches to help Bayesian reasoning [
21], and symptom categorisation [
31].
Cases will be piloted and tested by healthcare professionals and a cohort of students in one centre prior to the study commencing. For the study, students will complete cases at WMS and KMS is in the form of sequential teaching sessions to students, taking place in a computer cluster. Students at UBMS will complete cases during allocated self-study time during their musculoskeletal block.
Other than the described independent variables we will control for other design variables highlighted in a critical literature review [
2].
Inclusion and exclusion criteria
Inclusion criteria are students enrolled on the medical degree course and in the musculoskeletal teaching block in one of the medical schools in the study. Students must electronically sign consent to be included. Exclusion criteria are students from other year-groups. Students registered for a medical degree are required and assumed to have appropriate language and information technology skills.
Blinding
Students will be blind to their group allocation. Investigator blinding for the purposes of the data analysis and allocation is not used. In the institution where clinical examination performance is recorded, none of the investigators examine within the clinical specialty (musculoskeletal medicine).
Outcome measures
The primary outcome measures in this study are the performance in standardised composite clinical reasoning assessment using validated tools, and a modified self reported 15-item evaluation, reviewing four domains. These will be completed both during and immediately following each case. The secondary outcome measures are engagement and patterns of use within the cases and collected from the online environment (see Table

). For each case, the composite clinical reasoning assessment consists of validated assessment of clinical reasoning. These 15 items are eight ‘key feature problem’ questions, one Bayesian reasoning question, two multiple choice questions on diagnosis, and four multiple choice clinical decision questions. For each case the content of these 15 items is identical, allowing comparison between case designs. This allows comparison of a case which is not branched with no structured feedback vs. the same case in a branched format with structured feedback.
| Table 1Outcomes Measured during the study |
Student evaluation of each case will be collected using an electronic version of the EViP questionnaire, a fifteen item self reported evaluation. This explores exploring authenticity, professionalism, learning, and coaching through the case, using Likert scales with additional free text responses. Secondary outcome measures for each case are student’s patterns of use of the case, such as time taken per case, case completion rates, and time taken to complete individual decisions.
Additional data will be collected from one centre, WMS, to support the validity of the VPs as educational and assessment tools. This includes a pre- and post-test Diagnostic Thinking Inventory, a 41 item validated assessment of clinical reasoning ability [
24]. Performance in summative and formative written and clinical assessments, measured one week following the VP case, and several months later will also be collected.
Sample size determination
The authors agreed an important educational effect of a 5% difference in the score on validated assessments of clinical reasoning skills, and student self reported evaluations score. As no gold standard exists for the measurement of clinical reasoning skills, we have based the sample size calculation on performance for clinical reasoning on performance in the key feature problems (KFPs) integrated into each VP case. A previous study has shown mean KFP scores in a student population to be approximately normally distributed, with a standard deviation of 1.32 [
32]. In this study where we will use 16 KFPs, a 5% difference in scores is considered significant, that is a difference in mean scores of 0.8, corresponding to a standardised effect size of approximately 0.6 (moderate to large). Based on these assumptions, we would require a total sample size of 88 students to detect this difference with 80% power at the (two-sided) 5% level. Assuming the effect size to be the same for both branching and feedback interventions, a sample size of 88 students would provide sufficient power to detect the main effects and an interaction effect that was twice as large as the assumed main intervention effect in the setting shown in Table

.
| Table 2Sample size calculation for Key Feature Problems outcomes, and student self evaluation scores |
If the interaction between branching and feedback interventions is of same order of magnitude as the expected main effects then we would require a fourfold increase in the sample size to give a total of 352 students. [
33]
For self-reported scores, where a previous study reported a standard deviation of 0.93, [
34] a 10% difference in scores (with a maximum of 5) is considered significant, that is a difference in mean scores of 0.5, corresponding to a standardised effect size of approximately 0.5 (moderate). Based on these assumptions, we would require a total sample size of 112 students to detect this difference with 80% power at the (two-sided) 5% level (Table ).
Therefore 112 students would be required to detect the main effects and large interaction effect of branching and feedback on self-reported scores. To detect an interaction effect of the same order of magnitude as the expected main effects would require a total of 448 students. The pool of students available for recruitment into this study is large at the three centres (WMS, n~160; UBMS, n~400; KMS, n~150). Given unforeseen recruitment problems and some loss to follow-up, a target of 112 students should be easily achievable to quantify the main intervention effects (branching and feedback) which are the primary focus of the study, with increasing recruitment above this target providing increasing power to detect potential interactions between the main effects.
Data analysis
We will present absolute numbers for enrolment, eligibility, and complete follow up. Descriptive statistics will be used to present student demographics, along with the mean, standard deviation, standard error of the mean, and 95% confidence intervals for primary and secondary outcome measures.
The primary analysis will be based on complete cases on a per-protocol analysis. It seems likely that some data may not be available due to voluntary withdrawal of participants, or drop-out through lack of completion of individual data items , unforeseen technical difficulties, and general loss to follow-up. Where possible the reasons for data ‘missingness’ will be ascertained and reported. The pattern of the missingness will be carefully considered and the reasons for non-compliance, withdrawal or other protocol violations will be stated and any patterns summarised. The primary analysis will investigate the fixed effects of the factorial combinations of branching and feedback on the primary outcome measures, performance in a standardised composite clinical reasoning assessment and a 15-item self reported evaluation. Analysis of covariance (ANCOVA) will be used to identify main effects, effect sizes, and interactions between the two independent design variables (feedback and branching). Blocking factors in the ANCOVA will adjust for the effects randomisation group, case ordering and recruiting centre, with student GEM status and gender as covariates. Tests from the ANCOVA will be two-sided and considered to provide evidence for a significant difference if p-values are less than 0.05 (5% significance level). Estimates of treatment effects will be presented with 95% confidence intervals. Students case preferences for learning and realism, and EViP will be evaluated using chi-squared tests for grouping factors case design and number.
We will determine the predictive validity of performance in the VP composite assessment, using one institution’s summative examination results, WMS. We will use the correlation coefficient (Pearson’s product–moment, r) to determine the effect size of any linear correlation between the VP scores and institution examinations.
A detailed statistical analysis plan (SAP) will be agreed with the trial management group at the start of the study, with any subsequent amendments to this initial SAP being clearly stated and justified. The routine statistical analysis will mainly be carried out using R (
http://www.r-project.org/) and S-PLUS (
http://www.insightful.com/). Results from this study will also be compared with results from other studies.