The protocol for this trial and supporting CONSORT checklist are available as supporting information; see Protocol S1
and Checklist S1
Seventy-two MLPs were selected from 36 Ugandan health facilities. Inclusion and exclusion criteria for both health facilities and clinicians are in . The 36 health facilities represented all administrative regions of Uganda.
Inclusion and exclusion criteria for health facilities and mid-level practitioners.
Two MLPs were selected from each participating health facility. Registered midwives who met the inclusion criteria could be selected when a registered nurse was not available. All selected MLPs had a secondary school education; clinical officers had three years of pre-service training and two years of internship, registered nurses and registered midwives had three years of pre-service training, and registered nurse-midwives had four and one-half years of pre-service training. Preference was given to eligible MLPs with two characteristics: 1) held leadership roles such as in-charge of ward or clinic, or were focal person for malaria, TB, HIV, or prevention of mother-to-child transmission of HIV (PMTCT); and 2) had previous training and experience in counseling and/or Integrated Management of Childhood Illness (IMCI), Integrated Management of Adolescent and Adult Illness (IMAI).
IDCAP was reviewed and approved by the School of Medicine Research and Ethics Committee of Makerere University (reference number 2009-175) and the Uganda National Committee on Science and Technology (reference number HS-722). Written informed consent was obtained from participants for secondary analysis of IDI training program data. The University of Washington Human Subjects Division determined that it did not meet the regulatory definition of research under 45 CFR 46.102(d).
Two interventions sought to support the development of routine and complex clinical reasoning skills: IMID training program and OSS. As shown in , all 72 MLPs participated in IMID core training from March to June 2010. Thirty-six MLPs at 18 randomly selected facilities participated in OSS beginning in April 2010 after the first session of the IMID core course. The interventions and the evidence in the medical education literature supporting them are described in detail in Miceli A, et al. 
Flow diagram of mid-level practitioners who attended the Integrated Management of Infectious Diseases course.
The IMID training program began with a 3-week core course, followed over a 24-week period by two 1-week boost courses and distance learning. In 2010, a 3-week session was offered to participants in arm A in either March (A1) or April (A2) 2010, and to arm B in May (B1) or June (B2). Twelve and 24 weeks after the 3-week IMID course, all participants participated in 1-week boost courses. The start and end dates were different for each session, but the duration of follow-up was the same. The second boost course for the A1 session finished on 1 October 2010 and for the B2 session on 17 December 2010.
The 3-week IMID core course addressed diagnosis and management of HIV/AIDS, malaria, TB, diarrhea, acute respiratory infections and other infections of local importance in pregnant women, non-pregnant adults, infants and children. Its content was specifically adapted to Ugandan national policy and to the clinical context of a health center IV (HC IV), which included responsibility for child health and prevention, management, and control of infectious diseases within a health subdistrict. 
The content was summarized in 14 Clinical Decision Guides. The core course and boost courses included both classroom sessions taught at IDI by expert clinicians in Kampala, Uganda and12 half-day clinical rotations. Distance learning between courses used a case-based method; participants reflected on and recorded cases encountered in their home health facilities using structured log books. Review of original IMID content during boost courses built on the participants' cases from the log books.
All participants were also encouraged to use the AIDS Treatment Information Center (ATIC), which is a Kampala-based warm-line staffed by medical doctors and pharmacists experienced in infectious disease, for advice on management of complicated patients.
The facilities randomized to Arm A (1
1 allocation) participated in monthly OSS beginning in April 2010. Arm B participated in OSS beginning in March 2011, but the impact of OSS on clinical competence in Arm B was not assessed. OSS was provided by a four-member mobile team: medical doctor, clinical officer, registered nurse and laboratory technologist. The teams' two-day visits included multidisciplinary didactic sessions, discipline-specific break-out sessions, mentoring for both clinical and laboratory staff, and continuous quality improvement (CQI) activities. Each OSS visit was structured around a theme, beginning with “Emergency, Assessment, Triage, and Treatment.”
The OSS sessions for clinicians were based on IMID core course materials. The multi-disciplinary training was primarily an overview of national guidelines. The IMID participants attended the break-out session for clinicians, which focused on the Clinical Decision Guides. For the mentoring sessions for clinicians, all of the mentors attended the pilot version of the IMID core course. The mentoring sessions varied however, with the patients at the facility during the OSS visit. The mentors sought to build the following six patient care competencies: 1) history taking, 2) routine physical examination including danger signs, 3) clinical reasoning including identification of differential diagnoses, 4) ordering laboratory investigations, 5) appropriate diagnosis, and 6) appropriate treatment and management plan.
The CQI activities were designed to support the CQI teams at the sites, which included IMID participants, and focused on a subset of 13 of the facility performance indicators. All of the indicators were selected in collaboration with the curriculum developers for the IMID core course and reflected its content. Some indicators measured individual clinicians' performance, such as reducing the percentage of patients with a negative malaria smear who were treated with anti-malarials. Others measured team performance, such as increasing the percentage of malaria suspects with laboratory tests for malaria for which the clinician ordered the test and the laboratory staff performed it. The sites generally chose to focus on six indicators. Three CQI activities for each visit were organized around those indicators: 1) preparing data on the indicators, 2) mapping processes of care, and 3) reviewing the data and processes of care to identify problems and goals for the next month.
The hypothesis for the single-arm intervention with pre-post design was that the IMID course is effective at building clinical competence, where competence was measured by participating clinicians' scores on written case scenarios. The hypothesis for the cluster-randomized trial component was that OSS will yield additional improvement in the competence of individual MLPs relative to those in the control arm, using the same measure.
The primary objectives of our assessment of the impact of IDCAP interventions on individual clinician competence were:
- Estimate mean change in written case scenario scores from t0 (baseline) to t1 (end of 3-week IMID course) for arms A and B combined. (Between t0 and t1, both arms received the same intervention.)
- Compare mean changes in written case scenario scores from t1 to t2 (end of second boost course) between arm A and arm B. (Between t1 and t2, only arm A received OSS.)
- Estimate overall mean changes in scenario score for arms A and B from t0 to t2 and t1 to t2.
Secondary objectives were description of association of scores with characteristics of scenario administration, and differences in scores over testing points for individual case scenarios.
The primary outcome was change in aggregate scores (all participants or across arms, depending on objective) on written case scenarios across three time intervals (t0–t1, t0–t2, t1–t2). For the cluster-randomized trial only, the primary outcome was the difference in score change across study arms from t1 to t2.
We selected written case scenario scores, sometimes referred to as vignettes, as our primary measure of competence, because case scenarios test both knowledge and clinical reasoning skills. Case scenario scores have been validated as a measure of quality of care against data from clinician encounters with standardized patients in the United States. 
Case scenario scores have also been used successfully to describe differences in clinician competence across groups characterized by practice setting, level of training, and other factors. 
Recently, Das et al. identified a gap between case scenarios and standardized patients or observation of clinical care as measures of practice, called the “know-do-gap,” among doctors in India and Tanzania. 
Leonard et al. noted differences in the gap across professions in Tanzania, but the differences were associated with the organization where they practiced rather than years of training. 
Twelve case scenarios were designed to cover the main elements of IMID content. A sample case scenario is presented in Web Appendix S1
. Scenario structure was based on a template that included danger/emergency signs, history, physical examination, laboratory testing, initial diagnosis and treatment, and evolution of the case over time (hours to months).The template document also referenced specific IMID curriculum sessions and Ugandan national policy documents that addressed the subject of each question. Scenario questions were short-answer and open-ended (e.g. “What are the three most likely causes of this patient's current signs and symptoms?”). In a pre-trial pilot, scores on draft versions of the case scenarios increased significantly after MLP exposure to a pilot version of the IMID core course (Weaver M et al., unpublished manuscript).
Available time for assessment did not allow for administration of all 12 case scenarios to each participant at a single testing point. In addition, each case scenario was structured in four parts where the answers to one part were revealed at the beginning of the next part (except the fourth). To isolate course learning as opposed to familiarity with the case scenarios, participants responded to different case scenarios at each testing point. Consequently, the 12 case scenarios were divided into three blocks of four scenarios, and each block was assigned to participants from one-third of the sites within each arm at each testing point. Each block contained material relevant to HIV/AIDS, malaria, tuberculosis, and selected other infectious diseases in pregnant women, non-pregnant adults, and infants/children. The competencies differed across blocks; for example case scenario 2 in block A addressed AIDS in a non-pregnant adult, whereas case scenario 10 in block C addressed AIDS in a pregnant woman. briefly describes the content of the case scenarios and their distribution across blocks.
Scenario Content Description; Evolution of Scores on New Scenarios by Scenario and Block.
Within each arm, at t0 12 participants (two from each of six randomly selected sites) completed block A, participants from another six sites completed block B, and participants from the remaining six sites completed block C. As shown in , the block allocations were then rotated at subsequent testing points, so that participants completed different blocks at t1 and at t2. Within each arm, all 12 scenarios were completed by 12 participants each time; over the three testing points, each participant completed all 12 scenarios. This design allowed us to compare mean scenario scores across arms and time points, but not evolution of scores at the level of the individual participant.
Allocation of case scenarios across testing points.
To mitigate against the possible impact of fatigue or time constraints on case scenarios scores, the order of the scenarios within each block was also randomized for each participant. Within each block of four case scenarios, there were 24 (4!) possible sequences; for example 1234 and 1243. The sequence for the first block was repeated in subsequent blocks; for example a participant whose sequence in block A at t0 was 1234, had sequence 5678 in block B at t1. We selected 12 of 24 possible sequences for each block and randomly assigned one sequence to each of the 12 participants assigned to that block in each arm. The same 12 sequences were assigned to each arm, so the sequences were balanced across arms.
To test whether score improvements on repeated case scenarios reflected course learning or familiarity with case scenario content, each participant repeated one (at t1) or two (at t2) randomly selected scenarios from earlier testing points. Their position in the sequence for the current time was also selected at random. Pretest (t0) case scenarios that were selected to be repeated at post test (t1) were removed from possible selection for final test (t2).
Secondary outcomes were differences in scores on new case scenarios (scenarios not previously seen by the individual participant) and repeat scenarios (scenarios that were completed by the same participant at more than one time), differences in scores associated with the order in which case scenarios were completed, and differences in scores associated with individual scenarios.
Two experienced Ugandan physicians scored the scenarios based on pre-specified scoring guides developed specifically to reflect IMID training program content. To eliminate inter-scorer variability, a case scenario was always scored by the same person (Weaver M, et al., unpublished manuscript). After t1, the four co-authors who are clinicians (IC, SE, MG, JN) and Paula Brentlinger reviewed the participants' answers at t0 and t1 to identify correct answers that had not been anticipated in the original scoring guidelines; for example, clinical actions that were technically correct but generally not relevant in the HC IV context, such as requesting computerized tomography of the brain. They also reviewed changes in Ugandan national policy guidelines and/or IMID training program content that had occurred after the case scenarios were drafted. The scoring guidelines were revised and expanded, based on consensus. The revised guidelines were used to score all of the scenarios and those scores are reported below.
The sample size calculations for IDCAP were based on testing the effect of OSS on facility performance, and thus were not based on power requirements for the analysis of the case scenarios. The sample size calculations are reported in Naikoba S. et al. (unpublished manuscript).
Each facility selected two MLPs to attend the IMID training program for a total of 36 MLPs per arm. The initial proposal was to train all the MLPs at the site, based on evidence of the effect of IDI's 3-week Comprehensive HIV Care Including ART course for doctors 
and 1-week Integrated Management of Malaria course for teams of MLPs, laboratory professionals, and records staff 
on clinical practice. Funding was not available to train all MLPs at 36 sites however, so the effectiveness of IMID for two MLPs per facility was tested in the hopes of offering it to all MLPs in the future.
Health facilities were assigned to arm A (OSS) or arm B (delayed OSS) by stratified random selection (see ). Sites were stratified by two characteristics: 1) prior experience with the Health Care Improvement project, a CQI program for HIV prevention and treatment vs. CQI naïve, and 2) current or prior support from the Baylor International Pediatric AIDS Initiative (BIPAI) for clinical mentoring in pediatric HIV/AIDS vs. no BIPAI (for more information, please see http://www.bipai.org/Uganda/
). Sites were then randomly assigned to arm A or B (1
1 balance) within those strata.
Randomization of health facilities to arm was implemented using random number generation in Stata 10.1. As noted above, sequences of case scenarios were also randomly assigned to participants so that the case scenarios sequences would be balanced across arms at all three time points, also using the same method.
Randomization to arm A or B occurred on February 23, 2010 after the majority of participants had completed baseline clinical assessments to measure clinical performance in January and February 2010. Within two weeks of the A1 session of the IMID core course, arm A participants were notified of their upcoming course dates and arm assignment. Allocation was not concealed during the IMID training program and testing points. Randomization of participants to sequences of case scenarios occurred on March 17, 2010 before the A1 session of IMID.
The generation of random sequences was performed by the co-author who is a biostatistician (MLT) and who was not involved in site selection or participant enrollment. Participants were assigned to interventions based on the allocation of their home health facility to arm.
This study was not blinded.
For estimation of mean aggregate score changes for the three possible time intervals (t0–t1, t1–t2, t0–t2), we used linear mixed-effects models with individual score on a single new case scenario as the dependent variable, time and scenario as fixed effects, and participants nested within health facility as random effects. The inclusion of a random effect for health facility did not meaningfully alter the results and this variable was not included in the analyses reported below. For assessment of score differences across arms between t1 and t2 (the only interval in which the two arms received different interventions), we included an interaction between time and arm.
For comparison of new vs. repeat scores, we used the model described above with individual scores on all case scenarios, with the addition of a dichotomous variable for new vs. repeat scenario. We assessed the effect of scenario order using the model described above and all case scenarios, including a categorical variable for order in which the scenario was completed and, alternatively, including linear spline terms.
We conducted exploratory analyses with a model that included dummy variables for hospital, registered nurse, and registered midwives to control for their effects, because facility-type and profession of the participants were not balanced across arms (see baseline data below).
All analyses were conducted in Stata 11.0 (StataCorp LP, College Station TX, 2009). The probability of type I error to define statistical significance was 0.05 and all tests were two-sided.