The study was a prospective randomized controlled trial. Study outcomes were not assessed at baseline to avoid a pre-test effect. The possibility of a pre-assessment leading to a higher post assessment score due to an item-practice effect is well recognised in the educational evaluative literature [11
]. However, trial participants' characteristics (i.e. gender, age, attitude towards the use of evidence about healthcare research, and details of previous training in research, epidemiology, or statistics) were collected by questionnaire prior to randomization and used as covariates to reduce variation from individual differences. Ethical approval for the study was obtained from all of the local district ethics committees from which the participants were drawn.
Selection of subjects & setting
Over a three-month period, 1,305 practitioners, working within the South and West Regional Health Authority in England, were sent an invitation to participate in one of a number of CAS workshops being run across the region. Invitations were sent to the health authority offices and all general practices in the geographical area. The letters of invitation included an explanation that agreement to take part in the workshops would include a formal evaluation. Applying to attend, which involved completion of a questionnaire with baseline questions, was taken as consent to enter the study. On receipt of a completed questionnaire, participants were randomized to either intervention or control. The intervention group were given a date to attend a CAS workshop and the control participants assigned to a waiting list to attend a workshop. The only exclusion criterion for entry into the study was attendance at a previous CAS workshop.
Sample size determination
The target sample size was 200, 100 in each group, which was chosen to allow the study to detect a 'moderate' effect size difference of 0.4 standard deviation units (in any outcome) at 80% power and a 5% significance level (2-tailed) [12
Randomization and blinding
An independent researcher used computer generated codes to allocate applicants randomly to intervention (attend a critical appraisal workshop) or control group ('waiting list'), stratified by occupation: manager/administrator; medically qualified practising physician; nurse/profession allied to medicine and 'other' professions. The researchers who scored study outcomes were blinded to the allocation of participants at all times.
The teaching programme used in this study was based on the Critical Appraisal Skills Programme (CASP). The half-day workshop centres upon facilitating the process by which research evidence is systematically examined to assess study validity, the results and relevance to a particular clinical scenario. Participants practise these skills, during the workshop, by critically appraising a systematic review article and then receive follow up materials following the workshop (see Appendix 1 for details of intervention).
Development of outcomes
Given the absence of suitable validated outcomes measures, the outcomes were developed for use in trial. A questionnaire was developed and validated (reliability and internal consistency) to assess the following outcomes – knowledge of the principles necessary for appraising evidence; attitudes towards the use of evidence about healthcare; evidence seeking behaviour; perceived confidence in appraising evidence; and, knowledge of the principles necessary for appraising evidence; attitudes towards the use of evidence about healthcare; evidence seeking behaviour; perceived confidence in appraising evidence. A copy of the outcome questionnaire can be found in Appendix 2 (see Additional file 1
). Full details of the validation process can be found elsewhere [13
The questionnaire included 18 multiple-choice knowledge questions, 7 attitude statements and 6 confidence statements. Possible response categories to the knowledge questions were 'true', 'false' or 'don't know'. Correct, incorrect and don't know responses were awarded scores of 1, -1 and 0 respectively. Knowledge scores across question were summed giving a possible range of scores from -18 to +18. Attitude statements were scored on a five-point Likert scale. A 'strongly agree' to a positive attitude statement or 'strongly disagree' to a negative attitude statement was given a score of 5. Conversely, a 'strongly disagree' with a positive attitude statement and 'strongly agree' with a negative attitude statement was give a score of 1. Attitude scores were summed giving a possible range of scores from 7 to 35. The 6 statements of confidence in critical appraisal skills statements were scored using a 1 to 5 Likert scale and summed. A minimum overall score of 5 indicated 'little or no confidence' while a maximum total score of 30 indicated 'complete confidence'.
Critical appraisal ability was assessed through the appraisal of a systematic review article. Participants' critiques were independently assessed by two of the authors (BR & PE) using a 5-point visual analogue scale, a high score indicating a superior level of appraisal skill. A framework for scoring the reviews was developed and agreement assessed; a random sample of 20 appraisals (10 control and 10 intervention) was assessed using this framework. Intra-class correlation coefficients were calculated for each of the three aspects of critical appraisal skills assessed: 'methodology' (0.86), 'results' (0.84) and 'relevance/generalisability' (0.70), indicating satisfactory inter-assessor agreement.
Assessment of outcomes
Six months after the CAS workshop, the intervention group were asked to complete the outcome questionnaire and undertake the critique of a systematic review article (different to article used in the workshop). Five to six months after randomisation, and about one month prior to attending the workshop, controls were asked to complete the same outcomes. Thus, outcomes were obtained from both groups at about the same time after randomisation.
Primary analysis of the difference between CAS training and control groups was performed on an intention-to-treat basis, adjusting for baseline characteristics. Given that not all participants in the intervention group attended a CASP workshop, a secondary explanatory analysis was also conducted, i.e. according to whether participants received the intervention or not (see Figure ). For continuous outcomes, multiple linear regression modeling was used to adjust for potential confounding arising from baseline differences in prognostic variables between groups. Regression model goodness of fit was checked by examining model residuals. Ordinal outcomes were compared by Mann-Whitney U tests, and binary outcomes were compared by Chi-squared analyses. Percentages and time variables were analysed as continuous variables. All analyses were carried out using STATA. All statistical tests used a level of significance of 0.05 and two-sided hypothesis testing. 95% confidence intervals (95% CI's) were calculated for differences between the two groups. No adjustment for multiple comparisons was made. However, all analyses were planned a priori
and reported in full. Costs were analysed using recognized methods [14
Flow diagram summarising participant recruitment and receipt of outcomes
A detailed analysis of the costs of setting up and delivering the program of CAS workshops was undertaken. This cost analysis was carried out from the perspective of the NHS. Based on information about the resources and associated costs of providing the workshops, the following items were considered – costs of inviting and processing applications to attend a workshop, time of workshop organizers in the Regional R&D Office, hire of workshop venue and catering, time and expenses of workshop tutors associated with preparing and delivering the workshops, time and expenses (including locum cover) of workshop participants associated with attending the workshops. Published health and social care costs [15
], local costs (e.g. NHS trust costs) and Whitley Council pay scale were used to estimate the value of staff time.