Based on the 183 studies included in 11 Cochrane reviews, we were able to include 161 RCTs published between 1966 and 2009. The distribution of the type of diseases evaluated in the reports was as follows: COPD (n = 49), heart failure (n = 31), T2DM (n = 48), stroke (n = 33). Nearly half (43%; n = 70) were drug trials, which assessed long-acting β agonists (n = 21), lipid-lowering agents (n = 7), metformin (n = 26), diuretics (n = 12) and oral anticoagulants (n = 4), while the remainder (57%; n = 91) assessed were non-drug interventions such as rehabilitation (n = 69) and exercise and/or diet (n = 22).
Sources of bias, overall and stratified for type of disease and type of intervention
Testing for statistical significance of baseline characteristics was not reported by 68% (n = 110) of all trials (Table ). Of the 51 papers that did reported testing for statistical significance of baseline characteristics, 38% (n = 20) found at least one characteristic with a significant difference. Of these twenty papers, two (10%) had results adjusted for the characteristic; four (20%) had the difference in baseline characteristics discussed in the Discussion section, and one (5%) (in which no significant difference in baseline characteristics was found) had the results adjusted for a characteristic measured at baseline because the difference between groups was considered to be large. A between-group comparison was reported in 90% of the 161 trials (n = 145), while 10% reported only a within-group analysis. An ITT analysis was reported in 42% (67 trials).
| Table 1Reporting of aspects of trial design, conduct and analysis: Stratified by diseases and types of Intervention |
Only 17% (n = 28) of groups reported on how missing data were handled: 50% (n = 14) carried forward last values, 27% (n = 8) performed a complete case analysis, 13% (n = 4) used a fixed value imputation and 10% (n = 3) used more advanced methods, such as a regression model that also took into account patients with missing data, and a stratified imputation (one trial used two methods of handling missing data). Only 24% (n = 40) reported both P value and 95% CI. One or more primary outcomes were defined in 33% of trials (n = 53) but only 21% (n = 34) of the trials had a single primary outcome.
For three of the sources of bias we evaluated, the trials scored similarly across disease areas (Figure ). For all four diseases, a single primary outcome was clearly defined in a similar proportion of trials, P value and 95% CI were reported in approximately 20% of trials, and between-group comparisons were conducted in approximately 90% of trials. Large differences between disease areas were present for the reporting of an ITT analysis (29 to 58% of trials), reporting on the handling of missing data (6 to 30% of trials) and not reporting on statistical comparisons of baseline characteristics (55 to 74% of trials). In the trials of drugs for heart failure, only 67% (n = 8) reported a between-group comparison, and this category of trials was also worse in terms of other sources of bias, whereas trials of drugs for COPD scored higher than drug trials of any other disease areas.
Association of trial quality with type of intervention, year of publication and impact factor
Simple regression analysis (Table ) showed that drug trials were significantly more likely than non-drug trials to clearly define one or more primary outcomes, to include an ITT analysis, and to avoid testing for differences in baseline characteristics. Defining one or more primary outcomes was significantly more likely after the year 2001. Finally, a high impact factor was associated with defining one or more primary outcomes, reporting of the handling of missing data, inclusion of P values and 95% CI, and use of an ITT analysis. In multivariate analyses (Table ), drug trials were still more likely than non-drug trials to define one or more primary outcomes, to conduct an ITT analysis, and to avoid testing for differences in baseline characteristics. A more recent year of publication remained associated with defining one or more primary outcomes and the reporting of handling of missing data. A higher impact factor remained strongly associated with the reporting of one or more primary outcomes, a description of the handling of missing data, and the reporting of P values and 95% CI.
| Table 2Simple logistic regression to compare aspects of trial design, conduct and analysis by types of intervention, time before and after CONSORT (2001) and impact of journal |
| Table 3Multiple logistic regression to assess the associations of types of intervention, time before and after CONSORT and impact of journal with aspects of trial design, conduct and analysis |
Reporting of subgroup analyses
A subgroup analysis of variables measured before randomization was reported in 27% (n = 43) trials; of these, only 23% (n = 10) reported an interaction test, whereas 77% reported either separate tests for each subgroup or tests for one subgroup only. Of the 43 trials reporting subgroup analyses, 81% (n = 35) reported a small number (< 5) of subgroups. In trials (n = 19) in which more than one significant subgroup effect was reported, 21% (n = 4) reported whether these effects were independent from other subgroup effects (that is, if interaction terms were still significant when other interaction terms were in the same regression model). Of the 34 trials that reported related outcomes, only 15% (n = 5) of trials found a consistent direction of the subgroup effects among closely related outcomes. Finally, of the forty-three trials that performed a subgroup analysis, 16% (n = 7) specified a hypothesis for a subgroup effect a priori but only two of these trials explained the rationale of their hypothesis by discussing prior evidence and biological plausibility. None of the trials pre-specified the direction of a potential subgroup effect.