|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: JAH JH JKA RP CK CH. Performed the experiments: JAH JH NR CK. Analyzed the data: JAH JH CK. Contributed reagents/materials/analysis tools: RP. Wrote the paper: JAH JH JKA CK CH.
Randomization, allocation concealment, and blind outcome assessment have been shown to reduce bias in human studies. Authors from the Collaborative Approach to Meta Analysis and Review of Animal Data from Experimental Studies (CAMARADES) collaboration recently found that these features protect against bias in animal stroke studies. We extended the scope the work from CAMARADES to include investigations of treatments for any condition.
We conducted an overview of systematic reviews. We searched Medline and Embase for systematic reviews of animal studies testing any intervention (against any control) and we included any disease area and outcome. We included reviews comparing randomized versus not randomized (but otherwise controlled), concealed versus unconcealed treatment allocation, or blinded versus unblinded outcome assessment.
Thirty-one systematic reviews met our inclusion criteria: 20 investigated treatments for experimental stroke, 4 reviews investigated treatments for spinal cord diseases, while 1 review each investigated treatments for bone cancer, intracerebral hemorrhage, glioma, multiple sclerosis, Parkinson's disease, and treatments used in emergency medicine. In our sample 29% of studies reported randomization, 15% of studies reported allocation concealment, and 35% of studies reported blinded outcome assessment. We pooled the results in a meta-analysis, and in our primary analysis found that failure to randomize significantly increased effect sizes, whereas allocation concealment and blinding did not. In our secondary analyses we found that randomization, allocation concealment, and blinding reduced effect sizes, especially where outcomes were subjective.
Our study demonstrates the need for randomization, allocation concealment, and blind outcome assessment in animal research across a wide range of outcomes and disease areas. Since human studies are often justified based on results from animal studies, our results suggest that unduly biased animal studies should not be allowed to constitute part of the rationale for human trials.
Clinical epidemiologists and proponents of evidence-based medicine (EBM) have been using methods to reduce bias in human studies for over four decades. – Random allocation of participants to treatment groups, concealing the allocation sequence from those assigning participants to intervention groups (allocation concealment), and blinding of investigators assessing outcomes are now viewed as fundamental ways of ensuring quality and minimizing bias in clinical trials.  This is because concealed random allocation reduces selection bias and blinding outcome assessors reduces detection bias.  Armed with these methods, researchers have exposed several common medical practices as ineffective. For example, observational studies led us to believe that sodium fluoride reduced vertebral fractures,  that vitamin E reduced major coronary events,  and that high-dose aspirin was more effective than low-dose aspirin.  But subsequent randomized trials exposed all these treatments as useless or harmful. ,  Benefits of randomization, allocation concealment, and blinding have been confirmed in larger meta-epidemiological studies. In the earliest of these, Schulz et al. (1995) found that odds ratios were exaggerated by 30% in trials lacking allocation concealment and by 17% in studies that lacked blind outcome assessment.  Subsequent larger investigations have confirmed these results and also shown that adequate randomization reduces bias in human studies. , 
A growing body of evidence is beginning to suggest that randomization, allocation concealment, and blinding outcome assessment can also reduce the risk of bias of animal studies. – Some researchers hypothesize that avoidable biases in animal studies contribute to the failure to translate much experimental work for human benefit. ,  For example, while 503 of 835 candidate drugs for use in the management of stroke appeared effective in animal models, only one (tissue plasminogen activator) has proved sufficiently efficacious in humans. 
Much research into the empirical dimensions of bias in animal studies has been conducted by investigators from the Collaborative Approach to Meta Analysis and Review of Animal Data from Experimental Studies (CAMARADES) group.  CAMARADES researchers recently conducted an overview of systematic reviews of animal studies researching treatments for experimental stroke, and showed that failure to conceal allocation (but not failure to randomize or blind) exaggerated apparent treatment benefits in animal studies.  Despite this research, evidence-based principles have not yet been widely adopted in animal research; a recent study showed that only one in six controlled animal studies use randomization and only one in five use blind outcome assessment . We therefore aimed to replicate the CAMARADES study independently and to expand its scope to include all conditions.
We conducted an overview of systematic reviews. The protocol (unpublished) was finalized by JH, CH, RP, and JA in October 2012. We modified the protocol once to add the secondary analysis (testing the “unpredictability paradox”; see below). We searched MEDLINE and Embase databases (19 April 2012) and scanned reference lists for systematic reviews of animal studies that measured effects of randomization, allocation concealment, or blinding of outcome assessment. We included reviews in any disease area, using any intervention, any control group, any outcome measure and any animal model. We limited our search to the last 20 years and excluded human studies (search strategy in Appendix S1). We also excluded conference papers, studies not reported in English, ecological studies, and epidemiological studies.
Two reviewers (JH and JAH) independently extracted data on numbers of studies, numbers of animals, disease/condition, outcomes, effect measures, and effect sizes with confidence intervals, using piloted data extraction forms. Disagreements were resolved by discussion with other authors. Authors were contacted to request data which were not reported. To enable inclusion of one review  we estimated the number of animals in randomized and non-randomized groups by calculating the mean number of animals per study. To test whether this estimation affected our results we carried out a sensitivity analysis by removing the study from the meta-analysis. We assessed the risk of bias of included systematic reviews using the Assessment of Multiple Systematic Reviews (AMSTAR) criteria. 
We pooled results using the DerSimonian and Laird random effects model.  We reported outcomes for which differences between randomization/no randomization, allocation concealment/no allocation concealment, and blinding/no blinding were reported. We combined different outcomes and measurement units using standardized mean differences (SMDs), and quantified heterogeneity using the I-squared statistic.  We used meta-regression in a post-hoc analysis to examine whether various features influenced outcomes. Specifically, we investigated whether study size, disease state (stroke versus all other outcomes), or outcome measure were significantly associated with the effect size or could explain some of the heterogeneity.
For our secondary analysis we investigated the “unpredictability paradox”, which was proposed in a similar study involving human subjects.  The paradox states that the difference between inadequately randomized and randomized studies, although real, is unpredictable in terms of direction. This is plausible, given that the direction of bias may relate to differences in expected results. To investigate the paradox we ignored direction to see whether there was an absolute difference between results in randomized and non-randomized studies. We used the same method to investigate the unpredictability paradox for adequate allocation concealment and blinding. This approach is useful only as a guide, since with a large enough sample some absolute difference is likely to arise by chance alone.
We identified 238 articles from our electronic search, and a further 24 articles by hand searching references and contacting CAMARADES authors. Two authors (JH, JAH) excluded 199 articles after reading titles and abstracts. We assessed the full text of the remaining 63 articles and excluded a further 32 for not including outcome data. CAMARADES authors generously shared data from 19 reviews in which data were not included in the published reports. We were left with 31 systematic reviews involving 7339 comparisons (estimated 123,437 animals) to include in the meta-analysis (see Figure 1). Characteristics of the 31 included reviews are shown in Table 1, and our data are available freely from the authors.
Twenty systematic reviews investigated treatments for experimental stroke, –, , , , – four reviews investigated treatments for spinal cord diseases, – one review each investigated treatments for bone cancer,  intracerebral hemorrhage,  glioma,  multiple sclerosis,  Parkinson's disease,  and any treatments used in emergency medicine. Animal types included baboons, cats, dogs, ewes, gerbils, guinea pigs, lambs, marmosets, mice, monkeys, pigs, rabbits, rats, and sheep. In our sample 29% of studies reported randomization, 15% reported allocation concealment, and 35% reported blinded outcome assessment.
Thirty reviews with 7249 comparisons (121,784 animals) reported the effects of randomization. Randomized trials reduced effect sizes by a moderate and statistically significant amount (SMD = −0.07, 95% CI −0.12 to −0.02, I2=89.1%, P = 0.008) (Figure 2). In a subgroup analysis examining the effect of randomization by disease (stroke versus other), we found that randomization resulted in a lower effect size in areas other than stroke (SMD −0.18, 95% CI −0.30 to −0.06) but not stroke itself (SMD −0.03 95% CI −0.08 to 0.02). However, using meta-regression we found no significant difference between stroke and non-stroke on outcome measures (P =0.08); additionally, meta-regression could not explain more than 3% of the heterogeneity. A sensitivity analysis excluding the single review  in which we had to estimate the number of animals, did not alter the overall result (SMD= −0.08 95% CI −0.13 to −0.03). In our secondary analysis (where we ignored direction of effect) we found a larger difference between randomized and non-randomized studies (SMD −0.16, 95% CI −0.21 to −0.11, I2=86.6%, P<0.0001) compared with the effect size in which we took direction into consideration.
Eighteen reviews with 2696 comparisons (39,405 animals) reported the effect of allocation concealment. Studies in which allocation concealment was used resulted in slightly decreased effect sizes, but this was not statistically significant (SMD = −0.04, 95% CI −0.09 to 0.00, I2=51.6%, P=0.059) (Figure 3). Subgroup analysis examining different diseases (stroke and non-stroke) showed that allocation concealment in studies of stroke resulted in significantly lower effect sizes (SMD= −0.07, 95% CI −0.12 to −0.02, I2=48.5%, P=0.009), whereas allocation concealment in other disease areas resulted in higher effect sizes (SMD 0.05, 95% CI −0.01 to 0.11, I2=0%, P=0.128) but the difference between these groups was not found to be significant using meta-regression (P=0.073). Meta-regression of the combination of disease and outcome measure was did not explain more than 9% of the heterogeneity. In our secondary analysis (where we ignored direction of effect) we found a larger difference between concealed and non-concealed studies (SMD −0.08, 95% CI −0.11 to −0.05, I2=13.8%, P<0.0001) compared with the effect size in which we took direction into consideration.
Twenty-eight reviews involving 7140 comparisons (119,597 animals) reported the effects of blinding of outcome assessment. Effect sizes in studies that involved blind outcome assessment were not significantly different from studies that did not (SMD= −0.01, 95% CI −0.04 to 0.03; I2=68.3%; P=0.667) (Figure 4). A sensitivity analysis excluding one study in which some estimates were made did not change results.  We did not find any differences in effect sizes when we sub-divided studies into stroke and non-stroke groups. In a post-hoc subgroup analysis, we showed that blinding in studies reporting infarct volume did not significantly change effect size (SMD=0.03, 95% CI −0.02 to 0.08, P=0.187)), whereas blinding in those reporting neurobehavioral outcomes did (SMD= −0.06, 95% CI −0.10 to −0.02, P=0.003) and this difference was significant when tested using meta-regression (P=0.014). In our secondary analysis (in which effect direction was ignored) we found a larger difference between blinded and non-blinded studies (SMD= −0.08; 95% CI −0.11, −0.06; I2=49.5%; P < 0.001) compared with the effect size in which we took direction into consideration.
Using AMSTAR (Table 2), we found a moderate risk of bias. It was encouraging that all 31 reviews assessed the quality of included studies, all but two reviews used clearly used appropriate methods, and all but two reviews performed comprehensive literature searches. Yet only 9 studies provided a protocol, and only 17 studies searched the grey literature.
In this overview of systematic reviews we found that failure to randomize is likely to result in overestimation of the apparent treatment benefits of interventions across a range of disease areas and outcome measures. We also found a borderline effect of allocation concealment but no overall effect of blinding in our primary analysis. We hypothesize that the reason for an effect of randomization but not allocation concealment or blinding is that subjective judgments are less likely to influence outcomes in trials of (relatively homogeneous) animal models compared with (relatively heterogeneous) humans. While animal heart rates , blood flow , and behavior can be conditioned by human handling so that placebo controls are sometimes also used in animal studies,  there are no ‘patient-reported’ (subjective) outcomes in animal studies. This may make some measures of expectancy effects (for which blinding is useful ) smaller in animal studies. Our hypothesis is supported by our post hoc analyses, which showed that blinding reduced effect sizes for (more subjective) neurobehavioral scores, but not for (more objective) infarct volume. It may also be relevant that the comparison of allocation concealment versus non-allocation concealment was reported far less frequently (about half as) as the other comparisons, so the failure to find an effect of allocation concealment could be due to insufficient power. A future individual major study of individual trials is now warranted to investigate the direction, magnitude, and conditions that must hold for randomization, allocation concealment, and blinding to reduce bias in animal studies.
Our results corroborate those of the CAMARADES study, in the sense that we also identified significant bias in animal studies. However, whereas they found a borderline effect of allocation concealment, but no effect for blinding or randomization, we found an effect of randomization, a borderline effect for allocation concealment, and no effect for blinding. The differences between the two reviews could be because our review covered all disease areas, whereas theirs was limited to experimental stroke. In addition, our methods were different; we calculated standardized mean differences rather than (the less widely used and more difficult to replicate) normalized mean differences used by the CAMARADES researchers.
Our study had several potential limitations. First, outcomes, animal models, and disease types were heterogeneous. The high levels of between-study heterogeneity of our overview could not be explained using meta-regression but may result from heterogeneity of the included reviews (and it was beyond the scope of our study to examine the sources of heterogeneity within our included reviews). Secondly, we relied on reports of systematic reviews; these, in turn, relied on reports of individual trials. Some trials may have failed to report randomization, allocation concealment, and blinding when in fact these were used, and vice versa. Evidence from clinical trials suggests that reporting quality is a good surrogate for actual risk of bias. If a similar relationship between reporting quality and study quality in animal studies holds, incomplete reporting may not have affected our results . Based on reporting standards for clinical studies (that require, among other things, descriptions of how randomization, concealment, and blinding were achieved ) reporting standards for animal studies have been are emerging.  The Animal Research: Reporting In Vivo Experiments (ARRIVE) guidelines, developed in 2010,  arguably constitute the leading candidate for becoming a requirement, although development work in this area continues . More recently, it has been suggested that until formal reporting guidelines become required: “at a minimum, authors of grant applications and scientific publications should report on randomization, blinding, sample-size estimation, and the handling of all data”. 
Thirdly, it is unclear whether publication bias may have affected our results. It has been estimated that 1 in 6 animal trials remain unpublished,  so publication bias may have affected our results. If we assume that unpublished studies were equally likely to be randomized, allocation concealed, and blinded as they were to be non-randomized, not adequately concealed, and unblinded, then publication bias may not have affected the direction of our results. As with human studies,  compulsory registration of preclinical studies  would reduce publication bias and allow more precise estimates of the empirical dimensions of bias in animal studies.
Fourthly, many of the individual trials included in the systematic reviews applied randomization, allocation concealment, and blinding together, whereas we examined these features independently. Of the 31 included reviews, 19 investigated experimental stroke. If stroke studies tend to be different from other types of studies this might have influenced the results, although we explored this using sub-group analysis and meta-regression. Fifthly, there were a disproportionate number of stroke studies included in out overview of systematic reviews. This was due to the fact that stroke researchers have spearheaded empirical investigations of bias in animal research. Finally, this study was restricted to an investigation of the effects of randomization, allocation concealment, and blinding. Other features, such as lack of power, publication bias, choice of animal models, choice of sex of animals, and choice of outcome may also contribute to the internal and external validity of animal studies. , , ,  A future individual study systematic review and meta-analysis is now warranted to address these potential limitations.
Our study has implications that extend beyond the conduct of animal studies. Only animal studies that do not suffer from avoidable bias should be accepted as justification for human studies. For this reason, the United States Food and Drug Administration (FDA),  the Medical Research Council (MRC) in the United Kingdom,  and the World Health Organization (WHO)  insist on fair tests, often involving systematic reviews of high quality randomized trials. Our study therefore supports the requirement for adequate conduct and reporting of animal studies, including those being promoted by CAMARADES, and SABRE Research UK. 
Our overview of systematic reviews and meta-analyses revealed that failure to randomize leads to exaggerated effect sizes in animal studies across a wide range of disease areas. In our secondary analysis we found that failure to conceal allocation or employ blind outcome assessment exaggerates effect sizes in animal studies. Biased animal research is less likely to provide trustworthy results, is less likely to provide a rationale for research that will benefit humans, and wastes scarce resources. Requiring compulsory study registration and adherence to emerging evidence-based standards for the conduct and reporting of animal research is likely to reduce the risk of bias in animal studies and improve translatability of animal research.
Sir Iain Chalmers made comments on earlier drafts of this paper, and authors from the CAMARADES Collaboration (Al-Shahi Salman, R, Amarasingh, S, Antonic, A, Banwell, V, Batchelor, PE, Bath, PM, Battistuzzo, CR, Bennett, MI, Bernhardt, J, Briscoe, CL, Brommer, B, Carter, S, Chandran, S, Colvin, LA, Currie, GL, Delaney, A, Dickenson, AH, Dirnagl, U, Donnan, GA, Egan, KJ, Fallon, MT, ffrench-Constant, C, Forsberg, K, Frantzias, J, Gibson, C, Gray, L, Hirst, TC, Horky, LL, Howells, DW, Janssen, H, Jerndal, M, Koblar, SA, Kopp, MA, Lees, JS, Linden, T, Longley, L, Macleod, MR, Mead, GE, Mee, S, Murphy, S, Nilsson, M, O'Collins, VE, Pedder, H, Rooke, ED, Sandercock, PA, Schwab, JM, Sena, ES, Skeers, P, Speare, S, Spratt, NJ, van der Worp, HB, Vesterinen, HM, Wardlaw, JM, Watzlawick, R, Wheble, PC, Whittle, IR, Williams, A, Willmot, M, and Wills, TE) generously shared data from the studies their group had published. Malcolm Macleod was especially generous with his support in helping to gather CAMARADES data.
Jeremy Howick was funded by a National Institute for Health Research (NIHR) non-clinical fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.