|Home | About | Journals | Submit | Contact Us | Français|
To assess whether reported methodological quality of randomized controlled trials (RCTs) reflect the actual methodological quality, and to evaluate the association of effect size (ES) and sample size with methodological quality.
Retrospective analysis of all consecutive phase III RCTs published by 8 National Cancer Institute Cooperative Groups until year 2006. Data were extracted from protocols (actual quality) and publications (reported quality) for each study.
429 RCTs met the inclusion criteria. Overall reporting of methodological quality was poor and did not reflect the actual high methodological quality of RCTs. The results showed no association between sample size and actual methodological quality of a trial. Poor reporting of allocation concealment and blinding exaggerated the ES by 6% (ratio of hazard ratio [RHR]: 0.94, 95%CI: 0.88, 0.99) and 24% (RHR: 1.24, 95%CI: 1.05, 1.43), respectively. However, actual quality assessment showed no association between ES and methodological quality.
The largest study to-date shows poor quality of reporting does not reflect the actual high methodological quality. Assessment of the impact of quality on the ES based on reported quality can produce misleading results.
Randomized controlled trials (RCTs) are considered the most reliable method to assess the efficacy of competing interventions. Well designed and conducted RCTs and meta-analyses of RCTs are essential to ascertain whether new treatments offer small or moderate, but worthwhile, benefits.[1, 2] Users of research evidence (i.e., physicians, patients and policy-makers) make decisions on the basis of their confidence in the methodological quality of data presented in the publications. A large body of empirical evidence shows that biased results from poorly designed and reported RCTs can mislead decision making in health care. Accordingly, assessment of methodological quality of RCTs is crucial for informed healthcare decision making which has also been emphasized in a recent report by Institute of Medicine However, whether, published reports of RCTs truly reflect the actual methodological quality of RCTs has been assessed in only 3 cohorts of RCTs to date. [5–7] The first study by Soares et. al. assessed the publications and protocols of 59 phase III RCTs conducted by Radiation Therapy Oncology Group in the US. The study concluded that published reports do not reflect the actual superior methodological quality of RCTs. Similarly, a study by Hill et. al. assessed the published methodological quality of 40 RCTs in rheumatology and reported that published results do not represent the true methodological quality. Similarly, the study by Devereaux and colleagues assessed publications of randomly selected 105 RCTs and concluded that authors failed to report the allocation concealment and blinding procedures conducted in these RCTs. However, if these findings are applicable to other cohort of RCTs is not known. Moreover, what matters to the user of randomized evidence is not methodological quality per se, but whether the quality affects treatment effect size. To date, the relative impact of actual methodological quality of conduct versus methodological quality of reporting on treatment effect size in RCTs has not been studied. Some investigators have also postulated that RCTs with larger sample size may be associated with better methodological quality and that the larger sample size might have an impact on outcomes (i.e. in terms of results favoring standard or experimental treatment).[8, 9] Nevertheless, the evidence on the association of sample size of a RCT and methodological quality and associated outcomes is conflicting.[8–10] While some studies found an association between sample size and methodological quality and resulting outcomes [8, 9], others found no such association. In addition, the association of sample size with methodological quality of RCTs and resulting outcomes has not been evaluated in oncology RCTs, which is the subject of this paper.
The objectives of the current study are to assess 1) whether published methodological quality of a RCT truly reflect the actual methodological quality as reflected in the study protocol; 2) assess the impact of methodological quality of reporting and actual methodological quality on treatment effect size; 3) association of RCT sample size with methodological quality and outcomes.
The objectives of this study were addressed using the systematic review methodology. The study plan was specified a priori in a protocol.
All consecutive terminated phase III RCTs conducted by 8 National Cancer Institute (NCI) sponsored Cooperative Groups (COG) published until year 2006 and for which both full protocol and publications were available are included in the study. The 8 COGs are Children’s Oncology Group, National Surgical Adjuvant Breast and Bowel Project, Radiation Therapy Oncology Group, North Central Cancer Treatment Group, Gynecology Oncology Group, Eastern Cooperative Group, Cancer and Leukemia Group B and Southwest Cooperative Group.
A list of all consecutive RCTs and associated study protocols were obtained from respective COGs which also provided a matching publication associated with each RCT.
An initial list of all consecutive RCT and associated protocols was reviewed independently by 2 authors for eligibility. Studies not meeting the inclusion criteria were excluded.
Two reviewers independently extracted data using a standardized data extraction form. Data was extracted from both, protocol and matching publication for methodological quality domains relevant to minimizing bias and random error for each included RCT according to the methods recommended by Cochrane Collaboration.[12, 13] The extracted data from protocol and publication were then classified according to their source as: a) data reported only in publication b) data reported only in protocol C) data reported in either protocol or publication d) data reported neither in protocol nor in publication. Our “final assessment” of reporting of methodological quality was based on data from either protocol or the publication. We also extracted data on outcome of overall survival (hazard ratio (HR) and associated 95% confidence intervals) from each RCT which was used as the estimate of treatment effect size (ES). When direct extraction was not feasible, methods by Tierney et al were used.
We used descriptive statistics for reporting of methodological quality domains in protocols and publications. Association between methodological quality (reported versus actual) with ES was conducted using standard meta-epidemiologic methods.  Briefly, we computed the combined treatment ES estimates separately in trials with and without the methodological quality domain interest (e.g. inadequate or unclear allocation concealment) to calculate the ratio of hazard ratios (RHR) and 95% confidence interval (CI). The effect of methodological quality domains in RCTs on sample size was tested using Kruskal–Wallis one-way analysis of variance test. To test the association of RCT sample size and outcomes, we analyzed the correlation between the RCT sample size and treatment effect size (HR) using Spearman rank-correlation test. The Kruskal–Wallis one-way analysis of variance test  was used to test the equality of median sample size across three possible outcomes: RCTs favoring new treatment (defined as upper limit of overall survival HR 95% CI less than one), RCTs favoring standard treatment (defined as lower limit of overall survival HR 95% CI greater than 1) and RCTs favoring none of the treatments (defined as overall survival HR 95% CI including 1).
Between years 1968 and 2006, the NCI cooperative groups conducted 622 RCTs involving 781 comparisons. Out of 622 studies, protocols or publications were not available for 276 comparisons (117 RCTs) and therefore were not included in the final analysis. Of the remaining 505 comparisons, 76 (15%, 76/505) shared the protocols and were therefore excluded, resulting in 429 unique RCTs (enrolling 158,000 patients) which were used in the final analysis (Figure 1). The publication rate for the 429 RCTs was 98% (421/429).
The evaluation of key methodological domains associated with risk for bias is illustrated in Figure 2. While, 39% (165/429) of RCTs employed an adequate method for randomization sequence generation as stated in the protocol, only 23% (98/429) reported doing so in the publications. Treatment allocation was adequately concealed and specified in protocols in 95% (409/429) of RCTs but only 24% (101/429) reported correctly in the publications. The procedure for blinding was applicable to 35 RCTs only. Of these 35 RCTs the procedure for blinding was adequately reported in 91% (32/35) of protocols versus 43% (15/35) of publications. Thirty three percent (141/429) of RCTs reported expected drop-out rate in the protocols and 97% (418/426) of RCTs stated the drop-out rate in publications as well.
The plan to follow the intention to treat principle was reported in protocols in 21% (91/428) versus 91% (392/428) of RCTs actually reporting it in publications. However, out of these 392 RCTs reporting ITT analysis, only 15 % (63/392) of RCTs mentioned the exact “intention to treat” phrase in the publications. In the remaining 77% (329/392) of RCTs it was clear (by matching randomized and analyzed populations) that ITT principle was used but these RCTs did not state the exact phrase or provided an explanation that analysis were based on ITT principle. Eighty two percent (351/429) of RCTs had active treatment as comparator, 7% (30/429) of RCTs employed placebo as comparator, 10% (45/429) of RCTs had no treatment as comparator and 1% (3/429) of RCTs had active treatment and add-on placebo as comparator.
The expected difference in the primary outcome was stated a priori in the protocol of 96% (414/429) of RCTs, but was mentioned in the publication for 43% (186/429) of RCTs only. The α and β errors were pre-specified in the protocol for 93% (398/426) and 95% (406/426) of RCTs, respectively. However, only 32% (138/426) of RCTs reported α error and 41% (174/426) of RCTs reported β errors in the publications. A priori sample size calculations were performed in 98% (421/429) of RCTs while only 40% (172/429) of RCTs reported of having done so (figure 3).
There was no statistically significant difference between associations of ES and methodological quality of reporting for the following domains: adequacy of randomization sequence generation, description of drop outs, intention to treat analysis, pre specification of alpha and beta errors. However, on average, poorly reported allocation concealment exaggerated the ES by 6% (RHR: 0.94, 95% CI: 0.88–0.99). Also, poorly reported blinding inflated the ES by 24% (RHR: 1.24, 95% CI: 1.05–1.43; figure 4).
Nonetheless, when the data from either protocol or the publication was taken into consideration in our final assessment of methodological quality, there was no statistically significant association between ES and any of methodological quality domains (figure 4).
The distribution of median sample size of RCTs was similar across RCTs with adequate versus inadequate description of generation of randomization sequence as reported in publications only (p value=0.28). Similarly, there was no difference in median sample size across trials reporting adequate versus inadequate allocation concealment (p value=0.09), and adequate versus inadequate description of drop outs (p value=0.56) in the publications. Furthermore, there was no difference in median sample size in trials with pre specified α error (p value=0.14), and β error (p value=0.23) compared with RCTs which did not specify α error and β error in publications. However, RCTs which adequately described the blinding procedures enrolled more patients (median: 234; range: 47–1387) compared with RCTs which did not described the blinding procedures (median: 123; range: 45–18882) (p value = 0.05). Choice of the comparator was also associated with the median RCT sample size (p value < 0.001). (see table 1) However, taking into account the final assessment of methodological quality of RCT, regardless of the source (protocol or publication), the results showed that variation in sample size of a RCT was not associated with methodological quality (see table 2).
There was a statistically significant negative correlation between RCT sample size and absolute treatment effect size for the outcome of overall survival (Rho = −0.147, p value = 0.006). However, the distribution of RCT sample size was similar across RCTs favoring new treatment, RCTs favoring standard treatment and RCTs favoring none of the treatments (p value = 0.13).
The practice of medicine is informed by new research findings, and physicians incorporate the evidence into practice after assessing the quality of evidence as reported in the peer-reviewed publications. That is, quality of reporting is vital because the users of research evidence (i.e., physicians, patients, policy- makers) make decisions on the basis of their confidence in the accuracy of a given research paper. Our findings show that the quality of reporting of the NCI cooperative groups is rather poor. Publications often omit important methodological features which are critical to decision making. However, the findings also suggest that poor reporting of RCTs does not correlate with actual superior methodological quality of trials. Our study was limited to the cohort of RCTs for which both the protocol and publication were available. As a result, we excluded 117 RCTs (276 comparisons) from our analysis (figure 1). However, the methodological quality of reporting in these 117 RCTs was similar to the RCTs included in our analysis (data not shown). Also, details related to generation of randomization sequence were reported poorly in protocol and publications which may be an artifact of the strict definitions we applied towards the assessment of methodological quality. Additionally, NCI cooperative groups with a centralized mechanism with common and standardized methods for randomization might find it unimportant to report these details in protocols as well.
Our study results also show that the association of methodological quality of reporting and treatment effect is spurious as evident from the actual high methodological quality of conduct of these RCTs. That is, we found a statistically significant association between treatment effect size and reporting (published data only) of methodological quality domains of allocation concealment and blinding. That is, poor reporting of allocation concealment and blinding inflates the ES. However, assessment of methodological quality using data from protocols and publications showed no impact of methodological quality on treatment effect size. Results also showed that for the cohort of RCTs conducted by the NCI cooperative groups the published methodological quality was also associated with trial sample size. In contrast, taking into account the actual methodological quality the association between median sample size and quality domains was non-existent. That is, we found a statistically significant variation in the distribution of median RCT sample size based on the reporting (published data only) of methodological quality domains such as choice of the comparator (placebo) and description of blinding procedure (table 1). However, assessment of methodological quality using data from protocol and publications showed no variation in the sample size across RCTs (table 2). The results further emphasize the importance of including protocols and publications for assessment of methodological quality.
Our findings are in line with previous research on the topic.  [2, 6, 18] However, except one study that focused on radiation oncology trials  none of the previous study used study protocols for assessment of actual methodological quality i.e. reported a comparison of the quality of reporting with the methods specified in the original research protocols. The current study is also largest to date on the subject.
It has been previously shown that low methodological quality of RCTs inflates the treatment effect. For example, Colditz et. al. found that double-blind RCTs had smaller ES than nonblinded trials. Similarly, Schulz et. al. reported that inadequate allocation concealment accounted for a substantial increase in ES. Allocation concealment has shown the most consistent associations with treatment effect sizes. However, all of these studies have assessed the methodological quality of reporting only. The relative impact of the actual methodological quality of conduct versus the methodological quality of reporting on ES has not been studied. Our results show that the assessment of the impact of methodological quality on the ES based only on the quality of reporting can produce spurious results. These findings are important for meta-epidemiologic research and further development of quality assessment tools. Researchers may need to revise the current strategy of assessment of methodological quality based on reporting only. In instances where the study protocol is available, the assessment of methodological quality of conduct may augment the overall assessment of methodological quality of these studies.
Our results also showed that median sample size of a RCT was also associated with reported methodological quality domains of blinding and the choice of the comparator (placebo versus active treatment). However, there was no association when actual methodological quality of trials as reported in the protocols was assessed. We are aware of one study by Singh et. al. which concluded that RCT sample size was related to methodological quality domains of blinding and use of ITT principle of data analysis. This study also reported that sample size was an independent predicator of positive result in a RCT. In contrast, our findings suggest that the reported RCT sample size is not associated with outcomes (P = 0.11). The discrepancy in findings between the study by Singh et al. and ours could be attributed to the facts that this study reviewed only arhtroplasty RCTs and dichotomized RCT sample size as large (≥ 100 patients) versus small sample size (≤ 100 patients). Additionally, the study by Singh et. al. based their analysis on published methodological quality only and did not have access to study protocols to determine the actual methodological quality of RCTs.
In summary, we report the first comprehensive, formal investigation of the methodological quality of RCTs conducted by the NCI cooperative groups. While the results show that, RCTs conducted by the NCI cooperative groups are of high quality the published reports have deficiencies in their description of the actual methods used in the RCTs. It is important to note that these RCTs were of high quality even before publication of the Consolidated Standards of Reporting of Trials (CONSORT) statement in 1996. Our findings indicate that although investigators in NCI cooperative groups were attentive to the critical aspects of the design and conduct of RCTs, they were less aware of the need to report these features. Given that physicians, policy-makers, guidelines panels, systematic reviewers and other users of evidence rely on published reports only, this is important findings that can be immediately rectified by the NCI “officially” adopting the CONSORT reporting guidelines. Our study also highlights the pitfall of current practices of appraisal of based on methodological quality of reporting only without considering the actual methodological quality of conduct evident from research protocols.
These findings underline the overall need for enhanced adherence to the revised CONSORT statement by both authors and journal editors which will allow transparent evaluation of RCTs. The results also underscore the importance of publication of RCT protocols in the public domain. Publication of study protocols will not only help decision makers in interpreting the results from RCTs transparently and efficiently, but might also enhance prospects for collaboration thereby decreasing replication of research effort, improve RCT recruitment, and reduce bias in the reporting.[24–27] These findings are also important for further development of the methodological quality appraisal tools such as Cochrane “risk of bias” algorithm and their application in assessment of the actual methodological quality versus quality of reporting.
Funding source: NIH/ORI Grant # 1 R01NS052956-01 PI: Dr. Benjamin Djulbegovic
Conflict of Interest: All authors [RM, BD, AM, HS, AK] declare that they have no non-financial interests that may be relevant to the submitted work. All authors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.