Overview of methods
STOPIT-1 included 143 tRCTs and we were able to identify 14 additional RCTs stopped early for benefit through a hand search in the medical literature and personal contact with trial investigators. The effort to identify tRCTs will continue by updating the search that led to the trials identified for STOPIT-1 (November 2004) using the same search strategy and through citation searching linked to the STOPIT-1 publication and accompanying editorial in JAMA[
3,
5]
In STOPIT-2, we will search for systematic reviews addressing the same question as the tRCTs (Figure ). We will utilize the sensitive strategy for systematic reviews put forth for MEDLINE by Montori et al[
6] Systematic reviews that ask a similar question to the tRCT but do not include the tRCT due to its publication after the search date of the systematic reviews will be updated to the present time. Other systematic reviews that include the tRCT will not be updated. Systematic reviews that are only similar under the broadest of definitions will be included only if the review authors chose to pool the tRCT within the systematic review.
From each eligible systematic review we will blind each RCT's results and two independent reviewers will determine eligibility. From each eligible trial we will then extract data and conduct new meta-analyses addressing the outcome that led to the early termination of the tRCT(s). First, we will compare the relative risk generated by the tRCT with the relative risk from all non-truncated studies. Second we will use multivariable regression to determine the factors associated with the difference in magnitude of effect between the tRCTs and RCTs not stopped early. These factors will include the presence and quality of a stopping rule, the methodological quality of the trials, and the number of events that had occurred at the time of truncation. Finally, we will compare possible methods for correcting the treatment effect estimates from tRCTs, in particular the use of Bayesian methods using conservative informative priors to "regress to the mean" the tRCT estimates. We will then compare the degree of disagreement with the meta-analytical estimates between the Bayesian-adjusted tRCT and the unadjusted tRCT results.
Some authors have suggested that pooling tRCTs with non-truncated trials addressing the same question will yield minimally biased estimates of treatment effects[
7,
8] However, our previous empirical finding that stopped-early studies contributed more than 40% of the weight in more than a third of meta-analyses including tRCTs challenges this view[
9]) Nevertheless, if the overall estimate of treatment effect (based on all studies, including tRCTs) were the least biased estimate of the true underlying effect, it is this estimate to which one should compare tRCTs. Based on simulations and theoretical considerations we found compelling strengths and compelling limitations for each approach (Table ). We will explore the extent to which results are consistent with the hypothesis that the pooled estimate including the tRCTs is least biased using our empirical data (for instance, the tRCTs should provide a relatively small weight in the meta-analysis as a result of their having fewer events because of stopping early). Choosing a primary analysis for a study commonly involves some arbitrariness. Given compelling reasons for either approach, we decided to conduct both analyses. We chose non-truncated RCTs only as the comparator in our primary analysis. In a complementary second analysis, we will compare the tRCT and the pooled estimate of all trials including the tRCT.
| Table 1Comparison of non-truncated RCTs only and truncated + non-truncated RCTs as comparators to estimate the magnitude of bias associated with stopping clinical trials early for benefit based on simulations and theoretical considerations. |
Literature Search for Systematic Reviews
For meta-analyses, we will search the Cochrane Database of Systematic Reviews and the Database of Abstracts of Reviews of Effects using the population and intervention of the tRCTs as search terms. We will also search for meta-analyses in MEDLINE with textwords and Medical Subject Heading terms based on the study population and the intervention specified in the research question of the tRCTs, if necessary supplemented by a specified outcome, and with textwords "meta-analysis" OR "overview" OR "systematic review" and in a second approach with limits "meta-analysis.pt." AND "human"[
6]
Eligibility of Systematic Reviews
Systematic reviews will be considered eligible if they meet all of the following 5 criteria:
1) Report the methods used to conduct the review
2) Describe a literature search that, at minimum, includes MEDLINE
3) Include a population similar to that of the tRCT
4) Include an intervention similar to that of the tRCT
5) Include an outcome similar to the one that was the basis of the decision to stop the tRCT early
Because there is considerable judgment involved in the eligibility decisions, particularly criteria 3 to 5, every decision of the initial adjudicators will be reviewed and confirmed or refuted by another adjudicator and if necessary, by a third party. If in doubt while applying the broadest similarity criteria, a key factor for eligibility will be that the systematic review pooled the tRCT. In general, if in doubt, we will judge the systematic review eligible, because there will be a second review of eligibility at the level of individual trials.
Updating of Systematic Reviews
The only systematic reviews we will update are those that did not include the index tRCT(s) because they were completed prior to the publication of the tRCT. In these instances, we will update the search of the systematic reviews to the present using the same strategy used in the systematic review. We will not update all meta-analyses in the systematic reviews, only the ones for the outcomes that led to the early termination of the matching tRCT(s).
Identification, retrieval and eligibility of RCTs included in the systematic reviews
For each systematic review we will retrieve all included RCTs in full text (including associated manuscripts describing methods) to determine their similarity to the index tRCT. We will obtain data from unpublished studies that were included in the systematic reviews by contacting the authors of the systematic review and/or the authors of the unpublished studies. Including trials addressing a question that was different to that addressed in the relevant tRCT would bias the assessment of magnitude of effect from the trials not stopped early. Thus, we will judge the eligibility of each trial in the systematic review on the basis of the following criteria:
1) Including a population similar to that of the tRCT
2) Including an intervention similar to that of the tRCT
3) Including a control similar to that of the tRCT
4) Including an outcome similar to the one that led to the early termination of the tRCT
5) Random allocation to intervention and control group
One could have criteria for similarity that are very strict, or very permissive. As it is uncertain what the right approach is, we will classify the population, intervention, control and outcome of each potentially eligible trial as either "more or less identical", "similar, but not identical" or "broadly similar". The eligibility form will allow differentiation between eligibility of the studies based on the narrow, the broad or the broadest criteria and the "closeness" of the RCTs to the index tRCT will be considered in the analyses. We will construct a number of teams of two reviewers to make the eligibility decisions.
Each team will include individuals with expertise relevant to the content of the studies they will review. Within each pair of reviewers, the rating of the individual RCTs will be done independently and in duplicate. Disagreements will be resolved by discussion and, if necessary, by a third party. Because we are at risk of bias in the decision about whether to include a RCT based on the results, the reviewers who judge eligibility will be blinded to the results of the trial. Blinding will be accomplished by a separate team, not involved in study selection, using black ink on "hard copies" before these are scanned into electronic format or using black boxes overlaid on the sections describing results on electronic versions in portable document format of the paper. Every section of potentially eligible RCTs that reports the magnitude of results (abstract, results and discussion) will be blinded before the decision on eligibility is made. Blinding will be tested in a random set of 20 papers sent to 20 reviewers to ensure its success.
For RCTs, disagreements in relation to similarity of 2 levels or greater will require adjudication. Disagreements in relation to similarity of 1 level will not and the broader similarity rating will be assumed correct (Figure ).
Data extraction
From each RCT, we will collect the following data in duplicate.
1. Stopped early (yes/no)
2. Methodological quality: allocation concealment (documented as central independent randomization facility or numbered/coded medication containers prepared and distributed by an independent facility (e.g. pharmacy)); blinding of participants, care providers, and outcome adjudicators (blinding of participants and care providers will be rated as "probably yes" when trial report states "double blinded" or "placebo controlled"); loss to follow-up (we will collect the number of participants randomized and the number of participants with outcome data for the outcome of interest allowing for an estimation of loss to follow-up)
3. Measure of treatment effect for the outcome that terminated the tRCT (events and number randomized in intervention and control groups)
4. Pre-implemented stopping characteristics, if any (e.g., planned sample size, interim looks, stopping rules, number of events)
5. Date of conduct of the trial (start date, stop date, publication date)
Statistical Analysis
We will calculate relative risks for each RCT in our study. For studies that provide results as continuous data (means, standard deviations), we will estimate an approximate dichotomous equivalent. To do this we will assume normal distributions of the results and that half a standard deviation represents the minimal important change[
10] Using baseline data we will obtain the 0.5 standard deviation threshold from the baseline distribution and calculate the proportion of each follow-up distribution above or below (depending on the direction of the outcome) the threshold, i.e. the proportion of patients in each treatment arm who "did worse". This will allow us to specify relative risks and associated confidence intervals. If baseline data are not available, we will use the follow-up distribution of the control group to substitute for the 0.5 standard deviation threshold.
As well, for each meta-analysis we will calculate the pooled relative risk and 95% confidence interval for all trials that were not stopped early. Where there is more than one tRCT per meta-analysis we will also calculate a pooled relative risk and confidence interval for those tRCTs. These pooled estimates of relative risks will be calculated using an inverse variance weighted random effects model.
We will graphically present the results in a scatterplot of the effect size (relative risk) of the tRCT (horizontal axis) against the pooled effect size of non-tRCTs (vertical axis). If the tRCT and non-tRCTs give similar results, the points should be scattered along the diagonal of the scatterplot; if the tRCTs overestimate treatment effects they should be found above the diagonal.
We will also perform a z-test for each meta-analysis to look for differences between the truncated and non-truncated RCTs for the pooled relative risks. As a summary measure we will calculate a ratio of relative risks for each meta-analysis as follows:
We will plot the log(ratio of relative risks) and calculate an overall log(ratio of relative risks) as an inverse variance-weighted average of the log(ratio of relative risks). These will be back transformed and the ratio of relative risk values will be plotted for presentational purposes.
To investigate possible predictors of treatment effect sizes in RCTs, we will perform a hierarchical (multi-level) regression analysis. Our model will have two levels: individual RCT (study) level and meta-analysis level. The dependent variable in this analysis will be the logarithm of the relative risk (logRR) for each study and we will investigate the associations of the logRR with characteristics of the individual studies. We will investigate five possible predictors. Our main predictor of interest is a variable that we will construct from two different study characteristics, the presence and quality of a stopping rule and whether or not the RCT was truncated early.
The rule for stopping early will be categorized as one of three possibilities: (i) a rigorous rule (published prior to the trial plan), (ii) a not-so-rigorous rule such as ad hoc rules developed during the trial, (iii) no rule or unknown. Each of these three possibilities will be combined with whether or not the trial stopped early, creating 6 categories in total. It is very likely that there will be less than six categories in our final analysis as it is quite conceivable that some of the scenarios will not occur. We will carry out post hoc comparisons of outcomes between these 6 groups, focusing on contrasts that highlight the effects of the rule and the "truncated study" variable, and their interaction, to the extent that the available data permit.
Other study-level characteristics that we will examine are the methodological quality (blinding of patients, care-givers, and outcome assessors, and allocation concealment), and the total number of events. At the meta-analysis level, the only variable in the model will be an indicator of the specific meta-analyses to which each study belongs.
We will look at the main effects of all the variables and the interaction between the rule/truncated variable and the other predictor variables. Each study will yield a summary statistic (logRR) and an associated variance. The variance will provide weights for a meta-regression to evaluate the determinants of the estimated treatment effect.
The multivariable regression described above will be performed on 5 different datasets based on different levels of a variable which we will call closeness. This variable will measure how similar the non-truncated trials in each meta-analysis are to the corresponding truncated trial(s) with regard to the a) patient population, b) treatment arm, c) control arm and d) outcome. For each of these four, we will categorize closeness into one of three levels: very close (termed as 'fits the narrow criteria' in the database), moderately close (termed as 'fits the broad criteria' in the database), and less close (termed as 'fits the broadest criteria' in the database). This judgement will be coded by 2 reviewers, and the level of agreement (kappa) checked. Each trial will then be categorized by its least close category over the four areas which we will use to define our 5 different datasets. The datasets will be: 1) only trials that are "very close" in all domains; 2) trials with one or more "moderately close" domain, but no "less close" domains and not "very close" in all domains; 3) trials that are "less close" in at least one domain; 4) trials that are "very close" or "moderately close" in all domains (corresponds to 1) and 2) combined); and 5) all trials.
As discussed previously, we will conduct a further analysis in which the comparison is between the tRCT and the pooled estimate of all trials including the tRCT. If the tRCTs provide relatively small weights in the meta-analyses as a result of fewer events because of stopping early, the pooled estimate including the tRCTs may provide the least biased summary estimate.
Finally, we will compare possible methods for correcting the estimates from tRCTs for possible bias, in particular the use of Bayesian methods. The basic approach here is to use a conservative prior for trials (derived empirically from past trials in other areas – we will review such existing reviews [
11-
13]) and combine this information with the data from the tRCT to obtain a posterior estimate of effect. The weight will depend on the relative variance of the conservative prior and the tRCT: small trials will lead to an emphasis on the conservative prior whereas large trials will attach relatively greater importance to the observed data. We will calculate such Bayesian relative risks for each tRCT in our study. As for the simple tRCT estimate, we will graphically present the results for a visual comparison of the effect size (relative risk) of the truncated RCT(s) and the non-truncated RCTs. Based on previous simulation work [
14] we would predict that the Bayesian estimates obtained will be closer to the meta-analysis findings.