Four determinants affect the magnitude of the signal generated in an RCT (as you will see later, these factors may also affect noise). They are the “baseline” or control group's risk of an outcome event, the responsiveness of experimental patients to that treatment, the potency of the experimental treatment, and the completeness with which outcome events are ascertained and included in the analysis. Understanding how these determinants operate begins and ends with the realization that the important number in an RCT is not the number of patients in it, but the number of outcome events among those patients.
All 4 determinants are present in every group of individuals being initially considered for, or later invited to join, a phase III RCT. Sometimes they are already optimum (in terms of maximizing the signal) within all potential study patients, and no restrictive eligibility criteria need to be applied on their account. More often, however, they are optimum only in certain subgroups of these patients, and the trialist needs to decide whether to selectively enrol just these optimum subgroups. As we shall see, manipulations of eligibility criteria to accomplish this selective enrolment can result in large, indeed definitive, increases in the signal produced by the trial. On the other hand, the opportunity costs of examining, lab testing and imaging all patients in order to find just the optimum subgroup of them may be prohibitive. Moreover, as noted in the previous section, eligibility criteria might shift an RCT away from its intended “pragmatic” orientation (“Does offering this treatment to all patients do more good than harm under usual circumstances?”) toward an “explanatory” one that is more difficult to apply (“Can rigorously applying the treatment to just some subgroup of patients do more good than harm under ideal circumstances?”). With those caveats in mind, we can now consider each of the determinants and how they convert into strategies for maximizing the signal.
Selectively enrol “high-risk” patients
Restricting eligibility to patients who are at higher than average “baseline” risk of outcome events leads to higher “Control Event Rates” (CER) among those receiving placebo or standard therapy. Because the absolute risk reduction signal is equivalent to the product of this control event rate and the relative risk reduction from therapy (ARR = CER х RRR)
6 it follows that, if the relative risk reduction achieved by the experimental treatment is both true and constant over different control event rates, the experimental treatment will generate a larger absolute risk reduction signal when the control event rate is high than when it is low. This is illustrated in . If the relative risk reduction is 1/4 for all patients in the RCT (regardless of their control event rates), notice the different impacts on the absolute risk reduction signal and the corresponding confidence in the trial result when we enrol all patients and when we restrict enrolment to just the subgroups at high and low baseline risk. Recruiting and randomly assigning just the subgroup of 120 high-risk patients in panel B generated both a higher absolute risk reduction (up from 0.125 to 0.20) and a 20% narrower confidence interval around it (from ±100% to ±80%) than randomly assigning all 240 patients in panel A. An examination of the low-risk patients in panel C shows how they inflate the confidence interval around the absolute risk reduction signal. In fact, every low-risk patient admitted to this trial makes the need for additional patients go up, not down!
Remember that this strategy works only when the relative risk reduction is either constant or increasing as control event rates increase. Although there isn't much documentation about this, and there are some exceptions, I've concluded that relative risk reduction is pretty constant over different control event rates when the treatment is designed to slow the progression of disease and prevent its complications. This has been observed, for example, in meta-analyses of ASA and the secondary prevention of cardiovascular disease,
7 and of both ACE inhibitors
8 and β-blockers
9 in heart failure. Moreover, in an examination of 115 meta-analyses covering a wide range of medical treatments, the control event rate was twice as likely to be related to the absolute risk reduction as to a surrogate for the relative risk reduction (the odds ratio), and in only 13% of the analyses did the relative risk reduction significantly vary over different control event rates.
10 When the treatment is designed to reverse the underlying disease, I've concluded that relative risk reduction should increase as control event rates increase, exemplified by carotid endarterectomy for symptomatic carotid artery stenosis, where the greatest relative risk reductions are seen in patients with the most severe stenosis (and greatest stroke risks).
11When outcomes are “continuous” you can look for evidence on whether the experimental treatment will cause the same relative change in a continuous outcome (say, treadmill time) for patients with severe starting values (awful exercise tolerance, analogous to high-risk patients for discrete events) and good starting values (good but not wonderful exercise tolerance, analogous to low-risk patients for discrete events). If this evidence suggests a consistent relative effect over the range of the continuous measure, I hope it's clear why the absolute difference signal generated by experimental treatment is greater (and its confidence interval narrower) among the patients with initially severe disease than among those with less severe disease (if this isn't clear, consider how much “room for improvement” there is in a patient who already is doing pretty well v. one who is doing poorly).
Harsh as it may sound, you need people in your RCT who are the most likely to have the events you hope to prevent with your experimental treatment (e.g., myocardial infarctions, relapses of a dreadful disease, or death). And, as long as the relative risk reduction from treatment is constant or rises with increasing control event rates, these high-risk patients also have the most to gain from being in the trial. Finally, to be practical, this “high-risk” strategy requires not only solid prior evidence that high- and low-risk patients exist, but also that their identification is easy and cheap enough to make their inclusion and exclusion cost-effective in conducting the trial.
The foregoing should cause second thoughts among trialists who are considering arbitrary upper age limits for their trials; they may be excluding precisely the high-risk patients who will benefit the most, raise the absolute risk reduction and make the largest contribution to the confidence in a positive result. On the other hand, if high-risk patients (or those with severe disease) are too far gone to be able to respond to the experimental therapy, or if competing events (e.g., all-cause mortality) swamp those of primary interest in the trial, the absolute risk reduction's confidence interval will expand and its signal might decrease. This discussion introduces a second element, responsiveness.
Selectively enrol highly responsive patients
The second way that you can increase the absolute risk reduction signal and the confidence in a positive trial result is by selectively enrolling highly responsive patients who are more likely (than average) to respond to the experimental therapy. Their greater-than-average relative risk reductions translate to increased absolute risk reductions and higher confidence in positive trial results. This increased responsiveness can arise from 2 different sources. The first and most easily determined cause is patients' compliance with an efficacious experimental therapy. Those who take their medicine might respond to it, but those who don't take their medicine can't respond to it. No wonder, then, that so much attention is paid to promoting and maintaining high compliance during RCTs, and why some RCTs put patients through a pre-randomization “faintness-of-heart” task, rejecting those who are unwilling or unable to comply with it. This is because, once patients are randomly assigned, all of them must be included in subsequent analyses, even if they don't comply with their assigned treatment. The second cause for increased responsiveness is the result of real biologic differences in the way that subgroups of patients respond to experimental treatment. This biologic difference may be much more difficult (and expensive) to determine among otherwise eligible patients. illustrates how either cause works among another 240 patients, this time with subgroups at the same baseline risk but with differing degrees of compliance (or other aspect of responsiveness).
Panel A is identical to panel A of . If, as in panel B, just the highly compliant subgroup is recruited, the resulting confidence interval around the absolute risk reduction is narrower than that observed among all 240 patients. However, every patient with low compliance (panel C) admitted to this trial made the need for additional patients go up, not down! Note that this high-response strategy works best when control event rates are either constant or increasing in subgroups with progressively higher relative risk reductions. Once again, although there isn't much documentation of control event rates in subgroups with different responsiveness, patients in our carotid endarterectomy trials with higher control event rates also enjoyed greater relative risk reductions with surgery.
11 As in the case of high-risk patients, the identification of highly responsive patients has to be both accurate and inexpensive if it is to decrease the total effort necessary for achieving a definitive trial result.
The foregoing elements of risk and responsiveness can usefully be combined as shown in , where I have summarized the “attractiveness” (in terms of maximizing the absolute risk reduction signal and the confidence in a positive trial result) of different sorts of patients whom you might consider enrolling into your RCT. This will come home to haunt you if, toward the end of your recruitment phase, you are short of “ideal” patients and decide to relax your inclusion criteria and start admitting lower risk or less compliant individuals. As predicted in , admitting such patients may increase, rather than decrease, the remaining sample size requirement (and administrative burdens) that must be satisfied to achieve a sufficiently large absolute risk reduction and a sufficiently narrow confidence interval around it.
Use a potent experimental treatment and give it a chance to exert its effect
The third way that you can tend to raise an absolute risk reduction signal and the confidence in a positive trial result is to employ a potent experimental treatment and give it a chance to exert its effect. You shouldn't expect patients to experience better outcomes when their treatment regimens aren't administered in a sufficient dose for a sufficient duration. Thus, an RCT to see whether drastic reductions in blood pressure reduce the risk of stroke must employ a drug that, in phase II trials, really does reduce blood pressure to the desired level. This “be-sure-your-experimental-treatment-is-potent” strategy is dramatically demonstrated in surgical trials, where the principal investigators may restrict their clinical collaborators to just those surgeons with excellent skills and low perioperative complication rates. In similar fashion, you should be sure that the experimental treatment is applied long enough to be able to achieve its favourable effects, if they are to occur.
If you digested the foregoing, you'll quickly grasp the incremental price of therapeutic progress that trialists must pay as they search for marginal improvements over treatments they already have shown, in previous RCTs, to do more good than harm. When today's standard treatment is already known (through prior RCTs) to do more good than harm, clinicians and ethics committees should and will insist that “standard therapy” (rather than a placebo) be provided to control patients in any subsequent RCT of the next generation of potentially more effective treatments. As a result, the control event rates are progressively reduced in subsequent trials (they behave like the low-risk patients described in panel C of ), and even if relative risk reductions are maintained at their former levels, the resulting absolute risk reductions will fall and their confidence intervals will widen. No surprise, then, that RCTs in acute myocardial infarction have become huge and hugely expensive, not (only) because cardiologists are an entrepreneurial lot, but because they already are reducing control event rates with the thrombolytics, β-blockers, ASA and ACE inhibitors they validated in previous positive trials.
As forecast in the introduction, the foregoing strategies for increasing the absolute risk reduction and narrowing its confidence interval by restricting trial participants to just the high-risk, high-response group, by maximizing compliance, by employing just the best surgeons, and so forth, moves the resultant trial away from a “pragmatic” study question (“Does offering the treatment do more good than harm under usual circumstances?”) toward an “explanatory” study question (“Can rigorously applying the treatment do more good than harm under ideal circumstances?”).
12 If the original question was highly pragmatic and intended to compare treatment policies rather than rigorous regimens, the strategies described above may be unwise and it becomes more appropriate to conduct a really large, simple trial. Similarly, these restrictive strategies may raise concerns (and not a few hackles) about the generalizability of the trial result. As I've argued elsewhere,
13 it is my contention that front-line clinicians do not want to “generalize” an RCT's results to all patients, but only to “particularize” its results to their individual patient, and already routinely adapt the trial result (expressed, say, as a “number-needed-to-treat” or NNT, which is the inverse of the absolute risk reduction) to fit the unique risk and responsiveness of their individual patient, the skill of their local surgeon, the patient's preferences and expectations, and the like.
14 Moreover, cautionary pronouncements about generalizability have credibility only if the failure to achieve it leads to qualitative differences in the kind of responses patients display such that, for example, experimental therapy is, on average, unambiguously helpful for patients inside the trial but equally unambiguously harmful or powerfully useless, on average, to similar patients outside it. I'll address this straw man in a later essay in this series.
Identify and record (ascertain) every event suffered by every patient in the trial
This is the fourth way that you can maximize an absolute risk reduction signal and the confidence in a positive trial result. Up to this point, I have assumed that all events have been ascertained in both control and experimental patients and that the resulting absolute risk reduction signal, regardless of whether it is large or small, is true. In other words, although the absolute risk reductions displayed in and are affected by the risk-responsiveness composition of the study patients, they nonetheless provide unbiased estimates of the effects of treatment. What happens in the real world of RCTs, where the ascertainment of events is virtually always incomplete? As you will see, this leads to systematic distortion of the absolute risk reduction signal away from the truth; that is, this estimate of the signal becomes biased. Accordingly, the fourth way that you can increase the absolute risk reduction signal and the confidence in a positive trial result is by improving the ascertainment of events during the RCT. This is shown in .
Suppose that the RCT's follow-up procedures were loose, and many patients were lost. Or, suppose that the outcome criteria were so vague and subjective that lots of events were missed. If experimental and control patients are equally affected by this incomplete ascertainment, the situation depicted in would occur, with a loss in the strength of the absolute risk reduction signal even though the relative risk reduction is preserved. But what if the accuracy of ascertainment differs between control and experimental patients, such as might occur in nonblinded trials, when experimental patients are more closely followed (e.g., for dose-management and the detection of toxicity) than control patients? What if that greater scrutiny of experimental patients leads to missing only 5% of events in the experimental group while continuing to miss 25% of events in the control group? This situation is shown in . Missing more events among control patients than among experimental patients not only decreases the absolute risk reduction signal but also widens its confidence interval. In this case, the bias leads to a “conservative” type II error (concluding that the treatment may be useless when, in truth, it is efficacious) and presents a powerful additional argument for blind RCTs (since they maintain equal scrutiny of experimental and control patients and equal ascertainment of their outcome events).
Having defined the determinants of the signal generated in an RCT and demonstrated how they can be manipulated to maximize that signal, it is time to consider how noise affects our confidence in the trial result and how that noise can be reduced.