|Home | About | Journals | Submit | Contact Us | Français|
A challenge for the clinician treating patients with multiple sclerosis (MS) is to determine the most effective treatment while weighing the benefits and risks. Results of the phase 2 and phase 3 studies on natalizumab were received with great interest, in part due to the “improved” risk reduction for relapse rate, disease progression, and MRI metrics observed in comparison to results in trials of beta-interferon and glatiramer acetate. However, comparison across trials is invalid, in large part due to differences in the study populations. The increased efficacy observed in more recent trials has also been attributed to a fundamental change in subjects with MS enrolled in recent trials compared with the prior decade. In this article, we debate the relative efficacy of natalizumab vs the older injectable therapies.
In 2004, natalizumab was welcomed to the market as a US Food and Drug Administration (FDA)–approved treatment for multiple sclerosis (MS). It held great promise as a monthly IV agent with possible improved efficacy over its injectable counterparts. However, its temporary withdrawal from the market in 2005 after its association with an often fatal CNS infection, progressive multifocal leukoencephalopathy (PML), incited both increased attention and scrutiny regarding its use. Even as enthusiasm has dampened due to risks, a widely held perception is that the efficacy of natalizumab surpasses other FDA-approved disease-modulating therapies (DMTs) for relapsing-remitting MS (RRMS). We argue that the basis for this belief, which often cites cross-trial comparisons of clinical endpoints, is inherently invalid.
To date, randomized head-to-head trials comparing natalizumab to other DMTs for MS are lacking. However, often in medicine we must make treatment decisions in the absence of comparative trials. Phase III studies of natalizumab include the AFFIRM trial, comparing natalizumab to placebo, and the SENTINEL trial, evaluating natalizumab as an add-on treatment to interferon β (IFNβ)-1a IM.1,2 The perceived superiority of natalizumab over other DMTs originates in part from a robust 68% reduction in annualized relapse rate relative to placebo in the AFFIRM trial. In contrast, the annualized relapse rate reduction relative to placebo in the pivotal trials for IFNβ-1a IM, IFNβ-1a SC, IFNβ-1b SC, and glatiramer acetate (GA) ranged from 29% to 34%, based on 2-year outcome data (table 1).3–6 The perception of superiority may also derive from the SENTINEL trial, where the addition of natalizumab to IFNβ-1a IM reduced relapse rate by 54% relative to IFNβ-1a IM alone. Subjects enrolled in SENTINEL had to meet the inclusion criterion of breakthrough disease while on IFNβ-1a IM monotherapy. A natalizumab monotherapy arm was not included. Although the 2 therapies may have additive effects on clinical and radiographic progression of disease in the population studied in SENTINEL, a direct comparison of the 2 drugs cannot be made.
Likewise, comparison of results from natalizumab trials to relatively recent trials using standard DMTs that were performed in subjects with clinically isolated syndrome (CIS) with high risk for developing MS is improper. CIS trials tested the hypothesis that early treatment would delay the development of MS. This is a very different scenario than in the pivotal trials of these same agents performed in subjects with active RRMS, which tested the effect on relapse rate. We argue that an accurate comparison of the efficacy in RRMS of natalizumab to that of the older DMTs cannot be made until subjects are randomized from the same patient population and compared directly.
The risk of PML mandates that the benefit of natalizumab be examined with a high degree of scrutiny, elevating the stakes of this debate. This discussion is particularly critical when considering the use of natalizumab as a first-line therapy in RRMS. Estimates have placed the risk of PML in natalizumab-treated patients at 1 in 1,000 at 18 months.7 The risk of PML is present whether natalizumab is used in combination or as a monotherapy. Long-term natalizumab-associated PML risk is unknown at this time, making risk-benefit analyses more difficult to extrapolate.8
Moreover, when comparing trials with relatively less frequent primary outcomes, such as clinical relapses, the absolute risk reduction (ARR) may more accurately reflect therapeutic gain than relative risk reduction (RRR).9 This is because the RRR becomes magnified as the event rates in the comparator group get smaller. A difference in efficacy between natalizumab compared to the 2 high-dose beta IFNs is less apparent when ARR is compared between trials than when using RRR. In AFFIRM, treatment with natalizumab reduced annualized relapse rate by 0.5 (0.73 in the placebo group vs 0.23 in the natalizumab group), whereas IFNβ-1b SC reduced annualized relapse rate by 0.43 (1.27 vs 0.84) and IFNβ-1a SC reduced annualized relapse rate by 0.42 (1.28 vs 0.86) (table 2). Thus, the number needed to treat (NNT) is 2.0 for natalizumab compared to a NNT of approximately 2.4 for the 2 high-dose beta IFN trials. Instead of appearing twice as effective as when making comparisons based on RRR, these treatments appear to be similarly effective using comparisons of ARR. It has been argued that if dissimilar results arise when comparing 2 trials by the 2 different methods of analysis, then conclusions regarding relative efficacy from cross-trial comparisons should not be made.10
Even with ARRs taken into account, cross-trial comparisons can be flawed by differences in disease severity of subjects in the trials. A change in MS clinical trial recruitment has occurred over the last 15 years. For several reasons, patients with more benign disease are now being recruited into clinical trials. One possible reason is that the modern high-resolution MRI allows earlier confirmation of the diagnosis, including subjects with nonspecific findings or milder symptoms. Prior to routine use of MRI, emphasis was placed on objective confirmation of 2 relapses, which often consisted of corticospinal, cerebellar, or brainstem systems sometimes associated with a poorer prognosis.11 In addition, prior to proven therapies, confirmation of the diagnosis of MS was not considered urgent. Another possible reason for the shift toward more benign subjects in present-day clinical trials is that patients with more aggressive disease may be less likely to be recruited into a trial where there is a chance they may receive a placebo or experimental treatment with an unknown efficacy. Prior to the advent of standard DMTs, aggressive disease would not have been a factor discouraging recruitment.
This shift toward enrollment of subjects with more benign MS is illustrated by comparisons of the placebo groups from different trials of RRMS. For example, the AFFIRM placebo group baseline mean Expanded Disability Status Scale (EDSS) score of 2.3 and mean pretrial annualized relapse rate of 1.5 were similar to most pivotal trials, yet this placebo group behaved in a more benign manner than the placebo groups from the earlier trials. The annualized relapse rate of 0.73 for the AFFIRM placebo group was lower than the placebo groups of all 4 other pivotal trials, which ranged from 0.84 to 1.28. More evidence for this trend stems from the recently published REGARD trial, in which subjects were randomized to either GA or IFN-β1a SC.12 The annual relapse rates for each treatment group were only ~50% of the annual relapse rates for subjects randomized to the same agents in the pivotal trials done more than 10 years prior. This suggests that patients in the recent REGARD study had more benign disease. In fact, in these 2 more contemporaneous studies, the 2-year annualized relapse rates for glatiramer acetate (0.29) and IFN-β1a SC (0.30) in REGARD were similar in magnitude to that of natalizumab-treated subjects in AFFIRM (0.23). However, it is important to recognize that these 2 trials enrolled somewhat dissimilar subjects and that a placebo group was not included in the REGARD trial. Moreover, whether patients with MS with more benign disease experience a more robust response to treatment remains a point of debate.
In addition, subjects recruited into present-day clinical trials are enrolled using a different set of diagnostic criteria in the setting of ever-improving diagnostic imaging techniques. The AFFIRM trial enrolled subjects diagnosed using the McDonald criteria, as opposed to the Poser criteria that had been used in prior pivotal trials.13,14 Due to this, a small proportion of the AFFIRM subjects could not have been enrolled in the earlier pivotal trials that required diagnosis by more stringent criteria.
Changes in the MS patient population itself have the potential to yield distinctly different responses in clinical trials. Evidence indicates that the demographics of MS are shifting over the past few generations. For example, the latitudinal gradient of MS appears to be decreasing over time, due to increasing incidence in people living at lower latitudes.15–17 This change may reflect increasing genetic diversity within the MS population. Of note, subjects from the pivotal trials of IFN-β1a IM, IFN-β1b SC, and GA were predominantly recruited from the United States and Canada, with only the PRISMS trial recruiting mainly from Europe. The AFFIRM and SENTINEL trials included subjects from a larger number of clinical centers representing greater geographic diversity, enrolling subjects from Europe, North America, Australia, and New Zealand. We conclude that the general MS population is changing over time, making cross-trial comparisons increasingly fraught with imprecision, especially when comparing trials that were completed more than a decade apart and in different locations.
In MS clinical trials, both clinical and radiographic endpoints are used to assess efficacy. Relapse-related outcome measures are frequent primary endpoints of phase III MS clinical trials. It remains to be determined whether these clinical outcomes are more valid surrogate markers of long-term disability than MRI measures. Radiographic measures may be more sensitive to the underlying pathologic changes and disease activity. In fact, when compared to the 2 high-dosed β-IFN pivotal trials, similar relative reductions in new T2 lesions and gadolinium-enhancing lesions were observed in natalizumab-treated AFFIRM subjects. Thus, the relative differences in MRI outcome measures between the clinical trials were much less robust than relative differences in clinical outcomes. It should be noted that MRI techniques have improved over the past decade, and the effect of these improvements on imaging outcomes is unknown.
The appropriate clinical trials to answer the questions raised in this article would be direct comparisons of natalizumab with other FDA-approved treatments for RRMS. These have not yet been done, and may never be initiated. However, several recent and ongoing randomized controlled studies have included comparisons of novel immunomodulators with current FDA-approved injectable treatments. While cross-trial comparisons may be tempting to use as a means of assessing treatments at this time, we must be keenly aware of the shortcomings of such comparisons, as discussed herein. The recently published Report of the Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology states that the “relative efficacy of natalizumab compared to current disease-modifying therapies cannot be accurately defined,” branding “Level U” evidence to this question.18 As a result, we are left pondering if the new 66% is just the old 33%.
Any introductory biostatistics course teaches that in the comparison of 2 groups, one should consider the groups to be the same unless there is sufficient evidence to reject the null hypothesis. An appropriate comparative class I trial to reject the null hypothesis might consist of 1,200 blinded individuals to randomly receive natalizumab infusions and sham injections, vs sham infusions and active injections with 1 of the 4 MS DMTs introduced during the 1990s. At this time, such a trial seems unlikely.
Scientific evidence comes in a variety of qualities. As practicing clinicians, our task is to make a judgment after careful consideration of study design and external validity. Despite hundreds of millions of dollars spent each year in MS clinical research, many unanswered questions remain. Since our patients need to make treatment decisions now, we must use our education and experience to best interpret the available qualities of evidence in order to increase the chance for net benefit. If we analyze incorrectly, the patient may be exposed to unnecessary toxicity, or be deprived of a more effective therapy. Waiting for ideal evidence may seem a safe choice, although not all of our patients have the luxury of time. Once we accept that interpretation of the literature will be complex in the absence of a class I head-to-head study, the next question is whether the sum of several lines of reasoning is greater than its flawed parts.
The recruitment of a slightly more benign MS population into studies now compared to 10 years ago is plausible. The on-study relapse rates seen in recent trials support this effect. This modern population may have a tendency to do better due to the less stringent baseline relapse rate entry criteria, along with earlier disease detection. The argument has been made that this relatively mild phenotype is contributing to the apparent increased efficacy. However, the modern placebo arm would also do better, diminishing any such net effect. Since early pivotal clinical trials sought to enrich their study population with subjects preselected for active endpoints, inclusion of a more benign population in later studies would reduce the power of the study and might reduce the apparent efficacy of the medication. Expectation of a better response in subjects with mild disease is not necessarily intuitive from a trial design perspective, unless there is a scientific explanation for greater responsiveness to treatment in early and mild MS.
Although it has been hypothesized that the immune system is more effectively controlled in early or less active MS, not only is this explanation unproven, it does not necessarily apply to the studies being compared (table 1). Comparison of disease duration, EDSS, age, and progression of disability in the placebo arms of the earlier vs later pivotal trials does not support a consistent large difference among the study populations in the various trials (table 1). In fact, 96% of subjects in AFFIRM met the same Poser criteria that were used in the early pivotal studies. Arguing against the concept of an enhanced effect in those with milder disease, the results from AFFIRM have been analyzed based on several subgroupings related to severity. Natalizumab had a more robust effect in the more severe than the less severe subgroups as stratified by relapse rate, T2 lesion burden, and presence of gadolinium-enhancing lesions.19 Moreover, natalizumab also demonstrated efficacy in the more severe subgroups, as stratified by EDSS and age.
Comparison across trials is invalid due to known and unknown differences between the responses of the active and placebo groups. Thus, comparing the efficacy of pivotal trials of the 4 injectable DMTs would be flawed due to differences in populations recruited, locations of the trials, along with the time period. With that caveat, the magnitude of relapse reduction across the injectable pivotal trials appears to be consistent, and this consistency continues in available comparison trials (table 1). Specifically, the pivotal trial intention-to-treat relapse rate reduction over placebo is approximately 30% for IFN-β1b, GA, and IFN-β1a, and is approximately 20% for IFN-β1a IM.3–6 In comparator trials, the differences between IFN-β1b and GA, and IFN-β1a and GA appear minimal.12,20 Based on the 1-year EVIDENCE trial, IFN-β1a SC may have a slight edge over IFN-β1a IM in RRMS patients with overtly active MS, providing an absolute relapse reduction of about 10%.21 The 68% relapse reduction for natalizumab over placebo, at the very least, is a number that stands out. If the relapse reduction in AFFIRM had been 40% or even 50%, one might consider it to be within the range of results seen in the earlier DMT trials. Although it may be an overstatement to say that natalizumab is twice as effective as earlier DMTs, the twofold increase in efficacy in AFFIRM merits further debate.
ARR is a helpful statistical parameter that addresses the magnitude of a treatment effect and the NNT within an individual trial. However, the dependency of ARR upon the rate of an outcome makes it particularly difficult to interpret across trials. The varying placebo relapse rates in MS clinical trials results in this dependency being particularly pertinent (table 2). As an example, consider the comparison between 2 theoretical agents, one that reduces annualized relapse rate from 1.5 to 0.6 (ARR 0.9) vs a newly developed cure that reduces relapses from 0.5 to 0 (ARR 0.5). In this scenario, the theoretical cure would have a lower ARR than the partial treatment.
The argument has been put forth that subjects enrolled in recent trials represent a different population of patients with MS, and that this earlier and milder disease phenotype is contributing to higher relative efficacy. However, recent trials of the injectable DMTs in CIS suggest treatment effects of similar magnitude to the pivotal trials. Although patients enrolled in these trials were CIS, follow-up revealed that conversion to McDonald-criteria MS occurred within 5 years in 85% and at least 89% by 10 years, revealing them to be essentially early RRMS.22,23 In CHAMPS, where annualized relapse rates were reported for the first 2 years, relative reduction for IFN-β1a IM was 35% (0.23 in delayed treatment, 0.15 in immediate treatment), again not exceeding 50%.23 At the earliest possible opportunity for treatment, injectable DMTs reduced risk for a second relapse over 2 years from 24% to 43% compared to placebo (table 3).24–27 In contrast, natalizumab prolonged the time to first study relapse by 16 months vs placebo, compared to 6–8 months for 4 CIS trials (table 3, standardized at the percentile for which half of all relapses had occurred for the placebo group). AFFIRM also demonstrated a 1.6-fold increased chance of being relapse-free for subjects receiving natalizumab vs placebo, compared with a 1.2- to 1.3-fold increased chance for injectable DMTs in CIS.
It has been suggested that the relapse reduction in the BEYOND and REGARD trials is similar to the relapse reduction in AFFIRM vs placebo. However, in the BEYOND and REGARD trials, the reduction is compared to baseline, not vs placebo. Comparison of on-study relapse rate to baseline historical relapses is prone to error due to several factors including recall bias, lack of a standardized relapse definition prior to baseline, and regression to the mean in subjects enrolled due to recent relapses. This is illustrated by the relapse reduction on drug vs baseline in the pivotal trials where glatiramer acetate had a 59% reduction over baseline, and IFN-β1a SC had a 42% reduction from baseline (table 1). These results might suggest the first drug is better than the latter. Yet comparison of the relapse rate reduction against placebo in the pivotal trials of each drug yielded near identical reductions, subsequently confirmed by direct comparison of the 2 in REGARD. Not having a placebo arm for the head-to-head BEYOND or REGARD trials makes it very difficult to come to any conclusions about comparative efficacy for the newer agents.
Secondary endpoints and analyses consistently show a robust treatment effect with natalizumab. In AFFIRM, there was a fivefold increased chance of being disease-free, a 42% increase in sustained improvement, a 47% reduction in a 6-month sustained progression of worsening vision, an improvement on both the physical and mental subscales for quality of life, an improvement on all components within the MS Functional Composite, a greater improvement in the MS Severity Score compared with placebo, and a 43% reduction in a sustained cognitive decline.19,28–33 Those on natalizumab also had an 83% reduction in T2 lesions, and a 92% reduction in gadolinium-enhancing lesions.
With regard to modulating the immune system, there appears to be a delicate balance between efficacy and safety. Serious safety issues, including death, have been identified with many of the newer agents with potential for increased efficacy. Concerns have been identified for natalizumab, rituximab, alemtuzumab, daclizumab, cladribine, and fingolimod. The first-line injectable therapies are known to be efficacious and safe based on multiple consistent clinical trials, and over a decade of experience in more than 100,000 treated individuals.
One may take issue with any of the above arguments. Nonetheless, as practicing clinicians who treat patients with a serious and progressive neurologic disease, we must decide for ourselves whether all these points, taken together, suggest an increase in efficacy sufficient to warrant a switch. In the future, additional head-to-head data may be available for some of the emerging MS agents not yet approved by the FDA. However, even large head-to-head trials will not address all the scenarios and permutations in switching from one agent to the next. For at least the next several years, the burden falls upon the clinician and patient to make a decision together as to the optimal treatment for each individual. Despite unanswered questions, there are many reasons to remain optimistic about the ongoing progress being made with MS therapeutics.
Dr. Klawiter has received a speaker honorarium from Teva Neuroscience and receives fellowship funding from the American Academy of Neurology (AAN) Foundation Clinical Research Training Fellowship. Dr. Cross serves on a scientific advisory board for Lilly (formerly BioMS); has received speaker honoraria for non-industry-sponsored activities; serves on the speakers’ bureaus of BayerHealthcare (formerly Berlex), Genentech, Inc., Biogen Idec, and Teva Neuroscience; and receives research support from the NIH [NINDS; PO1 NS059560-01 (Overall PI, PI of Project 3 and Core A), NINDS UO1 NS45719-01A1 (coinvestigator), RO1 NS047592 (coinvestigator), and NINDS/National MS Society RO1 NS 051591/NMSS RG 3915-A-15 (PI)], and from the National MS Society; and received an honorarium from the AAN for editing and cowriting two chapters in CONTINUUM (Lippincott Williams & Wilkins, 2007). Dr. Naismith has served on speakers’ bureaus and as consultant for Bayer Healthcare, Biogen Idec, Elan Pharmaceuticals, and Teva Neurosciences; receives research support from Acorda Therapeutics (Site PI) and the NIH [K23NS052430-01A1(PI) and K12RR02324902 (PI)]; and received an honorarium from the AAN for editing and writing one chapter in CONTINUUM (Lippincott Williams & Wilkins, 2007).
Address correspondence and reprint requests to Dr. Eric C. Klawiter, Neurology, Box 8111, 660 S. Euclid Ave., St. Louis, MO 63110 ude.ltsuw.oruen@eretiwalk
NIH funding included K23NS052430-01A1 (R.T.N.), K12RR02324902 (R.T.N.), UL1RR024992 (E.C.K.), and K24 RR017100 (A.H.C.). National MS Society funding included CA1012 (A.H.C.). Dr. Cross was supported in part by the Manny and Rosalyn Rosenthal–Dr. John L. Trotter Chair in Neuroimmunology. Other funding includes American Academy of Neurology Foundation Clinical Research Training Fellowship (E.C.K.).
Disclosure: Author disclosures are provided at the end of the article.
Received March 12, 2009. Accepted in final form July 7, 2009.