bDMARDs, in combination with a conventional DMARD, have been shown to be efficacious in patients who have had an inadequate response to prior DMARD therapy, thus representing an important addition to the RA treatment algorithm for patients and their health-care provider. Based on the clinical data identified in a systematic review, we conducted NMAs obtaining pooled estimates of relative treatment effects, allowing pair-wise comparisons and ranking of licensed bDMARD therapies. We also conducted a separate analysis of bDMARD monotherapy treatments, which are licensed for use in patients who cannot tolerate MTX or for whom MTX is contraindicated. Our results show that all licensed bDMARD combinations have significantly higher odds (based on the 95% CrI) for ACR 20/50/70 compared to MTX or DMARD monotherapy, ACR 70 results for RTX being the only exception.
For DMARD experienced patients, our results also show that the etanercept combination is significantly better than the adalimumab and infliximab combinations and comparable to the certolizumab combination in improving ACR 20/50/70 outcomes (based on the 95% CrI). Therefore, previous meta-analyses that pooled TNF-α inhibitors into a single group may have underestimated the efficacy of etanercept.85
The internal validity of any NMA is dependent upon three key considerations: RCT identification, individual RCT quality, and the degree of confounding bias because of similarity or consistency assumptions not being met.
Regarding the first of these, an extensive systematic review was conducted to ensure identification of all relevant RCTs. The extent of publication bias was assessed, the slope of the colored lines in the funnel plots (–, combination NMA; –, monotherapy NMA) indicating a small degree of publication bias. The network of RCTs was fairly balanced for most treatments. In the combination analysis, there was some network asymmetry, however; a greater weight of evidence was available for tocilizumab (three trials and 1058 patients) and a smaller such weight for golimumab (two arms and 124 patients).
Regarding the second consideration, quality assessment of individual RCTs did identify some open-label or early escape design studies that may have been more prone to bias, but the effect of including these in the base case was assessed – by sensitivity analyses – which showed that including these studies did not bias the treatment-effect estimates in favor of etanercept.
Regarding the third consideration, meta-analysis has the underlying assumption that trials and outcomes are sufficiently similar to allow data to be pooled, and the consistency assumption relies on there being no imbalance in modifiers of relative treatment effects across studies. In our NMA, the similarity assumption was supported by the eligibility criteria applied for study selection, and the adjustment of the results by way of covariate analyses for the potential effect modifiers, low dosing of MTX, length of follow-up, age, and disease duration. This covariate adjustment aimed to reduce the impact of any bias due to similarity and/or consistency violations. Low dosing of MTX did not have a statistically significant impact on ACR 20/50/70, nor did length of follow-up for ACR 20/50. Longer disease duration was associated with higher odds of ACR 50 and higher age with higher odds of ACR 70. Adjusting for age and disease duration did not alter our conclusion. We further examined the influence of exposure to prior anti-TNF-α therapies and of incorporating the TEMPO trial, a trial that included some MTX-naïve patients or patients that were not MTX-inadequate responders, by sensitivity analyses: overall, removing subsets of trials had very little impact on treatment-effect estimates, but meta-analysis that included the TEMPO trial85
underestimated the efficacy of etanercept in the DMARD-experienced/inadequate-response population because of the high MTX control arm response rate in patients previously untreated with MTX, ie, these patients were still able to benefit from MTX.
There remained heterogeneity among the studies in our NMA. The patient characteristic that differed across studies but that was not assessed as a covariate was the number of prior DMARD treatments. This is one area, therefore, where the similarity assumption may be challenged, and should be considered for covariate adjustment in future research.
One tocilizumab study, in particular – CHARISMA51
– enrolled patients with a mean duration of disease of only 9.2 months.51
This does not appear to have influenced the treatment estimates here, however. The CHARISMA study is small compared to the other tocilizumab studies – OPTION52
– so will have less weight in the meta-analysis. The random-effects direct meta-analysis of tocilizumab versus DMARD did not indicate any significant heterogeneity in effect on ACR 20 between OPTION, CHARISMA, and TOWARD (ACR 20 l2
= 0%, P
= 0.86; ACR 50 I2
= 30.6%, P
= 0.24; ACR 70 I2
= 59.7%, P
= 0.08). In the direct meta-analysis of tocilizumab versus DMARD for ACR 50 and ACR 70, CHARISMA had a lower ACR 50 and ACR 70 treatment effect (relative to DMARD) compared to OPTION and TOWARD. This was somewhat counterintuitive, as one would expect that patients with shorter disease duration (fewer previous lines of treatment) would have better response to treatment than would patients with longer disease duration (who have had more previous treatments). The different ACR 50 and ACR 70 relative effects observed in CHARISMA, therefore, may be due to factors other than disease duration, and we conclude that the short duration of disease in the CHARISMA study population did not impact on the treatment-effect estimates.
Differences in placebo/MTX responses across trials were assessed by way of funnel plots, identifying some studies within the network that under- or overestimated the response to placebo/MTX, meaning that they would over- or underestimate, respectively, the treatment effect. From review of the funnel plots, it can be deduced that the overall treatment effects on ACR 20 and ACR 50 may be overestimated for certolizumab pegol.44
The low response to MTX observed in the certolizumab pegol RAPID 144
and RAPID 245
trials (see and ) may be explained by the early escape trial design, whereby patients who had failed to respond at weeks 12–14 were withdrawn from treatment at week 16 and classified as nonresponders. More than half of the patients were withdrawn from the MTX control arms, whereas a lower percentage of certolizumab combination-arm patients were withdrawn. This imbalance in withdrawals may have had an impact on the treatment effects measured by these studies: week 16 withdrawals in RAPID 1 – 62.8% placebo, 21.1% certolizumab 200 mg, 17.4% certolizumab 400 mg; week 16 withdrawals in RAPID2 – 81.1% placebo, 21.1% certolizumab 200 mg, 21.1% certolizumab 400 mg. The ITT primary outcome at 24 weeks suggested a greater treatment effect for CZP compared to placebo than was the case before early escape.
Treatment effects may be overestimated for tocilizumab on ACR 20 and ACR 5043
and on ACR 70.43
Infliximab treatment effects may be underestimated for ACR 20 and ACR 70.53
For ACR 50, two studies53
underestimated the treatment effect, and one study68
provided an overestimate. For adalimumab, the effect may be an underestimate for ACR 50 and 70.62
For ACR 20, one study may have underestimated the adalimumab effect62
and another overestimated it.71
The treatment effect of abatacept on ACR 20 may be an underestimate.66
There were no etanercept studies that were outliers on the funnel plots, suggesting that the treatment effects for etanercept were within the bounds of what might be expected.
The assumption of consistency between the direct and indirect evidence could not be assessed formally in the combination analyses, as there were no independent loops of evidence in the network: for ACR outcomes, there was only one study69
that compared one licensed-treatment combination (infliximab) directly to another (abatacept) head-to-head. The combination network was comprised solely of indirect comparisons via MTX/DMARD. However, the results of direct meta-analyses and of the indirect Bucher were compared to base case results from the NMA to gauge consistency. For example, etanercept vs DMARD direct (data from etanercept vs DMARD trials only) was compared to etanercept vs DMARD as estimated by the NMA, and etanercept vs other bDMARDs indirect (as no head-to-head data) was compared to etanercept vs other bDMARDs as estimated by the NMA. The NMA had a wider confidence interval compared to the direct estimates from head-to-head trials: when comparing confidence intervals, the lower bounds were similar but the NMA estimated a much higher upper bound. Similarly, there is more uncertainty (in favor of etanercept) in the NMA estimates of etanercept versus the other licensed combinations compared to the estimates obtained from the Bucher indirect comparison. In the monotherapy analyses, one loop of evidence was present in the network, enabling a formal test of the consistency assumption. This test indicated that the direct and indirect treatment-effect estimates were not statistically significantly different, indicating that the consistency assumption held.
The relative treatment-effect estimates observed in our NMA were not influenced by any prior distribution estimates, as noninformative priors were used, meaning that prior to the data being applied, any result was taken to be equally likely, and that the posterior results were driven by the data. Model selection in our NMA was based on the best model fit, as indicated by the lowest DIC and average residual deviance values.
Regarding the heterogeneity observed among studies in the network, it may be argued that this might present a challenge to the similarity assumption. It does, however, better support the external validity of these NMA results: this variation in patient populations is more likely to reflect real-world practice.
Our outcome measure was ACR response, a good shortterm measure of disease response that is widely reported in clinical trials of RA. Other measures, such as the HAQ, might be more relevant for longer-term progression measurement. However, HAQ is not so broadly reported and is not as sensitive in measuring short-term changes in RA symptoms.
Our data relate to the population of adult patients with active, moderate–severe RA who have failed on or had an inadequate response to MTX or other conventional DMARDs. In these patients, the treatments evaluated were effective. In relation to other NMAs, our data illustrate that evidence from the different TNF-α inhibitors should not be combined together in meta-analyses, because efficacy differs between drugs in this class. Etanercept is a fusion protein including a soluble fragment of human p75-soluble TNF receptor and human immunoglobulin G1
, whereas adalimumab and infliximab are MAbs directed against TNF. Differences in the kinetics and mode of action between etanercept and the MAbs have been reported,88
and these differences may provide a plausible biological rationale for the differences in treatment-effect estimates observed in our NMA. Differences in the findings of published NMAs of biological DMARDs in RA have been reviewed and attributed to methodological shortcomings and inconsistencies.89
Our NMA was performed incorporating many of the recommended criteria,89
for a high-quality NMA including clear statement of the population (DMARD-MTX-inadequate responders, as distinct from TNF-α-inadequate responders, or DMARD- or MTX-naïve populations), analyzing monotherapy and combination therapy in separate networks (thereby avoiding lumping of mono- and combination therapy without controlling for concomitant DMARD use), exploring heterogeneity and effect modification by covariate analyses, and examining the influence of particular trials or sets of trials by sensitivity analyses.
Our data do not address treatment effects in the population of patients who have failed TNF-α treatment, as this is a later stage in the treatment pathway. Likewise, further review work would be required to gain treatment-effect estimates for bDMARDs in a moderate-RA population, which implies introducing bDMARDs at an earlier stage. Future NMAs, whilst mindful of the risk of multiplicity, should consider covariate adjustment for the number of prior DMARD treatments, C-reactive protein, or baseline HAQ, if sufficient data are available.