|Home | About | Journals | Submit | Contact Us | Français|
Biologic disease-modifying antirheumatic drugs (bDMARDs) extend the treatment choices for rheumatoid arthritis patients with suboptimal response or intolerance to conventional DMARDs. The objective of this systematic review and meta-analysis was to compare the relative efficacy of EU-licensed bDMARD combination therapy or monotherapy for patients intolerant of or contraindicated to continued methotrexate.
Comprehensive, structured literature searches were conducted in Medline, Embase, and the Cochrane Library, as well as hand-searching of conference proceedings and reference lists. Phase II or III randomized controlled trials reporting American College of Rheumatology (ACR) criteria scores of 20, 50, and 70 between 12 and 30 weeks’ follow-up and enrolling adult patients meeting ACR classification criteria for rheumatoid arthritis previously treated with and with an inadequate response to conventional DMARDs were eligible. To estimate the relative efficacy of treatments whilst preserving the randomized comparisons within each trial, a Bayesian network meta-analysis was conducted in WinBUGS using fixed and random-effects, logit-link models fitted to the binomial ACR 20/50/70 trial data.
The systematic review identified 10,625 citations, and after a review of 2450 full-text papers, there were 29 and 14 eligible studies for the combination and monotherapy meta-analyses, respectively. In the combination analysis, all licensed bDMARD combinations had significantly higher odds of ACR 20/50/70 compared to DMARDs alone, except for the rituximab comparison, which did not reach significance for the ACR 70 outcome (based on the 95% credible interval). The etanercept combination was significantly better than the tumor necrosis factor-α inhibitors adalimumab and infliximab in improving ACR 20/50/70 outcomes, with no significant differences between the etanercept combination and certolizumab pegol or tocilizumab. Licensed-dose etanercept, adalimumab, and tocilizumab monotherapy were significantly better than placebo in improving ACR 20/50/70 outcomes. Sensitivity analysis indicated that including studies outside the target population could affect the results.
Licensed bDMARDs are efficacious in patients with an inadequate response to conventional therapy, but tumor necrosis factor-α inhibitor combination therapies are not equally effective.
Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease characterized by inflammation of the synovial lining of joints, tendons, and periarticular structures,1 which affects approximately 0.8% of the UK population.2 If untreated, RA leads to joint destruction, functional limitation and severe disability, and has a significant impact on health-related quality of life.3–5 Therefore, RA imposes a significant economic burden on health-care systems and society in general. 6 Although the causes of RA are still obscure, research has shown that proinflammatory cytokines, such as tumor necrosis factor-α (TNF-α) and interleukin (IL)-6 or IL-1 play key roles in its pathogenesis.7
Conventional disease-modifying antirheumatic drugs (cDMARDs) are generally offered as first-line treatments (most commonly methotrexate [MTX] alone, or, for active disease, in combination with another DMARD). Biologic DMARDs (bDMARDs) offer a valuable treatment alternative, being recommended for patients with suboptimal response or intolerance to cDMARDs or where continued cDMARD therapy is contraindicated.8,9
A number of bDMARDs have been licensed for such use in the EU. TNF-α inhibitors include etanercept, adalimumab, infliximab, certolizumab pegol, and golimumab. In combination with MTX, the TNF-α inhibitors are each indicated for the treatment of moderate to severe active RA in adults when the response to DMARDs, including MTX, has been inadequate. In addition, adalimumab, etanercept, and certolizumab pegol are licensed as monotherapy in those patients intolerant of MTX or for whom continued MTX is inappropriate.
The costimulatory inhibitor abatacept and the anti-IL-6 therapy tocilizumab, in combination with MTX, are licensed for moderate to severe active RA in adults responding inadequately to previous therapy with one or more cDMARDs including MTX or a TNF-α inhibitor. Tocilizumab is also licensed as monotherapy in patients intolerant of MTX or for whom continued MTX is inappropriate. In addition, the anti-B-cell therapy rituximab, in combination with MTX, is licensed in adult patients with severe active RA with inadequate response or intolerance to other DMARDs including one or more TNF-α inhibitors.
The objective of this systematic review was to compare the clinical efficacy of EU licensed-dose bDMARD combinations for the treatment of adult RA patients after failure on one or more DMARDs, where efficacy was measured using American College of Rheumatology (ACR) response end points from randomized controlled trials (RCTs). A network meta-analysis (NMA) was performed to pool RCT evidence for bDMARDs via common control treatments (eg, MTX control), to provide estimates of relative treatment effects. The rationale for this approach was that there are few trials comparing bDMARDs head-to-head. Therefore, NMA can support inferences to the target RA population, as all the available evidence from relevant RCTs are used in the analysis.
As bDMARD monotherapies are used in a different part of the treatment pathway, ie, in a population intolerant of MTX or for whom continued MTX is inappropriate, a separate analysis of bDMARD monotherapies was performed.
The methods used for the review and meta-analysis of combination therapy are the same as for monotherapy, except where otherwise stated.
A protocol was written to define all aspects of the systematic review prior to commencement. The inclusion criteria are shown in Table 1. As the data used in a meta-analysis should be from sufficiently similar studies and outcomes to make the results meaningful and to reduce the influence of confounding factors, included studies had to report sufficient data for the ACR 20, 50, or 70 response to treatment end point (defined as a 20%, 50%, or 70% improvement in tender and swollen joints and the same level of improvement in three of the five following variables: patient and physician global assessments of overall disease activity; patient evaluation of pain (pain health assessment questionnaire [HAQ]10); a score of physical disability; and blood acute-phase reactants). End points needed to be measured between 12 and 30 weeks from baseline. Studies in which more than 15% of patients had had previous TNF-α inhibitor treatment were excluded, because this population was more extensively pretreated and considered likely to be less responsive than the TNF-α inhibitor-naïve population. Studies were not restricted by date of publication or publication status.
The data sources to identify published RCTs and ongoing (as yet unpublished) RCTs included:
The structured database search strings were designed to identify RCTs or systematic reviews indexed on Medline, and these strings were then modified for performing searches of Embase and the Cochrane Library to account for differences in syntax and thesaurus headings. Searches included terms for free text and Medical Subject Heading (MeSH) terms to identify RCTs of RA patients taking DMARDs or bDMARDs.
One reviewer screened the title and abstract of studies identified against the eligibility criteria. Full-text papers were then assessed to ensure studies met the criteria or for those studies where eligibility could not be determined from the title/abstract. Any uncertainties as to eligibility were referred to a second reviewer and resolved by consensus. Data were extracted from eligible publications into a predefined data-extraction table by one reviewer and verified by a second.
The data items collected included patient (average age, percentage female, disease duration, baseline severity of RA, MTX- or other DMARD-exposure and TNF-α exposure), intervention (treatment(s) received, dosage and dose schedule), study (study blinding and country(ies), number of patients randomized, follow-up period, frequency of withdrawals), and outcome (ACR 20/50/70) level parameters.
Risk of bias was assessed using criteria set out in the National Institute for Health and Clinical Excellence (NICE) guidelines manual.22 For studies included in the meta-analysis, a formal assessment of publication bias was conducted via funnel plots with Egger’s linear regression test of asymmetry.23,24
For this meta-analysis, the study arms were pooled into treatment groups; we were interested in those study arms where the intervention was used in accordance with the EU license, since these are the treatments used in clinical practice. Therefore, the treatments of interest for therapy in DMARD-experienced patients are licensed bDMARD combinations plus common control arms used to connect the network (Table 2).
In a separate analysis we considered a population of patients who are intolerant of MTX or for whom MTX is contraindicated; the treatments of interest here are licensed bDMARD monotherapies plus common control arms used to connect the network (Table 2). As other cDMARDs may be used as monotherapy if MTX is contraindicated, sulfasalazine is also a treatment of interest in the monotherapy analysis. Other control arms were included in the evidence networks to preserve randomization as well as other unlicensed arms. The results for these unlicensed treatments have been omitted from this publication.
The analyses of the ACR 20/50/70 outcomes were conducted on an intent-to-treat (ITT) basis, or modified ITT (number actually receiving treatment at baseline) if the number randomized to treatment is not reported. An ITT analysis requires imputing outcomes for the missing participants, although there is no overall consensus on how to do this;25 for the ACR 20/50/70 outcomes, it is assumed that missing participants did not achieve the required improvement (ie, a worst-case scenario).
A fixed and random-effects meta-analysis was conducted in Stata IC version 11.2 using the Metan package SJ9_2: sbe24_3 (StataCorp, College Station, TX).26,27 The random-effects analysis used the method of DerSimonian and Laird, with the estimate of heterogeneity taken from the Mantel– Haenszel model. The fixed-effect analysis used the Mantel– Haenszel method. For binomial data analysis, if a study contains a zero observation (eg, no patients achieved ACR 70), Stata adds 0.5 to each cell of the trial by default.
Indirect comparisons between treatment (A) and other treatments of interest (B) via a common comparator (C) were made using the Bucher method28,29 and the pooled odds ratio (OR) produced from the direct meta-analysis.
In an NMA, treatment effects are calculated for all treatments using all available evidence in one simultaneous analysis.30–32 NMA methods build on the principles of indirect comparisons28,29 and preserve the randomized comparisons within each trial.
The models were fitted to the data using Bayesian Markov chain Monte Carlo methods (specifically Gibbs sampling), using WinBUGS software version 188.8.131.52–35 WinBUGS code for NMA of dichotomous and standard Bayesian random-effects meta-analysis was adapted from code developed by the MRC Biostatistics Unit36 and the NICE Decision Support Unit.37
The WinBUGS models were run for a minimum of 100,000 iterations to ensure model convergence. Subsequently, two chains of 20,000 were sampled from the posterior distributions. These samples were used to calculate the median/mean and where relevant the 95% credible interval (CrI), which is the interval from percentiles 2.5 to the 97.5. For treatment effects, medians are presented as the best estimate for the central value, since means may be overly influenced by outliers.
To calculate the absolute probability of responding to each treatment, we first conducted a standard direct random-effects meta-analysis that pooled data on the log-odds (or mean difference from baseline) of responding to the reference control treatment. The reference treatment is chosen to be the control that has the most data available, ie, the DMARD control in this analysis. The (mean and standard deviation) pooled log-odds (or mean differences) of responding to the reference treatment were then used as priors in the main NMA to inform the calculation of the absolute efficacy of each treatment.38
For dichotomous end points, such as ACR 20/50/70, the NMA calculates the ORs for all treatments compared with other treatments. The base case models were random-effects models; fixed-effect models were used as sensitivity analyses. Random-effects NMA differs from fixed-effect NMA in that it allows the true treatment effect (eg, OR between two treatments) to vary between studies due to heterogeneity. In these random-effects models, a uniform (uninformative) prior is used for the between-studies standard deviation (as per Hasselblad39 and Gelman40).
For the ACR 70 outcome, some studies reported zero events in the DMARD control arm, requiring a continuity correction to be applied. A fixed value of 0.5 was added to the numerator (and 1 to the denominator) for all arms of the affected trial.37,41 The fixed-value correction overcomes computational errors, but it biases study estimates towards no difference and overestimates variances. Biases will be more apparent in trials where the treatment arms are of unequal effect.
Covariate analyses were conducted to explore potentially confounding factors. We conducted a study-level covariate analysis to take into account the following differences in study protocols (DMARD-experienced analysis): (1) Length of follow-up: the model included a study level continuous variable to adjust for the time point at which the response was measured (in weeks). Xweeks is a covariate centered at mean follow-up across the included studies, such that the coefficient βweeks estimates the incremental difference in (log) treatment effect for each week above/below the average follow-up across studies. (2) Studies where MTX was administered at a low dose: the Japanese maximum dose of 8 mg/week was used as a cutoff (Xmtx = 1 if study population received MTX within the normal dose range [maximum dose more than 8 mg/week]; 0 otherwise). The coefficient βmtx estimates the incremental (log) treatment effect between low-dose concomitant MTX and normal-dose concomitant MTX.
We conducted an additional covariate analysis to take into account the following study-arm level differences in patient characteristics: (1) average age at baseline, and (2) average disease duration at baseline. This covariate model included these continuous variables to adjust for differences in patient age and disease duration (in years) across study arms. Xage and Xduration are covariates centered at mean age and disease duration, respectively, such that the coefficient βage and βduration estimate the incremental difference in the (log) treatment effect for each year above/below the average age or disease duration across study arms.
The following additional analyses were conducted for combination therapy:
Sensitivity analysis for monotherapy was conducted as follows: as base case, but include data from the TEMPO trial (24-week data from the unpublished clinical study report55). Subset analysis was not conducted: removing studies in MTX-naïve or TNF-α-experienced populations from the base case would leave too few remaining studies.
The mean residual deviances provided an estimate of how well the values predicted by the model fit the observed dataset.56 For an adequate model fit, the sum of the residual deviances should be approximately equal to the total number of study arms in the observed dataset. In addition the deviance information criterion (DIC) output by WinBUGs57 was recorded. The model with the lowest DIC is estimated to be the model that would best predict a replicate dataset of the same structure as that currently observed.
An informal assessment of consistency was performed by comparing the treatment effects estimated via the NMA against the pair-wise direct meta-analysis results and results of the indirect Bucher analysis to identify potential discrepancies between the results from the different methods.
Furthermore, the network diagrams were examined to determine the number of independent loops in the network of evidence for which inconsistency in the evidence could occur.38 Disregarding loops that occur solely from a multi-arm trial (since within-trial treatment effects are not independent), the size of any inconsistency was determined for each independent loop using the Bucher method28,29 and the Z-test (or chi-squared test if one edge of the loop is shared with other loops) to determine if the inconsistency was statistically significant.
A total of 10,616 potentially relevant records were identified, excluding duplicates from the original search, of which 8175 were excluded on screening the title and abstract. On application of the inclusion criteria to the 2441 full-text papers, a further 2415 were excluded. Nine additional studies were identified from the updated search. Thirty-seven publications were included; 23 assessing combination therapy only43–45,48–50,52,53,58–72 eight monotherapy only,73–80 and six monotherapy in addition to combination-therapy arms42,46,47,51,81,82 (Figure 1).
Of the 29 studies with at least one combination-therapy arm (Figure 2) three assessed abatacept,49,66,69 five adalimumab,59,62,64,65,71 two certolizumab pegol,44,45 six etanercept,42,46–48,67,70 three golimumab,50,63,82 six infliximab (one of which had an abatacept arm also, providing the only head-to-head comparison),53,58,60,68,69,72 two rituximab,61,81 and three tocilizumab.43,51,52 All studies utilized licensed doses, with the exception of one golimumab study.50
Study and patient characteristics are summarized in Table 3. The majority of RCTs were double-blind, three being open-label.46,47,60 In total the included studies randomized 11,490 patients. Included patients had active RA in spite of prior treatment with a DMARD. Not all studies reported baseline disease activity score (DAS). Of those that did, 13 involved populations with more severe RA: eleven trials had a mean or median baseline DAS28 of 5.9 or above,44,45,47,51,52,59,61,66,69,81,82 and in two trials of either abatacept or infliximab, the authors noted the particularly severe or active nature of disease in the study population.49,68 In two trials involving either etanercept or infliximab, the mean baseline DAS28 was between 5.0 and 5.2,42,72 indicating that the population would have included some patients with severe RA and others with more moderate-severity disease. The definition of “active RA” was inconsistent across studies, with some requiring ≥ six tender joints and ≥six swollen joints, whilst others required ≥ twelve tender joints and ≥ten swollen joints. One study of infliximab in particular may have enrolled patients with less active RA, as its definition of active RA included having ≥ eight tender joints and only ≥ three swollen joints.53 In most trials, the patient population was anti-TNFα inhibitor-naïve. Patients had a mean age of between 48 and 58 years and had on average suffered from RA for 5–10 years (around 9 months in Maini et al51 and 13 years in Weinblatt et al71). These trials were, therefore, broadly representative of the population of interest, namely, adult patients with moderate–severe active RA, previously treated with (and with insufficient response to) MTX or another DMARD, irrespective of disease duration.
The risk of bias, as assessed by NICE criteria, was considered low for the majority of included studies. For five studies, the risk of bias was unclear,50,53,59,61,67 due to incomplete reporting. Only the study by van Riel et al47 was considered to have a high risk of bias, as there was no concealment of treatment allocation (and several other parameters were unclear).
Data for the ACR 20/50/70 end points are presented in Table 4. The follow-up period was 24 weeks in 18 of the 29 trials,42–46,48–50,52,61,62,64–66,70,71,81,82 ranging from 12 weeks59,67 to 30.68Figures 3–5 show funnel plots for ACR 20/50/70, respectively, for all studies with DMARD control arm used in the combination-therapy meta-analysis. An asymmetrical funnel plot suggests publication bias or systematic difference between smaller and larger studies, and might therefore suggest that simple meta-analysis of the dataset was not appropriate. Funnel plots also highlight outlier studies, where the control-arm response is either particularly high (leading to an underestimate of the active treatment effect by comparison) or particularly low (leading to an overestimate of the active treatment’s relative effect).23,24 For ACR 20, there is a good, symmetrical spread of control responses either side of the mean response (Figure 3). RAPID 1,44 RAPID 2,45 TOWARD,43 and ARMARDA71 may underestimate the log-odds of ACR 20 response in the control arm, and hence overestimate the treatment effects (Figure 3). Conversely, AIM,66 ATTEST,69 Huang et al,62 and Zhang et al53 may overestimate the log-odds of ACR 20 response in the control arm, and hence underestimate the treatment effects (Figure 3). For ACR 50, there is a reasonable spread of control responses either side of the mean response (Figure 4). RAPID 1,44 RAPID 2,45 TOWARD,43 and ATTRACT68 may overestimate treatment effects, and CHARISMA,51 ATTEST,69 Huang et al,62 and Zhang et al53 underestimate them (Figure 4). For ACR 70, the spread of control responses is asymmetrical in the direction of lower-than-expected responses (Figure 5). OPTION52 and TOWARD43 may overestimate treatment effects, whereas CHARISMA,51 Huang et al,62 and Zhang et al53 may underestimate them (Figure 5).
The random-effects model did not show a significant difference in ACR 70 for rituximab 2 × 1000 mg + DMARD compared to DMARD alone. Otherwise, all licensed bDMARD combinations have significantly higher odds of ACR 20/50/70 compared to DMARDs alone (based on the 95% CrI, Table 5).
The etanercept combination was significantly better than the other TNF-α inhibitors, adalimumab, and infliximab in improving ACR 20/50/70 outcomes (based on the 95% CrI, Table 6). The etanercept combination was also significantly better than abatacept in improving ACR 20/50/70 outcomes, significantly better than golimumab in improving ACR 20 and rituximab in improving ACR 70 (based on the 95% CrI, Table 6). There were no significant differences between the etanercept combination and certolizumab pegol or tocilizumab.
Regarding model selection, there were sufficient studies for random-effects models to be used. The base case NMA models displayed good convergence, and for all outcomes the random-effects model had the best fit based on lowest DIC and mean residual deviance (the sum of the residual deviances divided by the total number of study arms in the observed data set) (Table 7). For ACR 70 data, a continuity correction was applied in order to account for several instances of zero events in the control arms for this outcome. Between study heterogeneity, as shown by the standard deviation in treatment effects between studies (Table 7) was quite high among studies in the network (ACR 20 standard deviation [SD] on a logarithmic scale = 0.31, ACR 50 SD = 0.40, and ACR 70 SD = 0.50). This suggests that the predicted difference on a natural scale between a study’s OR estimate and our NMA estimate may vary (between upper and lower limits) by 3.44 for ACR 20, 4.84 for ACR 50, and 7.23 for ACR 70. There is, therefore, greater uncertainty around the ACR 70 results than around ACR 50 or ACR 20.
The NMA results compare well with the direct head-to-head analysis (Table S1, Table 5) and with the Bucher indirect comparisons (Table S2, Table 6), though no formal test of consistency could be conducted, due to there being no independent loops of evidence. The NMA has a wider CrI compared to direct estimates from head-to-head trials: the lower bounds are similar, but the NMA estimates a much higher upper bound. Similarly, there is more uncertainty (in favor of etanercept) in the NMA estimates of etanercept versus the other licensed combinations compared to the estimates obtained from the Bucher indirect comparison.
Table 8 shows the results of the study-level covariate analysis, which estimates the treatment effects taking into account the impact of low-dose MTX (maximum dose less than 8 mg/week) and length of follow-up for reporting the ACR outcomes. A low dose of background MTX did not have a statistically significant impact on ACR 20/50/70, and length of follow-up did not have a statistically significant impact on ACR 20 or ACR 50. The βmtx coefficient was statistically significant in the analysis of ACR 70 outcomes (based on the 95% CrI, Table 8), such that a longer length of follow-up was associated with higher odds of ACR 70 response. However, this single significant result should be viewed with caution, since the criteria for significance (type I error, the probability of rejecting the null hypothesis when it is true − < 5%) does not take into account multiple significance testing, ie, no correction for multiple testing was applied, and no reduction in the criteria for significance (to 1%, for example) was made to keep the type I error to a minimum, and as such this result could have occurred by chance.
In an additional covariate analysis of patient characteristics (at study-arm level), longer disease duration was associated with higher odds of ACR 50 and higher patient age with higher odds of ACR 70. Otherwise, age and disease duration were not statistically significant (Table S3). As the base case and covariate odds ratios for each treatment are not largely different, our conclusion regarding the differential effectiveness of etanercept vs adalimumab or infliximab remains unaltered.
The results of the pr-defined sensitivity analyses are shown in Table S4. Removing the RAPID 1/2 or TNF-α-exposed trials had very little impact on the treatment-effect estimates. Removing the etanercept studies had very little impact on the treatment-effect estimates for etanercept for ACR 50 and 70 outcomes, but lowered the treatment-effect estimates for etanercept and certolizumab for ACR 20. The inclusion of the TEMPO study lowered the treatment-effect estimates for etanercept for ACR 20/50/70.
Fourteen studies qualified for inclusion in the analysis of bDMARD monotherapy (Figure 6): two studies with a licensed-dose adalimumab arm,74,78 plus one additional study with nonlicensed adalimumab arms,79 five trials including licensed-dose etanercept,42,46,47,73,75 plus one additional study including nonlicensed etanercept,80 and three studies including licensed tocilizumab.51,76,77 There were two additional studies that included only nonlicensed rituximab and golimumab arms.81,82 There were no studies included in the review that assessed a certolizumab monotherapy arm: the FAST4 WARD trial83 was excluded on the basis that patients may have had a prior bDMARD (other than TNF-α).
Study characteristics and patient characteristics are summarized in Table 9. All studies were double-blind, with the exception of two that were open-label.46,47 The range of baseline disease severity, as measured by DAS28 score, was from moderate–severe (DAS28 5.0–5.242) to very severe (DAS28 7.0–7.178,79). Seven studies enrolled anti-TNF-α-naïve patients.42,46,47,73–75,81 In one study, 14% of patients had prior exposure to etanercept or infliximab, but not in the 12 weeks prior to enrolment.51 In another,78 there had been no biologic treatment permitted in the 6 months prior to enrolment. In two,77,84 the status was not reported (so for these, an assumption of patients being anti-TNF-α-naïve was made). The mean age ranged from 5175 to 5774 years. The percentage of female participants in any treatment arm varied from 61%80 to 90%.84 Mean disease duration ranged from 8.4 years74 to 13 years.75 The risk of bias was highest in the open-label studies.46,47 Data for ACR 20/50/70 for the monotherapy analysis are presented in Table 10.
The patients enrolled in the adalimumab studies were broadly similar in terms of disease duration, but one of the adalimumab trials78 may have involved patients who had had some prior biologic exposure (though not in the 6 months prior to enrolment), and may therefore have enrolled a group less likely to respond.
Tocilizumab studies had on average a shorter disease duration (8.377 and 8.5 years84) compared to the etanercept and adalimumab monotherapy studies. Figures 7–9 show funnel plots for ACR 20/50/70, respectively, for all studies with placebo control arm used in the monotherapy meta-analysis.
The results from the NMA are shown in Table 11 (comparison versus placebo control), with comparisons of etanercept versus other licensed bDMARDs shown in Table 12. Licensed-dose etanercept, adalimumab, and tocilizumab monotherapy were significantly better than placebo in improving ACR 20/50/70 outcomes (based on the 95% CrI, Table 11). Etanercept monotherapy was significantly better than sulfasalazine in improving ACR 20/50/70 outcomes (based on the 95% CrI, Table 12).
As expected, the NMA had wider confidence intervals compared to direct estimates from head-to-head trials. In general, the NMA models displayed fair convergence, though some of the ACR 70 models did not converge. The random-effects model had the best fit (Table 13). A continuity correction was applied to the ACR 70 data where zero events occurred in the control arms, and between-study heterogeneity estimates were high (ACR 20 SD on a logarithmic scale = 0.81, ACR 50 SD = 0.55, and ACR 70 SD = 0.76). This suggests that the predicted difference on a natural scale between a study’s OR estimate and our NMA estimate may vary (between upper and lower limits) by 24.64 for ACR 20, 8.8 for ACR 50, and 19.8 for ACR 70 (Table 13). As a result of between-study heterogeneity, therefore, there is more uncertainty associated with the ACR 20 and ACR 70 treatment-effect estimates for monotherapy, compared to the ACR 50 outcome.
The NMA results compare well with direct head-to-head analysis (Table S5, Table 11) and with Bucher indirect comparisons (Table S6, Table 12). Examination of the monotherapy evidence network shows that there was one independent loop for which inconsistency of the direct and indirect evidence can be assessed (Figure S1). The analysis indicates that the inconsistency on this loop is not significant (P > 0.05 for ACR 20/50/70; Tables S7–S9).
A covariate analysis was not conducted, as there were too few monotherapy studies to make such an analysis robust. The results of the predefined sensitivity analyses are shown in Table S10. The inclusion of the TEMPO study lowers the treatment-effect estimates for etanercept monotherapy.
bDMARDs, in combination with a conventional DMARD, have been shown to be efficacious in patients who have had an inadequate response to prior DMARD therapy, thus representing an important addition to the RA treatment algorithm for patients and their health-care provider. Based on the clinical data identified in a systematic review, we conducted NMAs obtaining pooled estimates of relative treatment effects, allowing pair-wise comparisons and ranking of licensed bDMARD therapies. We also conducted a separate analysis of bDMARD monotherapy treatments, which are licensed for use in patients who cannot tolerate MTX or for whom MTX is contraindicated. Our results show that all licensed bDMARD combinations have significantly higher odds (based on the 95% CrI) for ACR 20/50/70 compared to MTX or DMARD monotherapy, ACR 70 results for RTX being the only exception.
For DMARD experienced patients, our results also show that the etanercept combination is significantly better than the adalimumab and infliximab combinations and comparable to the certolizumab combination in improving ACR 20/50/70 outcomes (based on the 95% CrI). Therefore, previous meta-analyses that pooled TNF-α inhibitors into a single group may have underestimated the efficacy of etanercept.85,86
The internal validity of any NMA is dependent upon three key considerations: RCT identification, individual RCT quality, and the degree of confounding bias because of similarity or consistency assumptions not being met.
Regarding the first of these, an extensive systematic review was conducted to ensure identification of all relevant RCTs. The extent of publication bias was assessed, the slope of the colored lines in the funnel plots (Figures 3–5, combination NMA; Figures 7–9, monotherapy NMA) indicating a small degree of publication bias. The network of RCTs was fairly balanced for most treatments. In the combination analysis, there was some network asymmetry, however; a greater weight of evidence was available for tocilizumab (three trials and 1058 patients) and a smaller such weight for golimumab (two arms and 124 patients).
Regarding the second consideration, quality assessment of individual RCTs did identify some open-label or early escape design studies that may have been more prone to bias, but the effect of including these in the base case was assessed – by sensitivity analyses – which showed that including these studies did not bias the treatment-effect estimates in favor of etanercept.
Regarding the third consideration, meta-analysis has the underlying assumption that trials and outcomes are sufficiently similar to allow data to be pooled, and the consistency assumption relies on there being no imbalance in modifiers of relative treatment effects across studies. In our NMA, the similarity assumption was supported by the eligibility criteria applied for study selection, and the adjustment of the results by way of covariate analyses for the potential effect modifiers, low dosing of MTX, length of follow-up, age, and disease duration. This covariate adjustment aimed to reduce the impact of any bias due to similarity and/or consistency violations. Low dosing of MTX did not have a statistically significant impact on ACR 20/50/70, nor did length of follow-up for ACR 20/50. Longer disease duration was associated with higher odds of ACR 50 and higher age with higher odds of ACR 70. Adjusting for age and disease duration did not alter our conclusion. We further examined the influence of exposure to prior anti-TNF-α therapies and of incorporating the TEMPO trial, a trial that included some MTX-naïve patients or patients that were not MTX-inadequate responders, by sensitivity analyses: overall, removing subsets of trials had very little impact on treatment-effect estimates, but meta-analysis that included the TEMPO trial85,87 underestimated the efficacy of etanercept in the DMARD-experienced/inadequate-response population because of the high MTX control arm response rate in patients previously untreated with MTX, ie, these patients were still able to benefit from MTX.
There remained heterogeneity among the studies in our NMA. The patient characteristic that differed across studies but that was not assessed as a covariate was the number of prior DMARD treatments. This is one area, therefore, where the similarity assumption may be challenged, and should be considered for covariate adjustment in future research.
One tocilizumab study, in particular – CHARISMA51 – enrolled patients with a mean duration of disease of only 9.2 months.51 This does not appear to have influenced the treatment estimates here, however. The CHARISMA study is small compared to the other tocilizumab studies – OPTION52 and TOWARD43 – so will have less weight in the meta-analysis. The random-effects direct meta-analysis of tocilizumab versus DMARD did not indicate any significant heterogeneity in effect on ACR 20 between OPTION, CHARISMA, and TOWARD (ACR 20 l2 = 0%, P = 0.86; ACR 50 I2 = 30.6%, P = 0.24; ACR 70 I2 = 59.7%, P = 0.08). In the direct meta-analysis of tocilizumab versus DMARD for ACR 50 and ACR 70, CHARISMA had a lower ACR 50 and ACR 70 treatment effect (relative to DMARD) compared to OPTION and TOWARD. This was somewhat counterintuitive, as one would expect that patients with shorter disease duration (fewer previous lines of treatment) would have better response to treatment than would patients with longer disease duration (who have had more previous treatments). The different ACR 50 and ACR 70 relative effects observed in CHARISMA, therefore, may be due to factors other than disease duration, and we conclude that the short duration of disease in the CHARISMA study population did not impact on the treatment-effect estimates.
Differences in placebo/MTX responses across trials were assessed by way of funnel plots, identifying some studies within the network that under- or overestimated the response to placebo/MTX, meaning that they would over- or underestimate, respectively, the treatment effect. From review of the funnel plots, it can be deduced that the overall treatment effects on ACR 20 and ACR 50 may be overestimated for certolizumab pegol.44,45 The low response to MTX observed in the certolizumab pegol RAPID 144 and RAPID 245 trials (see Figures 3 and and4)4) may be explained by the early escape trial design, whereby patients who had failed to respond at weeks 12–14 were withdrawn from treatment at week 16 and classified as nonresponders. More than half of the patients were withdrawn from the MTX control arms, whereas a lower percentage of certolizumab combination-arm patients were withdrawn. This imbalance in withdrawals may have had an impact on the treatment effects measured by these studies: week 16 withdrawals in RAPID 1 – 62.8% placebo, 21.1% certolizumab 200 mg, 17.4% certolizumab 400 mg; week 16 withdrawals in RAPID2 – 81.1% placebo, 21.1% certolizumab 200 mg, 21.1% certolizumab 400 mg. The ITT primary outcome at 24 weeks suggested a greater treatment effect for CZP compared to placebo than was the case before early escape.
Treatment effects may be overestimated for tocilizumab on ACR 20 and ACR 5043 and on ACR 70.43,52 Infliximab treatment effects may be underestimated for ACR 20 and ACR 70.53,69 For ACR 50, two studies53,69 underestimated the treatment effect, and one study68 provided an overestimate. For adalimumab, the effect may be an underestimate for ACR 50 and 70.62 For ACR 20, one study may have underestimated the adalimumab effect62 and another overestimated it.71 The treatment effect of abatacept on ACR 20 may be an underestimate.66 There were no etanercept studies that were outliers on the funnel plots, suggesting that the treatment effects for etanercept were within the bounds of what might be expected.
The assumption of consistency between the direct and indirect evidence could not be assessed formally in the combination analyses, as there were no independent loops of evidence in the network: for ACR outcomes, there was only one study69 that compared one licensed-treatment combination (infliximab) directly to another (abatacept) head-to-head. The combination network was comprised solely of indirect comparisons via MTX/DMARD. However, the results of direct meta-analyses and of the indirect Bucher were compared to base case results from the NMA to gauge consistency. For example, etanercept vs DMARD direct (data from etanercept vs DMARD trials only) was compared to etanercept vs DMARD as estimated by the NMA, and etanercept vs other bDMARDs indirect (as no head-to-head data) was compared to etanercept vs other bDMARDs as estimated by the NMA. The NMA had a wider confidence interval compared to the direct estimates from head-to-head trials: when comparing confidence intervals, the lower bounds were similar but the NMA estimated a much higher upper bound. Similarly, there is more uncertainty (in favor of etanercept) in the NMA estimates of etanercept versus the other licensed combinations compared to the estimates obtained from the Bucher indirect comparison. In the monotherapy analyses, one loop of evidence was present in the network, enabling a formal test of the consistency assumption. This test indicated that the direct and indirect treatment-effect estimates were not statistically significantly different, indicating that the consistency assumption held.
The relative treatment-effect estimates observed in our NMA were not influenced by any prior distribution estimates, as noninformative priors were used, meaning that prior to the data being applied, any result was taken to be equally likely, and that the posterior results were driven by the data. Model selection in our NMA was based on the best model fit, as indicated by the lowest DIC and average residual deviance values.
Regarding the heterogeneity observed among studies in the network, it may be argued that this might present a challenge to the similarity assumption. It does, however, better support the external validity of these NMA results: this variation in patient populations is more likely to reflect real-world practice.
Our outcome measure was ACR response, a good shortterm measure of disease response that is widely reported in clinical trials of RA. Other measures, such as the HAQ, might be more relevant for longer-term progression measurement. However, HAQ is not so broadly reported and is not as sensitive in measuring short-term changes in RA symptoms.
Our data relate to the population of adult patients with active, moderate–severe RA who have failed on or had an inadequate response to MTX or other conventional DMARDs. In these patients, the treatments evaluated were effective. In relation to other NMAs, our data illustrate that evidence from the different TNF-α inhibitors should not be combined together in meta-analyses, because efficacy differs between drugs in this class. Etanercept is a fusion protein including a soluble fragment of human p75-soluble TNF receptor and human immunoglobulin G1, whereas adalimumab and infliximab are MAbs directed against TNF. Differences in the kinetics and mode of action between etanercept and the MAbs have been reported,88 and these differences may provide a plausible biological rationale for the differences in treatment-effect estimates observed in our NMA. Differences in the findings of published NMAs of biological DMARDs in RA have been reviewed and attributed to methodological shortcomings and inconsistencies.89 Our NMA was performed incorporating many of the recommended criteria,89 for a high-quality NMA including clear statement of the population (DMARD-MTX-inadequate responders, as distinct from TNF-α-inadequate responders, or DMARD- or MTX-naïve populations), analyzing monotherapy and combination therapy in separate networks (thereby avoiding lumping of mono- and combination therapy without controlling for concomitant DMARD use), exploring heterogeneity and effect modification by covariate analyses, and examining the influence of particular trials or sets of trials by sensitivity analyses.
Our data do not address treatment effects in the population of patients who have failed TNF-α treatment, as this is a later stage in the treatment pathway. Likewise, further review work would be required to gain treatment-effect estimates for bDMARDs in a moderate-RA population, which implies introducing bDMARDs at an earlier stage. Future NMAs, whilst mindful of the risk of multiplicity, should consider covariate adjustment for the number of prior DMARD treatments, C-reactive protein, or baseline HAQ, if sufficient data are available.
Part of the monotherapy evidence network containing the tocilizumab 4 mg/kg/4 weeks–tocilizumab 8 mg/kg/4 weeks loop.
Notes: 6, Maini 2006 (CHARISMA); 10, Nishimoto 2004 (STREAM); 11, Nishimoto 2009 (SATORI).
This study was sponsored by Pfizer Ltd, UK. Michelle Orme, Katherine MacGilchrist, and Stephen Mitchell were paid consultants to Pfizer Ltd, UK in connection with this study. Dean Spurden and Alex Bird are paid employees of Pfizer Ltd, UK.