|Home | About | Journals | Submit | Contact Us | Français|
To derive and validate decision trees to categorize rheumatoid arthritis (RA) patients 12 weeks after starting etanercept with or without methotrexate into three groups: patients predicted to achieve low disease activity (LDA) at 1 year; patients predicted to not achieve LDA at 1 year; and patients who needed additional time on therapy to be categorized.
Data from RA patients enrolled in TEMPO were analyzed. Classification and Regression Trees were used to develop and validate decision-tree models with week 12 and earlier assessments that predicted long-term LDA. LDA, defined as DAS28 ≤ 3.2 or Clinical Disease Activity Index (CDAI) ≤ 10.0, was measured at 52 or 48 weeks. Demographics, laboratory data, and clinical data at baseline and through week 12 were analyzed as predictors of response.
Thirty-nine percent (67/172) of patients receiving etanercept and 60% (115/193) of patients receiving etanercept plus methotrexate achieved LDA at week 52. For patients receiving etanercept, 53% were predicted to have LDA, 39% were predicted to not have LDA, and 8% could not be categorized using DAS28 criteria at week 12. For patients receiving etanercept plus methotrexate, 63% were predicted to have LDA, 25% were predicted to not have LDA, and 12% could not be categorized.
Most (80%–90%) patients in TEMPO initiating etanercept with or without methotrexate could be predicted within 12 weeks of starting therapy as likely to have LDA or not at week 52. However, approximately 10%–20% of patients needed additional time on therapy to decide whether to continue treatment.
The presentation and disease course are highly variable in patients with rheumatoid arthritis (RA). Unsurprisingly, there is also great variation in the response to both nonbiologic and biologic disease-modifying antirheumatic drugs (DMARDs). Given the chronic nature of RA (with the consequent need for long-term treatment), the expense of newer biologic DMARDs, and the urgency of identifying efficacious treatment to minimize joint damage in individual patients, the ability to predict response to treatment would have substantial clinical and economic impact.
The National Institute for Health and Clinical Excellence (NICE) and the British Society for Rheumatology (BSR) recommend discontinuation of anti-TNF therapies after 6 months in the absence of an adequate response.[1, 2] The American College of Rheumatology (ACR) recommends re-evaluation of patients who have not achieved clinical benefit within 12 weeks of initiating anti-TNF therapy. Given the short period of time in which physicians are expected to make treatment decisions, it has become increasingly important to identify features in individual patients that may assist in decisions to continue or discontinue a treatment regimen.
Etanercept is a human TNF receptor-Fc fusion protein that binds to TNF and inhibits its interaction with cell surface TNF receptors. Etanercept is approved for treatment of moderately to severely active RA. Using data from a pivotal trial of etanercept, the objective of this analysis was to derive and validate a decision tree that was able to categorize patients within 12 weeks after starting etanercept with or without methotrexate into one of three groups: patients predicted to achieve low disease activity (LDA) at 1 year; patients predicted to not achieve LDA at 1 year; and patients who were not able to be categorized at 12 weeks and would need additional time on therapy. Additional analyses substituted the new ACR/EULAR remission definition  for the LDA outcome and categorized patients at 12 weeks as being likely or not to achieve remission at 1 year.
Data from patients enrolled in the Trial of Etanercept and Methotrexate with Radiographic Patient Outcomes (TEMPO)  were used in this analysis. Patients 18 years or older with active, adult-onset RA were enrolled in TEMPO. Patients in TEMPO received etanercept (25 mg twice weekly [BIW]), methotrexate (7.5 mg escalated to 20 mg oral capsules once weekly [QW] within 8 weeks if patients had any painful or swollen joints), or both.
The primary outcome of this retrospective analysis was LDA (DAS28 ≤ 3.2)  at week 52 (or week 48 if the DAS28 measurement was missing at week 52). LDA as a goal is consistent with recent treat-to-target recommendations suggesting that while remission is an optimal goal, LDA is acceptable, especially for RA patients with established disease. As a secondary outcome, patients were considered to have LDA if they had Clinical Disease Activity Index (CDAI) ≤ 10  at week 52. An additional secondary outcome required remission using the ACR/EULAR Boolean definition (tender and swollen joint ≤1, CRP ≤ 1 mg/L, and patient global assessment ≤ 1 on a scale of 0 to 10).
Patient demographics (age, sex, race [white vs non-white]), clinical data (tender joint count, swollen joint count, rheumatoid factor status, DAS28 raw score at baseline and change from baseline score at weeks 4, 8, and 12, Health Assessment Questionnaire Disability Index [HAQ-DI] score, patient pain, Physician Global Assessment, Patient Global Assessment, CDAI, and laboratory data (erythrocyte sedimentation rate, C-reactive protein) at baseline and at each visit through week 12 were included as candidate variables as predictors of LDA at week 52.
Patients were included in this analysis if they had DAS28 assessments at 48 or 52 weeks of therapy and had received etanercept alone or etanercept plus methotrexate. Patients who dropped out early because of unsatisfactory efficacy were considered to be nonresponders. Patients who dropped out of the study for safety reasons were excluded from the analysis.
Classification and Regression Trees (CART) software (Salford Systems, San Diego, CA, USA) was used to develop and validate models for identifying week 12 and earlier assessments that would predict LDA at week 52. The CART model relies on statistically optimum recursive splitting of the patients into subgroups based on critical levels of the prognostic variables. In the general implementation of CART, the dataset is split into the two subgroups that are the most different with respect to the predictor variable outcomes, and subgroups are split further based on the same principle. The percentages of patients with LDA were calculated for each node of the regression tree.
Patients predicted to have LDA at week 52 by the predictor variables were classified as responders and patients predicted to not achieve LDA at week 52 were classified as nonresponders. The remaining patients (who had an approximately 40%–60% predicted likelihood of achieving LDA) had an unclear likelihood of response and were classified as indeterminate responders needing additional time on treatment.
Two analyses were performed: the primary analysis used LDA based on DAS28 at 52 or 48 weeks; secondary analyses used LDA based on CDAI at 52 or 48 weeks and remission at 52 or 48 weeks. A 10-fold cross-validation technique [9, 10] was used to guard against model overfitting, a potential problem in prediction models in which the model fits the dataset used to derive it but would fit other datasets less well. Misclassification penalties of 3:1 were implemented, placing greater emphasis on correctly classifying patients who were predicted to be nonresponders. This procedure optimizes the prediction model for patients predicted to be nonresponders, since it is these patients for whom a decision to change the treatment regimen at 12 weeks would likely be made.
Finally, we examined the tradeoff between the degree of accuracy of a prediction model that might be minimally acceptable to a clinician and the resulting proportion of patients who could be classified with that amount of accuracy. The best-performing decision tree derived from the combination etanercept plus methotrexate data set was evaluated using 1000 simulated data sets with bootstrapping techniques. Patients receiving etanercept plus methotrexate were sampled 1000 times with replacement to generate 1000 bootstrap samples of equal size to the original TEMPO etanercept plus methotrexate arm. For each of the 1000 samples, the performance of the decision tree was evaluated iteratively by varying the level of required accuracy to range from 50% to 100%. Accuracy was defined as the proportion of patients who could be correctly classified by the decision tree. The proportion of the patients in each node of the tree who could be classified with the required amount of accuracy relative to the total sample size was then plotted against the accuracy level for that iteration. The process was then repeated for the remainder of the 1000 data sets. All points were plotted and a LOESS smoothing curve was fitted; this figure described the proportion of the population that could be predicted to have LDA at 1 year as a function of the accuracy required for that prediction.
Patient demographics and disease characteristics at baseline by treatment group and LDA status at week 52 or 48 as defined by DAS28 are shown in Table 1. Thirty-nine percent of patients receiving etanercept and 60% of patients receiving etanercept plus methotrexate achieved an LDA response at week 52 based on DAS28. Demographic and clinical characteristics were similar across treatment groups and between responders and nonresponders at baseline.
High concordance between LDA assessed by DAS28 and LDA assessed by CDAI at week 52 was demonstrated by the Kappa (Κ) coefficients: Κ = 0.64 for patients receiving etanercept; Κ = 0.78 for patients receiving etanercept plus methotrexate.
LDA in patients receiving etanercept plus methotrexate was predicted by DAS28 at week 12 and change in DAS28 from baseline at week 8 (Figure 1A). Patients receiving etanercept plus methotrexate were categorized into 3 groups by 12 weeks: responders (63% of all patients, 81% accuracy); nonresponders (25% of all patients, 88% accuracy); and patients with an indeterminate likelihood of response (12% of all patients).
Response to therapy in patients receiving etanercept plus methotrexate combination therapy was predicted by CDAI at week 12 and SJC at week 8 (Figure 1B). Patients were categorized at week 12: responders (54% of all patients, 94% accuracy); nonresponders (29% of all patients, 80% accuracy); and patients with an indeterminate likelihood of response (17% of all patients).
In summarizing the two models presented in Figures 1A and 1B, 83% to 88% of patients could be classified as responders or nonresponders by 12 weeks. The accuracy of prediction for these individuals was approximately 85%. For the remainder of the 12% to 17% of patients, additional time on therapy would be necessary to determine their treatment response at 1 year.
The model that predicted remission at 1 year is shown in Figure 2. A total of 26% of patients receiving etanercept plus methotrexate achieved remission. The key predictor variables included tender and swollen joint count (both at week 12), patient pain (at week 12), and CRP (measured at week 4). At 12 weeks, 95% of patients could be classified as responders or nonresponders. The accuracy of prediction for patients classified at week 12 as nonresponders (58% of all patients) was 98%.
Response to therapy in patients receiving etanercept monotherapy was predicted by DAS28 at week 12 and tender joint count at week 8 (Figure 3A). Patients receiving etanercept monotherapy were categorized into 3 groups by 12 weeks: responders (53% of all patients, 61% accuracy); nonresponders (39% of all patients, 93% accuracy); and patients with an indeterminate likelihood of response (only 8% of all patients).
The second model constructed using CDAI is shown in Figure 3B. Response in patients receiving etanercept monotherapy was predicted by HAQ-DI at week 8, change in CDAI at week 4, and CDAI at week 12. Patients were categorized by week 12 as responders (49% of all patients, 81% accuracy); nonresponders (27% of all patients, 91% accuracy); and patients with an indeterminate likelihood of response (24% of all patients). Too few patients in the etanercept monotherapy arm achieved remission to warrant deriving a prediction model for this treatment group.
As shown in Figure 4, there was a tradeoff between accuracy and the proportion of patients who could be classified with that amount of accuracy. Simulations represented in the cluster of points on the left side of the figure show that only 25% of the patients sampled from the bootstrapped TEMPO study population could be classified with approximately 90% accuracy. Assuming a willingness to tolerate somewhat lesser accuracy of 80% to 85%, the substantial majority of the study population sampled from TEMPO could be classified by week 12; indeed, approximately 60% to 65% of the TEMPO patients could be classified with 85% accuracy, and 85% to 90% of patients could be classified with 80% accuracy.
Patients with RA have a heterogeneous pattern of response to currently available therapy, including TNF inhibitors and other DMARDs. This is likely due to the variable course of RA. Researchers have proposed that predicting which patients will develop aggressive versus mild disease is important in order to tailor therapy that will be promising and predictable. Whereas baseline disease activity may assist in selection of the type of treatment used to treat RA, baseline measures are generally inadequate to predict treatment outcome, as we have also shown here (Table 1). For that reason, we built DAS28- and CDAI-based models using early treatment response (through 12 weeks) to show that approximately 80% to 90% of patients in TEMPO could be classified as nonresponders with respect to achieving LDA at 1 year. For the remaining ~15% of patients, additional time on therapy would be needed to determine their longer term treatment response. Substituting the alternate outcome of RA remission for LDA, the proportion of patients able to be classified at week 12 was high (95%); overall model accuracy was similar to the prediction models with LDA as the outcome. The accuracy for predicted nonresponders (98%) using the remission outcome was higher than for the LDA outcome.
Results from a study by Verstappen et al showed that early response to nonbiologic DMARD therapy in the first year, rather than the kind of initial treatment, given predicted disease remission in patients with early RA. Similarly, in patients receiving anti-TNF therapy, the likelihood of continuation of treatment was predicted by the response that they had during the first 3 months of treatment. This finding suggests that decision-making regarding continuation of nonbiologic and biologic DMARD therapy might be considered as early as 3 months, which led to our decision to use the response to treatment through 12 weeks as the key predictor variables.
Aletaha et al  analyzed pooled data from several clinical trials of patients with early and established RA.[5, 16–19] Similar to our results, they found that disease activity after 3 months of DMARD or anti-TNF therapy, but not at baseline, determined the treatment response at one year. We have extended those findings to be able to provide a clinically useful, albeit preliminary, decision tree to classify patients as being responders, nonresponders, or those for whom more time (beyond 12 weeks) is needed to predict response at 1 year. In contrast to that prior report, our results incorporate multiple predictors at different time points to improve the validity of prediction and increase the proportion of patients for whom such a prediction can be made by week 12.
Three other studies specifically examined the ability to predict treatment outcomes based on short-term response to treatment with anti-TNF therapies. Gülfe et al found that response to treatment as early as 6 weeks predicted continuation of current therapy at 3 months. Pocock et al found that a substantial number of patients who had not achieved response at 3 months of treatment were able to continue treatment and achieve a response at 6 months, supporting a need for longer therapeutic trials before discontinuation in some patients. This result is also supported by a previously published study by Kavanaugh et al, which also suggested that some patients who did not respond by 3 months did achieve some response by 6 months. Our results provide guidance on which patients need more time beyond 12 weeks to make a clinical decision, and which patients probably do not. Patients we classified as ‘indeterminate,’ comprising only approximately 15% of the TEMPO population, likely would benefit from additional time on therapy. In contrast, we were able to accurately predict treatment response for the remainder of the patients (85%) by 12 weeks; for those predicted to be nonresponders at 12 weeks, a decision might be made to switch treatment regimens, and no additional time on therapy would be necessary.
We based our prediction model on clinical and laboratory-based factors. The identification of surrogate markers of response with genetics or proteomics that might assist in predicting treatment response or response to a particular therapy is an attractive possibility but has been elusive. The current lack of genetic markers or biomarkers with the ability to discriminate between patients who will or will not respond to therapy places the emphasis on clinical evaluations and patient-reported measures. We view our LDA response models as providing a useful framework to which more sophisticated biomarker-based predictors might be added.
The CART methodology used in our study has been shown to provide more robust analyses of data containing nonlinear features, colinearity, and interactions than conventional logistic regression analyses. This nonparametric tree-based method of modeling is useful to identify the best predictors of treatment response and has a simple and visual interpretation. The validity and reproducibility of the CART decision trees is enhanced by the cross-validation technique we used, which provides an estimate of how well any classification tree performs on similar but non-identical datasets. However, despite efforts to avoid model overfitting, all prediction models, including these, should be revalidated in an independent data set.
There is a trade-off between the accuracy of prediction and the proportion of patients whose treatment response can be predicted with that degree of accuracy. As we have shown in a simulated data example in Figure 4, and using one of the prediction models we derived, there were patients for whom we could predict treatment response or nonresponse with 90% or greater accuracy; only about 25% of a population such as the patients enrolled in TEMPO could be predicted with this amount of accuracy. A higher proportion of patients could have their treatment response predicted if lower amounts of accuracy are acceptable (e.g. 80% to 85%). The degree of accuracy that clinicians require to make a treatment change at 12 weeks for those predicted to be nonresponders is clinician-dependent, but it is reassuring that the prediction accuracy of most of the nonresponder groups in our models was approximately 90%, and was even higher (98%) for the remission outcome. Further illustrating the tradeoff between the accuracy of prediction and the proportion of patients able to be classified, an analysis by van der Heidje  showed that patients who did not achieve a change in their DAS28 of greater than 1.2 units at 12 weeks after starting certolizumab pegol had only a 1% likelihood of achieving LDA at 1 year; however, only 13% of patients in the study could be classified as nonresponders using this criterion.
In summary, approximately 80% to 90% of patients in the TEMPO study who initiated etanercept with or without methotrexate could be classified within 12 weeks of starting therapy as likely to have a good response or not at week 52. Additional time on therapy was needed to determine whether to continue or discontinue etanercept for the remaining 10% to 20% of patients. This exploratory decision tree is specific to our study population; we expect that prediction models may need to vary based on the RA patient population (early versus established disease), the level of baseline disease activity (high versus moderate/low), biologic-naïve versus biologic-experienced patients, and perhaps even the specific agent or treatment regimen used. Separate prediction models might be necessary for each of these different RA patient populations; this remains to be tested. However, at least for RA patients whose clinical and disease characteristics are similar to patients enrolled in TEMPO, these models may assist the clinician in assessing the likelihood of response to TNF inhibitor therapy. This could aid clinical decision-making at an earlier time point.. Additional prediction models built with easily measured clinical and laboratory data will likely provide a useful framework to allow for adding predictors of response, including biomarker data.
We thank Edward Mancini and Larry Kovalick of Amgen Inc. and Julia R. Gage on behalf of Amgen Inc. for assistance with writing the manuscript.
Dr. Curtis has received research grants from Amgen, Genentech, Bristol-Myer-Squibb, Abbott, Centocor, and CORRONA; has consulting arrangements/honoraria from Genentech, UCB, Centocor, Amgen, and CORRONA. Dr. Park and Ms. Bitman are compensated employees and shareholders of Amgen Inc. Mr. Wang is a compensated contractor of Amgen Inc. Dr. Kavanaugh has received research grants from Amgen.
This study was sponsored by Immunex, a wholly owned subsidiary of Amgen Inc. and by Wyeth, which was acquired by Pfizer Inc. in October 2009. Data obtained from Wyeth (Pfizer). Dr. Curtis receives support from the Agency for Healthcare Research and Quality (R01HS018517) and the NIH (AR053351).
Mr. Yang, Dr. Chen, and Dr. Navarro have no competing interests to disclose.