|Home | About | Journals | Submit | Contact Us | Français|
To analyze the minimum clinically important improvement (MCII) of disease activity measures in rheumatoid arthritis (RA) using patient-derived anchors, and to assess whether criteria for improvement differ with baseline disease activity.
We used data from a Norwegian observational database comprising 1,050 patients (73% women, 65% rheumatoid factor-positive, mean duration of RA 7.7 years). At 3 months after initiation of therapy, patients indicated whether their condition had improved, had considerably improved, was unchanged, had worsened, or had considerably worsened. We used receiver operating characteristic curve analysis to determine the MCII for the Disease Activity Score based on the assessment of 28 joints (DAS28), the Simplified Disease Activity Index (SDAI), and the Clinical Disease Activity Index (CDAI), and analyzed the effects of different levels of baseline disease activity on the MCII.
On average, patients started with high disease activity and improved significantly during treatment (American College of Rheumatology 20%, 50%, and 70% improvement criteria responses were 37%, 17%, and 5%, respectively). The overall mean (95% confidence interval [95% CI]) thresholds for MCII after 3 months for the DAS28, SDAI, and CDAI were 1.20 (95% CI 1.18–1.22), 10.95 (95% CI 10.69–11.20), and 10.76 (95% CI 10.49–11.04), respectively, and the mean (95% CI) thresholds for major responses were 1.82 (95% CI 1.80–1.83), 15.82 (95% CI 15.65–16.00), and 15.00 (95% CI 14.82–15.18), respectively. With increasing disease activity, much higher changes in disease activity were needed to achieve MCII according to patient judgment.
The perception of improvement of disease activity of patients with RA is considerably different depending on the disease activity level at which they start.
Disease activity assessment in rheumatoid arthritis (RA) is complex and requires the use of a number of different measures (1), ideally combined in scores, criteria, or pooled indices (2–4). Several of these indices have been developed over the years and are frequently used in clinical trials and practice (5–10). The use of such indices to follow disease activity over time has become a very important aspect in the care for patients with RA (11–14).
The most commonly employed response criteria in clinical trials are those by the American College of Rheumatology (ACR) (8), which have been derived based on the discrimination of responses between treatment with active drugs and placebo. The response criteria of the European League Against Rheumatism (EULAR) were developed with the concept that not only the change in disease activity upon therapeutic intervention was important, but potentially also the disease activity state reached (15). The importance of integrating response to therapy with the disease activity state attained is, however, still a matter of debate, and the newest proposed revision of the ACR criteria has maintained its focus on response (16). However, it has recently been shown that even if the same level of response is reached, radiographic and functional outcomes differ significantly depending on the disease activity state attained (17).
Recently, the perspectives of patients have become an increasingly relevant constituent of RA outcomes assessment (18), putting emphasis on their perceptions of improvement in their disease, and on their limitations. Although patient perspectives per se are subjective, it will likely be a future benchmark to improve not only traditional objective measures of disease activity, but also patient satisfaction. It is still common to try to understand the meaning of more objective measures by mapping their levels to patient-reported outcomes such as functional measures, or to map them to a condition called the patient acceptable symptom state (19). In fact, significant advances have been made to value the level of subjective improvement (20).
In the present study, we addressed the question of how a decrease in disease activity, as currently measured by composite instruments, relates to patient perception of improvement. In addition, we questioned whether patients with higher levels of disease activity would require the same responses on objective scales to perceive improvement. We therefore used a large observational data set of patients with RA who were newly prescribed a disease-modifying antirheumatic drug (DMARD) and assessed the level of minimum clinically important improvements (MCII) for 3 of the available composite indices, using the patients’ ratings of improvement as the anchor. We hypothesized that thresholds for improvement would be greater in patients with more active RA. We also analyzed the associations of other contextual factors on the patient-reported responses, such as duration of RA, sex, and treatment regimens.
The data source was a Norwegian prescription data set (the Norwegian Disease-Modifying Antirheumatic Drug study) (21). The data set used for the present analyses included 1,285 patients with RA who received DMARD therapy. We identified and analyzed the first documented DMARD in each patient. For all patients, core set measures of disease activity were available at baseline and at 3 monthly followup intervals thereafter. At all visits except the baseline visit, patients were asked to assess the improvement of their disease activity on a 5-point Likert scale. The wording of the question was: “Since you started treatment in this follow-up study, has your rheumatic disease improved, been unchanged or become worse?” (originally in Norwegian: “Siden du startet behandling i denne oppfølgingsundersøkelsen, er du blitt bedre, uforandret eller verre i din revmatiske sykdom?”), and the response options were considerably better, better, unchanged, worse, and considerably worse. In the main analysis, we used the MCII at the 3-month time point for response evaluation, and the Disease Activity Score based on the evaluation of 28 joints (DAS28) (7), the Simplified Disease Activity Index (SDAI) (9), and the Clinical Disease Activity Index (CDAI) (10) as the composite disease activity measures. Additionally, we validated the results of this analysis using the response ratings at 6 months.
The principal method used to derive the cut points for MCII was a receiver operating characteristic (ROC) curve analysis. The anchor for the ROC curve analysis was the degree of improvement as reported by the patient on a 5-point Likert scale. Because we aimed to identify levels of improvement, patients who rated themselves as worse or considerably worse were excluded from analysis. For the MCII analysis, the status of the anchor was defined as follows: unchanged = no; improved or considerably improved = yes. In addition, we analyzed the level of major response at which a patient would perceive considerable improvement. Accordingly, the status of the major response anchor was defined as follows: unchanged or improved = no; considerably improved = yes. In the ROC curve analyses, the sensitivities and specificities of increasing change values in each disease activity measure were tested and plotted. In other words, with increasing changes on the disease activity measure, the specificity for the presence of a response improved and the sensitivity decreased (i.e., more and more improved patients will be missed because their measured changes were smaller). The area under the ROC curve can therefore be used to estimate the overall usefulness of a scale as a test for a patient-reported response.
However, an ROC curve rarely provides an optimal threshold value on a tested scale. Therefore, we used the 80% specificity method, by which the cut point was selected that showed the best sensitivity for a response while still achieving at least 80% specificity, which we have also used in previous studies (22). In a later sensitivity analysis, we also used the maximum accuracy method, in which the cut point on the disease activity scale with the highest combination of specificity and sensitivity was selected (19). In the latter, however, specificity can be traded for sensitivity, leading to the same accuracy but at the cost of poor comparability across different ROC curve analyses. The 80% specificity method was therefore used in the main analysis.
Any method applied to obtain optimal cut points from ROC curve analysis is prone to various amounts of error. ROC curves using subjective anchors, such as patient-reported responses, especially tend to be flat in the area of interest. In other words, many adjacent cut points yield similar results (regardless of the method used to identify the best cut point). As a consequence, the cut points obtained for the composite disease activity measures from single ROC curves might not be sufficiently reliable.
To overcome this problem, we used the bootstrapping technique, by which only a random sample of the population is subjected to the ROC curve analysis (23,24). The best cut points from each of many repeated random samples of the same population can then be summarized to yield a highly reliable cut point. In our analysis, we used a 50% random sample and bootstrapped 100 times. In this way, each patient contributed on average to 50% of the analyses. Then we summarized the best cut points from all samples using their mean.
We performed subgroup analyses to test our hypothesis that level of baseline disease activity has a considerable impact on the change in disease activity needed for a patient to perceive improvement. We used subgroups of patients with increasing disease activity by testing the various scores in steps of 0.5 units (DAS28) or 5 units (SDAI, CDAI), and used a ±0.5 unit (DAS28) or a ±5 unit (SDAI, CDAI) tolerance interval. For example, the group defined as having a baseline SDAI score of 20 was comprised of patients with a baseline SDAI score range of 15–25. For each of these subgroups the same analyses were performed as were performed for the complete cohort. The bootstrapping technique allowed reliable cut-point estimates despite the smaller numbers of patients in the various subgroups. We repeated these analyses using subgroups based on the distribution of scores (quartiles) to test whether a similar association would be found across subgroups defined in a data-driven way.
To assess these associations using a purely patient-derived measure, the patient global score was used in an additional analysis. In this analysis, response levels of the patient global score were analyzed across the levels of patient global scores at baseline.
In addition to baseline disease activity levels, we also tested the associations of other contextual variables, such as disease duration, age, sex, and therapy. In each of these analyses, we stratified by baseline RA disease activity (using tertiles) to control for the presumed effects of baseline disease activity on the response levels.
To validate the results on the association of baseline disease activity levels on the MCII, we performed the same analysis using data from an independent cohort of patients with RA from the US. In that cohort, patients with active RA were enrolled if they were beginning a new treatment (prednisone or DMARDs/biologic agents) or had escalation of their current treatment. Those treated with prednisone (39 [28.3%] patients) were reassessed 1 month after entry, and those treated with new DMARDs/biologic agents (26 [18.8%] patients) and those with escalation of therapy (73 [52.9%] patients) were reassessed 4 months after entry.
At followup, 90 patients reported improvement in global status, 48 reported no change, and the remainder reported worsening in global status (and were excluded from the analysis as detailed previously). The patient anchor wording in that study differed from the Norwegian anchor wording: “Since the start of the study, OVERALL my arthritis has improved, stayed the same, gotten worse (check one).” Those who indicated improvement had to specify its importance as hardly, a little, somewhat, moderately, a good deal, very, or extremely important. Similar options were offered for those who indicated that their disease had worsened. In the context of validating the association with baseline disease activity, this difference was even considered to be an advantage, as was the fact that the US patients were in general of clearly different background than the Norwegian patients.
The patients’ characteristics at the start of a new DMARD therapy are outlined in Table 1. The average disease activity was high when DMARDs were initiated (DAS28 score 5.1, SDAI score 28.1, and CDAI score 25.5). After 3 months, 22.9% of patients reported worsening or no change in disease activity, 45.0% reported that they had improved, and 32.2% reported that they had considerably improved.
The mean (95% confidence interval [95% CI]) overall thresholds by MCII bootstrapping after 3 months for the DAS28, SDAI, and CDAI were 1.20 (95% CI 1.18–1.22), 10.95 (95% CI 10.69–11.20), and 10.76 (95% CI 10.49–11.04), respectively, and the mean levels for major responses were 1.82 (95% CI 1.80–1.83), 15.82 (95% CI 15.65–16.00), and 15.00 (95% CI 14.82–15.18), respectively (Table 2).
The overall MCII levels were shown to not be representative when we performed stratified analyses by disease activity at baseline. The almost linear association between baseline disease activity and MCII for the DAS28, SDAI, and CDAI is shown in Figure 1. For example, when patients started DMARDs with moderate disease activity according to the SDAI (e.g., SDAI 15), they required a change in the SDAI of 7 to feel improved, but when they started with high disease activity (e.g., SDAI 50) they needed an SDAI improvement of 30 to perceive improvement similarly. Not unexpectedly, this is also the case for the major response (Figure 1D, E, and F) and for the other indices.
These results can be used to roughly define MCII for patients starting therapy at low, moderate, or high disease activity based on the respective definitions for the DAS28, SDAI, and CDAI. The respective cut points that would then apply are shown in Table 2. For example, the CDAI changes needed for MCII are 1.8, 7.3, and 17.8, respectively, for low, moderate, and high baseline disease activity.
To assess the association of other contextual variables with thresholds for MCII, we repeated the analyses in subgroups of patients by age (tertiles), duration of RA (tertiles), sex, and therapy (methotrexate, methotrexate plus tumor necrosis factor inhibitor, or leflunomide; all other treatment groups were too small for this analysis). The analyses were stratified by the baseline disease activity states of the respective index, which were a major determinant of the MCII and could therefore potentially confound the effects of other contextual variables. The tertiles of baseline disease activity were 1.7–20.6, 20.7–31.8, and 31.8 – 81.3. There was no clinically relevant difference of the MCII between the levels of the variables within each stratum of baseline disease activity (Figure 2).
We aimed to validate the findings on 4 levels: 1) using the 6-month rating of response by the patients instead of their rating at 3 months; 2) using the best-accuracy method of determining the best cut point from an ROC curve (see Patients and Methods) instead of the 80% specificity method; 3) using a completely independent cohort of patients from the US; and 4) using subgroups based on score distributions (quartiles) rather than absolute score values. To be concise, and given the comparative analysis of the main analysis for the SDAI, CDAI, and DAS28, the results of these validation analyses are shown only for the SDAI and the MCII level. In addition, we investigated the association between baseline levels of a purely patient-derived measure, the patient global score, with its cut points for MCII determined using the same methodology as we used for the composite indices. Although the overall degree of SDAI changes were different, when the analysis was performed in these different settings the association remained unaffected (Figures 3A, 3B, and 3C). Importantly, this was also seen in the US cohort (Figure 3D), which reflects the views of patients who are widely different than the Norwegian patients. A similar association was also observable for the patient global assessment measure, as shown in Figure 3E.
Many instruments for disease activity assessment have been developed by employing physicians’ valuation of disease activity, response to treatment, or decision to start or change treatment as a gold standard, at least in the initial phases of their derivation and composition. Several of these indices, such as the SDAI, CDAI, and DAS28, performed very well when separately evaluated in relation to physicians’ judgment of response (16). In most cases, patient assessment of disease activity or response to therapy has been used to create new outcome measures (patient report outcomes), but rarely, if ever, have patient reports been used to evaluate threshold levels on established and frequently used measures of disease activity such as the composite indices.
Even though physicians’ global estimation of response to therapy was not used for assessment of improvement of certain composite scores, such as the DAS28, SDAI, and CDAI, these scores performed very well when evaluated in relation to physicians’ judgment of response (16). The patient’s perception has also entered the valuation of disease activity in some indices and criteria by encompassing pain and physical function assessment by the patient (ACR response criteria) (8) and/or including patient global assessment of disease activity (the ACR criteria, DAS28, SDAI, and CDAI). However, the patients’ judgment of response to therapy in itself is a fundamental component of disease management, and is currently neglected in our thinking about treatment outcomes in RA. The simple question “How are you today?” is the foundation of every interaction between physicians and patients, and the request of patients is to make them feel better. Although these patient-oriented assessments must not challenge other well-known predictors of long-term damage and patient prognosis, such as swollen joint counts (25) and composite indices (10), the patient’s perception has repeatedly been shown to likewise have pivotal predictive value (26). The combination of the advantages of composite scores and the knowledge of their meanings from the patients’ perspective ought to impart particular strength in providing care for individuals with RA.
The results of the present study reveal that patients’ perceptions on improvement differ significantly with the level of baseline disease activity; the more active their RA was at the start of DMARD therapy, the higher the absolute changes in scores required to be experienced as minimum (or major) improvement were. In fact, there was an almost linear correlation between the MCII and baseline disease activity. This observation was essentially made for all composite indices employed. When patient responses by objective scales or criteria are evaluated, it is important to consider the starting point of a patient in conjunction with the quantifiable changes that have been observed.
Recently published recommendations by the ACR and EULAR (27,28) encourage the inclusion of both response and state measures in reports of clinical trials in RA. Interestingly, our study suggests that the mean absolute disease activity by SDAI (i.e., the achieved state) at which patients believed that they had obtained an MCII (i.e., a response) was within the range of the moderate disease activity category, irrespective of the baseline disease activity. Therefore, although improvement constitutes a moving target by patient perception, the ensuing absolute level of disease activity appears to be fixed within a relatively narrow range. This suggests that disease activity states, rather than absolute responses at the end of a trial, may be relevant end points from a patient perspective. This is supported by a recent study in which we showed that the disease category attained in clinical trials was more relevant to progression of joint damage and disability than the degree of improvement was (17). Likewise, this supports the concept of a patient acceptable symptom state, as published by Tubach et al for 2 primarily noninflammatory rheumatic conditions, knee osteoarthritis and rotator cuff syndrome (29). In that study, greater MCII levels were required for patients with higher levels of pain or functional disability than for patients with lower levels of pain or functional disability at baseline, but the patient acceptable symptom state was relatively stable across these groups.
Importantly, attaining an MCII is likely not sufficient from the current perspectives on RA and the aim for remission, because there is still significant progression of damage and functional impairment in moderate disease activity. However, a treatment reducing damage progression with high efficacy but that will not lead to an MCII in a great proportion of patients will not be very useful from a clinical perspective. Examples of such regimens could be purely antidestructive agents. Likewise, a treatment that entails minimum improvement in many patients may lead to major improvement and even remission in some of them, and therefore is likely to be very efficacious. Using therapeutic algorithms and predictive markers of response in conjunction with the consideration of the relevance to patients will allow for optimal use of therapies, will maximize patient compliance, and will lead to improved long-term outcomes of the disease.
Our study had several limitations. First, the size of the validation cohort was small; nevertheless, the US cohort exhibited a similar association between the MCII and baseline disease activity as the original Norwegian population studied. Second, disease activity at baseline was different in the 2 cohorts; however, this difference was helpful regarding the generalizability of the study results, namely the consistent association of response levels with disease activity levels in different patient groups (Norway versus the US). A third limitation relates to the present thresholds for MCII and major response, namely to the possibility of cultural differences in the perception of improvement, which cannot be addressed in a single study. Although it is possible that the overall change needed to perceive response is different in other patient populations, it is very likely that the major finding of our study, i.e., the association of response levels with baseline disease activity, will be similar in those populations. This is, in fact, what we found in the validation analysis comparing the results from the Norwegian cohort with the results from the US. Similarly, the reported association is likely not going to depend on specific methodologic aspects (such as picking an 80% specificity cut point), which has also been shown in the sensitivity/validation analyses.
In summary, patients judge the term improvement in relation to both their absolute baseline disease activity and their absolute disease activity at the time of perceived improvement. Therefore, the proportion of patients who achieved particular disease activity states should be reported in clinical trials. This should be complemented by the proportion of patients who achieved improvement by their own judgment. To make this level comparable between different reports, patient-based anchors, as derived in the present study, need to be defined in order to be applied to evaluation of the presence or absence of patient-reported improvement. This will add an important layer to the interpretation of outcomes in clinical trials, observational studies, and clinical practice.
Dr. Kvien has received consultant fees, speaking fees, and/or honoraria (less than $10,000 each) from Roche, Merck Sharp & Dohme, Abbott, Wyeth, Bristol-Myers Squibb, and Pfizer.
AUTHOR CONTRIBUTIONSDr. Aletaha had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study design. Aletaha, Smolen, Kvien.
Acquisition of data. Ward, Kvien.
Analysis and interpretation of data. Aletaha, Funovits, Ward, Smolen, Kvien.
Manuscript preparation. Aletaha, Ward, Smolen, Kvien.
Statistical analysis. Aletaha, Funovits, Smolen.