We developed a novel, administrative data-based clinical effectiveness algorithm for use in future studies as a proxy for the clinical effectiveness of RA medications. In this preliminary assessment of its performance, we showed that it has acceptable sensitivity, specificity, PPV and NPV. Our sensitivity, specificity, PPV and NPV that were in the 75% to 90% range reflect good, although not perfect, performance of our effectiveness algorithm applied to administrative claims data. By way of comparison, the corresponding performance characteristics of administrative data for a number of rheumatology conditions, including diagnoses for RA, spondyloarthropathies, systemic lupus erythematosus, fibromyalgia, osteoarthritis, joint injection and joint replacement procedures [15
] were similar and ranged from approximately 80% to 95%. Besides a new or worsened comorbidity, the most common reason why patients met the effectiveness algorithm criteria but failed to meet the gold standard criteria was that the physician and patient were satisfied with the level of disease activity, despite not having achieved low disease activity or an improvement in the DAS28 by ≥ 1.2 units. In this circumstance, providers may feel that the patient is getting at least some benefit from the drug and that the clinical response is adequate to continue its use. It is also possible that quantitative disease activity measures such as the DAS28 may not adequately capture underlying RA disease activity for some patients (for example, those with concomitant fibromyalgia). Moreover, patients may fear that their condition will worsen after switching to a new therapy or may have trepidation regarding new side effects [28
], and therefore they may be reluctant to change medications. Further studies are needed to validate the effectiveness algorithm in other data sets and RA patient populations. However, these results are encouraging and suggest that administrative data can be used to estimate medication effectiveness for RA patients.
As our gold standard for medication effectiveness, we selected low disease activity (DAS28 ≤ 3.2) or improvement in DAS28 by > 1.2 units. It might be argued that these criteria are not stringent enough, although they are broadly consistent with (albeit not identical to) the European League Against Rheumatoid Arthritis (EULAR) responder definition [26
]. Consistent with our focus on the DAS28, results from a preference analysis found that RA disease activity score (also measured using the DAS28) was the most important factor in rheumatologists' decisions to escalate care [29
]. The results from the Consortium of Rheumatology Researchers of North America (CORRONA) registry showed that low disease activity or a DAS28 improvement > 1.2 units was sufficient for the majority of patients to continue treatment with biologic therapy [30
]. As part of a sensitivity analysis, we modified our gold standard to require patients to achieve only LDA (DAS28 ≤ 3.2) and did not include patients who achieved only some improvement (change in DAS28 ≥ 1.2) in the absence of LDA. This lowered the PPV, indicating that many patients had clinical improvement but did not achieve LDA. Many of these patients were continued on therapy, suggesting that both the patients and physicians were in many cases satisfied enough with the response. We also note that the DAS28 response rate (approximately 30%) (Table ) observed for our clinical effectiveness gold standard was relatively low. However, given the comorbidity profile and other characteristics of the RA patients enrolled in VARA [31
], response rates are typically lower than those reported in clinical trials of more selectively included RA patients with fewer comorbidities [32
Another component of our gold standard is that we required that patients have high (that is, ≥ 80%) adherence to their medication regimen. We recognize that any threshold for adherence is arbitrary. Requiring ≥ 80% compliance is conventional and has been used when studying other conditions, such as osteoporosis and cardiovascular disease [33
]. The main purpose of the adherence requirement was to focus on medication effectiveness. Medications that the patient does not continue, whether for reasons of inefficacy, safety, tolerability or something else, are not effective. Adherence has been required in other observational analyses of comparative effectiveness in RA [37
]. Also, we wanted to maximize confidence in the patient's disease activity's being attributable to the RA treatment started on the index date rather than on a medication that was later substituted because the previous medication begun on the index date had failed. Finally, the requirement of continued adherence to the RA therapy is consistent with clinical trial methodology in which patients who do not adhere to the study protocol, including continuing to take the medication, are generally excluded from the trial. These patients' outcomes are often imputed as nonresponse, which is the same classification to which they were assigned in our effectiveness algorithm.
Although many of the elements of our effectiveness algorithm are intuitive, a few deserve special mention. The requirement that patients not initiate or escalate the dose of oral glucocorticoids assumes that the dominant prescribing indication for glucocorticoids is RA. For patients who may have another indication for glucocorticoids (for example, chronic obstructive pulmonary disease, which is very common in VHA patients), this criterion may not perform optimally. As described in Table , this issue was the most common reason why patients failed the effectiveness algorithm. Our algorithm might be expected to perform better in other RA populations that have been shown to have a lower prevalence of comorbidities for which systemic glucocorticoids are used [31
]. We also limited the number of intraarticular injections allowable to no more than 1 unique day on which the patient received such injections. VA physicians are not directly compensated for these injections and other procedures and therefore are likely to underreport them. For this reason, our effectiveness algorithm may perform better when there is a financial incentive to code these procedures more accurately. We also found certain comorbidities (for example, fibromyalgia and depression) were common, and we hypothesized that they might be associated with high patient global scores even if the patient's RA is under good control. This is not a unique feature of the VARA cohort or our study, but is potentially problematic for the measurement of patient-reported outcomes in all RA studies that include patients with these conditions. Restricting the population to individuals without these comorbidities improved the PPV of our effectiveness algorithm by 6%, but limits our study's generalizability as it excluded one-third of our data.
The strengths of our study include evaluation of a large number of patients participating in a RA registry at 11 VA medical centers. All patients had rheumatologist-confirmed RA and well-characterized measures of RA disease activity. The novel linkage between the registry and the national VHA administrative data made developing and testing of our effectiveness algorithm possible. Additionally, there are strong financial incentives for RA patients to fill their biologic medications within the VHA system, and it is likely that most if not all RA medications were captured in the VHA administrative data. Despite these strengths, we acknowledge the potentially limited generalizability of patterns of care in the VHA system, and the possible dissimilarity in the RA patients who receive treatment in that system, compared to other RA populations. However, sensitivity andspecificity, unlike PPV and NPV, should be less dependent on the prevalence in the population, and more reflective of the test itself, thereby decreasing the impact of any unique features of the VA population. Moreover, we might expect that the PPV and NPV of the algorithm might perform better in other RA cohorts, given the higher prevalence of comorbidities in this VARA population compared to other RA cohorts [31
]. We also acknowledge that while the effectiveness algorithm, which was based upon factors selected from content knowledge, appeared to perform well and have good face validity in VARA, further validation in more recently recruited VARA participants who were not included in our sample, and in different RA cohorts where there is a link to administrative data, is needed to confirm our algorithm's robustness. We also recognize that using more empirical approaches to let the data guide optimization of the algorithm would be desirable, but substantially more data would be required for this approach and for validation. Finally, as an additional opportunity to extend the algorithm in the future, we note that our effectiveness outcome was measured at 1 year, and assessing effectiveness at other time points (for example, at 6 and 24 months) is important. Although we expect similar performance of the algorithm at these different time points, this hypothesis remains to be confirmed.