An evaluation of the predictive validity of different candidate definitions of remission in the ESPOIR cohort, a practice-based observational study, shows that the new ACR/EULAR trial-based remission definitions have high predictive validity for good outcomes in clinical practice.
Practice-based definitions suggested by the committee focused on those definitions that did not include acute phase reactants, which were felt to be difficult to obtain during a clinic visit. Our analyses validated the committee's choices and suggested that those recommended would perform well in practice. While the definitions recommended by the committee did not necessarily have the highest positive predictive values and lowest P-values of all those tested, they performed well, and using these definitions, the proportion in remission doing well was subsequently within 2% of the top-performing definitions (see Table ).
Predictive validity was a critical element in the selection of definitions of remission for RA. It was felt by the ACR/EULAR committee that persons in remission at one time point should have more favorable later RA outcomes than persons not in remission. Our data suggest that the RA patients who attain the recommended definitions of remission in practice have a high likelihood of good future outcomes. As in the trial data analysis, our results suggest that the Boolean and SDAI/CDAI definitions of remission are better at predicting good outcomes than disease activity score (DAS)-based definitions of remission.
One other study has examined the relationship between different definitions of RA remission and functional and radiographic status [11
]. While also focused on patients with recent onset disease, this used data from a trial, the BeSt study, which was not conducted in a practice based setting. In this study, versions of the DAS were as strongly associated with good outcomes as the ACR/EULAR-recommended definitions. It should be noted that the design of the BeSt study, in which patient treatment was guided by the DAS score [12
] dictates that outcomes cannot be independent of the DAS score, and this makes it likely that DAS scores would be associated with major outcomes. Thus, it is not surprising that in this analysis of BeSt being in DAS remission portended good future outcomes.
Evaluating predictors of outcomes in observational studies like ESPOIR is not as straightforward as doing these analyses in clinical trials. First, it is hard to define a fixed time of remission. Among patients in some cohorts (although not in ESPOIR) we do not know when RA treatment was initiated; the majority might have already received multiple RA treatments before they enter the study. Second, some patients might already be in remission when they enter the study. Since the treatment protocol is not controlled as in a trial, it does not make sense to set a fixed time point after cohort entry as the time of remission, because some patients will reach remission at other time points. Also, because patient population heterogeneity is higher in observational studies than in clinical trials, some patients may respond more slowly or more quickly to new treatments than those in a trial.
There are also challenges in choosing a time point for "non-remission" so that valid comparisons can be made between non-remission and remission groups. The time to a non-event, that is, time to non-remission for a patient in an observational study, is either impossible to define or can only be arbitrarily defined. For those never reaching remission, it can be any time from baseline to the end of the study. Arbitrary fixing of the time of non-remission may introduce bias. Besides, since the time of remission for patients achieving remission is dynamic, it makes sense not to fix the non-remission time. We used bootstrap methods to create samples of non-remission patients. The well established advantages of bootstrap methods are that 1) they require fewer assumptions (for example, normality of the parameter estimates is not required); 2) they produce more precise and stable (that is, valid) parameter estimates than classical methods; 3) standard errors, confidence intervals and other parameters are easy to derive based on the distribution of the bootstrap parameter estimates to make inferences; and 4) the results are stable [6
There are a number of limitations to our study. First, we limited our comparison of candidate definitions of remission to those evaluating predictive validity. Other considerations are important too, such as face and content validity, feasibility, and reproducibility. The ACR/EULAR committee considered some but not all of these in its deliberations. Remission was defined by the ACR/EULAR committee using a data-driven consensus process. This type of process used for all consensus efforts in rheumatology combines expert opinion and data analysis. Prior to data analysis, there was input from the committee (that included experts in RA research), as to which measures should be included in a remission definition; the committee dictated that swollen and tender joint counts and CRP were mandatory. Although we added other variables to these, this committee decision determined subsequent variable selection and heavily influenced the selection of candidate remission definitions. This could be regarded a controversial; other recommendations or even an agnostic variable-driven approach might have produced a different definition of remission and our analyses in this paper might also have tested other options as definitions, ones not considered by the ACR/EULAR committee.
Among the limitations of the study is that we studied only one observational cohort. More observational studies from different geographical regions and study populations are needed to make sure that our single validation is generalizable to other samples of patients. However, the data from in this study are from a comprehensive large nationwide cohort of persons with RA.