|Home | About | Journals | Submit | Contact Us | Français|
To prospectively validate the provisional criteria for the evaluation of response to therapy in children with systemic lupus erythematosus (cSLE).
In this multi-center study cSLE patients (n=98; F: M = 81:17; 50% white; 88% non-Hispanic) were followed every 3 months for up to 7 visits (total number of visits 623). The five cSLE core response variables were obtained at the time of each visit: (1) physician assessment of overall disease activity; (2) parent assessment of patient well-being; (3) Child Health Questionnaire; (4) proteinuria; and (5) global disease activity measure score as measured by the ECLAM, SLEDAI, or SLAM. Physician-rated relevant changes in the disease course (clinically relevant improvement; no change in disease, or worsening) between visits served as the criterion standard. Mixed models were used to assess the diagnostic accuracy of the four highest-ranked provisional definitions of response to therapy.
There were 89 episodes of clinically relevant improvement between two consecutive visits, and 448 episodes without improvement. Irrespective of the choice of the global disease activity measure (ECLAM, SLAM, SLEDAI), sensitivities of all four highest-ranked definitions were low (all ≤ 31%), while their specificities were excellent (all > 88%). Using logistic models, alternative definitions can be developed with both 80% sensitivity and specificity.
The provisional criteria of response to therapy in cSLE may have considerably lower sensitivity than previously reported. Additional validation in clinical trials is necessary to further evaluate the measurement properties of the provisional PRINTO/ACR Criteria for Response to Therapy in cSLE.
Systemic lupus erythematosus (SLE) is a complex, chronic multi-system autoimmune inflammatory disease, and up to 20% of SLE patients are diagnosed during childhood, i.e. prior to the age of 16 years (cSLE) (1, 2). Compared to adults with SLE, patients with cSLE have more often severe disease phenotypes, including a higher prevalence of kidney involvement (3).
Highly sensitive and specific surrogate markers are needed to serve as primary outcome measures of clinical trials of cSLE that study the efficacy of novel medications. The lack of validated surrogate markers is considered a major barrier to the testing of safer and more effective therapies for cSLE (4).
Using consensus methodology the ‘PRINTO/America College of Rheumatology (ACR) Provisional Criteria for the Evaluation of Response to Therapy’ for children with cSLE were developed. Initial studies suggest that these criteria can measure response to therapy (or clinically relevant improvement) of individual patients with high sensitivity and specificity, using an algorithm that considers percentage changes of five cSLE core set parameters (5). Parameters include the score of an index of global disease activity, physician assessment of overall disease activity, parent assessment of patient overall well-being, proteinuria, and patient health-related quality of life (HRQoL) (6).
The Classification and Response Criteria Subcommittee of the Committee on Quality Measures of the ACR pointed out that validation of any outcome measures or response definition is a dynamic process. Confirmatory studies are mandated to substantiate the usefulness of response criteria in other patient cohorts and by using different raters than those involved in the criteria development (7).
Therefore, we undertook a prospective cohort study to corroborate the measurement properties of the ‘PRINTO/ACR Provisional Criteria for the Evaluation of Response to Therapy’. We specifically investigated their sensitivity, specificity, positive and negative predictive values for identifying cSLE patients who have experienced clinically relevant improvement.
Children (n=98) fulfilling American College of Rheumatology Classification Criteria for SLE (2) prior to the age of 16 years were recruited consecutively during routine clinic visits at seven academic pediatric rheumatology centers in the United States. Study visits occurred every three months for up to 18 months; each time height, weight, findings on physical examination and medication regimens were recorded, and disease activity and health-related quality of life (HRQoL) were measured.
In brief, changes in five cSLE core parameters are used to define improvement with cSLE: (1) physician assessment of overall disease activity as measured on a visual analog scale (VAS) with a range from 0 to 10 (0 = inactive disease; 10 = very active disease); (2) parent assessment of patient overall well-being as measured on a VAS with a range from 0 to 10 (0 = very poor; 10 = very well); (3) global disease activity as measured by a validated disease activity index (details see below); (4) HRQoL as measured by the Child Health Questionnaire (CHQ-P50) physical summary score (CHQ-PHS; details shown below); and (5) renal involvement as measured by daily proteinuria (5).
Consensus methodology and data-driven validation resulted in several proposed candidate criteria of improvement; the four highest ranked criteria (a – d) were tested for this study in more detail: (a) Improvement of two of any five core variables by ≥ 50% without worsening of more than one by ≥ 30% and without increase in proteinuria; (b) Improvement of two of any five core variables by ≥ 40% without worsening of more than one by ≥ 30% without increase in proteinuria; (c) Improvement of three of any five core variables by ≥ 30% without worsening of more than one by ≥ 30% without increase in proteinuria; (d) Improvement of two of any five core variables by ≥ 50% without worsening of more than two by ≥ 30% without increase in proteinuria.
It has been suggested that any of three indices of global disease activity can be used interchangeably as core set parameter when measuring improvement: (a) the SLAM (Systemic Lupus Assessment Measure; range 0 – 81; 0 = inactive disease) (11); (b) the SLEDAI (SLE Disease Activity Index; range 0 – 105; 0 = inactive disease) (12), or the ECLAM (European Consensus Lupus Assessment Measure) (13). Different from the SLAM and the SLEDAI, the global disease activity score of the ECLAM does not correspond to the sum of its item scores. Rounding procedures and special scoring rules for patients with single organ involvement are considered, and this yields integer ECLAM summary scores between 0 and 10 (0 = inactive disease).
For the cSLE core set, HRQoL is measured by the Child Health Questionnaire (CHQ™), a generic HRQoL inventory whose parent-completed version (CHQ PF-50) has been translated into numerous languages and culturally cross-validated for use in cSLE (14–16). Two summary scores can be derived to measure Psychosocial Health (CHQ-PSS) and Physical Health (CHQ-PHS), respectively. The CHQ-PHS is the proposed measure of HRQoL to be considered in cSLE core set.
A timed urine collection as has been suggested for the cSLE core set to assess renal involvement (5). For this study, we measured the protein:creatinine ratio in a random urine sample instead. The rationale was that, in recent years, the protein:creatinine ratio has been proven to be an accurate approximation of daily protein excretion (17). The protein:creatinine ratio is now commonly used in clinics and even in clinical trials to measure proteinuria (18).
Values of the protein:creatinine ratio at ≥ 0.2 were considered abnormal; values of < 0.2 were considered normal (19), and all smaller values were rounded up to 0.15. Any changes within the range of normal values were not considered as improved for the purposes of the cSLE core set.
The British Isles Lupus Activity Group (BILAG) Index is another validated SLE disease index but it has not been proposed for use in the cSLE core set. For this study, the BILAG was completed as a potential alternative measure of disease activity (20, 21). For each of the eight organ-systems considered (general, mucocutaneous, neurological, musculoskeletal, cardiovascular & respiratory, vasculitis, hematology, and renal) an alphabetical domain score is obtained that can be converted to a numerical value by the use of one of three conversion schemes, as suggested by Gladman et al (22) (A= 4; B= 3; C= 2; D= 1; E= 0), Liang et al (11) (A= 10; B= 6.7; C= 3.3; D or E= 0), and Stoll et al (23) (A=9; B=3; C=1; D or E= 0), with higher BILAG scores signifying higher disease activity. Global disease activity as measured by the BILAG is the sum of the numerical domain scores.
The original cSLE core set comprises six core variables for disease activity (5). Besides the five cSLE core variables considered in the cSLE Criteria for Response to Therapy, there is a parameter to represent immunological disease activity. This is achieved by measuring levels of anti-dsDNA antibodies. For this study, anti-dsDNA antibodies were measured by the investigators as part of standard of care, using various laboratory assays. To be considered as improved in this study, anti-dsDNA antibodies had to decrease by a certain percentage plus either be newly within the normal range (previous visit abnormal) or remain above the upper bounds for normal (stay abnormal). Further decreases of values of anti-dsDNA antibodies that were already in the range of normal were not considered as improved.
In response to the sentence stem, ‘Compared to the last study visit three months ago and the patient’s overall disease, the patient experienced a’, the managing pediatric rheumatologist rated the change in disease course on a 5-point Likert-scale as follows: major flare of disease; minor flare of disease; no change in disease; minor improvement of disease; or major improvement of disease.
In exploratory analysis, we assessed how the provisional criteria of improvement would reflect the family’s perspective. Thus, the parent rated the change of their child’s disease on a 5-point Likert scale (much worse; somewhat worse; unchanged; somewhat improved; or much improved) that was presented with the sentence stem ‘Compared to the last study visit three months ago, and when considering medications, school, work, life at home, doctor visits, pains and feelings the overall well-being is’.
Numerical variables were summarized by mean ± standard error (SE) or standard deviations (SD); categorical variables were summarized by frequency (in percent).
Numerical core sent variables were assessed for their associations to the PRINTO/ACR Improvement Criteria (a dichotomous variable of improved vs. non-improved or unchanged) using mixed effect models that adjusted for patient demographic and baseline clinical characteristics. A random effect was used to account for within-patient correlation caused by repeated measurements. In order to predict the likelihood of improvement as per the PRINTO/ACR criteria (a dichotomized dependent variable) using core set variables, both univariate and multivariate logistic regression models were applied. A generalized estimating equation method was used in the logistical regression models to account for within-patient correlation. In the univariate logistical regression models, each of core set variables was considered the only predictor at the time; while in multivariate logistic regression models all core set variables were included as predictors of interest. In order to assess if the predicted PRINTO/ACR Improvement was sensitive to the choice of different global disease activity indices, a series of multivariate models were generated, using one of the global disease activity indices at a time. Contributions of other core set variables were also assessed in the competing multivariate models by deleting one of these core set variables at a time. The predicted log odds (or scores) of improvement from logistic regression models were further used to assess their diagnostic accuracy using receiver operating characteristic (ROC) curve and the area under the curve (AUC), sensitivity and specificity, respectively. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were also used to evaluate diagnostic accuracy of four highest-ranked PRINTO/ACR Definitions of cSLE Response to Therapy. Confidence intervals of AUC were estimated using a bootstrap method with a total of 2,500 replicates for each model (24, 25).
Statistical computations were performed using SAS version 9.2 (SAS, Cary, NC) software. P-values < 0.05 were considered statistically significant. Diagnostic accuracy was considered excellent, outstanding, good, fair, and poor if AUC was in the range of 0.9 – 1.0, 0.81 – 0.90, 0.71 – 0.80, 0.61 – 0.70, and 0.50 – 0.60, respectively (26).
The study was approved by the institutional review boards of the participating pediatric rheumatology centers. Informed consent was obtained from all parents and, as appropriate, assent was given by the participants, prior to the study procedures.
The demographics and disease features of the cSLE patients were summarized in Table 1. A total of 98 children (F: M = 81: 17) were included in the analysis. The population consisted of 49 Caucasian, 32 African-American, 3 Asian and 3 mixed-racial patients (87 Non-Hispanics, 11 Hispanics). Data from a total of 623 visit (or 526 between visit intervals) were available for analysis. There were 39 patients with biopsy-proven lupus nephritis. The mean ± SD damage as measured by the Systemic Lupus International Collaborating Clinics/ ACR Damage Index (27) was 0.42 ± 0.1. The global disease course with cSLE on consecutive visits during the study are depicted in Figure 1. There were 35 renal flares (major or minor), and 42 episodes of renal improvement (major or minor).
The mean changes of the core set parameters by disease course (non-improved or flare and unchanged combined vs. improved) as rated by the managing pediatric rheumatologist are summarized in Table 2. Compared to patients who were not improved, the core variables of patients rated as improved significantly changed with the exception of proteinuria and the CHQ-PHS.
When only children with biopsy-proven lupus nephritis were considered, then the change in the mean ± SE of proteinuria was −0.1 ± 0.4 and −0.005 ± 0.17 for improved and non-improved courses, respectively (p=0.83).
Irrespective of the index used in the core set to measure global disease activity and for all candidate criteria assessed, the sensitivity did not exceed 31%, and the PPV did not exceed 48% (Table 3). However, specificity and the NPV remained high when using this validation data set at ≥ 89% and ≥ 84%, respectively. Of note, some candidate definitions that considered the BILAGStoll or BILAGGladman were more sensitive and at least equally specific.
When considering the family’s perspective (patient change in health) as criterion of whether improvement had occurred or not, the sensitivity of none of the four highest-ranked definitions exceeded 23% and the PPV was < 61%, while specificity, and the NPV remained high at > 88% and 61%, respectively. The choice of the measure of global disease activity index (SLAM, SLEDAI, ECLAM, BILAG) did not have a major impact on the above mentioned measurement properties.
When only patients with biopsy-proven lupus nephritis were considered, depending on the index used in the core set to capture global disease activity, the sensitivity of all of the four highest-ranked candidate improvement definition did not exceed 43% and the PPV not 56%, while specificity and the NPV remained high when using this validation data set at ≥ 87% and ≥ 85%, respectively.
Using univariate logistic regression, individual core set variables were found to contribute to a different degree to the measurement of the construct ‘improvement of cSLE’ (Table 4). In multivariate logistic regression, models that considered percentage changes of five or all six cSLE core set variable as possible predictors (outcome: physician rated improvement of cSLE), we generated additional candidate criteria of cSLE improvement with higher sensitivities and still acceptable specificities. The alternative candidate criteria that appeared to lend themselves best for potential use in clinical care and research, based the face validity of the underlying algorithms, are shown in Table 4. When only patients with renal involvement were assessed (n=39), then proteinuria contributed to a similar degree to the identification of patients who had improved (AUC= 0.51). When the family perspective of patient improvement was considered, then the AUC for the CHQ-PHS was slightly higher and at 0.61. A simplified version (called Model 1S), using rounded regression coefficients from the multivariate model (Model M1), showed an AUC of 0.82, supporting excellent accuracy for identifying patient who have improved. In Figure 2 we present the AUC for these alternative definitions. From the score derived from each of the regression functions, the likelihood that improvement has occurred can be deduced. Using these somewhat more complex algorithms, candidate improvement definitions with sensitivities as high as 80% and equally high specificities can be derived.
Validated response criteria allow investigators, clinicians, regulators, and patients to determine the efficacy (or lack thereof) of a given intervention and to communicate about response using the same metric. We undertook a prospective cohort study to validate the ‘PRINTO/ACR Provisional Criteria for the Evaluation of Response to Therapy’. We confirm the high level of specificity, but the sensitivity of the criteria was much lower than previously reported. This was true for all four previously proposed highest-ranked candidate criteria. Based on our results one might expect that, when used in the setting of a clinical trial, ‘the PRINTO/ACR Provisional Criteria for the Evaluation of Response to Therapy’ may underestimate the rate of responders. Thus larger samples size might be necessary to establish a significant difference between treatment arms than if more sensitive criteria were available.
Reasons why there were such a remarkable difference in sensitivity between our study and a previous study (6) are not completely known. They may include that evaluations were done every three months and not every six months and that the disease features of our patients were different from the previously studied multinational cohort. For example, our patients were somewhat older and had less severe disease at baseline, as is supported by lower disease activity scores, lower values on the VASMD, and higher on the VASwell, respectively. However, given the recruitment approach taken and based on our and the research of others, the patients included in this study can be considered representative for a contemporary cSLE cohort in the U.S.
Another reason for the less favorable performance of the provisional criteria might be differences in the experimental design. Instead of patient profiles used during a consensus conference, this study’s raters judged the course of an individual with cSLE based on physical assessment and standard of care laboratory tests. Nonetheless, our results appear to be well in line with the observations of Ruperto et al. who found only a moderate agreement between the consensus ratings and the ratings of the managing pediatric rheumatologists (kappa =0.4; 95% confidence interval 0.2 – 0.6). Given that the response criteria are to be used for the standardized assessment of real patients, we believe that it is critical to employ criteria that mirror the perceived disease course of true patients.
All raters who provided information on the course of disease activity (clinically important improvement or not), i.e. information about the external standard used for this validation exercise, are board-certified or board-eligible pediatric rheumatology professionals who see, on average, 20 patients with cSLE per week in their academic center and have a 10-year experience in treating cSLE. All raters underwent detailed and repeated training in scoring disease indices and completing the cSLE core set.
The six cSLE core response variables were developed by well-established consensus formation techniques (5). However, titers of anti-dsDNA antibodies are not considered when defining improvement using the PRINTO/ACR Criteria. We revisited the usefulness of changes in anti-dsDNA antibodies and, in univariate analysis, found them to contribute to the identification of patient improvement to a larger degree than proteinuria in a cohort of cSLE patients of which 40% had biopsy proven lupus nephritis. However, when considering changes in anti-dsDNA antibodies in the current algorithm used for the ‘PRINTO/ACR Provisional Criteria for the Evaluation of Response to Therapy’ as a sixth core variable, the sensitivity of the criteria did not improve importantly (data not shown).
The specific disease activity tool to measure global disease activity was not firmly chosen for the ‘PRINTO/ACR Provisional Criteria for the Evaluation of Response to Therapy’. Therefore, we explored whether there was a preferred disease activity index that should be used. Similar to what has been suggested by Ruperto et al., differences in sensitivity and specificity were small. Although response criteria considering the BILAG (as a measure of global disease activity) may have a somewhat higher sensitivity than those considering the ECLAM, SLEDAI or SLAM, this must be weighed against the complexity of the BILAG, which could result in a considerable measure error should the BILAG be scored by less experienced and trained raters.
We generated several alternative definitions of improvement that consider combinations of various core response variables using weightings derived by multivariate logistic regression modeling. Different from the current candidate definitions of response to therapy that treat each of the core set variables as equally important in prediction, the multivariate logistic models considered the different degree of contributions of each core set variable (via beta coefficients of the logistic model) to predict the outcome, i.e. improvement with cSLE. Candidate criteria of improvement derived by multivariate logistic models provided better diagnostic accuracy in terms of AUC, sensitivity and specificity than any of the provisional candidate definitions proposed in the past. In candidate criteria derived by logistic regression, again, the choice of the disease activity measure was not important. The regression function for each of these alternative definitions yields a score that can be translated to a certain probability that improvement has occurred. The presented algorithms are similar to the one used to calculate absolute disease activity using the DAS index (28). Although response criteria using absolute changes (and a regression formula) rather than percentage changes of core response variables are not commonly used in pediatric rheumatology at present, such criteria may actually be easier to use in clinical practice because the cumbersome calculation of percentage changes (as is done for other pediatric rheumatology response criteria) becomes unnecessary. Furthermore, difficulties of criteria based on percentage changes when assessing very active or very mild disease are circumvented.
This study must be seen in the light of certain limitations. Our dataset was relatively small, and all patients were followed in the U.S. by about 20 pediatric rheumatologists. Theoretically, these physicians might have judged the course of cSLE differently than the “average” pediatric rheumatologist who is taking care of children with SLE. However, all participating rheumatologists were rather experienced and see patients with various ethnic and racial backgrounds as is common in the U.S.
Additionally, response to therapy in our study was based on the physician’s perception of the course of cSLE rather than using data from a clinical trial. Clinical trial data from large number of cSLE participants testing interventions that have an impact on disease activity are unavailable at the current time. Given its prospective character and the training of the investigators performed, we consider our data to be of as high quality as that collected for clinical trials. This is supported by the fact that the rate of missing data for the cSLE core variables was less than 2%. Another limitation may be that we used the protein:creatinine ratio to estimate the degree of proteinura, an approach that is disputed by some (17, 29).
The ACR has outlined a series of validation steps necessary before new criteria are to be widely used for clinical care or research (7). Therefore, additional validation of the newly proposed and previous suggested candidate definitions of improvement appears warranted. Furthermore, the assessment of the measurement properties of any definition of improvement must also include the testing in so-called ‘extreme phenotypes’, meaning in patients with either very active or relatively quiescent disease and/or with disease activity restricted to single organ systems or rarely involved organ systems. In view of the recent progress in identifying biomarkers of cSLE global disease activity (30, 31), the inclusion of biomarker levels as alternative or additional cSLE core set variable(s) in any definition of improvement may also deserve consideration.
CCHMC, Cincinnati, OH: Drs. Bob Colbert, T. Brent Graham, Murray Passo, Thomas Griffin, Alexi Grom, and Daniel Lovell,
Nationwide Children’s Hospital, Columbus, OH: Dr. Robert Rennebohm
University of Chicago Comer Children's Hospital, Chicago, IL: Drs. Charles Spencer and Linda Wagner-Weiner
Texas Scottish Rite Hospital, Dallas, TX: Shirley Henry PNP
Medical College of Wisconsin, & Children's Research Institute, Milwaukee, WI: Drs. James Nocton and Calvin Williams; Elizabeth Roth-Wojicki, PNP
CCHMC, Cincinnati, OH: Shannen Nelson (study coordinating), Jamie Meyers-Eaton and Cynthia Rutherford (site coordinators); Amber Khan, MD, Clinical Fellow, Division of Rheumatology, University of Cincinnati College of Medicine, Cincinnati, OH (data entry); Lukasz Itert (database management). CCHMC Biomedical Informatics (Web-based data management application development).
Texas Scottish Rite Hospital, Dallas, TX: Shirley Henry, PNP
University of Chicago Comer Children's Hospital, Chicago, IL: Becky Puplava (site coordinator)
Children’s Memorial Hospital, Chicago, IL: Dina Blair (site coordinator)
Medical College of Wisconsin, & Children's Research Institute, Milwaukee, WI: Marsha Malloy (data collection & site coordinator), Jeremy Zimmermann, Joshua Kapfhamer and Noshaba Khan (data collection).
The study is supported by grant funded NIAMS 5U01AR51868, and P60-AR047884