|Home | About | Journals | Submit | Contact Us | Français|
To develop a definition of global flare in jSLE and derive candidate criteria for measuring jSLE flares.
Pediatric rheumatologists answered two Delphi questionnaires to achieve consensus on a common definition of jSLE flare and identify variables for use in candidate flare criteria. The diagnostic accuracy of these candidate flare criteria was tested with data from jSLE patients (n=98; 623 visits total). Physician-rated change in the jSLE course (worsening yes/no) between visits served as the criterion standard.
There was 96% consensus that a “a flare is a measurable worsening of jSLE disease activity in at least one organ system, involving new or worse signs of disease that may be accompanied by new or worse SLE symptoms. Depending on the severity of the flare, more intensive therapy may be required”. Variables suggested for use in flare criteria were: physician-rated disease activity (V1), patient well-being, protein:creatinine ratio, validated disease activity index (V2), Child Health Questionnaire physical summary score (V3), anti-dsDNA antibodies, ESR, and complement levels. Using multiple logistic regression, several candidate flare criteria were derived with area under the receiver operating characteristic curve (AUC) as high as 0.92 (sensitivity≥ 85%; specificity≥ 85%); CART analysis suggested that V1, V2 and V3 suffice to identify jSLE flares (AUC = 0.81; sensitivity = 64%; specificity = 86%).
Consensus about a definition of global disease flare with jSLE has been obtained and promising candidate flare criteria have been developed. These will need further assessment of their ease-of-use and accuracy in prospective study.
The disease course of juvenile SLE (jSLE), i.e. SLE with onset prior to the age of 16 years, is characterized by episodes of clinically relevant worsening followed by periods of jSLE improvement (1–3). The Pediatric Rheumatology International Trial Organization (PRINTO) led an international effort to define jSLE core response variables (jSLE-CRVs) for describing jSLE disease courses (4). Using some of these jSLE-CRVs, criteria for jSLE improvement have been proposed (5), while flare criteria have not been developed. Prior to developing criteria, a commonly accepted definition of jSLE disease flares is needed. Because disease flares are not the opposite of disease improvement, one cannot assume a priori that the jSLE-CRVs used in the jSLE improvement criteria suffice to develop algorithms suited for accurately recognizing children with jSLE flares. The SELENA Flare tool and the Responder Index for Lupus Erythematosus (RIFLE) were developed (6, 7) to diagnose adult SLE flares; their usefulness for identifying clinically relevant worsening of jSLE has not yet been studied.
The objectives of this study were (1) to define global flares in jSLE using formal consensus techniques, (2) to develop data-driven candidate criteria suited to discriminate jSLE patients who experience a global flare of jSLE from those with stable or improving disease, using the jSLE-CRVs and/or other disease parameters considered relevant based on expert consensus, and (3) to examine the usefulness of existing flare measures proposed for adults with SLE.
The goals of the Delphi survey were to define global flares with jSLE and obtain a list of signs or symptoms that are considered relevant when diagnosing jSLE flares. The leading bodies of PRINTO, the Pediatric Rheumatology European Society, the Childhood Arthritis and Rheumatology Research Alliance, the pediatric sections of the Pan-American League of Arthritis & Rheumatology and the American College of Rheumatology were all approached for their membership lists of practicing pediatric rheumatologists with interest in jSLE (trainee members were excluded). Based on literature review, an initial Delphi questionnaire was developed and sent to the identified physicians. Using the results of the first survey, a second questionnaire was sent to the respondents of the first survey who indicated continuous interest in being involved in the project. As commonly done, the level for consensus was set at 80% (8).
Children (n=98) diagnosed with jSLE at age ≤ 16 years (9), were recruited consecutively during routine visits at seven pediatric rheumatology clinics in the United States. Study visits occurred every 3 months for up to 18 months. Age, gender, height, weight, and findings on physical examination were recorded, and information on medication regimens was obtained. At each study visit disease activity was measured and a subset of 20 physicians prospectively rated the course of these patients as is detailed below. The study was approved by the institutional review boards of the participating pediatric rheumatology centers. Informed consent was obtained from all parents and, as appropriate, assent was given by the participants, prior to the study procedures. These data were used to test candidate flare criteria for jSLE.
In response to the sentence stem, ‘Compared to the last study visit three months ago and the patient’s overall disease, the patient experienced a’, the managing pediatric rheumatologist rated the change in disease course on a 5-point Likert-scale as follows: major flare of disease; minor flare of disease; no change in disease; minor improvement of disease; or major improvement of disease. Patients who were rated to have ‘major flare of disease’ or a ‘minor flare of disease’ were considered as having a global flare of disease while all others were regarded as not experiencing a jSLE flare. All pediatric rheumatology professionals (n=20) who provided the above ratings for the course of jSLE, i.e. information about the external standard used for this validation exercise, were board-certified or board-eligible and had, on average, 10-year experience in managing jSLE.
Information about the development and initial validation of the jSLE-CRVs is available elsewhere (5, 8). In brief, the six jSLE-CRVs used to define improvement with jSLE are:  physician assessment of overall disease activity (MD-VAS) as measured on a visual analog scale with a range from 0 to 10 (0 = inactive disease; 10 = very active disease);  VAS of parent assessment of patient overall well-being (0 = very poor; 10 = very well);  the score of a global disease activity index: several validated instruments were considered i.e. the SLAM (Systemic Lupus Activity Measure) (10), the SLEDAI (SLE Disease Activity Index) (6, 11), the BILAG (British Isles Lupus Assessment Group Index) (12–14), and the ECLAM (European Consensus Lupus Assessment Measure) (15) with scores of ‘0’ indicating inactive disease in all of these indices;  health-related quality of life as measured by the Child Health Questionnaire physical summary score (CHQ-PHS);  renal involvement as measured by daily proteinuria. For this study, the protein:creatinine ratio in a random urine sample was measured (16, 17); and  a marker of immunological activity as measured by anti-double-stranded DNA antibody levels: for this study, anti-dsDNA antibodies were measured as part of standard of care, using various laboratory assays. To be considered as ‘worse’ in this study, anti-dsDNA antibodies had to increase by a certain percentage plus either be newly within the abnormal range (previous visit normal) or remain above the upper bounds for normal (remain abnormal). Increasing anti-dsDNA antibody values that remained within the normal range were not considered ‘worse’.
The Selena Flare Tool rates patients as being ‘unchanged/better’, having a ‘mild to moderate flare’ or ‘severe flare’, depending on changes in the SELENA-SLEDAI, certain disease events and changes in the SELENA physician global assessment (range 0 – 3; 0 = inactive) (6). The RIFLE is a 60-item measure of change in SLE activity (7, 18). Items are rated on a 5-point Likert scale as ‘not present’, ‘partial response’, ‘complete response’, ‘present or ‘worsening’ (7). It has been suggested that clinically important worsening of SLE is present when there are at least three RIFLE items scored as showing ‘worsening’.
Numerical variables were summarized by mean ± standard error (SE) or standard deviations (SD); categorical variables by frequency (in %). We used mixed effect models to assess associations between the jSLE-CRVs as well as additional jSLE flare parameters (jSLE-FPs) as per Delphi survey and the fixed effect of interest, i.e. jSLE flares or worsening (yes/no). A random effect model was used to account for within-patient correlation due to repeated measurements. Means of dependent variables with worsening/flare versus no worsening/no flare were compared under the mixed effect model framework.
Before performing formal analyses, all numerical variables were assessed for central tendency, dispersion and skewness to ensure they fit the major assumptions of parametric statistical models. Log-transformed variables were used otherwise. All major outcome variables collected from the seven different centers were assessed for center effect to determine if statistical models needed to be adjusted for center effect in formal analyses.
To generate data-driven jSLE candidate flare criteria, we took four different approaches:  We examined whether candidate flare criteria using simple percentage changes of the jSLE-CRVs suffice to accurately identify patients with flares. In this analysis we focused on changes between 30% and 50%, as were favored for the four highest-ranking jSLE Criteria of Improvement under consideration of various validated disease indices (SLEDAI, ECLAM, SLAM, BILAG) (5). Although clearly flare of disease cannot be considered the opposite of disease improvement, we specifically assessed the reversed jSLE Improvement Criteria. We tested several other candidate flare criteria using percentage changes of jSLE-CRVs and/or the jSLE-FPs;  each of the jSLE-CRVs and the jSLE-FPs were assessed separately as to whether they were sufficed to accurately identify patients with flare.  Candidate criteria that considered absolute changes of multiple jSLE-CRVs and/or jSLE-FPs in multivariate logistic regression models as predictors of jSLE flares were tested. Each flare episode was then assigned a score (essentially a predicted probability of flare × 100, hence scores ranged between 0 and 100) based upon the episode’s observed values of jSLE variables.  Similarly, classification and regression tree (CART) analysis was performed to identify flare episodes using multiple predictors (i.e. jSLE-CRVs and/or jSLE-FPs). Specifically, flare episodes were recursively split into subgroups, using the most powerful predictor of the outcome at a time to form a hierarchical tree, with the final nodes in the tree representing different levels of likelihood of risk of flare, and hence they can be ordered as a score of flare risk (19, 20).
Irrespective of the approach taken to generate a candidate criterion [1 – 4], the diagnostic accuracy of each candidate algorithm was assessed by the receiver operating characteristic curve (ROC), and the corresponding area under the curve (AUC) was calculated. For particular cut-off values of each of the candidate flare criteria (generally that closest to the left upper corner of the ROC curve), the sensitivity and specificity was determined. A candidate criterion was considered outstanding, excellent, good, fair, and poor if an AUC was in the range of 0.9 – 1.0, 0.81 – 0.90, 0.71 – 0.80, 0.61 – 0.70, and 0.50 – 0.60, respectively (21).
The accuracy of the SELENA Flare Tool and RIFLE for identifying jSLE patients with worsening was assessed by kappa (κ) coefficients that can be interpreted as follows: poor agreement: κ < 0.4; fair to good agreement: κ ≥ 0.4 – 0.75; excellent agreement: κ ≥ 0.75 (22).
SAS 9.2 (SAS, Cary, NC) software and SYSTAT 12 (Systat Software, Inc. Chicago, IL) were used for analysis. P-values < 0.05 were considered statistically significant.
The first Delphi questionnaire was sent to 299 physicians, and 159 (53%) provided feed-back. Most respondents had between 11 and 20 years of experience in managing jSLE. They practiced in North America (n= 72), Europe (n= 35), Central and South America (n=38) but also in Asia (n= 5), Africa (n= 3), and Australia (n= 3). Three survey participants did not provide demographic information, and 153 indicated that they wished to continue participating in the project. The second Delphi Questionnaire had a response rate of 84%.
There was consensus about the need for developing a common definition of flare. A global disease flare with jSLE was defined as follows: “A flare of disease is a measurable worsening of SLE disease activity in at least one organ system, involving new or worse signs of disease that may be accompanied by new or worse SLE symptoms; depending on the severity of the flare, more intensive therapy may be required” (consensus 96%).
The group concurred that there should be separate criteria for major (or severe) flares, moderate flares and minor (or mild) flares and that minor flares may not necessarily require therapy. There was no consensus how to best measure flare severity and how to determine when a flare has ended. Agreement was reached that a certain level of disease activity had to be present for at least 3 months for further worsening to be considered a new flare (89% consensus). Additionally, based on consensus, the SLEDAI and ECLAM were ranked as the preferred disease activity indices for use in future criteria of jSLE flares. The group agreed that an increase in the score of a disease activity index alone does not suffice to adequately capture patients who are experiencing a jSLE flare (88% consensus) and that some increases in disease activity do not even constitute a minor flare (86% consensus).
jSLE signs and symptoms that may be considered as variables when developing criteria for jSLE flares included all of the jSLE-CRVs and some additional jSLE-FPs (Table 1). Complete blood counts (CBC) and serum creatinine were considered by 68% of the survey respondents as relevant for measuring jSLE flares. Because abnormally high levels were observed in fewer than 2% of the study visits, serum creatinine was excluded from further consideration as jSLE-FP. Likewise we excluded the CBC because all disease activity indices validated for jSLE already consider CBC information in their global jSLE disease activity score.
The demographics and disease features of the jSLE patients (n=98; F: M = 81: 17) considered in this analysis have been published elsewhere (23). In brief, there were 60 Caucasian, 32 African American, 3 Asian and 3 mixed-racial patients (87 Non-Hispanics, 11 Hispanics). The courses of jSLE between consecutive study visits, as per the managing pediatric rheumatologists, were as follows: 89 episodes of clinically relevant worsening (12 major worsening, 77 minor worsening); 437 episodes without flares (348 episodes with stable disease between visits, 14 major improvements, and 75 minor improvements). The mean ± SD of disease activity at study entry was 7.64±6.01, 5.18±4.35, 5.31±5.44, and 1.85±1.75 when measured by the SLAM, SLEDAI, BILAG, and ECLAM, respectively.
There were 62 episodes of ‘mild to moderate flare’ and 15 episodes of ‘severe flare’ as per the SELENA Flare Tool. Using the SELENA Flare Tool only 41 of 89 episodes of worsening were identified correctly, yielding a sensitivity of 46% and a specificity of 92%. There was fair agreement between SELENA Flare Tool and the external standard (κ ± SE: 0.4±0.05). The RIFLE was 26% sensitive and 95% specific for capturing worsening of jSLE (κ ± SE = 0.26±0.06). Alternative criteria for defining worsening did not improve the accuracy of the RIFLE for capturing jSLE courses (data not shown).
The mean changes of the jSLE-CRVs and the jSLE-FPs by disease course (flare versus no-flare) as rated by the managing pediatric rheumatologist are summarized in Table 2. None of the jSLE-FPs but most jSLE-CRVs significantly changed with flare episodes. Exceptions were proteinuria, the parent assessment of patient well-being, and the levels of anti-dsDNA antibodies.
The center effect was assessed before performing formal analyses on key variables and no evidence was found in this study. Furthermore, statistical models with or without adjusting for center effect showed no difference of findings. This paper, only presents results without adjusting for center effect were represented.
Using uniform percentage changes between 30 and 50%, we generated a series of candidate flare criteria, all with relatively low sensitivity but high specificity (Table 3). As was to be expected based on the results shown in Table 2, none of the AUCs of these candidate flare criteria exceeded 0.67.
CART analysis using the jSLE-CRVs and jSLE-FPs to classify jSLE flares episodes was done. One of the best performing CART models included the MD-VAS, SLEDAI, and CHQ-PHS: patients with a worsening of the MD-VAS by < 1 were least likely to have a flare; those with a worsening of the MD-VAS by ≥ 2 and any worsening of the SLEDAI were most likely to be rated as having a flare. The resulting candidate flare algorithm (worsening of CHQ-PHS by ≥ −10 plus worsening of MD-VAS≥ +1 OR worsening of MD-VAS≥ +2 plus any worsening of the SLEDAI) had an AUCof 0.81, but a sensitivity of only 64% and a specificity of 86%.
In univariate logistic regression, the change scores (between pre-flare and flare visit) of each jSLE-CRVs and jSLE-FPs contributed differently to the measurement of the construct ‘global flare of jSLE’ (Table 4). The single best performing parameter was the change of the MD-VAS (AUC= 0.8), followed by the change scores of the disease activity indices.
As is summarized in Table 5, using multivariate logistic regression models that considered absolute changes of jSLE-CRVs and/or jSLE-FPs as possible predictors of worsening jSLE, we generated additional candidate criteria with higher sensitivities and still acceptable specificities. We also generated models that excluded the MD-VAS because previous research suggested considerable between-physician variability (Table 5). Figure 1 depicts the AUCs for the above mentioned candidate flare criteria based on multiple logistic regression models. With the score derived from each of the regression functions, the likelihood that a flare has occurred can be deduced. Using these somewhat more complex algorithms that included all jSLE-CRVs and jSLE-FPs as predictors, candidate improvement definitions with sensitivities and specificities as high as 94% and 92% were derived (AUC all > 0.82; Table 5). Of note, the consideration of the jSLE-FPs, in addition to the jSLE-CRVs, did not improve the AUC by much, but tended to increase the sensitivity somewhat at the (expected) expense of the criterion’s specificity.
Considering only the jSLE-CRVs, among the best performing models (see MA2; Table 5) was: −2 + 2xΔMD-VAS + 0.2xΔSLEDAI − 0.04xΔCHQ-PhS − 0.1xΔpatient well-being + 0.03xΔProtein:creatinine ratio + 0.02xΔanti-dsDNA antibodies, where Δ indicates the change in score between visits and only the change in MD-VAS, SLEDAI (both p < 0.05) and CHQ-PhS (p < 0.1) appeared relevant predictors of patient worsening in this data set. This might be interpreted as that −2 + 2xΔMD-VAS + 0.2xΔSLEDAI − 0.04xΔCHQ-PhS, may be among the most promising candidate flare algorithms, where score increases of 15 or more are thought to be indicative for a global flare if jSLE.
Validated response criteria allow investigators, clinicians, regulators, and patients to determine the efficacy (or lack thereof) of a given intervention and to communicate about response using the same metric. The need for developing commonly accepted criteria for disease flares has become more urgent since the introduction of randomized withdrawal trials, where the time to flare or the proportion of patients who experience a flare serves as the primary efficacy measure (24). We used consensus methodology to derive a common definition of flare of global flare with jSLE and undertook a prospective cohort study for generating and subsequently testing several candidate flare criteria.
Before one can measure a construct (here: global flare with jSLE) one needs to clearly define it. Using consensus methodology we succeeded in reaching consensus about a common definition of the construct under investigation. Of note, this definition is very similar to the definition of flare suggested by a recent effort of the Lupus Foundation of America.
In an attempt to develop an accurate measure of disease flare with jSLE, we examined whether algorithms that are based on percentage changes of key jSLE signs and symptoms suffice to identify flares with high accuracy. However, irrespective of whether only the jSLE-CRVs and/or the jSLE-FPs were considered, simple equal-sized percentage changes alone appeared inadequate for identifying patients with flare accurately, i.e. with high sensitivity and specificity.
Criteria based on absolute differences of the jSLE-CRVs appeared better suited, especially when using parameter weightings (beta coefficients) derived by multivariate logistic regression modeling. Different from other current candidate criteria of response to therapy that treat each CRV as equally important in prediction, the multivariate logistic models allowed us to consider the relative contributions of each core set variable (through slopes) to predict the outcome, i.e. flares with jSLE. The regression function for each of these alternative criteria yields scores that can be translated into a certain probability of a flare to have occurred. The presented algorithms are reminiscent of the DAS Index used to calculate absolute disease activity with rheumatoid arthritis (25). Although candidate flare criteria that consider not only the jSLE-CRVs but also the jSLE-FPs in this fashion had higher sensitivity and AUCs, the added complexity of such jSLE flare criteria appears prohibitive for their use in clinical care or research. Nonetheless, simpler multivariate algorithms, solely considering all or some of the jSLE-CRVs, may be acceptable and feasible, especially because such criteria are based only on absolute change scores of fewer variables, and the cumbersome calculation of percentage changes (as is done for other pediatric rheumatology response criteria) becomes unnecessary. Furthermore, difficulties of criteria based on percentage changes when assessing very active or very mild disease are even circumvented.
In flare criteria derived by logistic regression or those based on percentage changes, the choice of the disease activity measure did not have much importance. Conversely, the exclusion of the MD-VAS led to a considerable reduction of the accuracy of candidate flare criteria in terms of AUC (decreased by 14% – 23%). Alternatively, algorithms derived by CART-analysis may be useful but close attention has to be paid to achieving sufficient sensitivity.
This study must be viewed in the light of certain limitations. Our dataset was relatively small, and all patients were managed in the U.S. by about 20 pediatric rheumatologists. Theoretically, these physicians might have judged the course of jSLE differently compared to the “average” pediatric rheumatologist taking care of children with jSLE. However, all participating rheumatologists were experienced and regularly looked after patients from various ethnic and racial backgrounds. All raters who provided information on the course of disease activity, i.e. information about the external standard used for this validation exercise, are board-certified or board-eligible pediatric rheumatology professionals at academic centers, and underwent detailed and repeated training in scoring disease indices and completing the jSLE core set.
The ACR outlined a series of validation steps necessary before new criteria are to be widely used for clinical care or research and proposed to use data from clinical trials when developing response criteria for rheumatic diseases (26). The presence of a flare in our study was based on the physician’s perception of the course of jSLE rather than using data from a clinical trial. Clinical trials in jSLE that test interventions with impact on disease activity are unavailable at the current time. Given its prospective character and the training of the investigators, we consider our data to be of as high quality as that collected for clinical trials (missing data < 2%).
Furthermore, in view of the recent progress in identifying high-quality biomarkers of jSLE global disease activity (27, 28), one also must contemplate the potential value of including biomarkers as alternative or additional parameters in future jSLE flare criteria.
Based on the results of the current study, the jSLE-CRVs appear to suffice to adequately capture global flares in jSLE. Consideration of complement levels and ESR aid to a minor degree to the measurement properties of the candidate criteria. Given their ease of completion and performance, the SLEDAI or the ECLAM appear preferable for use in future criteria of global flare. Validation studies of algorithms derived from logistic and CART analysis are warranted, with focus on those with favorable measurement properties, such as −2 + 2xΔMD-VAS + 0.2xΔSLEDAI − 0.04xΔCHQ-PhS, where score changes of at least 15 was found to be indicative of a jSLE flare. As is clearly stated by the ACR, a single study can never suffice to adequately examine the measurement properties of response criteria.
We are in the process of assessing the performance of the presented candidate flare criteria in different data sets with other physicians providing the ratings as to whether a flare occurred or not.
Grant Support: The study is supported by grant funded NIAMS 5U01AR51868, and P60-AR047884.
Acknowledgements – Investigators (data collection): CCHMC, Cincinnati, OH: Drs. Bob Colbert, T. Brent Graham, Murray Passo, Thomas Griffin, Alexi Grom, and Daniel Lovell
Nationwide Children’s Hospital, Columbus, OH: Dr. Robert Rennebohm
University of Chicago Comer Children’s Hospital, Chicago, IL: Dr. Linda Wagner-Weiner
Texas Scottish Rite Hospital, Dallas, TX: Shirley Henry PNP
Medical College of Wisconsin, & Children’s Research Institute, Milwaukee, WI: Drs. James Nocton and Calvin Williams; Elizabeth Roth- Wojcicki, PNP
CUMC, New York, NY: Drs. Lisa Imundo and Andrew Eichenfield
Acknowledgements – Other: CCHMC, Cincinnati, OH: Shannen Nelson (study coordinating), Jamie Meyers-Eaton (site coordinator); Lukasz Itert (database management); CCHMC Biomedical Informatics (Web-based data management application development). Dr. Rina Mina – manuscript preparation
Texas Scottish Rite Hospital, Dallas, TX: Shirley Henry, PNP
University of Chicago Comer Children’s Hospital, Chicago, IL: Becky Puplava (site coordinator)
Children’s Memorial Hospital, Chicago, IL: Dina Blair (site coordinator)
Medical College of Wisconsin & Children’s Research Institute, Milwaukee, WI: Marsha Malloy, Joshua Kapfhamer (both data collection & site coordinator), Jeremy Zimmermann, and Noshaba Khan (both data collection)
CUMC, New York, NY: Candido Batres (site coordinator)
University Hospitals/Case Medical Center and Rainbow Babies and Children’s Hospital Michelle Wallette (site coordinator)
Alfred DuPont Children’s Hospital, Wilmington, DE: Drs. AnneMarie Brescia and Carlos Rosé
We are indebted to the pediatric rheumatologists who participated on the Delphi surveys. We would like to thank the American College of Rheumatology, the Childhood Arthritis Rheumatology Research Alliance, the Pan-American League Against Rheumatism, and the Pediatric Rheumatology European Society for their support with the distribution of the Delphi surveys.