|Home | About | Journals | Submit | Contact Us | Français|
To develop a provisional definition for the evaluation of response to therapy in juvenile dermatomyositis (JDM) based on the PRINTO JDM core set of variables.
Thirty-seven experienced pediatric rheumatologists from 27 countries, achieved consensus on 128 difficult patient profiles as clinically improved or not improved using a stepwise approach (patients rating, statistical analysis, definition selection). Using the physicians’ consensus ratings as the “gold-standard measure”, chi-square, sensitivity, specificity, false positive and negative rate, area under the ROC, and kappa agreement for candidate definitions of improvement were calculated. Definitions with kappa >0.8 were multiplied with the face validity score to select the top definitions.
The top definition of improvement was: at least 20% improvement from baseline in 3/6 core set variables with no more than 1 of the remaining worsening by more than 30%, which cannot be muscle strength. The second highest scoring definition was at least 20% improvement from baseline in 3/6 core set variables with no more than 2 of the remaining worsening by more than 25%, which cannot be muscle strength which is definition P1 selected by the IMACS group. The third is similar to the second with the maximum amount of worsening set to 30%. This indicates convergent validity of the process.
we proposes a provisional data driven definition of improvement that reflects well the consensus rating of experienced clinicians, which incorporates clinically meaningful change in core set variables in a composite endpoint for the evaluation of global response to therapy in JDM.
The standardization of the criteria to evaluate improvement in rheumatic diseases has been a goal of numerous research groups. This work led to establishment of definition of response in rheumatoid arthritis (1), juvenile arthritis (2–4), systemic lupus erythematosus (SLE) both in adults (5–7) and children (8–10).
The International Myositis Outcome Assessment and Clinical Studies (IMACS) group proposed a core set of outcome variables for inclusion in clinical trials in adult and juvenile inflammatory myopathies and defined the degree of change in each core set variables that is clinically meaningful, as well as guidelines for performing clinical trials (11–14). However, until now these proposals have not yet been formally validated in the context of external prospective pediatric studies or clinical trials. Although children/adolescents and adults with DM share many signs and symptoms of disease, they differ in the clinical features and outcome (15–17), and treatment approaches should consider the peculiarities of juvenile patients as well as their longer life expectancy. Therefore, all outcome measures developed for adults need to be subjected to a critical evidence-based evaluation of their measurement properties in children and adolescents.
To help standardize the conduct and reporting of juvenile dermatomyositis (JDM) clinical trials and enhance identification of new therapeutic agents, the Pediatric Rheumatology International Trials Organization (PRINTO) (18), in collaboration with the Pediatric Rheumatology Collaborative Study Group (PRCSG) and with the support of the European Union and the U.S. National Institutes of Health, undertook in year 2000 a multinational effort to develop, and promulgate a core set of outcome variables and a definition of clinical improvement to evaluate response to therapy in patients with JDM and in juvenile SLE. The first two phases of the project, previously published (8;19), led to the development of a prospectively evidence-based validated core set of six variables for the evaluation of response to therapy that is now known as the provisional PRINTO/American College of Rheumatology/European League Against Rheumatism Disease Activity Core Set for the evaluation of response to therapy in JDM (PRINTO/ACR/EULAR JDM core set) (Table 1).
In this paper we report the results of the third phase of the project, which was aimed at developing a provisional validated definition of improvement to aid in the classification of individual patients in future therapeutic trials and in current clinical practice as either improved or not improved.
The overall methodology of this phase of the project was based on a methodological framework used successfully in previous work in rheumatoid arthritis (1) juvenile arthritis (2–4), juvenile SLE (8–10), and inflammatory myopathies (13).
Table 1 gives the six core variables validated previously and the respective tools for their assessment. The PRINTO JDM core set includes the following six variables: 1) physician’s global assessment of the patient’s overall disease activity measured with a 10-cm visual analogue scale (VAS) (0=no activity; 10=maximum activity) (20); 2) muscle strength as assessed by the Childhood Myositis Assessment Scale (CMAS) (0 = worst; 52 = best) (21–23); 3) global disease activity assessment through the Disease Activity Score (DAS) (24) or alternatively the Myositis Disease Activity Assessment (MDAA, this instrument (25) combines two partially overlapping tools named the Myositis Disease Activity Assessment Visual Analogue Scale [MYOACT] and the Myositis Intention to Treat Activity Index A–E version [MITAX) (25)); 4) parent’s global assessment of the overall child’s well-being on a 10-cm VAS (0 = very well; 10 = very poor) (20;26;27); 5) functional ability, as measured by the Childhood Health Assessment Questionnaire (C-HAQ) (26;27) (0 = best; 3 = worst); 6) health-related quality of life (HRQOL) assessment using the physical summary score (PhS) of the Child Health Questionnaire (CHQ) parent version (27;28). The methods for calculating the scores of the PRINTO JDM core set variables are reported in Ruperto et al (19).
The variables underwent extensive evidence-based evaluation, the process of which has been described previously (19). In particular, all variables s were found to be feasible, and have good construct validity, discriminant ability, and internal consistency. Furthermore, they were not redundant, proved responsive to clinically important change in disease activity, and were strongly associated with treatment outcome and thus were included in the final core set.
Following this selection of variables for the evaluation of response to therapy, a second consensus conference was held attended by 37 experienced pediatric rheumatologists from 27 different countries to ensure wide international acceptance of the results, and was facilitated by 4 of the authors (NR, EHG, BAG, AP) with expertise in nominal group process (29;30). The overall goal of the meeting was to reach consensus on a provisional validated definition of improvement, incorporating the PRINTO core set of variables, using a combination of statistical criteria and consensus formation techniques. In order to achieve this objective, four steps (process and analysis) were pursued as briefly described in order below and whose full details can be found elsewhere (2;19).
Step 1: Rate each of 128 paper patient profiles as “clinically importantly improved” or “not improved”, using nominal group technique. Data from the 294 JDM patients analysed for the PRINTO/ACR/EULAR JDM core set (19) were used to select a subgroup of 128 difficult/atypical patient profiles presented to conference attendees for evaluation of therapeutic response. The profiles selected (see examples in Table 2) were those that were judged by the conference organizers to be near a putative threshold level of improvement. For example, patients who showed 100% improvement in all outcome variables were not good candidates for inclusion because all would agree that the patient had improved, and all the definitions of improvement would categorize the patient as improved. Each profile contained only information related to the six validated JDM core set variables with absolute values at baseline and at 6 months, as well as absolute and percent change from baseline, (Table 1 and Table 2). Participants were randomized into three “nominal groups” of equal size, and asked to rate independently all 128 difficult patient profiles as either clinically importantly improved or not improved. If an 80% consensus was not achieved, the case was discussed in a round-robin fashion at each table and if necessary also in a plenary session. We expected to reach consensus for at least 80% of the patients discussed.
Step 2 (statistical analysis): Using the physicians’ consensus judgment as the “gold standard”, we performed several statistical evaluations (see below) to identify the definition of improvement with the best performance characteristics. We were unable to find in the literature any definitions of improvement that used combinations of the core set variables. Therefore, we tested 999 different definitions of improvement that were deemed clinically reasonable by the the Steering Committee of the project (NR, AP, AR, DHL, EHG, AM). Some of the definitions of improvement tested were provided by the IMACS group (13).
Each definition of improvement was classified as either “generic” or “specific” (9). An example of “generic definition” is as follows: at least 20% improvement from baseline in any 2 of the 6 core set variables with no more than 1 of the remaining worsening by more than 30%. An example of a “specific definition” is as follows: physician’s global assessment of the patient’s overall disease activity and muscle strength improved by at least 30%, two of any remaining three improved by at least 20%, and none worsening by more than 30%.
We evaluated the ability of the 999 candidate definitions of improvement to classify individual patients as improved or not improved, and then assessed the agreement between the definitions and consensus of the physicians. We used only patient profiles for which physician consensus was achieved. For each definition, we calculated the chi-square test (1 df) and the corresponding p value, sensitivity, specificity, percent of false-positives, percent of false-negatives, and area under the receiver operating characteristic curve (ROC) (31). The kappa statistic (32) was used to measure the strength of concordance between the definitions and consensus of the physicians. The kappa statistic was converted to a Likert-like scale using the conversion proposed by Landis & Koch (33): 0.01–0.2 = slight; 0.21–0.4 = fair; 0.41–0.6 = moderate; 0.61–0.8 = substantial; 0.81–1 = almost perfect agreement. While the statistical properties of all 999 definition were presented to the consensus attendees only definitions with a kappa > 0.7 (substantial agreement), sensitivity and specificity > 80%, and percent false positive and false negative < 20%, were retained in the further analysis. Results of the statistical analyses were then presented to the conference attendees.
Step 3: We then used nominal group technique to decide which of the definitions of improvement with the highest statistical performance is easiest to use and most credible (highest face validity). The attendees were again randomly split into three groups and, using nominal group technique, were asked to decide which definitions of improvement (selected among the 999 definition tested) that performed best (in the analysis described above) were easiest to use and most credible (content validity), ranking the 5 best from 1 (lowest) to 5 (highest content validity).
Step 4: We multiplied the content validity score by the kappa values to obtain the “best” definitions. For each definition, the three content validity rankings obtained by the 3 nominal groups were summed up and the resulting sum was multiplied by the corresponding value of the kappa statistic, to obtain the “final score” that incorporated both statistical evaluations and experts’ judgment.
The association between the change in each core set variables and the evaluation of response to therapy was analyzed by multiple logistic regression, which used as explanatory variables the baseline-to-6-month change in each core set variable and as the dependent outcome the physician’s consensus evaluation of patient’s improvement. Odds ratios (OR) with 95% confidence intervals (95% CI) were reported. Continuous variables were dichotomized according to the best cut-offs provided by the ROC analysis (31). The purposes of this post-consensus analysis was to evaluate which were the core set variables that influenced most the consensus decision and to establish the best cut-offs for absolute change for the variables included in the model. The best cut-offs for each core set variable should help physicians decide if a patient is improved based on the absolute change of that particular measure.
Data were entered into an Access XP database and analyzed with Excel XP (Microsoft), XLSTAT 6.1.9 Addinsoft, Statistica 6.0 (StatSoft, Inc), and Stata 7.0 (Stata Corporation).
Table 3 shows the comparison of demographic features and baseline and 6-month values of the core set variables between the subgroup of 128 difficult patients used to create the patient’ profiles used in this exercise, and the remaining 166 patient-cohort; the entire cohort of 294 patients was analysed for the PRINTO/ACR/EULAR JDM core set(19). In general, the features were comparable between cohorts, although the former had longer disease duration. Similarly, the two cohorts were comparable at baseline for five of the core set variables; the exception being the parent’s global assessment of the overall child’s well-being. The differences observed at 6 months between the 128 patient-cohort and the remaining sample was expected because this 128 subgroup was composed of the difficult/atypical patients selected for the consensus exercise that overall responded less to the 6-month treatment given by the treating physicians (see Methods section). The remaining 166 patient-cohort consisted of patients who achieved the most pronounced levels of improvement, after the 6-month of treatment, and who were not useful for the purposes of the consensus exercise.
Consensus ≥ 80% was achieved for 121 (95%) of the 128 difficult patients, with 98/121 (81%) patients being judged as clinically importantly improved, and 23/121 (19%) patients as not improved. All three nominal groups reached the same consensus opinion as to patient status on all profiles.
Thirteen of the 999 definitions of improvement reached a kappa ≥ 0.8 (almost perfect agreement); their corresponding chi-square values, p values, sensitivity, specificity, percent false positive and false negative rates, AUC, and kappa statistics are reported in Table 4.
After presentation of the above data, attendees used nominal group technique to rate content validity (Step 3) using a 1–5 scale, with five being the highest. The sums of the combined ranks from the three nominal groups are presented in Table 4 (min-max 1–131). Next, the sum of the ranking was multiplied by its respective kappa statistic to obtain the final score (min-max 1–113), thereby allowing identification of the definitions of improvement with the highest final score. The definition of improvement that scored highest was the following: At least 20% improvement from baseline in 3 of any 6 variables with no more than one of the remaining worsening by more than 30%, which cannot be muscle strength (as measured by the CMAS).
As can be seen in Table 4, the definitions that scored second (IMACS P1) and third highest are similar to the first all requiring an improvement ≥ 20% in at least 3 core set variable, but required a different number (2 instead of 1) or a different degree of worsening (25% instead of 30%) in the remaining variables (13). The similarity of the top ranking definitions indicates convergent validity of the measures. Since the statistical performance of the best definitions had all kappa > 0.8, the selection of the final definition of improvement was driven mainly by the ranking (content validity) of the top 5 definitions.
The association between the change in each core set measure and response to therapy was analyzed in a multivariate analysis, as described in the Methods section. In the final model (Table 5), the physician’s global assessment of the patient’s overall disease activity appeared to be the strongest predictor of response to therapy (OR, 11), followed by the CMAS (OR, 10.2) and the parent’s global assessment of the overall child’s well-being (OR, 5.5). The remaining three core set variables, the DAS, the C-HAQ and the CHQ PhS did not reach statistical significance. In the footnote of Table 5 are also reported the best cut-offs for absolute change for the variables included in the model.
Using a combination of data-driven and consensus-formation processes, pediatric rheumatologists with specific expertise in the assessment of JDM developed a provisional validated definition of improvement that PRINTO proposes for use in future JDM clinical trials. Based on the best performing definition, improvement in individual patients with JDM can be defined as follows: any three among the six core set variables improved by at least 20% versus baseline, with no more than one of the remaining variables worsening by more than 30%, which cannot be muscle strength.
The provisional definition selected by the consensus panel performed well in the available data set, with high sensitivity and specificity, and low false-positive and false-negative rates. The consensus process indicated that this definition had the best content validity as well. The main strength of the definition lies in the consensus of a large number of experienced pediatric rheumatologists from many countries, that provided wide international acceptance of the project, and in its strong statistical properties. Furthermore its core set variables (19) were selected with by an evidence-based process and validated through a large scale data collection in patients who had been assessed in a prospective fashion.
During the discussion phase in the content validity session participants made it clear that muscle strength is one of the essential components for the evaluation of response to therapy in JDM. For this reason, all definitions that required muscle strength to not worsen were highly ranked.
Of note, the second highest scoring definition was at least 20% improvement from baseline in 3 of any 6 core set variables with no more than 2 of the remaining worsening by more than 25% which cannot be muscle strength, is definition P1 selected by the IMACS group (13). This demonstrates convergent validity of the approaches used by the two groups which confirm the validity of the 2 parallel works and the respective findings but in different cohorts. The main difference between the PRINTO and the IMACS group definition of improvement is that we focused on response criteria for use only in JDM and not also in adult patients with DM and PM. Other differences, fully discussed elsewhere (17;19), are related to the core set of variables with serum muscle enzymes included in IMACS core set and excluded in PRINTO core set for their poor statistical performance, and second the inclusion of HRQOL assessment as a distinct core set variables specific for children by the PRINTO group, whereas the IMACS investigators did not incorporate it in the core set, though they recommended to include this measure in therapeutic trials of patients with IIM. Future studies in external cohort will allow the comparison and final validation of the 2 proposed core set and definitions.
The provisional validated definition of improvement was based on a composite combination of outcome measures that were set up to detect a broad range of clinical change. The PRINTO JDM core set includes both objective and subjective measures from both, the physician and patient/parents’ perspective. The evaluation of response to therapy from different perspectives has the advantage of covering all changes induced by the agent under study and of providing information related to the entire spectrum of disease manifestations and consequences. It is also expected to provide better discriminant validity than previous clinical trials which used only muscle strength as the primary outcome (12).
For the practical application of the provisional PRINTO definition of improvement we reported in Table 1 the domains and suggested variables included in the final core set for the evaluation of response to therapy in JDM (adapted from ref. (19)). The suggested variables to measure each domain are the ones used for the validation of the core set and of the definition of improvement but researchers can use other variables that might be more appropriate based on their study design or new validation data that may appear in the literature. In addition in Table 2 are reported 2 examples with data from real patients used at the consensus conference that will help readers, by using the related formulas, to apply the PRINTO definition of improvement for JDM. In the footnote of Table 5 are also reported the best cut-offs for absolute change for the variables included in the model that might help physician in daily practice to decide if a variable has improved significantly.
A possible limitation of our study is the lack of analysis in the context of a real clinical trial and the fact that the cohort used for the definition/consensus generation is the same as per the provisional validation. Another potential limitation is the small sample of not improved patients since prevalence of the outcome could have the false positive/negative rate. The main strength resides in the large prospective collected data, which rarely is attempted in rheumatic diseases (1;2;13) and that enables a comprehensive evidence-based provisional validation of the JDM core set (19) and related definition of improvement.
In summary, PRINTO developed and validated a data driven provisional definition of improvement that will help standardize the conduct of JDM clinical trials and assist clinicians in daily practice when attempting to classify patients as either responders or non-responders. The definition of improvement derived here should undergo final validation in future controlled studies in different external cohorts of patients. This will allow examination of its discriminant validity in detecting a therapeutic response greater than placebo or an active comparator, and to establish whether refinements in currently available instruments are required.
We are indebted to Drs. Anna Tortorelli, Monica Tufillo, and Elisabetta Maggi for their help in data handling, organization skills and overall management of the project. We are also thankful to Dr Luca Villa and Mr Michele Pesce for their help in data base development.
The authors wish to acknowledge the attendees of the Camogli, Italy “International Consensus Conference on defining improvement in JSLE and JDM” for their work during the meeting.
Supported by a grant from the European Union (contract no. QLG1-CT-2000-00514), by IRCCS G. Gaslini, Genoa, Italy, and by the National Institute of Health (Grant RO3 AI 44046). Lisa G. Rider was supported by the intramural research program of the NIH, National Institute of Environmental Health Sciences
Italy: Alberto Martini, MD, Prof, Nicolino Ruperto, MD, MPH, Angelo Ravelli, MD, Angela Pistorio, MD, PhD; USA: Edward H Giannini, MSc, DrPH, Daniel J Lovell, MD, MPH; Sweden: Boel Andersson-Gäre, MD, PhD.
Argentina: Carmen De Cunto, MD, Ruben Cuttica, MD; Belgium: Rik Joos, MD; Brasil: Claudia Magalhaes Saad, MD, Sheila Oliveira, MD; Bulgaria: Dimitrina Mihaylova, MD; Canada: Brian M. Feldman, MD, MSc; Croatia: Miroslav Harjacek, MD; Czech Republic: Pavla Dolezalova, MD; Denmark: Susan Nielsen, MD; Finland: Pekka Lahdenne, MD; France: Anne Marie Prieur, MD; Germany: Hans Iko Huppertz, MD; Greece: Florence Kanakoudi Tsakalidou, MD; Israel: Philip Hashkes, Yosef Uziel, MD; Latvia: Ingrida Rumba, MD; Mexico: Ruben Burgos Vargas, MD; Netherlands: Nico Wulffraat, MD; Norway: Berit Flato, MD; Poland: Malgorzata Wierzbowska, MD; Portugal: Jose Antonio Melo-Gomes, MD; Serbia and Montenegro: Gordana Susic, MD; Slovakia: Richard Vesely, MD; Slovenia: Tadej Avcin, MD; Switzerland: Michael Hofer, MD; Turkey: Huri Ozdogan, MD; United Kingdom: Clarissa Pilkington, MD, Madeleine Rooney, MD; USA: Daniel J. Lovell, MD, MPH, Lauren M. Pachman, MD, Lisa G. Rider, MD, Ann M. Reed, MD, Robert Rennebohm, MD, Carol Wallace, MD.
Brasil: Marcia Bandeira, MD; Greece: Jenny Pratsidou, MD; Argentina: Stella Maris Garay, MD.