|Home | About | Journals | Submit | Contact Us | Français|
This prospective longitudinal multicenter study of ambulatory children with cerebral palsy (CP) examined changes in outcome tool score over time, tool responsiveness, and used a systematic method for defining minimum clinically important differences (MCIDs). Three hundred and eighty-one participants with CP (Gross Motor Function Classification System [GMFCS] Levels I–III; age range 4–18y, mean age 11y [SD 4y 4mo]; 265 diplegia, 116 hemiplegia; 230 males, 151 females). At baseline and follow-up at least 1 year later, Functional Assessment Questionnaire, Gross Motor Function Measure, Pediatric Quality of Life Inventory, Pediatric Outcomes Data Collection Instrument, Pediatric Functional Independence Measure, temporal–spatial gait parameters, and oxygen cost were collected. Adjusted standardized response means determined tool responsiveness for nonsurgical (n=292) and surgical (n=87) groups at GMFCS Levels I to III. Most scores reaching medium or large effect sizes were for GMFCS Level III. Nonsurgical group change scores were used to calculate MCID thresholds for ambulatory children with CP. These values were verified by examining participants who changed GMFCS levels. Tools measuring function were responsive when a change large enough to cause a change in GMFCS level occurred. MCID thresholds assess change in study populations over time, and serve as the basis for designing prospective intervention studies.
Assessing physical function and quality of life using valid and reliable outcome tools is important for clinical care of individuals with cerebral palsy (CP). One crucial attribute of outcome tools is responsiveness, which establishes the ability to detect change and remain constant with no change.1–4 Standardized response mean (SRM)5–13 and effect size14 have been used to establish responsiveness. Responsiveness of the Pediatric Functional Independence Measure (WeeFIM), Pediatric Outcomes Data Collection Instrument (PODCI), and Gross Motor Function Measure (GMFM-88; GMFM-66) have been reported.7,15,16 The responsiveness of other outcome tools commonly used to assess individuals with CP needs to be established.
It is important to understand the minimum clinically important difference (MCID) for outcome tools. MCID is a threshold for determining when meaningful changes occur. MCIDs have been defined and calculated in various ways.7,15,17–19 Reported changes frequently reach statistical significance but may not be clinically meaningful.20,21 For this study, MCID is the magnitude of change required for an observable difference in function, and is quantified using effect sizes. Small effect sizes may be described as imperceptible to the human eye, medium as being large enough to be seen in normal observation, and large as grossly observable.3
MCIDs for the Pediatric Quality of Life Inventory (Peds-QL),19 WeeFIM,7 and GMFM-88 and -6615 have been reported. Techniques used to obtain these thresholds varied and were obtained from different populations. No MCIDs have been established for the PODCI, temporal–spatial gait parameters, and energy cost during walking (O2 cost).
This study determined how outcome measures change over 1 year in ambulatory children with CP, with and without surgical intervention. Responsiveness was assessed using SRM and a known change in function defined as a change in Gross Motor Function Classification System (GMFCS) level between assessments. MCIDs were established for the GMFM, PODCI, PedsQL, WeeFIM, O2 cost, and temporal–spatial gait parameters for ambulatory children with CP. The results provide critical information for designing intervention studies and interpreting results.
This 6-year prospective multicenter study with cross-sectional and longitudinal components was conducted at seven pediatric orthopedic facilities located across the USA (California, Kentucky, Massachusetts, Missouri, Texas, Utah, Virginia) that each treat children from several surrounding states. Institutional Review Board approval was obtained at each site and all participants signed consent, assent as appropriate, and privacy and confidentiality forms. A complete description of the methods was previously reported.22
GMFM Dimensions D (standing) and E (walking, running, jumping), Parent and Child PedsQL, Parent and Child PODCI,23 FAQ,24 WeeFIM, O2 cost, temporal–spatial gait parameters, and GMFCS level were collected at baseline and follow-up at least 1 year later. Before the start of the study, local coordinators were trained in GMFCS classification, tool administration, and data collection procedures. Consistency among coordinators was verified. Data were collected into a study-specific database by direct computer entry and reviewed by the project manager for completeness and accuracy.
Participants were recruited and enrolled for both the cross-sectional and longitudinal study phases. Inclusion criteria were: diagnosis of CP, GMFCS Levels I to III, ages 4 to 18 years, and ability to complete a gait evaluation. Exclusion criteria were: previous selective dorsal rhizotomy, lower extremity orthopedic surgery within a year, botulinum toxin A injections within 6 months, or a currently operating Baclofen pump.
Of 562 participants who completed baseline assessments, 387 completed the follow-up evaluation (68.7%). Of those who did not complete the follow-up evaluation, 95 were unable to be contacted (16.9%), 29 declined (5.2%), nine did not attend for their follow-up (1.6%), four were no longer ambulatory (0.7%), two had surgery less than 1 year from baseline at study completion (0.4%), and 36 for other reasons (6.4%). Six (1.6%) were excluded from analysis because of incomplete data, resulting in a final sample of 381. There were no differences at baseline between those who completed the follow-up and the 175 who did not for age, height, weight, type of involvement, sex, GMFCS level, birth history, and ethnicity. Of the 381 participants, there were 230 (60%) males, 151 (40%) females; 174 (46%) GMFCS Level I, 132 (34%) Level II, and 75 (20%) Level III; 265 (69%) diplegic and 116 (31%) hemiplegic; and predominately Caucasian (83%). Mean age at baseline was 11 years (SD 4y 3mo, range 4y 3mo–18y 4mo) and at follow-up was 12 years 5 months (SD 3y 3mo, range 5y 2mo–20y 6mo). The mean time between assessments was 1 year 5 months (SD 5mo). Demographics are reported in Table I. Individual treatment plans prescribed by the participants’ physicians were followed between assessments; 87 participants (23%) had orthopedic surgery during the study period.
The study sample was separated by GMFCS level and by those with surgical intervention between assessments (n=87; GMFCS I=32, II=35, III=20) and those without (n=292; GMFCS I=141, II=96, III=55). The nonsurgical group represents changes over time with standard care excluding orthopedic surgery. The surgical group represents changes over time due to orthopedic surgery, which were expected to exceed the nonsurgical group’s changes.
Change scores were calculated for all tools by GMFCS level as follow-up minus baseline score. Mean and SD of the outcome tools’ change scores were calculated for the surgical and nonsurgical groups.
Effect sizes are a standardized unit-free measure, calculated as: mean/SD.4 Guidelines for interpreting effect size magnitudes were introduced by Cohen25 as: 0.2 small, 0.5 medium, and greater than 0.8 large. The use of Cohen’s effect sizes is based on the assumption that the study groups are independent, with equal samples size and common within-population SD.26
The effect size for repeated measures of independent samples was renamed the standardized response mean (SRM) by Liang et al.12 and is stated as (Formula 1):
To apply the SRM to dependent samples, one must account for pooled SDs and correlations between measures. The SRM for dependent samples is referred to as the adjusted standardized response mean (SRMa). The factors of √2 and √(1–r), where r is the correlation between repeated measures, are added to the equation to account for pooled samples and correlations respectively.26 The resulting equation for SRMa is (Formula 2):
MCID27 may be subjectively defined based on patient or family rating, or by clinicians from training and personal experience. Some investigators have proposed objective techniques to define MCID.19,26
The MCIDs of study tools were calculated based on non-surgical group data. This non-operative treatment group was used to estimate the population SD for change and the correlation between baseline and follow-up scores.
The MCID is based on the assumption that the mean change score needed to obtain a medium or large effect size is clinically meaningful. This was substantiated by Portney and Watkins,3 who stated that medium effect sizes are observable and large effect sizes are grossly observable. Based on this assumption, the MCID equation was derived as follows (Formula 3):
Substituting MCID as the mean change score (Formula 4):
where any desired effect size can be substituted into the equation.
Because the MCID equation is based on independent samples, equation (Formula 2) was solved for SRM (independent samples; Formula 5):
where SRMa is the desired effect size.
After substitutions, the final MCID equation is (Formula 6):
From here, 0.5 was substituted for medium (observable) effect size and 0.8 for large (grossly observable) effect size. An example calculation for the GMFM-66 MCID at GMFCS Level I is:
Secondary analysis to test the responsiveness of study tools and newly established MCID was completed based on individuals who changed GMFCS levels between study assessments. The GMFCS is a standardized classification system designed to reflect differences in gross motor function that are meaningful in the daily lives of children with CP and their families28 and are clinically meaningful.29 The GMFCS has proven stable over time; 73% children remained in the same level over time and 87.5% were classified in the same level as previous visits, with 11.7% reclassified by one GMFCS level.28 Reclassification to a higher level of ability is valid for children who demonstrate considerable improvement in gross motor function.28 The GMFCS has also been highly correlated (–0.91) to the functional capacity measure of GMFM-88.30 Previous work by the Functional Assessment Research Group demonstrated clear functional differences among GMFCS Levels I to III on standardized outcome tools.22
GMFCS levels at baseline and follow-up were verified for 377 of the 381 participants. Thirty-one (8%) individuals had changes in function, defined as a verified change in GMFCS level between assessments, and were placed into an Improved (n=18) or Declined (n=13) group, and those who stayed the same in a No Change group (n=346). Child POD-CI and PedsQL responses were excluded because of small sample sizes. t-tests determined if change scores were significantly different from zero (p≤0.05). Comparisons among groups were made using analysis of variance (ANOVA) with post-hoc Tukey tests for results p≤0.05. Change scores were compared with MCID values derived from the nonsurgical group (n=292).
Table II reports the means and SDs of change scores for nonsurgical and surgical groups by GMFCS level. SDs of change were large compared with magnitude of mean changes.
The values of SRMa of the nonsurgical and surgical groups by GMFCS level are reported in Table II. For the GMFCS Level I nonsurgical group, the only score exceeding a medium effect size was Parent PODCI Upper Extremity. No tools achieved medium effect size for GMFCS Level II. For the GMFCS Level III non-surgical group, GMFM Dimension E achieved a medium effect size and Dimension D a large effect size. For the GMFCS Level I surgical group, Parent PODCI Global Function and Parent PODCI Upper Extremity achieved a medium effect size and Parent PODCI Transfers achieved a large effect size. For the GMFCS Level II surgical group, only WeeFIM Self Care reached a medium effect size. For the GMFCS Level III surgical group, six outcome scores reached a medium effect size: GMFM-66, Parent PedsQL School Functioning, Parent PODCI Global Function, Parent PODCI Comfort/Pain, Parent PODCI Satisfaction, and Parent PODCI Transfers. For the GMFCS Level III surgical group, Parent PedsQL Social Functioning and WeeFIM Cognition reached a large effect size.
Minimum change scores needed for a MCID on the outcome tools at medium and large effect sizes are reported (Table III). Change scores exceeding MCIDs for a medium effect size are shown in bold type in Table II.
In the Improved group, 12 participants changed from GMFCS Level II to I and six from III to II. In the Declined group, five changed from GMFCS Level I to II and eight from II to III. In the Improved group, 9 of 18 had surgery between assessments. In the Declined group, 6 of 13 had surgery between assessments. In the No Change group, 72 of the 274 had surgery between assessments.
Change scores significantly different from zero (p<0.05) are reported in Table IV. Those with significant differences (p<0.05) between groups (No Change vs Declined, Improved vs Declined) are highlighted. With the exception of PODCI Satisfaction, all change scores for the Declined group in Table IV exceeded MCID for a medium effect size, with six of ten exceeding the thresholds for large effect size. For the Improved group, four of ten exceeded MCID for a medium effect size, and three exceeded MCID for large effect sizes. For the No Change group, change scores did not exceed the medium effect size for any tool. Overall, scores increased for those with an improved GMFCS level, decreased for those with a declined level, and varied little for those with no change. Change scores were larger in the Declined group than the Improved group.
This study sought to evaluate change and responsiveness of outcome tools assessing ambulatory children with CP over time. Although the tools are valid and reliable,23,24,31–33 assessments of responsiveness have been limited to intervention studies, small sample sizes, and a limited number of tools.7,15,16 As MCID has not been established for many tools, interpretation is often in the context of anecdotal experience. A systematic method of defining MCIDs for ambulatory children with CP is presented in this study.
The responsiveness of outcome tools is difficult to assess because changes must be compared with a known change in function. However, no criterion standard of change exists. In the absence of a known change, responsiveness has been assessed using statistical techniques including SRM5–8,10–12 in studies of childhood disabilities, including CP.7
The magnitude of the SRMa is affected by both the change scores and the variability of the changes. Heterogeneity in diagnosis and severity level in CP results in highly variable baseline and change scores. In this study, inclusion criteria were broad and no constraints were placed on participants during the study duration, resulting in a diverse sample. Owing to high variability, large change scores are needed to obtain large SRMa values.
The natural history of children with CP is characterized by a gradual deterioration of gait function over 1 year 6 months to 5 years.34–36 Therefore, minimal changes were expected in a 17-month period for the nonsurgical group. This was supported by the nonsurgical group’s trivial to small SRMa values at all GMFCS levels. Orthopedic surgery is expected to maintain or improve function.21,36 Surgical intervention varied from isolated releases to multilevel bony and soft-tissue procedures. In the surgical group, SRMa values were at trivial or small levels for most tools and did not demonstrate change for all GMFCS levels. Most SRMa values that reached medium and large effect sizes were noted for GFMCS Level III. Small SRMa values may be due to maintenance of function or the surgical group’s variability of change scores. Other factors that may contribute to small SRMa values for both groups in this study are limited tool resolution, no change occurring, or the tools not measuring the appropriate factors of change.
The transition from SRMa to the calculation of MCID is a novel approach. MCID has been reported for PedsQL,19 Wee-FIM,7 and GMFM-66.15,18 The MCIDs in this study are higher than previously reported,7,15,19 probably because of the larger sample size with greater variability in change data. Based on the investigators’ clinical expertise, the calculated MCIDs are consistent with anecdotal reports. MCID based on personal experience and observation may not appreciate the variability of the change data and so may be overly optimistic.
MCIDs established in this study are based on longitudinal data from a large sample of ambulatory children with CP, from seven geographic regions, and with no surgical intervention during the study period. This is more generalizable than previously reported MCIDs, which were based on a variety of diagnostic populations, from single geographic areas, and primarily collected cross-sectionally.7,15,19 These data define expected change over one year for other studies with the same population. If changes exceed the reported MCIDs, a clinically meaningful change is likely to have occurred.
To establish tool responsiveness further, children who changed GMFCS levels between study assessments were identified and change scores analyzed. The GMFCS is a widely accepted, valid, and stable tool for classifying severity level of children with CP (κ=0.75; children younger than 2y).28,29 In this study, outcome tools were responsive when compared with improvements and declines in function. Changes in functional subscales were most consistent with changes in GMFCS level.
Larger change scores in the Declined group may reflect the subtle functional differences between GMFCS Levels I and II, compared with II and III. In the Improved group, most changed from II to I whereas those in the Declined group changed from Level II to III. Distinctions between Levels I and II are that children in level II have limitations in the ease of performing movement transitions, walking outdoors and in the community, and the ability to perform gross motor skills such as running and jumping.29 For Levels II and III, children in the latter need assistive devices to walk, whereas children in the former do not require them after age 4.29
This study provides clinicians and researchers with data that can directly apply to clinical practice. Because data for all tools were collected on the same population, comparisons can be made between change scores on different tools. The results apply to changes in group means; therefore clinicians should not interpret the results at the individual level. MCID thresholds can assist clinicians and researchers in moving beyond interpreting findings based strictly on statistical significance. Effect sizes are a useful statistical technique because they are operationally defined, independent of the sample size, and unit-free.37 However, they can be difficult to translate into the clinical setting. The conversion of an effect size (SRMa) to an MCID makes the measure more useful because they are in units of the tool score. MCID thresholds for both medium and large effect sizes were reported based on Cohen’s operational definition, where a medium effect size is of moderate clinical importance and a large effect size is of crucial clinical importance.37 If a change score exceeds MCID, it is likely that change is of clinical importance. For researchers, the results of this study can: help determine if statistically significant results are also clinically relevant; be used for power calculations; and assist in tool selection for future studies.
GMFCS level was used as the measure of change in function because there is no criterion standard for functional change. Examining differences between Improved, Declined, and No Change groups was limited because of small sample sizes in the Improved and Declined groups. Small sample sizes in the Improved group may have been because children in GMFCS Level I cannot, by definition, improve.
SRMa and MCID values are based on data from the non-surgical group, including individuals of different ages. The SRMa and MCID values may differ based on age; however, age was not a significant predictor of changes in outcome scores from other analyses of this study.
The short-term follow-up of the study participants may have limited the magnitude of changes seen for either non-surgical or surgical treatments. Although large effect sizes are more likely to be clinically relevant than small ones, this is not always true.38 Ideally, clinical relevance would also include an external standard.
This study refined the understanding of how ambulatory children with CP function, by examining change scores over time in a large sample. Outcome tools demonstrated few changes beyond a small effect size for either the surgical or nonsurgical group. Tools were responsive when a change in function occurred large enough to cause a change in GMFCS level. A systematic method of defining MCID was established using the variability of the change scores in the nonsurgical group. These threshold values can be used to assess the change in study populations over time. They can also serve as the basis for designing prospective intervention studies.
This work was funded by Shriners Hospitals for Children, Clinical Outcomes Study Advisory Board Grant no. 9140 ‘A cross-sectional and longitudinal assessment of outcome instruments in patients with ambulatory cerebral palsy’. The authors acknowledge the following investigators for their contribution to this article and the Functional Assessment Research Group (FARG) project: Judi Linton MS PT (SHC Houston, TX); Elroy Sullivan PhD (SHC Houston, TX); Mark Romness MD (UVA); and Diane Nicholson PhD PT (SHC Salt Lake City, UT). The authors also acknowledge from each site the participation of the FARG study coordinators, and the patients and their families.
D Oeffinger, Shriners Hospital for Children (SHC), Lexington, KY.
A Bagley, Sacramento, CA.
S Rogers, Lexington, KY.
G Gorton, Springfield, MA.
R Kryscio, University of Kentucky, Lexington, KY.
M Abel, University of Virginia, Charlottesville, VA.
D Damiano, Washington University, St. Louis, MO.
D Barnes, Houston, TX.
C Tylkowski, Lexington, KY, USA.