The clinical meaning of change in measures must be understood in order to interpret effects over time in observational studies and clinical trials. Since physical performance measures are becoming preferred indicators of health and function in older adults, it is essential to develop supportive evidence for their use as measures of change. This study contributes to this purpose with the novel finding that in a single blinded clinical trial, the relationship between self-report and performance measures was consistent between two intervention groups. Furthermore, this study provided estimates of the magnitudes of performance change based on clinical trial data. Estimates for gait speed and SPPB appeared to be in the same range as earlier calculations from smaller studies (5
). This report provided the first estimates for meaningful change in 400 meter walk time.
This study had several strengths. It was based on longitudinal rather than cross sectional data, making estimates of change more valid and reliable (21
). The data came from a large multi-site clinical trial that targeted a population of older adults with mild to moderate mobility limitations. This was a population at high risk of future disability, and therefore likely to be the target of future clinical trials. Research on this population is likely to provide estimates that may be generalizable and useful in future studies. In contrast to observational studies, where performance tends to decline over time and improvement is uncommon, the LIFE-P intervention increased the potential to improve performance, and allowed both the magnitudes of improvement and decline to be estimated. The LIFE-P trial also used meticulous training protocols and quality assurance methods to produce highly reliable performance measures, reducing noise and error in the data. The analyses performed here use state of the art approaches to estimates of meaningful change and the sample size allowed us to compare effects in subgroups based on treatment arm and direction of change.
We used two analytic methods, two magnitudes of change and multiple indicators of self perceived mobility in our work. Anchor-based methods have strong face validity for clinical meaning because they are based on a clinical perception of change while distribution-based methods have optimal capacity to maximize precision (24
). The combined use of both methods has been recommended as the best approach to balance clinical meaning and precision (24
). We calculated two levels of change; minimally important and substantial. The minimally (clinically) important difference (M(C)ID), has been used traditionally to estimate power and sample size (10
). We believe, however, that a larger magnitude of change; one that is considered substantial by patients or moderate by effect size estimates, is valuable. In the clinical arena, changes in health and function are perceived as smaller or larger; and larger changes might be more valued or worth more effort than smaller changes (25
). A clinical trial that achieves a substantial rather than a minimally important change might be considered to have had a greater effect. We used multiple anchors, as has been recommended by others, in order to seek consistency across individual items and increase the robustness of our conclusions (24
This study has limitations. Our self-reported mobility anchors appeared to have significant ceiling effects since over half of participants had to be excluded from some of the analyses because they reported “no difficulty”. This led to reduced sample sizes and lower precision in some cases. Newer self-reported mobility items that include degree of ease as well as degree of difficulty, such as those used in the Health, Aging and Body Composition Study (Health ABC) (2
), may expand the ability to detect change, especially improvement, in performance measures. The study had small to moderate rates of missing data at 12-month follow up. It is possible that our estimates might be biased by this censoring. Despite the large sample size of the LIFE-P trial, it was still inadequate to reliably estimate the magnitude of improvement versus decline and we are unable to state with certainty whether symmetry can be assumed.
We made assumptions about the magnitude of change in ordinal anchor measures that could be considered minimally detectable or substantial. This problem is inherent when using ordinal measures which have no defined magnitude between levels. We arbitrarily defined a one level change in degree of difficulty as minimally detectable change and a change of two or more levels as substantial change. This approach has been recommended by others to create more than one level of change (25
). We acknowledge that we did not account for differences in baseline degree of difficulty; for example, a one level change could occur from “no difficulty” to “a little”, or from “a little” to “some”. Further insights into the effect of baseline status on estimates of change require much larger sample sizes. Rasch analysis of ordinal data could help calibrate the distance between ordinal points, as has been done with other mobility scales (26
Interestingly, the relationship between self reported “no change” and the magnitude of change in the three performance measures was not consistent. When subjects reported “no change”, the mean 400m walk time was slightly worse but the mean gait speed and SPPB were slightly better. () We do not know why these effects were discordant.
Finally, this study, like a prior study of meaningful change, yielded estimates of minimally significant meaningful change in SPPB that were smaller than a one unit change in the score (5
). While a meaningful change of less than one point could not be detected in an individual, it could be used for groups. For example, differences in between-group mean change can be used for power estimates or for interpreting the importance of intervention effects.
In order to enhance the utility of performance measures in research and clinical care of older adults, further work on meaningful change is needed. Future studies could examine the consistency of estimates of meaningful change across subgroups based on demographics, initial health or functional status, as well as in subgroups with intercurrent events such as beneficial treatments or adverse health events.
While clinically meaningful change has traditionally referred to change that is detectable to patients, significant others, or providers, subclinical change can be important when it predicts future clinically relevant states. For this reason, additional research should assess the effect of change on future events such as hospitalization, mobility disability and survival. We hope that the estimates provided here will be of assistance to both the developers and users of physical performance data so that the mobility, health and function of older adults is improved.