|Home | About | Journals | Submit | Contact Us | Français|
Understanding changes in patient-reported outcomes is indispensable for interpretation of results from clinical studies. As a consequence the term “minimal clinically important difference” (MCID) was coined in the late 1980s to ease classification of patients into improved, not changed or deteriorated. Several methodological categories have been developed determining the MCID, however, all are subject to weaknesses or biases reducing the validity of the reported MCID. The objective of this study was to determine the reproducibility and validity of a novel method for estimating low back pain (LBP) patients’ view of an acceptable change (MCIDpre) before treatment begins. One-hundred and forty-seven patients with chronic LBP were recruited from an out-patient hospital back pain unit and followed over an 8-week period. Original and modified versions of the Oswestry disability index (ODI), Bournemouth questionnaire (BQ) and numeric pain rating scale (NRSpain) were filled in at baseline. The modified questionnaires determined what the patient considered an acceptable post-treatment outcome which allowed us to calculate the MCIDpre. Concurrent comparisons between the MCIDpre, instrument measurement error and a retrospective approach of establishing the minimal clinically important difference (MCIDpost) were made. The results showed the prospective acceptable outcome method scores to have acceptable reproducibility outside measurement error. MCIDpre was 4.5 larger for the ODI and 1.5 times larger for BQ and NRSpain compared to the MCIDpost. Furthermore, MCIDpre and patients post-treatment acceptable change was almost equal for the NRSpain but not for the ODI and BQ. In conclusion, chronic LBP patients have a reasonably realistic idea of an acceptable change in pain, but probably an overly optimistic view of changes in functional and psychological/affective domains before treatment begins.
Patient-reported outcomes provide the patient’s perspective on the effectiveness of the treatment, and this has become an important source of health outcome endpoint data also for patients with low back pain (LBP) . Using such measures requires a detailed understanding of the meaning of the differences observed when used in a longitudinal setting. In 1989 Jaeschke et al.  coined the term “minimal clinically important difference” describing the smallest difference in score which patients perceive as beneficial and which would mandate a change in the patient’s management.
So far three methodological categories have been used to determine a meaningful change in measured health status. By far the most common are the a posteriori anchor-based approaches where observed change scores on the outcome measure whose interpretation is under question is compared to some independent measure serving as an aid for interpretation. Of these approaches, the single anchor method employing the patients’ global rating of change is the most widely used as it specifies a threshold between important and trivial change (MCIDpost) [9, 19]. Employing patients’ global rating of change are, however, limited by recall bias [18, 31, 33] and do not, in themselves, account for the measurement error of the instrument . Other weaknesses have been mentioned such as present-state bias where the patients’ global rating correlates more to the present health state rather than to the experienced change during the treatment , motivational bias where patients going through a cumbersome treatment have a tendency to overestimate their improvement  and the complexity of the global rating of change question compared to questions in multi-item scales . Finally, little information exists on the reliability and validity of the global ratings [9, 10].
Two a priori methods for estimating the minimal important difference include the between-patient method developed by Redelmeier et al.  and the clinician-based prognostic rating method by Westaway et al. . Several limitations have been highlighted for the between-patient method. First, it is questionable if the results of this method are comparable to within-patient methods. Second, the method poses practical difficulties of assembling a representative group of patients. Finally, the issue of generalisability of the results across disease states have been challenged . A drawback for the prognostic rating method is that it focuses on the clinician as the arbiter of good prognosis and not the patient. Thus, the question “how do we determine a clinically acceptable change before treatment begins?” remains unanswered. This is unfortunate as aligning clinician expectations to what is acceptable to the patient is important for the clinical outcome and satisfaction [23, 29].
We therefore developed a novel a priori method of estimating low back pain patients’ acceptable outcome of treatment by modifying the standard outcome instruments. A comparison of the modified questionnaire score to standard instrument score allowed us to calculate the pre-treatment acceptable change score (MCIDpre) and compare it to MCIDpost, post-treatment acceptable change and measurement error.
The overall objective of this study was to develop a novel method to estimate LBP patients’ view of an acceptable outcome of treatment before it begins. We did this by: (a) developing new questionnaires measuring acceptable outcomes by modifying well-known questionnaires, (b) testing the modified questionnaires for reproducibility, (c) comparing the results of the prospective acceptable outcome method to a well-established retrospective method and measurement error, and (d) determining if patients with chronic LBP can determine an acceptable change before treatment commences.
The study was reported to and accepted by The Danish Data Protection Agency.
Patients suffering from chronic low back pain and/or leg pain were recruited from an out-patient hospital back pain clinic in 2005. Inclusion criteria were: (1) age between 18 and 60 years, (2) presence of low back pain and/or leg pain, and (3) able to read and understand Danish. Patients were excluded if: (1) a pathological disorder of the spine (e.g. fractures, spinal infections or malignancy, ankylosing spondylitis, rheumatoid arthritis, or other inflammatory diseases) was suspected, (2) they had received a prior back operation, (3) they showed signs and symptoms of a progressive neurological disorder, (4) pending action for damages/litigation were recorded in the case notes, and (5) they had been diagnosed with a psychiatric disorder. Included patients received oral and written information about the project and gave their informed consent.
The modified questionnaires (see outcome measures) were tested for face validity prior to the main study. Twenty-five consecutive patients were interviewed by the first author after filling in the baseline questionnaire booklet. The semi-structured interview focused on difficulty and comprehension in answering the modified questionnaire and resulted in minor changes.
Patients fulfilling the inclusion criteria were followed over an 8-week period when they received standard conservative treatment according to the Danish national guidelines for the management of LBP . They received a questionnaire booklet at baseline before commencing the treatment. Questionnaire booklets were mailed to all patients at 1-week (test–retest study for the modified questionnaires) and 8-week follow-up.
In order to clarify whether patients can distinguish between what is acceptable and what are their expectations/hopes to the treatment we randomised another 133 chronic LBP patients from the same out-patient hospital back pain clinic as the main study into three groups. Group A filled in three 11-box numeric rating scales for: (1) pain intensity over the past week, (2) their expectations/hopes to the treatment, and (3) their acceptable result of the treatment. Group B and C received question 1 and either question 2 or question 3, respectively, and these groups were compared to group A.
At baseline, patients provided standard sociodemographic information and completed a questionnaire booklet, including standard and modified pain and functional/psychological measures.
The summary score of the ODI and the BQ and the score of the NRSpain at baseline was termed the pre-treatment standard score.
The standard LBP outcome measures were modified to allow patients (before treatment start) to reflect on what would be acceptable for each item after cessation of treatment. First, patients were asked to differentiate between what they considered an acceptable result and their expectations/hopes to the treatment outcome in the introduction. Second, all the questions in each questionnaire were modified to include the following basic question: “Please indicate what you consider to be (e.g. an acceptable level pain) after completion of the treatment if you had to accept some (e.g. pain)?” (Fig. 1). The summation of all the items of the modified outcome measures was termed the pre-treatment acceptable score. These were subsequently transformed into 0–100 scales to allow for comparison.
At 1-week follow-up, all patients completed both the standard and modified LBP outcome measures including questions indicating whether (a) their condition had changed or (b) their opinion on what constituted an acceptable outcome of the treatment had changed since baseline. Only patients answering “unchanged” to question (a) and “I have not changed my opinion” to question (b) were considered stable and included in the reproducibility study.
At 8-week follow-up, patients were asked to complete the standard LBP outcome instruments. Furthermore, the patient assessed the treatment result by completing a 7-point global rating of change (transition question) . Focus on the change in health rather than the present health state was optimised by informing the patients of their baseline global rating of pain severity before answering the transition questions (TQ) [17, 20].
Descriptive statistics were used to summarise patient demographic and clinical data and frequency distributions of the pre-treatment acceptable scores were generated for each of the modified outcome measures. Pre-treatment standard scores were also correlated with the pre-treatment acceptable scores to ensure score reliability for the validity study .
Test–retest reproducibility of the modified questionnaires was carried out on 55 stable patients using the limits of agreement (LOA) method as outlined by Bland and Altman . This method plots the difference between the measurements (modified scores at baseline minus modified scores at 1-week follow-up) against the mean of the same measurements with 95% limits calculated as the mean difference ± 1.96 SD. Thus, 95% of the differences between the two measurements lie between these limits .
Second, the internal consistency was tested for the modified ODI using Cronbach’s alpha and values between 0.7 and 0.9 was considered acceptable [15, 39]. Alpha was not calculated for the modified BQ and NRSpain as each instrument dimension is represented by only one item . The original questionnaire scale range was used.
The MCIDpre was compared to a post-treatment anchor-based method of establishing the minimal clinically important difference (MCIDpost) and measurement error. The minimal detectable change at the 95% confidence level (MDC95%) was used as a marker for measurement error.
The MCIDpre was calculated by taking the mean of the pre-treatment acceptable score minus the mean pre-treatment standard score. Thus, the MCIDpre represents the change score acceptable to the included patients determined before commencement of treatment.
The MCIDpost was established using the anchor-based receiver operating characteristic (ROC) curve method . It determines the minimal clinically important difference retrospectively at the individual level of interpretation (changes within patients over time). A ROC curve analysis was used to determine sensitivity and specificity for classifying patients as having experienced an “important improvement” or “no change” was carried out. Patients classified as having experienced an “important improvement” had to rate themselves as either “much better” or “better” on the TQ. The optimal cut-off change score was identified as the cut-point with equally balanced sensitivity and specificity , and this was considered an expression of the MCIDpost at the individual level. Confidence intervals for the MCIDpost were estimated using STATA’s programming function to calculate the optimal cut-point and a bootstrap procedure using 200,000 samples with replacement.
The MDC95% expresses the degree of change required in an individual’s score, in order to establish it (at a 95% confidence level) as being a “real change” over and above measurement error [2, 11]. The standard error of the measurement (SEMconsistency) was used to indicate the MDC95% and was defined as the square root of the residual variance computed with ANOVA for random effects [12, 28]. At the 95% confidence level, the MDC was calculated as 1.96 × √2 × SEMconsistency which is equivalent to 2.77 × SEMconsistency.
To establish whether our cohort of chronic patients was able to determine an acceptable outcome of treatment before it began, the MCIDpre was compared to: (a) the MCIDpost and (b) the post-treatment acceptable change.
The post-treatment acceptable change was defined as the mean serial change score in patients who rated themselves as “better” or “much better” on the transition question.
Statistical significance between the MCIDpre and the post-treatment acceptable change was tested using Wilcoxon rank-sum test.
All statistical calculations were performed using the statistical package STATA v. 10.0 IC (STATA Corp., College Station, TX, USA) and statistical significance was accepted at the P < 0.05 level.
A total of 225 consecutive patients were eligible for inclusion in the study. Seventy-eight patients (34%) refused to participate or never returned the baseline questionnaires. Thus, 147 patients were available at baseline. At 1- and 8-week follow-ups the response rates were 83.7% (n = 123) and 81.0% (n = 119), respectively. Table 1 provides characteristics of the participants at baseline and 8-week follow-up.
A dropout analysis showed higher median days of sick leave during the last 12 months and a higher proportion of patients with light physical workload for the dropouts, however, this was not statistically significant. All other baseline characteristics were identical in the two groups.
Correlation between the pre-treatment standard scores and the pre-treatment acceptable scores were 0.39 for the ODI, 0.34 for the BQ and 0.38 for the NRSpain.
The distribution of the pre-treatment acceptable score (0–100 scales) is shown in Fig. 2. Approximately 90% of the scores were below 32 for all instruments. The pre-treatment acceptable score varied according to the type of instrument. 84% scored less than 16 points on the modified ODI, 56% on the modified BQ and 37% on the modified NRSpain. The NRSpain was the instrument which had most patients with a pre-treatment acceptable score between 17 and 32 points (55%).
The median (25, 75 percentiles) time interval between baseline and 1-week follow-up was 11 (9, 20) days. The LOA plots show negligible systematic bias for all the modified questionnaires and acceptable 95% LOA intervals. The systematic difference and 95% LOA were 0.8 [−6.6; 8.2] for the modified ODI, −0.2 [−8.8; 8.4] for the modified BQ, and 0.0 [−1.9; 1.9] for the modified NRSpain (data not shown). Cronbach’s alpha was 0.84 for the modified ODI.
Table 2 provides a concurrent comparison of measurement error (MDC95%), MCIDpre and MCIDpost for all patients.
Acceptable change determined before treatment (MCIDpre) varies between the chosen outcome measures. Thus, the MCIDpre for chronic LBP patients scoring the ODI is a 26.1 points reduction (26%) whereas this figure is 25.6 points (37%) for the BQ and 4.2 points (42%) for the NRSpain. Further, the MCIDpre appears approximately 4.5 times larger compared to the MCIDpost for the ODI, however, only 1.5 times larger for the BQ and NRSpain. Third, the MCIDpre values are well above measurement error (MDC95%) when looking at the instrument sumscores. However, five out of seven of the BQ subscales showed MCIDpre values smaller than measurement error. In contrast, the MCIDpost values are all smaller than measurement error except the NRSpain.
A comparison of MCIDpre to MCIDpost and post-treatment acceptable change (mean change score of those patients who rated themselves as “better” or “much better” on the external criterion) is provided in Fig. 3. The MCIDpre for the ODI and BQ (including subscales) did not match the post-treatment acceptable change (P < 0.05). However, this was not true for the pain measure where MCIDpre and post-treatment acceptable change were almost comparable (4.2 [3.9; 4.5] vs. 3.8 [2.9; 4.7], P = 0.33). Thus, the mean pain change score of patients who rated their improvement as “better” or “much better” after treatment was in fact almost identical to what was an acceptable change in pain before commencing treatment.
The post-hoc study showed no difference in baseline data (age, sex, pain duration, disability and pain scores) between the three groups. The median baseline pain score was 7–8 in the three groups. Group A (n = 41) received all three questions and expected/hoped to become pain free (median pain score of 0) after the treatment; however, found a median pain score of 2 acceptable. Group B (n = 46) received the question about expectations/hopes to the treatment and scored this to a median of 1 after the treatment. Contrary, group C (n = 46) received the question about what was an acceptable result after the treatment and found a pain rating of 2 to be acceptable after the treatment.
This study applied the prospective acceptable outcome method to two well-established LBP questionnaires (ODI and BQ) and a pain rating scale (NRSpain) and compared it to a commonly used retrospective method of establishing an important change and instrument measurement error. We found the pre-treatment acceptable scores reproducible and the MCIDpre outside instrument measurement error for the instrument sumscores. Furthermore, the MCIDpre was between 1.5 and 4.5 times larger compared to the MCIDpost. Interestingly, patients seemed to overestimate an acceptable change in functional and psychological/affective aspects but less so for pain before treatment when compared to serial improvement in patients who rated themselves as “better” or “much better” after treatment.
Our results showed a gap between an acceptable change determined before treatment (MCIDpre) and the retrospective MCIDpost for all included measures (Table 2). Several reasons may, at least in part, explain why we observed such large discrepancies between what patients consider acceptable before and after treatment. First, it is likely that a response shift has taken place during the treatment. This has been defined as changes in the meaning of the patient’s self-evaluation of the instrument resulting from changes in (a) conceptualisation (i.e. meaning of the item content), (b) values (i.e. a change in the relative importance of the item as an indicator of the measured dimension), or (c) internal standards (i.e. a change in the meaning of the response options) [32, 38]. What is an acceptable outcome before treatment may change during the course of treatment as patients continuously receive information on treatment efficacy, and any mismatch between this and patient-established acceptable outcome is likely to have been reset. Thus, the direction of the response shift is probably of an overestimation of the MCIDpre which, during the course of treatment, was adjusted to more realistic benchmarks for treatment outcome and therefore a relatively smaller MCIDpost. Second, it can be questioned whether patients in fact can distinguish between what is acceptable and what are their expectations/hopes to the treatment. If our patients experienced difficulties differentiating between the concepts of “acceptable result of the treatment” and “expectations/hopes to the treatment”, this would result in an underestimation of what is acceptable and an overestimation of the MCIDpre. The post-hoc study results suggest that chronic LBP patients are able to distinguish the dimensions of acceptable treatment results from their expectations/hopes as group B and C rate these two concepts differently (i.e. group B expected/hoped for no pain after treatment whereas group C could accept a pain rating around 2). Similar findings are reported by Yelland et al.  who also found a disparity between what LBP patients regard as a minimum worthwhile reduction in pain and disability and their expectations/hopes to the treatment. Therefore, we believe that patients are able to differentiate between an acceptable outcome and their expectations/hopes to the result of the treatment. Thus, this factor is less likely to have resulted in an overestimation of the MCIDpre. A last point to be remembered is that difference scores (i.e. MCIDpre and MCIDpost) often show lower reliability compared to both single scores .
In summary, we found that the prospective acceptable outcome method yielded results which are not comparable to the retrospective MCIDpost method and that the disparity is possibly influenced by a response shift during the treatment.
In Fig. 3, the MCIDpre is compared to the MCIDpost and the post-treatment acceptable change. We defined post-treatment acceptable change as the mean change score of those patients who rated themselves as “better” or “much better” on the external criterion. However, some patients may consider less of an improvement acceptable resulting in an overestimation of the post-treatment acceptable change using this method. This is illustrated in the disparity between the post-treatment acceptable change and the MCIDpost. Nevertheless, Fig. 3 clearly demonstrates the disproportion between MCIDpre and an acceptable change after the treatment for the ODI and BQ (including subscales) but less so for the NRSpain. Accordingly, it may be more difficult for back pain patients to establish an acceptable score before treatment for functional and psychological/affective constructs (ODI and BQ) compared to pain intensity. The ODI and BQ include various aspects of complex cognitive and affective concepts such as social life and depression which require difficult mental estimations of what is an acceptable level of these dimensions. In contrast, estimating an acceptable pain level is probably easier as this is often the main reason for seeking care. We hypothesise that the difference in cognitive/affective complexity of the constructs included in the instruments accounts, at least in part, for some of the disparity observed between the instruments.
In summary, chronic LBP patients probably overestimate the size of an acceptable change in functional and psychological/affective aspects. In contrast, the same patients seem to have a clearer understanding of what is an acceptable change in pain intensity before treatment begins. Clinically, this highlights the importance of matching the patient established benchmarks for what are acceptable functional and psychological/affective outcomes and the anticipated treatment efficacy during a rehabilitation programme.
Many authors advocate that estimates of the minimal clinically important difference should fall outside measurement error of the instrument in question [10, 13, 26]. Our results show that measurement error (MDC95%) was smaller compared to the MCIDpre, however, larger when compared to the MCIDpost. Whether this invalidates the MCIDpost is probably a matter of perspective. We agree that any values of the MCIDpost lower than measurement error at the group level of interpretation is probably invalid. However, interpreting the MCIDpost at the individual level could indeed be valid as a patient may well have experienced an important improvement when the change is equal to or above the MCIDpost value but below measurement error.
In conclusion, the prospective acceptable outcome method offers a benchmark by which the patients’ acceptable scores on well known and validated clinical outcome measures can be scrutinised before treatment commences. It yields results which are 1.5–4.5 times larger compared to the retrospective anchor-based method of determining the minimal clinically important difference which possibly can be explained by a response shift. Our results suggest that chronic LBP patients probably overestimate an acceptable change in functional and psychological/affective domains but have a clearer understanding of what is an acceptable change in pain intensity before treatment commences. This has implications for matching patients’ acceptable outcome to the expected treatment efficacy.
We thank Jytte Johannesen and Ida Bhanderi for administering the questionnaires. Furthermore, we would like to thank the management and staff at Backcenter Funen for their enthusiastic participation in the project. A special thanks to the seven chiropractic clinics for their involvement in recruiting patients for the study. The study was supported by the Foundation of Chiropractic Research and Postgraduate Education, The Faculty of Health Science at the University of Southern Denmark and The European Chiropractic Union.
Conflict of interest statement The funding bodies have no control over design, conduct, data, analysis, review, reporting, or interpretation of the research conducted with the funds.