|Home | About | Journals | Submit | Contact Us | Français|
Prospective cohort study
to establish outcome measures for recovery and chronic pain for studies with patients that present with recent-onset acute low back pain in primary care
Among back pain researchers, no consensus exists about outcome definitions or how to identify primary-care patients as not-recovered from an episode of low back pain. Cut points for outcome scales have mostly been arbitrarily chosen. Theoretical models for establishing minimal important change (MIC) values in studies of patients with low back pain have been proposed and need to be applied to real data.
In a sample of 521 patients which presented with acute low back pain (<4 weeks) in primary care clinics and were followed for 6 months, scores for pain and disability were compared with ratings on a global perceived effect scale. Using multiple potential “gold standards” as anchors (reference standards), the receiver operating characteristics method was used to determine optimal cut points for different ways of defining non-recovery from acute low back pain.
MIC values and upper limits for pain and disability scores as well as minimal important percent changes are presented for five different definitions of recovery. A previously suggested 30% change from baseline scores does not accurately discriminate between recovered and not recovered patients in patients presenting with acute low back pain in primary care.
Outcome definitions that combine ratings from perceived recovery scales with pain and disability measures provide the highest accuracy in discriminating recovered from non-recovered patients.
Great efforts have been made in recent years to assess outcome measures and define minimally important clinical differences (MIC) when assessing the efficacy of treatments for low back pain (LBP)1–3. Yet, despite long-existing expert recommendations,4, 5 no agreement exists regarding appropriate outcome criteria for defining recovery or chronic pain in patients who present with a new episode of acute LBP. This problem is particularly compelling in primary care, where few of these patients are on sick leave and using return-to-work as primary outcome is inappropriate.
Recent qualitative studies pointed out that patients' views of recovery are spread across multiple domains6, are highly individualized3 and do not fit any single standardized instrument used in prior prediction studies, such as pain scales or the Roland-Morris Disability Questionnaire (RM).7, 8 A low pain score does not clearly distinguish those viewing themselves as recovered from those who do not.3 Cut-offs vary widely ranging from 0–2 for pain and 2–4 for RM, are arbitrarily defined by median9 or quartile splits10, or percent changes.11–13 To address this measurement problem, numerous studies have (a) combined criteria of pain and function10, 14–19, (b) used a symptom satisfaction scale20, or (c) a global-perceived-effect (GPE) or recovery scale, commonly as a 7-point Likert scale20–23 and rarely as a dichotomous option.24 One study used a 15-point Likert scale25 that could be collapsed into a 7-point scale.13 Patients have expressed difficulties with self-classification into a binary judgment demanding options for ambiguous responses.3 However, a GPE Likert scale provides patient responses in a middle “gray” range12 of the scale (“slightly improved”, “unchanged” or “slightly worsened”) and presents a challenge for measurement strategies that require binary classifiers with a defined cut point.21, 26 Binary classifiers are commonly used for both clinical decision making and prognosis studies, e.g. when assessing the odds of developing chronic pain with specific risk factors. The choice of a cut point for defining recovery versus chronic pain comes with a sensitivity-specificity tradeoff: if we place the cut point where we classify only a few patients with significant pain and/or disability as chronic LBP cases, then we may misclassify many patients with less pain and disability as recovered although they might self-classify as not recovered. For example, when de Vet et al.9 defined her reference standard for “important improvement” by using at least “slightly improved” as reference standard on the GPE scale, 35% of their hypothetical patients were viewed as misclassified. Beurskens et al.21 compared two different interpretations of recovery on the same scale with “slightly improved” either classified as recovered or non-recovered. Kamper et al.27 used only “fully recovered” patients for their analyses ignoring “much improved” patients, which many might consider to be an overly stringent criterion for prognosis studies.
Furthermore, the choice for how to divide a Likert scale might depend on the research question: efficacy studies interested in recovery may want to neglect the undecided and move the cut point towards the recovery end of the scale, whereas prognostic studies interested, for example, in chronification of acute pain may move the cut point towards the opposite end of the scale. In addition, criteria for improvement from therapy for chronic pain are different from criteria for recovery from acute pain: a patient suffering from chronic LBP for several years might be content with a smaller improvement in pain and function than a patient with acute LBP, who generally experiences pain and function rapidly improved by more than 50% within a few weeks.28
This paper presents analyses data from a prospective cohort study of patients seeking primary medical care for narrowly-defined acute LBP in the US (main results published separately). Its aim was to explore risk factors for chronic pain and to identify patients who might benefit from early intervention to prevent the progression to chronic pain.
For this study, we reasoned that patients who consider themselves “fully recovered” or ”much improved” despite a minor degree of persistent pain and/or functional disability might be expected to not seek further medical services for their LBP and resume pre-episode activity levels. A low cut-off for pain or disability would count a large proportion of these patients as not recovered.3 Therefore, one option could be to use the criterion of at least “much improved” on the GPE scale at the 6-month follow-up as external criterion for recovery, as well as “slightly worse” and “much worse” as external criteria for non-recovery. For the more ambiguous criteria of “slightly improved” or “same” it was less clear how to force these patients into the dichotomy of recovered versus chronic. It has been shown that the way this is sorted according to different “gold standards” has a considerable effect on the sensitivity and specificity of disability measures21, 26. Using a cohort of primary care patients with aLBP, we decided to (1) explore how self-reported global recovery relates to standard measures of pain and disability, (2) determine “optimal” cut-offs for discriminating between recovered and non-recovered patients, and (3) compare sensitivity and specificity of previously suggested “gold standards” with reference standards using combined outcome criteria. Although theory-driven face validity of integrated assessment strategies may be appreciated by researchers, they cannot be validated against an external criterion or purported “gold standard”.
Members of the largest health maintenance organization (HMO) in Northern California, seeking primary medical care for acute LBP were interviewed twice over the phone, at baseline and at six month follow-up. Acute LBP was defined as back pain between rib cage and buttocks of less than four weeks. Patients 18–70 years of age were included if they spoke English, had no prior LBP episode in the past year, no red flags (fever, cancer history, inflammatory/rheumatoid diseases), no history of spine surgery, no diagnosis of fibromyalgia, or current pregnancy. Patients with sciatica, defined as pain radiating below the knee, were not excluded unless they were scheduled for surgery at the time of the baseline interview. From February 2008 to March 2009, on the day following their clinic visit, consecutive patients were identified by a computer program from electronic medical records and invited by mail to participate in the study. The sample represented the socio-economic and ethnic diversity of the population of health-insured adults in Northern California seen in primary care for acute LBP.29
We assessed pain scores for average pain, bothersomeness of pain20, 30, least and worst pain (in past week) by 11-point numeric rating scales (NRS) and functional disability by RM at both time points, and a GPE scale at follow-up21. We calculated absolute and percent changes. We collapsed answering options “much worsened” and “vastly worsened” on the GPE, thereby reducing the original 7-point Likert scale to 6 points. We explored additional criteria suggested by Jordan31, Ostelo1 and Fritz13 assessing the proportion of patients that improved from their baseline parameters by 30% or 50%.
We used the receiver operating characteristics (ROC) method with GPE as reference standard to assess (1) the minimally important change (MIC)1, 2 values for pain and disability perceived by patients as sufficient to self-classify as recovered, (2) upper limits for pain and functional disability compatible with perceived recovery27, and (3) minimally important percent changes for pain and disability from baseline scores. We assessed the areas under the curves (AUC) as quantified measures for the overall ability of the scales to discriminate between patients who recovered and those who did not.32, 33 Similar to de Vet et al.2 we determined cutoff scores that combined maximal sensitivity with optimal specificity for identifying non-recovered patients. Similar to Beurskens et al.21, in the absence of a gold standard, cutoffs for this sample were based on different GPE interpretations as reference standard: patients who were “slightly improved” at 6 months were either counted as recovered (Reference Standard 1) or non-recovered (Reference Standard 2).
We explored combined outcome criteria (Reference Standards 3–5): patients reporting to be at least “much improved” on the GPE scale were classified as recovered, and patients reporting to be “worse” were classified as non-recovered. Patients reporting to be “slightly improved” or “same” were classified as non-recovered if their scores at follow-up exceeded the upper limit of what all patients perceive as compatible with recovery as determined by ROC curves using Reference Standard 1. We conducted multiple analyses exploring which additional criterion would discriminate recovered from non-recovered with greatest sensitivity and specificity. Confidence intervals for MIC values were estimated by bootstrapping (1000 replications).
To estimate minimally important change (MIC) thresholds, we used the cut-point corresponding to the smallest residual sum of sensitivity and specificity, similar to the study by de Vet et al.2 We used Stata11 software34 with an additional module provided by R. Froud (London, UK).35
605 patients fulfilled eligibility criteria and were interviewed at baseline. This represents 25% of the 2,454 respondents to invitations mailed to 42,650 patients who were seen for any kind of LBP in clinics of the HMO during the twelve months of recruitment. 521 patients (86%) completed a 6-month follow-up interview. Table 1 shows mean self-ratings on the GPE scale and mean pain and disability scores for six response levels. At 6-month follow-up, 32% of patients reported to be “fully recovered”, 81% to be at least “much improved” and 91% at least “slightly improved”. If we classify patients reporting to be “slightly improved” as recovered (Reference Standard 1), 47 (9%) would be classified as non-recovered; if we classified the same patients as non-recovered (Reference Standard 2), we would classify 98 (19%) of all patients as non-recovered.
Table 2 shows the average percentage changes in pain and disability from baseline to 6-month follow-up for each GPE score. For the “completely recovered” and “much improved” GPE groups, pain and RM disability similarly improved on average by approximately 100% and 80%. The two GPE groups reported “slightly improved” or ”same” improved by 30–40% with mean RM change scores being identical for both groups (41%). The finding that patients with 30–40% improvement in pain or disability may report their follow-up situation as being the “same” illustrates the potential for misclassification if we use a single criterion of GPE, pain or disability for discriminating between recovery and chronic pain.
Table 3 shows the proportions of patients in each GPE category who improved by more than 50% or 30%, respectively, from baseline to six months. Though the proportions of patients who improved by either 30 or 50% in a parameter were quite similar within the subgroups at both ends of the GPE scale (“much improved” and “much worse”), these proportions clearly differed in the GPE scale's middle range. In “slightly improved” patients, less than half of the patients reported a 50% reduction in pain or disability; in this GPE group the mean RM score was 8 (median = 7; see Table 1) which is above the reference standard score of ≥7 for chronic pain in several prior studies.9, 36 Consequently, half of these patients would fall into the chronic pain outcome group if we used a RM score of ≥7 or a 50% reduction in pain and function as reference. These findings question the accuracy of a dichotomous outcome using the GPE scale and classifying “slightly improved” patients as recovered2. Although the number of self-reported “slightly worse” patients in our sample is too small (N = 9) to draw general conclusions, choosing a 30% improvement in the RM score as criterion for improvement would classify more than half of these as improved and therefore render this choice problematic. To reiterate, in general dichotomous classifications based on a single criterion may be problematic.
In which way do “completely recovered” patients differ from “much improved” patients? Almost half of the patients (117 of 253; 46%) reportedly not “completely recovered” but “much improved” were free of pain at 6 months, with a mean RM score of 1.8 (SD ± 2.5) (data not presented in tables). In other words, the majority of “much improved” patients reported pain in the past week rated 1.8 for average intensity and 4.2 for worst pain. Generally, if at follow-up patients still had pain, worst pain in the last week was considerably higher than average pain (“slightly improved”: 5.6 vs. 3.3; “same”: 7.0 vs. 4.6). Worst pain in the past week, in addition to average pain intensity, may be a key aspect of GPE self-classification.
Tables 4 to to66 show reference standard-based cut-offs (and areas under the corresponding ROC curves) for: MIC values for pain and disability (Table 4), upper limits of pain and disability still compatible with self-reported recovery (Table 5), and minimally important percent changes for pain and disability from baseline (Table 6). Absolute values for MIC in pain or disability scores were expected to vary according to baseline scores; therefore we present separate results for patient subgroups with baseline scores either above (Table 4A) or below (Table 4B) the median. Table 7 shows confidence intervals estimated by bootstrapping to the results of Table 6.
Each table presents five rows of data for five different reference standards. For easy comparison, all reference standards are listed in a single legend in Tables 4 to to7.7. As de Vet et al. suggested2, with Reference Standard 1 patients were counted as recovered, if they were “fully recovered”, “much improved” or “slightly improved”, whereas with Reference Standard 2 “slightly improved“ patients were counted as non-recovered. Reference Standards 3, 4 and 5 add conditions to the patients self-classified as “slightly improved” or “same”. These patients were counted as recovered if they had pain of less than 3 out of 10 (NRS; Reference Standard 3), disability of less than 4 out of 24 on RM scale (Reference Standard 4) or fulfilled both conditions (Reference Standard 5). These cut-offs were taken from the assessment of the upper limits of these values for compatibility with self-reported recovery according to Reference Standard 1.
Using Reference Standards 3, 4 or 5 with combined criteria, 70 (13%), 82 (16%) or 67 (13%) patients, respectively, would be classified as having chronic LBP. In our sample of patients with acute LBP, perceived recovery required percent changes from baseline pain and disability to be well above 50%. As expected, absolute values for MICs were dramatically higher for patients with higher baseline scores than for those with lower baseline values.
In addition to average pain in the past week, we assessed bothersomeness of pain, a parameter used in numerous previous LBP studies5, 20, 30, 36–39. All of our analyses showed virtually identical results for both pain measures (data not presented). Regarding the parameter's ability to discriminate between recovery and non-recovery, bothersomeness of pain in the past week was not superior to average pain in the past week (p-values for comparing AUCs were between 0.12 and 0.76). As expected, integrating pain or disability or both into the classification criteria for recovery or non-recovery improves the discriminative ability. Among the combination criteria, the discriminatory accuracy appears to be strongest with the inclusion of either pain into the GPE scale, or both pain and disability conjoined.
De Vet et al. presented methods for establishing MIC values on multi-item questionnaires for studies of LBP and used a hypothetical sample of 500 patients for correlating the hypothetical responses on the GPE scale as reference standard with a hypothetical multi-item scale.2 Their theoretical model described a situation identical to the one we explored in our study. The results of the current study put flesh on that theoretical skeleton by providing data for 521 patients.
For the reference scale, we used an identical Reference Standard 1: patients self-reporting as at least “slightly improved” were classified as “importantly improved”. The hypothetical questionnaire for physical functioning consisted of a continuous scale scoring from 0–50. In our study we used the RM Disability Questionnaire, a validated scale from 0–24. If we were to translate the resulting hypothetical MIC value on De Vet's 51-point scale into an MIC on the RM scale, we could expect a change score of 5.0 (95% CI: 2.7–6.8) as MIC value. Using identical methods, our acute LBP sample showed a higher proportion of “importantly improved” patients (91% versus 80%) with an MIC of 11.6 (95% CI: 8.5–14.7) on the RM scale (range 0–24).
Similar to Beurskens et al.21, the current study provides and compares MIC values for multiple hypothetical “gold standards” with Reference Standards 1 and 2 being identical to those used by Beurskens. However, the results are quite different from prior studies, as the population samples differ considerably with respect to symptom duration (limited to 4 weeks in our study), which is a well-known key factor for the prognosis of LBP.40 At the 6-month follow-up in the current study, 32% of patients reported to be “fully recovered”. This is different from the 8% reported by Kamper et al.27 and may reflect the differences in the participants' duration of LBP or in months of follow-up. Kamper et al. presented data for a sub-sample of 239 patients with acute LBP; however, this group was only followed for 3 weeks, and only “fully recovered” patients were analyzed. Beurskens et al.21 excluded patients with LBP of less than 6 weeks; Demoulin et al.21 of less than 3 months. In the samples examined by Hill et al.9 and Dunn et al.19, 75 and 83% of participants had LBP for more than 4 weeks (up to 3 years). In the study by Fritz et al.13 24% reported symptoms of more than 3 months, and 68% had a history of prior LBP with an unknown time interval to the current episode. Finally, the population from which we draw these data is not easily comparable to the US sample of patients with any duration of LBP, in which Von Korff et al18 developed and validated a graded definition of chronic pain using a 6-month recall time frame. Similarly in the replication study in the UK, 81% of participants suffered from LBP for at least 3 months.19
A limitation of this study is that we only interviewed patients who responded to our invitation letter. Therefore, this inception cohort is a small portion of all the patients seen for any type of LBP in that HMO setting during the time of enrolment. We do not have comprehensive information for the patients who did not respond to our invitation. We know, however, that 1) our patient sample was similar in key characteristics (age, sex, ethnicity, education, income) to the insured patients of that HMO according to membership surveys,29 2) respondents were slightly older and slightly more likely female than non-respondents, which is common for respondents in membership surveys of this HMO.29
In the absence of a real gold standard for the definition of recovery from acute LBP or for its chronification, data from a primary care cohort of patients with acute LBP are provided to inform the discussion of (a) MIC values and (b) upper limits for pain and functional disability associated with perceived recovery at follow-up. Although we explored multiple cut points and reference standards, the previously suggested outcome of a 30% decrease in pain or RM scores1, 31 did not discriminate between recovered and non-recovered patients in a sample of patients strictly limited to acute LBP of up to four weeks.
For studies of acute LBP that require a bivariate outcome criterion for recovery versus non-recovery, we presented “optimal cut-offs” for standard measures of pain and disability and assessed their discriminatory capability. Our data suggest a cut-off of <3 for pain and <4 for the RM scores as upper limits of recovery at follow-up. Our data also suggest values for MICs and minimally important percent changes compatible with perceived recovery from acute LBP. If we were using minimally important percent changes as outcome measures, which are less vulnerable to baseline differences, these appear to provide good discriminatory accuracy at change scores generally above 50%.
However, as qualitative studies have previously suggested3, 41, our data confirmed that single parameters such as pain or disability do not easily translate into perceived recovery. We found large AUC values in ROC curves when we used reference standards with combined outcome criteria (GPE scale with the addition of pain or disability scales for patients that self-classify as neither much improved nor worse). Combined outcomes showed improved discriminatory ability between recovered and chronic pain patients and may be considered as alternative to single parameter outcomes. Our results suggest that for studies with acute LBP patients, a combination of the GPE with pain scores may be used for the middle group of patients that self-classify as neither much improved nor worse.
There is a need to define recovery and non-recovery from acute low back pain in primary care. In 521 patients with acute low back pain followed over 6 months, perceived recovery and pain and disability scores are used to establish minimal important change scores using the receiver operating characteristics method.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.