|Home | About | Journals | Submit | Contact Us | Français|
Clinicians require brief, practical tools to help identify low back pain (LBP) subgroups requiring early, targeted secondary prevention. The STarT Back Tool (SBT) was recently validated to subgroup LBP patients into early treatment pathways.
To test the SBT’s concurrent validity against an existing, popular LBP subgrouping tool, the Örebro Musculoskeletal Pain Screening Questionnaire (ÖMPSQ), and to compare the clinical characteristics of subgroups identified by each tool.
Two hundred and forty-four consecutive ‘non-specific’ LBP consulters at 8 UK GP practices aged 18–59 years were invited to complete a questionnaire. Measures included the ÖMPSQ and SBT, disability, fear, catastrophising, pain intensity, episode duration and demographics. Instruments were compared using Spearman’s correlations, tests for subgroup agreement and discriminant analysis of subgroup characteristics according to reference standards.
Completed SBT (9-items) and ÖMPSQ (24-items) data was available for 130/244 patients (53%). The correlation of SBT and ÖMPSQ scores was ‘excellent (rs = 0.80). Subgroup characteristics were similar across the low, medium and high subgroups, but, the proportions allocated to ‘low’, ‘medium’ and ‘high’ risk groups were different, with fewer patients in the SBT’s high risk group. Both instruments similarly discriminated for reference standards such as disability, catastrophising, fear, comorbid pain and time off work. The ÖMPSQ was better at discriminating pain intensity, while the SBT was better for discriminating bothersomeness of back pain and referred leg pain.
The SBT baseline psychometrics performed similarly to the ÖMPSQ, but the SBT is shorter and easier to score and is an appropriate alternative for identifying high risk LBP patients in primary care.
Primary care evidence-based guidelines for non-specific back pain highlight the importance of identifying indicators of poor prognosis in order that treatment can be targeted appropriately (Chou et al., 2007; van Tulder et al., 2006). Investigators increasingly advocate that better identification of potentially modifiable prognostic indicators may lead to more effective early secondary prevention of back pain in primary care (Boersma and Linton, 2005; Morley and Vlaeyen, 2005; Jellema et al., 2006; Koes et al., 2006). Several back pain clinical tools exist to aid clinicians in identifying patients either ‘at risk’ of chronicity or to improve targeting of treatment (Childs et al., 2004; Hicks et al., 2005; Dionne et al., 2005; Truchon and Cote, 2005; Duijts et al., 2006; Neubauer et al., 2006; Denison et al., 2007).
The STarT Back Tool (SBT) is a recently validated tool developed to identify subgroups of patients to guide the provision of early secondary prevention in primary care (Hill et al., 2008). The conceptual purposes of the SBT (Hill et al., 2008) were to identify patients with potentially treatment modifiable prognostic indicators using a brief, user-friendly tool and to validate cut-off scores for subgrouping patients into 1 of 3 a priori initial treatment options in primary care (low, medium and high risk groups). The development and validation of the STarT Back Tool (SBT) was recently reported (Hill et al., 2008), but a head to head comparison with a ‘reference standard’ instrument was not evaluated. The Örebro Musculoskeletal Pain Screening Questionnaire (ÖMPSQ: Linton and Hallden, 1998) was considered the most appropriate ‘reference standard’ against which to compare the SBT, as it is one of the most widely used tools in clinical practice to similarly differentiate primary care back pain patients, and it has a common conceptual purpose with the SBT of identifying high risk patients requiring targeted treatment. As well as being a popular instrument, the ÖMPSQ has been externally validated in numerous international samples (Heneweer et al., 2007; Westman et al., 2008; Hough et al., 2007; Margison and French, 2007; Nordeman et al., 2006; Jellema et al., 2007; Gabel et al., 2008; Hurley et al., 2001; Grotle et al., 2006; Linton and Boersma, 2003; Boersma and Linton, 2005; Dunstan et al., 2005).
Clinicians wanting to follow evidence-based practice guidelines to assess prognostic indicators for chronicity (Chou et al., 2007), require brief and practical tools to help them identify subgroups of patients that may require early, targeted secondary prevention pathways. The publication of the SBT (Hill et al., 2008) provides clinicians with a choice, in addition to an existing ‘reference standard’ instrument, the ÖMPSQ. However, in order to decide which clinical tool to use in practice, clinicians need to know how these instruments compare, including the baseline clinical characteristics of the subgroups identified by each tool.
The overall aim of this study was to therefore provide a head to head comparison of the SBT’s concurrent validity against a best available reference standard, the ÖMPSQ. Our objectives were to test the correlation between the SBT and ÖMPSQ scores and agreement of patients allocated to ‘low’, ‘medium’ and ‘high’ risk subgroups; and to compare the abilities of the SBT and ÖMPSQ scales to discriminate patients according to validated reference standard measures.
The ÖMPSQ consists of 24 self-report items (21 items are scored), selected following a literature review to identify strong independent risk factors for work absence. The authors defined ‘poor prognosis’ as accumulated sick leave of 30 days or more at six months follow up. The ÖMPSQ’s 21 scored items use an 11-point response format, apart from item 1 (pain sites), which has five descriptive components that are double weighted. The instrument therefore provides a potential score ranging from two to 210 points. The reliability of the ÖMPSQ has been reported with a Kappa of 0.83 (Linton and Hallden, 1998) and external validity, with a number of different high risk cut-off scores, has been established in a variety of patient populations and settings (Heneweer et al., 2007; Westman et al., 2008; Hough et al., 2007; Margison and French, 2007; Nordeman et al., 2006; Jellema et al., 2007; Gabel et al., 2008; Hurley et al., 2001; Grotle et al., 2006; Linton and Boersma, 2003; Boersma and Linton, 2005; Dunstan et al., 2005). The predictive validity of the ÖMPSQ has also been investigated in a number of studies, as summarised by Hockings et al. (2008). Since the initial development of the ÖMPSQ, other authors have used it to classify patients into low, medium and high ‘at risk’ groups (Nordeman et al., 2006).
The SBT has 9-items selected as predictive of ‘poor prognosis’ following a literature review and secondary analysis to identify strong independent predictors for persistent disabling back pain. The 9-items each use a dichotomised response format (‘agree’ or ‘disagree’), apart from one bothersomeness item, which uses a Likert scale. Overall SBT scores range from 0-9 and are produced by summing all positive items; a psychosocial subscale score ranging from 0 to 5 is produced by summing bothersomeness, fear, catastrophising, anxiety, and depression items (items 1, 4, 7, 8, and 9). The predictive validity and external validity of the STarT Back Tool has been reported, as well as the SBT’s reliability, with a Kappa of 0.79 (Hill et al., 2008).
Prognostic constructs included in both tools as single screening items include disability, fear avoidance, anxiety, depression and also an item on the patient’s perceived chance that ‘current pain may become persistent’ or that the back pain is ‘never going to get any better’. Both instruments allocate ‘at risk’ patients based on the presence or absence of known indicators of poor prognosis. The SBT also discriminates patients using single items for bothersomeness, referred leg pain, and comorbid pain, while the ÖMPSQ discriminates on the basis of additional items for pain intensity, coping and work-related factors.
Potential differences between the ÖMPSQ and SBT were explored using data from a cross-sectional survey of consecutive adult patients (n = 131) who consulted their GP with low back pain. The methods of patient recruitment have already been published elsewhere (Hill et al., 2008). In brief, 244 participants were invited to complete a questionnaire containing both the ÖMPSQ and the SBT from 8 general practices in North Staffordshire and Central Cheshire, UK using computerised Read Codes to identify recent back pain consulters. This study received ethical approval from the North Staffordshire Local Research Ethics Committee.
The SBT and ÖMPSQ were scored according to the methods specified by the instrument developers (Hill et al., 2008; Linton and Hallden, 1998). Patients were classified into ‘low’, ‘medium’ and ‘high’ risk groups using derived cut-off scores for each instrument. For the SBT, cut-offs recommended in the original article were used (Hill et al., 2008); for the ÖMPSQ, we replicated the high risk cut-off score determined within a UK primary care population by Hurley et al. (2001) of 112; and a cut-off of 90 to separate low, from medium risk group patients, as used by Nordeman et al. (2006). To score the ÖMPSQ among patients who were not workers, (four work-related items are only relevant to those patients in work), the procedure used by Jellema et al. (2007) was followed, where the mean score of the remaining 17 items was imputed, on the condition that at least 75% of all items were completed.
Correlations between the SBT and ÖMPSQ, and reference standard constructs included in both instruments: disability (Roland and Morris Disability Questionnaire, RMDQ: Roland and Morris, 1983) fear avoidance (Tampa Scale of Kinesiaphobia, TSK: Kori et al., 1990) and catastrophising (Pain Catastrophising Scale, PCS: Sullivan et al., 1995) were calculated using Spearman’s rank correlations. The magnitude of the reported correlation coefficient was evaluated with a correlation of 0.1–0.3 was small, >0.3–0.5 was moderate, and greater than 0.5 was large (Cohen, 1998). In addition, box and whisker plot graphs were produced to visually present the correlation between the SBT and ÖMPSQ.1
Observed agreement of the two tool’s allocation to ‘low’, ‘medium’ and ‘high’ risk subgroups was examined by calculating absolute agreement, and agreement beyond chance statistically evaluated using a weighted Cohen’s kappa test. Kappa values were classified for reference as follows: less than 0.00 showed poor agreement, 0.00–0.20, slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; and greater than 0.80, near perfect agreement (Landis and Koch, 1977). The McNemar Bowker test was used to determine whether disagreement observed was evenly balanced or significantly skewed towards the lower or higher group. The characteristics of patients selected for ‘low’, ‘medium’ and ‘high’ subgroups using the SBT and ÖMPSQ were compared using median scores for age, gender, episode duration, bothersomeness, pain intensity, days off work, fear (TSK), catastrophising (PCS), ÖMPSQ scores and SBT scores (mean values were used for age). The Mann Whitney test was used to test for statistical differences between continuous data and the Chi-squared test used for categorical data among the characteristics for discordant individuals (patients for whom group allocation differed).
Finally, the ability of the two instruments’ total (and subscale) scores to discriminate patients was compared by plotting receiver operator characteristic (ROC) curves for various different reference standard ‘cases’ (using the ‘case’ definitions provided in Table 3) and calculating AUCs (95% CIs), with statistical differences tested using the Wilcoxon statistic (Hanley and McNeil, 1982). Total scores were entered as the ‘test’ variables and reference standard ‘cases’ as the ‘state’ variable.
Complete data on the SBT and ÖMPSQ was available for 130/244 (53%) patients. The sample was 60% female, with a mean (±SD) age of 44 (±10.0) and disability score (RMDQ) of 8.6 (±6.6). Back pain episode durations were varied with 25% < 1 month, 27% 1–6 months, and 48% >6 months, and 43% reported being ‘very’ or ‘extremely’ bothered by their back pain. The median scores (range) for SBT and ÖMPSQ were 4.5 (0–9) and 98.8 (26.3–189.0), respectively.
The Spearman’s rank correlation coefficients for the SBT total scores and psychosocial subscale scores with the ÖMPSQ scores were 0.802 and 0.769 respectively. The magnitude of the reported correlation coefficients was therefore, ‘large’ and was similar to the correlation between disability (RMDQ) and the subgroup tools (SBT = 0.813; ÖMPSQ = 0.830). Figures for fear (TSK) (SBT psychological subscale = 0.659; ÖMPSQ = 0.683) and catastrophising (PCS) (SBT = 0.671; ÖMPSQ = 0.656) were lower but also classified as ‘large’. Fig. 1 presents box plot graphs of the ÖMPSQ total score against the SBT (a) total score and (b) psychosocial subscale and demonstrate that increasing ÖMPSQ scores, correlated with higher SBT scores across the full range of both instrument scales.
The proportions of patients allocated to ‘low’, ‘medium’ and ‘high’ risk groups by the SBT and ÖMPSQ were 40% cf 40%, 35% cf 22%, and 25% cf 38%, respectively. The ‘low’ risk proportion was therefore the same, but the SBT allocated considerably fewer patients to the ‘high’ risk group, but significantly more patients to the ‘medium’ risk group than the ÖMPSQ (McNemar Bowker test p = 0.022).
Observed agreement between the two instruments for allocation to ‘low’, ‘medium’ and ‘high’ risk groups, is presented in Table 1, with absolute agreement in 62% of patients. Weighted Cohen’s kappa for agreement in allocation to the three subgroups beyond chance was ‘moderate’, 0.57 (95% CI 0.47–0.68, p = 0.000). Agreement about allocation to the ‘low’ risk group alone (‘low’ versus ‘medium’ or ‘high’) was ‘substantial’ at 0.63 (95% CIs 0.50–0.77, p = 0.000).
The clinical characteristics of the subgroups (‘low’, ‘medium’ and ‘high’) derived using the SBT and ÖMPSQ are presented in Table 2. The characteristics of the ‘low’ risk groups were almost identical (e.g. both had an RMDQ median of 2 and a pain NRS of 1.7). The ‘medium’ risk groups differed only slightly with higher pain and disability scores among those allocated by the SBT and the ‘high’ risk groups were also very similar although the ÖMPSQ ‘high’ risk group had slightly more females and had longer episode durations. However, these differences between the clinical characteristics in the ‘medium’ and ‘high’ risk groups produced by both tools were not statistically significant (p > 0.05).
The ability of the two instruments’ total scores (and SBT subscale scores) to discriminate patients defined as ‘cases’ on reference standards is given in Fig. 2 and Table 3. There were no significant differences in the discriminative abilities of the SBT and ÖMPSQ scales for ‘cases’ of disability, catastrophising, fear, comorbid pain, time off work or episode duration reference standards. The SBT was significantly better for discriminating bothersome and referred leg pain ‘cases’ while the ÖMPSQ scores was better at discriminating ‘cases’ of pain intensity.
This study has provided a comparative analysis of the measurement properties of the SBT and the ÖMPSQ for patients with low back pain, consulting in primary care. The ÖMPSQ (21 items) has been available for nearly a decade and is a popular clinical tool with numerous external validation studies in comparison to the STarT Back Tool (9-items), which has been more recently validated and has potential advantages in terms of its length and scoring simplicity. Both share broadly similar purposes, although subtle conceptual differences have become apparent through conducting this investigation, which should be highlighted for clinicians seeking to choose which tool is most appropriate for their context. Firstly, the ÖMPSQ was conceived as a prognostic screening tool, although in practice it has adopted a subgrouping function with established cut-offs across a variety of settings to allocate patients to low and high risk groups. In contrast, the SBT was primarily conceived as a subgrouping tool, with cut-offs based on baseline factors and not on future outcome, and for specifically defined low, medium and high risk treatment subgroups.
This study found that the completion rate for both instruments was high (n = 130/131 for both) and that their three subgroups’ clinical characteristics were broadly similar. Agreement over the two instrument’s allocation to a ‘low’ risk subgroup was ‘substantial’, suggesting that both tools have a similar ability to differentiate patients based on their baseline characteristics. Although the characteristics of patients in the three subgroups allocated by each tool were not significantly different, significant differences were observed in the proportions of patients allocated to the ‘high’ and ‘medium’ risk groups, with a smaller proportion of patients allocated to the high risk group by the SBT than the ÖMPSQ. One factor in the choice of instrument used by clinicians may therefore be the availability of services for the management of a high risk group of patients.
An analysis of both instrument scales’ abilities to discriminate a number of relevant reference standards was investigated to determine whether similarities in the subgroup clinical characteristics were due to tool cut-off points used, or due to similar discriminative abilities of the instrument scales. The results demonstrated that the two instrument scales similarly discriminated the majority of clinical characteristics including disability, catastrophising, fear, comorbid pain, time off work or episode duration. The ÖMPSQ total scale was significantly better at discriminating patients’ baseline pain intensity, while the SBT was better for discriminating baseline bothersomeness of back pain and referred leg pain.
One substantial limitation of this study is that only cross-sectional data was available for directly comparing the performance of both instruments, preventing a comparison of the instrument’s predictive validity. However, guidelines for the management of non-specific back pain highlight the importance of identifying indicators of poor prognosis, rather than predicting individual patient’s outcomes, in order that treatment can be targeted appropriately (Chou et al., 2007; van Tulder et al., 2006). Traditionally, validation of the utility of such instrument scales has predominantly focused on establishing an instrument’s predictive validity, although the SBT subgroup cut-offs were specifically validated to optimally differentiate patients with specific baseline clinical characteristics. To date, there have been no studies that have compared the predictive abilities of both instruments within a single sample, but published data on the predictive validity of the ÖMPSQ (Hockings et al., 2008) reports AUCs for persisting disability from 0.68 to 0.83, which compares similarly to reported AUCs of 0.80 for the SBT (Hill PhD Thesis, 2008). A further limitation is the potential for non-response bias influencing generalizability as only 53% of the sample responded, which may have inflated the proportion classified by the two instruments to the medium and high risk subgroups, as people with longer term or more severe problems are more likely to respond to the questionnaire. However, non-response bias is likely to have equally affected both instruments and is unlikely to have influenced the tool comparisons made in this study.
This study has compared measurement properties of the SBT to those of the ÖMPSQ, and found them to be similar in respect to the subgroup patient characteristics and abilities of the scales to differentiate according to validated reference standard measures. The SBT is quicker for patients to complete and easier for clinicians to score, although is able to identify a high risk subgroup with similar clinical characteristics to the ÖMPSQ. Apart from the differences in instrument length, the results of this study suggest that the main difference between the two instruments was the proportion of patients each tool allocated to the high risk group, with the SBT identifying 25% as high risk, and the ÖMPSQ 38% as high risk. This might be of relevance to clinicians working within services that have resource limitations for treating high risk patients. Recommendations for future research include; a direct comparison of the predictive validity of both instruments, and also an investigation of possible instrument item redundancy. Clinicians need to be aware of the relative strengths and weakness of the two clinical tools before selecting which tool is most appropriate to help decision making in their specific clinical context and setting. This head to head comparison has demonstrated that the SBT’s discriminative abilities appear to be at least equivalent, and preferable in some criteria, to the ÖMPSQ, which is the most widely used current tool in clinical practice.
This study was supported by a Programme Grant from the Arthritis Research Campaign (arc), UK (Grant Code: 13413) and by the North Staffordshire Primary Care Research Consortium. The authors would like to thank the administrative and health informatics staff at Keele University’s Primary Care Musculoskeletal Research Centre and the Keele General Practice Partnership. Thanks also to the members of the independent monitoring committee, staff and patients of the eight participating general practices and to the wider members of the research team who were involved with the study. Dr Jonathan Hill is funded as an arc Lecturer in Physiotherapy awarded by Arthritis Research Campaign (arc) UK.
1In box and whisker plots the central line is the median, the box provides the lower and upper quartiles, the whiskers are the 2.5% and 97.5% values, and extreme values are noted with a ‘*’ (Chambers et al., 1983).