|Home | About | Journals | Submit | Contact Us | Français|
Clinical practice guidelines state that the tissue source of low back pain cannot be specified in the majority of patients. However, there has been no systematic review of the accuracy of diagnostic tests used to identify the source of low back pain. The aim of this systematic review was therefore to determine the diagnostic accuracy of tests available to clinicians to identify the disc, facet joint or sacroiliac joint (SIJ) as the source of low back pain. MEDLINE, EMBASE and CINAHL were searched up to February 2006 with citation tracking of eligible studies. Eligible studies compared index tests with an appropriate reference test (discography, facet joint or SIJ blocks or medial branch blocks) in patients with low back pain. Positive likelihood ratios (+LR) > 2 or negative likelihood ratios (-LR) < 0.5 were considered informative. Forty-one studies of moderate quality were included; 28 investigated the disc, 8 the facet joint and 7 the SIJ. Various features observed on MRI (high intensity zone, endplate changes and disc degeneration) produced informative +LR (> 2) in the majority of studies increasing the probability of the disc being the low back pain source. However, heterogeneity of the data prevented pooling. +LR ranged from 1.5 to 5.9, 1.6 to 4.0, and 0.6 to 5.9 for high intensity zone, disc degeneration and endplate changes, respectively. Centralisation was the only clinical feature found to increase the likelihood of the disc as the source of pain: +LR = 2.8 (95%CI 1.4–5.3). Absence of degeneration on MRI was the only test found to reduce the likelihood of the disc as the source of pain: −LR = 0.21 (95%CI 0.12–0.35). While single manual tests of the SIJ were uninformative, their use in combination was informative with +LR of 3.2 (95%CI 2.3–4.4) and −LR of 0.29 (95%CI 0.12–0.35). None of the tests for facet joint pain were found to be informative. The results of this review demonstrate that tests do exist that change the probability of the disc or SIJ (but not the facet joint) as the source of low back pain. However, the changes in probability are usually small and at best moderate. The usefulness of these tests in clinical practice, particularly for guiding treatment selection, remains unclear.
The online version of this article (doi:10.1007/s00586-007-0391-1) contains supplementary material, which is available to authorized users.
Low back pain guidelines recommend the use of the term non-specific low back pain (NSLBP) [1, 51] on the grounds that it is not possible to establish the source of the pain in the majority of cases. However, most guidelines do not refer to any primary studies to support this position. Some authors and clinicians are now questioning the utility of the diagnosis NSLBP [15, 19] arguing that no one treatment would be expected to be effective for all patients with this diagnosis. If the tissue-source of low back pain could be identified, this may lead to more logical, and effective, interventions.
The disc, facet joint and sacroiliac joint (SIJ) are potential sources of low back pain. The prevalence of each of these structures as a source of low back pain has been estimated at 39,  15 and 13% , respectively. Each structure is innervated  and noxious, mechanical or chemical stimulation can cause low back pain . There is no universally accepted gold standard for diagnosis of LBP of disc, facet joint or SIJ origin. The recommended reference standards involve anaesthetic or provocative injections . Much has been written both for and against the diagnostic accuracy of these reference tests [7, 10, 14, 55], however, they are currently the best available tests to identify the disc facet or SIJ as the source of low back pain. These reference standards are invasive, expensive and not widely available and therefore not suitable for routine clinical use. Using these reference standards researchers have investigated the accuracy of diagnostic tests available to clinicians, which aim to identify the tissue source of NSLBP. No systematic review of this body of literature has been performed. Without this systematic summary it is not evident if the existing literature supports or refutes the position that it is not possible for clinicians to identify a tissue source for low back pain in most patients presenting for care, or if inadequate research has been performed to answer the question.
To resolve this issue we performed a systematic review of studies investigating the accuracy of diagnostic tests available to clinicians to identify the disc, facet joint or sacroiliac joint as the source of a patient’s NSLBP. Our aims were to determine which tests had been investigated, the diagnostic accuracy of these tests and the methodological quality of this research.
There is no widely accepted search strategy to identify diagnostic studies. We therefore developed a sensitive strategy based on several authors’ work [11, 58]. The final search (Appendix 1) contained several terms for one of three domains (diagnostic studies, index tests available to clinicians and terms for disc, facet joint and sacroiliac joint) which were combined to generate the final strategy. Search terms from retrieved articles were added to the search until saturation occurred.
A search was conducted of Medline, Cinahl and Embase up to the end of February 2006. One author inspected the titles of the search results and excluded clearly irrelevant articles. Two independent reviewers then read all abstracts and full texts as needed to determine if articles met inclusion criteria. In cases where reviewers disagreed and consensus could not be reached a third reviewer made the final decision. Reference lists of included articles were reviewed for additional articles. Included articles were entered into Web of Science as a further search for additional articles. A final list of included articles was sent to two experts in the field who reviewed the list for possible omissions.
To be included studies were required to meet the following criteria:
Two reviewers independently rated the quality of studies using the QUADAS scale . Reviewers met initially to define acceptable standards for individual rating items. One item was added such that studies were also rated on whether they used a prospective design. In cases where reviewers disagreed and consensus could not be reached a third reviewer made the final decision.
We pre-specified that we would investigate the effect of using more strict reference standards. For disc studies this was a stricter control procedure (one adjacent pain-free disc) or abnormal morphology in addition to concordant pain response as part of the reference standard. For facet joint and SIJ studies it was using a double control block or greater levels of pain relief as the reference standard.
Index tests were considered informative when positive likelihood ratios (+LR) were > 2, and/or negative likelihood ratios (−LR) < 0.5 and confidence intervals did not include one. +LR are typically > 1: the higher the +LR the more likely a patient with a positive test does have the disorder. −LR are typically < 1: the lower the −LR the more likely a patient with a negative test does not have the disorder. Meta-DiSc was used to calculate sensitivities and specificities, likelihood ratios, assess heterogeneity, perform meta-analyses and generate summary receiver operating characteristic curves (SROC). Heterogeneity was assessed by visually inspecting SROC for threshold effects and by reviewing Chi-square analysis for significant P values.
Search Our electronic search identified 10,647 articles (Fig. 1). Of these, 10,294 clearly irrelevant articles were excluded by title leaving 353 potentially eligible articles. Following review by two independent authors, 41 articles [2, 5, 6, 9, 12, 13, 17, 18, 20–31, 34–40, 42, 43, 48–50, 52–54, 56, 59–63] met all inclusion criteria and were included. No additional articles were identified by citation tracking, or by contacting two experts in the field. Individual study characteristics are summarised in Table 1.
Quality Results of the quality assessment using QUADAS are shown in Appendix 3. Overall the quality of studies was moderate (average 8.8 positive results from a possible 14). The item which scored worst was the spectrum of patients where only seven of 41 (17%) studies scored positive. In most cases the population was a convenience sample of patients receiving the reference test. It is possible that these patients are not typical of those presenting with low back pain. Other items which were generally poor included: time between index and reference test (27% positive), availability of clinical data (29% positive), and reporting of uninterpretable results (22% positive).
Of the 41 included studies, 28 investigated the disc as the source of low back pain, 8 investigated the facet joint and 7 the SIJ (Table 1). One study  investigated all three sources while all other studies investigated only one source of low back pain. Studies investigated from 1–40 index tests. Index tests were investigated by 1–10 studies. The prevalence of pain originating from the disc, facet joint and SIJ across all studies was 20–79% for disc, 12–61% for facet joint and 28–61% for SIJ.
Appendix 2 records the contingency data for all studies. Diagnostic accuracy values for index tests investigated by two or more studies are presented in Tables 2 (disc studies), studies),33 (facet joint and SIJ studies). For most index tests heterogeneity of the data made pooling inappropriate.
In a few studies we created new 2 × 2 tables representing the subset of patients eligible for this review) different to those published after excluding patients who had undergone previous surgery. This was done in two studies [37, 63] from data presented in the published papers, and in two studies [26, 27] by the original authors upon request. We also requested and received new 2 × 2 tables for three studies [22, 24, 60] where the reference test was slightly different to our criteria but results could be easily modified.
Index tests evaluated in at least two studies included magnetic resonance imaging (MRI) findings (high intensity zone, disc degeneration, endplate changes, annular disruption and narrowing) the centralisation phenomenon  and response to vibration testing (Table 2). Index tests investigated in single studies were ultrasound (annular tear), radiographs (narrowing), pain drawings, status of posterior annulus (MRI) and isolated findings from the medical history and physical examination. All MRI studies calculated diagnostic accuracy at the level of the disc, while centralisation studies always calculated diagnostic accuracy at the level of the patient. Spinous process vibration was calculated both at the level of the disc and the patient. Some studies calculated diagnostic accuracy at the level of the disc and others at the level of the patient.
Seven [2, 18, 21, 42, 43, 50, 59] of the ten studies investigating high intensity zone found informative +LRs but only four [2, 21, 43, 59] of ten studies found informative −LRs indicating that a positive high intensity zone increases the probability of the disc being a source of pain but a negative test does not usefully reduce the probability of the disc being the source of pain (Table 2). Figure 2a plots sensitivity and 1-specificity for the ten studies as a summary receiver operating characteristic (ROC) curve. The area under the curve equals = 0.88.
The various studies utilized different thresholds for disc degeneration (Table 2). There appears to be a significant threshold effect. When the highest threshold for each study is used seven of the eight studies demonstrate informative +LRs, but only five [5, 18, 28, 37, 56] demonstrate informative −LRs, while if the lowest threshold for each study is used only three studies [5, 37, 56] have informative +LRs, but all eight [5, 6, 17, 18, 28, 36, 37, 56] have informative −LRs. A summary of the results are presented as a summary ROC curve in Fig. 2b. The area under the curve = 0.81.
The diagnostic accuracy of different thresholds was examined in three [5, 20, 56] of the five studies investigating endplate changes (Table 2). Three studies [5, 18, 56] found informative +LRs. Regardless of threshold −LRs were uninformative for all studies.
Because of contradictory findings of the two studies [18, 28] investigating MRI narrowing, it is unclear whether narrowing is a useful test to help rule in or out the disc as the source of low back pain (Table 2).
Because of contradictory findings of the two studies [54, 63] investigating annular disruption, it is unclear whether annular disruption is a useful test to help rule in or out the disc as the source of low back pain (Table 2).
Lack of statistical heterogeneity made pooling possible for LRs from the four studies[12, 25, 26, 60] investigating centralisation. Results indicated informative +LRs (2.8, CI 1.4–5.3) and uninformative −LRs (0.66, CI 0.53–0.83).
Pooled LRs from the four studies [54, 61–63] investigating spinous process vibration at the level of the patient found uninformative +LRs (1.7, CI 1.3–2.4) and −LRs (0.53, CI 0.39–0.72). Pooled LRs from the two studies investigating vibration at the level of the disc found informative +LRs (2.86, CI 2.0–4.0) and −LRs (0.39, CI 0.22–0.72).
Index tests investigated in more than two studies were “Revel’s criteria” (5 or more of 7 clinical characteristics; age >65 years, pain well relieved by recumbent posture, and absence of pain exacerbation with coughing, forward flexion, rising from sitting, hypertension or extension rotation), each of the seven individual variables which make up Revel’s criteria, absence of centralization, and traumatic onset (Table 3). Other index tests studied only in single studies include intra-articular degeneration on CT, many aspects of a medical examination, and clinical prediction rules (Appendix 2).
The two studies by Revel et al. [38, 39] found informative +LRs and −LRs for “Revel’s criteria”. However, two more recent, studies [23, 31] failed to find informative +LRs or −LRs (Table 3). None of the seven individual items that make up “Revel’s criteria” were found to have informative +LRs by more than one study (Table 3). One item (relief with recumbancy) had informative −LRs in two of three studies (Table 3).
Most studies investigating the SIJ only included participants whose primary pain was below the level of the fifth lumbar vertebrae. Consequently, the results relate only to this group of patients. Index tests investigated included clinical examination findings and bone scan (Table 3).
All four studies [22, 24, 52, 60] investigating a composite of pain provocation tests found worthwhile diagnostic validity. Due to lack of heterogeneity of diagnostic accuracy data, pooling was performed giving pooled estimates of 80 (70–88), 75 (67–83), 3.2 (2.3–4.4) and 0.29 (0.19–0.44) for sensitivity, specificity, +LR and −LR, respectively. Only two of the individual pain provocation tests (thigh thrust and sacral thrust) were tested by two studies for their diagnostic accuracy in isolation. Neither test was found to have informative +LRs or −LRs in both studies. Both studies investigating bone scan [29, 49] found high +LR point estimates(6.19, 5.62), however, the confidence intervals were very wide for both studies and crossed 1 in one of the studies. The −LRs (0.58 and 0.88) were uninformative for both studies. The results suggest that a positive bone scan may increase the probability of the SIJ being the source of pain but a negative bone scan does not reduce the probability.
We pre-planned to investigate the influence of reference test quality on the diagnostic accuracy of index tests if sufficient data existed. Due to the low number of studies for most index tests this was only possible for HIZ studies. We investigated the influence of having a control pain free disc as part of the reference standard on the diagnostic accuracy of the HIZ. Three of the ten HIZ studies were controlled. Meta analysis using Meta Disc  found no significant difference (ratio of diagnostic odds ratio (RDOR)= 2.56, CI 0.68–9.59, P=0.14).
With only four studies investigating the most common index test for pain originating from the facet joint (Revel’s criteria) it was not possible to investigate the influence of controlled facet blocks on diagnostic accuracy. Visual inspection of the data showed that the only study using double controlled blocks found lower diagnostic accuracy than the three studies that did not use double blocks [23, 38, 39].
Of the four studies investigating a combination of pain provocation tests of the SIJ, two studies [22, 52] used double blocks as the reference standard. Visual inspection of the data suggests no difference in diagnostic value for this index test between double blocks and single blocks.
This systematic review reveals that there are relatively few studies which have investigated the diagnostic accuracy of tests to identify the disc, facet joint or SIJ as the source of low back pain. Only two index tests (MRI-HIZ and MRI- disc degeneration) have been investigated by five or more studies. Only a few studies evaluated a cluster of signs or a combination of tests. The results of the SIJ studies found increased diagnostic validity for a cluster of tests compared to a single test in isolation. Forming a diagnosis based on a combination of findings is typical of the clinical reasoning approach used by clinicians and should be investigated in future studies.
The results of studies investigating the disc as the source of low back pain indicate that there is no available clinical test which can be used to both increase and to decrease the likelihood of the disc as the source of low back pain. However, several of the available tests (MRI high intensity zone, MRI disc degeneration, MRI endplate changes, and centralisation) have informative +LRs indicating that a positive test result does increase the likelihood of the disc as the source of the patient’s symptoms. The results however are heterogeneous making an accurate prediction of diagnostic strength impossible. Reduced MRI signal intensity is the only index test, which decreased the likelihood of the disc as the source of symptoms and then only when a low threshold is used. When the lowest threshold available in the eight studies was used, all studies found informative −LRs. The data approached statistical heterogeneity (P = 0.03) and a pooled estimate for −LR was 0.21 (0.12–0.35) demonstrating moderate ability for a negative MRI to rule out the individual disc as a source of symptoms.
The results of studies investigating the facet joint as the source of a patient’s symptoms suggest that the currently available tests have limited or no diagnostic validity. Studies of “Revel’s criteria” found conflicting results. However, the only study that used a double block found no useful diagnostic value. Two clinical prediction rules developed by Laslett  (Appendix 2) have both informative +LRs and –LRs. However, these have only been developed in a single study and need validating in an independent sample.
A combination of SIJ pain provocation tests appears to be useful both to increase and to decrease the likelihood of the SIJ as the source of symptoms in patients with pain primarily below the fifth lumbar vertebrae. The summary +LR and −LR of 3.19 and 0.29, respectively suggest moderate changes to the post test probability. While a positive bone scan appears to be useful at increasing the probability of the SIJ being the source of low back pain, it also has very low sensitivity, which means that the majority of patients with pain from the SIJ will have a negative bone scan.
The tests reviewed produce small or at best moderate changes in pre to post-test probability. For example assuming a pre-test probability of 50% for the disc being the source of pain a +LR of 3, as was typical for high intensity zone and centralisation studies, would change the post test likelihood to 75%. The −LR of 0.21 for absence of disc degeneration would reduce the likelihood of the disc as the source of pain to 17%. Assuming a lower pre-test probability of 20% for the SIJ as the source of pain the +LR of 3.19 for the combination of SIJ tests would increase the likelihood to 45%. The −LR of 0.29 for the SIJ tests would reduce the probability to 7%. These changes in probability of the disorder are modest but must be considered in the context of current recommendations that it is impossible to identify a source for a patient’s low back pain.
The results of this study may be used in future research to identify patients more likely to have pain originating from the disc or SIJ and test the effectiveness of treatments aimed at these structures. Currently there is no literature indicating that knowledge of the tissue source of low back pain leads to improved outcomes however this research has been very difficult to perform without easily available and valid methods of identifying the source of low back pain.
The results of this study rely on the accuracy of the reference standards used. There has been much controversy in the literature on discography [8, 10, 41, 55] and to a lesser extent facet and SIJ blocks.[14, 45, 46, 52] A high rate of false positive responses to discography and facet blocks has been reported in the literature by some authors [16, 45]. Other authors have found low false positive rates especially when strict criteria for a positive response are used [10, 55]. In our review we required relatively strict criteria for a positive response to discography (concordant pain and a minimum of two levels tested per patient) and to facet and SIJ injections (at least 50% pain reduction with guided injection). We pre-planned to investigate the impact of even stricter reference standards including a pain free adjacent disc or positive morphology, for discography and higher levels of pain relief or a pain free control injection for facet joint or SIJ blocks. However, there were not enough studies using the higher level of control to investigate if this impacted on the diagnostic validity of different index tests.
One of the limitations of the studies included in our review was that the majority of patients in the trials may not be representative of patients presenting for care of their low back pain. The patients were primarily a convenient sample of patients presenting for each type of diagnostic injection and may be more likely to have the target condition than an unscreened cohort presenting for care of low back pain. There is a need for research to be done in less selected populations however these studies may be difficult to conduct due to the invasive nature of the reference tests. The prevalence of the target disorder varied considerably across the included studies. This implies the populations were dissimilar and some pre-selection bias may have occurred. This may be a primary cause of heterogeneous results making pooling impossible.
It appears that only a small amount of investigation has been performed into the diagnostic accuracy of clinical tests to identify the tissue source of low back pain. There are tests for the disc and SIJ that have some diagnostic value but no test for the facet joint that appears informative. The usefulness of these tests in clinical practice, particularly for guiding treatment selection, remains unclear. Further quality investigation into tests that appear promising is required.
Below is the link to the electronic supplementary material.
Chris Maher’s research fellowship and Mark Hancock’s PhD scholarship are funded by Australia’s National Health and Medical Research Council (NHMRC). An NHMRC project grant funds the salaries of James McAuley and Megan Spindler.