|Home | About | Journals | Submit | Contact Us | Français|
Subgroup analyses according to treatment received.
To evaluate whether baseline radiographic findings predicted outcomes in patients with degenerative spondylolisthesis (DS).
The SPORT combined randomized and observational DS cohorts.
The Meyerding listhesis grade was determined on the neutral radiograph (n=222). Patients were classified as having low disk height if disk height was less than 5 mm. Flexion-extension radiographs (n=185) were evaluated for mobility. Those with greater than 10° rotation or 4mm translation were considered Hypermobile. Changes in outcome measures were compared between listhesis (Grade 1 vs. Grade 2), disk height (Low vs. Normal) and mobility (Stable vs. Hypermobile) groups using longitudinal regression models adjusted for potential confounders. Outcome measures included SF-36 bodily pain (BP) and physical function (PF) scales, Oswestry disability index (ODI), stenosis bothersomeness index (SBI), and low back pain bothersomeness scale.
Overall, 86% had a Grade 1 listhesis, 78% had Normal disk height, and 73% were Stable. Baseline symptom severity was similar between groups. Overall, surgery patients improved more than patients treated non-operatively. At one year, outcomes were similar in surgery patients across listhesis, disk height, and mobility groups (ODI: Grade 1 -23.7 vs. Grade 2 -23.3, p=0.90; Normal disk height-23.5 vs. Low disk height -21.9, p=0.66; Stable -21.6 vs. Hypermobile -25.2, p=0.30). Among those treated nonoperatively, Grade 1 patients improved more than Grade 2 patients (BP +13.1 vs. -4.9, p=0.019; ODI -8.0 vs. +4.8, p=0.010 at 1 year), and Hypermobile patients improved more than Stable patients (ODI -15.2 vs -6.6, p=0.041; SBI -7.8 vs -2.7, p=0.002 at 1 year).
Regardless of listhesis grade, disk height or mobility, patients who had surgery improved more than those treated non-operatively. These differences were due, in part, to differences in non-operative outcomes, which were better in patients classified as Grade 1 or Hypermobile.
Since early clinical descriptions of degenerative spondylolisthesis (DS), 1,2,3 it has been suggested that certain radiographic features are related to surgical outcomes, including the magnitude of the slip,4-6 the degree of disk space narrowing,6-8 and angular and translational hypermobility identified on functional radiographs.6,9-14 Many of these clinical studies on degenerative “instability” were not specific for DS and included patients with a variety of degenerative conditions.13-18 Because most of these studies have focused on surgical outcomes,5,9,13-16,19-21 the role of radiographic findings in predicting the natural history and non-operative outcomes in patients with DS remains unknown.8,22 23 As a result, it is uncertain to what extent treatment decisions for DS should be influenced by radiographic findings.
Recently, we reported the results of the Spine Patient Outcomes Research Trial (SPORT) for DS, which demonstrated that patients treated surgically had quantitatively better outcomes than patients managed non-operatively over two years of observation.23 The specific goals of the current study were to: 1) describe the baseline characteristics of DS patients stratified by listhesis grade, disk height, and hypermobility; and 2) determine if surgical and non-operative outcomes were associated with these baseline radiographic findings.
The initial design of SPORT consisted of a randomized controlled trial with a concurrent observational cohort study conducted in 11 states at 13 institutions with multidisciplinary spine practices.24 The human subject committees at each participating institution approved a standardized protocol for the study.
Patients were considered for inclusion in the DS cohort of SPORT if they: were over 18 years old; had neurogenic claudication or radicular pain with associated neurologic signs for at least 12 weeks; spinal stenosis on cross-sectional imaging; DS identified on the standing lateral radiograph; and were considered surgical candidates by their treating physicians.23,24 Exclusion criteria included: cauda equina syndrome; malignancy; other significant deformities; prior back surgery; and other established contra-indications to elective surgery.24
All patients enrolled in the study (n=607) had lateral neutral standing and flexion-extension radiographs obtained at baseline evaluation. Because of the technical complexities and expense involved in digitizing radiographs, the neutral lateral radiographs of 222 patients were available for independent review. Of this group, 185 patients also had digitized flexion-extension radiographs available. Thus, the 222 patients included in this study represent a “convenience sample” of the entire DS cohort, including patients from 11 of the 13 sites.
The listhesis grade was quantified on neutral lateral radiographs using Meyerding’s classification.25 Anterior translation of the listhetic vertebra less than 25% of the anterior-posterior (AP) vertebral depth was classified as Grade I, and translation of 25-50% was classified as Grade II. There were no translations greater than 50%.
Disk height was measured on the lateral radiograph using Quint’s method specific for spondylolisthesis (Figure 1).26 The calculated disk height was normalized to the AP vertebral depth in order to account for differences in magnification.A patient was classified as having low disk height if the normalized disk height was less than or equal to 0.139. This stratification was based on Wilke et al. who classified a decrease in disk height by two-thirds as severe,27 and Frobin et al. who reported the average disk height at L4-L5 as 0.41 (normalized to the AP depth of L5).28
The AP translation of the listhetic vertebra relative to the lower vertebra from the extension to flexion radiograph was determined using Quint’s method of digitizing the corners of the vertebral bodies26 and then calculating intervertebral rotation and translation using the method of Morgan and King (Figure 1).29 Intervertebral rotation was calculated as the change in intervertebral angle from extension to flexion.26 Based on Hanley’s definition of instability, patients were classified as Hypermobile if anterior translation of the affected motion segment exceeded 10% of the AP vertebral depth or if intervertebral rotation exceeded 10 degrees.30 Patients not meeting these criteria were classified as stable.
We previously have reported the intra-rater reliability of the radiographic measurements employed in this study and found intra-class correlation coefficients of 0.90, 0.89, and 0.93 for translation, intervertebral angle, and disk height, respectively.31
All patients treated surgically had a decompressive laminectomy. If fusion was performed, it consisted of iliac crest bone grafting with or without instrumentation based on the surgeon’s preferences.24 The non-operative treatment group received “usual care”, recommended to include at least physical therapy, education and counseling with home exercise instruction, and non-steroidal anti-inflammatory drugs if tolerated. Details are reported elsewhere.23
Data utilized in this study were obtained from patient questionnaires completed at baseline, one and two years after enrollment or surgery that included the SF-36,32 ODI, 33 Low Back Pain Bothersomeness Scale,34 and the Stenosis Bothersomeness Index.35,36 The SF-36 scales and the ODI range from 0-100, the Stenosis Bothersomeness Index from 0-24, and the Low Back Pain Bothersomeness Scale from 0-6. Higher scores indicated more severe symptoms on the ODI, Stenosis Bothersomenss Index, and Low Back Pain Bothersomeness Scale, while higher scores indicated less severe symptoms on the SF-36.
The initial design of SPORT included both a randomized and an observational cohort. In the first two years of surveillance of the DS randomized trial, 36% of patients assigned to surgery did not have that intervention, and 49% of patients assigned to non-operative treatment did have surgery. We previously reported a comparison of the baseline characteristics between the randomized and observational cohorts.23 The only significant differences between the cohorts were a lower frequency of L3-4 involvement and lateral recess stenosis in the observational group. Furthermore, there were no significant differences in the treatment effects of surgery between the two groups. Given the high rate of treatment cross-over and the consistency of the baseline characteristics between the randomized and the observational cohorts, the data from both cohorts were combined in an as-treated analysis. The detailed statistical rationale for this strategy has been published elsewhere.37
Differences in baseline radiographic characteristics were compared for the listhesis (Grade I vs. Grade II), disk height (normal vs. low) groups, and mobility (stable vs. hypermobile) groups, using chi square tests for categorical data and t-tests for continuous data. Comparisons of clinical baseline characteristics and outcomes were also made between the patients included in the convenience sample and those not included to evaluate if the convenience sample was representative of the overall cohort.
The primary analyses compared changes in the clinical outcome measures from baseline as a function of the degree of listhesis, disk height, and mobility within each treatment arm (i.e. surgery or non-operative). In addition, the treatment effects of surgery were also compared between the listhesis, mobility, and disk height groups. The treatment effect of surgery was defined as:
Positive treatment effects for SF-36 scores and negative treatment effects for ODI, Stenosis Bothersomeness Index, and Low Back Pain Bothersomeness Score indicated that surgery was more effective than non-operative treatment. In these analyses, the treatment indicator (surgery or non-operative) was assigned according to the actual treatment received at each time point. For surgery patients, all changes from baseline prior to surgery were included in the estimates of the effect of non-operative treatment. Following surgery, follow-up times were measured from the date of surgery.
To adjust for potential confounding, baseline variables associated with missing data or treatment received (age, sex, work status, depression, osteoporosis, joint problems, duration of current symptoms, reflex deficit, number of moderate or severe stenotic levels, medical center, and baseline SF-36, ODI and Stenosis Bothersomeness Index scores) were included as adjusting covariates in longitudinal regression models.38 A random effect was specified to account for the repeated measurements of individual patients. Statistical analysis was performed on SAS Software (SAS Institute Inc, Cary, NC) using PROC MIXED for continuous data with normal random effects (BP, PF, ODI, Sciatica Bothersomeness) and PROC GENMOD for non-normal outcomes (Low Back Pain Bothersomeness). At each time point, adjusted mean scores were estimated, and differences between the listhesis, mobility, and disk height groups were compared using a Wald test. Statistical significance was defined as p<0.05 on the basis of a two-sided hypothesis test.
Of the 892 DS patients eligible for the study, 607 (68%) were enrolled while 285 (32%) declined to participate. The convenience sample selected from the 607 enrolled patients included 222 (37%) patients with neutral lateral digitized radiographs and 185 (30%) who also had lateral flexion-extension radiographs available for analysis. The baseline characteristics of the convenience sample patients were not significantly different from the 385 patients not included except that a lower proportion of the convenience sample had motor weakness (18% vs. 28%, p=0.008) and L3-L4 listhesis (5% vs. 12%, p=0.014). About 60% of available images were from the observational cohort and 40% form the randomized cohort. Outcome data were available for 98% of patients at 1 year and 94% of patients at 2 years.
The study population had a mean age of 66 years, and 70% were female (Table 1). Nine percent had applied for or were receiving disability compensation, and 23% were working full time. Of the 222 patients with lateral radiographs, surgery was performed in 139 patients (63%) within the first 2 years, and the remaining 83 (37%) were treated non-operatively.
Among surgical patients, 71% underwent decompression with instrumented fusion, 24% decompression with uninstrumented fusion, and 5% decompression alone. Detailed information about surgical complications has been published elsewhere.23 In the 83 patients treated non-operatively, treatment included education and counseling (88%), non-steroidal anti-inflammatory drugs (52%), narcotic pain medication (36%), physical therapy (41%), and epidural injections (36%).
Of the 222 patients, 192 (86%) had Grade I listhesis, while the remaining 30 (14%) demonstrated Grade II slips (Table 1). The Grade II group included a higher proportion of females (90% in Grade II vs. 67% in Grade I, p=0.02) and patients with low disk height (60% in Grade II vs. 16% in Grade I, p<0.001). There were no other significant differences in baseline characteristics between the Grade I and Grade II groups, nor did the type of surgical procedure performed vary by listhesis grade.
The treatment effect of surgery for the Grade II group at one year was greater than the Grade I group for bodily pain (41.2 vs. 14.5, p=0.003) and ODI (-28.0 vs -15.7, p=0.04) (Table 2). These differences resulted from the poorer outcomes in the non-operatively treated Grade II group (Grade II vs. Grade I BP -4.9 vs. +13.1, p=0.02; ODI +4.8 vs. -8.0, p=0.01). At 2 years, these differences in treatment effect were no longer significant, with a delayed improvement in BP between 1 and 2 years in the nonoperatively treated group with Grade II slips.
Seventy-eight percent of the 222 patients had normal disk height at the affected level, while the remaining 22% were classified as low disk height (Table 1). The low disk height group was older (69.3 years vs. 64.7 years, p=0.007) and had worse physical function scores (28.4 vs. 36.2, p=0.03), but the type of surgical procedures performed were similar for the two groups.
Among patients treated surgically, the normal disk height group improved more than the low disk height group on Stenosis Bothersomeness Index at 1 year (-9.8 vs -6.3, p=0.01), though this difference was no longer significant at 2 years (-8.5 vs -6.1, p=0.10) (Table 2). Comparison of the low disk height and normal disk height patients treated non-operatively revealed no significant differences in outcomes, and there were no significant differences in treatment effect sizes between the disk height groups.
Of the 185 patients with flexion-extension radiographs, 131 (73%) were classified as stable, while the remaining 44 (27%) demonstrated hypermobility. The hypermobile patients included fewer females than the stable group (58% vs. 75%, p=0.04), but the two groups were otherwise similar at baseline (Table 1). The hypermobile group was more likely to undergo instrumented fusion than the stable group (88% vs. 64%, p=0.03).
The hypermobile group tended to improve more with non-operative treatment than did the stable group. As a result, hypermobile patients had a smaller treatment effect of surgery. Among non-operatively treated patients, the hypermobile group improved more than the stable group on ODI (-15.2 vs -6.6, p=0.04 at 1 year) and Stenosis Bothersomeness Index (-7.8 vs. -2.7, p=0.002 at 1 year and -6.8 vs. -2.3, p=0.02 at 2 years). Consequently, the treatment effect of surgery on Stenosis Bothersomeness Index was smaller for the hypermobile group at 1 year (-0.7 vs. -6.5, p=0.007), and the difference was on the threshold of significance at 2 years (-1.5 vs. -5.8, p=0.06).
This analysis of a subset of the SPORT DS patients demonstrated only modest associations between baseline radiographic parameters and surgical and non-operative treatment outcomes. In this study, we have focused on three radiographic measurements: the degree of listhesis, disk space narrowing, and hypermobility at the affected motion segment. We did not evaluate other radiographic factors that may predispose patients to DS, such as facet orientation and segmentation abnormalities,39,40 and this study did not allow for measurement of deformity progression over time.
The effect of listhesis grade on clinical outcomes has not been carefully analyzed in DS.4,5,9 Sengupta and Herkowitz recommended instrumented fusion for the uncommon case of DS where the slip exceeds 50%, but they did not make specific recommendations for Grade I and II listhesis.6 In accordance with prior studies, we found the Grade II listhesis group included a higher proportion of women and low disk height patients than did the Grade I group.6,8 We also found the treatment effect of surgery was significantly greater for Grade II than Grade I patients at one year. These data suggest a high grade slip portends a poorer outcome with non-operative treatment.
Disk height was generally not associated with outcomes in either the surgically or non-operatively treated patients. Patients with low disk height were older, had baseline physical function scores indicative of more severe symptoms, and, as noted, were more likely to have Grade II slips. Some investigators have suggested that loss of disk height, although indicative of greater degeneration, could lead to “re-stabilization” and diminution of back pain over time.8,28,41 However, that same process could be associated with worsening stenotic symptoms due to advancing facet degeneration accompanying the disc space narrowing. This study does not shed light on that question.
Of the three radiographic parameters evaluated, hypermobility (referred to by some as “segmental instability”) has received the most attention in the literature. Many investigators have suggested instrumented fusion is most appropriate for hypermobile DS patients despite a lack of rigorous outcome data to support this.6,9-11,13,14 Surgeons participating in the current study appear to have been influenced by this recommendation, insofar as instrumented fusion was more commonly performed in patients with increased angular and translational movements.
Our mobility analyses did yield some unexpected results. First, the hypermobile group had a significantly lower proportion of females compared to the stable group, a finding contrary to the traditional belief that women are more likely to have segmental hypermobility than men. However, our findings are consistent with those of McGregor et al who found that women had a smaller total lumbar flexion-extension range of motion compared to men.42 Second, radiographically hypermobile patients had better non-operative treatment results than stable patients. Comparison of this finding to prior results is limited since no prior DS study has compared surgical and non-operative outcomes stratified by baseline radiographic mobility. In a small study that combined spinal stenosis and DS patients, Yone et al demonstrated that unstable patients undergoing decompression with instrumented fusion had better results than unstable patients undergoing decompression alone.13,14 In the current study, we were unable to compare surgical outcomes among hypermobile patients who underwent decompression alone, decompression with uninstrumented fusion or decompression with instrumented fusion because of the small sample sizes.
This unexpected finding does bring into sharper focus the long-unanswered question, “What constitutes instability in patients with degenerative diseases?” Traditionally, DS has been viewed as the prototype for degenerative instabilities because it was the condition most associated with the four criteria for instability at a motion segment: pain, translational and rotational hypermobility, risk for progressive deformity, and the potential for neurologic insult.21,43,44 Because we have not obtained follow-up radiographs, we cannot be certain if the deformity increased or if progressive listhesis was associated with increased pain. However, we do know that symptoms decreased over two years with non-operative treatment, particularly in those patients with stringently defined radiographic hypermobility. While surgery resulted in better outcomes for both hypermobile and stable patients, hypermobility should not be considered a contra-indication to non-operative treatment. Given that hypermobile patients improved with non-operative treatment, the need for fusion in these patients, particularly those with significant medical comorbidities, should be further explored.
The current study has some limitations. Although all patients enrolled in the study had spinal radiographs, we were able, for logistic and economic reasons, to digitize and analyze images from only about one third of the patients. Comparison of baseline characteristics of the patients with available images to those not included suggested that the two groups were very similar, so we felt that the convenience sample was fairly representative. Another issue was the relatively small proportion of patients that were classified as having Grade II listhesis, hypermobility or low disk height. These relatively small subgroups may have limited our power to detect significant differences.
A statistical limitation related to the overall SPORT study has been the substantial cross-over between treatment arms.23 In our prior publications, we have addressed this concern and the rationale behind the decision to perform an as-treated analysis using multiple regression models to control for baseline differences.23,37 Despite this approach, it is possible the current analysis could be vulnerable to residual confounding by unmeasured variables.
Similar to prior spinal radiographic studies, the criteria for classifying listhesis, disk height, and mobility were somewhat arbitrary, although they were based on literature standards.25,27,30 The greatest continuing controversy is the radiographic definition of instability.21,43,45-49 Some authors have suggested that there is substantial overlap in radiographic findings between “unstable” and “normal” subjects.50,51 For this reason, we chose a stringent radiographic definition of hypermobility. We also performed analyses in which the listhesis, disk height, and mobility groups were defined by the median value for each characteristic and found similar results (data not shown). The more perplexing problem is that anterolisthesis can be found in 10% of females over age 65, yet the majority of patients are asymptomatic.52,53 In addition, radiographic signs of “instability” do not correlate with symptoms.45,54 SPORT utilized strict clinical and radiographic inclusion criteria, and we wish to emphasize that our findings do not apply to patients without both clinical signs and symptoms of spinal stenosis and associated radiographic abnormalities.
Finally, obtaining reliable radiographic measurements can be difficult.55 Given that the intra-class correlation coefficients for the measurements used in this study ranged from 0.89 to 0.93, we feel that the measurement technique and classification system were reasonably reliable.31
Despite its limitations, the current study has identified associations between baseline radiographic findings and outcomes in DS patients which should be useful to clinicians caring for these patients. Patients with Grade II listhesis had a greater treatment effect of surgery compared to Grade I patients at 1 year, suggesting that surgery is more strongly favored in patients with higher grade slips. The other significant and unexpected finding was that hypermobile patients had better non-operative outcomes than did stable patients. This indicates that hypermobility should not be considered a contra-indication to non-operative treatment, and leaves open the questions “Who benefits most from fusion?” and “Which patients require instrumented fusion?”
The authors would like to acknowledge funding from the following sources: The National Institute of Arthritis and Musculoskeletal and Skin Diseases (U01-AR45444-01A1) and the Office of Research on Women’s Health, the National Institutes of Health, and the National Institute of Occupational Safety and Health, the Centers for Disease Control and Prevention. The Multidisciplinary Clinical Research Center in Musculoskeletal Diseases is funded by NIAMS (P60-AR048094-01A1). Dr. Pearson was funded by NIAMS (T32-AR-049710). Dr. Lurie received support from a Research Career Award from NIAMS (1 K23 AR 048138-01).