|Home | About | Journals | Submit | Contact Us | Français|
Clinical trial sub-group analysis
To compare outcomes of different fusion techniques treating degenerative spondylolisthesis (DS).
Surgical candidates from 13 centers in 11 states with at least 12 weeks of symptoms and confirmatory imaging showing stenosis and DS were studied. In addition to standard decompressive laminectomy, one of three fusion techniques was employed at the surgeon’s discretion: posterolateral in situ fusion (PLF); posterolateral instrumented fusion with pedicle screws (PPS); or PPS plus interbody fusion (360°). Main outcome measures were the SF-36 Bodily Pain (BP) and Physical Function (PF) scales and the modified Oswestry Disability Index (ODI) assessed at 6 weeks, 3 months, 6 months, and yearly to 4 years. The as-treated analysis combined the randomized and observational cohorts using mixed longitudinal models adjusting for potential confounders.
Of 380 surgical patients, 21% (N= 80) received a PLF; 56% (N=213) received a PPS; 17% (N=63) received a 360°; and 6% (N=23) had decompression only without fusion. Early outcomes varied, favoring PLF compared to PPS at 6 weeks (PF: 12.73 vs. 6.22, p<0.020) and 3 months (PF: 25.24 vs.18.95, p<0.025) and PPS compared to 360° at 6 weeks (ODI: −14.46 vs. −9.30, p<0.03) and 3 months (ODI: −22.30 vs. −16.78, p<0.02). At two years, 360° had better outcomes: BP: 39.08 vs. 29.17 PLF, p<0.011; and vs. 29.13 PPS, p<0.002; PF; 31.93 vs. 23.27 PLF, p<0.021; and vs. 25.29 PPS, p<0.036. However, these differences were not maintained at 3- and 4-year follow-up, when there were no statistically significant differences between the three fusion groups.
In patients with degenerative spondylolisthesis and associated spinal stenosis, no consistent differences in clinical outcomes were seen among fusion groups over four years.
Lumbar fusion rates increased dramatically during the 1980s, and accelerated further in the 1990s. Medicare spending for back surgery more than doubled over the decade, with lumbar fusion spending increasing more than 500% to $482 million. In 1992, lumbar fusion represented 14% of total spending for back surgery; by 2003, lumbar fusion accounted for 47% of spending.1 While overall rates and cost of spine fusion have increased, there remains little evidence of substantial improvement in patient functional outcomes.
Degenerative spondylolisthesis (DS) is one of the most common conditions for which surgery is performed in the US.1,3 In SPORT, 1,4 as-treated comparisons with careful control for potentially confounding baseline factors showed that patients with spinal stenosis and associated DS treated surgically had substantially greater improvement in pain and function during a period of 4 years than did patients treated non-operatively. Despite the evidence that surgically treated patients fare better, questions remain about which surgical fusion treatment is best. 5-23
In 1991, Herkowitz et al evaluated 50 patients and concluded that posterolateral fusion provided a significant improvement in relief of back and lower limb pain, and that pseudarthrosis did not preclude a successful result. 13 In a follow up study, higher radiographic fusion rates were seen with pedicle screw instrumentation but clinical outcomes were no different. 9 Long term follow-up of patients with uninstrumented fusions showed that patients with pseudarthrosis had worse outcomes than those with solid fusion;17 however, there was no control group and so the role of instrumentation in improving clinical outcomes remains unclear.
Despite increasing efforts to establish a solid fusion, the Cochrane review on fusion for a variety of degenerative conditions of the lumbar spine determined that no conclusions are possible about the relative effectiveness of various fusion procedures (anterior, posterior, or circumferential).11 A review of treatment for DS specifically suggested that spinal fusion may lead to better clinical outcomes, though conclusions about the benefits of instrumentation could not be made. 19
In SPORT, whether and how to fuse patients with degenerative spondylolisthesis was optional, based on surgeon/patient preferences. This paper explores the relative outcomes of three different fusion techniques utilized in SPORT DS patients.
SPORT was conducted in 11 U.S. states at 13 medical centers, and included both a randomized (RCT) and a concurrent observational (OBS) cohort with identical selection criteria and outcomes assessment. Additional background information is available in previous publications. 1,24-26 This report is a sub-group analysis of fusion methods using the combined RCT and OBS cohorts with degenerative spondylolisthesis.
All patients had the following: neurogenic claudication or radicular leg pain with associated neurological signs; cross-sectional imaging showing spinal stenosis; degenerative spondylolisthesis on standing lateral radiographs; persistent symptoms for at least twelve weeks; and physician confirmation as a surgical candidate. Patients with adjacent levels of stenosis were eligible; patients with spondylolysis and isthmic spondylolisthesis were not. Enrollment began March, 2000 and ended February, 2005.
Patients were either treated non-operatively or with surgery. The surgeries were classified into the following groups: 1) decompressive laminectomy only; 2) decompression with posterolateral in situ fusion (PLF); 3) decompression with instrumented posterolateral fusion with pedicle screws (PPS); and 4) decompression with interbody fusion plus instrumented posterolateral fusion with pedicle screws (360°).
Main endpoints were the SF-36 Bodily Pain (BP) and Physical Function (PF) scales,27-30 and the AAOS/Modems version of the Oswestry Disability Index (ODI)31 measured at 6 weeks, 3 months, 6 months, and yearly out to four years. Additional outcomes included patient self-reported improvement; satisfaction with current symptoms and care;32 and the stenosis bothersomeness index.28,33 Treatment effect was defined as the difference in the mean changes from baseline between the three fusion groups.
SF-36 scores were scaled to range from 0 to 100, with higher scores indicating less severe symptoms; the standard scoring for the ODI was also scaled to range from 0 to 100, but with lower scores indicating less severe symptoms; the Stenosis Bothersomeness Index ranges from 0 to 24, with lower scores indicating less severe symptoms; and the Low Back Pain Bothersomeness Scale ranges from 0 to 6, with lower scores indicating less severe symptoms.28,33 For measures with higher values indicating better outcomes (i.e., BP, PF), a positive change in score reflects improvement, while for those measures for which lower values indicate better outcomes (i.e., ODI and the Bothersomeness scales), negative changes in scores reflect improvement.
As part of routine clinical care the majority of patients undergoing surgery had imaging studies done at one- and two-year follow-up. The treating surgeons were asked to evaluate the patients’ fusion status based on all available information. Fusion status was rated as Solid Fusion, Pseudoarthrosis, or Unclear based on the surgeons overall impression, no specific radiographic protocol was used. Surgeons were also asked to record whether any additional testing was used in additional to plain radiographs to assess fusion status.
Statistical methods for the analysis of this trial have been reported in previous publications, 1,34-36 and are summarized here. Initial analyses compared the baseline characteristics of three fusion groups. The extent of missing data and the percentage of patients undergoing surgery were calculated according to study group for each scheduled follow-up. Baseline predictors of time until surgical treatment were determined through a stepwise proportional-hazards regression model with inclusion criteria of p < 0.1 to enter and p > 0.05 to exit. Predictors of adherence to treatment and missing follow-up visits at 1, 2, 3 and 4 years were determined through stepwise logistic regression. Primary analyses evaluated changes from baseline at each follow-up visit, with a mixed effects model of longitudinal regression that included a random individual effect to account for correlation between repeated measurements.
Repeated measures of outcomes were used as the dependent variables, and treatment received was included as a time-varying covariate. Adjustments were made for post-surgical visit times with respect to time of surgery to better approximate the designated follow-up times. Although the focus of this study was to evaluate for differences in outcomes across the three fusion surgical groups, the nature of the experimental design and analysis approach dictated that the overall analysis include all patients, both operative and non-operative, to ensure the best possible estimates of outcome scores across the follow-up interval. Therefore, the fundamental questions of interest regarding differences in outcomes among the three surgical fusion protocols were evaluated by constructing pre-planned contrasts that tested the overall differences in change from baseline between the three fusion groups both overall and at each time of assessment.
We evaluated two basic research questions: 1) Do the studied treatments result in improvement over pre-treatment health status? and 2) Do different fusion approaches result in different patterns of change across the follow-up interval? With regard to Question 1, tests for significant change from baseline were evaluated for all outcomes at each follow-up point. With regard to Question 2, tests for differences in change from baseline across follow-up intervals were evaluated for each treatment group; and then tests for differences between treatment groups were evaluated at each follow-up time. This was done in a hierarchical fashion. At each assessment interval the first step was to test for any differences among the three fusion groups. In order to increase power (since this study was not designed to compare differences in fusion technique), the tolerance for making a Type I error was relaxed by setting the threshold at 0.10. If the null hypothesis (Ho: PLF = PPS = 360°) was rejected at p < 0.10, then the next step was to evaluate for differences between the three groups based on pair-wise comparisons. For these comparisons Type I error was set at 0.05. Since these comparisons represent the most basic approach for evaluating for treatment differences they were considered planned comparisons and, as such, no adjustments were made to control for inflated Type I error rates due to multiple comparisons.
Computations were performed with the use of the PROC MIXED procedure for continuous data and the PROC GENMOD procedure for binary and non-normal secondary outcomes in SAS software, version 9.1 (SAS Institute, Cary, NC). Data for these analyses were collected through May 1, 2008.
607 participants enrolled in the DS SPORT trial (304 in the randomized cohort and 303 in the observational cohort). Of these, 34.9% (212) were non-operative patients and 395 were treated surgically. Of the 395 surgical cases, 380 (96%) had surgical descriptive data and at least one follow up: 21% (N=80/380) had a PLF; 56% (N=213/380) had a PPS; 17% (N=63/380) had a 360°; and 6% (N=23/380) had a decompressive laminectomy only. In the 360° fusion group, 35% underwent an anterior-posterior procedure while 65% underwent a posterior procedure including PLIF and TLIF. These were not independently analyzed due to the small size of these sub-sub-groups. Given the small size of the decompression-only group, they are not considered in this analysis. The proportion of enrollees who supplied data at each follow-up interval ranged from 70% to 99% with losses due to dropouts, missed visits, or deaths (Figure 1).
Table 1 summarizes baseline characteristics of the three fusion groups. Statistically significant differences were seen between the groups in age; race; work status; osteoporosis; neurologic deficits; and stenosis level, location and severity (Table 1). Several significant baseline differences appear to be driven by the 360° group. Compared to the other fusion groups the 360° group was: younger; more likely to be working; less likely to report osteoporosis; had lower rates of stenosis at L3-4; less severe stenosis; less central stenosis; and had lower scores on the SF-36 mental component summary scale. There were no other significant differences in baseline characteristics or functional health status between groups.
These observations highlight the need to control for baseline differences in the adjusted models. Based on the selection procedure for variables associated with treatment, missing data and outcomes, the final as-treated models controlled for the following covariates: age; gender; BMI; compensation status; depression; joint problems; hypertension; current symptom duration; number of moderate/severe stenotic levels; baseline stenosis bothersomeness; enrollment center; and baseline score for each outcome.
The mean surgical times for the three fusion groups ranged from 157 to 274 minutes, PLF having the shortest time and 360° the longest (Table 2). Mean estimated blood loss ranged from 499 cc to 666 cc and was lowest for PLF and highest for PPS. Intra-operative blood replacement was lowest in PLF but did not reach statistical significance (p=0.098); however, there was a difference in the postoperative transfusion rates (14% for PLF versus 26% for PPS and 17% for 360°, p=0.05). The most common surgical complication was dural tear, which was highest for PPS (12% versus 9% PLF and 2% 360°, p=0.047). This may reflect the fewer number of operative levels and severity of stenosis in the 360° group. The 4-year re-operation rate did not significantly differ across the three groups (P=0.27).
Over four years, there were 16 documented deaths across the three fusion groups (Figure 1); 8 PLF, 8 PPS and 0 360° compared to expected numbers based on age-specific mortality rates of 7, 16, and 3, respectively7 for the general population. A Cox model comparing the three fusion treatment mortality rates adjusting for patient age, gender and wait time for surgery was not statistically significant (Wald = 2.71, p = 0.259). However, the hazard ratio for PLF referenced to PPS was 2.30 (95% CI 0.85 to 6.17), which would be clinically significant; this result approached statistical significance at p < 0.10 with a 90% CI of 1.001 to 5.265. All 16 deaths were independently reviewed and 12 were judged not to be treatment-related; 2 deaths were of unknown cause; and 2 were judged as potentially related to treatment. For these 2 potentially related deaths, 1 was in the PLF group and occurred 32 days after surgery due to respiratory distress; the other, in the PPS group, occurred 82 days after surgery due to sepsis.
All three fusion groups demonstrated significant changes compared to baseline in all primary outcomes (BP, PF and ODI) out to four years (Table 3). The patterns of change are depicted in Figure 2. Overall, there were some varying differences between groups during the early time periods and no significant differences between any of the groups in later time periods.
For SF-36 BP, the groups were similar at the early time points, though at 1-year there was a trend toward a difference between the three groups overall (p<0.10) with the 360° fusion demonstrating a slightly larger improvement than PLF (38.99 vs. 30.7; p=0.04) and PPS (38.99 vs. 32.32; p=0.06) in pair-wise comparisons. At 2-years the groups were significantly different (p<0.008), with 360° having significantly better outcomes than PLF (39.08 vs. 29.17; p=0.01) and PPS (39.08 vs 29.13; p=0.003); however, no significant differences were seen between fusion types at 3 years (p<0.79) or 4 years (p<0.74). The outcomes of PLF and PPS were similar in all pair-wise comparisons.
For SF36 PF, there were trends toward early differences at 6 weeks (p<0.07) and 3 months (p<0.08), slightly favoring PLF over PPS (6 weeks 12.73 vs. 6.22; p=0.02 and 3 months 25.24 vs. 18.95; p=0.03). Pair-wise differences between PLF and 360°, or PPS and 360° were not significant at these early time points. No differences between the groups were seen at 1 year but the 360° group had better outcomes at 2 years compared to both PLF (31.93 vs. 23.27; p=0.02) and PPS (31.93 vs. 25.29; p=0.04). There were no significant differences between the groups at 3 or 4 years but there was a trend toward worse outcomes in PPS at 4 years.
For ODI, differences between the three fusion groups were observed at 6-weeks (p<0.10), 3-months (p<0.042), and 1-year (p<0.036). In pair-wise analysis, PPS demonstrated significantly greater improvement than 360° at 6 weeks (−14.46 vs.−9.30, p < 0.03) and 3 months (−22.30 vs. −16.78, p < 0.02). At 1-year PLF was worse than PPS (−20.92 vs.−26.33, p < 0.02) and 360° (−27.61, p < 0.03). Again, no significant differences were seen between any of the groups at 3 and 4 years.
The Stenosis Bothersomeness Scale revealed no statistically significant differences between PLF and 360°, and slightly worse outcomes in PPS compared to 360° that were statistically significant at 2 years (p=0.009) but not at other time points. Back pain bothersomeness showed a similar pattern with somewhat worse outcomes in PPS compared to 360° at 2 and 3 years but not at other time points (Figure 3). There were no significant differences across fusion groups in satisfaction with symptoms or care at any of the 5 follow-up time intervals (data not shown).
Fusion status classifications were reported for 74% (282/380) of the cases. Of the 282 fusion classifications, 89.7% were classified based on plain radiographs only, 3.9% indicated that classification included CT, and the remaining 6.4% indicated that classification included some “other” method.
As illustrated in Table 4, solid fusion was the predominate classification. However, across the three treatment approaches, the solid fusion ratings were significantly different, = 10.69, p <0.005. Follow-up tests using logistic regression methods revealed that the PLF group had a significantly lower solid fusion rate (67.24%) compared to both the PPS (85.29%, p < 0.004) and the 360° (87.04%, p < 0.017) approaches, respectively. The difference in solid fusion rates for the two instrumented approaches was not significant (85.29% vs. 87.04%, p < 0.75).
The rationale for surgical treatment of degenerative spondylolisthesis is two-fold. The primary goal is decompression of the neural structures to relieve the symptoms of neurogenic claudication via laminectomy. Fusion is performed to prevent potential further slippage of the vertebrae and to stabilize the associated degenerative disc and arthritic facets for improvement in or prevention of back pain and possible instability. Traditional factors favoring fusion include: improved spine stability; minimization of long-term back pain from the operated degenerative levels; and concern for recurrent leg pain from progression of the spondylolisthesis in the absence of fusion. Radiographic rates of solid fusion are improved when instrumentation is used, however, many studies have demonstrated a lack of benefit from instrumentation in terms of patient-oriented outcomes. 11,19 Concern for adjacent segment degeneration or facet violation from instrumentation and the potential for increased operative and peri-operative complications must be considered with increased surgical complexity.37,38
In patients presenting with image-confirmed degenerative spondylolisthesis with spinal stenosis, signs and symptoms of which had persisted for at least twelve weeks, the treatment effects did not consistently demonstrate one fusion procedure to be better than any other. On some measures and at some time points, PLF was somewhat better and for others 360° was sometimes better. There was very little to suggest any advantage for PPS based on the outcomes studied here, As with previous studies we did find a higher rate of solid fusion on imaging in the groups with instrumented procedures. We found a 67% solid fusion rate in the PLF group, similar to the 64% seen by Herkowitz et al. 13 and better than the 45% solid fusion rate seen by Fischgrund et al. 9 However, these determinations were largely based on surgeon impressions from plain radiographs and were based on surgeon impression rather than a fixed protocol and, therefore, the reliability and validity of these assessments are unclear. Similar to these prior studies, however, the difference in radiographic fusion rate did not seem to affect the short-term clinical outcomes.
Although SPORT was not specifically designed to study these fusion subgroups, they do represent the largest cohort of DS patients studied to date and the only report in DS comparing the three common fusion methods. Furthermore, the results of this study are strengthened by use of specific inclusion and exclusion criteria, the overall sample size, and adjustment for potentially confounding baseline factors. However, the current study does have several limitations. It represents a subgroup analysis and not the a priori hypothesis for which SPORT was designed. These cases were not randomized to treatment groups, and radiographic fusion was not formally assessed predominately by CT or fusion exploration. Although these data suggest that fusion method does not influence outcome out to 4 years, further study with appropriate methodological design is necessary to properly answer the questions of clinical outcome, risk, cost effectiveness, and benefit of each of these fusion techniques. In addition, these results may not extrapolate to clinical outcomes for spine fusions performed for other diagnoses than DS.
We are aware of only one other study specifically comparing different fusion techniques specifically in patients with DS. Fishgrund et al. randomized 66 patients undergoing decompressive laminectomy to posterolateral in situ fusion or instrumented posterolateral fusion with pedicle screws, similar to the PLF and PPS groups in this study. 9 They found a higher pseudoarthrosis rate in the PLF group but no difference in clinical outcomes at 2-year follow-up, similar to our findings.
Several comparative studies of these different fusion techniques have been performed in other lumbar conditions. Andersen et al randomized patients with a variety of degenerative conditions (but not degenerative spondylolisthesis) to posterolateral fusion with or without pedicle screw instrumentation (PLF vs. PPS) and found no significant differences in pain outcomes at 5 years, similar to our results in DS. 39 Similarly, the Swedish Lumbar Spine Study Group randomized 222 patients with degenerative low back pain (not DS) to PLF, PPS, or 360° fusion and 72 to a nonsurgical group. The clinical outcomes were similar across the 3 fusion groups at 2-years. 40 Also, Swan et al. compared instrumented posterolateral fusion with circumferential fusion in patients with radiographically unstable isthmic spondylolisthesis in a non-randomized prospective cohort study and found significantly better outcomes with 360° fusion at 6 months and 1 year, but results became similar between the groups at 2 years. 41
Our results only go out to 4 years and there is the possibility that differences between the groups could emerge with longer-term follow-up. Kornblum et al. followed DS patients following PLF and found that at long-term follow-up (5-14 years, average 7 years 8 months) patients with pseudarthrosis reported worse clinical outcomes than those with solid arthrodesis.17 However, this case series of patients with posterolateral in situ fusion did not have a control group who underwent instrumented fusion and thus does not shed any direct light on the relative outcomes between different fusion approaches. Videbaek et al. report the long-term follow-up of patients with a variety of degenerative lumbar conditions (although not DS) randomized to PPS or 360° fusion. 42 Of note, they found no differences between the two groups at 2 years but found significantly better results in the 360° group at 5-9 years. This highlights the importance of ongoing follow-up in our study.
There was little evidence of harm from any of the fusion treatments. Over 4 years there have not been any cases of paralysis in any of the treatment groups. The 2 year re-operation rates in each of the three fusion groups were similar to those seen in the Swedish Lumbar Spine Study for PLF (14% vs. 12%) but lower in PPS (11% vs. 22%) and 360° (9% vs. 17%), 40 and somewhat higher than those in the study by Fischgrund, et al (PLF [14% vs. 6%] and PPS [11% vs. 8%]).9 The overall complication rates were slightly higher than those in the Swedish Study for PLF (22% vs. 12%) and PPS (38% vs. 22%) but similar for 360° (25% vs 25%). 40 Overall perioperative mortality was 0.05%, which is less than the 1.3% seen in Medicare patients after fusion surgery for spondylolisthesis. 7 The 4-year mortality rate was similar across all fusion groups and was lower than actuarial projections, suggesting the likely selection of healthier than average patients for surgery.
Patients with degenerative spondylolisthesis and associated spinal stenosis are commonly treated by a combined procedure of decompression and fusion. Results out to four years suggest no significant advantage of one fusion technique over another on clinical outcomes, though longer-term follow-up may be needed. The fusion techniques were not randomly assigned and selection bias may have affected these results; a more definitive study would require random allocation into the various surgical approaches.
The authors would like to acknowledge funding from the following sources: The National Institute of Arthritis and Musculoskeletal and Skin Diseases (U01-AR45444) and the Office of Research on Women’s Health, the National Institutes of Health, and the National Institute of Occupational Safety and Health, the Centers for Disease Control and Prevention.
Trial Registration: Spine Patient Outcomes Research Trial (SPORT): Degenerative Spondylolisthesis with Spinal Stenosis; #NCT00000409; http://www.clinicaltrials.gov/ct/show/NCT00000409?order=22