|Home | About | Journals | Submit | Contact Us | Français|
The Spine Patient Outcomes Research Trial (SPORT) was designed to assess the relative efficacy and cost-effectiveness of surgical and non-surgical approaches to the treatment of common conditions associated with low back and leg pain.
To describe the rationale and design of the SPORT project and discuss its strengths and limitations.
First, we explain the rationale for embarking on SPORT, i.e. deficiencies in the existing scientific knowledge base for treatment of these conditions. Second, we describe the design of SPORT, including topics such as: specific aims; participating sites; study population; recruitment and enrollment; study interventions; follow-up; outcomes; statistical analysis; and study governance and organization. Finally, we discuss issues that complicate the performance of randomized trials in surgery as they relate to the design and conduct of SPORT.
The SPORT project is being conducted at 11 clinical centers around the United States. It involves the simultaneous conduct of three multi-center, randomized, controlled clinical trials. The study includes patients with the three most common diagnoses for which spine surgery is performed: intervertebral disc herniation (IDH), spinal stenosis (SpS) and degenerative spondylolisthesis (DS), and compares the most commonly used standard surgical and non-surgical treatments for patients with these diagnoses. By the end of enrollment we anticipate a total of 500 IDH, 370 SpS, and 300 DS patients in the randomized trials. Patients who meet the eligibility criteria but decline to be randomized are invited to participate in an observational cohort study. Patients are being followed for a minimum of 24 months with visits scheduled at 6 weeks, 3, 6, 12, and 24 months.
The results of this study will provide high-quality scientific evidence to aid clinical decision making and improve treatment outcomes for these common, costly, and, in some instances, debilitating conditions.
Few areas of clinical medicine are as controversial as the management of conditions associated with low back pain, as evidenced by wide variability in the use of spine surgery both nationally and internationally.14,52 Overall rates of spine surgery in the U.S. have increased dramatically in recent years.2 Between 1988 and 1997, spine surgery rates among U.S. Medicare enrollees grew by 57%, from 2.1 to 3.4 per 1,000 population.2 Nearly half a million cases of spine surgery were performed in the U.S. in 1999.3 Although only about 10% of patients with low back pain have the conditions that are the focus of SPORT (intervertebral disc herniation (IDH), spinal stenosis (SpS) and degenerative spondylolisthesis (DS)),15 these conditions are especially important to study because they account for a large proportion of the spine surgery that is performed in the U.S. each year.15
The lack of clinical consensus regarding the use of spine surgery can be traced to deficiencies in the scientific data that underlie indications for its use.16,17 There are a few cohort studies comparing surgical and non-surgical treatment for patients with the conditions that are the subject of SPORT.9–12,23,26,36,38,42 However, scientific flaws inherent to cohort studies, especially large baseline differences in the treatment groups resulting from selection bias, preclude definitive conclusions regarding the efficacy of surgical and non-surgical treatment from these data. There is also one randomized, controlled study of surgical and non-surgical treatment for patients with herniated disc.49 This study also had some important flaws. In particular, the primary measure of patient outcomes was the investigator’s categorization of patients’ verbal assessments of the results of treatment. The extent to which this measure correlates with patients’ self-evaluations of clinical, functional, or satisfaction outcomes is unknown. In addition, both surgical and non-surgical treatments have changed dramatically since the time the study was conducted (patients enrolled from 1970 to 1971). The rest of the literature on the treatment of these conditions is based on uncontrolled case series. While such studies may be valuable as an initial assessment of treatment safety, they are not adequate to demonstrate effectiveness. This is particularly true for conditions such as these where spontaneous improvement may result in biased estimates from uncontrolled studies.
Research addressing the causes and treatment of spinal conditions has been given high priority due to its prevalence and associated morbidity, costs, and clinical controversy. The Spine Patient Outcomes Research Trial (SPORT) project application was fortunate to be successful in obtaining funding from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) of the National Institutes of Health (NIH). The purpose of this paper is to describe the design of SPORT (Figure 1) and discuss some of its strengths and limitations.
SPORT has three specific aims:
The 11 clinical centers that recruit patients into SPORT include: Dartmouth-Hitchcock Medical Center, Hanover, NH; Rothman Institute at Thomas Jefferson Hospital, Philadelphia, PA; Rush-Presbyterian-St. Luke’s Medical Center, Chicago, IL; Emory University Medical Center, Decatur, GA; University Hospitals of Cleveland/Case-Western Reserve, Cleveland, OH; New York University/Hospital for Joint Diseases, New York, NY; William Beaumont Hospital, Royal Oak, MI; University of California Medical Center-San Francisco, San Francisco, CA; Washington University Medical Center, St. Louis, MO; Nebraska Foundation for Spinal Research, Omaha, NE; and the Hospital for Special Surgery, New York, NY. Each of these sites is a member of the National Spine Network. This non-profit organization was formed in 1994 to develop and foster high quality and cost-effective care for its member’s patients. To this end, the organization developed an infrastructure for collecting outcomes data and a registry of patients with spinal conditions that provided the pilot data for the SPORT proposal.
SPORT inclusion (Table 1) and exclusion criteria were developed from evidence-based management algorithms produced by the Agency For Health Care Policy and Research1 and others,20 as well as by consensus of the participating physicians.
For IDH, patients are eligible for SPORT if they have radicular pain and evidence of nerve root compression with a positive nerve root tension sign (positive straight leg raise test or femoral tension sign). Alternatively, they may have a reflex (asymmetric depressed reflex), sensory (asymmetric decreased sensation in a dermatomal distribution), or motor (asymmetric weakness in a myotomal distribution) deficit with associated radicular symptoms and positive nerve root tension signs. In addition, a confirmatory imaging study (MRI or CT) must indicate an intervertebral disc herniation (a protrusion, extrusion, or sequestered fragment) at a location (level and side) corresponding with the patient’s radicular signs or symptoms.20 Patients with only a bulging disc (circumferential symmetric extension beyond the interspace) are not eligible.20
Patients are eligible for the SpS trial in SPORT if they have pseudoclaudication (pain in the buttock, thigh, or leg on ambulation that improves with rest) or radicular pain with an associated neurologic deficit. The patient with suspected pseudoclaudication must have strong distal pulses or other testing to rule out vascular claudication. In addition, SpS patients must have a confirmatory imaging study (MRI or CT) showing lumbar spinal stenosis at one or more levels (L2 to sacrum) defined as narrowing of the central spinal canal, lateral recesses, or neural foramina due to encroachment on the neural structures by the surrounding bone and soft tissue. Patients are not eligible if they have evidence of instability on lateral flexion/extension radiographs, defined as a change of greater than 10° of angulation of adjacent segments by Cobb measurement or a change of more than 4 mm of AP or PA translation.
To be eligible for the DS trial in SPORT patients must have pseudoclaudication or radicular pain with an associated neurologic deficit and have a confirmatory imaging study (MRI or CT) indicating spinal stenosis at the L3-4 or L4-5 level and a degenerative forward slippage (of the L3 relative to the L4 vertebral body or the L4 vertebral body relative to the L5 vertebral body) in the sagittal plane on lateral standing radiograph. Displacement is usually less than or equal to 25% of the vertebral body. Patients with adjacent levels of stenosis are eligible (additional levels of stenosis are acceptable at L2-3 and/or L4-5 for degenerative slips at L3-4 and at L3-4 and/or L5-S1 for degenerative slips at L4-5).
Patients in all three diagnostic categories are not eligible for SPORT if they have any of the following exclusion criteria:
A research nurse and the participating physicians at each site identify potential subjects from among their clinic’s patients. In addition, SPORT has been advertised on a national level via press releases issued by the NIH/NIAMS and through guest speaking appearances by the principal investigator at various specialty society meetings. The participating clinical sites have advertised the project on a local level via print, television, and radio media. SPORT is also publicized on the NIH/NIAMS web site and on its own web site at http://sport.dartmouth.edu/nsn.
Videotapes are used as part of the recruitment process to standardize and make available information that potential subjects need to make an informed decision about participation. Evidence-based programs produced by the Foundation For Informed Decision Making, “Treatment Choices For Low Back Pain: Herniated Disc” and “Treatment Choices For Low Back Pain: Spinal Stenosis” were edited to include information about SPORT. The programs provide information from the published literature regarding what currently is and is not known about the risks and benefits of the various treatment options for these conditions in lay language.
Informed consent is obtained from patients who are willing and eligible to participate. Data regarding demographic characteristics, medical history and comorbidities, symptoms (bothersomeness and frequency), and baseline measurements for all outcomes are obtained via a combination of patient interview, patient self-administered survey, and physician survey. All data is collected via portable computers that are equipped with software for touch screen data entry or via a SPORT internet site developed for this project.
Patients who agree to participate in a randomized arm of the trial receive their treatment assignment at the time of enrollment. To assure that the number of subjects is approximately the same in the surgical and non-surgical groups for each site/disease group combination at the end of accrual, with approximately balanced assignment totals at interim intervals as well, a variable blocked randomization scheme is used.21 “Blocked” randomization means that treatment assignment is performed sequentially within blocks of a predetermined size so that when a blocked sequence is completed it will contain nearly equal numbers from each treatment group. “Variably blocked” means that the sequence and size of the blocks is varied randomly to further assure that the treatment assignments can not be predicted or manipulated.
An automated randomization system was created by the Statistical Center for SPORT based on computer generated, random, blocked treatment assignments for each disease group within each site. The treatment assignment tables are accessed through either a Web site maintained by the Statistical Center or through portable computers provided for each site with special software for administering questionnaires. When eligibility criteria have been verified, the program extracts the next randomization code from the appropriate disease sequence along with the sequence number and a confirmation code. These codes are permanently embedded in the patient record along with the exact time of randomization. Automated reports for enrollments are updated on demand through the secured Statistical Center Web sites and allow coordinators and investigators to track enrollment performance on a daily basis.
Patients may choose not to be randomized, but still be part of the study by participating in the observational cohort study. The inclusion and exclusion criteria, study intervention protocols, follow-up schedule, and study endpoints are identical to patients in the randomized study, but treatment is not randomly assigned.
The participating physicians have agreed to use the following standardized surgical approaches by diagnosis:
In addition, participating physicians agreed to forego the use of any experimental devices or biologics as part of these procedures. However, if the physician decides during the surgery that the patient requires a procedure that differs from these protocols, he or she is instructed to perform that preferred procedure and record the details.
At a minimum, patients in all three diagnostic groups receive active physical therapy, education/counseling with home exercise instruction and a non-steroidal anti-inflammatory drug (NSAID) if tolerated. Patients may receive any additional non-surgical therapies deemed appropriate by their physician with all prescribed therapies recorded. Participating sites are encouraged to aggressively utilize all appropriate non-surgical therapies in those non-surgical patients who do not respond to the minimum intervention.
A complete list of non-surgical therapies being tracked in SPORT is shown in Tables 2a and and2b.2b. These lists are meant to include both typical treatments that would be prescribed by the participating physician as well as treatments patients might seek on their own. This approach, as opposed to a more limited, standardized non-surgical intervention in all patients, is more generalizable and will allow for a better understanding of the broad range of interventions, and their associated costs, being used in the care of patients with these diagnoses.
Follow-up data is gathered at two short-term follow-up intervals and five long-term follow-up intervals. The short-term follow-ups are at 6 weeks and 3 months from the time of treatment onset (surgery for the surgical group and the start of non-surgical therapy, generally at the time of enrollment, for the non-surgical group). The difference in timing the short term follow-up visits for the surgical and non-surgical treatment groups was necessitated by wide variability in the amount of time between randomization and surgical treatment. This occurs for a variety of reasons including surgeons’ schedules and case review for insurance and workers compensation. A complex set of follow-up timing rules have been developed to address potential biases that this might create that in some cases requires additional follow-up visits for surgical patients.
Long term follow-up occurs at 6 months, 12 months, and 24 months, and, if time permits, at 36 months and 48 months from the time of enrollment. One-hour clinic visits with the nurse coordinator are scheduled for all follow-ups. When study visits correspond to routine clinical care visits, the meeting with the nurse coordinator is in addition to the usual time scheduled with their physicians. When study visits occur outside the usual care, if possible the patient is also scheduled to meet with their physician at the beginning of this appointment for a brief visit.
The primary outcome measure for the SPORT project is health-related quality of life as measured using both generic and disease-specific instruments. Secondary endpoints, which are included to address the cost-effectiveness of surgical versus non-surgical treatment, include preference-based measures of current health, resource utilization, and work status.
We chose as our generic health status measure the SF-36 Health Status Questionnaire.48 This instrument consists of 36 questions that can be aggregated to form 8 sub-scales (physical function; mental health; general health perceptions; pain; role limitations-physical; role limitations-emotional; social functioning; and vitality) and two summary scales (physical and mental component scales). On each scale, higher scores indicate better outcomes. Scores can be compared to published age- and sex-matched general population or disease-specific norms.
For our disease-specific instrument we chose the Oswestry Disability Index.19 We are using a version of the questionnaire that consists of nine questions relating to limitations in performing the following activities during the past week: getting dressed, lifting, walking and running, sitting, standing, sleeping, social and recreational activities, traveling, and sexual activity. Each question has six graded responses that range from unlimited, pain free activity to total incapacity due to pain. The sum of the 9 responses is expressed as a percentage of the maximum score. In the original scoring of the instrument higher scores represented greater disability.19 We reverse the coding of the responses so that higher scores indicate better outcomes for consistency with the interpretation of the SF-36 measures.
To assess the economic value of low back pain treatment we use quality-adjusted life years (QALYs) as our measure of effectiveness. To estimate QALYs, preference-based measures (i.e., utilities) for current health are required. Utility values,40,44 that range from 0 (worst imaginable health) to 1 (best imaginable health) reflect the desirability of health outcomes. Utilities are important to consider because patients with the same degree of functional limitation or symptoms may feel very differently about life under those circumstances.37
In SPORT, utility for current health is estimated using two preference classification systems—the 5-item EuroQoL EQ-5D.5 and the15-item Health Utilities Index (HUI).47 Both provide societal utility values that are appropriate for estimating QALYs. In addition, we obtain individual patient values for current health using a visual analogue scale (VAS).5
In our analysis, we distinguish among direct inpatient costs, direct outpatient costs, and indirect costs. To estimate costs associated with surgical and non-surgical care, we track resource utilization and work status at each follow-up period. The patient is our primary source for resource utilization and work history data. To enhance the quality of the recalled cost data, two approaches are employed. First, we provide all patients with detailed diaries, which serve as memory aids, for tracking encounters with health care providers. Second, we minimize the period of recall. To capture costs for the first 3 months of study enrollment, cost data are obtained at 6 weeks and 3 months using a 6-week recall window. At the 6, 12, 24, 36 and 48 month visits, we record most costs using a one-month recall period and use it to estimate average costs for the months between visits. However, certain costs, including hospitalizations and medical devices, are captured for the entire period between study visits.
To identify resource use associated with hospitalization, patients are questioned at follow-up visits about hospitalizations since their last study visit. Inpatient services include both physician and hospital service components of costs for each of the hospital stays experienced during the period of follow-up. Additional costs incurred during hospitalization include physician and diagnostic service expenses. We estimate costs for these using the Resource-Based Relative Value Scale (RBRVS),24 which is used by the Centers for Medicare and Medicaid Services.7 We favor this approach over using institution and cost-center-specific cost-to-charge ratios because it reflects national fee schedules and can therefore be generalized to the US population.
To identify care other than hospitalization, patients are questioned at each follow-up visit about their use of outpatient services, medications, and devices. We collect the relevant physician costs for outpatient services using the same RVU weights as for the physician inpatient services. To estimate costs of medications and other health care services not covered by Medicare, we are surveying providers on charges for services at diverse geographic locations involved in this study. Cost estimates of physical therapy at each site are derived based on condition-specific profiles, which are being developed through surveys of each site’s physical therapy practices.
An important component of the overall economic evaluation of surgical and non-surgical treatment for these conditions is the measurement of indirect costs of employees, employers, and the government resulting from the spine-related condition. To this end, our survey asks about occupation, difficulty performing work (either in a job or as a homemaker), job changes, and financial distress stemming from the spine condition. To estimate indirect costs associated with time lost from work, we collect earning and wage information at baseline to allow for assigning costs to lost work hours. Average weekly hours are collected at each study visit, while wage, insurance coverage, disability payments, and measures of difficulty working are collected at 6 months, 12 months, and yearly thereafter. Finally, we collect information about factors such as insurance status, disability payments, and income of other family members to determine what factors affect labor force participation and treatment options. We use the pre-treatment wage rate in these calculations, but also include a question at the 12-month follow-up regarding the post-treatment wage rate.
A monitored event form is completed whenever it is learned that a study patient has been lost to follow-up, has crossed over from their randomly assigned or self-designated treatment group, or has withdrawn from the study. Deaths and hospitalizations are immediately reported to the study Principal Investigator, Data and Safety Monitoring Board, the NIAMS Project Officer, and all participating IRB’s. The Data and Safety Monitoring Board uses narrative data provided by the site regarding the circumstances of the event as well as any documentation in the medical record, discharge or operative notes, or death certificates to determine whether the event was related to the patient’s spine condition or its treatment. All other medical complications are monitored throughout the study at all routine follow-ups and are reported to the Data Safety Monitoring Board and the NIAMS Project Officer at regular intervals. IRB notification guidelines are followed according to each site’s specifications as well as complying with national regulations (45CFR46, see http://ohrp.osophs.dhhs.gov/humansubjects/guidance/45cfr46.htm)
Any deviation from the study protocol is considered a protocol violation. Major protocol violations include: randomization of an ineligible patient; enrollment of a SPORT participant in any other spine-related study; subject receiving the wrong treatment; randomization prior to completion of necessary paperwork or diagnostic studies; loss of radiology or operative report; and informed consent violations. Minor protocol violations include failure to report a monitored event within 24 hours of learning of it and failure to obtain follow-up data within the specified time intervals. Violations are reviewed weekly and serve as data for reports to the DSMB, NIAMS and IRB’s.
Our primary analyses will conform to the “intent-to-treat” or “as randomized” principle. Thus, patients who crossover (elect surgery after having been randomized to the non-surgical arm or decline to undergo surgery after having been randomized to that treatment) will be included in the treatment group to which they were randomized for the purposes of assessing the intervention. The intent-to-treat analysis is an estimate of the intervention effect attributable to the randomization event, i.e., an estimate of the benefit of offering an intervention without regard to compliance. In this sense, it is a more realistic assessment of the “effectiveness” of an intervention in a clinical setting and is statistically unbiased. Secondary analyses will examine the potential effects of treatment in the absence of crossover by tracking patient outcomes and actual treatment status.
All the data analyses required to fulfill the specific aims of the study will be accomplished using statistical methods for repeated measures and longitudinal data.29 The primary study endpoints will be summarized in terms of changes from pre-intervention baseline quality of life measurements. The mean changes observed at specified follow-up times will be compared between the treatment arms with appropriate adjustment for intra-patient correlations. In the event of unexpected baseline imbalances between the treatment arms, adjusted analyses will include important predictors of quality of life scores as covariates.
Sample size calculations were based on our primary outcome measures for health-related quality of life, the SF-36 and Oswestry instruments. For the SF-36 calculations the physical function and bodily pain sub-scales were used. The required sample size for each trial was computed so that a t-test based on the time-specific treatment effect with a two-sided significance level of 0.05 would have a power of 0.85 to detect mean differences from baseline to follow-up of at least 10 points, assuming a 20% loss to follow-up. Appropriate variances for the quality of life scores within each disease group were derived from data provided by the National Spine Network.
The overall structure of SPORT (Figure 2) is similar to many large National Institutes of Health (NIH) sponsored collaborative clinical trials. An Executive Committee, the membership of which includes the study’s Principal Investigator, representatives of NIAMS, and a physician representative from the participating sites, oversees the study. The study is operationally directed by the Scientific Work Group, comprised of the Principal and Dartmouth Co-Principal Investigators as well as the study coordinator and a representative of the Site Principal Investigators. The Scientific Work Group meets regularly and as needed to attend to issues that arise throughout the various phases of the project, including protocol development and/or modification; enrollment; follow-up; data analysis; and reporting.
The Data and Safety Monitoring Board is composed of five voting members from outside the study investigative group with experience in statistical and scientific issues relevant to the conduct of large, collaborative clinical trials. This committee reviews summaries of safety, the accrual and progress of the trial, the quality of the data, and blinded interim efficacy/effectiveness analyses, and reports on its findings to the NIAMS Project Officer. It is the responsibility of the Data and Safety Monitoring Board to interpret data on adverse effects. This group reviews the data every six months and makes recommendations to NIAMS regarding actions to ensure that subjects are not exposed to undue risks. The mandate of the committee complies with the July 1, 1999 release of the NIH Policy for Data and Safety Monitoring, with its primary function being to “ensure the safety of participants and the validity and integrity of the data.”
The Study Coordinating Center for SPORT (see Figure 2) is located at Dartmouth Medical School/Dartmouth-Hitchcock Medical Center in Lebanon, NH. Operationally, there are four groups within the Study Coordinating Center: Statistics, Data Management, Study Administration, and Cost-Effectiveness.
Staff from the Data Management group pre-installed all software used in this study, taught participating center staff how to use it, and maintain ongoing communication with the sites to troubleshoot software and hardware problems.
Baseline, enrollment, treatment, and follow-up data are collected at the participating centers by a combination of patient self-report, physician report, or nurse coordinator interview (personal and phone). This data is entered electronically using touch-screen computers and secure Web sites by patients and study staff at the participating centers. The data collection system has numerous advantages, including:
The Data Management group also maintains a project web site that serves as a reference for protocol and policy information. Data regarding enrollments are updated daily and are also available to the sites and data coordinating center staff on the web site. All of the computers and databases are password protected, hard drives are connected to uninterrupted power supplies, and daily back-up of the individual site databases as well as the combined central database located at Dartmouth is performed to ensure data security and integrity. Data moving from participating centers to Dartmouth are encrypted, and are also moved off the central server once they reach Dartmouth.
Surgical treatments are rarely subjected to rigorous evaluation prior to dissemination.32,43,45 However, a few randomized trials have assessed the relative efficacy of surgical and non-surgical approaches to the same problem, proving the feasibility of randomized studies involving surgery and providing some surprising results. For example, a number of commonly used surgical treatments were abandoned after randomized trials proved them ineffective.4,18,28,34,46 Other trials have resulted in revised indications for surgical treatment (for symptom relief rather than improved survival)39 or shown that the balance between risks and benefits is fragile for many surgical procedures.50,51 In other cases, randomized trials proved that surgical treatments were clearly superior to the non-surgical alternatives.8,35,53
There are a number of issues that complicate the performance of randomized trials in surgery that may relate to why these types of studies are not frequently conducted. In this section of the paper we discuss some of these issues as they relate to the design and conduct of SPORT.
Randomization is not easy for patients or physicians to accept. It is rare that a patient would not have preferences regarding the alternatives where both surgical and non-surgical options exist. It is also not surprising that physicians believe in the value of the treatments that they administer. In SPORT we decided that the best way to recruit patients into this study would be to educate them about the lack of existing scientific evidence to guide clinical decision making for these conditions and the resulting need for randomized clinical trials. Rather than relying solely on the participating physicians to make this case, we decided to use videotaped educational materials that were developed to promote shared medical decision-making as part of our recruitment strategy. These tapes are condition-specific and provide information based on the available scientific data regarding what is and is not known about the relative risks and benefits of surgical and non-surgical treatments for these conditions. The programs were edited with permission from the Foundation for Informed Medical Decision Making to include an invitation to participate in SPORT.
Placebo effects have often been discussed as a potential source of bias in surgical trials.13,25 In perhaps the most famous cases, patients receiving sham internal mammary artery ligation for coronary artery disease and gastric freezing for peptic ulcer in randomized, placebo controlled trials reported improvement in their symptoms in similar proportions to the patients receiving the active treatments.18,41 However, both the ethics and feasibility of performing sham surgery are very much debated and it was determined that placebo controlled studies of spine surgery for these conditions were not possible at this time. Instead, SPORT was designed so that the “control” patients would receive an aggressive form of non-surgical care rather than no treatment. We hope that the videotapes used in the informed consent process may help to mitigate preconceived notions regarding the efficacy of the treatment options.
Prior surgical trials have frequently relied on objective outcomes such as survival or objectively defined clinical events that are believed to be less susceptible to placebo effects. While our primary outcome measures (survey instruments for measuring back and leg symptoms and health-related quality of life) are subjective, we also measure other variables such as treatment complications and work status that are more like objective outcomes. Ultimately, the major goal of elective spine surgery is pain relief, which is necessarily a subjective phenomenon. This contrasts with trials for cardiovascular diseases, for example, where the primary endpoints are objective outcomes such as mortality and cardiovascular events. In the end, we will not be able to determine to what extent any observed benefits of surgical treatment are attributable solely to its placebo effect and this must be considered a limitation of SPORT.
In the design of clinical trials a trade-off between validity and generalizability must be struck. The results of randomized clinical trials are frequently criticized for not being generalizable due to strict institutional and patient eligibility criteria as well as bias in the selection of patients that are invited to participate.
Surgical trials have frequently been criticized for restrictive eligibility criteria that limit enrollment to subjects with the lowest risks of procedure-related complications and the best prognosis.6,8,30 These criticisms refer to differences between patients who are enrolled in the study and those for whom the surgery is regularly performed. Additionally, patients who are willing to be randomized are known to be different from those who refuse in ways that may be related to their response to treatment. Failing to adequately describe the demographic and clinical differences between the patients enrolled in the trial and those to whom the results are intended to apply is a frequent criticism of clinical trials in general.
In SPORT, the results are intended to apply to patients with IDH, SpS, or DS who are surgical candidates and the eligibility and exclusion criteria were designed to capture this population. To gather information that will allow an assessment of our study’s generalizability, SPORT includes both randomized and observational trials in an effort to gather baseline and follow-up information for as many eligible patients as possible. This will allow an examination of the baseline characteristics as well as the responses to treatment for patients who agree and do not agree to be randomized.
One difference between surgery and pharmaceutical therapy is that no two surgeons do an operation exactly the same way or with exactly the same skill. Therefore, surgery cannot be standardized to the same extent that drug therapy can. In addition, surgical procedures evolve over time with the possibility that by the time any trial of a procedure is complete it will be considered obsolete. Prior surgical trials have been criticized for restricting institutional participation to high volume, academic referral centers.51 While these concerns are frequently raised to argue against the use of randomized clinical trials to assess the efficacy of surgical procedures,31 they should be of equal or even greater concern for cohort studies as well as uncontrolled case series.
In SPORT, the participating physicians have agreed to general approaches to both surgical and non-surgical therapy that represent current standard treatment for these conditions. The physicians participating in SPORT are from a range of different medical and surgical specialties (Spine On-line: see Article Plus for details) and their centers represent a wide range of institutions that perform spine surgery, including private and non-profit, academic and non-academic, high volume and low volume. Any observed differences in the specific techniques and skill levels of the physicians and medical centers will be accounted for in the statistical analysis of our data. Any “advances” in treatment that occur during the conduct of the trial can be tested against the most beneficial standard approach upon the completion of SPORT.
While rates of crossover from surgical to non-surgical treatment have been substantial in prior studies of treatment for back conditions, the data from these studies lend some support for the performance of an intention-to-treat analysis. In the Maine Lumbar Spine Study, a prospective cohort study, 15% of the herniated disc patients and 14% of the spinal stenosis patients crossed over from non-surgical to surgical treatment within the first three months of enrolling in the study and were treated as surgical patients in the analysis.11,12 The baseline characteristics of these patients were similar to patients who opted for surgical treatment initially but their reclassification as surgical did not have any affect on the long-term outcomes by treatment group.9,10,27 An additional 16% of the herniated disc patients underwent surgery between 3 and 60 months for an overall crossover rate of 30% at 5 years.9 An additional 8% of the spinal stenosis patients underwent surgery between 3 and 48 months for an overall crossover rate of 22% at 4 years.10 The baseline characteristics and outcomes for the patients who crossed over after three months were not significantly different from those who stayed in the non-surgical treatment group in either the herniated disc or spinal stenosis study.9,10 In Weber’s randomized trial of surgical and non-surgical treatment for herniated disc, 26% of the patients randomized to non-surgical treatment crossed over (all occurring within 1 year).49 Outcomes for the patients who crossed over were not significantly different from those who stayed in the non-surgical treatment group at one year (33% versus 47% “good” results) or ten years (55% versus 59%).49
While our primary analyses will conform to the “intention to treat” principle, an “as treated” analysis is planned to study the potential effects of such crossovers. In these analyses, the intervention group for an individual patient will depend on their actual status at the time of crossover and at each designated follow-up time. These analyses will not be based on a randomized treatment assignment, and will be carefully adjusted for potential confounding factors.
In SPORT we track utilization of health care services in a consistent fashion for all patients regardless of treatment arm. To achieve comparability in cost data across treatment arms, we rely on the patient as the single best source of information. Although this minimizes the potential for ascertainment bias that is inherent in relying on hospital administrative data for costing surgical (mainly hospital-based) versus non-surgical (mainly non-hospital-based) care, it may result in some inaccuracies due to patient recall. However, as detailed earlier, we are using two approaches (detailed diaries and one-month recall window) to minimize these problems.
There are some methodological controversies related to measuring indirect costs and benefits of employed hours and wage changes. The first is that an increase in one hour of work imposes some cost in terms of lost leisure time, so that simply measuring gains in earnings would tend to overstate the true utility gains to the worker. Sensitivity analysis can be used to provide bounds on how important this issue is in measuring indirect costs, as well as quantifying the implications of changes in employment for government tax revenue. Second, and more importantly, is the potential for double counting the benefits of improved functioning if patients report higher levels of current health as a consequence of their being able to go back to work.22 However, one recent study demonstrated that financial constraints did not seem to be related to how patients responded to a survey about health functioning.33 Here again, we will use sensitivity analysis to assess the potential effects of different assumptions regarding indirect cost estimates.
A third issue is whether individual wage rates should be used to value the indirect costs and benefits of work. Using actual wages would imply that a change in one hour of work for an individual making $30 per hour would be weighted five times as heavily as a change in one hour for an individual making $6 per hour, for example. As a statistical issue, this approach could induce a great deal of “noise” into the estimates, especially if there are a few people with very high wage rates. One approach to solving the statistical noise issue is to assign wage rates according to occupation and education cells based on a larger sample from the Current Population Survey. A more philosophical issue is whether society should value the $30 per hour individual’s change in hours of work at five times the rate of the $6 per hour worker.54 An alternative specification that we will test in sensitivity analyses is the calculation of indirect costs where each patient is assigned the average wage of the sample rather than their actual wage.
There are at least three advantages to including observational cohorts as part of SPORT. The first is to test whether there are systematic differences in the type and severity of illness for patients who enroll in the randomized and observational study groups. As stated previously, knowing whether the randomized group is representative of the population of surgically eligible patients with low back pain is important in judging the generalizability of the study. Second, where observational cohort data can be combined with the randomized cohort data, they will increase the sample size and power of statistical analysis involving cost data and many of the secondary outcomes such as work experience and earnings. Finally, the observational group can be used in future ancillary studies and supplemental analysis of outcomes at the different surgical sites.
In other surgical specialties, large-scale randomized trials of surgical and non-surgical treatment alternatives have been successfully conducted despite obstacles that seem much harder than those encountered in assessing the effectiveness of spine surgery. Most causes of back pain are not fatal or associated with permanent pain or disability. In addition, spine surgery is associated with less risk than most of the other types of surgery that have been studied in randomized trials. Spine surgery should not be more difficult to standardize than the other types of surgery that have been studied in this way, such as coronary bypass, carotid artery, and brain surgery. Finally, there is no reason to think that patient and surgeon preferences regarding spine surgery would be any more strongly held than those in cardiovascular surgery or (non-spine-related) neurosurgery, where large scale randomized trials have been successfully conducted.
Supported by grants from the National Institutes of Health - National Institute of Arthritis and Musculoskeletal and Skin Diseases and Office of Research on Women’s Health, the Centers For Disease Control and Prevention - National Institute of Occupational Safety and Health (U01-AR45444-01A1) and Agency For Healthcare Quality and Research (K02 HS11288-).
The authors would like to acknowledge funding from the following sources:
This study is dedicated to the memory of Brieanna Weinstein.
James N. Weinstein, DO, MS, Dartmouth Medical School, Hanover, NH