|Home | About | Journals | Submit | Contact Us | Français|
Research on surgical interventions is associated with several methodological and practical challenges of which few, if any, apply only to surgery. However, surgical evaluation is especially demanding because many of these challenges coincide. In this report, the second of three on surgical innovation and evaluation, we discuss obstacles related to the study design of randomised controlled trials and non-randomised studies assessing surgical interventions. We also describe the issues related to the nature of surgical procedures—for example, their complexity, surgeon-related factors, and the range of outcomes. Although difficult, surgical evaluation is achievable and necessary. Solutions tailored to surgical research and a framework for generating evidence on which to base surgical practice are essential.
Evaluation of a therapeutic, procedure-based intervention presents several methodological and practical challenges for the surgical research community. Few, if any, of these challenges apply only to surgical procedures; many arise during the assessment of other non-pharmacological interventions, such as interventional radiology, technical procedures and devices, rehabilitation, behavioural interventions, and psychotherapy.1 However, what is arguably unique to surgery is the way in which many of these challenges coincide. Perhaps this situation leads many surgeons to view randomised controlled trials (RCTs)—although theoretically advantageous—to be too difficult and impractical to undertake, and at worst, irrelevant to their practice because of concerns about generalisability.2 Most of the same challenges also affect non-randomised studies and, in some cases, to a greater extent. Despite the barriers, an RCT remains the best possible study design for the assessment of therapeutic interventions.
This report, the second of three papers on surgical innovation and evaluation, presents the conclusions of a meeting held by the Balliol Collaboration on April 3, 2009. By identifying many issues related to surgical research and deconstructing them into constituent methodological parts, we targeted several important areas to develop guidance for appropriate, evidence-based surgical practice. Here, we discuss the challenges related to study design of surgical research and the challenges related to the nature of surgical interventions. Recommendations for improvement and solutions are presented in the third report in this Series.3
RCTs are considered the gold standard for establishing safety and efficacy of an intervention. Despite calls for surgical research to be more rigorous, the overall frequency of RCTs has been consistently low since the 1970s.4 Large, high-quality RCTs have been done in a variety of surgical specialties, but those of the surgical procedure itself are less common. Most surgical RCTs have focused on other aspects of the intervention, such as anaesthesia or pharmacological interventions, in preoperative and postoperative care.5 There are several types of RCT that can be used for different aims.6 A useful distinction can be made between explanatory trials, which seek to assess whether an intervention can work, and pragmatic trials, which seek to inform clinical decision making. Pragmatic trials are needed to ensure surgical practice is based on evidence. Some criticisms of surgical trials are misplaced and reflect misunderstanding of a trial’s aim. Specific challenges to the planning and conduct of randomised trials comparing a surgical intervention with different types of comparator are summarised in the table. Many of the issues raised might need different solutions, depending on the aim of the study.
A particularly difficult question in the assessment of a new surgical intervention is whether an RCT is necessary and, if so, when the first one should be done. Conceptually, there are few arguments against doing RCTs early in development, although a further assessment might be appropriate. For a new surgical intervention, it can be difficult to decide when to shift from an early exploratory stage of development to a formal investigation. If done too early, the constraints of an RCT could obstruct innovation, and if too late, equipoise could be lost. Another consequence of an early RCT is that the definitive technique might not be fully refined; the subsequent study outcome then reflects the stage of development and learning, and not the therapeutic effect of the intervention.11 Additionally, restricting a new procedure to an RCT might be impractical in the absence of regulation that prevents surgeons offering the intervention to patients outside the trial.12
When two interventions have different benefit-to-harm profiles, patients and surgeons might strongly prefer one intervention. Strong preferences might lead patients and surgeons to decline trial participation, making trial recruitment more difficult if not impossible. This situation is often intractable when the preferred intervention is widely available. The strength of a patient’s preference partly depends on the comparator used in the trial (table). Trial designs that seek to evaluate preferences have been proposed.13,14 Possible comparisons are a new procedure versus a sham (placebo) procedure; similar but distinguishable procedures; substantially different procedures; or surgery versus non-surgical treatment, such as a medical treatment, participative interven tion (eg, rehabilitation), or watchful waiting. Research to assess surgical innovations in emergency and paediatric settings are perhaps particularly susceptible to pre ferences that preclude randomisation. Breast surgery is the archetypal example (difficulty of randomising patients to segmental mastectomy or mastectomy).15
Response to uncertainty has also been suggested as an explanation why surgeons might be unwilling to take part in surgical trials. Compared with physicians, surgeons might be less tolerant of uncertainty about the effectiveness of alternative treatments, affecting their participation in RCTs and thereby making surgical trials more difficult to undertake.16 Previous negative experiences and perceived threat of litigation might make some surgeons reluctant to submit parts of their practice to evaluation. A feasibility study of patients’ willingness to participate in surgical and oncology trials found a low level of willingness because of a stated dislike for randomisation, and a desire to make their own decisions about the selection of the intervention.17 Preferences of both surgeons and patients might rarely be based on existing evidence. Patients need to be provided with sufficient information to make informed decisions, which has not always been the case.18 Qualitative research can provide insights into the recruitment process and might enable greater participation in RCTs.19
Randomisation should be done as close to the time of the intervention as possible to reduce the possibility that the allocated intervention will not be delivered because of strong preferences, knowledge of allocation assignment, cancellations, or clinical events before the procedure.20 However, randomisation needs to be sufficiently early for the patient and surgical team to be adequately prepared. In the case of two surgical procedures, randomisation can often be done in the operating theatre, for example, by use of a telephone or web-based randomisation service.21 The challenge is exaggerated when substantially different (eg, surgical vs pharmacological) interventions are compared and participants have to be told their allocation in advance of receiving the intervention, or when the new procedure is available outside the trial. Irrespective of the timing of randomisation, a surgeon might decide that a surgical procedure is inappropriate, impossible, or unsafe after randomisation. The Spine Patient Outcomes Research Trial (SPORT),10 which compared surgery and non-surgical management in low back pain, reported substantial crossing over of patients in both directions. Although the principle of intention to treat provides the preferred analysis framework for dealing with crossing over, application of the results of such an analysis to other settings can be difficult (for example, if crossovers are predominantly from the new treatment to the conventional treatment).
Absence of masking can lead to several forms of bias: performance bias (surgeons, other caregivers, or patients choosing concurrent interventions depending on their allocation); attrition bias (differential withdrawal from follow-up); and detection bias (differential outcome assessment).22 Masking of surgeons, patients, and other caregivers is difficult and often impossible in surgical trials; nevertheless, innovative methods of masking are available.23 In a comparison of laparoscopic and small-incision cholecystectomy, bloody bandages were used to blind patients and other caregivers.21 Sham, or placebo surgery, in which the surgeon mimics the intervention, has been used to assess arthroscopic surgery for osteoarthritis of the knee and stem-cell therapy for Parkinson’s disease.7,24 The use of placebo surgery is controversial, and has been restricted to cases where a suitable comparator was not available or the placebo surgery had limited risk.25 Although masking of the surgeon and patient is difficult, it should be possible to blind the clinical assessment of outcomes (though seldom done).26 If patients cannot be masked to treatment assignment, some outcomes can be susceptible to bias, especially patient-reported outcomes.
The principle of random allocation of participants to surgeons with expertise of different procedures—ie, an expertise-based design (an intrinsic feature of a comparison between a surgical procedure and a non-surgical treatment)—has been proposed for the comparison of two surgical procedures.27 Similar to cluster randomisation, this design protects against contamination and allows surgeons with strong pre ferences to take part. However, this design brings its own challenges: more surgeons are required, the comparison could be confounded by the characteristics of surgeons who prefer one technique, and the logistics of shared waiting lists across surgeons are formidable.
A tracker trial design has been proposed to reflect and incorporate the difficulty of incremental and stepwise innovation during assessment of a surgical procedure.12 In this design, modifications of the surgical technique during the progress of the trial are allowed, recorded, and subsequently tracked in the statistical analysis. Variations in the randomisation scheme, such as adding a new treatment, are also allowed. In principle, the full development of an innovation can be assessed in a single study. Although conceptually attractive, tracker trials would be very challenging in practice.
As previously mentioned, several factors contribute to make RCTs of surgical procedures difficult and, in a few cases, impossible. For example, lesser surgical inno vations might have such a small effect on a serious but rare outcome (eg, mortality) as to make an RCT evaluation prohibitively large to achieve adequate statistical power. Historically, most advances in surgical knowledge have been accepted on the basis of non-randomised studies.28 Surgical interventions such as heart, liver, kidney, and lung transplantation are established therapies in developed countries.29 None of these procedures has been validated with RCTs, and it is generally regarded unethical to do so in view of the apparent benefit.30 Other advances have been identified through observational studies, or even anec dotes, because of dramatic effects, where biases are unlikely to be so severe that they could account for the findings.31
Early exploratory cases of new procedures are likely to be reported as case reports or case series.32 Large cohort observational studies have been critically and extensively used to develop and validate risk assessment for surgical therapies, to monitor safety in practice, to identify treatment effects (adverse or beneficial) that might not have been looked for or detected in original studies, and to estimate treatment effects when RCTs were deemed impracticable (eg, rare events, observations far in the future).33 When RCTs are not feasible, it is essential to undertake high-quality non-randomised studies.34 A dichotomy between randomised and non-randomised studies is some what artificial since both designs can provide different and complementary evidence.35 For example, a non-randomised investigation of long-term and rare safety outcomes could be done alongside an RCT.
Overall, most surgical studies are non randomised and often retrospective; their quality is also very variable and often poor.4 Prospective comparative designs are substantially more useful than case series, which are over-represented in surgical publications. An important driving factor behind non-randomised studies is that they are easier to undertake than RCTs, and increasingly so with electronic data collection and standardised databases.36 However, a lack of appropriate planning and poor data quality (missing data for important risk factors, inconsistencies, and the absence of key diagnostic and operative details) are common setbacks that tend to undermine the validity of results from non-randomised studies.
Protocol-driven studies, which account for all cases and have accurate and informative clinical data, are needed. More effort should be focused on data collection to reduce bias caused by incomplete data or unmasked outcome assessment, as is the case in RCTs. However, even well-designed non-randomised studies face many of the difficulties associated with RCTs—for example, the existence of a learning curve. Accounting for any pretreatment differences between intervention groups is also a particular concern in non-randomised studies.37 Rigorous prospective design and data collection provide some protection against biases.38 The use of any statistical adjustment (such as propensity scores) to overcome potential confounding effects will only have merit if informed by comprehensive clinical understanding of the condition and its risk factors. The detailed patient data needed to undertake such an analysis is rarely available.
Finally, causal inferences established in a non-randomised study are judged weaker than those identified in an RCT and need cautious interpretation.39 There are several examples of established surgical practices—previously validated with non-randomised studies—that have been discontinued after testing in a large RCT (eg, extracranial to intracranial bypass,40 carotid endarterectomy,41 lung volume reduction surgery42). Since current advances have been more subtle, the need for RCTs should increase—ie, the smaller the difference in outcome, the greater the need for an RCT.
We need to recognise that many surgical interventions are complex and require appropriate evaluation.43 Surgical interventions, like other non-pharmacological interventions such as therapist-based and educational interventions, consist of several components that cannot be separated.44 This situation contrasts with most pharmacological interventions, which can be readily defined and standardised. Although the surgical procedure itself requires attention, a surgical intervention can depend on many health-care professionals and involve other aspects of health-care delivery in ways that a pharmacological intervention does not (figure).
A surgical procedure is mainly delivered by a surgeon and is affected by characteristics such as surgical skill, decision making, preferences, and experience. The delivery of a surgical intervention also depends on the other members of the team (eg, anaesthetists, nurses, technicians) and preoperative and postoperative management (eg, emergency department, imaging services, postoperative recovery ward, intensive care, and rehabilitation programmes). This complexity often receives little recognition in the design of surgical studies. Indeed, its existence is sometimes used to criticise studies of surgical interventions for failing to control for potential confounding factors.45
An example of a typical complex surgical intervention that consists of several interacting components is coronary artery bypass graft surgery (CABG). The aim of this procedure is to revascularise the myocardium by bypassing coronary arteries that are stenosed or blocked. Several steps constitute the surgical procedure: opening the chest; harvesting conduits; attaching (and later detaching) the heart–lung machine; undertaking the anastomoses; reanimating the heart; closing the chest. In the case of CABG, there is limited variation in technique between surgeons.46,47 However, there are many recognised variations in surgical strategy, such as off-pump CABG (avoidance of the heart–lung machine), minimally invasive approaches, and different choices of bypass conduits (eg, bilateral mammary arteries, radial arteries). Some decisions are made intra-operatively (eg, whether additional grafts are needed) and will depend on the judgment of the individual surgeon. Other co-interventions might be used, such as antifibrinolytic agents, insulin, or hypothermia. Preoperative medical care (eg, coronary care unit/cardiology management, medical management of comorbidities, blood bank management), roles of other members of the surgical team (eg, nurses, anaesthetists, perfusionists), and postoperative care (eg, intensive care, acute and chronic cardiac rehabilitation) also vary and affect outcomes.48 These supporting components vary between centres and are affected by infrastructure, staffing, and local policies.
Although an intervention needs to have a coherent aim (or function), different forms are often available.49 The complexity, and potential variability, of a surgical intervention raises two difficult questions for the design of a surgical evaluation for which only general answers can be given. First, when is variation in form substantial enough to be worth assessing? Second, when investigating alternatives, how standardised should they be, in view of the complexity of the steps involved? Continuing the CABG example, does avoidance of the heart–lung machine warrant investigation? If so, how standardised should the off-pump CABG surgical strategy and other steps be? The effect on health services (eg, equipment resources, staff requirements such as training), the potential for a change in the balance of benefits and harms, or consensus among surgeons could justify assessment of alternatives. The degree of intervention definition and the level of standardisation of the new approach will depend on the stage of development and the aim of the evaluation. The amount of information that researchers need to record about the conduct of an intervention will depend on how an intervention is defined and the degree of standardisation sought. Very restrictive approaches could limit surgeon participation and might not be feasible in some centres.
As previously mentioned, attributes of the surgeon, such as surgical knowledge, previous training and experience, and inherent skills, will influence the delivery of a surgical intervention and lead to variability in practice and health outcomes. Variability can be expected irrespective of previous training and experience. Differences between surgeons interact with patients’ differences, affecting the responses to operations. The expectation that all surgeons should attain the ideal, often high level of performance is unrealistic. Evaluations of surgical procedures should therefore be done in realistic settings.
The learning curve for a surgical intervention, whereby surgeons acquire expertise, poses an important challenge. Since the technical and functional success of a procedure is paramount, the early stages of assessment, and thus publication of results, tend to focus on complications.50,51 For example, the rate of bile duct injuries associated with laparoscopic cholecystectomy fell as the surgeons’ experience increased.52 Proxies for operative expertise, such as duration of surgery and amount of blood loss, have been used to assess the impact of learning.51 The effect of the learning process on health outcomes is subject to debate and likely to vary between interventions. For complex operations (eg, radical prostatectomy and laparoscopic hernia repair), learning can continue over a very long time, perhaps hundreds of procedures.9,53
Evaluation of a new surgical intervention versus an established control has been criticised, owing to a perceived imbalance of experience that favours the established comparator.54 Some have sought to undertake studies that include surgeons who have completed their learning. This strategy is complicated by individual surgeons learning at different rates and the effect of external factors on the learning process itself.55 Trial design could be modified to incorporate individual learning, and studies of surgical innovations should consider the effect of learning. Better recording of surgical training and the experience of participating surgeons would be a step forward. Collection of comprehensive data on new interventions, which requires surgeons to document personal procedure-based learning, would allow a more informative assessment of surgical learning.56
The key questions to address when planning a study of a surgical intervention are: what is the outcome, how should it be measured, who should assess it, and when? The quality assurance literature has used the terms structure, process, and outcomes as suggested aspects for measuring the quality of surgical care.57 Traditionally, surgeons themselves have selected and assessed the outcomes, mainly focusing on short-term clinical measures of technical success and harm. However, such outcomes are often not standardised and therefore not reproducible, which hinders evaluation. For example, a systematic review showed that in 107 studies, there were 56 separate definitions of anastomotic leak at any site after gastrointestinal surgery, precluding comparison of leak rates between studies.58 The absence of standardised (agreed upon) surgical terminology for the definition of clinical outcomes has long been recognised and this has led to the development of methods for grading and classifying deviations from the normal postoperative course, which have been tested, modified, and validated.59 One proposed strategy is to use a validated therapy-oriented classification system for complications, which ranks adverse events by severity with avoidance of confusing terms (panel).59,60 This system could be adjusted to match a clear and consensus definition of postoperative events within specific specialties of surgery.61
Although these surgeon-selected (or physician-centred) clinical outcomes (eg, mortality and morbidity rates) are very important to patients, evaluation of surgery needs to be widened to include the patient’s perspective. Patients’ perceptions—and thus reporting of symptoms and function—can differ from the surgeon’s assessment; what patients judge as important (eg, social, emotional function) might be different from the issues of interest to surgeons. Therefore, studies of surgical interventions require assessment of both clinical and patient-reported outcomes. Typically, this information is captured in questionnaires assessing health-related quality of life. It can be difficult to decide which outcomes are best suited to a particular medical problem. Methods to select and incorporate assessment of health-related quality of life in trials are emerging and better performed studies will produce more reliable data. However, despite the recent interest in this area, there seems to be a gap between measuring health-related quality of life outcomes and using the information to change surgical practice.62 This division might occur because the surgical community does not understand the data or because clinical outcomes are considered paramount. Methods to accurately measure and interpret patient-related outcomes alongside clinical data are needed so that surgeons can effectively evaluate surgery and subsequently inform patients.
There are some situations in which patient-related outcomes are more important than clinical outcomes (eg, palliative surgery, functional outcomes after joint replacement surgery) and capturing these data within well-designed trials is essential. Developing core outcome sets with key clinical, technical, and patient-reported outcomes will help to facilitate the process. Methods to reach agreement about these outcomes have been developed in rheumatology.63 Additionally, economic evaluation is crucial for efficient use of often limited resources.
Any deviation from the normal postoperative course without the need for pharmacological treatment or surgical, endoscopic, and radiological interventions. Allowed therapeutic regimens are: drugs as antiemetics, antipyretics, analgesics, diuretics, electrolytes, and physiotherapy. This grade also includes wound infections opened at the bedside
Requiring pharmacological treatment with drug other than such allowed for grade I complications. Blood transfusions and total parenteral nutrition are included
Requiring surgical, endoscopic, or radiological intervention
Life-threatening complication (including CNS complications)* requiring intermediate care or intensive-care unit management
Death of a patient
Grading system proposed in 2004. The key concept of this scale was that objective severity of a complication could be defined by the treatment it provoked to reverse it, or death. *Brain haemorrhage, ischaemic stroke, subarachnoid bleeding, but excluding transient ischaemic attacks. From references 59 and 60 with permission.
A more comprehensive approach to studying surgical procedures is needed. This approach should use accurate, standardised clinical and patient-reported outcomes, recorded in real time, and whenever possible by an independent observer who is masked to treat ment assignment. After the early development of surgical interventions, comprehen sive assessment of outcomes is recommended for all other stages of development.3 This approach provides infor ma tion to allow evidence-based comparisons between different interventions.
The traditional hierarchical system of surgery epitomises eminence-based medicine. This master–student apprentice ship tradition holds that the master has all the knowledge and skill and the student learns by observation and emulation. This approach can prevent new models and information from entering independent practice. Despite attempts to implement change with aggressive knowledge translation methods,64 adoption of best practice guidelines in surgery remains poor without involvement of surgical opinion leaders.65 This might help explain, in part, the slow acceptance of evidence-based surgery, and in particular, RCTs. Meakins’ editorial introducing the first users’ guide for evidence-based surgery did not appear until 2001,66 well after the introduction of users’ guides to the medical literature in 1993.67
The basic principles of clinical epidemiology and biostatistics are familiar to surgeons, but formal training in these specialties is rare. Without the appropriate amount of methodological expertise, it has been difficult to transform the surgical culture to an evidence-seeking profession.68 Research funding agencies have developed programmes to increase research exposure of junior members of the medical faculty, but there is some evidence that surgeons are less likely to apply for funding and are less successful when they do.69 Perhaps as important for improved research is in creased recognition of the need for collaboration between surgeons and methodologists, to enable high-quality and clinically relevant studies through the combination of expertise. Surgical and research communities and funding bodies need to recognise this gap in knowledge.
Surgeons must devote a substantial proportion of their career development to their craft, irrespective of whether or not they choose an academic career. Surgical research with an RCT design is not favoured because of the protracted nature of these trials, which, when combined with the obligatory time commitments of the operating theatre, is currently not conducive for rapid career advancement.70 By contrast, many established (funded) faculty development programmes are in place for basic science and medical disciplines in which pharmacological interventions are the predominant focus. Although some funding bodies have provided increasing support to improve surgical research, a disproportionately smaller number of surgeons are working in this specialty.71
Surgical research should generally follow the same ethical and scientific principles as pharmacological research. Worldwide mandatory regulations, such as the International Conference on Harmonisation guidelines, Directives of the European Union, and the US Food and Drugs Administration, have been developed for assessment of drugs. There are no regulatory procedures for licensing surgical treatments on the basis of high-quality evidence. However, this type of evaluation through assessment bodies has begun to appear in some developed countries.72,73 Unlike pharmacological evaluations, industry funding is limited and financing of research by health-care funding agencies is greatly needed. Whether a regulatory framework and an agency for surgical innovation would make a difference to the quality of surgical research is speculative.
Rigorous evaluation of new surgical interventions, although difficult, is achievable and necessary. The complexity of surgical procedures makes it difficult, if not generally impossible, to mirror some aspects of pharmacological research. This shortcoming has contributed to uncertainty about the risk of biases and has led to scepticism about the value of surgical research. Although much criticism is aimed at RCTs of surgical procedures, few of the challenges apply only to this type of study design; an RCT should be the default choice for an evaluation. A greater understanding of the processes of evaluation in surgery could lead to more high-quality studies. Surgery does not lack evaluative research. What it does not have are accepted guidelines for generating valid evidence: systematic, well-planned and conducted, and meticulously reported evidence, on which surgical practice can be based.
The Balliol Colloquium has been supported by Ethicon UK with unrestricted educational grants and by the National Institute of Health Research Health Technology Assessment Programme. The Balliol Colloquium was administratively and financially supported by the Nuffield Department of Surgery at the University of Oxford and the Department of Surgery at McGill University. JAC holds a Medical Research Council UK special training fellowship. The University of Aberdeen’s Health Services Research Unit is core funded by the Chief Scientist Office of the Scottish Government Health Directorates. IB is supported by a grant from the Société Française de Rhumatologie and Lavoisier Program (Ministère des Affaires Etrangères et Européennes). PLE is a DPhil Candidate in Evidence-Based Health Care at Oxford University.
This is the second in a Series of three papers on surgical innovation and evaluation
Conflicts of interest We declare that we have no conflicts of interest.
Patrick L Ergina, Department of Surgery, McGill University Health Centre, Montreal, QC, Canada; Oxford International Programme in Evidence-Based Health Care, Balliol College, Oxford University, Oxford, UK .
Jonathan A Cook, Health Services Research Unit, University of Aberdeen, Aberdeen, UK .
Prof Jane M Blazeby, Department of Social Medicine, University of Bristol, Bristol, UK .
Isabelle Boutron, Centre for Statistics in Medicine, University of Oxford, Oxford, UK .
Prof Pierre-Alain Clavien, Department of Surgery, University Hospital Zurich .
Prof Barnaby C Reeves, Department of Clinical Science .
Christoph M Seiler, Department of General, Visceral and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany .