|Home | About | Journals | Submit | Contact Us | Français|
The quality and quantity of randomised trials of surgical techniques is acknowledged to be limited. According to Peter McCulloch and colleagues, however, some aspects of surgery present special difficulties for randomised trials. In this article they analyse what these difficulties are and propose some solutions for improving the standards of clinical research in surgery
The improvement in the quality of clinical research in the past decade is to be welcomed, but it carries its own dangers. Some have extrapolated the advantages of the randomised controlled trial (RCT) into the dogma that it is the only valid method for comparing treatments,1 ignoring the difficulties that have hampered the use of RCTs in some disciplines. The RCT has theoretical advantages over other study designs, but experimental studies comparing treatment effect estimates in randomised and non-randomised studies have not consistently confirmed this,2,3 w1-w3 and the superiority of RCTs should not therefore be accepted as axiomatic.
Small, poorly conducted RCTs are more likely to result when RCTs are difficult to conduct, and these may then be misleading because their design affords them unwarranted credibility. Surgery seems to be such an area. Until recently, most studies of operations were retrospective case series, with RCTs accounting for less than 10% of the total.w4-w6 RCTs declined from 14% of research articles in the British Journal of Surgery in 1985 to 5% in 1992.4,5 Treatments in general surgery are half as likely to be based on RCT evidence as treatments in internal medicine.6,7 Methodological quality was poor in 56% of RCTs comparing cancer surgery techniques.8 Only 58% of these studies described satisfactory randomisation, and few significant outcome differences were found, probably because of type II statistical errors.
Why is surgery so deficient? Some of the obstacles militate against all scientific studies, but in view of previous specific criticism,w7 we focus on randomised trials and try to evaluate the problems and suggest potential solutions.
History did not favour the validation of surgery by RCTs. After the invention of anaesthesia and antiseptic techniques, surgical treatments were rapidly developed for many previously untreatable conditions. Many current operations were therefore introduced well before randomised trials became established in medicine—unlike most modern drugs. Once a treatment is accepted as standard, testing it against placebo becomes difficult. Rarely, treatment benefits are so obvious that a trial would clearly be unethical,9 but often lack of equipoise (see below) simply prevents studies. This problem applies equally to old drugs—for example, digoxin—which are also difficult to study in RCTs using placebo. For fields such as cardiac surgery, transplantation, orthopaedics, and neurosurgery, however, which have developed rapidly since 1950, surgeons cannot fall back on history to explain the lack of rigour in surgical research.
Doctors can be tempted to ignore evidence that threatens their personal interests. Objectivity about procedures central to a surgeon's reputation is difficult, and RCTs may seem threatening. Private sector competition may affect surgeons particularly strongly, and it arguably influenced the introduction of laparoscopic cholecystectomy. A consensus conference in 199410 quoted many reports of increased bile duct injuries and only two RCTs.11,12 The benefits that these showed were not overwhelming against this evidence of possible harm, but further RCTs were declared infeasible because the technique was already so widespread. Surgeons' eagerness to learn the operation seemed related more to commercial concerns than to concern for patients.
Other doctors regard surgeons as making up in self confidence for what they lack in patience, a stereotype containing a kernel of truth. Career surgeons are selected for traits that include comfort with making important clinical decisions quickly with incomplete information. This quality, required for decisive action during operations, may make it difficult for them to be consciously uncertain which of two treatments is better. This state of equipoise, however, is a prerequisite for performing RCTs.
These are real and major problems for surgical trials.w8 The difficulty is partly self inflicted as funding bodies are influenced by the poor quality of much previous surgical research.w9
Subjectively, surgeons' knowledge of clinical epidemiology remains poor despite relevant publications in surgical journalsw10-w17: we have no objective evidence that they receive less specific education than other doctors.13 w15 Surgeons recruit patients for cancer chemotherapy trials14 w18 but less readily for trials of surgical technique. Whether lack of education can explain this is unclear.
Emergency surgery often occurs outside normal working hours and involves urgent lifesaving treatment, making consent and randomisation difficult. Uncommon conditions are difficult to investigate when accrual of patients takes over two years.13
Some authors suggest that RCTs of new operations should begin with the first patient.15 w19 Operations, however, are complex procedures, and quality in performance requires frequent repetition over time. Learning curves of similar lengths are reported for disparate operations.16,17 w20 During the learning curve, errors and adverse outcomes are more likely. Randomising between a familiar and an unfamiliar operation therefore introduces bias against the latter, as observed for gastrectomy.18 This problem for surgical RCTs has few parallels in drug trials.
Variations on an operation are common and may influence success rates. When comparing operations, clear definitions are therefore needed of the limits on acceptable technical variation. A standard description may be necessary, proscribing all modifications. If definitions are not precise, the treatments delivered may overlap, whereas in drug trials, treatments are usually simple to define exactly.
The technical quality of operations undoubtedly affects outcome. Poor quality surgery represents failure to deliver the intended treatment, causing a difference between efficacy and effectiveness. Trials then measure deliverability, not efficacy.w21 Quality control failures may narrow important differences in the surgery received—for example, for gastric cancer19,20—and may influence outcomes.w22 w23 Defining and enforcing minimum quality standards may be difficult for surgical trials.
RCTs consume substantial resources and are therefore not justified for some questions about small modifications to treatments. Surgical technique typically progresses via such modifications, which individually are unlikely to produce detectable benefits, but which collectively may do so. During the historical progression through hand washing via the use of antiseptics to the aseptic surgical environment, the change in morbidity from surgical infection was huge, but the increment with each step was small enough to allow persistent scepticism.21 Small randomised trials of components of this progression showed no benefit.22 w24 If a positive RCT were required before adopting each small improvement, most would be rejected, and progress would be slowed. RCTs are appropriate where a clear, clinically important choice exists between contrasting alternatives. For smaller changes, an industrial paradigm may be needed.
Three types of RCT are commonly described as “surgical.” Type 1 trials—standard RCTs comparing medical treatments in surgical patients—account for 75% of “surgical trials.”23 Type 2 trials—comparing surgical techniques—pose the problems described above. Type 3 trials—comparing surgical and non-surgical treatments—pose particular difficulties with the equipoise of patientsw25: patients often reject RCTs because they do not wish their treatment to be decided by chance.w26 Type 3 trials increase this discomfort because the adverse effects of the options often differ enormously and the surgical option is irreversible. Eighty two per cent of problems preventing type 3 trials are related to patients' equipoise.13 Examples of choices include aspirin versus carotid endarterectomy to prevent embolic stroke24 and goserelin versus castration for prostate cancer.25 w27 Such trials may recruit slowly, or select an unusual subgroup of patients, making them impractical or their results difficult to generalise.w28
Blinding is particularly difficult in surgical trials, although creative solutions—such as the use of standardised wound dressings—can succeed.w29 Only a third of surgical trials examined by Solomon et al had adequate blinding of patients and/or surgeons.23
History—A comprehensive review of the evidence base is needed to indicate areas warranting new trials of old techniques.
may be less obstructive in a framework of comprehensive continuous performance evaluation (see below).
, if confirmed, may need to be accommodated by including parallel, non-randomised, preference arms alongside RCTs.
require a change to a culture of cooperation rather than competition. This would facilitate the creation of large groups to perform specific trials, thereby attracting funding and developing the infrastructure. This change would require support from bodies responsible for funding clinical research.
needs to be investigated and if necessary corrected through the bodies responsible for postgraduate surgical education and training.
will always be challenging areas for RCTs, but have been successfully studied in other disciplines.26 w30 Paediatric oncologists have illustrated the enormous value of cooperation through their success in trials on childhood leukaemia.27 w31
needs to be recognised and evaluated using appropriate statistical techniques.28 Trial methodology will need modification—for example, to show completion of the curve before beginning randomisation,w32 as in two recent trials.29,30 In theory, patients could also be randomised not to operations but to surgeons, who would perform their operation of preference, although this option remains untested in practice.
Precisely defined photographic or video evidence and/or pathological specimens could document the nature and quality of the treatment delivered, as in a recent trial of total mesorectal excision in rectal cancer.31 Norms for pre-trial success rates and complications could provide a basis for defining acceptable quality, making reliable surgical audit data essential for participation in RCTs.
Surgeons should adopt industrial quality assessment techniques to evaluate changes in technique where RCTs are inappropriate.32 The Japanese term “kaizen” defines an evaluative system akin to the classical audit loop.w33 Sequential approaches such as CUSUM33 and the “control curve”32 are also applicable to surgical innovation.
in type 3 trials may be helped by decision analysis techniquesw34 and carefully designed composite end pointsw35 to reflect the contrasting possible outcomes of trial arms.
will always be difficult for surgical treatments,34 but blinded observers should be used routinely for evaluating outcomes.w36
This analysis of the problems shows why current practices are not working. We need a framework that reflects the difficulties of evaluation in surgery.
The baseline for the scientific study of surgery is routine collection of comprehensive data about practice and outcomes. The culture and organisation necessary for this should permit easy participation in trials, whereas where these are absent, trialists have to develop the trial infrastructure and run it simultaneously. Surgeons need the resources to record a meaningful audit dataset, entailing considerable investment in data acquisition and management resources.
Systems for continuous quality control, using instruments such as CUSUM, CRAM or VLAD plots33,35,36 or control curves32 should be used for the analysis of technical innovations. Indications of outcome changes from this surveillance should lead to an audit or kaizen assessment, using decision analysis techniques to determine whether an RCT is warranted.w37 Where it is not, continuing prospective data collection and regular re-evaluation using bayesian analysisw38 provide the best available data on outcome changes and allow reconsideration of the need for an RCT.
When RCTs are necessary, they should routinely be preceded by preliminary phase 2S (phase 2 surgical) studies. These would develop satisfactory definition criteria for the procedure, test measures of surgical quality, define suitable end points, estimate the required sample size, and analyse the learning curve of participants. Such studies would reduce the problems of timing surgical RCTs, and randomisation could be introduced early using “tracker” designs if desired.w39 During randomised data entry, continuous quality control should be linked to preplanned interim analyses by the trial review committee and appropriate stopping rules. Objective validation of quality should evaluate images, pathological specimens, and outcome data against criteria drawn up in the phase 2S study. Parallel preference arms may be used to improve overall power and evaluate generalisability. For type 3 trials, end point design and decision analysis tools to help patients understand their choices may be important.
Historically, the surgical literature is poor in RCTs. Meta-analysis of non-randomised evidence should therefore be used wherever appropriate. Where RCTs are difficult for sound reasons, prospective non-randomised designs that minimise known biases should be considered sympathetically by journals and funding bodies.
The substantial obstacles to RCTs of surgical techniques should be recognised. Alternative methods of studying operations should be based on comprehensive prospective audit data. Where RCTs are appropriate they require attention to the issues of the learning curve, intervention definition, and quality control; a preliminary non-randomised phase is also recommended.
This work was partly inspired by interactions with members of the Cochrane Non-randomised Studies Methodology Group and by the activities of its surgical subgroup. We thank Laurent Audige and Barney Reeves in particular for their helpful criticisms. The final article is the responsibility of the authors and not of the surgical subgroup.
Competing interests: PMcC and DG are members of the Cochrane Non-randomised Studies Methodology Group and its surgical subgroup. PMcC is a member of the Centre for Evidence Based Medicine and is paid to facilitate at its Oxford teaching courses once a year.
References cited in the text with the prefix “w” are available on bmj.com