|Home | About | Journals | Submit | Contact Us | Français|
Randomized controlled trials for antipsychotic drugs have a variety of design features suited to diverse purposes. Efficacy (or explanatory) trials seek to establish if a drug can reduce psychotic symptoms under ideal circumstances. To isolate drug effects, researchers enroll carefully selected patients. Specialized research personnel use rating scales of symptoms that are sensitive to drug effects as the study primary outcomes. Large simple trials (LSTs) are conducted at typical treatment settings with usual clinical personnel and enroll large numbers of participants so small but clinically important differences between treatment options can be detected. LSTs focus narrowly on clearly defined, patient-oriented outcomes. To some extent, practical trials can be conceptualized as hybrids of efficacy and large simple trials. Practical trials provide independent evidence to inform decision makers about the everyday effectiveness of clinically relevant alternative interventions. Practical trial researchers include a heterogeneous population of patients and collect data on a broad range of meaningful health outcomes at many types of practice settings intended to represent usual treatment. The designers of practical trials make trade-offs between internal validity, external validity, the breadth of issues addressed, and the ability to detect small differences. The different objectives of trials should be considered in the interpretation of the complete body of randomized evidence on antipsychotic drugs.
Randomized controlled trials (RCTs) remain the gold standard for establishing evidence to support clinical and policy choices. RCTs comparing antipsychotic drugs, however, have produced conflicting results and interpretations,1–3 and many studies have been criticized for failing to address critical issues adequately.4–6 This article aims to delineate the features of certain kinds of RCTs relevant to antipsychotic treatment of schizophrenia and to demonstrate that diverse trials may have complementary purposes that together can provide a broad understanding of antipsychotic drug effects.
Efficacy or explanatory trials are usually conducted by a drug's developer to meet requirements for marketing approval by drug regulatory agencies. These highly standardized, short-term studies compare a new drug with placebo or an active comparator or both and usually make no pretenses about a drug's long-term effectiveness. Safety, efficacy, and speed are critical priorities because companies seek rapid approval for a new drug.
Large simple trials (LSTs) are new for antipsychotic drugs and psychiatry but are widely used in other medical fields. These trials enroll great numbers of patients at typical practice settings and randomly assign treatments, but use few specialized research procedures and collect only minimal data focused on a critical outcome. Prominent LSTs in cardiology, eg, the Clopidogrel and Metoprolol in Myocardial Infarction Trial (COMMIT)7,8 and Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT)9 investigations, used mortality or stroke as primary outcomes. These influential studies have helped determine the relative benefits and risks of treatments for hypertension and acute myocardial infarction.
Practical or pragmatic trials are also relatively new to psychiatry, but several examples, including the recent Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE),10–14 Rapid Tranquilization Clinical Trial (TREC),15–17 and Cost Utility of the Latest Antipsychotic Drugs in Schizophrenia Study (CUtLASS)18,19 investigations, have provided additional information about the relative risks and benefits of antipsychotic drugs. Practical trial designs can vary considerably in their complexity, but their overall goal is to provide comparative treatment information that can be applied broadly to relevant patients.
In this article, we describe the typical characteristics and goals of efficacy trials, LSTs, and practical trials and then use contemporary examples involving antipsychotic drugs to illustrate the merits, weaknesses, and what can be learned from each type of trial. Table 1 shows typical characteristics of these trials and Table 2 provides examples.
Efficacy (or explanatory) trials seek to establish if a drug can work under ideal circumstances.20 Developers of new antipsychotic drugs use efficacy trials to establish a drug's capacity to reduce psychotic symptoms. The major goals in the design of efficacy trials is to make sure that any treatment effect is likely to be picked up and to reduce the background “noise” caused particularly by interrater variation in the assessment of outcomes and the presence of comorbidity, ie, to maximize the signal-to-noise ratio. To achieve these aims, researchers conduct these studies in specialized settings using trained personnel and deploying rating scales of symptoms that are sensitive to drug effects as the study primary outcomes.20 A purported antipsychotic is commonly compared both with placebo and with a known antipsychotic; this procedure helps to confirm “assay sensitivity”, or that the study design and procedures are adequate to detect a difference between a drug known to be effective and placebo.21
Trained research personnel recruit patients experiencing an acute episode of schizophrenia and conduct specialized assessments to ensure that diagnostic and severity of illness criteria are met. Most individuals with medical illnesses, substance use disorders, or additional psychiatric diagnoses are excluded both to reduce the likelihood of serious adverse events and limit the variance in outcomes. To protect against ascertainment and performance biases, treatments are assigned randomly under double-blind conditions, so that neither the patient nor the research team providing clinical care and conducting outcome assessments know the treatment assignment. Typical efficacy trials are designed to follow the clinical course of participants for 6–8 weeks. Trained personnel conduct frequent assessments of symptoms and side effects using psychometrically validated instruments. The primary outcome is typically improvement on a symptom rating scale, such as the Positive and Negative Syndrome Scale (PANSS). Response, or improvement by some prespecified amount on a symptom rating scale, is a common secondary outcome. Known side effects and common laboratory parameters are measured systematically, while other adverse events are reported spontaneously if and when they occur.
Efficacy trials efficiently meet the requirements of drug-licensing agencies to show that a drug works and that it is safe to use. They work well for drug developers that want to complete trials rapidly, because the high costs of using many highly trained and geographically dispersed research centers can be offset against the reduced time to market and profitability. However, efficacy trials do not reveal much about how a new drug works in typical settings, with patients who may have medical and psychiatric comorbidities and who may take other medications. They do not tell us about long-term safety issues or about the effects of drug therapies on mortality, ability to work, or other issues that are important to patients.
“Efficacy and safety of paliperidone extended-release tablets: results of a 6-week, randomized, placebo-controlled study” is a recent, typical example of an efficacy trial.22 Janssen (Titusville, NJ) conducted this phase 3 trial to meet regulatory requirements for approval of paliperidone as a treatment for acute schizophrenia. The trial compared 2 fixed dosages of paliperidone (6 and 12 mg daily) with placebo and olanzapine 10 mg daily for up to 6 weeks under double-blind conditions. Change in symptoms as measured by the PANSS total score was the primary outcome. Clinical response was defined as a 30% reduction in PANSS total score.
Individuals aged 18 or older who were experiencing an acute episode of schizophrenia as indicated by a PANSS score of 70–120 (representing moderate to severe, but not mild or extremely severe, psychopathology) were eligible.22 Patients met DSM-IV criteria for schizophrenia and agreed to at least 14 days of hospitalization. Extensive exclusion criteria included the following: substance dependence in the previous 6 months; medical conditions that might affect drug pharmacokinetics; history of tardive dyskinesia or neuroleptic malignant syndrome; significant risk of suicide or violence; female patients who were pregnant or breast feeding; use of antidepressants or mood stabilizers within 2 weeks of screening; history of any drug sensitivity or hypersensitivity to olanzapine, risperidone, or paliperidone; or a history of unresponsiveness to antipsychotic drugs.
Researchers at 74 sites with inpatient units in the United States conducted the study.22 They randomly assigned a total of 444 volunteer patients to one of the 4 “treatment” arms. Previous medications, including antipsychotics, antiparkinsonian agents, and beta-blockers were discontinued 3 days before randomization. Patients were assigned to take either paliperidone 6 mg, paliperidone 12 mg, olanzapine 10 mg, or placebo once daily in the morning for 6 weeks. Participants were hospitalized for the first 14 days and assessed at least weekly throughout the trial. Trained raters assessed drug efficacy using multiple measures, including the PANSS, the Clinical Global Impression (CGI) scale, and the Personal and Social Performance (PSP) scale. The study included multiple safety measures, including laboratory evaluations, physical parameters, and rating scales of neurologic side effects.
The primary efficacy analysis was conducted among randomized patients who received at least one dose of medicine and had at least one post-baseline observation.22 The Last Observation Carried Forward (LOCF) method was used to deal with missing data for participants who left the study early. Safety was evaluated among all patients who took at least one dose of double-blinded medication.
Of the 444 people randomized, fewer than half of participants in any of the groups completed the 6-week trial (placebo 34%, paliperidone 6 mg 46%, paliperidone 12 mg 48%, and olanzapine 10 mg 45%).22 In the primary report from this study, the authors presented efficacy data only for paliperidone and placebo. They excluded olanzapine from the primary efficacy analyses because it had been included only for assay sensitivity. Nevertheless, they reported safety outcomes for all treatment arms for the PSP and for many safety measures, including treatment-emergent adverse events.
Using LOCF analyses, the authors reported that both doses of paliperidone were more efficacious than placebo in reducing symptoms measured by the PANSS and that patients taking paliperidone were significantly more likely to meet response criteria.22 Adverse events rates were high in all groups, including placebo and olanzapine (range 73%–79%). Paliperidone was associated with elevated prolactin levels and dose-related neurologic symptoms, although the specific effects on akathisia and on sexual functioning that might be related to high prolactin levels were not reported. The comparison to olanzapine provided interesting information even though it was included only for assay sensitivity. Overall, olanzapine 10 mg daily appeared to be similar in efficacy to both paliperidone 6 and 12 mg daily, while olanzapine 10 mg seemed to be associated with greater incidence of extrapyramidal side effects than paliperidone 6 mg but lower incidence than paliperidone 12 mg.
Although the low rate of study completion (no study arm with 50% completion rates at 6 weeks) may reflect investigator attitudes about placebo-controlled studies and a desire to minimize the risks to study participants, the huge amount of missing data makes it very difficult to make inferences even about the short-term effects of the drug. An extremely cautious observer might decide the study is inconclusive because less than half of the people who took paliperidone continued to do so for the scheduled 6 weeks of the study. A less conservative interpretation is that that paliperidone appeared better than a placebo for reducing schizophrenic symptoms over a few weeks. Assessment of long-term risks and benefits were not goals of this trial.
LSTs focus narrowly on clearly defined, patient-oriented outcomes.21 A typical LST outcome, mortality, is discrete and meaningful. Designers of LSTs resist the desire to collect information on a wide array of outcomes, instead using resources to enroll large numbers of participants so they can meaningfully detect relatively small but clinically important differences.20 LSTs are conducted at typical treatment settings with usual clinical personnel. Study procedures are simple so that the need for specialized research training and interference with routine clinical care is minimized. Inclusion criteria are broad, and exclusion criteria are minimal. The key criterion for study entry is uncertainty about which treatment option is best for the individual participant. Treatments are randomly assigned to avoid selection bias.
There are no excellent examples of LSTs involving antipsychotic drugs that have been completed and fully reported. Although the results are not yet available, the best available example is the Ziprasidone Observational Study of Cardiac Outcomes (ZODIAC) trial, which was conducted by Pfizer to satisfy regulatory agencies that wanted to know about the long-term effects of the new drug on mortality.23 Premarketing studies revealed that ziprasidone was associated with prolongation of the electrocardiographic QT interval, which carried a potential increased risk of fatal ventricular arrhythmias. Because even a small increase in mortality would be clinically important, an LST designed to detect effects on mortality was highly appropriate.
Individuals aged 18 or older who were diagnosed with schizophrenia and whose psychiatrist wanted to start a new antipsychotic and would consider either olanzapine or ziprasidone were eligible to participate.23 Exclusion criteria were minimal as follows: pregnant or lactating women, presence of disease with life expectancy less than 1 year, and participation in any study involving investigational products within 30 days of entering the study.
Researchers at almost 1000 sites in 18 countries conducted the study.23 They randomly assigned more than 18 000 volunteer patients to receive open-label olanzapine or ziprasidone. No other study-related interventions, no laboratory testing, or clinical monitoring was required. Minimal data collected at baseline included only the following: demographics, the CGI scale to indicate the severity of illness, cardiac risk factors, and prior antipsychotic use. Simple outcome information, including vital status, data on continuation of the assigned drug, hospitalizations, and emergency room visits, was obtained from the treating physician or other member of the clinical team at regular intervals. If hospitalization occurred, then the hospital records were obtained. Each participant was intended to be followed for 1 year regardless of treatment status. An “Endpoint Committee” that was blinded to treatment assignments adjudicated the endpoints based on all available information.
When the results of the ZODIAC study are available, they will provide important information about mortality associated with both ziprasidone and olanzapine over 1 year of treatment.
Like other LSTs, ZODIAC will provide important data on an important, highly focused research question. The study will provide critical information about the relative risk of death among a broad group of patients treated with either olanzapine or ziprasidone. Because ziprasidone is widely promoted and used among individuals with significant risk factors for cardiovascular disease, its effects on mortality are important. Olanzapine is a reasonable comparator because it too is associated with risk factors for cardiovascular disease, although some observers may have preferred comparing ziprasidone to a drug not so strongly associated with risk factors for heart disease. Secondary outcomes, including hospitalization rates and discontinuation of the assigned study medication, will provide limited information for those who want to know more about the relative effectiveness of these 2 drugs. The ZODIAC study nicely illustrates the trade-offs of a LST—many subjects are enrolled to answer an important, focused question but many other relevant health outcomes are unaddressed. The need to enroll large numbers of participants means that results of LSTs are often not quickly available. For example, the results of the ZODIAC trial, which enrolled more than 18 000 patients and examined relatively long-term outcomes, were still not available several years after the study question was identified and the trial begun.
Practical trials are intended to provide independent evidence to inform decision makers about clinical and policy choices related to the risks and benefits of approved treatments.24 Researchers design practical trials to provide high-quality evidence, with high internal and external validity, regarding the everyday effectiveness of clinically relevant alternative interventions.24 To do this, researchers include a heterogeneous population of patients and collect data on a broad range of meaningful health outcomes at many types of practice settings intended to represent usual treatment.24
Practical trials use broad subject inclusion criteria and minimal exclusion criteria to enhance external validity and thus enhance the believability of study results for clinicians and patients in typical treatment settings.25 Practical trials compare treatments about which there is clinical uncertainty about the outcome at the individual patient level and use randomization to protect against selection biases.26 Not all practical trials conceal the treatment assignment from patients and study clinicians, but for subjective outcomes determined by raters blinding of the raters is necessary. The training and personnel requirements of rater blinding place burdens on sites that make it less likely that typical clinical sites, rather than research sites, can participate in these trials. In addition, a desire to examine “a broad range of meaningful health outcomes” often including service utilization to allow estimation of cost-effectiveness is at variance with a desire to limit participant and researcher burden. To some extent, practical trials can be conceptualized as hybrids of efficacy and LSTs with the main trade-offs being in internal validity and the potential for a low signal-to-noise ratio.
The US National Institute of Mental Health (NIMH) initiated the CATIE schizophrenia trial to provide objective information about the long-term effectiveness and cost-effectiveness of antipsychotic drugs in widespread use.10,14 The NIMH supported the study because it had a strong interest in informing policy makers regarding the appropriate use of these commonly used, expensive, and sometimes toxic medications.27
The CATIE schizophrenia trial was a hybrid study that included characteristics of efficacy trials and LSTs to examine the overall effectiveness of antipsychotic drugs for chronic schizophrenia.14 Researchers conducted the study at heterogeneous practice settings, enrolled a relatively large sample of participants, used a discrete primary end point, and attempted to mimic routine clinical practices. However, because CATIE also included double blinding and an extensive array of assessments, many of which required trained clinical raters, all participating clinical sites were required to have specialized research personnel.
Individuals aged 18–65 years with a diagnosis of schizophrenia and who were appropriate for treatment with an oral antipsychotic medication were eligible to enroll in the CATIE schizophrenia trial.14 Exclusion criteria were as follows: diagnosis of schizoaffective disorder, mental retardation, or other cognitive disorders; history of serious adverse reactions to any of the proposed treatments; patients having a first schizophrenic episode; patients with persistent severe symptoms despite an adequate trial of one of the proposed treatments, prior treatment with clozapine for treatment resistance; current pregnancy or breast-feeding; and serious and unstable medical conditions. People with medical conditions that were not serious and unstable, and people with substance use disorders, were included.
The CATIE schizophrenia trial was conducted at 57 clinical sites representing the array of clinical settings where people with schizophrenia receive treatment in the United States.10 The study enrolled nearly 1500 participants and randomly assigned them to receive olanzapine, perphenazine, quetiapine, risperidone, or ziprasidone under double-blind conditions.14 Patients had monthly study visits. Medications were dosed flexibly and, except for additional antipsychotic agents, other medications were permitted throughout the trial. Patients continued on this first treatment for 18 months or until treatment was discontinued for any reason (phase 1).
Participants who did not respond to or tolerate the first assigned medication could enter subsequent phases of the trial. If the phase 1 treatment was perphenazine, patients then received randomized, double-blinded treatment with a second-generation antipsychotic (phase 1B).14 If patients in phase 1B again discontinued treatment, then they entered phase 2. In phase 2, patients could choose between 2 randomization pathways. An “efficacy” pathway (phase 2E) was recommended to individuals who discontinued the previous treatment due to inadequate symptom relief. A “tolerability” pathway (phase 2T) was recommended to individuals who discontinued the previous treatment due to intolerability. If study participants discontinued either of the phase 2 studies they could enter phase 3, which allowed open choice of one of 9 treatment options.
The CATIE investigators selected time until all-cause treatment discontinuation, meant to reflect the overall effectiveness and acceptability of the medications, as the primary outcome.14,28 They chose medication discontinuation because it integrates patient and clinician judgments of efficacy, safety, and tolerability into a discrete, global measure of effectiveness. In addition, treatment discontinuation conceptualized as evidence of ineffectiveness avoids the problem of missing data by framing the outcome so that missing data are actually part of the outcome.21 Secondary outcomes included the reason for treatment discontinuation and changes in PANSS and CGI scores.14 Safety and tolerability outcomes included adverse events, changes in weight, measures of neurologic side effects, and laboratory analyses of parameters related to lipid and glucose metabolism. Other important outcomes included effects of the drugs on psychosocial and neurocognitive functioning. Health services use was carefully tracked to allow comparison of the cost-effectiveness of the study medicines.
Almost 1500 individuals with schizophrenia volunteered for the study and took randomly assigned medication.10 Only a minority of patients in each group remained on the first assigned drug treatment for the 18-month duration of the study (18%–36%). The time to treatment discontinuation for all causes was longest with olanzapine compared with quetiapine and risperidone. The comparisons of olanzapine with perphenazine and ziprasidone produced similar results but did not reach statistical significance. Substantial increases in weight and adverse metabolic changes associated with increased risk of coronary heart disease were most common with olanzapine.
Analyses of neurocognitive functioning found small but significant improvements for all treatment groups, but no difference among them after 2 months of treatment.29 Similarly, modest improvements in psychosocial functioning were evident across all treatment groups, but there were no significant differences between treatment groups.30 Cost-effectiveness analyses also found modest improvement for all groups on several measures of effectiveness over 18 months, but no significant differences between perphenazine and any second-generation antipsychotic.31 Because total monthly health care costs were lower for perphenazine than for second-generation antipsychotics due to lower antipsychotic drug costs, the investigators concluded that initiating treatment with perphenazine was less costly and no less effective than initiating treatment with each of the 4 newer medications.31
In phases 1B, 2E, and 2T, outcomes for individuals taking the various drugs depended considerably upon clinical factors.11–13 The CATIE researchers concluded that clinical circumstances affect drug effectiveness and that these circumstances should be considered when making drug choices.
The attempt to satisfy many constituencies by collecting data on many outcomes and to include many treatment arms and randomizations addressing a variety of clinical situations meant that the study design was quite complicated and required trained personnel. Decisions to add ziprasidone to the protocol after recruitment began and to use perphenazine in only one randomized phase of the study generated controversy. The exclusion of people with tardive dyskinesia from possible randomization to perphenazine may have limited the study's ability to detect differences between perphenazine and the newer drugs. The decision to use double-blinded treatments decreased the risk of measurement bias and improved the study's internal validity. However, double blinding increased study costs, decreased the resemblance of the study procedures to those of routine clinical care, and tied the study to relative dose equivalences that have proven controversial. These complexities and its large size made the study expensive, at least compared with other studies not conducted by pharmaceutical companies.
Some aspects of routine clinical practice that were incorporated into the trial design in order to mimic usual care led to problems interpreting the results and to new information regarding the treatments. For example, because patients entering the study could be randomized to stay on their current treatment and there were different rates of medication use at study entry, the study provided new information about “switching”.32 The researchers found that switching medications affected the primary outcome and that the effect of switching varied according to both the drug being stopped and the one being started.10,32 Another “naturalistic” aspect of the study, which allowed choice between 2 pathways in phase 2, provided information about patient and clinician preferences but led to small numbers of subjects and low statistical power in the efficacy (clozapine) pathway.
RCTs, with the critical feature of randomization to protect against selection biases, can be the source of high-quality evidence to guide various kinds of health-related decisions. In this article, we discussed some types of late phase trials that can be of use to clinicians and other types of decision makers. We have not included a discussion of all types of RCTs, eg, phase 2 or proof of concept studies, which may focus on dose determination or mechanism of action rather than efficacy.
The goals of a study can help guide decisions regarding appropriate study designs and associated trade-offs. Explanatory or efficacy trials provide ritualized, systematic data regarding the short-term efficacy and safety of new drugs; they provide critical information about how a drug can work in ideal circumstances but provide little information regarding how it might work in usual treatment settings. LSTs can provide definitive answers to a focused clinical question regarding the comparative effects of treatments; the results have tremendous external validity because the studies are conducted in actual treatment settings using routine clinical procedures. If researchers can create effective collaborations with clinicians, these enormous studies can be conducted efficiently and cost-effectively. Although LSTs can answer a focused question definitively, they do not answer the multitude of other health-related questions that might be associated with antipsychotic drugs. Practical trials can provide data regarding many health-related outcomes, but because much data are collected and study logistics are not simple, high costs can limit the numbers of participants studied and may lead to less than definitive results. Practical trials can provide vast amounts of useful information, but designers of these studies must resist the temptation to address all the questions that might be asked regarding a treatment in order to provide optimal information about the most important issues.
The thorough assessment of any health technology, including pharmaceuticals, should include a systematic appraisal of all the available randomized evidence. The confidence that we can have that a drug is likely to be effective in the real world will be increased if a treatment effect found in practical trials is comparable to that observed in highly controlled efficacy trials. For example, the treatment effect on cognition observed in the large pragmatic AD2000 trial comparing donepezil with placebo33 was comparable to that estimated in the short-term efficacy trials conducted by the manufacturer.34 However, what can we conclude when trials with different design characteristics differ in their results? We would argue that it cannot be concluded that the results from one kind of trial are necessarily “true” and the results from the other trials are “wrong”. Rather, reasons for the apparent heterogeneity between trials should be sought by examining the clinical and design differences between the trials. If the trials are judged reasonably similar in terms of design and participants, meta-analysis may be possible in which case it is simple enough to determine whether the differences between trials are likely to be real or simply due to the play of chance.
The reasons for any material differences between trials can be assessed using meta-regression, a technique which statistically compares the results of trials according to trial-level characteristics.35 For example, a systematic review of industry-sponsored trials of second-generation antipsychotics (SGA) compared with first-generation antipsychotics (FGA) found a number of methodological weaknesses in the trials, and, furthermore, a meta-regression strongly suggested that the substantial statistical heterogeneity between trials could be explained by differences in the dose of the FGA used in the trial.4 The use of a higher dose of a FGA was more likely to produce a result favoring a SGA. Although a subsequent meta-analysis using different methods found partially different results,36 the design of CATIE was informed by the meta-regression findings on dose. CATIE largely confirmed the conclusion that there were probably no major differences between drugs selected from both available FGAs and SGAs on a broad range of clinically relevant outcomes and cost-effectiveness.10,29–31 This example demonstrates the particular usefulness of practical trials in situations when there is substantial residual uncertainty about the comparative effectiveness of drugs even in the presence of a large body of efficacy trials. The different objectives of efficacy and practical trials should be considered in the interpretation of the complete body of randomized evidence.
T.S.S. has consulted or spoken at events sponsored by AstraZeneca, Janssen, Lilly, and Solvay, and he was an investigator in the CATIE trial. JG has received research funding and support from GlaxoSmithKline and Sanofi-Aventis and has consulted for Bristol-Myers Squibb, TauRx and GlaxoSmithKline. He has received research funding from UK Department of Health, UK Medical Research Council and the Stanley Medical Research Institute.