|Home | About | Journals | Submit | Contact Us | Français|
Second-generation antipsychotics have attracted practitioners’ and policy-makers’ attention, because of concerns over their health effects and costs. Comparative effectiveness data from the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE)—a high-profile National Institutes of Health (NIH)–funded study—have been used to argue for restricting coverage for these costly drugs. But concerns about the design of CATIE and its associated cost-effectiveness analysis and uncertainty about the precision of these findings raise questions about this interpretation. Our work suggests that additional research to increase the precision of comparisons of the effectiveness of antipsychotics would be well worth the cost.
Antipsychotic medications are the primary drug treatments for patients with schizophrenia because of the drugs’ marked efficacy in the treatment of delusions and hallucinations. However, in most patients they have less ability to improve other symptoms in schizophrenia, including cognitive impairment.1 First-generation antipsychotic medications were introduced in 1954 (chlorpromazine). A major issue especially with these older drugs is the risk of “extrapyramidal” symptoms, which include several serious movement disorders. The approval in 1990 for limited use of clozapine, the first antipsychotic drug to almost never produce extrapyramidal symptoms, initiated what is commonly called the second generation of antipsychotic drugs.2
The fundamental clinical difference between the first- and second-generation antipsychotics is the diminished risk of extrapyramidal symptoms in the latter.3 This paper examines the following set of second-generation antipsychotics: olan-zapine, quetiapine, and risperidone.4
Despite the lower rate of extrapyramidal symptoms with second-generation antipsychotics, the optimal choice of drug treatments for patients with schizophrenia has been disputed for several reasons: (1) uncertainties surrounding effectiveness in controlling psychotic symptoms, (2) the tendency of some second-generation drugs to produce weight gain and blood-lipid abnormalities, (3) some evidence that second-generation antipsychotics improve cognition, and (4) the costliness of second-generation antipsychotics.5 Crucial questions have gone unanswered regarding the value of second-generation compared to first-generation antipsychotics.
In 1999 the National Institute of Mental Health (NIMH) funded the $42.6 million Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) study to compare the effectiveness of one first-generation antipsychotic, perphenazine, and all second-generation antipsychotics available in the United States. CATIE’s design as a randomized trial that compared these drugs head to head, its large size compared to prior studies, and federal sponsorship made its results highly anticipated.
The initial results from CATIE, published in 2005, reported that olanzapine but not the other second-generation antipsychotics (quetiapine, risperidone, and ziprasidone) was superior to perphenazine in having greater time to discontinuation of the study medication for any cause (primary outcome), and greater time to discontinuation for lack of efficacy (secondary outcome) (time to discontinuation has been suggested as an index of antipsychotic effectiveness, with longer discontinuation times considered to be evidence of superiority).6 Olanzapine was, however, associated with a shorter time to discontinuation for weight gain, hyper-glycemia, and other adverse effects, while perphenazine was associated with faster discontinuation because of extrapyramidal symptoms. The article concluded that perphenazine could not be rejected as an inferior treatment.
In 2006, a cost-effectiveness analysis (CEA) based on CATIE concluded that “treatment with perphenazine was less costly than treatment with second-generation antipsychotics with no significant differences in measures of effectiveness.”7 In light of this analysis, some CATIE investigators have questioned the results of previous studies that favor second- over first-generation antipsychotics.8 Some have suggested that this discrepancy might reflect the influence of industry sponsorship of these trials, although recent meta-analyses of these studies have failed to support this hypothesis.9
Regardless, investigators disagree about the implications that CATIE results should have for access to the second-generation antipsychotics. Several CATIE investigators, including the authors of the CATIE cost-effectiveness study, have stated that the study was not designed to directly answer questions of policy regarding access to antipsychotic drugs; others have suggested that the CATIE results establish that it is wasteful to use public funds to pay for second-generation antipsychotics, a perspective that has been adopted by some influential media outlets and pharmacy benefit managers (PBMs).10
Despite these differences in interpretation, there are no efforts under way to perform large-scale comparative effectiveness studies of second-generation anti-psychotics that would address CATIE’s major limitations.11
These issues raise questions about how comparative cost-effectiveness results should be designed, interpreted, and used to make public policy decisions. These questions are also timely because, since the introduction of the Medicare Prescription Drug, Improvement, and Modernization Act (MMA), there have been greater incentives for comparative effectiveness research and CEA to play a larger role in U.S. public policy.12 Federal funding for comparative effectiveness research was dramatically increased as part of the American Recovery and Reinvestment Act (ARRA) (the so-called stimulus bill) of 2009.
Here we focus on the design of CATIE and the precision of its findings that could influence its potential policy implications, such as coverage of second-generation antipsychotics and the value of additional public spending on comparative effectiveness research in this area.13
The main focus of our quantitative analysis is a “value-of-information” calculation that builds on CATIE data to calculate the expected value of research to further reduce the uncertainty about the costs and benefits of the first- and second-generation antipsychotics and implications for the design of future studies in this area. Although the focus is on CATIE, some of the issues that arise are clearly of more general relevance, including challenges in making coverage decisions in the presence of substantial uncertainty about costs and benefits, and the use of “value-of-information” techniques to assess the value of reducing such uncertainty and to identify optimal designs to carry out such research.
The relatively long duration (eighteen months) and number of patients (1,460) in CATIE have caused it to be described as the most comprehensive randomized clinical trial (RCT) ever conducted in patients with schizophrenia.14 However, CATIE’s design has also been criticized from a clinical perspective, and these and other concerns are relevant in considering its public policy implications.15
Probably the most important issue with CATIE from a clinical perspective concerns its primary outcome measure: time to discontinuation of the study medication. In clinical practice for many conditions, time to discontinuation provides a measure that integrates information on both effects on symptoms and tolerability, and perhaps cost, from the perspectives of the patient and the prescriber. However, in the treatment of schizophrenia, decisionmakers might not have much data to assess when medication changes will benefit a given patient, which raises doubts about the selection of this measure as a primary outcome. Similarly, because patients or their families could choose different medications in CATIE at no cost, switching behavior would not reflect cost concerns.
Also, because patients who dropped out of CATIE might have had the cost of treatment fall on themselves or their family, and the prescriber would have lost income from study participation, CATIE may have produced incentives to switch medications more often than otherwise, to keep the patient in the study protocol—again raising concerns about the use of time to discontinuation as an outcome measure.
Other important concerns about the clinical implications of CATIE include (1) assignment of 231 patients with tardive dyskinesia (a movement disorder sometimes caused by first-generation antipsychotics) to treatment with the second-generation antipsychotics but not to the first-generation antipsychotic (perphenazine); (2) failure to explore a range of fixed doses for the drugs; (3) a peak dose of one second-generation antipsychotic, olanzapine (30 mg per day) that is three times the optimally effective dose in non-treatment-resistant patients; (4) inclusion of up to one-third treatment-resistant patients, for whom the higher doses of olanzapine may have increased efficacy; and (4) failure to define and categorize the largest cause for discontinuation (“patient choice”), which need not imply lack of efficacy or intolerability.16 These concerns have led some to argue that the CATIE results are insufficient to justify what would be radical departures from accepted practice for the treatment of schizophrenia, as reflected in multiple published algorithms.17
Several other concerns are especially relevant from a policy perspective. First, CATIE was designed to compare rates of discontinuation from the primary assigned medication, not coverage policies that restrict access to specific medications, which, unfortunately, is how its results have sometimes been used. Second, given the primary objective of studying discontinuation and the high frequency of switching in clinical practice because of the widespread belief that patients who do not respond to one antipsychotic drug may respond better to another, CATIE had to allow patients to switch medications whenever they and their clinicians chose. This implies that CATIE’s findings apply only to the outcome of initial medication assignment if patients can readily switch and not if the assigned medication was the only one available or if coverage policies created barriers to switching. Indeed, switching occurred especially frequently in CATIE: more than 75 percent of patients on perphenazine changed medications over the eighteen-month course of the study, a rate far higher than in other effectiveness studies.18 This reinforces the notion that the effects of initial medication assignment in CATIE might not reflect those in routine clinical care.
For our analysis, another important issue concerning CATIE is that the outcomes needed for cost-effectiveness, including quality-adjusted life-years (QALYs) and costs, must be estimated. Exhibit 1 reports these cost-effectiveness results from CATIE.
The methods used to develop these estimates raise several concerns. First, to compute QALYs, the authors had to estimate the “utilities” of the patients’ health at all times. Such utilities assign values to various health states based on patients’ preferences, to allow comparison of these health states. This was done by defining a series of health states described by psychotic symptoms and potential side effects of treatment with antipsychotic drugs, including tardive dyskinesia and weight gain. Psychotic symptoms were defined by severity scores based on the Positive and Negative Syndrome Scale (PANSS), which is the primary rating scale used to evaluate psychopathology in clinical trials in schizophrenia.
These symptoms and side-effect profiles were presented to members of the general public for rating using the “standard gamble technique,” a reasonably common and theoretically grounded, yet still controversial, approach for measuring preferences for health states.19 Whether this accurately reflects the welfare of patients experiencing these health states is impossible to know. Another concern arises because when psychotic symptoms and drug side effects occurred at the same time, the authors estimated utilities by multiplying utilities for the two separate health states—an approach that has been shown to underpredict utility of health states where conditions co-occur.20 Bias for any of these reasons could alter the conclusions from the CATIE cost-effectiveness study, given the small differences in outcomes between treatment arms.
Another issue is that the CATIE results are based on average outcomes over the eighteen-month duration of the study. A longer-term investigation could have resulted in different conclusions—for example, first-generation antipsychotics could have been found more likely than second-generation anti-psychotics to cause tardive dyskinesia and thus be discontinued.
By contrast, if patients in all treatment arms were to switch over time to treatments that were more effective for them, differences between treatment arms would decrease. Even the nature of switching behavior itself must be considered. For example, if one drug was thought to be more effective or more tolerable than another, switching would be less likely with that drug, increasing its performance with respect to this primary outcome measure. These issues make generalizing about long-term outcomes from CATIE extremely complex and raise caution about drawing policy conclusions from CATIE.
The complexity of these issues also makes it challenging to accurately estimate the long-term value of research using the value-of-information approach we describe below. Some of these issues tend to make our estimates of the value of research too high (for example, the tendency of people to find the best treatments for themselves), while others (such as unclear long-term outcomes) tend to make estimates of the value of research too low.
On the cost side, the CATIE CEA found significant differences in total costs across arms, with costs associated with perphenazine less than with quetiapine, olanzapine, and risperidone (all p < 0.001).21
Two major methodological concerns preclude the acceptance of this conclusion. First, the adjusted differences in “raw-scale” costs (dollars) across treatments might not have been calculated correctly.22 Second, the finding of significant differences in costs between perphenazine and the atypical (second-generation) antipsychotic arms appears to be based on estimates from the transformed (log-scale) data, rather than the raw-scale cost data (in dollars).23
Additionally, examination of the time trend in monthly costs estimates suggests that the conclusions of Robert Rosenheck and colleagues regarding the overall average monthly costs may be sensitive to the duration of the study.24 Figure 3 in that paper reveals that monthly costs were rising almost continually for the perphenazine group from the eight-month mark until the end of the study, at least partially because of these patients’ switching to more expensive drugs, while the monthly costs in all of the other second-generation antipsychotic groups were declining slightly or stable. This suggests that had the study been continued longer, costs associated with perphenazine may have further approached or even exceeded those of the second-generation antipsychotics.
Investigators examining the results of long-term studies such as CATIE and the U.K.-based Cost Utility of the Latest Antipsychotic Drugs in Schizophrenia Study (CUtLASS) study, which found that the overall effectiveness (in terms of QALYs) is similar between first- and second-generation antipsychotics, have noted the differences in findings from prior short-term trials that found evidence of benefits for the second-generation antipsychotics.25 However, like CATIE, CUtLASS also has important design concerns that may explain why these studies find that overall effectiveness (in QALYs) is similar for first- and second-generation antipsychotics, while prior short-term trials found benefits for the second-generation antipsychotics.26
Some of the reasons discussed for such discrepancies are that many of these prior studies were industry-funded, examined highly selected patient populations, and had high dropout rates. Another potential explanation is that these short-term studies are more likely to identify the effects of specific drugs than studies like CATIE that allow patients to switch medications, reducing the effect of initial assignment over time. Indeed, even patients started on placebo might have long-term outcomes similar to those of patients receiving an effective treatment if switching to effective treatments were allowed.
All of these concerns suggest that there is great uncertainty about the extent to which CATIE findings should be used to make coverage policy. If existing data are not sufficient to address this need, it is useful to ask whether additional studies to address this question would be justified.
“Value-of-research” calculations have recently begun to be applied in both academic and policy settings (for example, by the U.K. National Institute for Health and Clinical Effectiveness [NICE]) to determine the value of additional studies on the value of treatments, including diverse clinical applications such as the treatment of Alzheimer’s disease and the routine removal of wisdom teeth.27 The approach uses data on the uncertainty around the effects of treatment and the likelihood that new information would change the recommended treatment to calculate the expected value of the improvement in outcomes that would result from additional research.28
To take a simple example, a study comparing a currently accepted treatment (A) to a potential new treatment (B) that had a 75 percent chance of showing that A was better than B, so that no change in practice was indicated, and a 25 percent chance of showing that B increased QALYs by four years, would have an expected value of one year per patient (0.25 × 4 years). This could then be multiplied by the size of the relevant population to estimate the population-level benefits. The approach can also incorporate effects on the costs of treatment or research, or both, to calculate measures of benefits net of costs. The value-of-information approach can estimate the expected benefits of a study that could eliminate all uncertainty surrounding the outcomes of a treatment decision (the expected value of perfect information) or the benefits of less powerful studies with a specific sample size or more limited outcome measures that incompletely characterize the uncertainty in outcomes.29
We began our analysis by assessing the value of performing additional studies that would tell us exactly how treatment with typical and atypical antipsychotics would affect costs and outcomes. To do this, we used data from CATIE and other sources on the uncertainty in the lifetime survival, costs, and QALYs resulting from initial schizophrenia treatment with per-phenazine, olanzapine, risperidone, and quetiapine to estimate the average changes in outcomes and costs expected from perfect knowledge of the effects of these atypicals. This expected value-of-research calculation was done assuming that without additional research, we accept the CATIE results and use perphenazine as the starting treatment. The value of future research, therefore, depends on the probability that this decision would change because of the additional research and the expected net benefits of improving that decision.30
To perform this expected value-of-research calculation, we first used data from the literature on schizophrenia prevalence and mortality and data from CATIE on the effects of treatments on quality of life and costs to characterize the uncertainty in lifetime incremental costs (change in costs) and incremental benefits (change in QALYs) between perphenazine and each of the second-generation antipsychotics. We then used varying assumptions about the monetary value of a QALY to calculate the net monetary benefit (NMB) of choosing each second-generation antipsychotic over perphenazine, given these effects on QALYs and costs, by taking the monetary value of the gain in QALYs and subtracting the increase in costs. When the resulting NMB was positive, this provided an estimate of the value of perfect information about that second-generation antipsychotic compared to perphenazine, given these effects on QALYs and costs, because the default treatment choice of perphenazine would have been incorrect.
When the NMB was zero or negative, the research was considered to have no value, given those effects on QALYs and costs, because it did not change the treatment decision. For each second-generation antipsychotic, we then calculated the expected value of perfect information by multiplying each level of positive NMB by its probability calculated based on the distribution of effects on costs and QALYs for that second-generation antipsychotic and then adding up over all possible values of positive NMB. The overall value of future research was calculated by taking the maximum value of perfect information over all second-generation antipsychotics.
Assuming that schizophrenia incidence occurs at age twenty, we estimated the incidence for schizophrenia to be 13 per 1,000 twenty-year-olds, producing 52,620 incident cases annually. Exhibit 2 shows simulated means and variances for each drug.31 The means and variances match well with the statistics reported in CATIE. The variances in QALYs may seem small when considered on a scale of 0 to 1, but even if a QALY is valued at only $50,000, which is below most recent estimates, a difference of even 0.03 would be valued at $1,500 annually, which is roughly similar in magnitude to the costs of even the more costly second-generation antipsychotics.32 The distribution of costs is also notable for having a relatively large proportion of observations that exceed the mean by a large margin (known as a “long right tail”), which is a common feature of many health spending distributions, including this one, because of costly hospitalizations.
The CATIE results about the mean effects on outcomes across treatment arms imply that first-generation anti-psychotics are an effective first-line treatment. However, our analysis suggests that there is a 37 percent probability that this current decision based on the CATIE cost-effectiveness results will be wrong. Moreover, at $50,000 per QALY, we estimate that the expected value of more precisely determining the effectiveness (ignoring costs) of atypical/typical antipsychotics in the United States is $17.3 billion: $11.7 billion accrues to patients who already have schizophrenia and the remainder to future cohorts.
When we also incorporate the costs of treatment, we estimate that the CATIE finding that initial assignment to typical antipsychotics is cost-effective has a 55 percent probability of being wrong, and the value of more precisely determining the cost-effectiveness of the initial assignment of antipsychotics in the United States increases to $308 billion: $207 billion accrues to the prevalent cohort of schizophrenia patients and $6.6 billion, to each cohort of persons expected to develop schizophrenia each year over the next twenty years (Exhibit 3).33
Exhibit 4 illustrates how this expected value of future research and the probability that the CATIE findings are wrong (that is, that perphenazine is not the most cost-effective treatment) vary with the value of a QALY. It is notable that the expected value of research varies little even as the value of a QALY varies widely. This is because the major determinant of the value of research for these drugs is uncertainty about their effects on total costs.
Having determined that the potential value of perfect information concerning the comparative costs and effectiveness of first- and second-generation anti-psychotics is large, similar analyses can identify the optimal sample size for future studies that try to more precisely address the same questions that CATIE sought to address. As suggested earlier, CATIE was designed to focus on discontinuation of the assigned drug as a primary outcome; it was not powered to be able to show differences in cost-effectiveness. To explore this further, we performed a traditional deterministic power calculation to determine the number of subjects needed to identify a statistically significant effect on NMB based on the largest average effect size seen in CATIE between a second-generation antipsychotic (ziprasidone) and perphenazine. With estimated NMB of the two treatments of $15,680 (standard deviation: $315,000) and $26,296 (SD: $140,000), we found that a trial with 80 percent power to find statistically significant differences at the 5 percent level would require 8,300 patients in each arm. Similar calculations reveal that CATIE, with only 400 patients per arm, had only 10 percent power to detect differences at this level. Although these findings might be surprising to those who have put much faith in the CATIE CEA results, they are consistent with many previous findings that cost-effectiveness studies require larger sample sizes than clinical studies require, because of high variances in costs.
An important limitation of these power calculations is that they require assuming an effect size of a given magnitude, and they do not explicitly incorporate the value of the information obtained from a study of a given size. To address these concerns, we used the CATIE data to calculate the expected value of information for trials of varying sizes in each arm net of the estimated cost of performing these trials.34
In making this estimate, we first carried out these calculations ignoring costs and only focusing on QALYs. We found that the expected value of information is maximized for a future trial of 4,000–4,500 subjects per arm. Such research is expected to produce a value of about $13.8 billion, with health benefits valued at $50,000 per QALY. Optimal sample sizes were not sensitive to the threshold value for a QALY.
When the effects of costs are taken into account, optimal sample sizes and the expected value they generate change considerably (Exhibit 5). First, the value of these trials of finite size is all much higher, and very close to the $308 billion estimated value of perfect information noted above. Second, the optimal sample size is exceptionally large—more than 20,000 subjects per arm.
This value-of-research analysis suggests that CATIE has important limitations for guiding public policy concerning coverage of antipsychotic medications. It also suggests that future research on this important class of medications is likely to be of immense value—exceeding $300 billion—to people with schizophrenia today and those who will develop it over the next twenty years. Much more of the value of research appears to be generated by uncertainty about the effects of treatments on costs than by uncertainty in outcomes, and very large sample sizes will be needed to address such uncertainty.
Although we identified a number of important limitations of CATIE from the technical and clinical perspectives, among the most important from a policy perspective is that it was never intended to determine coverage policy. Indeed, the high rate of medication switching in CATIE implies that policy decisions based on CATIE’s findings are relevant only in contexts that allow for effective switching. This is important because if comparative effectiveness research is motivated by attempts to inform payment policy, then it must be able to address these sorts of concerns. This suggests that if comparative effectiveness research is to be used to guide policy decisions, such as changes in coverage policy, then such research should include direct evaluations of policy changes.
Such analyses could include observational studies of policy changes or large-scale social experiments in which patients are randomly assigned to different forms of coverage, of which there are some examples in health care (such as the RAND Health Insurance Experiment). Social experiments could also be designed to work at higher levels—for example, if states are assigned different treatments.
The importance of considering social experiments as an important part of comparative effectiveness research is increased by our findings about the remarkably large sample sizes needed for cost-effectiveness studies, whether assessed by traditional deterministic sample-size calculations or by value-of-information approaches. Traditional clinical trials with samples of the size our results suggest are needed seem quite unlikely to be funded in the current funding environment, even if policymakers accept our conclusion that such studies would produce expected benefits that justify their expense, and they might not be feasible even if funding were not a concern.
The idea that switching behavior itself influences the value of research if patients tend to gravitate toward (or away from) preferred treatments over time and the idea that switching behavior may be influenced by research that suggests some treatment is dominant show that merely increasing the size of comparative effectiveness studies such as CATIE is unlikely to provide definitive evidence on the best use of antipsychotic agents in schizophrenia. This point is reinforced further by the need to study the potential for more carefully specified treatment algorithms involving multiple medications (at a point in time or over time) to alter cost and outcomes.
Although the CATIE study had major limitations, it should be emphasized that analysis performed here would not have been possible without the data it provided. Comparative effectiveness, like most research, is inherently iterative; prior studies inform future ones. Accordingly, our study has a variety of limitations, including being dependent on mortality data collected before the atypicals were available, being limited by the duration of and outcomes data collected by CATIE, and the distributional assumptions we made to analyze those data. Finally, our analysis cannot address the value of algorithms that may improve decisions about which sequences or combinations of drugs should be used when initial treatment with a single drug does not produce a desirable outcome.
Answering these questions will require further investment in research that will need to be justified relative to other areas of research. Building on the tradition of CATIE in establishing the potential of comparative effectiveness research to inform policy making, value-of-research calculations such as those presented here can play an important role in designing such trials, prioritizing them relative to other studies, and building the evidence base to assess the appropriate level of overall spending on comparative effectiveness research.
This study was sponsored by Best Practice Inc., with partial and unrestricted grant support from the Foundation for Education and Research on Mental Illness, Janssen Pharmaceuticals, and the Center for Medicine in the Public Interest. David Meltzer also acknowledges salary support from the Agency for Healthcare Research and Quality through the Hospital Medicine and Economics Center for Education and Research in Therapeutics (U18HS016967) and the National Institute on Aging through a mid-career career development award (K24 AG031326). Herbert Meltzer acknowledges salary support from and participation in the CATIE study as a consultant and investigator. A part of Anirban Basu’s time was supported by a research grant from the National Institute of Mental Health (1R01MH083706-01).
A preliminary version of this work was presented at the Scientific Session of the International Society for CNS Clinical Trials and Methodology (ISCTM), in Washington, D.C., 26 February 2008.
David O. Meltzer, Section of Hospital Medicine, at the University of Chicago in Illinois.
Anirban Basu, Section of Hospital Medicine, at the University of Chicago in Illinois.
Herbert Y. Meltzer, Vanderbilt University in Nashville, Tennessee.