|Home | About | Journals | Submit | Contact Us | Français|
Cost-effectiveness analysis (CEA) is an analytic tool in which the costs and effects of an intervention designed to prevent, diagnose, or treat disease are calculated and compared with an alternative strategy to achieve the same goals. The results of a CEA are presented as a ratio of costs to effects, where the effects are health outcomes such as cases of disease prevented, years of life gained, or quality-adjusted life years gained, rather than monetary measures, as in cost-benefit analysis. Conducting a CEA requires a framework for portraying the cascade of events that occur as a consequence of the decision to intervene, for describing the probability that each event will occur, for accounting how long each event will last, and describing how much each event costs and is valued by the population or individuals targeted by the intervention. Mathematical models are well suited to these purposes.
The purpose of this article is to provide an overview of modeling to estimate net effectiveness in a CEA (the difference in effectiveness between an intervention and the alternative to which it is being compared). Many of the principles described for estimating effectiveness apply equally to determining costs in a CEA. The main difference is that health events are weighted by costs in the numerator of the cost-effectiveness ratio, while they are often weighted by preference values in the denominator. Preference values, or utilities, reflect the fact that individuals or populations with similar ability (or disability) to function may regard that level of functioning differently. When preferences are incorporated into CEAs, the results are generally expressed as costs per quality-adjusted life years.1,2 A discussion of measurement of costs and valuing outcomes is beyond the scope of this article; for further information on these, and other components of a CEA, the reader is referred elsewhere.3–5 Following some definitions of terms, this article is organized into two sections describing the process of estimating effectiveness in a CEA: the first presents a review of the sources of event probabilities, and the second describes the use of modeling to estimate effectiveness.
Effectiveness, which reflects the impact of an intervention of health in real practice settings, should be distinguished from two related concepts, efficacy and appropriateness. Efficacy refers to impact under ideal conditions (e.g., randomized controlled trials). Appropriateness reflects a broader range of issues considered in deciding whether an intervention should or should not be done, including acceptability, feasibility, and cost-effectiveness.6–8
Cost-effectiveness analysis calculates incremental effectiveness (and costs); that is, the difference in effectiveness (and costs) between the intervention of interest, and the next least effective (costly) alternative. This is distinguished from marginal effects (and costs), which refer to a production function, where, for instance there are other effects (or costs) associated with producing one additional unit of output.
To calculate net effectiveness, we must estimate the probabilities of all events that occur as a consequence of the intervention and alternative. Probabilities express the degree of certainty that an event will happen, on a scale from 0.0 (certainty that the event will not occur) to 1.0 (certainty that the event will occur). Probabilities found in the literature are often not in the form required for the CEA.9–14 For instance, a lifetime cumulative risk of breast cancer must be converted to an annual probability.
Cost-effectiveness analysis probability data can be collected as part of a research protocol (primary data), or can be abstracted or extrapolated from existing published research (secondary data). Event probability values should be selected or collected from the best designed and least biased sources that are relevant to the question and population under study using the following hierarchy (in decreasing order): well-conducted randomized, controlled trials (RCTs); observational data, including cohort, case-control, and cross-sectional studies; uncontrolled experiments; descriptive series; and expert opinion.15 Less rigorously designed studies drawing similar conclusions may be the best available source of data for a particular subpopulation or research hypothesis, in the absence of other data.
Table 1 summarizes the main advantages and disadvantages of different study designs as data sources for a CEA.16 Although well-conducted RCTs are generally accepted as the most powerful tool for assessing the efficacy of interventions, in CEA one is most interested in how an intervention performs in real-life (i.e., non-RCT) settings. Observational cohort and case-control studies can provide real-world data on the probabilities of particular outcomes associated with an intervention. Observational studies differ from RCTs in that the investigators do not have control over which persons receive the intervention, so that observational studies may be subject to unknown selection effects. Also, the use of case-control studies to evaluate the effectiveness of screening can be biased by the type of case or control group selected.17–21
For example, in evaluating the effectiveness of sigmoidoscopy screening to reduce mortality from colorectal cancer, cases should include all those dying from the disease, and controls should include both those with colorectal cancer who are still alive and those without cancer. Moreover, the definition of cancers in both the case and control groups should include those in reach of the sigmoidoscope and those beyond.22 This choice of study groups eliminates the lead time bias that would occur in situations if early-stage cases were compared with later-stage cases, where lead time bias is defined as resulting in earlier diagnosis, even though survival does not actually differ in screened and unscreened groups. Cross-sectional studies, case series, uncontrolled cohort studies, postmarketing surveillance, and disease and administrative databases may also provide data for a CEA.23–27
When there are insufficient data from any one source or when studies conflict, information from many types of good-quality studies can be combined to provide probability values for estimating effectiveness. The two major approaches are meta-analysis28–32 and Bayesian methods.33,34 Expert opinion and consensus panels are other synthesis techniques used to estimate effectiveness. For example, the original Oregon priority list relied on educated guesses of experts who estimated the ability of particular technologies and practices to improve survival.35 For further information on sources of effectiveness data, the reader is referred elsewhere.23,36,37
Randomized control trials and observational studies cannot compare all relevant alternative program designs that may be of interest to the CEA analyst or policy maker. Models have the advantage of providing the user with the ability to manipulate an intervention program in ways that are not possible in real-time experiments with human subjects. Models can be used to extrapolate from existing data to different population groups, points in time, or disease end points. For instance, models enable exploration of the implications of different screening or treatment intervals. Models allow simulation of the effects and costs of ending a screening program at a given age, investigating questions such as whether to continue cervical screening past the sixth decade of life. Models allow examination of the implications of using different cutoff points for screening tests, such as the cholesterol level chosen for initiating dietary or pharmacologic intervention, or the bone mineral density level chosen for initiating treatment for osteoporosis. Models can also be of use in performing sensitivity analyses and threshold analyses to ask what the data parameters would need to be for an intervention to be considered cost-effective.38
Models for estimating health effectiveness may be characterized along several dimensions: (1) the analytic methodology for accounting for events that occur over time, typically either a decision tree or state transition model; (2) application to cohorts longitudinally or to populations cross-sectionally; and (3) using deterministic or stochastic (probabilistic) calculations.
Decision tree models represent chance events and decisions over time.,39–41 Each path through the decision tree represents one possible sequence of events, and is associated with a probability and a consequence, such as life expectancy, or quality-adjusted life expectancy. Decision analysis models have been used extensively in the medical literature, for example, to estimate gains in life expectancy from vaccines,42–44 and for screening elderly women for breast cancer.45 One limitation of decision trees is that they are not well suited to representing multiple outcome events that recur over time.
State-transition models are more efficient representations of recurring events. State-transition models allocate, and subsequently reallocate, members of a population into one of several categories, or states. States may be defined according to disease stage, treatment status, or a combination of the two. Transitions occur from one state to another at defined, recurring time intervals (usually 1 year, but sometimes 3 months or 1 month for rapidly progressive diseases) according to transition probabilities. Transition probabilities can be made dependent on population characteristics, such as age or other risk factors.
Through simulation, or mathematical calculation, the number of members of the population passing through each state at each point in time can be estimated. A special type of state-transition model, in which the transition probabilities depend only on the current state (and not, for example, on the previous states or the path by which the current state was entered), is called a Markov model.46 State-transition models have been used to estimate outcomes in a large number of cost-effectiveness studies, including coronary heart disease prevention47 and treatment48; breast,49 cervical,50 and prostate cancer screening 51; and hormone replacement therapy.52 Decision tree models can also be augmented to include “Markov nodes,” or branching points within the tree that lead into a Markov model.53 Several computer programs, such as SMLTREE (© 1989, J. Hollenberg), DATA (© 1994, TreeAge Software, Inc.), and Decision Maker (© 1980, 1993, S.G. Pauker, F.A. Sonnenberg, and J.B. Wong, New England Medical Center, Boston, Mass.) can be used to construct such models. Other types of models, such as difference equations, have been used to assess the effectiveness of interventions targeting infectious diseases, such as AIDS prevention programs.54
All models include a population or group that is relevant to the research question. Simulations then project future outcomes or “follow the patients or individuals over time.” There are two common ways these modeling approaches can be accomplished, differing in the way in which the study population is constituted at the start of the model.
The first method, known as longitudinal modeling, calculates expected outcomes for “typical” patients or cohorts (i.e., groups of 50-year-old men with a first myocardial infarction) and follows them longitudinally through time to evaluate health outcomes resulting from alternative interventions. This approach is often used in decision tree models or models to extend the follow-up period of an RCT from the end of the trial, typically 1 to 5 years, to death.55 Results of such models are typically expressed as quality-adjusted life-years.
The second method is known as the cross-sectional model. These models record the health outcomes of a cross-section of an entire population, or a substrata of the population, and then follow each person in the population from their age at the start of the model to the end-point of the analysis (either a specified period, such as 10 years, or until death). The outcomes from alternative interventions are then summed or averaged over the population and expressed as an aggregate measure, such as quality-adjusted person-years. The model CAN* TROL and the Coronary Heart Disease Policy Model are examples of cross-sectional population models.47,56
The choice of a cross-sectional or longitudinal model is determined by the problem being studied. For instance, a cross-sectional model may be used to ask public health questions about interventions that are to be applied population-wide to groups of varying ages; a longitudinal model may be used to ask questions about the long-term effects of an intervention on an age-specific group.
Deterministic models calculate probabilities as an average number of health events. For example, suppose that one is interested in the number of people in a cohort of 10,000 who will be dead in 10 years from a particular disease, and suppose further that one knows the annual disease-specific mortality rate is 10% and the average annual other-cause mortality rate is 1%. The number of people who will be dead from the disease can be computed directly by multiplying the survival percentages by the expected number of survivors recursively in each of the 10 years.
In stochastic models, known as discrete event simulations, probabilities for each individual in the cohort over time are simulated using computer-generated random numbers to represent chance events. For instance, to calculate 10-year survival, simulating a 10% chance of death in a given year, the computer is directed to generate a random integer between 1 and 100, and if that integer is 10 or less, the computer program tallies the simulated person as dying in that year; otherwise, the person is deemed to have survived. The process is repeated over time for the survivors. The number of people in the cohort who are “observed” to live the full 10 years in the simulation is used as an estimate of the number that would be observed were a real study done under conditions of the simulation. The entire simulation is repeated many times, and the counts are averaged across simulation runs; as the number of runs grows large, these averages approach the values that would be computed by deterministic calculations. This type of discrete event simulation is also known as Monte Carlo simulation (e.g., the MISCAN simulation program57).
Deterministic calculations have the advantage of being exact. However, if a model is complex, involving many possible events and intervening decisions based on those events, deterministic computations must exhaustively calculate the probability of every possible combination of events and decisions. In problems of even moderate complexity, this may involve millions of combinations. Stochastic models are, in essence, empirical samplings from these combinations, so that each combination appears in the final counts in proportion to its likelihood.
In addition to greater ease of complex simulation, stochastic models have the advantage of yielding not only average effects, but also measures of the uncertainty around the computed average (i.e., they can provide a confidence interval); deterministic calculations yield a point estimate only. A limitation of stochastic models is that complex simulations require intricate knowledge of the disease’s natural history to estimate the parameters of the simulation. In the absence of such knowledge, the analyst must make the most reasonable assumptions about disease history. In either deterministic or stochastic modeling, lack of knowledge may be addressed through sensitivity analyses—varying the model parameters through reasonable ranges to observe the effect on the results. Generally it is desirable to use the simplest model possible, for which critical data are available for describing the parameters and their interrelationships.
Several issues germane to modeling will be briefly reviewed in this section, including specification of parameters and modeling patient characteristics, use of disease-specific or total mortality data, using models to “correct” for lead time and length biases, and model validation.
Clinical trials and observational studies typically provide estimates of risk reduction or relative risk during the follow-up period, but give little indication how to estimate the survival curve for individuals beyond the end of the trial. Moreover, a trial restricted to a particular demographic or clinical group begs the question of what the effect might be in persons of younger or older ages, persons of the opposite sex, or persons with comorbidities. Thus, the analyst must make assumptions regarding the appropriate basis for extrapolation beyond the period of observation and to populations with different survival curves. For example, in a CEA comparing two thrombolytic therapies for acute myocardial infarction, the analysts used primary data from an RCT to estimate 1-year survival and then extended the observation period by modeling survival based on a separate database of patients with coronary heart disease.55
The simplest assumption to make is that the age- and sex-specific risk of death for the affected population is modified by the disease in question, the intervention being evaluated, and any comorbidities that affect survival relative to the general population. A key choice is whether these three effects are additive or multiplicative.
Event probabilities in CEA models are often represented as conditional on patient characteristics, including age, gender, risk factors, stage of disease, and prior morbid events.58–62 These probabilities can be estimated separately for relevant subpopulations when data permit, but more often they are specified by an equation derived assuming a statistical relation between event probabilities and patient characteristics. The predictive equations can be derived using logistic regression, Poisson regression, proportional hazards models, or Bayesian analysis, to name a few techniques. These analyses can assume independence among the characteristics, or they may allow for interactions (e.g., effect modification); they can also be additive or multiplicative. Proportional hazard models and the logistic regression models are both essentially multiplicative. The declining exponential approximation to life expectancy (DEALE) model uses an additive function of risk factors.63,64 The implications of the choice of an additive versus a multiplicative assumption can be striking.9,10,47,65–68 When practical, sensitivity analysis should be used to evaluate different assumptions about parameter form.
Estimating length of life is a central problem in CEA. The main end point in many trials is disease-specific mortality—that is, mortality due to the disease addressed by the trial. However, the disease-specific mortality may be only part of the picture. For example, in estimating the effectiveness of cholesterol-lowering drugs to reduce deaths from cardiovascular disease, use of cardiovascular disease–specific mortality will overstate effectiveness if the intervention also leads to a higher rate of death from other causes.69,70
Another caveat regarding the use of disease-specific mortality to estimate effectiveness in a CEA concerns misclassification.71,72 In RCTs, in which the investigators have drawn careful protocols for attribution of cause of death and can make this determination in follow-up of study participants, the attributed disease-specific mortality rates may be useful inputs to the CEA modeling process. However, in less-controlled studies, there may be either an underreporting or overreporting of disease-specific causes. Thus, it is suggested that CEAs use all-cause mortality as the basis for estimating life expectancy gains.
Two related types of bias—lead time and length—make it difficult to determine with certainty if a screening intervention is effective in improving outcomes seen in nonrandomized controlled studies;15,73–75 modeling can be used to “correct” estimates of effectiveness for these biases. Lead time in a screening program is the time, in the normal course of disease, between the average time of early diagnosis by screening or case finding, and the average time of diagnosis in the absence of screening. Lead time bias is an overestimate of the increased survival associated with screening, owing to the fact that the disease is diagnosed earlier in its natural history. In extreme cases, all of the observed increase in survival with screening may be attributable to lead time bias, and there may be no actual prolongation of life.74,75 Lead time may have another consequence that is important for effectiveness estimation: earlier diagnosis and treatment afforded by screening may expose the patient to a longer period of adverse treatment effects than would occur in the absence of screening.
Length bias refers to the tendency for slower-growing, less-virulent disease to be detected in a screening program more often than more aggressive disease. As a result, those with aggressive disease are underrepresented in screened populations, and patients detected by screening may do better than unscreened patients, regardless of whether screening actually influences outcome, leading to an overestimation of effectiveness.15,74
The analyst can address these biases by modeling the disease process directly; this requires estimates of such variables as tumor progression, stage-specific screening sensitivity, and stage-specific treatment response. One simulation model that incorporates such disease process modeling to avoid lead time and length biases is the MISCAN model.57 For examples of this, and other approaches, the reader is referred to other sources.49,58–62,76–79
Models often cannot be validated directly. The results of a model should, therefore, be accompanied by sensitivity analyses identifying which model inputs and parameters exert the most leverage on outputs. However, some aspects of models, such as which variables are included in the model inputs and their relations within the model’s structure (i.e., additivity or multiplicativity), are not easily amenable to sensitivity analysis. Thus, validation of a model may have to rest largely on the inherent reasonableness of the model and its assumptions. The technical accuracy of models must also be verified, to ensure that the model performs the calculations correctly as claimed. Computer programming and data entry errors, as well as logical inconsistencies in model specification, can all lead to errors that should be detected in the verification process. Reports based on models should contain assurances that the model has been verified.
The predictive validity of a model should also be evaluated when data are available to validate intermediate or final numerical predictions. For instance, predictions of cancer models can be compared with data on observed patterns in cancer incidence, staging, and mortality.80 It is incumbent on the modeler to provide for the possibility of peer review and replication by colleagues; procedures for review of complex models need to be developed.
In conclusion, the goals of this article were to provide researchers and clinicians with a familiarity with the issues and terminology involved in evaluating the effectiveness of medical practices. The calculation of net effectiveness for CEA involves accounting for the many complex events that follow from the decision to intervene with patients or populations. Estimates of net effectiveness will rarely be based on the results from a single study that collects all of the outcomes of interest for the alternatives to be analyzed. Much more frequently, the process of estimating effectiveness will be one of constructing models that combine diverse information from several sources. Although we may see more future studies that are designed specifically to collect the primary data needed for a CEA,81 we are likely to continue to rely on approaches that combine data in mathematical models.
The members of the Panel on Cost-Effectiveness in Health and Medicine include Norman Daniels, PhD; Dennis G. Fryback, PhD; Alan M. Garber, MD, PhD; Marthe R. Gold, MD, MPH; David H. Hadorn, MD; Mark S. Kamlet, PhD; Joseph Lipscomb, PhD; Bryan R. Luce, PhD; Jeanne S. Mandelblatt, MD, MPH; Willard G. Manning, Jr., PhD; Donald L. Patrick, PhD; Louise B. Russell, PhD; Joanna E. Siegel, ScD; George W. Torrance, PhD; and Milton C. Weinstein, PhD.
We thank Kristine I. McCoy, MPH, for research, coordination, and editorial assistance in association with this project, and William F. Lawrence, MD, MSc, for editorial review of this manuscript.