|Home | About | Journals | Submit | Contact Us | Français|
Microsimulation models (MSMs) for health outcomes simulate individual event histories associated with key components of a disease process; these simulated life histories can be aggregated to estimate population-level effects of treatment on disease outcomes and the comparative effectiveness of treatments. Although MSMs are used to address a wide range of research questions, methodological improvements in MSM approaches have been slowed by the lack of communication among modelers. In addition, there are few resources to guide individuals who may wish to use MSM projections to inform decisions.
This article presents an overview of microsimulation modeling, focusing on the development and application of MSMs for health policy questions. We discuss MSM goals, overall components of MSMs, methods for selecting MSM parameters to reproduce observed or expected results (calibration), methods for MSM checking (validation), and issues related to reporting and interpreting MSM findings (sensitivity analyses, reporting of variability, and model transparency).
MSMs are increasingly being used to provide information to guide health policy decisions. This increased use brings with it the need both for better understanding of MSMs by policy researchers, and continued improvement in methods for developing and applying MSMs.
In the late 1950's, Guy Orcutt proposed microsimulation models (MSMs) as a method for exploring social policy questions by simulating the effect of policy on decision making units (e.g., individuals, families, or corporations).1 During the 1970's, MSMs were developed to guide U.S. social policy decisions.2 Models of traffic flow are used to plan transit projects.3, 4 MSMs are used in operations research5 to describe queuing systems; where they are referred to as discrete event simulations6–8. MSMs for the transmission of infectious diseases such as HIV,6, 7 influenza,8–10 smallpox,11 onchocerciasis12, 13 and shistosomiasis,14 are used to examine the effects of intervention and policy change on disease transmission, including the cost effectiveness of vaccination programs.15 MSMs for traffic and infectious diseases allow interactions among `agents' (e.g., cars or individuals). Such `agent-based models'16, 17 are needed when agent interactions are critical to downstream outcomes, for example, when modeling epidemics. To simplify discussion, this article focuses on somewhat simpler MSMs that assume independence across individuals, though the issues raised here are equally relevant to agent-based models.
MSMs were applied to health policy questions as early as 1985, when Habbema and colleagues introduced the “Microsimulation Screening Analysis” (MISCAN) model18 to examine the impact of cancer screening on morbidity and mortality. MISCAN models have been developed to describe the effects of screening for cervical19, breast20, colorectal21, and prostate cancer.22 In 1994, the Population Health Model was introduced to evaluate costs of diagnosis and treatment of lung and breast cancer,23, 24 as part of a broader microsimulation effort by Statistics Canada25. MSMs have also been developed for diabetes,26, 27 cardiovascular disease,28, 29 stroke,30 reoperation rates after aortic valve replacement,31 organ transplant,32, 33 osteoporosis,34–36 and end stage liver disease.37 Karnon and colleagues review some of the models used to evaluate screening programs.38 Several funding agencies now support the development and application of MSMs, including the National Cancer Institute's Cancer Intervention and Surveillance Modeling Network (CISNET)39, the National Institute of General Medical Sciences' Models of Infectious Disease Agent Study40, and the Robert Wood Johnson Foundation and National Institutes of Health's Childhood Obesity Modeling Network41.
Many of the MSMs used to examine health policy questions are written by researchers for specific disease processes, an endeavor requiring a high level of programming expertise and a large time commitment. Software programs, such as TreeAge42 and Arena,43 allow users to implement discrete event MSMs with relative ease,44–47 though there are few publications to guide this work. With the increasing use of MSMs, there is a growing need for both modelers and end-users of MSMs to consider issues related to their application. This article provides an overview of MSMs, focusing on model development and applications to health policy questions.
MSMs describe events and outcomes at the person-level with an ultimate goal of providing information that can guide policy decisions. (In contrast, mechanistic models, such as those that simulate the behavior of cells, are developed to provide insight into underlying processes. Biological models, discussed in section 3.0, can incorporate mechanistic models into policy-focused MSMs.) Examples of policy-relevant findings from MSMs include over-diagnosis of prostate cancer among PSA-detected cases;48 identification of efficient cervical cancer screening policies;49 and the impact of modifiable risk factors, screening, and treatment on colorectal cancer (CRC) mortality rate.50
MSMs provide policy-relevant information by generating predictions. For example, MSMs can be used to predict trends in disease incidence and mortality under alternative health policy scenarios50, 51, or to compare the effectiveness and cost-effectiveness of treatments.52 Using MSMs for prediction often requires integration of results across studies, and may require extension of results, for example extending cross-sectional results to longitudinal predictions, and extending results to more diverse patient populations or to comparisons not made in available RCTs. For example, RCTs demonstrate that fecal occult blood testing (FOBT) leads to reduced CRC mortality,53–55 and a case-control study found that flexible sigmoidoscopy was also associated with reduced CRC mortality.56 While there is no direct evidence that either optical colonoscopy or CT colonography reduce CRC mortality, several studies have estimated their sensitivity and specificity for detecting colorectal adenomas.57–59 MSMs for colorectal cancer have combined available information about CRC screening tests to compare the effect and cost-effectiveness of all four of these screening modalities.60
Thus, the explicit goal of using MSMs for prediction is met by developing models that combine results from randomized controlled trials, epidemiologic studies (e.g., case-control and cohort studies), meta-analyses, and expert options. This combining of information becomes an implicit modeling goal and is conducted in conjunction with model calibration.
MSMs have two components: a natural history model and an intervention model. Our focus is on the natural history model, which describes the disease process in the absence of intervention. The natural history model may be complex, but once developed it can be combined with different intervention models to answer policy questions. The appropriate level of model complexity depends on the disease process, the scientific questions to be addressed by the model, and data available to inform model parameters. There is a tension between the simplicity of a model and the complexity of the disease process. Simpler models include fewer parameters making it more likely that model parameters can be estimated using observed data, and are easier to describe, making them more transparent. However, more complex MSMs can be used to address a wider range of scientific questions. One approach is to begin with relatively simple models, extending these as needed to address specific research questions.61
When considering model complexity, it is useful to distinguish between biological models, which describe the underlying disease processes, and epidemiological models, which focus attention on the observable portion of the disease process62. For example, Luebeck and colleagues63 predicted the effect of folate on colorectal cancer risk using a biological model that describes three phases of carcinogenesis: stem cell initiation (acquisition of one or more mutations), cell proliferation, and malignant conversion. Their model allowed folate to both reduce initiation rates and increase proliferation rates, and predicted that early folate initiation (e.g., at age 2) decreases colorectal cancer risk while late folate initiation (e.g., at age 65) may increase risk. While complex biological models can describe the disease process more completely than epidemiological models, they also include parameters that cannot be directly informed by data (such as the impact of folate on cell initiation and proliferation rates).
There are three essential steps in developing any MSM: 1) Identifying a fixed number of distinct states and characteristics associated with these states; 2) Specifying stochastic rules for transition through states; and 3) Setting values for model parameters.
The specification of distinct states included in an MSM depends on the disease process, the research questions of interest, and the availability of data. The number of states included in a model may also depend on how characteristics are attributed to states. For example, an MSM could describe lesion size categorically with different size categories considered different states, or it could model a single lesion state with size treated as a characteristic of the lesion.
MSMs are either state-transition or continuous-time models. State-transition models allow individuals to move between states at fixed time intervals (possibly dependent on the state), with probabilities specific to this cycle length. In continuous-time MSMs, intervals between transitions have continuous distributions. Both types MSMs allow transition rules to depend on individual characteristics. MSM transition probabilities may be specified as Markov,64 meaning that the probabilities of the next transition depend only on the current state and not on the previous history.65 Many models relax this assumption by carrying forward relevant past information into the current state.
Given the structure of an MSM, it is necessary to specify the values of parameters that determine transitions through disease states. Parameters associated with observable processes can be estimated directly from biological, clinical, or epidemiological evidence. For example, the distribution survival time following cancer diagnosis may be estimated from Surveillance Epidemiology and End Reporting (SEER) data,66 and other-cause mortality probabilities may be based directly on life tables derived by removing deaths attributable to the disease modeled from Berkeley mortality databasesA. For example, Rosenberg67 demonstrates calculation of non-breast cancer mortality using Berkeley mortality data bases with breast cancer deaths estimated using data from the National Center for Health Statistics databases.68
When such direct estimation is not possible, model parameters are selected so that the model reproduces observed results, a process called `calibration'. Calibration is typically necessary for complex models involving transitions that are only observed through indirect consequences. For example, an observed rate of tumors detected is a function of two unobserved processes: the rate of tumor initiation and the growth of tumors to a detectable size. In this case, the functional relationships between MSM parameters and observable calibration data are complex, since MSM parameters describe transitions between specific states, which might not all be observed, and data generally describes observed states resulting from a series of transitions. As a result, MSM parameters may be nonidentifiable.69 Parameter identifiability is determined by the model in the context of available data. A nonidentifiable parameter cannot be estimated, even with an infinite amount of data. However, the parameter could become identifiable if a new type of data became available. When parameters are unidentifiable, it is still possible to find values that provide good fit to calibration data, but these parameter values are not unique. For example, (observed) rates of detected tumors at screening might depend on (unobservable) rates of initiation and growth to detectable size, with different combinations of these parameters producing the same rate of detection. Whether or not a parameter is identifiable may not be obvious. A parameter associated with an unobservable process can be identifiable if there is sufficient information about the entire process. Complete description of a disease process may necessitate specification of an MSM with nonidentifiable parameters. In this case, parameters can be identified by adding information to the model, either by setting some parameters to fixed values and carrying out sensitivity analysis (described in section 5.2) or by specifying prior distributions for parameters and carrying out Bayesian calibration (described below).
Early MSMs were calibrated by perturbing parameters one at a time and subjectively judging agreement with available data.70 Subjective judgment has largely been replaced by the use of statistics that measure how well the model fits calibration data, and ad hoc parameter perturbation has been replaced by grid search methods.71–74 An undirected grid search selects parameter values based on evaluation of model fit at every node in a grid of parameter values or at a random set of values.73, 75 Undirected searching is based only on MSM results, and therefore can be directly applied to virtually any MSM. However, grid searches are not computationally feasible for highly parameterized models, since the number of grid nodes grows exponentially with the number of model parameters. Furthermore, even a dense grid or a large random sample of parameters might miss regions of good fit.
Directed searches move through the parameter space by `hill climbing', that is, moving in a direction of improving goodness of fit. (The Nelder Mead simplex algorithm76 is an example of a commonly used directed search method.) The direction taken by a directed search is based on the derivative of the likelihood function, which describes the probability of the calibration data given the MSM and a set of parameter values. These derivatives provide information about the rate of change of objective functions, and so direct the algorithm to move in the direction of most rapid increase (`up the hill'). For MSMs, there is usually not a closed form expression for these derivatives, so that directional searches must rely on approximations.77 In addition, directed searches may find parameter values that provide locally good fit, but not the best fit across all possible parameter sets (globally good fit). To avoid locally, but not globally, good solutions, directed searches should be initiated at widely dispersed points within the parameter space. In spite of these difficulties, directional search methods for MSM calibration are generally more computationally efficient than grid search approaches, requiring fewer runs of the MSM for parameter estimation. Some modelers have focused on model simplification to allow likelihood-based estimation of MSM parameters that parallels usual frequentist or Bayesian estimation approaches.74, 78–86
Another challenge associated with calibration is the combination of data from different sources, collected at different times, from different populations. This problem, essentially one of meta-analysis, arises from the implicit MSM goal of integrating results from randomized controlled trials, observational studies, and expert opinion. If study data include information on participant characteristics, it can be incorporated into models to adjust for between-study differences, but such covariate adjustment is not always possible. In addition, some data may carry more weight than other data because of the sample size, the relevance of the population studied or subjectively perceived data quality. Thus, for example, the modeler might have to assign relative weights to a study of moderate size conducted under current conditions and a much larger study conducted during an earlier period.
Once developed, MSMs are used to simulate a hypothetical population with specific characteristics, such as a specific age-sex distribution, or a specific risk factor profile. An MSM can be structured to simulate a population directly, taking distributions of population characteristics at baseline as inputs (e.g., age, gender, and relevant risk factors). Alternatively, cohort models simulate individuals for a relatively narrow age range and specify uniform age distributions within the age range, with hypothetical populations generated by combining multiple simulated cohorts.
MSMs are sometimes used to simulate cohorts rather than representative populations. The choice of the target set (population or cohort) can affect the conclusions drawn from the MSM when costs, effectiveness, or cost effectiveness vary across cohorts.87 For example, because cervical cancer screening is less cost-effective for older women, simulations targeted to a cohort of 15–20 year olds will find that shortening the screening interval from 3 to 2 years is more cost effective than simulations targeted to a population that includes more older women.88
Assessment of a calibrated MSM includes model validation, examination of sensitivity to untestable assumptions, and incorporation of variability.
Model validation is the process of assessing whether a model is consistent with data not used for calibration, a process also called `external validation'. Because validation requires that data be held out of the calibration process, MSMs may not be validated, or validation may be based on simulation of randomized trials that are not directly related to the processes of interest. A related problem is that the models generally cannot be validated against the outcomes of greatest interest, since the models focus on unobserved or unobservable phenomena. In some cases, important outcomes, such as survival, may be set aside and used as validation points.89
Comparison of model outputs to calibration data is sometimes called `internal validation'. There is a grey area between internal and external validation that involves using detailed calibration data for model assessment. For example, suppose that calibration data include incidence rates by decade of life, and these rates can be used to internally validate the MSM. If incidence rates were also available by gender and year of life, then further model validation could compare these more detailed rates that were not directly used for calibration.
Sensitivity analysis refers to estimation and presentation of model results under various scenarios, often corresponding to varying values of model parameters that are inestimable or poorly estimable. Sensitivity analyses can also provide insight into the impact of specific model assumptions. For example, sensitivity analysis can be used to explore whether adenoma regression, which cannot be directly observed, is plausible by comparing predictions under specific scenarios, e.g., `no regression' and `10% of lesions regress'.72 Probabilistic sensitivity analysis places distributions on unknown parameters, providing a range of possible results. Parameters are sampled from specified distributions, and multiple MSM runs are used to infer variability in model results that result from variability in model parameters.83, 90–93 Sensitivity analyses are common, largely because most models include unobservable components.
Another type of model assessment compares predictions from different models that are calibrated using the same data. Such comparative modeling studies allow exploration of uncertainty due to model structure. Examples of this approach include estimation of the combined effects of screening and treatment on breast cancer mortality based on 7 CISNET models for breast cancer89 and the Mt. Hood Challenge comparing diabetes models.26, 27 Each of these groups compared models only after standardizing the calibration data. Without such cooperation to simulate and present results, it can be difficult to directly compare model results. For example, four different simulation studies examining the cost-effectiveness of spiral CT for lung cancer screening based on results from the Early Lung Cancer Action Project (ELCAP)94, 95 reported a wide range of cost effectiveness, from $2,500 per life year gained to $154,000 per quality adjusted life year gained.96–99 The reasons for these large differences includes differences in the screening frequency examined, lung cancer risk in the hypothetical population, and costs attributed to screening, diagnosis and treatment. Differences in the underlying assumptions of the microsimulation models is another important source of variability, but the impact of these differences is difficult to determined without a systematic comparison of both model assumptions and model predictions.
While comparison of results across independently developed models provides an important avenue for addressing variability due to model structure, these comparisons are very time consuming and are only practical for major policy questions. In addition, it can be difficult to determine which models provide the best fit to available data. Bayesian model averaging provides a formal framework for model comparison, that could in theory be used to compare different parameterization of a single model, but this approach requires a maximized likelihood for each competing model.100
Sources of MSM variability and uncertainty include inherent variability in the population of interest, variability due to estimation of unknown parameters, selection of calibration data, sampling variability of the selected calibration data, simulation (Monte Carlo) variability, and variability due to model structure assumptions.83, 101 Bayesian calibration methods parallel Bayesian estimation, and provide interval estimates that describe the variability in both parameter estimates and model predictions due to parameter estimation, sampling variability of the selected calibration data, and simulation variability85. Repeated model runs that systematically vary parameter values can be used to assess the relationship between parameter variability and variability in model predictions, with findings used to direct model improvement towards reducing variability of those parameters that have the greatest impact on prediction variability (e.g., additional data collection or modifications to the model structure).102, 103.
To illustrate the use of replication in error analysis for calibration, suppose the target population includes N individuals. Rather than simulating a single population of 1000N individuals, more information can be gained by simulating results independently in 1000 samples of size N; while a precise point estimate is obtained as the mean of the 1000 simulations, an interval estimate can be obtained by calculating the variability in model results across simulations.
Although these methods can be used to summarize and explain uncertainty in model predictions, because of computational and conceptual limitations, MSM results are routinely provided without measures of precision. Further development of computationally efficient methods to estimate the uncertainty in MSM model estimates and predictions is an important area of research.
Model transparency refers to the ability to convey model assumptions. There are divergent views on what transparency means and what level of transparency is optimal 104,105 Different levels of disclosure can be consistent with model transparency, such as describing MSM assumptions (including relative costs), providing algorithms used in the MSM, providing equations used to program the MSM, and, ultimately, releasing the computer code underlying the model. While release of computer code is seemingly the most transparent approach, this strategy is time consuming and ultimately uninformative to the vast majority of end users so that code release may obscure rather than clarify the model. While transparency often refers to model structure (assumptions, algorithms, equations), it also should incorporate the data used to calibrate or estimate model parameters, goodness of fit to calibration data, including fit to subgroups of interest, and validation results.
MSM transparency can be difficult to achieve. In experimental or observational research, transparency is often understood as the hypothetical ability to repeat an experiment or analysis. Yet many published articles use complex data analytic models that cannot be completely described within page limits, so that even these analyses are not completely transparent. Furthermore, there are very few venues for complete model description and review, which is necessary for complete evaluation of the model. Technical appendices, including supplemental material published online, can be critical. For example the CISNET modeling group provides online model profiles106. Finally, there are disincentives to fully transparent models. Models take years to develop, resulting in hesitation to fully disclose this `intellectual property'. There are explicit financial disincentives for making proprietary models fully transparent. In spite of these difficulties, model transparency is necessary if MSMs are used to inform policy decisions.
Microsimulation models combine expert opinion with observational and experimental results, providing a relatively inexpensive way to estimate population-level effects (including costs and benefits) of interventions, policy changes, or shifts in risk factors. Microsimulation modeling is beginning to coalesce as a defined area of expertise, as evidenced by publication of modeling guidelines,107, 108 funding of microsimulation modeling efforts by NCI and NIH, microsimulation-focused conferences such as the Conference of the International Microsimulation Association, sponsored by Statistics Canada 109 and the General Conference of the International Microsimulation Association110, and the recently launched International Journal of Microsimulation. Given this interest, we anticipate both an increased use of microsimulation models to address research questions and parallel improvements in methods used to develop and apply microsimulation models.
Supported by NCI U01 CA97427