|Home | About | Journals | Submit | Contact Us | Français|
Increasingly, computer simulation models are used for economic and policy evaluation in cancer prevention and control. A model’s predictions of key outcomes such as screening effectiveness depends on the values of unobservable natural history parameters. Calibration is the process of determining the values of unobservable parameters by constraining model output to replicate observed data. Because there are many approaches for model calibration and little consensus on best practices, we surveyed the literature to catalogue the use and reporting of these methods in cancer simulation models.
We conducted a MEDLINE search (1980 through 2006) for articles on cancer screening models and supplemented search results with articles from our personal reference databases. For each article, two authors independently abstracted pre-determined items using a standard form. Data items included cancer site, model type, methods used for determination of unobservable parameter values, and description of any calibration protocol. All authors reached consensus on items of disagreement. Reviews and non-cancer models were excluded. Articles describing analytical models which estimate parameters with statistical approaches (e.g., maximum likelihood) were catalogued separately. Models that included unobservable parameters were analyzed and classified by whether calibration methods were reported and if so, the methods used.
The review process yielded 154 articles that met our inclusion criteria and of these, we concluded that 131 may have used calibration methods to determine model parameters. Although the term “calibration” was not always used, descriptions of calibration or “model fitting” were found in 50% (n=66) of the articles with an additional 16% (n=21) providing a reference to methods. Calibration target data were identified in nearly all of these articles. Other methodologic details such as the goodness-of-fit metric were discussed in 54% (n=47 of 87) of the articles reporting calibration methods while few details were provided on the algorithms used to search the parameter space.
Our review shows the use of this type of modeling methods is increasing although thorough descriptions of calibration procedures are rare in the published literature on cancer screening models. Calibration is a key component of model development and is central to the validity and credibility of subsequent analyses and inferences drawn from model predictions. To aid peer-review and facilitate discussion of modeling methods, we propose a standardized Calibration Reporting Checklist for model documentation.
Increasingly, mathematical and computer models are used for economic evaluation of cancer prevention and control policies.[1–5] These models fill an important role in policy making as they are able to synthesize data from multiple sources and estimate the effects of interventions in situations when clinical trials may not be feasible because of time, cost, and/or ethical considerations. The National Cancer Institute’s Cancer Intervention and Surveillance Modeling Network (CISNET) recently spurred growth in modeling efforts by funding the development of over 18 models of breast, prostate, colon and lung cancers built to investigate prevention, screening and treatment policy questions in the United States (http://www.cisnet.cancer.gov/). In addition to CISNET, numerous other cancer simulation models have been developed including those for cervical, ovarian and gastric cancers.[7–11]
In general, disease natural history models use a “systems” approach to simulate the underlying course of disease in individuals and project the overall effect of disease on health in a population. Many of these models are quite complex, describing both unobservable and observable portions of the natural history at an individual level. Capturing the mechanism of disease onset and growth in simulated individuals involves specification and determination of unknown model parameters, many of which cannot be directly informed by data as none may exist. One method for parameter determination is calibration.
Formally, model calibration is the process of determining parameters values such that model output replicates empirical data.[6, 12–14] It is performed by comparing model output from different input parameter sets with existing data to identify parameter set(s) which produce model outputs that best correspond to those data. This is often a complex task and there has been little consensus on best practice. Calibration, often termed “model fitting”, may be distinguished from other methods of parameter determination such as direct “estimation” of model parameters. For estimation methods, model parameters are estimated in separate processes from the model itself and the overall fit of model output to data is not considered in parameter determination.
Calibration is a key component of model development and together with validation establishes credibility of modeling results. Models are already often criticized for being “black boxes” with documentation often lacking transparency. If modeling is to gain strength as a tool for informing health policy, it is critical that the assumptions, structure, input data, and parameter estimation methods including calibration are well-documented and made available to “consumers” of these models. To understand how calibration methods are currently described, we surveyed the literature to catalogue these methods as they are used in cancer simulation models. We then proposed a framework for reporting the calibration methods used in cancer models and disease simulation models in general.
We conducted a focused qualitative review of the literature on calibration methods in cancer simulation models.
Multiple sources were used to identify relevant published literature. We used MEDLINE to identify English-language articles published in the years 1980 through 2006. The following US National Library of Medicine “Medical Subject Headings” (MeSH) and keywords were used in the search: cancer, neoplasm, simulation, computer simulation, natural history, and mass screening. The search results were supplemented by reviewing the reference lists from the articles identified in the search and from articles in our personal reference databases. The full listing of articles retrieved by the search is available from the authors.
Study inclusion criteria consisted of articles describing models that explicitly captured the mechanism of the underlying natural history of cancer. Reviews and commentaries were excluded, as were articles describing models of diseases other than cancer. Also excluded were articles unanimously judged by the authors to describe models with no natural history component, including models that begin from the point of cancer detection. We further distinguished purely analytic models that use statistical inference to estimate parameters from microsimulation models that use calibration to determine underlying parameter values. For analytic models, the process of direct estimation of the unknown natural history parameters to observed data is typically done independently of the model itself. We note that some modelers use a “hybrid” approach with direct estimation of some parameters and subsequent calibration of others.
For each article, two investigators independently abstracted pre-determined data elements using a standard form (Appendix). Data elements abstracted included cancer site, type of simulation model, methods used for determination of unobservable parameter values, description of calibration protocol (if any), and whether model validation was mentioned. All investigators reached consensus on items of disagreement. For articles that provided no calibration information but that referenced previous publication(s) with the same model, information from the prior publication(s) was used to supplement the information reported in the primary article. Review data were summarized using descriptive statistics.
Calibration protocols were characterized by the descriptions of five components: calibration targets, goodness-of-fit metric, search algorithm, acceptance criteria, and stopping rule. A “target” refers to observed data the model attempts to replicate during calibration. The goodness-of-fit (GOF) metric is the quantitative measure of the model’s fit or ability to replicate target data for a particular set of parameter values. The search algorithm is the method for selecting alternative model parameter values to evaluate. The acceptance criteria specify the satisfactory or acceptable levels of fit based on the GOF metric for a particular set of parameter values. Finally, the stopping rule describes the rationale for ending the search procedure and calibration process as a whole. These components are described in more detail in Table 1.
The MEDLINE search and personal databases yielded 169 unique articles. We excluded 18 articles on the basis of the title and/or abstract (5 reviews, editorials, or commentaries; 1 meta-analysis; 3 articles about a disease other than cancer; and 9 cancer-related articles that did not report on a model of the natural history of the disease). We reviewed and abstracted data from the remaining 151 articles and excluded an additional 28 articles (5 review articles, editorials, or commentaries, 1 article reporting the results of a database analysis; 4 methodological papers using general models for which calibration was not necessary, 18 articles on reporting on models that did not track the underlying natural history of cancer or that did not evaluate cancer screening programs). One hundred twenty three articles met the criteria for inclusion. From the references to prior publications in these articles, we identified 31 additional articles that met the inclusion criteria, yielding a total of 154 articles.
The number of published articles describing cancer simulation models increased substantially over the study period (Figure 1). Articles describing models of the natural history of breast or cervical cancers were the most prevalent (Table 2). Note that multiple articles using the same model were counted separately.
Of these 154 articles, 23 used purely analytic methods to directly estimate parameter values[27, 40, 45, 50, 54, 59, 62–66, 69, 73, 74, 78, 95, 123, 124, 149, 150, 161, 174, 175], leaving 131 articles for which calibration may have been used to determine at least some unknown model parameter values. This subset includes articles that may have used a hybrid or combination approach of both analytic and calibration methods for parameter determination (for example, see references [49, 70, 71]). Subsequent summary statistics on the proportions of articles reporting calibration details are based on these 131 articles.
We found that 66 articles discussed or alluded to calibration in the description of the model in the article itself. In some, calibration was explicitly mentioned in the text (for example, see references [48, 61, 120]) while in others we inferred calibration was conducted by authors’ use of terms such as “model fitting” or “model identification” (for example, see references [11, 93, 135]). An additional 21 articles did not explicitly state or imply that calibration was used but did provide references to a prior publication in which calibration methods were mentioned.[29, 34, 37, 38, 41, 42, 47, 68, 79, 82, 84, 94, 98, 104, 106, 111, 113, 119, 121, 128, 136] Thus 87 articles (66%) provided some documentation for the calibration protocol used in developing the model.
Of the 87 articles that discussed or provided a reference to calibration methods, 95% (83 of 87) made at least some mention of the data used as calibration targets. Targets included data from cancer registries, observational studies, and clinical trials. The vast majority of the models were calibrated to multiple targets, although in most cases it was unclear if they were calibrated to these data simultaneously or in stages.
Goodness-of-fit metrics were either explicitly or implicitly described in 54% (47 of 87) of the articles that discussed calibration. Visual assessment of fit, a qualitative goodness-of-fit metric, was used in 20 articles although its use was typically inferred from the article text rather than explicitly stated. For the 27 articles that reported quantitative methods, two used likelihood-based measures [39, 77] and 25 used distance measures such as the absolute or relative differences (for example, see references [43, 48, 61, 71, 81, 135]). The majority of articles did not describe how goodness-of-fit metrics for individual targets were combined to yield an overall goodness-of-fit measure for the model parameters under calibration.
Search algorithms were generally not well described in the articles we reviewed. We inferred that some used informal “trial-and-error” approaches (for example, see reference ) while others specified “systematic variation” of model parameter values and provided no further description (for example, see references [7, 36, 81]). When further specified, formal search algorithms included grid search (for example, see references [58, 61, 71, 80]) and random sampling (for example, see references [31, 48]) as well as directed or iterative search methods. Directed search methods use to computer algorithms and numerical approximation techniques identify points in the parameter space likely to lead to successively better model fits. Methods used included the Nelder-Mead algorithm (for example, see references [39, 164]), as well as a variety of other optimization algorithms from engineering (for example, see references [39, 77, 156]).
Criteria for identifying parameter sets that provide an acceptable model fit were rarely described in the articles reviewed. Stopping criteria were also not well documented. With few exceptions, modelers ultimately accepted a single best-fitting parameter set to conduct model analyses rather than accept multiple parameter sets to form a posterior distribution that can capture parameter uncertainty.
Model validation methods or model validity were mentioned in 52% (80 of 154) of the articles although details as to how validity was assessed were provided in only a few (for example, see references [10, 83]).
Our descriptive review indicates that while an array of techniques are used for model calibration, little attention has been paid to the documentation of these methods in the cancer simulation literature. Calibration is a necessary component of models that simulate an unobserved disease process as it helps ensure the validity and credibility of inferences drawn from model predictions. We found techniques for calibration used in cancer screening models vary in level of complexity and rigor. Methods include “trial-and-error” searching that may employ a more subjective visual assessment of model fit as well as the use of directed search algorithms and more objective quantitative assessments of fit.
Our review also indicates that the depth of documentation of calibration methods in the cancer simulation literature is highly variable. We expected methods would be well-documented in all published papers. Descriptions of calibration methods, if included in the articles, ranged in detail and were often only a few sentences in length. In most equivocal cases, details were not reported or the terminology was considered too imprecise to categorize the methods. The very real possibility that misclassification of methods for these models may have occurred emphasizes the need for more clear and consistent reporting. We recognize that journal word count limits may preclude detailed descriptions of calibration protocols and that the technical details of model calibration may be difficult to publish as stand-alone articles because of their specialized nature. However, clear documentation is critical for both modelers and consumers of model results.
To aid the documentation process, we propose a Calibration Reporting Checklist be adopted as standard practice for modelers (Figure 2). The checklist ensures reporting of details regarding the data, methods and sources for the calibration targets, goodness-of-fit metric(s), search algorithm(s), acceptance criteria, and stopping rules. By encouraging more complete and consistent documentation of all components of calibration, the checklist would aid in the peer-review process. Additionally, with improved transparency, modeling may be less frequently viewed as a “black box” process.
The use of the checklist also will provide a means of disseminating existing and new calibration methodology from other fields such as engineering and environmental science to the disease modeling community. Further it can facilitate comparisons of methods across models, critical for collaborative modeling projects such as CISNET. However there are few rules of thumb to guide choice of methods for any particular model. At present the process of calibration is often an art, rather than a science. Open research questions remain about the quality and appropriateness of alternative calibration methods. Direct comparisons of methods both within and across models are needed to understand if different methods could lead to different calibration results. These types of comparisons will be important next steps for the advancement of model calibration methodology as well as disease simulation modeling in general.
Our focus on cancer screening models for this review does not allow conclusions about the reporting of calibration methods used in models of other diseases. However similar reporting issues most likely exist in other disease models and the checklist would be applicable for these as well. Because of the imperfect nature of article indexing, our search may have missed some relevant articles. For example, the term “natural history” only became a US National Library of Medicine MeSH heading in 1996 and is a broad term. We attempted to mitigate the potential loss of articles by performing a keyword search and including articles from our personal reprints and those referenced in the retrieved articles. As illustrated by our survey of the literature, this is a growing field with greater numbers of disease modeling articles published each year. We are aware of several recent articles describing details of calibration methods[26, 178, 179] but there are likely additional examples that have been published after our search period.
Disease simulation modeling plays an important role in health policy analysis. Our review indicates the application of such models in cancer screening has become more widespread especially within the past decade. The use of disease simulation modeling is likely to expand especially in light of new initiatives in comparative effectiveness research. In addressing policy questions, these models synthesize biologic, epidemiologic and economic data from diverse sources. With their increasing use for policy making, the models themselves will face additional scrutiny and therefore careful documentation of methods will be critical for establishing credibility. While there have been efforts to standardize methods for decision analytic modeling in economic evaluation,[177, 180] less attention has been paid to the calibration of these models. Although questions remain about the best calibration practices, the use of the Calibration Reporting Checklist would begin the standardization process, provide a basis for more transparent comparisons across models, and facilitate important discussions about methods.
The authors gratefully acknowledge the support of Drs. Eric (Rocky) Feuer and Karen Kuntz and members of the NCI Cancer Intervention and Surveillance Modeling Network. This work was supported in part by grants from the National Cancer Institute: F32 CA1259842 (NKS), R25 CA92203 (ABK), K99 126147 (PMM, CYK) and R01 97337 (GSG, PMM, CYK). The funding agreements ensured the authors’ independence in designing the study, collecting, analyzing and interpreting the data, writing, and publishing the report. An earlier version of this work was presented at the 2007 Society for Medical Decision Making Annual Meeting.
Publisher's Disclaimer: This is the prepublication, author-produced version of a manuscript accepted for publication in PharmacoEconomics. This version does not include post-acceptance editing and formatting. The definitive publisher-authorized version of PharmacoEconomics. 2009:27(7):533-545 is available online at: http://adisonline.com/pharmacoeconomics/.