|Home | About | Journals | Submit | Contact Us | Français|
As the complexity of microsimulation models increases, however, concerns about model transparency are heightened.
We conducted model “experiments” to explore the impact of variations in “deep” model parameters using three colorectal cancer (CRC) models. All natural history models were calibrated to match observed data on adenoma prevalence and cancer incidence, but varied in their underlying specification of the adenoma-carcinoma process. We projected CRC incidence among individuals with an underlying adenoma or preclinical cancer vs. those without any underlying condition and examined the impact of removing adenomas. We calculated the percentage of simulated CRC cases arising from adenomas that developed within 10 or 20 years prior to cancer diagnosis, and estimated dwell time – defined as the time from the development of an adenoma to symptom-detected cancer in the absence of screening among individuals with a CRC diagnosis.
The 20-year CRC incidence among 55-year-old individuals with an adenoma or preclinical cancer was 7 to 75 times greater than in the condition-free group. The removal of all adenomas among the subgroup with an underlying adenoma or cancer resulted in a reduction of 30% to 89% in cumulative incidence. Among CRCs diagnosed at age 65, the proportion arising from adenomas formed within 10 years ranged between 4% and 67%. The mean dwell time varied from 10.6 years to 25.8 years.
Models that all match observed data on adenoma prevalence and cancer incidence can produce quite different dwell times and very different answers with respect to the effectiveness of interventions. When conducting applied analyses to inform policy, using multiple models provides a sensitivity analysis on key (unobserved) “deep” model parameters and can provide guidance about specific areas in need of additional research and validation.
Colorectal cancer (CRC) is the second most common cause of cancer-related death in the United States.1 It is estimated that 142,970 CRC cases were diagnosed in 2010 with 51,370 CRC deaths. Most, if not all, CRCs arise from an adenomatous polyp,2 the detection of which is one of the goals of CRC screening.3,4 Randomized controlled trials (RCTs) of CRC screening with fecal occult blood testing (FOBT) or flexible sigmoidoscopy have shown reductions in CRC incidence and mortality.5-10 While RCTs can provide important information about the effectiveness of screening strategies, they are designed to address specific questions and necessarily leave some important questions unanswered. For example, RCTs are unwieldy for comparing multiple strategies that include different screening intervals or different ages to start or stop screening.
Microsimulation models can be used to synthesize data from multiple sources of evidence to evaluate a more comprehensive set of CRC screening strategies. The models discussed in this article simulate the natural history of the adenoma-carcinoma process, which is largely unobservable. The outcomes associated with screening can then be modeled via the detection and removal of adenomas or the early detection of cancer by applying the test characteristics of screening tests.
Numerous other models have been developed to evaluate the costs, clinical benefits, and cost-effectiveness of CRC screening.11-22 While those models all support the finding that annual FOBT relative to no screening costs less than $20,000 per life year gained, they disagree about how alternative strategies compare against one another.23,24 For example, using a willingness to pay threshold of $50,000-$100,000 per life-year gained, three different strategies were found to be optimal: annual FOBT plus 5-yearly flexible sigmoidoscopy, 10-yearly colonoscopy, and colonoscopy at ages 55 and 65 only.24 The Institute of Medicine Workshop on Economic Models of Colorectal Cancer Screening25 was convened in 2004 to explore these findings further with five models.12,16-18,20 The workshop showed that the discrepancies across models could be reduced somewhat by standardizing the inputs on adherence, test characteristics, costs and follow-up assumptions; however, it was unable to further evaluate the uncertainties in the “deep” natural history assumptions across the different models, such as adenoma dwell time.
The Cancer Intervention and Surveillance Modeling Network (CISNET) is a consortium of research teams funded by the National Cancer Institute to develop and use models to address questions related to cancer control and prevention. The CRC-focused CISNET teams represent three independently-developed microsimulation models of the natural history of CRC. As part of the CISNET consortium the groups conducted several experiments to investigate the relative differences among the three models—all of which were calibrated to the same observational data on adenoma prevalence and cancer incidence, but may have different implications for screening effectiveness due to differences in “deep” model parameters. In this paper we explore the differences in rates of adenoma progression to cancer—the biggest uncertainty in the natural history process—across models, and discuss their implications for evaluating screening strategies. In a companion paper published in this issue of Medical Decision Making, we propose the use of a summary measure that would provide insight into the implications for predicted screening effectiveness of differences in natural history assumptions.26
We used the three independently developed CRC CISNET microsimulation models (MISCAN, CRC-SPIN, and SimCRC) to project outcomes for a synthetic cohort of individuals. Each model has a natural history component that simulates the progression of colorectal disease through the adenoma-carcinoma sequence in the absence of screening, and incorporates the assumption that all CRCs arise from adenomas (Figure 1). The models are calibrated to the same data regarding adenoma prevalence, cancer incidence, and stage distribution. These data were collected and processed as part of CISNET and can be considered the best-available data for informing the simulation models. However, some model parameters cannot directly informed by available data. In this paper we focus on dwell time, defined as the time from the development of an adenoma of detectable size to symptom-detected cancer in the absence of screening among individuals with a CRC diagnosis. Standardized profiles of the each model’s structure and underlying model parameters and assumptions, with additional references, are available at http://cisnet.cancer.gov/profiles/. In addition model specifications and key parameters are provided in the companion paper.26
All three natural history models simulate the life histories of a large population of individuals from birth to death. Before age 20, simulated individuals are assumed to be at no risk of developing detectable adenomas (i.e., adenomas >1mm). After age 20, a simulated individual can develop an adenoma in any one of six locations: 1) cecum; 2) ascending colon; 3) transverse colon; 4) descending colon; 5) sigmoid colon; and 6) rectum. The risk of an adenoma depends on age, sex and a person-specific risk index, and multiple adenomas may form within one individual over the lifetime. Non-adenomatous polyps are not modeled under the assumption that they do not progress to cancer (though this may not be the case for serrated polyps27). Differences across models relate mostly to the functional forms used to model adenoma risk and the assumptions used to model within-person changes in adenoma risk over time. All models were calibrated to fit reported adenoma prevalence by age according to autopsy studies,28-37 although the methods used for calibrating to these data varied across modeling groups. Because the evidence for adenoma prevalence is variable, especially for younger ages, there were some differences in the predicted prevalence of adenomas for the simulated cohort by model (Figure 2). The models also projected outcomes for adenoma size and multiplicity. For example, for 65-year-old individuals, the percentage with adenoma ≥10mm was 14% (SimCRC), 17% (CRC-SPIN), and 24% (MISCAN), and the average number of adenomas per individual with at least one adenoma was 1.6 (SimCRC), 1.8 (CRC-SPIN), and 2.0 (MISCAN).
Each simulated adenoma may grow in size over time, and growth trajectories vary across adenomas. In the CRC-SPIN model, growth is continuous with an asymptote at 50 mm, while the other two models simulate growth as transitions from small adenomas (1-5 mm) to medium adenomas (6-9mm), and from medium to large adenomas (≥10 mm). All models allow random variability in adenoma growth across individuals, and across adenomas within individuals via an adenoma-specific growth propensity. Models vary in terms of factors that systematically influence adenoma growth. In the MISCAN model, no patient or adenoma level characteristics systematically affect adenoma growth. In the CRC-SPIN and SimCRC models, adenoma growth varies by location (for CRC-SPIN: colon vs. rectum; for SimCRC: proximal colon vs. distal colon vs. rectum).
The MISCAN model incorporates the assumption that there are two types of adenomas: progressive and non-progressive adenomas. Non-progressive adenomas can grow in size but will never develop into cancer, while all progressive adenomas have the potential to develop into cancer (though some do not). In the SimCRC and CRC-SPIN models, all adenomas have the ability to develop into cancer (i.e., are progressive adenomas), although most do not because the patient dies from another cause before the adenoma progresses. The probability that a progressive adenoma transitions to a preclinical cancer depends on age, sex, and adenoma location (SimCRC and CRC-SPIN only), as well as a person-specific risk variable (SimCRC only) and size (or size category). In the MISCAN and SimCRC models, only adenomas that are >5 mm in size can transition to preclinical cancer; in the CRC-SPIN model an adenoma of any size can progress, though transitions to cancer are rare among adenomas ≤5 mm.
Sojourn time is defined as the time from the onset of preclinical cancer to the development of symptoms leading to clinical detection in the absence of screening. For the MISCAN and SimCRC models, preclinical cancer starts as stage I and progresses through each undiagnosed stage (I through IV) until becoming detected via symptoms. Transition probabilities through cancer stages and symptom-detection rates vary by location in all models and age in the MISCAN model. In the CRC-SPIN model, the sojourn time of each preclinical cancer is drawn from a lognormal distribution that depends on location (colon or rectum); clinically detected cancers (on the basis of symptoms) are assigned a stage that is consistent with the stage distribution observed in the Surveillance, Epidemiology, and End-Results (SEER) Program for years 1975-1979.38 Predicted cancer incidence is comparable across the three models (Figure 2).
The actual transitions from adenoma to preclinical cancer to cancer diagnosis could not directly be inferred because of limited data. Therefore, the models were calibrated to observed data on adenoma prevalence and cancer incidence. The models were all calibrated by simulating the life histories of cohorts of individuals under a given set of parameter values and comparing the model-predicted outcomes with observed data on: (1) the prevalence, size, and multiplicity of adenomas by age and sex from autopsy studies; and (2) the stage- and location-specific incidence of CRC by age and sex from SEER for years 1975-1979.38 We used this period because incidence rates and stage distribution had not yet been affected by screening. Model fits were assessed using maximum-likelihood-based goodness-of-fit statistics.
All three natural history models have a screening component that enables a CRC screening test such as colonoscopy to detect adenomas and preclinical cancers. In the natural history models (i.e., in the absence of screening), all disease states are undetected except for clinical cancer. Once screening is introduced, a simulated person who has an underlying adenoma or preclinical cancer has a chance of having it detected by screening, which depends on the diagnostic characteristics and reach of the test. Test sensitivity varies by the size of the adenoma and the presence of a preclinical cancer. When an adenoma is detected via screening it can then be removed with polypectomy, thereby interrupting the adenoma-carcinoma sequence. Thus, the effectiveness of screening in terms of cancer reduction is a function of how many adenomas are removed and the subsequent cancer risk of these patients.
Although the three natural history models yielded similar adenoma prevalence and CRC incidence by age (Figure 2) we conducted several model “experiments” to assess the potential impact of the superimposed screening component (e.g., removal of adenomas) on subsequent colon cancer incidence.
We first examined the cumulative CRC incidence (in the absence of competing mortality risk) among the two distinct subpopulations that comprise the natural history cohort at age 55: 1) individuals with an underlying adenoma or preclinical cancer, and 2) individuals without any underlying condition. While the projections for the overall cohort were similar across models we sought to determine the degree to which the projections for the two subgroups varied across models. For example, for models that predict a higher cumulative CRC incidence for the subgroup with an adenoma or cancer, we would expect a greater potential benefit from screening.
We then compared the cumulative CRC incidence of 55-year-old individuals with underlying adenomas and cancer (as in the last experiment) with a counterfactual group of individuals for whom all adenomas are diagnosed and removed via polypectomy and all preclinical cancers are diagnosed and treated appropriately. The aim of this experiment was to provide insight on the model results once the natural history has been interrupted. Because all models included a person-specific risk index we expected that the cumulative CRC incidence for “no preclinical disease” group without prior history of adenomas would be lower than a “no preclinical disease” group with a prior history of adenomas. In addition, we would expect a greater potential benefit from surveillance (i.e., colonoscopy follow-up after polypectomy) for models that predict a higher cumulative CRC incidence for the subgroup with adenomas and cancers removed.
A colonoscopy screening interval of 10 years has long been recommended.39-42 The effectiveness of a 10-yearly screening colonoscopy depends not only on the sensitivity of colonoscopy for detecting colorectal lesions but also on the likelihood that an adenoma will develop and transition to clinical cancer within the interval. While the sensitivity of colonoscopy is an external model parameter that can have the same definition and value across models (i.e., the probability that an existing lesion is detected), the chance that an adenoma will form and progress to clinical cancer within the screening interval relies on deep model parameters that cannot be directly informed by observable data. To gain insight on the effect of these unknown model parameters, we used the natural history models to simulate individuals and identified those with CRC clinically detected at age 65. Next we determined whether or not the precursor lesion was present 10 years prior to the time of diagnosis, representing a potentially successful screening opportunity. For those cancers for which the precursor adenoma arose within the interval, a screening colonoscopy 10 years prior would have no effect. We repeated this analysis focusing on cancers diagnosed at different ages, and also considered precursor lesions developing within 20 years of diagnosis.
Dwell times refers to the time between adenoma incidence and adenoma progression to preclinical cancer (i.e., adenoma dwell time), the time between preclinical cancer to symptom-driven cancer diagnosis (i.e., preclinical cancer dwell time or sojourn time), or the total time between adenoma incidence and cancer diagnosis (i.e., total or overall dwell time). We used our simulation models to project the cancer incidence over time and for each case of CRC we determined the time at which the precursor adenoma was either formed (MISCAN, SimCRC) or reached a detectable size (CRC-SPIN) and the time at which the precursor adenoma progressed to preclinical cancer. We calculated the mean, median, and interquartile range for the time between adenoma and preclinical cancer, the time between preclinical and clinical cancer, and the total dwell time for each of the natural history models.
Figure 3 shows the cumulative cancer incidence for a simulated 55-year-old cohort, stratified by whether individuals are free of adenomas and preclinical cancer at age 55 or have (undetected) preclinical adenomas or cancers. While the weighted average of the cancer projections of these two subgroups represents the natural history projections, the degree to which the 20-year cancer risk among the preclinical disease group is greater than the disease-free group varied substantially across models: 7 times greater with MISCAN, 29 times greater with SimCRC, and 75 times greater with CRC-SPIN.
The three natural history models predicted a 20-year cancer incidence of 3.2-3.6% for a 55-year-old cohort. However, if we remove all adenomas and cancers at age 55 the 20-year cancer incidence was reduced to 2.7% for MISCAN, 0.7% for SimCRC, and 0.4% for CRC-SPIN. Figure 4 shows the cumulative cancer incidence among the subgroup of the 55-year-old cohort with adenomas or preclinical cancers under two scenarios: 1) adenomas not removed and preclinical cancers not treated (i.e., false-negative finding), and 2) all adenomas removed via polypectomy and preclinical cancers diagnosed (i.e., true-positive finding). The removal of all adenomas and detection of all preclinical cancers among the subgroup of 55-year-old individuals with preclinical disease resulted in a reduction in the 20-year cumulative cancer risk of 30% (MISCAN) to 87-89% (SimCRC and CRC-SPIN). Results were similar for ages 45 and 65 (results not shown).
The comparison between the cumulative CRC incidence among individuals with “no preclinical disease” and no prior polypectomy (Figure 3) and “no preclinical disease” and prior polypectomy (Figure 4) provides some insight into the effect of the person-specific risk index in each of the models. In other words, individuals with a prior polypectomy – even if they do not have any existing adenomas – are at higher risk for developing new adenomas compared with adenoma-free persons without prior polypectomy. We found that 55-year-old individuals with no preclinical disease but with prior polypectomy were 4.7 (MISCAN), 5.7 (CRC-SPIN), and 2.9 (SimCRC) times more likely to have CRC at age 85 than a 55-year-old disease-free persons without prior polypectomy.
Table 1 shows the percentage of simulated CRC cases at various ages that arose from adenomas that developed within 10 years or within 20 years prior to cancer diagnosis. The MISCAN model results showed the proportion of cancers arising from adenomas forming within the prior 10 years ranged between 62% and 72%, depending on age at cancer detection. This was substantially greater than the corresponding proportions from the SimCRC model (9-10%) and the CRC-SPIN model (3-4%). The proportion of cancers that arose from adenomas forming within 20 years prior to cancer diagnosis was 89-94% for the MISCAN model, 33-39% for SimCRC, and 24-28% for CRC-SPIN, depending on age at cancer diagnosis.
Table 2 shows the summary measures of dwell times for the three natural history models. The mean total dwell time for the MISCAN model (10.6 years) was much shorter than the other two models (25.2 and 25.8 years for SimCRC and CRC-SPIN, respectively). The differences in dwell time were primarily due to the time spent in the adenoma stage; the models were more comparable in terms of their mean preclinical cancer dwell times (i.e., time from development of preclinical cancer to cancer detection among diagnosed cases): 1.6 years (CRC-SPIN), 3.0 years (MISCAN), and 4.0 years (SimCRC).
We conducted several “experiments” with three independently-developed CRC microsimulation models with an aim to understand the impact of superimposing a screening mechanism on alternative specifications of the adenoma-carcinoma process. The natural history models represent a wide spectrum of dwell times (Table 2), although all are consistent with the observed data on adenoma prevalence and cancer incidence. The projected 20-year cumulative CRC incidence for the subgroup of cancer-free 55-year-old individuals with an underlying lesion varied across models: 8.6% (MISCAN), 13.1% (SIMCRC), and 13.5% (CRC-SPIN) (Figure 3). The implication of this result is that the expected effectiveness of screening would be greater for the SimCRC and CRC-SPIN models compared with the MISCAN model. We also found that the 20-year cumulative CRC risk after polypectomy was much greater with the MISCAN model (5.9%) compared with the other two models (1.4% and 1.7% for CRC-SPIN and SimCRC, respectively) (Figure 4). The implication of this finding is that post-polypectomy surveillance would more beneficial with the MISCAN model compared with the other two models. Both of these findings for the MISCAN model are consistent with a shorter overall dwell time, and the implied lower screening effectiveness associated with the former finding could be offset by the implied higher surveillance effectiveness associated with the latter finding. We also found that the MISCAN model showed a relatively high percentage of diagnosed cancers that would not have had a chance of being prevented with colonoscopy 10 years prior because the associated adenoma developed during the 10-year interval (Table 1), which indicates that the MISCAN model would favor strategies with repeated screenings or shorter intervals, particularly in strategies without surveillance.
The CRC natural history model parameters were selected to fit observed data on adenoma prevalence from autopsy studies and cancer incidence from SEER. Because there is a larger range of uncertainty with adenoma prevalence than with cancer incidence, the natural history models’ projections of cancer incidence were closer to one another than their projections of adenoma prevalence (Figure 2), yet all projections were consistent with the cross-sectional data. The underlying assumptions that were incorporated to simulate the adenoma-carcinoma sequence yielded a range of average overall/total dwell times varying from 10 to 26 years. While dwell times are a function of growth rates and variability among growth rates, the incorporation of non-progressive adenomas (i.e., adenomas that could not progress to cancer) within the MISCAN model was necessary to achieve mean overall/total dwell times as short as 10 years and still calibrate to the empirical data. This assumption results in extreme heterogeneity for the progression rate from adenoma to cancer, where one group of adenomas with rate 0 (i.e., non-progressive adenomas) and the other group with rates that are faster on average than those in the CRC-SPIN or SimCRC models, in which all adenomas have the potential to progress to cancer. The effect of incorporating non-progressive adenomas (and thus modeling a shorter dwell time) is generally a decreased effectiveness of screening.
We have shown substantial differences in generated outcomes with our three natural history models, particularly for the MISCAN model relative to the CRC-SPIN and SimCRC models (also see Table 1 of companion paper26). However, when we have used these models to generate outcomes associated with screening strategies, which include both screening and surveillance, we often reach similar conclusions.43-45 Hence, although models with longer overall/total dwell times predict greater effectiveness of screening than models with shorter dwell times, they also show poorer effectiveness of surveillance compared with those with shorter dwell times, since it takes longer for a new adenoma to progress to cancer. Hence, programs that involve both screening and surveillance may produce similar conclusions when comparing screening strategies. While it is not likely that we will ever have direct evidence on dwell times because of ethical considerations, we anticipate that the large prospective trials of sigmoidoscopy that are currently underway or recently concluded10,46-48 will provide opportunities to further validate our models and provide insights into the natural history process. In particular, these trials have variations in their surveillance protocols (intensive surveillance vs. none prescribed). Hence, we would expect the differences of screening effectiveness in trials with surveillance vs. without surveillance to be relatively modest, all else being equal, if the underlying dwell times are longer. Another difference across trial protocols that could shed light on the natural history parameters is the post-sigmoidoscopy follow-up (referral to colonoscopy for large or high-risk adenoma vs. referral to colonoscopy for any adenoma). We would expect the differences of screening effectiveness in trials with follow-up of any adenoma vs. only high-risk adenomas to be relatively modest, all else being equal, if the underlying dwell times are shorter (because missed lesions with shorter dwell times have lower cancer incidence than missed lesions with longer dwell times).
We do not feel that we need to resolve the uncertainties of the natural history of the adenoma-carcinoma process in order for our CRC screening models to be potentially useful for policy makers. That being said, it is important to continue to evaluate our models critically and to be able to evaluate both parameter and structural uncertainty in policy applications. The comparison of outputs from independently developed models provides sensitivity analysis of the uncertainties surrounding each model’s “deep” structural parameters. Consistent conclusions across models would be helpful for policy makers. If the model conclusions are not consistent with each other then we would conclude that models are less helpful for those situations, although they still add value by highlighting the uncertainty.
The CISNET consortium has been a strong proponent of adopting a “comparative modeling” approach, both for understanding the implications of model differences and for conducting policy analyses. Using multiple models designed to address the same question provides a sensitivity analysis on underlying model structure, or on the “deep” model parameters. We have used multiple CRC CISNET models to project life-years gained and resources used associated with various screening strategies to inform the deliberations by the United States Preventive Services Task Force aimed at updating the recommendations for CRC screening for the average-risk population43 and to inform the Centers for Medicare and Medicaid Services on the cost-effectiveness of new screening technologies.44,45 Despite the apparent model differences our findings in those analyses were very similar across models, strengthening the validity of the results.
There are several limitations to note. First, while there are several sets of underlying assumptions that one could make that would be consistent with data on adenoma prevalence and cancer incidence, we only considered three. Although we show that the models represent a broad spectrum of underlying natural history assumptions, three models cannot cover the full range of plausible assumptions. Second, we recognize that there are many “moving parts” in these models and it is difficult to identify the exact cause of differences across models. We focus much of our discussion on dwell times but there may be other assumptions that contribute to the observed differences in model output. Lastly, the ideal study that is needed is one that directly estimates dwell time; unfortunately, that type of study could not be conducted in an ethical manner.
In conclusion, we provide results from several hypothetical exercises designed to gain insights on the impact of screening interventions superimposed on alternative approaches to specifying the adenoma-carcinoma process using three CRC models that are part of the CISNET consortium. We found that differences in dwell times had differential effects on the effectiveness of screening vs. the effectiveness of surveillance. Without direct evidence on dwell time of adenomas, we anticipate that the large prospective trials of sigmoidoscopy may provide insights into the natural history process because of the variations in surveillance and post-sigmoidoscopy follow-up protocols. When conducting applied analyses to inform policy, using multiple models provide a reasonable sensitivity analysis on the key (unobserved) “deep” model parameters.
This research was supported by the National Cancer Institute (U01-CA-088204, U01-CA-097426, U01-CA-097427, and U01-CA-115953).