|Home | About | Journals | Submit | Contact Us | Français|
Several mathematical models of epidemic cholera have recently been proposed in response to outbreaks in Zimbabwe and Haiti. These models aim to estimate the dynamics of cholera transmission and the impact of possible interventions, with a goal of providing guidance to policy-makers in deciding among alternative courses of action, including vaccination, provision of clean water, and antibiotics. Here we discuss concerns about model misspecification, parameter uncertainty, and spatial heterogeneity intrinsic to models for cholera. We argue for caution in interpreting quantitative predictions, particularly predictions of the effectiveness of interventions. We specify sensitivity analyses that would be necessary to improve confidence in model-based quantitative prediction, and suggest types of monitoring in future epidemic settings that would improve analysis and prediction.
The recent cholera epidemic in Zimbabwe (2008-2009) and the ongoing cholera epidemic in Haiti (2010-2011) are catastrophes in two regions already devastated by disease and poverty. The extent of these disasters has prompted inquiries into whether interventions – such as vaccination, antibiotic administration, and the provision of clean water – could have slowed or aborted these cholera epidemics, and how such interventions might be most effectively implemented in future epidemics. Cholera spreads in areas with poor sanitation and through contaminated water, and the ideal solution is to improve infrastructure to provide clean water and effective sanitation -- an approach that has been successful since the 19th century.1 On the timescale of an epidemic, creation of such infrastructure is rarely feasible. Administration of vaccine, a staple of preventive medicine, is one of the few potentially life-saving and implementable solutions.2-8 However, vaccines remain untested in epidemic cholera.
Decisions regarding whether and how to pursue mass vaccination during epidemic cholera present logistical and policy challenges. Ideally, all lifesaving interventions should be employed, but, in practice, policy makers often have to choose among possible interventions, as well as among strategies for deploying these interventions.
Mathematical models of disease transmission aim to provide guidance in making such decisions. Models can estimate key parameters such as R0 (the basic reproductive number, referring to the number of infectious cases caused by an average infectious person in an otherwise entirely susceptible population), and the impact of control strategies. Toward this end, several recent models based on data from the cholera epidemics in Haiti and Zimbabwe have been published.3-7
All models are limited by their simplifying assumptions. It is important to critically evaluate cholera models and their assumptions, so as to gauge the strength of their conclusions. Here, we examine the assumptions implicit in mathematical models of cholera, and the ways these models have been applied to data from Haiti and Zimbabwe. We discuss the impact of model misspecification, parameter uncertainty, and spatial heterogeneity, and explore specifically the impact of the unknown lifespan of cholera vibrios in water reservoirs. We discuss why these criticisms -- potentially applicable to many infectious disease modeling efforts -- are particularly germane to cholera models. Our goal is neither to compare models directly nor to critique each model individually, but to explore the general issues that confront cholera modeling efforts.
Cholera transmission depends on excretion of Vibrio cholerae by infected persons and on ingestion of vibrios in contaminated food or water. In endemic situations, cholera transmission is influenced by complex factors including multiple co-circulating strains, local immunity from past outbreaks,9 weather cycles (both seasonal and climatic oscillations 10-12), and phage that destroy V. cholerae.13 In epidemic outbreaks in susceptible populations, many of these factors are ignored; models assume a single infecting strain, an entirely susceptible population, and a short time scale for the epidemic such that climatic and phage-cholera relationships can be neglected. The rate at which cholera vibrios are excreted depends on the severity of infection, which ranges from asymptomatic infection to cholera gravis (0.5-1L of diarrhea an hour 14), and the extent to which food and water supplies are contaminated by sewage.
In 2001, Codeço 15 proposed a model that aims to capture transmission within a community and is a predecessor of several recent models.3,4,6,16 While this model was explicitly designed for simplicity and qualitative analysis (rather than for quantitative prediction), it provides a convenient framework to illustrate concerns that frequently arise in cholera models. A simplified version of the model is:
S, I, and R represent the number of susceptible, infected, and recovered persons, respectively, with a total population N = S+I+R; B represents the concentration of Vibrio cholerae in the water reservoir used by this population. Key parameters include those influenced by specific local geographic, aquatic, socioeconomic, and behavioral characteristics and others that reflect the biology of Vibrio cholerae and clinical disease. We discuss below the issues of model misspecification (in which the item modeled differs from the item of interest) and parameter uncertainty (in which the true values of the parameters are difficult or impossible to estimate accurately, as they pertain to this cholera model). The model parameters include
This model also assumes that the ratio of asymptomatic to symptomatic infections is constant throughout an epidemic, and that dose determines the likelihood of infection but not the likelihood of being symptomatic. This assumption is contrary to findings from experimental human infections.20 Violations of this assumption may have two consequences for cholera modeling in conjunction with case-notification data. First, severity affects the intensity of shedding,14 and so the average contribution of an infectious person to transmission may change systematically with time as the distribution of infectious doses changes. Second, only symptomatic infections are likely to be reported, and so the reporting rate may change systematically over time for the same reason.
The infection term in this model suffers from misspecification in the sense that there is no physically plausible process that relates the modeled state variables (concentrations of vibrios and “rate of contact with contaminated water”) to a rate (or probability per small unit of time) at which susceptible persons become infected. Put another way, there is no simple way to convert measurable quantities (e.g., a measured dose-response relationship between number of vibrios ingested and the risk of infection) into the parameters β and κ of this model.
This base model has been augmented in multiple ways.16,22-27 Some recent models that analyze the outbreaks in Haiti and Zimbabwe 3-7 incorporate a non-reservoir-based, person-to-person transmission term.5-7 Also, some incorporate a hyperinfectious state for vibrios shortly after excretion3,5,16,22. Some include an asymptomatic state for infected persons 3-5,23, with different models assuming that 20%,5 21%,3 and 25% 4 of infected persons are symptomatic, and that symptomatic persons are 10 (ref. 5) to 1000 (ref. 3) times as infectious as asymptomatic ones. One model includes a latent period,5 and several models link communities to generate meta-population models.4,5,7,25,26
The Table shows the range of several key parameter values either used or generated by the models. Although these models have different structures, we include the ranges to illustrate the uncertainty and inconsistency in parameters that should be biological aspects of cholera and parameters that reflect local water infrastructure and sanitation. We note that some of the parameters used by Codeço,15 chosen for use in an exploratory study, continue to be used in some cholera models despite no evidence for them. For example, the rate of contact with reservoir water is either fixed at 1 day-1 in some models or used for fitting in others, although the physical meaning of this term is unclear as described above. Similarly, rate of contribution of vibrio concentration in the aquatic environment is variously set at ξ=10 cells/mL/person/day following Codeço,15 or set as low as 0.01 cells/mL/person/day based on estimated water reservoir size, or allowed to vary in model fitting.
All of the recent models of epidemic cholera in Haiti and Zimbabwe calibrate to province-level incidence data. Fitting models to data that aggregate local communities assumes that parameters derived from aggregated data can be applied homogeneously, essentially saying that everyone within a province shares the same water reservoir. Attempts to approach this issue in recent cholera modeling include the papers by Chao et al.,5 in which the authors address this concern by estimating local communities based on population density, making use of LandScan (http://www.ornl.gov/sci/landscan/) and geography with respect to rivers and highways, and by Bertuzzo et al.,4 in which the authors use administrative sub-district populations in their model. These model improvements require additional estimated parameters.
The obvious difficulty in calibrating models to province-level data is that cholera outbreaks may be highly spatially heterogeneous: adjacent neighborhoods can experience very different levels of infection,28,29 and given the dependence on water source and sanitation, there may be significant variation at smaller spatial scales than neighborhood. Even neighboring households may not be equally exposed to contaminated water. The shape of an aggregate epidemic curve is influenced by the size of constituent communities, relative timing of outbreaks in those communities, local factors that influence each community's R0, control measures implemented over time, fraction of asymptomatic infection, and extent of underreporting.
Fitting a model to an aggregate epidemic curve will generate a single R0, but this R0 may suggest a level of vaccination that would be protective in some constituent communities but not others. For example, in one recent paper,6 Mukandavire et al. calibrated their model to the Zimbabwean epidemic as reported in each province and from the country as a whole; they then derived the R0 for each of these populations. They reported a threshold vaccination proportion ranging from 13% in Mashonaland East to 81% in Matebeleland South, with the fraction for all provinces except for Mashonaland East reported as 34% or greater. For Zimbabwe overall, the estimated critical vaccination fraction that would prevent an epidemic was 17% – lower than the level of vaccination for all but one of its constituent provinces. This discrepancy emphasizes how spatial heterogeneity can bias results.
In summary, the epidemic curves in constituent spatial units may differ both temporally and in shape, such that the aggregated epidemic curve incorporating each of these communities does not reflect homogeneous dynamics, as assumed in mass-action mixing.30 Consequently, an R0 estimated from aggregate data fails to capture the dynamics critical to accurately estimate the impact of interventions in the constituent spatial units. The data from Zimbabwe show multiple peaks and other features characteristic of heterogeneously mixed populations 31 at province and neighborhood spatial scales.6,29 The practice of fitting epidemic models to cumulative incidence curves rather than incidence curves can obscure these features, while also violating statistical assumptions of independence between fitted data points.
Interventions such as vaccination, antibiotic administration, and provision of clean water all can decrease the number of cholera cases. Vaccination reduces the number of fully susceptible persons, reduces infectiousness (ie, the rate of contamination of the water supply), and reduces the probability of becoming symptomatic when infected. Antibiotic administration shortens the duration of illness and perhaps reduces the concentration of vibrios excreted during illness. Access to clean water reduces consumption of vibrios. Each of these interventions will result in qualitative decreases in the extent of the epidemic. The benefits will be a combination of direct effects on those receiving the intervention, and indirect effects on those who benefit from reduced exposure because others received the intervention; in the case of vaccines, the latter effect is known as herd immunity.
The estimated direct impact of these interventions is often an input variable for transmission models; for example, these models assume that a certain proportion of the population is vaccinated and that the vaccine is effective in a particular fraction of the population (all-or-nothing efficacy) or reduces the infectiousness of each contact by a fixed fraction (leaky efficacy).32 Thus the role of the transmission models, over and above the assumptions about how interventions affect those who receive them, is to quantify the indirect effects of interventions – how much interventions can slow transmission and protect those who are not directly protected by the intervention. These quantitative results about the impact of interventions depend on the parameter values used in the model. In this sense (setting aside issues of model specification), the value of model-based predictions depends on the extent to which the predictions about indirect effects are robust to uncertainties about the value of input parameters.
Uncertainties in the values of input parameters can translate into massive uncertainties in the values of model predictions. We present an example in the Figure (see eAppendix [http://links.lww.com] for details), and focus for this example on the impact of uncertainty in δ, the rate of removal of cholera from the water supply. In the Codeço model,15 the lifespan of cholera in the water supply is represented as 3 days (δ=1/3 days-1). In other models, the lifespan of cholera in the water reservoir is set at 30 days,3,5,6,16 estimated at approximately 4.5 days 4,25 or fitted at approximately 41 days.7 Given that this term reflects the rates at which infectious vibrios become noninfectious due to death or physiologic change, one would expect the lifespan to be highly context-dependent and to vary based on the conditions of the water reservoir. Studies from the 1960s 33,34 examined cholera lifespan in a variety of water types (such as well-water and sea-water) and under a variety of conditions (including sun exposure and temperature variation). In these studies, cholera lifespan is reported from 4 to 80+ days depending on water source and condition.
Variation in the assumed survival of cholera in water directly translates into variation in the distribution of assumed serial intervals for cholera transmission. This in turn changes estimates of R0, the basic reproductive number, when these are obtained by fitting a model to the initial growth rate of the epidemic.35 When these estimates of R0 are in turn used to model interventions (by extending the model, after fitting to initial-growth data, into the future and considering the impact of interventions on transmission), the various values of R0 can give dramatically different predictions for the population-level effects of the interventions.
The proportion of a randomly mixing population that must be effectively vaccinated to prevent an epidemic from taking place is known as the critical vaccination threshold. This threshold is expressed as 1-1/R0 (effective vaccination means fraction vaccinated, or coverage, multiplied by vaccine efficacy 36). For a model fitted to the early growth rate of the epidemic, varying the lifespan of infectious cholera vibrios in the aquatic reservoir (a parameter for which there are no data, but which is a key component to the serial interval) leads to very large changes in the inferred value of R0 and the corresponding critical vaccination threshold. Fitting the model with the lifespan of infectious vibrios set at 30 and then at 3 days changes the fitted R0 from 6 to 1.95, while the critical vaccination threshold decreases from 83% to 49%. Let us assume pre-epidemic vaccination of 70% of the population with a non-leaky vaccine that has 70% efficacy (in keeping with estimates for populations with less natural immunity than the endemic populations in which the vaccine was trialed37). If R0 = 1.95, then pre-vaccination of a population would prevent an epidemic, whereas if R0 = 6, then nearly all unvaccinated persons will become infected. Thus, using parameter values found in the literature, the indirect benefits of vaccination (which is what the model is meant to quantify) range from almost complete protection of all unvaccinated persons to no protection. (See the eAppendix [http://links.lww.com] for further discussion of this issue.)
If one is willing to make strong assumptions, the problems of estimating R0 based on the initial growth rate and on assumed-duration parameters can be overcome in an idealized model by fitting to an epidemic curve with a known peak in cases.38 However, one must assume homogenously mixing and homogeneous population (which is implausible as we argue in the previous section); fixed reporting ratio throughout the epidemic, which is not the case39,40; fixed asymptomatic to symptomatic ratio throughout the epidemic, for which we know of no supporting data; and a single-peaked epidemic, which has not been the case in multiple locations in both Haiti and Zimbabwe.29,41 Even if these assumptions were tenable, this approach can only be used once the epidemic has peaked and so cannot be employed at the start of an epidemic to guide interventions.
The claims made in this section are particular to the actual parameter values required for cholera models and the range of uncertainty that exists for them, in particular for the duration of infectiousness in contaminated water. Sensitivity analyses are necessary in all prediction models for infectious disease transmission, but here we argue, more specifically, that the uncertainty in just one parameter of cholera models can nearly eliminate the predictive power of these models. Within the range of possible values of this parameter, the qualitative predictions of the model range from substantial indirect vaccine effects to almost no indirect vaccine effects.
The process of fitting models to data on the early growth of an epidemic, then running the models forward to test the predicted impact of interventions, has been applied extensively to other infectious diseases. Are the critiques presented here generally applicable to all transmission models fitted to early epidemic data, or are there particular challenges pertaining to cholera or a limited class of infections including cholera?
Such approaches have been frequently applied in planning for pandemic influenza and in the response to SARS and the 2009 influenza pandemic. For these respiratory diseases, a common approach has been to estimate the early growth rate from daily or aggregated case counts, combine this estimate with a (usually exogenous) estimate of the distribution of serial intervals, and produce an estimate of the early values of the reproductive number of the infection. Relatively high-quality estimates of the serial interval distribution were available from contact tracing for SARS, leading to rather consistent estimates of the initial reproductive number around 3.42-44 Likewise, several sources of data provide estimates for the serial interval of pandemic influenza around 2-4 days, 45,46 with corresponding estimates of early reproductive numbers ranging from around 1.3 to a bit over 2, depending on the pandemic and the setting.45-49 While these estimates vary (reflecting true variation, methodological choices, and statistical noise), the range of variation in estimates is less than described above for epidemic cholera. The influenza literature contains explicit considerations of the appropriate values for natural-history parameters 45,50,51 (including critiques 45 of previously used values 48), discussions of the impact of data processing assumptions on reproductive number estimates,35,49 and extensive sensitivity analyses exploring the consequences of alternative parameter values.52-54
Moving beyond parameter-value uncertainty to issues of model structure, the literature on respiratory diseases has considered how varying assumptions about the scale and “local-ness” of mixing,55,56 the relative importance of various settings for transmission,54 seasonality,57 and other factors affect the predicted natural history of an epidemic and impact of control measures. The importance of heterogeneity is recognized in modeling many diseases.58-61
In summary, the issues raised here about the reliability of quantitative predictions from cholera models are applicable to other diseases, including those for which real-time (and retrospective) model fitting has been attempted, such as influenza and SARS. In these diseases, as well, sensitivity analyses to uncertain or heterogeneous parameters are needed, and have indeed been employed.42,62,63 However, there appears to be less heterogeneity and less uncertainty about parameter values for these diseases, perhaps because of their direct person-to-person transmission route, which reduces the impact of environmental variables on parameter values and improves one's ability to measure relevant quantities. Thus, while the same issues should be considered in other diseases, we believe the magnitude of uncertainty in the predictions of models is greater for cholera than for SARS and influenza.
Each of the uncertainties described here provides a potential avenue for advancing cholera modeling. Additional monitoring, where possible, of spatial heterogeneity and the model's quantifiable variables will aid in understanding the mechanisms and dynamics of cholera transmission.
For example, the accuracy of model predictions can be improved by line-listing data describing an outbreak in high spatial and temporal resolution, coupled with descriptions of water resources, storage, and use, and by direct quantitation of vibrio concentration, or measurement of fecal coliform contamination as a proxy.64 Such data would also improve understanding of the extent to which critical variables vary across epidemic settings. In the context of an ongoing epidemic, treatment and prevention efforts must be primary. Still, we note that data relevant to model-building have been obtained in past epidemics.64 Also, coordination with demographic, geologic, and aquatic databases 4,5 can help improve the understanding of cholera transmission dynamics under various conditions.
From the perspective of model misspecification, one possible improvement is to restructure the rate of infection to reflect quantifiable variables. For example, some model misspecification can be avoided by collecting empirical data on drinking rates. This could allow the contact rate (currently units of day-1) to be changed to a drinking rate (units of volume per time), with the probability of infection then formulated as a function of the dose of ingested vibrios rather than a function of the concentration of vibrios. Better data on the dose-response relationship for cholera – including differences between ingestion of a given dose all at once and ingestion of the same dose over several hours or days – would help to constrain the infection terms.
Lastly, the survival of vibrios in a water supply, as shown in our simple sensitivity analysis, may have a significant impact on model-based predictions. The magnitude of the effect may be limited under circumstances in which person-to-person transmission outweighs waterborne transmission (see eAppendix [http://links.lww.com]). This further emphasizes the need for monitoring, and suggests the importance of assessing the sensitivity of results to variations in this parameter.
The uncertainties in epidemic cholera modeling described above suggest that current quantitative estimates of benefits from intervention strategies are handicapped by uncertain model structure. Such uncertainties include the role of person-to-person transmission, a lack of data about critical parameters, including the rate of contamination of communal water supply and the rate of loss of infectious vibrios from the aquatic reservoir, and spatial heterogeneity of parameters among communities. For quantitative modeling to improve its predictions and offer better guidance to policy-makers during episodes of epidemic cholera, innovative approaches are needed for gathering data on neighborhood-level water consumption and contamination, as well as higher spatial-resolution case-reporting.
The analyses and suggestions presented here are intended to provide assistance in critically interpreting the results of cholera models and to point out avenues for further exploration in terms of data collection and modeling development. This is not to dismiss recent cholera modeling efforts or to suggest a particular threshold for modeling accuracy beyond which the use of models is valid. As discussed by George Box, 65 all models are wrong, but some are useful. The extent to which a model is useful depends on the question being asked, and then on an assessment of how a model's uncertainties and simplifying assumptions influence the strength of its conclusions.
In the model we present, based on Codeço,1 the basic reproductive number (R0) and the rate of exponential growth (r), defined as the per capita change in number of new cases per unit of time, are:
These equations indicate that for a given growth rate, varying parameter values within their plausible range of uncertainty can lead to large changes in R0, with important consequences for the models predictions about the effects of interventions.
As an example, take the starting point in which the duration of infection with cholera is 5 days, the lifespan of cholera in the water supply is 30 days, the size of the population is 10000 individuals, the concentration of V. cholerae in the water reservoir resulting in 50% probability of infection is 1×106 cells/mL, and the contact rate is 1 per day. Assuming an initial growth rate of 0.1 per day and solving for the contamination rate (ξ), then using the ξ term to derive R0, we calculate R0 = 6. Alternatively, assuming the lifespan of cholera in the water supply is 3 days, then, for the same growth rate, R0 = 1.95. We consider the effect of giving 70% of the population a vaccine that gives full immunity to 70% of recipients, but has no effect on the remainder. We assume the vaccine is distributed prior to the introduction of cholera. We model this as shifting individuals to the “Recovered” compartment in the model.
The examples above reflect the relationship among the growth rate, basic reproductive number, and disease-generation time, defined as the average amount of time between when an individual is infected and when the person who infected that individual was infected.2 Given two of the three, one can determine the third. Because the generation time depends on the duration of cholera infection and the lifespan of cholera in the water reservoir, then, for a given positive growth rate, R0 depends on these variables. Lack of knowledge of the lifespan of V. cholerae in a water reservoir then means we can only guess at the disease generation time, and hence a positive growth rate is compatible with a wide range of values of R0.
Note that in both expressions for R0 and r, the terms ξ, β, κ, and S0 appear only as the combination ξβS0/κ. If we were to assume δ and γ are fixed and fit the model to an observed value of r, then we are specifying the combination of ξβS0/κ, and so R0 is uniquely determined. This holds regardless of how we allow ξ, β, S0, and k to vary in the fit; the relation between R0 and r is not sensitive to these parameters. In other words, a sensitivity analysis of the impact of varying one of these parameters while fitting to another of the parameters is uninformative, as their product will remain the same. Introduction of terms to account for hyperinfectivity, differing infectivity for asymptomatic and symptomatic individuals, and person-to-person infection, will add terms to, and therefore influence, these relationships, but the core structure remains.
Combining these observations, the similar values for R0 reported by several recent cholera models may reflect use of similar values of δ and γ and hence a similar serial interval. The variation seen in ξ and β likely then reflects differences in other parameter estimates and model differences. However, while we might know the distribution of duration of infection with cholera, we do not know the lifespan of cholera in a water supply. Because the relation between r and R0 is sensitive to this parameter, and our knowledge of it is poor, sensitivity analyses should investigate wide ranges of this parameter.
The serial interval for the model discussed here can be calculated as follows. Start with one infected individual, who recovers at rate γ and increases the concentration of vibrios in the water reservoir at rate ξ. These vibrios decay at rate δ. The expected concentration of vibrios due to the infected individual at time t is given by:
As infectiousness is proportional to vibrio concentration when concentrations are low, to obtain the serial interval distribution we can normalize BI(t) by its integral over time, which is
The serial interval has a mean equal to the sum of the mean duration of human infectiousness and the mean duration of vibrio viability in the water,
Thus for these parameters, this ranges from 8 days (5 + 3) to 35 days (5+30) -- an uncertainty of almost 4.5 times. Plotting the serial interval distribution for the two sets of parameters used in the example above yields eFigure (http://links.lww.com).
The parameter uncertainty emphasized here is importantly dependent on the most uncertain (and probably variable) parameter that influences timing of infectiousness, the decay rate of cholera infectivity in water. One might argue that this dependence is an artifact of assuming a purely waterborne transmission route, without accounting for person-to-person transmission, which in this context means transmission through contaminated food or water containers within households or at communal meals. The role of waterborne transmission is to extend the duration of infectiousness traceable back to one infected person from the duration of that person's shedding to the (possibly much longer) time that the vibrios shed by that person remain infectious in the water reservoir. In a model with primarily person-to-person transmission, the serial interval would be shorter and less uncertain.
If one were certain of the relative proportion of person-to-person and waterborne transmission of cholera within an epidemic (and if it could be assumed constant in space and time), then the parameter uncertainty described in the main text of this paper would be reduced. However, in models incorporating direct person-to-person transmission, the relative role of this route vs. waterborne transmission is either fitted, for which there may be an identifiability problem, or assumed, based on little or no data, especially for any particular ongoing outbreak. In the absence of knowledge about the relative importance of person-to-person and waterborne transmission, the uncertainty in the serial interval remains unchanged.
SDC Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com). This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author.
Conflicts of Interest and Sources of Funding: The project described was supported by Award Number U54GM088558 to ML from the National Institute Of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute Of General Medical Sciences or the National Institutes of Health. YHG received support from National Institutes of Allergy and Infectious Disease (T32 grant AI007061). JCM received support from the RAPIDD program of the Science and Technology Directorate, Department of Homeland Security and the Fogarty International Center, National Institutes of Health.