Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2365921

Formats

Article sections

- Abstract
- 1 Introduction
- 2 General stochastic SIR model
- 3 Generation interval contraction
- 4 Simulations
- 5 Consequences for estimation
- 6 Discussion
- References

Authors

Related links

Math Biosci. Author manuscript; available in PMC 2009 May 1.

Published in final edited form as:

Published online 2008 February 29. doi: 10.1016/j.mbs.2008.02.007

PMCID: PMC2365921

NIHMSID: NIHMS45802

Corresponding author: Email: ekenah/at/hsph.harvard.edu

The publisher's final edited version of this article is available at Math Biosci

See other articles in PMC that cite the published article.

The *generation interval* is the time between the infection time of an infected person and the infection time of his or her infector. Probability density functions for generation intervals have been an important input for epidemic models and epidemic data analysis. In this paper, we specify a general stochastic SIR epidemic model and prove that the mean generation interval decreases when susceptible persons are at risk of infectious contact from multiple sources. The intuition behind this is that when a susceptible person has multiple potential infectors, there is a “race” to infect him or her in which only the first infectious contact leads to infection. In an epidemic, the mean generation interval contracts as the prevalence of infection increases. We call this *global competition* among potential infectors. When there is rapid transmission within clusters of contacts, generation interval contraction can be caused by a high local prevalence of infection even when the global prevalence is low. We call this *local competition* among potential infectors. Using simulations, we illustrate both types of competition. Finally, we show that hazards of infectious contact can be used instead of generation intervals to estimate the time course of the effective reproductive number in an epidemic. This approach leads naturally to partial likelihoods for epidemic data that are very similar to those that arise in survival analysis, opening a promising avenue of methodological research in infectious disease epidemiology.

In infectious disease epidemiology, the *serial interval* is the difference between the symptom onset time of an infected person and the symptom onset time of his or her infector [1]. This is sometimes called the “generation interval.” However, we find it more useful to adopt the terminology of Svensson [2] and define the *generation interval* as the difference between the infection time of an infected person and the infection time of his or her infector. By these definitions, the serial interval is observable while the generation interval usually is not. We define *infectious contact* from *i* to *j* to be a contact that is sufficient to infect *j* if *i* is infectious and *j* is susceptible, and we define a *potential infector* of person *i* to be an infectious person who has positive probability of making infectious contact with *i*. Finally, we use the term *hazard* rather than *force of infection* to highlight the similarities between epidemic data analysis and survival analysis.

The generation interval has been an important input for epidemic models used to investigate the transmission and control of SARS [3, 4] and pandemic influenza [5,6]. More recently, generation interval distributions have been used to calculate the incubation period distribution of SARS [7] and to estimate *R*_{0} from the exponential growth rate at the beginning of an epidemic [8]. It is generally assumed that the generation interval distribution is characteristic of an infectious disease. In this paper, we show that this is not true. Instead, the expected generation interval decreases as the number of potential infectors of susceptibles increases. During an epidemic, generation intervals tend to contract as the prevalence of infection increases. This effect was described by Svensson [2] for an SIR model with homogeneous mixing. In this paper, we extend this result to all time-homogeneous stochastic SIR models.

A simple thought experiment illustrates the intuition behind our main result. Imagine a susceptible person *j* in a room. Place *m* other persons in the room and infect them all at time *t* = 0. For simplicity, assume that infectious contact from *i* to *j* occurs with probability one, *i* = 1, ..., *m*. Let *t _{ij}* be a continuous nonnegative random variable denoting the first time at which

When a susceptible person is at risk of infectious contact from multiple sources, there is a “race” to infect him or her in which only the first infectious contact leads to infection. Generation interval contraction is an example of a well-known phenomenon in epidemiology: The expected time to an outcome, given that the outcome occurs, decreases in the presence of competing risks. In our thought experiment, the outcome is the infection of *j* by a given *i* and the competing risks are infectious contacts from all sources other than *i*.

Adapting our thought experiment slightly, we see that the contraction of the generation interval is a consequence of the fact that the hazard of infection for *j* increases as the number of potential infectors increases. Let λ(*t*) be the hazard of infectious contact from any potential infector to *j* at time *t* and let *E*[*t _{j}*|

so the expected generation interval decreases as the number of potential infectors increases. A hazard of infection that increases with the number of potential infectors is a defining feature of most epidemic models, so generation interval contraction is a very general phenomenon. We note that a very similar phenomenon occurs in endemic diseases, where increased force of infection results in a decreased average age at first infection [9].

The rest of the paper is organized as follows: In Section 2, we describe a general stochastic SIR epidemic model. In Section 3, we use this model to show that the mean generation interval decreases as the number of potential infectors increases. As a corollary, we find that the mean serial interval also decreases. In Section 4, we consider the role of the population contact structure in generation interval contraction and illustrate the effects of global and local competition among potential infectors with simulations. In Section 5, we argue that hazards of infectious contact should be used instead of generation or serial interval distributions in the analysis of epidemic data. Section 6 summarizes our main results and conclusions.

We start with a very general stochastic “Susceptible-Infectious-Removed” (SIR) epidemic model. This model includes fully-mixed and network-based models as special cases, and it has been used previously to define a mapping from the final outcomes of stochastic SIR models to the components of semi-directed random networks [10, 11].

Each person *i* is infected at his or her *infection time t _{i}*, with

When person *i* is infected, he or she makes infectious contact with person *j* after an *infectious contact interval τ _{ij}*. Each

which is the conditional probability that *i* never makes infectious contact with *j* given *r _{i}*. Since a person cannot transmit disease before being infected or after recovering from infectiousness,

The *infectious contact time t _{ij}* =

Schematic diagram of variables in the general stochastic SIR model for the ordered pair *ij*. Recall that *t*_{j} ≤ *t*_{ij}. As discussed in Section 3.2, person *i* develops symptoms at time , where *q*_{i} is the incubation period.

The *importation time t*_{0i} of person *i* is the earliest time at which he or she receives infectious contact from outside the population. The importation time vector **t**_{0} = (*t*_{01}, ..., *t*_{0n}).

We assume that each infected person has a unique infector. Following [4], we let *v _{i}* represent the index of the person who infected person

Let *t*_{(1)} ≤ *t*_{(2)} ≤ ... ≤ *t*_{(m)} be the order statistics of all *t*_{1}, ..., *t _{n}* less than infinity, and let (

In this section, we show that the mean infectious contact interval *τ _{ij}* given that

(note that *v _{j}* =

We first show that *E*[*τ _{ij}*|

be the conditional cdf of *τ _{ij}* given

(1)

If person *j* is susceptible at time *t _{i}* and

If we let

then

Since *S*_{*j} (*t _{i}* +

Therefore,

(2)

Since the same inequality holds for all *r _{i}*,

(3)

by the law of iterated expectation.

Equality holds in equation (2) if and only if *τ _{ij}* and

The expected generation interval from *i* to *j* given *v _{j}* =

will be minimized when *S*_{*j}(*t _{i}* +

In general, we expect to see the following pattern over the course of an epidemic: The mean generation interval decreases as the prevalence of infection increases, reaches a minimum as the prevalence of infection peaks, and increases again as the prevalence of infection decreases.

In [2], Svennson discussed two types of generation intervals that are consistent with the verbal definition given in the Introduction. *T _{p}* (

In an epidemic, infection times are generally unobserved. Instead, symptom onset times are observed. Recall that the time between the onset of symptoms in an infected person and the onset of symptoms in his or her infector is called the *serial interval*. Contraction of the mean generation interval implies contraction of the mean serial interval as well. The *incubation period* is the time from infection to the onset of symptoms [1]. Let *q _{i}* be the incubation period in person

Therefore,

with strict inequality whenever strict inequality holds for the corresponding generation interval. Over the course of an epidemic, we expect the mean serial interval to follow a pattern very similar to that of the mean generation interval.

We refer to the “race” to infect a susceptible person as *competition among potential infectors*. In this section, we illustrate two types of competition among potential infectors: *Global competition* among potential infectors results from a high global prevalence of infection. *Local competition* among potential infectors results from rapid transmission within clusters of contacts, which causes susceptibles to be at risk of infectious contact from multiple sources within their clusters even if the global prevalence of infection is low. In real epidemics, the prevalence of infection is usually low but there is clustering of contacts within households, hospital wards, schools, and other settings.

In this section, we use simulations to illustrate generation interval contraction under global and local competition among potential infectors. Each simulation is a single realization of a stochastic SIR model in a population of 10, 000. We keep track of the infection times of the primary and secondary case in each infector/infectee pair and the prevalence of infection at the infection time of the secondary case, which is a proxy for the amount of competition to infect the secondary case. We then calculate a smoothed mean of the generation interval as a function of the infection time of the primary case in each pair. Another valid approach would be to calculate the smoothed means from the results of many simulations. We did not take this approach for the following reasons: (i) Because of variation in the time course of different realizations of the same stochastic SIR model, many simulations would be required to obtain a curve that reliably approximates the asymptotic limit. (ii) The smoothed mean over many simulations would show a pattern similar to that obtained in any single simulation. (iii) Generation interval contraction was proven in Section 3, so the simulations are intended primarily as illustrations.

All simulations were implemented in Mathematica 5.0.0.0 [© 1988-2003 Wolfram Research, Inc.]. All data analysis was done using Intercooled Stata 9.2 [© 1985-2007 StataCorp LP] All smoothed means are running means with a bandwidth of 0.8 (the default for the Stata command lowess with the option mean). Similar results were obtained for larger and smaller bandwidths.

To illustrate global competition among potential infectors, we use a fully-mixed model with population size *n* = 10, 000 and basic reproductive number *R*_{0}. The infectious period is fixed, with *r _{i}* = 1 with probability one for all

From equation (1), the mean infectious contact interval given that contact occurs is

For *n* = 10, 000, Table 1 shows this expected value at each *R*_{0}. For all *R*_{0}, *E*[*τ _{ij}*|

Expected infectious contact interval given that infectious contact occurs in the models illustrating global competition among potential infectors. If the generation interval were constant, this would be the mean generation interval throughout an epidemic **...**

This model was run once at *R*_{0} = 1.25, 1.5, 2, 3, 4, 5, and 10. For each simulation, we recorded *t _{i}*,

The smoothed mean generation interval as a function the source infection time for *R*_{0} = 2, 3, 4, 5. There is a clear tendency to contract, with greater contraction for higher *R*_{0}.

The smoothed mean generation interval (solid lines) and prevalence (dotted lines) as a function of the source infection time for *R*_{0} = 2, 3, 4, 5. In all cases, the greatest contraction of the serial interval coincides with the peak prevalence of infection **...**

To illustrate local competition among potential infectors, we grouped a population of *n* = 9, 000 individuals into clusters of size *k*. As before, the infectious period is fixed at *r _{i}* = 1 for all

We fixed the hazard of infectious contact between individuals in the same cluster at λ_{within} = .4. We tuned the hazard of infectious contact between individuals in different clusters to obtain *R* mean infectious contacts by infectious individuals; specifically,

We chose λ_{within} = .4 to obtain rapid transmission within clusters while retaining sufficient transmission between clusters to sustain an epidemic. Note that when *k* > *R*(1 - *e*^{-.4})^{-1} + 1, we get the implausible result that λ_{between} < 0. Clearly, *R* and *k* must be chosen so that an infectious person makes an average of *R* or fewer infectious contacts within his or her cluster, which guarantees that λ_{between} ≥ 0.

At a given *R*, the mean infectious contact interval given that infectious contact occurs depends on the cluster size. If the entire population is infectious and the cluster size is *k*, then a given individual will receive an average of *R* infectious contacts, of which (*k*-1)(1-*e*^{-.4}) come from within his or her cluster. The mean infectious contact interval for within-cluster contacts is

and the mean infectious contact interval for between-cluster contacts is approximately .5 (as in the models for global competition). Therefore, the mean infectious contact interval given that contact occurs and the cluster size is *k* is

To compare generation interval contraction for different cluster sizes, we calculated *scaled generation intervals* by dividing the observed generation intervals at each cluster size by *E*[*τ _{ij}*|

For *R* = 2, we ran the model with cluster sizes of 1 through 6. For *R* = 3, we ran the model with cluster sizes of 2 through 8. For each simulation, we recorded *t _{i}, v_{i}, t_{vi}*, and the prevalence of infection at time

The effect of generation interval contraction on parameter estimates obtained from models that assume a constant generation or serial interval distribution is difficult to assess. The assumption of a constant serial or generation interval distribution may be reasonable in the early stages of an epidemic with little clustering of contacts, in an epidemic with *R*_{0} near one, or in an endemic situation. However, this ignores the more fundamental issue that estimates of these distributions are obtained from transmission events where the infector/infectee pairs are known (often because of transmission from a known patient within a household or hospital ward). Even in the early stages of an epidemic, the generation interval distribution in these settings may differ substantially from the generation interval distribution for transmission in the general population.

In this section, we argue that hazards of infectious contact can be used instead of generation or serial intervals in the analysis of epidemic data. As an example, we look at the estimator of *R*(*t*) (the effective reproductive number at time *t*) derived by Wallinga and Teunis [4] and applied to data on the SARS outbreaks in Hong Kong, Vietnam, Singapore, and Canada in 2003. In their paper, the available data was the “epidemic curve” **t** = (*t*_{(1)}, ..., *t*_{(m)}), where *t*_{(i)} is the infection time of the *i*^{th} person infected. They assume a probability density function (pdf) *w*(*τ*|*θ*) for the serial interval given a vector *θ* of parameters (note that this parameter vector applies to the population, not to individuals). The infector of person (*i*) is denoted by *v _{(i)}*, with

The sum of this likelihood over the set *V* of all infection networks consistent with the epidemic curve **t** is

Taking a likelihood ratio, Wallinga and Teunis argue that the relative likelihood that person *k* was infected by person *j* is

(4)

The number *R _{j}* of secondary infectious generated by person

An estimate of the effective reproductive number *R*(*t*) can be obtained by calculating a smoothed mean for a scatterplot of (*t _{j}, E*[

A very similar result can be derived by applying the theory of order statistics (see Ref. [12]) to the general stochastic SIR model from Section 2. Specifically, we use the following results: If *X*_{1}, ..., *X _{n}* are independent non-negative random variables, then their minimum

Given that the minimum is *x*_{(1)}, the probability that *X _{j}* =

For simplicity, we assume that the infectious contact intervals *τ _{ij}* are absolutely continuous random variables.

Let λ_{ij}(*τ*|*r _{i}*) be the conditional hazard function for

(5)

which is the probability that *t _{jk}* = min(

A partial likelihood for epidemic data can be derived using the same logic as that used to derive *p _{jk}* in equation (5). For each person

(6)

where the numerator is the hazard of infection (from all sources) in person *k* at time *t _{k}* and the denominator is the total hazard of infection for all persons at risk of infection at time

If there is a vector of parameters **x**_{ij} for each pair *ij* (which may include individual-level covariates for *i* and *j* as well as pairwise covariates for the ordered pair *ij*) and a vector of parameters *θ* such that λ_{ij}(*τ*|*r _{i}*) = λ(

(7)

This is very similar to partial likelihoods that arise in survival analysis, so many techniques from survival analysis may be adaptable for use in the analysis of epidemic data.

The goal of such methods would be to allow statistical inference about the effects of individual and pairwise covariates on the hazard of infection in ordered pairs of individuals. In the ordered pair *ij*, the effects of individual covariates for *i* and *j* on λ_{ij}(*τ*|*r _{i}*) would reflect the infectiousness of

This approach has several advantages over any approach based on a distribution of generation or serial intervals. First, it is not necessary to determine who infected whom in any subset of observed infections. If *v _{j}* is known for some

Generation and serial interval distributions are not stable characteristics of an infectious disease. When multiple infectious persons compete to infect a given susceptible person, infection is caused by the first person to make infectious contact. In Section 3, we showed that the mean infectious contact interval *τ _{ij}* given that

with strict inequality when *τ _{ij}* is non-constant and

In an epidemic, the mean generation (and serial) intervals contract as the prevalence of infection increases and susceptible persons are at risk of infectious contact from multiple sources. In the simulations of Section 4, we saw that the degree of contraction increases with *R*_{0}. For models with clustering of contacts, generation interval contraction can occur even when the global prevalence of infection is low because susceptibles are at risk of infectious contact from multiple sources within their own clusters. In all of the simulations, the greatest serial interval contraction coincided with the peak prevalence of infection, when the risk of infectious contacts from multiple sources was highest. The mean generation interval increases again as the epidemic wanes, but this rebound may be small when *R*_{0} is high.

The reason that generation and serial intervals contract during an epidemic is that their definition applies to pairs of individuals *ij* such that *i* actually transmitted infection to *j*. If we don’t require that an infectious contact leads to the transmission of infection, we are led naturally to the concept of the infectious contact interval, which has a well-defined distribution throughout an epidemic. Similarly, we can define *R*_{0} as the mean number of infectious contacts (i.e., finite infectious contact intervals) made by a primary case without reference to a completely susceptible population. Generation and serial intervals and the effective reproductive number can then be defined in terms of infectious contacts that actually lead to the transmission of infection. Many fundamental concepts in infectious disease epidemiology can be simplified usefully by defining them in terms of infectious contact rather than infection transmission.

Infectious contact hazards for ordered pairs of individuals can be used for many of the same types of analysis that have been attempted using generation or serial interval distributions. In Section 5, We derived a hazard-based estimator of *R*(*t*) very similar to that developed by Wallinga and Teunis [4]. This derivation led naturally to a partial likelihood for epidemic data very similar to those that arise in survival analysis. We believe that the adaptation of methods and theory from survival analysis to infectious disease epidemiology will yield flexible and powerful tools for epidemic data analysis.

This work was supported by the US National Institutes of Health cooperative agreement 5U01GM076497 “Models of Infectious Disease Agent Study” (E.K. and M.L.) and Ruth L. Kirchstein National Research Service Award 5T32AI007535 “Epidemiology of Infectious Diseases and Biodefense” (E.K.). We also wish to thank Jacco Wallinga and the anonymous reviewers of Mathematical Biosciences for useful comments and suggestions.

[1] Giesecke J. Modern Infectious Disease Epidemiology. Edward Arnold; London: 1994.

[2] Svensson Å. A note on generation times in epidemic models. Mathematical Biosciences. 2007;208:300–311. [PubMed]

[3] Lipsitch M, Cohen T, Cooper B, et al. Transmission dynamics and control of Severe Acute Respiratory Syndrome. Science. 2003;300:1966–1970. [PMC free article] [PubMed]

[4] Wallinga J, Teunis P. Different epidemic curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. American Journal of Epidemiology. 2004;160(6):509–516. [PubMed]

[5] Mills CE, Robins J, Lipsitch M. Transmissibility of 1918 pandemic influenza. Nature. 2004;432:904. [PubMed]

[6] Ferguson NM, Cummings DAT, Cauchemez S, Fraser C, Riley S, Meeyai A, Iamsirithaworn S, Burke D. Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature. 437:209–214. [PubMed]

[7] Kuk AY, Ma S. Estimation of SARS incubation distribution from serial interval data using a convolution likelihood. Statistics in Medicine. 2005;24(16):2525–37. [PubMed]

[8] Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B. 2007;274:599–604. [PMC free article] [PubMed]

[9] Anderson RM, May RM. Infectious Diseases of Humans: Dynamics and Control. Oxford University Press; New York: 1991.

[10] Kenah E, Robins J. Second look at the spread of epidemics on networks. Physical Review E. 2007;76:036113.

[11] Kenah E, Robins J. Network-based analysis of stochastic SIR epidemic models with random and proportionate mixing. Journal of Theoretical Biology. 2007;249(4):706–722. [PMC free article] [PubMed]

[12] Gut A. An Intermediate Course in Probability. Springer-Verlag; New York: 1995.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |