Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Math Biosci. Author manuscript; available in PMC 2010 September 1.
Published in final edited form as:
PMCID: PMC2731010



Many of the studies on emerging epidemics (such as SARS and pandemic flu) use mass action models to estimate reproductive numbers and the needed control measures. In reality, transmission patterns are more complex due to the presence of various social networks. One level of complexity can be accommodated by considering a community of households. Our study of transmission dynamics in a community of households emphasizes five types of reproductive numbers for the epidemic spread: household-to-household reproductive number, leaky vaccine-associated reproductive numbers, perfect vaccine reproductive number, growth rate reproductive number, and the individual reproductive number. Each of those carries different information about the transmission dynamics and the required control measures, and often some of those can be estimated from the data while others cannot. Simulations have shown that under certain scenarios there is an ordering for those reproductive numbers. We have proven a number of ordering inequalities under general assumptions about the individual infectiousness profiles. Those inequalities allow, for instance, to estimate the needed vaccine coverage and other control measures without knowing the various transmission parameters in the models. Along the way, we’ve also shown that in choosing between increasing vaccine efficacy and increasing coverage levels by the same factor, preference should go to efficacy.

1. Introduction

Over the last few years, there has been much interest in analyzing data from SARS, past influenza pandemics, and other diseases to estimate the reproductive numbers of these infections. A practical goal in making such estimates is that reproductive numbers can be used to gauge the effort that would be required to control such infections, either by classical public health measures such as isolation and quarantine, by biological interventions such as antivirals or vaccines, or by social interventions to reduce contact. A fundamental insight from simple models of infectious disease transmission is that the critical proportion pc of transmission events that must be blocked by such measures to halt the growth of an epidemic is given by the equation 1


Here R0 is the basic reproductive number, or the mean number of infections caused by a typical infectious case. The validity of equation 1.0.1 depends on the mass action assumptions for epidemic transmission; equation 1.0.1 is often applied to an early stage of an epidemic in a large community. We note that equation 1.0.1 is also valid under certain other assumptions besides mass action, such as no small loops and no depletion of susceptibles.

Another known consequence of the mass action assumptions, often used to compute R0, is that the reproductive number can be related to the epidemic curve, and to the growth rate of an emerging epidemic; this will be discussed further in Section 2. All this makes the mass action model a convenient (and widely used) set-up for estimating the various epidemiological quantities, and for the subsequent specification of the required control measures. In reality, these mass action assumptions are violated due to the existence of complex social networks. Hence, it is possible that estimates of the effort needed to halt the growth of an epidemic are significantly biased. It would be useful to know the directions and magnitude of these biases, and to provide quantitative bounds on the required measures even though the estimates of transmission are made under models that involve simplifications of reality.

In this paper we focus on one important departure from mass action mixing, the existence of small, closely connected groups of people in which transmission is localized and possibly quite intense – that is, individuals within the group tend to mix preferentially with others in the group, and to subject other members of their group to a sustained risk of transmission. The classic example of such a group is a household, though the notions presented here may be generalized to other, similar settings; the basic idea is that there are two levels of mixing (Ball et al. (1997)), one local and one global. In this context, the universal role played by the reproductive number in the mass action model, both in characterizing the epidemic’s dynamics and the required control measures, is no longer valid. In fact, given the two levels of mixing, there are several reproductive numbers describing the spread of an epidemic and the various control measures that can be implemented to stop it. This paper deals with those reproductive numbers and the relations between them. It is organized as follows:

In Section 2 we review the roles played by the reproductive number under the simplifying assumptions of mass action. In Section 3 we present a model for the stochastic transmission dynamics in a community of households, following the one of Ball et al. (1997); Britton and Becker (2000) - see also House and Keeling (2008); Ferguson and Dodd (2007) for a treatment of related issues in a deterministic model. In Section 4 we define five reproductive numbers that arise in this context. Three of these reproductive numbers are quantities that may be measured in an emerging epidemic, while two others encapsulate the extent of control measures directed at individuals that would be required to stop the growth of such an epidemic. While all five numbers are equal in the setting of a mass-action epidemic, their magnitudes diverge in a population with a household structure. In Section 5 we note that all five are epidemic thresholds (the requirement for epidemic spread is that each of them exceeds one), but when they exceed one, their magnitudes diverge. Numerically, we show that there is in most cases a consistent ordering among them, so that the three measurable reproductive numbers provide bounds on the two relevant for control. Analytically, we demonstrate that most of these ordering relations are generally true (Theorem 1). Along the way, we also prove that in the household model, increasing the vaccine efficacy is better than scaling the vaccination coverage level by the same factor (Proposition A.2.3), with the two measures having the same effect in a mass action model. The rest of the paper deals with proving Theorem 1 (Section 6 and Appendix A), analyzing the limiting cases for the remainder of the ordering relations (Appendix B), and presenting the numerical model for computing those reproductive numbers under various assumptions on individual infectiousness profiles (Section 7). We conclude by discussing the relevance of these comparisons for public health, and in particular for estimating important quantities during emerging epidemics (Section 8).

2. Roles of the reproductive number in a mass action epidemic

In a mass action model, the critical proportion of infectious contacts that must be blocked to halt the growth of an epidemic is given by equation (1.0.1). If a vaccine is to be used as the sole control measure, then effective immunization of a fraction pc of the population is required; likewise, preventing pc of the contacts by behavior change or reducing the infectiousness of cases (e.g. by treatment) by a proportion pc will be equivalent. In a mass action model, it does not matter whether the intervention is “leaky” (scales down the susceptibility parameter for those who received it) or “all-or-nothing” (completely prevents transmissions for a fraction of individuals who received it, and has no effect on susceptibility for the rest of individuals who got it, Becker and Dietz (1995); Halloran et al. (1997)) because all contacts in mass action models are assumed to be instantaneous and with different people.

Under mass-action mixing, or more generally, under the assumption of no small loops and no depletion of susceptibles, there are several known techniques for estimating the reproductive number. The available information often includes the epidemic curve (number of new cases on each day) and a known distribution w( ) for the infectious contact interval. The latter can be understood in several ways: under no depletion of susceptibles, it represents the distribution of times from appearance of one case in the epidemic curve to appearance of secondary cases caused by it, Kenah et al. (2008); alternatively, w(t) equals the proportion of the average individual’s infectivity which falls on day t since infection. Given that information, one may estimate the effective daily reproductive number R(t), which equals the mean number of infections caused by an individual, who got infected at time (day) t - see Wallinga and Teunis (2004). Essentially, this technique produces a local estimate of how fast the epidemic is growing on each day by attributing cases in the present to their likely infectors in the past, and then estimating how many cases are attributed to each infector on a given day. When detailed epidemic curve data are not available, or when one is interested in the average value of R(t) over some interval (t1, t2), a special case of this approach is to estimate the exponential growth rate r of the epidemic over that interval as


where N(t) is the number of persons infected on day t, and then to estimate the value of R during that interval using the expression


where the moment-generating function


These two approaches are closely related (Wallinga and Lipsitch (2007)), converging to the same answer if R(t) remains constant during the interval for which data are analyzed.

The definition of R has been rigorously extended to the setting in which the population is stratified into a number of groups (e.g., age groups or sexes or sexual activity groups) with different mixing patterns: in this case, members of certain groups are more likely than others to become infected, and these individuals may in turn be more or less likely to infect others. The appropriate definition of a “typical” infective in this setting is a weighted average over the groups, in which the weights are derived from the leading right eigenvector of the next generation matrix (Rij), where Rij is defined as the expected number of secondary cases in group i caused by an infective member of group j. Using this definition, the basic reproductive number for the disease in the whole population is given by the leading eigenvalue of the matrix (Rij). Defined in this way, R0 = 1 still represents the threshold between an infection that can increase and one that will decline; vaccination of a fraction pc from equation 1.0.1 of members of every group in the population will scale all entries of the next generation matrix by (1− pc) and produce R = 1. A full account of these calculations is given in Diekmann and Heesterbeek (2000).

In summary, for a mass action model, the same reproductive number may be used for four different purposes: as the mean number of secondary cases caused by a typical infectious person, as a well-defined function of the exponential growth rate of the epidemic and the infectious contact interval distribution, as the proportion of contacts that must be blocked to halt the growth of the epidemic (all-or-nothing transmission reduction), and as the proportional reduction in the probability of each transmission that is required to halt the growth of an epidemic (leaky intervention).

In the next section, we define a model for transmission in a community of households, and then in Section 4 we discuss these four reproductive numbers and one additional one in the context of a community of households.

3. A model for homogeneous individuals in a community of households of varying sizes

The model used here builds on the seminal paper of Ball et al. (1997); see also Britton and Becker (2000).

We consider a population of households, in which the infection is spreading. The relative frequency of individuals in the population who live in a household of size h is πh, with Σπh = 1. We focus on an early phase of the epidemic so that no more than one individual in a household is infected from outside, and there is no depletion of susceptible households. Individuals are homogeneous except for the size of the household they inhabit. Upon infection, individuals may become infectious immediately or after some period, and their degree of infectiousness may vary over the infectious period. While infectiousness profiles followed by different individuals can be different, they are assumed to have the same distribution. Formally, it means that a person infected t time units ago generates infectiousness of intensity I(t), where I(t) is a trajectory of some fixed stochastic process M. One can think of I(t) as the amount of pathogen shedding by an infected individual t time units after his/her getting infected; or as the risk that he/she poses to the susceptibles -see equations 3.0.7, 3.0.9. Thus I(t) is random, and its expected value is denoted by


The number


is the expected cumulative infectiousness of an infected individual. Having introduced the individual infectiousness profile, we can now describe the infectiousness hazard that an infected individual poses, both outside of his/her household and within the household.

Outside of the household, for a person infected t time units ago, his/her infectiousness hazard to the community is μGI(t), where μG > 0 is a number. We note that at a given time, a person can be inside or outside the household with a certain probability, and the coefficient μG incorporates that probability. Thus, conditional on a particular individual infectiousness profile, the expected number of persons he/she will infect between t and t + Δt time units since own infection is


Thus the total number of people that one infected person is expected to infect outside of his/her household is


Within a household of size h, given a person A infected t time units ago and having an individual infectiousness profile I(t), and an uninfected person B, the probability that A will infect B during time Δt is


Here μh > 0 is a number, which depends on the household size. Note that the out-of-household hazard, given by equation 3.0.7, and the within-household hazard, given by equation 3.0.9 are proportional. This represents the assumption that the overall shape of the infectiousness curve is determined by biological (e.g. pathogen shedding, fluid production) and behavioral (e.g. amount of time spent sleeping, social contact) characteristics of the infection, and that household contacts simply get a different, but proportional, exposure to the infectious individual than outside contacts. We also note that while in- and out-of household infectiousness hazards have a similar form, the infectiousness process taking place within an infected household is quite different from the mass action (branching) process for out-of-household infections, due to a fixed number of susceptibles in the household. Some insight into that process will be gained via a Sellke-type construction in Appendix A. In the meantime, let us introduce some notation to be used later.

Consider a household of size h infected t time units ago - this means that the first, or index case got infected t time units ago. Let Ih(t) be the sum of the individual intensities of infectiousness for all infected members of the household at that point. Let


be the expected intensity of infectiousness of a household of size h infected t time units ago. Let


be the expected cumulative infectiousness of an infected household of size h. From equation 3.0.7 we see that the outside hazard posed by a household of size h infected t time units ago is μGIh(t). Thus the expected total number of people outside the household infected by one household of size h is


We conclude this section with the following well-known fact (see Andersson and Britton (2000), p. 15), whose proof we include for reader’s convenience:

Proposition 3.0.1


where fh is the expected number of people who will be eventually infected in a household of size h. Thus we also have



Let the individuals in the household be 1,…,h. Define the random variable Ai to be 1 if person i is infected, zero otherwise. The total number of infected people is ΣAi, thus


Let Tli be the cumulative (total) infectiousness of the ith person, which is a random variable. Then Ai· Tli = Tli (if a person is never infected, he has zero infectiousness). Also

E(Tli[mid ]Ai=1)=CIG


CIh=E(Tli)=E(Ai·Tli)=P(Ai=1)·E(Tli[mid ]Ai=1)==CIG·P(Ai=1)=CIG·fh

The fact that Rh = fh · RG follows from equations 3.0.8 and 3.0.12. Q.E.D.

4. Five reproductive numbers

In this section, we define five possible reproductive numbers that are relevant to transmission of infection in a community of households. All except the second (to our knowledge) have been described previously in various contexts. We also assume that we are dealing with an early stage of an epidemic in a large community - see Ball et al. (1997) for the limiting behavior in that transmission model as the community size goes to infinity. In particular, there is no depletion of susceptible households, and no household is infected more than once from outside.

4.1. The household reproductive number RH

One definition of the reproductive number that has been widely used in the literature on transmission in communities of households is the “household reproductive number” (we call it RH; it is also occasionally called R* in the literature), which is the mean number of individuals in other households infected by any member of an index household following the infection of one member of the index household (Becker and Dietz (1995); Ball et al. (1997); Fraser (2007)). The idea is to treat households as individuals, and to define RH by analogy to the standard individual reproductive number in a well-mixed population. In this analogy, within-household dynamics - the infection of other members of the household by the index member - play the role analogous to that played by within-host pathogen dynamics in the case of individual reproductive numbers: they are not directly considered but they affect the infectiousness to the outside, in this case to other households.

We know from equation 3.0.14 that the expected number of households infected by an infected household of size h is Rh = fh · RG. The proportion of people living in households of size h is πh, which is also the probability that each new out-of-household infection occurs in a household of size h. From the standard theory of branching processes we get that the mean number of households infected by an index household is


where as before, RG is the mean number of out-of-household infections by a single infected individual; f = Σh πhfh is the expected number of individuals (including the first) infected within a household given that at least one person becomes infected - the final size of the mini-epidemic within the household.

This reproductive number is usefully thought of as corresponding to the extent of reduction in between-household transmission required to stop the epidemic from spreading (Becker and Dietz (1995); Fraser (2007)).

4.2. The individual reproductive number RHI

This reproductive number ignores the distinction between within- and between- household infections and counts the expected number of secondary cases caused by an average infected individual from an average infected household, including those outside and inside the household. There is a caveat in this definition, stemming from double averaging - see Becker and Dietz (1995) for an alternative way of counting. Our approach is as follows:

First, for each household size h, pick N infected households of size h, where N is large. Let AN be the total number of people eventually infected in those households, and let BN be the total number of people whom those AN persons have infected - thus BN is obtained by counting all people infected within the N households (index cases in each on the N selected households are excluded), and all the out-of-household infections that those AN people have caused.

We have

AN[similar, equals]N·fh,BN[similar, equals]ANN+AN·RG

We define


to be the expected number of people infected by one infected person in a household of size h. Now we need to average this out over various household sizes. We will choose a stratified average. The proportion of people living in households of size h among all people is πh; this is also the proportion of infected households of size h among all infected households. Thus we define


where g[equivalent]πhfh1fh.

There are merits and demerits to this definition of RHI, discussed further in Appendix A.4.2. We note that RHI is in general an epidemic threshold only if all households have the same size. Nonetheless this RHI will always fit into the scheme of ordering among the reproductive numbers.

4.3. The perfect-vaccine-associated reproductive number RV

RH defines the extent to which transmission must be reduced between households to halt growth of an epidemic; for instance if we effectively vaccinate all individuals in a proportion p=11RH of households of each size, the reproduction number will be brought down to 1. However, many interventions, such as vaccination, involve only some members of the targeted households, having an impact on both within-household and between-household transmission. In this section we consider vaccination of individuals at random in the population with a 100% effective vaccine - see Ball et al. (2004) for more details. Let pC be the proportion of the population which needs to be randomly vaccinated with a perfect (100% effective) vaccine to reduce the epidemic’s reproductive number to 1. We define the perfect vaccine-associated reproductive number


4.4. The leaky vaccine-associated reproductive number RV L

In contrast to an “all-or-nothing” vaccine, a “leaky” vaccine with efficacy E is defined as one that reduces the instantaneous probability (rate) of infection given infectious contact (both for in-household and out-of-household contacts) by a proportion E (multiplies it by 1 − E, see Ball and Becker (2006)). Moreover it has no effect on transmission by the vaccinated individuals who did get infected. If E = 1, the vaccine is called perfect. A leaky vaccine would have effects on both within- and between-household transmission. For out-of-household transmission, the expected number of people infected by an index case (RG) scales down by a factor of 1 − E. The effect on within-household transmission is more complex - see Appendix A.

We can define a minimum (critical) efficacy EC for a leaky vaccine, where EC is characterized by the fact that if all the population is vaccinated with a vaccine of efficacy EC, the resulting reproductive number of the epidemic will be 1. By analogy with the perfect-vaccine-associated reproductive number, we then define a leaky-vaccine-associated reproductive number as


Remark 1

Note that this need not refer only to vaccines. One could also consider antiinfective treatments, hygiene measures, or masks that reduce the rate of infection of susceptibles by a 1 − E, and consider the critical value EC for such an intervention to be the one which, if administered to the whole population, reduces the reproductive number to 1.

Remark 2

One can define the notion of a reproductive number RE for any vaccine efficacy ECE ≤ 1. Let pE be the proportion of the population which needs to be randomly vaccinated to reduce the epidemic’s reproductive number to 1. We define the reproductive number for a vaccine of efficacy E as


Note that we recover the perfect vaccine-associated reproductive number RV by setting E = 1, and the leaky vaccine-associated reproductive number RV L by setting E = EC. In Appendix A.2 we’ll show that RE is a decreasing function of E as E increases from EC to 1.

4.5. The exponential-growth-associated reproductive number Rr

Recall the definition of βG(t) from equation 3.0.5. We define the individual infectious contact interval distribution


We note that wG( ) can often be estimated from the data, using the distribution of times from appearance of one case in the epidemic curve to appearances of secondary out-of-household cases caused by it (Kenah et al. (2008), Lipsitch et al. (2003), Wallinga and Teunis (2004), Mills et al. (2004), Ferguson et al. (2005)).

As a consequence of the Euler-Lotka equation (Wallinga and Lipsitch (2007)), infection spreading in a mass-action population with a reproductive number R and an infectious contact interval distribution wG( ) will grow exponentially at a rate r given by the solution to


or alternatively,


where MwG is the moment-generating function for the density wG(t). There is an analogous statement for epidemic dynamics in a community of households. Define the infectious contact interval distribution is a community of households as


where as before, βh(t) is the expected intensity of infectiousness of a household of size h infected t time units ago - see equation 3.0.10. Again, one can apply the Euler-Lotka equation using the number of newly infected households, where households are treated as individuals, with the serial interval distribution wH(t) and the reproductive number RH. Thus infectious spead in a community of households will grow exponentially at rate r given by solution to


or alternatively,


where MwH is the moment-generating function for the density wH( ). We note that this growth rate (which is the same as the growth rate for the number of newly infected individuals) can often be estimated well from the incidence curve. We can plug this growth rate r of the epidemic into the moment generating function for the individual infectious contact interval distribution to define


Here wG( ) is the individual infectious contact interval distribution from equation 4.5.1 and r is the solution to equation 4.5.6. Perhaps the main motivation behind this definition is that one can often have good estimates on r and wG( ) from the data (Lipsitch et al. (2003), Wallinga and Teunis (2004), Mills et al. (2004), Ferguson et al. (2005)). The interest in this reproductive number is thus practical: it allows us to ask such questions as, if we know nothing about household structure but apply naive (well mixed) estimates to transmission in a population where households are actually important, how will this affect our estimates of R? If we calculate critical vaccine coverage based on these estimates, how wrong (and in which direction) will we be?

4.6. Characteristics of the reproductive numbers

It is useful to note that the reproductive numbers RH, RHI, and Rr are all quantities that may, in principle, be measured from data that would be available early in an epidemic. RHI requires knowing who infected whom, and a sampling scheme for calculating household sizes, but nothing further. RH requires knowing who infected whom, and who lives in which household. Rr requires knowing the total number of new infections each day, plus a reliable estimate of the infectious contact interval distribution wG( ) (Wallinga and Lipsitch (2007); Wallinga and Teunis (2004)). Of these three, only RH has a direct practical application: it represents the factor by which transmission between households must be reduced to halt epidemic spread (Becker and Dietz (1995); Fraser (2007)). The other two reproductive numbers, RV and RV L, cannot be directly estimated from data, but are of direct practical interest, since they are measures of the extent of effective intervention in individuals required to stop the growth of the epidemic. Under mass-action assumptions, once again, all of these are the same (except RH which is not defined in the absence of households); in reality, we would like to use measurable quantities – RH, RHI, and Rr – to provide bounds on the quantities of practical interest – RV and RV L. In the next section we explore whether such bounds are possible.

5. The relative magnitude of the reproductive numbers

To begin comparing all the reproductive numbers defined so far, we note that they all serve as an epidemic threshold:

Lemma 1

RH=1[left and right double arrow ]Rr=1[left and right double arrow ]RV=1[left and right double arrow ]RVL=1

and for a single household size, RH = 1 [left and right double arrow ] RHI = 1.


RH = 1 [left and right double arrow ] RV = 1 [left and right double arrow ] RV L = 1 is trivial. Equation 4.5.6 tells that RH = 1 [left and right double arrow ] MwH (−r) = 1 [left and right double arrow ] r = 0 [left and right double arrow ] Rr = 1 by equation 4.5.7. Finally, for a single household size, RH = RG · f. So RH = 1 [left and right double arrow ] RG = 1/f [left and right double arrow ] RHI = 1. Q. E. D.

The situation is quite different for a growing epidemic. In this paper we’ll prove

Theorem 1

In a growing epidemic


We note that Theorem 1 doesn’t characterize well the relation between Rr and the rest of the reproductive numbers. In fact, as we will show in the next section, in most numerical simulations, we also have


suggesting a strict ordering between the reproductive numbers, with the dynamical ones (RH, Rr and RHI) providing upper and lower bounds for the reproductive numbers which assess the required control measures. We’ve shown analytically, however, that there are some exceptions for the second inequality when the latent period for individual infectiousness is very large: see Appendix B.

The proof of Theorem 1 is given in the next section, with several more technical results deferred to Appendix A. Some of the consequences of Theorem 1 are discussed in Section 8.

6. Proofs of inequalities in Theorem 1

6.1. RHRV L

We have RH = RG · f, see equation 4.1.1. Suppose a vaccine of critical efficacy EC is administered to the whole population. By definition of the critical efficacy, RH(EC) = 1. We also have


where RG(EC) is the expected number of out-of-household infections by an index case (in the vaccinated population), and f(EC) is the expected number of people who are eventually infected in the index household. We have


(outside hazard scales by a factor of (1 − E)); and f(EC) ≤ f (having vaccinated individuals in a household decreases the expected number of infected). Thus


So 1EC1RH, which is equivalent to RVLRH.

6.2. RHRr

Note that RH and Rr are closely related - see equations 4.5.7 and 4.5.6. Define now the CDF for the infectious contact intervals, both individual and household:


We have WG(0) = 0 = WH(0), WG(t)=wG(t),WH(t)=wH(t). Integrating equations 4.5.5 and 4.5.7 by parts we get


We see from the equation above that the RrRH is a consequence of QED.

Lemma 6.2.1

The household infectious contact interval distribution wH( ) is always longer than the individual infectious contact interval distribution wG( ): WH(t) ≤ WG(t) ∀t > 0


The infectious contact intervals from household to household will be a “rightward smear” of the infectious contact intervals from any given individual to others outside the household.


Let us introduce the following definitions, both for a household of size h and for an individual:


Thus CIG = BG(∞) and CIh = Bh(∞). Looking at the definitions of wG( ) in equation 4.5.1, of wH( ) in equation 4.5.4, and of WG( ) and WH( ) in equation 6.2.1, clearly Lemma 6.2.1 would follow if we could show that Bh(t)CIhBG(t)CIG. The latter is equivalent, by Proposition 3.0.1, to


To understand why the latest equation is true, define the function yh(s) by the condition that the probability that in the household of size h there is new infection during (s, s + Δs) time units since the index case got infected is yh(s) · Δs. We have


- the above represents the expected number of people ever infected in the household (besides the index case). Also for each a ≥ 0,


The first factor on the right hand side above is the contribution from the index to the household’s expected infectiousness at time a, and the integral gives the contribution of everybody else. Integrating equation 6.2.4 from zero to t we get


6.3. RV L ≥ RV

This inequality was previously demonstrated by Ball et al. (2004) - see also Ball and Becker (2006), where the model is extended to allow for two degrees of infection - mild and severe. In the Appendix A we provide a proof of a slightly generalized version of this claim. It is generalized in two ways. First, it is valid for any population structure, not only for a population of households. Second, we generalize the results of Ball et al. (2004), which in effect compared only perfect vs. leaky vaccines, to consider a continuum of vaccines with leaky efficacy between perfect and a minimal (critical) leaky efficacy EC - see Proposition A.2.3.

Recall the notion of a vaccine of efficacy E from Section 4.4. Let us define the notation R[E; p] to denote the value of a reproductive number realized when a fraction p of the population receives a vaccine with efficacy E. For any ECE ≤ 1, there is proportion pE of people that need to be randomly vaccinated with a vaccine of efficacy E so that R[E, pE] = 1. Recall from Remark 2 in Section 4.4 that the reproductive number RE associated with a vaccine of efficacy E is


In the appendix we prove that RE is a non-increasing function of E in the interval ECE ≤ 1. Since RV L = REC and RV = R1, this general proof includes as a special case the finding RV LRV.

The basic finding RV LRV may be interpreted practically as stating that it is always more effective to vaccinate a fraction p with a perfect vaccine than to vaccinate everyone with a vaccine of efficacy p. The more general form established in Proposition A.2.3 in the appendix shows that it is always better to vaccinate a fraction p1 with a vaccine of efficacy E1 than to vaccinate a proportion p2 with a vaccine of efficacy E2, if p1E1 = p2E2 but p1 < p2. Put another way, the efficacy and the vaccination coverage, which are interchangeable in their effect according to mass action models, are in general not interchangeable; it is better to increase vaccination efficacy than to increase vaccination fraction of the population in an “equivalent” fashion.

The proof that it is always better to vaccinate a fraction p1 with a vaccine of efficacy E1 than to vaccinate a proportion p2 with a vaccine of efficacy E2, if p1E1 = p2E2 but p1 < p2 is fully general, not relying on household structure in particular. It stems from a simple observation that for any realization of individual infectiousness profiles I(t), and for any individual i, the probability of i getting infected by a given time is lower under the (E1, p1) scenario than under the (E2, p2) scenario - see Proposition A.2.3 for full details. The argument is then formalized by coupling the two scenarios so that in the first case there are always less infected people than in the second case.

The non-increasing nature of RE as a function of E is an immediate consequence which is valid in general and not just in a community of households. Indeed pick two vaccines of efficacy E1 > E2, and pick a vaccination fraction p > pE1 (recall that R[E1, pE1] = 1). Then R[E1, p] < 1, so the expected number of people who are eventually infected under the (E1, p) scenario is finite. Hence the expected number of people who are eventually infected under the (E2, pE1E2) scenario is also finite. Hence the critical fraction pE2pE1E2 for any p > pE1. So pE2pE1E1E2, which is equivalent to saying than RE1RE2.

6.4. RV ≥ RHI

This is the proof in Appendix A that is longest, most mathematically involved, and least illuminating intuitively. Here we’ll sketch an argument for a community of homogeneous households of the same size. The basic approach is to note that RHI = RG + g is composed of two components, the between-household component RG and the within-household component g=f1f. Suppose that a fraction p* of the population receives a perfect vaccine, leaving q* unvaccinated, and achieving 1 = RV (q*) = RHI(q*) = RG(q*) + g(q*) (here having q* in brackets means considering the scenario when a proportion q* of the population is unvaccinated). By definition, RV = RV (1) = 1/q*. Because the between-household transmission is mass-action, RG(q*) = q*RG(1). It is therefore sufficient to show that g(q*) ≤ q*g(1).

To prove the above, pick any q* ≤ x ≤ 1, and let f(x) be the expected number of infected people in an infected household in a population, where each individual is vaccinated with a perfect vaccine with probability 1 − x. Let g(x)=f(x)1f(x) as before. We need g(q*) ≥ q*g(1). Well show more generally that for any number C > 1, g(Cx) ≤ Cg(x). Some intuition for why this inequality holds may be gained from considering limiting cases.

As the first limiting case, suppose that each household is very large with N persons and the probability of each infecting the other is [congruent with] μ/N for some μ < 1. Thus the dynamics can be well approximated by a branching process, since the depletion of susceptibles within a household has negligible impact on the transmission process in that household. So each person will infect on the average μ more people in the next generation, etc., so the expected total number of infected people is 11μ. If a proportion 1 − x is vaccinated with a perfect vaccine then the expected number of infected people is f(x)=11xμ, g(x) = μx and g(Cx) = Cg(x).

The spirit of the general proof in the appendix is to demonstrate that this branching process limit is the “worst-case scenario.” To show this, we’ll consider an infected household where each person is vaccinated with a perfect vaccine with probability 1 − x, and define Hk(x) to be the expected number of people infected in the kth generation within the household, for each k ≥ 0. Thus H0(x) = 1 and f(x) = Σk ≥ 0Hk(x). In the branching process case, Hk(x) = μkxk. A key step in the proof is to show that in general for any C > 1, Hk(Cx) ≤ CkHk(x). This implies the differential inequality xHk(x)kHk(x). Another step is to note that for each k ≥ 0, Σl kHl(x) ≥ Hk(x)f (x). Combining these, and using f(x) = Σk ≥ 0Hk(x) one easily obtains a differential inequality


This in turn implies that g(Cx) ≤ Cg(x) for C > 1.

7. Numerical simulations

In the preceding sections, we have shown mathematically that in a growing epidemic, RHRV LRVRHI and RHRr. The present section uses numerical simulations to address three further points:

  1. We show the magnitude of these reproductive numbers for a wide range of assumptions about the natural history of the disease and the household size distribution in order to assess how much they differ under particular assumptions.
  2. We show that in most plausible parameter regions, we have a strict ordering
    despite the fact that we can show mathematically that conditions exist in which Rr < RV (Appendix B), and that we have found no mathematical argument to demonstrate that RV LRr.
  3. We show that a simple approximation to RV works well for reasonable parameters for several diseases.

We took two approaches to numerical studies. In section 7.1 we evaluate the reproductive numbers in a Markov SEIR model in which the parameters may be varied in a simple fashion to explore a wide range of parameter space. In section 7.2 we use more realistic distributions of parameters, similar to those for measles and influenza (Fraser (2007)), to assess how these reproductive numbers may perform for real diseases.

7.1. Relative magnitudes of reproductive numbers in an SEIR model

To explore parameter space and compare the magnitudes of the five reproductive numbers, we numerically implemented a model for within- and between-household transmission of an infection with a simple susceptible-exposed-infectious-recovered (SEIR) natural history. Upon infection, an individual remains exposed but not infectious for a latent period whose length is exponentially distributed with mean 1/u, and then becomes infectious for a period that is also exponentially distributed with mean 1 (without loss of generality). Infectiousness I(t) (see section 3) during the infectious period is constant (and can be assumed equal to 1), and is zero before and after. Thus CIG = 1 (see equation 3.0.6). External infections follow standard mass-action dynamics, but we limit our consideration to the early phase of the epidemic such that new (uninfected) households are plentiful, and all secondary infections outside the household occur in households that have never before been exposed (no depletion of susceptible households).

Households are assumed to be of a single fixed size h, and the infectiousness profile of a household is calculated using a master equation for within-household infections. In this setting, using “standard” (deterministic) SEIR infection dynamics within a household is inappropriate because of the small numbers of individuals, hence we keep track explicitly of the probability p(σ, ε, ι) that the number of S, E, I and R individuals in the household at a particular time is (σ, ε, ι, hσει). Infectious individuals exert constant infectiousness μh on other, uninfected individuals within the household (see equation 3.0.9, with I(t) = 1) and μG on all susceptible members of other households combined (see equation 3.0.7). The overall epidemic grows (until depletion of susceptible households sets in) at a rate equal to the leading eigenvalue of the master equation matrix for within-household transmission supplemented with a term for extra-household infections.

The appropriate master equation for within-household dynamics is:


Here we define p(σ, ε, ι) = 0 if σ+ε+ι > h or if any of σ, ε, ι is negative. To simulate the dynamics within a community of households, we allow these dynamics to run in each household while also allowing transmission between households. If we now define z(σ, ε, ι) as the expected number of households with σ susceptibles, ε latently infecteds, ι infectious, and hσ,ε,ι recovered, we can write these dynamics identically to the within-household dynamics, adding a term for between-household infections


where δ (x) = 1 if x = 0 and 0 otherwise. The last term represents transmission to other households, which feeds new households into the epidemic with h −1 susceptibles, 1 latent, and no infectious individuals. For this last term, note that the transmission coefficient for newly infected households is RG = μG as CIG = 1, see equation 3.0.8.

For a household of size h there are h(h+1)(h+2)6 possible configurations for the family, of which two (all susceptible and all recovered) are not tracked in our dynamics, the former being assumed constant and the latter being irrelevant. We can thus exactly track the early phase of the epidemic with h(h+1)(h+2)62 linear differential equations defined by equation 7.1.2. The largest eigenvalue of this system constitutes the exponential growth rate of the epidemic and is used, along with the serial interval distribution in the calculations of the main text. This system was implemented in Mathematica 6.0, and the eigenvalues taken as the eigenvalues of the Jacobian matrix of the system of linear equations. Code is available on request.

Figure 1 shows plots of the five reproductive numbers for varying household sizes (different rows), varying values for the out-of-household reproductive number RG, and varying rates of within-household transmission μh under a range of assumptions using the an SEIR model of transmission in a community of households, with an assumed mean infectious period of 1 time unit. The curves give the values of each of the five reproductive numbers defined in the previous section. One curve for each type of reproductive number is shown, except for Rr (the only reproductive number sensitive to u), where three curves are shown to represent values of 0.01, 1, and 100 time units (from top to bottom) for the latent period of the disease, equivalent to u = 100, 1, 0.01 respectively. Thus the upper curve has almost no latent period while the lower curve has a latent period 100 times as long as the infectious period.

Figure 1
Magnitudes of the 5 reproductive numbers for various household sizes (rows) and values of RG, the number of outside-household secondary cases per infectious individual (columns). The horizontal axis shows μh, the within-household transmission ...

Several features are notable in Figure 1. First, the five reproductive numbers follow a strict ordering


for nearly all parameter values chosen, with Rr nearly overlapping RV for very long latent periods (u = 0.01). In Figure 2, we show that, as predicted mathematically in Appendix B.3, Rr > RV for very long latent periods for a larger household size h ≥ 4, though the two are very nearly equal. We consider these results primarily of theoretical interest since a latent period so much longer than the infectious period is unrealistic for most known infections. Hence the strict ordering observed in most of Figure 1 seems to be a safe assumption for most plausible natural histories, assuming homogeneously sized households.

Figure 2
The ratio Rr/RV as a function of the latent period (with all other parameters fixed) in a household of size 4. As the latent period becomes (unrealistically) large, Rr becomes slighty smaller than RV, as predicted in Appendix B.3.

Second, it is clear that RH and RV L have similar magnitudes when the within-household transmission parameter μh is large. This is not surprising, since a leaky vaccine is of little benefit within a household when within-household transmission is intense, so its effect is mainly on between -household transmission. On the other hand, when the within-household transmission parameter is moderate, RH is quite larger than RV L.

Third, Rr, which has been measured in various forms for infections such as SARS (Lipsitch et al. (2003); Wallinga and Teunis (2004)), influenza (Ferguson et al. (2005); Mills et al. (2004)) and Ebola (Chowell et al. (2004)), gives in general a conservative indication of how much transmission must be blocked to halt the growth of the epidemic (as indicated by the RV), in some cases (short latent period, intense within-household transmission), a strongly conservative estimate.

Fourth, whenever there is significant within-household transmission, RV and RV L are quite different, in percentage terms, implying that the tradeoff between efficacy and coverage is rather strongly weighted in favor of efficacy in such situations.

Fifth, there is significant variation in Rr depending on the household size distribution and the degree of within-household transmission. It has been suggested that if different models of a disease are calibrated to give the same initial growth rate of an epidemic, they should then produce similar estimates for the effect of interventions. Our model suggests this may not be correct, since different models of the same disease might make different assumptions about parameters such as household size distributions and the relative contribution of within-household transmission (Halloran et al. (2008)), producing different values of Rr even for similar values of the other reproductive numbers. As a consequence, it is possible that for a given natural history of disease and value of Rr, models might make conflicting predictions about the effectiveness of control measures.

7.2. Application to realistic distributions of infectiousness

In a second set of numerical simulations, we studied the five reproductive numbers in the context of a disease natural history calibrated more closely to two specific diseases, measles (which has a very high reproductive number and a highly peaked infectiousness profile) and influenza (which has a modest reproductive number and a wider infectiousness profile). These simulations followed the protocol described by Fraser (2007). Individual cumulative intensities of infectiousness were Gamma-distributed, with the shape parameter set to.22 for measles and 1 for flu. Stochastic simulations were used (rather than the master equation approach described in the previous section) to estimate the mean infectiousness profile for each household size, and from these profiles and an assumed value of RG (out of household reproductive number, set to equal 1.5 for flu and 15 for measles), the values of the other reproductive numbers are calculated for a household size distribution based on the 2000 US census. The within-household transmission parameter μh (assumed to be independent of household size) varies along the x-axis of each panel of Figure 3, with all other parameters fixed.

Figure 3
Magnitudes of the 5 reproductive numbers for influenza (left) and measles (right), as a function of the within-household transmission parameter μh. Red - RH, orange - RV L, green - Rr, brown - RV A, purple - RV, blue - RHI. In both cases the individual ...

Here, in addition to the five reproductive numbers, we also calculated an approximate reproductive number RV A, which approximates RV by assuming that the within-household, indirect effects of a perfect vaccine are negligible, hence that upon vaccination of a proportion p of persons at random, the expected final size in an infected household of size h becomes 1 + (fh − 1)(1 − p). Since in the same circumstances, RG[p] = RG(1 − p), the household reproductive number then becomes approximately RH[p] ≈ RH(1 − p)2 + RGp(1 − p). Setting this quantity equal to one and solving for p, one obtains


The following can be observed from Figure 3:

  • Rr is a very close approximation to RV for all parameters considered, and when it is wrong, as in the simulations in the previous section, it is conservative (Rr predicts that more people need to be vaccinated than the actual number). Moreover, RV A is also an excellent approximation, indicating within-household herd immunity effects are minor compared to the direct effects of vaccination in the household and the effects in the population as a whole.
  • RV L is significantly larger than RV whenever within-household transmission is substantial.
  • For flu, RHI significantly underestimates the other reproductive numbers, and in both cases RH significantly overestimates them.

8. Discussion and implications

In this paper we have defined five reproductive numbers that may be of epidemiologic interest when an infection is spreading in a population of households. The reproductive numbers RH, RHI and Rr are related to the dynamics of the emerging epidemic and in principle can be measured from the incidence and contact data. The reproductive numbers RV, RV L and RH each indicate the level of effort required to halt the growth of the epidemic with a different class of control measures. RV corresponds to the fraction of persons who must be completely removed from potential transmission; the most obvious way to accomplish this is by the administration of a perfectly effective vaccine. RV L corresponds to the fractional reduction in the hazard of transmission that must be accomplished for all persons in the population, for example by administration of a “leaky” vaccine, but also conceivably through improved hygiene, wearing of masks, or any other imperfect means of reducing individuals’ infectiousness or susceptibility to infection. RH corresponds to the fractional reduction in between-household transmission required to halt epidemic growth, for example by reducing public gatherings or closing schools, which would reduce between-, but not within- household contacts. In all cases, for simplicity, we assume that the interventions are applied uniformly (or at random in the case of RV) across the population.

The upper and lower bound values, the household reproductive number RH and the individual reproductive number RHI respectively, are conceptually the simplest and perhaps the most straightforward to measure, if contact tracing data are available. RH may be measured by counting the number of out-of-household transmissions from each member of each household, adding within a household and averaging across households. RHI may be measured by counting the total number of transmissions from each person and averaging.

As Figure 1 shows, these two numbers may in general give extremely divergent values, especially when household sizes are large and within-household transmission is important. An intermediate value Rr may be obtained by estimating the exponential growth rate of the epidemic, then calculating a reproductive number from the Euler-Lotka equation 4.5.7, using the individual infectious contact distribution.

The main purpose of this paper was to compare these three (dynamical) quantities that can be estimated from data against two other quantities that are important for epidemic control: namely, the proportion pC of the population that must be successfully immunized to stop epidemic growth with a perfect vaccine, or the minimum efficacy EC of a leaky vaccine that, if given to the entire population, would stop epidemic growth. To keep comparisons on a single scale, these values are converted into reproductive numbers in the same way they would be for simple epidemic models. Specifically the perfect vaccine-associated reproductive number RV is defined as RV = 1/(1 − pC) and the leaky vaccine-associated reproductive number RV L = 1/(1 − EC).

Conveniently, the reproductive numbers that can be estimated from data provide bounds on these reproductive numbers associated with control. RV L lies between RH and (in all simulated cases) Rr, while RV lies above RHI and (in most cases) below Rr. The last qualification is required because the inequality RrRV fails, albeit by a negligible amount, for very long latent periods - see Appendix B.3 and Figure 2. Study of the relations between the reproductive numbers has also revealed that increasing the vaccine efficacy is better than scaling the vaccination coverage level by the same factor. This implies that attempting to model tradeoff between vaccine efficacy and coverage levels without considering household structure may underestimate the value of higher-efficacy vaccines that could be given to fewer people (e.g. by changing the dose, Riley et al. (2007)).

These findings suggest that while no simple measure of epidemic progress is exactly indicative of the effort needed to control epidemic spread in a community of households, it is nonetheless possible to obtain some bounds on, and reliable estimates of, the effort required. For instance, even if no good contact data exists for most cases, Rr may be obtained by observing the exponential growth rate of the epidemic, then calculating it from the Euler-Lotka equation 4.5.7, using the individual infectious contact distribution. Several studies in the past (Lipsitch et al. (2003), Wallinga and Teunis (2004), Mills et al. (2004), Ferguson et al. (2005) etc.) took that approach, using the growth rate of an epidemic along with an infectious contact interval distribution, in various ways, to estimate the reproductive number. In at least the SARS cases, the infectious contact interval distribution was derived from known contacts of brief duration, and thus may be approximately equal to the infectious contact distribution. Rr appears for the real diseases we have simulated to be a close, and conservative, approximation, to RV, consistent with our findings in Figure 1, where this was true for cases where the latent period was comparable to (or shorter than) the infectious period.

Another consequence of our findings is that we can give a lower bound for the quantity of a vaccine needed to halt the epidemic. The argument runs as follows: first we can observe the epidemic’s impact on infected households, estimating the numbers fh. Using the inequalities we’ve proven we get that for any vaccine of efficacy E,


where as before, f = Σπifi and g=πifi1fi.

Several aspects of the story remain incomplete. First, we have no proof yet of the inequality RV LRr, which holds in all numerical simulations. Second, the inequality RrRV is violated very modestly for some very long latent periods, but we have not been able to quantify this statement. Third, we have considered only random vaccination, rather than rational or targeted vaccination. Finally, we have considered only the early, exponential phase of the epidemic. The impact of household structure on the dynamics between this phase and the final outcome (considered by Ball and Neal (2002)) remains to be studied in more detail.


This work is supported by the US National Institutes of Health cooperative agreement 5U01GM076497 “Models of Infectious Disease Agent Study”; Ruth L. Kirchstein National Research Service Award 5T32AI007535 “Epidemiology of Infectious Diseases and Biodefense”; and a Royal Society Research Fellowship.

Appendix A. Proofs of RV LRVRHI in a growing epidemic

A.1. A Sellke-type construction for the spread of infection on networks with randomly vaccinated persons

In this section we’ll present a framework to understand the spread of infection on some network, on which each person receives a leaky (or perfect) vaccine with a given probability p, and those vaccinations are independent of one another. The main idea behind this contruction appeared essentially in Sellke (1983). We’ll use two variants of the Sellke contruction, needed for our proofs: one with individual thresholds, which is more common (see Andersson and Britton (2000), p. 14), and one with pairwise thresholds, to trace who infected whom.

Let the nodes (persons) in the network be labelled as P1, …, PN. Once infected, individual Pi generates infectiousness of intensity Ii(s) ≥ 0, where s is the time since Pi’s infection and Ii(s) is a trajectory of some stochastic process Mi. The processes Mi are perhaps not necessarily the same (say people in different age groups have different patterns of pathogen shedding).

For a calendar time t and a person Pi infected at time ti, define


The integral (which is zero if ti > t) is called the cumulative infectiousness of the person Pi up to time t.

Given two individuals Pi and Pj, there is a transmission coefficient μij ≥ 0 for infectious contacts between Pi and Pj. For a person Pi initially infected at calendar time ti, the probability of Pj receiving an infectious contact from Pi during time (t, t + Δt) (conditional on Pi’s particular infectiousness profile) is


Note that “receiving an infectious contact” is not the same thing as getting infected as Pj could already be infected by time t. The number μij is sometimes also called Pj’s susceptibility to infectious contact from Pi.

For a person Pj and a calendar time t, we define the cumulative dosage of infection received by Pj up to time t as


Recall the notion of a leaky vaccine from Section 4.4. Given a number 0 ≤ E ≤ 1, we say that a vaccine has efficacy E if for a person Pj receiving this vaccine, the transmission coefficient for infectious contacts between any person Pi and Pj becomes (1 − E)μij. If E = 1, the vaccine is called perfect and Pj cannot be infected.

We assume that each person receives such a vaccine with probability p and those events are independent. We now describe two variants of a constuction for the spread of infection:

A.1.1. Individual thresholds

This construction is well known, see Andersson and Britton (2000), p. 12, Ball et al. (2004). For a person Pj, let Qj be the cumulative dosage of infection that Pj needs to receive before getting infected. (Qj is called Pj’s threshold for infection). Note that Qj is a random variable; P (Qj > Q) is the probability that Pj was not infected, conditional on the fact that by that time Pi received a dosage Q of infection. To understand the distribution of Qj, we first need to understand the conditional distribution of Qj on whether Pj is unvaccinated (with probability q = 1 − p), or vaccinated (with probability p).

If Pj is unvaccinated, equations A.1.2 and A.1.3 say that

P(D<QjD+ΔD[mid ]Qj>D)=ΔD

Thus (with probability q), Qj is an exponential variable Exp(1). Similarly with probability p (Pj vaccinated), Qj is an exponential variable Exp(1 − E) (if E = 1, Qj = ∞ with probability p). We’ll call such a random variable Exp(1, 1 − E, q). Also various Qj are independent of each other and of the infectiousness profiles Ii.

The construction of the spread of infection works as follows: suppose we have a network as above with individuals Pi1, …, Pil initially infected. Consider the following collection of independent random variables:

  1. I1(s), …, IN (s) are trajectories of the stochastic processes M1, …, MN (those are individual infectiousness profiles).
  2. Qj = Exp(1, 1 − E, q).

Given any realization for those variables, we reconstruct the dynamics of the spread of infection. We go along the time line until the first susceptible Ps1 receives a dosage of infection (from the initially infected), that matches Qs1. Once that happens, we add Ps1 to the collection of the infected and his infectiousness trajectory Is1 starts to count towards other susceptibles’ dosages. If another susceptible Ps2 receives a dosage (from the initially infected and Ps1) which matches Qs2, we add Ps2 to the set of the infected, etc. This process recovers the timing of each infection, but it does not recover the information on who infected whom.

A.1.2. Pairwise Thresholds

For any individual Pi, define a random variable Xi to be uniform between [0, 1], with various Xi being independent of each other and of the individual infectiousness profiles. Let p be the probability of being vaccinated, with q = 1 − p. Define χq(Xi) = 1 if Xiq, zero otherwise. The random variable χq(Xi) is Bernoulli, taking value 1 with probability q and 0 with probability p. The reason for introducing Xi into the contruction instead of just χq(Xi) will become clear later (see the proof of Lemma A.3.1), when the probability q for being unvaccinated gets scaled, and comparisons between the stochastic processes will need to be drawn.

For persons Pi and Pj and calendar time t, we define the pairwise dosage of infection that Pj received from Pi up to time t as


Let Qij be the dosage of infection that Pj needs to receive from Pi before the first infectious contact from Pi. Note that infectious contact doesn’t mean that Pj gets infected as he might have already been infected by someone else. Also for E = 1, we’ll think of Qij as the dosage that needs to be received conditional on the fact that Pj is unvaccinated. We see, using the same reasoning as in section A.1.1, that Qij are independent exponential variables Exp(1). We now have a construction of the spread of infection analogous to the one in section A.1.1. We start with the initially infected and, as time moves forward, we add individuals to the list of the infected if one of their pairwise thresholds to one of the previously infected is reached by that person’s infectiousness trajectory. Note that in this construction we recover more information, namely the information on who infected whom.

It is also useful to state measure-theoretically that the whole stochastic process is represented by the probability space


Mi is the space of trajectories of the stochastic process of Pi’s individual infectiousness profile; Rij has a (probability) measure associated with the exponential variable Exp(1) (pairwise threshold); and [0, 1]i has the standard probability measure (which determines whether Pi was vaccinated or not). The interval [0, 1]i can also be replaced by a Bernoulli variable Bi(p).

Any point in S represents a particular realization of the dynamics of infection on our network. Thus, for instance the total number of infected people under a particular dynamic is an integer-valued function of S, and the integral of that function over S gives the expected number of people, who are eventually infected etc.

A.2. Reproductive numbers associated with vaccines of varying efficacy

Suppose that in an emerging epidemic in a community of households, we randomly distribute a vaccine of efficacy E (0 ≤ E ≤ 1) - see Section 4.4 for more details. Recall (see Remark 2 in Section 4.4) that the reproductive number for a vaccine of efficacy E is 1


Note that the definition makes sense only for EEC. The main result of this section is the following

Lemma A.2.1

The function RE is a non-increasing function of E as E ranges between EC and 1. In particular, RV L ≥ RV.


This lemma and its proof are a slight generalization of a proposition in Ball et al. (2004), where it is essentially proved that for any vaccine efficacy E, RER1 = RV.

Proof of the Lemma

The idea is as follows: suppose we have two vaccines of efficacy E1 and E2 with E1 > E2, and we distribute them at random for proportions p1 and p2 of the population, with p1E1 = p2E2. Then the spead of infection is less severe in the first case than in the second case. This is true for any finite network, as we’ll show below. A simple argument will then establish Lemma A.2.1 for a community of households. First we prove the following elementary

Proposition A.2.1

Consider two numbers 1 > q1 > q2 ≥ 0, and two numbers 0 ≤ c1 < c2 < 1. If (1−q1)(1−c1) = (1−q2)(1−c2), then the random variable X = Exp(1, c1, q1) is stochastically larger than the random variable Y = Exp(1, c2, q2).


Let pi = 1 − qi, with p1 < p2. The stochastic comparison means that for any number a,


Now P(Xa) = q1ea + p1ec1a = ea(1 − p1 + p1e(1−c1)a). Similarly P (Ya) = ea(1 − p2 + p2e(1−c2)a). Let C = p1(1 − c1) = p2(1 − c2). It is elementary to see that f(x)=1x+xeCax is a decreasing function of x. Q.E.D.

The next proposition is well-known (see Andersson and Britton (2000), p. 20)

Proposition A.2.2

(Coupling) Suppose a random variable X is stochastically larger than a random variable Y. Then there is a probability space J and realizations X: J [mapsto] R [union or logical sum] (∞) and Ŷ: J [mapsto] R [union or logical sum] (∞) such that X has the same distribution as X, Ŷ has the same distribution as Y and X (z) ≥ Ŷ (z) for every point z [set membership] J.

From this we derive the following proposition, which generalizes the result in Ball et al. (2004):

Proposition A.2.3

Consider a finite network with the initially infected Pi1,…,Pik. Consider the two scenarios: In the first case, each of the susceptibles receives a leaky vaccine of efficacy E1 with probability p1; in the second case, each receives a leaky vaccine of efficacy E2 with probability p2. Suppose E1 > E2 and E1p1 = E2p2. Then one can couple the two processes for the spread of infection so that for any realization and for each point in time, the set of people, who are infected by then in the first scenario is contained in the corresponding set for the second scenario.


For each person Pi, we assign a pair ( Qi1,Qi2) of individual thresholds for the two scenarios (thus Qi1=Exp(1,1E1,q1) and Qi2=Exp(1,1E2,q2)), coupled according to propositions A.2.1 and A.2.2, so that Qi1Qi2. We construct the spread of infection in time as described in section A.1.1. We claim that for a particular realization of the individuals infectiousness profiles and thresholds and for each time t, the set of individuals infected by time t in the first scenario is contained in the corresponding set in the second scenarion. If not, there is a first time t0 and a person P0 infected at that time in the first scenario, but not yet infected in the second scenario. That means that its threshold Q01 was reached by the infection dosage D01(t0) from the previously infected in the first scenario. Since t0 is the first such time, for any person Pi previously infected (at time ti1<t0) in the first scenario, Pi was already infected at time ti2ti1 in the second scenario. Since the infectiousness profiles Ii(s) are the same under the two scenarios, those earlier infected generated a higher (no lower) dosage of infection D02(t0) for P0 under the second scenario than under the first one. Since Q02Q01, person P0 must have been infected by time t0 under the second scenario. Q.E.D.

We can now prove Lemma A.2.1, which is simplest to see for a community of homogeneous households. Consider a community of homogeneous households of size 1,…,n and suppose that πi is the proportion of people, living in households of size i. Then the household reproductive number of the epidemic is


Here RG is the expected number for out-of-household infections by one infected person, and fi is the expected number of people, who are eventually infected in a (homogeneous) household of size i (given an initial infection). Suppose we now have two vaccination strategies: In the first case, we distribute a vaccine of efficacy E1 at random to proportion p1 of the population - thus we get a community Com1. In the second case we distribute a vaccine of efficacy E2 at random to proportion p2 of the population - thus we get a community Com2. As before, p1E1 = p2E2 and E1 > E2. For the first community, the reproductive number is


Here fi[E1, p1] is the expected number of infected in a household of size i with one initially infected where each person is given a vaccine of efficacy E1 with probability p1. There is also an analogous expression for RH[E2, p2]. From Proposition A.2.3 we conclude that if E1 > E2 and p1E1 = p2E2, then


Now pick a portion pE1 of the first vaccine to have the reproductive number RH[E1, pE1]= 1. Consider also a portion p2 = pE1 E1/E2 of the vaccine of efficacy E2. We have that RH[E2, pE1E1/E2] ≥ 1. Thus pE2pE1E1/E2, which is equivalent to Lemma A.2.1.

A.3. Generations of infections and comparison results

In this section we’ll establish several auxiliary results towards proving that RVRHI. Throughout this section we’ll talk about the distribution of a perfect vaccine (E = 1), given with probability p = 1 − q to each individual on an arbitrary finite network as in section A.1. We’ll call such a network Netq.

Let S be the space of all realizations of infectiousness profiles, pairwise thresholds and individual vaccination statuses as in equation A.1.5. Pick a particular realization z in S. We’ll now define the notion of generations of infections for z. The 0th generation G0(z) are the initially infected. The first generation G1(z) are those, which were infected by people in G0. Inductively, Gk+1(z) consists of all persons, infected by people in Gk(z). Thus the sets G0(z), G1(z), … are disjoint and their sizes |Gi(z)| are random variables Giq. Our first result is a comparison lemma for the expections of the number of infected persons in the kth generation when we scale the probability of not being immunized:

Lemma A.3.1

Consider a network Netq as above and let C > 1 be a real number such that Cq ≤ 1. Consider also the network NetCq. Then for each k ≥ 1, E(GkCq)CkE(Gkq)


For each person Pj, who is infected in generation k, there is a transmission path


where Pj,0 is initially infected, Pj,k = Pj, and person Pj,i infected Pj,i+1. This calls for a definition:

Definition A.3.1

Let S be the probability space as in equation A.1.5. For a given path κ of length k, let Lq(κ) [subset or is implied by] S to be the set of all points z [set membership] S such that κ is a transmission path for z in Netq.

For any z [set membership] S, the number of persons in the kth generation Gk(z) equals the number of transmission paths of length k which were realized under z. Thus


Here the sum is taken over all possible paths of length k starting from one of the initially infected persons. Thus Lemma A.3.1 will follow from

Proposition A.3.1

For any path κ of length k, P(LCq(κ)) ≤ CkP(Lq(κ))


Let S be the space as in equation A.1.5. The same space works for both networks Netq and NetCq, only the spreads under a particular scenario z [set membership] S are different due to different definitions for the pairwise dosages in equation A.1.4. We have the subsets Lq(κ), LCq(κ) [subset or is implied by] S. We’ll establish the proposition by constructing a map W: SS with the following properties:

  1. W(LCq(κ)) [subset or is implied by] Lq(κ).
  2. For any subset B of S, P(W(B)) = CkP(B).

To construct W, recall that S = ΠMi × πRij × Π[0, 1]i. We define W: SS componentwise. On the components Mi and Rij, we define W to be the identity. On the components [0, 1]i, we have two cases:

Case 1

If Pi is a vertex in the path κ except for the initial one, define W (x) = x/C on [0, 1]i.

Case 2

If not, W is an identity map on [0, 1]i.

Clearly property b) holds. Also for any z [set membership] S, W(z) has the same infectiousness profiles and pairwise thresholds as z, while χCq(Xi(z)) ≥ χq(Xi(W (z))) except for the case when Pi is a (non-initial) vertex in κ. Aguing as in the proof of Proposition A.2.3 we conclude that each person was infected later (no earlier) under the W(z) scenario than under the z scenario. Moreover for z [set membership] Lq(κ), times till the first infectious contact for edges in κ are the same for W(z) as for z. Thus κ is indeed a transmission path for W(z), and so W(LCq(κ)) [subset or is implied by] Lq(κ). Q.E.D.

For the rest of this section we specialize to the case when our network is homogeneous, namely all μij are equal and all the individual infectiousness profiles are trajectories of the same stochastic process M. To fix notation, let Hk(q)=E(Gkq) be the expected number of people, who are infected in the kth generation. Also let f(q) = Σk ≥ 0Hk(q) be the expectation of the total number of people infected. We have the following basic

Proposition A.3.2

In a homogeneous network, Σl ≥ 0Hk+l(q) ≤ Hk(q) · f (q).


For each individual Pi, let Ui be the event that Pi is infected in the kth generation. Thus Hk(q) = ΣiP (Ui). Since each person infected in generation k + l has a unique infector in generation k, we get that

l0Hk+l(q)=iP(Ui)E(infectedviaPi[mid ]Ui)

Here a person is “infected via Pi” means that Pi is in the infection path leading to that person. As we know, ΣiP (Ui) = Hk(q). Also

E(infectedviaPi[mid ]Ui)f(q)

This is because under any spread of infection in time till Pi gets infected in the kth generation, the remaining, uninfected persons have higher conditional probabilities to be vaccinated than p, and so the expected number of those to be infected via Pi is less than the expected number of people to be infected by one vertex in a completely susceptible network. Q.E.D.

A.4. RV vs. RHI

A.4.1. Single household size

Suppose we have an epidemic in a community of households, with all households having a single size n. Let f be the expected number of infected people in an infected household. Thus RHI=RhI=RG+f1f. We have

Theorem A.4.1



Let p = 1 − q be the critical proportion of people, which needs to be vaccinated to bring the reproductive numbers to 1. By definition, RV = 1/q. For each t between q and 1, we consider a community of households with a proportion 1 − t vaccinated with a perfect vaccine. We have RHI(t)=RG(t)+f(t)1f(t), with RHI(q) = 1. Note once again that the proportion of vaccinated individuals is 1 − t - as in section A.3, it will be convenient to express everything as a function of the proportion of unvaccinated individuals.

The theorem will hold via the following

Lemma A.4.1

Let C > 1. Then



Consider g(x)=11f(t·ex) for x ≥ 0. We’ll show that g′(x) ≤ g(x). It will follow from Gronwall’s inequality that g(x) ≤ exg(0), which is the statement of the lemma. Now g(x)=t·exf(ex·t)f2(ex·t). We’ll now show that for any number y,


which is equivalent to g′(x) ≤ g(x).


The above inequality becomes an equality for a branching process, in which case f(y)=11y for y < 1 (here y is the expected number of people that one infected individual will infect) and f(y) is infinite if y > 1.

Let Hk(t) be the expected number of infected people in the kth generation as before. We have f(t)=k=0nHk(t) with H0(t) = 1. Lemma A.3.1 tells that Hk(es · y) ≤ eskHk(y) for s ≥ 0 with an equality for s = 0. Differentiating at s = 0, we get that y·Hk(y)kHk(y). Thus


Now we re-write the latest sum as


By Proposition A.3.2, Σk ≥ 1Σl ≥ 0Hk+l(y) ≤ Σk ≥ 1f (y)Hk(y) = f (y)(f (y) − 1) Q.E.D.

A.4.2. Varying household sizes

Suppose we have households of size 1,…,n, and the proportion of people living in households of size i is πi with Σπi = 1. Let fi = fi(1) be the expected number of infected people given one initial infection in a household of size i. Recall the definition of the reproductive number RHI from section 4.1.1: the proportion of people, who reside in households of size i is πi, and each such infected person infects on the average RG+fi1fi persons, thus


At the same time, if we forget about the stratification by the household size, we note that the proportion πi of people in households of size i does not equal the proportion of infected people in households of size i among all infected people. To prove RVRHI, it is convenient to define an alternative way of averaging the number of infections per infected individual as


The latter reproductive number serves as an epidemic threshold. However one can construct examples of household structures with RHI>RVLRV. On the other hand we always have

Theorem A.4.2

For any household size distribution, RV ≥RHI.


First we need to establish the following

Proposition A.4.1

For any non-negative numbers π1, …, πn with Σπi = 1, and for any positive numbers fi,



Define a random variable X, which takes values fi with probabilities πi. Consider a function g(x)=x1x for x > 0. We have g″(x) < 0, thus by Jensen’s inequality, E(g(X)) ≤g(E(X)). But this is the statement of the proposition. Q.E.D.

The above proposition tells that RHIRHI. Now we can argue as in the case of single size households. First we vaccinate the proportion p = 1 −q of the population to have RH(q)=1=RHI(q). By definition, RV = RV (1) = 1/q. Since RHI(q)=1, by Proposition A.4.1 RHI(q) ≤1. When we scale the proportion of unvaccinated people from q up to 1, we conclude by Lemma A.4.1 that RHI = RHI(1) ≤1/q ·RHI(q) ≤1/q = RV. Q.E.D.

Appendix B. Inequalities involving Rr is some special cases

B.1. Proof that RV LRr in certain cases

Here we demonstrate the inequality at two extremes. First consider the extreme where within-household transmission is very low, the limit as μh →0 (see equation 3.0.9). The household model converges to the mass action model, and all reproductive numbers converge; thus at this extreme equality holds.

Now, consider the extreme where within-household transmission is very high, μh →∞. Then everyone in the household becomes infected, and a leaky vaccine will change that fact only negligibly. Thus, the only effect of a leaky vaccine is to reduce between-household transmission. Hence at this extreme, RV L [similar, equals]RHRr.

Another comparison result between RV L and Rr is the following

Proposition B.1.1

For households of size h = 2 with exponentially distributed infectiousness periods and no latent periods, RV L = Rr

We note that in this case one can explicitly compute both reproductive numbers to see that they are equal - we omit the derivations.

B.2. Relationship between Rr and RV in the limiting cases of within-household transmission parameters

Again, we note that equality holds at the extreme of no within-household transmission, μh →0 - in this case dynamics converge to mass action model.

Now we consider the limiting case where μh → ∞. Here, all persons within the household will become infected (fh). Within this limiting case, take a further limit where there is no incubation period. Thus all individuals in an infected household immediately become infectious, and therefore the infectious contact interval distributions are equal:


- see equations 4.5.1 and 4.5.4. Thus in this limit, RrRHRV.

B.3. Relationship between Rr and RV for a very long latent period

In this appendix we prove the following

Lemma B.3.1

If we add an “infinite” latent period (a limiting case of adding very long latent periods) to individual infectiousness profiles, then RVRr, with equality if all households have sizes up to 3.


Suppose we add a large latent period to individual infectiousness profiles (without changing anything else). We can scale down time and scale up intensities of infectiousness by the same factor, so that the reproductive numbers don’t change, the latent period is 1, and the infectiousness period is very short. In this limit the individual infectious contact interval distribution wG(t) is zero for t ≠ 1, and has “weight” 1 at t = 1 - thus in the limit its moment-generating function is


To understand the household infectious contact interval distribution, we observe that in the limit, the kth generation of infections in the household are the people infected at time k (after the index case). Let us denote the expected number of people, infected in kth generation in a household of size h by Gk(h) (with G0(h) = 1). Consider the random variable A=0I(s)ds- the cumulative individual infectiousness. In this notation, the function βh(t) (see equation 3.0.10) is zero for t ≠ 1, …, h, and βh(k) has weight E(A)Gk(h) for k = 1, …, h. Also E(A) = CIG and RG = CIG · μG (see equation 3.0.8). Thus the equation 4.5.5 for growth rate r of the epidemic becomes


with Rr = er. To understand RV, suppose we need to immunize a proportion p of the population with the perfect vaccine to bring the reproductive numbers to 1. Let q = 1 −p, with 1


Let us denote by Gkq(h) the expected number of people infected in kth generation is an infected household of size h in which each person is vaccinated with a perfect vaccine with probability p. We have


By Lemma A.3.1, Gkq(h)qkGk1(h)=qkGk(h). Using this, we rewrite the latest equation as


Comparing equations B.3.1 and B.3.2, we see that qer. Thus


Finally we note that if all households are of size up to 3, it is easy to see that Gkq(h)=qkGk1(h). Thus the inequality in equation B.3.2 becomes an equality. Comparing equations B.3.1 and B.3.2, we conclude that er = q, so Rr = RV in the limiting case of a latent period going to ∞.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Andersson H, Britton T. Stochastic epidemic models and their statistical analysis. Springer Lecture Notes in Statistics. 2000:151.
  • Ball F, Becker N. Control of transmission with two types of infection. Math BioSci. 2006;200:170–187. [PubMed]
  • Ball F, Britton T, Lyne O. Stochastic multitype epidemic in a community of households: estimation and form of optimal vaccination schemes. Math BioSci. 2004;191:19–40. [PubMed]
  • Ball F, Mollison D, Scalia-Tombda G. Epidemics with two levels of mixing. Ann Applied Prob. 1997;7(1):46–89.
  • Ball F, Neal P. A general model for stochastic sir epidemics with two levels of mixing. Math Biosci. 2002;180:73–102. [PubMed]
  • Becker NG, Dietz K. The effect of household distribution on transmission and control of highly infectious diseases. Math BioSci. 1995;127(2):207–219. [PubMed]
  • Britton T, Becker NG. Estimating the immunity coverage required to prevent epidemics in a community of households. Biostatistics. 2000;1 (4):389–402. [PubMed]
  • Chowell G, Hengartner N, Castillo-Chavez C, Fenimore P, Hyman J. The basic reproductive number of ebola and the effects of public health measures: the cases of congo and uganda. Journal of Theoretical Biology. 2004;229 (1):119–126. [PubMed]
  • Diekmann O, Heesterbeek JAP. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation. John Wiley & Sons; New York: 2000.
  • Ferguson NM, Cummings DA, Cauchemez S, Fraser C, Riley S, Meeyai A, Iamsirithaworn S, Burke DS. Strategies for containing an emerging influenza pandemic in southeast asia. Nature. 2005;437 (7056):209–14. [PubMed]
  • Ferguson N, Dodd P. Approximate disease dynamics in household-structured populations. Journal of The Royal Society Interface. 2007;4 (17):1103–1106. [PMC free article] [PubMed]
  • Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS ONE. 2007;2 (1):e758. [PMC free article] [PubMed]
  • Halloran ME, Struchiner CJ, Longini IMJ. Study designs for evaluating different efficacy and effectiveness aspects of vaccines. American Journal of Epidemiology. 1997;146 (10):789–803. [PubMed]
  • Halloran M, Ferguson NM, Eubank S, Ira M, Longini J, Cummings DAT, Lewis B, Xu S, Fraser C, Vullikanti A, Germann TC, Wagener D, Beckman R, Kadau K, Barrett C, Macken CA, Burke DS, Cooley P. Modeling targeted layered containment of an influenza pandemic in the united states. PNAS. 2008;105 (12):4639–4644. [PubMed]
  • House T, Keeling M. Deterministic epidemic models with explicit household structure. Math BioSciences. 2008;213 (1):29–39. [PubMed]
  • Kenah E, Lipsitch M, Robins JM. Generation interval contraction and epidemic data analysis. Math Biosci. 2008;213 (1):71–79. [PMC free article] [PubMed]
  • Lipsitch M, Cohen T, Cooper B, Robins JM, Ma S, James L, Gopalakr-ishna G, Chew SK, Tan CC, Samore MH, Fisman D, Murray M. Transmission dynamics and control of severe acute respiratory syndrome. Science. 2003;300 (5627):1966–70. [PMC free article] [PubMed]
  • Mills CE, Robins JM, Lipsitch M. Transmissibility of 1918 pandemic influenza. Nature. 2004;432 (7019):904–6. [PubMed]
  • Riley S, Wu JT, Leung G. Optimizing the dose of pre-pandemic influenza vaccines to reduce the infection attack rate. PLOS medicine. 2007;4 (6):1032–40. [PMC free article] [PubMed]
  • Sellke T. On the asymptotic distribution of the size of a stochastic epidemic. J Appl Prob. 1983;20:390–394.
  • Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc Biol Sci. 2007;274 (1609):599– 604. [PMC free article] [PubMed]
  • Wallinga J, Teunis P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am J Epidemiol. 2004;160 (6):509–16. [PubMed]