To begin, we consider a setting similar to previous literature^{7,8} in which there are *N* households indexed by *i* = 1*,* …*, N* such that each household consists of two persons indexed by *j* = 1, 2. We let *A*_{ij} denote the vaccine status for individual *j* in household *i*, where *A*_{ij} = 1 denotes vaccinated and *A*_{ij} = 0 denotes not vaccinated. We let *Y*_{ij} denote the infection status of individual *j* in household *i* after some fixed follow-up period, with *Y*_{ij} = 1 denoting infection and *Y*_{ij} = 0 denoting no infection.

The counterfactual framework defines causal effects in terms of contrasts of hypothetical scenarios or interventions, some of which may be contrary to fact. For example, *Y*_{ij}(*a*_{i}_{1}*, a*_{i}_{2}) denotes the potential infection outcome for person *j* in household *i* if the two persons in that household had, possibly contrary to fact, vaccination status of (*a*_{i}_{1}*, a*_{i}_{2}). Thus, *Y*_{i}_{2}(1, 0) denotes the potential outcome of person 2 if person 1 receives vaccine and person 2 does not; and *Y*_{i}_{1}(0, 0) denotes the potential infection outcome of person 1 if neither person 1 nor person 2 received the vaccine. For a particular person, say person 1, we have four potential outcomes in this setting: *Y*_{i}_{1}(0, 0), *Y*_{i}_{1}(0, 1), *Y*_{i}_{1}(1, 0)*, Y*_{i}_{1}(1, 1). We could then consider various contrasts of these potential outcomes as causal effects. The potential outcomes are related to the observed data by a consistency assumption that, when the actual vaccine status (*A*_{i}_{1}*, A*_{i}_{2}) = (*a*_{i}_{1}*, a*_{i}_{2}), the actual observed outcomes *Y*_{ij} = *Y*_{ij}(*a*_{i}_{1}*, a*_{i}_{2}). Thus, for a particular household and particular person we actually observe only one of these four potential outcomes: the one corresponding to the vaccine status that actually occurred. We thus will not be able to estimate the causal effects for a particular person and a particular household. However, if vaccine is randomly assigned, we can hope to estimate a population average effect. In this paper, we define our average causal effects using this “potential outcomes” or counterfactual notation, as it allows for precise definitions for contagion and infectiousness effects.

Under the notation above, the potential outcome for individual 1, *Y*_{i}_{1}(*a*_{i}_{1}*, a*_{i}_{2}) depends on the vaccine status of both person 1 and person 2, and likewise the potential outcome for individual 2, *Y*_{i}_{2}(*a*_{i}_{1}*, a*_{i}_{2}) depends on the vaccine status of both persons. Thus, the exposure status of one person could affect the outcome of another. In the statistics literature, this is sometimes referred to as interference or spillover effect.^{3,4,16–19} Most literature in causal inference assumes there is no interference,^{16,20} so that one person’s outcome does not depend on the exposure of others. In the current context this would imply that *Y*_{i}_{1}(*a*_{i}_{1}*, a*_{i}_{2}) = *Y*_{i}_{1}(*a*_{i}_{1}) and *Y*_{i}_{2}(*a*_{i}_{1}*, a*_{i}_{2}) = *Y*_{i}_{2}(*a*_{i}_{2}) so that each person’s outcome depends only on his or her own exposure status. This assumption of no interference is implausible in the infectious disease context^{2}, and so we do not make it here. We do, however, assume the exposure status of persons in one household in the study does not affect the outcomes of those in other study households, sometimes called an assumption of partial interference.^{3,17} This might be plausible if the various households are sufficiently geographically separated or do not interact with one another. In the case of a vaccine trial, where a few households are randomly sampled from a large city and the cluster is treated as the household, this assumption is perhaps not unreasonable - but it is unlikely to hold exactly. Importantly, this assumption pertains to household units in the study, not to all households that might have been in the study.

Throughout this paper we assume a simple randomized experiment in which one of the two persons is randomized to receive a vaccine or control and the second person is always unvaccinated. This could correspond to the hypothetical pneumococcal vaccine trial described above where we are interested in the effect on the mother of vaccinating the one-year-old. In the Discussion we consider relaxing these assumptions. We will let *j* = 1 denote the individual who may or may not be vaccinated and *j* = 2 the individual who is always unvaccinated. The methodology below and the definitions used will still be applicable even if some persons in the study are immune to infection.

Using the counterfactual notation, the average indirect effect is

i.e. the difference in infection status for person 2 if person 1 is vaccinated versus unvaccinated.^{8}

If vaccine status is randomized, this can be estimated by

^{7,8}:

Halloran and Hudgens^{8} also refer to this as the “ITT (intention to treat) indirect effect.”

To proceed with decomposing this indirect effect into the two effects we need to consider counterfactuals of a different form. From this point onwards, we assume that only person 1, not person 2, can be infected from outside the household; person 2 can be infected only by person 1. Thus if *Y*_{i}_{1}(*a*_{i}_{1}*, a*_{i}_{2}) = 0 then *Y*_{i}_{2}(*a*_{i}_{1}*, a*_{i}_{2}) = 0. This could again correspond to a vaccine trial for pneumococcal conjugate vaccine for one-year-olds at the day-care center.

Suppose that in addition to potentially intervening to give person 1 the vaccine we could also, at least hypothetically, intervene to infect or not infect person 1. Then *Y*_{i}_{2}(*a*_{i}_{1}*, a*_{i}_{2}*, y*_{i}_{1}) would denote the infection status of person 2 if we would set the vaccine status of person 1 and person 2 to *a*_{i}_{1} and *a*_{i}_{2} and the infection status of person 1 to *y*_{i}_{1}. This in some sense formalizes, using counterfactual notation, ideas that were proposed by Halloran and Struchiner.^{2}

The assumption that individual 2 is always unvaccinated allows a simplified notation. Counterfactuals *Y*_{i}_{1}(*a*_{i}_{1}*, a*_{i}_{2}) and *Y*_{i}_{2}(*a*_{i}_{1}*, a*_{i}_{2}) can be written as *Y*_{i}_{1}(*a*_{i}_{1}) := *Y*_{i}_{1}(*a*_{i}_{1}*,* 0) and *Y*_{i}_{2}(*a*_{i}_{1}) := *Y*_{i}_{2}(*a*_{i}_{1}*,* 0). We are still assuming interference/spillover in that the vaccine of person 1 affects the outcome of person 2. This simple setting in which person 2 always remains unvaccinated also allows us to rewrite the counterfactual *Y*_{i}_{2}(*a*_{i}_{1}*, a*_{i}_{2}*, y*_{i}_{1}) as *Y*_{i}_{2}(*a*_{i}_{1}*, y*_{i}_{1}) := *Y*_{i}_{2}(*a*_{i}_{1}*,* 0*, y*_{i}_{1}). We thus consider counterfactuals of the form *Y*_{i}_{1}(*a*_{i}_{1}), *Y*_{i}_{2}(*a*_{i}_{1}) and *Y*_{i}_{2}(*a*_{i}_{1}*, y*_{i}_{1}). The direct effect of person 1’s vaccine on person 1’s outcome is *E*[*Y*_{i}_{1}(1) − *Y*_{i}_{1}(0)]; the indirect effect of person 1’s vaccine on person 2’s outcome is simply *E*[*Y*_{i}_{2}(1) − *Y*_{i}_{2}(0)]. In the next section we will use these counterfactuals to define contagion and unconditional infectiousness effects.