Using simple differential equation based models to gain insights into the phylodynamics of viral infections, we have demonstrated that the pattern of coalescence for an infectious disease is dominated by the transmission rate, while the number of infected individuals is of secondary importance. Although
Holmes et al. (1995) recognized that coalescence in an infectious disease was related to transmission, this was not taken into account in later phylodynamic studies, which referred to the ‘effective number of infections’, i.e. the prevalence. Some studies also noted that the generation time is effectively the time between infections (
Pomeroy et al. 2008;
van Ballegooijen et al. 2009), and not the duration of infectiousness, but did not recognize that this changes throughout an epidemic. Hence, a single transformation of time, which is commonly used to estimate
Ne from temporally sampled sequence data, cannot be used to recover the ‘effective number of infected individuals’. In some cases, such as during exponential growth, there is a linear relationship between the transmission rate and the number of infected individuals, and with an appropriate choice of time scale (dividing time by
βc in the models here) it is possible to estimate the number of infected individuals, but this is not true in general. Some studies (e.g.
Rambaut et al. 2008) have been vague in the interpretation of the coalescence rate, relating it to ‘genetic diversity’. We believe that this is a little too cautious—the rate of coalescence can be related to epidemiological parameters, but we have to explicitly consider the underlying transmission dynamics for this to be done correctly. For example, in the case of endogenous retroviruses (
Romano et al. 2008), the transmission tracks the reproduction of the host, and standard coalescent models used for human populations can be used. In the case of viruses where there is significant vertical and horizontal transmission, more sophisticated models that incorporate coalescence in both the host and the virus will be required to interpret the phylodynamics patterns in the context of transmission parameters. A particularly pertinent quote comes from a review by
Donnelly & Tavaré (1995) in their discussion of the time-varying coalescent (equation (
2.1)):
[T]he results described above do not apply in general. It is true for very general neutral models that unless there are discontinuities, i.e. sudden changes, in the processes governing the population size, the ancestral process can be represented as a time change of the process described in (equation (
2.1)). However, the form of the time change, which is in general different from (equation (
2.1)), depends on properties of the random process governing the rate at which individuals are born in the population, about which little is known in many practical contexts. It thus appears that some caution is appropriate in applying the above results on the coalescent in populations of variable size.
(Donnelly & Tavaré 1995, p. 408)
That coalescence is related to transmission has important implications when interpreting phylodynamic patterns in the context of other data, such as information on the timing of external events or on disease prevalence. For example, in a recent study of dengue (DENV-4) in Puerto Rico (
Bennett et al. 2009), although both
Ne and case counts fluctuated over time, changes in
Ne preceded changes in case counts by about seven months. This puzzling result is easily explained when one recognizes that the coalescence rate is a measure of incidence; as shown in our simple model of an oscillating epidemic, we expect incidence and prevalence to be out of phase, and in general, peaks of incidence precede peaks of prevalence. There was also no simple relationship between the amplitude of the fluctuations in
Ne compared with the amplitude in case counts; in order to derive a meaningful comparison between these data, we would have to compare fluctuations in estimated incidence with
Ne. Multiple studies have interpreted the timing of changes in phylodynamic patterns in the context of changes in other factors. For example, a decline in a skyline plot obtained from hepatitis A sequences sampled in France coincided with the introduction of vaccination (
Moratorio et al. 2007), while a massive expansion in the ‘effective number of infections’ of hepatitis C virus in Egypt fell within a time period when the general population was treated with parenteral antischistosomal treatment (
Pybus et al. 2003). Such external forces have a more immediate impact on transmission than prevalence.
The phylodynamic patterns can also be affected by sampling; sampling a higher fraction of the infected individuals at a time results in more recent coalescent times, and shorter terminal (external) branches of the tree, and a different tree shape (
Mooers 1995;
Rannala et al. 1998; Pybus
et al. 2000,
2002;
Purvis & Agapow 2002;
Huelsenbeck & Lander 2003;
Volz et al. 2009). As many viral phylodynamic studies employ serial samples of viral sequences, it is important to correct for possible differences in sampling depth, which will be a function ofthe temporal pattern of the sampling and the number of infected individuals. In a heterogeneous epidemic, the extent to which specific subpopulations are over- or under-sampled also has to be taken into account. The model framework we present here can be extremely informative to help understand the potential effects of sampling on phylodynamic patterns, and offers a more computationally faster approach to studying sampling effects than approaches based on full epidemic simulations coupled with computationally intensive Bayesian approaches for estimating
Ne (
Stack et al. in press).
Deterministic models of the phylodynamics of infectious disease can be very informative due to their relative simplicity. However, in some cases, such as the very early stages of an epidemic, or an endemic infection in a small population, a stochastic model may be more appropriate. In the simple case of a susceptible-infected (SI) model in a closed population (i.e. equations (
2.5) and (
2.6) with
Λ =
μ= 0), the timing of the coalescent events coincides with each transmission, and hence in this case, we can use the widely studied stochastic version of the SI model to model changes in ancestral lineages through time. However, in general, we cannot simply borrow from the epidemiological or population genetic literature. Most work on coalescent theory in finite populations has focused on birth–death processes (
Hey 1992;
Nee et al. 1994;
Rannala 1997), either homogenous or non-homogenous, which are too simple for our purposes, while stochastic epidemiological models generally consider the dynamics of the process forward in time, rather than backwards, and do not consider the number of lineages. Unlike the deterministic models, in general we cannot simply run the nonlinear epidemiological models backwards in time from the present; for example, the stochastic version of the model (2.5) and (2.6) reaches a quasistationary state, at which point, the system has no ‘memory’ of when the first infection occurred.
The simple nature of the epidemiological models considered here allowed us to draw direct comparisons between population genetics models such as the Wright–Fisher and the Moran model, and epidemiological models. The correspondence between population genetic and epidemiological models becomes more complex in the case of heterogeneous populations; the models described here can be extended to consider heterogeneous populations, such as different contact rates, different infectivities, spatial structure and so on. For example, previously we considered a model of HIV infection which assumed two stages of infection, a brief, highly infectious acute period, followed by a long, less infectious chronic period (
Volz et al. 2009), such that there is no longer a single rate of coalescence that applies to all individuals. In addition, for the simple models discussed here, the shape of the tree is captured by the dynamics of the number of lineages over time. However, phylogenetic trees contain more information than simply the number of lineages over time, for example tree balance, the distribution of the length of the terminal branches, and in the case of a heterogeneous population, the relative distribution of subpopulations across the tree. The development of new phylodynamic models will help to elucidate the role of epidemiological processes in generating these patterns.