Here, we presented a two-tiered model that can be used to simulate both the ecological and the evolutionary dynamics of rapidly evolving RNA viruses. The model's novelty resides in its modular design: it separates antigenic dynamics from genotypic dynamics, and thereby yields computationally simpler simulations that allow for a more realistic representation of viral sequences. At the heart of the two-tiered model is the antigenic emergence rate, which drives the emergence dynamics of new antigenic variants in the epidemiological submodel. Here, we also showed that this antigenic emergence rate, when parametrized with a shape parameter k of 2, can be mechanistically interpreted in terms of a model that considers neutral mutation accumulation and the probability of immune escape increasing linearly with the number of mutations already accumulated (appendix A). The phenotypic dynamics resulting from this first tier of the model are then used as input for the second tier of the model, the molecular evolution submodel. We showed here that this second submodel simulates sequence data from which quantitative indexes (divergence and diversity metrics) can be computed, which can be compared with empirical sequence data. Furthermore, phylogenies can be inferred from these simulated sequences, with branch lengths that are comparable to those from trees inferred from empirical viral sequences. The two-tiered model in its entirety can, therefore, be used to generate case data and sequence data that can be confronted statistically against empirical datasets of these two types.
The modularity of the two-tiered model is its principal strength and will allow this framework to be adapted to consider alternative hypotheses and to include alternative, and potentially better or faster, submodels. For example, the first tier of the model uses a status-based, reduced infectivity multi-strain model in its implementation (
Gog & Grenfell 2002). However, recent work has shown that this model, in contrast to other, more highly dimensional, multi-strain models, overestimates the level of herd immunity to a new antigenic variant (
Ballesteros et al. 2009). Owing to its modularity, the two-tiered model can be easily modified to consider alternative epidemiological submodels, for example, the well-known history-based multi-strain model (
Andreasen et al. 1997). Furthermore, any of these models can be extended to consider specific questions of interest, such as what climate variables are important drivers of influenza's seasonal dynamics (
Shaman et al. 2009), what role population substructure plays in the ecological and evolutionary dynamics of influenza (
Truscott et al. 2009) and how cross-immunity may act (e.g. by its separate effects on infectiousness and infectious period;
Park et al. 2009). The only requirement of the epidemiological submodel is that it generates variant-specific case dynamics, which are used as input into the second tier of the model (electronic supplementary material, figure S1).
Similarly, the molecular evolution submodel, as described, can be easily replaced with an alternative submodel. One possible alternative submodel that would be computationally faster, but apply to a more limited number of cases, might use approaches based on coalescent theory to yield viral genealogies. A second possibility is to replace the current molecular evolution submodel with a model that has a mechanistic link between the parameter f in the second tier of the model and the parameter γ in the first tier of the model. A third possibility would be to consider not just the process of point mutations, but also to allow for insertions, deletions, recombination and, for segmented viruses, reassortment. Here, the only requirement for the second tier is that it takes in variant-specific case data and generates time-stamped viral sequences.
Following its description, we applied the two-tiered model to influenza A (H3N2) in humans, to influenza B in humans and to influenza A (H3N8) in equine hosts in order to illustrate its use. In the first application, we showed that a model parametrized for a combination of gradual and punctuated antigenic change could quantitatively reproduce the ecological and evolutionary patterns of this subtype in humans. In contrast, and consistent with previous theoretical findings (
Ballesteros et al. 2009), a model with purely punctuated antigenic evolution failed to capture these patterns well. In the electronic supplementary material, we also showed that only gradual antigenic evolution was not consistent with all of the observed dynamics of influenza A (H3N2). The ability of only the model with both modes of antigenic change to quantitatively reproduce the dynamic patterns of this variant resolves the seemingly contradictory findings that antigenic change occurs either in a punctuated (
Smith et al. 2004;
Wolf et al. 2006;
Blackburne et al. 2008) or in a gradual (
Shih et al. 2007;
Suzuki 2008) manner. Both modes of antigenic evolution appear necessary: gradual antigenic evolution is needed to reproduce the observed periodicity of influenza's ecological dynamics and the rapid rate of HA divergence, while punctuated antigenic evolution is needed to reproduce rates of divergence and the overall ladder-like topology of influenza A (H3N2)'s HA protein.
The application of the model to influenza B illustrated the model's ability to generate qualitatively different ecological and evolutionary dynamics under alternative parametrizations. This ability is critical for the model to be effectively interfaced with different empirical datasets, through the development and application of new statistical approaches. The application of the model to influenza A (H3N8) in equine hosts served to illustrate the ease with which the model could be extended to accommodate further hypotheses. Specifically, we considered the hypothesis that the introduction and later weakening of quarantine measures between North America and Europe played a role in shaping the evolutionary dynamics of H3N8. Although the model realizing this hypothesis was able to reproduce features of the ecological and evolutionary dynamics of H3N8, alternative hypotheses could easily be considered within this framework. A statistical comparison between these models' simulated sequences (and possibly case dynamics) could then determine the appropriate level of support for each of the models considered.
In our applications to flu, we parametrized the two-tiered model to consider the effects of humoral immune escape, driven by genetic changes in the virus's dominant antigenic protein. While this parametrization has empirical support in the case of influenza (
Smith et al. 2004), we may want to consider alternative mechanisms of immune escape. For example, there is evidence for positive selection of cytotoxic T lymphocyte escape mutants (
Gog et al. 2003). Another possible hypothesis is that generalized immunity plays a role in shaping the ecological and evolutionary dynamics of influenza (
Ferguson et al. 2003). These hypotheses, as other ones mentioned above, could easily be integrated into this two-tiered modelling framework. This integration would enable us to finally compare these hypotheses in a quantitative way, considering both incidence data and sequence data.
Although our focus here was on the ecological and evolutionary dynamics of RNA viruses at the population level, the two-tiered structure of the model could also be used to consider the dynamics at another level of organization. Specifically, while we modelled the dynamics of susceptible, infected and recovered hosts here, within-host dynamics could instead consider classes of naive cells and cells that are infected with virus of different antigenic phenotypes. In lieu of simulating epidemiological dynamics, the first tier of the model would simulate the viral load dynamics, by antigenic type. The second tier of the model would then be used again to generate viral sequences that could be compared with viral sequences isolated from a single chronically infected host over several time points (e.g. in the case of HIV;
Shankarappa et al. 1999).
Regardless of whether the two-tiered framework presented here is applied at the within-host level or the population level, its ability to generate both case data and sequence data that can be statistically confronted with empirical observations will improve our understanding of the key drivers of viral dynamics, and may thereby ultimately help in their control.