3.1. Model components
Comparison of spatial and non-spatial population-independent models shows that inclusion of distance substantially improves model fit for both England and Wales and the US (ΔDIC = 64.7 and ΔDIC = 33.3, respectively). DIC and parameter estimates for the distance-only model are given in column 1 of tables and .
Parameter estimates for six models in England and Wales. Posterior medians and equal tailed 95% credible interval presented for each parameter.
Parameter estimates for six models in US. Posterior medians and equal tailed 95% credible interval presented for each parameter.
Previous formulations of the gravity kernel in the literature have considered either density-dependent (
= 0) or density-independent transmission (
= 1). compares the fit (expressed by the posterior deviance) of these two formulations and with that from the model where the degree of density dependence,
, is estimated. This comparison is made for models assuming no linear or a fitted power-dependence of spatial coupling on both source and destination city population size. In a
is estimated, whereas in b
it is fixed at 1. See model components in the electronic supplementary material for further comparisons.
Figure 2. Posterior deviances of nine models for comparison in (a) England and Wales, and (b) the US. Blue models are population independent, green have a linear relationship with source and destination population size, and red estimate the relationship between (more ...)
For England and Wales (a), in each population context the variant that estimates the degree of density dependence (the lightest curve of each colour) gives a slightly better fit than models with no density dependence, with pure density dependence fitting substantially less well. The comparison also shows that the models which estimate the effect of origin and destination city population sizes on the connectivity of cities are much better than either the population-independent or linear population size-dependent models.
The same set of comparisons is made for the US in b. The situation is more complex with the posterior distribution of many model variants lying in the same area. Comparisons by DIC value cannot distinguish these models. Unlike in England and Wales, there is no density-dependence variant which has lower deviance for all three of the population size-dependence variants examined. Inclusion of nonlinear population size-dependence does not penalize the fit of the US model, and so cannot be definitively excluded as being consistent with the data. The models presented in columns 5 and 6 in have different population relationships, but the same DIC score. The credible intervals on the population parameters of the density-dependent population with infectivity model (column 5) are very wide suggesting that little information is added by the inclusion of these parameters.
In England and Wales, the lowest DIC model is one where the degree of density dependence is estimated and the effect of population is also estimated. This is in contrast to the US where population-independent models either with density-dependent or estimated density dependence spatial interaction terms are indistinguishable.
3.2. Impact of infectivity profile
We tested models with three types of infectiousness profile through time: constant infectivity, a linear relationship between infectivity and mortality in the week ahead, and an estimated power-law relationship between mortality and infectivity. Mixing was poor when estimating
with the US data so we only compare the first two models in that setting.
In England and Wales, the linear infectivity model has a DIC value of more than 25 above either the constant or estimated infectivity model. Parameter estimates for the constant-infectivity and estimated-infectivity model variants are shown in columns 4 and 5 of using the density-dependent population-dependent framework from the previous comparison in England and Wales. These two models are indistinguishable by DIC (ΔDIC = 0.1). Estimates for all other parameters are very comparable between these two models.
The estimated relationship includes two inputs from the infected city: the mortality rate and the population of the city. It can be more difficult to estimate parameters regarding infectivity, so we tested a model which takes only one piece of information from the infected city. The final column in shows a model which takes the mortality rate from the infected city into account but does not include the population size of that city. There is an improvement in the DIC score for this model of 4.4 over the constant infectivity model.
shows parameter estimates for models in the US. The difference in DIC score between a constant infectivity model and one with a linear relationship between mortality and infectivity is negligible in either a distance-only model framework (columns 1 and 6) or a population-dependent framework (columns 2 and 5). Adding infectivity information does not improve the fit of the model.
In the England and Wales dataset, the lowest DIC model is the single infected city parameter model in column 6 of . The model is dependent on the destination population size, has estimated dependence of infectivity on mortality and an estimated intermediate degree of density dependence. The distance power γ was estimated as 1.18 (0.96, 1.39). A lower value was found for models in the US, where for the most parsimonious low DIC model (density dependent, population independent), γ was estimated as 0.79 (0.54, 1.00). shows the distance kernels for the two datasets. The credible intervals for γ overlap for the two datasets.
Best-fit distance kernels for England and Wales (blue) and US (red). Posterior median is shown as a darker line and 95% credible intervals are given by the shaded region. (a) Unlogged and (b) logged.
The power parameter on the destination city population, μ, was estimated at 0.40 (0.25, 0.54) in England and Wales. The credible intervals exclude 1, demonstrating that as population size increases, the susceptibility of the city increased more slowly.
3.3. Comparison between datasets
We used the posterior median parameter estimates fitted to the England and Wales dataset to calculate a likelihood value in the US dataset. By likelihood ratio test, this value was not different from the most parsimonious low DIC US model (−55.23, −57.72, p > 0.97). We therefore cannot reject the assumption that spread had the same characteristics in the US and England and Wales, though clearly the smaller size of the US dataset reduces inferential power.
3.4. Infection trees
shows the most likely infection tree for each city in England and Wales stratified by the phase of the epidemic during which each city was infected. Inferred city-to-city infection events more frequent than 70 per cent (in 1000 trees) are shown in black, events of lower frequency are shown in grey. Interactions in weeks 0–3 are longer range than those in weeks 4–7 (p < 0.01), which are in turn longer range than those in weeks 8–10 (p < 0.01) (d). The probability that the most likely infector was responsible for each infection falls as the epidemic progresses because there are many more potential infectors available later (e). Cities infected early give rise to more infections than those infected late in the wave, as expected, but the range is large, with some early cities giving rise to no new infections (f).
Figure 4. (a–c) The most likely infector tree for each stage of the epidemic in England and Wales. Weeks 0–4 are the (a) early stage of the epidemic, (b) weeks 5–7 are the middle and (c) weeks 8–10 are late in the epidemic. Black (more ...)
shows results for two models using the US data. We compare the most likely infection trees for the distance-only constant infectivity model with parameters inferred from the US data (a) with a model where parameters used to generate the trees are taken from the England and Wales single-infected city parameter model (c). In the distance-only constant infectivity model, the nearest infected city is always the most likely infector. In contrast, with the England and Wales parameters, some links between cities are high frequency, while other cities have several potential infectors of intermediate frequency (b). As in England and Wales, infection events inferred early in the epidemic have a higher support than those later in the epidemic. There are some exceptions owing to the distribution of cities in the US dataset—Oakland and San Francisco are distant from all other cities but very close to each other. In the distance-only constant infectivity model, some cities may give rise to a large number of new infections (e.g. Pittsburgh gives rise to nearly a quarter of infections) (d). The effect of a city acting as a hub of infection is reduced in the more complex model, as the risk of infection from one city to another is the combined effect of several factors including distance.
Figure 5. (a) The most likely US infection tree for 1000 parameter sets from the distance-only constant-infectivity model. Arrows show infector and infected cities. Cities infected early in the epidemic (week 0–2) are shown in red, those infected in the (more ...)
For England and Wales, there is a relatively good agreement between observed and simulated epidemic curves (b). The observed epidemic curve rises more steeply than the simulation curves in the early stages of the epidemic, and peaks one week earlier than the simulation mean. This suggests that the model may underestimate the external infection pressure early in the wave.
Figure 6. (a) Cities that lie outside of the 75% probability interval for infection week are shown in red. (b) 1000 simulations from the best model—single infected-city parameter model in England and Wales. Dark grey is the simulation mean, red is the observed (more ...)
We calculated the probability that a city was infected in each week given the observed behaviour of all other cities up to that time. In England and Wales, 245 of 246 cities lie within the 95 per cent interval of their expected distribution. a shows the cities which the observed infection week lies outside the stricter inter quartile interval. For further information see the electronic supplementary material. There are no population size (p = 0.36) or density trends (p = 0.11) in these cities, which are typically infected later in the epidemic (p < 0.01 for difference in infection week). In the US, all cities lie within the 95 per cent probability interval and all but three lie within the inter quartile interval. Those three outlier cities are smaller than other cities (p = 0.01) but equally distributed in space (p = 0.88) and time (p = 0.06).
We have tested the effect on parameter estimates in England and Wales of relaxing the single-introduction assumption inherent in the model. We re-estimated the parameters conditioning on infections that occurred from week 3 of the epidemic onward. There is a small increase in the kernel power parameter estimate, which causes the kernel to decay more rapidly with distance (electronic supplementary material, figure S9). This suggests that the very long-range interactions, which are forced to occur early in the epidemic impact the shape of the kernel. However, the credible intervals largely overlap which indicates this assumption does not affect the fit of the model to a large degree.
In the US the simulated curves for the distance-only constant infectivity model are shown in e and for the England and Wales parameters in f. In both cases, the mean simulated and observed curve are very comparable, with the distance-only constant infectivity model giving peak incidence in the same week as observed. g shows the observed week of infection against the simulated week of infection for all 1000 simulated epidemics. There is good correlation between the observed and simulated weeks of infection for both parametrizations.
We have tested the effect of thinning the England and Wales dataset so that it more closely resembles the US dataset to determine if the differences in formulation between the best models for each dataset are owing to the smaller number of cities in the US dataset. We removed all cities with fewer than 90 000 inhabitants in England and Wales leaving 46 cities distributed quite evenly in England and Wales as shown in the electronic supplementary material, figure S10. There were identifiability problems in estimating the density-dependence parameter
using the thinned dataset. The best model by DIC comparison gave a distance-only interaction (no dependence on population size) with infectivity scaling linearly with mortality in a density-independent framework. As we found with the US data, it is difficult to disentangle the effects of population and infectivity parameters because these feature in different combinations in comparable DIC models.