The most striking observation from our study was the number of independent entries of H1N1/09 into the UCSD community, despite the relatively small number of sequences collected and the short duration of sampling. For ease of presentation, only the ML tree of the second wave of H1N1/09 virus is shown here (), with that for all available H1N1/09 sequences shown in Fig. S1 in the supplemental material (and these trees did not differ in any substantive way with respect to the UCSD sequences). Because of the lack of resolution in some parts of the H1N1/09 phylogenies, such that most viruses differ by very few nucleotide differences even at the genomic scale, estimating the number of independent introductions of this virus into the UCSD community is complex. An upper-bound estimate of the number of independent introductions is provided by the number of phylogenetically distinct and strongly supported UCSD lineages; that is, well-supported clusters of UCSD viruses (>70% bootstrap support in the tree of sequences from the second wave of H1N1/09) that are separated from each other by non-UCSD viruses as well as singleton lineages. Under these criteria, a total of 33 separate introductions into the UCSD community are apparent. A lower-bound estimate of the number of introductions can be obtained by simply counting the total number of UCSD lineages of H1N1/09 that may be considered phylogenetically distinct and hence indicative of independent entry, irrespective of their bootstrap support, plus singleton lineages. With the use of this second approach, a total of 24 separate introductions can be inferred. Importantly, we observed no clear signal for reassortment either with the use of the RDP3 analysis of concatenated genomes or in the individual HA and NA phylogenies. Although there were topological differences among the HA and NA phylogenies and to the whole genome tree, these likely reflect a lack of phylogenetic signal, as bootstrap values in the HA and NA trees were consistently very low, and both trees clearly presented a picture of multiple entries of H1N1/09 into UCSD (Fig. S2). As such, concatenating genomic segments as we have done here represents a valid approach for studying the microevolution of H1N1/09.
Fig. 3. Maximum likelihood phylogenetic tree of 1,036 complete genome sequences from the second wave of H1N1/09 influenza A virus. Those viruses sampled from the UCSD community are highlighted in red. The approximate positions of the seven clusters of sequences (more ...)
Visual inspection of the phylogenies revealed that in all cases the sequences most closely related to the UCSD viruses, and hence their probable source populations, were also sampled from the United States, and often from California (although more-precise location information within California was unavailable). This suggests that there has been at least some regional evolution of H1N1/09 within the United States, although the clear bias toward U.S. sequences in the available data from the second wave means that all conclusions here should be drawn with caution. All but one of the UCSD sequences fell into clade 7 of H1N1/09 influenza A virus, which appears to dominate infections globally during the second wave of H1N1/09 (14
). The exception was a single sequence (A/San Diego/INS35/2009) associated with clade 2 that commonly circulated during the first (Spring) wave of H1N1/09 virus ( and ). Hence, not all early circulating viral lineages suffered extinction with the arrival of the second wave.
Fig. 4. Maximum likelihood phylogenetic tree of 57 complete genome sequences of H1N1/09 sampled from UCSD. Each of the seven putative UCSD transmission clusters is indicated, with viruses collected from students residing on campus shown in red. Bootstrap support (more ...)
Of equal note was the extensive spatial mixing of H1N1/09 within the UCSD community. In total, we observed seven well-supported clusters of sequences from UCSD students that were compatible with intracommunity transmission ( and ), although the composition of some of the clusters is hard to define and it is striking that none of these clusters contained sequences sampled exclusively from on-campus residences. The remaining sequences fell as singletons, such that each may represent an independent entry into the UCSD community.
Of the seven UCSD clusters, three comprised only two sequences, another two contained three sequences, and one cluster comprised four sequences. Of most significance was the observation of a major cluster (no. 6) of 14 sequences that received strong bootstrap support (82% in the tree of sequences from the second wave of H1N1/09) and was composed of many short branches, including some identical sequences (see below), as expected if some of the individuals were linked by direct, or near-direct, transmission. Although the presence of this cluster is strongly suggestive of intra-UCSD transmission, closer inspection of the phylogenetic trees again reveals a complex picture of geographical movement. In particular, there is no clear clustering according to whether individuals were residing in the same dorm or on the east or west part of the campus (see below). For example, sequences A/San Diego/INS02/2009 and A/San Diego/INS05/2009 occupy divergent phylogenetic positions within cluster no. 6, although they were taken from individuals residing in the same dorm and who reported sick on the same day. Such extensive spatial mixing is also apparent from the three identical pairs of genome sequences that were sampled from UCSD. One such pair comprised two sequences (A/San Diego/INS03/2009 and A/San Diego/INS197/2009) sampled off-campus from different ZIP codes in the San Diego region and 9 days apart. A second pair (A/San Diego/INS08/2009 and A/San Diego/INS195/2009) comprised one sequence from an on-campus location and one from an off-campus location, sampled 2 days apart. Most interestingly, a third pair of sequences (A/San Diego/INS02/2009 and A/San Diego/INS06/2009) was sampled on the same day, both on-campus, but from different residences.
To obtain a more quantitative measure of the pattern of the geographical structure of H1N1/09 in the UCSD community, we employed a number of phylogeny-trait association statistics, in which the trait in question represents a discrete measure of the home residence of each patient. Accordingly, the strength of clustering was explored according to (i) whether individuals were resident “on” or “off” campus, (ii) the ZIP code of each individual residence, and (iii) the ZIP code of each individual, with a division into those individuals resident in either the east or the west part of the UCSD campus. The results here were striking: in no case did we find a significant association between phylogeny and geography (P > 0.4 in all cases) (). Hence, there is no more clustering by residence in these data than might be expected from chance alone. In addition, we observed no significant viral clustering by time (week of sampling) under the AI statistic and only a relatively weakly significant result (P = 0.044) under the PS statistic, which was in large part due to a clustering of sequences from week 1 of sampling (12 October to 18 October; P = 0.005), representing the majority of those collected on campus. Finally, we observed no significant phylogenetic clustering by either the age or the sex of the patients in question ().
Results of phylogeny trait association tests for H1N1/09 genome sequences sampled from UCSD