|Home | About | Journals | Submit | Contact Us | Français|
Pandemic infectious diseases have accompanied humans since their origins1, and have shaped the form of civilizations2. Of these, plague is possibly historically the most dramatic. We reconstructed historical patterns of plague transmission through sequence variation in 17 complete genome sequences and 933 single nucleotide polymorphisms (SNPs) within a global collection of 286 Yersinia pestis isolates. Y. pestis evolved in or near China, and has been transmitted via multiple epidemics that followed various routes, probably including transmissions to West Asia via the Silk Road and to Africa by Chinese marine voyages. In 1894, Y. pestis spread to India and radiated to diverse parts of the globe, leading to country-specific lineages that can be traced by lineage-specific SNPs. All 626 current isolates from the U.S.A. reflect one radiation and 82 isolates from Madagascar represent a second. Subsequent local microevolution of Y. pestis is marked by sequential, geographically-specific SNPs.
‘Plague’ designates a human invasive disease caused by Yersinia pestis that is usually fatal without antimicrobial treatment. However, most of the primary hosts for Y. pestis are rodents of various species, in which sylvatic cycles of disease depend on transmission by species-specific flea vectors3. Epidemic expansions of endemic sylvatic disease can result in epidemics and global pandemics of human plague4. Europe was devastated by Justinian's plague (541-767) and the Black Death (1346-18th century)5,6, which also ravaged China7. Plague resurged in China in 1894, spreading from Yunnan province to Hong Kong, and then via marine shipping to diverse global destinations, including India, Europe, Africa and the Americas7. However, direct genetic insights into these historical events were lacking until now.
Y. pestis was subdivided into biovars Orientalis, Medievalis, Antiqua and Pestoides on the basis of nutritional properties8. Genetically, Y. pestis is a monomorphic clone of its more diverse parental species, Yersinia pseudotuberculosis9. Only 76 synonymous SNPs were found in 3,250 orthologous coding sequences from the first three Y. pestis genomes10. These SNPs defined a tree consisting of an ancestral branch 0 and derived branches 1 and 2. These branches contained populations with distinctive geographic patterns, designated 1.ORI (Orientalis), 2.MED (Medievalis), 1.ANT (African Antiqua) and 2.ANT (East Asian Antiqua). Pestoides/Microtus were assigned to four other populations, 0.PE1 through 0.PE4. Here we present a global overview of the phylogeographic diversity of Y. pestis and reconstruct historical patterns of plague transmission.
We compared non-repetitive core genomes of 17 isolates of Y. pestis, including eleven that were sequenced for this project (Supplementary Table 1), and found that the phylogenetic patterns of synonymous, non-synonymous or intergenic SNPs were almost identical. The genomes tree in Fig. 1 is based on 1,364 SNPs in coding regions that were present in all genomes, and was dated using an endemic molecular clock rate based on isolates from Madagascar (Supplementary Table 2). We performed additional mutation discovery in up to 185 kb with 370 isolates from diverse sources, and screened 286 isolates for 933 SNPs that were discovered by the combination of genomics and mutation discovery (Supplementary Fig. 1). The resulting SNP assignments were used to calculate a minimal spanning tree (MSTree) and assign isolates to populations Fig. 2). The phylogenetic tree and the MSTree share an identical branching order, which is fully parsimonious and reflects unidirectional, clonal evolution from root to the tips, thus allowing deductions about historical waves of Y. pestis transmission.
A particularly striking aspect of the MSTree is the strong geographical clustering of populations (Supplementary Fig. 2). Isolates from China are scattered in multiple populations over ancestral branch 0, which evolved >2,600 ya, as well as branches 1 and 2, which split from branch 0 at least 728 ya. At the base of branch 0, Chinese populations 0.PE4 and 0.PE7 are intermingled with populations 0.PE1 through 0.PE3 from the Former Soviet Union (FSU) (Fig. 2). Isolates from outside China or the FSU were only found on branches 1 (1.ORI, 1.ANT) and 2 (2.ANT, 2.MED), suggesting that plague originated in China or the FSU. Consistent with a Chinese source, the average phylogenetic diversity among isolates was greater in China than within other countries (99% bootstrap confidence intervals: China: [0.23 ; 0.32]; elsewhere: [0.029 ; 0.058]; Welch t-test: p < 2.2×10−16). The genomes tree suggests that 0.PE2 (FSU) is the oldest population but 0.PE7, isolated in China, is at least as old according to SNP typing, and considerably older according to genomic sequencing (R. Yang, pers. comm.). Our observations thus suggest that Y. pestis evolved in China and spread to other areas again and again.
Population 3.ANT, at the end of branch 0 (Fig. 2), is as old as branches 1 and 2 and represents a fourth branch, branch 3, which is apparently restricted to China and Mongolia (R. Yang, pers. comm.). More recently, >545 ya, branch 2 split into 2.ANT and 2.MED. This evolutionary separation probably occurred in China with subsequent transmission by land to other areas: all isolates in 2.ANT3 and 2.ANT2 were from China whereas the terminal nodes in 2.ANT1 were isolated in neighboring Nepal (Fig. 2). Similarly, all isolates in 2.MED3 and 2.MED2 were from China whereas the terminal nodes in 2.MED1 were from Kurdistan. Isolates in 1.ANT are restricted to East and Central Africa, which also requires long distance travel if Y. pestis evolved in China. The next population on Branch 1, 1.IN, consists of three sub-populations (Fig. 2) in Western and Southern China (Supplementary Figs. 2-3).
The youngest population on branch 1, 1.ORI, evolved >210 ya and spread globally via multiple independent radiations during the third pandemic (Supplementary Fig. 3E). The earliest node in 1.ORI gave rise to three sub-branches: 1.ORI1, 1.ORI2 and 1.ORI3. 1.ORI1 reached the U.S.A. 1.ORI2 refers to multiple radiations (iii through ix) that reached Europe, South America, Africa and Southeast Asia (Table 1). 1.ORI3 spread to Madagascar and Turkey.
We postulate that sub-branch 1.ORI1 evolved in China because its ancestral node, 1.ORI1.a, contains one isolate from China plus two from Indonesian Java (Supplementary Fig. 4). The next node along the 1.ORI1 sub-branch (1.ORI1.d) contains multiple isolates from northern India, Hawaii and the vicinity of Los Angeles, California. The subsequent evolutionary path in the U.S.A. is marked by five strains isolated since 1939 in central California (Supplementary Fig. 4, red, yellow). All 626 other isolates from diverse sources in the western U.S.A. belonged to descendent nodes, derived from the red and yellow nodes. Thus, all extant Y. pestis in the U.S.A. seem to be derived from a single import.
We further postulate that sub-branch 1.ORI3 spread from India to Madagascar because its ancestral node, 1.ORI2.a, contains two isolates from India (Supplementary Fig. 5), one of which (strain 195) was isolated in Bombay in the same year (1898) that plague was imported to Madagascar via a plague ship from India11. All 82 strains that were isolated in Madagascar over a period of 80 years fell into two Madagascar-specific clusters, radiating from node 1.ORI3.k (blue) or its derived node, 1.ORI3.d (red) (Supplementary Fig. 5).
The 1.ORI3.k cluster already existed by 1926, in which year EV76, a widely used attenuated live vaccine strain within the blue cluster, was isolated12. Other members of that cluster were isolated from diverse geographical sources, including highlands and coastal regions. In contrast, the 1.ORI3.d cluster was restricted to a smaller area in the highlands near Fianarantsoa. Plague first began here in 1933 11 and the oldest member of the 1.ORI3.d cluster is from 1939, suggesting that 1.ORI3.d evolved between 1933 and 1939. Three descendents of 1.ORI3.d were isolated in Turkey.
Clonal microevolution within Y. pestis allows inferences about its evolutionary history, especially when placed in the context of the geographic sources of the isolates and historical records regarding waves of transmission. One important conclusion is that Y. pestis probably evolved in China. Isolates from China are scattered over all four phylogenetic branches and average phylogenetic diversity among isolates is greater within China than other countries. Subsequently, Y. pestis has spread from China to other areas on multiple occasions since the origins of branch 0. For example, we infer that Y. pestis on branch 0 spread on multiple occasions to Mongolia, Siberia and central regions of the FSU, which is the most parsimonious explanation for the isolation there of strains with related microsatellite (MLVA) patterns13.
Dates of several branching events postdate historical events with which they might be associated for reasons that are explored and analyzed in the Supplementary Note. Genotypes from the Black Death in Europe dating to the mid-14th century (~610 ya)14 map at or near the split between branches 1, 2 and 3, which occurred >728 ya. The geographical sources and evolutionary branch order of 2.MED subpopulations, which arose >545 ya, correspond with points along the former Silk Road15 (Supplementary Fig. 3B), an extensive trade route from China to Western Asia between 200 BC and 1,400. Other 2.MED1 isolates have been found in Western China (R. Yang, pers. comm.), as well as Kazakhstan and the Caucasus13, which supports the westward spread of 2.MED from China via trade articles that were carried along the Silk Road.
We also invoke extensive spread of Y. pestis for the 1.ANT1 to 1.ANT3 populations that have only been isolated from East and Central Africa. The estimated age of 1.ANT1 (628-6,914 ya) slightly predates the extensive voyages from China led by Zheng He between 1409 and 1433 (Supplementary Fig. 3A). These voyages involved up to 300 ships, up to ten times larger than those of contemporary European explorers, and carrying ~28,000 crewmen16. It seems highly likely that these ships were infested by rats, which could have transmitted Y. pestis from China to Africa. The geographic locations of 1.ANT isolates are consistent with the terminus of Zheng He's route, and suggest progressive evolution during migration from the coast. However, a causal association between 1.ANT in Africa and the voyages by Zheng He remains an unproven hypothesis. Plague may have been introduced to East Africa by an alternative route, such as the limited contacts between East Africa and China facilitated by Arab traders.
The third plague pandemic initially spread from Yunnan to Hong Kong5-7 prior to global dissemination in 1894. This pandemic was caused by Y. pestis isolates from the youngest population on branch 1, 1.ORI, which evolved >210 ya. As expected, multiple 1.ORI isolates were found in China, including the oldest node of 1.ORI1. Subsequent global dispersion during the third pandemic was associated with multiple independent radiations of sub-branches 1.ORI1, 1.ORI2 and 1.ORI3.
1.ORI1 spread to northeast India from which six 1.ORI1.d strains were isolated. These may have been associated with a major epidemic in 1899 in Calcutta (now Kolkata) which is thought to have been infected from Hong Kong by 1896 5. Historical records also document that plague was imported to the U.S.A. in 1899 via a plague ship from Hong Kong that docked in Hawaii and then in San Francisco17. Plague broke out soon thereafter in both Hawaii (December, 1899) and San Francisco (March, 1900). Our data pinpoint the origin of modern plague in the continental U.S.A. to California as all extant Y. pestis in the U.S.A. are the progeny of 1.ORI1.d, which was isolated three times in the vicinity of Los Angeles (Supplementary Fig. 4) where plague-infected squirrels were observed by 191017. 1.ORI1.d was also isolated in Hawaii, and its ancestor 1.ORI1.a was isolated in China, which is consistent with the historical records. The subsequent evolutionary path in the U.S.A. is marked by two descendent nodes containing five strains in central California. All 626 other isolates from diverse sources in the western U.S.A. are descendents of those nodes. Thus, all extant Y. pestis in the U.S.A. are derived from a single import, possibly corresponding to bacteria introduced to San Francisco in 1899 that then spread to Los Angeles.
1.ORI2.a, a second descendent of 1.ORI1.a, probably also evolved in China because its descendent radiation ix spread from the Chinese-Vietnamese border to southern Vietnam and Burma and back to China (Supplementary Fig. 3E inset). 1.ORI2.a strain 195 was isolated from Bombay in 1898, which is compatible with historical records showing that Bombay was infected by plague in 1896 via a ship from Hong Kong18. But 1.ORI2.a was also the parent of multiple other radiations that reached Europe, South America, western and southern Africa and Southeast Asia (Table 1; Supplementary Fig. 3E). For example, radiation viii to Hamburg in Germany and Argentina probably originated in India because plague was imported into Argentina in 1899 from Uruguay by a rice ship from India via Rotterdam19. Several ships carrying plague-infected rats docked in Hamburg soon after 189420 but did not cause recorded cases of human plague there.
1.ORI2.a also gave rise to sub-branch 1.ORI3 that reached Madagascar. Our data are consistent with one single successful import event from India into Madagascar in 1898, which then differentiated further within Madagascar, finally reaching the highlands in 192111, where it remains endemic. Alternatively, the original import in 1898 has no extant descendants, and 1.ORI3.k was imported after 1898 but before 1921. Still other scenarios invoking independent imports of the blue and red clusters (Supplementary Fig. 5) after microevolution outside Madagascar are less likely, because they would need to account for the restricted geographical specificity of the red cluster within Madagascar. A descendent of 1.ORI3 spread from Madagascar to Turkey because three isolates from Turkey are descendents of nodes that likely evolved within Madagascar. Historical records document two cases of human plague from Madagascar that reached the Middle East in 193121, which may have been the time period in which transmission to Turkey occurred.
In summary, we present a phylogeny of Y. pestis that covers a large part of its global evolutionary history since an origin in the vicinity of China. We also provide a postulated historical reconstruction for major migrations from East Asia to other continents. The phylogeny of this genetically monomorphic clone is based on an unambiguous reconstruction of the sequential accumulation of approximately 1,000 SNPs that have accumulated in different branches during that phylogenetic history. This extensive SNP-based framework will facilitate future investigations of under-sampled regions, such as Africa and the former Soviet Union, for which details are still lacking. It will also help to elucidate the basis of historical pandemics such as Justinian's plague and the Black Death through ancient DNA studies. This study thus provides a basis for more detailed analyses as well as a general paradigm for the reconstruction of historical pandemics.
We investigated bacteria from various geographical sources, including 92 isolates from global origins and 98 from China that represent the genetic diversity revealed by biotyping, ribotyping24 and large deletions25 as well as country-specific isolates from Madagascar (82 isolates; Supplementary Table 3) and the U.S.A. (651; Supplementary Table 4). We also tested eight Pestoides isolates10, including strain Angola, for which no information is available about source other than its name.
We performed genomic resequencing on 11 Y. pestis isolates, four from China (B42003004, E1979001, F1991016, K1973002)26, two from Madagascar (MG05-1020, IP275), one from Uganda (UG05-0454), one from Turkey (IP674), one of ambiguous source (Angola)27 and two from the U.S.A. (CA88 28, FV-1 29 (Supplementary Table 1) in order to expand the diversity of genomic polymorphisms beyond that of published genomic sequences28-35. Details of the epidemiological sources and other properties of the four unpublished isolates are:
Biovar Orientalis. Isolated in 2005 in Antananarivo, Madagascar from an 8 year old male with bubonic plague. MG05-1020 expresses the F1 capsule and is sensitive to Y. pestis-specific bacteriophage as well as chloramphenicol, trimethoprim/sulfamethoxazole, ciprofloxacin, gentamicin, streptomycin, and doxycycline.
Biovar Antiqua. UG05-0454 was isolated in 2004 in Arua, Uganda from a 10 year old female with bubonic plague. It expresses the F1 capsule and is sensitive to Y. pestis-specific bacteriophage, chloramphenicol, trimethoprim/sulfamethoxazole, ciprofloxacin, gentamicin, streptomycin, and doxycycline. 3.2×103 cfu kill mice within four days.
Biovar Orientalis. Isolated in Turkey in 1952.
In order to avoid phylogenetic discovery bias38, we compared the genomes of 17 Y. pestis isolates (Supplementary Table 1), which represent all known biovars, multiple populations from each of the three known branches, and representatives of novel populations from China. Synonymous, non-synonymous and intergenic SNPs were extracted from the genomic comparisons, after annotating and excluding potentially repetitive, mobile or hyper-variable regions (Supplementary Table 5).
These comparisons were supplemented with SNP discovery in up to 185 kb by denaturing High Performance Liquid Chromatography (dHPLC)39 with isolates from the different collections. The SNPs discovered within coding regions of isolates from Madagascar were used to calculate minimal and maximal estimates of a mutational clock rate (Supplementary Fig. 6), which were then used to estimate ranges of dates for the branches in Fig. 1 (Supplementary Table 2B).
Finally, 286 isolates were screened by Sequenom MassArray SNP typing for 933 SNPs that had been identified by genomic comparisons and/or SNP discovery, resulting in a minimal spanning tree (MSTree) of clustered nodes (Fig. 2, Supplementary Fig. 7, Supplementary Table 6). Pestoides isolates were also assigned to this MSTree by typing selected SNPs (Supplementary Table 7). SNP typing identified several homoplastic sites and sequencing errors in the genomic sequences, and showed that several isolates represented cross-contamination with vaccine strain EV76 (Supplementary Table 3), which were all excluded. Thereafter, clustered nodes in the MSTree were assigned to individual populations and subpopulations (Supplementary Table 8, Supplementary Fig. 7).
Isolates IP275, MG05-1020 and UG05-0454 were sequenced at the J. Craig Venter Institute by random whole genome shotgun sequencing and closure strategies40. Plasmid (pHOS2) and fosmid (pCC1fos) libraries were constructed for each isolate with insert sizes of 4-6 kb and 30-40 kb, respectively. An average of 63,451 high quality Sanger reads were generated on ABI3730xl as previously described41 and assembled using the Celera assembler42.
IP674 was sequenced at the Wellcome Trust Sanger Institute, using 454 FLX pyrosequencing, and assembled (454/Roche Newbler) into ~700 contigs (N50 contig size: 13,799 bp) from 784,705 sequence reads of average length 450 bp. Putative SNPs were confirmed by DNA sequencing.
Chromosomal genomic sequences of Y. pestis (16 sequences) and Y. pseudotuberculosis IP32953 (1 sequence; outgroup) (Supplementary Table 1) were aligned against the well-annotated genome of strain CO92 33 using Kodon (Applied Maths, Belgium) in order to identify non-repetitive SNPs. We used the alignments to identify and exclude all repetitive regions because these can lead to pseudo-SNPs due to faulty alignments or to gene conversion by recombination, resulting in homoplasies43,44. We excluded microsatellites (VNTRs), IS elements, bacteriophages, homo- and hetero-polymeric repeats, and duplications (the largest of which, DR1/DR2, was 12.1 kb). Additional potential repetitive regions and/or regions that might be under strong diversifying selection were identified by examining 31 bp flanking each potential SNP for three or more polymorphic sites across the 17 Y. pestis genomes. Additional repetitive regions were identified by reversed best hit Fasta searches for duplicated regions containing putative SNPs. These procedures excluded388 kb (8.3%) from the ~4.65 Mb CO92 genome (Supplementary Table 9).
We also excluded all SNPs that were exclusive to the FV-1 genome, because that genome was suspected to contain many sequencing errors, and from strain Angola. Angola contains >708 genome-specific SNPs, which is extraordinarily high for a strain of Y. pestis, and no other isolate was closely related to Angola according to dHPLC. Finally, we excluded SNPs in 1,000 regions spanning ~600 kb that were lacking in one or more major branches in the tree.
Independent lists of SNPs in non-repetitive regions were also generated with the nucmer module of MUMmer45 from pair-wise alignments to CO92 of 16 Y. pestis genomes (excluding FV-1). Differences between the Kodon and MUMmer results were resolved by manual inspection. The remaining SNPs were combined with SNPs detected by dHPLC mutation discovery, resulting in a total of 1,232 biallelic SNPs that were considered suitable for genotyping analyses (Supplementary Fig. 1). For each SNP, the ancestral state was assigned to the nucleotide present within Y. pseudotuberculosis IP32953 and the derived state was assigned to the alternative nucleotide found in Y. pestis.
Thirteen SNPs on branch 1.ORI1 were screened (Supplementary Fig. 4) with Y. pestis DNAs from India (N = 2), Hawaii (N = 2), and diverse sources in western states of continental U.S.A. (N = 634) (Supplementary Table 4). SNPs s34 and s59 were screened using TaqMan assays, as described46. SNPs s1076, s1086 and s1135 were screened using similar, newly designed TaqMan assays. SNPs s691, s729, and s985 were screened using Sequenom MassArray as described above. SNPs s57, s58, s60, s274 and s429 were screened using the melt-MAMA approach47 with newly designed primers (Supplementary Table 10).
Strains Pestoides A, B, C, D, E and G were tested for 39 SNPs specific for the beginning of branch 0 by the melt-MAMA approach47 (Supplementary Table 10). Those results were combined with published sequencing results10 to provide the SNP calls in Supplementary Table 8. The locations of these isolates in Fig. 2 and Supplementary Fig. 7 reflect the following conclusions: Pestoides E and G in 0.PE2.b share 12/13 derived SNPs with Pest-F, confirming that 0.PE2.b is closely related to 0.PE2.a10, and Pestoides A, B, C and D in population 0.PE1 share 6/10 derived SNPs with 0.PE4 isolates. We also re-confirmed10 that strain Nich51 from the FSU is in 1.ORI by melt-MAMA and PCR tests (Supplementary Table 10) and that it differs from all other known 1.ORI isolates by containing an intact glpD gene.
The merged SNP data were stored as a character set in Bionumerics 5.1 (Applied Maths, Belgium) and depicted as an MSTree whose branch lengths reflect the numbers of SNP differences between pairs of nodes. However, missing data are interpreted by Bionumerics as equivalent to 0, which leads to artificial nodes and branches due to apparent homoplasies. We therefore assigned each isolate to a node on the basis of unambiguous SNP calls. Where such an assignment was ambiguous due to missing data, the ambiguity was resolved by sequencing (Supplementary Table 11). After unambiguously assigning each isolate to a node, we arbitrarily replaced all remaining missing data for that isolate by the SNP calls that were characteristic of other members of the same node. This strategy is justified because different SNP calls in other parts of the phylogenetic tree would correspond to homoplasies, which are exceedingly rare in Y. pestis.
36 SNPs were considered to represent homoplasies because the same nucleotide change, as confirmed by direct sequencing, was found in at least two independent branches of the MSTree (Supplementary Table 12). 10 additional SNPs were scored as putative homoplasies; in those cases, sequence confirmation was not performed because many isolates had missing Sequenom data.
Strain Angola evolved prior to strain 91001 according to four SNPs but an independent fifth SNP (s595) indicated that 91001 evolved earlier (Supplementary Table 13). On the basis of the former four SNPs we decided that s595 represents a homoplasy that had occurred in strain Angola and excluded it from further analyses.
We gratefully acknowledge technical assistance by Roxanne Nera and Adina Doyle and helpful comments by Andrew Rambaut and Daniel Falush. Support was provided by grants from the German Army Medical Corps (MSAB15A013) and the Science Foundation of Ireland (05/FE1/B882) to M.A., the National Key Program for Infectious Diseases of China (2008ZX10004009) and the State Key Development Program for Basic Research of China (2009CB522600) to R. Y., and the US Department of Homeland Security (NBCH2070001; HSHQDC-08-C00158) and National Institutes of Health (AI065359) to P.K. and D.M.W. Whole genome sequencing of Y. pestis strains IP275, MG05-1020 and UG05-045 was supported by federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services (N01 AI-30071) and sequencing of IP674 was supported by funding for Sanger Institute Pathogen Genomics by the Wellcome Trust. Genomic DNA of Y. pestis MG05-1020 was kindly provided by Scott Bearden and Martin Schriefer (Centers for Disease Control and Prevention, Fort Collins, CO).
Database accessions. Genomic sequences have been deposited under the accession codes listed in Supplementary Table 1. The Sequenom results are available under accession number E-MTAB-213 at http://www.ebi.ac.uk
Supplementary Information including Supplementary Figures 1-5, Supplementary Tables 1-2 and a Supplementary Note is linked to the online version of the paper at www.nature.com/nature. Supplementary Fig. 1 summarizes the main approach of this paper. Supplementary Figs. 6-13 and Tables 3-18 can be found at http://research.ucc.ie/NG1/index.html
Reprints and permissions information are available at www.nature.com/reprints.