This work presents the complete genome sequences for the two previously unsequenced Y. pestis
major lineages (both designated biovar antiqua using classical nomenclature). Phylogenetic relationships were elucidated clearly with the distribution of synonymous SNPs (Fig. ). Since synonymous mutations do not affect protein functions (unlike nsSNPs or some IS elements), their accumulation is not under selective pressure, making this the least biased method for inferring evolutionary relationships. The distribution of sSNPs convincingly demonstrates that a single biovar, antiqua, is an inaccurate phylogenetic representation, supporting previous claims that categorize biovar antiqua strains into two groups (2
). Using terminology proposed previously (2
), lineage 1.ANT (African strain Antiqua) is closely related to orientalis strain CO92, while 2.ANT (Asian strain Nepal516) is more closely related to medievalis strain KIM. These four classical isolates fall on a branch separate from the nonclassical, human-avirulent Chinese strain 91001. This analysis also revealed a relatively rapid divergence of the four distinctive lineages from two ancestral lines for the classical Y. pestis
strains. Although it is only possible to make very crude estimations of the age of descent for these four lineages, the numbers of sSNPs are consistent with all of the lineages being present within the last 1,500 years of the three great pandemics (calculation not shown).
A comparison of all five Y. pestis
sequences reveals extensive DNA sequence rearrangement, widespread gene reduction, and strain-specific IS elements as well as SNPs. It was previously reported that Y. pestis
strains differ greatly in genome synteny and that repeated sequences most often were found at the borders of rearrangements (10
). Indeed, most rearrangements occur at IS elements and, regardless of which genomes were chosen for two-way comparisons, we identified numbers of rearrangements similar to those previously observed between Y. pestis
) and even between Y. pestis
and Y. pseudotuberculosis
) (data not shown). The question remains whether these observed rearrangements have any effect on transcription or whether they have an overall destabilizing influence on the genome.
The distribution pattern of IS elements in the sequenced strains generally supports the SNP-derived phylogeny, with several IS elements shared across all classical strains except 91001 as well as IS elements found in only the CO92/Antiqua or KIM/Nepal516 pair of strains. Only a few (five in total) IS elements found to be shared by two or more strains did not conform to the predicted phylogeny (footnoted in Table ); similar observations have been reported previously (2
). Our analyses suggest that a small number of IS elements may have been precisely excised from their insertion locations, that identical insertion events have occurred in two different strains/lineages, or that there may be some limited horizontal transfers between Y. pestis
strains that have resulted in mobilizing IS elements from one strain to a different strain/lineage (or, alternatively, removing an IS element by introduction of the wild-type sequence). One example is an aminotransferase (YPO3250) that is disrupted by an IS100
in all sequenced Y. pestis
strains except Nepal516, which instead has the wild-type gene and no trace of an IS100
. These data also suggest that certain IS elements may not be useful for typing or grouping strains and may explain certain discrepancies in phylogenetic groupings using different methods.
Interestingly, the entire complement of IS1541 (and almost all IS1661) elements in strain 91001 was acquired by the ancestor of all Y. pestis strains. In contrast, since 91001 diverged from the other strains, it has acquired a number of strain-specific IS100 and IS285 elements, supporting the idea of actively integrating IS elements within the genome of Y. pestis. With the exception of Nepal516, IS100 appears to have been more active (greater number of new transposition events) than other IS elements, but the reason for this is unknown.
Functional reduction analysis also generally agrees with the SNP-based phylogenetic tree (Fig. ) as well as with a more limited study that identified the loss of gene regions across a panel of Y. pestis
isolates using a CO92 gene-specific microarray (21
). Similar to the IS and SNP data, the four classical strains appear to share an evolutionary path distinct from that of strain 91001 based on functional reduction, and the KIM/Nepal516 and CO92/Antiqua pairs also exhibit a larger number of shared function loss. The exceptions are the result of independent mutations: two shared losses between KIM and Antiqua, one shared loss between Nepal516 and 91001, eight shared losses between Antiqua and 91001, and 16 shared losses between CO92 and KIM (Table and see the supplemental material). The two shared function losses between KIM and Antiqua are a putative siderophore biosynthetic enzyme and a putative membrane protein. The predicted functions suggest that both proteins could be involved in interactions with the environment; therefore, these losses may reflect adaptations to the Y. pestis
microenvironment. Similarly, the single functional loss shared between Nepal516 and 91001 is the arabinose operon regulatory protein. Although the observed shared loss of function between Antiqua and 91001 contained several genes, they are exclusively in the prophage region described above and it is the result of independent deletion events. The shared losses between CO92 and KIM were possibly from a single deletion event.
Strain 91001 has the highest number of strain-specific losses of function, with a total of 69. Interestingly, all but four of the 91001-specific pseudogenes have homologs with >90% identity in Y. pseudotuberculosis
, suggesting that 91001 lost those genes, while other virulent Y. pestis
strains retained them. It is possible that these genes may be involved in human virulence and/or fitness in the human host. Some inactivated proteins, such as hemolysin (YPO2045), sulfatase and sulfatase modifier protein (YPO3046 and YPO3047), UDP-glycosyltransferase (YPO1985), and O-unit flippase-like protein (YPO3110), may be related to pathogenicity (see the supplemental material). Hemolysin is a toxin that forms transmembrane channels and is involved in heme utilization and adhesion. The precise function of the sulfatase operon (YPO3046 and YPO3047) in Y. pestis
is not known; however, these enzymes belong to a family of proteins that hydrolyze various sulfate esters or catalyze sulfur insertions. In mammalian cells, the oligosaccharide moieties on glycoproteins, glycolipids, and proteoglycans are frequently modified with sulfate. Sulfatase from pathogenic bacteria has been shown to interact with mucin (47
), and a previous study suggested that mucin-sulfatase activity in Burkholderia cepacia
and Pseudomonas aeruginosa
may contribute to their associations with airway infection in cystic fibrosis patients by possibly facilitating bacterial colonization (25
). Thus, the deletion of the sulfatase and sulfatase modifier protein in strain 91001 may have contributed to its human-avirulent phenotype. Finally, the O-unit flippase is involved in translocating a polysaccharide unit across the membrane while UDP-glycosyltransferase (YPO1985) is typically involved in O-antigen biosynthesis. Since Y. pestis
is known to lack O antigen, the actual functions of YPO3110 and YPO1985 may not directly involve O antigen but perhaps other surface polysaccharides.
Antiqua also had a high number of strain-specific losses, even after discounting the deletion events which involved several genes (41 and 31, respectively). Interestingly, we found a correlation with the observed higher IS100 transposition activity in Antiqua, with 13 of the 31 inactivations due to IS100 interruptions. The profile of Antiqua-specific loss of function contains a significant number of proteins which interact with environment, such as glutathione S-reductase, chemotaxis protein, porin C protein, potassium efflux pump, insecticidal toxin, flagellar motor switch protein, and six membrane proteins without specific known functions. A possible explanation for this may be that the genome has been adapting to the niche the Antiqua organism occupies.
Discounting those genes lost in a single deletion event, the numbers of KIM-specific (14
) and Nepal516-specific (13
) functional loss are similar. Surprisingly, only three CO92-specific losses of function were identified. It is possible that there was a selective advantage for the orientalis biovar to maintain a greater repertoire of genes and to maintain flexibility and be able to adapt quickly to a new host(s). The worldwide distribution of this group and the small number of CO92-specific putative gene inactivations are consistent with this theory. A 31-amino-acid deletion in YPO3937 (473 amino acids) confers the glycerol-negative phenotype of biovar orientalis (33
); however, since the deletion was below the cutoff threshold, it was not included in our study as a loss of function. Unique to strain CO92 are a hypothetical protein (YPO2469), a hemolysin activator protein (YPO3720), and a prophage that do not exist or have been inactivated in the other sequenced Y. pestis
strains or Y. pseudotuberculosis
. These genes may again have been retained by CO92 to maintain its ability to interact with a more variable environment.
Unexpectedly we found that Nepal516 has many exceptions relative to the other sequenced Y. pestis strains, including the apparent loss of function of TufB, the number of Nepal516-specific SNPs much smaller than the numbers specific for other strains (Fig. ), and the fact that IS100 has not been as active in Nepal516 as in the other strains (Table ). Since both nsSNPs and sSNPs are equally affected, it is unlikely that this is due to selective pressure (which should have a neutral effect on sSNPs) but rather the mutation rate is responsible, suggesting the rate of mutation or evolution is slower in Nepal516. The reason for this is not known; however, a possible explanation may be that this phenomenon is driven by fewer rounds of bacterial division with a relatively cooler local environment and hibernation of the host(s) that fostered fewer opportunities for transmission.
Despite the observed differences between different strains of Y. pestis
, the sequenced genomes reveal a highly conserved chromosomal backbone reminiscent of what is observed in Bacillus anthracis
). Within the five genomes of Y. pestis
compared here, a single region present in strain CO92 was found to be unique (not shared with another Y. pestis
genome), though independent studies have shown that this region, which encodes phage genes, is present in most, if not all, 1.ORI strains as well as some 1.ANT strains (10
). We thus believe that most of the genomic sequence shared among the “classical” Y. pestis
isolates is represented within this data set, though other sequences of nonclassical isolates may harbor novel genomic regions not revealed in these analyses.
The two completed genomes presented here, from the previously unrepresented antiqua biovar, have provided important references for SNP discovery and for the study of insertion element distribution, genome rearrangement, and reductive evolution in Y. pestis. Comparisons of the four virulent “classical” strains to the human-avirulent strain 91001 have also provided further insight into Y. pestis human virulence. With sSNPs as the preferred method for elucidating phylogenetic relationships, strains Nepal516 and Antiqua were convincingly placed in two clearly separate branches, with one branch shared by strains KIM (medievalis) and Nepal516 and the other shared by strains CO92 (orientalis) and Antiqua. While IS element distributions and function loss across the strains generally agreed with such a phylogenetic representation, certain exceptions were found and are thought to be the result of a lack of selective pressure in the Y. pestis strain-inhabited niche, of possible horizontal gene exchange between Y. pestis strains, or of homoplasy in the reductive processes. Though there is some evidence of convergent evolution, whether this is the primary mechanism underlying the observed discrepancies remains to be investigated. The Y. pestis genome is a clear example of one actively undergoing reductive evolution, as its lifestyle has altered from an enteropathogen to an intracellular pathogen. The genome has slowly accumulated inactivations and deletions that result in loss of function, which, for the virulent strains (all strains except 91001), have little effect on pathogenicity. The differences between these strains and the human-avirulent 91001 provide an ideal starting point for future experiments to elucidate the mechanisms involved in Yersinia pathogenicity.