|Home | About | Journals | Submit | Contact Us | Français|
Yersinia enterocolitica is a heterogeneous bacterial species with a wide range of animal reservoirs through which human intestinal illness can be facilitated. In contrast to the epidemiological pattern observed in the United States, infections in China present a pattern similar to those in European countries and Japan, wherein “Old World” strains (biotypes 2 to 5) are prevalent. To gain insights into the evolution of Y. enterocolitica and pathogenic properties toward human hosts, we sequenced the genome of a biotype 3 strain, 105.5R(r) (O:9), obtained from a Chinese patient. Comparative genome sequence analysis with strain 8081 (1B/O:8) revealed new insights into Y. enterocolitica. Both strains have more than 14% specific genes. In strain 105.5R(r), putative virulence factors were found in strain-specific genomic pathogenicity islands that comprised a novel type III secretion system and rtx-like genes. Many of the loci representing ancestral clusters, which are believed to contribute to enteric survival and pathogenesis, are present in strain 105.5R(r) but lost in strain 8081. Insertion elements in 105.5R(r) have a pattern distinct from those in strain 8081 and were exclusively located in a strain-specific region. In summary, our comparative genome analysis indicates that these two strains may have attained their pathogenicity by completely separate evolutionary events, and the 105.5R(r) strain, a representative of the Old World biogroup, lies in a branch of Y. enterocolitica that is distinct from the “New World” 8081 strain.
Yersinia enterocolitica is an enteropathogen transmitted by consumption of contaminated food or water (3), outbreaks of which have been reported worldwide. Y. enterocolitica has established a broad base of animal reservoirs; however, colonization in swine presents the most robust capacity for transmission to humans (22, 32), and dogs have been implicated as a potentially significant source in rural families (43). Human infection is primarily localized to the gastrointestinal tract, where a wide range of syndromes can emerge in response to the bacteria's presence, including enteritis, enterocolitis, mesenteric lymphadenitis, and terminal ileitis (6). Moreover, some of these syndromes can rapidly advance from self-limiting manifestations to lethality (3). The pathogenic process of Y. enterocolitica is characterized by the bacteria translocating through the intestinal epithelium to attain residence in the Peyer's patches, which represent a less hostile environment and are conducive to unimpeded bacterial proliferation (16, 36). Translocation of Y. enterocolitica's toxic virulence effectors into proximal host cells is mediated by the bacteria's type III secretion system (T3SS), in conjunction with other transporter systems, and this event plays an essential role in establishing infection (30).
Y. enterocolitica has been classified as a heterogeneous collection of organisms consisting of six distinct biotypes and 60 serotypes. The six biotypes were delineated according to their pathogenic properties: nonpathogenic biotype 1A, weakly pathogenic biotypes 2 to 5 (“Old World”), and highly pathogenic and mouse-lethal biotype 1B (“New World”). Eleven of the 60 serotypes have been associated with clinical illness in humans (2). In particular, the serotype O:3, O:8, and O:9 strains expressing virulence factors are considered causative agents of yersiniosis. Interestingly, the predominant bioserotypes differ with geographical region and time in Great Britain, and the major portion of pathogenic isolates from humans diagnosed with yersiniosis were found to be biotype 3/serotype O:9 (24%) or 4/O:3 (19%). Surprisingly, only veterinary isolates from pigs, the most frequent source of human infection, were found to carry the 3/O:9 strain (31). In the United States, the 1B/O:8 strains have been historically considered the most common and highly pathogenic bioserotype (3); that is, until recently, when the 4/O:3 strain emerged as an important enteric pathogen (27). In contrast, all of the O:8 isolates recovered from patients and animals in China have been biotype 1A strains and have lacked virulence determinants (44). The only pathogenic strains isolated from China have been of serotypes O:3 and O:9 (42). China has a distribution and a pathogenic pattern similar to those of the European countries and Japan, making it a region where Old World strains predominate.
Prior to this study, only the genome of the high-pathogenicity strain 8081 (1B/O:8) had been sequenced for Y. enterocolitica (39). To increase our understanding of the pathogenesis of the other bioserotypes, we sequenced the genome of a pathogenic strain, 105.5R(r) (3/O:9), recovered from a patient in Liaoning Province, in the northeastern part of China. Bioserotype 3/O:9 is the prevalent cause of human yersiniosis in China and some European countries. A comparative genomic strategy was employed using the published genome of strain 8081 to analyze the heterogeneity and evolutional relationships within Y. enterocolitica (39). We identified the specific regions between the two strains that were closely related to pathogenicity and analyzed the differences in loss and acquisition of virulence determinants among them. Our findings indicated that the Old World strain 105.5R(r) occupies an evolutionary branch within Y. enterocolitica which is distinct from that of the New World 8081 strain.
The strain 105.5R(r) was recovered in 1996 from a patient in the People's Hospital of Shenyang in Jilin City, Liaoning Province, China. Characterization of this strain has been described previously (43).
Whole-genome sequencing of strain 105.5R(r) was performed with a combined strategy of Sanger shotgun sequencing (11) and next-generation high-throughput 454 single-end sequencing by synthesis (29). Genomic libraries containing 5-kb inserts were constructed, and 5,000 ABI sequences were generated. A total of 211,782 single-end reads were generated using the GS FLX system (454 Life Sciences Corporation, Branford, CT) and assembled with the 454 Newbler assembler (454 Life Sciences Corporation). Newbler-generated contigs and ABI reads were assembled using the Phred/Phrap/Consed software package (14). In total, 81 contigs were generated, indicating an average of 19-fold coverage across the genome. Physical gaps were closed by sequencing of combinatorial and multiplex PCR products. The complete genome revealed that the 105.5R(r) isolate possesses a single circular chromosome and the virulence plasmid (pYV) associated with classical Y. enterocolitica strains.
Automated gene modeling was carried out using GLIMMER 3.0 software (8), in addition to comparing the respective gene products by a BLASTP search of the nonredundant protein sequences obtained from the NCBI, InterPro (1), KEGG, and COG databases (38). The final results were arranged by custom-made Perl scripts. The tRNA genes were predicted by tRNAscan-SE (28), while the rRNA genes were identified by a BLAST search against Rfam (15) and rRNA gene sequences from strain 8081. Insertion sequence (IS) elements were assigned by ISfinder (http://www-is.biotoul.fr/) (37) and a BLASTX search in the NCBI database.
The published genome of strain 8081 (GenBank accession numbers AM286415 and AM286416 [plasmid]) was employed for comparative genomic analysis. An all-versus-all tBLASTn search was performed for all proteins of strain 105.5R(r) against all bases of strain 8081 by using an E value cutoff of 1e−5. Orthologous proteins were defined as reciprocal best-hit proteins with a minimum of 50% identity to and 70% of the length of the query protein, as calculated by the BLAST algorithm. Proteins without orthologs were considered to be specific proteins. Average nucleotide identity (ANI) was calculated according to the method described previously (25) and by using strain 105.5R(r) as the query genome. Genomic islands (GIs) were identified by the presence of transposable elements and the genes specific to strain 105.5R(r) as well as G+C content variation across the genome and dinucleotide bias (23). Therefore, the GIs identified contain mainly strain 105.5R(r)-specific genes. Comparison of the nucleotide sequences between strains 105.5R(r) and 8081 was carried out using the MUMmer program (26) with default values (minimum match length, 20 bp).
The genome sequence of strain 105.5R(r) was deposited in GenBank under accession numbers CP002246 (for the chromosome) and CP002247 (for the pYV plasmid).
The genome of strain 105.5R(r) is composed of a single circular chromosome, 4,552,107 bp in length, and a pYV virulence plasmid, 69,704 bp long, that is nearly identical to the published pathogenic pYV plasmid from biotypes 2 to 5. The whole genome contains a total of 4,021 predicted coding sequences (CDSs), 76.6% of which could be annotated with known or predicted functions. The chromosome contains 3,935 predicted genes, of which 3,012 were assigned a known function and 923 were most similar to hypothetical proteins in the public database. In addition, 85 pseudogenes were found, as well as 71 tRNA-encoding genes and 7 rRNA operons (Fig. 1 and Table 1). The pYV plasmid was found to carry 86 protein-encoding genes and 6 pseudogenes. The chromosome carries 67 IS elements belonging to 9 IS families, while the plasmid has 3 IS elements of 2 IS families.
The structure and size characteristics of the chromosome and plasmid of strain 105.5R(r) were similar to those of 8081 (Table 1), with an average nucleotide identity of 95%. The chromosome of strain 105.5R(r) is about 63 kb smaller, while the plasmid is about 2 kb larger. There was no significant difference found in amino acid composition between the two strains. Whole-genome nucleotide alignment was performed to determine the synteny of these two strains. The results indicated partial synteny, with numerous inversions, rearrangements, and indels being present (Fig. 2). A large region of approximately 2.7 Mb, corresponding to half of the entire genome, is inverted between the two genomes. As 105.5R(r)'s genome harbors a large number and variety of IS elements, the occurrence of genome-wide inversion, rearrangement, and indels might be expected. In fact, most of the synteny breakpoints between the two genomes were found to be bounded by IS elements. Five 105.5R(r)-specific phages identified may have also contributed to structural rearrangements.
Although the total number and type of IS elements carried by strain 105.5R(r) were comparable to those carried by 8081, the diversity and distribution in IS families were different. Similar to what has been observed in Yersinia pestis strains, IS elements in 105.5R(r) were not evenly represented by the different types. IS1667 was predominant, and three specific ISs (IS21, IS256, and ISNCY) were found on the chromosome. IS3 and IS1666 were found only on the plasmid. All of the IS elements were located in the strain 105.5R(r)-specific region. Six IS elements were found to contain genes associated with transcription regulation, while three carried ABC transporter genes and two were related with virulence. In addition, three ISs were located in phage, and one was located in the flag-2 genes.
Comparison of the two genomes revealed that they share 3,431 core CDSs (Fig. 3) and the ortholog genes dominated most of the COG (clusters of orthologous groups) categories of metabolism functions. The YGI-1 island and the Y. enterocolitica species-specific yts2 type II secretion cluster were found in both of the two strains. Most of the virulence determinants in Y. enterocolitica (including ail, inv, yst, yadA, virF, and yopT) were also present in the genome of strain 105.5R(r) and corresponded with its identity as a pathogenic strain (12).
Considerable variation in the gene repertoire between the two strains was also found. A significant proportion of genes are unique to strain 8081 (16.3%) or 105.5R(r) (14.7%). More 105.5R(r)-specific genes belonged to COGs representing the L (replication, recombination, and repair) and N (cell motility) groups than 8081-specific genes; fewer unique genes fell into the H (coenzyme transport and metabolism), I (lipid transport and metabolism), S (function unknown), and T (signal transduction mechanisms) groups. Results from metabolic network analysis indicated that the 105.5R(r)-specific genes could be mapped to 117 different KEGG orthology (KO) categories and were dispersed among 19 pathways (see Table S1 in the supplemental material).
Many of the 105.5R(r)-specific genes form genomic islands. Fifteen of the GIs that were larger than 4.2 kb were able to be identified by the methods described above; these GIs included a novel T3SS, an ATP binding cassette transporter system, an insecticidal toxin complex (TC) gene cluster (13), a Vibrio cholerae RTX toxin gene cluster, a colicin E2 immunity protein gene cluster, a flagellar gene cluster (Flag-2) (4), a respiration-related gene cluster, and five prophage-related gene clusters. The high-pathogenicity island (HPI) (35), Yersinia type II secretion 1 (yts1) (21), and Yersinia secretion apparatus (ysa) T3SS, all of which characterize the high-pathogenicity strains of biotype 1B (19), were markedly absent from the 105.5R(r) genome. The 105.5R(r)-specific GIs which may confer pathogenic features are as follows.
Type III secretion systems are widely utilized among proteobacterial pathogens of plants, animals, and humans and constitute one of the most fundamental virulence determinants (20). Studies into yersiniosis using a mouse model system have demonstrated that the T3SS plays an important role in Y. enterocolitica colonization of gastrointestinal tissue during the earliest stages of infection (41).
Strain 8081 has two sets of T3SSs, which act independently of one another: the Yop T3SS on pYV and the ysa T3SS, which is carried on the plasticity zone of the chromosome (YE3450 to YE3644). In strain 105.5R(r), a plasmid-encoded Yop T3SS which resembled the corresponding one in the 8081 pYV plasmid was identified, but ysa was absent from the 105.5R(r) chromosome. Strain 105.5R(r) carried a second T3SS on the chromosome, which was composed of 30 CDSs (YE105_C0312 to YE105_C0341) in its specific region. In strain 8081, the genes of the ysa are arranged into two divergently oriented clusters; however, the T3SS transcript orientation on the chromosome of 105.5R(r) was arranged for expression in only one direction.
Phylogenetic analysis was applied to determine the relationship between the T3SS on the chromosome of strain 105.5R(r) and other closely related bacterial strains. Comparison of its sequences with the two T3SS of strain 8081 revealed that seven genes exist in all three sets, but two of those seven showed very low similarity between the two strains. Hence, phylogenetic analysis was carried out using the other five genes of these two strains, eight strains of other Yersinia species and 41 strains of 16 closely related genera (Fig. 4). The T3SS of strain 105.5R(r) resembled those sequences in the same genus other than 8081, and was closely related to Salmonella spp. and “Candidatus Hamiltonella defensa.” The ysa gene in 8081 was closest to that in Y. pestis CO92 and significantly more similar to those of other genera. The ysa gene is located in the substantially large plasticity zone that is believed to have been acquired by horizontal gene transfer. Thus, the T3SS of strain 105.5R(r) may represent the original and conservative characteristics of the Yersinia genus, and 8081 may have attained the ysa gene after its divergence.
In the Yersinia genus, translocation of toxic virulence effectors into host cells by type III secretion systems plays an essential role in determining the outcome of infection. Strain 105.5R(r) carries the six common pYV plasmid-encoded Yersinia outer proteins (known as Yop effectors [YopE, YopH, YpkA, YopM, YopJ/P, and YopT]) delivered by Yop T3SS; however, the ysa-related Yersinia secreted proteins (Ysps) do not exist in this strain.
Homology searches of these two strains against more than 300 effectors from various type III secreting organisms (40), including plant and animal pathogens and symbionts, revealed that each strain has its specific effectors scattered throughout the genome. Strain 105.5R(r) has six specific effector-encoding genes (YE105_C0316, YE105_C0320, YE105_C0322, YE105_C2952, YE105_C3581, and YE105_P0044), of which three (YE105_C0316, YE105_C0320, and YE105_C0322) are located in the T3SS region. YE105_C0316 shows similarity to sseB of Shewanella baltica, a translocon component for effector proteins (33). YE105_C0320 and YE105_C0322 resemble sopB (24) and sseF (7), respectively, the protein products of which are known to modulate membrane structure and localization of vacuoles during bacterial infections. The product of YE105_C2952 acts in conjunction with SopB to modulate host cell membrane integrity and facilitate bacterial entrance. YE105_C3581 appears to be a protein kinase gene and may act to negatively control the host innate immune response. Finally, YE105_P0044 is almost identical to lcrQ, the product of which is known to determine the substrate specificity of Ysc (45).
Genomic comparison with strain 8081 revealed the presence of a transporter system unique to 105.5R(r), which was somewhat similar to the ATP binding cassette transporter system of the enteroaggregative Escherichia coli (EAEC) virulence plasmid. The aat (enteroaggregative ABC transporter) gene cluster is known to encode a specialized ABC transporter, which plays a role in pathogenesis by transporting out dispersin to promote detachment and dispersal of the bacterial cells (34). This gene cluster, composed of 10 CDSs (YE105_C3315 to YE105_C3324), carries all five genes required for ATP binding and formation of transporter components; however, the aatD gene was found to be truncated, and the genetic arrangement was different from that typically observed (Fig. 5). We applied the dispersin (aap gene product) protein sequence from EAEC042 (CBG27807) to search the 105.5R(r) genome using the BLASTP program, with an E value of 1e−2, and found no hits. Thus, this particular transporter system may use other yet-unidentified effectors to carry out transporter functions.
Genome comparison with strain 8081 and metabolic pathway analysis revealed the presence of three toxin-related gene clusters in the 105.5R(r)-specific region.
Toxin complex (TC) proteins, termed TcaABC, TcdAB, and TccABC, with insecticidal activity have been identified in a variety of bacteria. The TC gene organization presents a high degree of conservation, but remarkable functional differences have been observed. The Y. pseudotuberculosis and Y. pestis toxin complexes have been experimentally demonstrated as active against cultured mammalian cells (17). The TC gene cluster in strain 105.5R(r) (YE105_C3507 to YE105_C3520) presented the same homology groups (tcaAB, tcaC, and tccC) as Y. enterocolitica W22703 (2/O:9) (13). Both tcaB and tcaC genes in Y. enterocolitica W22703 harbor frameshift mutations that result in two open reading frames (ORFs), but all the genes in strain 105.5R(r) are intact CDSs without any apparent truncations. The overall genetic organizations of this gene cluster are almost identical for these two strains, which express insecticidal determinants. The TC pathogenic island was probably acquired by an ancestral Yersinia strain prior to the separation of species within the Yersinia genus. The offspring strains could then have evolved the ability to exploit invertebrates by the acquisition of further genetic determinants required for interaction with those particular hosts (18). Although strain 8081 lacks any TC-like genes, it demonstrates equal toxicity to fleas infected with a Y. enterocolitica strain that contains the tcdB-tccC gene pairs (10). This finding indicates that strain 105.5R(r) most resembles a characteristic ancestral strain, as opposed to 8081, which has undergone large-scale genomic evolution.
One locus composed of three CDSs (YE105_C1278 to YE105_C1280) had high similarity to the Vibrio cholerae RTX toxin gene cluster. In particular, it has an rtxC activator gene (which is the hallmark feature of RTX toxins), an rtxH peptide chain release factor gene, and an rtxA remnant cytotoxin gene. Y. enterocolitica O:3 has a similar locus (GenBank accession number AM258967) with an intact rtxA gene. The 105.5R(r) remnant rtxA was found to be disrupted by two mutations, which divided it into three parts; yet, this gene shared 50% to 65% similarity with the rtxA gene of V. cholerae RC385. The major difference between the RTX loci of Y. enterocolitica and V. cholerae was that the associated ABC transporter system encoded by rtxB and rtxD was absent in Y. enterocolitica. We compared the amino acid sequences of 1,431 housekeeping genes of V. cholerae and Y. enterocolitica and found that their similarity was in the range of 74.79% ± 23.50%. The overall level of amino acid similarity in the RTX gene cluster of these two species was also in this range (between 50% and 77%). No insertion element or transposase was found upstream and downstream of this region. These findings indicated that the RTX gene cluster existed in an ancestral strain of Y. enterocolitica and that this gene cluster had undergone gene deletion and mutation since then.
Another locus consisting of four CDSs (YE105_C0155 to YE105_C0158) was found to resemble a colicin E2 immunity protein gene cluster. Colicin E2 is known to cause DNA breakdown. Moreover, the release of colicin E2 protein requires the presence of the periplasmic and inner membrane Toll components (9). The colicin E2 gene cluster encodes three components of this operon: colicin, immunity protein, and lysis signal peptide. The YE105_C0155, YE105_C0156, and YE105_C0158 genes resemble the colicin immune-related protein. Importantly, the strain 105.5R(r) carries the Toll components necessary for colicin translocation; however, no homolog of the lysis peptide was identified within this region. Whether or not the colicin-like gene cluster in this strain is able to function as such requires further investigation.
The gene set composed of 42 CDSs (YE105_C3260 to YE105_C3301) was found to be responsible for flagellar assembly and chemotaxis. In this tightly clustered set, the first 14 CDSs (YE105_C3260 to YE105_C3273) were found to closely resemble the corresponding genes found in Y. enterocolitica W22703 (2/O:9); however, the remaining portion showed high identity with flagellar genes of other Yersinia species. This suggested that a genomic recombination event might have occurred in this region. Only the left boundary of the genomic island was found in the genome, and this boundary was disassembled by a reduced G+C content bordered by an IS3-type transposase (YE105_C3312). The right border of the genomic island did not display any distinctive insertion features. The Flag-2 gene cluster was also found to be absent from the corresponding genomic location in strain 8081 but was present in biotypes 2 to 5 (4).
According to the biotype identification scheme reviewed by Bottone (3), Y. enterocolitica biotypes 2 and 3 were characterized as nearly identical to one another; the only exception was found in indole production, with biotype 2 presenting a variable status and biotype 3 having a negative reaction. Previous microarray analysis based solely on the genes of strain 8081 revealed that biotypes 2 and 3 have almost the same gene composition, while other biotypes show significantly different gene patterns (39). However, the microarray strategy used was limited by its inability to detect genes that are not present in strain 8081; thus, those genes expressed only in other lineages were not able to be analyzed by this approach.
Bioserotype 3/O:9 is one of the predominant pathogens that afflict the populations of European countries and Japan. Although the predominant bioserotype in China is 2/O:9, we chose a 3/O:9 strain for analysis in this study as a representative of the Old World strains due to the high level of homology between biotypes 2 and 3 (42). The genome sequences of strain 105.5R(r) agreed with the results obtained using eight biotype 3 strains in the previous microarray analysis; specifically, they all have tad, fep, fes, hyb, proP, speF, and potE gene clusters and lack cusA, cusB, alkB, flgII, ysa, yst1, and YAPI genomic islands (39). The comparative genome analysis presented herein not only revealed the absence of high-pathogenicity-related genomic islands but also identified several unique pathogenic islands in strain 105.5R(r).
Strain 8081 was determined to have undergone a large amount of gene acquisition and loss; for instance, the acquisition of the substantially large plasticity zone probably arose through a series of independent insertions (5). In contrast to strain 8081, 105.5R(r) was found to lack a high-pathogenicity plasticity zone containing abundant virulence determinants; in addition, the pathogenic genomic islands, especially the strain-specific ones, were found to be scattered throughout the genome. The proposed gene loss and acquisition were supported by the IS elements analysis, which revealed that transposase enzyme genes were located throughout the whole genome and were not localized to any particular region. Phylogenetic analysis of T3SS and the comparison of several ancestral gene clusters, especially the toxin complex and the flagellar gene cluster, revealed that strain 105.5R(r) better resembles other strains within this species. This finding is consistent with the fact that the Y. enterocolitica 3/O:9 belongs to the comparatively low-pathogenicity group, which most Y. enterocolitica strains belong to.
The number of the core set of genes encoding orthologous proteins shared by strain 105.5R(r) and 8081 was determined to be higher than the number of core genes detected in Yersinia (39). This is consistent with the fact that these two strains are of the same species; however, their genomes present a high level of variation when compared with one another. The previous microarray result that indicated that most biotypes are different confirmed that Y. enterocolitica is a highly divergent heterogeneous species and strains 105.5R(r) and 8081 lie in distinct branches. However, the information provided by comparison of these two complete genomes remains insufficient to confirm which strain is closer to the ancestral strain. Further genome sequencing of other bioserotype strains is necessary to definitively characterize this heterogeneous species.
This work was supported by the Chinese National Science Fund for Distinguished Young Scholars (30788001), the National 863 Program (2010AA10A203), the National 973 program of China (grant 2009CB522603), the NSFC Key Program (grant 31030002), and the National Key Programs for Infectious Diseases of China (2009ZX10004-108 and 2009ZX10004-203).
†Supplemental material for this article may be found at http://jcm.asm.org/.
Published ahead of print on 16 February 2011.