General features of Y. enterocolitica strain 105.5R(r).
The genome of strain 105.5R(r) is composed of a single circular chromosome, 4,552,107 bp in length, and a pYV virulence plasmid, 69,704 bp long, that is nearly identical to the published pathogenic pYV plasmid from biotypes 2 to 5. The whole genome contains a total of 4,021 predicted coding sequences (CDSs), 76.6% of which could be annotated with known or predicted functions. The chromosome contains 3,935 predicted genes, of which 3,012 were assigned a known function and 923 were most similar to hypothetical proteins in the public database. In addition, 85 pseudogenes were found, as well as 71 tRNA-encoding genes and 7 rRNA operons ( and ). The pYV plasmid was found to carry 86 protein-encoding genes and 6 pseudogenes. The chromosome carries 67 IS elements belonging to 9 IS families, while the plasmid has 3 IS elements of 2 IS families.
Fig. 1. Circular representation of the strain 105.5R(r) chromosome. The outer scale shows the size in bp. From the outside in, the first circle shows strain 105.5R(r)-specific genomic islands, compared with those for 8081; the second and third circles show the (more ...)
Genome properties of strains 105.5R(r) and 8010
Comparative genomics with Y. enterocolitica strain 8081.
The structure and size characteristics of the chromosome and plasmid of strain 105.5R(r) were similar to those of 8081 (), with an average nucleotide identity of 95%. The chromosome of strain 105.5R(r) is about 63 kb smaller, while the plasmid is about 2 kb larger. There was no significant difference found in amino acid composition between the two strains. Whole-genome nucleotide alignment was performed to determine the synteny of these two strains. The results indicated partial synteny, with numerous inversions, rearrangements, and indels being present (). A large region of approximately 2.7 Mb, corresponding to half of the entire genome, is inverted between the two genomes. As 105.5R(r)'s genome harbors a large number and variety of IS elements, the occurrence of genome-wide inversion, rearrangement, and indels might be expected. In fact, most of the synteny breakpoints between the two genomes were found to be bounded by IS elements. Five 105.5R(r)-specific phages identified may have also contributed to structural rearrangements.
Fig. 2. Whole-genome comparison of strains 105.5R(r) and 8081. The coordinate image shows the whole-genome sequence alignment (per the MUMmer package, version 3.0). The forward and reverse matches are indicated by the upward and downward slopes, respectively. (more ...)
Although the total number and type of IS elements carried by strain 105.5R(r) were comparable to those carried by 8081, the diversity and distribution in IS families were different. Similar to what has been observed in Yersinia pestis strains, IS elements in 105.5R(r) were not evenly represented by the different types. IS1667 was predominant, and three specific ISs (IS21, IS256, and ISNCY) were found on the chromosome. IS3 and IS1666 were found only on the plasmid. All of the IS elements were located in the strain 105.5R(r)-specific region. Six IS elements were found to contain genes associated with transcription regulation, while three carried ABC transporter genes and two were related with virulence. In addition, three ISs were located in phage, and one was located in the flag-2 genes.
Comparison of the two genomes revealed that they share 3,431 core CDSs () and the ortholog genes dominated most of the COG (clusters of orthologous groups) categories of metabolism functions. The YGI-1 island and the Y. enterocolitica
type II secretion cluster were found in both of the two strains. Most of the virulence determinants in Y. enterocolitica
, and yopT
) were also present in the genome of strain 105.5R(r) and corresponded with its identity as a pathogenic strain (12
Fig. 3. Comparison of gene contents in strains 105.5R(r) and 8081. (A) Venn diagram of the orthologous and specific genes in each strain. (B) COG categories of the orthologous and specific genes in each strain. The alphabetic code for the column charts is as (more ...)
Considerable variation in the gene repertoire between the two strains was also found. A significant proportion of genes are unique to strain 8081 (16.3%) or 105.5R(r) (14.7%). More 105.5R(r)-specific genes belonged to COGs representing the L (replication, recombination, and repair) and N (cell motility) groups than 8081-specific genes; fewer unique genes fell into the H (coenzyme transport and metabolism), I (lipid transport and metabolism), S (function unknown), and T (signal transduction mechanisms) groups. Results from metabolic network analysis indicated that the 105.5R(r)-specific genes could be mapped to 117 different KEGG orthology (KO) categories and were dispersed among 19 pathways (see Table S1 in the supplemental material).
Unique genomic islands identified in strain 105.5R(r).
Many of the 105.5R(r)-specific genes form genomic islands. Fifteen of the GIs that were larger than 4.2 kb were able to be identified by the methods described above; these GIs included a novel T3SS, an ATP binding cassette transporter system, an insecticidal toxin complex (TC) gene cluster (13
), a Vibrio cholerae
RTX toxin gene cluster, a colicin E2 immunity protein gene cluster, a flagellar gene cluster (Flag-2) (4
), a respiration-related gene cluster, and five prophage-related gene clusters. The high-pathogenicity island (HPI) (35
type II secretion 1 (yts1
), and Yersinia
secretion apparatus (ysa
) T3SS, all of which characterize the high-pathogenicity strains of biotype 1B (19
), were markedly absent from the 105.5R(r) genome. The 105.5R(r)-specific GIs which may confer pathogenic features are as follows.
(i) Novel type III secretion system.
Type III secretion systems are widely utilized among proteobacterial pathogens of plants, animals, and humans and constitute one of the most fundamental virulence determinants (20
). Studies into yersiniosis using a mouse model system have demonstrated that the T3SS plays an important role in Y. enterocolitica
colonization of gastrointestinal tissue during the earliest stages of infection (41
Strain 8081 has two sets of T3SSs, which act independently of one another: the Yop T3SS on pYV and the ysa T3SS, which is carried on the plasticity zone of the chromosome (YE3450 to YE3644). In strain 105.5R(r), a plasmid-encoded Yop T3SS which resembled the corresponding one in the 8081 pYV plasmid was identified, but ysa was absent from the 105.5R(r) chromosome. Strain 105.5R(r) carried a second T3SS on the chromosome, which was composed of 30 CDSs (YE105_C0312 to YE105_C0341) in its specific region. In strain 8081, the genes of the ysa are arranged into two divergently oriented clusters; however, the T3SS transcript orientation on the chromosome of 105.5R(r) was arranged for expression in only one direction.
Phylogenetic analysis was applied to determine the relationship between the T3SS on the chromosome of strain 105.5R(r) and other closely related bacterial strains. Comparison of its sequences with the two T3SS of strain 8081 revealed that seven genes exist in all three sets, but two of those seven showed very low similarity between the two strains. Hence, phylogenetic analysis was carried out using the other five genes of these two strains, eight strains of other Yersinia species and 41 strains of 16 closely related genera (). The T3SS of strain 105.5R(r) resembled those sequences in the same genus other than 8081, and was closely related to Salmonella spp. and “Candidatus Hamiltonella defensa.” The ysa gene in 8081 was closest to that in Y. pestis CO92 and significantly more similar to those of other genera. The ysa gene is located in the substantially large plasticity zone that is believed to have been acquired by horizontal gene transfer. Thus, the T3SS of strain 105.5R(r) may represent the original and conservative characteristics of the Yersinia genus, and 8081 may have attained the ysa gene after its divergence.
Fig. 4. Cladogram based on the amino acid sequence data of type III secretion systems. The phylogenetic analysis was based on the sequences of 51 T3SSs within 17 genera and performed with the neighbor-joining method. Calculations were performed using the two-parameter (more ...)
In the Yersinia genus, translocation of toxic virulence effectors into host cells by type III secretion systems plays an essential role in determining the outcome of infection. Strain 105.5R(r) carries the six common pYV plasmid-encoded Yersinia outer proteins (known as Yop effectors [YopE, YopH, YpkA, YopM, YopJ/P, and YopT]) delivered by Yop T3SS; however, the ysa-related Yersinia secreted proteins (Ysps) do not exist in this strain.
Homology searches of these two strains against more than 300 effectors from various type III secreting organisms (40
), including plant and animal pathogens and symbionts, revealed that each strain has its specific effectors scattered throughout the genome. Strain 105.5R(r) has six specific effector-encoding genes (YE105_C0316, YE105_C0320, YE105_C0322, YE105_C2952, YE105_C3581, and YE105_P0044), of which three (YE105_C0316, YE105_C0320, and YE105_C0322) are located in the T3SS region. YE105_C0316 shows similarity to sseB
of Shewanella baltica
, a translocon component for effector proteins (33
). YE105_C0320 and YE105_C0322 resemble sopB
) and sseF
), respectively, the protein products of which are known to modulate membrane structure and localization of vacuoles during bacterial infections. The product of YE105_C2952 acts in conjunction with SopB to modulate host cell membrane integrity and facilitate bacterial entrance. YE105_C3581 appears to be a protein kinase gene and may act to negatively control the host innate immune response. Finally, YE105_P0044 is almost identical to lcrQ
, the product of which is known to determine the substrate specificity of Ysc (45
(ii) ATP binding cassette transporter system.
Genomic comparison with strain 8081 revealed the presence of a transporter system unique to 105.5R(r), which was somewhat similar to the ATP binding cassette transporter system of the enteroaggregative Escherichia coli
(EAEC) virulence plasmid. The aat
(enteroaggregative ABC transporter) gene cluster is known to encode a specialized ABC transporter, which plays a role in pathogenesis by transporting out dispersin to promote detachment and dispersal of the bacterial cells (34
). This gene cluster, composed of 10 CDSs (YE105_C3315 to YE105_C3324), carries all five genes required for ATP binding and formation of transporter components; however, the aatD
gene was found to be truncated, and the genetic arrangement was different from that typically observed (). We applied the dispersin (aap
gene product) protein sequence from EAEC042 (CBG27807) to search the 105.5R(r) genome using the BLASTP program, with an E value of 1e−2, and found no hits. Thus, this particular transporter system may use other yet-unidentified effectors to carry out transporter functions.
Fig. 5. Comparison of the aat clusters in strain 105.5R(r) and Escherichia coli strain 042(EAEC). The aat gene cluster is known to encode a specialized ABC transporter, which promotes pathogenesis by transporting virulent effector proteins out of the bacterial (more ...) (iii) Toxin-related gene clusters.
Genome comparison with strain 8081 and metabolic pathway analysis revealed the presence of three toxin-related gene clusters in the 105.5R(r)-specific region.
Toxin complex (TC) proteins, termed TcaABC, TcdAB, and TccABC, with insecticidal activity have been identified in a variety of bacteria. The TC gene organization presents a high degree of conservation, but remarkable functional differences have been observed. The Y. pseudotuberculosis
and Y. pestis
toxin complexes have been experimentally demonstrated as active against cultured mammalian cells (17
). The TC gene cluster in strain 105.5R(r) (YE105_C3507 to YE105_C3520) presented the same homology groups (tcaAB
, and tccC
) as Y. enterocolitica
W22703 (2/O:9) (13
). Both tcaB
genes in Y. enterocolitica
W22703 harbor frameshift mutations that result in two open reading frames (ORFs), but all the genes in strain 105.5R(r) are intact CDSs without any apparent truncations. The overall genetic organizations of this gene cluster are almost identical for these two strains, which express insecticidal determinants. The TC pathogenic island was probably acquired by an ancestral Yersinia
strain prior to the separation of species within the Yersinia
genus. The offspring strains could then have evolved the ability to exploit invertebrates by the acquisition of further genetic determinants required for interaction with those particular hosts (18
). Although strain 8081 lacks any TC-like genes, it demonstrates equal toxicity to fleas infected with a Y. enterocolitica
strain that contains the tcdB-tccC
gene pairs (10
). This finding indicates that strain 105.5R(r) most resembles a characteristic ancestral strain, as opposed to 8081, which has undergone large-scale genomic evolution.
One locus composed of three CDSs (YE105_C1278 to YE105_C1280) had high similarity to the Vibrio cholerae RTX toxin gene cluster. In particular, it has an rtxC activator gene (which is the hallmark feature of RTX toxins), an rtxH peptide chain release factor gene, and an rtxA remnant cytotoxin gene. Y. enterocolitica O:3 has a similar locus (GenBank accession number AM258967) with an intact rtxA gene. The 105.5R(r) remnant rtxA was found to be disrupted by two mutations, which divided it into three parts; yet, this gene shared 50% to 65% similarity with the rtxA gene of V. cholerae RC385. The major difference between the RTX loci of Y. enterocolitica and V. cholerae was that the associated ABC transporter system encoded by rtxB and rtxD was absent in Y. enterocolitica. We compared the amino acid sequences of 1,431 housekeeping genes of V. cholerae and Y. enterocolitica and found that their similarity was in the range of 74.79% ± 23.50%. The overall level of amino acid similarity in the RTX gene cluster of these two species was also in this range (between 50% and 77%). No insertion element or transposase was found upstream and downstream of this region. These findings indicated that the RTX gene cluster existed in an ancestral strain of Y. enterocolitica and that this gene cluster had undergone gene deletion and mutation since then.
Another locus consisting of four CDSs (YE105_C0155 to YE105_C0158) was found to resemble a colicin E2 immunity protein gene cluster. Colicin E2 is known to cause DNA breakdown. Moreover, the release of colicin E2 protein requires the presence of the periplasmic and inner membrane Toll components (9
). The colicin E2 gene cluster encodes three components of this operon: colicin, immunity protein, and lysis signal peptide. The YE105_C0155, YE105_C0156, and YE105_C0158 genes resemble the colicin immune-related protein. Importantly, the strain 105.5R(r) carries the Toll components necessary for colicin translocation; however, no homolog of the lysis peptide was identified within this region. Whether or not the colicin-like gene cluster in this strain is able to function as such requires further investigation.
(iv) Flagellar gene cluster (Flag-2).
The gene set composed of 42 CDSs (YE105_C3260 to YE105_C3301) was found to be responsible for flagellar assembly and chemotaxis. In this tightly clustered set, the first 14 CDSs (YE105_C3260 to YE105_C3273) were found to closely resemble the corresponding genes found in Y. enterocolitica
W22703 (2/O:9); however, the remaining portion showed high identity with flagellar genes of other Yersinia
species. This suggested that a genomic recombination event might have occurred in this region. Only the left boundary of the genomic island was found in the genome, and this boundary was disassembled by a reduced G+C content bordered by an IS3
-type transposase (YE105_C3312). The right border of the genomic island did not display any distinctive insertion features. The Flag-2 gene cluster was also found to be absent from the corresponding genomic location in strain 8081 but was present in biotypes 2 to 5 (4