|Home | About | Journals | Submit | Contact Us | Français|
Citrobacter rodentium (formally Citrobacter freundii biotype 4280) is a highly infectious pathogen that causes colitis and transmissible colonic hyperplasia in mice. In common with enteropathogenic and enterohemorrhagic Escherichia coli (EPEC and EHEC, respectively), C. rodentium exploits a type III secretion system (T3SS) to induce attaching and effacing (A/E) lesions that are essential for virulence. Here, we report the fully annotated genome sequence of the 5.3-Mb chromosome and four plasmids harbored by C. rodentium strain ICC168. The genome sequence revealed key information about the phylogeny of C. rodentium and identified 1,585 C. rodentium-specific (without orthologues in EPEC or EHEC) coding sequences, 10 prophage-like regions, and 17 genomic islands, including the locus for enterocyte effacement (LEE) region, which encodes a T3SS and effector proteins. Among the 29 T3SS effectors found in C. rodentium are all 22 of the core effectors of EPEC strain E2348/69. In addition, we identified a novel C. rodentium effector, named EspS. C. rodentium harbors two type VI secretion systems (T6SS) (CTS1 and CTS2), while EHEC contains only one T6SS (EHS). Our analysis suggests that C. rodentium and EPEC/EHEC have converged on a common host infection strategy through access to a common pool of mobile DNA and that C. rodentium has lost gene functions associated with a previous pathogenic niche.
The mouse has been utilized extensively as a model for studies of infection and immunity (11, 25, 41, 44). Mice can be housed in relatively simple facilities and can be maintained under controlled pathogen- or germ-free conditions. Further, the availability of inbred or genetically manipulated mice greatly enhances the potential of any studies (35, 80). Although murine infection systems are used extensively, relatively few pathogens have been exploited that are naturally virulent for the species.
Spontaneous disease outbreaks in mouse colonies in the United States and Japan in the 1960s and 1970s were associated with infections by the Gram-negative pathogens Escherichia coli (Ex30, also known as murine pathogenic E. coli [MPEC]), Citrobacter freundii ANL, and C. freundii biotype 4280 (3, 9, 25, 43, 64). On the basis of DNA relatedness, these pathogenic strains were subsequently assigned to a separate species, Citrobacter rodentium (54, 84). C. rodentium is highly infectious, causing colitis and transmissible colonic hyperplasia (61). Following ingestion, C. rodentium colonizes the intestines of mice, residing predominantly in the cecum and colon (91), where in some inbred strains it can drive hyperplasia or overgrowth of the epithelium. C. rodentium-mediated colonic hyperplasia is highly infectious in poorly managed mouse facilities, spreading between mice via contaminated feces and environments (93). Some mouse strains are relatively resistant to clinical disease, but in such strains, C. rodentium lives in the intestine as a component of the microbiota; younger mice are generally more susceptible to infection (89).
C. rodentium is genetically related to Escherichia coli and is a member of the Enterobacteriacae. An extracellular pathogen, C. rodentium colonizes the lumen of the mouse gut mucosa by formation of so-called attaching/effacing (A/E) lesions on the apical surfaces of the enterocytes (82; reviewed in references 30 and 61). A/E lesions, which are characterized by effacement of the brush border microvilli and intimate bacterial attachment to pedestal-like structures, are also classically associated with the pathogenesis of enteropathogenic E. coli (EPEC), a leading cause of pediatric diarrhea, and enterohemorrhagic E. coli (EHEC), which causes diarrhea, hemorrhagic colitis, and hemolytic uremic syndrome (HUS) (reviewed in reference 45). In this respect, C. rodentium, EPEC, and EHEC share a common virulence strategy.
A/E lesion formation is dependent on the expression of intimin, a type III secretion system (T3SS) (reviewed in reference 31) and effector proteins encoded on a specific pathogenicity island (PI) known as the locus for enterocyte effacement (LEE) (57). The LEE of C. rodentium, and those of EPEC and EHEC, are horizontally acquired sections of DNA, and although not identical, they are highly related in terms of overall genetic organization and gene function (19, 26, 75). The fact that mice are resistant to EPEC and EHEC infection (60) makes C. rodentium an ideal animal model to study the roles of conserved genes in pathogenesis in vivo.
Aside from the role of the LEE in pathogenesis, a number of other genes have been shown to contribute to C. rodentium disease, including type IV pili (63) and various T3SS effector proteins (33, 39, 46, 55, 62, 86). However, direct comparisons between the virulence gene repertoires of C. rodentium and other enteric pathogens have been limited. Whole-genome sequencing has proved to be a powerful approach to defining the genetic structure of bacteria, providing a genomewide blueprint to guide genetic analysis. A fully annotated genome sequence can guide targeted mutagenesis and transcriptomic or proteomic studies. Here, we report the fully annotated genome sequence of a murine virulent strain of C. rodentium, ICC168, and make comparisons with the genomes of EPEC, EHEC, and other enteric bacteria.
The strain of C. rodentium sequenced, strain ICC168, was obtained in 1993 from S. W. Barthold's original stocks, previously known as Citrobacter freundii biotype 4280 (ATCC 51459) (3, 4), and subjected to minimal passages. The bacterial strains, plasmids, and primers used in the biological studies are shown in Table Table1.1. Ten μg genomic DNA was fragmented by multiple passages through a 30-gauge needle and end repaired with a mixture of Klenow polymerase, T4 DNA polymerase, and polynucleotide kinase before size selection by preparative electrophoresis through a 0.8% agarose gel. The recovered sized fractions were subjected to a second round of end repair and size selection. Two- to 2.2-kb and 2.2- to 4-kb fractions were ligated into pUC19_smaI, and 4- to 5-, 5- to 6-, 6- to 9-, and 9- to 10-kb fractions were ligated into the low-copy-number vector pMAQ1b_smaI. Two bacterial artificial chromosome (BAC) libraries containing 15- to 25- and 25- to 40-kb inserts were constructed as previously described (69). End sequences obtained from 672 clones from each library were used to scaffold the assembly. The whole-genome shotgun was sequenced to approximately 9× coverage on ABI3730 automated sequencers using BigDye terminator chemistry.
The sequence was assembled, finished, and annotated as previously described (72). The program Artemis (81) was used to collate data and facilitate annotation. Base 1 of the C. rodentium genome matched that of E. coli K-12 MG1655 to aid whole-genome alignments, making the first gene thrL. The origin of replication is located between positions 4232946 and 4233181.
Pseudogenes were defined as coding sequences (CDS)s that had one or more possible inactivating mutations based on similarity with intact CDSs in related bacteria. Each single frameshift mutation or premature stop codon within a CDS was confirmed by checking the original sequencing data.
In addition, in order to provide a robust whole-genome phylogenetic species tree for Citrobacter, we sequenced the whole genome of Citrobacter freundii ballerup 7851 using 454 FLX pyrosequencing, assembled using the 454/Roche Newbler assembly program into 357 contigs (N50 contig size, 60,237 bp) from 410,057 sequence reads with an average read length of 177 bp to give a total sequence length of 4,904,659 bp.
The following genome sequences were used in genome comparisons with C. rodentium ICC168: E. coli K-12 MG1655 (K-12; accession number U00096), E. coli E2348/69 (EPEC; accession number FM180568), E. coli O157:H7 Sakai (EHEC; accession number BA000007), Salmonella enterica subspecies enterica serovar Typhi CT18 (accession number AL513382), S. enterica subspecies enterica serovar Typhimurium LT2 (accession number AE006468), S. enterica subspecies enterica serovar Enteritidis P125109 (accession number AM933172), Citrobacter koseri ATCC BAA-895 (accession number CP000822), Yersinia enterocolitica 8081 (accession number AM286415), Pectobacterium atrosepticum SCRI1043 (accession number BX950851), Klebsiella pneumoniae 342 (accession number CP000964), Enterobacter sakazakii ATCC BAA-894 (accession number CP000783), and Serratia marcescens Db11 (http://www.sanger.ac.uk/Projects/S_marcescens/).
Pairwise whole-genome comparisons of C. rodentium with a range of enteric bacteria were performed using TBLASTX and visualized using the Artemis Comparison Tool (13). Circular diagrams were made using DNAplotter (12). Average nucleotide identities (ANI) were calculated from BLASTN matches over the entire lengths of the bacterial chromosomes. Orthologues in C. rodentium ICC168 and the three E. coli genomes (K-12, EPEC, and EHEC) were identified using an all-against-all reciprocal FASTA comparison of translated DNA with at least 40% identity over 80% of the length.
The sequences for each of seven housekeeping genes (adk, fumC, gyrB, icd, mdh, purA, and recA) were extracted from the whole-genome sequences of several enteric bacteria, individually aligned using MUSCLE (24), and then concatenated. FindModel (77) was used to identify the best phylogenetic model to fit the data. Phylogenetic trees were constructed by the maximum-likelihood method using the General Time Reversible (GTR) plus Gamma model with PhyML (34) and Bayesian likelihood using GTR plus Gamma with 100,000 generations, sampling every 1,000, and a burn-in of 1,000 using MrBayes, and gave consistent results.
C. rodentium strain ICC169 ΔespA (ICC304) was constructed using a modified version of the lambda red-based mutagenesis system (18). The espA gene and flanking regions were PCR amplified from wild-type C. rodentium ICC169 genomic DNA using the primers EspA-Fw/BamHI-EspA-Rv and BamHI-EspA-Fw/EspA-Rv (Table (Table1).1). The two fragments were digested with BamHI, ligated to each other, and cloned into pGEMT, and the nonpolar aphT cassette was then inserted into the BamHI site between the two fragments. After verification for correct orientation of the kanamycin resistance gene cassette, the insert was PCR amplified using the EspA-Fw and EspA-Rv primers (Table (Table1).1). The PCR product was electroporated into ICC169 containing pKD46 encoding the lambda red recombinase (18). Transformants were selected on kanamycin plates, and the correct insertion of the kanamycin resistance gene cassette was confirmed by PCR.
Full-length espA was amplified using C. rodentium ICC169 genomic DNA as a template and the primer pair EcoRV-rbsEspA-Fw and BamHI-EspA-Rv (Table (Table1).1). The PCR product was digested and ligated into the EcoRV/BamHI sites of pACYC184 to produce plasmid pICC490, which constitutively expresses espA from the tetracycline promoter (Table (Table1).1). The construction was confirmed by DNA sequencing.
Genes encoding EspS and EspH fusions to TEM-1 β-lactamase were constructed by PCR amplification using ICC169 or E2348/69 genomic DNA and the primer sets NdeI-EspS-Fw/EcoRI-EspS-Rv and Nde-EspH-Fw/EcoRI-EspH-Rv, respectively (Table (Table1).1). The plasmids resulting from the ligation of the digested PCR products into pCX340 were called pICC453 and pICC454, respectively.
Pathogen-free male C3H/Hej mice, 6 to 8 weeks old, were purchased from Harlan Olac (Bichester, United Kingdom). All animals were housed in individually HEPA-filtered cages with sterile bedding and free access to sterilized food and water. All animal experiments were performed in accordance with the Animals Scientific Procedures (Act 1986) and were approved by the local ethical review committee. Independent single-infection experiments were performed twice using four mice per group. The mice were inoculated by oral gavage with 200 ml of overnight LB-grown C. rodentium suspension in phosphate-buffered saline (PBS) (~5 × 109 CFU). The number of viable bacteria used as an inoculum was determined by retrospective plating on LB agar containing antibiotics. Stool samples were recovered aseptically at various time points after inoculation, and the number of viable bacteria per gram of stool was determined by plating them on LB agar (93). The experiment was terminated at day 8 postinoculation by sacrificing the mice.
The translocation assay is an adaptation of the assay described previously (14). The Swiss 3T3 cell line was grown in Dulbecco's modified Eagle's medium (DMEM) containing 4,500 mg liter glucose−1 supplemented with 10% fetal calf serum and 2 mM glutamine at 37°C in 5% CO2. Two hundred μl of cells was seeded in black-wall/clear-bottom 96-well plates at a density of 2 × 104 cells per well to obtain 100% confluence on the day of infection. C. rodentium strains were grown for 8 h in LB broth and then transferred into fresh, sterile DMEM containing 1,000 mg liter glucose−1 and incubated statically at 37°C in 5% CO2 overnight prior to infection. Each well was infected with 40 μl of the static overnight culture of either C. rodentium ICC169 or ICC304 carrying either pICC453 or pICC454. This was then centrifuged at 1,000 rpm for 5 min at room temperature. The infection was carried out at 37°C in 5% CO2 for 2 h, followed by induction of the TEM-1 β-lactamase fusion protein for 1 h 30 min in the presence of 1 mM IPTG (isopropyl-β-d-thiogalactopyranoside). The cells were washed three times with Hanks' buffered salt solution (HBSS) and incubated for a minimum of 1 h 30 min in the dark at room temperature in 100 μl of 6 mM probenecid in HBSS-20 mM HEPES buffer supplemented with 20 μl of 6× CCF2/AM solution freshly prepared with the CCF2/AM loading kit (Invitrogen). The cells were washed in HBSS, and the fluorescence was quantified on a Fluostar Optima reader with excitation at 410 nm (10-nm band-pass), and emission was detected via 450-nm (blue fluorescence) and 520-nm (green fluorescence) filters. The translocation rate was expressed as the 450/520-nm emission ratio. Experiments were performed in triplicate.
The annotated genome sequence of C. rodentium ICC168 has been deposited in the public databases under accession numbers FN543502 (chromosome), FN543503 (pCROD1), FN543504 (pCROD2), and FN543505 (pCROD3). The sequence of the fourth plasmid was identical to that of the previously sequenced plasmid pCRP3 (accession number AF311902). The raw sequence data generated for Citrobacter freundii ballerup 7851 has been deposited in the trace archive under accession number ERA000106 (ftp://ftp.era.ebi.ac.uk/vol1/ERA000/).
The genome of C. rodentium ICC168 consists of a 5.3-Mb circular chromosome and four plasmids (pCROD1 to -3 and pCRP3) that range in size from 54 to 3 kb (Table (Table2;2; see Table S1 in the supplemental material). The genome of C. rodentium ICC168 shares significant synteny with those of E. coli K-12 and other members of the Enterobacteriaceae; however, there is evidence of repeated mediated recombination events (see below and data not shown). The chromosome is predicted to contain 4,984 CDSs, including 182 pseudogenes, 86 tRNAs, and 7 rRNA operons. We identified a total of 10 prophage-like regions, including five complete prophages, which were mainly Mu- or P2-like, and four putative lambdoid prophage remnants (Table (Table3)3) . Seventeen genomic islands (GIs) were also identified within the chromosome (GI1 to GI16 and the LEE) as regions of aberrant GC content carrying mobility functions or showing evidence of integration, such as being flanked by direct repeats or insertion sequence (IS) elements. The exceptions are GI2, GI6, and GI13, which were identified on the basis of similarity to Salmonella pathogenicity islands 4 and 9. Many of the known and putative virulence factors of C. rodentium are encoded on the GIs (Table (Table33).
A total of 113 IS elements comprising 29 different types were found within the genome of C. rodentium ICC168, six of which (ISCro1 to -6) were newly identified in this study. The most abundant element, ISCro1, which had 22 intact copies in the genome sequence, was previously referred to as IS679 (19) but is actually a novel IS element. ISCro1 was found to be very similar to the single IS679 element in C. rodentium (59%, 97%, and 98% amino acid identity to transposases A, B, and C, respectively), and it is possible, therefore, that one is a derivative of the other. It has been suggested that IS679-like elements may be involved in the horizontal transfer of virulence determinants (32), and it is notable that both GI7, which encodes the virulence factor LifA (47), and the LEE are flanked by intact copies of ISEc14 at one end and ISCro1 at the other end.
To determine the relationship between C. rodentium, E. coli, and other enteric bacteria, a phylogenetic tree was constructed, based on the DNA sequences of seven conserved backbone genes located in core regions of the genome (Fig. (Fig.1).1). Figure Figure11 shows that C. rodentium clusters within the tree between other members of the Enterobacteriaceae. However, even though C. rodentium and C. koseri appear to cluster, the branch lengths separating them are almost as long as those separating E. coli and Salmonella, and C. freundii is more distantly related. These data indicate that Citrobacter as a genus is polyphyletic. This is consistent with the whole-genome ANI shared between C. rodentium and C. koseri (92.2%), E. coli K-12 (90.7%), Salmonella Typhi CT18 (90.7%), and C. freundii (88.9%).
Reciprocal FASTA comparisons showed that the number of genes shared between C. rodentium, EPEC O127:H6 E2348/69 (referred to here as EPEC) (42), and EHEC O157:H7 Sakai (referred to here as EHEC) (37) is 2,940 (Fig. (Fig.2).2). This compares to 3,305 shared between E. coli strains K-12 MG1655 (referred to here as K-12) (6), EPEC, and EHEC.
The numbers of CDSs unique to C. rodentium and those shared exclusively between C. rodentium and either EPEC or EHEC were also high: 1,585, 230, and 229 CDSs, respectively (Fig. (Fig.2).2). We analyzed these three CDS groups in more detail. We subtracted from the groups those genes that were also shared with K-12 (Fig. (Fig.2).2). This comparison has been used previously to differentiate core E. coli genes from pathotype-specific functions (42). Almost half of the CDSs C. rodentium shares with either EPEC (75/117 CDSs) or EHEC (67/146 CDSs), but without an orthologue in K-12, were found to be clustered and located on GIs or prophages (Fig. (Fig.3).3). Of the genes shared with either EPEC or EHEC that were in the “backbone” of C. rodentium, many had putative functions associated with metabolism and sugar transport and utilization. Of particular note, C. rodentium carries all of the genes in the propanediol utilization operon, which are also found in EPEC but are absent in many other E. coli strains, including EHEC and K-12 (42). This supports the suggestion that the cob-pdu locus originated in a common ancestor of the Enterobacteriaceae and has been lost repeatedly by different lineages (42, 79). Also, among the genes shared only with EHEC was a urease gene cluster. The lack of urease production in EHEC has been linked to a premature stop codon in ureD (38, 65); however, C. rodentium encodes the full-length UreD and can produce urease (84). Interestingly, also among the EHEC and C. rodentium shared genes were those encoding the clustered regularly interspaced short palindromic repeats (CRISPR)-associated proteins, which were absent from EPEC and divergent from those found in K-12. The C. rodentium genome sequence also contains two CRISPRs, located in genomic locations equivalent to those in EHEC but which contain different numbers of repeats and spacers. No CRISPRs were apparent within the genome of EPEC. Figure Figure33 shows that C. rodentium-specific genes were located throughout the chromosome, although 414 of the 1,578 unique genes (genes not shared with EPEC, EHEC, or K-12) were located on genomic islands or prophages and are likely to have been acquired by recent horizontal gene transfer.
Of the 2,940 core genes present in C. rodentium, EPEC, and EHEC, only 99 had no orthologues in K-12 (see Table S2 in the supplemental material), and of these 76 were located on GIs and prophages. Over half (49) of these 76 genes were associated with virulence and included those found on the LEE, as well as other non-LEE-encoded T3SS effectors. Of note, among the remaining shared genes was a cluster with a putative role in the production of 4-hydroxybenzoate decarboxylase, which is found sporadically in other members of the Enterobacteriaceae, as well as elsewhere, and may be involved in metabolism of aromatic compounds under anaerobic conditions (20, 53, 56).
The genome of C. rodentium ICC168 has 188 pseudogenes, 43% of which are found in prophages, GIs, or plasmids (summarized in Table S3 in the supplemental material). Almost a quarter of the pseudogenes (42/188 CDSs) have been caused by prophage- or IS-mediated insertional inactivation, including disruption of flagellum biogenesis genes and other core and accessory functions (see Table S3 in the supplemental material). The remainder of the pseudogenes are truncated by frameshift mutations or premature stop codons or are gene remnants identified by similarity to full-length genes in other bacteria. Fifty-six of 188 pseudogenes had only a single frameshift mutation or premature stop codon, indicating that these inactivating mutations were relatively recent events. There are a number of sugar uptake and energy generation pseudogenes (see Table S3 in the supplemental material), which may be indicative of metabolic streamlining. Also, a number of pseudogenes are associated with virulence factors, including several T3SS effector proteins and essential functions within three distinct fimbrial operons (Table (Table4;4; see Tables S3 and S4 in the supplemental material).
Through targeted investigations and comparison with EPEC and EHEC, a number of factors have previously been shown to be important for C. rodentium virulence. Here, we describe the full repertoire of the known and potential virulence-associated determinants that were identified in the genome sequence of ICC168 (summarized in Table S4 in the supplemental material).
C. rodentium has a type II secretion system (T2SS) encoded in a single locus (ROD_44871 to ROD_45001). Nine putative substrates for this secretion system were also identified, including four likely polysaccharide-degrading enzymes with similarity to chitinases encoded adjacent to the T2SS itself and four additional similar enzymes (including one encoded by a pseudogene) encoded elsewhere in the C. rodentium genome. C. rodentium also carries an orthologue of yodA, the product of which has been shown to be secreted by the pO157-encoded T2SS and to promote virulence in EHEC (40).
The LEE pathogenicity island, along with its associated T3SS, is conserved among the genomes of EPEC, EHEC, and C. rodentium, but with some notable differences. While the LEEs of EPEC and EHEC are inserted into the SelC tRNA locus, the C. rodentium LEE is in a nonsynonymous location flanked by IS elements, adjacent to another genomic island (GI8). Moreover, in relation to EPEC and EHEC, the rorf1 and espG genes of C. rodentium are located at the opposite end of the LEE (19).
C. rodentium possesses a repertoire of 35 genes (including six pseudogenes) (Table (Table4)4) encoding proteins that show significant sequence homology to known T3SS effector proteins in other systems or that we have shown to be translocated by the Citrobacter T3SS. In comparison, EPEC E2348/69 encodes 27 T3SS effectors (42), EPEC B171 has 40 (42), and EHEC Sakai has 62 effector proteins (88) (Table (Table4).4). All of the C. rodentium effectors are encoded in regions thought to have been laterally acquired. Of the 29 intact C. rodentium effectors, seven are encoded on the LEE; five are present on the prophage remnants denoted CPRr13, CPRr17, and CPRr33; and 14 effector proteins are encoded on five GIs (GI1, GI2, GI4, GI11, and GI14) (Table (Table3).3). The three remaining effectors are located within regions with low GC content. The three prophage-like elements, CRPr13, CRPr17, and CRPr33, each consist of two genes encoding putative phage-related functions, along with three different T3SS effectors each. It is notable that these regions share similarity with and are integrated at the same sites as lambdoid prophages identified within the genomes of EPEC and EHEC (Table (Table3)3) (37, 42). CRPr13 encodes the effectors EspX7, EspN1, and EspK, which are homologues of effectors encoded on EHEC prophage Sp6. CRPr17 encodes an Sp10-like integrase and homologues of the effectors NleC, NleB, and NleG, which are found encoded on different lambdoid prophages in EPEC and EHEC, and CRPr33 shares similarities with prophage Sp17 and encodes homologues of three of the four effectors found on this prophage. It therefore seems likely that these three regions are remnants of lambdoid prophages and that the nine associated effector proteins have been carried into the C. rodentium genome on these phages by specialized transduction.
C. rodentium genes encode at least one intact member of each of the core set of effector families, consisting of the seven LEE-encoded effectors plus the non-LEE-encoded NleA/EspI, NleB, NleE, NleF, NleG, NleH, and EspL, previously suggested to represent the minimum set required to facilitate infection (42), as they are common to all LEE-encoding species sequenced to date. All the effector families that are not represented in the C. rodentium genome are also absent from at least one of the other sequenced A/E pathogens, E2348/69, B171, and Sakai (Table (Table44).
In addition to homologues of EPEC and EHEC effectors, we identified a new T3SS translocated protein in C. rodentium, named EspS, which has some similarity to OspB of Shigella. Using a β-lactamase fusion, we showed that EspS is translocated in an EspA-dependent manner (Fig. (Fig.4).4). Mouse inoculation showed that a ΔespA mutant was severely attenuated in virulence (Fig. (Fig.5).5). C. rodentium also encodes the effector EspT, found only rarely in EPEC (1), which subverts the eukaryotic cytoskeleton by activating the small Rho GTPases Rac1 and Cdc42 (10), and an intact copy of EspV, which is considered to be a pseudogene in EHEC Sakai due to an N-terminal truncation (88). Interestingly, two identical copies of NleD are encoded within the genome of C. rodentium adjacent to identical transposases, and three EspN family members are present (though two of these are encoded by pseudogenes), whereas E2348/69, B171, and Sakai have only one or no representatives from these four families (Table (Table4).4). This indicates that there has been an expansion of the NleD-like gene repertoire in C. rodentium.
A new class of secretion system, the type VI secretion system (T6SS), has recently been described in a variety of bacterial species. T6SS gene clusters commonly consist of between 15 and 25 genes and were initially identified based on the presence of a gene encoding a homologue of the Legionella pneumophila type IV secretion system (T4SS) protein IcmF (17). A contribution to virulence was shown for T6SSs of several pathogenic bacteria (28). The T6SS apparatus is not well characterized yet, but recent studies indicate that several core components resemble bacteriophage tail proteins (49). The Hcp and VgrG proteins have been demonstrated to be secreted in several organisms; however, a possible dual role of these proteins as a structural part of the T6SS and secreted effector molecules is becoming evident (78).
There are two distinct T6SS gene clusters in the genome of C. rodentium, which we have designated Citrobacter rodentiumtype six secretion system cluster 1 (CTS1) and 2 (CTS2). CTS1 is composed of 39 genes, 18 of which are conserved components of the T6SS (Fig. (Fig.6).6). CTS1 is almost identical to a T6SS gene cluster carried by Enterobacter cancerogenus and also shows limited similarity to components of SPI-6 (also known as Salmonella enterica centisome 7 genomic island [sci]) (29). Like the T6SS cluster encoded by E. cancerogenus and the sci locus, CTS1 also carries a chaperone-usher fimbrial biogenesis operon (29). CTS1 contains all of the components that have been determined to be necessary for a functional T6SS in other bacteria (95). Interestingly, the icmF homologue in CTS1 has a frameshift at a region containing a polyadenosine tract, which may allow regulation of the gene via transcriptional slippage in an manner analogous to that of mixE of Shigella or the pyrBI locus in E. coli (52, 74).
CTS2 consists of 16 genes, 13 of which are conserved within other T6SS clusters (Fig. (Fig.6).6). Although CTS2 encodes all the components required for assembly of a functional T6SS, the icmF homologue in this cluster has a bona fide frameshift, which makes it unlikely that CTS2 is sufficient to produce a functional T6SS. CTS2 is highly similar to a putative T6SS in C. freundii, which has conserved gene order and an intact icmF homologue in a single frame (data not shown). CTS2 also shares limited similarity with components of the T6SS clusters in Pseudomonas species (59, 85).
The differing architectures and lack of homology between CTS1 and CTS2 suggest that these two T6SS clusters have distinct evolutionary histories. In addition to CTS1 and CTS2, which each encode one Hcp family protein, there are four other Hcp homologues encoded in the C. rodentium chromosome, which share little homology to each other and thus may also have been acquired independently.
EHEC also encodes a T6SS, which we have designated the enterohemorrhagic E. coli type six secretion system cluster (EHS), which is encoded in O island 7 and has high homology to the Shigella sonnei T6SS (85). EHS is distinct from both C. rodentium T6SSs and consists of 33 genes, 18 of which are conserved components of the T6SS (Fig. (Fig.6).6). Unlike in CTS1 and CTS2, the icmF gene in the EHS operon is not interrupted by a frameshift. Additionally, downstream of ehsG there is a region including genes that are associated with Rhs proteins (51, 90). EHEC encodes only two Hcp homologues, EhsH1 and EhsH2, which are both encoded in the EHS cluster and have very limited similarity to each other. In contrast to the T3SS, which is conserved in all A/E pathogens and is essential for virulence, the overall lack of homology between EHS, CTS1, and CTS2 suggests that the T6SS may facilitate more subtle strain-specific adaptations to divergent host or environmental niches.
Type I secretion systems (T1SS) are involved in the transport of a wide range of substrates, and in common with many other enteric bacteria, C. rodentium encodes multiple T1SSs. Of particular note are four genomic islands (GI2, GI6, GI8, and GI13) (Table (Table3)3) that are predicted to each encode T1SS apparatus, as well as a presumptive secreted substrate, a large (1,637- to 5,979-amino-acid [aa]) repetitive protein (Table (Table3).3). The gene products forming the T1SSs share ~50% amino acid identity with each other. The associated repetitive proteins show higher variation in protein length and lower overall sequence conservation, with ~25% amino acid identity between ROD_08971, ROD_25701, ROD_29581, and ROD_48171.
The arrangement, nature, and proposed functions of CDSs seen in the four C. rodentium T1SS genomic islands appear to suggest that they are functional modules that have been laterally acquired. Wider comparisons also showed that these GIs are similar in gene content, arrangement, and, in some instances, sequence to GIs previously seen in other enteric bacteria. Comparisons with these other loci showed that the large repetitive protein encoded by GI6 shares 47% identity over 3,790 aa with the large repetitive protein of S. Typhi encoded on SPI-9. Both GIs are inserted in the same intergenic region between ssrA and smpB.
GI2, GI8, and GI13 also have some similarity to SPI-9 and the related SPI-4; however, GI13 has a high degree of similarity to O island 28 in EHEC EDL933 (51% identity over 5,392 aa to the large repetitive protein), and the large repetitive protein of GI2 is similar to a putative hemagglutinin/hemolysin-related protein in Ralstonia solanacearum. SPI-9 has been shown to be required for intestinal colonization in mice and in vitro biofilm formation in S. Enteritidis (48); therefore, GI2, GI6, and GI13 may encode factors that play roles in the virulence of C. rodentium. However, transcription of the large repetitive protein of GI2 is unlikely to be due to the insertion of prophage CRP99 in the middle of the CDS. In addition, the CDS for the large repetitive protein on GI13 has been truncated by a premature stop codon; however, as it is located close to the C terminus, it is possible that a 5,884-aa protein may be expressed as opposed to the 5,979-aa full-length protein.
There are 20 putative type V secretion systems, or autotransporters, encoded in the C. rodentium genome. Fourteen of these, including one encoded by a pseudogene, were identified as putative autotransporter adhesins and were related to AIDA-I, TibA, or pertactin, which are involved in virulence in EPEC, enterotoxigenic E. coli (ETEC), and Bordetella pertussis, respectively (5, 8, 27). Two other autotransporters were predicted to belong to the serine protease autotransporter of the Enterobacteriaceae (SPATE) family of proteins, members of which have been shown to contribute to virulence in other pathogens (7, 23, 70). The majority of the autotransporters are located in the chromosomal backbone; however, one, AdcA, which was shown to function as an adhesin but had no obvious role in colonization of mice (36), is encoded on a genomic island. In addition, the two SPATE autotransporters are encoded by genes on the large plasmid pCROD1, along with a TibA-like autotransporter adhesin and its associated glycosyltransferase.
C. rodentium carries a total of 19 fimbrial biogenesis operons, including four incomplete operons (Table (Table5).5). Of the intact operons, the product of the Kfc chaperone-usher fimbrial operon and CFC type IV fimbriae were previously determined to be involved in gastrointestinal colonization (36, 63), whereas long polar fimbriae had no discernible role in the virulence of C. rodentium (87).
Of the nonfimbrial adhesins encoded by C. rodentium genes, LifA (47) and intimin (83) are essential for colonization. We also found two LifA homologues that are divergent from LifA at the N terminus but almost identical at the C terminus and that had 97% amino acid identity to each other. In addition, two intact genes and one pseudogene encoding intimin-like proteins were identified outside of the LEE (see Table S4 in the supplemental material).
Other predicted proteins encoded by C. rodentium genes that are potentially involved in virulence include homologues of the dispersin Aap of enteroaggrative E. coli (EAEC) (58) and virulence factor CexE of ETEC (76), the virulence-related outer membrane protein PagC (66), and the Yersinia enterocolitica porin required for colonization, SpfA (36).
Other factors previously reported to be important for the virulence of C. rodentium include the transcription factors Ler, GrlA, GrlR, RegA, and RpoS (2, 19, 21, 36); the quorum-sensing locus (CroIR); and the Pho regulon (15, 16). In addition, C. rodentium genes encode three putative hemolysin expression regulatory proteins, two on the chromosome and the third on pCROD2, and the putative hemolysin activator HlyBC, which is presumably involved in the regulation of the putative hemolysin/hemagglutinin encoded by the downstream ROD_49941.
C. rodentium carries four plasmids (Table (Table2),2), the largest of which, pCROD1, is nonconjugative; encodes two toxin/antitoxin (TA) addiction systems, PemIK and CcdAB; and has a replication locus similar to those of plasmids belonging to the incompatibility group IncFII, such as the E. coli plasmid R100 (NR1) (94). In addition to functions associated with plasmid replication and maintenance, pCROD1 genes, including a colicin S4-like biosynthesis operon and one intact and one degenerate chaperone-usher fimbrial operon, encode proteins associated with virulence, including three putative autotransporters (Table (Table55).
The backbone of the 39-kb plasmid, pCROD2, is syntenic with those of plasmids R6K (IncX1) and pOLA52 (IncX2) (67), sharing 22 CDSs (average 47% amino acid identity) with the former and 31 CDSs (average 59% amino acid identity) with the latter out of the 51 pCROD2 CDSs. Included in the shared regions is the plasmid replication locus, suggesting that pCROD2 also belongs to incompatibility group IncX. pCROD2 carries a 16-kb conjugative plasmid transfer locus (ROD_p2371-ROD_p2531); however, transfer of this plasmid might be impaired due to the disruption of triD (ROD_p2461) by IS102. Two addiction systems are likely to be involved in the stability of this plasmid, the TA system stbDE and the putative TA locus hicAB. pCROD2 also encodes a putative H-NS DNA-binding protein (ROD_p2271) that has been linked to mitigating the impact of low-G+C DNA acquired by lateral gene transfer in other systems (22).
Plasmids pCROD3 and pCROD4 are small (3.9 and 3.2 kb, respectively) and nonconjugative. The sequence of the smallest plasmid is identical to that of the previously sequenced 3,172-bp pCRP3 from C. rodentium strain DBS100 (19).
The data that we have described here highlight several clear messages. The phylogenetic analysis of C. rodentium showed that although it clusters with members of the Enterobacteriaceae, C. rodentium is more distantly related to E. coli than E. coli is to Salmonella. Moreover, the genomic evidence we have presented also shows that the sequenced species currently within the genus Citrobacter do not cluster, and our analysis indicates that the genus is in fact polyphyletic.
It is apparent from the comparisons of the gene sets of C. rodentium, EPEC, and EHEC that a large percentage (~32%) of the C. rodentium genome is unique. These distinguishing functions include those known to be important for virulence in other systems, including T6SSs and associated effectors and multiple fimbrial operons, as well as other exported adhesins. Also included in this group are some novel T3SS effector proteins.
The majority of functions shared between C. rodentium, EPEC, and EHEC are also found in the nonpathogenic K-12, which suggests that these genes encode core enterobacterial functions. However, there are a significant number of genes found in C. rodentium, EPEC, and EHEC that are absent from K-12. The majority of these are located on mobile genetic elements and are also recognizable as key virulence determinants, including lifA, LEE, and many of the C. rodentium T3SS effector proteins. It is the acquisition of these factors that is likely to have been responsible for their convergently evolved common virulence strategy.
Specialized transduction by lambdoid phages has been shown to be important for the acquisition of virulence determinants by both EPEC and EHEC (88). It is clear from this analysis that this is also the likely mechanism by which many of the C. rodentium effectors have been acquired. It is important to note that although the cargo genes are found at analogous sites in the genome, comparisons of the phage genes thought to have brought these shared effectors into the genome show that they probably came from distinct phages and so are likely to be part of a large pool of temperate phages disseminating a related set of effector genes to a diverse set of hosts, underscoring their importance in pathogen evolution. It is access to this gene pool that has allowed C. rodentium, at least in part, to convergently evolve with the pathogenic E. coli strains.
It is likely that the acquisition of the LEE and its associated effector proteins had a dramatic effect on the pathogenic potential of C. rodentium in the mouse and perhaps even the niche it occupies within this host. Certainly the presence of such a large number of pseudogenes and IS elements in other bacteria, such as S. Typhi and Yersinia pestis, has been tightly linked to a change in lifestyle associated with the occupation of a new niche (71, 73). Although the numbers of pseudogenes in C. rodentium and EPEC are comparable, 60% of EPEC pseudogenes are in regions that are thought to have been laterally acquired. The inverse is true of C. rodentium, where 57% of pseudogenes fall in the core regions.
It is not just the relative number of pseudogenes lost by C. rodentium that draws parallels with S. Typhi and Y. pestis; C. rodentium has also lost metabolic, colonization, and virulence-associated functions. If there had been a change of niche, it would be expected that genes important for the previous lifestyle would be lost. Certainly the observation that the majority of pseudogenes lie in the C. rodentium backbone is consistent with this. A specific example that illustrates this point is the loss of function in the fimbrial genes that has occurred exclusively in operons unique to C. rodentium compared to EPEC and EHEC.
C. rodentium is the etiological agent of transmissible colonic hyperplasia in mice. It colonizes the gut mucosa via A/E lesions and may also reach deeper tissues (e.g., the liver and spleen). We have recently shown that espT can mediate invasion of C. rodentium into mammalian cells (10a); it is not yet known if EspT plays a role in in vivo dissemination or in the hyperplastic or immunological responses. C. rodentium is nontoxigenic, using an infection strategy that appear to rely mainly on its T3SS effector repertoire. Although we have shown that C. rodentium is genetically distinct from EPEC and EHEC and does not share the same host, the three bacteria have similar infection strategies and share virulence genes that are found on mobile genetic elements. Accordingly, it is unlikely that C. rodentium acquired the virulence loci directly from EPEC or EHEC, and the data point to the existence of a common ancestral source. Nonetheless, the availability of the complete C. rodentium genome annotation will enhance future studies of virulence mechanisms in A/E pathogens using a natural and unique host-pathogen interaction model.
We thank Simon Harris for assistance with the phylogenetic analyses. We thank the core sequencing and informatics teams at the Sanger Institute for their assistance and the Wellcome Trust for its support of the Sanger Institute Pathogen Genomics group.
Published ahead of print on 6 November 2009.
†Supplemental material for this article may be found at http://jb.asm.org/.
‡The authors have paid a fee to allow immediate free access to this article.