|Home | About | Journals | Submit | Contact Us | Français|
Francisella novicida is a close relative of Francisella tularensis, the causative agent of tularemia. The genomes of F. novicida-like clinical isolates 3523 (Australian strain) and Fx1 (Texas strain) were sequenced and compared to F. novicida strain U112 and F. tularensis strain Schu S4. The strain 3523 chromosome is 1,945,310 bp and contains 1,854 protein-coding genes. The strain Fx1 chromosome is 1,913,619 bp and contains 1,819 protein-coding genes. NUCmer analyses revealed that the genomes of strains Fx1 and U112 are mostly colinear, whereas the genome of strain 3523 has gaps, translocations, and/or inversions compared to genomes of strains Fx1 and U112. Using the genome sequence data and comparative analyses with other members of the genus Francisella, several strain-specific genes that encode putative proteins involved in RTX toxin production, polysaccharide biosynthesis/modification, thiamine biosynthesis, glucuronate utilization, and polyamine biosynthesis were identified. The RTX toxin synthesis and secretion operon of strain 3523 contains four open reading frames (ORFs) and was named rtxCABD. Based on the alignment of conserved sequences upstream of operons involved in thiamine biosynthesis from various bacteria, a putative THI box was identified in strain 3523. The glucuronate catabolism loci of strains 3523 and Fx1 contain a cluster of nine ORFs oriented in the same direction that appear to constitute an operon. Strains U112 and Schu S4 appeared to have lost the loci for RTX toxin production, thiamine biosynthesis, and glucuronate utilization as a consequence of host adaptation and reductive evolution. In conclusion, comparative analyses provided insights into the common ancestry and novel genetic traits of these strains.
Francisella tularensis is an intracellular pathogen that causes tularemia in humans, and the public health importance of this bacterium has been well documented in recent history (81). The genome sequences of several F. tularensis isolates from disparate geographic origins have been sequenced to date (4, 6, 12, 13, 41). Comparative genome sequence analyses have provided insights into the taxonomy, physiology, and pathogenic evolution of different subspecies of F. tularensis (33, 91). Francisella novicida strain U112, which is β-galactosidase negative but citrulline ureidase and glycerol positive, originally was isolated from water collected in the Ogden Bay Bird Refuge in Utah (39). Early comparative studies indicated that F. novicida strain U112 is less fastidious than F. tularensis, and it differs from the latter in antigenic composition as well as virulence (64). The lipopolysaccharide (LPS) of F. novicida strain U112 also is structurally distinct and biologically more active than F. tularensis LPS (87, 92). Nevertheless, F. novicida strain U112 has been used as a surrogate of F. tularensis in scores of laboratory studies, primarily due to its pathogenicity for rodents and amenability to genetic manipulation (24).
F. novicida was formally included in the genus Francisella in 1959, and strain U112 was the sole member of the species until 1989 (62). Since then, there have been at least seven reports of F. novicida or F. novicida-like bacteria isolated from humans. These include the earliest human isolates described by the Centers for Disease Control and Prevention from Louisiana and California (30), two isolates from Texas (15), and an isolate each from Australia (95), Thailand (42), and Arizona (8).
Some of these reports were from immunocompromised patients who manifested a localized, relatively mild illness, which is consistent with F. novicida being an opportunistic pathogen. Despite these reports, F. novicida is thought to constitute an environmental lineage along with Francisella philomiragia, which also has been associated with human disease (30, 94).
The genome of F. novicida strain U112 has been sequenced and compared to the genome sequences of F. tularensis pathogenic to humans (75). The draft genome sequences of Francisella isolates from Louisiana and California (GA99-3548 and GA99-3549, respectively) also have been compared to the genome sequences of F. tularensis strains (12). The draft genome sequences of F. novicida strains FTE (a mouse passage of strain U112; GenBank project number 30717) and FTG (a human isolate; GenBank project number 55313) are available. Whereas F. tularensis is a highly infectious zoonotic agent, the mechanism of transmission of F. novicida among vertebrate or invertebrate species is unknown. Despite the presence of ≥97.7% average nucleotide identities among the genome sequences of F. novicida and F. tularensis strains, they have been proposed to constitute separate species based on evolutionary analyses (40). However, amending the genus Francisella and the reclassification of F. novicida as a subspecies of F. tularensis have been formally proposed (31).
Francisella isolate Fx1, which is β-galactosidase, β-lactamase, and citrulline ureidase positive but glycerol negative, was cultured from the blood of a diabetic patient in the Galveston Bay area of Texas (15). The patient was thought to have disseminated infection due to isolate Fx1 manifesting with bacteremia, pneumonia, and brain abscesses. Virulence studies in laboratory animals performed by standard methods showed no difference between F. tularensis subsp. tularensis and isolate Fx1 (15). Subsequent studies have designated this isolate F. tularensis subsp. novicida strain Fx1 and F. novicida strain Fx1 (22, 66). Francisella isolate 3523, the first reported Francisella from the Southern Hemisphere, was cultured from a patient who had cut a toe in brackish water in the Northern Territory of Australia (95). The patient was afebrile, had no other clinical symptoms, and recovered after antibiotic treatment. Studies using Swiss-Webster mice showed that isolate 3523 was less virulent than F. tularensis subsp. tularensis (95). The true nature of this isolate, which is glycerol and β-galactosidase positive, is unknown and has been tentatively designated a novicida-like subspecies of F. tularensis (95). Since these isolates were from different continents/hemispheres, they were desirable candidates for genome sequencing and comparative analyses. The objectives of the present study were to decipher the genomes of F. novicida-like strains Fx1 and 3523 and to identify the genetic differences between these strains and F. tularensis subsp. novicida strain U112 by comparative genome analyses. Since genetic relationships and evolutionary contexts could be better understood by whole-genome analyses of conserved operons, it was envisaged to include the genomes of different Francisella species and strains in the comparisons when available.
Bacterial cultivation and chromosomal DNA extraction were performed at the Centers for Disease Control and Prevention, Fort Collins, CO, using standard protocols (57, 66). Genomic library construction, sequencing, and finishing were performed at the Genome Science Facilities of Los Alamos National Laboratory, Los Alamos, New Mexico, as described previously (4, 6, 98). The prediction of the number of subsystems and pairwise BLAST comparisons of protein sets within strains Fx1 and 3523 were performed using Rapid Annotation using Subsystems Technology (RAST), which is a fully automated, prokaryotic genome annotation service (3). Proteins deemed to be specific to each strain were compared against the NCBI nonredundant protein database to determine whether they were hypothetical or conserved hypothetical. If there was no adequate alignment with any protein (less than 25% identity or the aligned region is less than 25% of the predicted protein length), the translated open reading frame (ORF) was designated a hypothetical protein.
Multiple genome comparisons were performed using the progressive alignment option available in the program MAUVE, version 2.3.0. Default scoring and parameters were used for generating the alignment. A synteny plot was generated using the program NUCmer. The program uses exact matching, clustering, and alignment extension strategies to create a dot plot based on the number of identical alignments between two genomes. Prophage regions (PRs) were identified using Prophinder (http://aclame.ulb.ac.be/Tools/Prophinder/), an algorithm that combines similarity searches, statistical detection of phage gene-enriched regions, and genomic context for prophage prediction. Insertion sequences (ISs) were identified by whole-genome BLAST analysis of strains Fx1, 3523, and U112 using the IS finder (http://www-is.biotoul.fr/). Gene acquisition and loss among the three strains were determined by comparing gene order, orientation of genes (forward/reverse), GC content of genes (the percentage above or below the whole-genome average), features of intergenic regions (e.g., remnants of IS elements, integration sites, etc.), and the similarity of proteins encoded by genes at a locus of interest (>90% identity at the predicted protein level).
DNA and protein sequences were aligned using the ClustalW (http://www.ebi.ac.uk/Tools/clustalw2/index.html) and BOXSHADE (http://www.ch.embnet.org/software/BOX_form.html) programs as described previously (80). Multiple-sequence alignments for phylogenetic analyses of strains 3523 and Fx1 were performed using the program MUSCLE, which is available at the website http://www.phylogeny.fr/ (20, 21). The alignment was followed by a bootstrapped (n = 100) neighbor-joining method for inferring the phylogenies (85). Only full-length 16S rRNA and succinate dehydrogenase (sdhA) gene sequences from high-quality, finished Francisella genomes were included in these comparisons.
The chromosome of strain 3523 was 31,692 bp larger than that of strain Fx1. Although the chromosomes of strains 3523, Fx1, and U112 differed in size, their average GC content and the percentage of sequence that encodes proteins were similar (Table 1). Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin (38). Comparative genomic analysis indicated that strain 3523 contained two GIs (GI1, 4,352 to 15,978 bp, 28.8% GC; GI2, 1,012,013 to 1,056,964 bp, 34.2% GC). Strain U112 also contained two genomic islands (GI1, 4,244 to 15,027 bp, 29.2% GC; GI2, 369,322 to 376,086 bp, 30.7% GC). However, strain Fx1 contained a single genomic island (GI1, 4,244 to 13,232 bp, 27.9% GC). Whereas GI2 of strain 3523 was unrelated to GI2 of strain U112, GI1 of strains 3523, Fx1, and U112 were related to each other, suggesting a common lateral origin. Interestingly, none of these GIs contained genes that have a role in pathogenicity.
Short sequence repeats, which include insertion sequences (IS), are the hallmark of F. tularensis genomes, and the IS elements are thought to be generally stable among different isolates despite their diverse geographical origins (75, 86). Whole-genome BLASTN and BLASTX analyses using IS finder showed that strain 3523 contained a single copy of an IS481 family element (FN3523_0714, 348 amino acids [aa]) that had no homologs in strains Fx1 and U112. Strain Fx1 contained six copies of the IS110 family element (FNFX1_0028, FNFX1_1255, FNFX1_1555, and FNFX1_1557, 315 aa each; FNFX1_0715 and FNFX1_0718, 290 aa each) that had no homologs in strains 3523 and U112. Strains 3523 and Fx1 contained IS982 family elements (FN3523_1458, 258 aa; FNFX1_0226, 187 aa) that had no homologs in strain U112. Whereas strain 3523 lacked ISFtu1, ISFtu3, and ISFtu5 sequences, strain Fx1 lacked only ISFtu5 sequences. Nevertheless, both strains contained full-length or partial homologs of other ISFtu elements in various copy numbers (data not shown).
Whole-genome alignment using MAUVE showed the presence of extensive blocks of homologous regions among strains 3523, Fx1, and U112 (Fig. 1A). NUCmer analyses revealed that the genomes of strains Fx1 and U112 were mostly colinear, whereas the genome of strain 3523 had gaps, translocations, and/or inversions compared to the genomes of strains Fx1 and U112 (Fig. 1B). In-depth sequence examination indicated that some of these gaps and/or inversions were associated with integrative and conjugative elements, including the IS elements mentioned above. The occurrence of IS elements at genomic breakpoints also has been observed in comparisons of the genomes of different subspecies of F. tularensis (75).
A three-way comparison of strains 3523, Fx1, and U112 revealed that they contained 1,583 orthologous protein-coding genes (bidirectional best hits). A similar comparison indicated that strain 3523 contained 149 protein-coding genes with no homologs in strains Fx1 and U112. Strain Fx1 contained 70 protein-coding genes with no homologs in strains 3523 and U112. Strain U112 contained 69 protein-coding genes with no homologs in strains 3523 and Fx1. A two-way comparison of protein-coding genes of strains 3523, Fx1, and U112 is shown in Table S1 in the supplemental material. In strain 3523, 494 genes could not be assigned a function based on BLAST analysis and therefore had been annotated as encoding hypothetical or conserved hypothetical proteins. In strain Fx1, 447 genes had been annotated as encoding hypothetical or conserved hypothetical proteins.
A cluster of 17 to 19 genes has been proposed to constitute the Francisella pathogenicity island (FPI), and it is found in a single copy in F. tularensis subsp. novicida strain U112 (FTN_1309 to FTN_1325), but it is duplicated in the genomes of F. tularensis subsp. tularensis (75, 91). Genome comparisons revealed that strains 3523 and Fx1 also contained a single copy of the FPI (FN3523_1373 to FN3523_1389 and FNFX1_1347 to FNFX1_1363, respectively). The order and orientation of genes within the FPI of strains 3523, Fx1, and U112 were identical. The predicted proteins within the putative FPIs of strains 3523 and Fx1 had average identities of 87 and 97%, respectively, to those from strain U112. Furthermore, strains 3523, Fx1, and U112 contained the ferric uptake regulator gene (fur) as well as the fslABCDEF operon (FN3523_1749 to FN3523_1755, FNFX1_1722 to FNFX1_1728, and FTN_1681 to FTN_1687, respectively), which is implicated in the biosynthesis of a polycarboxylate siderophore in F. tularensis subsp. tularensis strain Schu S4 and F. tularensis subsp. novicida strain U112 (37, 71). An ortholog of strain Schu S4 fupA (FTT0918), whose product is required for the efficient utilization of siderophore-bound iron (47), also was found in strains 3523, Fx1, and U112 (FN3523_0408, FNFX1_0437, and FTN_0444, respectively). The presence of FPI, fur, fslABCDEF, and fupA in almost all Francisella genomes suggests that these functions are essential for their survival in the environment and/or host.
Several pathogenic Gram-negative bacteria produce potent pore-forming cytotoxins that contain calcium-binding glycine and aspartate-rich repeat regions (93). The genetic determinants of RTX toxin synthesis and transport usually consist of a single operon (rtxCABD) and an unlinked gene (tolC) encoding an outer membrane channel (46). Although some researchers have alluded to the presence of toxins in cellular preparations of F. tularensis, there is no consensus opinion on the synthesis of toxins, and genes encoding toxins have not been found in this species (69, 90). However, homologs of tolC (FTT_1095c and FTT_1724c) have been characterized in F. tularensis (25) and also are found in strains U112 (FTN_0779 and FTN_1703), 3523 (FN3523_1096 and FN3523_1775), and Fx1 (FNFX1_0783 and FNFX1_1744).
The predicted prophage region of strain 3523 contained an operon (10,160 bp, 35.5% GC) related to those involved in RTX toxin synthesis and secretion (Fig. 2A). The first ORF (FN3523_1037, rtxC) had no homologs in the databases. Based on its location within the operon and predicted protein molecular mass (356 aa, 41.81 kDa), rtxC may encode a protein involved in fatty acylation during the activation of protoxin to cytotoxin. The second ORF (FN3523_1036, rtxA) encoded a putative protein (1,829 aa, 195 kDa) that had 30% identity to the RTX cytotoxin-related FrpC protein (1,829 aa, 197 kDa) of Neisseria meningitidis FAM20 (88) and α-hemolysin HlyA (1,024 aa, 110 kDa) of Escherichia coli (82). The third ORF (FN3523_1035, rtxB) encoded a putative ABC transporter (703 aa, 79.83 kDa) that had 50% identity to the α-hemolysin translocator ATP-binding protein HlyB (706 aa, 79.83 kDa) of E. coli (29). The fourth ORF (FN3523_1034, rtxD) encoded a putative membrane fusion protein (490 aa, 56.41 kDa) that had 32% identity to the type 1 translocator protein HlyD (478 aa, 54.48 kDa) of E. coli (67). The toxin-encoding ORF (FN3523_1036, 5,490 bp) was the largest among all protein-coding genes predicted in strain 3523. Several other Francisella genomes contained truncated genes related to rtxA of strain 3523 (e.g., FTM_1222, FTL_1124, FTT_1077c, FTH_1098, and FTA_1184; 76 to 94% identity at the predicted protein level). However, homologs of rtxB, rtxC, and rtxD were not found in any of these genomes.
The genomic analysis of strain Fx1 revealed a locus (12,973 bp, 32.6% GC) encoding proteins related to RTX toxins (Fig. 2B). The first ORF (FNFX1_0799, rtxC) had no homologs in the databases. Based on its location within the operon and predicted protein molecular mass (337 aa, 39 kDa), rtxC may encode a protein involved in fatty acylation during the activation of protoxin to cytotoxin. The second and third ORFs (FNFX1_0800 and FNFX1_0801, rtxA1and rtxA2, respectively) encoded putative proteins (1,234 and 1292 aa, 134.7 and 140 kDa, respectively) that are unrelated to α-hemolysin HlyA of E. coli but had 28 to 30% identity to the RTX cytotoxin-related proteins FrpA and FrpC of Neisseria meningitidis and the putative protein encoded by rtxA of strain 3523. Furthermore, the putative proteins encoded by rtxA1 and rtxA2 had 30% identity to each other. Interestingly, the rtx locus of strain Fx1 also contained rtxB and rtxD ORFs found in strain 3523. Whereas the rtxB ORFs (FNFX1_0802 and FN3523_1035) were 96% identical among the two strains, the rtxD ORF of strain Fx1 was fragmented due to a transposition event (FNFX1_0803 and FNFX1_0805). Within this region, strain Fx1 contained a gene encoding a hypothetical protein (FNFX1_0797, 141 aa) that had 86% identity to the putative RTX toxin (1,829 aa, encoded by rtxA) of strain 3523. An operon (1,487 bp; 31% GC) related to the hipBA locus of multidrug-tolerant bacteria also was found in this region (Fig. 2B), and putative HipA (FNFX1_0806, 418 aa) of strain Fx1 was 30% identical to the HipA persistence factor (2WIU_C, 446 aa) of E. coli. Genomes of several other F. tularensis subsp. tularensis strains contained a similar hipBA operon, but strain U112 contained only a hipB ortholog (FTN_0795) and strain 3523 lacked hipBA.
Although the α-hemolysin of E. coli is the prototype of RTX toxins, genes encoding RTX toxins appear to be more common in pathogenic members of the Pasteurellaceae and are almost always associated with genes encoding a type I secretion system. The horizontal transfer of genes encoding RTX toxins across different bacterial families has been suggested based on phylogenetic analyses (23). The identification of a locus encoding putative RTX toxins in strains 3523 and Fx1 is especially intriguing, since such genes have not been reported thus far in any of the francisellae genomes. The bacteriocin ABC transporters of some bacteria have similarities to ABC transporters of type I secretion systems (99). Although bacteriocins of Francisella have been investigated previously and designated tularecins (1, 2, 5), very little is known about the genetic basis of their synthesis/secretion. The putative RTX toxin of strain 3523 is also related to the bacteriocins of Rhizobium leguminosarum and Nitrococcus mobilis Nb-231 (data not shown). Since the bacteriocin of R. leguminosarum has similarities to RTX toxins (63) and some cytolysins can function as bacteriocins (11), the putative RTX toxin of strain 3523 also may possess properties of bacteriocins and afford environmental fitness. The biochemical characterization of rtxCABD loci of strains 3523 and Fx1 is required to verify these hypotheses. Furthermore, the presence of a vestigial RTX toxin/bacteriocin-encoding ORF in several Francisella genomes, including strains 3523 (FN3523_1038), Fx1 (FNFX1_0797), and U112 (FTN_0793), indicates that the common ancestor of these bacteria was toxigenic/bacteriocinogenic. It is possible that the ancestral locus was truncated during habitat restriction and/or reductive evolution of these bacteria, and that it was reacquired by strains 3523 and Fx1 through horizontal transfer.
Arsenic is an environmental pollutant, and some microorganisms have evolved mechanisms of resistance to this cytotoxic agent. Arsenic exists in two oxidation states, arsenite [As(III)] and arsenate [As(V)], in biological systems. In most bacteria, the minimal arsenical resistance operon contains three ORFs (arsRBC), wherein the conversion of arsenate to arsenite is accomplished by a reductase (product of arsC), arsenite is transported out of the cell by a membrane-bound efflux pump (product of arsB), and arsR encodes an arsenic resistance regulatory protein. The plasmid- or transposon-mediated horizontal transfer of genes that confer arsenic resistance has been well documented (58). F. tularensis subsp. novicida strain U112 contained a two-gene operon (1,218 bp, 30.4% GC) that encoded a putative transcriptional regulator and an arsenite exporter (arsRB) (Fig. 2C). At the protein level, strain U112 ArsB (FTN_0800, 342 aa) was 61% identical to the arsenite efflux transporter of Bacillus subtilis (BSU25790, 346 aa), and ArsR (FTN_0801, 116 aa) was 38% identical to the ArsR repressor of B. subtilis (BSU25810, 105 aa). A gene encoding a putative IS4 family transposase (247 aa) was found adjacent to the arsRB operon of strain U112. Within this region, strain U112 also contained a gene encoding a putative protein (FTN_0799, 109 aa) that was 44% identical to the small multidrug resistance antiporter EmrE of E. coli. The arsRB operon and homologs of emrE were present in F. tularensis subsp. novicida strain GA99-3548 and F. philomiragia strains ATCC 25015 and 25017, but not in strains 3523 and Fx1. In strain U112, the transposase gene associated with the arsRB operon was separated from an identical gene upstream by genes encoding a putative fatty acid hydroxylase (FTN_0797, 182 aa), an rRNA methyltransferase (FTN_0798, 718 aa), and a hipB gene. Adjacent to this transposase gene, strain U112 contained two ORFs encoding hypothetical proteins (FTN_0792 and FTN_0793, 140 and 189 aa, respectively) (Fig. 2C). Homologs of FTN_0792 were present in F. tularensis subsp. novicida strains GA99-3548 and GA99-3549 but not in strains 3523 and Fx1. Furthermore, FTN_0793 had 91% identity to the putative RTX toxin (1,829 aa, encoded by rtxA) of strain 3523. The occurrence of different loci (prophage region containing rtxCABD in strain 3523, transposon associated with rtxCABD in strain Fx1, and transposon associated with arsenite resistance genes in strain U112) flanked by conserved genes implies that this variable region is a genomic hot spot among different Francisella strains.
The LPS of Francisella spp. has several unique features and has been demonstrated to undergo antigenic variation (28). In contrast to F. tularensis subsp. tularensis, F. tularensis subsp. novicida strain U112 expresses a single chemotype of LPS which has been proposed to contribute to virulence in a mouse model of infection (34). Immunobiological studies have demonstrated the activation of complement by pathogenic strains of F. tularensis subsp. tularensis as well as F. tularensis subsp. novicida strain U112 and implicated their LPS O-antigen in mediating resistance to complement-mediated lysis (16). It also has been shown that mice immunized with LPS from strain U112 were protected against strain U112 but not against F. tularensis subsp. holarctica, and immunization with LPS from F. tularensis subsp. tularensis protected against F. tularensis subsp. holarctica but not against strain U112 (87). The wbt gene cluster of F. tularensis subsp. tularensis strain Schu S4 (17,378 bp, 31% GC) is involved in LPS biosynthesis and contained 15 ORFs (87). In contrast, a similar gene cluster of F. tularensis subsp. novicida strain U112 (13,880 bp, 30.6% GC) contained only 12 ORFs (87). The LPS O antigens of F. tularensis subsp. novicida and F. tularensis subsp. tularensis have been shown to be structurally and immunologically distinct, due in part to the differences in wbt genes involved in their biosynthesis (73, 87).
Comparative genomic analyses revealed that strains 3523 and Fx1 contained a cluster of 21 (23,452 bp, 31% GC) and 16 (15,841 bp, 30.6% GC) ORFs, respectively, that were related to the wbt gene cluster of F. tularensis subsp. tularensis strain Schu S4. In addition, these strains contained conserved ORFs encoding proteins putatively involved in mannose modification adjacent to the wbt gene cluster (FN3523_1475 to FN3523_1476 and FNFX1_1451 to FNFX1_1452). Table S2 in the supplemental material contains a comprehensive list of the annotated ORFs found in the wbt gene clusters of these bacteria. The 10 strain-specific ORFs found in the wbt gene cluster of strain 3523 were organized into groups of five (FN3523_1480 to FN3523_1484, 4,575 bp, 29.6% GC), two (FN3523_1486 and FN3523_1487, 2,328 bp, 29.8% GC), and three (FN3523_1492 to FN3523_1494, 3,232 bp, 28.9% GC) contiguous ORFs. Of the six strain-specific ORFs found in the wbt gene cluster of strain Fx1, four were contiguous (FNFX1_1457 to FNFX1_1460, 3,888 bp, 28.6% GC). Of the five strain-specific ORFs found in the wbt gene cluster of strain Schu S4, four were contiguous (FTT_1452c to FTT_1455c, 4,150 bp, 30% GC). The wbt gene cluster of F. tularensis subsp. novicida strain U112 contained three noncontiguous ORFs that were strain specific (FTN_1422, FTN_1424, and FTN_1428). The wbt gene clusters of F. novicida-like strains 3523 and Fx1 contained four contiguous orthologous ORFs (FN3523_1488 to FN3523_1491, 4,175 bp, 30.7% GC, and FNFX1_1461 to FNFX1_1464, 4,204 bp, 30.6% GC, respectively) that had no homologs in strains U112 and Schu S4. The wbt gene clusters of strains U112 and Schu S4 contained three contiguous orthologous ORFs (FTN_1425 to FTN_1427, 3,380 bp, 32.5% GC, and FTT_1459c to FTT_1461c, 3,391 bp, 32.5% GC, respectively) that had no homologs in strains 3523 and Fx1.
The wbt gene clusters of strain Fx1 and F. tularensis subsp. tularensis Schu S4 contained two contiguous orthologous ORFs (FNFX1_1466 and FNFX1_1467, 1,225 bp, 32.7% GC, and FTT_1462c and FTT_1463c, 1,411 bp, 31% GC, respectively) that had no homologs in strains 3523 and U112. The wbt gene clusters of strains 3523 and U112 contained two contiguous orthologous ORFs (FN3523_1495 and FN3523_1496, 2,209 bp, 34% GC, and FTN_1429 and FTN_1430, 1,735 bp, 34.3% GC, respectively) that had no homologs in strains Fx1 and Schu S4. The strain-specific ORFs in these strains were flanked by highly conserved orthologous ORFs putatively encoding WbtM (FTT_1450c, FN3523_1477, and FNFX1_1453) and WbtA (FTT_1464c, FN3523_1497, FNFX1_1468, and FTN_1431). Furthermore, the wbt gene cluster of strain Schu S4 was flanked by genes encoding ISFtu1/IS630 transposases (126 aa each), and the wbt gene cluster of strain U112 contained a copy of ISFtu3/IS1016 (233 aa) that appears to have truncated the ORF encoding a putative dTDP-d-glucose 4,6-dehydratase (FTN_1420c; WbtM). However, the wbt gene clusters of strains 3523 and Fx1 lacked transposase genes (see Table S2 in the supplemental material).
Since the wbt gene clusters of strains U112, Fx1, 3523, and Schu S4 display a cassette/mosaic structure with an outer conserved region and an inner variable region, it can be hypothesized that genes in the outer region encode functions common to all strains, whereas genes in the inner region encode serogroup-specific functions. If the number of genes in the inner variable region is an indicator of the complexity of LPS, then the LPS of strains 3523 and Fx1 likely is more different from that of strains U112 and Schu S4. A similar chimeric arrangement has been observed in the gene clusters encoding polysaccharide antigens in Salmonella enterica, and it has been proposed that genes in the outer conserved region mediate the conspecific exchange of genes in the inner variable region (44).
Biosynthesis of polysaccharides requires several glycosyltransferases (GTs), which catalyze the transfer of sugars from an activated donor to an acceptor molecule and are usually specific for the glycosidic linkages created (48). The genomes of strains U112, Fx1, 3523, and Schu S4 contained yet another cluster of 10 to 17 ORFs oriented in the same direction that encoded putative GTs and other proteins related to enzymes involved in polysaccharide biosynthesis or cell wall/membrane biogenesis. This cluster has been tentatively designated psl (polysaccharide synthesis locus). Table S3 in the supplemental material contains a list of the annotated ORFs found within this gene cluster. Strain 3523 had 11 contiguous strain-specific ORFs (FN3523_1278 to FN3523_1288, 10,580 bp, 30.4% GC) within the psl cluster. Strain Fx1 had two contiguous strain-specific ORFs (FNFX1_1260 and FNFX1_1261, 1,897 bp, 26.8% GC) within the psl cluster. Strain Schu S4 had three contiguous strain-specific ORFs (FTT_0794 to FTT_0796, 2,675 bp, 27% GC) within the psl cluster. The psl clusters of strains Schu S4 and Fx1 contained two orthologous ORFs that were not found in strain 3523 (FTT_0797/FNFX1_1259 and FTT_0793/FNFX1_1262). These strain-specific ORFs were flanked by three contiguous orthologous ORFs at both ends. Furthermore, strain Fx1 contained a copy of the IS110 family transposase (FNFX1_1255; 315 aa) adjacent to the ORF that encoded a putative HAD family hydrolase (FNFX1_1256; see Table S3 in the supplemental material). Based on gene content and organization, it may be surmised that the psl gene cluster was involved in LPS and/or exopolysaccharide (EPS) biosynthesis. Since the psl gene cluster of strain 3523 contained more genes than strains Fx1, U112, and Schu S4, it is possible that the LPS/EPS of this strain is more complex.
Vitamin B1 (thiamine pyrophosphate) is involved in several microbial metabolic functions (74). Prokaryotes have evolved elaborate mechanisms to either synthesize this important cofactor de novo or acquire it from their niche (7). Thiamine biosynthesis in most bacteria is accomplished by two major pathways; one involves the formation of hydroxymethylpyrimidine pyrophospate (HMP-PP) from aminoimidazole ribotide using ThiC and ThiD, and the other involves the formation of hydroxyethylthiazole phosphate (HET-P) using ThiS, ThiF, ThiG, and ThiO. The enzyme thiamine phosphate synthase (ThiE) combines HMP-PP and HET-P to produce thiamine phosphate, which is phosphorylated by thiamine monophosphate kinase (ThiL) to produce thiamine pyrophosphate (7). The in vitro growth of most Francisella species is achieved by supplementing culture media with thiamine hydrochloride or thiamine pyrophosphate, indicating the absence of the thiamine biosynthesis (TBS) pathway in these bacteria (59).
Strain 3523 and F. philomiragia strain ATCC 25017 contained an operon with six ORFs (thiCOSGDF; FN3523_1212 to FN3523_1217, 6,035 bp, 32% GC) encoding proteins related to enzymes involved in thiamine biosynthesis in several prokaryotes (Table 2). This gene cluster was not found in other Francisella genomes and was associated with a transposable element in F. philomiragia strain ATCC 25017 (Fig. 3A). The genetic organization of the strain 3523 thiCOSGDF locus was similar to that of the plasmid-encoded thiCOGE locus involved in thiamine biosynthesis in Rhizobium etli (56). An analogous cluster (thiOGF) is found within plasmid pEA29 of the plant pathogen Erwinia amylovora strain Ea88 (50). The chromosome of lithoautotrophic bacterium Ralstonia eutropha H16 also contains a thiCOSGE locus that is proposed to be involved in the de novo synthesis of thiamine (68). At the protein level, strain 3523 ThiC was 68% identical to ThiC of R. etli (AAC45972) and R. eutropha (H16_A0235), whereas strain 3523 ThiF was 35% identical to ThiF of E. amylovora (NP_981993; E value, 3e−24). Furthermore, strain 3523 ThiO and ThiS were 27 and 31% identical to ThiO (H16_A0236) and ThiS (H16_A0237) of R. eutropha, respectively (E values, 9e−27 to 0.001). Putative thiazole synthase ThiG of strain 3523 was ~50% identical to ThiG of R. etli (AAC45974; E value, 4e−66) and R. eutropha (H16_A0238; E value, 8e−77).
In strain 3523, FN3523_1213 appeared to encode a putative fused protein containing hydroxy-phosphomethylpyrimidine kinase and thiamine-phosphate pyrophosphorylase domains. In some bacteria, these functions are encoded by two different ORFs (thiD and thiE, respectively). Homologs of FN3523_1213 were found in several bacteria (e.g., Legionella pneumophila, Coxiella burnetii, Geobacter sulfurreducens, and Colwellia psychrerythraea; ~30% protein identity) and plants (e.g., Arabidopsis thaliana, Zea mays, and Brassica napus; ~29% protein identity). It has been proposed that these bifunctional enzymes are involved in the synthesis of HMP-PP as well as the condensation of HMP-PP and HET-P to produce thiamine monophosphate (36, 72, 74). The 5′ untranslated regions of operons involved in thiamine biosynthesis and transport have been shown to contain a regulatory element called the THI box sequence (55). Based on the alignment of conserved sequences upstream of operons involved in thiamine biosynthesis from various bacteria, a putative THI box sequence was identified upstream of thiC in strain 3523 and F. philomiragia strain ATCC 25017 (Fig. 3B). The identification of a putative THI box upstream of thiC in strain 3523 suggests a thiamine-dependent regulation of this gene, similarly to other bacteria that have TBS genes (74).
In bacteria that lack a TBS pathway, thiamine kinases may facilitate the salvage of dephosphorylated thiamine intermediates from the environment or growth medium (32). A gene that encodes a putative thiamine pyrophosphokinase (TPK) was found in most members of Francisella, including strains 3523, Fx1, and U112 (FN3523_0611, FNFX1_0669, and FTN_0662, respectively). F. novicida strain U112 TPK was 27% identical to Bacillus subtilis TPK (THIN_BACSU; E value, 3e−10), which catalyzes the direct conversion of thiamine to thiamine pyrophosphate (52). In Listeria monocytogenes, which has an incomplete thiamine biosynthesis pathway, it has been shown that proliferation within host cells is reduced upon the deletion of thiD and thiT, encoding an HMP salvage protein and a thiamine transporter, respectively (77). In addition, strains of E. amylovora that lack the thiOSGF operon-containing plasmid pEA29 have been shown to be altered in exopolysaccharide biosynthesis and less virulent (51). Aspergillus nidulans, the filamentous ascomycete which induces a fatal systemic mycosis in mice, has been shown to be less pathogenic when its thiamine biosynthetic pathway was blocked by mutation (70). It also has been suggested that Brucella abortus and R. etli, which have similar intracellular lifestyles, require increased thiamine in the stationary phase (56, 76). In view of these observations, the TBS gene cluster of strain 3523 could serve to enhance its survival in the environment/host and also may have an indirect role in pathogenesis. Furthermore, the TBS genes of strain 3523 could be useful in accelerating the growth of thiamine auxotrophs of F. tularensis in the laboratory.
Some bacteria have evolved mechanisms for the metabolism of uronic acids and uronates using the Entner-Doudoroff pathway (65). In this pathway, α-d-glucuronic acid (GlcUA) is converted into 2-keto-3-deoxygluconate (KDG) by a three-step process. The subsequent phosphorylation of KDG yields 2-keto-3-deoxy-6-phosphogluconate (KDPG), which is finally cleaved to produce glyceraldehyde 3-phosphate (G3P) and pyruvate. Comparative genomic analyses indicated that strains 3523 and Fx1 contained a cluster of nine ORFs that appear to constitute a polycistronic operon (FN3523_0892 to FN3523_0900 and FNFX1_0904 to FNFX1_0912, respectively, 11,784 bp, 31.7% GC). These ORFs encoded putative proteins related to enzymes involved in GlcUA catabolism (Table 2). Furthermore, genes encoding putative transposases of IS1106 and IS1016 families were found near the GlcUA utilization gene cluster of strain Fx1 but not in strain 3523 (Fig. 4A). This gene cluster was not found in other Francisella genomes, with the exception of F. philomiragia strain ATCC 25015, which appeared to contain the entire cluster (79 to 89% identity at the protein level).
The predicted mannonate dehydratase, 2-keto-3-deoxygluconokinase, KDPG aldolase, and glucuronate isomerase proteins from strain 3523 had 57, 37, 37, and 28% identities to E. coli UxuA (ECs5281), KdgK (ECs4406), KdgA (ECs2560), and UxaC (ECs3974), respectively (E values, 5e−132 to 9e−26). These enzymes catalyze the dehydration of d-mannonate to KDG, phosphorylation of KDG, cleavage of KDPG to pyruvate and G3P, and the conversion of d-glucuronate to d-fructuronate, respectively (65). The GlcUA utilization gene clusters of strains 3523 and Fx1 had some similarities to that of Bacillus stearothermophilus T-6, which has been predicted to metabolize GlcUA akin to E. coli and Bacillus subtilis (79). Furthermore, one end of the GlcUA utilization gene cluster of B. stearothermophilus T-6 also contains an ORF encoding a protein that has similarities to transposases of the IS481 family (49), indicating the possible horizontal transfer of this locus.
The ORFs encoding a putative inositol oxygenase in strains 3523, Fx1, and ATCC 25015 had no bacterial homologs in the public databases outside the genus Francisella. However, strain 3523 inositol oxygenase had 38% identity to Mus musculus myo-inositol oxygenase (56727 Miox; E value, 7e−48), which catalyzes the conversion of myo-inositol to GlcUA (9). myo-Inositol and its derivatives are ubiquitous among eukaryotes and archaea, but their synthesis and metabolism is believed to be less common among bacteria (54). Although none of the francisellae genomes sequenced to date contained ORFs encoding proteins putatively involved in the transport and/or metabolism of myo-inositol, most of them, including strains 3523 and Fx1, had a suhB homolog (FN3523_1410 and FNFX1_1384, respectively). This evolutionarily conserved gene encoded inositol-1-monophosphatase, which hydrolyzes myo-inositol-1-phosphate to yield free myo-inositol (60, 61). Thus, it appears that most members of Francisella can convert myo-inositol-1-phosphate to free myo-inositol. However, only strains 3523 and Fx1 as well as F. philomiragia strain ATCC 25015 are able to utilize myo-inositol to synthesize GlcUA, which then is metabolized using the Entner-Doudoroff pathway.
Polycations such as putrescine and spermidine are precursors for several cellular components, and pathways for the biosynthesis of these polyamines have been described in a number of species (97). Polyamines also have been implicated in bacterial oxidative stress response and virulence (78). The ubiquitous presence of genes encoding putrescine transport systems (potFGHI) in bacterial genomes suggests that these small molecules play crucial roles in cellular physiology. Strains 3523, Fx1, and U112 as well as F. tularensis subsp. tularensis strains Schu S4, WY96-3418, and OSU18 contained a cadA gene (FN3523_0462, FNFX1_0489, FTN_0504, FTT_0406, FTW_1667, and FTH_0474, respectively). This gene encoded a putative protein that had 55% identity to E. coli lysine decarboxylase (NP_418555; E value, 0), which is involved in cadaverine biosynthesis (53). The strains mentioned above also contained a potGHI operon (e.g., FN3523_1141 to FN3523_1143) and an unlinked potF gene (e.g., FN3523_0514).
Furthermore, strain 3523 contained a cluster of five ORFs (FN3523_0489 to _0493, 4,873 bp, 32.6% GC) encoding putative proteins related to enzymes involved in the biosynthesis of spermidine and putrescine (Table 2, Fig. 4B). This gene cluster was not found in strains Fx1 and U112 but was present in other F. tularensis subsp. tularensis genomes (e.g., strains Schu S4, OSU18, and FSC147). In F. tularensis subsp. tularensis strain WY96-3418, fragmented ISFtu1 elements were found near this gene cluster (Fig. 4B). Strain 3523 S-adenosylmethionine decarboxylase had 45% identity to SpeD (MJ_0315; E value, 9e−27) of Methanocaldococcus jannaschii, but it is unrelated to SpeD of E. coli. Strain 3523 spermidine synthase and arginine decarboxylase had 55 and 28% identities to E. coli SpeE (3O4F_A; E value, 1e−92) and SpeA (3NZQ_B; E value, 3e−60), respectively. Strain 3523 agmatine deiminase and N-carbamoylputrescine amidohydrolase had 37 and 57% identities to Pseudomonas aeruginosa PAO1 AguA (PA0292; E value, 1e−58) and AguB (PA0293; E value, 8e−94), respectively. These proteins are key components in the biosynthesis of polyamines in M. jannaschii, E. coli, and P. aeruginosa (14, 35, 45). The acquisition of genes encoding polyamine biosynthesis/metabolism functions may enhance the adaptive capabilities of a bacterium (78, 97). The presence of spermidine biosynthesis genes in strain 3523 as well as F. tularensis subsp. tularensis strains, but their absence in strains Fx1, U112, and F. philomiragia strain ATCC 25017, is a notable feature from a pathogenic and evolutionary standpoint.
The lac operon, which contains the structural genes encoding proteins that facilitate lactose metabolism, is found in a variety of bacteria. Lactose is imported into the cell as a free sugar by means of a permease, and the enzyme β-galactosidase hydrolyzes this disaccharide into galactose and glucose (96). Genome comparisons revealed that strain 3523 contained a cluster of two ORFs (3,165 bp, 32% GC) adjacent to the spermidine biosynthesis locus that appear to constitute an operon (FN3523_0487 to FN3523_0488) (Fig. 4B). The predicted LacZ of strain 3523 had 29% identity to the β-galactosidase (AAF16519, BgaB; E value, 2e−79) of Carnobacterium maltaromaticum (17) and 26% identity to the β-galactosidase (O07012.2, GanA; E value, 8e−76) of Bacillus subtilis (19), but it is unrelated to the β-galactosidase of E. coli. Strain 3523 LacY had 25% identity to the putative oligogalacturonide transporter (NP_752266; E value, 3e−07) of E. coli CFT073 and 22% identity to the putative sugar transporter (YP_081973; E value, 1e−08) of Bacillus cereus. A similar lac operon was found in strain Fx1 and F. philomiragia strains ATCC 25015 and 25017. However, genes encoding the galactoside O-acetyltransferase (lacA) and the regulatory protein (lacI) were not found in any of these bacteria. In F. philomiragia strain ATCC 25017, an ORF encoding a putative transposase/integrase-like protein was found near the lac operon (Fig. 4B). Furthermore, other F. tularensis subsp. tularensis genomes (e.g., strains Schu S4, WY96-3418, OSU18, and FSC147) contained a truncated lacZ. Based on the alignment of conserved sequences upstream of lacZ from nine different Francisella genomes, a putative lac promoter also was identified (Fig. 4C). This promoter had high AT content (78 to 86%) and was similar to the lac promoters of several Gram-negative bacteria.
Although the classic lac operon of E. coli consists of three ORFs (lacZYA), which are regulated by the product of the adjacent lacI repressor gene, bacteria containing only lacZY or lacZ have been identified (27, 43, 83). While the evolutionary origin of the E. coli lac operon is uncertain (83), the occurrence of lac genes near integrative and conjugative elements and the identification of E. coli-like lac operons in some Gram-positive bacteria suggest their lateral mobility (18, 26, 89). From genome comparisons, it appeared that the last common lactose-utilizing ancestor of strains Fx1 and 3523, F. philomiragia strain ATCC 25017, and F. tularensis subsp. tularensis strains have acquired the lacZY operon by transposon-mediated horizontal transfer. The loss of this operon in strain U112 and most F. tularensis subsp. tularensis strains may be due to niche selection or through genetic drift, and a similar mechanism for some members of Enterobacteriaceae has been proposed (83). The ability to metabolize lactose probably affords strains Fx1 and 3523 a growth advantage in environments where the sugar is present, and these may represent a subset of Francisella species that have retained an ancestral copy of the lac operon. Although detailed phylogenetic analyses are required to establish the evolutionary origin of the Francisella lac operon, the lacZ gene and its promoter identified in this study should be useful in genetic studies requiring a reporter/marker within this group of bacteria.
Based on whole-genome comparisons of different Francisella species and strains, the following genetic acquisitions and losses are evident. While strain Fx1 contained the lac operon in addition to genes for GlcUA utilization, strain 3523 contained the lac operon along with GlcUA utilization, thiamine biosynthesis, and spermidine biosynthesis genes. Whereas most F. tularensis subsp. tularensis strains had lost loci for GlcUA utilization and thiamine biosynthesis, they contained a vestigial lacZ gene and a complete speDE system. Furthermore, strains 3523 and Fx1 were the only ones containing the loci for RTX toxin production. Notably, strain U112 lacked all of these loci but contained an arsRB operon, which was absent from strains 3523 and Fx1 as well as F. tularensis subsp. tularensis strains. A summary of the presence/absence of these loci is presented in Fig. 5A. It is possible that strain U112, an environmental isolate of F. tularensis subsp. novicida, had retained an arsRB operon but had lost loci for lactose and GlcUA utilization as well as genes for thiamine biosynthesis because of niche selection. Strains 3523 and Fx1 appeared to be in transition from being environmental to pathogenic, since they have lost the arsRB operon but contain loci for lactose and GlcUA utilization, in addition to genes for RTX toxin production. Extant strains of F. tularensis subsp. tularensis appeared to have lost the arsRB operon and loci for lactose and GlcUA utilization, in addition to genes for thiamine biosynthesis, as a consequence of host adaptation and reductive evolution. Since strain 3523 and most F. tularensis subsp. tularensis strains contain a complete speDE system, the role of spermidine in the pathogenesis of tularemia needs to be carefully studied. This is especially important in light of recent reports implicating spermine and spermidine in the stimulation of host cells by F. tularensis subsp. tularensis (10).
Based on these observations, it may be surmised that the genes/loci for arsenite resistance, GlcUA utilization, lactose utilization, and thiamine biosynthesis are more ancient. It is possible that genes for RTX toxin production and spermidine biosynthesis were acquired later by an F. tularensis/novicida-like clade. Although strains 3523 and Fx1 were isolated in different parts of the globe, both were associated with marine environments, and their recent common ancestry cannot be ruled out. Deletion-based phylogenetic analyses have indicated F. tularensis subsp. novicida to be the oldest and that both the acquisition and loss of genes have occurred during the evolution of F. tularensis (84). Phylogenetic analyses based on full-length 16S rRNA and sdhA genes support the proposition that F. tularensis subsp. novicida is very closely related to F. tularensis subsp. tularensis and indicate that strain 3523 represents a new lineage parental to the other F. tularensis subsp. novicida and F. tularensis subsp. tularensis strains (Fig. 5B and C).
In conclusion, previous genome comparisons have indicated that subspecies of F. tularensis have independently and intermittently acquired and lost genes, and that the net loss in F. tularensis subsp. tularensis is more than the net loss in F. tularensis subsp. novicida (4, 75). Based on the analyses of unidirectional genomic deletion events and single-nucleotide variations, it has been suggested that the different subspecies of F. tularensis have evolved by vertical descent (84). Analyses of the genomes of strains 3523 and Fx1 imply that these strains represent new links in the chain of evolution from the F. novicida-like ancestor to the extant strains of F. tularensis. The presence of genes encoding novel biochemical properties appeared to have contributed to the metabolic enrichment and niche expansion of F. novicida-like strains. The inability to acquire new genes coupled with the loss of ancestral traits and the consequent reductive evolution may be a cause for, and an effect of, niche restriction of F. tularensis subsp. tularensis. Although numerous previous studies have discovered the genetic basis for some of the biochemical and antigenic dissimilarities between F. tularensis subsp. novicida and F. tularensis subsp. tularensis strains, comparative genome sequence analyses have provided a comprehensive account of innate and acquired genetic traits in this important group of bacteria.
This study was funded by the United States Department of Homeland Security (Chemical and Biological Division, Science and Technology Directorate) through an interagency agreement (grant number HSHQDC-08-X-00790) and by the Centers for Disease Control and Prevention.
We thank Karen Davenport, Christine Munk, and other members of the Genome Sequencing and Finishing Team at the Los Alamos National Laboratory for help with sequencing strains Fx1 and 3523.
†Supplemental material for this article may be found at http://aem.asm.org/.
Published ahead of print on 10 June 2011.