The FNP genome consists of a single circular chromosome containing 2,429,698 bp (, accession number CM000440) and a single circular plasmid of 11,934 bp (, accession number CP111710). The FNP genome is larger than the FNN genome (2,174,499 bp), and the unfinished FNV genome (2,118,259 bp). The GC content of the chromosome is 26.84%, similar to FNN and FNV () and the GC content for the plasmid is 24.53%. There are 2,433 predicted ORFs, 42 pseudogenes, 45 tRNA, 15 rRNA, and 11 ncRNA genes in the FNP genome.
Map of the FNP ATCC 10953 genome.
General genome statistics for FNP, FNN and FNV.
Forty-two of the tRNAS lie in one of seven clusters, each containing from two to fourteen tRNAs. Identical clusters, in both content and internal gene order, are found in FNN. However, the relative position of the clusters in the genome is different. In addition, FNN has two additional asparagine tRNAs, which are associated with rRNA gene regions that have not been fully sequenced in ATCC 1095300 so it is possible that these tRNAs are present in FNP. The tRNAs represent all twenty amino acids. In five cases (Ala, Cys, His, Trp, Tyr), there is only a single tRNA. Many of the tRNAs are duplicated (Asn, Gln, Gly, Leu, Pro, Lys, Ser, Thr, Val) or triplicated (Asp, Glu). The clusters include duplicates as well as singleton tRNAs with no clear pattern associated with the distribution of singletons, duplicates and triplicates. Some tRNAs have multiple copies only one of which is unique, e.g. Gly, Lys, Ser, Val. All four Arg tRNAs, the three Met tRNAs (one is identified as the likely initiator) and the two Phe tRNAs have unique sequences.
A 131 bp repeat sequence occurs seventeen times within the genome (). In several cases, the repeat regions have the potential to encode small hypothetical peptides, which we believe were incorrectly annotated as ORFs in FNN and FNV. All of the 131 nt. repeats from FNP were aligned and used to generate a consensus sequence for additional BLASTN searches. Five nearly complete repeats (≥90% of the element's length) and eight repeats with gaps were observed. Sequences corresponding to the 3′ half of the element were also found in six locations. In most cases, the repeat sequence was found in intergenic regions and not within coding sequence. One complete copy of the repeat and numerous subsequences were identified in FNN and twenty-eight complete copies plus numerous subsequences were identified in FNV. As in FNP, these fell within intergenic regions. Because the long repeats occur within intergenic regions, it is possible that the sequence is involved in gene regulation, though no particular regulatory motifs were found within the sequence.
Many examples of conservation of gene order, operon structure and gene clustering were observed in the genome. Most of the ribosomal protein genes are organized into operons similar to the L11, L10, str
, S10, spc
and α operons in Escherichia coli
; each encodes between two and eleven ribosomal proteins 
. Several non-r-protein clusters, similar to conserved non-r-protein gene clusters described in E.coli
and other bacteria 
were also identified in the FNP genome. shows the conservation of these gene regions between FNP, FN, Clostridium difficile
(NC_009089), Bacillus anthracis str. Ames
MG1655 (NC_000913). The conserved gene clusters were categorized in five groups according to protein function. Clusters 1–11 in group I are primarily RNA and protein constituents of the ribosome. Group II encodes subunits of the F-type two-sector ATPase. Proteins encoded by the group III clusters are involved in RNA synthesis, modification, transcription, and translation. Cluster 17 in group IV, encodes a spermidine/putrescine ATP binding cassette (ABC) superfamily transporter, while group V contains cluster 18, which codes for the molecular chaperones GroEL and GroES.
Conserved gene clusters/operons in FNP.
Slightly more than 62% of the FNP genes (1514) are found in both of the previously sequenced fusobacterial genomes. Thus, nearly 38% of the genome is either wholly unique to FNP or is shared by FNP and only one of the other two genomes. In terms of coding potential, these 919 genes account for the differences between FNP, FNN, and FNV and thus serve to distinguish FNP. Three comparative lists have been generated: ORFs unique to FNP with respect to FNN and FNV; ORFs common to FNP and FNV, but absent in FNN; and ORFs common to FNP and FNN, but absent in FNV (Table S1
627 FNP ORFs have no ortholog in either FNN or FNV (Table S1a
), including 106 conserved hypothetical proteins, 287 hypothetical proteins and 9 pseudogenes. Thirty-eight ORFs functioning in transport, including transporters of amino acids, oligopeptides, a siderophore, and divalent metals (Hg2+
) are also unique. Seven additional membrane proteins, two phosphotransferases, and two beta-lactamases are also present only in FNP. Twenty-seven ORFs related to transcriptional regulation are unique to FNP, including a LuxS autoinducer ortholog and two sensor histidine kinases. Additionally, FNP encodes several unique proteins related to DNA modification. These include such functions as methylation, histone acetylation, recombination, integration, topoisomerase, and type I restriction and modification. FNP also contains numerous prophage and transpose genes not found in FNN or FNV. Four Tra conjugation genes (FNP_1868–1871) were found only in FNP. These are adjacent to a region encoding two proteins resembling Type IV secretion components, an outer membrane protein, and four hypothetical proteins. Thus, it is plausible that FNP may have obtained a Type IV secretion system via HGT. This region and the prophage sequences are discussed in more detail, below.
Ninety-six ORFs (5 are pseudogenes) are shared between FNP and FNV but are missing from FNN (Table S1b
), including 20 conserved hypothetical proteins and 36 fusobacterial conserved hypothetical proteins. Citrate lyase, glutamate–ammonia lyase, serine-pyruvate aminotransferase, cholinephosphate cytidylyltransferase, sulfate reductase and malate dehydrogenase enzymes are shared by FNV and FNP. They also each contain two N-acetylmuramoyl-L-alanine amidases not found in FNN. Five regulatory proteins are common to FNP and FNV, including transcriptional regulators of the MarR, MerR, LysR, and Crp families, and the RNA polymerase sigma factor σ54
Two hundred thirteen ORFs (5 are pseudogenes) are found in both FNP and FNN but not in FNV (Table S1c
). The total may be inflated, however, because the FNV genome is incomplete and some ORFs may have been missed. This set includes 35 conserved hypothetical proteins and 49 fusobacterial conserved hypothetical proteins. There are 30 transport proteins in this group, including iron transporters, a siderophore transporter, amino acid symporters, ion symporters, and a formate/nitrate transporter. Twelve ORFs related to genetic regulation are in this group, including RpiR and TetR family regulators, Fur, an iron-dependent transcriptional regulator and a response regulator and sensor histidine kinase for the ethanolamine utilization pathway. Also common to FNP and FNN are the GroeSL chaperonins, a cold shock protein, a ribosome-related heat shock protein, and a translation inhibitor. Other proteins of interest in this group are a beta-lactamase, bacterioferrin, at least 12 genes encoding the ethanolamine utilization gene family, an autotransporter/adhesin, an O-antigen assembly gene, 5 glycosyltransferases and the gene for a possible immunosuppressive protein FipA 
Horizontal Gene Transfer
HGT can be detected by several parametric methods based on deviant nucleotide composition 
, dinucleotide frequencies 
, codon usage biases 
or patterns inferred by Markov chain analysis 
. Phylogenetic methods determine a gene's unusual similarity or distribution among organisms by comparing phylogenetic trees of different genes from the genome and assessing the significance of any resulting incongruities. Alternative phylogenetic methods exist that do not reconstruct phylogenetic trees like Clarke's phylogenetic discordance test 
and Lawrence's 
rank correlation test. Another reliable inference of recent HGT events is the anomalous phylogenetic distribution method wherein a gene is present in one genome but not found in several closely related genomes 
. This is the approach used examine genes that had no top BLAST hit to either of the two sequenced fusobacterial genomes, FNN and FNV.
Based on BLASTP similarity searches, a total of 1235 ORFs, composed of the 621 FNP ORFs and 9 pseudogenes with no top hits to FNN or FNV (Table S1a
) and 608 hypothetical or conserved hypothetical proteins, were graphically plotted to identify clusters that could represent regions of HGT (). About 21% of these, or 255 ORFs, mapped within gene clusters. There were 28 specific regions or islands of interest with clusters of 5 or more genes. Top BLASTP hits for each cluster (Table S2
) were examined to determine a consensus genus and species.
Whole genome display of FNP illustrating clustering of genes without hits in FNN or FNV.
One hundred forty of the ORFs (out of 255), or 55% were hypothetical proteins with no matches to other bacterial proteins. Of the remaining 115, 20 ORFs or 17% had top hits to the Clostridia. The most common top hits in this class were to Clostridium tetani, Clostridium thermocellum, Clostridium perfringens, and Desulfitobacterium hafniense. Other ORFs had top hits to other Firmicutes including Bacillus, Streptococcus, Listeria and Enterococcus species. Hits to the archaea Methanosarcina mazei, Methanococcoides burtonii, and Methanothermobacter species, as well as to cyanobacteria Nostoc punctiforme, Trichodesmium erythraeum and Synechocystis sp., were also observed.
A 10 kilobase (kb) region of the FNP genome from nt. 27349 to 37954 (FNP_2111–FNP_2124) appears to have arisen via HGT since this region is not found in the other published fusobacterial genomes and since its GC content (30.4%) is higher than that of the remainder of the genome (26.8%). In the Firmicutes
, the clostridial %GC ranges from 28.5 to 30.9 so the DNA may have been acquired from this genus. The region includes 14 predicted genes, 5 of which compose Cluster I (Table S2
). Twelve of the genes have no orthologs in FNN or FNV. The genes in this region are homologous to the propanediol utilization locus (pdu
) of Salmonella enterica
serovar typhimurium, which also arose via HGT 
. Propanediol is a byproduct produced during the fermentation of fucose 
, which has been shown to be present in saliva and metabolized by oral bacteria 
. Although it appears that FNP is missing the fucose catabolism (fuc
) operon, some bacteria such as E. coli
secrete propanediol 
, so it is possible that FNP can utilize this propanediol pool.
According to Kapatral et al.
, the genomes of FNN and FNV lack the necessary enzymes for valine, isoleucine, and leucine biosynthesis 
. A region in the FNP genome carries the ilv
operon, which is responsible for the biosynthesis of these amino acids. The predicted products encoded by this locus include dihydroxy-acid dehydratase (IlvD, FNP_0059), threonine ammonia-lyase (IlvA, FNP_0060), acetolactate synthase (IlvB and IlvN, FNP_0061 and FNP_0062), 2-isopropyl malate synthase (LeuA, FNP_0063), 3-isopropyl malate synthase (LeuC and LeuD, FNP_0064 and FNP_0065), isopropyl malate dehydrogenase (LeuB, FNP_0067), and ketol-acid reductoisomerase (IlvC, FNP_0069). The cluster of unique genes (Cluster IV) also includes a small hypothetical protein gene between leuD
(FNP_0066). With the exception of ilvA
, all of these genes are missing from the genomes of the other two sequenced fusobacteria. Three additional ilv
genes are located at non-adjacent loci in the genome including an additional copy of ilvA
(FNP_1302), which is not in the other genomes, and two copies of ilvE
(branched-chain-amino-acid transaminase), one that is unique to F. nucleatum
ATCC 10953 (FNP_1952) and one that is also found in the other two genomes (FNP_1165).
A prophage genome was identified immediately downstream of an arginine tRNA gene between coordinates 2024189 and 2053649 (28.9% GC) ( and ). The ORFs are not found in FNN or FNV (Table S1a
). Forty-two open reading frames (FNP_1662–1703) were predicted in this region spanning four clusters of unique genes (XXI–XXIV), including genes encoding integrase, DNA polymerase, antirepressor, helicase, and terminase proteins. Several genes encoding bacteriophage structural components, such as capsid and tail proteins, were also identified in the region, though 20 of the open reading frames encode hypothetical proteins. The predicted tail proteins were most similar to those in a potential prophage genome in C. tetani
, while the terminase and packaging proteins were most similar to those in the C. perfringens
phage phi3626 
. High scoring matches to the non-structural proteins were found in other gram-positive genomes, such as Bacillus halodurans, Streptococcus mitis
, and C. thermocellum
. Homologs of 10 of the phage proteins were found in FNV, though only one (a helicase, FNP_1671) was found in FNN. An additional block of phage-like genes map between coordinates 1775962 and 1786106 (FNP_1415–1432) ( and Table S2
, Cluster XVIII). Only one of the 19 proteins encoded in this region had orthologs in the other fusobacteria and only two of the proteins (a replication protein and integrase) matched to other bacteriophage sequences.
Linear map of prophage located between nts 2,024,189 and 2,053,649 in FNP.
The region between nts. 2174023 and 2218775 in the FNP genome (FNP_1820–1879, ), containing 59 genes that include Clusters XXV–XXVII (25.4% GC), is predicted to contain a large conjugal plasmid (, Table S1a
, Table S2
). Fifty-five unique genes, not found in FNN or FNP, encode a primase/helicase that could function as a replication initiation protein, topoisomerase, integrase, recombinase, a plasmid partitioning protein, and pseudogenes of a mobilization protein and a plasmid addiction system. This region also carries genes encoding homologs of seven Type IV secretion system (T4SS) proteins. T4SS can translocate DNA and proteins out of the bacterial cell to recipient cells; bacterial conjugation systems are a subset of this family 
. Full-length copies genes encoding the T4SS proteins VirB4, VirB8, VirB10, VirB11 and VirD4 (FNP_1868–1871, 1873, 1875) are found within the conjugal plasmid region, as are truncated versions of VirB6 and VirB9. These proteins are most similar to orthologs in Mesorhizobium
. The T4SS proteins identified in FNP could constitute the inner membrane and periplasmic components of the transporter, but genes encoding components for biogenesis of the T-pilus (VirB1, VirB2, VirB3, VirB5 and VirB7) are missing. A different set of proposed Type IV pilus genes are present in FNP. A cluster of eleven genes (FNP_2389–2399) plus an unlinked pilT
gene may encode the pilus, as suggested by Desvaux, et al. 
Five composite ribozyme/transposons, similar to the CdISt
IStrons described in C. difficile 
were identified in the genome (). The consensus IStron in FNP is 1811 nt. long and contains a 477 nt. intron followed by an open reading frame encoding the transposase-like protein, TlpB. The FNP IStron is 31% identical to CdISt
-C34 from C. difficile
and contains four conserved RNA sequences that form the catalytic core of group I introns 
. All five IStrons in FNP are inserted directly downstream of the pentanucleotide TTGAT, which is the conserved site of insertion of the IS8301
family of transposons 
. The IStrons have an average G+C content of 29%, consistent with that of both fusobacteria and clostridia. In C. difficile,
self-splices to remove itself from the mRNA into which it is inserted. As a result, the insertion does not disrupt expression of the gene. In FNP, we predict that only one copy of the IStron (1361160 to 1362978) is a fully functional element with self-splicing and transposition activities, since the other copies have mutations in either the ribozyme or tlpB
regions of the element. Three additional sequences with homology to portions of the ribozyme were identified in the FNP genome and ten additional copies of tlpB
-like genes occur in the genome. Homologs of this element were not found in any other organism, including the two strains of Fusobacterium
that have been previously sequenced, though TlpB sequences are found in a variety of organisms, including cyanobacteria, Bacillus cereus, Enterococcus, Deinococcus
. Thus, it appears that a unique exchange between C. difficile
and FNP has occurred.
ATCC 10953 harbors a single plasmid, pFN3 
(), which is 11,934 bp in length and has a GC content of 24.53%. Eleven pFN3 ORFs were identified: two possible replication protein genes, a possible resolvase/recombinase gene, a DNA relaxation protein gene and seven hypothetical protein genes. The two replication protein genes (FNP_pFN3g01 and FNP_pFNgo5) have predicted protein sequences with 20–22% identity and 27–32% similarity to the putative replication protein of the F. nucleatum
native plasmid pFN1 
. The sequence upstream of the pFN3 replication protein gene at 1315 has a sequence (1007 to 1136) characterized by clusters of two overlapping 18 bp repeats (repeat 1: TAATAGTACAAATTTCCC; repeat 2: TAGTACAAATTTCCCGAT
). Several of the repeats are spaced at 22 bp intervals, suggesting that they may represent replication protein binding sites that are characteristic of the replication origin of iteron-regulated plasmids. The resolvase (FNP_pFN3g09) was identified based on the presence of a N-terminal resolvase domain (pfam02796). The DNA relaxation protein (relaxase) (FNP_pFN3g07) has a relaxase domain and contains the conserved consensus motifs defined for relaxase proteins 
(). The pFN3 resolvase and relaxase genes both have potential significance for HGT. Resolvases are important in DNA recombination events, including excision and integration of mobile DNA elements. Relaxase proteins mediate the initiation of conjugal transfer of plasmid DNA. Plasmids that encode relaxases, which are not conjugative themselves, may be mobilized with the additional conjugative functions provided in trans
. Two other native F. nucleatum
plasmids, pFN1 (AF159249) and pPA52 (AF022647), which are 98% identical, also carry relaxase genes 
. The occurrence of the relaxase genes suggests the possibility that these plasmids were introduced into F. nucleatum
by conjugative processes. Consistent with this mechanism of HGT is the finding that plasmids or DNA sequences related to pPA52 have been detected in 18% of F. nucleatum
strains examined 
We identified 132 predicted proteins that may play a role in fusobacterial virulence (). Most of these are found in FNN and FNV, though there are a few notable exceptions. As in the two previously sequenced fusobacterial genomes 
, we identified a VacJ homolog (FNP_0314). This protein has been shown to play a role in the intracellular spread of Shigella flexneri 
. Although its mechanism of action has not been examined, VacJ may play a similar role in FNP since recent evidence suggests that F. nucleatum
can invade epithelial cells 
, which may allow dissemination throughout the host to cause infections at non-oral sites 
. Other previously known virulence factors were identified in FNP including the porin FomA (FNP_0972; not in FNV) 
, MviN (FNP_1360), which plays a role in virulence in Salmonella typhimurium 
, TraT (FNP_1881) which provides resistance to complement 
, and VacB (FNP_1921), a ribonuclease involved in virulence gene expression in S. flexneri 
. The strain also carries genes for butyrate fermentation (FNP_0790, 0791, 0969, 0970, 0971, 1762 and either 1467 or 2146). The production of butyrate has been associated with mouth odor and gingival inflammation 
. We also include FipA (described above) as a virulence factor because of its immunosuppressive properties 
, though it is most similar to an acetyl-CoA transferase of the butyrate fermentation system. The fipA
gene is not present in FNV.
Potential virulence factor genes.
The acquisition of iron from the host environment is an important function of most bacterial pathogens 
. We have identified 26 predicted proteins involved in iron uptake in FNP. Three proteins, HmuV (FNP_2267), HmuU (FNP_2266), and HmuT (FNP_2269), form a heme ABC superfamily ATP binding cassette transporter while two additional proteins (FNP_2270 and FNP_1765) are probable TonB-dependent heme receptors. There are also two additional iron ATP binding cassette transporters (FNP_428–430 and FNP_1451–1454) and a cobalamin/iron ATP binding cassette transporter (FNP_0398–0341). An Nramp family iron transporter (FNP_1660), and an OfeT family oxidase-dependent iron transporter (FNP_0531) were also annotated. Three hemolysin genes were identified (FNP_0006, FNP_0159, and FNP_0999); two of these (FNP_0159 and FNP_0999) have associated TPS family two-partner secretion proteins (FNP_0155/0156 and FNP_1246/1247, respectively). This is similar to what is seen in FNV, but different than FNN, which has three such pairs 
. Several of the iron transporters (FNP_0339, FNP_0426, FNP_0428, FNP_0531, and FNP_0769) are present in FNN but are missing in FNV; thus FNV may have a diminished requirement for iron or may occupy a different niche. As mentioned previously, a homolog of the ferric uptake regulator, Fur (FNP_2353), was identified in the FNP genome, though it is not present in FNV.
Sixteen possible drug transporters were annotated. These included 7 MOP/MATE family multidrug efflux pumps (FNP_0174, FNP_0640, FNP_0890, FNP_1162, FNP_1207, FNP_1299, and FNP_1596), 2 DMT superfamily drug/metabolite transporters (FNP_0388 and FNP_0622) and 2 RND family antiporters (FNP_0507 and FNP_0508). Our annotation did not permit us to predict the substrates of these transporters but it is likely that many of them are antibiotic transporters. With respect to antibiotic resistance, we also annotated 4 genes predicted to encode beta-lactamases. One of these (FNP_0627) is unique to FNP.
We identified all but one (FN0387) of the 14 outer membrane protein genes described by Kapatral, et. al 
(two, FNP_1046 and FNP_2283, have been re-annotated as AT family autotransporters) and we discovered a gene encoding OmpW (FNP_1248), which is not found in FNN or FNV. Four potential adhesion proteins were identified including a fibronectin-binding protein homolog (FNP_1337), a possible autotransporter adhesion (FNP_1391), and two proteins (FNP_1880 and FNP_1888) containing von Willebrand (vWF) type A domains. In addition to the Type IV secretion system discussed previously, FNP also carries genes that belong to the Type V secretion system. These include ten autotransporter genes (8 class 1, Type Va; 2 class 2, Type Vc) and the Tps secretion genes tpsA
(Type Vb) (). These are a subset of the genes found in FNN and FNV 
Twenty-five ORFs predicted to be involved in the biosynthesis of LPS were identified. This is of interest because F. nucleatum
has been shown to have endotoxin activity 
. Unlike FNN, however, FNP does not possess the lic
operon, which is predicted to attach choline residues to the LPS 
; only the licC
genes, encoding phosphocholine cytidylyltransferase and a phosphotransferase, respectively, are present. In contrast, FNP does contain genes (FNP_1105–1107) that encode N-acylneuraminate cytidylyltransferase, N-acetyl neuraminate synthase, N-acetylneuraminate synthase and a possible lipooligosaccharide sialyltransferase (FNP_1109) that might incorporate sialic acid into LPS, like FNV (note, however, that FNP_1109 is not found in FNN or FNV). This may facilitate evasion from the host immune response.