General genomic features. The
S. sanguinis genome is comprised of a 2,388,435-bp circular DNA molecule, which is 7 to 24% larger than previously described streptococcal genomes (Table ). The genome start point was assigned to the putative origin of replication, as determined by GC skew (
61), the location of the
dnaA gene, and similarity to other genomic sequences (
54). The putative replication termination region is ~1.2 Mbp downstream from the origin of replication (Fig. ). The G+C content of the genome is 43.40%, which is higher than the G+C content of any of the 21 other completed streptococcal genomes (35.62 to 39.72%) (Table ). For protein-encoding genes, the G+C contents are 53.55, 35.46, and 44.35% for positions 1, 2, and 3, respectively. Based on the relationship between the G+C contents of whole genomes and the G+C contents of position 3 of coding sequences, which were recently determined for 232 eubacterial genomes (
93), the expected value for position 3 in
S. sanguinis is 42.5%, in good agreement with the observed value. This observation suggests that unlike the findings for
Lactobacillus bulgaricus, the higher overall G+C content of
S. sanguinis is not due to an ongoing process of compositional change or to a different relationship of whole-genome and third-position G+C values. There are four rRNA operons containing the 5S, 16S, and 23S rRNA genes, which is less than the number in most other streptococci (Table ), despite the larger genome size and in contrast to a reported correlation between the numbers of rRNA and tRNA genes and the genome sizes in the
Firmicutes (
93). The 61 predicted tRNA genes encode all 20 amino acids, but wobble rules are required for several abundant codons (
www.sanguinis.mic.vcu.edu/supplemental.htm). Most tRNA genes are clustered near the rRNA operons; i.e., 48 of 61 of these genes were less than 1 kb from an rRNA operon (Fig. ), as in
Streptococcus pneumoniae (
88).
| TABLE 1.Comparison of S. sanguinis SK36 genome with other streptococcal genomes |
The genome contains genes encoding 2,274 predicted proteins and accounting for more than 90% of the sequence (
www.sanguinis.mic.vcu.edu/supplemental.htm). About 86% (1,965) of these genes are transcribed in the direction of replication, like the genes in other streptococci (
2,
87,
88). The average gene contains 935 bp of coding sequence, and the average intergenic region is 115 bp long. The latter value is smaller than the values for other streptococcal genomes that have been sequenced, whose average intergenic region lengths are 130 to 177 bp, or for the
E. coli genome, whose average intergenic region length is 139 bp. This observation suggests that
S. sanguinis has a more compact genome, although differences in annotation methods may also explain the differences. Of the predicted proteins, 89% exhibit significant similarity to proteins from other organisms. About 22% are conserved hypothetical proteins (present in multiple species but having unknown functions), and approximately 645 of the predicted proteins were confirmed by MS (
www.sanguinis.mic.vcu.edu/supplemental.htm).
The
S. sanguinis SK36 genome was compared with other genomes to identify the proteins that are conserved among streptococci. Figure shows the homologous proteins that are shared by
S. sanguinis,
S. mutans, and
S. pneumoniae. This analysis indicated that
S. sanguinis shares 23 more proteins with
S. mutans than with
S. pneumoniae and that the latter two species share only 19 proteins not present in
S. sanguinis. Previous analyses based on rRNA (
41) and our more broadly based phylogenetic analysis confirmed that
S. sanguinis is more closely related to
S. pneumoniae than to
S. mutans, suggesting that the similarity with
S. mutans reflects the shared oral niche of these two species. The proteins shared by only
S. sanguinis and
S. mutans include 60 proteins that are hypothetical or have unknown functions and, interestingly, 34 putative transcriptional regulators. All proteins in the
S. sanguinis genome were functionally categorized and compared (Fig. ) essentially as previously described (
98).
Energy and metabolism. Consistent with previous observations (
43),
S. sanguinis can apparently use a broad range of carbohydrate sources for survival. We identified more than 50 putative carbohydrate transporters, including phosphotransferase system enzymes specific for transport of glucose, fructose, mannose, cellobiose, glucosides, fructose, lactose, trehalose, mannose, galactitol, and maltose (see Table S1 in the supplemental material). Thus, this bacterium seems to possess a robust system for energy generation by fermentation of sugars and other carbohydrates.
Similar to
S. mutans (
2) and other streptococci,
S. sanguinis has an incomplete citrate cycle and contains only the enzymes to convert oxaloacetate into 2-oxoglutarate. Although clearly incapable of direct ATP production, this pathway fragment likely generates intermediates in the synthesis of aspartate and glutamate.
Our analysis suggests that
S. sanguinis has a robust biosynthetic capacity. All key enzymes for gluconeogenesis are present. This bacterium has both pyruvate-phosphate dikinase (EC 2.7.9.1) (encoded by SSA_1053) found in other streptococci and phosphoenolpyruvate synthase (EC 2.7.9.2) (encoded by SSA_1012 and SSA_1016) that is absent in other streptococci. There is also a
Firmicutes-specific fructose-1,6-bisphosphatase (EC 3.1.3.11) (encoded by SSA_1056) that is present in
Streptococcus agalactiae but not in
S. pneumoniae,
S. mutans,
Streptococcus pyogenes, or
Streptococcus thermophilus. Phyletic pattern analyses suggested that the genes for these enzymes were acquired by HGT (
www.sanguinis.mic.vcu.edu/supplemental.htm). Similarly, enzymes in the pentose phosphate pathway and enzymes in the purine and pyrimidine pathways, which are required for de novo synthesis of nucleotides with the possible exception of dTTP, seem to be available. Enzymes necessary for converting glutamate and glutamine to intermediates in purine and pyrimidine synthesis are also present. However, as in
S. mutans (
2), the gene for nucleoside diphosphate kinase (EC 2.7.4.6), which phosphorylates dTDP to dTTP, could not be identified. Since these enzymes are highly conserved in other streptococci, it is unlikely that we missed identifying their genes, assuming that they are derived from common progenitors.
S. sanguinis seems to have the ability to synthesize de novo all essential amino acids except the branched amino acids (leucine, isoleucine, and valine), lysine, and tryptophan (
www.sanguinis.mic.vcu.edu/supplemental.htm). This conclusion is in agreement with our finding that
S. sanguinis cannot grow in a semidefined biofilm medium (
52) if supplemental amino acids are not included (data not shown). Synthesis of asparagine likely relies on a two-step process in which aspartate is bound to tRNA
Asn by a nondiscriminating Asp-tRNA synthetase, followed by conversion of the aspartate to asparagine via a three-subunit aspartyl/glutamyl-tRNA amidotransferase, as has been shown for
Deinococcus radiodurans (
62). The latter enzyme is probably also responsible for conversion of Glu-tRNA
Gln to Gln-tRNA
Gln, thus explaining the lack of a gene encoding glutaminyl-tRNA synthetase in the genome (
72). As noted above, enzymes for gluconeogenesis are present and could permit the bacterium to convert some amino acids (e.g., serine) into fructose-6-phosphate, an entry point of the pentose phosphate pathway. In this way, amino acids can be converted into the precursors of nucleotide biosynthesis. Marri et al. (
58) recently reported that among the streptococci,
S. mutans is unique in possessing the genes responsible for biosynthesis of histidine and that
S. pyogenes is unique in its apparent ability to convert histidine to glutamate.
S. sanguinis possesses the genes for both of these processes.
Lipid biosynthesis apparently follows the classical bacterial type II fatty acid synthase complex pathway (
34). As shown previously for
S. pneumoniae (
33,
57),
S. sanguinis encodes the enoyl-(acyl-carrier protein) reductase (EC 1.3.1.9) FabK instead of the widespread and conserved FabI type enzyme of other bacteria and plants. The FabK enzyme of
S. pneumoniae is less sensitive to inhibition by the antimicrobial triclosan than FabI is (
33,
57). Therefore,
S. sanguinis is probably more resistant than FabI-containing bacteria to inhibition of lipid biosynthesis by the triclosan used in some toothpastes. Fatty acids can be generated from amino acids since enzymes needed for the conversion of some amino acids (e.g., serine) into acetyl coenzyme A are present (
www.sanguinis.mic.vcu.edu/supplemental.htm).
As expected, the
S. sanguinis genome contains the genes required for cell wall sugar, peptidoglycan, and teichoic acid biosynthesis and degradation (
www.sanguinis.mic.vcu.edu/supplemental.htm). Single copies of the genes encoding homologs of the
S. mutans signal recognition particle components Ffh, FtsY, and small cytoplasmic RNA are present in
S. sanguinis, as are single copies of the genes encoding the secretion components YidC1, YidC2, YajC, SecA, and SecYEG (
31).
HGT. In contrast to
S. pneumoniae, in which ~5% of the genome is composed of insertion sequences (IS) (
88), we found only two apparently functional IS elements (SSA_0265 and SSA_0266; SSA_1361 and SSA_1362) in
S. sanguinis. These elements are flanked by 4-bp direct repeats and are ~80% identical at the nucleotide level to IS
3 elements flanked by 3-bp repeats in
S. mutans (
55). Neither IS interrupts a known gene or open reading frame (ORF). Other evidence of transposable elements includes remnants of IS elements (SSA_1477 to SSA_1479 and SSA_0732) and a truncated transposase (SSA_2029). No intact prophages were found, although some apparent remnants (SSA_0235, SSA_2032, and SSA_2295 encoding an integrase/recombinase; SSA_2383 encoding a prophage maintenance system killer protein; and SSA_2282 encoding a phage infection protein) are present (
www.sanguinis.mic.vcu.edu/supplemental.htm). No evidence of the presence of integrons was found. Homologs of the
dpnM,
dpnA, and
dpnB genes of
S. pneumoniae encoding the DpnII restriction-modification system are present in the
S. sanguinis genome (SSA_1716 to SSA_1718). This system reduces the efficiency of HGT by phage infection, conjugative transfer, and transformation by plasmid (but not chromosomal) DNA (
47). We did not find genes for the R.StsI and M.StsI components previously found in
S. sanguinis 54 (
44).
In spite of the relative paucity of transposon- and phage-related genes, at least 270
S. sanguinis genes (12% of the genes) were identified as candidates for HGT by observing the phyletic pattern of gene distribution (
www.sanguinis.mic.vcu.edu/supplemental.htm) (see Materials and Methods). The apparent lack of phage genes and conjugative transposable elements suggests that transformation is the predominant method by which HGT occurs in
S. sanguinis. Like certain other streptococci,
S. sanguinis is naturally competent for transformation (
25). In
S. pneumoniae, 22 proteins necessary for chromosomal transformation have been identified (
70). We found that 20 of these proteins have apparent orthologs in
S. sanguinis (
www.sanguinis.mic.vcu.edu/supplemental.htm). Neither ComW, an 80-amino-acid protein which stabilizes and activates the alternative sigma factor ComX (
84) and for which there are no database matches in any other bacterium in the GenBank database, nor ComB, which functions with ComA to cleave and export competence-stimulating peptide (CSP), was identified. The SSA_1100 product exhibits similarity to ComA. However, the best matches for SSA_1100 in the GenBank database were matches to genes encoding transporters for RTX-type toxins from gram-negative bacteria (
94). Since the adjacent gene encodes a putative RTX toxin, it appears that this protein transports the toxin rather than CSP. Therefore, it appears that ComA and ComB are not present in
S. sanguinis. This absence may be related to the previous observation that ComC, the CSP precursor in
S. sanguinis, is unique among all 125 ComC sequences from 13 streptococcal species in the GenBank database in that it lacks a double-glycine cleavage site (
32). This unique cleavage site could be paired with unique proteins for processing and export.
One 70-kb cluster of 68 HGT candidates (SSA_0463 to SSA_0541) encodes an anaerobic cobalamin (vitamin B12) biosynthetic (cob) pathway, as well as propanediol utilization (pdu) and ethanolamine utilization (eut) pathways (Fig. ; see Table S2 in the supplemental material). Many of the proteins in this cluster were identified by MS, proving that these genes are expressed.
Vitamin B
12 is an important nutrient for human health; a deficiency leads to pernicious anemia. However, synthesis of this compound in prokaryotes (
40) occurs only by two alternative routes: an aerobic pathway that incorporates molecular oxygen during biosynthesis and an anaerobic pathway that incorporates chelated cobalt ions in the absence of oxygen (
78). All genes required for anaerobic cobalamin biosynthesis are present in
S. sanguinis. It appears that the complete vitamin B
12 biosynthesis pathway is available. If so, this is the first time that the complete B
12 biosynthesis pathway has been identified in streptococci, although three proteins involved in cobalamin biosynthesis and cobalt transport (
cbiMQO) have been found in
Streptococcus salivarius 57.I and
S. thermophilus (
18).
Cobalamin-dependent utilization of 1,2-propanediol via the
pdu pathway plays an important role in
Salmonella enterica serovar Typhimurium infection (
20), and the
pdu genes are correlated with cobalamin biosynthetic genes in terms of both location and coregulation. The
S. enterica serovar Typhimurium
pdu pathway contains 23 genes for the coenzyme B
12-dependent catabolism of 1,2-propanediol (
12).
S. sanguinis has all of these genes except
pduM and
pduS, which encode proteins with unknown functions, and
pduN, which encodes polyhedral bodies that may not be directly related to the catabolism of 1,2-propanediol (
12) (see Table S2 in the supplemental material).
The
eut pathway in
S. enterica serovar Typhimurium is required for utilization of ethanolamine as a carbon and nitrogen source (
75). Only 4 (
eutB,
eutC,
eutD, and
eutE) of the 17 genes in the
S. enterica serovar Typhimurium
eut operon have been correlated directly with an enzymatic activity known to be required for ethanolamine utilization (
79). Three of these four genes,
eutB (SSA_0519),
eutC (SSA_0520), and
eutE (SSA_0523), have homologs in
S. sanguinis. eutD encodes a protein with phosphotransacetylase activity (
14) and exhibits 40% identity with the
S. sanguinis SSA_1207 ORF, which is annotated as phosphate acetyltransferase ORF. A two-component system (SSA_0516 and SSA_0517) that may regulate ethanolamine utilization in response to environmental factors is upstream of
eutA. Since ethanolamine and propanediol sources in the environment seem largely man-made (e.g., toothpaste, mouthwash, and antifreeze) and their utilization is dependent on vitamin B
12, it is interesting to speculate that this large ~70-kb gene cluster may have been selected in
S. sanguinis by exposure to these man-made products.
Although very few of these cobalamin-related genes are present in previously published streptococcal genomes, many are present in other oral pathogens, including
Porphyromonas gingivalis,
Treponema denticola, and
Fusobacterium nucleatum (see Table S2 in the supplemental material). Our analyses suggest that the 70-kb cluster of HGT genes has an origin similar to the origin of orthologs in
Listeria (
www.sanguinis.mic.vcu.edu/supplemental.htm), but a more in-depth phylogenetic analysis involving more prokaryotic genomes is necessary to confirm its origin.
Two small discrete blocks of HGT candidate genes (SSA_1012 to SSA_1017 and SSA_1053 to SSA_1056) contain three genes involved in gluconeogenesis. The two genes in the second block (SSA_1053 and SSA_1056), encoding EC 2.7.9.1 and EC 3.1.3.11, are sufficient, in combination with other apparently native genes, to enable gluconeogenesis. These two genes are also found in
S. agalactiae, theoretically enabling gluconeogenesis in this organism, while all other streptococcal genomes that have been sequenced seem to lack the complete set of genes required for gluconeogenesis. The results of our analysis (see Materials and Methods) are consistent with the hypothesis that these genes were transferred by HGT to these streptococci from other bacteria belonging to the phylum
Firmicutes (
www.sanguinis.mic.vcu.edu/supplemental.htm).
Putative virulence factors and adhesins. Several proteins potentially relevant to adhesion in the oral cavity or to virulence in invasive disease were identified in the
S. sanguinis genome (see Table S3 in the supplemental material). Perhaps the most surprising is the protein encoded by SSA_1099 (Stx), which exhibits homology to RTX-type toxins in gram-negative bacteria (
94). To our knowledge, this is the first occurrence of this class of toxin gene in a gram-positive bacterium. Consistent with this unique setting, orthologs of the HylB ATPase and HlyD “membrane fusion protein” components of an RTX toxin export system are encoded by adjacent ORFs (SSA_1100 and SSA_1101, respectively), but no homolog of the TolC outer membrane component (
36) was found. Both Stx and the putative ATPase transporter component, encoded by SSA_1100, were detected in the proteomic analysis (
www.sanguinis.mic.vcu.edu/supplemental.htm). Although the leukotoxin from the oral bacterium
Actinobacillus actinomycetemcomitans is a well-known ortholog of the Stx protein, the products of SSA_1099 to SSA_1101 are, as a whole, most similar to proteins in plant-pathogenic pseudomonads. Thus, the origin of these
S. sanguinis genes and their functions are unclear.
The genes associated with pathogenicity in
S. sanguinis also include genes encoding orthologs of the major known adhesins in other viridans species. SspC and SspD are orthologs of the SspA and SspB adhesins of
Streptococcus gordonii (
39,
53). Whereas the latter proteins are encoded by adjacent genes in
S. gordonii, this is not true in
S. sanguinis. Conversely, the
cshA and
cshB adhesin genes are not contiguous in
S. gordonii (
60), whereas the
S. sanguinis crpABC orthologs are contiguous. The ligand specificity of SspA orthologs in viridans streptococci is determined by their sequences (
39,
53). Neither SspC nor SspD is closely related to any SspA homolog that has been characterized previously. As determined by BLASTP analysis (
3), SspC has only 55% identity with its closest relative (SspA), and SspD has 33% identity with its closest relative (PaaA of
Streptococcus criceti). Therefore, it is not clear what ligand(s), if any, SspC and SspD bind. However, the 27-amino-acid region of SspB that has been shown to mediate binding of
S. gordonii to
P. gingivalis is conserved in SspC (18 identical residues and five similar residues), including perfect identity of the critical NITVK subsequence (
21). This observation suggests that SspC may also adhere to
P. gingivalis.
Lipoproteins (LP) and cell-wall anchored proteins (CWA), two classes of proteins that are surface exposed and prevalent among reported virulence factors, were predicted (
www.sanguinis.mic.vcu.edu/supplemental.htm). The
lgt and
lspA genes expected for LP processing are present (SSA_1546 and SSA_1069, respectively), as are genes encoding three sortases (SSA_0022, SSA_1219, and SSA_1631) for CWA processing. Interestingly, the numbers of these surface proteins (60 LPs and 33 CWAs) are striking compared to the numbers in related species. As determined by the same search criteria used for
S. sanguinis,
S. mutans has only 29 LPs and six CWAs.
S. pneumoniae TIGR4 possesses 40 LPs and 12 CWAs, while R6 has 39 LPs and 13 CWAs. However, many of the additional ORFs in
S. sanguinis appear to be redundant. Thus,
S. sanguinis contains nine paralogous CWAs in three families and seven paralogous LPs in three families. In addition, functional redundancy may occur in the absence of overall sequence similarity; five CWAs possess the collagen-binding domain, Pfam05737 (
23). This vast array of surface proteins may contribute to the ability of
S. sanguinis to colonize the tooth and interact with a diverse group of oral bacteria (
46) and may account for its predominance as a cause of streptococcal endocarditis (
66).
Fibrils or pili are involved in streptococcal adherence and virulence (
7,
59,
82).
S. sanguinis strains possess both short fibrils and long fibrils (
30). Fap1 of
Streptococcus parasanguinis, an ortholog of the CWA encoded by SSA_0829 or SrpA, is thought to be the structural component of long fibrils (
82), and its orthologs are important for adhesion to platelets (
9), saliva-coated hydroxyapatite (
96), and salivary agglutinin (
39). The products of SSA_0830 to SSA_0841 exhibit homology to the proteins shown to be required for the glycosylation and export of SrpA orthologs in
S. parasanguinis and
S. gordonii (
9,
17,
85). In fact, the 11 genes downstream from
srpA are most similar in terms of sequence to, and are in the same order as, the 11 genes that form the export locus of the SrpA ortholog, GspB, in
S. gordonii (
85). Shorter fibrils in
S. gordonii are comprised of CshA and possibly also CshB (
59), which are orthologs of CWAs encoded by SSA_0904 to SSA_0906. The fact that
S. sanguinis has both classes of proteins, as well as the locus dedicated to SrpA export, could account for the apparent presence of both short and long fibrils. In addition, in recent studies workers have identified long pili in
S. agalactiae (
49),
S. pyogenes (
63), and
S. pneumoniae (
7). In these bacteria, a single locus contains three putative pilin subunit genes encoding CWA motifs and one to three sortase genes that are required for assembly of the pili (
7,
49,
63).
S. sanguinis also contains an apparent pilus locus, with SSA_1632 to SSA_1635 encoding LPXTG proteins and SSA_1631 encoding a sortase. SSA_1632 to SSA_1634 also each contain a conserved “E box” domain found in many pilin genes (
90).
The SSA_2302 to SSA_2318 sequences exhibit homology to ORFs required for production of type IV pili. Such pili were originally believed to exist only in gram-negative bacteria, although the gram-positive bacterium
Ruminococcus albus appears to possess a type IV pilus that serves as an adhesin (
73). Our analysis suggests that the
S. sanguinis ORFs were acquired by HGT, perhaps from a clostridial species, and are distinct from the ORFs in
S. sanguinis that apparently encode the pseudopilus involved in genetic competence (data not shown).
Cell wall polysaccharides (CWP) serve as important receptors for agglutination and coaggregation in oral streptococci (
19,
45,
46).
S. sanguinis SK36 is similar to type strain ATCC 10556 in that it coaggregates with numerous species of
Streptococcus,
Actinomyces, and
Fusobacterium (
38,
45) (Kolenbrander and Andersen, personal communication). These interactions are inhibited by addition of 60 mM
N-acetyl-
d-galactosamine, confirming the polysaccharide composition of the receptor (
45). Six structures have been defined for CWP in oral streptococci (
19), and the loci responsible for synthesis of one of these structures have been characterized in
S. gordonii (
97). Orthologs of these genes are located mostly in two genomic segments in
S. sanguinis, SSA_1509 to SSA_1519 and SSA_2211 to SSA_2225. However, these segments also contain apparent CWP synthesis genes that have close orthologs in
S. thermophilus,
Streptococcus suis,
S. pneumoniae, or
Streptococcus iniae but no orthologs in
S. gordonii. These CWP loci, therefore, appear to be unlike any loci characterized previously, and it is not clear whether they direct the synthesis of a type 1
N-acetylgalactosamine-β1→3-galactose CWP like that found in previously characterized
S. sanguinis strains (
19).
Other interesting features. The
S. sanguinis genome contains only two homologs of the twin-arginine translocation (Tat) system, which exports folded proteins with the characteristic N-terminal twin-arginine motif across the cytoplasmic membrane (
65). SSA_1132 and SSA_1133 apparently encode the TatC Sec-independent protein translocase and the TatA Sec-independent protein secretion pathway component, respectively. Of the streptococcus genomes examined to date, this system has been found only in
S. thermophilus. Our analysis showed that three genes, encoding a periplasmic lipoprotein involved in iron transport (SSA_1129), an iron-dependent peroxidase (SSA_1130), and a high-affinity Fe
2+/Pb
2+ permease (SSA_1131) associated with the Tat genes in
S. sanguinis, are similarly associated in other genomes, including the genomes of
S. thermophilus,
Staphylococcus aureus MRSA252, and
Staphylococcus haemolyticus. Using the TatP server (
8) to search for Tat secretion substrates, we found that the iron-dependent peroxidase gene SSA_1130 was the only ORF in the genome that encoded both a consensus Tat motif and a Tat signal peptide.
Two glucosyltransferases (GTF) were found in
S. sanguinis. The SSA_0613 product is a homolog of GtfR of
Streptococcus oralis ATCC 10557, which synthesizes water-soluble glucans with no primer dependence (
24). The SSA_1006 product is a homolog of GtfA, an enzyme that, in the presence of inorganic phosphate, converts sucrose to fructose and glucose-1-phosphate (
4). Furthermore, the products of several ORFs exhibit homology to
S. mutans non-GTF glucan-binding proteins (GBP), including the products of SSA_0019, SSA_0303, and SSA_0956. Non-GTF GBPs are cell surface receptors for glucan or secreted proteins that can become cell associated when glucan coats the bacterial cells. Although all GBPs have glucan-binding properties, they are a heterogeneous group of proteins with variations in size, glucan-binding domains, glucan-binding affinity, and function (
4).
More than 100 putative transcriptional regulators were identified in the
S. sanguinis genome (
www.sanguinis.mic.vcu.edu/supplemental.htm). Like the genomes of some other streptococci, the
S. sanguinis genome contains genes encoding a major sigma factor 70 (SSA_0825,
rpoD) and an ortholog of the competence-specific sigma factor, ComX (SSA_0016). Genes encoding NusA (SSA_1900), NusB (SSA_0452), and NusG (SSA_2205) were found, although no obvious Rho protein was identified. This was also true for the other streptococcal genomes examined. Two genes, SSA_1187 and SSA_1695, code for additional putative antitermination proteins. Two-component regulatory systems, composed of a sensor histidine kinase and a transcriptional response regulator, provide a mechanism for bacteria to sense and respond to environmental signals. We found 29 genes that apparently comprise 14 two-component regulatory systems (
www.sanguinis.mic.vcu.edu/supplemental.htm). This number is comparable to the numbers found in other streptococci (
2,
26,
37,
80,
87). The “orphan” two-component response regulator encoded by SSA_1810 is an ortholog of the tissue-specific virulence factor RitR that represses the hemin-iron transport system in
S. pneumoniae (
92) and of the virulence factor CsrR in
S. pyogenes (
29), suggesting that this regulator may have a similar role in virulence in
S. sanguinis.
S. sanguinis is one of the pioneer colonizers of the oral cavity and may initiate biofilm formation on tooth surfaces. Several putative biofilm-related genes are found in
S. sanguinis and most other streptococci. For example, SSA_0135 to SSA_0137 are clustered in an arrangement similar to that observed for their orthologs in the
adc operon, which is involved in biofilm formation in
S. gordonii (
52). Genes of the inducible fructose phosphotransferase operon, which is also related to biofilm formation in
S. gordonii (
51), are similarly clustered in
S. sanguinis (SSA_1080 to SSA_1082). The SSA_1909 product is more than 60% identical to biofilm regulatory protein A (BrpA) in
S. mutans. BrpA codes for a predicted surface-associated protein with functions not only in biofilm formation, autolysis, and cell division but also in the regulation of acid and oxidative stress tolerance in
S. mutans (
95).
SSA_1853 is an ortholog of the LuxS gene in
S. oralis 34, which is responsible for the catabolism of
S-ribosylhomocysteine, producing autoinducer 2, a universal signal molecule mediating cell-cell and interspecies communication (quorum sensing) among bacteria, biofilm formation, and virulence (
74).
Conclusion. S. sanguinis is one of the most frequently recognized pioneering inhabitants of human oral plaque (
76). Completion of its genome sequence provided unique insight into the biology, virulence, and pathogenesis of this important bacterium. The greater size and G+C content of the
S. sanguinis genome reflect the differences between this organism and other streptococci. The genome has clearly been molded by HGT, and the mechanisms by which the large cluster of genes in the
cob,
pdu, and
eut pathways were transferred and confer a selective advantage to
S. sanguinis are rich subjects for future investigations. Our analysis of the genome also provided fundamental genetic data for investigating the etiology of caries by comparison with cariogenic
S. mutans. The biology and metabolism of this important bacterium have been described so that new prophylactic and therapeutic strategies can now be explored. Finally, in previous studies workers have used many different strains of
S. sanguinis, several of which would now be classified as
S. gordonii,
S. parasanguinis, or other species. The availability of the SK36 sequence, as well as the bacterium, which has been deposited in the American Type Culture Collection (catalog no. BAA-1455), should facilitate future studies with this species.