|Home | About | Journals | Submit | Contact Us | Français|
The genome of Streptococcus sanguinis is a circular DNA molecule consisting of 2,388,435 bp and is 177 to 590 kb larger than the other 21 streptococcal genomes that have been sequenced. The G+C content of the S. sanguinis genome is 43.4%, which is considerably higher than the G+C contents of other streptococci. The genome encodes 2,274 predicted proteins, 61 tRNAs, and four rRNA operons. A 70-kb region encoding pathways for vitamin B12 biosynthesis and degradation of ethanolamine and propanediol was apparently acquired by horizontal gene transfer. The gene complement suggests new hypotheses for the pathogenesis and virulence of S. sanguinis and differs from the gene complements of other pathogenic and nonpathogenic streptococci. In particular, S. sanguinis possesses a remarkable abundance of putative surface proteins, which may permit it to be a primary colonizer of the oral cavity and agent of streptococcal endocarditis and infection in neutropenic patients.
Streptococcus sanguinis (formerly known as “S. sanguis,” but renamed for grammatical correctness ) is an indigenous gram-positive bacterium that has been recognized for a long time as a key player in colonization of the human oral cavity (81). Like most oral streptococci, this bacterium produces alpha-hemolysis on blood agar, a characteristic linked to the ability of viridans streptococci to oxidize hemoglobin in erythrocytes by secretion of H2O2 (6). S. sanguinis binds directly to saliva-coated teeth, probably by a variety of mechanisms (46). Studies employing saliva-coated hydroxyapatite as a tooth model have revealed both lectin-carbohydrate and nonlectin interactions (27, 38, 42, 64). Some of the salivary components to which S. sanguinis binds have been identified, including salivary immunoglobulin A and α-amylase (27). Once bound, S. sanguinis serves as a tether for the attachment of other oral microorganisms that colonize the tooth surface, form dental plaque, and contribute to development of caries and periodontal disease (46). S. sanguinis may also interfere with colonization of the tooth by Streptococcus mutans, the primary species associated with dental caries (16), and its presence therefore may also be beneficial for oral health.
The viridans streptococci are the most common cause of native-valve infective endocarditis, and S. sanguinis is the viridans streptococcus most commonly implicated in this disease (66). S. sanguinis and other viridans streptococci are also emerging as important bloodstream pathogens in infections that threaten neutropenic patients (1), and these infections may be complicated by an increasing frequency of antibiotic resistance (71). The reasons underlying this previously unrecognized virulence are unknown, and antibiotic resistance is disquieting because viridans streptococci, including S. sanguinis, have been classified historically as penicillin sensitive and for many years were believed to be unable to become resistant to β-lactam antibiotics.
Here, we describe the sequence and an analysis of the genome of S. sanguinis strain SK36, which was originally isolated from human dental plaque (43). Analysis of the predicted proteins yielded new insights into potential pathogenicity and virulence factors in this important bacterium, which allowed comparison with virulence mechanisms in other streptococci. Furthermore, about 28% of the predicted proteins were confirmed with high confidence by mass spectrometry (MS).
S. sanguinis SK36 was isolated from human dental plaque (38, 42). This strain was selected because it (i) has the defining features of S. sanguinis as determined by accepted diagnostic tests, (ii) can aggregate human platelets (35), (iii) is naturally competent (69), (iv) binds to saliva-coated hydroxyapatite (38, 42), (v) coaggregates with other oral bacteria (R. N. Andersen and P. E. Kolenbrander, personal communication), and (vi) is virulent in the rat and rabbit models of infective endocarditis (69). For genomic DNA isolation, cells were grown in an atmosphere containing 10% H2, 10% CO2, and 80% N2 at 37°C in brain heart infusion broth (Difco Inc., Detroit, MI).
The genome was sequenced using a modified whole-genome shotgun strategy, as previously described (98). In short, two shotgun libraries (1- to 2-kb and 2- to 4-kb inserts) and one BAC library (~500 clones, 25- to 100-kb inserts) were constructed, and approximately 74,000 sequences were generated (~15-fold coverage of the genome) by using an ABI 3700 96-lane capillary DNA sequencer (Applied Biosystems). The genomic sequence was assembled as previously described (98). Gaps were closed by genome walking (Clontech), alignment with BAC clones, long-distance PCR, and multiplex PCR (89). All remaining low-quality sequence regions were amplified and resequenced for finishing. About 5,000 sequences were added during gap closing and finishing. Genome annotation was performed automatically essentially as previously described (98). Gene predictions were based on Glimmer (77), database searches, and manual verification in Apollo (50). rRNA boundaries were set based on predicted structural criteria (15).
To select candidates for horizontal gene transfer (HGT), the phyletic patterns of gene distribution were analyzed. First, S. sanguinis proteins were compared to the NCBI nonredundant protein database using BLASTP. Significant matches (E < 1e-6) were analyzed to find genes without streptococcal sequences among the top six species matching the S. sanguinis protein. The same analysis was performed with Escherichia coli K-12, considering Salmonella and Yersinia the “same” genus (these genera were chosen as the genera that were closest phylogenetically to E. coli since no other species of Escherichia have been sequenced). This analysis overestimated the number of HGT candidates in E. coli due to the narrow sampling of genetic diversity in the genus compared to the broad sampling available for streptococci.
Total protein was extracted from S. sanguinis grown overnight in brain heart infusion broth. Cells were harvested by centrifugation, washed twice in ice-cold phosphate-buffered saline, and suspended in 20 mM morpholinepropanesulfonic acid (MOPS)-62.5 mM NaCl-0.5 mM MgSO4 (pH 7.8) with a protease inhibitor cocktail (Sigma-Aldrich). The cells were mechanically disrupted with an FP120 FastPrep cell disruptor (Bio 101 Systems, Qbiogen, Inc.) by using three 30-s cycles of homogenization at the maximum speed with 1-min intervals between cycles in ice. The suspension was centrifuged (5,000 × g for 15 min at 4°C) to remove unbroken cells and large cellular debris. The supernatant was suspended in solubilization buffer as previously described (68) and was precipitated with a 2D clean-up kit (GE Healthcare). After reduction with dithiothreitol and iodoacetamide alkylation, proteins (~75 μg) were digested overnight with trypsin. The resulting tryptic peptides were desalted using C8 cartridges (Michrom BioResources) and were subjected to two-dimensional nano liquid chromatography-MS/MS analyses with a Michrom BioResources Paradigm MS4 multidimensional separation module, a Michrom NanoTrap platform, and an LCQ Deca XP Plus ion trap mass spectrometer. The mass spectrometer was operated in the data-dependent mode, and the four most abundant ions in each MS spectrum were selected and fragmented to produce tandem mass spectra. The MS/MS spectra were recorded in the profile mode. Proteins were identified by searching the MS/MS spectra against our S. sanguinis database using Bioworks v3.2. Peptide and protein hits were scored and ranked using the new probability-based scoring algorithm incorporated in Bioworks v3.2. Only peptides identified as possessing fully tryptic termini with cross-correlation scores greater than 1.9 for singly charged peptides, 2.3 for doubly charged peptides, and 3.75 for triply charged peptides were used for peptide identification. In addition, the delta-correlation scores had to be greater than 0.1, and for increased stringency, a protein was accepted only if its probability score was <0.0001.
The S. sanguinis SK36 genome sequence has been deposited in the GenBank database under accession no. CP000387.
The S. sanguinis genome is comprised of a 2,388,435-bp circular DNA molecule, which is 7 to 24% larger than previously described streptococcal genomes (Table (Table1).1). The genome start point was assigned to the putative origin of replication, as determined by GC skew (61), the location of the dnaA gene, and similarity to other genomic sequences (54). The putative replication termination region is ~1.2 Mbp downstream from the origin of replication (Fig. (Fig.1).1). The G+C content of the genome is 43.40%, which is higher than the G+C content of any of the 21 other completed streptococcal genomes (35.62 to 39.72%) (Table (Table1).1). For protein-encoding genes, the G+C contents are 53.55, 35.46, and 44.35% for positions 1, 2, and 3, respectively. Based on the relationship between the G+C contents of whole genomes and the G+C contents of position 3 of coding sequences, which were recently determined for 232 eubacterial genomes (93), the expected value for position 3 in S. sanguinis is 42.5%, in good agreement with the observed value. This observation suggests that unlike the findings for Lactobacillus bulgaricus, the higher overall G+C content of S. sanguinis is not due to an ongoing process of compositional change or to a different relationship of whole-genome and third-position G+C values. There are four rRNA operons containing the 5S, 16S, and 23S rRNA genes, which is less than the number in most other streptococci (Table (Table1),1), despite the larger genome size and in contrast to a reported correlation between the numbers of rRNA and tRNA genes and the genome sizes in the Firmicutes (93). The 61 predicted tRNA genes encode all 20 amino acids, but wobble rules are required for several abundant codons (www.sanguinis.mic.vcu.edu/supplemental.htm). Most tRNA genes are clustered near the rRNA operons; i.e., 48 of 61 of these genes were less than 1 kb from an rRNA operon (Fig. (Fig.1),1), as in Streptococcus pneumoniae (88).
The genome contains genes encoding 2,274 predicted proteins and accounting for more than 90% of the sequence (www.sanguinis.mic.vcu.edu/supplemental.htm). About 86% (1,965) of these genes are transcribed in the direction of replication, like the genes in other streptococci (2, 87, 88). The average gene contains 935 bp of coding sequence, and the average intergenic region is 115 bp long. The latter value is smaller than the values for other streptococcal genomes that have been sequenced, whose average intergenic region lengths are 130 to 177 bp, or for the E. coli genome, whose average intergenic region length is 139 bp. This observation suggests that S. sanguinis has a more compact genome, although differences in annotation methods may also explain the differences. Of the predicted proteins, 89% exhibit significant similarity to proteins from other organisms. About 22% are conserved hypothetical proteins (present in multiple species but having unknown functions), and approximately 645 of the predicted proteins were confirmed by MS (www.sanguinis.mic.vcu.edu/supplemental.htm).
The S. sanguinis SK36 genome was compared with other genomes to identify the proteins that are conserved among streptococci. Figure Figure22 shows the homologous proteins that are shared by S. sanguinis, S. mutans, and S. pneumoniae. This analysis indicated that S. sanguinis shares 23 more proteins with S. mutans than with S. pneumoniae and that the latter two species share only 19 proteins not present in S. sanguinis. Previous analyses based on rRNA (41) and our more broadly based phylogenetic analysis confirmed that S. sanguinis is more closely related to S. pneumoniae than to S. mutans, suggesting that the similarity with S. mutans reflects the shared oral niche of these two species. The proteins shared by only S. sanguinis and S. mutans include 60 proteins that are hypothetical or have unknown functions and, interestingly, 34 putative transcriptional regulators. All proteins in the S. sanguinis genome were functionally categorized and compared (Fig. (Fig.3)3) essentially as previously described (98).
Consistent with previous observations (43), S. sanguinis can apparently use a broad range of carbohydrate sources for survival. We identified more than 50 putative carbohydrate transporters, including phosphotransferase system enzymes specific for transport of glucose, fructose, mannose, cellobiose, glucosides, fructose, lactose, trehalose, mannose, galactitol, and maltose (see Table S1 in the supplemental material). Thus, this bacterium seems to possess a robust system for energy generation by fermentation of sugars and other carbohydrates.
Similar to S. mutans (2) and other streptococci, S. sanguinis has an incomplete citrate cycle and contains only the enzymes to convert oxaloacetate into 2-oxoglutarate. Although clearly incapable of direct ATP production, this pathway fragment likely generates intermediates in the synthesis of aspartate and glutamate.
Our analysis suggests that S. sanguinis has a robust biosynthetic capacity. All key enzymes for gluconeogenesis are present. This bacterium has both pyruvate-phosphate dikinase (EC 184.108.40.206) (encoded by SSA_1053) found in other streptococci and phosphoenolpyruvate synthase (EC 220.127.116.11) (encoded by SSA_1012 and SSA_1016) that is absent in other streptococci. There is also a Firmicutes-specific fructose-1,6-bisphosphatase (EC 18.104.22.168) (encoded by SSA_1056) that is present in Streptococcus agalactiae but not in S. pneumoniae, S. mutans, Streptococcus pyogenes, or Streptococcus thermophilus. Phyletic pattern analyses suggested that the genes for these enzymes were acquired by HGT (www.sanguinis.mic.vcu.edu/supplemental.htm). Similarly, enzymes in the pentose phosphate pathway and enzymes in the purine and pyrimidine pathways, which are required for de novo synthesis of nucleotides with the possible exception of dTTP, seem to be available. Enzymes necessary for converting glutamate and glutamine to intermediates in purine and pyrimidine synthesis are also present. However, as in S. mutans (2), the gene for nucleoside diphosphate kinase (EC 22.214.171.124), which phosphorylates dTDP to dTTP, could not be identified. Since these enzymes are highly conserved in other streptococci, it is unlikely that we missed identifying their genes, assuming that they are derived from common progenitors.
S. sanguinis seems to have the ability to synthesize de novo all essential amino acids except the branched amino acids (leucine, isoleucine, and valine), lysine, and tryptophan (www.sanguinis.mic.vcu.edu/supplemental.htm). This conclusion is in agreement with our finding that S. sanguinis cannot grow in a semidefined biofilm medium (52) if supplemental amino acids are not included (data not shown). Synthesis of asparagine likely relies on a two-step process in which aspartate is bound to tRNAAsn by a nondiscriminating Asp-tRNA synthetase, followed by conversion of the aspartate to asparagine via a three-subunit aspartyl/glutamyl-tRNA amidotransferase, as has been shown for Deinococcus radiodurans (62). The latter enzyme is probably also responsible for conversion of Glu-tRNAGln to Gln-tRNAGln, thus explaining the lack of a gene encoding glutaminyl-tRNA synthetase in the genome (72). As noted above, enzymes for gluconeogenesis are present and could permit the bacterium to convert some amino acids (e.g., serine) into fructose-6-phosphate, an entry point of the pentose phosphate pathway. In this way, amino acids can be converted into the precursors of nucleotide biosynthesis. Marri et al. (58) recently reported that among the streptococci, S. mutans is unique in possessing the genes responsible for biosynthesis of histidine and that S. pyogenes is unique in its apparent ability to convert histidine to glutamate. S. sanguinis possesses the genes for both of these processes.
Lipid biosynthesis apparently follows the classical bacterial type II fatty acid synthase complex pathway (34). As shown previously for S. pneumoniae (33, 57), S. sanguinis encodes the enoyl-(acyl-carrier protein) reductase (EC 126.96.36.199) FabK instead of the widespread and conserved FabI type enzyme of other bacteria and plants. The FabK enzyme of S. pneumoniae is less sensitive to inhibition by the antimicrobial triclosan than FabI is (33, 57). Therefore, S. sanguinis is probably more resistant than FabI-containing bacteria to inhibition of lipid biosynthesis by the triclosan used in some toothpastes. Fatty acids can be generated from amino acids since enzymes needed for the conversion of some amino acids (e.g., serine) into acetyl coenzyme A are present (www.sanguinis.mic.vcu.edu/supplemental.htm).
As expected, the S. sanguinis genome contains the genes required for cell wall sugar, peptidoglycan, and teichoic acid biosynthesis and degradation (www.sanguinis.mic.vcu.edu/supplemental.htm). Single copies of the genes encoding homologs of the S. mutans signal recognition particle components Ffh, FtsY, and small cytoplasmic RNA are present in S. sanguinis, as are single copies of the genes encoding the secretion components YidC1, YidC2, YajC, SecA, and SecYEG (31).
In contrast to S. pneumoniae, in which ~5% of the genome is composed of insertion sequences (IS) (88), we found only two apparently functional IS elements (SSA_0265 and SSA_0266; SSA_1361 and SSA_1362) in S. sanguinis. These elements are flanked by 4-bp direct repeats and are ~80% identical at the nucleotide level to IS3 elements flanked by 3-bp repeats in S. mutans (55). Neither IS interrupts a known gene or open reading frame (ORF). Other evidence of transposable elements includes remnants of IS elements (SSA_1477 to SSA_1479 and SSA_0732) and a truncated transposase (SSA_2029). No intact prophages were found, although some apparent remnants (SSA_0235, SSA_2032, and SSA_2295 encoding an integrase/recombinase; SSA_2383 encoding a prophage maintenance system killer protein; and SSA_2282 encoding a phage infection protein) are present (www.sanguinis.mic.vcu.edu/supplemental.htm). No evidence of the presence of integrons was found. Homologs of the dpnM, dpnA, and dpnB genes of S. pneumoniae encoding the DpnII restriction-modification system are present in the S. sanguinis genome (SSA_1716 to SSA_1718). This system reduces the efficiency of HGT by phage infection, conjugative transfer, and transformation by plasmid (but not chromosomal) DNA (47). We did not find genes for the R.StsI and M.StsI components previously found in S. sanguinis 54 (44).
In spite of the relative paucity of transposon- and phage-related genes, at least 270 S. sanguinis genes (12% of the genes) were identified as candidates for HGT by observing the phyletic pattern of gene distribution (www.sanguinis.mic.vcu.edu/supplemental.htm) (see Materials and Methods). The apparent lack of phage genes and conjugative transposable elements suggests that transformation is the predominant method by which HGT occurs in S. sanguinis. Like certain other streptococci, S. sanguinis is naturally competent for transformation (25). In S. pneumoniae, 22 proteins necessary for chromosomal transformation have been identified (70). We found that 20 of these proteins have apparent orthologs in S. sanguinis (www.sanguinis.mic.vcu.edu/supplemental.htm). Neither ComW, an 80-amino-acid protein which stabilizes and activates the alternative sigma factor ComX (84) and for which there are no database matches in any other bacterium in the GenBank database, nor ComB, which functions with ComA to cleave and export competence-stimulating peptide (CSP), was identified. The SSA_1100 product exhibits similarity to ComA. However, the best matches for SSA_1100 in the GenBank database were matches to genes encoding transporters for RTX-type toxins from gram-negative bacteria (94). Since the adjacent gene encodes a putative RTX toxin, it appears that this protein transports the toxin rather than CSP. Therefore, it appears that ComA and ComB are not present in S. sanguinis. This absence may be related to the previous observation that ComC, the CSP precursor in S. sanguinis, is unique among all 125 ComC sequences from 13 streptococcal species in the GenBank database in that it lacks a double-glycine cleavage site (32). This unique cleavage site could be paired with unique proteins for processing and export.
One 70-kb cluster of 68 HGT candidates (SSA_0463 to SSA_0541) encodes an anaerobic cobalamin (vitamin B12) biosynthetic (cob) pathway, as well as propanediol utilization (pdu) and ethanolamine utilization (eut) pathways (Fig. (Fig.4;4; see Table S2 in the supplemental material). Many of the proteins in this cluster were identified by MS, proving that these genes are expressed.
Vitamin B12 is an important nutrient for human health; a deficiency leads to pernicious anemia. However, synthesis of this compound in prokaryotes (40) occurs only by two alternative routes: an aerobic pathway that incorporates molecular oxygen during biosynthesis and an anaerobic pathway that incorporates chelated cobalt ions in the absence of oxygen (78). All genes required for anaerobic cobalamin biosynthesis are present in S. sanguinis. It appears that the complete vitamin B12 biosynthesis pathway is available. If so, this is the first time that the complete B12 biosynthesis pathway has been identified in streptococci, although three proteins involved in cobalamin biosynthesis and cobalt transport (cbiMQO) have been found in Streptococcus salivarius 57.I and S. thermophilus (18).
Cobalamin-dependent utilization of 1,2-propanediol via the pdu pathway plays an important role in Salmonella enterica serovar Typhimurium infection (20), and the pdu genes are correlated with cobalamin biosynthetic genes in terms of both location and coregulation. The S. enterica serovar Typhimurium pdu pathway contains 23 genes for the coenzyme B12-dependent catabolism of 1,2-propanediol (12). S. sanguinis has all of these genes except pduM and pduS, which encode proteins with unknown functions, and pduN, which encodes polyhedral bodies that may not be directly related to the catabolism of 1,2-propanediol (12) (see Table S2 in the supplemental material).
The eut pathway in S. enterica serovar Typhimurium is required for utilization of ethanolamine as a carbon and nitrogen source (75). Only 4 (eutB, eutC, eutD, and eutE) of the 17 genes in the S. enterica serovar Typhimurium eut operon have been correlated directly with an enzymatic activity known to be required for ethanolamine utilization (79). Three of these four genes, eutB (SSA_0519), eutC (SSA_0520), and eutE (SSA_0523), have homologs in S. sanguinis. eutD encodes a protein with phosphotransacetylase activity (14) and exhibits 40% identity with the S. sanguinis SSA_1207 ORF, which is annotated as phosphate acetyltransferase ORF. A two-component system (SSA_0516 and SSA_0517) that may regulate ethanolamine utilization in response to environmental factors is upstream of eutA. Since ethanolamine and propanediol sources in the environment seem largely man-made (e.g., toothpaste, mouthwash, and antifreeze) and their utilization is dependent on vitamin B12, it is interesting to speculate that this large ~70-kb gene cluster may have been selected in S. sanguinis by exposure to these man-made products.
Although very few of these cobalamin-related genes are present in previously published streptococcal genomes, many are present in other oral pathogens, including Porphyromonas gingivalis, Treponema denticola, and Fusobacterium nucleatum (see Table S2 in the supplemental material). Our analyses suggest that the 70-kb cluster of HGT genes has an origin similar to the origin of orthologs in Listeria (www.sanguinis.mic.vcu.edu/supplemental.htm), but a more in-depth phylogenetic analysis involving more prokaryotic genomes is necessary to confirm its origin.
Two small discrete blocks of HGT candidate genes (SSA_1012 to SSA_1017 and SSA_1053 to SSA_1056) contain three genes involved in gluconeogenesis. The two genes in the second block (SSA_1053 and SSA_1056), encoding EC 188.8.131.52 and EC 184.108.40.206, are sufficient, in combination with other apparently native genes, to enable gluconeogenesis. These two genes are also found in S. agalactiae, theoretically enabling gluconeogenesis in this organism, while all other streptococcal genomes that have been sequenced seem to lack the complete set of genes required for gluconeogenesis. The results of our analysis (see Materials and Methods) are consistent with the hypothesis that these genes were transferred by HGT to these streptococci from other bacteria belonging to the phylum Firmicutes (www.sanguinis.mic.vcu.edu/supplemental.htm).
Several proteins potentially relevant to adhesion in the oral cavity or to virulence in invasive disease were identified in the S. sanguinis genome (see Table S3 in the supplemental material). Perhaps the most surprising is the protein encoded by SSA_1099 (Stx), which exhibits homology to RTX-type toxins in gram-negative bacteria (94). To our knowledge, this is the first occurrence of this class of toxin gene in a gram-positive bacterium. Consistent with this unique setting, orthologs of the HylB ATPase and HlyD “membrane fusion protein” components of an RTX toxin export system are encoded by adjacent ORFs (SSA_1100 and SSA_1101, respectively), but no homolog of the TolC outer membrane component (36) was found. Both Stx and the putative ATPase transporter component, encoded by SSA_1100, were detected in the proteomic analysis (www.sanguinis.mic.vcu.edu/supplemental.htm). Although the leukotoxin from the oral bacterium Actinobacillus actinomycetemcomitans is a well-known ortholog of the Stx protein, the products of SSA_1099 to SSA_1101 are, as a whole, most similar to proteins in plant-pathogenic pseudomonads. Thus, the origin of these S. sanguinis genes and their functions are unclear.
The genes associated with pathogenicity in S. sanguinis also include genes encoding orthologs of the major known adhesins in other viridans species. SspC and SspD are orthologs of the SspA and SspB adhesins of Streptococcus gordonii (39, 53). Whereas the latter proteins are encoded by adjacent genes in S. gordonii, this is not true in S. sanguinis. Conversely, the cshA and cshB adhesin genes are not contiguous in S. gordonii (60), whereas the S. sanguinis crpABC orthologs are contiguous. The ligand specificity of SspA orthologs in viridans streptococci is determined by their sequences (39, 53). Neither SspC nor SspD is closely related to any SspA homolog that has been characterized previously. As determined by BLASTP analysis (3), SspC has only 55% identity with its closest relative (SspA), and SspD has 33% identity with its closest relative (PaaA of Streptococcus criceti). Therefore, it is not clear what ligand(s), if any, SspC and SspD bind. However, the 27-amino-acid region of SspB that has been shown to mediate binding of S. gordonii to P. gingivalis is conserved in SspC (18 identical residues and five similar residues), including perfect identity of the critical NITVK subsequence (21). This observation suggests that SspC may also adhere to P. gingivalis.
Lipoproteins (LP) and cell-wall anchored proteins (CWA), two classes of proteins that are surface exposed and prevalent among reported virulence factors, were predicted (www.sanguinis.mic.vcu.edu/supplemental.htm). The lgt and lspA genes expected for LP processing are present (SSA_1546 and SSA_1069, respectively), as are genes encoding three sortases (SSA_0022, SSA_1219, and SSA_1631) for CWA processing. Interestingly, the numbers of these surface proteins (60 LPs and 33 CWAs) are striking compared to the numbers in related species. As determined by the same search criteria used for S. sanguinis, S. mutans has only 29 LPs and six CWAs. S. pneumoniae TIGR4 possesses 40 LPs and 12 CWAs, while R6 has 39 LPs and 13 CWAs. However, many of the additional ORFs in S. sanguinis appear to be redundant. Thus, S. sanguinis contains nine paralogous CWAs in three families and seven paralogous LPs in three families. In addition, functional redundancy may occur in the absence of overall sequence similarity; five CWAs possess the collagen-binding domain, Pfam05737 (23). This vast array of surface proteins may contribute to the ability of S. sanguinis to colonize the tooth and interact with a diverse group of oral bacteria (46) and may account for its predominance as a cause of streptococcal endocarditis (66).
Fibrils or pili are involved in streptococcal adherence and virulence (7, 59, 82). S. sanguinis strains possess both short fibrils and long fibrils (30). Fap1 of Streptococcus parasanguinis, an ortholog of the CWA encoded by SSA_0829 or SrpA, is thought to be the structural component of long fibrils (82), and its orthologs are important for adhesion to platelets (9), saliva-coated hydroxyapatite (96), and salivary agglutinin (39). The products of SSA_0830 to SSA_0841 exhibit homology to the proteins shown to be required for the glycosylation and export of SrpA orthologs in S. parasanguinis and S. gordonii (9, 17, 85). In fact, the 11 genes downstream from srpA are most similar in terms of sequence to, and are in the same order as, the 11 genes that form the export locus of the SrpA ortholog, GspB, in S. gordonii (85). Shorter fibrils in S. gordonii are comprised of CshA and possibly also CshB (59), which are orthologs of CWAs encoded by SSA_0904 to SSA_0906. The fact that S. sanguinis has both classes of proteins, as well as the locus dedicated to SrpA export, could account for the apparent presence of both short and long fibrils. In addition, in recent studies workers have identified long pili in S. agalactiae (49), S. pyogenes (63), and S. pneumoniae (7). In these bacteria, a single locus contains three putative pilin subunit genes encoding CWA motifs and one to three sortase genes that are required for assembly of the pili (7, 49, 63). S. sanguinis also contains an apparent pilus locus, with SSA_1632 to SSA_1635 encoding LPXTG proteins and SSA_1631 encoding a sortase. SSA_1632 to SSA_1634 also each contain a conserved “E box” domain found in many pilin genes (90).
The SSA_2302 to SSA_2318 sequences exhibit homology to ORFs required for production of type IV pili. Such pili were originally believed to exist only in gram-negative bacteria, although the gram-positive bacterium Ruminococcus albus appears to possess a type IV pilus that serves as an adhesin (73). Our analysis suggests that the S. sanguinis ORFs were acquired by HGT, perhaps from a clostridial species, and are distinct from the ORFs in S. sanguinis that apparently encode the pseudopilus involved in genetic competence (data not shown).
Cell wall polysaccharides (CWP) serve as important receptors for agglutination and coaggregation in oral streptococci (19, 45, 46). S. sanguinis SK36 is similar to type strain ATCC 10556 in that it coaggregates with numerous species of Streptococcus, Actinomyces, and Fusobacterium (38, 45) (Kolenbrander and Andersen, personal communication). These interactions are inhibited by addition of 60 mM N-acetyl-d-galactosamine, confirming the polysaccharide composition of the receptor (45). Six structures have been defined for CWP in oral streptococci (19), and the loci responsible for synthesis of one of these structures have been characterized in S. gordonii (97). Orthologs of these genes are located mostly in two genomic segments in S. sanguinis, SSA_1509 to SSA_1519 and SSA_2211 to SSA_2225. However, these segments also contain apparent CWP synthesis genes that have close orthologs in S. thermophilus, Streptococcus suis, S. pneumoniae, or Streptococcus iniae but no orthologs in S. gordonii. These CWP loci, therefore, appear to be unlike any loci characterized previously, and it is not clear whether they direct the synthesis of a type 1 N-acetylgalactosamine-β1→3-galactose CWP like that found in previously characterized S. sanguinis strains (19).
The S. sanguinis genome contains only two homologs of the twin-arginine translocation (Tat) system, which exports folded proteins with the characteristic N-terminal twin-arginine motif across the cytoplasmic membrane (65). SSA_1132 and SSA_1133 apparently encode the TatC Sec-independent protein translocase and the TatA Sec-independent protein secretion pathway component, respectively. Of the streptococcus genomes examined to date, this system has been found only in S. thermophilus. Our analysis showed that three genes, encoding a periplasmic lipoprotein involved in iron transport (SSA_1129), an iron-dependent peroxidase (SSA_1130), and a high-affinity Fe2+/Pb2+ permease (SSA_1131) associated with the Tat genes in S. sanguinis, are similarly associated in other genomes, including the genomes of S. thermophilus, Staphylococcus aureus MRSA252, and Staphylococcus haemolyticus. Using the TatP server (8) to search for Tat secretion substrates, we found that the iron-dependent peroxidase gene SSA_1130 was the only ORF in the genome that encoded both a consensus Tat motif and a Tat signal peptide.
Two glucosyltransferases (GTF) were found in S. sanguinis. The SSA_0613 product is a homolog of GtfR of Streptococcus oralis ATCC 10557, which synthesizes water-soluble glucans with no primer dependence (24). The SSA_1006 product is a homolog of GtfA, an enzyme that, in the presence of inorganic phosphate, converts sucrose to fructose and glucose-1-phosphate (4). Furthermore, the products of several ORFs exhibit homology to S. mutans non-GTF glucan-binding proteins (GBP), including the products of SSA_0019, SSA_0303, and SSA_0956. Non-GTF GBPs are cell surface receptors for glucan or secreted proteins that can become cell associated when glucan coats the bacterial cells. Although all GBPs have glucan-binding properties, they are a heterogeneous group of proteins with variations in size, glucan-binding domains, glucan-binding affinity, and function (4).
More than 100 putative transcriptional regulators were identified in the S. sanguinis genome (www.sanguinis.mic.vcu.edu/supplemental.htm). Like the genomes of some other streptococci, the S. sanguinis genome contains genes encoding a major sigma factor 70 (SSA_0825, rpoD) and an ortholog of the competence-specific sigma factor, ComX (SSA_0016). Genes encoding NusA (SSA_1900), NusB (SSA_0452), and NusG (SSA_2205) were found, although no obvious Rho protein was identified. This was also true for the other streptococcal genomes examined. Two genes, SSA_1187 and SSA_1695, code for additional putative antitermination proteins. Two-component regulatory systems, composed of a sensor histidine kinase and a transcriptional response regulator, provide a mechanism for bacteria to sense and respond to environmental signals. We found 29 genes that apparently comprise 14 two-component regulatory systems (www.sanguinis.mic.vcu.edu/supplemental.htm). This number is comparable to the numbers found in other streptococci (2, 26, 37, 80, 87). The “orphan” two-component response regulator encoded by SSA_1810 is an ortholog of the tissue-specific virulence factor RitR that represses the hemin-iron transport system in S. pneumoniae (92) and of the virulence factor CsrR in S. pyogenes (29), suggesting that this regulator may have a similar role in virulence in S. sanguinis.
S. sanguinis is one of the pioneer colonizers of the oral cavity and may initiate biofilm formation on tooth surfaces. Several putative biofilm-related genes are found in S. sanguinis and most other streptococci. For example, SSA_0135 to SSA_0137 are clustered in an arrangement similar to that observed for their orthologs in the adc operon, which is involved in biofilm formation in S. gordonii (52). Genes of the inducible fructose phosphotransferase operon, which is also related to biofilm formation in S. gordonii (51), are similarly clustered in S. sanguinis (SSA_1080 to SSA_1082). The SSA_1909 product is more than 60% identical to biofilm regulatory protein A (BrpA) in S. mutans. BrpA codes for a predicted surface-associated protein with functions not only in biofilm formation, autolysis, and cell division but also in the regulation of acid and oxidative stress tolerance in S. mutans (95).
SSA_1853 is an ortholog of the LuxS gene in S. oralis 34, which is responsible for the catabolism of S-ribosylhomocysteine, producing autoinducer 2, a universal signal molecule mediating cell-cell and interspecies communication (quorum sensing) among bacteria, biofilm formation, and virulence (74).
S. sanguinis is one of the most frequently recognized pioneering inhabitants of human oral plaque (76). Completion of its genome sequence provided unique insight into the biology, virulence, and pathogenesis of this important bacterium. The greater size and G+C content of the S. sanguinis genome reflect the differences between this organism and other streptococci. The genome has clearly been molded by HGT, and the mechanisms by which the large cluster of genes in the cob, pdu, and eut pathways were transferred and confer a selective advantage to S. sanguinis are rich subjects for future investigations. Our analysis of the genome also provided fundamental genetic data for investigating the etiology of caries by comparison with cariogenic S. mutans. The biology and metabolism of this important bacterium have been described so that new prophylactic and therapeutic strategies can now be explored. Finally, in previous studies workers have used many different strains of S. sanguinis, several of which would now be classified as S. gordonii, S. parasanguinis, or other species. The availability of the SK36 sequence, as well as the bacterium, which has been deposited in the American Type Culture Collection (catalog no. BAA-1455), should facilitate future studies with this species.
This work was supported by PHS grants DE12882 from the National Institute of Dental and Craniofacial Research (to F.L.M. and G.A.B.) and AI47841 and AI054908 from the National Institute of Allergy and Infectious Disease (to T.K.) and by grant J743 from the Jeffress Trust (to P.X.).
Sequence analysis was performed in the Nucleic Acids Research Facilities at Virginia Commonwealth University.
Published ahead of print on 2 February 2007.
†Supplemental material for this article may be found at http://jb.asm.org/.