|Home | About | Journals | Submit | Contact Us | Français|
Streptococcus gallolyticus (formerly known as Streptococcus bovis biotype I) is an increasing cause of endocarditis among streptococci and frequently associated with colon cancer. S. gallolyticus is part of the rumen flora but also a cause of disease in ruminants as well as in birds. Here we report the complete nucleotide sequence of strain UCN34, responsible for endocarditis in a patient also suffering from colon cancer. Analysis of the 2,239 proteins encoded by its 2,350-kb-long genome revealed unique features among streptococci, probably related to its adaptation to the rumen environment and its capacity to cause endocarditis. S. gallolyticus has the capacity to use a broad range of carbohydrates of plant origin, in particular to degrade polysaccharides derived from the plant cell wall. Its genome encodes a large repertoire of transporters and catalytic activities, like tannase, phenolic compounds decarboxylase, and bile salt hydrolase, that should contribute to the detoxification of the gut environment. Furthermore, S. gallolyticus synthesizes all 20 amino acids and more vitamins than any other sequenced Streptococcus species. Many of the genes encoding these specific functions were likely acquired by lateral gene transfer from other bacterial species present in the rumen. The surface properties of strain UCN34 may also contribute to its virulence. A polysaccharide capsule might be implicated in resistance to innate immunity defenses, and glucan mucopolysaccharides, three types of pili, and collagen binding proteins may play a role in adhesion to tissues in the course of endocarditis.
Several studies have reported that the proportion of infective endocarditis due to Streptococcus gallolyticus has increased during the last decades, concomitantly with a decrease of cases due to oral streptococci (35). S. gallolyticus is now becoming the first cause of infectious endocarditis among streptococci in Europe (16). Furthermore, S. gallolyticus endocarditis is associated with rural residency, suggesting transmission from animals (29). However, the reasons for the emergence of this pathogen remain poorly understood. S. gallolyticus belongs to the Streptococcus bovis group known for more than 60 years to cause endocarditis (45). Recently, the former species S. bovis has been divided into four major species (50, 53). S. gallolyticus corresponds to S. bovis biotype I (mannitol fermentation positive), the closely related species S. pasteurianus to biotype II/2 (mannitol negative and β-glucuronidase positive), and the more distantly related species S. infantarius to biotype II/1 (mannitol negative and β-glucuronidase negative). S. macedonicus, the fourth species, commonly found in cheese, is nonpathogenic and also considered a S. gallolyticus subspecies (53, 62). A majority of endocarditis cases was due, among the formerly S. bovis group, to S. gallolyticus strains (4).
Multiple studies have shown that endocarditis due to S. gallolyticus as well as positive blood culture for this species is often associated with gastrointestinal malignancy (4, 6). This association has led to a strong indication for gastrointestinal investigation and endoscopic follow-up in the case of S. gallolyticus infections (66). The association of S. gallolyticus infection with colon cancer is a major but still unsolved issue. It may be just incidental, as the alteration of the digestive mucosa may favor the translocation of the bacteria into the bloodstream. Alternatively, the tumor may contribute to the proliferation of S. gallolyticus in close proximity to the gut epithelium, increasing its probability of translocating through the gut barrier. It has also been suggested that the bacterium itself contributes to carcinogenesis (60, 69). In addition to human disease, S. gallolyticus may also cause diseases in animals, like septicemia in pigeons (19), outbreaks in broiler flocks (11), or bovine mastitis (28).
Independent from its association to disease, S. gallolyticus has been isolated as a tannin-resistant bacterium from the feces of different mammalian herbivores, including the koala (48) or the Japanese large wood mouse (52), and it is also a normal inhabitant of the rumen (39). Its resistance to tannins is linked to its tannase activity, a characteristic which also led this bacterium to be named “gallolyticus” as it is able to decarboxylate gallate, an organic acid derived from tannin degradation. S. gallolyticus is also known to express other degradative functions unique among streptococci, like a bile salt hydrolase or an amylase. These properties allow its multiplication outside the animal host, as S. gallolyticus was isolated from a digester fed with shea cake (derived from the nuts of the African tree Vitellaria paradoxa) rich in tannins and aromatic compounds (12). S. gallolyticus is a commensal of the human intestinal tract but remains a rarely detected (2.5 to 15%) low-abundance species (10, 40). In herbivores, overgrowth of S. bovis may become deleterious. For example, ingestion of large amounts of rapidly fermented cereal grains leads to a destabilization of the rumen flora and to the proliferation of acid-tolerant bacteria, including S. gallolyticus. This is accompanied by the overproduction of mucopolysaccharides that stabilize the foam, resulting in feedlot bloat, a significant cause of economical loss (14).
Virulence and colonization factors of S. gallolyticus in humans are largely unknown. Studies of the bird host have shown that this Streptococcus species expresses a capsular polysaccharide, and five different serotypes have been described (19). In addition, electron microscopy studies have revealed the presence of fimbria-like structures on the surface of S. gallolyticus. It was hypothesized that capsules and/or fimbriae are involved in virulence (63). S. gallolyticus isolates responsible for endocarditis exhibited heterogeneous patterns of adherence to extracellular matrix (ECM) proteins, which suggests that they produce different surface components (55). Recently, a collagen binding adhesin together with 10 putative ECM binding proteins were identified in the draft genome sequence of a human isolate of S. gallolyticus (54).
Here we describe the sequence and analysis of the genome of S. gallolyticus strain UCN34 isolated from a human case of endocarditis associated with colon cancer. Analysis of the predicted proteins revealed unique metabolic and cell surface features among streptococci, which contribute to its adaptation to the rumen and to its ability to cause endocarditis. We showed by comparative genomics that many of the corresponding genes were probably acquired by lateral gene transfer (LGT) from other Firmicutes of the gut microbiota.
S. gallolyticus subsp. gallolyticus strain UCN34 was isolated in 2001 at the Hospital in Caen (Calvados, France) and is resistant to tetracycline. It was recovered from blood cultures of a 70-year-old man with a 1-month history of intermittent fever. The patient was found to have endocarditis of his native aortic valve. He was successfully treated with amoxicillin and gentamicin. Subsequent digestive endoscopy revealed a colic cancer, which led to a partial colectomy. For the construction of the large insert library and the shotgun libraries, pSYX34 and XL10-blue Kanr (Stratagene) or pcDNA2.1 (Invitrogen) and XL2-blue were used as vectors or recipient strains, respectively. Escherichia coli and S. gallolyticus strains were grown in Luria and brain heart infusion (BHI) broth, respectively.
Genome sequencing was performed using the conventional whole-genome shotgun strategy (24, 26). Two libraries (1- to 2-kb and 2- to 3-kb inserts) were generated by random mechanical shearing of genomic DNA and cloning into pcDNA-2.1 (Invitrogen). A scaffold was obtained by end sequencing clones from a medium-size insert library (5 to 10 kb) in the low-copy-number vector pSYX34 (68). Recombinant plasmids were used as templates for cycle sequencing reactions consisting of 35 cycles (96°C for 30 s; 50°C for 15 s; 60°C for 4 min) in a thermocycler. Samples were precipitated and loaded onto 96-lane automatic capillary 3700 and 3730 DNA sequencers (Applied Biosystems). In an initial step, 30,240 sequences from the three libraries were assembled into 144 contigs using the Phred/Phrap/Consed software (22, 32). CAAT Box (25) was used to predict links between contigs. PCR amplification products amplified from UCN34 chromosomal DNA as template were used to fill gaps and to resequence low-quality regions using primers designed by Consed. If no link was predicted between contigs, direct sequencing with chromosomal DNA as template was performed. Sequence reactions consisted of 99 cycles (96°C for 30 s; 50°C for 15 s; 60°C for 4 min) in a thermocycler. A total of 250 mM betaine was added to the reaction mixture.
The CAAT Box environment (25) was used for genome annotation. Coding sequences (CDS) were defined by combining GeneMark predictions (38) with visual inspection of each open reading frame (ORF) for the presence of a start codon with an upstream ribosome binding site and BlastP similarity searches of the Nrprot database (2). The GeneMark predictions were trained on a set of ORFs longer than 300 codons encoding proteins similar to proteins with known function present in public databases. Initially, only CDS longer than 80 codons were retained. Subsequently, all CDS between 40 and 80 codons were searched using the same matrix, but only those with a high coding probability, as predicted by GeneMark, were retained. In a final step, all intergenic regions were searched for short or truncated genes by BlastX comparisons with protein sequence libraries. Limits of the rRNA operons were identified by homology with the other streptococcal genomes; tRNAs were searched using tRNAscan-SE (41).
All predicted CDS were examined visually. Function predictions were based on BlastP similarity searches and on the analysis of motifs using the PFAM databases. Toppred2 was used to identify transmembrane domains (15), SignalP version 2.0 to predict signal peptide regions (44), and LipPred to predict lipoproteins (58). Orthologs between S. gallolyticus UCN34 and Streptococcus agalactiae (strain NEM316) (31), Streptococcus suis (strain 05ZYH33) (13), Streptococcus thermophilus (strain CNRZ1066) (7), Streptococcus mutans (strain UA159) (1), Streptococcus sanguinis (strain SK36) (67), Streptococcus uberis (strain 0140J) (64), Streptococcus pyogenes (strain M1) (23), Streptococcus pneumoniae (strain TIGR4) (59), and Streptococcus zooepidemicus (strain MGCS10565) (5) were defined as genes showing bidirectional best hits by S. gallolyticus proteome BlastP comparisons. The threshold was set to a minimum of 50% sequence identity and a ratio of 0.8 to 1.25 of the protein length. Multiple sequence alignments were done using T-Coffee, and the phylogenetic trees by a Bayesian analysis using MrBayes at the phylogeny website (http://www.phylogeny.fr/) (20).
The S. gallolyticus UCN34 genome sequence is available from DDBJ/GenBank/EMBL under accession number FN597254.
The genome of S. gallolyticus strain UCN34 consists of a single circular chromosome of 2,350,911 bp (Fig. (Fig.1).1). It is second in size among streptococcal strains for which complete genome sequences are publicly available, being only 37 kb smaller than the S. sanguinis SK36 genome (67). The G+C content of the genome is 37.6%. It contains six rRNA operons, 71 tRNA genes, and 2,239 protein coding genes, 13 of which are pseudogenes. The average gene size is 306 codons, accounting for 87% of the genome. These values are similar to the average values reported for the genus Streptococcus (5). We assigned functional annotations to 1,586 proteins, whereas 574 proteins were annotated as conserved hypothetical proteins. S. gallolyticus UCN34 encodes only 79 proteins without similarity to proteins described in the sequence databases (E value threshold of e-3). This small number of orphan genes probably reflects both the large number of sequenced streptococcal genomes and the relative paucity of UCN34 in mobile genetic elements, which commonly carry orphan genes (17). However, 454 genes, i.e., one-fifth of the predicted genes, encode proteins showing a best BlastP hit with proteins from nonstreptococcal species. Most of them are clustered in genomic regions of up to 61 genes (in black in Fig. Fig.1).1). The best BlastP hits were predominantly with other Firmicutes species, principally with enterococci, lactobacilli, bacilli, and clostridia, bacterial species also present in the rumen microbiota and in the human gut (Fig. (Fig.2).2). This result suggests that these genes were acquired by LGT. Analysis of the functional categories of these genes revealed that they are enriched in functions related to transport, carbohydrate metabolism, cofactor biosynthesis, detoxification, and regulation, which may account for the specific adaptation of S. gallolyticus to the rumen environment (see Fig. S1 in the supplemental material).
To get a better insight into the relationship of S. gallolyticus with other streptococcal species, we compared its genome sequence with those of 10 other completely sequenced streptococcal species (Table (Table1).1). S. gallolyticus showed the largest number of orthologous genes with S. agalactiae, a gut-adapted pyogenic Streptococcus species and a major cause of neonatal infections that also causes mastitis in the bovine host, and with S. mutans, a Streptococcus species of the viridans group responsible for dental caries. A high proportion of best BlastP hits was also found with S. salivarius, a commensal Streptococcus species from the dorsum of the tongue and the saliva belonging also to the viridans group (Fig. (Fig.2).2). Therefore, S. gallolyticus shares properties mainly with two oral streptococci from the viridans group and with the gut-associated species S. agalactiae. As these different species do not cluster phylogenetically (57), this gene content conservation likely reflects conserved lifestyle and niche adaptation rather than phylogenetic proximity. In contrast, 687 genes of strain UCN34 did not show any ortholog among the 10 other sequenced streptococci. Among these genes, 244 have, nevertheless, a best BlastP hit with a streptococcal gene. This observation indicates that they have been acquired by LGT from other streptococci or that they are paralogous genes. In addition to having transport, carbohydrate metabolism, and regulatory functions, they are enriched in genes coding for surface proteins and proteins implicated in cell wall biosynthesis (see Fig. S2 in the supplemental material). These functions are probably involved in the interaction of S. gallolyticus with the gut environment and its ability to cause disease in humans (see below).
The most remarkable feature of the catabolism of S. gallolyticus as deduced from the genome sequence is that it should have the capacity to degrade diverse complex polysaccharides (Table (Table2).2). Strain UCN34 was predicted to utilize plant storage carbohydrates, like various forms of alpha glycoside polymers, including starch and glycogen. Furthermore the genome encodes nine members of the maltogenic alpha amylase family (glycosidase family 13), including a secreted alpha amylase homologous to the Bacillus subtilis amyE gene product. In addition, S. gallolyticus UCN34 should be able to utilize polymers of fructose, levan, and inulin, which are storage polysaccharides in plants, as we identified a gene encoding a secreted membrane-bound fructan hydrolase (fruA). Unique among streptococci is the presence of genes encoding secreted enzymes predicted to be involved in the degradation of insoluble polysaccharides of the plant cell wall. The gene gallo_0162 encodes a putative secreted mannanase with a C-terminal domain similar to known Bacillus mannanases, whereas the N-terminal domain is a carbohydrate-binding domain, and gene gallo_0330 encodes a secreted protein highly similar to Ruminococcus albus endoglucanase V with cellulase activity (47). Strain UCN34 also synthesizes two putative secreted pectate lyases (Gallo_1577 and Gallo_1578). The catalytic domain of these two paralogous proteins is similar to that of the pectate lyase from Clostridium acetobutylicum. Finally, gallo_0189 encodes a putative secreted arabinogalactan endo-1,4-beta-galactosidase similar to a protein from Enterococcus faecium. These genes encoding glycosyl hydrolases are located in chromosomal regions sharing few orthologs with other streptococci, suggesting that they might have been gained by LGT.
It is known that S. gallolyticus is able to use a broad range of carbohydrates, like cellobiose, fructose, galactose, glucose, lactose, maltose, mannitol, melibiose, raffinose, and trehalose (12). In order to investigate the predictions made from the genome analysis, we further tested the capacity of S. gallolyticus to utilize different carbon sources using the API-50 assay (bioMérieux). This showed that strain UCN34 is indeed able to ferment mannose, N-acetylglucosamine, salicin, sucrose, inulin, starch, glycogen, and gentobiose (data not shown). In agreement with these observations, genome analysis revealed an extremely broad range of sugar permeases, as among the 322 genes possibly involved in transport, 80 were predicted as specific for carbon sources. As an example, strain UCN34 expresses 25 transporters of the phosphotransferase systems (PTS), whereas S. mutans and S. uberis, two species known for their ability to ferment numerous sugars, express only 14 and 15 PTS transporters, respectively (1, 64). These 25 PTS transporters belong to the seven categories defined by Barabote and Saier according to their specificity for different sugars (3): glucose (10), lactose (5), ascorbate (3), fructose (2), mannose (2), galactitol (2), and glucitol (1). Strain UCN34 encodes a great diversity of enzymes to metabolize these carbohydrates, including 33 hydrolases belonging to nine different families. In particular, it encodes 11 phospho-beta-glucosidases (family 1) involved in degrading plant-derived polysaccharides, which is a number similar to that reported for S. uberis, which is known to utilize metabolites deriving from the degradation of plant polysaccharides (64). In addition to the ability to utilize and degrade plant sugars, S. gallolyticus is adapted to use other nutriments of plant origin, as it is the only streptococcus able to utilize malate, a keto acid that occupies a central role in plant metabolism. Strain UCN34 expresses a malolactic enzyme (encoded by gallo_2048) associated with a malate transporter (encoded by gallo_2049).
Taken together, the specific catabolic capacities of S. gallolyticus identified from its genome sequence suggest that they provide a selective advantage to this bacterium for life in the gut environment of herbivores, as it contains a large diversity of compounds of plant origin. S. gallolyticus should be able to degrade them and does not depend on other microorganisms for the initial degradation of insoluble plant polysaccharides which are abundant in this environment.
Although most streptococci are auxotrophic for several amino acids and vitamins, previous experiments have shown that most S. bovis strains show an absolute requirement only for biotin, while thiamine stimulates growth, and no requirement for any amino acid (45). In agreement with this observation, we identified from the genome of strain UCN34 the complete biosynthetic pathways for the 20 amino acids (aa) and for selenocysteine. Comparison of the distribution of these genes among streptococci (Table (Table3)3) showed that streptococci isolated from the oral cavity, such as S. mutans, S. sanguinis, S. gordonii, and S. pneumoniae, have the capacity to synthesize most amino acids. However, S. gallolyticus and S. mutans are the only two Streptococcus species expressing the glutamate synthase genes (gltAB) clustered in an operon with the glutamine synthetase gene (glnA). The glutamate synthase is a key enzyme in conjunction with glutamine synthetase for the assimilation of ammonium ions into cellular metabolic pathways. The genetic organizations of the amino acid biosynthetic pathways are identical in S. gallolyticus and in S. mutans, which is also a prototroph for all amino acids. The analysis of the distribution of the different amino acid biosynthetic pathways among the 11 Streptococcus species shown in Table Table33 suggests that the common ancestor of the genus was proficient in synthesizing all amino acids, and then during evolution, a heterogeneous gene loss took place in the different Streptococcus species, leading to different synthesis capacities. Among nonoral streptococci, S. gallolyticus is particular, as it has kept the ability to synthesize all 20 amino acids.
In agreement with the growth requirements, we identified the complete pathways for the synthesis of multiple vitamins, including riboflavin (B2), nicotine amide (NAD, B3), panthotenate (B5), pyridoxine (B6), and folic acid (B9), and partial biosynthetic pathways for biotin (B8) and thiamine (B1) (Table (Table4).4). S. gallolyticus is the only Streptococcus species that possesses complete biosynthetic pathways for panthotenate and NAD, whereas riboflavin biosynthesis is present only in S. gallolyticus, S. agalactiae, and S. pneumoniae. Pantothenate biosynthesis involves four enzymatic steps from α-ketoisovalerate and l-aspartate catalyzed by the products of panB, -C, -D, and -E. The panBCD operon of S. gallolyticus is highly similar to those present in various clostridial species, bacteria that are also present in the rumen. In contrast, the panE gene encoding a 2-dehydropantoate 2-reductase is located elsewhere on the chromosome and has an ortholog only in S. thermophilus. The high identity of the two panE genes (97% at the nucleotide level) indicates a recent event of LGT between the two streptococcal species. S. gallolyticus is also able to synthesize NAD from aspartate in three steps. The nadABC operon, encoding a quinolinate synthetase, an L-aspartate oxidase, and a nicotinate-nucleotide pyrophosphorylase, respectively, is organized again similarly to counterparts found in various clostridial species present in the rumen. Finally, phylogenetic analysis of the ribDEAH operon, encoding the enzymes catalyzing the four steps of riboflavin biosynthesis in S. gallolyticus, S. agalactiae, and S. pneumoniae revealed that they do not cluster on the evolutionary tree and that they were thus probably gained independently by LGT (see Fig. S3 in the supplemental material). These genes, like the gene involved in panthotenate and NAD biosynthesis, are clustered in regions predicted to have been acquired by LGT. Although we cannot rule out that all other species have lost these genes, we suggest that unlike with the genes coding amino acid biosynthesis pathways, the above-described functions were acquired from other gut bacteria by LGT and the common ancestor was likely auxotrophic for these vitamins.
The genome analysis suggests that due to the very few nutritional requirements of S. gallolyticus it can grow in an environment containing a diverse range of carbohydrates and poor in amino acids and vitamins, probably enabling it to outcompete auxotrophic bacteria in the rumen and in the human colon.
S. gallolyticus has a particularly versatile lifestyle, with the capacity to adapt to different environments and to survive harsh conditions. It colonizes different mammalian and bird hosts, causes a broad range of diseases, and was also isolated outside a host as the major species in a continuous digester fed with shea cake (12). In agreement with its adaptation capacities, the genome of strain UCN34 encodes more regulatory proteins than any other streptococcal genome sequenced to date, as 177 genes (7.7% of all predicted genes) are devoted to regulatory functions. This proportion is slightly higher than that of L. monocytogenes (7.3%) (30) and close to that found in Pseudomonas aeruginosa (8.4%) (56), which are both environmental opportunistic pathogens characterized by their capacity to colonize diverse environments.
Another specific characteristic of S. gallolyticus that is important in hostile environments is its ability to degrade tannins, which are toxic polyphenolic compounds that form strong complexes with proteins and other macromolecules. Indeed, a gene, tanA, encoding a 596-aa-long protein 43% identical to the tannase recently described for Staphylococcus lugdunensis (46), is present in its genome. Like S. lugdunensis tannase, TanA is predicted to be exported and contains a conserved lipobox motif (LTACS) at the cleavage site of the signal peptide, indicating that it may be a lipoprotein that remains attached to the cell membrane. Strain UCN34 also expresses a nonsecreted protein similar to TanA (Gallo_1609). This protein has homologs in diverse Firmicutes species and may have a similar hydrolase activity. tanA and gallo_1609 have no counterpart in other streptococcal genomes and were probably gained by LGT.
Degradation of hydrolyzable tannins by S. gallolyticus produces a phenolic compound, gallic acid, which may also be toxic. However, S. gallolyticus is able to decarboxylate gallate and to use it as an alternative carbon supply (12). This capacity may be achieved by two decarboxylases encoded in the genome that have no orthologs in other sequenced streptococcal genomes. Gallo_2106 (PadC) is predicted to be a phenolic acid decarboxylase similar to those present in lactobacilli and bacilli. For example, the B. subtilis PadC protein decarboxylates phenolic compounds and is implicated in the phenolic acid stress response (61). Gallo_0906 belongs to the carboxymuconolactone decarboxylase family. Both activities may account for the ability of S. gallolyticus to decarboxylate gallate but also other phenolic compounds, like protocatechuic p-coumaric or caffeic and ferulic acids, as previously described (12).
Another important feature enabling S. gallolyticus to multiply in the gut environment is its capacity to hydrolyze bile salt conferring resistance to this detergent. This capacity is probably linked to the bsh (gallo_0818) gene encoding a protein highly similar to Listeria monocytogenes bile salt hydrolase (63% identity) (21). This activity, not yet described for another streptococcus, is commonly found in diverse bacteria dominant in the gut community, like Clostridia, Lactobacillus, or Bacteroidetes organisms.
In addition to these enzymatic activities, S. gallolyticus expresses a broad range of transport systems possibly involved in the adaptation to diverse environments. We predicted 25 genes encoding efflux proteins likely involved in detoxification. Strain UCN34 expresses six efflux proteins belonging to the multidrug and toxic compound extrusion (MATE) family, whereas other streptococci (S. gordonii and S. uberis) encode not more than three efflux proteins of this family. Furthermore, strain UCN34 encodes 58 proteins probably transporting inorganic compounds. Among those, eight have no counterpart in other sequenced streptococcal genomes (including a sulfate family and two heavy-metal transporters). It also expresses three different iron transport systems; gallo_0590 encodes an NRMAP Mn++, Fe++ transporter; gallo_1771-1774 an iron ABC transporter; and gallo_0619-0620 a ferrous iron transporter. Strain UCN34 is, among the sequenced streptococci, the only one to express all three systems, conferring to S. gallolyticus probably a higher capacity to capture iron, a compound rare in the digestive tract and the environment.
In the gut environment, bacteria need to defend themselves against diverse viruses and other mobile genetic elements (MGE) that may aggress them. CRISPR small RNAs are important in the bacterial response to these invaders. Strikingly, strain UCN34 carries two CRISPR loci located next to each other on the chromosome that belongs to the two major classes of CRISPR loci (8). The first one, with 16 spacer sequences, is associated with three cas (CRISPR-associated) genes, gallo_1439 to gallo_1437. The second locus contains 14 spacers and four cas genes, gallo_1443 to gallo_1446. Together with S. thermophilus, strain UCN34 is the only Streptococcus species whose genome sequence is known that encodes multiple CRISPR systems. They may contribute to its resistance to phages and other mobile elements.
S. gallolyticus is known to produce an extracellular capsule considered a virulence factor. We identified an operon containing 12 genes encoding proteins similar to enzymes involved in the biosynthesis of capsular polysaccharides of different streptococci, in particular of S. pneumoniae and S. thermophilus exopolysaccharides (Fig. (Fig.3A).3A). Its organization and encoded proteins showed a high degree of similarity with those of the S. pneumoniae serotype 23F capsule, although the best BlastP hits for the polymerase (encoded by cpsH) and for the undecaprenyl-phosphate glycosyl-1-phosphate transferase (encoded by cpsE) were to the serotype VIII capsular proteins of S. agalactiae. Thus, this operon probably encodes the functions for the synthesis of a capsular structure of S. gallolyticus, which might confer its resistance to complement and to innate immunity, allowing survival in blood as described for these two pathogenic streptococci (37, 43). It has also been shown that S. gallolyticus expresses the human sialyl Lewis antigen, which may contribute to the ability of this bacterium to cross the vascular endothelium (34). The above-described locus may encode the polysaccharide mimicking the human sialyl Lewis antigen.
It was shown that the production of mucopolysaccharides by S. bovis increases the viscosity of ruminal fluid and stabilizes the foam implicated in frothy feedlot bloat (14). A second locus identified in the UCN34 genome encodes three different glycosyl transferases where each structural gene is associated with a regulatory gene (Fig. (Fig.3B).3B). gallo_1053, encoding a nonsecreted transferase, is preceded by an abrB-like regulatory gene, and the two paralogous genes gallo_1055 and gallo_1057, encoding secreted transferases, highly similar to the three glycosyltransferases, GtfA, GtfB, and GtfC, of S. mutans, are each preceded by an rgg-like regulatory gene. In S. mutans, the three extracellular glucosyltransferases are involved in the biosynthesis of insoluble glucans from sucrose. These glucans mediate the adherence of the bacterial cells to the tooth surface and contribute to biofilm formation. The similarity of the genetic organization and the clustering of the three glycosyl transferase genes in S. gallolyticus strongly support the hypothesis that these genes were acquired by LGT from oral streptococci. In order to learn whether strain UCN34 is able to produce glucans, we compared growth on Todd-Hewitt (TH) medium supplemented either with glucose or sucrose (Fig. (Fig.3C).3C). Whereas the colonies on TH-glucose medium were small and nonshiny, they were extremely mucoid on medium supplemented with sucrose. This suggests that the glucans produced by these glycosyl transferases represent the mucopolysaccharides produced during feedlot bloat in cattle (14). Thus, among streptococci, S. gallolyticus is the only one known to produce both a capsule and a mucoid extracellular glucan.
Finally, we also identified an operon (gallo_0364-0367) possibly involved in the biosynthesis of hemicellulose (Fig. (Fig.3D).3D). These four genes encode two glycosyltransferases, including a putative bacterial cellulose synthase (Gallo_0366), a transmembrane (TM) protein (Gallo_0367), and a putative diguanylate cyclase (Gallo_0364). Gallo_0364 contains the characteristic GGDEF motif and carries, in addition, a PAS domain, indicating that it is probably a sensory protein. Diguanylate cyclases are shown to have important regulatory roles in many bacteria. However, Gallo_0364 is the first example of such a protein described for a streptococcal species. Furthermore, cellulose biosynthesis in Gram-negative bacteria has been shown to depend on cyclic di-GMP (51) and is involved in biofilm formation (27). The polysaccharide produced by this biosynthetic pathway may be part of the external matrix of a S. gallolyticus biofilm.
Together with the diversity of surface-exposed polysaccharides, a large repertoire of genes encoding surface proteins belonging to different families is present in the S. gallolyticus genome. Many of them may be involved in its colonization capacity and virulence. For example, strain UCN34 encodes a putative fibrinogen/fibronectin binding protein (Fbp) and four proteins related to staphylococcal collagen binding proteins (Gallo_0577, Gallo_1570, Gallo_2032, and Gallo_2179). However, among these four proteins, only Gallo_2179 carries the collagen binding motif and likely binds collagen. Collagen binding has been shown to be required for the capacity of Staphylococcus aureus to cause endocarditis (33). Similarly, these proteins may contribute to the capacity of strain UCN34 to colonize the endocardium.
Nineteen proteins possess both an N-terminal signal peptide and a C-terminal LPXTG sorting motif. Among these proteins, three have predicted enzymatic functions: a subtilisin-like serine protease (Gallo_0748), a pullulanase (Gallo_1462), and a fructan hydrolase (FruA). Furthermore, three operons encoding a sortase C and two LPXTG motif proteins (gallo_1570-1568, gallo_2040-2038, and gallo_2179-2177) were identified. These gene clusters present several characteristics of pilus operons, indicating that strain UCN34 may have the capacity to synthesize three different types of pilus appendages. Unlike Gram-negative bacteria, Gram-positive bacteria polymerize pilus subunits by transpeptidylation reactions catalyzed by sortase C. These pili of Gram-positive bacteria have been shown to contribute to the colonization of specific host tissues, the modulation of host immune responses, and the development of bacterial biofilms (42). In S. gallolyticus UCN34, two of these pilus operons express a protein similar to collagen binding proteins (Gallo_1570 and Gallo_2179), predicted to be adhesins. Similarly, it was recently described for S. gallolyticus strain TX20005 that it encodes three pilus operons and also collagen binding proteins (54). The locus gallo_2179-2177 is identical to the acb locus of strain TX20005, and the locus gallo_2040-2038 is identical to the sbs15 locus. In contrast, the third locus (gallo_1570-1568) is only distantly related to the third locus of strain TX20005 (sbs13). This indicates diversity in the surface protein repertoire among S. gallolyticus isolates.
Expression of multiple sortase C proteins associated to pilus loci is common in streptococci. However, strain UCN34 presents the unique property of expressing three highly similar sortase A-encoding genes. In addition to the housekeeping srtA gene located downstream the gyrA gene, we identified two paralogous genes both located on mobile genetic elements. Gallo_1651 is carried by a predicted integrative and conjugative element (ICE) (ICESgal1; encoded by gallo_1646 to gallo_1703) that carries also a Tn916-like conjugative transposon (encoded by gallo_1676-int to gallo_1700) expressing the tet(M) determinant. Furthermore, within Tn916, the previously described plasmid pBC16 which carried the tet(L) determinant is inserted (49). This presence of tet(M) and tet(L) explains the resistance of strain UCN34 to tetracycline. Interestingly, ICESgal1 expresses two LPXTG proteins (Gallo_1649 and Gallo_1675), which may contribute to the conjugative transfer of the element. The third sortase gene, gallo_0299, is also located on an MGE, TnGallo1. TnGallo1 carries also three genes encoding LPXTG proteins and is similar to the recently described conjugative transposon TnGBS2 of S. agalactiae (9). Phylogenetic analysis of the amino acid sequences of these three sortases indicates that they are closely related and clustered within the housekeeping SrtA sequences of streptococci. Therefore, the two additional sortase genes were probably acquired by LGT of srtA genes from related species (Fig. (Fig.4).4). It is tempting to suggest that these additional sortase A genes contribute to the propagation of MGEs by anchoring surface proteins encoded by these elements and involved in their conjugative transfer; however, they may also contribute to the cell wall anchoring of proteins encoded by the genome backbone and to the general fitness of the cell.
Another important class of surface proteins is lipoproteins. The genome of strain UCN34 encodes 42 lipoproteins (see Table S1 in the supplemental material). Twenty-six are ABC transporter substrate binding proteins, one is the above-described tannase, one is a peptidyl-prolyl cis-trans isomerase, and 14 are conserved hypothetical proteins. Further analysis of the deduced protein sequences revealed in 27 out of the 42 lipoproteins the presence of a serine-rich motif following the lipid-modified cysteine residue (Table S1). For example, the cysteine residue in Gallo_1845 is followed by 11 serine residues. Interestingly, alignments with similar proteins from other streptococci revealed that the serine-rich domain is in most cases not conserved. This indicated that this motif might be specific for S. gallolyticus lipoproteins and recently acquired. We thus searched for such motifs in the S. gallolyticus proteins. We identified 16 proteins containing one or several serine-rich domains that were not conserved in orthologous proteins of other streptococci. These serine-rich regions are systematically found in domains predicted to be extracellular (Table S1). Similar polyserine repeat regions have been identified with plant cell wall-degrading enzymes of environmental bacteria, like Cellvibrio japonicus (18), Microbulbifer degradans (36), or Saccharophagus degradans (65). It has been proposed that these flexible spacer regions enhance substrate accessibility for its degradation. Interestingly, in S. gallolyticus, these polyserine repeats are found in surface-exposed proteins with diverse predicted functions not directly related to carbohydrate degradation. Some of them are involved in cell wall biosynthesis (penicillin binding proteins 1A and 2C, d-alanyl-d-alanine-carboxypeptidase, and polyglycerol synthase) or in regulation (protein kinase). They are thus possibly linked to specific interactions of the bacteria with polysaccharides from the environment or that they produce.
S. gallolyticus is an important cause of endocarditis; still very little is known about the genetic basis of virulence and niche adaptation. Analysis of the S. gallolyticus genome sequence has revealed important features that might help to understand its virulence and survival strategies. We identified an impressive number of functions that are specific to the species S. gallolyticus among streptococci but that are shared with Lactobacillus, Bacillus, and clostridial species belonging to the normal rumen flora, suggesting that S. gallolyticus acquired different functions by LGT from other inhabitants of the rumen to better adapt to this environment. These functions may also provide some explanation for the association of S. gallolyticus and colon cancer. In a healthy human gut, the low prevalence of S. gallolyticus compared to what is observed with ruminants is probably linked to the difference in diet and to the different environments in the rumen and the human gut. However, alterations of the gut due to colon cancer or neoplastic polyps may affect the flux of intestinal content, leading to the accumulation of material, including plant-derived fibers in close proximity to the epithelial cells and to the tumor. This material is metabolically poor but rich in fiber carbohydrates and possibly tannins. Thus, this may now represent a favorable microenvironment for the proliferation of S. gallolyticus. Our genome analysis therefore substantiates the model of overgrowth of S. gallolyticus linked to colon dysplasia and also supports the observation that the prevalence of S. bovis in fecal cultures from patients with carcinoma of the colon was significantly increased compared to that in controls (10, 40). Furthermore, this proliferation together with the alteration of the gut epithelium may favor translocation of the bacteria to the bloodstream. Subsequently, the predicted capsular polysaccharides will protect S. gallolyticus from innate immunity responses, and the surface proteins and pili showing ECM binding properties, including collagen binding, will aid its adherence to endothelium cells as a preliminary step in the development of infective endocarditis.
The genome sequence of strain UCN34 reported here, isolated from human blood, is the first obtained for a S. gallolyticus isolate. It will provide the genomic information for designing a multilocus sequence typing scheme to study the population of S. gallolyticus strains in more detail. This will lead to a better definition of the relationship between strains isolated from humans, bovines, and birds. This will also be the basis for a rational strain selection for further genome sequencing using high-throughput methods to identify putative genomics and functional specificities associated with the different hosts.
This work was supported by the Institut Pasteur Genopole program. V.D. is supported by a grant from the Ministère Délégué à l'Enseignement Supérieur et à la Recherche (France).
We wish also to thank Mathieu Brochet, Carmen Buchrieser, Maria-Jose Lopez, Harold Tjalsma, and Isabelle Rosinski-Chupin for critical comments.
Published ahead of print on 5 February 2010.
†Supplemental material for this article may be found at http://jb.asm.org/.