|Home | About | Journals | Submit | Contact Us | Français|
The lactic acid bacteria (LAB) might be the most numerous group of bacteria linked to humans. They are naturally associated with mucosal surfaces, particularly the gastrointestinal tract, and are also indigenous to food-related habitats, including plant (fruits, vegetables, and cereal grains), wine, milk, and meat environments (60, 61). The LAB include both important pathogens, e.g., several Streptococcus species, and extremely valuable nonpathogenic species that are used for industrial fermentation of dairy products, meats, and vegetables, and they are also critical for the production of wine, coffee, silage, cocoa, and sourdough (13, 60, 61). In addition, the LAB are a priceless source of antimicrobial agents, the bacteriocins (reference 10 and references therein).
The term LAB mainly refers to the defining feature of the basal metabolism of these bacteria, the fermentation of hexose sugars yielding, primarily, lactic acid. Various aspects of LAB biology and application are thoroughly covered in several books by Wood and Holzapfel and Wood and Warner (60, 61) and numerous reviews, including those in a recent specialized issue of FEMS Microbiology Reviews (12, 14, 18, 20, 23, 24, 31, 39, 42, 49, 58). The definition of LAB is biological rather than taxonomical, i.e., the LAB do not comprise a monophyletic group of bacteria. Most of the LAB belong to the order Lactobacillales, a group of nonsporulating, gram-positive bacteria, but a few LAB species belong to the Actinobacteria (60).
The early sequencing of LAB genomes involved mostly bacteria of the genus Streptococcus, which encompasses most of the pathogenic LABs (50, 60). Currently, 19 complete genomes of streptococci are available, covering different strains of five species. A program aimed at extensive sequencing of the genomes of nonpathogenic LAB was announced in 2002 by the Lactic Acid Bacteria Genome Sequencing Consortium (19), but the actual breakthrough occurred only in the last 2 years (2005 and 2006). At the time of writing (August 2006), 18 complete genome sequences of the nonpathogenic LAB representing 14 species from the order Lactobacillales were available (Table (Table1)1) . The Lactobacillales have relatively small genomes for nonobligatory bacterial parasites or symbionts (characteristic genome size, ~2 megabases, with ~2,000 genes), with the number of genes in different species spanning the range from ~1,600 to ~3,000. This variation in the number of genes suggests that the evolution of LAB involved active processes of gene loss, duplication, and acquisition. The current collection of LAB genomes is a unique data set that includes multiple related genomes with a gradient of divergence in sequences and genome organizations. This set of related genomes is amenable to detailed reconstruction of genome evolution, which is not yet attainable with other groups of bacteria.
This review is largely based on recent work on comparative genomic analysis of 12 nonpathogenic LAB from the order Lactobacillales (Table (Table1)1) (28). We briefly discuss the customized comparative-genomic framework that was developed for the LAB and its application to evolutionary reconstruction and functional genomics.
Robust identification of sets of orthologs (genes derived from the same gene in the last common ancestor of the compared species) is a prerequisite for informative evolutionary-genomic analysis of any group of organisms (25). Construction of orthologous gene clusters for a compact taxon, such as the Lactobacillales, results in much finer granularity than is attainable for broader groups of organisms (e.g., all bacteria), with a greater fraction of clusters containing a single member from all or most of the analyzed genomes. Lactobacillales-specific clusters of orthologous protein coding genes (LaCOGs) in 12 sequenced Lactobacillales genomes were built using the computational procedures that were previously employed for the construction of clusters of orthologous groups of proteins (COGs) from the sequenced prokaryotic and eukaryotic genomes (52, 53). As with the original COG database, manual curation was undertaken to refine the results of the automatic procedure, including tree reconstruction for LaCOGs that had multiple representatives for several genomes, searching for potential missing LaCOG members in untranslated regions of the LAB genomes, merging two or more LaCOGs when they appeared to have evolved from the same ancestral gene, using genomic context for LaCOG validation and refinement, and several additional analyses (37, 52).
Figure Figure11 shows the projection of the LaCOGs on the set of COGs that includes 63 prokaryotic genomes (52). The finer granularity of ortholog identification in the LaCOGs is apparent from the fact that 42% (1,359) of the LaCOGs correspond to 390 COGs. Not unexpectedly, many LaCOGs in this group represent duplications of widely conserved bacterial genes in the common ancestor of Bacilli or at a later stage of evolution. For example, there are three LaCOGs corresponding to the single COG that includes 23S RNA-specific pseudouridylate synthases from other bacteria (COG0564) and two LaCOGs for SpoU-like rRNA methylases (COG0566), TrmA-like tRNA (uracil-5-)-methyltransferases (COG2265), SrmB-like RNA helicases (COG0513), FtsW-like bacterial cell division membrane protein (COG0772), and many other COGs. Several paralogous genes (i.e., genes that evolved by duplication) in this group appear to be characteristic of Lactobacillales and deserve a more detailed discussion. One notable case involves two paralogous enolases (two LaCOGs), a nearly ubiquitous glycolytic enzyme that is present in a single copy in other bacteria. Phylogenetic analysis showed that one of the enolases in Lactobacillales is the ancestral version of that in gram-positive bacteria, whereas the other had apparently been acquired by the ancestor of the Lactobacillales from a different bacterial lineage, most likely Actinobacteria (28). Thus, the enolases of LAB actually appear to be pseudoparalogs, i.e., genes that evolved via acquisition of one of the related genes by horizontal transfer from a distant lineage rather than via duplication within the given lineage (25). It has been shown that both enolases of Lactococcus lactis subsp. lactis have enzymatic activity (15); however, their specific physiological functions remain unknown.
Another case analysis revealed an even more complex evolutionary history within the family of enzymes involved in peptidoglycan biosynthesis. The gene in question is present in a single copy in the majority of bacteria and encodes the enzyme UDP-N-acetylmuramyl tripeptide synthase (MurE), which catalyzes the addition of mesodiaminopimelate to the peptidoglycan biosynthesis intermediate UDP-N-acetylmuramoyl-l-alanyl-glutamate. The phylogenetic tree suggests that two paralogs (LaCOG00743 and LaCOG01200) of this enzyme from another group of bacteria (perhaps, again, actinobacteria) have replaced the ancestral gene in all Lactobacillales, with the exception of Lactobacillus plantarum, with subsequent functional diversification (Fig. (Fig.2).2). The proteins of LaCOG01200 apparently changed the substrate specificity to lysine and continue to function in the peptidoglycan biosynthesis pathway (4). The function of LaCOG00743 remains unknown, but examination of the genomic context yields some clues. The gene is located in a conserved operon with the gene for α-carboxyl amidase, which acts on the second amino acid (d-Glu) of the stem peptide precursors; in addition, the LaCOG00743 protein lacks the catalytic residues. Thus, it can be inferred that this protein is the substrate recognition subunit of the amidase.
Another group of interest consists of 707 (22%) LaCOGs that have no counterparts in the general COG set. Of these, 338 (~11%) were shared with one or more non-Lactobacillales bacterial genomes among those reported recently and not yet included in the COGs, whereas 369 (~11%) appeared to be specific to Lactobacillales and could be considered the genomic signature of the group.
Considerable effort has been invested in assigning reliable functional annotations to LaCOGs to reflect recent experimental findings on LAB genes and to predict functions of uncharacterized proteins on the basis of sequence similarity to proteins with known functions and domain architectures. Thanks to the broad representation of various lineages of Lactobacillales in the set of genomes used for LaCOG construction, LaCOGs provide excellent coverage of new genomes from this taxon, up to 90% of the genes (Fig. (Fig.3).3). Thus, LaCOGs are expected to become an essential, evolving resource for annotation of new genomes from the order Lactobacillales. The complete annotated list of LaCOGs is available at ftp://ftp.ncbi.nih.gov/pub/wolf/lacto (file LaCOGS_table.xls); other detailed results of this analysis, including reconstructions of gene loss and gain, are available upon request.
An important issue that can be readily addressed with the help of LaCOGs is the identification of the conserved gene core of Lactobacillales, i.e., the set of genes that are present in all sequenced genomes from this taxon and, by inference, are likely to be essential for these bacteria. There are 567 LaCOGs (18%) that are present in all 12 analyzed genomes. Not surprisingly, the functional distribution of these LaCOGs shows that the majority encode components of the information-processing systems (translation, transcription, and replication). However, the core also includes 50 genes for which only a general prediction of biochemical activity is available and 41 genes without known or predicted functions. This observation emphasizes the current lack of understanding of even some of the central cellular functions of relatively simple bacteria. Two core genes have no detectable orthologs outside Lactobacillales and thus may be considered unique genomic markers of Lactobacillales. One of these unique genes encodes a protein containing a peptidoglycan-binding LysM domain (LaCOG01826) (Fig. (Fig.4A).4A). In several genomes, this gene is located next to the genes for ribosomal proteins and cytidylate kinase and might be coregulated with these housekeeping genes. The second genomic marker of Lactobacillales, the highly conserved LaCOG01237, contains no characterized domains (Fig. (Fig.4B).4B). However, this gene is located in a conserved genomic neighborhood encoding two enzymes implicated in 4-thiouridine modification of tRNA [(5-methylaminomethyl-2-thiouridylate) methyltransferase and a predicted sulfurase] (LaCOG00578 and LaCOG01188), suggesting a role of LaCOG01237 proteins in specific modulation of this essential modification (27).
The LAB considered here belong to the phylum Firmicutes, class Bacilli, and order Lactobacillales, a sister taxon to the order Bacillales. Prior to the recent, extensive genome sequencing, classification of Lactobacillales remained an unresolved issue, particularly because the phenotypic classification, which was traditionally based on the type of fermentation, did not match the rRNA-based phylogeny (56). Whole-genome DNA and DNA-RNA hybridization and GC content analysis led to the delineation of three closely related lineages of Lactobacillales: the Leuconostoc group (Leuconostoc mesenteroides and Oenococcus oeni), the Lactobacillus casei-Pediococcus group (L. plantarum, L. casei, Pediococcus pentosaceus, and Lactobacillus brevis), and the Lactobacillus delbrueckii group (L. delbrueckii, Lactobacillus gasseri, and Lactobacillus johnsonii); streptococci (Streptococcus thermophilus) and lactococci (L. lactis subsp. lactis and Lactococcus lactis subsp. cremoris) formed a separate branch (48). With the availability of complete genomes for all major branches of Lactobacillales, phylogenetic trees can be built by using concatenated protein sequences encoded by genes that are unlikely to be transferred horizontally. This approach has been shown to improve the resolution and increase the robustness of phylogenetic analyses (59). A tree of Lactobacillales constructed by this approach from concatenated sequences of ribosomal proteins and RNA polymerase subunits had the same topology and was supported by high bootstrap values but disagreed in some important respects with the above classification (28). Specifically, the new tree suggests that the Streptococcus-Lactococcus branch is basal in the Lactobacillales tree and the Pediococcus group is a sister to the Leuconostoc group within the Lactobacillus clade. Thus, the genus Lactobacillus appears to be paraphyletic with respect to the Pediococcus-Leuconostoc group. Lactobacillus casei is confidently placed at the base of the L. delbrueckii group. Figure Figure55 shows the phylogenetic tree of concatenated RNA polymerase subunits for all species of Lactobacillales whose genomes are currently available; this tree, made for an expanded species set, was fully compatible with the previous one (28).
A molecular-clock test performed for the phylogenetic tree based on multiple alignment of concatenated ribosomal proteins (51) revealed a high heterogeneity of evolutionary rates among Lactobacillales, including confirmation of the previously reported (62) accelerated evolution of the Leuconostoc group by a factor of 1.7 to 1.9 relative to the sister Pediococcus group (28). Similarly, O. oeni was found to evolve substantially faster (by a factor of 1.6) than Leuconostoc. This finding is in accord with the experimental observation of an increased mutation rate in O. oeni (D. A. Mills, unpublished observation) and the absence in the species of the key enzymes of mismatch repair, MutL and MutS, which is unique among the Lactobacillales (Table (Table2)2) .
Analysis of phyletic (phylogenetic) patterns, i.e., patterns of gene presence/absence in a particular set of genomes, is a valuable approach both for the detection of evolutionary trends and for functional prediction (36, 44). A straightforward examination of frequent phyletic patterns in LaCOGs immediately reveals several trends in the evolution of Lactobacillales that mostly reflect gene losses, especially of genes that encode biosynthetic enzymes (Table (Table2).2). However, genes shared by distinct sets of bacteria are also of interest. Some of these shared genes apparently reflect recent gene exchanges between distantly related species within the order Lactobacillales. Several cases are obvious, e.g., 11 genes that are shared by L. johnsonii and L. lactis subsp. cremoris and are located adjacent to a prophage and therefore in all likelihood have been transferred by the phage vehicle. Another set of genes disseminated via horizontal transfer is the CRISPR-related system (CASS) implicated in the defense against integrative phages and plasmids (6, 33) in L. delbrueckii and L. casei. In these LAB, the CASS includes a unique gene that encodes a protein with a Cas1 domain fused to a 3′-5′ exonuclease domain (29). Other phyletic patterns reflect specific sets of genes shared by related species, often poorly understood in functional terms. Not surprisingly, the second largest gene set (246 genes), after the conserved core, is shared by two most closely related genomes, those of Lactococcus lactis subsp. lactis and Lactococcus lactis subsp. cremoris. Many genes in this list (>50) apparently belong to prophages that are shared by these two species and have probably integrated into the genome of their common ancestor. Among the 88 genes that are specifically shared by L. johnsonii and L. gasseri, 48 are uncharacterized; many of them encode secreted and membrane proteins that are likely to be involved in the interaction with mucosal surfaces of the gastrointestinal tract, which these bacteria colonize. Similar trends have been observed for three related species, L. johnsonii, L. gasseri, and L. delbruecki, that specifically share 39 genes, 27 of which are uncharacterized.
Phyletic patterns of LaCOGs, together with the phylogenetic tree of Lactobacillales, can be used for explicit reconstruction of the events that occurred during the evolution of this group after its divergence from the common ancestor with the rest of the Bacilli. For the purpose of this reconstruction, we employed a modification of a previously developed method based on the weighted-parsimony approach (32). The results of the reconstruction (28) suggest that the common ancestor of Lactobacillales had at least ~2,100 to 2,200 genes, having lost 600 to 1,200 genes (~25 to 30%) and gained <100 genes after the divergence from the Bacilli ancestor, for which the genome size of ~2,700 to 3,700 genes was estimated (Fig. (Fig.6).6). Many of the changes mapped to this stage of evolution seem to be related to the transition made by the LAB to existence in nutritionally rich medium. Thus, a number of genes for biosynthesis of cofactors, such as heme, molybdenum coenzyme, and panthothenate, were lost, and conversely, some cofactor transporters were acquired, e.g., nicotinamide mononucleotide transporter. Another notable acquisition is a group of diverse peptidases which are obviously an important commodity in the protein-rich environments inhabited by the LAB. The loss of heme/copper-type cytochrome/quinol oxidase-related genes (CyoABCDE) and catalase (KatE), characteristic enzymes of aerobic bacteria, suggest that the ancestor of Lactobacillales was a microaerophile or an anaerobe.
Lineage-specific gene loss was extensive in the evolution of all lineages of Lactobacillales, but several species stand out as especially notable “losers.” In particular, S. thermophilus not only lost numerous genes but also has many fresh pseudogenes, suggesting an active and ongoing process of genome decay, which has been reported for two different strains of the same species (5). Moreover, substantial gene loss (368 genes, according to the present reconstruction) also occurred at the base of the Streptococcus-Lactococcus branch, including several genes involved in cell division that are conserved in most bacteria, such as CrcB, MreB, MreC, and MinD. This is reminiscent of the trends of gene loss that are observed in other symbiotic and pathogenic bacteria (21, 35). The lineages of Lactobacillales that are particularly prone to gene loss are P. pentosaceus; the Leuconostoc and Oenococcus branch, with considerable additional loss in each species; and the L. delbrueckii group (L. debrueckii, L. gasseri, and L. johnsonii), with further genome reduction in L. gasseri and L. johnsonii (Fig. (Fig.6).6). In the species with larger genomes, such as L. plantarum and L. casei, the loss of ancestral genes was counterbalanced by the emergence of many new genes via duplication and horizontal gene transfer (HGT) (Fig. (Fig.66).
Horizontal gene transfer via bacteriophage-mediated or conjugative pathways has been extensively documented in Lactobacillales and appears to be important for niche-specific adaptation in the lactococci (61). Signs of HGT are particularly notable in L. lactis subsp. cremoris SK11, which harbors a conjugative plasmid (pLAC3) and several additional plasmids carrying genes related to growth in milk (47).
Horizontal gene transfer definitely played an important role in shaping the common ancestor of Lactobacillales. As many as 84 genes that were inferred to have been acquired by the Lactobacillales ancestor (Fig. (Fig.6)6) most probably were horizontally transferred from different sources (only 2 of the 86 acquired genes currently do not have orthologs outside Lactobacillales). In some cases, the ancestor acquired an additional, pseudoparalogous copy of a gene by HGT (e.g., the aforementioned enolases), whereas on other occasions, xenologous displacement (acquisition of genes via HGT, followed by the loss of the ancestral orthologous gene) (26) apparently took place, as in the case of the two forms of the MurE-like UDP-N-acetylmuramyl tripeptide synthase.
Recently, a simple approach for the detection of violations of the molecular clock in individual COGs has been developed (41). The approach is based on comparing the evolutionary distances within a set of orthologs to a standard intergenomic distance measured for genes known to be less prone than others to HGT and deviations in evolutionary rates (e.g., ribosomal proteins and subunits of RNA polymerase). Most often, statistically significant deviations from the molecular clock are best explained by HGT (41). This test was applied to the LaCOGs, and significant violations of the molecular clock were detected in at least 25% of the LaCOGs, suggesting a high level of HGT and/or major local accelerations of evolution. Molecular-clock violations are particularly common in certain functional groups of genes, such as those encoding enzymes of sugar metabolism, including key enzymes, such as phosphoketolase, transketolase, and various components of phosphotransferase systems.
Furthermore, most of the unique genes that are present in the individual genomes (not covered by LaCOGs) and that have homologs outside Lactobacillales are probable products of recent HGT. Among such examples are two copies of the Mn-containing catalase (COG3546) in P. pentosaceus that is shared with a different strain of L. plantarum (3) and that were probably transferred to the ancestral lineage of the Pediococcus group from a source belonging to Bacilli (but not Lactobacillales), with subsequent loss in other members of the Pediococcus group; a distinct form of the predicted polymerase of CASS (COG1353) that is most often present in thermophiles and was detected in all strains of S. thermophilus (29); the urease complex that is present in S. thermophilus but absent in other Lactobacillales and that has been shown to influence the rate of milk acidification (34); and the propanediol utilization operon, which is present in L. brevis and so far has been found only in a few species from different lineages, including Lactobacillus collinoides, a heterofermentative LAB contained in cider, where the enzymes encoded in this operon are thought to be involved in glycerol degradation (45).
Like all other bacterial lineages (16), Lactobacillales have a substantial number of expanded gene families that evolved either by lineage-specific gene duplication or by acquisition of pseudoparalogous genes via HGT (26). A closer examination of these families indicates that adaptation to growth in nutrient-rich environments was the major driving force behind the fixation of duplications and acquisitions during the evolution of the Lactobacillales.
Many genes encoding proteins involved in sugar metabolism and transport were duplicated or acquired early in the evolution of the Lactobacillales, including those encoding enolase, several phosphotransferase systems, β-galactosidase, GpmB-family sugar phosphatases, galactose mutarotase, and others. In addition, expansion of peptidases and amino acid transporters seems to have occurred in several lineages of Lactobacillales. Interestingly, several expanded families include proteins involved in antibiotic resistance in other bacteria, such as β-lactamases and penicillin V acylases, despite the fact that most LAB species are sensitive to common antibiotics and, after centuries of consumption by humans, have been accordingly “generally recognized as safe” (17, 54). Conceivably, the homologs of antibiotic resistance genes are involved in normal cell wall biosynthesis in the Lactobacillales. In the same vein, expansion of a distinct family of tyrosine/serine phosphatases, which are often located in the same operon with a serine/threonine protein kinase fused to β-lactam-binding (PASTA) domains, is likely to be important for regulation of cell wall biosynthesis (63). Furthermore, Lactobacillales encode a paralog of class II lysyl-tRNA synthetase that is fused to a membrane-associated domain (COG2898) implicated in oxacillin-like antibiotic resistance (40) and is probably involved in cell wall biosynthesis.
Bifidobacterium longum is a LAB that belongs to a different major bacterial branch, the actinobacteria. The complete genome sequence of B. longum has been reported (46). This bacterium also inhabits the gastrointestinal tract and so shares the environment with several Lactobacillales (Table (Table1).1). Identification of common genes between B. longum and Lactobacillales is of interest because it has the potential to delineate a “genomic cognate” of the LAB phenotype. However, only seven genes that are present in B. longum but not in the genomes of non-LAB actinobacteria are shared specifically with Lactobacillales. Only one of these genes, which encodes a functionally uncharacterized membrane protein, is present in seven genomes of Lactobacillales (LaCOG00453), whereas the rest are present only in B. longum and two Lactobacillales species. By contrast, common trends of gene loss among LAB with different taxonomic affinities are obvious (Table (Table3).3). The majority of these common losses map to the last common ancestor of Lactobacillales. These observations indicate that convergent evolution of the LAB phenotype in different bacterial lineages was accompanied by similar processes of extensive loss of genes, primarily those encoding a variety of metabolic capabilities, whereas acquisition of new genes was much less extensive and involved different gene sets. At least in the case of B. longum and the Lactobacillales, there is no evidence of substantial HGT between taxonomically distant LAB.
Lactobacillales are known for producing specific antimicrobial peptides, the bacteriocins (38, 55). Several proteins responsible for the modification and export of bacteriocins and regulation of bacteriocin biosynthesis are often encoded in the same operon with the bacteriocins themselves (38, 55). Since bacteriocins are small proteins with highly diverged sequences, they are often difficult to identify by amino acid sequence conservation. A comparative-genomic approach is likely to be effective for a more complete characterization of the bacteriocin repertoire of Lactobacillales. Indeed, in seven Lactobacillales genomes, we identified clustered genes for putative bacteriocins and associated proteins. Several of these newly detected candidate bacteriocins clearly belong to two families characterized previously: homologs of pediocin from P. pentosaceus, homologs of which are also present in L. mesenteroides and L. casei, and homologs of divercin V41 (30), which are present in P. pentosaceus and L. johnsonii. In addition, numerous small open reading frames located in the immediate vicinity of the genes for bacteriocins and associated proteins might encode novel bacteriocins, despite the lack of sequence similarity to known ones (Fig. (Fig.7).7). Recently, a similar strategy for genome mining in search of new bacteriocins has been implemented in a specialized online server (11).
The sequencing of multiple complete genomes has created unprecedented opportunities for evolutionary genomics of LAB, particularly those of the Lactobacillales lineage. The clusters of orthologous genes from Lactobacillales (LaCOGs) provide a convenient, flexible framework for both functional annotation of new genomes of Lactobacillales and evolutionary reconstruction. Loss of ancestral genes, primarily those for various metabolic enzymes, comes across as the central theme in the evolution of Lactobacillales, with a clear connection with the adaptation of these bacteria to their nutrient-rich habitats. Substantial gene loss had already occurred at an early stage of the evolution of Lactobacillales, after their divergence from the common ancestor with the rest of the Bacilli but before the radiation of the extant species. Additional differential gene loss took place during the subsequent evolution of each lineage of Lactobacillales. The trends of gene loss are very similar in taxonomically distant LAB, particularly between Lactobacillales and B. longum, an actinomycete, conceivably due to similar environmental pressures. However, the repertoires of genes that were acquired via HGT are quite different, indicating that the LAB phenotype evolved convergently in diverse lineages of bacteria as opposed to horizontal transfer of a unique suite of genes. Some of the genes acquired by Lactobacillales are clearly adaptations to existence in the nutrient-rich habitats of these bacteria. Comparative-genomic analysis substantially facilitates functional annotation of LAB genomes. In particular, this analysis helps to predict new bacteriocins, antimicrobial peptides that are typically produced by LAB, apparently reflecting their long-term existence in complex microbial communities.
We thank the members of the Lactic Acid Bacteria Genome Sequencing Consortium and, personally, David Mills and Sergay Kozyavkin for extensive collaboration and numerous helpful discussions in the course of the comparative-genomic analysis of LAB.
Published ahead of print on 3 November 2006.