Lactobacillus acidophilus NCFM
NCFM was originally isolated from humans and has been used extensively in industrial applications as a probiotic culture. Its complete genome sequence and content have been described [6
], highlighting various genome regions harboring potentially significant genes and gene clusters. These analyzes were performed using regular similarity and gene synteny searches using standard databases. To investigate strain-specific traits within lactobacilli of human origin and other LAB, custom databases dbLB and dbLAB were utilized and revealed the presence of several unique regions, including three previously described genetic elements designated as pauLA-I through pauLA-III (potential autonomous unit; PAU). These elements resemble features from both plasmids and phages and were not found in other lactic acid bacteria to date. Because the core region of each PAU is mostly unique with regard to amino acid sequence similarities to LAB, use of the custom databases dbLB and dbLAB highlighted all three regions on the respective genome atlas (Fig. a, I, III, and VI, blue shading). In contrast, the genome regions adjacent to the core for pauLA-II and pauLA-III were identified by DBA due to the presence of a Type II and a Type III restriction endonuclease (LBA332 and 475), respectively, that was not present in the other three lactobacilli. However, homologs to these nucleases can be found in Streptococcus thermophilus, Bacillus subtilis
and to a lesser degree, in the draft phase sequence of Lactobacillus bulgaricus
. The Doc-Phd system featured by pauLA-I remained unique for L. acidophilus
NCFM, with no close homologs among LAB. Strikingly, despite the lack of apparent core sequence similarities, more PAUs were identified by functional classifications and synteny analyses in other LAB and a detailed analysis was performed in reference to L. johnsonii
Another L. acidophilus
NCFM-specific locus at ~250 kb features a potential DNA repair system (Fig. a, II). The gene cluster describes a multidrug resistance protein that may resemble a permease (Lba251 to 253) and a 6-O-methylguanine DNA methyltransferase (Lba255), likely to be arranged in an operon-like structure. The formation of O-6-alkylguanine represents one of the major mutagens for DNA [48
]. The repair of such DNA is facilitated by DNA S-methyltransferases by transferring the alkyl group at the O-6 position to a cysteine residue, consequently inactivating the enzyme in a suicide reaction. These enzymes are featuring two distinct domains: an N-terminal methyltransferase and a C-terminal DNA-binding domain. Lba255 revealed a well-conserved DNA-binding domain and although the methyltransferase domain was only weakly conserved, most of the absolutely conserved residues were maintained [48
], indicating a preservation of the methylase function. An alternative function of the methyltransferase is that of a transcriptional activator which is mediated by self-methylation at Cys-38 [60
]. The immediate presence of a predicted multidrug efflux system might point to a potential export mechanism for (conjugated) alkylating agents [63
]. The genetic arrangement of this locus may suggest a dual functionality of the methyltransferase, acting both as a DNA repair system and a chemosensor for adaptive response. Although L. johnsonii
NCC533 features a related methyltransferase (Ljo252) at a similar genome position, no transport mechanism could be identified upstream of this gene, implicating this as a unique DNA repair mechanism in L. acidophilus
NCFM. However, the transport system may be inactivated by a premature stop-codon and its functionality remains to be verified.
Located upstream of the terminus of replication, two large genome regions were identified (1.012–1.055 Mb and 1.073–1.092 Mb), predominantly specific to L. acidophilus
NCFM (Fig. a, X and XI). Within the first region, a unique arginase (Lba1022, EC22.214.171.124) was identified, preceded by a transcriptional regulator of the merR family (Lba1021). This enzyme might catalyze the hydrolysis of arginine into l
-ornithine and urea. As reported earlier, L. acidophilus
NCFM encodes an ornithine decarboxylase (Lba996) that decarboxylates ornithine to putrescine and carbon dioxide [9
], consuming a proton and potentially raising the cytoplasmic pH. The presence of a putrescine export system (Lba709 to 712) further strengthens the model of a specific internal mechanism to compensate for lowering pH during acidification. The presence of a periplasmatic-binding protein for putrescine (Lba713) points to the fact that under normal circumstances this ABC transporter is responsible for the uptake of this metabolite. However, it is noteworthy that under certain conditions ABC transporters may also act bi-directionally and are capable of generating ATP in the process [13
Within the second L. acidophilus NCFM–specific region (Fig. a, XI), a gene cluster was identified, consisting of 16 genes (LBA1100 to 1115). This cluster might be divided into three groups, each consisting of a flavodoxin, an associated oxidoreductase, and some accessory proteins. Although no immediate functional assignment within the metabolic network could be performed, the close proximity of these genes to each other could indicate an intimate cooperation between the single gene sets. Furthermore, the COG classification of all three groups referred to energy production and conversion and the presence of two membrane transporters further suggests a strain-specific mechanism for metabolite uptake and conversion. Despite the presence of several predicted premature stop-codons and frameshifts within the first set of genes, it certainly would be intriguing to further characterize this unusual genome region.
A ppGpp regulated growth inhibitor protein of the PemK family (Lba1405) has been described for L. acidophilus
NCFM (Fig. a, XIV). PemK is part of a plasmid maintenance system in E. coli
, whereby PemK inhibits growth of host cells. PemK has also been reported to show DNA-binding capabilities, autoregulating its own synthesis (PFam2452). The antagonist, PemI, binds to PemK thus inactivating the protein [46
]. However, the corresponding suppressor PemI was not identified in the L. acidophilus
NCFM genome. Alternatively, the presence of a predicted bacterial transcription activator (LBA1408) upstream of LBA1405 could indicate a possible interaction of LBA1405 and LBA1408 in transcriptional regulation of this genome region. While this system is unique to L. acidophilus
NCFM, a toxin–antitoxin system was recently described for the L. johnsonii
NCC2761 (closely related to L. johnsonii
NCC533) prophage LJ771. There is evidence that the maz
K system contributes to prophage stability within L. johnsonii
NCC2761, although the specific role of this system for both the phage and its host remains to be elucidated [20
A putative l
-alanine conversion system was identified in L. acidophilus
NCFM that was unique among the LAB examined (LBA1695 and 1696) (Fig. a, XXIII). LBA1695 has been annotated as aspartate aminotransferase, converting l
-aspartate into oxaloacetate (EC126.96.36.199). However, metabolic pathway mapping using PathwayVoyager [5
] and the KEGG database [33
] also revealed a second highly conserved hit to an l
-aspartate beta-decarboxylase (EC 188.8.131.52) that converts l
-aspartate into l
-alanine while releasing carbon dioxide. A membrane-bound antiporter (LBA1696) then mediates the exchange of l
-alanine and l
-aspartate, generating a physiological membrane potential. This system might be involved in yielding metabolic energy by generating a proton motif force and regulating internal pH. Similar systems have been described in Tetragenococcus halophila
D10 and are generally referred to as proton motif metabolic cycles [1
]. It is noteworthy that related systems were described for other lactobacilli (i.e. L. sakei
) that are not considered commensals.
The above-described genome regions highlighted strain-specific traits. DBA, however, identifies areas that are group-specific and may represent features shared by all or some members of this group, whereas the same regions are absent in another set of organisms. Among the most prominent regions identified by DBA are those harboring surface, mucus-, and fibrinogen-binding proteins (LBA1019, 1020, 1611, 1612, and 1634) (Fig. a, IX and XXI). All of these regions (green) are generally conserved between at least one of the three lactobacilli, but not found in other LAB, indicating their importance in strain/group-specific cell–cell interactions. Functional analyses of selected surface genes confirmed their role in cell adhesion and potentially cell–host interactions, in vitro [16
]. It might be noteworthy that besides these group-specific genes, a number of strain-specific genes, potentially involved in cell adhesion (LBA1495, 1496, and 1654) or located on the cell surface (LBA1654), were also identified (Fig. a, XXII). Strain specificity of cell surface proteins appears to be a widespread and important factor mediating particular cell–cell interactions. Recently, a novel genetic locus, spa
ABCDEF coding for LPXTG-like pilins and a pilin-dedicated sortase in Lactobacillus rhamnosus
GG, was identified [34
]. The functional characterization of this locus revealed a mucus adhesion mechanism not previously described in lactobacilli. Specifically, presence of the cell wall–bound SpaC pilin was of critical importance to mucus binding, directly affecting retention time of L. rhamnosus
GG in the intestinal tract [68
]. Because of the biological significance, all four genomes analyzed here were screened for the presence of such a pilus cluster. A PSI-Blast over three iterations against the non-redundant database provided by NCBI and a BlastP analysis against the ORFeomes of L. acidophilus
NCFM, Lactobacillus gasseri
ATCC33323, Lactobacillus johnsonii
NCC533, and Lactobacillus plantarum
WCFS1 were carried out. While hits against other L. rhamnsosus
and Lactobacillus casei
strains were identified for the spa
ABCDEF cluster, no significant matches were detected against L. acidophilus
NCFM, L. johnsonii
NCC533, L. gasseri
ATCC33323 or L. plantarum
WCFS1—with the exception of hits against L. gasseri
strain 224-1 (spa
B, and spa
E) and one weak hit to spa
F in L. plantarum
WCFS1. Similar results were found when the spaABCDEF cluster was compared to the four genomes directly. Only L. plantarum
WCFS1 showed some weak matches, most of them unspecific hits to surface proteins. It may be interesting to point out that PSI-Blast analyses revealed an extensive number of matches throughout the gene cluster to numerous enterococci, supporting the proposed model of acquisition of this gene cluster via horizontal gene transfer [34
Directly adjacent to the strain-specific gene cluster LBA251 to LBA255 (Fig. a, II), a three gene operon was identified potentially involved in drug resistance (Lba246 to Lba248). The operon consists of an ABC transport system (ATPase and permease component) and a LytR-type response regulator. Although the ATPase component was annotated as a Daunorubicin resistance protein, alignments of the Daunorubicin (DNR) resistance genes DrrA and DrrB from Streptomyces peucetius
] did not reveal a high degree of similarity between the respective protein sequences (DrrA: 32% identity, 53% positive; DrrB: no similarities detected) and it appears to be more likely for this operon to be involved in other types of drug transport. Only L. johnsonii
NCC533 showed significant homologies to these genes, whereas other LAB merely displayed limited levels of conservation for certain conserved domains.
Oxalic acid is a strong dicarboxylic acid that can cause pathological disorders [49
]. At this point, degradation of oxalic acid in the gastrointestinal tract has been reported to be linked exclusively to the bacterial commensal Oxalobacter formigenes
]. The reaction is carried out by two enzymes, a formyl CoA-transferase (frc
) and an oxalyl CoA-decarboxylase (oxc
). L. acidophilus
NCFM and L. gasseri
ATCC33323, but not the closely related L. johnsonii
NCC533, show highly conserved homologs to frc
(LBA395; LGA130244) and oxc
(LBA396; LGA130245). Flanking genes in L. acidophilus
NCFM constitute a second acyl CoA-transferase (LBA394) and the ATPase component of an ABC transporter (LBA397) (Fig. a, V). Although the genetic organization might indicate an operon-like structure spanning from LBA397 to LBA394, only LBA395 and LBA396 were inducible and expressed upon acid stress [10
]. Functional characterization and gene expression analyses in L. acidophilus
NCFM demonstrated the capability to degrade oxalate, potentially identifying one probiotic role [10
]. Interestingly, both CoA-transferases were grouped into the PFam CoA-transferase family 3, a novel family distinctively different from the two other, previously described families [30
]. Based on gene synteny, functional classification, and the slightly higher GC content of 38.64 and 39.53% in L. acidophilus
NCFM and L. gasseri
ATCC33323 respectively, one might speculate that this region was acquired via horizontal gene transfer (HGT) and then subjected to subsequent deletion events that left only frc
functional. To further investigate the possibility of HGT, we have analyzed dinucleotide frequencies and codon usage for both genes and compared them to the average found in the L. acidophilus
NCFM genome. A chi-squared test was used to determine whether the observed forward strand dinucleotide counts for the genes of interest were similar to the expected counts under the assumption that the genes came from the genome of interest. Results clearly indicate a different dinucleotide frequency when compared to the average genome (chi-squared value: 100.6, degrees of freedom = 15, p
-value = 1.00E-14). Similarly, a chi-squared test was used to compare the codon usage frequencies for the two genes to the overall genome frequencies. Rare codons (expected values <5) were pooled to create a single category before the test was done. Again, results indicate a different codon usage of both genes (chi-squared value of 224.8, degrees of freedom = 46, and p
-value = 2.14E-25). Both the dinucleotide frequencies and codon usage differences support the hypothesis of HGT acquisition.
Carbohydrate metabolism is a central anchor of energy generation with living organisms. Transport and metabolism systems are often specific to certain sugars and rely on dedicated enzymes for processing [35
]. DBA partially highlighted two divergently oriented gene clusters in L. acidophilus
NCFM, LBA1364 to 1367 and LBA1368 to 1369, respectively, likely to be involved in group-specific carbohydrate metabolism (Fig. a, XIII and Fig. , cluster 1 and 2). The first gene cluster consists of three alpha- and beta-galactosidases and a transcriptional regulator of the ROK family, predicted to be involved in galactose metabolism. Although the three structural genes feature many homologs throughout LAB, it is interesting to note that the transcriptional regulator LBA1367 is much less conserved and only L. johnsonii
NCC533 revealed a close homolog (LJO735) within the same gene synteny. The second gene cluster consisted of the cellobiose-specific phosphotransferase component EIIC, cel
B, (LBA1369) and a transcriptional regulator of the ROK family (LBA1368), similar to LBA1367. CelB is one of eight cellobiose-specific PTS EIIC components identified in L. acidophilus
NCFM, responsible for the phosphorylation during translocation of the sugar. Both genes of this cluster show only very limited similarities throughout LAB, again with the exception of L. johnsonii
NCC533 (Ljo733 and 734). It is tempting to speculate that the specific transcriptional regulators found for these two gene clusters are involved in modulating strain-specific degradation of complex sugars and moiety uptake. Based on gene synteny at this locus, it might be noteworthy to highlight a third gene cluster found exclusively in L. johnsonii
NCC533 (Fig. , cluster 3). This L. johnsonii
specific cluster is located immediately downstream of Ljo733 and consists of three ORFs; two phosphatidyl-serine decarboxylases (Ljo730 and 732) and a predicted permease/antiporter (Ljo731). Detailed analysis of the intergenic region between Ljo732 and 733 revealed the presence of a predicted rho-independent terminator (−16.09 kcal/mol) and a highly conserved promoter-like structure, likely to control expression of the gene cluster independently from the upstream transcriptional regulator Ljo734. In silico analyses using PathwayVoyager [5
] and the KEGG database revealed a partial metabolic pathway for aminophospholipids, originating from serine and terminated at phosphatidylethanolamine. Phosphatidylserine decarboxylases (EC184.108.40.206) mediate the conversion of an activated serine (phosphatidyl-l
-serine) into phosphatidylethanolamine and CO2
. Both decarboxylases identified in this cluster are highly conserved to each other at their respective N and C-termini (75% identity, 86% similarity) and show similarities only to LBA1607 in L. acidophilus
NCFM. However, neither gene synteny nor gene content is conserved in L. acidophilus
NCFM and LBA1607 could not be placed into the same context. No genes could be identified in L. johnsonii
NCC533 that might further convert phosphatidylethanolamine. However, the presence of the predicted permease/antiporter (Ljo731) might point to a different functionality than aminophospholipid metabolism, thus employing an otherwise dead-end metabolic reaction. Decarboxylation reactions have the general potential of raising the internal pH. Hence, the decarboxylation of the activated serine might change the pH, and ethanolamine could subsequently be exported from the cell to avoid accumulation of a dead-end product in the organism. Based on this model, the gene cluster shows a possible involvement in internal pH regulation, unique to L. johnsonii
NCC533. The presence of two decarboxylase genes has been reported for other organisms as well and both enzymes have the capability to complement each other [65
]. Interestingly, the KEGG database does not currently list the required serine-activating phosphatidyltransferase (EC220.127.116.11) for L. johnsonii
NCC533. However, reverse BlastP using a customized database featuring prokaryotic orthologs to the phosphatidyltransferase listed in the KEGG database and the complete proteome of L. johnsonii
NCC533 as queries revealed the presence of a predicted glycerophosphate phosphatidyltransferases (Ljo838). Further analyses strengthened the functional classification of Ljo838 by disclosing the presence of CDP-alcohol phosphatidyltransferase domain (PFam 1066) and showing similarities to COG class 558, harboring phosphatidylglycerophosphate synthases. Therefore, Ljo838 might complete this strain-specific pathway, possibly giving L. johnsonii
NCC533 the ability to utilize more substrates for internal pH regulation.
Fig. 2 Schematic representation of a pACT analysis of two homologous genome regions in L. johnsonii NCC533 (upper panel) and L. acidophilus NCFM (lower panel). Genomes are indicated by the black horizontal lines (not to scale). Predicted ORFs are shown as red (more ...)
Other prominently highlighted group-regions represent specific transport systems for other carbohydrates and amino acids (i.e. an auxin transporter LBA367, an l-lactate permease LBA1768, an ABC-type multidrug exporter LBA1859, and an amino acid permease LBA1902 (Fig. a, IV, XXVI, XXVIII, and XXIX)). Considering their similar ecological niches, the intestinal lactobacilli may share a large array of similar transport systems, not found in other LAB, reflecting their adaptation to the intestine.
A region likely involved in pentose uptake and conversion shows an intriguing mosaic structure of a gene cluster conserved in L. acidophilus
NCFM and other LAB (red), flanked by two loci conserved only in at least one of the other three lactobacilli of human origin (green) (Fig. a, XVI, XVII, and XVIII). The central region (LBA1481 to 1485) encodes for an ABC-type ribose transporter and a ribose kinase, likely activating the sugar moiety for further conversion after uptake [62
]. Immediately downstream of this cluster, an ORF was identified likely to encode for a cellulase (LBA1480). Only L. johnsonii
NCC533 showed a closely related homolog (LJO1263). This particular cellulase was classified within the glycosyl hydrolase family 5, a widespread group of enzymes that hydrolyze the glycosidic bond between two or more carbohydrates and are known to act on both hexoses (cellulose) and pentoses (xylans) [31
]. While it has been shown that L. acidophilus
NCFM is able to grow on cellobiose and weakly on xylobiose (van Santen and Lahtinen, personal communication), growth on cellulose has not yet been experimentally validated. Upstream of the central cluster, a predicted l
-fucose isomerase was identified (LBA1486) and only L. johnsonii
NCC533 revealed a closely related homolog (LJO1264). However, the gene synteny between the two strains is not preserved. Whereas in L. johnsonii
NCC533 the cellulase and the isomerase genes are located adjacent to each other and appear to be organized in one operon, the two genes are flanking the sugar ABC transporter (LBA1481–LBA 1485) in L. acidophilus
NCFM. More interestingly, l
-arabinose isomerase revealed a very similar domain structure to l
-fucose isomerases. In conjunction with the other genes present in this cluster, a partial pathway for arabinose uptake and utilization could be proposed. The predicted cellulase might be involved in xylan or arabinan degradation, both molecules are an essential part of plant and mycobacterial cell walls. Arabinose molecules are then transported into the cell by the ABC-type transporter, predicted to be specific for pentoses. l
-arabinose can then be converted into l
-ribulose by the l
-arabinose isomerase and further activated by the ribose kinase. Sa-Nogueira et al. [58
] reported the utilization of arabinose by Bacillus subtilis
to depend on three enzymes, namely an l
-arabinose isomerase, an l
-ribulokinase, and an l
-ribulose-5-phosphate 4-epimerase. Although the first two enzyme activities were identified in L. acidophilus
NCFM, no 4-epimerase was predicted. However, the presence of a transposase (LBA1487) directly adjacent to the isomerase could indicate a previous genome rearrangement resulting in the loss of the epimerase gene. It might be interesting to investigate this predicted partial pathway by providing the missing enzyme activity in trans and analyze the organisms potential capability to utilize xylans or arabinans.
On the other hand, there are a number of genome regions conserved between L. acidophilus
NCFM and at least one other member of the LAB database dbLAB that are not, or only weakly conserved in the other three lactobacilli (red). The multiple sugar metabolism (msm) operon (LBA500 to 507) (Fig. a, VII) has been extensively analyzed and described previously [14
]. A similar operon also involved in sugar uptake and metabolism has been identified and was designated msmII (LBA1437 to 1443) [6
] (Fig. a, XV). The enzymatic part (sucrose phosphorylase, LBA1437 and alpha galactosidase, LBA1438) and the energy conversion unit of the ABC transporter (ATPase, LBA1439) of the msmII operon are highly conserved in both the lactobacilli of human origin and many other LAB. However, the substrate-specific genes (permease subunits LBA1440 and 1441 and the sugar-binding component LBA1442) are unique to L. acidophilus
NCFM and two Streptococcus
strains, namely S. mutans
and S. pneumoniae
. The unique uptake and binding proteins may indicate the possibility of varying substrate specificities for these systems.
Pullulanases are involved in the hydrolysis of (1 − >6)-alpha-D-glucosidic linkages in pullulan, amylopectin, and glycogen. The smallest sugar released by its enzymatic activity is maltose. This enzyme (Lba1710) appears to be unique for L. acidophilus
NCFM among the other lactobacilli (Fig. a, XXIV). A BlastP search against the non-redundant database revealed further similarities to firmicutes, in particular to bacilli. Based on these results, a phylogenetic tree was constructed using the Cobalt multiple sequence alignment tool and the Fast Minimum Evolution algorithm [50
] (data not shown). The closest homologs to Lba1710 were found in Lactobacillus amylolyticus
DSM11664 (beer malt and beer wort), Lactobacillus crispatus
JV-V01 and Lactobacillus iners
DSM 13335 (both are considered part of the normal human microbial flora), whereas bacilli and bifidobacteria exhibited weaker similarities. In particular, pullulanases from bifidobaceria were nearly twice the size than Lba1710–extending the N-terminal part of the protein while maintaining a more conserved C-terminus. Further detailed analyses will be required to examine the differences in functionality between those enzyme groups. A SignalP analysis [22
] revealed the presence of a signal peptide with a predicted cleavage site between position 35 and 36 of the deduced amino acid sequence, indicating an extracellular location and thus further supporting the proposed enzymatic activity. The presence of this unique enzyme in L. acidophilus
NCFM could reflect an adaptation to the nutritional content of the hosts GI-tract. Amylopectin is a highly branched polymer of glucose found in plants and is one of the two components of starch. Glycogen is a polysaccharide and the principal storage form of glucose in animal cells. Pullulan is a polysaccharide polymer consisting of maltotriose units and is produced from starch by the yeast Aureobasidium pullulans
. Because of its physical and biochemical properties, it is increasingly used in food industry as a non-digestible carbohydrate [70
]. Hence, all three components are likely to be present in the GI tract as integral parts of a diet. Accordingly, L. acidophilus
NCFM might be able to utilize some or all of these complex carbohydrates. However, in vitro growth experiments have failed to show that NCFM can grow on pullulan (O’Flaherty & Klaenhammer, unpublished data).
As reported earlier, L. acidophilus
NCFM is auxotrophic for many amino acids, including the branched chain amino acids leucine, isoleucine, and valine [6
]. An ABC transport system was identified (Lba1943 to 1946), likely to be specific for uptake of these amino acids (Fig. a, XXX). Interestingly, no other Lactobacillus
of human origin revealed a system with significant amino acid similarities. Only weak similarities were identified to the periplasmic and ATPase components. The substrate-specific permeases, however, were unique to L. acidophilus
NCFM. In contrast, the branched chain amino acid ABC transporter was well conserved throughout LAB, most notably in Lactobacillus delbrueckii
. Both gene synteny and content are highly conserved on amino acid level. Interestingly, L. delbrueckii
features a high overall GC content of 51.37%, and the ABC transporter exhibits an even higher GC content of 52.26%. L. acidophilus
NCFM on the other hand belongs to the low GC branch of LAB and has an overall GC content of only 34.71%, with the ABC transporter featuring an only slightly higher GC content of 35.44%. This is also reflected by a DNA alignment of both regions that shows only a few regions to be conserved (data not shown). The observation that this system is, apart from L. delbrueckii
, almost exclusively conserved in low to medium GC content LAB, such as E. faecalis
, S. pyogenes
, S. thermophilus
, S. agalactiae
, O. oeni
, or L. mesenteroides
, might indicate an ancient genetic transfer to L. delbrueckii
, with a subsequent GC and codon adaptation, while preserving the secondary structure.
Lactobacillus johnsonii NCC533
As described previously, L. johnsonii
NCC533 and L. gasseri
ATCC33323 share a significant amount of genetic information [55
] and are considered closely related [24
]. Despite this high level of similarity, a number of regions were identified unique to L. johnsonii
NCC533 when compared to dbLB and dbLAB. Most prominent were the two prophages of L. johnsonii
NCC533 (Lj965 ranging from Ljo288 to Ljo330 and Lj928, ranging from Ljo1418 to Ljo1465 [67
]) (Fig. b, VIII and XXII). Both elements were recognized by BlastP and DBA analyses, since some phage genes displayed similarities to LAB phages unrelated to those of lactobacilli (data not shown).
Furthermore, a putative arsenite-efflux transport system was identified to be unique to L. johnsonii NCC533 among LAB (Fig. b, VII). This system is likely to be organized in an operon and consists of an operon repressor ArsR (Ljo230), an arsenite efflux transporter ArsB (Ljo231), and an arsenate reductase ArsC (Ljo232). This operon might provide an efficient detoxification system for arsenate by forming arsenite which, subsequently, is exported from the cell. This might reflect an interesting lifestyle adaption to the increasing concentrations of these substances in the environment (i.e. use of metal-containing pesticides) which have been shown to accumulate in higher organisms.
An L. johnsonii NCC533 unique region predicted to be involved in exopolysaccharide biosynthesis was also identified (Ljo1707 to Ljo1711) (Fig. b, XXV). In this small gene cluster, four genes (Ljo1707 to Ljo1710) showed similarities to glycosyl transferases of family 8. This family includes enzymes that transfer sugar residues to donor molecules. Based on genome context analysis, the gene product of Ljo1711, a 3039 aa protein with an LPXTG membrane anchor and a 10 amino acid repeat structure, might act as donor protein.
A region with strain-specific characteristics, analyzed previously [36
], encodes an EPS cluster. Highly conserved genes (eps
A, B, C, D, E, J, and I) flank a unique core consisting of sugar transferases and polymerases (Fig. b, XVII).
Lastly, a genome region (Ljo1748 to Ljo1755) encoding a potential autonomous unit (PAU) has been identified (Fig. b, XXVII), previously believed to be unique to L. acidophilus
]. PAUs in L. acidophilus
NCFM resembled elements from both bacteriophage and plasmids. Analysis of pauLjo-I region in L. johnsonii
NCC533 revealed a striking similarity in both functional classification and gene synteny to pauLa-I, pauLA-II, and pauLa-III of L. acidophilus
NCFM (Fig. ). Consequently, this region was designated as pauLjo-I. Interestingly, the amino acid similarities of the conserved core genes (fts
A and int
G) were very low to the respective L. acidophilus
NCFM elements. Further analyses (gene ortholog neighborhoods) using the Integrated Microbial Genomes system (IMG) provided by the DOE Joint Genome Institute (http://img.jgi.doe.gov/pub/main.cgi
) revealed the presence of three more genetic elements with significant similarities to the core region of PAUs. These elements comprise of pauSaga-I (ORFs2064 to 2075) found in Streptococcus agalactiae
NEM316, and pauLlc-I (scaffold18, ORFs1220 to 1228, of the current draft phase genome) and pauLlc-II (scaffold6, ORFs 3359 to 3364, of the current draft phase genome) found in Lactococcus lactis
SK11 (Fig. ). Gene synteny and functional classification remain highly conserved and little variation was observed in the presence or absence of the two smaller hypothetical core genes. Like L. johnsonii
NCC533, the amino acid similarities of the classified core elements remain very weak and do not cluster consistently among the three proteins (Supplemental Fig. 1). This suggests either a differentiation of a common ancestor or distinctive different roots of these elements. Adjacent of the core, PAUs displayed genes that might stabilize or maintain the respective PAU in the chromosome. In case of L. acidophilus
NCFM, the death on curing (doc) system and several restriction/modification (R/M) systems were identified in close proximity to the core genes [6
]. Similarly, pauLlc-II exhibited the presence of an R/M system, an adenine specific DNA methylase, and a putative abortive infection (abi) system alpha, upstream of FtsK. Interestingly, pauSaga-I exhibited a cadmium export system (CadD and CadX), similar to the one determined in pauLa-I (CadB and CadX). This system consists of a cadmium exporter (CadD family) and a regulatory protein (CadX). Amino acid alignments of CadD and CadB revealed a 60% identity (75% similarity), indicating similar transporter functionalities for both genes. Similarly, both regulatory proteins exhibited 44% sequence identity (67% similarity).
Fig. 3 Potential autonomous units identified in LAB. Alignment of previously described and newly identified PAUs in L. acidophilus NCFM, L. johnsonii NCC533, Lactococcus lactis ssp. cremoris SK11, and Streptococcus agalactiae NEM316. Elements of the core region (more ...)
With the exceptions of pauLjo-I and pauLlc-I, every investigated PAU harbored genes adjacent to its core that might promote stabilization of the PAU in the chromosome. The discovery of seven different PAUs in four different organisms clearly suggests that PAUs are not a curiosity, but represent a new class of potentially mobile elements that might be further classified into different distinct families. However, no functional analyses have been performed to date and the in vivo activities remain to be investigated.
DBA analysis highlighted a large number of genes found to be shared between L. acidophilus NCFM, L. gasseri ATCC33323, and L. johnsonii NCC533 but which are not highly conserved within other LAB (green color shading). As described earlier, a significant portion of these regions are comprised of cell surface proteins mediating host–cell interactions (Ljo46, 47, 48, 484, and 1128), transport systems (Ljo733 and Ljo734), and metabolism islands (Ljo730, 731, 732 and Ljo1263 to Ljo1268) (Fig. b, II, X, XII, XIX, and XXI).
A genome region located at ~60 kbp (Ljo59 to Ljo63) was identified to harbor a partial exopolysaccharide (EPS) biosynthesis and transport cluster (Fig. b, III). This cluster was highly conserved to L. gasseri ATCC33323 and L. acidophilus NCFM and to a significantly lower degree in similarity and synteny to Lactobacillus bulgaricus. Interestingly, Ljo60 to Ljo62 show similarities to glycosyl transferases of family 8. Despite their chromosomal distance, perhaps this gene cluster and the L. johnsonii NCC533–specific cluster (Ljo1707 to Ljo1711) described earlier may act synergistically by providing different moieties to the LPS structure.
Directly adjacent to the LPS cluster, a smaller set of genes was identified, coding for a bile salt hydrolase (Ljo56) and two bile salt transporters (Ljo57 and Ljo58) (Fig. b, II). Not surprisingly, the bile salt hydrolase is highly conserved in L. acidophilus NCFM,L. gasseri ATCC33323, and L. plantarum WCFS1, likely reflecting their common adaptation to the intestinal environment. Other LAB, in particular bifidobacteriae, show only a limited degree of similarity to Ljo56. In contrast, both bile salt transporters are only conserved in L. gasseri ATCC33323 and L. johnsonii NCC533. It is not clear if bile salt hydrolases have a wide substrate spectrum, allowing the deconjugation of many different bile salts. In contrast, it is more likely that bile salt transporters might be dedicated to certain classes of bile salts.
Despite the exceptional degree of similarity to L. gasseri ATCC33323, several genome regions were identified that were uniquely shared between L. johnsonii NCC533 and other LAB, but not present in either three of the other Lactobacilli (red color shading).
Interestingly, a genome region close to the origin of replication was originally annotated to harbor an enzyme involved in thiamine biosynthesis (Ljo20, 21, 22) (Fig. b, I). However, when analyzed in context, these ORFs are likely to encode for different functionalities. Ljo20 was annotated as a transcriptional regulator and is likely to control expression of Ljo21 and Ljo22. The gene product of Ljo21 shows significant similarities to COG0476. This cluster represents dinucleotide-utilizing enzymes involved in molybdopterin and thiamine biosynthesis, as indicated by the original annotation. The presence of Ljo22, a predicted efflux transporter, rendered the initial functional classification of Ljo21 questionable. Further analyses of Ljo 21 and Ljo22 revealed similarities to the microcin C51 production gene mcc
B in Escherichia coli
] and the microcin C7 secretion protein mcc
C in Heliobacter pylori
], respectively. Interestingly, the identified similarity of MccB to COG0476 has been described for both MccB of microcin C7 [26
] and C51 [23
]. It has been proposed that the adenylation conferred by MccB plays a role in the substitution of the C-terminus of the heptapetide by AMP [23
] and is not involved in molybdopterin or thiamine biosynthesis. MccC has been shown to provide partial immunity, complemented by MccE. However, in L. johnsonii
NCC533, the microcin operon appears to be only partially conserved, as mcc
A, the gene coding for the heptapeptide moiety, mcc
D, and mcc
E (immunity) are absent from the operon. Furthermore, the presence of the divergently oriented transcriptional regulator Ljo20 does not comply with the reported operon structure of microcins. The low GC content of 25.78% found for this region might indicate horizontal gene transfer (HGT) as source of acquisition. This is further strengthened by the absence of this gene cluster in all other lactobacilli present in dbLB. Analysis of dbLAB revealed homologous proteins only in Streptococcus thermophilus
CNRZ1066 (CP000024: ORF 1944, mcc
B and 1943, pmr
B). A similarly low GC content also indicates HGT as possible source of acquisition; however, the predicted transcriptional regulator (ORF1947) is separated from proposed mcc
B and pmr
B by two predicted transposase genes (ORFs1945 and 1946). It might be speculated that this genome region represents an acquired resistance mechanism to microcin-like substances giving it a strain-specific competitive advantage. Alternatively, the gene cluster could resemble the remnants of a bacteriocin-producing operon, similar in structure to microcins.
An uptake system found in L. johnsonii NCC533, but not any other Lactobacillus, consisted of a PTS sugar transporter (predicted mannose specificity) and a 2-CRS sensor (Ljo1652 to 1660) (Fig. b, XXIII). Ljo1653 to Ljo1656 represent PTS components A to D. These genes are relatively highly conserved in Enterococcus faecalis, Lactobacillus casei, Streptococcus mutans, and Leuconostoc mesenteroides, with decreasing levels of similarity from PTS EIID to EIIA. Located in the same operon-like gene cluster, Ljo1652 was found to share similarities to membrane proteins and permeases, potentially acting as an accessory protein to the PTS system. Homologous genes were identified in L. casei, S. mutans, and L. mesenteroides. Of the other lactobacilli in dbLB, only L. acidophilus NCFM showed very weak levels of similarity to this gene cluster (except for Ljo1652), indicating only a general functional similarity that is unlikely to cover the same substrate spectrum. This uptake system might give L. johnsonii NCC533 the advantage of using additional sugars as carbon and energy sources in shared ecosystems. Genes likely to be involved in gene regulation were oriented in the opposite direction (Ljo1657 to Ljo1660) (Fig. b, XXIII). Ljo1658 and 1659 encoded for a predicted 2 CRS, with similarities of the histidine kinase indicating sugar specificity (COG2851). Interestingly, the flanking genes Ljo1657 and 1660 revealed similarities to periplasmatic sugar-binding protein. In addition, Ljo1657 revealed conserved domains indicating a possible role as primary receptor for sugar transport and transcriptional repressor (PFam00532 and COG1609). These four genes do not feature homologs in the other three lactobacilli and the closest relatives were found in streptococci.
Lastly, a gene likely to encode a cell-envelop-associated proteinase was identified in L. johnsonii
NCC533 (Ljo1840) (Fig. b, XXXI). Cell-envelop proteinases (CEP) facilitate the proteolysis of casein in lactic acid bacteria [53
] and are often essential for the growth in milk [25
]. Furthermore, CEPs were also associated with periodontal disease in Bacteroides forsythes
]. Most CEPs require the presence of chaperones for maturation (lactococcal PrtP [69
] and PrtH from Lactobacillus helveticus
]), whereas others are capable of an autocatalytic process that substitutes the chaperone protein PrtM with an intramolecular chaperone (PrtB of L. delbrueckii
]). The identified CEP of L. johnsonii
NCC533 is a unique gene among lactobacilli of human origins. Only L. acidophilus
NCFM harbored a weakly conserved homolog (Lba1512). However, as described earlier, the homologs are present and conserved in other LAB, further implicating a key role in extracellular protein processing. Most notably, several lactobacilli (L. casei
, L. bulgaricus)
and L. lactis
showed conserved homologs of PrtH. A phylogenetic tree featuring previously described and predicted cell-envelop proteinases from several lactobacilli and lactococci revealed a clear clustering of PrtH from L. helveticus
and CEP of L. johnsonii
NCC533 (Fig. ) likely positioning Ljo1840 into the same functional group (Group I) as PrtH, as was noted previously [55
]. This is further supported by the presence of the divergently oriented maturation protein PrtM (Ljo1841). However, two of the five reported substrate-binding regions differ significantly in amino acid composition in Ljo1840, indicating a different specificity than previously reported CEPs. Interestingly, PrtP of L. acidophilus
NCFM is the phylogenetically most different protein included in the analysis, although many conserved regions remain intact. In contrast to Ljo1840, no adjacent maturation protein could be identified; the closest chaperone-like protein was found 70 kb downstream (PrtM, Lba1588) and it is unknown whether PrtM is involved in PrtH processing.
Fig. 4 Phylogenetic tree of cell-envelop proteinases (CEP) of selected LAB. Deduced amino acid sequences of CEPs identified in other LAB were aligned using ClustalW , and the phylogenetic tree was calculated and visualized using Mega3.1 . The evolutionary (more ...)
Lactobacillus gasseri ATCC33323
Originally isolated from the human GI tract, L. gasseri
ATCC33323 occupies a similar ecological niche as L. acidophilus
NCFM and L. plantarum
]. However, overall genome synteny and similarity were found to be significantly more conserved between L. gasseri
ATCC33323 and L. johnsonii
NCC533, than between L. gasseri
ATCC33323 and the other two lactobacilli (Fig. ). More than 50% of the predicted ORFs in L. gasseri
ATCC33323 share similarities to L. johnsonii
NCC533 ORFs at a level of 1e-100 and below (Fig. a). This likeness is not surprising as phylogenetically these two species are the most closely related of the members of the L. acidophilus
NCFM complex. However, it is noted that the two species are often isolated from distinctly different hosts: L. gasseri
ATCC33323 from humans and L. johnsonii
NCC533 from the crop of chickens [2
]. These are distinctive environments with unique metabolic challenges. The L. johnsonii
NCC533 strain was sequenced and used in this analysis, however, was isolated from a human [55
Fig. 5 Protein similarity and genome synteny of four Lactobacillus genomes pACT were used to simultaneously analyze four complete genomes on amino acid level. Double lines featuring pointed boxes indicate predicted ORFs in their respective orientation. Red and (more ...)
Fig. 6 Differences in e-value distribution between the four lactobacilli of human origin when compared to the two custom databases dbLB and dbLAB. Based on the best BlastP hit, similarities were grouped within the assigned e-value ranges. Solid bars represent (more ...)
A significant percentage of ORFs are very similar between L. gasseri ATCC33323, L. johnsonii NCC533, and L. acidophilus NCFM and only to a lesser degree conserved to other LAB (green). Among the most dominant features identified are exoproteins like mucus- and fibrinogen-binding proteins (Lga130040, 130041, 130042, 130139, 130140, 131623, and 131641), as described earlier (Fig. c, I, III, XXII, and XXIII). One of the largest proteins in the genome, a specific mucus-binding protein (Lga130041), was found in L. johnsonii NCC533 (Ljo0047) and L. acidophilus NCFM (Lba1019 and 1020), but was absent in L. plantarum WCFS1. Homologs in L. gasseri ATCC33323 and L. johnsonii NCC533 were located close to the origin of replication, whereas the corresponding ORF in L. acidophilus NCFM was found close to the terminus of replication. No evidence of apparent genome rearrangements was observed. Quite contrary, the region close to the origin shows a relatively high degree of synteny between all three genomes (Fig. ).
Adjacent to this surface protein, a bile salt export system was identified (Fig. C, II). This system consists of a bile salt hydrolase and two permeases. As described for the corresponding system in L. johnsonii NCC533, closely related genes to this export system can only be found in these two strains. Interestingly, one of the permease genes (Lga130054) appears to be truncated by a premature stop-codon and its integrity remains to be verified.
Furthermore, a tagatose-diphosphate aldolase was identified (Lga130142), present only in L. johnsonii NCC533, L. gasseri ATCC33323 and to a lesser degree, in certain LAB (Fig. c, III). This enzyme is able to convert tagatose-1,6-diphospate into glycerone phosphate and glyceraldehyde-3-phosphate. Both metabolites can be directly fed into the glycolysis for further conversions. Although all genes necessary for lactose uptake and conversion are present in the L. gasseri ATCC33323 genome (data not shown), the presence of a sugar phosphate permease (Lga130141) directly adjacent to the aldolase could indicate an additional tagatose-specific sugar uptake and metabolism system, not present in L. acidophilus NCFM and L. plantarum WCFS1. Tagatose is widely used as a commercial low-calorie sweetener in soft drinks and fruit juices but is otherwise present in only small amounts in dairy products. The increasing consumption of low-calorie foods and the resulting rising presence of tagatose in the GIT might represent a new source of energy, promoting the growth of L. gasseri ATCC33323 and L. johnsonii NCC533.
A gene-pair which is conserved in L. gasseri ATCC33323 and L. acidophilus NCFM, but not in any other LAB consists of an oxalyl CoA-decarboxylase (Lga130245) and a formyl CoA-transferase (Lga130244) (Fig. c, IV). As described for L. acidophilus NCFM above, this system is likely involved in oxalic acid degradation.
The synthesis and modification of bacterial cell walls and the incorporated teichoic acids require the coordinated interaction of many different enzyme groups. The most predominant group is composed of glycosyltransferases, catalyzing the transfer of sugar moieties from activated donor molecules to specific acceptor molecules. A group of three genes (Lga131543 to 131545), presumably organized in an operon-like structure, has been identified in L. gasseri ATCC33323 that is likely involved in teichoic acid biosynthesis (Fig. c, XIX). All three genes were predicted to be glycosyltransferases and classified into separate families. Lga131544 represents a glycosyltransferase of group 1 which is highly conserved in L. acidophilus NCFM (Lba520), L. johnsonii NCC533 (Ljo1736), and many other LAB. However, close homologs cannot be found in L. plantarum WCFS1. A glycosyl transferase of the WecB/TagA/CpsF family (Lga131543) might be involved in the synthesis of cell wall polymers and was highly conserved in many LAB, including the four lactobacilli. Notably, the last gene of this operon-like gene cluster, Lga131543, a glycosyltransferase of group 2 that transfers sugar moieties to teichoic acid, is unique to L. gasseri ATCC33323 and L. johnsonii NCC533. A comparison to other LAB revealed that this region appears to be highly variable. L. acidophilus NCFM only harbors two genes (Lba 519 and 520), L. johnsonii NCC533 and L. gasseri ATCC33323 each feature a three gene cluster, whereas the region in L. bulgaricus consists of five genes (ORFs 111644 to 111648). One could speculate that the components of the cell wall synthesized by the proteins encoded in this operon harbor a conserved core that then features strain-specific entities, possibly altering cell surface properties and consequently, immunological responses.
Despite the predominant similarity to L. johnsonii NCC533, several genome regions were revealed in L. gasseri ATCC33323 that exhibited more significant similarities to other LAB (red).
Two loci, separated by ~40 kb, each harbored a carbohydrate transport system (Fig. c, V and VI). The first region (Lga130338 to Lga130341) consisted of four genes, composing a lactose-specific ABC transporter. LacE (Lga130340), representing the sugar-specific permease component, featured only very weak homologs in the other three lactobacilli. Similarly, LacG (Lga130341), a 6-phospho-galactosidase, was only weakly conserved in this dataset, with homologs displaying only certain conserved domains. However, highly conserved homologs were identified in several streptococcal strains. The second locus (Lga130393 to 130395) also represents an ABC transporter. However, the permease (Lga130395) and sugar-specific hydrolase (Lga130396) feature relatively close homologs in both databases. Interestingly, BglX (Lga130394) a beta-glucosidase and the first structural gene of this operon-like structure, cannot be found in the three other lactobacilli but features close homologs in several E. faecalis
and other LAB strains. Glycoside hydrolases of this specific family are known to accept larger carbohydrates as substrates and release beta-D-glucosides. In E. coli
BglX homologs were found to be located in the periplasm or the cytoplasm [71
], and the absence of a signal peptide sequence in Lga130394 could indicate a cytoplasmic location. Substrate specificities for this predicted carbohydrate transporter remain unresolved, and the presence of a BglX homolog might point to a different, more complex substrate for Lga130395 than glucose or maltose.
Another sugar metabolism and uptake system is located at ~500 kb (Fig. c, VII). It mainly consists of two PTS systems, likely to be specific for galactitol and cellobiose, respectively and an adjacent operon featuring both subunits of a galactose isomerase. A PTS system consisting of the three genes gatA, gatB, and gatC (Lga130491 to 130494) is likely to be specific for galactose uptake. In succession, the adjacent isomerase operon (Lga130487 to 130489), induced by the presence of galactose, may catalyze the conversion of D-galactose 6-phosphate to D-tagatose and 6-phosphate in the tagatose 6-phosphate pathway of galactose catabolism. Both systems are weakly conserved or not present in the other three Lactobacillus strains, but can be readily found in many other LAB. The second PTS system identified in this genome region (Lga130496 to 130499) is likely to be specific for lactose or cellobiose. Similar to the other PTS system, significant similarities can be found almost exclusively outside the Lactobacillus group.
Centered on a phage-related integrase (Lga130904), a type I restriction/modification (R/M) system (Lga130902 to 130906) was identified via DBA, not found in this form in the other three lactobacilli of human origin (Fig. c, X). The main function of R/M system lies in the protection of the bacterial organism against introduced foreign DNA. The foreign DNA will be degraded by endonucleolytic cleavage, whereas the hosts’ DNA is protected by a system-specific pattern of modifications. The R/M system consists of three different complexes: a DNA methylase (family M), a target recognition domain (family S), and a restriction unit (R), present in varying quantities. Here, one N-6 adenine-specific DNA methylase (Lga130902) mediated the methylation of specific DNA sequences. Two target recognition peptides (Lga130903 and 130905) will then interact with either the DNA methylase to protect the host DNA or with the restriction enzyme (Lga130906) to degrade unprotected DNA. Interestingly, a DEAD/H box helicase (Lga130900) was also present in the gene cluster and showed close homologs only within the LAB group. This helicase might play a role in unwinding the DNA prior to its methylation. Interestingly, this gene complex features a sharp decline in GC content (29.9%), possibly indicating an acquisition via horizontal gene transfer. Alternatively, the region could represent mobile DNA or a remnant thereof, as indicated by the presence of the integrase gene. It might be interesting to investigate the range of protection this system may provide against foreign DNA, including DNA introduced by bacteriophage or electroporation.
A 5′-nucleotidase (Lga131078) was identified that is highly conserved in LAB, but not found in L. acidophilus NCFM, L. johnsonii NCC533, or L. plantarum WCFS1 (Fig. c, XIII). This gene is partly annotated as a secreted or peptidoglycan-bound enzyme. There it might function by degrading free DNA molecules and providing nucleotides for readily uptake into the cell.
Threonine dehydratase (Lga131436) usually mediates the conversion of l-threonine to 2-oxobutanoate and NH3 (Fig. c, XVI). Although L. gasseri ATCC33323 was predicted to harbor the complete pathway to convert l-aspartate into l-threonine (data not shown), the subsequent pathway to further utilize 2-oxobutanoate in the valine, leucine, and isoleucine metabolism appears to be absent. However, this enzyme is also able to act as an l-serine ammonia-lyase, mediating the conversion from l-serine to pyruvate and NH3. Lga131436 is not conserved in any of the remaining Lactobacillus strains of human origin and only E. faecalis strain V583 revealed a protein with some degree of similarity. Interestingly, Lactobacillus strains do harbor an l-serine dehydratase, mediating the same reaction. This enzyme consists of two subunits and does not show any sequence similarities to Lga131436. Furthermore, the presence of an adjacent permease (Lga131437) and a transcriptional regulator (Lga31438) could indicate a different metabolic role for this operon-like structure for L. gasseri ATCC33323. One could speculate that the permease upon induced expression would transport L-serine into the cell, where it is converted into pyruvate by the serine lyase. Pyruvate is then further metabolized to create ATP. This model would suggest a unique energy gaining mechanism for L. gasseri ATCC33323 dependent on the presence of free extracellular serine.
A second restriction/modification system (Lga133002 to 131482) largely unique to L. gasseri ATCC33323 was identified at ~1.45 Mbp (Fig. c, XVII). In contrast to the previously described type I system, this one is likely to represent a type III R/M system, consisting of a chromosome aggregation ATPase (Lga131480), two DNA methylases (Lga131477 and 131478), a type III restriction endonuclease (Lga131476), and a DNA helicase (Lga131474). This system does not feature any homologs in the other three lactobacilli, and it might provide a unique protection for L. gasseri ATCC33323 on the nucleotide level. Although single components are conserved in other LAB, most notably Pediococcus pentosaceus and Lactobacillus brevis, no organism represented in either database appears to harbor a complete homologous protection system.
Adjacent to the unique R/M system, a high-GC region harbors a phage remnant (the terminase, structural module, and lysis module are partially present), oriented against the main coding direction (Fig. c, XVII). Interestingly, this phage remnant also appears to feature a DNA helicase which additionally exhibits a type III endonuclease domain (Lga131490) and a DNA methylase (Lga131492). It is unknown whether this system might interact with other R/M systems or if it represents a phage-specific system.
One of the most unique features of L. gasseri ATCC33323 is the presence of a tandem phage, exactly duplicated in the genome. These two phages (Lga130573 to 130635 and Lga130636 to 130698), located at ~600 kbp, appear to be genetically complete (Fig. c, IX). Both phages are identical on nucleotide level and are directly adjacent to each other, with no other intermediate genes present (publication pending). At this point, the occurrence of a tandem phage represents a novel genome structure, and further experiments are required to investigate the functionality of these phages and the impact of the tandem organization on their life cycle.
A second L. gasseri ATCC33323-specific genome region was identified close to the terminus of DNA replication. A set of ORFs (Lga130942 to 130947) is comprised mostly of mucus-binding proteins, which could indicate strain-specific cell-binding or adhesion properties (Fig. c, XI).
Lactobacillus plantarum WCFS1
WCFS1 is by far the largest Lactobacillus
genome. With more than 3050 predicted ORFs, it is approximately 50% larger than the other three genomes. This is, in part, reflected by the large number of L. plantarum
specific genes (Fig. a), when compared to genes similar to other LAB in dbLAB (Fig. b). More than 60% of the ORFeome showed BlastP hits at 1e-10 and above. However, less than 15% of its genome shares similarities to the other three lactobacilli at a level of 1e-100 or below. In contrast, approximately 40% of the predicted ORFs of L. acidophilus
NCFM reside within this similarity range. As previously pointed out by Boekhorst et al. [15
], L. plantarum
WCFS1 does not appear to be significantly related to either one of the other lactobacilli and in particular to L. johnsonii
]. Figure further illustrates these differences by highlighting the complete lack of genome synteny found between L. plantarum
WCFS1 and the other three genomes.
Not surprisingly, most of the results obtained through DBA analysis represent genome regions conserved in L. plantarum
WCFS1 and other LAB, but which are not present in the other three lactobacilli analyzed (red). These include the previously described nonribosomal peptide synthesis module (Lp0578 to 584) [39
] which could not be identified within the other three lactobacilli but reveals close homologs in Bacillus subtilis
and other LAB (Fig. d, IV). Interestingly, this gene cluster displays a significantly lower average GC content (35.9%) than the overall genome (45.2%), possibly indicating gene acquisition by horizontal gene transfer.
The respiratory nitrate reductase (Lp1498 to 1501) consists of three subunits (NarG, NarH, and NarI) and reduces nitrate to nitrite under anaerobic conditions [39
] (Fig. d, VIII). This forms a redox loop that in turn aids in generating the proton motive force of the organism. Additionally, a chaperone (NarJ) might be present to aid in protein maturing and assembly. This complex was not found in the other lactobacilli and comparison to other LAB revealed an interesting pattern of similarities for the different subunits. The alpha and beta chains NarG (Lp1498) and NarH (Lp1499), respectively, are highly conserved in Bacillus subtilis
168, but do not share homologs in other LAB strains. In contrast, the gamma chain NarI (Lp1502) and the chaperone NarH (Lp1501) are only weakly conserved in B. subtilis
168, whereas the gene synteny is still maintained. Although no significant changes in GC content could be observed for this gene cluster (47.6%), two low GC spikes upstream of Lp1482 and downstream of Lp1503 identify potential hotspots for genome rearrangements. Interestingly, the genes enclosed by the two spikes comprise not only those of the nar
GHJI cluster, but also those of the molybdopterin biosynthesis cluster moe
B (Lp1496), moa
B (Lp1495), moe
A (Lp1494), mob
A (Lp1493), moa
C (Lp1492), and mob
A (Lp1491). This cluster, divergently oriented to the nar
GHJI operon, could provide the co-factor required for the nitrate reductase alpha chain, NarG. Notably, adjacent to the proximal low GC spike, the nitrite extrusion protein NarK (Lp1481) and the molybdopterin biosynthesis proteins MoaA (Lp1480), MoaD (Lp1479), and MoaE (Lp1478) were identified, further strengthening the hypothesis of an ancient genome insertion (see also http://www.cmbi.ru.nl/plantarum/supplementary/supplementary_text.html#molyb
for more information on this genome region).
Another low-GC region (Lp3129 to 3138) featuring a sugar uptake (specific for glucose or maltose) and metabolism system was identified, which was not present in the other three lactobacillus strains (Fig. d, XIV). However, it is likely that similar systems for these essential sugars are widespread throughout all LABs. The previously described low-GC sugar metabolism island [39
] (genome position ~3.1 to ~3.25 Mbp) also shares more similarities to other LAB than to the other three lactobacilli (Fig. d, XV).
Despite the low level of protein similarities to the other lactobacilli of human origin, some genome regions were identified that are shared between the four lactobacilli (green). A fumarate reductase (Lp55), predicted to mediate the conversion from fumarate to succinate in the citrate cycles, was highly conserved in L. gasseri ATCC33323 (Lga 130908) (Fig. d, I). Other LAB only share the N-terminal part of this enzyme. FMN reductases are members of the flavoprotein clan, whose protein families have arisen from a single evolutionary origin. Interestingly, a second predicted fumarate reductase (Lp952) was also identified, sharing extensive similarities with the previously described ORF Lp55 in L. plantarum WCFS1, FrdA (Lga130908) from L. gasseri ATCC33323, and other homologs within the Lactobacillus group (Fig. d, VI). Because different functionalities are so closely related, the exact substrate specificity might be difficult to predict, and further in vitro characterization is required to determine the true function of these proteins.
A predicted ATPase (Lp1520) that might be related to the helicase subunit of the Holliday junction resolvase is highly conserved in all four lactobacilli, but only to a much lesser extent in other LAB (Fig. d, IX). Whether this enzyme will act as a resolvase on unresolved Holliday junctions or is otherwise involved in DNA repair or modification remains to be verified.
Finally, a cation antiporter (Lp2674) was found to be highly conserved only within the four lactobacilli (Fig. d, XIII). Na+/H+ antiporters of the NhaP-type are important for maintaining the pH of metabolizing cells and may promote survival during gastro-intestinal passage. Interestingly, an efflux permease likely to be specific for arabinose (Lp2675) was identified immediately upstream of the antiporter. These proteins are also partially annotated as an H + antiporter, possibly indicating a functional link between the two proteins and providing an alternative model. One could speculate that the import of arabinose, at the cost of exporting protons, might lead to changes in the internal pH. These would then be subsequently compensated by the cation antiporter, thus maintaining a physiological state within the cell.