General features of the B. licheniformis genome
The genome of B. licheniformis ATCC 14580 consists of a circular chromosome of 4,222,336 base-pairs (bp) with an average G+C content of 46.2% (Table ). No plasmids were found during the genome analysis, and none were found by agarose gel electrophoresis (data not shown). Using a combination of several gene-finding programs and manual inspection, 4,208 protein-coding sequences (CDSs) were predicted. These CDSs constitute 87% of the genome and have an average length of 873 bp (ranging from 78 to 10,767 bp). They are oriented on the chromosome primarily in the direction of replication (Figure ) with 74.4% of the genes on the leading strand and 25.6% on the lagging strand. Among the 4,208 protein coding genes, 3,948 (94%) had significant similarity to proteins in PIR, 3,187 (76%) of these gene models contain Interpro motifs, and 2,895 (69%) contain protein motifs found in PFAM. The number of hypothetical and conserved hypothetical proteins in the B. licheniformis genome with hits in the PIR database was 1,318 (212 conserved hypothetical proteins). Among the list of hypothetical and conserved hypothetical gene products, 683 (52%) have protein motifs contained in PFAM (148 conserved hypothetical proteins). There are 72 tRNA genes representing all 20 amino acids and seven rRNA operons.
| Table 1Features of the B. licheniformis genome and comparison with genomes of other Bacillus species |
The likely origin of replication (Figure ) was identified by similarities to several features of the corresponding regions in
B. subtilis and other bacteria. These included co-localization of four genes (
rpmH,
dnaA,
dnaN, and
recF) near the origin, GC nucleotide skew ((G-C)/(G+C)) analysis, and the presence of multiple
dnaA-boxes and AT-rich sequences immediately upstream of the
dnaA gene [
10-
12]. On the basis of these observations we assigned a cytosine residue of the
BstBI restriction site between the
rpmH and
dnaA genes to be the first nucleotide of the
B. licheniformis genome. The replication termination site was localized near 2.02 megabases (Mb) by GC skew analysis. This region lies roughly opposite the origin of replication (Figure ). Unlike
B. subtilis, there was no apparent gene encoding a replication terminator protein (
rtp) in
B. licheniformis. The
Bacillus halodurans genome also lacks an obvious
rtp function [
13]; therefore, it seems likely that
B. subtilis acquired the
rtp gene following its divergence from
B. halodurans and
B. licheniformis.
Transposable elements, prophages and atypical regions
The genome of
B. licheniformis ATCC 14580 contains nine identical copies of a 1,285 bp insertion sequence element termed
IS3Bli1 [
9]. This sequence shares a number of features with other
IS3 family elements [
9] including direct repeats of 3-5 bp, a 10-bp left inverted repeat, and a 9 bp right inverted repeat (Figure ).
IS3Bli1 encodes two predicted overlapping CDSs, designated
orfA and
orfB in relative translational reading frames of 0 and -1. The presence of a 'slippery heptamer' motif, AAAAAAG, before the stop codon in
orfA may indicate that programmed translational frameshifting occurs between these two coding sequences, resulting in a single gene product [
14]. The
orfB gene product harbors the DD(35)E(7)K motif, a highly conserved pattern among insertion sequences. Eight of these
IS3Bli1 elements lie in intergenic regions, and one interrupts the
comP gene as noted previously [
9]. In addition to these insertion sequences, the genome encodes a putative transposase that is most closely related (E = 1.8 × 10
-11) to one identified in the
Thermoanaerobacter tengcongensis genome [
15]; however, similar transposase genes are also found in the chromosomes of
B. halodurans [
13],
Oceanobacillus iheyensis [
16],
Streptococcus agalactiae [
17] and
Streptococcus pyogenes [
18].
The presence of several bacteriophage lysogens or prophage-like elements was revealed by Smith-Waterman comparisons to other bacterial genomes and by their AT-rich signatures (Figure , Table ). Prophage sequences, designated NZP1 and NZP3 (similar to
B. subtilis prophages PBSX and
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
-105), were discovered by noting the presence of nearby genes that code for the large subunit of terminase, a signature protein that is highly conserved among prophages [
19]. Interestingly, a terminase gene was not observed in the third putative prophage, termed NZP2 (similarity to
B. subtilis phage SPP1); however, its absence may be the result of genome deterioration during evolution. Interestingly, we observed that regions in which the G+C content is less than 39% usually encoded proteins that have no
B. subtilis ortholog and share identity only with hypothetical and conserved hypothetical genes. Two of these AT-rich segments correspond to the NZP2 and NZP3 prophages.
| Table 2Gene sequences corresponding to isochore peaks shown in Figure 3 |
An isochore plot (Figure ) also revealed the presence of a region with an atypically high (62%) G+C content. This segment contains two hypothetical genes whose sizes (3,831 and 2,865 bp) greatly exceed the size of an average CDS in B. licheniformis. The first gene encodes a protein of 1,277 amino acids for which Interpro predicts 16 collagen triple-helix repeats, and the amino acid pattern TGATGPT is repeated 75 times within the polypeptide. The second CDS is smaller, and encodes a protein with 11 collagen triple-helix repeats; the same TGATGPT motif recurs 56 times. The primary translation products from these genes do not contain canonical signal peptides for secretion, and they do not contain motifs for the twin-arginine or sortase-mediated translocation pathways. Therefore, it is not likely that they are exported to the cell surface or the extracellular medium. Interestingly, the chromosomal region (19 kb) adjacent to these genes is clearly non-colinear with the B. subtilis genome, and virtually all of the predicted genes encode hypothetical or conserved hypothetical proteins. There are a number of bacterial proteins listed in PIR that also contain collagen triple-helix repeat regions, including two from Mesorhizobium loti (accession numbers NF00607049 and NF00607035) and three from B. cereus (accession numbers NF01692528, NF01269899 and NF01694666). These putative orthologs share 53-76% amino-acid sequence identity with their counterparts in B. licheniformis, and their functions are unknown.
Extracellular enzymes and metabolic activities
In the
Bacillus licheniformis genome, 689 of the 4,208 gene models have signal peptides forecast by SignalP [
20]. Of these, 309 have no transmembrane domain predicted by TMHMM [
21] and 134 are hypothetical or conserved hypothetical genes. Based on a manual examination of the remaining 175 genes, at least 82 are likely to encode secreted proteins and enzymes. Moreover, there are 27 predicted extracellular proteins encoded by the
B. licheniformis ATCC 14580 genome that are not found in
B. subtilis 168. In accordance with its saprophytic lifestyle, the secretome of
B. licheniformis encodes numerous secreted enzymes that hydrolyze polysaccharides, proteins, lipids and other nutrients.
Cellulose is the most abundant polysaccharide on Earth, and microorganisms that hydrolyze cellulose contribute to the global carbon cycle. Interestingly, two gene clusters involved in cellulose degradation and utilization were discovered in B. licheniformis, and there are no counterparts in B. subtilis 168. The enzymes encoded by the first gene cluster include two putative endoglucanases belonging to glycoside hydrolase families GH9 and GH5, a probable cellulose-1,4-β-cellobiosidase of family GH48, and a putative β-mannanase of family GH5. The β-mannanase (GH5) and endoglucanase (GH9) both harbor carbohydrate-binding motifs. With the exception of the cellulose-1,4-β-cellobiosidase (GH48), all of the gene products encoded in this cluster have secretory signal peptides, and all have homologs in Bacillus species other than B. subtilis. The overall G+C content of this cluster (48%) does not appear to differ appreciably from that of the genome average (46%). The second gene cluster encodes a putative β-glucosidase (GH1) and three components of a cellobiose-specific PTS transport complex. A second β-glucosidase (GH3) gene is present at an unlinked locus in the genome. Collectively, the genes in these two clusters should enable B. licheniformis to utilize cellulose as a carbon and energy source, converting it to cellobiose and ultimately glucose. In this regard, we have confirmed that B. licheniformis ATCC 14580 is capable of growth on carboxymethyl cellulose as a sole carbon source (not shown). The chromosome of B. licheniformis ATCC 14580 encodes a number of additional carbohydrase activities that may allow the organism to grow on a broad range of polysaccharides. These include xylanase, endo-arabinase and pectate lyase that may be involved in degradation of hemicellulose, α-amylase and α-glucosidase for starch hydrolysis, chitinases for the breakdown of chitooligosaccharides from fungi and insects, and levanase for utilization of β-D-fructans (levans). Several of these activities are marketed as industrial enzymes.
Saprophytic organisms must utilize a variety of nitrogenous compounds as nutrients for growth and metabolism. On the basis of the information encoded in its genome, B. licheniformis ATCC 14580 possesses the ability to acquire nitrogen from exogenous proteins, peptides, amino acids, ammonia, nitrate and nitrite. Like B. subtilis, the repertoire of extracellular proteases produced by B. licheniformis includes serine proteases (aprE, epr, vpr), metalloprotease (mpr), and an assortment of endo- and exopeptidases (yjbG, ydiC, gcp, ykvY, ampS, bpr (two copies), yfxM, yuiE, yusX, ywaD, pepT). However, B. licheniformis also has the capacity to produce a number of additional proteases and peptidases that are not encoded in the B. subtilis genome. These include a clostripain-like protease, a zinc-metallopeptidase, a probable glutamyl endopeptidase, an aminopeptidase C homolog, two putative dipeptidases and a zinc-carboxypeptidase.
B. licheniformis also has the ability to utilize amino and imino nitrogen from arginine, asparagine and glutamine via arginine deiminase, arginase, asparaginase and glutaminase activities. Interestingly, there appear to be two genes each for arginase, asparaginase and glutaminase. Presumably, the arginine deiminase activity is expressed during anaerobic growth on arginine, whereas the arginase activities are predominant during aerobic growth. The occurrence of putative arginase genes is somewhat of an enigma in B. licheniformis, because there are no genes encoding urease activity for the hydrolysis of urea that is generated by the arginase reaction. In addition to the absence of urease gene homologs (ureABC) in B. licheniformis, the glutamine ABC transporters (glnH, glnM, glnP, glnQ gene products) are also lacking.
It appears that nitrogen assimilation and transport pathways may be coordinated similarly in
B. licheniformis and
B. subtilis owing to the presence of key genes such as
glnA,
glnR,
tnrA and
nrgA in both species. Likewise, the pathways for nitrate/nitrite transport and metabolism in
B. licheniformis appear to be analogous to the corresponding pathways in
B. subtilis as suggested by the presence of
nasABC (nitrate transport),
narGHIJ (respiratory nitrate reductase), and
nasDEF (NADH-dependent nitrite reductase) genes. Unlike
B. subtilis,
B. licheniformis evidently possesses the capability for anaerobic respiration using nitric oxide reductase. Moreover, the gene encoding this activity lies in a cluster that includes CDSs for
narK (nitrite extrusion protein), two putative
fnr proteins (transcriptional regulators of anaerobic growth), and a
dnrN-like gene product (nitric oxide-dependent regulator). These observations are consistent with previous findings that certain
B. licheniformis isolates are capable of denitrification [
22]. While denitrification is a process of major ecological importance, the contribution of
B. licheniformis may be small as the species exists predominantly as endospores in soil [
1].
Microbial D-hydantoinase enzymes have been applied to the industrial production of optically pure D-amino acids for synthesis of antibiotics, pesticides, sweeteners and therapeutic amino acids [
23]. This enzyme catalyzes the hydrolysis of cyclic ureides such as dihydropyrimidines and 5-monosubstituted hydantoins to
N-carbamoyl amino acids. Hydantoinase activities have been detected in a variety of bacterial genera, and a cluster of six genes in
B. licheniformis appears to confer a similar capability. This gene cluster encodes
N-methylhydantoinase (ATP-hydrolyzing), hydantoin utilization proteins A and B (
hyuAB homologs), a possible transcriptional regulator (TetR/AcrR family), a putative pyrimidine permease, and a hypothetical protein that contains an Interpro domain (IPR004399) for phosphomethylpyrimidine kinase.
Protein export, sporulation and competence pathways
Kunst
et al. [
10] listed 18 genes that have a major role in the secretion of extracellular enzymes by the classical (Sec) pathway in
B. subtilis 168. This list includes several chaperonins, signal peptidases, components of the signal recognition particle and protein translocase complexes. All members of this list have
B. licheniformis counterparts. In addition to the Sec pathway, some
B. subtilis proteins are directed into the twin-arginine (Tat) export pathway, possibly in a Sec-independent manner. Curiously, the
B. licheniformis genome encodes three
tat gene orthologs (
tatAY,
tatCD, and
tatCY), but two others (
tatAC and
tatAD) are conspicuously absent. Furthermore, specific proteins may be exported to the cell surface via lipoprotein signal peptides or sortase factors. Lipoprotein signal peptides are cleaved with a specific signal peptidase (Lsp) encoded by the
lspA gene in
B. subtilis. An
lspA homolog can be found in
B. licheniformis as well, suggesting that this species may possess the ability to export lipoproteins via a similar mechanism. Lastly, surface proteins in Gram-positive bacteria are frequently attached to the cell wall by sortase enzymes, and genome analyses have revealed that more than one sortase is often produced by a given species. In this regard, three possible sortase gene homologs were detected in the genome of
B. licheniformis ATCC 14580. Together these observations suggest that the central features of the protein export machinery are principally conserved in
B. subtilis and
B. licheniformis.
From the list of 139 sporulation genes tabulated by Kunst
et al. [
10], all but six have obvious counterparts in
B. licheniformis. These six exceptions (
spsABCEFG) comprise an operon involved in synthesis of a spore coat polysaccharide in
B. subtilis. In addition, the response regulator gene family (
phrACEFGI) appears to have a low level of sequence conservation between
B. subtilis and
B. licheniformis.
Natural competence (the ability to take up and process exogenous DNA in specific growth conditions) is a feature of few
B. licheniformis strains [
24]. The reasons for variability in competence phenotype have not been explored at the genetic level, but the genome data offer several possible explanations. Although the
B. licheniformis genome encodes all of the late competence functions ascribed in
B. subtilis (for example,
comC,
comEFG operons,
comK,
mecA), it lacks an obvious
comS gene, and the
comP gene is punctuated by an insertion sequence element (
IS3Bli1), suggesting that the early stages of competence development have been pre-empted in
B. licheniformis ATCC 14580. Whether these early functions can be restored by introducing the corresponding genes from
B. subtilis is unknown. In addition to an apparent deficiency in DNA uptake, two type I restriction-modification systems were discovered that may also contribute to diminished transformation efficiencies. These are distinct from the
ydiOPS genes of
B. subtilis, and could participate in degradation of improperly modified DNA from heterologous hosts used during construction of recombinant expression vectors. Each of these loci in
B. licheniformis (designated as
BliI and
BliII) encode putative HsdS, HsdM and HsdR subunits that share significant amino-acid sequence identity to type I restriction-modification proteins in other bacteria. Curiously, the G+C-content for these loci (37%) is substantially lower than the overall genome average (46%) which may hint that they are the result of gene acquisitions. Lastly, the synthesis of a glutamyl polypeptide capsule has also been implicated as a potential barrier to transformation of
B. licheniformis strains [
25]. While laboratory strains of
B. subtilis usually do not produce significant capsular material, the genome sequence of
B. subtilis 168 indicates that they may harbor the genes required for synthesis of polyglutamic acid. In contrast, many
B. licheniformis isolates produce copious amounts of capsular material, giving rise to colonies with a wet or slimy appearance. Six genes were predicted (
ywtABDEF and
ywsC orthologs) that may be involved in the synthesis of polyglutamic acid capsular material in
B. licheniformis.
Antibiotics, secondary metabolites and siderophores
Bacitracin is a cyclic peptide antibiotic that is synthesized non-ribosomally by some
B. licheniformis isolates [
26]. While there is variation in the prevalence of bacitracin synthase genes among laboratory strains of this species, one study suggested that up to 50% may harbor the
bac operon [
27]. Interestingly, the
bac operon is not present in the type strain (ATCC 14580) genome. Seemingly, the only non-ribosomal peptide synthase operon encoded by the
B. licheniformis type strain genome is that responsible for lichenysin biosynthesis. Lichenysin structurally resembles surfactin from
B. subtilis [
28], and their respective biosynthetic operons are highly similar. Surprisingly, we found no
B. licheniformis counterparts for the
pps (plipastatin synthase) and polyketide synthase (
pks) operons of
B. subtilis. Collectively, these two regions represent sizeable portions (80 kb and 38 kb, respectively) of the chromosome in
B. subtilis, although they are reportedly dispensable [
29].
Unexpectedly, a cluster of 11 genes was found encoding a lantibiotic, with its associated modification and transport functions. We designated this peptide of 75 amino acids as lichenicidin, and its closest homolog is mersacidin from
Bacillus sp. strain HIL-Y85/54728 [
30]. Lantibiotics are ribosomally synthesized peptides that are modified post-translationally so that the final molecules contain rare thioether amino acids such as lanthionine and/or methyl-lanthionine [
31]. Like mersacidin, lichenicidin appears to be a type B lantibiotic, comprising a rigid globular peptide with no net charge (7 acidic residues, 7 basic residues) and a leader peptide with a conserved double glycine cleavage site (GG-type leader peptide). These antimicrobial compounds have attracted much attention in recent years as models for the design and genetic engineering of improved antimicrobial agents [
32]. However, since several post-translational modifications and product-specific export functions are required, a dedicated expression system is a prerequisite to provide all the factors necessary to synthesize, modify and transport the lantibiotic peptide. With its history of use in industrial microbiology,
B. licheniformis may be an attractive candidate for the development of such an expression system.
Like B. subtilis 168, the B. licheniformis ATCC 14580 chromosome harbors a siderophore biosynthesis gene cluster (dhbABCEF), and the organization of the cluster is similar to the corresponding chromosomal segment in B. subtilis. In addition, the B. licheniformis genome contains a second gene cluster of four genes (iucABCD) that show significant similarity to proteins involved in aerobactin biosynthesis in E. coli. Surprisingly, a gene encoding the receptor protein (iutA homolog) was not found in B. licheniformis. The B. halodurans genome also contains genes that are homologous to iucABCD, but like B. licheniformis, no iutA homolog could be found using BLAST or Smith-Waterman searches.
Comparison of the B. licheniformis genome with those of other bacilli
The
B. licheniformis ATCC 14580 gene models were compared to the list of essential genes in
B. subtilis [
33]. Predictably, all of the essential genes in
B. subtilis have orthologs in
B. licheniformis, and most are present in a wide range of bacterial taxa. In pairwise BLAST comparisons, 66% of the predicted
B. licheniformis genes have orthologs in
B. subtilis, and 55% of the gene models are represented by orthologous sequences in
B. halodurans (E-value threshold of 1 × 10
-5; Figure ). Using a reciprocal BLASTP analysis we found 1,719 orthologs that are common to all three species (E-value threshold of 1 × 10
-5).
As noted by Lapidus
et al. [
9], there are broad regions of colinearity between the genomes of
B. licheniformis and
B. subtilis (Figure ). Less conservation of genome organization exists between
B. licheniformis and
B. halodurans, and substantial genomic segments have been inverted in
B. halodurans with respect to
B. licheniformis and
B. subtilis. These observations clearly support previous hypotheses [
8] that
B. subtilis and
B. licheniformis are phylogenetically and evolutionarily closer to each other than to
B. halodurans.