|Home | About | Journals | Submit | Contact Us | Français|
We report novel features of the genome sequence of Leptospira interrogans serovar Copenhageni, a highly invasive spirochete. Leptospira species colonize a significant proportion of rodent populations worldwide and produce life-threatening infections in mammals. Genomic sequence analysis reveals the presence of a competent transport system with 13 families of genes encoding for major transporters including a three-member component efflux system compatible with the long-term survival of this organism. The leptospiral genome contains a broad array of genes encoding regulatory system, signal transduction and methyl-accepting chemotaxis proteins, reflecting the organism’s ability to respond to diverse environmental stimuli. The identification of a complete set of genes encoding the enzymes for the cobalamin biosynthetic pathway and the novel coding genes related to lipopolysaccharide biosynthesis should bring new light to the study of Leptospira physiology. Genes related to toxins, lipoproteins and several surface-exposed proteins may facilitate a better understanding of the Leptospira pathogenesis and may serve as potential candidates for vaccine.
Spirochetes are motile, helically shaped bacteria which include the genera Leptospira, Leptonema, Borrelia and Treponema. Borrelia and Treponema are the causative agents of Lyme disease, relapsing fever and syphilis. Leptospira consists of a genetically diverse group of pathogenic and non-pathogenic or saprophytic species (1). Leptospirosis is a widespread zoonotic disease: transmission to humans occurs through contact with domestic or wild animal reservoirs or an environment contaminated by their urine. Infection produces a wide spectrum of clinical manifestations. The early phase of illness is characterized by fever, chills, headache, and severe myalgias. The disease progresses in 5 to 15% of the clinical infections to produce severe multisystem complications such as jaundice, renal failure and hemorrhagic manifestations (2). In developed countries, leptospirosis is associated with recreational activities (1) while in developing countries it produces large urban epidemics with mortality mainly during the rainy season (3). Leptospirosis also represents a major economic problem producing abortions, stillbirths, infertility, failure to thrive, reduced milk production, and death in animals such as cows, pigs, sheep, goats, horses, and dogs (1). Environmental control measures are difficult to implement because of the long-term survival of pathogenic leptospires in soil and water and the abundance of wild and domestic animal reservoirs (1). Leptospira are classified according to serovar status - more than 200 pathogenic serovars have been identified. Structural heterogeneity in lipopolysaccharide (LPS) moieties appears to be the basis for the large degree of antigenic variation observed among serovars (1). The development of vaccines has been pursued as a strategy for the prevention of leptospirosis. At present, vaccines are based on inactivated whole cell or membrane preparations of pathogenic leptospires which induce immune responses against leptospiral LPS (1). However, these vaccines do not induce long-term protection against infection and do not provide cross-protective immunity against heterologous leptospiral serovars. Protein antigens conserved among pathogenic serovars may contribute to overcoming the limitations of the currently available vaccines.
The genome sequence of Leptospira interrogans serovar Lai was recently published (4) and comparative genome analysis with L. interrogans serovar Copenhageni has been performed. We report here new features of the L. interrogans serovar Copenhageni that should contribute to understanding the molecular mechanisms of leptospiral physiology, pathogenesis and facilitate the identification of candidates for broad-range vaccines.
The sequenced strain, Fiocruz L1-130, was isolated as described by Nascimento et al. (5). The sequencing strategy adopted follows the basic outline of the Xylella genome project (6). Library construction, sequencing, assembly, and finishing were carried out by the Agronomical and Environmental Genomes consortium [http://aeg.lbi.ic.unicamp.br] and by Instituto Butantan. The genome was assembled using phrap from shotgun reads, cosmid reads and PCR-product sequences. Scaffolding was performed using domestic software. Finishing criteria are based on consensus base phred quality of at least 20 and consensus base covered by at least one read sequence of each DNA strand (6). The first base of the sequence was chosen based on our hypothesis for the origin of the replication locus, which was in turn based on the presence of certain genes and on GC-skew variation. Genome annotation and comparative genomics were done as previously described (7). Detection of potentially surface-exposed integral membrane proteins was carried out as described by Nascimento et al. (5). Sequences from 16S rDNA were manually assembled using ESEE 3.2. Phylogenetic analyses were performed based on two matrices (34 taxa and 1255 positions; 24 taxa and 1375 positions) using the program PAUP 4.0b8 (8). Divergence time was estimated based on 1445 positions of the 16S rRNA sequences. A constant rate of 1 to 2% per 50 million years was assumed (9).
The sequences have been deposited in Genbank under accession numbers AE016823 (chromosome I) and AE016824 (chromosome II).
The Leptospira genome consists of two circular chromosomes with a total of 4,627,366 base pairs (bp), chromosome I with 4,277,185 bp and chromosome II with 350,181 bp (5). Circular representations of both chromosomes are depicted in Figure 1. The origin of replication of the large replicon was identified between the dnaA and dnaN genes, as in other bacterium genomes (10). GC nucleotide skew (G − C/G + C) analysis (11) confirmed the origin of replication of the large replicon and indicated two putative sites for the small replicon.
As previously described (12), rRNA genes in L. interrogans are not organized into operons, as in most other bacteria, but are scattered over the chromosome I (Figure 1). L. interrogans serovar Copenhageni has one rrf gene, two rrl genes and two rrs genes coding for 5S, 23S and 16S rRNA, respectively. As in other parasitic strains, L. interrogans serovar Copenhageni has only one rrf (5S) gene, which is located close to the origin replication region as described before for other strains of L. interrogans (12). Comparing the complete rrs (16S) sequences for L. interrogans, serovars Copenhageni, Lai and Canicola the identity among the sequences is of 99.9 to 100%. The rrf (5S) sequence identity comparing Lai and Copenhageni is 100% and the rrl (23S) is 99%. Based on ribosomal genes, Copenhageni and Lai are closely related, as supported by the whole genome comparison (5).
The Spirochaetes are divided into three major phylogenetic groups, or families: Spirochaetaceae, which includes Borrelia and Treponema among others; Brachyspiraceae and Leptospiraceae, which contain two genera, Leptospira and Leptonema (13). Phylogenetic analyses based on 16S rDNA sequences, using Leptonema as an outgroup, show that Leptospira are split into two well-supported monophyletic groups (Figure 2), one of them formed by the pathogenic strains (e.g., L. interrogans) and the other formed by the non-pathogenic strains (e.g., L. biflexa). At the base of the clade of the pathogenic strains, L. inadai and L. fainei form a well-supported assemblage. Similar results were obtained by Postic et al. (14) based on 16S rDNA analyses. In these analyses L. interrogans serovars formed a well-supported monophyletic cluster closely related to L. kirchneri (Figure 2B). Considering a constant divergence rate of 1 to 2% per 50 million years for the 16S rDNA (9), separation time between the two main assemblages (L. interrogans versus L. biflexa) was estimated at 590 to 295 million years.
L. interrogans belongs to a growing number of multichromosomal prokaryotes, including Vibrio cholerae (15) and Ralstonia solanacearum (16). The small replicon of L. interrogans was previously suggested to be a second chromosome based upon the localization of the metF gene which encodes an essential biosynthetic pathway enzyme, methylene tetrahydrofolate reductase (17). The genome sequence reveals that genes encoding enzymes for metabolic pathways, such as glycolysis and the tricarboxylic acid cycle, as well as the enzymes for biosynthesis pathways of amino acid and co-factor are also distributed between the two chromosomes. Sequence analysis of chromosome II shows that an almost complete operon of genes coding for the protoheme biosynthesis pathway is present (hemAIBCENYH). Although no homologue of the gene coding for uroporphyrinogen III synthetase (hemD) was found, experimental evidence has shown that the hemC gene is able to cope with hemD activity (18). Therefore, L. interrogans has the ability to synthesize protoheme de novo. In addition, 13 genes clustered in chromosome II coding for the cobalamin biosynthesis pathway were identified (cobC, cobD, cbiP, cobP, cobB, cobO, cobM, cobJ, cbiG, cobI, cobL, cobH, cobF) (Figure 3). Orthologues of cobGKN genes, known to be involved in the cobalamin pathway (19) were not found. However, there are two predicted coding sequences inside this operon in chromosome II that could perform these steps. One has an oxidoreductase NAD-binding domain (LIC20133) and the other is a [2Fe-2S] ferredoxin involved in electron transfer (LIC20135). In addition, other genes present in the genome coding for reductases such as LIC11145, LIC13354, LIC12391, and LIC10522 could also cope with these activities. The presence of cysG in chromosome I, a gene that encodes a multifunctional protein with methylase, oxidase and ferrochelatase activities (20), may also be a cobalt-inserting enzyme in the B12 pathway. Other genes involved in this biosynthesis pathway were found in chromosome I (cysG/hemX/cobA, cobT/cobU, cobS) (Figure 3). In fact, experimental evidence has recently shown that leptospires can grow in medium in the absence of B12 (Hartskeerl RA, unpublished results). This is contrary to the previous statement that L. interrogans is unable to synthesize B12, and that is why it is an essential component of the EMJH semi-synthetic medium (1,4). Thus, L. interrogans, unlike the spirochetes Borrelia burgdoferi and Treponema pallidum, have the complete repertoire of genes for de novo synthesis of protoheme and cobalamin. The functional link between the two replicons supports the view that the small replicon is in fact a second chromosome.
Among the 220 Leptospira transport proteins, we found 148 proteins comprising 108 different major primary and secondary transporters (Figure 4), as defined by Paulsen et al. (21). There are 34 major primary ATP-driven transporters including 30 ABC-transporters (53 proteins), the largest protein family in L. interrogans, as usually is the case for bacteria (22). The ABC superfamily contains both uptake and efflux transport systems, and ATP hydrolysis energizes the transport. The porters of the ABC superfamily consist of two integral membrane proteins (that determine specificity of the transported solute) and two cytoplasmic ATP-hydrolyzing proteins present as homodimers or heterodimers. The uptake systems (but not the efflux systems) additionally possess extra-cytoplasmic solute-binding receptors (one or more per system), which in Gram-negative bacteria are found in the periplasm. We found 21 ABC efflux systems, including drug and heavy metal export and detoxification, lipoprotein-releasing and hemolysin export systems. The remaining 9 are ABC uptake systems, including iron, sulfate, nickel, phosphate, dipeptide, amino acid, and carbohydrate uptake. There is one F-type ATP-synthase system (8 proteins), as mentioned above, and 3 P-type cation-transport ATPases (5 proteins).
We found 59 secondary electrochemical potential-driven transporters (65 proteins) including the largest family of secondary transporters, the resistance-nodulation-cell division superfamily with 12 members, 5 of which belong to the heavy metal efflux family (with 7 proteins) and 7 to the multiple drug efflux family (with 8 proteins). These secondary efflux transporters are energized by proton-motive force and show the widest substrate specificity among all known multi-drug pumps, ranging from most of the currently used antibiotics, disinfectants, dyes, and detergents to simple solvents. The second largest secondary transporter family is the major facilitator superfamily with 9 members including drug:cation antiporters (8 proteins) and a glycerol-phosphate:inorganic phosphate antiporter (1 protein). Additional secondary transporters include three sodium-solute symporters (one sodium-glucose co-transporter), three sodium-bile-acid symporters, and symporters for phosphate, di-tripeptides, glutamate, amino acids, uracil, sulfate as well as an ammonium:potassium antiporter, a glycerol-3-P:Pi antiporter, two drug-sodium antiporters and two polysaccharide exporters. Secondary transporters also include 7 members of the TonB family of auxiliary proteins for energizing the outer membrane receptor-mediated active transport. Leptospira has many iron-transporting proteins in addition to the cobalamin/iron periplasmic binding protein component of the ABC-transport system (LIC13403) mentioned above. Four members of the outer membrane receptor family have been identified, which are probably involved in the transport of iron dicitrate (LIC10714), hemin (LIC10964), ferric hydroxamate (LIC11345) and ferrienterobactin (LIC12998). In addition, an orthologue of the ferrous iron uptake (FeoB) protein of E. coli was found (LIC11402). FeoB has been characterized as an Fe2+ uptake system and possesses an ATP/GTP binding motif at its N-terminal hydrophilic domain, therefore being probably energized by ATP or GTP hydrolysis (23).
The Leptospira genome contains several genes encoding enzymes with peroxidase activities, such as catalase, glutathione peroxidase and thiol peroxidase. However, superoxide dismutase orthologues and two important regulons, SoxRS and OxyR, normally responsible for the defense against oxidative stress, are absent (5). It is possible that metalloporphyrins (24) could provide defense against oxidative damage, since L. interrogans is competent for porphyrin biosynthesis and has metal uptake transport proteins. Genes coding for co-migratory bacterioferritin (LIC20093 and LIC10732), thiol peroxidase (LIC12765) and peroxiredoxin (LIC11219) with alkyl hydroperoxide reductase (AhpC) and thiol-specific antioxidant activities, respectively, were also identified. Two predicted coding sequences for bacterioferritin (LIC11310 and LIC13209), which may have functions analogous to animal ferritin, are also present and may provide both iron detoxification and storage, that would prevent free iron in Leptospira from driving oxidative reactions.
There are many genes encoding signal transduction proteins in Leptospira, indicating that this organism has developed a vast array of regulatory systems that enable it to respond to environmental signals. This variety of regulatory domains is found in non-obligate parasitic bacteria, indicating a much greater need to interpret the signals from the environment in order to respond properly, while obligate parasites have a much lower number and variety of domains in their signal transduction proteins (25). There are 80 genes encoding components of the phosphorylation-mediated signal transduction pathway: 29 histidine kinases (HK), 30 response regulators (RR) and 18 hybrid kinase/regulators (HK/RR). These members of the two-component systems present several domains organized into different arrangements (Figure 5). Nineteen of the histidine kinases are located in the inner membrane, nine are cytoplasmic and one is probably found in the periplasm, as predicted by the PSORT program [http://psort.nibb.ac.jp/].
On the other hand, only a third of the hybrid HK/RR are membrane-bound, suggesting that these hybrid proteins are most likely to be involved in phosphorylation cascades, although some of them have the sensor PAS domain. The PAS domain has been reported to sense different environmental stimuli, like oxygen, redox state, nitrogen availability or light, and it may or may not be associated with co-factors such as heme or FAD (26). A two-component protein containing a PAS sensor domain was found to be required for virulence of M. tuberculosis in mice, probably because it senses radical oxygen intermediates generated by macrophage phagocytosis (27), and probably this is also the case for Leptospira.
The response regulators are the cytoplasmic effectors of the message, which become functional after being phosphorylated at an aspartate residue by the cognate histidine kinase. The RRs may possess a second effector domain (Figure 5), which will perform its ultimate function, such as the DNA-binding helix-turn-helix domain that allows the RR to regulate transcription. Other noticeable domains found in L. interrogans RRs are the GGDEF and EAL motifs, which correspond to putative diguanylate cyclase and phosphodiesterase domains, respectively, and a phosphatase domain similar to the mammalian phosphatase 2C that may be involved in the phosphorelay.
Cyclic nucleotides are likely to have a major regulatory role in Leptospira. There are 19 homologues of adenylate/guanylate cyclases, two specific phosphodiesterases and 7 cyclic nucleotide-binding proteins that probably have a regulatory function. There are 12 genes encoding proteins containing the GGDEF motif, seven of which are organized in tandem in chromosome I (LIC11131 to LIC11125), and they also have a PAS/PAC sensor domain in the amino-terminal region, suggesting that they are the product of gene duplication. The diguanylate cyclase activity of the GGDEF domain is required for a novel regulatory mechanism involving bis-(2′,5′)-cyclic diguanylic acid (c-di-GMP) as an allosteric activator (28). Two of the GGDEF-containing proteins also have the Cache domain, which is involved in chemotaxis signal transduction (29), an important feature for Leptospira. There are eight putative diguanylate phosphodiesterases containing an EAL domain, with three of them being associated with an RR domain (Figure 5). L. interrogans also presents in its genome three serine/threonine kinases and two hybrid HK/RR with a GAF domain, which is a binding domain for cGMP (30).
Other interesting findings in the genome include three related genes encoding putative DNA-binding proteins (LIC20104, LIC20105 and LIC20178), which contain 6 transmembrane domains in the amino-terminus and one helix-turn-helix domain of the AraC type at the carboxyl-terminus. These genes are present in chromosome II, and two of them are clustered. Orthologues with a C-terminal DNA-2binding domain and a hydrophobic N-terminal region were described in other bacteria, including Borrelia and Treponema (31), but their function is unknown.
The L. interrogans genome comprises a relatively large number of motility and chemotaxis genes. Enteric bacteria usually have about 50 genes coding for structural and functional proteins involved in motility (32). A similar number of genes have been identified for the spirochetes T. pallidum and B. burgdorferi (33,34). Apparently, the motility and chemotaxis apparatus of L. interrogans is more complex since its genome contains at least 79 putative motility-associated genes. All genes are well conserved among L. interrogans, T. pallidum and B. burgdorferi and 42 genes were found to be common to all three genera. However, the leptospiral genome contains multiple copies of a number of motility-associated genes, accounting partly for the higher number. For example, the genome of serovar Copenhageni contains 5 flaB genes, 4 motB genes and 2 motA genes compared to 3 flaB genes and one copy of both motA and motB in the T. pallidum genome and one copy each of flaB, motA, and motB in the B. burgdorferi genome. In addition, the L. interrogans genome contains 11 putative genes encoding methyl-accepting chemotaxis proteins, which is roughly twice as many as in T. pallidum and B. burgdorferi. Forty-eight of the 79 motility-associated genes are positioned into 14 gene clusters varying in size from 2 to 8 genes. Thus, like in T. pallidum and B. burgdorferi, the majority of the structural and functional motility genes are positioned in potential operons. However, the operons probably underwent extensive rearrangements as they are generally smaller, often corresponding to only parts of the major Treponema and Borrelia operons. For example, the flgB operon in B. burgdorferi consisting of 26 genes (35) has been disrupted into 6 fragments dispersed at 5 positions in the leptospiral genome. Differences in number of genes and operon organization might be associated with the high flexibility of pathogenic leptospires enabling them to survive and adapt to a variety of environments and hosts.
The primary lesion caused by Leptospira is the damage to the endothelium of small blood vessels, leading to hemorrhage and localized ischemia in multiple organs. As a consequence, renal tubular necrosis, hepato-cellular damage, meningitis, and myositis may occur in the infected host (1). Hemolysins may play a fundamental role in this toxic process. Several genes coding for predicted hemolysins were identified in the L. interrogans genome. Some of them are structurally related to sphingomyelinases C (LIC10657, LIC12631, LIC12632, LIC11040, and LIC13198). Although generically called sphingomyelinases, it is possible that these genes code for hydrolases that act not only on sphingomyelins but also on other sphingolipids. Erythrocytes may represent a target for these enzymes since they are rich in glycosphingolipids like the antigenic determinants of the ABO system. Interestingly, LIC12631 and LIC12632 are organized as an operon, which may suggest a concerted expression and action.
The other class of genes coding for hemolysins, tlyABC, was identified. Although they were assigned to the same TlyABC class, they do not present structural similarity. These putative hemolysins were first identified in the spirochete Brachyspira hyodysenteriae. Hemolytic and cytotoxic activities of the recombinant TlyA, TlyB and TlyC proteins, expressed in E. coli, were detected (36). TlyB belongs to the family of Clp ATP-dependent proteases (37) and there are 3 genes coding for structurally related proteins (LIC10339, LIC11814 and LIC12017). Thus, 5 genes of the tlyABC class (LIC10284, LIC10339, LIC11814, LIC12017, and LIC13143) were identified in the L. interrogans genome.
LIC12134 codes for an HlpA-related protein which was characterized as a hemolysin in Prevotella intermedia, a common oral bacterium associated with periodontitis (38). Another identified gene (LIC10325) is related to the hlyX gene which codes for a predicted hemolysin previously identified in L. borgpetersenii serovar Hardjo (Accession number AAF09252, unpublished results). LIC11352 should also be mentioned as the gene which codes for LipL32 or HAP-1, a ubiquitous lipoprotein of pathogenic Leptospira (see Lipoproteins section below) with hemolytic activity (39).
Pathogenic leptospires require several types of surface-exposed proteins for the purpose of colonization and survival in the mammalian host. Surface-exposed proteins may be categorized as nonspecific porins, specific channels for nutrient acquisition, efflux channels, lipoproteins, adhesins, Slayer glycoproteins, peripheral membrane proteins, or surface-maintenance proteins. Aside from S-layer proteins and peripheral membrane proteins, these surface-exposed proteins would be expected to be integrated into the outer membrane via transmembrane regions or lipid modification. The leptospiral genome contains homologues of SecY and other secretory proteins involved in exporting proteins with signal peptides across the cytoplasmic membrane. The leptospiral genome also contains both the standard signal peptidase and the lipoprotein signal peptidase. The standard signal peptidase is involved in the hydrolysis of signal peptides of transmembrane outer membrane proteins and periplasmic proteins, releasing them from the cytoplasmic membrane. Lipoprotein signal peptidase hydrolyzes the signal peptides of lipoproteins prior to lipidation of the N-terminal cysteine.
Analysis of the L. interrogans genome reveals 83 beta-sheet transmembrane outer membrane proteins (OMPs) and all except one (OmpL1, LIC10973) (40) were previously unknown. An example of a newly identified leptospiral protein with a transmembrane beta-sheet structure is LIC10642 which is predicted to be a member of the OMP superfamily.
Among these newly identified OMPs is a family of predicted coding sequences (LIC20202, LIC10995, LIC11465, and LIC11103) belonging to the superfamily of alpha/beta hydrolases, which includes bacterial lipases. Another transmembrane OMP is LIC11623, a member of a family of highly conserved proteins of Gram-negative bacteria, including N. meningitides Omp85, which is thought to be a chaperonin involved in the assembly of OMPs in the outer membrane (41). OmpL1 belongs to the class of nonspecific porins allowing passage of small molecules (<1000 Daltons) across the leptospiral outer membrane (40). Nonspecific porins allow both influx of nutrients and efflux of products of metabolism. Another example is LIC11458, which is predicted to be a member of the porin superfamily.
Survival in the mammalian host environment by bacterial pathogens requires the acquisition of certain trace nutrients. For example, iron is essential for leptospiral growth and is bound by a number of high-affinity binding proteins in mammals, which restrict its availability. Bacteria scavenge trace nutrients, including iron, heme, and vitamin B12, from their environment utilizing the cytoplasmic membrane protein TonB and a series of “TonB-dependent” OMPs (Figure 6). After binding, transport of the nutrient across the outer membrane into the periplasm by this type of channel is an energy-dependent step requiring interaction of the TonB-dependent OMP with TonB. The L. interrogans genome contains a TonB orthologue (LIC10889) and a large number of TonB-dependent OMPs: LIC20151, LIC20214, LIC10998, LIC10964, LIC10714, LIC12374, LIC12898, LIC12998, LIC11694, LIC11345, LIC10882, LIC10881, and LIC10896.
The leptospiral genome also contains OMPs involved in efflux pathways (Figure 6). The type I efflux system is represented by an orthologue of TolC (LIC13135), the outer membrane exporter of hemolysin and drugs, along with an orthologue of CusC (LIC11941), an outer membrane exporter of copper ion. A second type of efflux pathway found in the leptospiral genome belongs to the resistance-nodulation-cell division superfamily, a three-member complex that catalyzes substrate efflux via an H+ antiport mechanism. The three-member complex consists of an resistance-nodulation-cell division transporter, a membrane fusion protein, and an outer membrane factor involved in the export of proteins, carbohydrates, drugs or toxic heavy metals (Figure 6). The leptospiral orthologues are most closely related to those of the Alcaligenes eutrophus cobalt/cadmium/zinc export system consisting of the resistance-nodulation-cell division transporter CzcA (LIC12224, LIC15510 and LIC11938), CzcB (LIC12306 and LIC11940), and CzcC (LIC12307 and LIC11941). In addition, L. interrogans has two orthologues of the cation efflux system protein CzcD (LIC11714 and LIC13205), which are members of the cation diffusion facilitator family. CzcD of Bacillus subtilis has been shown to catalyze divalent cation (Zn2+ or Cd2+) efflux in exchange for the uptake of two monovalent cations (K+ and H+) in an electroneutral process energized by the transmembrane pH gradient (42).
Experimental evidence for fatty acid modification of leptospiral lipoproteins has been described for the outer membrane lipoproteins LipL32 (LIC11352) (43), LipL36 (LIC13060) (44), and LipL41 (LIC12966) (45). The cytoplasmic membrane also contains lipoproteins, as demonstrated for LipL31 (LIC11456) and LipL71 (LIC11003) (46). A total of 184 predicted coding sequences in the L. interrogans genome were found to have a lipoprotein signal peptidase cleavage site (5). All proposed lipoprotein-coding sequences conform to the rule of having an L, I, V, or F in the −3 and/or −4 position relative to cysteine and most of them have A, G, S, or N in the −1 position relative to cysteine (47).
Proteolytic functions can be assigned to some of the newly identified lipoproteins: four lipoproteins are thermolysin homologues (LIC10715, LIC13320, LIC13321, and LIC13322), and one is a predicted metallo-protease (LIC11834). Several lipoproteins may be involved in hemolysis: two lipoproteins are sphingomyelinase homologues (LIC10657 and LIC12632), and one is a phospholipase D homologue (LIC11754). Lipoprotein LIC10972 is predicted to be located in the outer membrane and is a MauG homologue belonging to a family of cytochrome c peroxidases that may be involved in defense against hydrogen peroxide. Other lipoproteins with putative enzymatic functions are homologues of glucose dehydrogenase (LIC12294) and cholesterol oxidase (LIC13202).
S-layers are thought to provide structural integrity to the bacterial surface (48). Although an S-layer has not been observed in pathogenic leptospires, the genome contains at least two proteins with S-layer homology (LIC10868 and LIC12952). Experimental evidence is needed to determine whether these proteins are actually S-layer components.
Like S-layer proteins, peripheral membrane proteins are not integral membrane proteins. LipL45 is processed to a peripheral membrane associated with the outer membrane, P31LipL45 (49). P31LipL45 expression is dramatically up-regulated in stationary phase cultures, and for this reason may have a membrane-stabilizing function. At this time it is unclear whether P31LipL45 is surface-exposed. Interestingly, the leptospiral genome contains a number of predicted coding sequences with homology to LipL45 (LIC20102, LIC20114, LIC13414, and LIC10123).
Bacteria are likely to have a variety of proteins involved in maintaining the surface structural integrity. One such protein, glycerophosphoryl diester phosphodiesterase, is a protein belonging to this category which has been found in all spirochetal genomes studied to date. E. coli has two forms of this enzyme, a cytosolic form, ugpQ, and a periplasmic form, glpQ. The leptospiral genome contains two homologues of this enzyme, LIC13182 and LIC10293. The former should be the cytosolic form because it lacks a signal peptide while the latter should be the exported form because it has an N-terminal signal peptide. In other spirochetes, GlpQ is associated with the outer membrane (50).
The leptospiral genome contains a number of proteins that belong to a large family of prokaryotic proteins with homology to the peptidoglycan-associated portion of E. coli OmpA (51). These proteins are predicted to be either cytoplasmic membrane (LIC20250, LIC10592, LIC13479, and LIC10050) or periplasmic proteins (LIC10537 and LIC10191), rather than OMPs, which is consistent with experimental evidence that the spirochetal cell wall is more closely associated with the cytoplasmic membrane than the outer membrane. An interesting protein family is the mechanosensitive ion channel (LIC20069 and LIC12671). Two members of this protein family of M. jannaschii have been functionally characterized and both form mechanosensitive ion channels (52). Therefore, this family is likely to also encode mechanosensitive channel proteins in L. interrogans, playing a physiological role in bacterial osmoregulation.
Lipopolysaccharides distinguish the leptospiral surface from those of the other invasive spirochetes. Changes in genes involved in the LPS biosynthesis apparatus are thought to account for serovar diversity among leptospires (53). It has been shown that the leptospiral LPS resembles a typical LPS from Gram-negative bacteria, containing pentoses, hexoses and heptoses. Although the predominant sugar is rhamnose, a large variety of other sugar residues are found (1). Antibodies raised against LPS from different Leptospira strains during infections are related to polysaccharide structure in terms of its sugar composition, number, repetitiveness, and ramification (1). In Leptospira, as in many other bacteria, at least part of the genes coding for enzymes of the polysaccharide biosynthesis pathway are found clustered in a region of chromosomes named O-antigen gene cluster (rfb locus) (53).
The complete genome analysis reveals a large portion of about 119 kb (genome position 2.538.470–2.554.255), in which all the 108 predicted coding sequences are transcribed in the same orientation (Figure 7). Interestingly, this region is dense in genes related to LPS biosynthesis (Table 1) and includes the O-antigen gene cluster previously described in L. interrogans serovar Copenhageni (53,54). In the first 14 kb of this region the predicted genes seem not to be related to LPS biosynthesis, but to DNA replication, and some genes code for ribosomal proteins. In the other 105 kb of this region there are 94 predicted genes 56 of which are clearly related to LPS biosynthesis. These predicted genes encode enzymes for nucleotide sugar biosynthesis, such as the enzymes for dTDP-rhamnose biosynthesis (54) for CMP-N-acetylneuraminic acid and for perosamine synthase. In addition, many genes coding for sugar transferases, sugar modifications and for the O-antigen processing, Wzx-flippase and Wzy-polymerase, involved in oligosaccharide exporting and assembly of the LPS, were identified. Some genes encoding proteins of the lipid A biosynthesis (lpxD) and transport (msbA) are also found is this region.
The comparison with the corresponding region in the genome of L. interrogans serovar Lai (4) revealed only three distinct predicted coding sequences: two extra copies of genes for aminotransferase (LIC12197 and LIC12198) and the absence of a gene encoding galactoside O-acetyltransferase (LA1622).
Seventy-seven other genes, probably related to LPS biosynthesis, were identified along the genome. All the genes related to lipid A biosynthesis described in E. coli were identified in the Leptospira genome (lpxA, lpxC, lpxD, lpxB, lpxK). However, the predominant fatty acids in lipid A of L. interrogans are dodecanoic and hexadecanoic acid instead of hydroxymyrystoyl (14 carbons), which is the signature of Gram-negative bacteria (1). Comparative analysis between LpxA of E. coli and P. aeruginosa (55) showed that few structural differences in this enzyme determine changes in the fatty acid chain size incorporated during lipid A biosynthesis. The lower endotoxicity of leptospiral LPS as compared to LPS from Gram-negative bacteria has been reported (56). It was also reported that leptospiral LPS induces a TLR2-dependent cell activation, instead of LTR4, the receptor frequently involved in the LPS immune response (57). The lower endotoxicity and the difference in the mechanism of cell immune response activation are supposed to be related to differences in the lipid A structure.
Genes encoding enzymes of Kdo biosynthesis (kdsA, kdsB) such as Kdo-transferase (waaA or kdtA), which catalyzes the binding of Kdo to lipid IVA, were identified. Although typical Kdo was found in Leptospira LPS, it was reported to be substituted at different carbon positions (1).
Four paralogues encoding MsbA, the protein that transfers the flippase of lipid A in the inner membrane from the cytoplasmic side to the periplasmic side, were identified. One of the msbA genes is located in the 119-kb region. Another msbA gene is located upstream of lpxK, as reported in many other genomes (55). It will be interesting to compare the set of genes coding for enzymes related to LPS biosynthesis identified in the Leptospira genome to the orthologues in other microorganisms, in order to correlate LPS structural differences and endotoxic activity.
A three-way comparison between the L. interrogans genome and the genomes of B. burgdorferi and T. pallidum yields the following results: 1167 (31%) of the genes in L. interrogans Copenhageni are found in B. burgdorferi and/or in T. pallidum, 666 (41%) of the genes in B. burgdorferi are found in the Copenhageni genome, and 589 (57%) of the genes in T. pallidum are found in the L. interrogans genome. A total of 362 predicted genes were found to be shared by all three spirochetes, 45 of which are hypothetical (detailed list in our project website: http://aeg.lbi.ic.unicamp.br/world/lic/). These results show that a thorough analysis of the genome can significantly contribute to the understanding of spirochete biology.
Our analysis of the L. interrogans genome revealed the presence of the previously described insertion sequence (IS) elements IS1500, IS1501 (58), reminiscent insertions of IS1533 (59) and IS1502 (60) and an IS recently identified that we designated ISlin1 (5) (Table 2). IS1500, IS1501 and ISlin1 account for 24 insertions distributed throughout chromosome I. Together, IS elements and transposases comprise 2% of the genome. These elements are related to major IS families such as IS110 and IS3 that are defined by their conserved amino acid motif (DDE) in the transposase. So far all the insertions were found in intergenic regions, having no mutagenic effects on L. interrogans genes.
The L. interrogans genome provides new insights into biosynthesis pathway, transport families, environmental response, and pathogenesis. A broad array of regulatory system proteins enable this organism to respond to environmental signals. A new group of genes involved in LPS biosynthesis may contribute to the elucidation of serovar diversity among leptospires. Several categories of surface-exposed proteins required for colonization and survival in the mammalian host were identified. These proteins may have a role in mechanisms of leptospiral pathogenesis and protective immunity. Available vaccines are serovar-specific and have low efficacy (1). Surface-exposed proteins that are conserved among pathogenic serovars may be used for vaccine development for the prevention of leptospirosis.
Research supported by FAPESP and CNPq.
We are deeply indebted to Drs. I. Raw (Fundação Butantan, São Paulo, SP, Brazil) for use of laboratory facilities and valuable support. We thank Dr. Albert I. Ko (Fiocruz, Salvador, BA, Brazil) for the strain Fiocruz L1-130, the Agronomical and Environmental Genomes (AEG) Consortium of the network Organization for Nucleotide Sequencing and Analysis (ONSA) for the genome sequencing data, and Dr. L.C.C. Leite (Instituto Butantan) for a critical reading of this manuscript.