|Home | About | Journals | Submit | Contact Us | Français|
The complete 2,343,479-bp genome sequence of the gram-negative, pathogenic oral bacterium Porphyromonas gingivalis strain W83, a major contributor to periodontal disease, was determined. Whole-genome comparative analysis with other available complete genome sequences confirms the close relationship between the Cytophaga-Flavobacteria-Bacteroides (CFB) phylum and the green-sulfur bacteria. Within the CFB phyla, the genomes most similar to that of P. gingivalis are those of Bacteroides thetaiotaomicron and B. fragilis. Outside of the CFB phyla the most similar genome to P. gingivalis is that of Chlorobium tepidum, supporting the previous phylogenetic studies that indicated that the Chlorobia and CFB phyla are related, albeit distantly. Genome analysis of strain W83 reveals a range of pathways and virulence determinants that relate to the novel biology of this oral pathogen. Among these determinants are at least six putative hemagglutinin-like genes and 36 previously unidentified peptidases. Genome analysis also reveals that P. gingivalis can metabolize a range of amino acids and generate a number of metabolic end products that are toxic to the human host or human gingival tissue and contribute to the development of periodontal disease.
Periodontal diseases are a group of infections that affect the structures surrounding teeth. If allowed to progress, periodontal disease can cause the destruction of supporting connective tissue and bone, ultimately resulting in tooth loss. Initiation and progression of periodontal diseases is the result of a complex interaction between the bacteria colonizing the gingival crevice and the hosts' immune and inflammatory responses. Since the species most strongly implicated in periodontal disease pathogenesis are also usually present in low numbers in healthy people, the distinction between pathogenic and commensal bacteria in the human host is not clearly defined.
The gram-negative anaerobe Porphyromonas gingivalis belongs to the family Porphyromonadaceae, order Bacteroidales in the phylum Bacteroidetes, previously known as the Cytophaga-Flavobacteria-Bacteroides (CFB) group (6). The bacterium is a major causative agent in the initiation and progression of severe forms of periodontal disease. P. gingivalis is a late or secondary colonizer of the oral cavity, a process that is facilitated by other microbial species that provide attachment sites, as well as supply growth substrates, and reduce oxygen tension to levels optimal for growth of P. gingivalis. Among the early plaque organisms that P. gingivalis adheres to are the oral streptococci (35, 36) and Actinomyces naeslundii (19). Adherence is facilitated by a variety of bacterial surface proteins, including fimbriae, hemagglutinins, and proteinases. P. gingivalis also binds to late colonizers such as Fusobacterium nucleatum, Treponema denticola, and Bacteroides forsythus (now renamed Tanerella forsythensis) (20, 30, 72). The use of a variety of metabolic strategies appears to enable the success of this microbial community. Once established, P. gingivalis cells participate in intercellular communication networks with other oral prokaryotic cells, as well as with eukaryotic cells (37).
P. gingivalis is the third oral pathogen (1, 28) and the second member of the CFB group to be sequenced. The genome of P. gingivalis strain W83 (also known as strain HG66) is presented here. Strain W83 was isolated in the 1950s by H. Werner (Bonn, Germany) from an undocumented human oral infection (41, 46) and was brought to The Pasteur Institute by Madeleine Sebald during the 1960s. The strain was subsequently obtained by Christian Mouton (Quebec, Canada) during the late 1970s. It is anticipated that the availability of the complete genome sequence from this oral pathogen will give tremendous insight into the mechanisms that result in disease progression in an ecological niche deep within the oral cavity.
P. gingivalis strain W83 was obtained from Christian Mouton, Laval University, Quebec City, Quebec, Canada. Genomic DNA was extracted twice with buffered phenol and once with 25:24:1 phenol-chloroform-isoamyl alcohol and precipitated with alcohol. Cloning, sequencing, and assembly were as described previously for genomes sequenced by The Institute for Genomic Research (TIGR) (13). One small-insert plasmid library (1.5 to 2.5 kb) was generated by random mechanical shearing of genomic DNA. One large-insert library was generated by partial Tsp5091 digestion and ligation to the λ-DASHII/EcoRI vector (Stratagene). In the initial random sequencing phase, ~8-fold sequence coverage was achieved with 39,623 sequences (average read length, 534 bases). The plasmid and λ sequences were jointly assembled by using TIGR Assembler. Sequences from both ends of 506 λ clones served as a genome scaffold, verifying the orientation, order, and integrity of the contigs. Sequence gaps were closed by editing the ends of sequence traces and/or primer walking on plasmid clones. Physical gaps were closed by direct sequencing of genomic DNA or combinatorial PCR, followed by sequencing of the PCR product. The final molecule has 8.35× sequence redundancy.
An initial set of open reading frames (ORFs) that likely encode proteins was identified with GLIMMER (57), and those shorter than 90 bp, as well as some of those with overlaps, were eliminated. (For more details on the annotation process that was used to identify all of the ORFs in the P. gingivalis genome, see reference 70.) A region containing the likely origin of replication was identified, and bp 1 was designated adjacent to the dnaA gene that is in this region. ORFs were searched against a nonredundant protein database as previously described. Frameshifts and point mutations were detected and corrected where appropriate as described previously (51). Remaining frameshifts and point mutations are considered authentic, and corresponding regions were annotated as an “authentic frameshift” or an “authentic point mutation,” respectively. ORF prediction and gene family identification was completed by using the methodology described previously (49). Two sets of hidden Markov models (HMMs) were used to determine ORF membership in families and superfamilies. These included 721 HMMs from Pfam v2.0 and 631 HMMs from TIGR ortholog resource. TMHMM (33) was used to identify membrane-spanning domains in proteins.
All genes and predicted proteins from this genome, as well as from all other completed genomes, were compared by using the basic local alignment search tool (BLAST). For the identification of recent gene duplications, all genes from the P. gingivalis genome were compared to each other. A gene was considered recently duplicated if the most similar gene (as measured by P value) was another gene within the same genome (relative to genes from the two other genomes).
The nucleotide sequence of the whole genome of P. gingivalis was submitted to GenBank under accession number AE015924.
The genome of P. gingivalis is 2,343,479 bp, with an average G+C content of 48.3% (Fig. (Fig.1).1). There are four ribosomal operons (5S-23S-tRNAAla-tRNAIle-16S) and 2 structural RNA genes, as well as 53 tRNA genes with specificity for all 20 amino acids. A total of 1,990 ORFs could be identified in the genome (13). Of these, 1,075 (54%) could be assigned to biological role categories (54), 184 (9.2%) were conserved hypothetical proteins or conserved domain proteins, 208 (10.5%) were of unknown function, and 523 (26.3%) encoded hypothetical proteins. More than 85% of the genome encodes ORFs.
Repetitive elements occupy ca. 6% of the P. gingivalis genome and fall into two major classes: DNA repeats and transposable elements. The DNA repeats include uninterrupted direct repeats (Table (Table1),1), and a subclass of dispersed repeats known as clustered regularly interspaced short palindromic repeats (CRISPRs) (Table (Table2).2). Strain W83 does not appear to contain other classes of dispersed repetitive DNA sequence elements such as ERIC and REP elements. The transposable elements include insertion sequence (IS) elements and miniature inverted-repeat transposable elements (MITEs), which are summarized in Table Table3,3, and large stretches of genes that resemble remnants of conjugable and mobilizable transposons based on sequence similarity to elements previously described in Bacteroides species (11). The locations of transposon-associated genes are shown in Fig. Fig.1.1. Although there are 96 complete or partial copies of IS elements and MITEs present in strain W83 that occupy more than 94 kb of the genome, the transposable elements are rarely found in a functional gene. Instead, these elements have inserted almost exclusively into intergenic regions and other copies of transposable elements, except for one insertion into a putative outer membrane protein (PG0176/PG0178) that is intact in at least four other strains of P. gingivalis (accession numbers AB069977 to AB069980). Analysis of the IS elements reveals two possible chromosomal inversions that most likely arose by homologous recombination between identical copies of elements at widely separated insertion sites. These potential DNA rearrangements are revealed by inspection of both the duplicated target site sequences (direct repeats) that flank some of the IS elements and a transposon gene disrupted by one IS insertion. The sites of these putative inversions are shown in Fig. Fig.1.1. In one case, 821 kb (35%) of the chromosome appears to be inverted between one copy of ISPg2 (PG1746) 512 kb before the origin and another copy of ISPg2 (PG0277) 309 kb after the origin (red dots). In the second case, 103 kb (4.4%) of the chromosome appears to be inverted between two copies of ISPg4, which are located on opposite sides of the origin (PG2194 and PG0050) (green dots). Since inversions about the origin do not invert the direction of transcription relative to replication of genes on the segment, such inversions may be selectively neutral. It will be interesting to determine whether strain W83 and other strains of P. gingivalis share a common genetic structure, or whether the proposed chromosomal inversions are relatively recent events.
There are 21 areas of the genome that display an atypical nucleotide composition identified by χ2 analysis (67) and that also correspond to regions of higher or lower G+C content than the rest of the genome. The areas range in size from 11 to 68 kb and range in G+C content from 29.4 to 61.6%. A variety of genes that could possibly have been acquired by this bacterium through lateral gene transfer are encoded in these regions. The genes include three restriction system proteins (PG0971, most similar to Anabaena sp. strain PCC 7120; PG0968, most similar to Anabaena sp. strain PCC 7120; and PG1469 most similar to Agrobacterium tumefaciens); hemagglutinin proteins B and C (HagB, PG1972, P. gingivalis specific; and HagC, PG1975, P. gingivalis specific); many capsular biosynthesis proteins, 20 transposase genes, two large mobile elements (PG1473 to PG1480, resembles only a conjugative element of Bacteroides thetaiotaomicron; and PG0868 to PG0875, whose sequence and gene organization most closely resembles the antibiotic-resistant mobilizable transposon Tn4555 from Bacteroides fragilis (68); and a thiamine biosynthesis operon (PG2107 to PG2111, which is most similar to the thiamine biosynthesis operon of Escherichia coli). These atypical regions in the P. gingivalis genome also encode many hypothetical and conserved hypothetical proteins, which undoubtedly contribute to the unique biology of this organism.
Comparison of the predicted proteome of P. gingivalis with that of other completely sequenced genomes confirms the close relationship of P. gingivalis to other members of the CFB, including B. fragilis and B. thetaiotaomicron. Outside of the CFB phyla, the genome most similar to that of P. gingivalis is the Chlorobium tepidum genome, supporting previous phylogenetic studies that indicated the chlorobia and CFB phyla are related, albeit distantly. The proteomes most similar to that of P. gingivalis (in terms of the number of proteins with the best scoring matches) were those of B. thetaiotaomicron and B. fragilis with 572 and 437 best-scoring matches (P < 10−5), respectively.
A total of 332 genes were identified as being putatively duplicated in the P. gingivalis lineage. These duplicated genes are likely an indication that there is some selective evolutionary advantage to retaining these genes in the genome. Among these genes are 10 that encode DNA-binding histone-like proteins that have a distinctive domain architecture compared to HU and related histone-like proteins. These DNA-binding proteins have been designated a superfamily (i.e., a set of proteins that share a given domain architecture; TIGRFAMs family TIGR01201). Outside of P. gingivalis, the single known example of a DNA-binding histone-like protein is found in the gut bacterium B. fragilis. All members of this superfamily are distantly related to the bacterial DNA-binding protein HU family (Pfam family PF00216, five of which are also found in the P. gingivalis genome) but differ in architecture, sharing both an N-terminal extension and a glycine-rich C terminus. HU has been shown, among other DNA-binding functions, to assist the unwinding of oriC DNA by the DNA replication initiation protein DnaA (4). Interestingly, all 10 members of the TIGR01201 family in P. gingivalis have direct repeats upstream of their genes that may act as binding sites for the DNA-binding proteins that are encoded by the nearby gene and perhaps regulate their own expression. Alternatively, the repeats may also coordinate expression of the other chromosomal genes that they flank.
The microbial species that exist in supragingival plaque of the oral cavity are exposed to the host's dietary intake, and many of these bacteria, including the oral streptococci, ferment carbohydrates to acidic end products such as lactic acid for the purpose of energy production. On the other hand, anaerobic species in the subgingival plaque are exposed to crevicular fluid and to the host tissue proteins (61). The availability of the complete genome sequence of P. gingivalis W83 allowed for an analysis of the physiological potential of this species. Based on this analysis, the range of transport capabilities and metabolic pathways that could be identified is presented in Fig. Fig.22.
Genome analysis suggests that P. gingivalis possesses a limited capacity for the uptake and metabolism of organic nutrients. Glucose utilization by P. gingivalis is known to be very poor, and carbohydrates in general do not appear to readily support growth (61). Strain W83 does, however, contain putative ORFs for all enzymes of the glycolytic pathway, as well as ORFs for a putative glucose/galactose transporter and glucose kinase. Sequence analysis shows that the glucose kinase is encoded in a split ORF generated by a missense mutation, and this is a likely explanation for the poor utilization of glucose to support growth. Four putative ORFs for the pentose phosphate pathway were identified, and it is likely that this pathway plays a role in the generation of precursor metabolites during anaerobic growth (Fig. (Fig.22).
Whole-genome analysis suggests that P. gingivalis can metabolize several sugars, including melibiose, galactose, starch, and maltodextrin. The bacterium also possesses enzymes for the degradation of complex amino sugars in the form of hexose aminidases. It is still unclear whether these complex sugars are metabolized, but one possibility is that the removal of amino sugars from host glycoproteins likely renders these proteins more susceptible to degradation by bacterial proteinases. In addition, at least 11 amino acids may serve as substrates for energy production (Fig. (Fig.2).2). These amino acids are most likely derived from the degradation of host tissues (see virulence section below) or from the breakdown of other bacterial cells in the oral cavity. Pathways for glutamate and aspartate utilization have been characterized by enzyme assays (65), and ORFs coding for all of these activities were found in the W83 genome. Intracellular glutamate is deaminated to 2-oxoglutatarate by glutamate dehydrogenase and then decarboxylated to succinyl coenzyme A (succinyl-CoA) by a CoA-dependent 2-oxoglutarate oxidoreductase. The possession of this activity is somewhat unusual in bacterial species (23, 25). It has been established that two-thirds of the succinyl-CoA produced in this reaction is converted to butyryl-CoA and then to butyrate. The remaining third may be converted to propionate by a pathway that involves the enzymes methylmalonyl-CoA mutase and acyl-CoA:acetate-CoA transferase, as reported for other propionate-producing bacteria (18). This pathway appears to be unique to P. gingivalis since other anaerobes catabolize glutamate through the hydroxyglutarate, methylaspartate, and/or the aminobutyrate pathways (5, 17, 18). P. gingivalis did not possesss activities for three key enzymes of these pathways: hydroxyglutarate dehydrogenase, 3-methylaspartate ammonia lyase, and 4-aminobutyrate aminotransferase (65). Peptide-derived aspartate is deaminated to fumarate by aspartate ammonia lyase and then either oxidized to acetate or reduced to propionate and butyrate (65).
Results from Takahashi et al. (65) suggest that P. gingivalis prefers to utilize arginine and lysine as free amino acids rather than in peptide form; thus, carboxy-terminal arginine and lysine residues could be released from proteins by carboxypeptidase activities. Masuda et al. (45) found such an activity in culture supernatants, and an ORF coding for an unspecified carboxypeptidase (PG0232) was identified in the genome. A report that P. gingivalis produces citrulline and ornithine from denatured protein (14) implies that the bacterium degrades arginine through the arginine deiminase pathway. Indeed, a gene with homology to arginine deiminase from Bacillus licheniformis (43) was identified. In addition, two genes—pyrB and pyrI (PG0357 and PG0358)—were contiguous in the genome and shared homology with aspartate/ornithine transcarbamylase catalytic and regulatory chains from Vibrio sp. strain 2693 and Pyrococcus abyssi, respectively.
The lysine catabolic pathways appear to be very similar to those found in Clostridium sp. ORFs were identified for the first steps of both l- and d-lysine catabolism; thus, the isomers are apparently degraded by two different pathways that yield butyric acid, acetic acid, and ammonia. Lysine 2,3-aminomutase (KamA) catalyzes the interconversion of l-lysine and l-β-lysine, the first step in the lysine degradation pathway in Clostridium subterminale SB4 (56). In P. gingivalis W83, kamA was found clustered with the genes kamD and kamE (PG1070, PG1073, and PG1074) that encode subunits of d-lysine 5,6-aminomutase, the first enzyme of the d-lysine degradative pathway. Genes encoding enzymes for the subsequent conversion of lysine to butyrate and acetate were located 3′ to kamE. It is not yet known whether these genes are transcribed as an operon.
Little is known about serine and threonine catabolism in P. gingivalis; however, an ORF was detected with homology to serine dehydratase (PG0084) that hydrolyzes serine to pyruvate, ammonia, and water. Threonine may be split to glycine and acetaldehyde by the activity of threonine aldolase, for which an ORF was detected (PG0474). In summary, P. gingivalis appears to catabolize amino acids through pathways that generate ammonia. The organism has a growth pH optimum of >7.5, and ammonia generation may have evolved as a strategy to shift the local pH to the favored alkaline range.
Several studies have shown that P. gingivalis preferentially uses peptides as sources of carbon and nitrogen (60, 65, 71) and, in addition to the previously described proteinases that are known to degrade host proteins, a number of peptidases that may be involved in the further digestion of protein fragments to smaller peptides and amino acids could be identified from the genome.
There are two carboxylate transporters possibly for lactate and formate, and no sugar transporters other than the aforementioned glucose/galactose importer. Although P. gingivalis possesses a broad assortment of secreted peptidases and pathways for the metabolism of amino acids, the bacterium appears to rely on two predicted peptide uptake systems and has only one amino acid transporter, the characterized sodium ion-driven serine/threonine uptake protein SstT (12). A LysE-type amino acid efflux protein is present that may protect the organism from toxic concentrations of amino acids.
The major fermentation products that can be produced based on whole-genome analysis and in vitro end product analyses are propionate, butyrate, isobutyrate, isovalerate, acetate, ethanol, and butanol (27). Many of these end products are probably toxic to human host tissues (see virulence section below).
Nucleosides and nucleobases may represent a hitherto-unsuspected important nutrient source for P. gingivalis and might be used either as building blocks for nucleic acid biosynthesis or may be catabolized as carbon and energy sources. There are three predicted purine uptake systems, a NupG nucleoside uptake system, and a homolog of the Salmonella enterica serovar Typhimurium nicotinamide mononucleotide transporter PnuC. In addition, there are four homologs of E. coli DinF, a DNA damage-induced protein related to sodium ion-driven drug efflux transporters, that are hypothesized to play a role in nucleoside and/or nucleotide efflux (8).
Common to most human pathogens, iron acquisition appears to be an important priority in P. gingivalis, and there are two iron chelate ABC uptake systems, two TonB-dependent iron receptors, and two FeoB ferrous iron uptake systems. There is an array of metal ion homeostasis transporters, including three sodium ion/proton exchangers, which may be important since a significant number of P. gingivalis transporters are predicted to be sodium ion driven.
The availability of the complete genome sequence of P. gingivalis facilitates the identification of putative virulence factors associated with the establishment and survival of the bacterium in the gingival crevice and subsequent penetration into host cells (Table (Table4).4). Initially, the bacterium must navigate the oral cavity where, as an obligate anaerobe, it is exposed to limited amounts of oxygen before it establishes itself in an anaerobic environment. A cluster of genes (PG1582 to PG1586) was identified with high levels of similarity to the recently described aerotolerance operon of B. fragilis (66). These functions promote the survival of B. fragilis upon exposure to oxygen, and their presence in P. gingivalis suggests that this system may also ensure tolerance to oxygen in the oral cavity. The genome also encodes a superoxide dismutase (PG1545), genes for an alkyl hydroperoxide reductase (PG0618 and PG0619) (55), a thiol peroxidase (PG1729), and a Dps homolog (PG0090) that is involved in the repair of oxidatively damaged nucleic acids (33).
The bacterium uses fimbriae to adhere to other bacterial species and host tissues. Hemagglutinins and various proteases (gingipains) are also involved in tissue colonization through adhesion to extracellular matrix proteins (38, 53, 59). Hemagglutinins in particular may mediate the binding of bacteria to receptors on human cells (21), and the gene sequences for six newly identified putative hemagglutinin-like proteins (PG0411, PG1326, PG1674, PG1427, PG1548, and PG2198) could be identified. Four of these are recent duplications in the genome of HagA and HagD adhesin domain-related sequences. A total of 42 proteinases were identified in the genome sequence that may enable adherence of the bacterium to host tissues, as well as to other bacterial cells, and that may also degrade host proteins (as discussed above). In vitro experiments have demonstrated that proteases attack a range of host proteins, including extracellular matrix proteins (32, 38, 53, 59) and cell adhesion molecules (29), the destruction of which leads to a loss of cell surface receptors (59) and tissue integrity (29). Protease destruction of cytokines (15, 42, 74) and gamma interferon (73) can result in disruption of polymorphonuclear leukocyte function (48) and ultimately affect the host immune response.
A single hemolysin for the release of iron and protophoryn IX (PG1875) was identified. This sequence has full-length homology only to the characterized hemolysin gene of another periodontal pathogen, Prevotella melaninogenica (3). These two hemolysins show absolutely no homology to any other biochemically characterized hemolysin and have weak homology to a conserved hypothetical protein/putative hemolysin fusion protein sequence from Vibrio cholerae.
In P. gingivalis, metabolic end products from the catabolism of various substrates include short-chained carboxylic acids that can affect the host defense system in a variety of ways. When applied directly to healthy human gingiva tissue, short-chain carboxylic acids have been shown to stimulate a gingival inflammatory response and inflammatory cytokine release (50). Short-chain carboxylic acids have also been shown to alter cell function and gene expression and may also contribute to the initiation and prolongation of gingival inflammation (50).
The capsule of P. gingivalis is most likely involved in the evasion of the host response and has been shown to be one of the important virulence determinants in this bacterium (34). Whole-genome analysis reveals at least four capsular biosynthesis gene clusters (PG0106 to PGPG0120, PG0435 to PG0437, PG1140 to PG1149, and PG1560 to PG1565) that are located across the genome. Closer investigation of these gene clusters suggests that mannose, glucose, and rhamnose may be some of the sugars that are present in the capsule of P. gingivalis strain W83. In several pathogens the secretion of virulence factors targeted to the host cells is mediated by type III protein secretion systems. The complete genome of P. gingivalis was searched for the presence of a cluster of nine genes Sct (Hrc/Ysc) that are known to be components of type III protein secretion systems (24). No BLAST matches with these motifs were found. Although several sec gene homologs are present in the genome, including SecA, SecY, SecD, and SecF, the main terminal branch of the general secretory pathway (type II) could not be identified, suggesting that this pathway is not functional in this bacterium.
It is estimated that 35% of the U.S. population has some form of periodontitis (2). Traditional methods for the prevention or treatment of periodontal disease include the mechanical removal of plaque and the use of antimicrobial agents. The availability of the genome sequences of at least three oral pathogens and recent investigations into the microbial composition of the human oral microbiome will afford new opportunities to investigate ways to alter the composition of the subgingival biofilm. Potential strategies include those that would decrease opportunities for biofilm formation, reduce attachment among species in the biofilm, or limit the availability of required nutrients. Inhibition of the primary colonizers could prevent the successful establishment of late colonizers such as P. gingivalis.
Recently, specific bacteria have been found to be associated with systemic diseases, e.g., Helicobacter pylori, as etiological agent in gastric ulcers, and the tentative association between the presence of Chlamydia pneumoniae in atherosclerotic plaques and cardiovascular disease (9). Stimulated by these studies and the detection of P. gingivalis and other oral pathogens in atherosclerotic plaques (10, 62), new research is assessing associations between periodontal infection and cardiovascular disease (22, 47, 52, 58). That the pathogenicity of oral bacteria may extend beyond their known ecological niche to other organ systems introduces an important and exciting new dimension to defining the genetic complement of these organisms. Although the genome sequence has not revealed many of the classical virulence factors that are associated with pathogens, it is anticipated that the newly described putative virulence factors of this bacterium, as well as from the others that have been recently sequenced or whose sequencing is nearing completion, will enable the development of antimicrobial agents that can be used against one of the major causative agents of periodontal disease. Ultimately, the genome sequence of P. gingivalis will facilitate an increased understanding of the virulence of this periodontal pathogen and will enable the development of improved diagnostics and therapeutics.
We thank Michael Heaney, Michael Holmes, Vadim Sapiro, and Emmanuel Mongodin for informatics support at TIGR. We thank J. F. Tomb for assistance in the early stages of the project.
This work was supported by the National Institute of Dental and Craniofacial Research at the National Institutes of Health (R01DE-13914).