|Home | About | Journals | Submit | Contact Us | Français|
Porphyromonas gingivalis is implicated in the etiology of chronic periodontitis. Genotyping studies suggest that genetic variability exists among P. gingivalis strains; however, the extent of variability remains unclear and regions of variability remain largely unidentified. To assess P. gingivalis strain diversity, we previously used heteroduplex analysis of the ribosomal operon intergenic spacer region (ISR) to type strains in clinical samples and identified 22 heteroduplex types. Additionally, we used ISR sequence analysis to determine the relatedness of P. gingivalis strains to one another and demonstrated a link between ISR sequence phylogeny and the disease-associated phenotype of the strains. In the current study, heteroduplex analysis of the ISR was used to determine the worldwide genetic variability and distribution of P. gingivalis, and microarray-based comparative genomic hybridization (CGH) analysis was used to more comprehensively examine the variability of major heteroduplex type strains by using the entire genome. Heteroduplex analysis of clinical samples from geographically diverse populations identified 6 predominant geographically widespread heteroduplex types (prevalence, ≥5%) and 14 rare heterodpulex types (prevalence, <2%) which are found in one or a few locations. CGH analysis of the genomes of seven clinically prevalent heteroduplex type strains identified 133 genes from strain W83 that were divergent in at least one of the other strains. The relatedness of the strains to one another determined on the basis of genome content (microarray) analysis was highly similar to their relatedness determined on the basis of ISR sequence analysis, and a striking correlation between the genome contents and disease-associated phenotypes of the strains was observed.
Porphyromonas gingivalis is a gram-negative anaerobe that has been strongly implicated as a pathogen in adult (chronic) periodontitis (3, 15, 20, 32, 39, 40), a destructive disease that affects the gingiva and supporting structures of the teeth. The bacterium is found under conditions of both health and disease, with prevalences that range from 10% to 25% in healthy individuals and 79% to 90% in individuals with periodontitis being found (20, 23, 24). Previous epidemiologic studies have demonstrated that P. gingivalis strains vary with respect to their levels of human disease association (4, 5, 21). Studies have also demonstrated that P. gingivalis strains vary in their virulence (soft tissue destruction and death) in animal models, with some strains being classified as virulent, e.g., strains W83, W50, ATCC 49417, and A7A1, and others being classified as avirulent, e.g., strains 381, 33277, and 23A4 (19, 25, 37).
Many studies have assessed the genetic diversity that exists among P. gingivalis strains, resulting in the finding of a high degree of diversity in some cases (17, 29, 31, 34) and a considerably lower degree of diversity in others (2, 28). The amount of variability found may be due to the different techniques used in the studies. In a previous clinical study, heteroduplex analysis of the intergenic spacer region (ISR) between the 16S and 23S rRNA genes was used to identify the clonal types of P. gingivalis that were present in 661 subgingival plaque samples (28). The rRNA ISR was used because, unlike the 16S gene, it possesses enough variability to distinguish between strains within the same species. Heteroduplex analysis allows similar but nonidentical DNA fragments, in this case, ISR sequences, to be distinguished by polyacrylamide gel electrophoresis (41). By that assay, 22 distinct heteroduplex types of P. gingivalis were identified in the samples, with many of them matching previously characterized laboratory strains. Multiple heteroduplex types were found in 34% of the samples (28). Additional rare heteroduplex types not found in this study have been identified in other study populations.
Various methods have been used to examine the evolutionary relationships among P. gingivalis strains and to try to correlate specific clonal types with disease status (17, 30, 34, 43). In a previous clinical study conducted to determine the distribution of P. gingivalis heteroduplex types in chronic periodontitis and health, 130 adults with clear indicators of periodontitis and 181 healthy, age-matched controls were sampled, and the heteroduplex types present in each sample that was positive for P. gingivalis were identified (21). Six heteroduplex types were present at levels high enough for statistical analysis. By using a multivariate model for the relationship of heteroduplex type to disease status, heteroduplex type hW83 was by far the most strongly associated with periodontitis (P = 0.0000), and two additional types, h49417 and hHG1691, were also statistically significantly associated with disease. The prevalence of the remaining types, h23A4, h381, and hA7A1, in health or disease was not statistically different, suggesting that they are equally likely to be detected in both groups.
Phylogenetic reconstruction of the P. gingivalis heteroduplex type strains on the basis of the sequence of the ribosomal operon ISR correlated with these findings, placing the two type strains most associated with disease (strains W83 and ATCC 49417) in a clade that was well separated from the others (42). The proximity of the two type strains most strongly associated with disease and their separation from type strains less associated with disease suggest that there is a link between ISR sequence phylogeny and the disease-associated phenotype of P. gingivalis heteroduplex types.
All these studies demonstrate that there is genetic variability among P. gingivalis strains; however, the degree of variability that exists remains unclear and the regions of variability remain largely unidentified. The availability of P. gingivalis W83 whole-genome microarrays allows a more comprehensive assessment of the genetic heterogeneity that exists within the species and can identify regions of variability by using the entire genome. Comparative genomics by microarray analysis has been used to examine the genetic diversity among strains of numerous bacterial species (10, 11, 14, 18, 26, 36, 44, 45, 47, 49), and in some cases, this has led to the identification of genes that are associated with pathogenic strains of the species (10, 14, 47).
The objectives of this study were to examine the worldwide genetic variability of P. gingivalis using heteroduplex analysis, to examine the relatedness of clinically prevalent P. gingivalis heteroduplex type strains on the basis of the entire genome (i.e., comparative genomic hybridization [CGH] analysis), to compare the relatedness of the strains on the basis of their genome content to phylogeny on the basis of the sequence of a single locus (ISR), and to identify the genes that are associated with strongly disease-associated, virulent strain W83 that could be important for virulence. Heteroduplex analysis showed that six P. gingivalis heteroduplex types (types hW83, h49417, h381, h23A4, hA7A1, and hHG1691) that matched previously characterized laboratory strains were predominant (prevalence, ≥5%) in a diverse sample population and were geographically widely distributed. Comparative genomic analysis of seven clinically prevalent P. gingivalis heteroduplex type strains with W83 whole-genome microarrays identified 133 W83 genes that were divergent in at least one of the test strains. The results of hierarchical cluster analysis of the P. gingivalis strains on the basis of their genome content (microarray data) correlated well with phylogenetic predictions from ISR sequence analysis, and a striking correlation between the genome content of the strains and their disease-associated phenotype was observed.
Human volunteers for this institutionally approved study were recruited and randomly selected, with no correlation to disease status, as follows: subjects from the United States included dental patients from the dental clinics of the Ohio State University College of Dentistry (n = 311), the University of Texas Health Sciences Center at San Antonio (n = 104), and the University of California at San Francisco (n = 74). Subjects from outside the United States were newly arriving foreign university students entering the United States from multiple locations around the world (n = 292). The foreign students would have been present in the United States for no more than 30 days prior to their arrival at the Ohio State University and were sampled during their initial health screening, which occurred within 1 week of their arrival at the university. Pooled subgingival dental plaque samples were collected from each subject. Excess saliva was removed with a cotton roll or gauze pad to minimize the collection of transient contaminating bacteria. A sterile, medium endodontic paper point (Caulk-Dentsply, Milford, DE) was placed in the mesial sulcus of each tooth for 10 s. All teeth present were sampled, and the paper points were pooled in a sterile 2-ml microcentrifuge tube and frozen. Of the 781 samples collected, 549 were positive for P. gingivalis and were used for this study.
The following P. gingivalis strains were used in this study: strain W83 (from Margaret Duncan, Boston, MA), which was used as the control strain; strain 23A4 (from Mike Curtis, London, United Kingdom); and strains W50, A7A1, 381, ATCC 33277, and ATCC 49417 (from Joseph Zambon, Buffalo, NY). All strains were originally isolated from individuals with oral infections. The bacteria were grown on brucella blood agar (Anaerobe Systems) at 37°C in a chamber with an anaerobic atmosphere (85% N2, 10% H2, 5% CO2).
DNA was isolated directly from the paper points, as described previously (27), in a final volume of 20 μl. The samples were analyzed for the presence of P. gingivalis by use of a nested, two-step PCR amplification procedure. The primers and conditions used have been well described previously (20, 27, 28, 33). Heteroduplex analysis was performed as previously well described (28). Briefly, heteroduplexes formed by the annealing of nonidentical sequences were generated by mixing PCR-amplified ISR DNA fragments from the samples with amplified ISR DNA fragments from laboratory/reference strains, denaturing the mixture at 95°C, and then allowing it to cool to 25°C at a rate of 1°C per min in a thermal cycler. For the detection of multiple strains in the samples, PCR-amplified ISR DNA fragments from the samples were heated and cooled to form heteroduplexes in the absence of added reference strain DNA. As the formation of each heteroduplex can result in either one or two new bands, the presence of one or two bands, in addition to the homoduplex band, indicates that two strains are present in the sample, while the presence of three to six bands indicates that three strains are present in the sample. Heteroduplexes were detected by polyacrylamide gel electrophoresis in 1× Tris-borate-EDTA. The gels were stained with ethidium bromide and visualized with UV light. Characteristic migration patterns were used to identify the heteroduplex types present in the samples.
Genomic DNA was isolated by using cetyltrimethlyammonium bromide and phenol-chloroform-isoamyl alcohol extractions, followed by ethanol precipitation, as described previously (6), and the DNA pellet was dissolved in 10 mM Tris chloride (pH 8).
Whole-genome microarrays provided by the Pathogen Functional Genomics Resource Center (The Institute for Genomic Research [TIGR], Rockville, MD) were used in this study. The microarrays were designed on the basis of the 1,990 W83 open reading frames (ORFs) annotated by TIGR. All 70-mer oligonucleotides were selected from within the ORFs, and no intergenic regions/elements were included in the design. Two ORFs that are less than 70 bp and 2 ORFs with a low level of sequence complexity were excluded from the design. By removing redundancy in the remaining ORFs (duplicate ORFs and ORFs that could not be differentiated by unique 70-mers), 1,907 70-mer oligonucleotides were designed to represent the unique set of remaining ORFs. Five hundred Arabidopsis thaliana 70-mers were used as controls. The oligonucleotides were printed four times onto the surface of aminosilane-coated slides. Additional array information can be viewed at http://pfgrc.jcvi.org/index.php/microarray/array_description/porphyromonas_gingivalis/version1.html.
Test and control (strain W83) DNAs were labeled and hybridized to the microarray slides according to the protocols supplied by TIGR (http://pfgrc.tigr.org/protocols/protocols.shtml). Flip dye replicates were performed for each test strain.
The microarrays were scanned with a GenePix 4000 scanner (Axon Instruments), and the slide images were processed with GenePix Pro (version 6.0) software (Axon Instruments). Spots that could not be identified by the software or by visual inspection, as well as spots with slide abnormalities or hybridization aberrations (dust, scratches, etc.), were excluded from further analysis. For further analysis, the GenePix Pro software output data were uploaded into the Gene Traffic (version 3.2) program (Iobion Informatics). The spots were filtered for a low signal (control signal less than the background or control signal less than 100 units), and the data were normalized by using LOWESS intensity-based normalization. The log2 (test/control [T/C]) ratios were computed and averaged across replicates (within and then between slides). Only genes with at least two valid measurements in each of the flip dye hybridizations were included in the final data set.
Forty genes (see Table S1 in the supplemental material) with a range of log2 (T/C) ratios were randomly selected, and their presence or absence in the test strains was confirmed by PCR and Southern blot analyses (14, 22), as follows: PCRs were performed in a final reaction mixture volume of 100 μl. The reaction mixture consisted of 1× PCR buffer (Promega), 3 mM MgCl2 (Promega), 0.2 mM each deoxynucleoside triphosphate (Amersham Biosciences), 0.5 μl Taq polymerase, and 0.3 ng/μl of each primer. The cycling conditions were as follows: 27 cycles of 94°C for 1 min, 52 to 63°C for 1 min, and 72°C for 1 min. Genes that were absent from the test strains by PCR were subjected to further verification by Southern blot hybridization. For Southern analyses, 2 μg of genomic DNA was digested with HhaI, Sau3A1, or RsaI and transferred to Hybond-N+ nylon membranes (Amersham Biosciences) by using a blotter (Turboblotter; Schleicher Schuell), according to the manufacturer's instructions. Digoxigenin-labeled probes were generated from strain W83 genomic DNA with a PCR DIG probe synthesis kit (Roche) and were hybridized to the blots at 50°C in DIG Easy Hyb solution (Roche), according to the manufacturer's instructions. Bound probes were detected with antidigoxigenin antibody conjugated to alkaline phosphatase (Roche) and 5-bromo-4-chloro-3-indolylphosphate-nitroblue tetrazolium substrate, according to the manufacturer's instructions. The results from these analyses were used to estimate the error rates (i.e., the number of genes that were divergent in test strains on the basis of the microarray data but that were confirmed to be nondivergent by PCR and Southern analyses) for potential cutoffs. Once a log2 (T/C) ratio cutoff was selected, the log2 (T/C) ratios of the genes were converted into a binary format by using the constant cutoff analysis tool CCACK (from the laboratory of Stanley Falkow [http://falkow.stanford.edu/whatwedo/software/software.html]). The locations of divergent genes in the test strains were mapped relative to the sequence of the genome of strain W83 by using the CLUSTER program and were visualized by using the Java Treeview program (http://jtreeview.sourceforge.net/).
Hierarchical clustering of the binary coded data of genes that gave valid results for all test strains (18) was performed by using the JMP statistical program (SAS Institute Inc.). Parsimony analysis was performed by using Paup*4 software (Sinauer Associates, Inc.), as described previously (42).
The CGH data from these experiments are deposited under accession number GSE13128 in the Gene Expression Omnibus repository at the National Center for Biotechnology Information.
Heteroduplex typing of P. gingivalis was performed with samples obtained from subjects from three U.S. cities and locations around the world by heteroduplex analysis of the ISR. The subjects were randomly chosen with no correlation to disease status. A total of 781 subjects from 61 countries on six continents were sampled. Samples were screened for the presence of P. gingivalis by PCR, and the heteroduplex types present in the P. gingivalis-positive samples were identified. P. gingivalis was detected by PCR in 549 (70%) of the samples collected. Of the samples in which P. gingivalis was found, 251 (39%) contained multiple heteroduplex types, which were detected and identified as described previously (28).
Table S2 in the supplemental material shows the total heteroduplex type distribution of P. gingivalis by country and region. Figure Figure1A1A shows the total count of each P. gingivalis heteroduplex type identified in this study, and Fig. Fig.1B1B shows the relative abundance of the heteroduplex types of P. gingivalis in sample subpopulations for which we had more than 40 subjects. Twenty different heteroduplex types were found, with the heteroduplex types matching laboratory strains W50 (type hW50; includes strain W83), 23A4 (type h23A4), A7A1 (type hA7A1), 381 (type h381; includes ATCC 33277), HG1691 (type hHG1691), and ATCC 49417 (type h49417) being the most common (see Table S2 in the supplemental material; Fig. Fig.1A).1A). These six major heteroduplex types (prevalence in the entire sample population, ≥5%) were geographically widespread. Three different U.S. populations were sampled (see Table S2 in the supplemental material); and although the same six heteroduplex types were predominant, their relative abundances differed, with hW50 being the most abundant in Columbus, OH, and h381 being the most abundant in San Francisco, CA. Similarly, the relative abundances of the six major heteroduplex types differed when the data were examined by continent (Fig. (Fig.1B).1B). An additional 14 rare heteroduplex types (prevalence, <2%) were found at low frequency in one or a few locations (see Table S2 in the supplemental material; Fig. Fig.1B1B).
The genome contents of seven P. gingivalis heteroduplex type strains were compared by using strain W83 whole-genome microarrays. Competitive hybridizations to microarrays were performed with W83 as the control strain and strains ATCC 49417, W50, 23A4, 381, ATCC 33277, and A7A1 as the test strains. All seven strains are commonly used laboratory strains that were originally isolated from individuals with oral infections. The strains were selected for this study because their heteroduplex types were identified in previous epidemiologic studies (21, 28) as being prevalent in U.S. populations, and we have shown in the current study that they are also geographically widespread.
For the designation of genes as nondivergent or divergent, 40 genes with various log2 (T/C) ratios were randomly selected for further analysis by PCR and Southern blot analyses (14). The presence or absence of the 40 genes (represented by 109 different log2 ratios) in the respective test strain genomes was first confirmed by PCR. The absence of the genes from the test strain genomes, as determined by PCR—which suggests divergence at the primer(s) binding site(s)—was further verified by Southern blot hybridization. Further screening of PCR-negative genes by Southern analysis identified 12 loci at which the target DNA fragments in some test strains were of a size different from the size of the corresponding fragments in the control strain (strain W83) DNA. As the restriction enzymes used to digest the genomes cut within the genes of interest, the different target band sizes in some test strains could be due to the gain/loss of a restriction site due to a sequence divergence, partial gene deletion, insertions, inversions, or other rearrangements. All genes that fell into this category (i.e., absent by PCR and a Southern hybridization pattern different from that of W83), as well as genes that were absent from the test strains by both PCR and Southern blot analyses, were designated divergent. The data were analyzed by using 12 potential log2 (T/C) cutoffs, which ranged from −1 to −2.75, and the error rates were determined for each cutoff by using the results of the PCR and Southern blot screenings.
A log2 (T/C) ratio of −1.76 (test signal, ≤29.5% of the control signal) was selected as a cutoff for the designation of genes as divergent in the test strains. Use of this cutoff gave different answers 10.7% of the time when the results from the microarray versus PCR and Southern blot hybridization analyses were compared. Besides the error inherent in microarray analysis, variation in the sequence matching the 80-mer probe could cause a decrease in hybridization that would not be seen with the longer probes used for Southern analysis or PCR. Application of the selected cutoff to the microarray data resulted in the designation of 133 W83 genes (6.4% of ORFs) as divergent in at least one test strain.
Selected characteristics of the strain W83 genes that were divergent in the test strains on the basis of CGH analysis are shown in Table S3 in the supplemental material, and their distribution in each of the test strains (on a W83 genome map) is shown in Fig. Fig.2A.2A. The divergent genes were present at 79 different locations on the genome of W83 and ranged in size from a single gene to 11 genes (PG1473 to PG1483), although not all genes at a given location may be divergent in any given test strain. On the basis of the results of CGH analysis, 12 genes (PG0219, PG0458, PG0742, PG0819, PG0838, PG0841, PG1110, PG1112, PG1471, PG1497, PG1499, and PG2134) were divergent in the genomes of all test strains except strain W50.
The functional classifications of the divergent genes are shown in Fig. Fig.2B.2B. Fifty-five genes (41.3% of divergent genes) encode hypothetical proteins presently found only in P. gingivalis. Eighteen of the genes in the hypothetical group are known to encode expressed proteins (46, 50; A. Progulske-Fox, personal communication), and 26 of them encode proteins that are predicted to have signal peptide sequences or transmembrane domains (Los Alamos National Laboratory [http://www.oralgen.lanl.gov]). The most abundant functional group of divergent genes that code for proteins with known functions is mobile and extrachromosomal elements (28 genes, 21% of divergent genes). The products of the genes in this group are involved in the exchange of DNA among bacteria and may promote the genetic diversification of the species.
Twenty-eight genomic islands have been identified in the genome of strain W83 on the basis of base composition analysis and BLAST taxonomy data (38; http://www.oralgen.lanl.gov). Seventy-five of the divergent genes identified in this study were distributed among 17 of the islands (see Table S3 in the supplemental material). Most notably, 44%, 50%, and 84% of the genes at genomic islands 7 (65 kb, mobilization cluster), 2 (15.3 kb, capsular biosynthesis cluster), and 13 (15.7 kb, Bacteroides conjugative transposon/tra gene cluster), respectively, were divergent in at least one test strain. Figure Figure33 shows the locations of the divergent genes in these three islands. The divergence of genes at the conjugative transposon/tra gene cluster (Fig. (Fig.3C)3C) was observed only in the two closely related strains ATCC 33277 and 381.
Table Table11 shows the percentages of the strain W83 ORFs that are divergent in each of the test strains on the basis of microarray analysis. Between 0% and 5.1% of the W83 ORFs were found to be divergent in the test strains. Hierarchical clustering was used to construct a dendrogram on the basis of the divergence of W83 genes in the test strains (Fig. (Fig.4A),4A), and a parsimony tree was constructed by using the ribosomal ISR sequences of all seven strains (Fig. (Fig.4B).4B). On the basis of the genome content, the test strains fall, according to their relatedness to W83, in the order W50-ATCC 49417-23A4-381 and ATCC 33277-A7A1 (Fig. (Fig.4A);4A); and on the basis of ISR sequence analysis, the strains fall, according to their relatedness to W83, in the order W50-ATCC 49417-381-ATCC 33277-A7A1-23A4 (Fig. (Fig.4B).4B). Both trees show that the pair of strains that are indistinguishable by heteroduplex analysis (strains ATCC 33277 and 381) (28) and the pair of strains that are indistinguishable by both heteroduplex analysis and ISR sequence analysis (strains W83 and W50) (28, 42) are more similar to each other than to any other strain. The trees show considerable agreement with respect to the relatedness of the strains to one another; however, strain 23A4 is predicted to be more distant from the other strains on the basis of its ISR sequence than it is when the whole genomes are compared.
When heteroduplex analysis was used to examine the genetic variability of P. gingivalis in geographically diverse populations, groups of strains represented by the well-characterized heteroduplex type strains W83 (which includes strain W50), 23A4, A7A1, 381 (which includes strain ATCC 33277), HG1691, and ATCC 49417 were found to be the most prevalent. A previous clinical study demonstrated that these heteroduplex types differ in their levels of association with human disease (21). It is important to note that although the subject population is not a comprehensive survey of the entire world population, the six major heteroduplex types were found not only in the United States but also around the world, making them clinically prevalent. Although the six heteroduplex types were predominant in all three U.S. locations sampled (see Table S2 in the supplemental material), their relative abundances differed. The same was observed when the data were examined by continent (Fig. (Fig.1B).1B). The 14 other heteroduplex types were found at low numbers and in only a few locations.
When the genomes of seven clinically prevalent heteroduplex type strains of P. gingivalis were compared, the genome of control strain W83 was largely conserved among the other strains. Between 0% (strain W50) and 5.1% (102 genes, strain ATCC 33277) of the W83 genes were found to be divergent in the test strains. The correlation between the results obtained from the whole-genome comparison and the ISR sequence analysis of the P. gingivalis strains suggests that variation in the ISR sequence is closely representative of the variation within the genome. Both ISR sequence analysis (43) and whole-genome comparisons (this study) predict that strains W50 and ATCC 49417 are the strains that are the most closely related to W83. Strains ATCC 33277 and 381 have previously been unresolvable by several different techniques (7, 12, 16, 17, 31, 34, 42) except ISR sequence analysis (43), and the current study detected a number of differences in the genomes of 381 and ATCC 33277 with respect to which W83 genes were divergent. Strains W50 and W83 have also previously been unresolvable by several different techniques, including ISR sequence analysis (12, 17, 29, 31, 34, 42, 43), and this study found no W83 genes that were divergent in strain W50. Thus, W83 and W50 are very closely related, if not identical, strains. The strong correlation between the results of ISR sequence analysis and those of genome content analysis suggests that ISR sequence analysis can be useful for predicting the relatedness of P. gingivalis strains to one another. Additionally, as the ISR is present in multiple copies (four) at different locations in the P. gingivalis genome, sequence divergence at the locus may be representative of divergence in the genome.
We previously demonstrated a clear association between specific heteroduplex types of P. gingivalis and periodontal disease status (42, 43). Different levels of disease association of different heteroduplex types of P. gingivalis were found, with type hW83 (which includes strain W50) showing the strongest association, followed in order by h49417, hHG1691, h23A4, h381 (includes ATCC 33277), and hA7A1. h23A4, h381, and hA7A1 were not statistically significantly disease associated (21). The relatedness of the two most strongly disease-associated heteroduplex types (hW83 and h49417) and their separation from less disease-associated and non-disease-associated heteroduplex types in an ISR sequence-based phylogenetic tree suggested that there was a link between ISR sequence phylogeny and the disease-associated phenotype of P. gingivalis heteroduplex types (42). The strong correlation between the relatedness of the strains to one another on the basis of ISR sequence and genome content analyses suggests that the amount of divergence seen at the ISR locus is representative of the amount of divergence in the rest of the genome. Together with the observation that the more closely related that a strain is to W83 by genome content (W50-ATCC 49417-23A4-381 and ATCC 33277-A7A1) the higher that the disease association level of its heteroduplex type is, this strongly supports our prior suggestion that there is a correlation between ISR sequence phylogeny and the disease-associated phenotype of P. gingivalis heteroduplex types.
Chen et al. previously compared the genomes of P. gingivalis strains W83 and ATCC 33277 using microarrays (13), and a comparison of those results with the results of the present study shows general agreement but a few differences. The current study found that 5.1% of the W83 genome was divergent in ATCC 33277, while Chen et al. (13) found that ~7% of the W83 genome was divergent in ATCC 33277. Both studies detected a high number of divergent genes in the regions encoding the capsular biosynthesis cluster (PG0106 to PG0120; Fig. Fig.3B)3B) and the mobilization cluster (PG0819 to PG0844) in strain ATCC 33277. The current study detected additional divergent genes throughout the mobilization cluster (which spans from PG0812 to PG0875; see Table S3 in the supplemental material) and the Bacteroides conjugative transposon/tra gene cluster (Fig. (Fig.3C)3C) in strain ATCC 33277. The difference in the amount of genome divergence detected in the two studies could be due to the use of different types of arrays—70-mer oligonucleotide arrays in the current study and PCR amplicon arrays in the previous study (13)—or the use of different types of analyses to determine the cutoffs to be used for the detection of divergent ORFs. During the preparation of the manuscript, Naito et al. published the complete genome sequence of strain ATCC 33277 (35), and using a reannotated W83 genome sequence, they performed a comprehensive genomic comparison of the two strains. As expected, they found a larger number of W83 genes that were divergent in strain ATCC 33277 than the number found in the present study or the study by Chen et al. (13).
Using restriction fragment length polymorphism analysis, Aduse-Opoku et al. previously observed significant variation at the capsular biosynthesis cluster (PG0106 to PG0120; Fig. Fig.3B)3B) among the capsular serotypes of P. gingivalis strains (1). This was expected, as the genes encode the biosynthetic machinery for different carbohydrate polymers and the microarray data from the current study show that a significant number (50%) of the genes at this cluster are divergent in at least one of the test strains, supporting the earlier observations.
The majority of divergent W83 genes that encode mobile and extrachromosomal elements are present at a genomic island (http://www.oralgen.lanl.gov) that is highly homologous to the transfer (tra) region of a Bacteroides self-transmissible conjugative transposon (CTnDOT) (8, 9). The data show that a majority of the genes at this island are divergent in closely related strains 381 and ATCC 33277 (see Table S3 in the supplemental material; Fig. Fig.3C),3C), as has been observed previously (48). Published data suggest that this locus plays a role in the transfer of chromosomal DNA from strain W83 to other P. gingivalis strains (48).
The microarray-based CGH technique used in this study is powerful; however, the results require verification by other independent methods, e.g., PCR or Southern blot analysis. The technique can be used to perform an initial survey to examine the relatedness of strains to one another and to identify potential functional differences. Sequencing of the genomes of more P. gingivalis strains would permit more thorough genomic comparisons to be made. While the CGH approach used in this study did not identify all genes in strain W83 that are divergent in the test strains, enough divergent genes were identified to give a representative picture of the amount of genetic diversity that exists among the strains.
This study has demonstrated that the major heteroduplex types of P. gingivalis are geographically widespread. Other heteroduplex types are limited to a few locations or are present at levels too low to be detected. The genomes of the widespread, clinically prevalent heteroduplex type strains have not previously been comprehensively compared. The findings reported here provide a more complete understanding of the amount of genetic variability that exists within the species and of the relatedness of the strains to one another. The results of this study support the use of ISR sequence analysis and sequencing of other loci for the identification and differentiation of P. gingivalis strains, as well as our previous hypothesis that there is a link between ISR sequence phylogeny and the disease-associated phenotype of P. gingivalis heteroduplex types. It is likely that the genomic differences identified in this study account in part for the interstrain variation in disease-associated phenotype and virulence of P. gingivalis strains. The large number of divergent genes that encode hypothetical proteins are promising candidates for further analysis, as some of them may encode proteins with functions (some of which may be novel) that allow strain W83 to colonize the oral cavity and contribute to disease to a greater extent than other P. gingivalis strains.
We thank Jeremy Wanzer for work done on the heteroduplex typing of strains. We gratefully acknowledge TIGR for providing the microarrays and some protocols. We gratefully acknowledge Herbert Auer (Ohio State University) and Robin Cline (TIGR) for technical assistance and Dave Armbruster and Robert Munson (Ohio State University) for help with data analysis.
This work was supported by grant DE10467 from the National Institutes of Health.
Published ahead of print on 12 August 2009.
†Supplemental material for this article may be found at http://jcm.asm.org/.