|Home | About | Journals | Submit | Contact Us | Français|
The genome of Treponema paraluiscuniculi strain Cuniculi A was compared to the genome of the syphilis spirochete Treponema pallidum subsp. pallidum strain Nichols using DNA microarray hybridization, whole-genome fingerprinting, and DNA sequencing. A DNA microarray of T. pallidum subsp. pallidum Nichols containing all 1,039 predicted open reading frame PCR products was used to identify deletions and major sequence changes in the Cuniculi A genome. Using these approaches, deletions, insertions, and prominent sequence changes were found in 38 gene homologs and six intergenic regions of the Cuniculi A genome when it was compared to the genome of T. pallidum subsp. pallidum Nichols. Most of the observed differences were localized in tpr loci and the vicinity of these loci. In addition, 14 other genes were found to contain frameshift mutations resulting in major changes in protein sequences. Analysis of restriction target sites representing 0.34% of the total genome length and DNA sequencing of three PCR products (0.46% of the total genome length) amplified from Cuniculi A chromosomal regions and comparison to the Nichols genome revealed a sequence similarity of 98.6 to 99.3%. These results are consistent with a close genetic relationship among the T. pallidum strains and subspecies and a strong, but relatively divergent connection between the human and rabbit pathogens.
The genus Treponema comprises five noncultivable species and subspecies showing various degrees of invasiveness and pathogenicity to humans (18). Treponema pallidum subsp. pallidum is the highly invasive causative agent of syphilis and can cause infection of the central nervous system, cardiovascular system, and almost any other tissue. T. pallidum subsp. pertenue and T. pallidum subsp. endemicum are moderately invasive pathogens that cause yaws and endemic syphilis (bejel), respectively; they cause lesions in skin and bone but rarely affect other internal organs. Treponema carateum is the causative agent of the noninvasive human disease pinta (25), and Treponema paraluiscuniculi is not infectious to humans (10) but causes venereal spirochetosis in rabbits.
The T. pallidum subspecies and T. paraluiscuniculi cannot be distinguished by morphology, protein content, or physiology (12, 17), suggesting that they are closely related. Serum from rabbits infected with T. paraluiscuniculi cross-reacted with 21 of 22 proteins recognized by rabbit antibodies raised against T. pallidum subsp. pallidum (1). However, in rabbits (which are susceptible to both T. pallidum subsp. pallidum and T. paraluiscuniculi infection) there is no immunological cross-protection against these species (23, 25). In addition to a lack of cross-immunity, these bacterial species differ in their host specificity and the clinical manifestations of the diseases that they cause. Human syphilis is a sexually transmitted disease characterized by infection of a wide spectrum of tissues and organs, multiple stages, persistent infection for years to decades, and variouos clinical manifestations (18), whereas rabbit venereal spirochetosis is characterized by genital lesions (17). These findings suggest that there are important differences between the two species in terms of antigens and virulence factor expression. Genetic differences between T. pallidum subsp. pallidum and T. paraluiscuniculi must account for the observed differences in immunity, host specificity, and clinical manifestations.
Neither T. pallidum subsp. pallidum nor T. paraluiscuniculi has been cultured continuously in vitro, and this fact prevents the use of common molecular genetic approaches to study these pathogens. Sequencing and in silico analysis of the T. pallidum subsp. pallidum Nichols genome (8, 26) allowed comparison of these genomes by use of comparative genomics methods.
In this study we compared the genomes of T. pallidum subsp. pallidum Nichols and T paraluiscuniculi Cuniculi A using DNA microarray hybridization, whole-genome fingerprinting (WGF), and sequencing of chromosomal regions.
T. pallidum subsp. pallidum Nichols and T. paraluiscuniculi Cuniculi A were maintained by rabbit inoculation and purified by Hypaque gradient centrifugation as described previously (2, 8). Chromosomal DNA was prepared as described by Fraser et al. (8).
Preparations of T. pallidum subsp. pallidum and T. paraluiscuniculi chromosomal DNA (0.25 to 0.75 μg) were labeled fluorescently using the Klenow enzyme (New England Biolabs, Ipswich, MA) and random nonamers with a CyScribe First-Strand cDNA labeling kit (Amersham Pharmacia Biotech, Piscataway, NJ) according to the protocol described previously (24). Microarrays containing PCR products representing the 1,039 T. pallidum subsp. pallidum Nichols open reading frames (ORFs) were prepared as described by Šmajs et al. (24). The pretreated slides (24) were hybridized simultaneously with labeled DNA using the CyScribe First-Strand cDNA labeling kit (Amersham Pharmacia Biotech). Quantitation of hybridization, exclusion of outliers, and data normalization were performed using the TIGR Spotfinder and TIGR MIDAS software (21). Combining the results of four independent experiments, including dye swapping in two separate hybridizations, yielded 12 possible values for each gene. From these data points, average signal ratios (ASR) and standard deviations were calculated. From these data, a set of genes with mean ASR of labeled T. paraluiscuniculi Cuniculi A DNA to labeled T. pallidum subsp. pallidum Nichols DNA (ASRCuniculi A/Nichols) less than 0.7 (average log2 ratio less than −0.51) was derived. This set comprised 22 genes that are likely to contain deletions and/or major sequence changes. No genes with a mean ASR greater than 1.43 (average log2 ratio greater than 0.51) were identified.
WGF was performed as described previously (27). The chromosomal DNA was amplified in 97 overlapping regions with a median length of 12,307 bp (range, 1,778 to 24,758 bp) using a GeneAmp XL PCR kit (Applied Biosystems, Foster City, CA). The primer pairs used for these amplifications are shown in Table S1 in the supplemental material. Each PCR product was digested with BamHI, EcoRI, or HindIII or combinations of these enzymes. To thoroughly assess the possible presence of deletions and insertions in restriction fragments, additional digestions were performed as needed to reduce the length of each restriction fragment to ≤4 kb. This was achieved by additional digestion with AccI, ClaI, EcoRV, KpnI, MluI, NcoI, NheI, RsrII, SacI, SpeI, XbaI, or XhoI (NEB) or combinations of these enzymes. The resulting fingerprints for T. pallidum subsp. pallidum Nichols were compared to those for the T. paraluiscuniculi Cuniculi A genome.
Standard methods were used for PCR amplification from a chromosomal DNA template and agarose gel electrophoresis (22). For sequencing of PCR products, XL PCR was used to minimize the number of PCR errors. Oligonucleotide primers were designed with Primer3 software (20). The resulting PCR products were purified using a QIAquick PCR purification kit (QIAGEN) and were sequenced using a Taq DyeDeoxy terminator cycle sequencing kit (Applied Biosystems). Complete sequences of amplified regions were finished using specifically designed synthetic oligonucleotides as primers. Computer-assisted sequence analysis was performed using the LASERGENE program package (DNASTAR, Madison, WI). Three XL PCR products comprising regions TPI12, TPI25A, and TPI25B (see Table S1 in the supplemental material) were purified and subjected to mechanical shearing to obtain smaller fragments (0.5 to 1 kb) that were cloned into the pUC18 vector. The resulting recombinant plasmids (96 plasmids for each XL PCR product) of the small insert library were isolated and sequenced using forward and reverse pUC18 primers to obtain multiple coverage (i.e., 2 × 96 sequencing reactions per XL PCR product).
The nucleotide sequences reported in this study have been deposited in the GenBank database under accession numbers EF057750, EF137736 to EF137743, and EF419245 to EF419253.
In this analysis, we used a T. pallidum subsp. pallidum Nichols DNA microarray containing PCR products corresponding to all 1,039 predicted ORFs (24). T. pallidum subsp. pallidum Nichols and T. paraluiscuniculi Cuniculi A chromosomal DNA were labeled with the Cy3 and Cy5 dyes and hybridized simultaneously on individual arrays. The hybridizations were performed four times, including dye swapping in two hybridizations, resulting in a total of 12 hybridizations for each ORF. Hybridization of labeled T. pallidum subsp. pallidum Nichols DNA probes yielded a detectable signal in ≤6 of the 12 reactions for 11 ORFs (TP0161, TP0224, TP0490, TP0573, TP0645, TP0753, TP0777, TP0795, TP0818, TP0932, and TP1032), and therefore analysis of these ORFs was not performed. All of these ORFs are relatively short (93, 105, 189, 93, 177, 285, 225, 159, 153, 93, and 432 bp, respectively) and code for a conserved hypothetical protein (TP0490) or hypothetical proteins (TP0161, TP0224, TP0573, TP0645, TP0753, TP0777, TP0795, TP0818, TP0932, and TP1032). Thus, data were calculated for 1,028 of 1,039 genes (99%) by determining the ASRCuniculi A/Nichols representing the average, normalized ratio of T. paraluiscuniculi Cuniculi A DNA fluorescent signals to T. pallidum subsp. pallidum Nichols DNA fluorescent signals for replicate spots on each microarray and in replicate experiments. A value of 1.0 corresponded to the mean signal for all genes of the array.
The results of the DNA microarray hybridizations are shown in Table Table1.1. Use of the Cuniculi A probe yielded significantly lower signals for 22 genes, with ASRCuniculi A/Nichols ranging from 0.14 to 0.7 (Table (Table1).1). These genes were not randomly distributed throughout the genome (Fig. (Fig.1)1) and tended to be clustered in regions containing tpr genes and genes in the vicinity of tpr genes. All but four putative genes (TP0128, TP0129, TP0896, and TP0970) belonged to paralogous gene family 2 (PGF2), PGF14, and PGF15. PGF2 represents tpr genes encoding Tpr proteins, which are T. pallidum-specific proteins of unknown function with similarity to the Treponema denticola membrane protein Msp (7). Eight tpr genes (tprC, tprD, tprF, tprG, tprI, tprJ, tprK and tprL) had significantly lower ASRCuniculi A/Nichols values, indicating that there were deletions or sequence diversity in the T. paraluiscuniculi Cuniculi A genes. In contrast, signals for the tprA, tprB, tprE, and tprH genes were similar in the two genomes, indicating that these genes are present in the Cuniculi A genome and that the Cuniculi A genes are highly homologous to their Nichols counterparts. Genes belonging to PGF14 and PGF15 encode hypothetical proteins with unknown functions encoded by genes in the vicinity of tpr genes. The ASRCuniculi A/Nichols values for PGF14 and PGF15 are considerably lower (range, 0.14 to 0.56) than the ASRCuniculi A/Nichols values for tpr genes (0.49 to 0.68). The average lengths of the tpr genes and the PGF14 and PGF15 genes listed in Table Table11 were 1,779 bp and 658 bp, respectively, so the PCR products used in the microarray were longer for the tpr genes. Therefore, this approach may be less sensitive to detection of sequence changes when larger genes are examined.
The genomes of T. pallidum subsp. pallidum strain Nichols and T. paraluiscuniculi strain Cuniculi A were analyzed using 97 overlapping XL PCR amplicons covering the entire treponemal genomes. Restriction mapping of 97 XL PCR products revealed 20 chromosomal regions (in 18 TPI intervals [Table [Table2])2]) with detectable insertions or deletions (indels) in the Cuniculi A genome (Table (Table2).2). Subsequent sequencing of heterologous parts of these TPI intervals revealed 10 deletions ranging from 15 to 2,609 bp and eight insertions ranging from 41 to 885 bp, as well as two regions (TPI12-13 and TPI66A) with both deletions and insertions. In addition, eight genes were found to contain small indels and multiple single nucleotide polymorphisms (SNPs) (TP0133, TP0136, TP0137, TP0315, TP0548, TP0617, TP0618, and TP1031 [Table [Table3]),3]), and three tpr genes contained multiple SNPs and single nucleotide indels that resulted in frameshift gene mutations (TP0131, TP0313, and TP0621 [Table [Table3]).3]). The largest deletions comprised TP0127 to TP0129, TP0133 to TP0135, tprF (TP0316) and tprG (TP0317), and TP0619 and tprI (TP0620). Deletions in the tprF and tprG genes (Table (Table2)2) and a frameshift mutation in tprG (Table (Table3)3) resulted in a new ORF encoding a 429-amino-acid TprI-like protein. A similar finding in the T. paraluiscuniculi Cuniculi A genome was described by Giacani et al. (9). The largest insertion was localized in the intergenic region (IGR) between the TP0126 and TP0129 loci (the TP0127 and TP0128 genes are missing in the Cuniculi A genome) and included 1,451 bp containing three ORFs showing similarity to the tprK sequence. Another large insertion (885 bp) comprising 14 complete repetitions (length, 60 bp) and one incomplete repetition (45 bp) was found in overlapping parts of the TP0433 and TP0434 genes. The expansion of repetitive sequences was shown by gel electrophoresis and draft sequencing of tandem repeats (data not shown). Because of the tandem repeats, this region was not sequenced completely, and the number of repeats was estimated based on gel electrophoresis. The described insertion resulted in fusion of the two genes (TP0433 and TP0434) to form a single acidic repeat protein gene (arp) (19). Except for deletions of TP0104 (encoding a 5′ nucleotidase), the tprF, tprG, and tprI genes, and TP0545 (encoding MglB-1, a methylgalactoside ABC transporter, periplasmic galactose-binding protein), all other indels were localized in genes encoding hypothetical proteins or in intergenic regions.
Taken together, the indels identified resulted in complete deletion of five genes (TP0127, TP0128, TP0134, TP0619, and TP0620), 13 partial gene deletions (TP0104, TP0129, TP0135, TP0136, TP0315, TP0316, TP0317, TP0470, TP0545, TP0733, TP0865, TP0967, and TP1029), seven gene elongations (TP0133, TP0487, TP0548, TP0616, TP0860, TP0923, and TP1031), two gene fusions (TP0433-TP0434 and TP0617-TP0618), two deletions in IGR (TP0135-TP0136 and TP0545-TP0546), and four insertions in IGR (TP0009-TP0010, TP0126-TP0129, TP0135-TP0136, and TP0548-TP0549). In addition, 14 frameshifts were identified in TP0126, TP0131, TP0132, TP0309, TP0310, TP0311, TP0313, TP0317, TP0318, TP0487, TP0621, TP0922, TP0968, and TP1030 leading to premature termination of 10 genes, elongation of three genes, and a reading frame change in one gene (Table (Table33).
Hypothetical proteins were characterized by searching the InterPro and Pfam databases and constructing hydrophobicity plots. Of the 25 hypothetical genes described in this study (Tables (Tables11 to to3),3), 17 were completely sequenced in both the Cuniculi A and Nichols strains. The corresponding 17 hypothetical proteins were analyzed to predict cellular localization. Signal sequences were predicted for six Nichols proteins (encoded by TP0133, TP0134, TP0135, TP0136, TP0548, and TP0733) and five Cuniculi A proteins (encoded by TP0315, TP0470, TP0548, fused genes TP0617 and TP0618, and TP0733). In both strains, transmembrane regions were predicted for TP0733. Three putative protein domains (UPF0164, TPR, and DbpA) were found in five hypothetical proteins (encoded by TP0470, TP0548, TP0860, TP0865, and TP1029). No differences between the Nichols and Cuniculi A strains were found in domain distribution.
To determine the level of sequence identity between the Nichols and Cuniculi A genomes, three chromosomal regions that also included IGR were sequenced. Analysis of these regions (5,289 bp; 0.46% of the genome), comprising genes TP0798 to TP0800 (accession number EF419251), TP0933 and TP0934 (accession number EF419252), and TP0961 and TP0962 (accession number EF419253), revealed 37 SNPs that resulted in 11 amino acid substitutions in the corresponding proteins. The average density of SNPs represented one nucleotide change per 143 bp (99.3% identity).
Using both DNA microarray and WGF approaches, deletions, insertions, and prominent sequence changes in 38 T. paraluiscuniculi Cuniculi A gene homologs were found when genes of this strain were compared to annotated genes of T. pallidum subsp. pallidum Nichols (8). In addition, 14 genes were found to contain frameshift mutations suggesting inactivation or changed functions of the genes. DNA microarray hybridization of labeled chromosomal DNA revealed 22 ORFs (predicted genes) with a lower signal for the T. paraluiscuniculi Cuniculi A genome than for the Nichols genome, indicating possible deletions or sequence diversity of the corresponding chromosomal loci. An alternative approach, WGF with subsequent sequencing, revealed 20 chromosomal regions (in 18 TPI intervals) with indels larger than 10 bp in the coding regions (22 genes with detectable indels), and six indels were found in the intergenic regions of the T. paraluiscuniculi Cuniculi A genome. An additional 11 genes were found to contain small indels and multiple SNPs (TP0131, TP0133, TP0136, TP0137, TP0313, TP0315, TP0548, TP0617, TP0618, TP0621, and TP1031). Seventeen of the gene deletions or sequentially diverse genes (i.e., genes with multiple nucleotide changes) were detected both by DNA microarray analysis (17 of 22 gene deletions and sequentially diverse genes) and by WGF (17 of 27 gene deletions and sequentially diverse genes) (Fig. (Fig.1).1). Compared to the DNA microarray approach, WGF detected 10 additional deletions or sequentially diverse genes (TP0104, TP0133, TP0313, TP0470, TP0545, TP0548, TP0733, TP0865, TP0967, and TP1029). With the exception of TP0133 and TP0470, relatively small deletions (i.e., deletions ranging from 0.74 to 8.58% of the total gene length) were detected by WGF and missed by DNA microarray hybridization. It is likely that such deletions cannot be detected by DNA microarray hybridization under the conditions used. The sequences present in the TP0136 locus in the Cuniculi A genome are similar to the Nichols TP0133 sequences. Therefore, this false-negative result for TP0133 obtained with the Cuniculi A DNA microarray was likely due to DNA cross-hybridization of labeled TP0136 DNA. The deleted region of TP0470 (23.8% of the gene length) comprises a chromosomal region containing tandem repetitions (length, 24 nucleotides [nt]); i.e., the deletion resulted in a decreased number of tandem repetitions. This region also showed interstrain genetic heterogeneity within T. pallidum strains (data not shown). In this locus, PCR products of variable lengths were also observed after amplification from the Cuniculi A DNA, suggesting possible intrastrain heterogeneity or PCR artifacts. Populations of spirochetes containing different numbers of tandem repetitions may distort the results of DNA microarray hybridization analyses.
Compared to the WGF results, the DNA microarray approach identified five additional genes (TP0117, TP0462, TP0896, TP0897, and TP0970) with lower hybridization signals. Two of these genes belonged to PGF2, one belonged to PGF15, and two were unique (TP0896 and TP0970). Sequence diversity of these genes was identified as the reason for the lower hybridization signals on the DNA microarray. In these genes, the sequence diversity was dispersed throughout the entire genes and thus had the potential to affect hybridization to a DNA microarray (P. Matějková, unpublished results). DNA microarray and WGF approaches thus represent complementary methods; DNA microarray analysis allows selective detection of diverse chromosomal regions, and WGF allows selective identification of insertions within the genes and indels in intergenic regions.
Several of the observed indels and sequence changes were identified in the family of tpr genes (in 8 of 12 tpr genes). The T. pallidum repeat (tpr) genes encode paralogous proteins with sequence similarity to the major outer sheath protein (Msp) of T. denticola (7). The tpr genes are specific for T. pallidum and T. paraluiscuniculi, and several of them show heterogeneity both within and between the T. pallidum subspecies and strains examined (3, 4, 5). It is believed that the Tpr proteins are involved in pathogenesis and/or immune evasion. The TprK protein was found to induce a strong humoral and cellular immune response (3, 14, 15), and variable regions of TprK are responsible for the specificity of the antibody response (16). Moreover, sequences of variable regions of TprK change during infection and passage of T. pallidum subsp. pallidum strains (6) by a gene conversion mechanism with donor sites in the vicinity of tpr genes (e.g., in TP0137 and in TP0126 to TP0130). Thus, some of the observed genetic differences in the tprK locus of the T. paraluiscuniculi genome may also be due to this gene conversion mechanism. In addition, three new ORFs in the T. paraluiscuniculi genome with tprK-like sequences were identified.
With the exception of tpr genes, the TP0104 gene (5′ nucleotidase), and the TP0545 gene (periplasmic galactose-binding protein), all other detected indels or sequence changes were localized in the genes encoding a conserved hypothetical protein (TP0470) or hypothetical proteins. The average transcription rate of these genes in T. pallidum subsp. pallidum cultivated in rabbit testes is considerably higher (1.74) than the average transcription rate of all genes of T. pallidum subsp. pallidum Nichols (1.0) (24). In addition, 8 of 29 (27.6%) of the proteins encoded by these genes were found to be recognized by serum antibodies derived from rabbits 84 days after infection with the Nichols strain (13). Both of these findings indicate that several of the putative genes identified are transcribed and translated and suggest that these T. pallidum subsp. pallidum genes are important during infection of rabbits. Most of these genes (17 of 29) were localized in the vicinity of tpr genes. Insertions identified in the T. paraluiscuniculi genome indicated that the sequences were tprK-like, tprA or tprB sequences, or unique sequences with no homologous sequences identified by the BLAST search. Deletion of the signal sequence peptide in MglB-1 encoded by TP0545 in the Cuniculi A strain may result in aborted export of this protein to the periplasm.
Seventeen hypothetical proteins were analyzed to predict cellular localization. Signal sequences were predicted in six and five proteins encoded in the Nichols and Cuniculi A genomes, respectively. Except for two hypothetical proteins (TP0548 and TP0733), signal sequences were predicted for different Nichols and Cuniculi A proteins. Possible localization of these proteins outside the cytoplasm may contribute to the different host ranges and pathogenicities of the Nichols and Cuniculi A strains.
A portion of the TPI12 region of the T. paraluiscuniculi genome sequenced in this study was nearly identical to a previously sequenced 2,792-nt region (accession number AY685237) comprising a nonfunctional tprD2 gene (9). Differences in 9 nt were found. Other regions of near identity with previously sequenced regions (9) were found in TPI2 and the accession number AY685232 sequence (tprA, 1,003 nt), in TPI2 and the accession number AY685233 sequence (tprB, 838 nt), in TPI25A and the accession number AY685239 sequence (nonfunctional tprG1, 3,255 nt), in TPI25B and the accession number AY685238 sequence (nonfunctional tprG and tprI, 2,449 nt), in TPI48 and the accession number AY685240 sequence (nonfunctional tprG2, 3,018 nt), and in TPI77 and the accession number AY685235 sequence (tprL, 1,331 nt). Within these regions, two, zero, seven, four, nine, and three nucleotide differences were found, respectively. These results could reflect differences accumulated in the Cuniculi A genome during independent cultivation in different laboratories; they potentially could also be due to PCR errors. It was previously shown that in the tprK locus (TP0897), sequence changes occurred during infection and passage of T. pallidum subsp. pallidum strain Chicago (6).
Altogether, 639 target restriction sites (representing 3.8 kb of the genomic sequence or 0.34% of the Nichols genome) in the Cuniculi A genome were analyzed with three enzymes (BamHI, HindIII, and EcoRI). Assuming that the majority of additional or missing restriction target sites were due to single nucleotide changes, the sequence similarity of the Cuniculi A and Nichols genomes could be predicted to be 98.6%. Sequencing of three chromosomal regions representing 0.46% of the Cuniculi A genome revealed a sequence identity of 99.3%. However, the latter result is a rather high estimate of the sequence identity because the value could be distorted by a number of factors, including nonrandom distribution of sequenced DNA and the fact that the sequentially divergent regions in the Cuniculi A strain appear to be localized in certain chromosome regions.
The data presented indicate that the genomes of T. pallidum subsp. pallidum and T. paraluiscuniculi are very closely related and that most of the observed differences are localized in tpr loci and in the vicinity of these loci, suggesting their possible role in the host range and pathogenicity of T. pallidum subsp. pallidum. The high degree of sequence similarity of the genomes tested could be used for planning an optimal genome sequencing strategy. In further studies, the high level of relatedness of the T. pallidum subsp. pallidum, T. pallidum subsp. pertenue, and T. paraluiscuniculi genomes could be used for identifying and deciphering T. pallidum subsp. pallidum virulence determinants.
We thank S. Lukehart for providing the T. paraluiscuniculi Cuniculi A strain.
This work was supported by Public Health Service grants to G.M.W. (grants R01 DE12488 and R01 DE13759) and S.J.N. (grants R01 AI49252 and R03 AI69107) and by grants 310/04/0021 and 310/07/0321 from the Grant Agency of the Czech Republic, grant NR8967-4/2006 from the Ministry of Health of the Czech Republic, and grant VZ MSM0021622415 from the Ministry of Education of the Czech Republic to D.S.
Editor: A. Camilli
Published ahead of print on 24 September 2007.
†Supplemental material for this article may be found at http://iai.asm.org/.