Using both DNA microarray and WGF approaches, deletions, insertions, and prominent sequence changes in 38
T. paraluiscuniculi Cuniculi A gene homologs were found when genes of this strain were compared to annotated genes of
T. pallidum subsp.
pallidum Nichols (
8). In addition, 14 genes were found to contain frameshift mutations suggesting inactivation or changed functions of the genes. DNA microarray hybridization of labeled chromosomal DNA revealed 22 ORFs (predicted genes) with a lower signal for the
T. paraluiscuniculi Cuniculi A genome than for the Nichols genome, indicating possible deletions or sequence diversity of the corresponding chromosomal loci. An alternative approach, WGF with subsequent sequencing, revealed 20 chromosomal regions (in 18 TPI intervals) with indels larger than 10 bp in the coding regions (22 genes with detectable indels), and six indels were found in the intergenic regions of the
T. paraluiscuniculi Cuniculi A genome. An additional 11 genes were found to contain small indels and multiple SNPs (TP0131, TP0133, TP0136, TP0137, TP0313, TP0315, TP0548, TP0617, TP0618, TP0621, and TP1031). Seventeen of the gene deletions or sequentially diverse genes (i.e., genes with multiple nucleotide changes) were detected both by DNA microarray analysis (17 of 22 gene deletions and sequentially diverse genes) and by WGF (17 of 27 gene deletions and sequentially diverse genes) (Fig. ). Compared to the DNA microarray approach, WGF detected 10 additional deletions or sequentially diverse genes (TP0104, TP0133, TP0313, TP0470, TP0545, TP0548, TP0733, TP0865, TP0967, and TP1029). With the exception of TP0133 and TP0470, relatively small deletions (i.e., deletions ranging from 0.74 to 8.58% of the total gene length) were detected by WGF and missed by DNA microarray hybridization. It is likely that such deletions cannot be detected by DNA microarray hybridization under the conditions used. The sequences present in the TP0136 locus in the Cuniculi A genome are similar to the Nichols TP0133 sequences. Therefore, this false-negative result for TP0133 obtained with the Cuniculi A DNA microarray was likely due to DNA cross-hybridization of labeled TP0136 DNA. The deleted region of TP0470 (23.8% of the gene length) comprises a chromosomal region containing tandem repetitions (length, 24 nucleotides [nt]); i.e., the deletion resulted in a decreased number of tandem repetitions. This region also showed interstrain genetic heterogeneity within
T. pallidum strains (data not shown). In this locus, PCR products of variable lengths were also observed after amplification from the Cuniculi A DNA, suggesting possible intrastrain heterogeneity or PCR artifacts. Populations of spirochetes containing different numbers of tandem repetitions may distort the results of DNA microarray hybridization analyses.
Compared to the WGF results, the DNA microarray approach identified five additional genes (TP0117, TP0462, TP0896, TP0897, and TP0970) with lower hybridization signals. Two of these genes belonged to PGF2, one belonged to PGF15, and two were unique (TP0896 and TP0970). Sequence diversity of these genes was identified as the reason for the lower hybridization signals on the DNA microarray. In these genes, the sequence diversity was dispersed throughout the entire genes and thus had the potential to affect hybridization to a DNA microarray (P. Matějková, unpublished results). DNA microarray and WGF approaches thus represent complementary methods; DNA microarray analysis allows selective detection of diverse chromosomal regions, and WGF allows selective identification of insertions within the genes and indels in intergenic regions.
Several of the observed indels and sequence changes were identified in the family of
tpr genes (in 8 of 12
tpr genes). The
T. pallidum repeat (
tpr) genes encode paralogous proteins with sequence similarity to the major outer sheath protein (Msp) of
T. denticola (
7). The
tpr genes are specific for
T. pallidum and
T. paraluiscuniculi, and several of them show heterogeneity both within and between the
T. pallidum subspecies and strains examined (
3,
4,
5). It is believed that the Tpr proteins are involved in pathogenesis and/or immune evasion. The TprK protein was found to induce a strong humoral and cellular immune response (
3,
14,
15), and variable regions of TprK are responsible for the specificity of the antibody response (
16). Moreover, sequences of variable regions of TprK change during infection and passage of
T. pallidum subsp.
pallidum strains (
6) by a gene conversion mechanism with donor sites in the vicinity of
tpr genes (e.g., in TP0137 and in TP0126 to TP0130). Thus, some of the observed genetic differences in the
tprK locus of the
T. paraluiscuniculi genome may also be due to this gene conversion mechanism. In addition, three new ORFs in the
T. paraluiscuniculi genome with
tprK-like sequences were identified.
With the exception of
tpr genes, the TP0104 gene (5′ nucleotidase), and the TP0545 gene (periplasmic galactose-binding protein), all other detected indels or sequence changes were localized in the genes encoding a conserved hypothetical protein (TP0470) or hypothetical proteins. The average transcription rate of these genes in
T. pallidum subsp.
pallidum cultivated in rabbit testes is considerably higher (1.74) than the average transcription rate of all genes of
T. pallidum subsp.
pallidum Nichols (1.0) (
24). In addition, 8 of 29 (27.6%) of the proteins encoded by these genes were found to be recognized by serum antibodies derived from rabbits 84 days after infection with the Nichols strain (
13). Both of these findings indicate that several of the putative genes identified are transcribed and translated and suggest that these
T. pallidum subsp.
pallidum genes are important during infection of rabbits. Most of these genes (17 of 29) were localized in the vicinity of
tpr genes. Insertions identified in the
T. paraluiscuniculi genome indicated that the sequences were
tprK-like,
tprA or
tprB sequences, or unique sequences with no homologous sequences identified by the BLAST search. Deletion of the signal sequence peptide in MglB-1 encoded by TP0545 in the Cuniculi A strain may result in aborted export of this protein to the periplasm.
Seventeen hypothetical proteins were analyzed to predict cellular localization. Signal sequences were predicted in six and five proteins encoded in the Nichols and Cuniculi A genomes, respectively. Except for two hypothetical proteins (TP0548 and TP0733), signal sequences were predicted for different Nichols and Cuniculi A proteins. Possible localization of these proteins outside the cytoplasm may contribute to the different host ranges and pathogenicities of the Nichols and Cuniculi A strains.
A portion of the TPI12 region of the
T. paraluiscuniculi genome sequenced in this study was nearly identical to a previously sequenced 2,792-nt region (accession number AY685237) comprising a nonfunctional
tprD2 gene (
9). Differences in 9 nt were found. Other regions of near identity with previously sequenced regions (
9) were found in TPI2 and the accession number AY685232 sequence (
tprA, 1,003 nt), in TPI2 and the accession number AY685233 sequence (
tprB, 838 nt), in TPI25A and the accession number AY685239 sequence (nonfunctional
tprG1, 3,255 nt), in TPI25B and the accession number AY685238 sequence (nonfunctional
tprG and
tprI, 2,449 nt), in TPI48 and the accession number AY685240 sequence (nonfunctional
tprG2, 3,018 nt), and in TPI77 and the accession number AY685235 sequence (
tprL, 1,331 nt). Within these regions, two, zero, seven, four, nine, and three nucleotide differences were found, respectively. These results could reflect differences accumulated in the Cuniculi A genome during independent cultivation in different laboratories; they potentially could also be due to PCR errors. It was previously shown that in the
tprK locus (TP0897), sequence changes occurred during infection and passage of
T. pallidum subsp.
pallidum strain Chicago (
6).
Altogether, 639 target restriction sites (representing 3.8 kb of the genomic sequence or 0.34% of the Nichols genome) in the Cuniculi A genome were analyzed with three enzymes (BamHI, HindIII, and EcoRI). Assuming that the majority of additional or missing restriction target sites were due to single nucleotide changes, the sequence similarity of the Cuniculi A and Nichols genomes could be predicted to be 98.6%. Sequencing of three chromosomal regions representing 0.46% of the Cuniculi A genome revealed a sequence identity of 99.3%. However, the latter result is a rather high estimate of the sequence identity because the value could be distorted by a number of factors, including nonrandom distribution of sequenced DNA and the fact that the sequentially divergent regions in the Cuniculi A strain appear to be localized in certain chromosome regions.
The data presented indicate that the genomes of T. pallidum subsp. pallidum and T. paraluiscuniculi are very closely related and that most of the observed differences are localized in tpr loci and in the vicinity of these loci, suggesting their possible role in the host range and pathogenicity of T. pallidum subsp. pallidum. The high degree of sequence similarity of the genomes tested could be used for planning an optimal genome sequencing strategy. In further studies, the high level of relatedness of the T. pallidum subsp. pallidum, T. pallidum subsp. pertenue, and T. paraluiscuniculi genomes could be used for identifying and deciphering T. pallidum subsp. pallidum virulence determinants.