Obtaining the complete genome sequence of a second syphilis spirochete (SS14) shows the utility of the CGS strategy for treponemal comparative genomics. This is the first application of this approach to sequence an entire genome. This approach can be used when highly similar genomes are investigated and one genome sequence of closely related organism is known. The CGS strategy represents a rapid (days to weeks) and scalable methodology to sequence multiple syphilitic strains and clinical isolates. In the present study there was a need to further investigate some variable regions, but the directed DDT sequencing required was much less than needed to sequence a whole genome, thus lowering the total cost of obtaining the genome sequence.
There are some of the TPA-specific limitations of this approach to whole genome sequencing. Because the CGS strategy uses genomic DNA as a probe, accuracy is affected by the presence of repeated sequences. Repeat regions hybridize to more than one oligonucleotide on a tiling array resulting in both reduced sensitivity to detect changes, as well as ambiguity in assigning locations for the variants detected. Precautions have to be taken when inspecting tpr regions and others (arp gene, TP0470) which cross-react based on sequence similarity. Such regions, together with highly variable regions, need to be analyzed by WGF and sequenced by DDT to reveal true nucleotide changes and numbers of repeated regions. Another possible restriction of this methodology arises from the character of the TPA population. Multiple sequence variants in the Nichols strain population were both described previously and identified in this work, and hybridization based sequence changes discovery in these regions is influenced by the ratio between/among different sequence variants in the population. Finally, the accuracy of the genome sequence produced by CGS is dependent on the accuracy of the reference genome sequence. As suggested by two newly revealed frameshifts in Nichols strain sequence, discovered sequence changes have to be verified in Nichols sequence to describe real sequence changes compared to Nichols genome.
The SS14 genome brings a first insight into the whole genome variability within TPA. Both Nichols and SS14 cause infection in rabbits and so are not believed to be attenuated to cause infection in man, thus it is very probable none of the differences may affect the ability of the bacteria to cause the disease. The examples of interstrain heterogeneity and multiple alleles in a population of haploid organisms are candidates for antigenic variation, contingency genes and other types of SSR (short sequence repeats) [30
]. Changes resulting in significant differences in protein sequences (frameshifts and sequence changes causing protein length shifts) and hypervariable regions affected novel genes, membrane antigens and Tpr proteins. The Tpr protein family includes 12 Treponema pallidum
repeat proteins, found uniquely in this bacterium and showing sequence similarity to major sheath protein of Treponema denticola
. 8 out of 12 tpr
genes (66%) were found to be affected by sequence changes representing a higher proportion than the whole genome rate (13.1%). Positions showing interstrain and intrastrain heterogeneity or both were found in tpr
genes. Altogether 53 SNPs and 38 intrastrain variable nucleotide positions, with at least one allele identical to the sequence of the Nichols genome, were found in tpr
genes (V1–V7 regions of tprK
were excluded from this analysis). Based on the fact that tpr
genes share a high degree of similarity on the DNA level, we expect differences could be underestimated due to the limitations of the hybridization method for repeated sequences. Multiple alleles of tpr
genes were described among and within TPA strains [22
] and some TPA repeated regions (tpr
gene) were used as loci for typing of clinical isolates [34
]. Newly identified hypervariable regions (Table ) represent candidate sequences to screen clinical isolates and have potential to be used as typing markers of strains and isolates. In addition, different strains of TPA have already been tested for association with higher risk for neuroinvasion in rabbits [39
] and identification of underlying sequence changes will enable prediction of such risks. The identified variation in novel genes suggests other targets besides tpr
genes could be responsible for antigenic variation in TPA, or without support of further expression and antigenicity data, these could represent pseudogenes.