Complete genome sequences of the TPA Mexico A strain was revealed. The genome size, G+C content and gene order was identical with other already sequenced TPA genomes 
. The Mexico A genome was most closely related to SS14 genome and differed in less than 300 hundred substitutions and indels. Since it has been published that the Nichols and the SS14 genomes contain about 200 nt errors 
a lower number of nucleotide changes differentiating the Mexico A and SS14 genome can be expected. In fact, the number of nucleotide differences between Mexico A and SS14 genomes (except of differences present in the tprD
genes) is probably lower than one hundred (Pětrošová, unpublished results). In any of these comparisons, the identified differences were more frequently present in (i) genes encoding putative virulence factors, (ii) genes involved in cell structure and processes and (iii) genes coding for DNA replication, repair and recombination. In contrast, genes encoding components of general metabolism, transcription, translation, gene regulation and transport appear to be conserved.
The observed mosaic character of the Mexico A TPAMA_0326 (tp92) and TPAMA_0488 (mcp2-1) loci, combining both TPA- and TPE-specific nucleotide sequences, can be, in principle, explained by six independent mechanisms including i) an ancestral position of the Mexico A strain with respect to both TPA and TPE strains, ii) rapid accumulation of nucleotide changes during evolution of TPA strains from TPE strains with the Mexico A as an intermediate, iii) intra-strain recombination between paralogous sequences, iv) artifacts during PCR amplification (as a result of contamination with TPE genomic DNA) and/or contamination with TPE-amplified DNA, v) convergent evolution and vi) inter-strain recombination between TPA and TPE strains during simultaneous infection of one host.
i) The first explanation can be ruled out because only two chromosomal loci (TPAMA_0326 and TPAMA_0488) showed demonstrable similarity to TPE strains. Moreover, the number of Mexico A-specific mutations (i.e., mutations that are only present in the Mexico A genome and not in other sequenced TPA genomes) is not significantly different from the number of specific mutations in other TPA genomes (data not shown). In a predicted common ancestor, one would expect a considerably higher number of ancestor-specific mutations in comparison to progenies. ii) The second hypothesis is illustrated in . The hypothetical evolution scheme comprises TPA, TPE and TEN strains arranged according to their relatedness to other TP strains 
(see also ). We sequenced TP0326 (tp92
) and TP0488 (mcp2-1
) loci in TEN strain Bosnia A (GenBank acc.no. JX392330.1 and JX392331.1, respectively; our TP0326 sequence is identical to partial tp92
sequence of Bosnia A published by Harper et al.
). The sequencing data showed that TEN strain Bosnia A contains the same nucleotide mosaic in the TP0488 (mcp2-1
) locus as Mexico A (with the exception of 2 single nucleotide substitutions) and similarly, some TPA isolates belonging to the SS14-like group of TPA strains show a TEN-specific pattern in the TP0326 (tp92
) locus. It was impossible to propose an evolutionary model based only on accumulation or loss of nucleotide changes (see ), and this fact supports recombination hypothesis. iii) The third hypothesis was rejected when we failed to identify potential recombinant (donor) sites for the TPAMA_0326 and TPAMA_0488 genes in the Mexico A genome, despite several attempts to identify such regions using several computer programs and algorithms (RDP3, EditSeq (DNASTAR), BLAST). iv) While it is known that PCR amplification of two sequentially related templates can result in the production of chimeric DNA amplicons 
, contamination of the Mexico A genomic DNA with TPE genomic DNA can be ruled out because recombinant genes were only found for two genes of the genome. Contamination with TPE-amplified DNA (corresponding to TPAMA_0326 and TPAMA_0488 genes) was excluded based on careful analysis of Illumina reads, where no TPA- or TPE-specific Illumina reads were found in any of these regions. In fact, the presence of 15 bp-deletions in the TPAMA_0326 gene was found in all 169 individual Illumina reads covering this region. Similar analysis of the TPAMA_0488 region revealed no TPA- or TPE-specific Illumina reads; and all 37 reads, covering regions with both TPA and TPE molecular signatures, revealed the Mexico A consensus sequence. Since Illumina technology sequences individual DNA molecules, contamination of Mexico A genomic DNA with TPE PCR product can be excluded. To exclude artifacts during REPLI-g kit amplification of the Mexico A genomic DNA, three different REPLI-g amplifications were used for TPAMA_0326 and TPAMA_0488 sequencing. No discrepancies were identified during analysis of Sanger reads in these regions. Moreover, Harper et al.
sequenced partial tp92
locus of the Mexico A strain (obtained directly from CDC, Atlanta) and the sequenced region (960 nt, GenBank acc. no. EU102088.1, containing TPE-like sequence in three nucleotide positions and a 15-bp deletion) was identical to our sequence. Sequences of TP0326 (tp92
) from various TPE isolates published by Harper et al.
contained the 15 bp TPE-like deletion and also corresponded to TPE-like changes in the South Africa treponemal isolate. All 21 South Africa partial nucleotide sequences available in the GenBank 
were 100% identical to the corresponding sequences of Mexico A published by Harper et al.
. Therefore, the South Africa strain appears to be another strain that is identical, or very closely related, to the Mexico A strain. Nevertheless, we found 3 nucleotide changes differentiating South Africa and Mexico A sequences published by Harper et al.
from our own sequences of Mexico A. Two of these differences were found in homopolymeric stretches (in fliG
-tp0027 and tp0347 regions) and one SNP (C→T) was found in the rpiA
-tp0617 region. Since both Mexico A strains came from the same laboratory (D. L. Cox, CDC Atlanta), the data suggest that possible sequencing errors in sequences published by Harper et al.
may explain these differences. To further asses the frequency of strains similar to Mexico A/South Africa, we investigated clinical samples published by Flasarová et al.
for Mexico A-specific mutations. No such nucleotide changes were found in 49 genotyped samples, indicating that the Mexico A/South Africa group of strains is not prevalent in central Europe. v) Since convergent evolution assumes acquisition of the same biological trait in unrelated lineages (operating on the level of biological function), it is extremely unlikely that it would result in exactly the same amino acid sequence of the relevant proteins. Due to degeneration of the genetic code, it is even more unlikely that convergent evolution would end up in two identical nucleotide sequences. vi) In contrast to previous alternatives, inter-strain recombination cannot be ruled out despite the fact that the probability of such event is relatively low. Moreover, the mosaic character of the TPAMA_0326 and TPAMA_0488 loci, combining both TPA- and TPE-specific nucleotide sequences, is a typical result of a recombination event after horizontal gene transfer 
. Also, patterns found in TEN strains indicate that observed mosaics in the Mexico A genome are not artifacts, but rather the results of recombination events in the common ancestor of TPA and TEN strains (see ).
Evolutionary relationships among TP strains and hypothetical evolutionary schemes.
There are several possible molecular mechanisms that could lead to the formation of the mosaic structure seen at the TPAMA_0326 and TPAMA_0488 loci. We propose two models () that are based on the incorporation of TPE double stranded DNA. In the first model, dsDNA was integrated into the chromosome of the Mexico A ancestor through homologous recombination. The resulting DNA heteroduplex was block-repaired via mismatch repair mechanisms. Similar reparation patterns have been observed after DNA transformation of Escherichia coli
and Helicobacter pylori
. In other bacteria, mismatch repair involves the cleavage of a daughter strand by MutH, which recognizes methylated cytosine in the GATC sequence. Since TPA does not contain a MutH orthologue and no methyltransferases, the mechanism of DNA cleavage remains unknown. Both mutS
have been annotated to the TPA genome.
Two possible molecular mechanisms resulting in formation of the mosaic structure of the TPAMA_0326 locus.
The second mechanism is based on gene conversion events following internalization of dsDNA. Gene conversion is a common mechanism for producing antigenic variability in TPA 
. Since TPA possesses only the RecF recombination pathway, gene conversion in TPA is likely to follow the successive half crossing-over model 
, as shown in . However, the mosaic structure observed at the TPAMA_0326 and TPAMA_0488 loci would require multiple successive gene conversion events in both loci, which is unlikely. One possible explanation would presume a partial mosaic structure () in both loci in the TPE donor DNA prior to crossing-over. Assuming this, the observed mosaic sequence at the TPAMA_0326 and TPAMA_0488 loci could result from a single gene conversion/recombination event.
Alternatively, there is a possibility of active DNA uptake across the cell membrane, which is more efficient, compared to natural competence of bacteria. Although no gene orthologs involved in natural competence have been identified in the TPA genomes, one cannot exclude this activity in one or more genes with unknown function. Internalization of TPE ssDNA would follow the model of mismatch repair.
TPAMA_0326 and TPAMA_0488 are mosaics resulting from interchromosomal recombination/gene conversion between TPA and TPE strains, while tprC
alleles are the results of intrachromosomal recombination in tprC
. Therefore, similarities to TPE strains seen in tprC
locus and TPAMA_0326 and TPAMA0488 loci arose via different mechanisms. Except for the TPAMA_0326 and TPAMA_0488 loci, two additional nucleotide positions (2 out of 1,192 single nucleotide changes differentiating TPA and TPE strains 
; i.e. 0.168%) were found in the TP0314 locus and TPAMA_0319 gene. In these cases the Mexico A sequence was identical to the TPE sequences. These two nucleotide differences appear to represent differences that occurred by chance. For a single nucleotide position, the theoretical probability is 1,192/1,140,038*1/3 (i.e. 0.035%), where 1/3 is the probability that a particular nucleotide would be changed into a TPE nucleotide. Moreover, since the set of 1,192 single nucleotide changes that differentiate TPA and TPE strains is only based on comparisons of three TPA and three TPE strains, it is likely that the number of nucleotide positions differentiating all TPA and TPE strains will decrease with the newly reported whole genome sequences from other TPA and TPE strains.
Horizontal gene transfer (HGT) is an important process in bacterial evolution and the most frequently transferred genes usually bring selective advantage to the host cell. The TPA genome contains no prophages or IS-elements 
or plasmids 
. Nevertheless, the absences of modification and restriction systems together with the presence of genes for homologous recombination in TPA strains 
appear to allow incorporation of foreign DNA molecules with subsequent integration into the chromosomal DNA. DNA transformation is commonly used in cultivable Treponema denticola
and related Borrelia burgdorferi
. Moreover, natural gene transfer among Borrelia burgdorferi
has been observed 
. In fact, 77 (8.32%) TPA genes were identified to be horizontally transferred by analysis of G+C contents, codon and amino acid usage, and gene position 
. In our analysis, we did not find DNA regions of different G+C content to be associated with regions that differentiate TPA and TPE strains 
, nor were such associations found in tpr
regions, indicating that the genome rearrangements took place before the diversification of these strains. It is therefore likely that the diversification of TPA and TPE strains was due to an accumulation of more subtle changes.
As shown by Centurion-Lara et al.
, recombination mechanisms are more active during treponemal infection and gene conversion events represent important mechanisms for avoiding the host immune response. Therefore, uptake of TPE DNA by TPA strain, during a simultaneous TPA and TPE infection of a single host, with subsequent integration into TPA chromosome, appears to be a plausible explanation. Simultaneous infection with TPA and TPE is certainly possible during the early stages of syphilis infection. It has been shown that experimental infection with either TPA or TPE strains did not result in complete cross-protection, which suggests differences in the pathogenesis of syphilis and yaws 
. Although syphilis is preferentially transmitted sexually among adults, and yaws is preferentially transmitted via direct skin contact among children, simultaneous infection in a single host cannot be ruled out. The Haiti B strain, originally classified as a TPE strain due to having been isolated from “typical yaws lesions” in an 11-year-old child 
, has been recently reclassified as a TPA strain 
. Moreover, Mexico A strain was isolated in a geographic region where both TPA and TPE infections occurred 
. Nevertheless, recombination could also take place outside Mexico.
The mosaic TPAMA_0326 protein (Tp92) belongs to a relatively small group of treponemal outer membrane proteins 
and is an ortholog of the BamA protein involved in outer membrane biogenesis 
. BamA protein was identified as a TPA antigen exhibiting reactivity with sera from patients with syphilis 
, and antibodies against this protein have opsonized living treponemes 
. The 15 bp (TPE-like) deletion in the TPAMA_0326 influences the polyserine tract in a predicted large extracellular loop of TPAMA_0326 protein, which serves as a potential site for attachment to the host cells 
TPAMA_0488 encodes the methyl-accepting chemotaxis protein (Mcp2-1) 
. Mcp2-1 is strongly expressed during experimental rabbit infections 
and elicits a humoral response 
. In the Mcp2-1 protein, there are 18 TPE-like changes, 8 of which are localized in the Cache domain 
, which binds small molecules during chemotaxis. All of these TPE-like changes cause amino acid changes, 7 non-conservative and 1 conservative. Taken together, due to described changes in extracellular/sensoring protein domains, both proteins can exhibit different antigenic epitopes and/or ligand binding activities.
Both TPAMA_0326 and TPAMA_0488 genes are under positive selection within TPA strains, as well as between TPA and TPE strains (genes were tested using codon-based testing by Čejková et al.
). The recombinant TPA strain (Mexico A) can thus possess a selective advantage in an infected host and could provide evasion from the host's immune system. However, it was recently shown that β-barrel structures, including surface-exposed loops of TPAMA_0326, where the TPE-like deletion is present, do not induce antibody response in humans 
On the other hand, positive selection need not be driven solely by the production of antibodies and may also comprise T-cell mediated cellular response, similar to the case of TprK 
. In addition, positive selection operating on the periplasmic Cache domain of TPAMA_0488, recognizing small molecules, could reflect changed tissue tropism of TPE bacteria in comparison to TPA.
Despite selective advantage in the infected host (evasion from immune response, changed tissue tropism), these changes could result in the observed lower growth ability of the Mexico A strain compared to the Nichols strain under in vitro
. Under positive selection, such a change can still have a growth advantage relative to the selective pressure on the host's immune system.
In summary, the mosaic character of the TPA Mexico A genome is likely the result of interstrain recombination between TPA and TPE strains during simultaneous infection in one host and similar patterns can be observed among other TP strains. These findings suggest the importance of horizontal gene transfer in the evolution of pathogenic treponemes.