Chicago and Nichols differ in their origins of isolation (primary chancre vs CSF), durations of propagation in the rabbit host, gene expression levels, induction of antibody and cellular immune responses to some antigens, and rates of TprK variation, the latter being higher in Chicago than in the Seattle Nichols 
. With respect to the published Nichols genome sequence, a 1204 bp insertion was found in the intergenic region downstream of TPChic0126. This large insertion contains 19 putative donor sequences used by T. pallidum
to generate variability within all of the seven tprK
V regions, especially V3 and V6 
. Although this insertion might be speculated to be a reason for Chicago's higher tprK
variability, this 1204 bp fragment is also present in the Nichols strain currently propagated in our laboratory 
, which is slow to develop tprK
variants. Therefore, the number of donor sites alone cannot explain the relative hypervariability of Chicago tprK
. The Nichols strain has been extensively propagated in rabbits and this might have selected for a tprK
sequence that is optimal for survival and rapid growth in rabbit tissues. Frequent passage of the Nichols strain (every 9–12 days) for routine propagation, virtually in the absence of an adaptive immune response, might have permitted the reduction in Nichols' propensity to vary tprK
. Comparative analysis between the two strains did not show differences in the genes coding for the recombination machinery typically involved in gene conversion (i.e. ruv
genes, genes encoding site-specific recombinases or hypermutation homologues; data not shown). Structural predictions of the TPChic0899 ORF obtained using the Bio Info Bank Metaserver (http://meta.bioinfo.pl
) however, found the encoded protein to be similar to an AddB-like deoxyribonuclease, a component of the counterpart of the E. coli
RecBCD enzyme in Gram positive bacteria. TPChic0899 spans Nichols' TP0899 and TP0900 (originally annotated as separate hypothetical proteins) 
. The presence of two ORFs in Nichols is due to a single G deletion that puts in frame the TGA triplet introducing a premature stop codon. Because of the possible involvement of this enzyme in homologous recombination, we further explored this difference between Chicago and Nichols. DT-sequencing of the region containing the G insertion was performed in a total of 16 T. pallidum
isolates, including Nichols strains obtained from several laboratories and the SS14 strain (also reported carrying the deletion; GenBank accession number CP000805.1) 
. Our sequencing data revealed that the G nucleotide is actually present in all isolates () confirming that the annotation of two separate ORFs, TP0899 and TP0900, in Nichols 
and SS14 
is indeed erroneous. Because this gene appears to be functional in all T. pallidum
strains, it is, therefore, likely not associated with the increased rates of tprK
variation that Chicago exhibits with respect to Nichols. Nonetheless, this example underscores the likelihood, when comparative genome-wide studies among T. pallidum
strains are pursued, of encountering inaccuracies in available sequences.
TPChic0899 sequence alignment in T. pallidum isolates.
TPChic0924, which encodes the Tex transcriptional regulator, could potentially explain reported differences in transcription of some tpr
genes in Chicago vs. Nichols 
. The Chicago Tex protein is predicted to be 250 aa shorter than in Nichols. Tex was first isolated and characterized in Bordetella pertussis
by virtue of its negative effect on the transcription and expression of toxin genes ptx
. Tex paralogs were then identified in a wide variety of bacterial species 
and were shown to contain domains involved in nucleic acid binding 
. Interestingly, studies conducted on the Pseudomonas aeruginosa
Tex protein showed that presence of the carboxyl-terminal domain (present in Nichols but not in Chicago) permits Tex to bind nucleic acids 
and thus inhibit transcription. The presence or absence of a complete Tex protein in T. pallidum
could affect a strain's ability to express virulence factors. To further support the “Nichols-specific” nature of this change, it is found that all examined non-Nichols T. pallidum
isolates carry the same A/C transversion () that would truncate the Tex protein in Chicago, in sharp contrast with the five Nichols isolates (Seattle, Houston, Dallas, Farmington, and UCLA), where the ORF encoding the Tex protein would not be truncated.
tex gene sequence alignment in T. pallidum isolates.
When the Chicago genome was first released 
, we reported that 44 coding sequences, annotated as independent ORFs in Nichols, are fused in Chicago leading to 21 considerably longer genes. TPChic0006, for instance, was predicted to be 417 aa long, and to span Nichols' TP0006-0008 (51, 216, and 89 aa, respectively). It is however evident now that these initial observations were a result of sequencing errors in the original Nichols genome, and not the result, as initially postulated, of gene inactivation of original longer sequences by frame shift or nonsense mutations. Recently, Šmajs and collaborators 
suggested that genomic decay might have played a central role in T. paraluiscuniculi
's adaptation to the rabbit host and loss of infectivity to humans 
, and the hypothesis that gene inactivation in the Nichols strain could reflect its adaptation to rapid passage in rabbits for nearly a century, also appeared plausible. Resequencing of the Nichols (Houston) genomic regions containing mutations hypothetically responsible for inactivation of these genes, however, clearly revealed that these annotation differences are also due to sequencing errors in the Nichols genome. It is therefore very likely that reannotation of the resequenced Nichols genome will be significantly more similar to that currently reported for Chicago. Similar findings were described by Cejková et al.
. A complete list of predicted gene fusions is reported in File S6
Indels falling within homopolymeric nucleotide sequences were found in three Chicago ORFs (TPChic0127, TPChic0479, and TPChic0618), and within 3 intergenic regions (3′ of TPChic0026, TPChic0121, TPChic0621). Growing evidence suggests that changes in the length of these homopolymeric repeats, likely induced by slipped-strand mispairing during DNA replication, might be involved in transcriptional or translational control of T. pallidum
genes. For example, the poly-G repeat upstream of TPChic0621 (TprJ) was shown to control transcription of this gene through a phase variation mechanism that allows transcription only when the poly-G tract is eight (or fewer) nucleotide-long 
. The poly-G repeat upstream of TPChic0026 (encoding the fliG1
gene) could have a similar role, although evidence of intra-strain variability of this homopolymeric tract is currently not available. Furthermore, recent evidence suggests that changes in the poly-G repeat within TpChic0127 could either cause a frameshift that prematurely truncates the putative TP0127 protein, or change its reading frame, resulting in a novel protein of approximately equal length but with a different amino acid sequence (unpublished data). Variation in the homopolymeric tracts associated with TPChic0479, and TPChic0618 can also influence the annotation of these ORFs.
Analysis of SNPs in protein-coding genes showed only nonsynonymous mutations, suggesting the presence of recent diversification favoring structural changes in T. pallidum genomes. Overall, significantly higher rates of nonsynonymous changes in the Nichols genome indicate positive selection pressures in 16 protein-coding genes throughout the genome. Limited frequency of polymorphic genes did not permit us to determine whether these genes with recent structural changes could be grouped into specific functional categories of proteins. However, we found a strong clustering of polymorphic genes into two functional groups – membrane proteins and DNA-binding proteins. Within the set of genes with defined functions, the single “Chicago-specific” SNP accumulated in an ATP-binding protein-coding gene, while most of “Nichols-specific” SNPs were found to be in membrane protein-coding genes mostly related to transport and proteolysis ().
Our study suggests that genetic variability likely influences the phenotypic differences seen between the Nichols and Chicago strains of T. pallidum
, even though definitive evidence for the correlation between specific genomic change(s) and phenotypic differences will require further investigation. This study also raises an important concern regarding the selection process that led to these mutations, believed to result from the adaptation of the Nichols strain to the rabbit host. Our comparative analysis incorporating 12 more T. pallidum
strains for the regions carrying SNP changes in Nichols and Chicago, indeed initially suggested that this might be the case, and that the SNPs identified in Chicago and Nichols might reflect pathoadaptive changes the Nichols strain acquired following years of growth in the laboratory animal where it has been propagated so far. Interestingly however, in the DAL-1 genome (GenBank accession number NC_016844) 
, a T. pallidum
strain recently isolated from the amniotic fluid of a pregnant woman 
, most of the Chicago/Nichols polymorphic loci were identical to Nichols sequences. Based on this evidence, we cannot exclude that Nichols and DAL-1 represent a separate naturally-occurring clonal lineage within T. pallidum
. The significant predominance of non-synonymous polymorphisms between Chicago and Nichols strains strongly suggests the likelihood of a role of positive selection in microevolution of T. pallidum
strains, whether due to differential adaptation during rabbit passage or pathoadaptation of individual strains in the human host.
Support for the mutational evolution of Nichols from an ancestral T. pallidum
lineage also comes from the published genome of T. paraluiscuniculi
(Cuniculi A strain, GenBank accession number NC_015715.1), closely related to T. pallidum
. In the Cuniculi A strain, nine of the Chicago/Nichols polymorphic loci (TP0051, TP0265, TP0430, TP0443, TP0488, TP0584, TP0748, TP0790, and TP0978) are identical to non-Nichols strains that were analyzed here, confirming the “Nichols-specific” nature of the mutations. Ongoing research in our laboratories using comparative genomics on a population-wide scale will provide an insight into phylogenetic relationships of T. pallidum
clonal populations and likely will help explain the role of such sequence changes during syphilis infection.