The raw microarray data of five Leptospira strains were processed with extraction of background, removal of outliers, and normalization using a global Loess method. Probes for the tiling array were designed to cover the entire genome. Therefore, in the tiling array, each gene was designed into 1 to 27 probes based on the length of the gene. A total of 43,235 probes for the L. interrogans serovar Copenhageni and Lai genomes were designed. Five strains (from L. interrogans serovars Bratislava, Canicola, and Hebdomadis and L. kirschneri serovars Grippotyphosa and Cynopteri) from two Leptospira species () were applied to the tiling array. The three serovars (L. interrogans serovars Bratislava, Canicola, and Hebdomadis) from the same species as the reference genome detected much higher percentages of the probes (96.23%, 95.45%, and 94.28%, respectively) than the two L. kirschneri serovars Grippotyphosa and Cynopteri (64.54% and 67.84%, respectively). In addition, the genomes of the five strains tested had higher similarity to the reference genome of L. interrogans serovar Copenhageni than to that of L. interrogans serovar Lai (data not shown).
Determination of genes present, absent, and partially present was conducted. If all probes representing one gene had hybridization intensities equal to or greater than 500, then this gene was considered to be present in the tested strains; likewise, if the signal intensities of all probes representing one gene were all less than 500, this gene was considered to be absent; if the signal intensity of one or more than one of the probes representing one gene was less than 500, this gene was considered partially present (or partially absent).
Analysis based on gene function categories. Based on the definition above, the percentage of present, absent, and partially present genes in each functional category (based on published annotation) for each strain was determined (). The known functional diversity showed a broad conservation in
L. interrogans serovars Bratislava, Canicola, and Hebdomadis, with 90 to 99.29% of reference genes detected in each functional category, while the highest percentages of reference genes detected in
L. kirschneri serovars Grippotyphosa and Cynopteri were 66.99% and 70.83%, respectively, for protein synthesis; the lowest percentages were 21.88% and 24.22%, respectively, for the mobile and extrachromosomal element functions (MEEF). Transposases were predominant in the MEEF category, and a new insertion (IS) element, IS
lin1, was identified in
L. interrogans serovar Copenhageni; IS
1500, IS
1501, and IS
1533 were discovered previously (
19). IS
1500 was detected in all tested strains. Other IS elements were present in
L. interrogans serovars Bratislava, Canicola, and Hebdomadis, and they were either absent or partially present in
L. kirschneri serovar Grippotyphosa; only 4 of 31 IS
lin1 elements were present in
L. kirschneri serovar Cynopteri. Transposases contributed to creating genetic diversity within species and adaptability to changing living conditions. This suggested that the two
L. kirschneri serovars Grippotyphosa and Cynopteri might have less genetic diversity than the three serovars from
L. interrogans.
Comparison of tested strains to reference genomes. Compared with the reference genomes, the percentage of genes present in the tested serovars varied from 51.23% (L. kirschneri serovar Grippotyphosa) to 95% (L. interrogans serovar Bratislava), whereas the percentages of partially similar genes ranged from 1.70% (L. interrogans serovar Canicola) to 27.90% (L. kirschneri serovar Grippotyphosa), and the percentages of absent genes ranged from 3.82% (L. interrogans serovar Bratislava) to 20.87% (L. kirschneri serovar Grippotyphosa) (). This result suggests that L. interrogans serovar Bratislava is the closest to the reference genome of L. interrogans serovar Copenhageni and that L. interrogans serovar Lai and L. kirschneri serovar Grippotyphosa have the least similarity to L. interrogans serovar Copenhageni.
| Table 2Percentage of genes detected in five serovars based on the reference genomes |
Comparison among tested strains. A total of 3,957 genes were detected in all five tested strains. Of these genes, 54.18% belonged to unclassified, hypothetical, or unknown functions or were unassigned any function; the rest of the genes were predominantly housekeeping genes involved in transport and binding, regulatory functions, transcription, purines, pyrimidines, nucleosides and nucleotides, protein synthesis, protein fate, energy metabolism, central intermediary metabolism, DNA metabolism, cellular processes, cell envelope, amino acid biosynthesis and biosynthesis of cofactors, and prosthetic groups, most of which were consistent with the core leptospiral genes resulting from the comparison of genomes between the saprophyte
L. biflexa and pathogenic species
L. interrogans and
L. borgpetersenii (
3,
19,
20,
23). Eighty-four genes were not detectable in all five strains, and these genes did not have functions assigned. The number of genes unique to each tested strain (except for the
L. kirschneri serovar Grippotyphosa strain) was 120, 78, 30, and 4 for
L. interrogans serovars Bratislava, Canicola, and Hebdomadis and
L. kirschneri serovar Cynopteri strains, respectively. Genes unique to
L. interrogans and
L. kirschneri were also observed. A total of 993 genes were detected in the three strains of
L. interrogans while only five genes without assigned functions were detected in the two strains of
L. kirschneri. Of the 993 genes, only 183 had known functions, and these were dominated by genes involved in MEEF (47 transposase) and cell envelope (44 lipoproteins and membrane proteins), which are involved in nutrition and signal transduction. In addition, three genes responsible for fruiting body development for long-term survival (
28) were shown only in the
L. interrogans species. This observation suggests that
L. interrogans might better adapt to multiple environments (
3,
19,
20) than the
L. kirschneri species.
PCA and clustering analysis. PCA based on signal intensities was used to group and separate strains with similar or dissimilar genetic properties. The results showed that the two
L. kirschneri serovars Grippotyphosa and Cynopteri were closely grouped, separated from the three
L. interrogans serovars (). Clustering analysis based on microarray signal intensities showed that the cluster formed into two groups, I and II (data not shown). In group I,
L. kirschneri serovars Grippotyphosa and Cynopteri were closely clustered together and then clustered with group II from
L. interrogans;
L. interrogans serovars Bratislava and Canicola grouped first, followed by
L. interrogans serovar Hebdomadis. The PCA and clustering results combined showed that strains within species were genetically closer than those from across species. In addition, there are controversies in previous publications about
L. kirschneri serovar Grippotyphosa taxonomy. Yasuda et al. (
32) assigned
L. kirschneri serovar Grippotyphosa to
L. interrogans while Ramadass et al. (
22) assigned it to
L. kirschneri. Our result based on tiling arrays, which covered most of the genome, supported that
L. kirschneri serovar Grippotyphosa should be assigned to
L. kirschneri instead of
L. interrogans, consistent with the most recent result (
22) based on DNA hybridization.
Pathogenic factors. (i) Confirmed or potential factors. Genes identified to be responsible for pathogenesis were also observed in the tested strains ().
ligA,
ligB, and
ligC previously reported to be probably involved in host-pathogen interactions (
19) were present or partially present in the tested strains except those of
L. kirschneri;
ligB and
ligC were present in three
L. interrogans strains while
ligB was partially present in
L. kirschneri strains;
ligA was present in one
L. kirschneri serovar Cynopteri strain and partially present in all other tested strains. This is consistent with previous results of McBride et al., who used
L. interrogans serovar Canicola strain Kito. The
L. interrogans serovar Canicola used in this study had 90.3%, 96.7%, and 98.5% DNA sequence identity of
ligA,
ligB, and
ligC genes, respectively, with those from
L. interrogans serovar Copenhageni while
L. kirschneri serovar Grippotyphosa had 91.4%, 93.2%, and 90.5%, respectively, sequence identity.
| Table 3Summary of previously confirmed/potential virulence factors detected in the five tested serovars |
Three integrin alpha-like proteins (LIC12259, LIC10021, an LIC13101) from
L. interrogans serovar Copenhageni and three (LA1499, LA0022, and LA3881) from
L. interrogans serovar Lai were identified as candidates of leptospiral adhesins (
19). Except for LA0022 missing on the microarray, others were present or partially present in all tested strains ().
Eshghi et al. (
6) compared global proteome analyses on
L. interrogans serovar Copenhageni grown under conventional
in vivo conditions and growth mimicking
in vivo conditions. Four novel proteins (LIC12575, LIC13050, LIC12032, and LIC13166) and related virulent factors were identified, which were present in all five tested strains (). The lipoproteins LipL21, LipL45, and LipL36 were unique in pathogenic
Leptospira based on the sequenced strains (
20). LipL21 and LipL45 were present in all tested strains; however, LipL36 varied among the serovars as it was present in
L. interrogans serovars Bratislava and Hebdomadis, partially present in
L. kirschneri serovars Grippotyphosa and Cynopteri, but absent in
L. interrogans serovar Canicola ().
Two proteins, Lsa63 (LIC10314) and Lp95 (LIC12690), were observed in all strains tested; they were confirmed to bind laminin and collagen (
29) and extracellular matrix components (
1), respectively, which were related to invasion of the hosts.
(ii) Possible pathogenic factors. The genomic comparison between saprophyte
L. biflexa and the pathogens of leptospirosis (
L. borgpetersenii and
L. interrogans) showed that 1,431 genes were unique to the pathogens. These genes may be playing a role in pathogenesis since there were no orthologous genes in
L. biflexa (
20). The array used in this study contained 1,083 of 1,431 genes, and only 323 genes had assigned functions. A clustering analysis for the 323 genes based on genes present, partially present, and absent in the tested strains was performed; for this, present was replaced by 1, partially present was assigned a value of 0.5, and absent was assigned a value of 0. Five clusters were formed (see Fig. S1 in the supplemental material). In cluster I (see Fig. S1a), the present genes varied among strains, probably related to the survival in the environment for different strains; in clusters II and III (see Fig. S1b and c), genes were present in all the strains tested or partially present in the
L. kirschneri serovars Grippotyphosa and Cynopteri, in which sphingomyelinase, phage-related protein, leucine-rich repeat protein, and methylase/methyltransferase were reported to be related to pathogenesis; in clusters IV and V (see Fig. S1d), genes were either present only in
L. interrogans serovars Bratislava, Canicola, and Hebdomadis (cluster IV) or present only in
L. interrogans serovar Canicola (cluster V). The pathogenic roles of most of these genes, even with assigned functions, were not clear; however, the data based on CGH microarray provided basic genomic information that can become the references for further study on
Leptospira.
A tiling array was used in this study, which was designed to cover the whole genome. The advantage of a tiling array compared to an expression array is that the tiling includes not only open reading frames (ORFs) but also intragenic DNA fragments, potentially providing more information. In addition, on the tiling array, a gene can be designed with more probes according to the gene size, thereby allowing identification of genes which were present, partially present, and absent in the tested strains confidently. We proposed the concept partially present in this study so that the gene variation during strain evolution could be identified. Furthermore, genes reported to be involved in pathogenesis were observed in all the five strains. However, a limitation of the application of an array based on the reference genome is that the unique genes, which existed in the tested strains instead of in the reference genomes, cannot be detected because they were not on the array. Our results also showed that the tiling CGH array could clearly distinguish species and identified the differences of genetic content for each strain. Thus, the tiling CGH array designed for this study is appropriate to conduct high-throughput genome screens for Leptospira.