|Home | About | Journals | Submit | Contact Us | Français|
The human pathogen Chlamydia trachomatis exists as multiple serovariants that have distinct organotropisms for different tissue sites. Culture and epidemiologic data have demonstrated that serovar G is more prevalent, while serovar E is less prevalent, for rectal isolates from men having sex with men (MSM). The relative prevalence of these serovars is the opposite for isolates from female cervical infections. In contrast, the prevalence of serovar J isolates is approximately the same at the different tissue sites, and these isolates are the only C-class strains that are routinely cultured from MSM populations. These correlations led us to hypothesize that polymorphisms in open reading frame (ORF) sequences correlate with the different tissue tropisms of these serovars. To explore this possibility, we sequenced and compared the genomes of clinical anorectal and cervical isolates belonging to serovars E, G, and J and compared these genomes with each other, as well as with a set of previously sequenced genomes. We then used PCR- and restriction digestion-based genotyping assays performed with a large collection of recent clinical isolates to show that polymorphisms in ORFs CT144, CT154, and CT326 were highly associated with rectal tropism in serovar G isolates and that polymorphisms in CT869 and CT870 were associated with tissue tropism across all serovars tested. The genome sequences collected were also used to identify regions of likely recombination in recent clinical strains. This work demonstrated that whole-genome sequencing along with comparative genomics is an effective approach for discovering variable loci in Chlamydia spp. that are associated with clinical presentation.
Chlamydia trachomatis is an obligate intracellular human pathogen that is the leading cause of preventable blindness worldwide and is the most common sexually transmitted infectious bacterium in humans. The study of the biology of chlamydiae is complicated by their obligate intracellular development and the lack of a routine system for directed mutagenesis. Chlamydial isolates are differentiated into serovars based on serospecificity for the chlamydial major outer membrane protein (MOMP) (7), which is encoded by ompA (37). The serovars fall into biological groups associated with trachoma (serovars A to C), sexually transmitted noninvasive disease (serovars D to K), and invasive lymphogranuloma (serovars L1 to L3) (35). Comparative genomic analysis of ocular and urogenital chlamydial species has proven to be an effective approach for discovering genetic loci that are associated with observed tissue tropism (9, 10).
Studies conducted in Seattle, WA, and Birmingham, AL, have shown that serovar G rectal isolates are prevalent in men having sex with men (MSM), while serovar E rectal isolates are less prevalent (1, 5, 17). This prevalence of serovar G and rectal tropism differ from what has been observed in studies of female cervical populations in the same geographical regions, where the prevalence of serovar E was significantly higher than the prevalence of serovar G (38). It is not clear whether the causes of these differences are behavioral resulting from network bottlenecks or whether there are genuine biological differences between rectotropic and cervicotropic strains.
The limited examples of horizontally acquired DNA in chlamydial species suggest that lateral gene transfer and recombination are rare in these organisms. However, sequencing efforts have identified clear examples of recombination at a limited number of chlamydial loci, including ompA (6, 19-21, 27, 29). Recent studies have shown that chlamydiae contain the necessary machinery for recombination (4, 37) and that lateral gene transfer can be selected for in cell cultures following coinfection with strains carrying dissimilar drug markers both within and among chlamydial species (4, 13, 14, 42). The mechanisms of recombination and the role of recombination in chlamydial fitness in vivo remain to be investigated.
The distributions of serovar G strains in the heterosexual and MSM populations led us to hypothesize that strains with rectal tropism have variable genes or loci compared to other urogenital isolates. To test this hypothesis, we sequenced eight chlamydial isolates representing serovars D, E, F, J, and G that were collected from the female cervix, male urethra, or male rectum. PCR and restriction fragment length polymorphism (RFLP) assays were then developed to determine if candidate open reading frames (ORFs) identified in the genome sequence analysis were associated with the observed tropism for the rectal site of infection. A set of candidate ORFs that were associated with rectal tropism in serovar G isolates were discovered, and polymorphisms in the pmp genes were correlated with rectal tropism across all serovars tested. Analysis of the genomes also demonstrated that recombination appears to be common in clinical isolates and occurs at locations across the chlamydial genome.
C. trachomatis clinical isolates Ds/2923, E/11023, E/150, F/70, G/9301, G/9768, G/11222, G/11074, and J/6276 were propagated from frozen samples stored at the University of Washington Chlamydia Repository (39). Isolate collection, clonal isolation, serotyping, and elementary body (EB) purification were conducted as previously described (41, 43). Purified EBs were incubated for 60 min with 4 U/ml RQ1 DNase (Promega), which was followed by treatment with 2 mM EGTA (RQ1 Stop solution; Promega) to inactivate the enzyme. DNA was then extracted from purified, DNase-treated EBs using a Qiagen genomic tip kit (Qiagen, Valencia, CA) by following the manufacturer's instructions. The initial suspension buffer used for these purifications was supplemented with dithiothreitol (5 mM) to facilitate EB lysis.
Isolates Ds/2923, E/11023, E/150, G/9301, G/9768, and G/11222 were sequenced using classical Sanger sequencing methods at the Joint Genome Institute (Walnut Creek, CA). DNA from isolates J/6276, G/11074, and F/70 was processed for Illumina-based sequencing using commercial DNA preparation kits (Illumina Inc., San Diego, CA) by following the manufacturer's instructions. Illumina-derived genomes were first assembled using the reference-guided assembly program Maq (28). Regions in reference-guided assembled genomes where Maq could not resolve the sequence were then compared to contiguous sequences assembled using the VCAKE de novo assembly software (22), and a single contiguous draft sequence was produced.
Whole-genome phylogenetic analysis was performed using the alignment program MAFFT with the default settings (24, 25). The sequences compared included sequences generated in this study, as well as previously published genomes for serovar D, A, and L2 strains (strains D/UW3 [GenBank accession number AE001273] , A/HAR-13 [GenBank accession number CP000051] , and L2-434/Bu [GenBank accession number AM884176] ). Pairwise genome alignments were produced using MAFFT with the following settings: iterative refinement, 2; default gap opening penalty, 1.53; and default gap extension penalty, 0.123. These alignments were used to determine the total number of substitutions and insertions or deletions (indels) in genome sequences. Regions where there was high variability between selected sequences were analyzed manually using the MacVector sequence analysis software (MacVector, Cary, NC), and counts were adjusted accordingly. Isolate genome sequences were compared with the previously published C. trachomatis D/UW3 genome sequence (GenBank accession number AE001273) (37) using Diffseq from the Emboss Bioinformatics suite (33), and an in-house single-nucleotide polymorphism (SNP) parsing program (Diffsort; http://people.oregonstate.edu/~rockeyd/Diffsort) was used to determine the locations and translational effects of polymorphisms that were identified. Any gene variation not resolved by Diffsort was manually analyzed using the MacVector sequence analysis software.
DNA sequences were computationally extracted from selected isolates using sliding windows (1,000 nucleotide windows, 800-nucleotide slides) and were used as bioinformatic probes with a database consisting of template genome sequences (D/UW3, J/6276, G/9768, E/11023, and/or F/70 sequences). A comparison of the BLAST raw scores for each window was performed based on whether the window was more similar to a clade containing serovar J, G, or D or a clade containing serovar E or F or whether the probe sequence matched all template genomes equally. The following rules were used to assign a window to a genome sequence or clade. Queries that matched all serovars equally were plotted along the “All” line. Queries that did not match all genomes equally but in which one matched template was either a serovar E template or a serovar F template were grouped with the “E or F” clade. Queries that matched serovar J, G, or D more closely than either serovar E or serovar F were grouped with the “J, G, or D” clade. A single query did not match any of the template genomes and was categorized as “No Hit” in this analysis. Whole-genome results from this parsing were then graphed using the complete D/UW3 genome as a reference, beginning with ORF CT001 (37).
Regions where there was apparent recombination in strain Ds/2923 were then characterized using the ClustalW program, and identified loci were aligned with corresponding sequences from strains D/UW3 and E/11023. The resulting alignments were used to determine the number of informative sites shared by each strain. For this purpose, an informative site was any position in the sequences examined where there was a polymorphism in the template genomes analyzed. Insertions and deletions of any size were counted as one informative site. An identical approach was used for analysis of recombination in strain F/70, using strains D/UW3, J/6276, G/9768, and E/11023 as templates.
Clinical isolate genome sequence variation data were used to design PCR and RFLP assays for genotypic variation in a population of clinical isolates stored in the repository, using oligonucleotide primers and restriction endonucleases listed in Table S1 in the supplemental material. The isolates used in this study included chlamydiae collected in King County, WA (55 isolates), Lima, Peru (2 isolates) (34), and Birmingham, AL (6 isolates). The genes analyzed were selected based on variation between serovar G cervical and rectal isolates or variation between serovar G, E, and D isolates. This approach was also used to confirm that each ompA genotype was consistent with the MOMP phenotype identified by immunofluorescence (data not shown). McCoy cells in six-well trays were infected with cloned clinical isolates, and chlamydiae were grown for 48 h. Genomic DNA was extracted using a Qiagen DNeasy blood and tissue kit by following the manufacturer's instructions, using an initial suspension buffer supplemented with 5 mM dithiothreitol. Sequences in the polymorphic regions selected for SNP analysis were confirmed by traditional Sanger sequencing, using primers shown in Table S1 in the supplemental material. The Fisher exact test was used to determine any statistical association of identified SNPs with observed phenotypes. The alternate hypothesis in these statistical analyses was that there was no correlation between genotype and phenotype at each of the loci tested. Statistical significance was expressed using a P value of <0.01 or <0.001, and the significance data supported the hypothesis that differences at the loci tested correlated with tissue tropism for either the rectal or cervical site of infection.
The C. trachomatis clinical isolate genome sequences sequenced at the Joint Genome Institute have been deposited in the DDBJ/EMBL/GenBank database under the following accession numbers: D(s)/2923, ACFJ01000001; E/11023, CP001890; E/150, CP001886; G/9301, CP001930; G/9768, CP001887; and G/11222, CP001888. The strains sequenced using Illumina sequencing as part of the Whole Genome Shotgun Project have been deposited in the DDBJ/EMBL/GenBank database under the following accession numbers: J/6276, ABYD01000001; F/70, ABYF01000001; and G/11074, CP001889.
Pairwise alignment analysis using MAFFT (24, 25) of the genomes of 12 recent clinical isolates demonstrated that there were different levels of heterogeneity among strains (Table (Table1).1). The number of nucleotide substitutions between urogenital strains belonging to different serovars (not including L2-434/Bu) ranged from 6,494 (for G/11222 and E/11023) to 1,638 (for D/UW3 and G/11222), and the number of nucleotide differences between strains belonging to the same serovar ranged from 1,287 (for G/11074 and G/11222) to 3 (for G/9301 and G/9768). The serovar G strains showing the highest level of similarity were cultured from the male urethra and male rectum of different patients, and their collection dates were separated by more than 1 year. The sequence of a cervical nonfusogenic isolate, Ds/2923, was more similar to the sequences of serovar E and F isolates than to the published serovar D sequence (Table (Table1).1). A comparison of strains E/11023 and Ds/2923 identified 1,211 substitutions, while a comparison of strains D/UW3 and Ds/2923 identified 5,764 substitutions. The highest number of differences between genomes (8,811 nucleotides, 326 indels) was the number of differences between the genomes of the publicly available sequenced ocular strain A/HAR-13 (10) and the publicly available sequenced lymphogranuloma strain L2-434/Bu. This number represented a maximum level of variability of 0.87%.
The whole-genome phylogenetic tree shown in Fig. Fig.11 indicated that our sequenced urogenital serovars fell into at least two clades, one group containing serovars D, G, and J and a second group containing serovars E and F. These two groups were distinct from ocular strain A/HAR-13 and lymphogranuloma strain L2-434/Bu. In this analysis, the genome of strain Ds/2923, which was originally serotyped based on reactivity with serovar D-specific monoclonal antibodies, grouped in the clade with the serovar E and F strains. These data are parallel to the data shown in Table Table11 and confirm that the genome of Ds/2923 is more similar to serovar E or F genomes than to the published serovar D genome.
Diffsort was used to determine the number of substitutions per ORF in comparisons of selected clinical isolates. This study was undertaken to determine if variation is localized to specific regions in the chlamydial genome and to identify possible loci containing variable regions in different serovars or within serovars (Fig. (Fig.2).2). These analyses identified specific regions throughout the genome that exhibited higher levels of variation than the overall genome. These regions included possible recombination targets, including ORFs CT049 to CT051 (20), the plasticity zone ORFs CT144 to CT176, ompA, and the pmp genes (5, 9-11, 16, 32, 45, 46).
Similar pairwise analysis of clinical isolates identified a variety of insertions or deletions (indels) in the different strains (Fig. (Fig.3).3). The serovar G rectal strains had a 430-nucleotide insertion in the CT154 gene, which led to an N-terminal truncation of CT154 and a putative new ORF (CT154.1). A second change found in the serovar G rectal strains was an in-frame 111-nucleotide insertion in gene CT326. These polymorphisms were limited to the sequenced serovar G rectal isolates; the cervical serovar G isolate G/11222 contained neither of these insertions. Both rectal and cervical serovar E isolates lacked the insertion in CT154 but had the 111-nucleotide insertion in CT326. The structure of CT326 was complicated by a 25-nucleotide deletion in serovar E strains compared to D/UW3. This deletion led to a truncated N-terminal CT326 ORF and a C-terminal CT326 ORF (Fig. (Fig.4).4). Strain J/6726 and previously sequenced strains L2-434/Bu and A/HAR-13 also had the 111-nucleotide insertion in CT326, as well as similar but unique insertions in the CT154 region (Fig. (Fig.4).4). Consistent with the findings of Carlson et al. (10), insertions and deletions were located in ORF CT456 (encoding Tarp) in several of the sequenced isolates.
The apparent similarity of the isolate Ds/2923 genome to genomes of serovar E and F strains led us to hypothesize that regions other than ompA might show evidence of recombination in this strain. A BLAST-based similarity approach using sliding windows consisting of 1,000 nucleotides across the entire genome was used to uncover additional regions of recombination. These analyses confirmed that the majority of the Ds/2923 genome is similar to the genomes of serovar E and F isolates (Fig. (Fig.5A),5A), while ompA and nearby sequences are most similar to D/UW3 sequences. Fine mapping of these regions in Ds/2923 demonstrated that the apparent upstream crossover point adjacent to ompA is in the rs2 gene (CT680), at a position previously described as a hot spot for recombination in chlamydial genomes (20). The downstream crossover point for this recombination event is located within ompA and results in a hybrid MOMP protein with variable domains from different serovars (48). These studies also uncovered several additional regions exhibiting higher levels of similarity to the genome of D/UW3 than to the genomes of serovar E or F strains. The clearest examples of this are in ORFs CT171 to CT183 and ORFs CT360 to CT388 (Fig. (Fig.5A).5A). The differences between genomes in these regions included SNPs, indels as large as 308 nucleotides (CT171), and ORF fusions. These data support the conclusion that these regions were involved in recombination between chromosomes, leading to the mosaic Ds/2923 genome.
To determine if genomes of other sequenced isolates exhibited a mosaic structure similar to that of Ds/2923, a BLAST similarity analysis was performed with isolate F/70 (Fig. (Fig.5B).5B). For this experiment, the genome of F/70 was removed as a template in the serovar E-serovar F clade used for analysis of Ds/2923 and instead used as a probe of the remaining genomes. This analysis uncovered a set of regions where there was apparent recombination that were different than the regions observed in Ds/2923. One of the loci (ORFs CT153 to CT166) included the plasticity zone. This region also contained a sequence homologous to the TC0438 (tox) sequence found in Chlamydia muridarum (39). Another possible site of exchange in F/70 included ORFs CT859 through CT868. This region is upstream of the pmp genes (CT869 to CT872 and CT874) and has previously been hypothesized to be a locus where there was lateral gene transfer in C. trachomatis (19, 21).
To determine if polymorphisms found in the isolates sequenced were associated with tropism for the rectal site of infection, loci with various degrees of difference were used for PCR or RFLP analysis. These loci included genomic regions with nucleotide substitutions (CT144, CT869, and CT870), regions with indels that affected the predicted protein length (CT154 and CT326) (Fig. (Fig.4),4), and regions with single-nucleotide polymorphisms that affect open reading frame length (CT158 and CT159). Analyses were conducted with a collection of clinical isolates representing serovars E, G, and J that were obtained from the cervix, male rectum, or male urethra. While in most cases there was no apparent or statistical difference between an SNP and tissue tropism, there were ORFs in which correlations were observed. Insertions in CT154 and CT326 (P < 0.001 and P < 0.01, respectively, Fisher's exact test) were associated with rectal tropism within serovar G (Fig. (Fig.44 and and6).6). Statistical analysis of these regions in serovar E and J isolates indicated that there was no association with genotype and tropism, but it is possible that some of these polymorphisms might be found to be significant (e.g., combinations of CT144, CT154, CT158, CT159, and CT326 in serovar J or pmp genes in serovars E and J) if higher numbers of isolates were investigated. Because a high proportion of our samples were collected in the Seattle area, it was possible that we examined a serovar G population that was restricted with respect to social network or geographic clustering. To address this concern, we included a limited number of rectal isolates belonging to serovar G collected in Alabama and Peru. Each of these isolates (n = 4) had the genotype associated with rectal infection for both ORF CT154 and ORF CT326. No significant association was found between polymorphisms in CT869 and CT870 and tropism for serovar G strains, but a comparison of rectal and cervical sites of infection across all isolates identified a significant association between selected polymorphisms in these two ORFs (P < 0.001) (Fig. (Fig.6C6C).
While these analyses demonstrated that certain SNP patterns were associated with tropism for either the cervical or rectal site of infection, overall the chromosomes were mosaics of a variety of different SNP patterns. The mosaicism observed is most apparent in the serovar G cervical isolates shown in Fig. Fig.6,6, but it is also evident in the serovar G rectal isolates (strains Gr-15P, Gr-16P, Gr-30, and Gr-30AB). This mosaicism also supports the conclusion that the analyses of MSM populations did not simply identify clonal expansion in a geographically restricted area or a closed social network.
The study of chlamydiae has benefitted greatly from advances in genome sequencing technology. These bacteria have small (~1-Mb), largely syntenic, AT-rich genomes with very few repeat regions and almost no genomic islands. We (39) and other workers (10, 23) have explored the possibility that genome sequence analysis can be used to characterize functional roles of individual genes in the chlamydiae, particularly since there are no practical genetic tools this system.
In the present study, we used genome sequencing and PCR-based analyses of polymorphisms to examine chlamydial recombination in clinical isolates and to develop a technology for correlating the chlamydial genotype with the clinical phenotype. The genomes sequenced were collected at Seattle/King County sexual health clinics, and the isolation dates ranged from October 1994 to August 2005. Initial genome sequence data demonstrated that the maximum sequence divergence between clinical strains was 0.87% and that there was a minimum difference of three substitutions and a single nucleotide indel. The two strains showing the highest degree of similarity were serovar G isolates collected from the rectal (G/9768) and male urethral (G/9301) sites of infection, and more than a year separated the collection dates of these strains (November 2001 and May 2000, respectively). The largest insertion in any clinical strain sequenced was 4,668 bp long, and it was found in the previously sequenced strain F/70 and was shown to be variable in serovar J strains (9, 39). The overall level of variation observed among C. trachomatis strains is similar to the level of variation observed for members of the same genomic group of the obligate intracellular bacterium Coxiella burnetii (2), as well as for Rickettsia rickettsii isolates (15).
There are technical issues associated with generation of precise SNP counts for different strains. For example, we found 3,696 substitutions when the published sequences of strains D/UW3 and A/HAR13 were compared; this number is slightly higher than the number determined by Carlson et al. (10). The differences can be attributed to the specific aspects of the programs used to count SNPs. One example of this is the settings used in the MAFFT software to generate alignments for determining numbers of substitution and indel events. The default MAFFT settings, which were used in our study, result in a higher penalty for inserting a gap than for extending a gap. Adjusting the settings for a lower gap insertion penalty and a higher extension penalty resulted in slightly fewer substitutions at the cost of increasing the number of indel events. Such differences in the analysis programs lead to slight differences in counts, but the overall relationships among genomes are conserved.
Phylogeny-based characterization of the genomes revealed two clades for our sequences; one clade contained serovars E and F, and the second clade contained serovars D, G, and J. These relationships are in agreement with previously described relationships determined by phylogenetic analysis either by using a set of housekeeping genes (31) or by performing comparative studies of chlamydial phylogeny (9, 10, 32).
A BLAST-based analysis was used to examine if recombination is common in chlamydiae and could be reflected in the genomes collected. Strain Ds/2923 is a cervical isolate that was our original example of an IncA-negative, nonfusogenic strain (40), and its genome showed the best evidence of recombination in clinical strains. This strain, similar to all other IncA-negative strains that have been characterized, has a lesion in IncA that is correlated with the inactivated incA open reading frame, and inclusions formed by such strains do not fuse with inclusions formed by either IncA-negative or IncA-positive C. trachomatis (40). Although serovar-specific monoclonal antibodies identified Ds/2923 as a serovar D strain, our analyses demonstrated that most of its chromosome is more similar to the chromosomes of serovar E or F strains. A set of ORFs that may have been recombination targets in this strain includes ompA, the gene encoding MOMP, the major serovariant antigen in the chlamydiae. Possible recombination events involving ompA have been found in other studies. Early in the analysis of chlamydial gene sequences, hybrid ompA coding sequences were identified by several investigators (6, 27, 29, 48), and recent work by Gomes et al. (20) demonstrated that a recombination hot spot is just upstream of the ompA coding sequence. The Ds/2923 chromosome had an apparent recombination event exactly at the position identified by Gomes and colleagues. The downstream recombination event occurred between variable domains 3 and 4, which is also consistent with the results of other workers who have identified ompA sequences encoding mosaic MOMPs with variable domains in members of different prototype serovars (27, 48).
Studies of possible recombination sites in Ds/2923 were expanded by performing a BLAST-based similarity analysis of sequenced genomes. This analysis identified additional regions in the Ds/2923 chromosome that were targets for recombination between strains. These candidate recombination loci involve several sites, including the plasticity zone (3, 9, 10, 16, 32, 45, 46) and ORFs CT360 to CT389, a region encoding a set of metabolic pathways and hypothetical genes. Both of these genomic regions in Ds/2923 are more similar to the prototypic serovar D sequence than to our serovar E or F sequences. ORFs CT360 to CT389 include aaxABC (CT372 to CT374), which encode proteins that participate in an arginine-agmatine exchange system (36). Recently, Giles et al. used an Escherichia coli system to show that polymorphisms in aaxB led to inactivation of this gene in strain D/UW3 and serovar L2 and that function might be restored in D/UW3 by an R115G replacement (18). Genomic analysis demonstrated that isolates E/150 and E/11023 had this R115G replacement. Therefore, it is possible that there is phenotypic discrepancy at this locus, with serovar E and F strains being “wild type” and D/UW3 and the mosaic strain Ds/2923 being deficient in this exchange pathway.
A subsequent analysis of the genome of strain F/70 identified similar examples of likely recombination events, but at different loci. The data obtained are consistent with the results of in vitro analyses by DeMars et al. (13, 14) and our laboratory (42) and support the hypothesis that recombination is very common and involves many different locations across the chlamydial chromosome.
Our second hypothesis addressed the concept that genome sequencing of clinical strains could identify and help characterize genes and gene products important in the biology of the pathogen in vivo. Pioneering work in this area was conducted by Caldwell et al., who used sequence analysis of a limited number of ORFs to correlate the presence of trp synthesis genes with ocular or genital tropism in C. trachomatis (8). We chose a different clinical phenotype for analysis, tropism for the rectal site of infection, as a trait for study. Our analyses identified four loci that were statistically associated with a particular tropism, only one of which, the CT869-CT870 locus, was associated with rectal tropism across all serovars (P < 0.001). These ORFs encode two Pmp proteins, which are members of a family of chlamydial autotransporters important in different aspects of chlamydial biology (12, 26, 44, 47). The nucleotide variation found in these genes leads to amino acid changes that are not randomly distributed across the coding sequence, indicating that the variable regions may be parts of domains important in chlamydial biology. The variation and molecular evolution of the pmp genes is an active area of research (30), but the possible function of the different Pmp variants in attachment or intracellular development remains to be investigated.
Alterations in three other ORFs were statistically correlated with tropism only in serovar G. CT144 encodes a hypothetical 285-amino-acid protein that varies at 14 amino acids in the strains tested. Eleven of the 14 amino acid changes in CT144 were clustered in a 26-amino-acid region of the gene product (data not shown). Further study of CT144 might determine if this variable domain plays a role in tissue tropism or pathogenesis. ORF CT154.1 is the result of a 430-nucleotide insertion that was also found in the genomes of serovar A and L2 isolates (Fig. (Fig.4)4) and encodes a protein with no predicted function. Finally, the presence of a 111-nucleotide in-frame insertion in uncharacterized hypothetical gene CT326 was statistically correlated with tropism for the rectal culture site. Our association of these ORFs with tissue tropism is complicated by the fact that none of these ORFs were statistically associated with rectal tropism in serovar E or serovar J strains. It is possible that production of the different proteins leads to phenotypic differences in the context of serovar G, facilitating the apparent tissue tropism. Alternatively, it is possible that these SNP differences are in or linked to regions of the genome that encode unidentified proteins that collectively are important for this tropism. Possible functions of candidate proteins identified in this study are currently being explored in our laboratory.
Connie Celum and Will M. Geisler are gratefully acknowledged for supplying chlamydia isolates collected from individuals in Lima, Peru, and Birmingham, AL. Paul Richardson and Alla Lapidus are acknowledged for the genome sequencing and assembly conducted at the Joint Genome Institute. We thank Kelsi Sandoz for advice and consultation regarding PCR and RFLP analysis. Sara Weeks is acknowledged for technical assistance and editing of the manuscript.
This research was supported by grants A148769 and A1031448 from the National Institutes of Health.
Editor: R. P. Morrison
Published ahead of print on 22 March 2010.
†Supplemental material for this article may be found at http://iai.asm.org/.