|Home | About | Journals | Submit | Contact Us | Français|
Chlamydia trachomatis is an obligate intracellular bacterium that causes a diversity of severe and debilitating diseases worldwide. Sporadic and ongoing outbreaks of lymphogranuloma venereum (LGV) strains among men who have sex with men (MSM) support the need for research on virulence factors associated with these organisms. Previous analyses have been limited to single genes or genomes of laboratory-adapted reference strain L2/434 and outbreak strain L2b/UCH-1/proctitis. We characterized an unusual LGV strain, termed L2c, isolated from an MSM with severe hemorrhagic proctitis. L2c developed nonfusing, grape-like inclusions and a cytotoxic phenotype in culture, unlike the LGV strains described to date. Deep genome sequencing revealed that L2c was a recombinant of L2 and D strains with conserved clustered regions of genetic exchange, including a 78-kb region and a partial, yet functional, toxin gene that was lost with prolonged culture. Indels (insertions/deletions) were discovered in an ftsK gene promoter and in the tarp and hctB genes, which encode key proteins involved in replication, inclusion formation, and histone H1-like protein activity, respectively. Analyses suggest that these indels affect gene and/or protein function, supporting the in vitro and disease phenotypes. While recombination has been known to occur for C. trachomatis based on gene sequence analyses, we provide the first whole-genome evidence for recombination between a virulent, invasive LGV strain and a noninvasive common urogenital strain. Given the lack of a genetic system for producing stable C. trachomatis mutants, identifying naturally occurring recombinants can clarify gene function and provide opportunities for discovering avenues for genomic manipulation.
Lymphogranuloma venereum (LGV) is a prevalent and debilitating sexually transmitted disease in developing countries, although there are significant ongoing outbreaks in Australia, Europe, and the United States among men who have sex with men (MSM). Relatively little is known about LGV virulence factors, and only two LGV genomes have been sequenced to date. We isolated an LGV strain from an MSM with severe hemorrhagic proctitis that was morphologically unique in tissue culture compared with other LGV strains. Bioinformatic and statistical analyses identified the strain as a recombinant of L2 and D strains with highly conserved clustered regions of genetic exchange. The unique culture morphology and, more importantly, disease phenotype could be traced to the genes involved in recombination. The findings have implications for bacterial species evolution and, in the case of ongoing LGV outbreaks, suggest that recombination is a mechanism for strain emergence that results in significant disease pathology.
Chlamydia trachomatis is responsible for a broad spectrum of diseases in males and females of all age groups worldwide. C. trachomatis is the leading cause of preventable blindness in tropical developing countries and the leading global cause of bacterial sexually transmitted diseases (STDs) (1). Over 92 million individuals are infected with C. trachomatis in the urogenital tract annually (1). In the United States alone, more than 1 million cases are reported each year, although the actual rates are estimated to be 2.8 million due to the deficiencies in screening and reporting (2). The Centers for Disease Control and Prevention (CDC) have reported that these infections result in an annual cost to Americans of over $10 billion.
The complications of urogenital infection include tubal factor infertility, ectopic pregnancy, and chronic pelvic pain. The lymphogranuloma venereum (LGV) strains of C. trachomatis are considered biological variants of the organism and cause invasive disease such as genital ulcers, hemorrhagic proctitis, rectal fistulae, and suppurative lymphadenitis (3). LGV is prevalent in developing countries and represents a neglected tropical disease. To date, five different strains of LGV have been identified: L1 to L3, L2a, and L2b.
Infection is initiated at the mucosal epithelia by host cell endocytosis of the infectious but metabolically inert elementary body (EB). Following cell entry, the EB expands into the replicative form termed the reticulate body (RB) within a nonacidified inclusion. Over 30 C. trachomatis proteins are secreted into the membrane of the inclusion, including a family of inclusion proteins of which IncA is the best described (4, 5). IncA is thought to be critical for fusion of independent inclusions that form when more than one EB enters the same host cell.
The mechanism for ascension to upper genital tract tissues remains ill described. While LGV strains have not been identified in endometrial or fallopian tube tissue, they are known to cause ulceration, invade the basal layers, and travel via the lymphatic system to regional draining lymph nodes (3). There have been sporadic and ongoing outbreaks of LGV among men who have sex with men (MSM) in many developed countries worldwide where, historically, the rates of LGV have been extremely low. The first reported outbreak occurred in Rotterdam, Netherlands, in 2003 (6). Additional outbreaks were reported to occur in the United States, Europe, and Australia (7). The majority of these infected men had engaged in high-risk behavior, were HIV positive, and presented with painful hemorrhagic proctitis, discharge, and, in some cases, constipation. There was a surprising absence of genital ulcers and the inguinal syndrome.
In 2005, a novel LGV strain, termed L2b, which was associated with both symptomatic and asymptomatic disease, was identified (8); the genome was subsequently sequenced (9). While a similar strain based on ompA genotyping was detected among isolates dating back to the 1980s in San Francisco, CA (10), the degree of genome homology between the two is not known. L2b strains have also recently been reported to occur in urethritis cases (11). These cumulative findings, especially the association of L2b with different disease presentations, and the availability of only two LGV genomes (L2/434 and L2b/UCH-1/proctitis ), suggest that further analyses of the morphological, molecular, and genetic characteristics of clinical LGV strains are warranted to identify virulence factors and understand the microbial nature of the outbreaks. The purpose of the current study was to advance these objectives by characterizing the cellular biology and genomics of an LGV isolate from an HIV-negative MSM who had severe hemorrhagic proctitis.
We isolated a unique LGV strain from the MSM that was discovered to be a recombinant of invasive L2 and noninvasive D strains. We present the first computational and statistical whole-genome evidence for recombination in this pathogen. Our findings have implications for bacterial evolution and, in the case of ongoing LGV outbreaks, suggest that recombination is a mechanism for strain emergence that results in significant disease pathology. While we do not know the molecular clock for these events, the results suggest either that the recombinant was produced in the rectal mucosa of our patient prior to sampling or that a clonal population of L2 and D recombinants had already emerged and was circulating among the patient’s core sexual group. Methods for direct genome sequencing of the organisms present in various tissues will be required to fully ascertain the diversity and recombinant nature of C. trachomatis strain infections and their associated disease phenotypes.
We compared the morphological characteristics of reference strains D/UW3, L1/440, L2/434, L2a/TW-396, and L3/404, clinical strain L2b, and a clinical isolate from the rectal mucosa of an MSM from this study, termed L2c, in culture using HeLa229 cells and confocal fluorescence microscopy (see Materials and Methods). Reference strains refer to C. trachomatis organisms that were isolated many decades ago, have been propagated since then, and are considered laboratory adapted. At 24 h, L2/434 (Fig. 1A) and L1/440, L2a/TW-396, L3/404, and L2b (not shown) produced large inclusions with IncA-containing fibers extending to secondary inclusions (Fig. 1A, arrowheads). D/UW3 had smaller inclusions than L2/434 at 24 h but a similar-sized inclusion at 48 h (Fig. 1B), with IncA staining at all time points. In contrast, L2c formed multiple smaller inclusions that did not fuse (Fig. 1C, arrows); there was no evidence for IncA staining, IncA-containing fibers, or fusion of inclusions during the course of development at 12, 18, or 24 h for L2c (Fig. 1C and 1D) (data not shown for 12 and 18 h). We found that incA mRNA was expressed for strains L2/434, D/UW3, and L2c (Fig. 1E), despite the absence of IncA staining for L2c.
L2c was more difficult to grow and had 20% fewer infected and surviving cells than LGV strains at 24 and 36 h with the use of the same multiplicity of infection (MOI). The growth curve (Fig. 1F) showed peak 16S rRNA relative expression at 24 h, with a taper to 36 and 48 h for L2. The peak was also at 24 h for L2c but was much lower than for L2, while the peak for D/UW3 was at 48 h and lower than for L2/434 but higher than for L2c.
Because of the difference in morphological characteristics between LGV and L2c strains and previous reports of nonfusing strains being less virulent (12), we hypothesized that other LGV strains would be more cytotoxic than L2c. We also assayed the D/UW3 strain given the probability that the partial toxin gene was acquired from a D strain (described below). The cytopathic effects of each strain were determined by infecting monolayers of HeLa cells with each strain independently using a toxicity assay (see Materials and Methods). Infections were visualized by light microscopy after 4 h. Surprisingly, the cellular effects of L2/434 showed far less cytotoxicity (Fig. 2A) than those of L2c (Fig. 2B) or D/UW3 (Fig. 2C), the latter two of which had significantly higher numbers of cells that underwent rounding, detachment, and lysis (score, 3+) than L2/434 (100% versus <10%; P < 0.01 for both). By Western blot anlaysis, we show toxin protein production for D/UW3 and L2c but not for L2 (Fig. 2G).
Genomic DNA was extracted from L2c grown in HeLa cells after plaque purification and propagation (13) (see Materials and Methods). To verify clonal purity, ompA and MLST genes (14) for 10 clones were Sanger sequenced following amplification by PCR and cloning using a TOPO TA kit (Invitrogen, Carlsbad, CA) (see Materials and Methods). The sequences of the eight genes were identical for each clone.
Genome sequencing was performed using 454 pyrosequencing (15). The genome sequences for the L2c chromosome (1,038,313 nucleotides [nt]) and plasmid (7,499 nt) were obtained following assembly and in silico closure (see Materials and Methods). The genome was annotated by the Integrative Services for Genomics Analysis pipeline (16). There were 1,005 putative protein-coding genes. While the overall sequence was similar to that of L2/434, we noticed unusual patterns of localized variation. Using an in-house program called Q-plotGenome (see Materials and Methods) and BLAST score ratio analysis (17), we identified a recombinant region of 78 kb and other smaller regions that were similar in sequence to D/UW3 (Fig. 3 and Table 1, region 1; see also Fig. S1 in the supplemental material). Phylogenetic reconstructions (Fig. 4) confirmed the recombinant regions to have originated recently from an ancestor similar to D/UW3, while the rest of the chromosome and plasmid were closely related to L2 sequences. Using Q-plotGenome, we identified indels for the cytotoxin region, the tarp and hctB genes, and the intergenic region (IGR) upstream of the ftsK gene (Fig. S1).
A puzzling feature of the recombinant regions was a persistent background of reads with bases similar to those of L2/434. We plotted the number of nucleotide variants at each of the 7,982 single nucleotide polymorphisms (SNPs) between D/UW3 and L2/434 (see Materials and Methods). At 6,527 (82%) of the known SNP positions, all nucleotide reads had the L2/434 variant base rather than that of D/UW3. At the other 18% of the known SNP positions, D-like variants were either approximately 50% or significantly above zero (Fig. 3, red bars, and Table 1, regions 2, 3, 5, and 6). It is probable that these SNPs were introduced through gene conversion of localized segments with a D-like ancestor. The Circos plot highlights candidate recombinant regions of L2c that match to D/UW3 or L2/434 sequences (Fig. 5). A further 235 variant positions in two regions were not known variants between L2/434 and D/UW3 (Fig. 3, blue bars, and Table 1, regions 1 and 7) and may have arisen through recombination with a strain significantly different from D/UW3 or with a D strain that has variant sequences in those regions. However, comparison of L2c with another D strain that was recently genome sequenced [D(s)/2923; GenBank accession no. NZ_ACFJ00000000] (18) showed similar results. The distribution of SNPs was almost identical to that for D/UW3 (see Fig. S2 in the supplemental material). In fact, fewer D(s)/2923 SNPs match (7,565 SNPs, versus 7,982 for D/UW3).
The likely explanation for these results is that an L2 strain acquired DNA segments from a D strain and either one other C. trachomatis strain or a D strain variant through gene conversion in the patient. The patterns of SNP discovery could not have been produced by accidental mixing of strain D and L2 genomic DNA in the laboratory before genome sequencing. If that were the case, all D-like SNPs would have had the same frequency in the chromosome and plasmid. Instead, more than 80% of the SNPs were not seen at all, and 18% of the variants were found at frequencies of greater than 15%. Furthermore, the extent of recombination and conservation of the recombinant regions would have had to have occurred in relatively few culture generations along with elimination of any contaminating D or other donor strains, since donor strains were not seen even with high-redundancy sequencing (75-fold). Also, all SNPs between L2c and L2/434 were within the boundaries of the seven discrete recombinant regions (Table 1). Outside these regions, there was no intrinsic genetic variation in the genome backbones of L2c and L2/434-LGV strains isolated in California more than 30 years apart (10). Finally, all 10 plaque-purified clones screened by ompA and typed by multilocus sequence typing (MLST) as L2 had no minor contaminating base peaks.
As described above, we identified seven putative recombinant regions within the L2c genome (Table 1). The regions were independently analyzed and confirmed using Sanger sequencing (see Materials and Methods).
Region 1 contained SNPs in ~20% of the reads that match a mixture of D-like and non-D-like variants, suggesting that the DNA source was not a D strain. Comparison of L2c with D(s)/2923 also did not reveal a match in this region. However, since only two D strains [reference D/UW3 and clinical D(s)/2923] have been genome sequenced to date and the distributions of SNPs are almost identical to each other, it is possible that other strains within the D cluster may match this region.
In region 2, from the middle of the sufB gene, the proportion of reads with SNPs rose to 70 to 90% for variants entirely D-like, continuing for 78 kb and including a 16S rRNA operon. The ftsK gene, a complement gene involved in cytokinesis, was in this region, reading from the 3′-to-5′ side of the genome origin of replication. The IGR sequence upstream of ftsK was annotated using the Bacterial PROMoter and BDGP programs; two putative promoters were identified at alignment coordinates 45 to 168 and 310 to 484, respectively, upstream of the start codon (see Fig. S3 in the supplemental material). Both predictions had significant scores. A 33-nt deletion in the promoter at nt 406 was present for L2c, D/UW3 and ocular strains compared to what was observed for L2/434 and L2b/UCH-1/proctitis (Fig. S3), although the significance of these findings in terms of gene regulation is unknown.
Region 3 was a small recombinant region centered on the glgB gene, part of the glycogen biosynthetic pathway.
Region 4 contained a large insertion of 1,915 nt representing a partial toxin gene (CT166 in D/UW3) at the 5′ end not present in any LGV strains (Fig. 6). ClustalW alignment revealed that the region of CT165 to CT168 was almost identical to that of D/UW3. In vitro experiments revealed expression of the toxin with a cytotoxic phenotype in HeLa cells for L2c (Fig. 2B) but not for the other LGV strains (Fig. 2A) (see Materials and Methods). After 20 passages of L2c in tissue culture, the toxin insertion was lost. The cytotoxicity assay was repeated for the passaged isolate, and there was no observed cytopathic effect for L2c (Fig. 2E) (see Materials and Methods) or L2 (Fig. 2D); D/UW3 maintained a cytotoxic effect (Fig. 2F). However, other genes that could have contributed to the change in phenotype may have also been lost.
Region 5 contained a deletion of 216 nt (72 amino acids) located in a region of the hctB gene, which encodes a histone H1-like protein, Hc2, and differentiates ocular, noninvasive, and LGV disease groups from one another (19, 20) (see Fig. S4 in the supplemental material). Compared to LGV strains, D/UW3 and D(s)/2923 have a 20-amino-acid indel, while ocular strains have a 56-amino-acid deletion (Fig. S5). The basic amino acids arginine and lysine (Fig. S5, blue and purple letters), as well as pentapeptide motifs (e.g., TAARK, VAAKK, and TVAKR), are highly repetitive (Fig. S5, red stars, denoting separation of pentapeptides) and consist primarily of three aliphatic residues followed by two basic residues.
Region 6 contained two in-frame deletions in the tarp gene, which encodes the translocated actin-recruiting phosphoprotein TARP (21), at residues 379 to 687 and at residues 1084 to 1239, based on the alignment with L2/434 (see Fig. S6 in the supplemental material). Both deletions are similar to D/UW3 and represent areas of sequence divergence from LGV strains (Fig. S6). The deletion might have occurred from a crossover event, as suggested by SimPlot informative sites and phylogenetic and bootscan analyses (Fig. 7A to D) (see Materials and Methods). With the use of SimPlot, recombination breakpoints were located at residues 990 to 991 and 1218 to 1219, equivalent to residues 996 to 997 and 1689 to 1690 when gaps are included (Fig. 7B and C; Fig. S6).
The deletions represent an area of tyrosine-rich repeats in Tarp (21). While L2/434 and L2b/UCH-1/proctitis contain six tandems of tyrosine-rich repeat regions, L2c, A/2497, B/HAR36, C/TW3, and D/UW3 contain three partial tandems of tyrosine-rich regions (see Fig. S7, purple letters, tyrosine in bold, in the supplemental material). Other polymorphisms include indels, SNPs, and C-terminal repeat regions in B/HAR36, C/TW3, and D/UW3 that differentiate these strains from L2/434, L2b/UCH-1/proctitis, and L2c (Fig. S7, cyan and grey highlights, respectively). No polymorphisms in the proline-dense and actin-binding domains were identified.
Region 7 contained non-D-like SNPs at a frequency of 10 to 20% of the total reads. Comparison of D(s)/2923 with L2c in this region showed similar results.
Although neither IncA expression nor large fused inclusions were observed in vitro for L2c (Fig. 1), all LGV strains, including L2c, had identical incA sequences (data not shown). The promoter region encompassing 1,500 nt upstream of the incA start codon showed a high degree of sequence conservation, with ≥99% nucleotide identity among reference strains A to K, Ba, Da, Ia, Ja, and L1 to L3 and clinical strains, L2b and L2c but not with D(s)/2923. Moreover, the promoter and encoding sequences of five hypothetical proteins (CTL0475 to CTL0540) in L2/434, which might function like IncA proteins due to the characteristic IncA protein domains and family, were also highly conserved (data not shown).
The sequence of the L2c ompA gene differed from the L2/434 sequence in an SNP upstream of variable segment 4 (VS4) that is conserved among L2b/UCH-1/proctitis and L2′ (see Fig. S8 in the supplemental material). The mutation results in a synonymous codon change.
The last decade has seen a cumulative increase in the evidence for recombination and horizontal gene transfer (LGT) among intracellular bacteria, and Chlamydia has been no exception to this (14, 22–31). Data on recombination for C. trachomatis have come solely from extensive comparative analyses of multiple genes dispersed throughout the genome for reference and clinical strains. However, the authors of two recent publications of different C. trachomatis clinical isolates that were genome sequenced concluded that there were regions of the genomes that were consistent with interstrain recombination (9, 18). Here, we provide the first analytically confirmed whole-genome evidence for recombination between C. trachomatis strains and, surprisingly, between invasive (LGV) and noninvasive urogenital strains. This provokes a reassessment of how we think about C. trachomatis infections and their evolution in vivo. We know that there are many diverse microbes that inhabit the rectum. However, it has not been considered that C. trachomatis strains coinfect the rectal mucosal with any relevant frequency, unlike the urethra or cervix (32), and undergo LGT to produce recombinants.
Our study was also informative in showing how laboratory processing of C. trachomatis strains after isolation from the patient skewed understanding of the nature of the infection. Initially, the L2c culture exhibited toxicity to HeLa cells and expressed the toxin protein. In addition, an intact D-like toxin gene was recovered by PCR and Sanger sequencing of the L2c genome. These data suggested a correlation with the clinical severity of disease. After multiple passage in cell culture, the cytotoxic phenotype was lost (Fig. 2), likely representing selection against the cytopathic clone in culture, and we could not detect the toxin gene in the genome sequencing project using the passaged isolate. The conclusion from this experience is that ideally, in order to fully grasp the complexity of C. trachomatis infections, including the frequency and diversity of recombinants, techniques for direct genome sequencing (without prior amplification) of the organisms present in infected tissue will be needed.
IncA is an important constituent of the inclusion membrane, facilitating the fusion of inclusions within the cell. Variants of C. trachomatis strains B, D to H, Ia, and J that completely lack incA or lack a portion of the gene produce multiple inclusions that do not fuse (5, 33). The possible benefits of fusogenic inclusions include interaction with host cell vesicle trafficking and genetic exchange between the DNA from different RBs (4), although it has been reported that nonfusogenic strains can undergo recombination (18), as has been demonstrated in vitro between nonfusogenic and wild-type strains (27). Interestingly, patients infected with naturally occurring incA “knockouts” or mutants have fewer signs and symptoms, lower proliferative capacity, and fewer inclusion-forming units (IFUs) in culture than wild-type fusing strains (12). L2c failed to express IncA at any time during development and produced many small nonfusogenic inclusions (Fig. 1). In contrast to the majority of asymptomatic cases caused by other nonfusogenic strains (12), L2c was hypervirulent in terms of clinical signs and symptoms, producing severe hemorrhagic proctitis, although the patient did not exhibit an inguinal syndrome, which may in part be due to the presence of a functional cytotoxin (Fig. 2) (discussed below). The L2c sequence of incA was identical to that of L2/434, and incA mRNA expression levels were similar for both strains (Fig. 1E), which suggests that there may be a disruption in regulation for protein processing. Recent studies of the transcription expression profile and cell culture kinetics of naturally occurring IncA knockout and wild-type strains suggest that the IncA-negative phenotype may arise from multistep events, involving a decrease in transcription level and/or a partial or complete inactivation of translation (5). Alternatively, host environmental clues may be necessary for regulation, as has been discovered for certain proteins among other human Gram-negative pathogens, such as Burkholderia (34).
While the genomes of C. trachomatis strains to date are relatively conserved and share a high degree of synteny, an exception to this is found in the plasticity zone (PZ) (28). The PZ is typically rich in heterogeneity, with evidence for genome rearrangement as well as LGT for many bacterial species, including Bartonella grahamii, Helicobacter pylori, and Shigella flexneri (28, 35–37). The chlamydial PZ contains metabolic and virulence factors associated with tissue tropism and immune evasion, including the toxin loci (38, 39). Here, we consider it likely that the partial toxin gene was acquired from coinfection with a D strain, since no LGV sequences (n = 6) to date contain a partial or complete toxin gene. Indeed, D strains are prevalent in rectal infections among MSM. A recent study in Sweden of C. trachomatis infections among 197 MSM identified high prevalences of strains G (45%), D (27%), and J (26%) in the rectum (40), although there was no information on the presence of mixed infections or coinfection with LGV strains. Importantly, the genomic uptake of DNA by transformation can occur during coinfection or sequential infection (41). Transformation is a likely mechanism employed by C. trachomatis, which would provide vast opportunities for genetic exchange.
We also found that the toxin acquired by L2c was functional (Fig. 2). In our cytotoxicity assay, L2c had a profound effect on cell morphology and death, as did D/UW3, compared with L2/434. This effect was similar to that noted previously for strain D (42). The C. trachomatis toxin is known to play an important role in damaging host cell actin microfilaments, likely facilitating growth of the intracellular inclusion (38, 42). Analyses of the toxin loci of Chlamydia muridarum, a mouse pathogen, have suggested that the toxin may function during an early phase of infection to inactivate GTPase near sites of EB entry, resulting in innate immune evasion (28, 39). Consequently, introduction of the functional toxin into L2c may support a mechanism for survival through escaping immune surveillance and allowing sufficient replication in nonfusing inclusions to cause severe localized mucosal disease as in our patient. Moreover, the relatively high degree of cytotoxicity may result in barriers to dissemination to regional lymph nodes and, thereby, a lack of an inguinal syndrome. While L2c does not appear to be more cytotoxic than D/UW3, some degree of cytotoxicity may limit the ability to successfully culture these and other strains with a partial or complete toxin gene, which could potentially hinder our abilities to detect emerging LGV strains that contain the toxin and to further characterize them.
Tarp is secreted via the type III secretion system (TTSS), present in both EBs and RBs (43), at sites of entry into the host cell for purposes of pathogen-directed actin polymerization and cytoskeleton rearrangement, an event that coincides with EB entry into the cell (44). While the L2c tarp gene sequence alignment and phylogenetic tree imply a close genetic relationship with L2/434 and L2b/UCH-1/proctitis strains, L2c contains a large in-frame deletion. Based on our analyses, the L2c tarp gene is likely a recombinant of L2 and D strains (Fig. 7).
According to functional studies of the chlamydial Tarp protein (21), tyrosine phosphorylation has been associated with actin recruitment and inclusion development, while the number of tyrosine-rich repeat regions has been associated with functionality. But inhibition of Tarp tyrosine kinase activity had no effect on EB entry (21). Thus, the presence of only two partial and two complete regions of tyrosine-rich repeats in the L2c Tarp, compared to six complete regions in other LGV strains (see Fig. S7 in the supplemental material), would not prevent pathogen entry but might affect cytoskeletal rearrangement that could impair inclusion development and result in smaller inclusions, as observed for L2c (Fig. 1C). If this is found to be a common recombinant region for clinical strains, it would highlight an evolutionary mechanism for diversifying the number of tyrosine residues to affect intracellular growth and infection outcomes. Indeed, recent sequence analysis of the tarp gene from numerous clinical strains found mutations that were similar among strains causing the same disease, and phylogenetic analysis suggested that this is one of the few genes that are responsible for C. trachomatis-specific disease phenotypes (45). Inferior actin recruitment may also be correlated with a lack of IncA expression and function, which would affect inclusion fusion as in L2c.
During the late stage in the developmental cycle, C. trachomatis expresses two histone H1 homologues that are involved in RB-to-EB transition through nucleoid compaction and downregulation of gene expression (19, 46). As shown in Fig. S5 in the supplemental material, the molecular mass and repetitive pentapeptide motifs of Hc2 are inversely correlated with the size of the deletion; thus, there are variable numbers among the C. trachomatis strains (19, 20). The deletion in L2c Hc2 resulted in a substantial decrease in pentapeptide motifs and also in the number of positively charged amino acids in two-thirds of the Hc2 amino terminus. According to in vitro studies showing the DNA-binding activity of L2/434 Hc2 expressed in Escherichia coli (20), the lower number of repetitive motifs in L2c Hc2 may weaken electrostatic and hydrogen-bonding interactions of Hc2 with DNA, which may affect transcriptional regulation. Moreover, while ocular and D/UW3 strains contain a proline at coordinate 153 (Fig. S5, denoted in bold and underlined), which creates a “kink” in protein structure allowing Hc2 to participate in stronger interactions (20, 47), the lack of proline in L2c Hc2 (P153T) suggests weaker DNA-binding activity. Consequently, inefficient repression of gene expression would likely impede RB-to-EB differentiation and may in part explain the poor propagation rate and morphological characteristics of L2c in culture.
A significant fraction of C. trachomatis genomes have unknown function, and the implied function of many genes is based on sequence similarity to homologous genes from other prokaryotes and a few eukaryotes. While there have been numerous attempts to produce stable mutants of C. trachomatis without success (48), which has hindered our ability to obtain unambiguous information about gene function, the discovery of naturally occurring recombinants, such as L2c, can help clarify the functional importance of specific genes and disease phenotypes and pave the way towards identifying the underlying mechanisms of LGT for developing a gene transfer system for Chlamydia.
The study was approved by the Institutional Review Board of Children’s Hospital and Research Center at Oakland in accordance with the Declaration of Helsinki. The clinical sample was obtained from the rectal mucosa of a male who had a history of sex with men and presented with severe hemorrhagic proctitis. Briefly, a 26-year-old male presented to a San Francisco Bay area clinic with a complaint of severe rectal pain with blood on defecation. He described a series of encounters with homosexual men and unprotected anal receptive intercourse during the prior month until the onset of rectal pain 5 days prior to the clinic visit. The rectal bleeding had commenced 1 day earlier. The man had a history of gonorrhea in the past but no other known STDs, no known exposure to men with known STDs, and no history of illicit drug use. He was reported to be HIV negative and had no other medical conditions, was not on any over-the-counter or prescription medications, and appeared to be in excellent health. On the physical exam, he was a well-appearing male, afebrile, and normotensive, with no evidence for an ulcerative lesion on the glans or shaft of the penis and no inguinal adenopathy. The anus was inflamed, and on proctoscopy, there was extensive bleeding of the mucosa, with evidence of a purulent discharge. Four swabs from each quadrant of the rectum were obtained and placed in transport medium. Three swabs were sent to the clinical laboratory for standard detection of Neisseria gonorrhoeae by commercial PCR and culture and of C. trachomatis by commercial PCR. The fourth swab was sent to the Chlamydia Research Laboratory at CHORI for in-house C. trachomatis culture.
Reference strains D/UW3, L1/440, L2/434, L2a/TW-396, and L3/404 (49), a clinical L2b strain from Amsterdam (a kind gift from Servaas Morré), and the clinical sample, referred to as L2c, from the above-described case, were analyzed. Each strain was propagated in the human cervical adenocarcinoma cell line HeLa229 using our previously described protocols (13, 50, 51). The clinical sample was diluted and directly plaque purified using sequential plaque purifications per our referenced protocols (13, 50, 51). To verify the clonal purity of the plaques, ompA and MLST genes (14) were amplified by PCR and cloned separately using a TOPO TA cloning kit (Invitrogen) (52); 10 clones of each were randomly selected for Sanger sequencing using techniques we have described previously (24). The confirmed clonal plaques were then individually propagated in tissue culture. A total of two passages were performed for L2c to generate sufficient gDNA for genome sequencing. The EBs for each C. trachomatis isolate were purified from contaminating human cells using DNase treatment followed by gradient ultracentrifugation, and genomic DNA was purified from each isolate using a High Pure PCR template preparation kit (Roche Diagnostics, Indianapolis, IN) as we previously described (13, 53).
For determination of MOI based on IFUs, duplicate serial 10-fold dilutions of purified EBs were used to infect HeLa cells in 24-well plates. After 24 to 48 h, the cells were fixed in methanol, washed with phosphate-buffered saline (PBS), and stained using the Pathfinder Chlamydia culture confirmation monoclonal antibody (MAb) (Kallestad Diagnostics, Chaska, MN) in accordance with the manufacturer’s directions. The number of IFUs per well was divided by the number of cells per well; an average was taken for duplicate wells to arrive at the MOI per strain.
HeLa cell monolayers grown in minimal essential medium (MEM) containing 10% fetal bovine serum (UCSF Cell Culture Facility, San Francisco, CA) and 1 µg/ml gentamicin (MP Biomedicals, Solon, OH) at a confluence of 80% on 12-mm coverslips (Fisher Scientific, Pittsburgh, PA) in 24-well plates were infected with either reference strains L1/440, L2/434, L2a/TW-396, L3/404, and D/UW3, clinical strain L2b from Amsterdam, or the clinical strain L2c from this study in sucrose-phosphate-glutamine (219 mmol/liter sucrose, 3.82 mmol/liter KH2PO4, 8.59 mmol/liter Na2HPO4, 4.26 mmol/liter glutamic acid, 10 µg/ml gentamicin [MP Biomedicals], 100 µg/ml vancomycin [Acros Organics, Morris Plains, NJ], and 25 U/ml nystatin [MP Biomedicals] in distilled water, pH 7.4) at an MOI of 1, unless indicated, for 2 h on an orbital shaker at room temperature. The inocula were aspirated, and the infected monolayers were cultured in a humidified incubator at 37°C with 5% CO2 in Dulbecco’s modified MEM (Cellgro, Manassas, VA) with GlutaMAX-1 (Life Technologies, Rockville, MD) supplemented with 10% fetal bovine serum (UCSF Cell Culture Facility), 0.45% glucose solution (Cellgro), 20 mM HEPES (UCSF Cell Culture Facility), 0.08% NaHCO3, and 1 µg/ml cycloheximide (13). At 12, 18, 24, 36, and 48 h (48 h for D/UW3 only) postinfection, the coverslips were fixed with methanol for 10 min, rinsed in PBS, and incubated for 30 min with anti-C. trachomatis specific lipopolysaccharide (LPS) MAb (Virostat, Portland, ME) and anti-IncA MAb 3H7 (gift from Daniel D. Rockey) or polyclonal anti-IncA (gift from Ted Hackstadt). The secondary antibodies were Cy-3 conjugated IgG (Jackson ImmunoResearch, West Grove, PA) for LPS and fluorescein isothiocyanate (FITC)-labeled IgG (Jackson ImmunoResearch) or Alexa 488 (Invitrogen) for IncA and chlamydial heat shock protein 60. DAPI (4′,6-diamidino-2-phenylindole dihydrochloride) (Vector Laboratories, Burlingame, CA) was used to stain DNA. The inclusions formed by the reference LGV and D/UW3 strains and the clinical L2c strain were visualized on a Zeiss 510 confocal microscope. Light microscopy was used to examine the D/UW3, LGV, and clinical L2c strains at 36 h, but no inclusions were observed for any LGV strains at this time point.
We examined incA expression using our previously described techniques, with slight modifications (54, 55). HeLa cells were infected at 0, 2, 12, 24, and 48 h with D/UW3, L2/434, and L2c at an MOI of 5. Total RNA was extracted using an RNeasy minikit (Qiagen, Valencia, CA) per the manufacturer’s instructions; on-column DNase (Qiagen) treatment was performed to remove contaminating DNA. cDNA was generated from 2 µg of total RNA using TaqMan reverse transcriptase (RT) reagents and random hexamers (Applied Biosystems, Foster City, CA). Quantitative real-time PCR was performed in replicate using SYBR green chemistry, reagents, primers (see Table S1 in the supplemental material), thermocycling, and standard curves as we previously described (54, 55). 16S rRNA was used for normalization. Negative controls and the standard curves for each gene were used as previously described (54, 55). Analysis of the dissociation curves was used to verify the specificity of the amplified products. The conversion of the mean threshold cycle values determined the relative amounts of target and control gene from the respective standard curve. Three independent experiments were performed.
Growth curves for strains D/UW3, L2/434, and L2c were generated using quantitative PCR as we previously described (52, 56). Briefly, each strain at an MOI of 1 was grown in HeLa cells and harvested at 0, 12, 18, 24, 36, and 48 h, and total RNA was extracted and reverse transcribed as described above. Each quantitative PCR consisted of 1× SYBR green master mix (Applied Biosystems), 1 µl of cDNA, and 5 pmol of each primer (16S rRNA and glyceraldehyde-3-phosphate dehydrogenase [GAPDH] as described in reference 56) in a total reaction volume of 25 µl run on an ABI 7900 (Applied Biosystems) using our thermocycling profile as previously described (56); standard curves were generated, and negative controls for each primer pair were included in each run as we previously described (56). GAPDH was used for normalization. Melting curves were used to verify the specificity of the reactions and the absence of primer dimmers. Samples were amplified in triplicate where the mean was used for analysis. Three independent experiments were performed.
Reference strains D/UW3 and L2/434 and the clinical L2c strain from this study were analyzed in a cytotoxicity assay adapted from the method of Belland et al. (42). HeLa cells were grown to 80% confluence as described above in 24-well plates. The monolayers were treated with 1 ml DEAE-dextran (45 µg/ml) in Hanks’ balanced salt solution (HBSS) for 15 min at 37°C. The cells were infected at an MOI of 100 (~5 × 107 IFU) with each strain and mock infected in duplicate at room temperature for 4 h on an orbital shaker. The inocula were removed, and the cells were washed with HBSS and incubated for an additional 4 h in growth medium as described above except for the addition of 10 µg/ml gentamicin (MP Biomedicals), 25 µg/ml vancomycin (Fisher) to prevent any nonchlamydial bacterial growth. The monolayers were visualized under light microscopy at ×400 magnification. The cells were scored for rounding, detachment, and lysis compared to the levels for uninfected control cells by the following metric: (−), same as cell control; 3+, 100% of cells affected; 2+, 75% of cells affected; and 1+, 25% of cells affected.
The above-described experiments were repeated using 1 µg/ml of doxycycline to pretreat cells before infection. In addition, the assay was repeated using L2/434 and a higher passage number (passage no. 20) for L2c. A total of two independent experiments were performed for each of these studies.
To detect the toxin protein, a Western blot analysis was performed as we have previously described, with modifications (55). Briefly, L2/434, D/UW3, and L2c were independently grown in HeLa cells as described above at an MOI of 100. Purified EBs from each isolate were solubilized in Laemmli buffer, run on NuPAGE Novex 4 to 12% bis-Tris precast gels (Invitrogen) with a BenchMark prestained protein ladder (Invitrogen), and electrophoretically transferred onto a BioTrace polyvinylidene difluoride (PVDF) membrane (Pall Life Science, Port Washington, NY) at 120 V for 2 h in 1× NuPAGE transfer buffer (Invitrogen) with 10% methanol. The membrane was incubated with 5% skim milk in 1× Tris-buffered saline (TBS) and 0.1% Tween 20, washed with 1× TBS-0.1% Tween 20, and reacted overnight with a polyclonal antiserum against recombinant CT166 (gift from Harlan Caldwell) at a 1:500 dilution. The membrane was washed, and secondary antibody (goat anti-rabbit IgG conjugated to horseradish peroxidase [HRP]; Bio-Rad, Hercules, CA) was applied at a 1:1,000 dilution, incubated at room temperature, and washed in 1× TBS prior to detection using the SuperSignal West Pico chemiluminescent substrate (enhanced chemiluminescence [ECL] detection system; Thermo Scientific, Rockford, IL) and CL-XPosure Film (Thermo Scientific).
The purified genomic DNA from the clinical case was shotgun sequenced using a combination of the 454/Roche GS-FLX Titanium and GS Junior instruments (454 Life Sciences, Branford, CT). The 809,874 reads were generated with an average read length of 404 nt. Preliminary analysis suggested an unusually high proportion of contamination of DNA from the HeLa cell culture. Therefore, 618,844 reads of human origin were removed (identified by mapping to the hs19 golden path release of the genome [http://genome.ucsc.edu/cgi-bin/hgGateway]) using Newbler gsMapper 2.5 software.
We assembled a random subset of 50,000 of the remaining reads de novo using the 454 gsAssembler software program, version 2.0.01.14, with default parameters. The order of the contigs in the final genome sequence was determined using a combination of the possible connections between contigs suggested by overlapping reads (57) and information from mapping contigs against the L2/434 reference genome (9). After determination of the correct path through the contig graph, the contigs were assembled to form the final sequence. To accomplish this, the ends of adjoining contigs were matched using BLAST (58) against the database of reads. Reads found in both contigs were assembled into a consensus sequence that bridged the two contigs to form a single, larger contig. The majority of gaps were small enough to be spanned by a single read. The plasmid was a single contig with reads overlapping the beginning and end, indicating the expected circular redundancy.
For verification of the sequences, we aligned the original reads to the consensus chromosomal and plasmid templates using gsMapper 2.3; 91.6% of the 191,030 human screened reads (average length, 436 nt) mapped to the chromosome, and 6.8% mapped to the plasmid. The remaining unmapped reads were found not to assemble de novo into any contigs consisting of more than 2 reads, suggesting that these were mostly contamination that had not matched the human reference sequence. The mapped assemblies were inspected to remove a small number of errors. The final assembly, with an average redundancy of coverage of approximately 75-fold, was used for annotation. The identity of each base in the final consensus sequence was determined by majority vote. The gsMapper software revealed two possible structural variants in a minority of sequence reads. The first was a deletion between coordinates 133542 and 155549 present in 4 sequence reads. The second was a deletion between bases 849581 and 849942 present in 19 sequence reads.
The L2c consensus chromosome and plasmid sequences were annotated automatically using the Integrative Services for Genomics Analysis pipeline (16). Promoter 2.0 prediction (http://www.cbs.dtu.dk/services/Promoter/), Bacterial PROMoter prediction (http://www.softberry.ru/berry.phtml), and the Berkeley Drosophila Genome Project neural network (http://www.fruitfly.org/seq_tools/promoter.html) were used to identify the ftsK putative promoter region, including the initiation site, −10 and −35 promoter elements, and ribosome binding sites. The default parameters were used for prediction scores for significance.
To verify the regions of diversity (L2-D hybrids), single amplicons of these regions (the tarp gene, incA, hctB, and toxin locus genes; the IGR upstream of ftsK) that were identified by the SAS program Q-plotGenome (see below) and genomic analyses (see below) as divergent from L2/434 were amplified, cloned using a TOPO TA cloning kit (Invitrogen), and Sanger sequenced using primers (see Table S1 in the supplemental material) designed to amplify and sequence the full length of each as previously described (16). Ten clones were sequenced for each. The sequences were compared with those of the following available reference strains (GenBank accession numbers are given in parentheses): A/2497 (EU121607), B/HAR36 (EU121608), C/TW3 (EU121609), D/UW3 (AE001273), L2/434/Bu (AM884176), and L2b/UCH-1/proctitis (AM884177) for the tarp gene; A/HAR13 (CP000051), B/Jali20/0T (FM872308), C/TW3 (EU121596), D/UW3 (AE001273), L2/434/Bu (AM884176), and L2b/UCH-1/proctitis (AM884177) for hctB; A/HAR13 (CP000051), B/TW5 (DQ064209), B/TZ1A828/OT (FM872307), Ba/Apache-2 (DQ064210), C/TW3 (DQ064211), D/UW3, E/Bour (DQ064214), F/IC-CAL3 (DQ064215), G/UW57 (EU247624), H/UW4 (EU247625), i/UW12 (DQ064218), Ia/870 (DQ064219), J/UW36 (DQ064220), K/UW31 (DQ064221), L1/440 (DQ064222), L2/434/Bu, L2b/UCH-1/proctitis, and L3/404 (DQ064224) for incA; C/TW3 (AY647994), D/UW3, H/UW4 (AY647999), and L2/434 for the cytotoxin locus; and A/HAR13, B/Jali20/0T, D/UW3, L2/434, and L2b/UCH-1/proctitis for the IGR upstream of ftsK.
To detect recombination, we developed a program called Q-plotGenome that was written as a set of macros in the SAS software 9.2 language (SAS Institute, Inc., Cary, NC) to compare the genome sequences of L2c with those of reference strains L2/434 and D/UW3. The core of the program is similar to that of SimPlot (59) except that it covers the entire 1-Mb genome instead of a discrete sliding window of limited size; it samples a series of fixed-length subsequences (windows) from one genome and then compares them to windows from the other genome. Q-plotGenome is, thus, a tool for comparing DNA sequences in the range of 1 Mb plus and for displaying results graphically. The description of Q-plotGenome is detailed in Text S1 in the supplemental material.
To identify statistically significant recombination breakpoints or regions, SimPlot software 3.5.1 (http://sray.med.som.jhmi.edu/SCRoftware/) was utilized as we have previously described (25). The parameters included a window size of 200 base pairs (bp), a step of 20 bp, neighbor-joining trees calculated using the Kimura-2-parameter distance model (60), and 1,000 bootstrap replicates to determine confidence for each branch (Fig. 7).
For a global analysis, we used a version of the BLAST score ratio approach (17) to identify recombinants. We aligned the databases of both (i) raw L2c sequences and (ii) 100-nt windows along the L2c chromosome sequence separated by 50 nt against the L2/434 genome as a reference and the other complete genomes listed in the previous section. Matches with a BLAST score ratio of <0.95 were plotted using Circos software (Fig. 5) (61).
For analysis of specific regions, nucleotide sequences were aligned by ClustalW 1.8, MUSCLE (62), or MAUVE (63). Gaps were removed using GBLOCKS (64) with default parameters. Phylogenic inference and tree plotting were performed using the MEGA 3.1 (65) package described previously (13, 24, 25) or PHYLIP (66) programs DNApars, Seqboot, and Consensus. Neighbor-joining trees were calculated using the Kimura 2-parameter model that assumes constant nucleotide frequencies and their rates of substitution among sites (60) and 1,000 bootstrap replicates.
For detection of strain D-specific SNPs within the L2c data, we first identified SNPs between the D/UW3 and L2/434 genomes using the show-snps tool of the MUMmer package (67). We then mapped L2c against L2/434 using gsMapper 2.3 and extracted the number of “D”-like and “L2”-like nucleotides at each variant position from the 454AllDiffs.txt output using a custom script. Similar analyses were performed for D(s)/2923.
The complete chromosome and plasmid sequences were submitted to the NCBI GenBank database (accession no. CP002024). The NCBI genome project identification number is 47581. The raw data from the 191,030 human screened L2c reads were deposited in the NCBI short read archive (sra; accession no. SRP002231). The sequence of the L2c toxin gene from the initial plaque purification was submitted to GenBank (accession no. 2981755).
Supplemental methods. Download Text S1, PDF file, 0.057 MB.
Real-time and quantitative PCR primers and PCR primers for multiple-sequence comparisons and sequence verification of L2c.
Q-plotGenome of L2c query genome sequence in comparison with those of reference strains L2/434 (red) and D/UW3 (blue). The figure is broken up into 7 panels (A to G). The y axis (left) represents the similarity to the query genome. The y axis (right) represents the number of nucleotides displaced by a recombination event or indel. The x axis shows the nucleotide position within the genome based on the genome sequence of L2/434 (GenBank accession no. AM884176). Download Figure S1, PDF file, 2.139 MB.
Plot of the sequence read variation within the L2c chromosome comparing D/UW3 and D(s)/2932. The x axis shows the coordinates of the L2c consensus sequence; the y axis shows the level of 454 read coverage, expressed as the number of reads per base; the gray line is a smoothed curve of overall read redundancy (produced using the R “lowess” function, with the f smoothing parameter set to 0.2). Green bars, number of variant reads that match D(s)/2932 at the 7,565 known SNPs overlaid on the 7,982 known SNP positions (red bars) between D/UW3 and L2/434. Download Figure S2, PDF file, 0.161 MB.
Alignment of the full-length intergenic region (IGR) upstream of the ftsK gene for clinical strains L2c and L2b/UCH-1/proctitis and reference strains L2/434, D/UW3, B/HAR36, and C/TW3. Two putative promoter regions are highlighted in green, and the sequences encoding tRNA-His are in a blue font. With the use of the BPROM and BDGP promoter prediction software programs, the proposed transcription initiation sites (+1) and the proposed −10 and −35 promoter elements for each of the two predicted promoter regions are in red, purple, and gray typefaces, respectively. The putative Shine-Dalgarno ribosome binding sequences are underlined. Download Figure S3, PDF file, 0.088 MB.
Alignment of the full-length hctB nucleotide sequence for clinical strains L2c and L2b/UCH-1/proctitis and reference strains L2/434, D/UW3, B/HAR36, and C/TW3. Codons for arginine and lysine residues are shown in blue and purple, respectively. The repetitive pentapeptide motifs are separated by red stars. Download Figure S4, PDF file, 0.086 MB.
Alignment of the full-length amino acid sequence of the histone H1-like protein Hc2 for clinical strains L2c and L2b/UCH-1/proctitis and reference strains L2/434, D/UW3, B/HAR36, and C/TW3. Arginine and lysine residues are shown in blue and purple, respectively. Repetitive pentapeptide motifs are separated by red stars. Dots at the ends of each translated protein sequence represent stop codons. The number at the top of the end of the sequence alignment represents the amino acid location in relation to the start codon. Download Figure S5, PDF file, 0.069 MB.
Alignment of the full-length tarp nucleotide sequence for clinical strains L2c and L2b/UCH-1/proctitis and reference strains L2/434, D/UW3, A/2497, B/HAR36, and C/TW3. The dashes represent a deleted region. The nucleotides that encode the ~50-amino-acid tyrosine-rich regions are shown in purple, with the tyrosine codon in bold purple. Each tyrosine-rich repeat contains 4 or 5 tyrosine residues. The nucleotides at the 3′ end that encode the ~120-amino-acid C-terminal repeat regions are shown in red. The nucleotides at the 3′ end that encode the proline-dense and actin-binding domains are highlighted in cyan and green, respectively. Download Figure S6, PDF file, 0.122 MB.
Alignment of the full-length Tarp amino acid sequence for clinical strains L2c and L2b/UCH-1/proctitis and reference strains L2/434, D/UW3, B/HAR36, and C/TW3. The dash represents deletion. The first 8 amino acids of the ~50-amino-acid tyrosine-rich regions are shown in purple, with tyrosines in bold. Each tyrosine-rich repeat contains 4 or 5 tyrosine residues. The first 8 amino acids of the ~120-amino-acid C-terminal repeat regions are shown in red. Proline-dense and actin-binding domains in the C-terminal half of Tarp are highlighted in cyan and green, respectively. Dots at the ends of each translated protein sequence represent stop codons. Download Figure S7, PDF file, 0.089 MB.
Partial ompA sequence alignment of clinical strains L2c and L2b/UCH-1/proctitis and reference strains L2/434L2′ and L2a/TW396. The numbers above the sequences indicate the nucleotide positions in ompA. The box indicates variable segment two (VS2) of ompA. Nonsynonymous amino acid substitutions are shown below the nucleotide change relative to the L2/434 sequence. Download Figure S8, PDF file, 0.095 MB.
This research was supported by Public Health Service grants R01 AI39499 and R01 AI059647 (to D.D.) from the National Institute of Allergy and Infectious Diseases and by National Science Foundation-U.S. Department of Agriculture grant 2009-65109-05760 (to D.D.).
We thank Mark Driscoll and Brian Desany for their generous help with the 454 Junior instrument.
Citation Somboonna N, et al. 2011. Hypervirulent Chlamydia trachomatis clinical strain is a recombinant between lymphogranuloma venereum (L2) and D lineages. mBio 2(3):e00045-11. doi:10.1128/mBio.00045-11.