|Home | About | Journals | Submit | Contact Us | Français|
Rickettsia typhi, the causative agent of murine typhus, is an obligate intracellular bacterium with a life cycle involving both vertebrate and invertebrate hosts. Here we present the complete genome sequence of R. typhi (1,111,496 bp) and compare it to the two published rickettsial genome sequences: R. prowazekii and R. conorii. We identified 877 genes in R. typhi encoding 3 rRNAs, 33 tRNAs, 3 noncoding RNAs, and 838 proteins, 3 of which are frameshifts. In addition, we discovered more than 40 pseudogenes, including the entire cytochrome c oxidase system. The three rickettsial genomes share 775 genes: 23 are found only in R. prowazekii and R. typhi, 15 are found only in R. conorii and R. typhi, and 24 are unique to R. typhi. Although most of the genes are colinear, there is a 35-kb inversion in gene order, which is close to the replication terminus, in R. typhi, compared to R. prowazekii and R. conorii. In addition, we found a 124-kb R. typhi-specific inversion, starting 19 kb from the origin of replication, compared to R. prowazekii and R. conorii. Inversions in this region are also seen in the unpublished genome sequences of R. sibirica and R. rickettsii, indicating that this region is a hot spot for rearrangements. Genome comparisons also revealed a 12-kb insertion in the R. prowazekii genome, relative to R. typhi and R. conorii, which appears to have occurred after the typhus (R. prowazekii and R. typhi) and spotted fever (R. conorii) groups diverged. The three-way comparison allowed further in silico analysis of the SpoT split genes, leading us to propose that the stringent response system is still functional in these rickettsiae.
The rickettsiae are small (ca. 0.4 by 0.9 μm), gram-negative, aerobic, coccobacillary, α-proteobacteria. They are obligate intracellular parasites with a life cycle that involves both vertebrate and invertebrate hosts. Rickettsiae depend on hematophagous arthropods as vectors and primary reservoir, although small mammals, such as rats and opossums, also serve as amplifying hosts (8, 10). Rickettsia are classified into two groups; the spotted fever group (SFG), which includes R. conorii, R. sibirica, and R. rickettsii, and the typhus group (TG), which includes R. prowazekii and R. typhi. Two rickettsiae (R. prowazekii and R. rickettsii) are on the list of agents of the Antiterrorism and Effective Death Penalty Act of 1996 because they are stable, highly infectious agents that cause severe disease (7). Both Japan, during World War II, and the Soviet Union, during the Cold War, investigated the use of rickettsiae as biological weapons (9, 35, 65).
Rickettsia typhi is the causative agent of murine typhus (endemic typhus). Infection with R. typhi causes fever, headache, and myalgia and leads to disseminated, multisystem disease, including infection of the brain, lung, liver, kidney, and heart endothelia, lymphohistiocytic vasculitis of the central nervous system, diffuse alveolar damage and hemorrhage, interstitial pneumonia, pulmonary edema, interstitial myocarditis and nephritis, portal triaditis, and cutaneous, mucosal, and serosal hemorrhages (63, 64). The nonspecificity and nonuniformity of symptoms and the lack of specific diagnostic tests that are effective during the acute stage of the illness often lead to misdiagnosis, delaying appropriate treatment. Although the mortality rate is low (1% of reported cases), in severe cases R. typhi can cause meningoencephalitis, interstitial pneumonia, and disseminated vascular lesions (12). Without specific treatment, 99% of those infected will clear the disease within weeks, making proper accounting of R. typhi infections difficult (28).
Although R. typhi can be transmitted to the mammalian host by the bite of an infected flea or louse (the rat flea Xenopsylla cheopis, the cat flea Ctenocephalides felis, or the rat louse Polyplax spinulos), the more important mechanism of inoculation is through the feces of the vector. R. typhi multiplies in the epithelium of the flea midgut, is shed in the feces, and is then deposited during feeding. Organisms in the feces enter the host through irritated, abraded skin. The bacterium is then hematogenously spread and ultimately invades endothelial cells (12). Transmission can also occur via inhalation of aerosolized fecal particles. To enter the host cell, R. typhi induces phagocytosis by an unknown mechanism. Once within the cell, the organisms rapidly escape the phagosome, multiply within the cytoplasm, and then exit the host cell by burst lysis, allowing subsequent spread to other endothelial cells (65).
R. typhi is one of the leading causes of rickettsioses in the world. Although distributed worldwide, it is most common in warm coastal areas with large rat populations (12). Prior to World War II, 2,000 to 5,000 cases were reported annually in the United States. Due to intensive efforts to control rat fleas, as well as rat populations, the number of incidences has dropped to 9 to 72 cases per year from 1980 to 1998 (12, 28). Outbreaks continue to occur both around the world and in the United States. Murine typhus is considered endemic to the Hawaiian Islands with five to six cases per year reported. However, 47 cases were reported in 2002 leading the Hawaii Department of Health to institute active surveillance measures and mandatory reporting of R. typhi test requests. Due to the nonspecific clinical signs associated with murine typhus, it is expected that the actual number of cases was greater than those reported (40).
Rickettsial organisms have comparatively small genomes (1.1 to 1.3 Mb) that have arisen through reductive evolution as they developed dependence on the host cell for necessary functions (6). As a result, their genomes are littered with pseudogenes. The rickettsiae have a close evolutionary relationship with the progenitor of the mitochondria (5). The genomes of R. prowazekii (1.11 Mb) Madrid E, an attenuated strain of the causative agent of epidemic typhus, and R. conorii (1.27 Mb), the causative agent of boutonneuse fever and a member of the SFG of rickettsiae, have been sequenced previously (5, 52). A draft genome of R. sibirica, the causative agent of North Asian tick typhus and a member of the SFG, has recently been deposited in GenBank (accession no. NZ_AABW01000001) as has the unpublished complete sequence of R. rickettsii (accession number AADJ01000001).
We present here the complete genome of R. typhi and compare it to previously completed rickettsial genomes. We anticipate that the complete genome sequence of R. typhi will enhance the opportunities for investigation of virulence factors, pathogenesis, attenuation, and novel targets for antimicrobial therapy or blocking of pathogenic pathways. Of the five rickettsial genomes now available, this is only the second in the TG, the other three being SFG organisms. This second TG sequence will allow insight into phenotypic differences between R. prowazekii and R. typhi by genome comparison. This is likely to yield important leads regarding the study of pathogenicity since the case-fatality rate of louse-borne typhus fever is an order of magnitude greater than that of murine typhus.
R. typhi strain Wilmington (ATCC VR-144) was cultivated in the yolk sacs of chicken embryos. The rickettsial inoculum was adjusted so that most embryos died ca. 4 to 5 days after inoculation. Infected eggs were incubated at 37°C until the embryos died. The yolk sacs were then harvested and frozen at −80°C for later isolation of bacteria. The yolk sacs were homogenized and diluted in 10 volumes of sucrose phosphate glutamate (SPG) buffer (0.22 M sucrose, 0.01 M potassium phosphate [pH 7.0], 0.005 M potassium glutamate) (16). The suspension was centrifuged at 200 × g for 10 min to remove debris, and then the supernatant was centrifuged at 10,000 × g for 20 min to pellet the rickettsial cells. The aqueous phase and fat layer were discarded, and the pellet was resuspended in SPG buffer. The suspension was sonicated three times at 40 W for 10 s on ice. The bacteria were further purified by two rounds of discontinuous renografin gradient centrifugation by using 32, 36, and 42% steps (66) at 87,275 × g for 60 min. Light bands containing intact organisms were collected and mixed with 2 volumes of SPG. The suspension was centrifuged at 11,400 × g for 20 min, and the pellet containing the purified rickettsiae was suspended in SPG buffer for storage. The rickettsial DNA was extracted from the purified bacteria by using an IsoQuick nucleic acid extraction kit (ORCA Research, Inc., Bothell, Wash.) according to the manufacturer's instructions.
Genomic DNA was sheared to a size of ~2 kb by nebulization (CIS-US, Inc., Bedford, Mass.) and cloned into a derivative of pUC18 by the double adapter method (2). These clones were used for whole genome shotgun DNA sequencing by using dye terminator chemistry, and data were collected on ABI 3700 sequencers (Applied Biosystems, Foster City, Calif.).
The quality for each base sequenced was determined by PHRED (26, 27). Initial assembly of 21,831 whole-genome shotgun reads by using PHRAP (P. Green, unpublished data) gave 78 contigs. Linkages among contigs were built based on the read-pair information, and gaps between contigs were filled by template walking with the existing clone templates. Template walking reduced the assembly to a total of 17 contigs. Further ordering of the contigs was accomplished by comparison of the contigs to the finished genomes of R. prowazekii and R. conorii. Gaps between contigs were then closed by sequencing products from PCRs across these regions. Low-quality regions on the contigs were sequenced by template walking or by sequencing of PCR products. In the final assembly, the quality of each base was greater than or equal to a PHRAP value of 30. The average PHRAP value for the entire genome is 87.6.
Identical repeats in the R. typhi genome were detected by REPFIND (14), and BLASTN (1) was used for the identification of nonidentical repeats. A 124-kb inversion of the R. typhi sequence was observed compared to the finished genomes of R. prowazekii and R. conorii. The inversion was examined by PCR with two pairs of primers across each junction region and confirmed to be authentic. In addition, the frameshift within dnaX (RT0856) and a premature stop codon within the tRNA nucleotidyltransferase gene (RT0619) were verified by resequencing the regions.
Glimmer (23) and GeneMark (48) were used independently to predict open reading frames (ORFs). For GeneMark, we used the complete genomes of R. conorii and R. prowazekii as training models. We also used BLASTX to search for proteins that the hidden Markov model programs missed. This proved advantageous since it allowed us to discover pseudogenes not predicted by the other programs. The annotation process was performed in four stages. The first was an automated step, in which we compared the entire sequence to many different databases including GenBank nr (67), SwissProt (15), COGs (58), EXProt (61), and Pfam (11). These results were then loaded into a database for use by annotators. The next stage was a double-blind annotation in which regions of the genome were assigned to different annotators for analysis. After each ORF had been annotated by two different persons, we reconciled the two annotations. This was followed by a final review and cleanup of the annotation. Visualization of gene predictions and sequence markers was performed by using the Genboree system (www.genboree.org).
We created a set of rules to keep the annotation as consistent as possible since 10 individuals were involved. First, each identified region was assigned a class: rRNA, tRNA, ncRNA, protein, frameshift, pseudogene, other, sequencing error, or error. An ORF was considered a frameshift if a single base change would restore the frame to encode a complete a protein that matched a known protein. Pseudogenes were defined as regions with one or more stops, typically in-frame, that interrupted the reading frame. These were typically detected by BLASTX, and the ends were identified by alignment of the DNA with the analogous regions in R. conorii and R. prowazekii. The “other” class was used as a flag for the annotator to reinspect the ORF and seek assistance in determining gene organization or function. The sequencing error class was used when the sequence quality was low, leading to ambiguity, or when additional sequencing of a region was required. The class “error” (not used in this study) was to be used where our data conflicted with known biological data. For proteins with an identified biochemical function and an Enzyme Commission (EC) number, we used the official EC name as provided by ExPasy (http://us.expasy.org/). An example were the tRNA amino acid ligases, which are commonly known as amino acyl-tRNA synthetases. We also included known alternative gene names to allow for text searching on previously used names. Additional descriptive information was added when useful (e.g., a subunit designation). We used the term subunit instead of chain in all cases of multiprotein functional units. For proteins where there was not enough evidence to be certain of the designation, we chose to use the terms possible and probable (as opposed to putative, predicted, etc.) to designate those that we believed were likely to be correct (probable) and those for which we were less confident (possible). Gene names followed the Demerec standard (24), except for duplicated proteins where the gene name was followed by a number to indicate its order in the genome (e.g., iscA1 and iscA2) or in cases where meaning would be lost by removing the number that had been added by previous groups (e.g., virBN and scaN). Genes whose closest nonrickettsial matches were eukaryotic genes were designated “XYZ-like.”
Predicted proteins with unknown functions were placed in one of three categories. A “hypothetical protein” is an ORF that has no database matches in any other organism. A “conserved hypothetical protein” is defined as an ORF that has matches to other proteins of unknown function in organisms outside of the rickettsiae. Proteins that have significant matches only to other rickettsial sequences are labeled “rickettsial conserved hypothetical proteins.”
The annotated R. typhi genome sequence has been deposited in GenBank and assigned accession number AE017197. The annotated genome and supplemental data are also available at http://www.hgsc.bcm.tmc.edu/microbial/Rtyphi.
The genome of R. typhi is a single circular chromosome consisting of 1,111,496 bp with an average G+C content of 28.9% (Table (Table1).1). The origin of replication (near base pair number 1) was predicted based on this region being one of the switching points of G+C skew (Fig. (Fig.1)1) and by searching for potential DnaA-binding sequences. In addition, this region contains sequences found at replication origins of other bacteria (see below). A 124-kb inversion with respect to the R. prowazekii and R. conorii genomes (see below), shaded in Fig. Fig.1,1, disrupts the continuity of the appearance of the G+C skew, making this criterion less clear than in other bacterial genomes. As in the other sequenced rickettsiae, the putative origin of replication in R. typhi is not located at the dnaA region (coordinate 762,664) but lies ca. 350 kb from dnaA between uroporphyrinogen decarboxylase (hemE) and a conserved hypothetical protein gene.
The region from 19 to 142 kb in the R. typhi genome is inverted with respect to R. prowazekii and R. conorii (Fig. (Fig.2)2) (5, 52) and to R. sibirica and R. rickettsii (results not shown). The latter three strains are all in the SFG, whereas R. prowazekii, like R. typhi, is in the TG. Thus, it is likely this is a recent inversion, occurring after the divergence of R. typhi and R. prowazekii. No repetitive sequences were found near the junctions of the inversion, and no gene was deleted in the inverted region or in the flanking junction regions.
A second region of multiple rearrangements was located from 612 to 644 kb (Fig. (Fig.2).2). This region is similar in the two TG bacteria but differs from the SFG bacteria. Thus, the inversion likely occurred after the divergence of TG and SFG rickettsiae but before the divergence of R. typhi and R. prowazekii. Analysis of the inversion junction regions in the TG bacteria showed one short repeat in R. typhi (33 bp [three copies]) and two short repeats in R. prowazekii (46 and 28 bp [two and three copies, respectively]). Although the TG had a similar structure in this region, the SFG bacteria were all different, indicating continued rearrangements occurring after their divergence (Fig. (Fig.2).2). The R. rickettsii region could have been derived by an additional inversion from the R. sibirica region, suggesting that it was the most recent derivative. Comparison of R. sibirica and R. conorii shows an 81-kb inversion in the 660- to 743-kb region of R. conorii which contains 47 ORFs. This region is next to where the TG bacteria have inversions (corresponding to 618 to 657 kb on R. conorii), indicating that this region is a hot spot for rearrangements. Examination of the inversion junction regions in R. conorii revealed three repeats (114, 73, and 30 bp [two, three, and two copies, respectively]) in the junction region at around 618 kb and seven repeats (114, 96, 92, 74, 54, 42, and 33 bp [55, 62, 44, 22, 3, 11, and 4 copies, respectively]) in the junction region at around 657 kb. The sequences of these repeats do not show sequence similarity to any known transposon or IS sequence. It is possible that repeats at the inversion junction regions contributed to the inversions that occurred in R. typhi, R. prowazekii, and R. sibirica.
Further comparison of the TG bacteria shows R. typhi is missing a 12-kb segment of DNA (at 898 kb) compared to R. prowazekii (Fig. (Fig.2).2). As a result, nine R. prowazekii ORFs (RP709 to RP717) are not present in the R. typhi chromosome. This region is also absent from R. conorii (Fig. (Fig.2),2), as well as R. sibirica and R. rickettsii (not shown). This unique R. prowazekii DNA may have been acquired by horizontal transfer, since three of the R. prowazekii pseudogenes within this region (RP710, RP711, and RP717) share similarity with transposon proteins.
Like R. prowazekii, the R. typhi genome has very few repetitive sequences (5). Five major classes of repeats were found in the R. typhi genome. The repeated sequences in R. typhi and R. prowazekii have similarities in sequence and locations within their genomes, although they are not similar to other known repeat sequences.
Most predicted proteins in the R. typhi genome have matches to ORFs in R. prowazekii and R. conorii (Fig. (Fig.3),3), so the exceptions (Table (Table2)2) deserve attention in determining the distinguishing characteristics of these three organisms. Most of the unique ORFs encode proteins of unknown function. A total of 24 R. typhi ORFs do not match any ORFs in either R. conorii or R. prowazekii. All but two of these are hypothetical proteins with no database matches. One is a rickettsial conserved hypothetical protein (RT0301) that had been previously annotated in R. typhi (accession number CAC33749) and matches hypothetical proteins in R. rickettsii and R. montanensis. The other is a conserved hypothetical protein found within the 23S rRNA gene that matches only a cell-wall-associated hydrolase of Vibrio vulnificus.
Fifteen ORFs common to R. typhi and R. conorii are absent in R. prowazekii. Thirteen are conserved hypothetical proteins, one is a thermostable carboxypeptidase, and one is the type IV secretion system outer membrane component VirB3.
Twenty-three ORFs common to R. typhi and R. prowazekii are absent in R. conorii. Seven are rickettsial conserved hypothetical proteins, and three are conserved hypothetical proteins. The others include an acetate kinase, a GTP-binding protein EngB and two proteins likely involved in type IV secretion. Only one type IV secretion substrate-binding protein (VirB4) is found in R. conorii, but two were annotated in R. typhi, suggesting that R. typhi and R. prowazekii may transport a substrate that R. conorii does not. The Sec7 domain-containing protein, a possible substrate of the type IV secretion system, also has no R. conorii homolog. A group of three sugar transferases found within 5.4 kb of each other and possibly functioning in lipopolysaccharide synthesis were among the proteins with no R. conorii homolog. In addition, the SpoTb and SpoTc proteins, which were annotated as pseudogenes in R. prowazekii (see below), are absent in R. conorii. Finally, R. typhi and R. prowazekii share an Na+/H+ antiporter, a methionine adenosyltransferase, and a poly-β-hydroxybutyrate polymerase that are absent in R. conorii.
Seventeen hypothetical proteins that were present in the R. prowazekii genome but missing in R. typhi were identified. One contained an ankyrin domain (41), suggesting that it could provide a unique adhesive or structural feature to R. prowazekii. Sixteen additional R. prowazekii genes are present in R. typhi as pseudogenes. Eight of these are pseudogenes of hypothetical, conserved hypothetical, or rickettsial conserved hypothetical proteins. Six are pseudogenes of cytochrome c oxidase proteins (discussed below). The R. prowazekii Surf1 and Sco2 proteins were also found as pseudogenes in R. typhi.
The recently released whole-genome shotgun sequence of R. sibirica (accession number NZ_AABW01000001) was also compared to R. typhi, R. prowazekii, and R. conorii. R. sibirica was most similar to R. conorii in genome size, ORF sequences, codon G+C composition (Table (Table1),1), and ORF order. All predicted R. sibirica ORFs had hits to R. conorii ORFs. However, 131 R. conorii ORFs did not have a match in R. sibirica.
A total of 39 noncoding RNA genes have been identified, and a single set of rRNA genes was present. As in the other rickettsiae, the 16S rRNA gene was separated from the 23S and 5S rRNA genes. The coordinates were initially identified by locating highly conserved sequence and secondary structural features that are proximal to the terminus of each rRNA. Final extension to predicted start and stop sites relied on distances from these conserved features seen in Escherichia coli, where the ends have been experimentally verified. Although an exhaustive search was not undertaken, three additional noncoding RNA genes were identified. These were genes for tmRNA, which is associated with trans-translation, the RNA component of RNase P, and the 4.5S RNA, which is associated with the signal recognition particle. All of these genes are present in the other rickettsial genomes, but not all were annotated. Thirty-three tRNA genes were initially identified with tRNAscan-SE (47) and subsequently verified for compliance with known tRNA structural features. The R. typhi tRNA genes have annotated equivalents in the other rickettsiae, although the assigned amino acid acceptor is different in two cases (RT510 [Val-GAC versus RP523, Phe-GAA] and RT0757 [Met-CAT versus RP770, Ile-CAT]). All amino acids are represented, but no tRNA genes were found that would be expected to read the proline codons CCC and CCU. These codons are present in R. typhi, so it is possible that the rickettsiae have a proline tRNA (gene) of unusual structure. The three arginine tRNA genes have the anticodons ACG, CCG, and UCU. It is possible the A of the ACG anticodon may be converted to inosine to allow wobble reading of three codons (CGU, CGC, and CGA). Finally, there are three methionine tRNA genes. One of these was identified as the initiator based on the presence of the characteristic mispair in the CCA stem. Since no tRNA was found that could obviously decode AUA, it is possible that one of the methionine tRNAs may be an isoleucine tRNA.
Like R. prowazekii and R. conorii, R. typhi encodes four subunits of RNA polymerase core enzyme (α, β, β′, and ω) and two predicted sigma factors, the major housekeeping sigma factor, σ70, and a heat shock sigma factor, σ32, encoded by rpoH. Madan Babu (49) suggested that loss of expressed sigma factor genes can initiate the accumulation of pseudogenes by eliminating entire sets of coregulated genes. No additional sigma factor-like sequences were detected in R. typhi, suggesting that most expressed genes are coregulated at this level.
No repressor-like proteins were found to be encoded by the R. typhi genome and a limited repertoire of other transcription factors that can respond to changes in the environment were present. The genome potentially contains four two-component signal transduction systems. An EnvZ-OmpR system (RT0412, RT0413), like those that function in osmoregulation in other bacteria, was found. An NtrY-NtrX-like two-component system (RT0603 and RT0550) involved in nitrogen metabolism in some organisms was also observed. The envZ-ompR genes are adjacent, as is common for such sensor kinase and response regulator systems, but the ntrY and ntrX genes are separated by more than 90 kb. Two additional, nonadjacent, response regulator genes and two sensor histidine kinase genes were also annotated in the R. typhi genome. One of the response regulators, CzcR (RT0061), is similar to the Caulobacter crescentus CtrA protein, which binds the Caulobacter origin of replication at TTAAN7TTAA sequences and represses DNA replication in swarmer cells (53). CtrA binds to similar sites in the R. prowazekii origin of replication and CzcR can complement CtrA function in C. crescentus (17). Presumably, the R. typhi CzcR protein has a similar DNA-binding activity since at least four TTAAN7TTAA-like sequences map within the putative origin of replication region. The histidine kinase partner for CzcR is uncharacterized but may be encoded by one of the two “unpaired” sensor kinases annotated in the R. typhi genome (RT0221 and RT0452). The last response regulator (RT0229) is similar to putative response regulators in the α-proteobacteria; its histidine kinase partner is also unknown. Beyond transcription initiation, R. typhi encodes proteins involved in transcript elongation (NusA, NusB, NusG, and GreA) and termination (Rho), as do R. prowazekii and R. conorii.
Little is known about the genetic determinants required for the rickettsiae to invade a host and cause disease. Analysis of the finished genome has suggested 42 genes that may be involved in virulence (Table (Table3).3). These include hemolysins, enzymes that allow intracellular survival, and phospholipases that permit escape of the organism from the phagolysosome. Hemolysis is thought to play a role in the pathogenesis (53, 68) of TG but not SFG of rickettsiae. R. prowazekii has two potential hemolysin genes, tlyA and tlyC (5), and the cloned tlyC from R. typhi can complement hemolytic activity in a nonhemolytic Proteus mirabilis (54). Both tlyA (RT0543) and tlyC (RT0725) were found in R. typhi. Another protein implicated in the virulence of R. prowazekii is the dinucleoside polyphosphate hydrodolase, InvA. This protein is thought to function by hydrolyzing toxic dinucleoside oligophosphates within the host cell to produce ATP, thus providing an environment more conducive to growth (30, 31). ORF RT0228 is 98% identical to the R. prowazekii InvA. Oxidative stress due to reactive oxygen species has been implicated in both host cell injury caused by Rickettsia and in host defense to infection (65). ORF RT0523, sodB, is an ortholog of an iron-associated superoxide dismutase that is predicted to neutralize reactive oxygen species, allowing survival in the host cell. It has been suggested that the rickettsiae use phospholipase A2 to exit the phagosome and the cell (69). A putative phospholipase-like protein (RT0590) with significant similarity to the patatin family of proteins was found in the genome. Patatin is the main storage protein found in potatoes but has also been found to have phospholipase A activity (39, 43). It is possible that this protein acts as a phospholipase in R. typhi. In R. conorii, the pldA gene has been shown to encode a protein with phospholipase D activity (55). An ortholog of pldA is present in R. typhi (RT0807). Outer membrane proteins, which are in contact with the host milieu and could function in virulence, have also been identified. These include omp1 (RT0150), a predicted antigenic lipoprotein precursor nlpD (RT0291), and an ortholog of a Salmonella enterica serovar Typhimurium virulence factor mviN (RT0579) (20). Further, 12 lipopolysaccharide biosynthesis genes have been identified. Most are genes predicted to be involved in lipid A and core biosynthesis. Finally, we have identified 16 genes associated with a type IV secretion system and 5 genes of an autotransporter family that also may be involved in the pathogenicity of R. typhi. These latter groups of genes will be discussed in more detail below.
Previous studies (36) have shown that R. typhi undergoes actin-based intracellular motility like the SFG rickettsiae. However, R. typhi seems to be deficient (or inefficient) in its directional control of motility since its movement occurs in circular paths with many changes in speed and direction, unlike the SFG rickettsiae, which tend to move in straight paths with constant speeds (37). It has been shown that the actin tail formed by R. typhi is shorter and more hook-shaped than those formed by the SFG rickettsiae (62). Ogata et al. originally identified a protein in R. conorii (RC0909) that has a domain organization similar to the actin organizing protein, ActA, of Listeria monocytogenes, and they hypothesized that this protein may play a role in rickettsial actin tail formation (52, 69). Further work showed that this protein, now called RickA, can activate Arp2/3 and induce actin polymerization (33). We were unable to identify a homolog of RickA in the R. typhi genome, and the absence of this protein may explain the actin tail defects and inefficient motility seen in R. typhi.
Doxycycline is the antibiotic of choice for treating R. typhi, since the organism is resistant to many other antibiotics, such as beta-lactams, gentamicin, and trimethoprim-sulfamethoxazole (56). Analysis of the genome has provided some potential mechanisms for resistance to these antibiotics, both specific and nonspecific.
One metallo-β-lactamase ortholog was found in R. typhi (RT0428) and in the other two finished rickettsial genome sequences. Rickettsiae produce a peptidoglycan cell wall containing penicillin-binding proteins (R. typhi has six penicillin-binding proteins) and show relative resistance to penicillin and other beta-lactam antimicrobial agents (50).
Several orthologs of multidrug transporters were found. An ortholog (RT0146) of the emrB multidrug transporter (46) was found in the R. typhi genome. In E. coli, EmrB confers resistance to nalidixic acid. It also shares sequence similarity with QacA, a multidrug-resistant pump of Staphylococcus aureus that belongs to a family that includes tetracycline resistance pumps of gram-positive bacteria. An ortholog of the EmrB regulator, EmrA, was also found in R. typhi, R. prowazekii, and R. conorii. It is unlikely that EmrB functions as a tetracycline pump since the rickettsiae are sensitive to tetracycline. Nor is it likely that it acts as a quinolone efflux pump since this class of antibiotics are moderately active against most rickettsiae (50). In addition, two copies of bcr gene orthologs (RT0591 and RT0693) were found each in R. typhi, R. prowazekii, and R. conorii. In E. coli, Bcr confers resistance to diketopiperazine antibiotics (13), usually used in therapy of Vibrio and Mycobacterium infections. Like EmrB, Bcr also resembles a multidrug transporter. Besides the potential EmrB and Bcr transporters, 10 additional potential ABC transporters were annotated in the R. typhi genome. Resistance to trimethoprim-sulfamethoxazole is likely due to the absence of enzymes of the folic acid pathways, since synthesis of folic and folinic acids are the target of these antibiotics. Gentamicin resistance is likely due to the fact that it remains extracellular, whereas the rickettsiae are intracellular pathogens.
Two other possible resistance genes, the acriflavin resistance gene (acrD, RT0161) and a tellurite resistance gene (terC, RT0774) were found in the R. typhi genome. These two genes are also present in R. prowazekii and R. conorii. Although the clinical relevance of these resistances is unknown, the possible presence of resistance to acriflavin (a mutagen) and tellurite (a reducing agent) in all three rickettsial genomes may suggest that they provide a selective advantage to the organism.
Autotransporters are proteins that are responsible for their own secretion across the outer membrane of gram-negative bacteria. The carboxy-terminal β-domain is thought to form a β-barrel outer membrane porin that allows the amino-terminal passenger domain to pass through it in an unfolded state. The secreted passenger domain is then released from the transporter as a functional mature protein (38). Analysis of the R. typhi genome revealed the presence of five previously described autotransporter proteins: Sca1 (RT0015), Sca2 (RT0052), Sca3 (RT0438), Sca4 (RT0485), and OmpB (RT0699) (5, 52). Work in the SFG rickettsiae showed that both OmpB and a related protein, OmpA (not found in the TG), play a role in adhesion to host cells (45, 60). Although no function of the Sca proteins was demonstrated, it was suggested that some autotransporters have adhesion or protease activities and could play a role in rickettsial virulence (38, 69). A characteristic of bacterial adhesins that bind to eukaryotic extracellular matrix proteins is the presence of amino acid repeats (57). A 50-residue repeat occurs six times within the carboxy-terminal portion of the Sca2 passenger domain, suggesting that this protein may be involved in binding of R. typhi to host cells.
In R. typhi, each of the four Sca proteins is found within a single predicted ORF, while at least one Sca protein has degraded into multiple ORFs in both R. conorii and R. prowazekii. Sca1, Sca2, and Sca4 are intact in R. conorii, but portions of the Sca3 sequence are found in two different ORFs (RC0630 and RC0631). The R. typhi Sca1 sequence corresponds to three different ORFs in R. prowazekii (RP016, RP017, and RP018), whereas the Sca2-like sequence is found in four different ORFs (RP081, RP082, RP083, and RP084). Sca1 and Sca2 contain only the β-domain and may function to transport other proteins across the outer membrane (69). Sca4 is found in two large ORFs in R. prowazekii (RP498 and RP499).
Several components of a type IV secretion system (19) found in R. conorii (52), R. prowazekii (5), and other α-proteobacteria, were present in R. typhi. Type IV secretion systems use a complex of transmembrane proteins and a pilus to export protein or DNA substrates across bacterial and eukaryotic cell membranes and into target cells (18, 19).
Fifteen ORFs related to the Agrobacterium tumefaciens VirB type IV secretion system were found in R. typhi. These include orthologs to the inner membrane components VirB4 (RT0033 and RT0771) and VirB11 (RT0283). In A. tumefaciens, VirB4 binds to the DNA substrate since it is translocated across the membrane (19); the presence of two VirB4 proteins in R. typhi may indicate versatility in substrate selection. The substrate-binding protein VirD4 (19) was also present (RT0283). The periplasmic components VirB6, VirB8, and VirB10 (RT0282) were identified. These included two ORFs resembling VirB8 (RT0278 and RT0280), in addition to five different ORFs with similarity to VirB6 (RT0028, RT0029, RT0030, RT0031, and RT0032). VirB6 functions in the stabilization of VirB5 and VirB7 (36), neither of which was found in R. typhi. VirB6 also stabilizes VirB3 (RT0034) (36), which was found in R. typhi. Multiple VirB6-like ORFs may be the result of inactivation of one or more genes or the development of new functions not yet described. Outer membrane components VirB3 and VirB9 (RT0277 and RT0281) were also found, but an ortholog of VirB7, which bridges the pilus to the transmembrane complex, was not (36). Neither VirB2 nor VirB5, the major and minor pilus components, were identified (19). Because R. typhi live within the host cell cytoplasm, there may be no need to secrete substrates across a eukaryotic host cell membrane, hence the lack of pilus components.
A potential protein substrate for a type IV secretion system was also found. R. typhi has a Sec7 domain-containing protein (RT0362) that is similar to RalF in Legionella pneumophila. RalF is secreted by the L. pneumophila dot/icm type IV secretion system (51). Along with a R. prowazekii ortholog, these are the only three known examples of Sec7 family proteins in bacteria (51). Sec7 proteins, named for Saccharomyces cerevisiae Sec7p, are guanine-nucleotide exchange factors and cofactors for ADP-ribosylation factors (ARFs), which function in the regulation of membrane traffic and organelle structure in eukaryotic cells (25, 42). The similarity to the L. pneumophila RalF protein suggests that RT0362 may be secreted by the R. typhi type IV system. In Legionella, the function of RalF is related to recruiting host ARFs to bacterium-containing phagosomes (51) but, since rickettsiae do not live in phagosomes, the role of the RalF-like protein in R. typhi is unknown.
In addition to the autotransporters, drug efflux pumps, and VirB secretion system, R. typhi has proteins that function in transporting substances across the cell membrane. There were five copies of an ADP/ATP translocase, a transporter that allows R. typhi to acquire host cell ATP (5). As in other rickettsiae, there are multiple copies of the osmosensory proline-betaine transporter, ProP. We found 7 copies of ProP in R. typhi compared to 11 copies in R. conorii and 7 copies in R. prowazekii (52). Transport proteins for cell membrane and cell wall components were found, including three copies of the murein peptide permease AmpG, two heme exporters CcmAB (RT0781 and RT0259), and two O-antigen exporter proteins (RT0002 and RT0003). Also found was a homolog of the SAM transporter of R. prowazekii (RT0056) that has recently been shown to transport S-adenosylmethionine into the bacterial cell (59). A variety of other transport proteins, including seven possible ABC transport systems, were found. These transporters include a potential zinc/manganese transporter (RT0013 and RT0612), a magnesium transporter (RT0572), potassium transporter (RT0798), a glutamine transporter (RT0118, RT0139, and RT0859), a putrescine/ornithine transporter (RT0470), a ribonucleotide transporter (RT0040), and a sodium/proton symporter (RT0549). Four members of the Tol family—TolC (RT0216), TolB (RT0293), TolQ (RT0299), and TolR (RT0300)—were identified. Transporters with less clear possible functions were given generic descriptions such as metal transporter and cation transporter. Still others with no known or postulated substrate were also found. In addition, a PSORT-B (29) analysis of coding regions identified 39 hypothetical, conserved hypothetical, or rickettsial conserved hypothetical proteins that could be membrane localized. Some of these may also function in transport.
In response to starvation and other stress signals, bacteria increase their intracellular concentration of GDP 3′-diphosphate and GTP 3′-diphosphate, or (p)ppGpp, in a process called the stringent response (21, 22). This leads to a downregulation of transcription and translation. In E. coli, the RelA and SpoT proteins modulate (p)ppGpp levels in the cell. No intact R. typhi ORF with significant similarity to either RelA or SpoT was found, although four small ORFs with significant similarity to SpoT were found. These ORFs align to two different regions of SpoT (Fig. (Fig.4)4) and have been previously described and classified as pseudogenes or “split genes”, named spoTa, spoTb, spoTc, and spoTd (4, 5, 52). Andersson and Andersson suggested that spoT underwent a gene duplication event and then each copy degraded into two pseudogenes (4). They postulated that spoTa and spoTd arose from one copy and that spoTb and spoTc arose from the other. In agreement with this, amino acid sequences of the R. typhi SpoTa and SpoTd roughly correspond to the same region of SpoT from E. coli, whereas SpoTb and SpoTc both correspond to a different region of SpoT. The authors of that study compared the lengths and arrangements of these four ORFs in R. prowazekii, R. typhi, R. rickettsii, and R. montanesis and concluded that they were not functional, although the study also indicated that SpoT proteins corresponding to the amino-terminal catalytic domain could be functional (4).
We suggest that the SpoT proteins in R. typhi may be functional and have chosen to adopt the term “split gene” as proposed by Ogata et al. (52), thus classifying them as coding ORFs instead of pseudogenes. The E. coli SpoT protein has a two-domain structure, with an amino-terminal (p)ppGpp degradation domain and a separate, but overlapping, (p)ppGpp synthetase domain (32). The degradation domain consists of no more than the first 203 amino acids, whereas the synthetase domain lies between residues 67 to 374 (32). The R. typhi SpoTa and SpoTb proteins align within the first 200 amino acids of E. coli SpoT (SpoTa, amino acids 20 to 196; SpoTb, amino acids 5 to 182) and thus, if translated, could code for functional (p)ppGpp hydrolases. The R. typhi SpoTc and SpoTd proteins correspond to the (p)ppGpp synthetase domain (SpoTc, amino acids 233 to 342; SpotD, amino acids 211 to 343), so these proteins could synthesize (p)ppGpp. These proteins may not function singly but may instead form complexes, e.g., SpoTa/SpoTd and SpoTb/SpoTd, to become active.
Further support of SpoT functionality is the high level of conservation of the genes among the Rickettsia species. In agreement with previous results, we found the most conserved ORF to be spoTa and the least conserved ORF to be spoTd (4). The DNA sequences of spoTa are 95% identical between R. typhi and R. prowazekii and 91% identical between R. typhi and R. conorii, whereas the spoTd genes are 86% identical between R. typhi and R. prowazekii and 90% identical between R. typhi and R. conorii. The spoTd genes vary in length, with an R. typhi homolog of 573 bp, an R. conorii gene 607 bp, and an R. prowazekii homolog of only 264 bp. The sequences for spoTb and spoTc are 98 and 100% identical between R. typhi and R. prowazekii, respectively, although homologs were not found in R. conorii. With the exception of spoTd, these levels of identity rank with those of highly conserved genes such as the 54 shared ribosomal proteins (94.8% identity between R. typhi and R. prowazekii and 91% identity between R. typhi and R. conorii), while the overall conservation rate between the genomes is 94.3% identity between R. typhi and R. prowazekii and 88.8% identity between R. typhi and R. conorii. Thus, there appears to be selective pressure on the spoT gene sequences.
Several additional lines of evidence suggest that the SpoTa to -d ORFs may be functional in R. typhi. First, two other components of the stringent response pathway, Ndk (nucleoside diphosphokinase) and GppA (pppGpp-5′-phosphohydrolase), are found in the genome. Ndk is involved in the addition of phosphates to nucleoside diphosphates and should be necessary even in the absence of the stringent response, whereas GppA is active only on GTP derivatives. Thus, without a ppGpp synthetase there would be no reason to maintain the gppA function. Second, Ogata et al. reported that in R. conorii some split genes may be transcribed and retain functionality (52). Lastly, there is evidence that the rickettsiae do possess a functional stringent response since bacteria in blood-starved ticks downregulate protein synthesis (68, 69).
The rickettsiae are well known for containing DNA that is no longer capable of producing a functional protein (pseudogenes) (3). We discovered 41 sequences that appear to be pseudogenes (Table (Table4).4). Although several pseudogenes matched proteins in other bacteria, most appear only in other rickettsiae (rickettsial conserved hypothetical proteins). For example, R. typhi is missing the entire cytochrome c oxidase pathway. Both R. prowazekii and R. conorii contain intact electron transport systems, including the genes necessary for NADH dehydrogenase (complex I), succinate dehydrogenase (complex II), cytochrome reductase (complex III), and cytochrome oxidase (complex IV) (5, 52). Many bacteria contain more than one terminal oxidase for aerobic respiration under differing oxygen concentrations (44). Both R. prowazekii and R. conorii contain the genes coxABC, which encode a putative cytochrome c oxidase, and cydAB, which encode a putative cytochrome d ubiquinol oxidase. This situation is similar to Mycobacterium smegmatis in which an aa3-type cytochrome c oxidase is expressed during normal aerobic growth and a cydAB-encoded bd-type oxidase is expressed during low oxygen conditions (44). In R. typhi, all three genes encoding cytochrome c oxidase (coxCAB; RTPG10, RTPG22, and RTPG23) have decayed and are likely nonfunctional. In addition, the genes encoding the three cytochrome c oxidase assembly proteins, CoxW, CtaG, and CtaB (RTPG14, RTPG15, and RTPG18), have also decayed. As a result, it appears that R. typhi retains only one terminal oxidase for aerobic respiration. The remaining oxidase, CydAB (RT0207 and RT0208), is predicted to be of the d-type that is commonly expressed during low-oxygen conditions. Such cytochrome d oxidases generally have a higher affinity for oxygen than do cytochrome c oxidases and thus mediate the transfer of electrons to water at a slower rate (34). This may lead to physiological effects that play a role in the attenuated disease manifestations of R. typhi compared to R. prowazekii.
The genome of R. typhi is nearly identical to its close relative R. prowazekii and highly similar to R. conorii and other SFG bacteria. The few differences between the two TG rickettsiae include a 12-kb insertion in the genome of R. prowazekii, a large inversion close to the origin of replication with no loss of genes in the region, and the fact that R. typhi has lost the complete cytochrome c oxidase system. In addition, R. typhi has several pseudogenes for which functional homologs are found in R. prowazekii. The similarity of the genome sequence of R. typhi to the TG rickettsiae further illustrates the delineation between the TG and SFG rickettsiae. Based on comparisons of these three genomes, it appears that rickettsiae may have a functional stringent response system that utilizes split genes of the SpoT protein as previously described. Further comparisons that include R. sibirica and R. rickettsii, as briefly mentioned here, should provide additional insight into the evolutionary relationship between these important pathogens.
The rickettsiae appear to maintain nonfunctional DNA for a much longer period than do other bacteria. It has been proposed that slow-growth times can account for this. However, there are many slow-growing bacteria that do not accumulate pseudogenes. It is possible that the rickettsiae retain this DNA for a reason or that their pathway for removing nonfunctional DNA is different. By comparing this genome to others, which lose nonfunctional genes faster, we may discover the mechanisms by which organisms lose DNA. It is interesting that most of the pseudogenes in R. typhi were found in regions with deletions in R. prowazekii and R. typhi with respect to R. conorii, and in fact many appear to be remnants of the deletion event. The pseudogenes may be functional in other rickettsiae or indicative of regions that are not functional in any of the rickettsiae.
The high degree of similarity between R. typhi and R. prowazekii illustrates the small differences that can affect virulence in different hosts. Thus, despite their close genetic relatedness, R. prowazekii is highly pathogenic in humans, whereas R. typhi is a milder pathogen in humans. Future studies should identify which of the differences in these genomes accounts for these phenotypes.
This research was supported by a grant from the National Institute of Allergy and Infectious Disease (AI49040). Michael P. McLeod was supported in part by a training grant from the Keck Center for Computational Biology (NLM grant LM07093). George E. Fox was supported in part by a Ruth L. Kirschstein National Research Service Award (F33-HGO2551).
We are grateful to Virginia Vera, Baize Montgomery, and Regina Wleczyk for technical assistance. We thank Manuel Gonzalez-Garay, Andrei Volkov, Andrew R. Jackson, and Aleksandar Milosavljevic for hosting annotation on the Genboree server.