Chromosome and plasmid nucleotide features. To gain insight into C. burnetii genetic diversity and pathogenetic potential, the genome sequences of the K, G, and Dugway isolates were determined and compared to the sequenced genome of NM. Chromosomal and plasmid features of the isolates are summarized in Tables and , respectively. Each isolate contains a roughly 2-Mb single circular chromosome. Relative to NM, the genomes of K, G, and Dugway have 13, 5, and 17 novel chromosomal insertions encompassing 51,414, 45,378, and 175,046 bp of DNA, respectively (see Table S1 in the supplemental material). Dugway has 10 unique insertions totaling 120,747 bp, while K and G do not contain unique DNA.
| TABLE 1.Chromosomal features of C. burnetii isolates |
| TABLE 2.Features of C. burnetii plasmids |
NM, K, and Dugway carry a moderately sized plasmid, while G has plasmid-like sequences integrated into its chromosome. The nucleotide sequence of the QpRS plasmid of K (39,280 bp) has 17 polymorphisms affecting eight ORFs (four ORFs being frameshifted) relative to the previously sequenced QpRS plasmid of the Priscilla (Q177) isolate, another genomic group IV isolate (
60) (see Table S2 in the supplemental material). The Dugway plasmid, termed QpDG (
65), is 54,179 bp. The larger size of QpDG relative to other
C. burnetii plasmids is consistent with a previous description (
65) and contrasts with a report claiming that QpDG is nearly identical to the NM plasmid QpH1 (37,393 bp) (
54). The G isolate has 17,532 bp of QpRS-like plasmid sequence integrated into its chromosome between two ORFs (CbuG0070 and CbuG0090) encoding hypothetical proteins (
8). This sequence has three SNPs relative to the previously sequenced integrated plasmid-like sequences of the S (Q217) isolate of genomic group V (
111) (see Table S2 in the supplemental material). At the nucleotide level, QpH1, QpRS, QpDG, and the integrated plasmid sequences of G are 99% identical within 14,218 bp of common DNA, while QpH1, QpRS, and QpDG are 99% identical within 28,421 bp of common DNA. QpH1, QpRS, and QpDG harbor 3,685, 2,677, and 15,423 bp, respectively, of unique sequence (see Table S3 in the supplemental material). At the nucleotide level, QpRS and QpDG are most similar in sharing 34,940 bp of common sequence.
Conserved and novel gene content. To obtain consistency in genomic comparisons, the previously sequenced NM genome (
96) was subjected to gene prediction and annotation procedures as described for the K, G, and Dugway isolates (see Materials and Methods). Including pseudogenes, but not insertion sequence (IS) element-associated genes, reannotation of the NM chromosome and plasmid (QpH1) resulted in the identification of 111 and 1 previously uncalled ORF(s), respectively (see Table S4 in the supplemental material). Dugway, with the largest chromosome and plasmid, encodes the most full-length ORFs (2,052), with 145 and 13 unique ORFs encoded by the chromosome and plasmid, respectively. G, with the smallest chromosome and lacking an autonomously replicating plasmid, encodes the fewest full-length ORFs (1,816). G encodes only 31 novel intact ORFs relative to the other
C. burnetii isolates, consistent with its lack of novel DNA (Fig. and Tables and ; see Tables S1 and S5 in the supplemental material). A detailed comparison of the four genomes revealed 1,503 chromosomal and 22 plasmid ORFs shared by
C. burnetii isolates (Fig. ; see Fig. S1 and Table S5 in the supplemental material). The lack of extensive novel gene content between isolates is in agreement with the organism's obligate intracellular lifestyle that limits opportunities for genetic exchange.
C. burnetii lacks obvious bacteriophage, although there are some phage-like genes carried by the plasmids (
96). Moreover, all
C. burnetii isolates contain pseudogenes associated with natural competence (e.g.,
comA) and lack genes encoding a conjugal apparatus. Intact chromosomal ORFs with functional annotation that are missing in NM but intact in K, G, and/or Dugway are listed in Table S6 in the supplemental material. Intact conserved and unique ORFs encoded by plasmid and plasmid-like sequences of
C. burnetii isolates are listed in Table S7 in the supplemental material. Isolate-specific genes and pseudogenes with functional annotations related to metabolism and virulence are discussed in more detail below.
As originally described for NM (
96),
C. burnetii isolates cumulatively encode an unusually high proportion (39.2%) of hypothetical and conserved hypothetical proteins (i.e., without assigned function), most of which are conserved among the four isolates (Tables and ; see Table S5 in the supplemental material). Isolates encode one copy each of 5S, 16S, and 23S rRNA genes, with the latter containing two self-splicing group I introns (
86).
In a recent study using comparative genomic hybridization, Beare et al. (
9) identified genetic polymorphisms of the Dugway isolates 7E9-12 and 5G61-63 relative to NM. These Dugway isolates were recovered in the same field study as the Dugway 5J108-111 isolate sequenced for this report (
102). The nucleotide sequence of Dugway 5J108-111 revealed the same plasmid and chromosomal polymorphisms as 7E9-12 (e.g., deletion of NM Cbu0881), suggesting that these isolates are genetically very similar and unlike the Dugway 5G61-63 isolate, which has no polymorphisms relative to NM (
9).
C. burnetii genome architecture and gene content. Although once considered rare in obligate intracellular bacterial pathogens, IS elements have now been described in at least four species of
Rickettsia (
12), with large numbers present in
Orientia tsutsugamushi (
19). Combined, the four
C. burnetii isolates harbor eight distinct families of IS elements with associated transposases: the IS
1111A, IS
30, IS
As1, IS
652, and IS
4 families, as well as three unknown transposase types (see Table S8 in the supplemental material). K has 59 IS elements, with 31 containing an intact transposase. G has 40 IS elements, with 33 containing an intact transposase. NM and Dugway have roughly the same number of IS elements (31 and 33, respectively); however, transposases are intact in only 5 Dugway IS elements while being intact in 28 NM IS elements. A single IS element (IS
4 family) was found in the QpDG plasmid of Dugway. Other
C. burnetii plasmids lack IS elements, although an IS element (IS
1111A family) is found adjacent to the integrated plasmid-like sequences of G.
The movement of IS elements clearly contributes to
C. burnetii genomic plasticity. Chromosomal rearrangements have resulted in 21, 6, and 13 syntenic blocks (defined as having the same gene order and gene content as NM) in K, G, and Dugway, respectively (Fig. and ; see Table S9 in the supplemental material). Two syntenic blocks are shared between K, G, and Dugway, with 5 shared between K and Dugway. G contains four novel syntenic blocks. Cumulatively, the syntenic blocks of K, G, and Dugway represent 40 chromosomal breakpoints relative to the NM chromosome. Of these, 30 (75%) have an intact or remnant IS element within 100 bp of the breakpoint (see Table S9 in the supplemental material), suggesting an important role for homologous recombination between IS elements in
C. burnetii genome rearrangements. Homologous recombination has been demonstrated in the NM isolate (
103). Moreover, intact
recA is present in all four isolates, with functionality recently demonstrated for the NM ortholog Cbu1054 (
67).
Figure depicts a syntenic chromosomal region shared by the four isolates that was presumably rearranged by recombination of flanking IS1111A elements. The region contains multiple genes encoding hypothetical proteins and housekeeping enzymes, such as prlC (Cbu0039 in NM) encoding oligopeptidase A, a protein involved in signal peptide degradation. In all isolates, the region is flanked at the 3′ end by a full-length or frameshifted IS1111A element (Cbu0040 in NM). The next gene at the 3′ end in G and Dugway is an ortholog of NM Cbu1960 encoding a hypothetical cytosolic protein. In K it is an ortholog of NM Cbu1778 encoding fructose-bisphosphatase. At the 5′ end, only NM and G maintain the flanking IS element (Cbu0006 in NM). However, the upstream gene in G is an ortholog of Cbu1896 encoding a macrolide efflux pump. This gene also constitutes the 3′ end of the syntenic block in K and Dugway, with the IS element and the NM Cbu0006a ortholog deleted in Dugway. A larger deletion in K eliminates the IS element and orthologs of NM Cbu0006a, Cbu0007, Cbu0008, and a piece of Cbu0008a.
Expansion of IS elements, accumulation of pseudogenes (defined as genes disrupted by IS elements, small indels, or nonsense mutations), and numerous genomic rearrangements are associated with pathogens that have recently emerged from nonpathogens (
82). An example is the facultative intracellular bacterium
Francisella tularensis (
89). A pathoadaptive evolutionary process is thought to result from bottlenecks encountered by small, isolated populations of a newly emerged pathogen whereby the new niche promotes gene decay by genetic drift (
89). The obligate intracellular nature of
C. burnetii, with its exploitation of host metabolic processes and limited opportunity for genetic exchange, would be expected to accelerate this process (
2). Although genome reduction is clearly occurring in
C. burnetii (
96), it is nowhere near the extent of other obligate intracellular bacteria, such as
Rickettsia prowazekii and
Chlamydia trachomatis. These pathogens are apparently in the final stages of host cell adaptation and have cleared most pseudogenes from their respective genomes (
2). A nonpathogenic progenitor of
C. burnetii has not been identified; however,
Coxiella-like endosymbionts of ticks are highly prevalent and may represent nonpathogenic ancestors of virulent
C. burnetii (
57).
The original NM genome annotation identified 83 pseudogenes, including those encoding transposases (
96). Using cross-genome comparisons of the four isolate genomes and pseudogene criteria described in Materials and Methods, an additional 125 NM pseudogenes were revealed (see Tables S5 and S10 in the supplemental material). These data are consistent with recent findings that bacterial pseudogenes are frequently underannotated (
79). Most new NM pseudogenes (78%) were originally annotated as ORFs encoding hypothetical proteins. The 207 total pseudogenes of NM represent 10.1% of NM ORFs (see Table S11 in the supplemental material). The K isolate has the highest percentage of pseudogenes (11.7%), while the Dugway isolate has the lowest percentage (6.6%) (see Table S11 in the supplemental material). Isolate pseudogenes are all caused by small indels or nonsense mutations, with none directly attributed to insertional disruption by an IS element. Sixty-five pseudogenes are conserved among
C. burnetii isolates (see Tables S5 and S10 in the supplemental material), representing genes likely inactivated in a common ancestor.
Similar to a scenario recently proposed for pathogenic
Francisella tularensis (
89), IS element-mediated genome rearrangements may drive pseudogene development in
C. burnetii. For example, in K, with 21 chromosomal breakpoints relative to NM, pseudogenes are enriched within 3 kb of a breakpoint (21.2%) (see Table S11 in the supplemental material). A proposed mechanism for IS element-mediated pseudogene formation is recombination between elements to result in transcriptional units that are no longer transcribed. Genes within these units then lack selective pressure and consequently accumulate mutations by genetic drift that result in their inactivation (
89). Isolates display substantial heterogeneity in pseudogenes associated with virulence, such as the ankyrin-repeat protein (Ank)-encoding genes (discussed in more detail below), a factor that likely contributes to isolates' virulence potential and other phenotypes.
Phylogenetics. Several lines of evidence suggest that Dugway is more primitive than NM, K, or G. Dugway, with the largest chromosome and plasmid, appears to have undergone the least amount of genome reduction and has more unique ORFs (Tables and ; see Tables S1 and S3 in the supplemental material). The Dugway isolate also has the fewest pseudogenes (Tables and ) and IS1111A insertion elements, an element that has particularly multiplied within other C. burnetii genomes (see Table S8 in the supplemental material). Of Dugway's 12 IS1111A elements, 9 have genomically conserved positions in at least two other isolates. Moreover, 17 of Dugway's 33 insertion elements have genomically conserved positions in all isolates, with 6 of the 11 uniquely positioned insertion elements belonging to the IS30 family, which has multiplied solely within the Dugway genome.
A multiprotein phylogenetic analysis was employed to test the hypothesis that Dugway is more primitive than NM, K, and G. Also included in this analysis were the recently completed genome sequence of Henzerling (RSA331), a human acute disease isolate, and the partially completed genome sequence of Priscilla (Q177), a goat abortion isolate (
51). Comparisons were made to the most-closely related outgroup genera
Rickettsiella (R. grylli) and
Legionella (
L. pneumophila) (
92). Bayesian analysis of 1,402 families that contained one and only one representative from each
C. burnetii isolate was conducted to gauge the vertical inheritance pattern of the genus (see Materials and Methods). While alignments of
C. burnetii protein sequences yielded a supermatrix with a very large number (425,592) of amino acid characters, only a small percentage (0.82%) were informative.
The tree was rooted based on a separate study of 102 diverse
Gammaproteobacteria which found the
Coxiella/
Rickettsiella/
Legionella clade robustly supported, with no intervening genera (Fig. ) (K. P. Williams, J. J. Gillespie, E. E. Snyder, J. M. Shallom, E. K., Nordberg, A. W. Dickerman, and B. W. Sobral, unpublished data). The phylogenetic relatedness of these three genera correlates with conservation of homologous genes that likely accommodate common features of their intracellular lifestyles. For example, all carry a close homolog (
e greater than −87) of Cbu0515, a major facilitator superfamily (MFS) transporter (
94). This protein may transport a vacuolar nutrient that overcomes a common auxotrophy of these bacteria (
94). Genes are also exclusively shared between
C. burnetii isolates and
L. pneumophila, such as the
enhABC cluster (Cbu1136-1138) that is implicated in macrophage invasion (
21).
For C. burnetii isolates, a consensus tree showed 100% support for each node, except for the node grouping Dugway with the K and Priscilla pair, which received 99% support (Fig. ). Because multiprotein datasets can receive exaggerated Bayesian support values and the number of informative characters was relatively low, the robustness of the tree was tested by two different resampling methods, one generating trees by a maximum likelihood program and the other by Markov chain Monte Carlo. The consensus trees from both tests reproduced the original tree topology and again gave all nodes 100% support, except for that placing the Dugway branch, which in these tests received 55 to 62% support. Based on these data, the designation “ancestral” for Dugway is not directly supported since it does not subtend all other isolates on the tree. However, “primitive” is an accurate designation for the Dugway isolate, because it has the shortest distance to the root of the tree and has the previously mentioned features that are presumed to have been lost during the pathoadaptation process of more-virulent isolates.
Comparative metabolomics. C. burnetii is metabolically complex relative to other obligate intracellular bacteria, with pathways of central carbon metabolism and bioenergetics largely intact (
96). However, some notable deficiencies exist. All
C. burnetii isolates encode a putative glucose transporter (Cbu0265), and biochemical evidence exits for conversion of glucose to pyruvate via glycolysis (
44). However, they lack a hexokinase responsible for converting glucose to glucose-6-phosphate, the first step in glycolysis. As an alternative,
C. burnetii isolates may phosphorylate glucose by a transphosphorylation reaction involving carbamoyl phosphate and a predicted inner-membrane-bound glucose-6-phosphatase (Cbu1267). A key pathway that appears inoperative is the oxidative branch of the pentose phosphate pathway. All isolates lack glucose-6-phosphate dehydrogenase and 6-phosphogluconate dehydrogenase. Thus,
C. burnetii may not rely on this pathway to replenish reducing equivalents in the form of NADPH. This biochemical deficiency could contribute to low biosynthetic capacity and the slow growth rate of
C. burnetii (
24). All
C. burnetii isolates lack the nonmevalonate (i.e., glyceraldehyde 3-phosphate-pyruvate) pathway for isoprenoid biosynthesis that is common in gram-negative bacteria. Instead, they encode the mevalonate pathway (Cbu0607, Cbu0608, Cbu0609, and Cbu0610) that is found almost exclusively in gram-positive cocci and considered horizontally acquired from a primitive eucaryote (
88,
110).
Isolate-specific gene polymorphisms are evident that may affect metabolic function. For example, isolate heterogeneity occurs within the MFS transporter family whose members transport a variety of molecules, including amino acids (
18). NM contains 13 intact transporters, including three paralog groups (Cbu0906-Cbu1162, Cbu0902-Cbu0515, and Cbu0566-Cbu2067-Cbu2068) that presumably resulted from gene duplication. All NM MFS transporter ORFs are conserved in other isolates, although some are frameshifted (e.g., Cbu0432 is frameshifted in both K and G). K, G, and Dugway have 11, 12, and 13 intact transporter genes, respectively, and share CbuD1564, which is frameshifted in NM. Interestingly, most
C. burnetii MFS transporters have homologs (
e greater than −39) in
L. pneumophila, such as
phtJ that transports valine (
18). This observation is consistent with
C. burnetii's auxotrophy for this amino acid (
96).
Gene polymorphisms in metabolic genes may also directly impact isolates' virulence potential. Clearance of
C. burnetii during acute infection requires macrophage activation by gamma interferon (
4). Among the gamma interferon-induced macrophage effector functions that limit bacterial replication is the upregulation of indoleamine-2,3-dioxygenase (IDO). This enzyme degrades
l-tryptophan to
l-kynurenine (
33), and a role for IDO in limiting
C. burnetii growth has been suggested (
13). Dugway may be less susceptible to IDO activity because, unlike NM, K, and G, it appears to be a tryptophan prototroph and capable of synthesizing the amino acid from chorismate via a putative
trp operon encoding intact
trpE (CbuD1249),
trpG (CbuD1249a),
trpD (CbuD1251),
trpC (CbuD1251), a fused
trpBF (CbuD1253), and intact
trpA (CbuD1255). Other isolates apparently lack TrpD. Fused TrpBF is present in NM and K, while G instead appears to have intact TrpF and fused TrpAB. An unlinked tryptophan operon repressor, TrpR, is present in all isolates.
Secretion systems. All
C. burnetii isolates appear capable of type I secretion while lacking prototypical proteins required for type II secretion (
20). Isolates contain a number of Pil genes that are involved in type IV pilus biogenesis and evolutionarily related to components of type II secretion systems (T2SSs) (
84). Type IV pili are important virulence factors in a number of gram-negative bacteria, which act by promoting host cell adherence, twitching motility, biofilm formation, and secretion (
16,
45).
C. burnetii encodes core genes for type IV pilus biosynthesis, including
pilA (Cbu0156; major prepilin),
pilE (Cbu0412; minor prepilin),
fimT (Cbu0453; minor prepilin),
pilD (Cbu0153; peptidase/methylase),
pilB (Cbu0155; ATPase),
pilQ (Cbu1891; outer membrane secretin),
pilC (Cbu0154; multispanning transmembrane protein),
pilF (Cbu1855; uncharacterized envelope protein), and
pilN (Cbu1889; uncharacterized envelope protein) (
84). However,
C. burnetii lacks a key gene required to synthesize a functional type IV pilus as all isolates lack a homolog to the ATPase PilT that presumably acts in concert with PilB to promote the pilus assembly and disassembly required for twitching motility (
16). As recently suggested for
Francisella novicida, the incomplete repertoire of
C. burnetii type IV pilus genes may constitute a secretion system (
45). Polymorphisms are found in Pil genes of
Francisella spp. and are associated with virulence potential (
36).
C. burnetii isolates also display genetic heterogeneity in Pil genes, with apparent frameshifts in
pilN of NM,
pilC of K and G, and
pilQ of G and Dugway which disrupt the functional domains of the latter two genes.
Substrates of
L. pneumophila type II and
F. novicida type IV pili secretion systems are biased toward signal sequence-containing enzymes (e.g., peptidases, glycosylases, phospholipases, and phosphatases) (
29,
45). All
C. burnetii isolates encode abundant enzymes with predicted signal sequences including phospholipase A1 (Cbu0489), phospholipase D (Cbu0968), acid phosphatase (Cbu0335), Cu-Zn superoxide dismutase (Cbu1822), and
d-alanine-
d-alanine carboxy peptidase (Cbu1261). Isolate variation is also observed in this group of genes, e.g., a gene encoding a predicted secreted chitinase (CbuD1225) is intact only in Dugway. Along with PV detoxification,
C. burnetii exoenzymes could presumably degrade macromolecules into simpler substrates that could then be transported by the organism's numerous transporters.
While
C. burnetii lacks a T3SS, it does encode a Dot/Icm T4SS homologous to that of
L. pneumophila (
97). All
C. burnetii isolates contain 23 of the 26
L. pneumophila dot/icm genes. While all isolates lack a homolog of IcmR, a predicted chaperone for the pore-forming protein IcmQ (
35), they contain a functional homolog of IcmR (Cbu1634a) immediately upstream of IcmQ (Cbu1634) (
35). Dot/Icm secretion substrates that are translocated directly into the host cell cytosol are essential for the establishment of the
L. pneumophila replication vacuole (
107), and a similar scenario has been invoked for
C. burnetii (
108).
L. pneumophila translocates over 50 proteins with its Dot/Icm T4SS, and these effector proteins target a variety of host cell functions (
58,
78). With the possible exception of Cbu1063 and Cbu0414 (
58),
C. burnetii lacks homologs of these proteins, which is consistent with the pathogen's biologically distinct vacuolar niche (
95). However, using
L. pneumophila as a surrogate host and a well-established adenylate cyclase-based translocation assay, four
C. burnetii ankyrin repeat domain-containing proteins (discussed in more detail below) were recently identified as Dot/Icm substrates (
83). Finally,
C. burnetii lacks autotransporter proteins indicative of type V secretion (
50) and a newly described gram-negative T6SS (
11).
Eucaryotic-like proteins. A common property of bacterial virulence factors is their ability to functionally mimic the activity of host cell proteins (
98). For example, it is clear that many predicted and documented T2SS and T4SS substrates of
L. pneumophila are most similar to eucaryotic proteins and/or contain eucaryotic-like domains and were likely acquired via interdomain horizontal gene transfer (
14,
29,
30). Similar to
L. pneumophila, C. burnetii isolates collectively encode multiple eucaryotic-like proteins predicted to modulate host cell functions (Table ).
| TABLE 3.Eucaryotic-like ORFs of C. burnetii isolates |
C. burnetii isolates encode two eucaryotic-like sterol reductases. One reductase (Cbu1206), annotated as a

-sterol reductase, displays the highest overall similarity to a eucaryotic protein, with no matches to procaryotic proteins. The other reductase (Cbu1158), annotated as a sterol delta-7-reductase, is most similar to a reductase of “
Candidatus Protochlamydia amoebophila” UWE25, a
Parachlamydia-related obligate endosymbiont of free-living amoebae (
e greater than −180) (
52). Cbu1158 has no additional matches to procaryotic proteins, with the next highest identity to a reductase from
Arabidopsis thaliana (
e greater than −135). While all
C. burnetii isolates encode orthologs of Cbu1206, Cbu1158 is frameshifted in G. De novo synthesis of cholesterol or ergosterol by
C. burnetii is improbable as the organism lacks the terminal enzymes of these pathways. Alternative scenarios include modification of a host cholesterol intermediate that could serve as a sterol-based signaling molecule or structural component of the PV membrane. Indeed,
C. burnetii's infectious cycle is severely disrupted by pharmacological agents that disrupt host cell cholesterol metabolism (
53). The maintenance of a sterol delta-7-reductase in modern day
Protochlamydia and
Coxiella suggests that the enzyme functions similarly in some key aspect of the host-parasite relationship, a hypothesis supported by the observation that vacuoles harboring
Parachlamydia acanthamoebae in human macrophages are superficially similar to the
C. burnetii PV in displaying endolysosomal characteristics (e.g., acidic and LAMP-1 positive) (
40).
Eucaryotic domains identified in
C. burnetii proteins include ankyrin repeats, F boxes, serine/threonine protein kinases (STPK), tetratricopeptide repeats (TPR), leucine-rich repeats (LRR), and coiled-coil domains (CCD).
C. burnetii isolates collectively encode 15 ankyrin repeat domain-containing proteins (Anks), although this protein family shows considerable heterogeneity among isolates in terms of frameshifting and truncation. Anks typically contain at least two tandem 33-residue ankyrin repeat motifs but can contain up to 34 repeats (
74). Anks mediate protein-protein interactions that influence a variety of cellular processes, including transcription, endocytosis, and cytoskeletal rearrangements (
74). The Dugway isolate encodes 11 full-length Anks, while the NM isolate encodes only 5. Four intact Ank genes (
ankC, -
F, -
G, and -
K) are conserved between the 4 isolates. Intact versions of
ankD, -
H, and -
O are found only in Dugway, with
ankO unique to this isolate. Intact versions of
ankN are found only in K and G, and
ankB, -
J, and -
L appear to be disrupted in all isolates. Both
L. pneumophila and
C. burnetii Anks are translocated into the host cytosol by a Dot/Icm-dependent mechanism (
83). Of interest is the transcription and translocation of the C-terminal portion of frameshifted NM AnkB (Cbu0145) (
83), suggesting that this and other disrupted effectors may still be functional.
Modulation of eucaryotic ubiquitin signaling pathways is an emerging theme in bacterial pathogenesis. Indeed, many bacterial F-box proteins are thought to possess ubiquitin ligase activity (
5).
C. burnetii isolates collectively encode three proteins (CbuA0014, Cbu0355, and Cbu0814) with predicted F boxes, a finding also made in a previous bioinformatic screen (
5). This ~50-amino-acid domain is typically N-terminally located and involved in ubiquitination processes that target proteins for degradation by the proteosome (
56). Moreover, bacterial F-box-containing proteins are known substrates of T3SS and T4SS (
5). Consistent with other F-box proteins, Cbu0355 and Cbu0814 contain additional C-terminal motifs involved in protein-protein interactions (
56) in the form of ankyrin repeats and regulator of chromatin condensation (RCC) domains, respectively. The F-box domain comprises the majority of CbuA0014, which is only 77 amino acids long. An additional
C. burnetii protein that is potentially ubiquitin-related is Cbu1217, with Hect-like E3 ubiquitin ligase domain similarity in its N terminus and multiple C-terminal RCC domains. Like the Anks, F-box proteins display considerable heterogeneity among
C. burnetii isolates, with apparently full-length Cbu0814 and Cbu0355 present only in K and Dugway, respectively, and CbuA0014 specific to the QpH1 plasmid of NM. Moreover, Cbu1217 appears to be full-length in NM and K but frameshifted in G and Dugway.
C. burnetii isolates collectively encode three eucaryotic-like domain proteins with similarity to STPKs (Cbu0175, Cbu1168, and Cbu1379) that may directly impact host cell signal transduction.
Mycobacterium tuberculosis secretes an STPK that is critical for the generation of its replication vacuole in macrophages (
109), and a similar scenario may be associated with
C. burnetii infection. Again, variation is observed in this family of proteins, with Cbu1168 orthologs apparently full-length only in Dugway and Cbu1379 full-length only in K.
The TPR is a 34-amino-acid motif, with the Sel-1-type TPR (SLR) displaying a variable consensus length of 36 to 44 amino acids (
68). TPR/SLR repeats are arranged in tandem arrays and form antiparallel α-helices that promote folding of proteins into a solenoid tertiary structure (
68). Proteins of this nature are frequently involved in signal transduction pathways, and the working model for TPR/SLR-containing proteins is that they function as adaptor proteins in building signaling complexes (
68). Together,
C. burnetii isolates encode seven TPR and four SLR proteins. Like the Ank proteins, Dugway encodes the most full-length TPR/SLR proteins, with two unique chromosomal TPR proteins and one unique QpDG plasmid SLR protein. Interestingly,
L. pneumophila encodes three annotated SLR proteins (EnhC, LidL, and LpnE) that all appear to function in the early stages of pathogen uptake to establish the organism's vacuolar replicative niche (
21,
25,
64,
75). As discussed earlier, only
C. burnetii and
L. pneumophila encode EnhC, containing 21 and 18 SLRs, respectively, suggesting that the protein was acquired from a common source to mediate replication vacuole biogenesis. EnhC is conserved among all
C. burnetii isolates, although the Dugway version has a 34-amino-acid extension at the C terminus.
C. burnetii encodes one LRR protein (Cbu0820) that appears to be full-length only in K and Dugway.
C. burnetii isolates also encode numerous hypothetical proteins with predicted CCDs (
P > 85%) (data not shown), a structure that consists of interacting heptad α-helices (
15). Of particular relevance to
C. burnetii is the prevalence of these domains in SNARE (soluble
N-ethylmaleimide-sensitive factor attachment protein receptor) proteins that control vesicular fusion (
15). Continuous fusion between the
C. burnetii PV and endolysosomal/autophagosomal compartments is considered necessary for PV biogenesis and maintenance (
108), and it is logical to suspect that the organism secretes a CCD protein(s) that modulates host regulators of these processes.
Because most genes encoding eucaryotic-like proteins are conserved, as intact genes or pseudogenes, between
C. burnetii isolates, they were likely present in a common ancestor of isolate lineages. Given that
C. burnetii lacks a system for conjugal gene transfer, interdomain transfer of at least some of these genes to an ancestral
Coxiella organism may have occurred via two horizontal gene transfer events, the first occurring between a eucaryote and an intracellular bacterium with gene transfer capability that secondarily transferred the gene to the ancestral
Coxiella organism (
26). For example,
C. burnetii may have acquired its sterol delta-7-reductase from “
Ca. Protochlamydia amoebophila” UWE25, which encodes a potentially functional F-like DNA transfer system (
39), after this ameobal symbiont, or ancestor with an expanded host range, acquired the enzyme via interdomain transfer with a eucaryotic host. Supporting the latter scenario is the observation that
C. burnetii trpA, -
B, and -
C, which are tightly linked to the gene encoding the sterol delta-7-reductase, show unusually high degrees of identity with their counterparts in pathogenic chlamydiae. Pathogenic chlamydiae and “
Ca. Protochlamydia amoebophila” UWE25 share a common ancestor (
52); however, only the former encodes Trp biosynthesis genes. Thus,
C. burnetii may have coincidently acquired Cbu1158 and Trp genes in a single horizontal gene transfer event that occurred with the common chlamydial ancestor and not “
Ca. Protochlamydia amoebophila” UWE25.
Free-living amoeba-like single-cell protozoa have been proposed to serve as bacterial “melting pots” where promiscuous horizontal gene exchange can occur between internalized bacteria (
80). This process has recently been proposed for
Rickettsia bellii, whose genome contains a disproportionate number of genes from amoebal parasites, including
L. pneumophila and “
Ca. Protochlamydia amoebophila” UWE25 (
80). Aided by lateral gene transfers, amoebae are furthermore speculated to serve as evolutionary “training grounds” where ancestral amoeba-associated bacteria evolved to become pathogens of multicellular eucaryotes, a prototypic example being
L. pneumophila (
69). Laboratory studies show resistance of
C. burnetii to destruction by the free-living amoeba
Acanthamoeba castellanii (
59); however, a niche for
C. burnetii in environmental amoeba has not been demonstrated (
41).
Signal transduction and gene regulation. Relative to gram-negative facultative intracellular bacteria,
C. burnetii has a paucity of potential two-component regulatory systems. This likely reflects a stable environmental intracellular niche (
73) and is observed in most (
3,
100) but not all (
19) obligate intracellular bacteria.
C. burnetii encodes only four obvious two-component systems: PhoB-PhoR (Cbu0367-Cbu0366), QseB-QseC (Cbu1227-Cbu1228), GacA-GacS (LemA) (four potential response regulators and Cbu0760), and an unclassified response regulator-RstB-like system (Cbu2005-Cbu2006). Four CsrS-like sensory kinases are also present in isolates, with Cbu0634 truncated in NM and K. The stimulus of RtsB is unknown, while PhoB-PhoR senses phosphate in
Escherichia coli (
7). GacA-GacS regulates the production of multiple virulence factors in gram-negative bacteria (
46). Moreover, the activation of GacA-GacS homologs in
L. pneumophila (LetA-LetS) during stationary phase derepresses the activity of the mRNA binding protein CsrA and results in pathogen differentiation to a stress-resistant transmission phase (
70,
71). In
L. pneumophila, limiting nutrients results in production of the alarmone ppGpp by SpoT and RelA that, in addition to LetA-LetS, activates the stationary-phase sigma factor RpoS, which can also upregulate transmission-phase genes (
70,
71). It is conceivable that a
C. burnetii GacA-GacS pair functions similarly to
L. pneumophila LetA-LetS. The
C. burnetii SCV developmental form is biologically reminiscent of the
L. pneumophila transmission phase, and the conservation in all
C. burnetii isolates of
spoT (Cbu0303),
relA (Cbu1375),
rpoS (Cbu1609), and
csrA (Cbu0024 and Cbu1050) suggests similar roles for these genes in
C. burnetii biphasic development. Other developmentally regulated genes, such as
hcbA and
scvA, that encode SCV-specific DNA binding proteins (
47,
48), are also conserved in all isolates. The sensor kinase QseC has recently been described as a bacterial adrenergic receptor that recognizes bacterial autoinducers and the eucaryotic hormones epinephrine/norepinephrine (
22). Interestingly,
C. burnetii QseB-QseC has also been classified as a PmrA-PmrB-type two-component system (
113). In
Salmonella enterica, PmrA-PmrB acts coordinately with PhoP-PhoQ to regulate resistance to cationic peptides and Fe
3+ and is activated by submillimolar Fe
3+ and low pH (~5.8) (
85). Moreover, PmrA has been shown directly and indirectly to regulate Dot/Icm type IV secretion in
L. pneumophila and
C. burnetii, respectively (
113). CpxA-CpxR, another two-component regulator of the
L. pneumophila Dot/Icm T4SS (
34), is lacking in
C. burnetii.
In summary, the four-way genome comparison in this report provides a comprehensive view of C. burnetii's genome architecture and gene content. Highlighting C. burnetii's obligate relationship with a eucaryotic host is evidence of interdomain horizontal gene transfer. Gene loss in the form of pseudogene formation appears to be the major source of genomic diversity among C. burnetii isolates, an evolutionary process facilitated by IS element-mediated chromosomal rearrangements. Thus, isolate-specific repertoires of pseudogenes, such as those in the Ank gene family, may impact isolates' virulence potential. The observation that Dugway has the largest genome with the fewest pseudogenes suggests that this lineage is the least pathoadapted, a hypothesis that is consistent with lack of human disease isolates in Dugway's genomic group and the isolate's attenuated virulence in animal models of Q fever. The pathogenetic correlates of disease potential described in this report provide the foundation for testable hypotheses related to gene function and C. burnetii virulence.