|Home | About | Journals | Submit | Contact Us | Français|
The facultative intracellular bacterial pathogen Brucella infects a wide range of warm-blooded land and marine vertebrates and causes brucellosis. Currently, there are nine recognized Brucella species based on host preferences and phenotypic differences. The availability of 10 different genomes consisting of two chromosomes and representing six of the species allowed for a detailed comparison among themselves and relatives in the order Rhizobiales. Phylogenomic analysis of ortholog families shows limited divergence but distinct radiations, producing four clades as follows: Brucella abortus-Brucella melitensis, Brucella suis-Brucella canis, Brucella ovis, and Brucella ceti. In addition, Brucella phylogeny does not appear to reflect the phylogeny of Brucella species' preferred hosts. About 4.6% of protein-coding genes seem to be pseudogenes, which is a relatively large fraction. Only B. suis 1330 appears to have an intact β-ketoadipate pathway, responsible for utilization of plant-derived compounds. In contrast, this pathway in the other species is highly pseudogenized and consistent with the “domino theory” of gene death. There are distinct shared anomalous regions (SARs) found in both chromosomes as the result of horizontal gene transfer unique to Brucella and not shared with its closest relative Ochrobactrum, a soil bacterium, suggesting their acquisition occurred in spite of a predominantly intracellular lifestyle. In particular, SAR 2-5 appears to have been acquired by Brucella after it became intracellular. The SARs contain many genes, including those involved in O-polysaccharide synthesis and type IV secretion, which if mutated or absent significantly affect the ability of Brucella to survive intracellularly in the infected host.
Brucellosis is a disease caused by bacteria of the genus Brucella. This disease is zoonotic and endemic in many areas throughout the world, causing chronic infections with common outcomes being abortion and sterility in infected animals. In humans, it is a severe acute febrile disease, producing focal lesions in bones, joints, the genitourinary tract, and other organs. Complications may include arthritis, sacroiliitis, spondylitis, and central nervous system effects. Brucella can cause abortions in women (as can other bacteria), mostly in the first and second trimesters of pregnancy (21, 27), and men can exhibit epididymo-orchitis (37).
Currently, there are nine recognized species of Brucella, based on host preferences and phenotypic differences. Six classically recognized species are Brucella abortus (cattle), Brucella canis (dogs), Brucella melitensis (sheep and goats), Brucella neotomae (desert wood rats), Brucella ovis (sheep), and Brucella suis (pigs, reindeer, and hares). These six species have been subdivided into 18 biovars based on a panel of culture and biochemical characteristics (41). Recently, three additional species have been identified, namely Brucella microti from voles (49), “Brucella pinnipediae” from pinnipeds, and Brucella ceti from cetaceans (20).
The genome from B. melitensis was the first to be sequenced (16), followed by those from strains of B. suis and B. abortus (9, 11, 24, 44). New genome sequences for B. canis, B. ceti, B. melitensis, and B. suis, as well as the recent release of the B. ovis genome, allow a more detailed look into this group. Furthermore, the increasing number of genomes for Brucella relatives from the order Rhizobiales allows examination of this genus in a broader context.
The main objectives of this study were to examine the phylogeny of Brucella, to examine differences among the different genomes and clades, and to do a detailed comparison between the Brucella genomes and those of their closest relatives in Rhizobiales. Techniques used to examine these differences included structural analysis of the Brucella chromosomes, an in-depth study of areas of possible horizontal transfer into the Brucella genomes, and a comparison of known genes and pseudogenes present in other Brucella genomes that correspond to them.
Ten different strains from six of the Brucella species were used in this comparison. Three strains with complete genomes (B. canis ATCC 23365, B. melitensis ATCC 23457 [bv. 2], and B. suis ATCC 23445 [bv. 2]) were sequenced by Los Alamos National Labs and the Joint Genome Institute. They also sequenced B. ceti, which has an incomplete genome with seven contigs. All were given their primary annotation by PATRIC, which is the NIAID/PathoSystems Resource Integration Center, a major repository for Brucella genomic data (51). Six additional strains that had been annotated previously (B. abortus S19, B. abortus bv. 1 strain 9-941, B. melitensis 16 M, B. abortus 2308, B. ovis ATCC 25840, and B. suis 1330) were reannotated by PATRIC prior to the comparison to ensure uniformity.
Chromosomal DNA sequences from nine Brucella species (all except B. ceti) were aligned using Mauve 2.2.0 (14).
We used OrthoMCL (32) to create groups of orthologous proteins. To create a representative set of ortholog groups (OGs) for the order Rhizobiales, 37 complete or nearly complete genomes were used (Table (Table1),1), incorporating 8 of the 11 families in the order.
In this study, a pseudogene is defined as a gene containing one or more in-frame stop codons and/or frameshifts (FS) compared to those of its orthologs. Three methods were used to identify potential pseudogenes within Brucella. The first method was based on the program GenVar, an analytical pipeline used to examine closely related species or strains and identify missed gene calls as well as split genes or indels (62). The second method aligns neighboring pairs of protein predictions using BLASTP (3) against the National Center for Biotechnology Information (NCBI) nonredundant protein database. Neighbors with alignments to the same target sequence with an E value of <10−5 were further evaluated by manual curation. If the pseudogene prediction from either method proved to be correct upon manual examination, the original gene and coding sequence (CDS) features were deleted, and a new gene feature spanning both gene predictions was created and marked with the pseudogene qualifier.
Once a first set of pseudogenes was identified by the above-described two methods, a third method was used to identify additional pseudogenes based on the first set. The DNA sequences of pseudogenes in the first set were first aligned to the bacterial subdivision of NCBI's nonredundant protein database using BLASTX and subjected to cutoffs of 165 bits and an E value of 10−9 or, to ensure alignments to very short pseudogenes are not missed, greater than 85% identity (at the protein level) for 50% of the query length. For each pseudogene, the protein sequence with the highest-scoring alignment (by bit score) was retrieved for use in the next step. These retrieved protein sequences were used as queries in a TBLASTN search of the nine Brucella genomes to identify new genes or pseudogenes by orthology. The resulting alignments were processed to merge overlapping or nearby (within 30 bp) high-scoring segment pairs to form meta-alignments to determine the approximate coordinates of the new (pseudo)gene. To identify its endpoints more precisely and determine the number of FS and in-frame stop (nonsense) codons relative to the functional homolog used as a query, the program estwise from the Wise2.0 package (6) was used to generate an alignment spanning the FS and nonsense features. The command line option “-alg 333” was used to select the simplest FS-tolerant alignment algorithm instead of using the hidden Markov model, with states for intron identification (which is enabled by default and intended for processing eukaryotic sequences). Note that this third method in effect also computes groups of genes related by similarity. While our primary method for computing OGs was OrthoMCL, as noted above, we used the method described here to identify pseudogenes that are “genome specific.” A pseudogene is genome specific if it is the only pseudogene in a gene similarity group containing at least one other member.
We employed Alien Hunter (AH) (59), a program that identifies regions that may have been laterally transferred. These are regions that have unusual sequence composition in terms of k-mers for various values of k (called interpolated variable order motifs in the terminology used in reference 59). An anomalous region is one whose AH score is above a genome-dependent and automatically calculated threshold that takes into account the sequence composition of the whole genome (termed background composition). AH was run on all 10 Brucella genomes. We called the regions identified by AH anomalous regions.
Because AH has been noted to have low specificity (29), we applied additional filters to the regions detected by AH. Anomalous regions that contained syntenic protein-coding genes in different Brucella genomes as given by OrthoMCL ortholog data and double checked by BLAST2seq (56) were labeled shared anomalous regions (SARs). We then compared the SARs obtained to those of the Ochrobactrum anthropi genome using MUMmer, option PROmer, which compares translated nucleotide sequences in all six frames (28). Using a SAR as the query and the whole genome of O. anthropi as the subject, we computed the coverage of that SAR in the O. anthropi genome by adding up the total length of all matches found by PROmer, regardless of their location in the O. anthropi genome, and dividing the result by the SAR length. Note that this approach is conservative, because matches found by PROmer may be disjointed and therefore may not correspond to a contiguous region in the O. anthropi genome (as would be expected if O. anthropi did in fact share that region). SARs that were absent or less than 50% complete in O. anthropi were selected for further analysis. Finally, SARs were cross-referenced with previously published studies. Several of these interrupt a tRNA gene and were originally named (36) to designate the size of the region in kilobases and the tRNA identity (e.g., 8T is an 8-kb region that interrupts a tRNA that codes for a threonine). SARs are labeled by chromosome and region order within the chromosome (e.g., SAR 1-8 is the eighth shared anomalous region on chromosome 1).
Protein sequences for the 10 Brucella genomes and four outgroup species (Ochrobactrum intermedium , O. anthropi ATCC 49188, Bartonella quintana Toulouse, and Mesorhizobium loti MAFF 303099) were clustered by applying OrthoMCL (32) to all-versus-all BLAST data, yielding 2,246 protein families with one and only one representative from each Brucella genome. Each protein family was made representative for the outgroup strains by excluding strains with more than one member in the family, leaving O. anthropi represented in 1,970 families, O. intermedium in 1,924, B. quintana in 851, and M. loti in 1,699. The protein sequences from each family were aligned using MUSCLE (18), and ambiguous portions of the alignment were removed using Gblocks (8). The concatenation of these alignments contained 671,030 amino acid characters, though only 8,004 were Brucella informative (for which at least two Brucella genomes differed from the others or one Brucella genome differed from the others and an outgroup was present). RAxML (53) was used with the PROTGAMMAWAGF model to prepare a maximum likelihood tree and in its quick mode to prepare 100 bootstrap trees.
All nine Brucella genomes studied have two circular chromosomes. Chromosome 1 is the larger chromosome, with a median length of 2.1 Mb, and chromosome 2 has a median length of 1.2 Mb. Both have similar G+C content, averaging 57.1% for chromosome 1 and 57.3% for chromosome 2. The total number of genes per genome (about 3,460) is very similar among the nine complete genomes studied, as is the number of protein-coding genes (about 3,180). These results are summarized in Table Table22 on a per-genome and per-chromosome basis.
Multiple replicon alignments were done for 9 of the 10 genomes. (B. ceti was excluded because it is an unfinished genome.) Chromosome 1 is similarly arranged among all nine genomes, with the only major difference being the B. suis ATCC 23445 genome (Fig. (Fig.1).1). Examination of both chromosomes of this species indicated that a 210-kb segment of chromosome 1 has been translocated to chromosome 2. Chromosome 2 appears to be more plastic than chromosome 1, with more internal rearrangements. A segment of approximately 700 kb in chromosome 2 is a shared inversion among the three B. abortus genomes (Fig. (Fig.1),1), with respect to the others.
The results of a maximum likelihood phylogenetic analysis of the 10 Brucella strains plus four outgroup species are shown in Fig. Fig.2.2. This tree sorts the Brucella genomes studied here into four clades, as follows: (i) the B. melitensis-B. abortus clade; (ii) the B. ovis clade; (iii) the B. suis-B. canis clade; and (iv) the B. ceti clade. Each node received 100% bootstrap support except for two extremely short internal branches. Although the tree is nominally bifurcating, the shortness and suboptimal support of those two branches suggest caution in assigning a strict evolutionary branching order to the four Brucella clades; they appear to have radiated explosively. The generated tree (Fig. (Fig.2A)2A) also shows, as expected, that Ochrobactrum is the closest relative to Brucella (48).
We identified an average of 40 anomalous regions in the Brucella strains (range, 32 to 51 regions). Chromosome 1 had an average of 17.4 regions (range, 13 to 21 regions), and chromosome 2 had an average of 23 (range, 14 to 38 regions). This variation is explained in part by variation in the genome-specific threshold score determined by AH, which was the main reason that led us to adopt the concept of the SAR. Seventeen SARs were absent or nearly absent in O. anthropi and were examined further (Table (Table3).3). These SARs ranged in size from 2 to 19 kb, with SAR 1-12 being the smallest and SAR 1-17 being the largest (see Table S1 in the supplemental material). Four of the 17 SARs showed the hallmark pattern of genomic islands, flanked on one side by an intact tRNA gene and on the other side by a fragment of that tRNA gene. Three of these, 8T (SAR 1-2), 15G (SAR 1-7), and 2I (SAR 1-12), have been described previously in chromosome 1 (36), and we identified a novel genomic island; SAR 2-10 is found in chromosome 2 and is 14 kb in length. This island is integrated into a tRNA-Thr and contains a type I restriction-modification system. For several additional SARs with tRNA gene neighbors, no tRNA fragment was identified at the other end (SARs 1-3, 1-5, 1-6, 1-8, 1-14, 1-16, 2-7, and 2-11). These may be older genomic islands that have lost the tRNA fragment, or the association with a tRNA gene may be accidental. It was more difficult to assign the endpoints of these SARs; the end of the sequence of the last shared ortholog identified by AH was used. Complete information on the 17 SARs examined, including the genes carried by them, is provided (see Table S1 in the supplemental material).
The translocation of a 210-kb segment from chromosome 1 to chromosome 2 in B. suis ATCC 23445 also moved SAR 1-16, and the inversion on chromosome 2 in the B. abortus genomes inverted SARs 2-8 and 2-10 (Fig. (Fig.1).1). Other SARs of interest that were not near tRNA genes include 1-17, 2-1, 2-4, 2-8, and the previously described IncP island (30), corresponding to SAR 2-5. Interestingly, SAR 1-2 (8T) contains a three-gene segment that is also found in SAR 1-8, including a resolvase family site-specific recombinase. Either these genes entered the Brucella twice independently or there was an insertion into one of the sites from either SAR 1-2 or SAR 1-8. SAR 2-5 is also interesting, as it had been noted previously that this region, the IncP island, was found only in B. suis, B. canis, B. neotomae, and in some of the marine strains (30). In this study, we found SAR 2-5 in B. ceti, but as noted previously, it is missing from B. ovis and from all the B. abortus and B. melitensis genomes.
SAR 1-7, first identified as 15G by Mantri and Williams (36) and later examined experimentally (45, 46), contains 15 genes, 2 of which (wboA and wboB) are of particular interest, since they help determine the smooth phenotype (see below).
We obtained 15,986 OGs from 37 Rhizobiales genomes (this number does not include singleton proteins that failed to group with others). Within the genus Brucella, there were 747 OGs that contained any combination of the 10 Brucella genomes but none of the other Rhizobiales genomes. Of these, 140 OGs had at least one representative from each of the 10 Brucella genomes (see Table S2 in the supplemental material). Using this set of 747 OGs, we identified a region that is found in all the Brucella genomes except for the three B. abortus genomes. This 23-kb segment contains a number of important genes, including those encoding glycosyl transferase and glycerol kinase (Table (Table4);4); this region was not identified as anomalous. It should be noted that the glycosyl transferase and glycerol kinases are the second copies of these genes. The B. abortus genomes have only a single copy of each gene.
A single protein representative from each of the 747 OGs was used to query the NCBI nonredundant protein database. Of these, 688 OGs had no BLASTP hits to any genome other than Brucella, accounting for 21.5% of all Brucella proteins (see Table S3 in the supplemental material). The majority of these Brucella-specific OGs are annotated as hypothetical proteins; over 50% have either 9 or all 10 of the genomes represented. There were 59 OGs with BLASTP hits (E value cutoff, >10−10) for Brucella and for genomes outside of Rhizobiales (see Table S4 in the supplemental material), indicating that the proteins with the nearest homology are not present in the closest relatives of Brucella; this is a small percentage (1.7%) of the Brucella proteins.
Notable among these 59 OGs are the components of the type IV secretion system (maps to SAR 2-1), tra genes (map to SAR 2-5), and the wbk gene cluster (maps to SAR 1-3), which was previously identified (22).
In a comparison of the AH and ortholog/BLASTP data, the observation that one of the regions contained a housekeeping gene led to the identification of SAR 1-17 as a composite. It contains a five-gene region shared with O. anthropi but flanked on both sides by genes unique to Brucella. Figure Figure33 shows an annotation of the 17 SARs in the B. suis 1330 genome.
A total of 1,396 pseudogenes were identified (this analysis excludes the unfinished B. ceti genome). Of these, 222 were found to be genome specific. Many such genome-specific pseudogenes may simply be the result of a sequencing error. The other identified pseudogenes are members of 522 OGs (see Table S5 in the supplemental material). The ratio of pseudogenes to total genes carried by a genome (the pseudogene fraction) was used as a benchmark for comparison between organisms. Values ranged from a low of 3.9% for B. melitensis 16 M to a high of 7.3% for B. ovis. The average value for the nine complete Brucella genomes was 4.6%. The highest number of genome-specific pseudogenes is found in B. ovis, with 107 (Table (Table2).2). The next highest number is found in B. suis ATCC 23445, which has 34, followed by B. canis, with 22. Pseudogene fractions were also calculated on a per-chromosome basis. Chromosome 2 had a higher percentage of pseudogenes than chromosome 1 for all nine genomes studied; on average, the pseudogene fraction was 3.9% for chromosome 1 and 6.0% for chromosome 2.
In the initial analysis of the B. suis 1330 genome sequence, Paulsen et al. (44) noted an unexpected capacity of this organism to use plant-derived compounds as an energy source. The β-ketoadipate pathway takes two aromatic compounds, protocatechuate and catechol, which are produced by the degradation of plant-derived molecules, and metabolizes them to intermediates that can enter the tricarboxylic acid cycle (34). There are 12 protein-coding genes that have been identified as being part of this pathway in B. suis 1330 (44); all of them are found on chromosome 2. In the case of Agrobacterium tumefaciens C58, the enzymes involved in this pathway are organized into two distinct operons (43); Brucella seems to have a similar arrangement, as do both Ochrobactrum genomes. Examination of all 10 Brucella genomes showed that at least 1 of the 12 genes carried by every genome except B. suis 1330 has become a pseudogene and that both of these operons are completely missing in B. suis ATCC 23445 (Fig. (Fig.44).
The 10 different Brucella genomes examined here are quite similar in genome size and the numbers of genes and proteins. They are also similar in the structural organization of the chromosomes, with the exceptions being a 210-kb translocation seen in B. suis ATCC 23445 and a 700-kb inversion in chromosome 2 shared by the B. abortus genomes (Fig. (Fig.11).
The combined phylogenomic analysis of 2,377 ortholog families shows that the depth of divergence for these 10 Brucella strains is quite shallow (Fig. (Fig.2B).2B). Despite this low level of divergence, with few characters differing among the genomes, the branching order seems to be clear and well supported, as reflected by support values. The major structure is a radiation producing a B. abortus-B. melitensis clade, a B. suis-B. canis clade, and B. ovis and B. ceti clades. A recent phylogenetic analysis (7) shows the same four subgroups of Brucella observed here, but our use of an outgroup further shows that these are four clades that radiated explosively. The B. abortus-B. melitensis clade segregated into two branches, one containing the B. abortus genomes and the other containing the B. melitensis genomes. B. canis nests within the B. suis clade, suggesting that there may have been a host switch. Genome sequences for B. neotomae and B. pinnipediae are not currently available, but previously presented evidence (7) indicates that a similar host switch may have occurred in these two species.
Assuming current knowledge of host preference is accurate, we can ask whether Brucella phylogeny reflects the phylogeny of their hosts. The mammalian taxa that have been identified as the preferred Brucella hosts belong to three distinct groups, all at the level of order in mammals. Neotoma is a genus of cricetid rodent found in the order Rodentia. The genus Canis (dogs, wolves, and coyotes) and the family Phocidae (seals) are in the order Carnivora. Bos (cattle and oxen), Ovis (sheep), Capra (goats), Sus (pigs), and the cetacean group (whales and dolphins) are all united in Cetartiodactyla. Humans (Primates) have also been infected but are not preferred hosts. These three mammalian orders representing the hosts are all well separated phylogenetically (42). The host and pathogen phylogenies are distinct and not similar. Although our phylogenetic data closely reflect the data found previously (7), our conclusions differ. The phylogeny of the Brucella isolates does not match that of their nominal mammalian hosts. This is especially clear from the inclusion of B. canis in our study. Considering the fact that most of the Brucella isolates have been identified in cetartiodactylid hosts, one could speculate that the ancestor of Brucella species infected a member of early cetartiodactylids and radiated within this group, with host switches to Carnivora and Rodentia occurring later.
The species concept in bacteria is a subject of debate (10, 52), as is the definition of different species within Brucella (40). The high degree of similarity of all these genomes, in comparison to other bacterial groups, suggests a close phylogenetic relationship. However, clear differences in host preference might still justify the separate species designations as they presently exist. For example, cattle have been described as the natural or primary hosts for B. abortus, and yet it has also been found in horses, pigs, sheep, goats, Bactrian camels, dromedary camels, water buffalo, yaks, elk, dogs (12), and humans (5, 55). It has also been isolated from rodents on occasion, although it was noted that these infections seem to be from areas where there was a large number of infected cattle (15). This list alone represents five different orders of mammalian hosts. A survey of the literature shows that the host range exhibited by B. abortus strains also extends to different degrees in the other Brucella clades. However, the isolation frequencies of different Brucella species from infected hosts are consistent with some type of host preference (58).
Genome reduction, or reductive evolution, involves gene loss through mutational inactivation and deletion (4, 19). It has been noted in a number of intracellular pathogenic bacteria, including Rickettsia prowazekii (4), Mycobacterium leprae (19), Shigella flexneri, and Salmonella enterica serovar Typhi (13). All of these bacteria are obligate intracellular pathogens, whereas Brucella is a facultative intracellular pathogen that can survive outside the host under certain conditions (12). Are the Brucella genomes undergoing reductive evolution? Based on genome size alone, the answer seems to be yes. Brucella genomes are all similar in size, with an average size of 3.29 Mb. Their nearest sequenced relatives are O. anthropi (5.22 Mb) and O. intermedium (4.6 Mb), which are both markedly larger. Pseudogene fractions can also be an indication of a genome reduction process. Excluding genome-specific pseudogenes, the average fraction determined here was 4.6%. This is low compared to the 50% estimate for Mycobacterium leprae (19), 24% for Rickettsia prowazekii (4), 15% for Shigella flexneri (13), 14% for Bartonella quintana (2), and 9% for Bartonella henselae (2). Of these, only the species of the Bartonella genus are in the order Rhizobiales. On the other hand, in three free-living Agrobacterium species, also in the order Rhizobiales, the fraction is less than 2% (50). Because the pseudogene fractions in these other studies were obtained using different methodologies, it is difficult to compare these numbers. However, using the general estimate that bacterial genomes have between 1 and 5% pseudogenes (33), the 4.6% fraction observed in Brucella can be considered relatively high and suggestive of genome degradation. Moreover, we did note more pseudogenes on chromosome 2 than on chromosome 1. Together with the higher degree of rearrangements observed on chromosome 2, this supports the conclusion that chromosome 2 is more dynamic, perhaps owing to its hypothesized origin as a plasmid (50).
The presence of many pseudogenes in the β-ketoadipate pathway is striking and reminiscent of the proposed “domino theory” of gene death (13), where after a crucial gene within a complex pathway becomes nonfunctional, a mass gene extinction is triggered. The B. suis genome is anomalous in retaining this gene cluster intact, as the cluster is entirely absent in the B. suis ATCC 23445 genome, and one or more of its genes have become pseudogenes in other genomes. It is likely that in its adaptation to an intracellular milieu, Brucella no longer requires this pathway that allows soil bacteria to break down plant compounds. We suspect that the preservation of this gene cluster in B. suis 1330 is anomalous and that over time it will succumb to pseudogenization; however, it is also possible that this particular strain (unlike other members of the B. canis-B. suis clade) makes use of these genes during periods of existence outside an animal host.
From examining the regions of potential lateral transfer, we note that many regions are unique to Brucella and not shared with Ochrobactrum. It is likely that these regions (Table (Table3;3; see also Table S1 in the supplemental material) entered Brucella after diverging from the ancestor it shared with Ochrobactrum, indicating that lateral transfer does happen despite intracellular preferences. Dobrindt et al. (17) suggest that horizontal transmission is more likely to occur in niches that contain diverse bacterial species and not as likely to occur in sparsely populated environments, which include intracellular niches like the host macrophage, the ultimate destination of Brucella (35). Of course, it is possible that these regions entered the genome at some point before Brucella committed to an intracellular preference. But when one considers that the journey to the macrophage takes Brucella through a complex series of environments that are inhabited by a wide variety of organisms with which they might interact, it seems plausible that Brucella has the opportunity to experience lateral transfer. The mammalian gut has been recognized as one of the most densely populated ecosystems on earth (38), and it has been documented that one of the most common means of transmission of Brucella involves ingestion of forage or water contaminated with genital discharge (54) or ingestion of raw milk or milk products (47). Passage through the gut would provide ample opportunity for different species of bacteria to interact, and it is plausible that Brucella experienced lateral transfer in this environment. In addition, Brucella might also interact with bacteria in the soil on which the blood, tissues, and aborted fetus of the host lies. Crawford et al. (12) report that Brucella can survive for up to 66 days in moist soil and up to 185 days in cold soil.
Our study contains strong indications that Brucella has acquired genes by lateral transfer. In particular, SAR 2-5, the IncP island, appears to have entered Brucella after it diverged from Ochrobactrum and after the individual species began to separate. This SAR contains the Tra proteins, known to be a type IV secretion system (31), and it is found in B. suis, B. canis, and B. neotomae (30) and is here identified in B. ceti. The fact that the phylogenetic tree places the B. suis-B. canis clade and the B. ceti clade cluster together makes it seem likely that SAR 2-5 was acquired by their common ancestor. A complete genome from B. neotomae is not yet available, but a previous study shows that this species is phylogenetically close to B. suis and the marine Brucella spp. Thus, we hypothesize that there was a common ancestor to these three clades and that SAR 2-5 was laterally transferred into it. The fact that it is in the same location in the genomes studied here gives further weight to a single, ancestral acquisition (Fig. (Fig.1).1). Because it is shared only among some of the Brucella genomes, it could be argued that it was acquired after the ancestor had begun living intracellularly, as it is unlikely that this type of lifestyle developed twice independently. However, it is also possible that the ancestor that gave rise to the B. ovis and the B. melitensis-B. abortus clades lost this region or that each of the clades lost it independently.
Some of the genes indicated as having been acquired by lateral transfer play an important role in the survival of this pathogen in its host. These include the enzymes involved in producing the smooth phenotype in Brucella (26, 45). Lipopolysaccharide (LPS) is the major structural component of the outer membrane of gram-negative bacteria. It is composed of a lipid core, a core oligosaccharide, and a distal O-polysaccharide (O-PS) side chain (22). A phenotypic characteristic used to distinguish between Brucella species is the presence of the O-PS. Isolates of B. abortus, B. suis, and B. melitensis have a smooth morphology with the O-PS intact, while B. canis and B. ovis are rough, as they have the lipid core and the core oligosaccharide but lack O-PS. The O-PS is a major contributor to the antigenic variation of the bacterial envelope as well as the ability of Brucella to survive in macrophages (26). Several studies have indicated specific genes as being important for the development of the smooth phenotype in Brucella (1, 22, 23, 25, 39, 61). Recently, Gonzalez et al. (23) looked at 19 genes that had been indicated as being important in producing smoothness and found that disruption of 13 genes (wboA, wboB, wa**, wbhE, manB, wbkA, gmd, per, wzm, wbkF, wbkD, prm, and manBcore) resulted in a rough phenotype in B. melitensis, with an additional 6 genes indentified as playing roles that were not fully determined. Rajashekara et al. (45) demonstrated that mutations of two genes, BMEI0997 (wboB) and BMEI0998 (wboA), resulted in a rough phenotype. Furthermore, they showed that BMEI0999, a hypothetical protein whose function is unknown, was necessary to restore a smooth LPS in rough strains. However, we have found that other smooth strains (B. abortus 2308 and B. melitensis ATCC 23457) are completely missing this hypothetical gene.
There are two well-established species of Brucella, B. canis and B. ovis, that are naturally rough and yet fully infective, and these 19 LPS-associated genes were specifically examined in these two genomes. B. ovis is missing two genes, wboA and wboB, that encode enzymes that polymerize N-formylperosamine (23), and without them, B. ovis is unable to complete the distal O-PS. Both of these genes reside in SAR 1-7 (15G) of B. suis 1330; their loss in B. ovis has been previously reported (46, 60). The B. ovis genome also has a truncated wzt; this gene encodes a protein that functions as a part of an ABC transporter, with its partner encoded by wzm. This specific enzyme (Wzt) is found in SAR 1-3 of B. suis 1330 and most likely entered Brucella by lateral transfer. This enzyme could be functional even if truncated. However, it could also indicate that the genes involved in LPS synthesis in SAR 1-3 are in a process of decay because the pathway is no longer complete in B. ovis. Only direct experimental evidence will determine if these genes are functional in B. ovis.
B. canis has truncations in 2 of the 19 LPS synthesis genes, wbkF, an undecaprenyl-glycosyltransferase, and wbkD, an epimerase/dehydratase. A truncated gene could still be functional, but the fact that B. canis is rough and that all other genes appear normal indicates that at least one of these genes is responsible for producing the rough phenotype. It is interesting that the rough phenotype results from different mutations in these two genomes, as follows: B. canis with mutations in wbkF and wbkD and B. ovis missing wboA and wboB and having a truncated wzt. It appears that roughness independently developed twice.
All known isolates of B. ceti are smooth (A. Whatmore, personal communication), and yet when the enzymes involved in LPS synthesis were examined in this species, manB, a phosphomannomutase whose function has not yet been determined (23), was found to be truncated due to a naturally occurring transposon insertion. Apparently, this does not affect the smooth phenotype of this organism. However, a rough phenotype was produced in B. melitensis when manB was experimentally mutated by a transposon (23).
Many of the 19 genes considered necessary for complete LPS synthesis are in SARs. Eighteen genes are located on chromosome 1, and one is found on chromosome 2. Of the genes found on chromosome 1, 12 are located on SAR 1-3 (Fig. (Fig.3),3), with one additional gene, wbkD (VBI0007BS1_0525), directly adjacent to this region. Two additional genes are found in SAR 1-7 (Fig. (Fig.3).3). These SARs are not adjacent on chromosome 1, with 402 kb between them, making it likely that these SARs represent genomic islands laterally transferred into Brucella genomes in separate events.
Genes of particular interest that we hypothesize to have entered the genome horizontally include the type IV secretion system, the tra genes, and the enzymes responsible for LPS synthesis that give Brucella a smooth phenotype. Mutations or the absence of these LPS genes is responsible for the rough phenotype of both B. ovis and B. canis. All these observations lead us to believe that Brucella, despite its preference for an intracellular milieu (e.g., phagocytic cells), has the ability and opportunity to interact with other bacteria in their environment and has acquired useful genes that facilitate its intracellular lifestyle.
We thank Sohan Nagrani (VBI) for his careful analysis of the literature and genes previously described in this genus.
This work is funded through NIAID contract HHSN266200400035C to Bruno Sobral. Funding to pay the Open Access publication charges for this article was provided by NIAID contract HHSN266200400035C to Bruno Sobral. We also thank Dennis Dean (Fralin Life Science Institute, Virginia Tech) for providing financial support to aid in resolving the number of contigs in the B. ceti genome.
Published ahead of print on 3 April 2009.
§Supplemental material for this article may be found at http://jb.asm.org/.