|Home | About | Journals | Submit | Contact Us | Français|
When the genomes of Caulobacter isolates NA1000 and K31 were compared, numerous genome rearrangements were observed. In contrast, similar comparisons of closely related species of other bacterial genera revealed nominal rearrangements. A phylogenetic analysis of the 16S rRNA indicated that K31 is more closely related to Caulobacter henricii CB4 than to other known Caulobacters. Therefore, we sequenced the CB4 genome and compared it to all of the available Caulobacter genomes to study genome rearrangements, discern the conservation of the NA1000 essential genome, and address concerns about using 16S rRNA to group Caulobacter species. We also sequenced the novel bacteria, Brevundimonas DS20, a representative of the genus most closely related to Caulobacter and used it as part of an outgroup for phylogenetic comparisons. We expected to find that there would be fewer rearrangements when comparing more closely related Caulobacters. However we found that relatedness was not correlated with the amount of observed “genome scrambling”. We also discovered that nearly all of the essential genes previously identified for C. crescentus are present in the other Caulobacter genomes and in the Brevundimonas genomes as well. However, a few of these essential genes were only found in NA1000, and some were missing in a combination of one or more species, while other proteins were 100% identical across species. Also, phylogenetic comparisons of highly conserved genomic regions revealed clades similar to those identified by 16S rRNA-based phylogenies, verifying that 16S rRNA sequence comparisons are a valid method for grouping Caulobacters.
Alphaproteobacteria comprise a large and metabolically diverse group of bacteria that includes the genus Caulobacter. Caulobacters are found in essentially all habitats ranging from fresh and salt water, soil, root systems, and water treatment plants. They thrive in low nutrient conditions and exhibit a rare dimorphic phenotype consisting of a stalked non-motile cell and a motile swarmer cell produced at cell division. The motile cell is immature and must first shed its flagellum and differentiate into the stalked form before it replicates its chromosome and divides asymmetrically to regenerate itself and produce a flagellated daughter cell, thus continuing its life cycle. The ability to synchronize this cell cycle has allowed great advancements towards comprehending the genetic regulatory network and signal transduction pathway controlling the C. crescentus cell cycle [12, 29].
Compared to the wealth of information available to support cell cycle research, the amount of research dedicated to understanding the environmental and evolutionary biology of Caulobacters is minimal. However, as the genomic sequences of more Caulobacters are becoming available, a significant opportunity has arisen to add to this literature. Ribosomal RNA analyses show that bacteria previously defined as Caulobacter are actually grouped into two separate branches consisting of freshwater and marine species, Caulobacter and Maricaulis, respectively [1, 27]. Further 16S rDNA comparisons by Abraham et al.  revealed that the freshwater branch is clearly defined into two genera, Caulobacter and Brevundimonas. Thus Brevundimonas genomes are ideal for use as an outgroup for the analysis of Caulobacter genomes. The genus Caulobacter can be divided into two branches as well, based on their 16S rRNA gene sequences . One branch contains C. crescentus and C. segnis while the other contains C. henricii and Caulobacter sp. K31 (Fig. 1). This separation influenced us to compare these genomes to the essential genome that has been experimentally defined for the C. crescentus strain NA1000 . The genomic DNA sequences of both C. crescentus strain CB15 and its derivative NA1000  and C. segnis strain TK0059 have been published [5; 23]. In addition, Caulobacter strain K31, a groundwater isolate of particular interest for its ability to tolerate and degrade chlorophenols , has also had its sequence elucidated .To provide a fourth strain for this genome comparison, the genome nucleotide sequence of C. henricii strain CB4  was determined as part of this study. Although many other Caulobacter isolates are listed in the IMG genome database , no other Caulobacter genome sequences have been fully assembled. Similarly, Brevundimonas subvibrioides strain CB81 is the only Brevundimonas with an available genome sequence . Therefore, we have determined the nucleotide sequence of the Brevundimonas DS20 genome to provide a second Brevundimonas genome. Comparing these six genomes, we found that the experimentally determined C. crescentus essential genome  was conserved in all six species with minor exceptions. However, we found extensive genome rearrangements among these six genomes.
The Caulobacter henricii strain CB4 (ATCC 15253) was obtained from the American Type Culture Collection. It was grown at 30°C for 48 hours in PYE medium  that contains 2 g Bacto Peptone, 1 g Yeast Extract, 0.5M MgSO4, and 0.5M CaCl2 per L. In addition, we isolated a Caulobacter-like bacterium from a contaminated culture of Caulobacter FWC20 . It was grown at 30°C under the same conditions as CB4. Based on genome comparisons, we determined this bacterium to be a novel member of the genus Brevundimonas. We named this isolate Brevundimonas DS20.
Genomic DNA from Caulobacter henricii CB4 and Brevundimonas DS20 was isolated from a saturated PYE culture using the Qiagen DNeasy Tissue Kit following the manufacturer’s protocol. Primers 16S_533F (GTGCCAGCMGCCGCGGTAA and 16S_U1492R (GGTTACCTTGTTACGACTT) were used to amplify the 16S rRNA region of the genome and the amplified DNA was sequenced using Sanger Sequencing on an ABI 3730 sequencer. Genomic DNA library construction and nucleotide sequencing were carried out by the University of Washington Pacbio Sequencing Services using the Pacific Biosciences RSII sequencing system. The library prep template for the 10 kb protocol was used but the DNA was sheared for 20 kb fragments using a Covaris tube and a final 0.4x bead wash for a finished library. The collection protocols for the P4-C2 chemistry were:
Standard Seq v2
Movie Time: 120 min
Insert Size (bp): 20000
Stage Start: True
Control: DNA Control 3kb-10 kb.
The average read length for the CB4 DNA was 4289 bp with approximately 55X coverage . The nucleotide sequence reads were assembled using HGAP2.0 as previously described . For both strains the nucleotide sequence data were assembled into a single chromosomal contig. In addition, a 97894 bp circular plasmid was predicted to be present in the CB4 strain. To verify the chromosome assemblies, genomic DNA cut with the PmeI, SnaBI, or SwaI restriction enzymes was separated by Pulsed Field Gel Electrophoresis (PFGE) using the protocols described by . For both genomes, the observed bands exactly matched the bands predicted from the assembled chromosomal sequence. For the CB4 genome, a SspI digest contained an extra band that was approximately 100 kb. This band confirmed the existence of the CB4 plasmid since it corresponded to the predicted 98 kb size of the plasmid. Annotations were performed using The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) .
The resulting genome sequences and annotation are available in the NCBI database: Caulobacter henricii CB4 (accession numbers CP013002 and CP013003) and Brevundimonas DS20 (accession number CP012897).
Whole genome comparisons were performed using Progressive MAUVE Multiple Genome Alignment . A BLAST comparison of the 480 experimentally-identified C. crescentus essential genes  to the predicted genes in the other five genomes was performed using the BlastStation version 1.3 software  to determine if homologous genes were present in the genomes of the Caulobacter strains CB4, TK0059, and K31 and the Brevundimonas subvibrioides CB81 and Brevundimonas DS20. BLAST matches with an e-value that was less than e−5 were considered significant. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 5.1 . Phylogenetic trees were constructed using the maximum likelihood method . All branches were recovered in both neighbor-joining and maximum-parsimony trees [11, 25].
Caulobacter henricii CB4  forms yellow colonies and can grow on minimal media glucose plates  in the presence of vitamin B12. Brevundimonas DS20 forms yellow mucoidal colonies which are round, smooth, slightly raised, and glistening. It is able to grow on minimal media and the cells are rod shaped and lack the curved phenotype found in many Caulobacters. At 30°C, CB4 had a doubling time of 190 minutes when grown in PYE. In addition it had a doubling time of two weeks at 10°C. DS20 grew faster with a doubling time of 120 minutes at 30°C and 12 days at 10°C.When grown at 10°C light microscopy of both strains indicated that they appeared healthy with highly motile swarmer cells.
The assembly of the Caulobacter henricii CB4 genome resulted in a 3,864,204 bp chromosome and a 93,0084 bp plasmid (Table 1). The CB4 chromosome contains 3751 genes and has a GC content of 66.4%. As such, the codons in the protein coding regions should have a high G+C content, especially in the third codon position (GC3). Indeed, 29 of the 30 most used codons contain either a G or a C in the third position. The overall GC3 percentage for CB4 is 83.2%. The plasmid has a GC content of 65.4 % and a GC3 content of 80.8%. It contains one integrase gene but no transposases among its 97 genes. Since the K31 genome contains two megaplasmids , we compared the predicted amino acid sequences of the CB4 plasmid genes to those of the K31 plasmids and found that only four of the plasmid genes are homologous to any of the genes in either of the two K31 plasmids. However, the plasmid contains 11 heavy metal resistance genes (czc) in addition to 15 czc genes on the chromosome. Despite the extra czc genes, CB4 is resistant to the same concentrations of cadmium and zinc as NA1000 which has only 13 czc genes. Nearly all of the other genes that code for proteins with predicted functions code for other types of metal resistance or for plasmid functions. No other plasmids have been reported in any other Caulobacter genome.
The Brevundimonas DS20 genome consists of 3,457,610 bp and does not include a plasmid (Table 1). It has a GC content of 67% and contains 3411 genes. As in CB4, 29 of the 30 most used codons contain either a G or a C in the third position with an overall GC3 percentage is 86.3%. The pattern of codon usage is similar to that of CB4 as well. The crescentin gene responsible for the crescent cell shape in most Caulobacter species  was not found in the DS20 genome.
The gene order of closely related species is usually very similar and often the assembled genome of one species can be used as the template for the alignment of closely related species. However, a previous comparison of NA1000 and K31 revealed rearrangements an order of magnitude greater than previously described in other bacteria . When the K31 chromosome was aligned to that of NA1000, more than 60 inversions and 45 large translocations were readily observed. Since this level of genome scrambling makes it difficult to identify the endpoints of individual inversion events, we hypothesized that the level of observed genome scrambling would decrease in comparisons of more closely related genomes based on phylogenetic comparisons (Fig. 1).
When the C. segnis TK0059 genome was compared to the NA1000 genome, only 35 inversions and 11 translocations were estimated from the Mauve comparison (Fig. 2). Since TK0059 is more closely related to NA1000 than to K31 , this reduced level of genome scrambling was consistent with the hypothesis that two closely related strains might have a small enough number of gene inversions that individual events could be accurately identified. However, when we compared the Caulobacter henricii CB4 genome to the closely related K31 genome, we observed more than 75 inversions and over 45 translocation events (Fig. 3). Most of these translocations were small with only five being over 100,000 bp. The rearrangements were also mostly organized around the origin of replication as shown previously for the NA1000 and K31 comparison . Intriguingly, when compared to the more distantly-related TK0059 genome, the CB4 genome had only three large translocations and fewer inversions than we observed in the CB4/K31 comparison (data not shown). Thus the number of inversions and translocations appears to be unrelated to genetic distance. The two Brevundimonads also exhibit these high levels of genome rearrangement (Fig. 4).
Previous experimental work had identified 480 protein coding regions that are essential for the growth of C. crescentus strain NA1000 in a nutrient rich medium . We hypothesized that genes that were essential for growth of NA1000 in PYE would also be essential and highly conserved for the other five bacterial strains in this study as well. We used Blaststation software to BLAST the 480 essential genes against the CDS regions of the other genomes and determined that 94% of the genes coded for proteins that had homologs in the other five species. In fact, the SSU ribosomal protein S10P had 100% amino acid identity in all four Caulobacter species, another ribosomal protein and IF-1 had 100% identity in three species, and 17 more were 100% identical in two Caulobacter species. Most of these highly conserved genes code for ribosomal proteins where amino acid sequence conservation is expected due to the fact that these proteins bind to an rRNA and to each other to form a very precise protein manufacturing machine.
There were nine NA1000 essential genes that were absent in all five of the other genomes (Table 2). Four of these essential genes, CCNA 761, 1304, 2841, and 3307, have an unknown function, CCNA 2844 codes for an antitoxin protein, and CCNA 465, 466, 467, and 469 code for proteins involved in cell wall synthesis. In addition, 8 other C. crescentus essential genes are present in at least one other species, but are missing in at least one other species. Three of the 8 are present in all four of the Caulobacter genomes but not in the two Brevundimonas genomes. Two other genes that are present in only some of the Caulobacter genomes also code for antitoxin proteins. It is unsurprising that an antitoxin protein would be essential. These genes code for proteins that neutralize a specific toxin. The absence of these antitoxin genes paired with the presence of the corresponding toxin gene would prove fatal for the organism, but the gene is not needed if the toxin gene is not present. The four essential genes involved in cell wall synthesis (CCNA 465, 466, 467, and 469) are contained in a region annotated as a prophage element (coordinates 473044 to 499074) in the NA1000 genome. It is unlikely that genes gained from a prophage would become essential unless they are needed to protect against lethality due to some other part of the prophage . In fact numerous antisense transcripts have been detected in this region (Schrader et al. 2014) so complex regulatory circuits may be present. This phenomenon appears to have occurred in C. segnis TK0059 with the NA1000 essential gene IF-2 that is found in every genome that we compared except for the TK0059 genome. However, the TK0059 genome contains an alternate gene Cseg_3298 that codes for a protein that was predicted to function as IF-2 even though it has an unrelated amino acid sequence. In addition,the gene has a GC content of only 55% suggesting that it was recently obtained by horizontal gene transfer. However, more testing needs to be done to verify that this TK0059 protein can actually function as an IF-2 translation initiation factor.
Comparative sequence analysis of the 16S ribosomal RNA genes is currently the most widely used approach for the reconstruction of microbial phylogeny since the rRNA operon size, nucleotide sequence, and secondary structures of the three rRNAs (16S, 23S, 5S) are highly conserved within a bacterial species . The 16S rRNA is the most conserved of these subunits and has been used widely as a sort of “evolutionary clock” . A comparison of the 16S rRNA nucleotide sequences shows within genus differences ranging up to 3% and between genus differences in the range of 5-7% (Table 3). However, a single gene cannot be used to assess genome divergence since different parts of the genome diverge at different rates.
Similarly whole genome comparisons only provide an average rate of divergence. Therefore, we decided to perform a phylogenetic analyses of two large clusters of genes that span thousands of base pairs: the divisional cell wall (dcw) cluster containing 26 genes (CCNA_2622-2647) involved in cell division and cell wall synthesis  and a ribosomal protein region containing 28 genes (CCNA_1304-1332). These conserved regions are large enough for robust statistical comparisons and provide the opportunity to study the divergence of individual functional units. The nucleotide sequences of the Caulobacter dcw operon differ by as much 19% in pairwise comparisons of the Caulobacter species and by 27% when the Caulobacter operons were compared with those of the two Brevundimonads (Table 4). However, the phlyogenetic tree of the dcw gene cluster was essentially the same as the 16S tree.
The Caulobacter ribosomal protein operons differed by as much 10% in pairwise comparisons of the nucleotide sequences and 21% when compared with those of the Brevundimonads (Table 5). Thus this region is more highly conserved than the dcw operon probably because the amino acid sequences of ribosomal proteins are more constrained since they are involved in complex intermolecular interactions. In both comparisons, the nucleotide sequences where much more diverse than the 16S rDNA sequences, but the phylogenetic trees were essentially the same
We also compared a conserved prophage region that spans approximately 20 genes (CCNA_2861-2880) and found that the nucleotide sequence differs by as much 17% in pairwise comparisons among the Caulobacters (Table 6). Thus the divergence of the Caulobacter prophage gene nucleotide sequences appears to be slightly less than that of the dcw operon even though a selective constraint is unknown. However, none of the Caulobacter prophage nucleotide sequences had significant identity (< e10−5) to the corresponding Brevundimonas sequences.
Upon closer inspection, we found that there was significant amino acid identity among the genes in this region in all six genomes. Part of the explanation for the disparity between the nucleotide and amino acid sequence identities may be that the Caulobacter phage regions display a codon usage bias for CTG (Leucine), GGG (Glycine), GCG (Alanine), and CGG (Arginine) in contrast to the Brevundimonas phage regions which display a bias towards CTC (Leucine), GGC (Glycine) CGC (Arginine), and GCC (Alanine). Interestingly, this difference in codon usage bias was not observed when the genomic codon usage was compared. This difference in codon usage would facilitate diversity at the nucleotide level while conserving the amino acid sequence. There may be some environmental or evolutionary pressure that is influencing the conservation of these genes, but it is not obvious. Also, even though the Brevundimonas subvibrioides genome contained all genes of the conserved phage operon, we detected two translocations of genes to locations away from this region. Three genes were grouped together in what appears to be a four gene operon along with a recombinase gene that is absent in four of the strains in this study but is found in K31. A fourth gene was found in a different four gene operon along with three other genes not found in any of the other bacteria in our study. Since these translocated genes were found 21,084 and 25,619 base pairs before the start of the prophage region, it is also possible that there was a single translocation event followed by an insertion between the first and second genes. In either case, we can conclude that the prophage region was present in the common ancestor of Caulobacter and Brevundimonas and has remained intact until recently despite the high level of other genome rearrangements observed in these species. Furthermore, when phylogenetic trees were produced using this set of 20 prophage genes, the resulting trees were similar to the ribosomal RNA gene trees except that the K31 and CB4 genes did not form a monophyletic group.
In summary, we found that despite the extensive scrambling of the Caulobacter and Brevundimonas genomes, most genes shown previously to be essential for C. crescentus  are highly conserved in other species of Caulobacter and Brevundimonas. Also, a prophage present in the genomes of both genera exhibits conserved amino acid sequences in the protein coding regions, but altered codon use between the two genera even though the overall codon usage patterns are very similar. Thus, the prophage genes may be subject to some unrecognized evolutionary pressure.
This work was funded in part by a fellowship from The Southern Region Educational Board (SREB) to DS and by NIH grant GM076277 and NSF grant EF-0826792 to BE.