|Home | About | Journals | Submit | Contact Us | Français|
Two groups independently sequenced the Agrobacterium tumefaciens C58 genome in 2001. We report here consolidation of these sequences, updated annotation, and additional analysis of the evolutionary history of the linear chromosome, which is apparently limited to the biovar I group of Agrobacterium.
Agrobacterium tumefaciens C58 has an unusual genome structure consisting of one circular chromosome (chromosome I), one linear chromosome (chromosome II), and two plasmids (1–5). Previous studies showed that the linear chromosome is derived from a plasmid (4, 5). Isolates of Agrobacterium spp. have traditionally been subdivided into three different groups, called biovars, based on differences in physiology and host range. Biovar I can be further subdivided into genomovars, with C58 belonging to genomovar 8 (6–10).
C58 was originally isolated in 1958 by Robert Dickey from a cherry gall in upstate New York (11). Lead authors of this article independently sequenced the genomes of two isolates of A. tumefaciens C58 in 2001 (4, 5). Wood et al. (5) sequenced a C58 strain stored in frozen glycerol in the laboratory of Eugene Nester at the University of Washington (hereafter designated C58UW). Goodner et al. (4) sequenced the ATCC 33970 isolate obtained from the American Type Culture Collection (ATCC) in 1999. This strain, also originating from the Nester lab via John Kleyn, was deposited in 1981 and subcultured three times by ATCC and once by researchers at the Monsanto Company prior to sequencing. The number of passages separating these strains from each other or the original strain isolated by Dickey is unknown.
A comparison of the two independent genome sequences identified 52 differences, including two insertion/deletions (indels) (see Table S1 in the supplemental material). All disparate loci were resequenced following PCR amplification (See the supplemental Materials and Methods in the supplemental material). Twenty-two of these apparent differences were base-calling errors, and 30 were true differences. Of the 30 true differences, 16 were single base changes residing in the 16S rRNA and tRNA-Ile region near 58.3 kbp on chromosome I, apparently resulting from recombination between rRNA loci. C58UW also contains two deletions relative to ATCC 33970. The first is a 90-bp in-frame deletion within a putative two-component response regulator gene (atu5121). The second is a 111-bp symmetrical intergenic deletion on the circular chromosome that removes part of a short repeat sequence called CIR2 (12, 13).
The latter result prompted a broader search for short repeated palindromic sequences within the C58 genome, resulting in the identification of three classes of repeats (Fig. 1). Two of these sequences, AgroCIR1 and AgroCIR2, were previously identified in a search for conserved motifs containing a binding site (GANTC) for the essential methylase CcrM (12, 13). The third element is herein designated AgroKE3 and bears no resemblance to the CIR repeats. A full KE3 repeat consists of 29-bp inverted repeats bracketing a variable region containing 49 to 76 bp (Fig. 1). Like AgroCIR1 and AgroCIR2, KE3 elements are preferentially found on chromosome I, consistent with the evolution of these repeats on the ancestral chromosome during the radiation of the Rhizobiaceae prior to the origins of chromosome II. Table S2 in the supplemental material summarizes the distribution of KE3 repeats in several closely related, fully sequenced relatives of A. tumefaciens C58, including the recently sequenced biovar I strain Agrobacterium sp. strain H13-3 (14). Table S3 in the supplemental material lists locations where these sequence repeats overlap a predicted open reading frame (ORF) in the C58 genome. The biological function of the KE3 repeats has not yet been determined.
All true variant loci between C58UW and ATCC 33970 were compared to the same loci in other A. tumefaciens C58 culture lines obtained from laboratories in the United States and Europe (see Tables S1 and S4 in the supplemental material). In 12 of 14 cases, including both indels, the ATCC 33970 sequence was identical to each of the C58 comparison strains, while in two cases, all reference strains matched C58UW. While the cause of the additional variation in C58UW is unclear, it may be that the strain was passaged more frequently or that one of the acquired variations resulted in a higher mutation rate.
The telomeres of the C58 linear chromosome are covalently closed hairpin loops (4). This unusual structure meant that neither of our original studies was able to provide a complete sequence for its ends; similarly, the telomeres have not yet been sequenced for H13-3 (14). Recently, however, the C58 telomere sequences, along with a biochemical characterization of the protelomerase enzyme that maintains them (TelA, encoded by atu2523), have been published (15). The updated GenBank submission has been modified to include these data (see below).
We hypothesize that linearization of chromosome II was a seminal event in the divergence of biovar I strains, such as A. tumefaciens C58, from biovar III strains, such as Agrobacterium vitis S4 (16). The simplest model for its linearization involves a single crossover between the ancestral circular chromosome II and a linear phage or plasmid, thereby incorporating both telomeres and telA into the genome in one event. Surprisingly, however, the telA gene is located on the circular chromosome I (4, 16). Comparison of the C58 and S4 genomes shows significant synteny between a single region of S4 chromosome I and three regions of the C58 chromosome. Our analysis of these relationships suggests that multiple recombination events in the atu2521-atu2523 (telA) region transferred telA to chromosome I and initiated two large DNA transfers to chromosome II (Fig. 2). The breakpoint for the translocation of genes atu3507 through atu8188 (0.558 to 0.575 Mbp on chromosome I of C58) from the circular to the linear chromosome is adjacent to telA. A similar translocation breakpoint occurs on chromosome I immediately upstream of atu2521 and extends through atu4172 (lysC) into the adjacent rRNA loci (1.311 to 1.292 Mbp on chromosome I of C58). These genomic reorganizations transferred an rRNA operon and several essential genes to chromosome II while placing telA on chromosome I. Intriguingly, atu2521 and atu2522 are more similar to orthologs in Rhizobium and Sinorhizobium, respectively, than they are to their orthologs in S4 (avi3961 and avi3963, respectively), suggesting that atu2521, atu2522, and telA may have entered the C58 genome together, perhaps as part of a linear plasmid.
We surveyed a large number of strains that have historically been classified as Agrobacterium, including biovar I (A. tumefaciens), biovar II (Agrobacterium radiobacter), and biovar III (A. vitis), for the presence of a linear mega-size DNA molecule by pulsed-field gel electrophoresis (PFGE) and for telA and its adjacent ORF, atu2522 (acvB), by PCR or a Southern blot (see Table S5 and Fig. S1 in the supplemental material). It is important to note in considering this analysis that strong evidence supports the reclassification of biovar II strains as Rhizobium (6, 10, 16–19). Our survey data indicate that linear chromosomes are unique to biovar I strains (see Table S5 in the supplemental material) (14, 20). Based on this comparison, we can now define the unique genomic content of biovar I as containing a linear replicon accompanied by a telA gene, in addition to other diagnostic genes (see Table S6 in the supplemental material).
We have added the recently published telomere sequences and consolidated our two earlier versions of the C58 genome sequence into a single version with updated annotation from our own work and that of others. ATCC 33970 was chosen as the standard sequence because it is most similar to other reference strains analyzed (see Table S1 in the supplemental material). Notations indicating the variations found in the C58UW strain are included in this update. The gene identifiers (locus tags) referring to genes kept from the original annotations are the same as those defined by Wood et al. (format, atuXXXX) (5). Newly predicted protein-coding genes were given the locus tag pattern atu8XXX, as were a number of genes that were initially predicted only by Goodner et al. (4) or by analyses subsequent to the initial genome deposit (21). Newly predicted small RNA genes (22) were designated with the locus tag pattern atu9xxx. Details of the reannotation are provided in the supplemental Materials and Methods in the supplemental material.
The GenBank accession numbers for the consolidated sequences are as follows: chromosome I, AE007869; chromosome II, AE007870; pTiC58, AE007871; and pAtC58, AE007872. These sequence files replace the two original versions of the A. tumefaciens C58 sequence files submitted by our research groups (4, 5).
This work was supported by National Science Foundation grants 0333297, 0603491, and 0736671, by a grant from the M. J. Murdock Charitable Trust life sciences program (2004262:JVZ), by a science education grant from the Howard Hughes Medical Institute (52005125), and by the Monsanto Fund.
We thank Kelly Williams for pointing out the group I intron within the tRNA gene. We thank the hundreds of students at Hiram College, Seattle Pacific University, the University of Arizona, and the Mesa (Arizona) Biotechnology Academy who have contributed to the annotation of this genome.
Published ahead of print 14 December 2012
Supplemental material for this article may be found at http://dx.doi.org/10.1128/AEM.03192-12.