Search tips
Search criteria 


Logo of aemPermissionsJournals.ASM.orgJournalAEM ArticleJournal InfoAuthorsReviewers
Appl Environ Microbiol. 2010 March; 76(5): 1604–1614.
Published online 2010 January 4. doi:  10.1128/AEM.02039-09
PMCID: PMC2832352

Conserved Symbiotic Plasmid DNA Sequences in the Multireplicon Pangenomic Structure of Rhizobium etli[down-pointing small open triangle]


Strains of the same bacterial species often show considerable genomic variation. To examine the extent of such variation in Rhizobium etli, the complete genome sequence of R. etli CIAT652 and the partial genomic sequences of six additional R. etli strains having different geographical origins were determined. The sequences were compared with each other and with the previously reported genome sequence of R. etli CFN42. DNA sequences common to all strains constituted the greater part of these genomes and were localized in both the chromosome and large plasmids. About 700 to 1,000 kb of DNA that did not match sequences of the complete genomes of strains CIAT652 and CFN42 was unique to each R. etli strain. These sequences were distributed throughout the chromosome as individual genes or chromosomal islands and in plasmids, and they encoded accessory functions, such as transport of sugars and amino acids, or secondary metabolism; they also included mobile elements and hypothetical genes. Sequences corresponding to symbiotic plasmids showed high levels of nucleotide identity (about 98 to 99%), whereas chromosomal sequences and the sequences with matches to other plasmids showed lower levels of identity (on average, about 90 to 95%). We concluded that R. etli has a pangenomic structure with a core genome composed of both chromosomal and plasmid sequences, including a highly conserved symbiotic plasmid, despite the overall genomic divergence.

It is becoming clear that bacterial genomes of strains of the same species vary widely both in size and in gene composition (39). An unexpected degree of genomic diversity has been found by comparing whole genomes (39). For instance, in Escherichia coli strains, differences of up to 1,400 kb account for some strain-specific pathogenic traits (5, 56). The extent of intraspecies genome diversity varies in different bacterial lineages. Some species have a wide range of variation; these species include E. coli (42), Streptococcus agalactiae (53), and Haloquadratum walsbyi (34). Other bacteria display only limited gene content diversity; an example is Ureaplasma urealyticum (1, 54). Tettelin and colleagues have suggested that bacterial species can be characterized by the presence of a pangenome consisting of a core genome containing genes present in all strains and a dispensable genome consisting of partially shared and strain-specific genes (53, 54). This concept is rooted in the earlier ideas of Reanney (43) and Campbell (7) concerning the structure of bacterial populations, and it indicates both that there is a pool of accessory genetic information in bacterial species and that strains of the same or even different species can obtain this information by horizontal transfer mechanisms (7, 43).

Genome size and diversity are related to bacterial lifestyle. Small genomes are typical of strict pathogens such as Rickettsia prowazekii (2) and endosymbionts such as Buchnera aphidicola (44a). In contrast, free-living bacteria, such as Pseudomonas syringae and Streptomyces coelicolor, have large genomes (4, 6). The bacteria with the largest genomes are common inhabitants of heterogeneous environments, such as soil, where energy sources are limited but diverse (32). An increase in genome size is attributable mainly to expansion of functions such as secondary metabolism, transport of metabolites, and gene regulation. All these features are common to the nitrogen-fixing symbiotic bacteria of legumes, which are collectively known as rhizobia, and their close relative the plant pathogen Agrobacterium. The genomes of such bacterial species have diverse architectures with circular chromosomes that are different sizes or linear chromosomes, like that in Agrobacterium species, and the organisms contain variable numbers of large plasmids (31, 49). Comparative genomic studies have highlighted the conservation of gene content and order among the chromosomes of some species of rhizobia (22, 23, 25, 40). Furthermore, Guerrero and colleagues (25) observed that most essential genes occur in syntenic arrangements and display a higher level of sequence identity than nonsyntenic genes. In contrast, plasmids, including symbiotic plasmids and symbiotic chromosomal islands (like those in Mesorhizobium loti and Bradyrhizobium japonicum) are poorly conserved in terms of both gene content and gene order (21). It is not clear what evolutionary advantage, if any, is provided by multipartite genomes, but some authors have speculated that such genomes may allow further accumulation of genes independent of the chromosome. Recently, Slater and coworkers (46) proposed a model for the origin of secondary chromosomes. Their idea is based on the notion of intragenomic gene transfers that might occur from primary chromosomes to ancestral plasmids of the repABC type. Observations of conservation of clusters of genes in secondary chromosomes or in large plasmids that retain synteny with respect to the main chromosome support this hypothesis (46).

We have been studying Rhizobium etli as a multipartite genome model species (23). This organism is a free-living soil bacterium that is able to form nodules and fix nitrogen in the roots of bean plants. The genome of R. etli is partitioned into several replicons, a circular chromosome, and several large plasmids. In the reference strain R. etli CFN42, the genome is composed of a circular chromosome consisting of about 4,381 kb and 6 large plasmids whose total size is 2,148 kb (23). A 371-kb plasmid, termed pSym or the symbiotic plasmid, contains most of the genes required for symbiosis (21). Previous studies have described the high level of genetic diversity among geographically different R. etli isolates (41). The strains are also variable with respect to the number and size of plasmids. Nevertheless, there has been no direct measurement of diversity at the genomic level, nor have comparative studies of shared and particular genomic features of R. etli strains been reported. Therefore, to assess the degrees of genomic difference and genomic similarity in R. etli, we obtained the complete genomic sequence of an additional R. etli strain and partial genomic sequences of six other R. etli strains isolated worldwide. Our results support the concept of a pangenomic structure at the multireplicon level and show that a highly conserved symbiotic plasmid is present in divergent R. etli isolates.


Bacterial strains.

Seven strains of R. etli were chosen for this study (Table (Table1),1), and in our analysis we also used the complete genomic sequences of R. etli CFN42 (referred to below as RetCFN42) and R. leguminosarum biovar viciae 3841 (referred to below as Rlv3841) described previously (23, 58). R. etli strains were previously classified by using the following taxonomic criteria: the ability to nodulate and fix nitrogen in common bean plants, the presence of reiterations of the nifHDK operon encoding nitrogenase, and demonstration of species-specific growth characteristics. The strains were isolated from distinct locations worldwide and had different plasmid contents (Table (Table11).

Genome features and sequence data for the R. etli strains used in this work

DNA sequencing.

Random genomic sequences were obtained from the seven strains by the shotgun method, and this was followed by capillary DNA sequencing using an ABI3730XL automatic DNA sequencing machine (Applied Biosystems, Foster City, CA). We determined the complete genomic sequence for one strain, R. etli CIAT652 (referred to below as RetCIAT652), and partial genomic sequences (coverage, about 1×) for the other six strains (Table (Table1).1). Assemblies were obtained using Phred-Phrap-Consed software (15, 16, 24). Gaps were filled in by use of appropriate PCR amplification. Partial assemblies were obtained for the low-coverage genomes using only high-quality readings. An ad hoc perl script was used to trim at least the first 20 low-quality bases at the 5′ and 3′ ends of each read. Reads with Phred values lower than Q20 were discarded.

Genome size prediction.

We used the formula of Fraser and Fleischmann (19) to estimate genome size (L) (L = −nw/ln(c/n) where n is the number of clones, w is the average read length, and c is the number of contigs). To predict the fraction (percentage) (p) of DNA which was not sequenced and to estimate genome coverage, we employed the formula of Lander and Waterman (p = enw/L) (33). To evaluate the accuracy of our predictions, we assembled 13,000 random reads for the complete genomes of RetCFN42 and RetCIAT652 and then predicted the genome sizes of the other strains using the formula described above and the partial genomic sequences. We found good agreement between the predictions and the experimentally determined lengths of the complete genomes (Table (Table11).

Protein families.

We performed pairwise comparisons of the whole proteomes of RetCFN42, RetCIAT652, and Rlv3841 using BLASTp with an E value of <1e-7 without filtering for low complexity and enabling the algorithm of Smith and Waterman (47). Orthologs were defined by constructing a similarity matrix using OrthoMCL (35). Similarity matrices were used to run the MCL program (14) to cluster orthologs (and recent paralogs) into families using the following parameters: inflation (I) = 1.5, initial iteration (i) = 1.4, initial inflation (l) = 5, scheme (K) = 7, and centering (c) = 1.2. Proteins that were not grouped into any family were classified as single-member families.

Distribution profiles of single-copy protein families.

Genes encoding members of single-member protein families were located in the replicons of RetCFN42 (seven replicons), RetCIAT652 (four replicons), and Rlv3841 (seven replicons). To identify replicons with common gene contents in the three genomes, we constructed a profile for each family using genome localization. The profiles were given the following numbers: 1 for the chromosomes of the three genomes; 2 to 7 for plasmids of RetCFN42 and Rlv3841, starting with the smallest plasmid; and 2 to 5 for plasmids of RetCIAT652, starting with the smallest plasmid. The absence of a gene in any replicon was encoded 0. As expected, linked genes in a replicon had the same profile. For instance, the profile 1-1-1 reflected genes present exclusively in chromosomes, and the profile 1-1-2 meant that the gene was located in the chromosome in two species but in plasmid 2 in the other species. To visualize the profiles, we used the program E-Burst V3 (17, 50).

Annotation and comparative genomics.

The RetCIAT652 genome was annotated manually by following the gene model constructed for the previously reported genomic sequence of RetCFN42 (23). Open reading frames were predicted by using GLIMMER 2.0 (10, 44), and annotations were obtained by analysis of BLASTx hits with the nonredundant databases of GenBank and Interpro. To compare partial genomic sequences with the nonredundant database of GenBank, BLASTx searches were performed, and the top hits were classified with respect to organisms with which they matched. Additional comparisons of the complete genomes of RetCFN42, RetCIAT62, and Rlv3841 and the collection of shotgun genomic sequences of strains of R. etli were performed using either BLASTn, BLASTx, or Mummer (12). To be considered homologous contigs, genes, or proteins, alignments had to be 30% (60% in the case of contigs) identical over 60% of the length of the largest contig, gene, or protein. Chromosomal islands were defined as contiguous groups of genes that were present in one strain but not in another. Predictions of chromosomal islands were obtained with the aid of the Alien-Hunter program (55). Pangenomic prediction was performed using the power law fit method as described by Tettelin and colleagues (54).

Genome tree construction.

R. etli complete genomes and the contigs of partial genomes were compared all versus all by BLASTn. Then we used the median percentage of identity for bidirectional best hits between pairs of genomes as a similarity measure, making no distinction between coding and noncoding regions (48). Further, this value was used to estimate the evolutionary distances based on the numbers of substitutions per site, to construct a distance matrix, and subsequently to construct an unrooted tree by the neighbor-joining method (57). For estimation of evolutionary distances we used the Poisson distance (d), determined as follows: d = −lnμ, where μ is (q − 0.05)/0.95 and q is the percentage of identity.

Nucleotide sequence accession numbers.

The complete sequences of the RetCIAT652 chromosome (accession number NC_010994) and plasmids pCIAT652a (NC_010998), pCIAT652b (NC_010996), and pCIAT652c (NC_010997) have been deposited in the GenBank database. Draft genomes of R. etli strains 8C-3 (NZ_ABRA00000000), Brasil5 (NZ_ABQZ00000000), CIAT894 (NZ_ABRD00000000), GR56 (NZ_AABRD00000000), IE4771 (NZ_ABRD00000000), and Kim5 (NZ_ABQY00000000) have also been deposited in the GenBank database.


General features of the genome of R. etli CIAT652.

R. etli strains have diverse genomic architectures highlighted by disparities in plasmid size and number (Table (Table1).1). To study the degree of intraspecies genomic similarity and divergence, we obtained the complete genomic sequence of RetCIAT652, a strain isolated in Costa Rica, and the partial sequences of six other R. etli strains isolated from various sites worldwide (23). RetCIAT652 had a circular chromosome consisting of about 4,513 kb and three plasmids (designated pCIAT652a, pCIAT652b, and pCIAT652c) consisting of about 414 kb, 429 kb, and 1,091 kb, respectively. The chromosome of RetCIAT652 is 131 kb larger than the chromosome of RetCFN42. In RetCIAT652, most of the genes required for symbiosis were carried on the pCIAT652b plasmid, which is 58 kb larger than the equivalent plasmid of R. etli CFN42, p42d. Annotation of the CIAT652 genome yielded 4,072 protein-encoding sequences (CDS) in functional classes and a substantial number (about 2,220) of hypothetical and orphan CDS. Compared with the CFN42 genome, the CIAT652 genome contained 473 more CDS with unknown functions and 215 fewer CDS for which functional annotations were available. Like the CFN42 genome, the CIAT652 genome contained a large number of CDS involved in transport and transcriptional regulation. There were 20 sigma subunits encoded in the CIAT652 genome, compared with the 23 sigma subunits encoded in the CFN42 genome.

Structural correspondence among RetCIAT652 and RetCFN42 replicons.

To establish structural correspondence among the replicons of the two R. etli strains, the chromosomal and plasmid sequences were aligned using Mummer (11). The chromosomes of both strain showed a straight line of synteny interrupted by several gaps of different sizes but without inversions or any other large rearrangements (Fig. (Fig.1).1). The plasmids were structurally heterogeneous, but some of them seemed to be equivalent. For instance, pCIAT652a had several large segments in common with p42e, as did pCIAT652b with p42d (pSym), and pCIAT652c with p42f and p42b. There were no matches with plasmids p42a and p42c of RetCFN42, indicating that these plasmids were not present in RetCIAT652.

FIG. 1.
Synteny relationships between RetCIAT652 and RetCFN42. (a) Nucmer plots of the chromosomes. (b) Nucmer plots of the concatenated plasmids.

Previous genomic comparisons of RetCFN42 and its close relative Rlv3841 showed that there is extensive chromosomal synteny and, to a lesser degree, synteny between some pairs of plasmids (9). A similar result was obtained when RetCIAT652 and Rlv3841 were compared (data not shown). Despite the divergence of these species, plasmid pCIAT652a showed conservation with pRL11 and plasmid pCIAT652c showed conservation with pRL9 and pRL12. The symbiotic plasmids of R. etli (p42d and pCIAT652b) are not related to any replicon in R1v38411, except for 20 common genes required for symbiosis (9). Recently, the complete genome sequences of two strains of R. leguminosarum biovar trifoli (RtrWSM1325 and RtrWSM2304) were deposited in the GenBank database; for RtrWSM1325 the accession number for the chromosome was NC_012850 and the accession numbers for the plasmids were NC_12848, NC_12858, NC_12853, NC_12852, and NC_12854, and for RtrWSM2304 the accession number for the chromosome was NC_011369 and the accession numbers for the plasmids were NC_11366, NC_011368, NC_011370, and NC_011371. Strain RtrWSM1325 has 5 plasmids, and strain RtrWSM2304 contains 4 plasmids. We looked for plasmid equivalence between these strains and RetCFN42. Nucmer comparisons showed that the plasmids of the RtrWSM1325 and RtrWSM2304 strains have large syntenic regions in common with plasmids p42b, p42c, p42e, and p42f but not with the pSym (p42d) plasmid. This observation suggests that there may be a common plasmid pool for R. leguminosarum and R. etli.

Core genome of Rhizobium.

Synteny relationships among the replicons of the two R. etli strains and the single Rlv3841 strain indicate that the core genome of Rhizobium might be not confined to the chromosome but may extend to some plasmids. To test this possibility, we used a clustering method to group the whole predicted proteomes encoded by the three complete genomes into protein families and then examine the distributions of these families in replicons. We found that a set of 3,971 protein families was encoded in the three genomes; 3,753 of these protein families were protein products of single genes, whereas 218 families corresponded to families with two or more protein homologs encoded in the genomes (Fig. (Fig.2).2). The genes encoding the 3,753 single-protein families were localized in the replicons of the three genomes by constructing presence-absence profiles, as described in Materials and Methods. Genes encoding core protein families were found predominantly in chromosomes but were also present in plasmids common to the three genomes (Fig. (Fig.3).3). There were two main clusters that contained plasmid-encoded proteins. One cluster corresponded to p42f, pCIAT652c, and pRL12 genes encoding 242 proteins and represented 42% of the coding capacity of p42f, 23% of the coding capacity of pCIAT652c, and 30% of the coding capacity of pRL12. The second cluster consisted of 237 proteins encoded by genes in p42e (51% of the total coding capacity), pCIAT652a (59%), and pRL11 (37%). Furthermore, most genes encoding these proteins were arranged in syntenic segments common to the three plasmids (data not shown). Another cluster consisted of 91 proteins common to plasmid pCIAT652c, the largest plasmid in the CIAT652 genome, and plasmids p42b and pRL9. Since, as shown above, plasmid pCIAT652c is related p42f and pRL12, this cluster shows that pCIAT652c might represent a chimeric structure that originated by interreplicon recombination. In addition, several other minor profiles appeared in the analysis, indicating that intragenomic recombination also involves small DNA segments (Fig. (Fig.3,3, smallest circles).

FIG. 2.
Common protein families in R. etli and Rlv3841. A total of 19,085 predicted proteins encoded by the three genomes were clustered into families using the OrthoMCL and MCL algorithms (14, 35) (see Materials and Methods). The total numbers of families are ...
FIG. 3.
Core genome profiles and their distribution in Rhizobium replicons. Single-member protein families were used to construct presence-absence profiles for replicons of the three genomes. Large circles represent the most common profiles, and small circles ...

Core genome stability is affected by other recombination processes, like gene duplications and gene loss. For construction of profiles we only used single-member proteins; thus, we were unable to observe the effect of gene duplications. However, gene loss was illustrated well by the presence of profiles that included proteins encoded by only two genomes in common clusters. Proteins encoded by only two genomes are subgroups of core replicons (data not shown). For example, proteins represented by the pCIAT642c-pCFN42f, pCIAT652c-pRL12, and pCFN42f-pRL12 profiles, as well as proteins encoded by two chromosomes, fall into this category (data not shown). A particular but important case in this category is the proteins encoded by the symbiotic plasmids that have a unique profile for the two R. etli strains and are grouped separate from the symbiotic plasmid pRL10 of Rlv3841 together with R. etli plasmid p42c (data not shown) (9). These data suggest that the genes that encode proteins involved in symbiosis are not part of the core genome of Rhizobium (Fig. (Fig.3).3). Quite a few of the symbiosis genes that have been identified are common among Rhizobium species (9, 22). In RetCIAT652, RetCFN42, and Rlv3841, only 11 genes were maintained in the symbiotic plasmids (cluster 18) (Fig. (Fig.3).3). These genes are nodA, nodB, nodC, fdxB, nodI, nodJ, fixX, fixA, fixB, fixC, and nifN.

Estimation of the size of the core genome of Rhizobium was performed by using the methods of Tettelin et al. (54) and the three complete Rhizobium genome sequences available (RetCFN42, RetCIAT652, and Rlv4841 sequences). This estimation yielded 3,220 core genes (with a 99% confidence interval) and predicted that about 99 new genes might be added to the pangenome by every new complete genome sequence (data not shown). Although the analysis was limited by the small number of complete Rhizobium genome sequences available, the estimate for the α parameter was 0.6. Since α represents the proportion of new genes discovered as more genomes are sampled, the fact that in this case α is <1 suggests that Rhizobium might have an open pangenome.

Accessory genome of Rhizobium.

The difference between R. etli and R. leguminosarum can be measured by estimating the numbers of species-specific proteins. Rlv3841 has the highest number of individual protein families (2,071), whereas the two R. etli strains were significantly different for this characteristic (Fig. (Fig.2).2). There were 698 protein families of RetCFN42 that were not present in RetCIAT652 and 994 protein families in RetCIAT652 that were not present in RetCFN42. Most members of these families are hypothetical conserved proteins with unknown functions or orphans (23% and 35% for RetCFN42 and RetCIAT652, respectively). In contrast to core genes, which have an average G+C content of 61%, genes unique to each genome had low G+C contents (on average, 57 to 58%).

To further examine genomic differences between R. etli strains, we analyzed samples of genome sequences from six R. etli strains having distinct geographical origins. The data were obtained by random shotgun sequencing of whole genomes with coverage of about 1× that of the predicted genome length (Table (Table1).1). This coverage allowed estimation of the genome lengths for these strains of R. etli. The genome size varied over a wide range; strain GR56, an isolate from Spain, had the smallest genome (about 5,000 kb), and RetCFN42 and RetCIAT652 had the largest genomes. Thus, the differences in genome sizes among the strains of R. etli examined here were on the order of hundreds of kilobases to 1,500 kb (Table (Table11).

To determine genomic relationships among R. etli strains, BLASTn analysis was used to compare the contigs of each partial genomic sequence with the whole-genome sequences of RetCFN42 and RetCIAT652 and the GenBank nonredundant database. As expected, most sequences of R. etli strains were present in the complete genomes (Fig. (Fig.4,4, red and orange bars). These sequences were equivalent to approximately 3,000 kb of common DNA and thus represented less than 50% of the total genome of RetCFN42 or RetCIAT652. In addition, there was about 1,000 kb in each strain for which no homologous sequences were detected in the complete RetCFN42 genome. A smaller, but still substantial, amount of extra DNA was found when similar comparisons were performed using the RetCIAT652 genome as the reference (Fig. (Fig.4).4). A proportion of this extra DNA showed matches to sequences of other organisms deposited in the GenBank database. However, most of the extra DNA did not match any other sequence in the database. A small proportion of the extra DNA was similar to sequences present in at least one other R. etli strain (Fig. (Fig.4,4, dark blue and light blue bars).

FIG. 4.
DNA common to or unique in R. etli strains. DNA readings from each partial genome sequence assembled into contigs were compared, using BLASTn, with the complete genomic sequences of RetCIAT652 (red bars) and RetCFN42 (orange bars), using the parameters ...

Chromosomal islands in R. etli.

When the collection of shotgun genomic sequences was aligned with the sequence of the chromosome of either RetCFN42 or RetCIAT652, the distribution of sequences along the chromosomes was found to be essentially random. Almost every chromosomal region of RetCFN42 and RetCIAT652 contained sequences present in at least one strain. Nevertheless, some chromosomal regions in RetCFN42 and RetCIAT652 contained sequences with no matches with any sequence from either incompletely or completely sequenced R. etli strains or Rlv3841 (Fig. (Fig.5).5). A prediction of the Alien Hunter program suggested that many of these regions are chromosomal islands that were acquired by horizontal transfer. Such regions differ from the average with respect to nucleotide composition and codon usage. We analyzed 13 chromosomal islands that were present exclusively in RetCFN42 and 12 chromosomal islands that were unique to RetCIAT652. As Fig. Fig.55 shows, these islands were for the most part not present in the other R. etli strains examined and Rlv3841. The chromosomal islands were variable in length (range, about 8 kb to 69 kb), and the largest chromosomal island was found in the RetCFN42 chromosome. The islands were dispersed throughout the chromosomes, and only three locations appeared to be preferentially used for island insertion. Islands 3, 4, and 6 of RetCFN42 had the same locations as islands in the RetCIAT652 chromosome (islands 3, 4, and 5), but the islands differed in gene composition. Island 3, between the tRNAHis and tRNAGln genes in the RetCFN42 chromosome, contains genes already described as the α-lps region, which is involved in synthesis, maturation, and transport of the O antigen (8, 13). Although mutations in some genes of this locus affect the symbiotic capabilities of R. etli CE3 (a streptomycin-resistant strain derived from RetCFN42), the presence of this island is not widespread in R. etli. In contrast, an island at the same position was found in the chromosome of RetCIAT652, and it also seemed to contain genes involved in polysaccharide synthesis and transport; however, none of these genes was homologous to genes of the α-lps loci. Island 4 of both chromosomes was highly variable in terms of genetic content, whereas island 6 in RetCFN42 (island 5 in RetCIAT652) was located near the putative terminus of replication, a region of the bacterial chromosome thought to undergo frequent genetic rearrangement. The other islands harbored most genes unique to RetCFN42 or RetCIAT652. These genes are genes related to a variety of enzyme activities, genes with unknown functions, and mobile elements (insertion sequences and genes of phage and plasmid origin). Genomic comparisons of B. japonicum strains using macroarrays of the reference strain B. japonicum USDA110 showed the presence of 14 genomic islands, some of which were associated with the symbiotic performance of strain USDA110 (29). Thus, the presence of genomic islands in R. etli might also be related to some symbiotic capabilities, but this possibility was not addressed here.

FIG. 5.
Chromosomal islands in R. etli. Readings for every partial genomic sequence were aligned, using BLASTn, with the chromosomes of RetCIAT652 (a) and RetCFN42 (b). The outer blue circles represent sequences from strains 8C-3, Brasil 5, CIAT894, GR56, IE4771, ...

DNA conservation among R. etli pSym plasmids.

We showed that the two complete R. etli genome sequences displayed a high degree of conservation at the chromosomal level and that large syntenic segments were present in plasmids. To evaluate the level of DNA divergence between homologous replicons of the two completely sequenced R. etli genomes, we compiled all local alignments made by BLASTn. The aligned regions of plasmids pCIAT652a and pCIAT652c, as well as those of the chromosomes, had levels of nucleotide identity of about 85 to 95% (Fig. (Fig.6).6). In marked contrast, sequences that the pSym plasmid of RetCFN42 (pCFN42d) and the pSym plasmid of RetCIAT652 (pCIAT652b) had in common showed the highest levels of nucleotide identity for these genomes (98 to 100%) (Fig. (Fig.6).6). Mummer alignments of the collection of partial genomic sequences with the complete genome sequence of RetCFN42 showed a similar pattern of nucleotide identity in the pSym plasmids, in contrast to the rest of the genomic sequences (Table (Table2).2). An exception to this pattern was strain IE4771, an isolate from Puebla, México, which had the most divergent pSym sequences compared with the pSym plasmid sequences of both RetCFN42 and RetCIAT652. Strain IE4771 also had a low proportion of pSym sequences (less than 10%) compared with other strains, which clearly contained at least 50% of the pSym sequence (Table (Table2).2). This indicates that the IE4771 strain lost pSym or that a different type of pSym plasmid is present. The latter suggestion seems more plausible because BLASTx comparisons of the partial sequence of strain IE4771 yielded matches with some pSym genes of RetCFN42 or RetCIAT652, including nifH, nifA, fixN, fixA, and fixB.

FIG. 6.
Nucleotide identities for the replicons of RetCIAT652 and RetCFN42. Local alignments constructed by BLASTn for the two genomes were computed using individual maximal segment pairs (MSPs). All MSPs more than 200 bp long with levels of nucleotide identity ...
Nucleotide identities of pSym sequencesa

When the R. etli CIAT652 genome was used as a reference in a comparison of the partial genome sequences, other conservation patterns were observed. Strains 8C-3 and Brasil 5 exhibited a high level of nucleotide conservation compared to RetCIAT652, and there were no appreciable differences in nucleotide identities among the pSym sequences and the rest of their genomes. Two strains (IE4771 and Kim5) had the lowest levels of nucleotide identity in the genome as a whole, and only strains CIAT894 and GR56 had pSym sequences that were more conserved than the chromosome (Table (Table22).

To evaluate if there is a relationship between the overall genomic divergence in R. etli strains and the conservation of pSym, we constructed a distance matrix tree based on BLASTn comparisons performed in an all-versus-all manner, using both the two complete genomes and the partial genomic sequences of R. etli and including Rlv3841 as the outgroup (Fig. (Fig.7).7). This tree showed that RetCIAT652, 8C-3, and Brasil 5 are very closely related but the rest of the strains have diverged to different degrees. In particular, RetCFN42 appeared to have no close relatives. Based on all of these observations, we concluded that pSym sequences are highly conserved and dispersed in variable genomic backgrounds.

FIG. 7.
Genetic relatedness of R. etli strains. A distance matrix derived from all-versus-all BLASTn alignments was used to estimate the degrees of divergence among R. etli strains and R1v3841. Next, an unrooted tree was constructed using the neighbor-joining ...


Previous theories of bacterial evolution emphasized that bacteria have evolved a “strategy to expand the effective genome size of the species without imposing on each individual the burden of reproducing the entire genome” (7). Campbell and other researchers hypothesized that a bacterial genome is composed of an “euchromosome” and “accessory elements” (7, 43). The terms have been changed in the modern genomic era and are now the “core genome” and “pangenome” (3, 37, 52). The core genome is defined as the set of genes shared by all members of a monophyletic group (1). In contrast, the pangenome is an expanded version of the repertoire of genes found in a species, and accessory genes are not present in all individual bacteria. Genome sequences of a number of strains of species such as S. agalactiae, E. coli, Haemophilus influenzae, and Streptococcus pneumoniae have provided support for this concept (27, 28, 42, 52).

We have previously sequenced the complete genome of RetCFN42, the reference strain. Here we approached an understanding of the pangenomic structure of R. etli by sequencing another complete genome and by low-coverage sequencing of six other genomes of strains of R. etli. We found a set of genes that might represent the core genome of Rhizobium by comparing sequences of the two complete R. etli genomes and the sequence of the close relative Rlv3841. Furthermore, we found that in R. etli (and Rlv3841) the core genome is not limited to the chromosome but also extends to some large plasmids. It is known that members of the rhizobia have multipartitioned genomes composed of the chromosome and a variable number of plasmids (31, 49). Genes of the core genome are commonly carried on the chromosome and are maintained in syntenic blocks in closely related species (25, 51). In contrast, neither symbiotic plasmids nor other plasmids are conserved, except for the presence of a few common genes (21, 23). We found here that plasmids p42f, pCIAT652c, and pRL12 are highly related in terms of gene content, as are plasmids p42e, pCIAT652a, and pRL11 and might be considered part of the core genome. The large plasmids and the linear chromosome of Agrobacterium, chromosome II of Brucella, and the pSymB plasmid of Sinorhizobium meliloti might also be viewed as secondary chromosomes (46). The sizes of the replicons, the G+C content, and the presence of certain classes of genes normally found in primary chromosomes make this possibility plausible. Recently, Slater and colleagues suggested that secondary chromosomes originated via intragenomic gene transfers from primary chromosomes to an ancestral repABC replicon (46). Evidence for this hypothesis comes from the conservation of several syntenic blocks of genes, such as minCDE (cell division proteins), hutIGU (histidine biosynthesis), and pcaGHID (protocatechuate biosynthesis), in secondary chromosomes and plasmids across members of the rhizobia (46). These three blocks of genes are located in the p42e-pCIAT652a-pRL11 cluster that is included in the set of genes encoding core proteins. Our comparison in the present work revealed that a substantial proportion of about 479 single-member protein families are encoded in the largest plasmids common to the genomes of R. etli and Rlv3841. Furthermore, most genes encoding these proteins were arranged in common syntenic segments (data not shown).

We found that a substantial proportion of DNA in the newly partially sequenced strains of R. etli was not present in the model strain RetCFN42 and in RetCIAT652, whose new complete genome sequence is reported here. There were 738 and 1,002 different protein-encoding genes in RetCFN42 and RetCIAT652, respectively, for which complete genome sequences are available. Similar amounts of extra DNA were detected when partial genomic sequences of various R. etli strains were compared to sequences of the complete genomes of RetCFN42 and RetCIAT652. The extra DNA represents the accessory component of the R. etli pangenome. This DNA has a low G+C content and contains numerous hypothetical genes and mobile elements that are also common in the accessory components of other bacterial species. Strain-specific chromosomal islands, which were shown here to be present in the chromosomes of RetCFN42 and RetCIAT652, are some of the locations of such extra DNA. The plasmid pool of R. etli is variable and also contributes importantly to the extra DNA, but it is not known to what extent.

A striking result of the present work was the high level of nucleotide identity in homologous segments of pSym of R. etli, in contrast to the more divergent sequences seen in the rest of the genome. In these segments there are 210 very conserved CDS (98 to 100% nucleotide identity), which represent 60% of the coding capacity of the pSym plasmid of RetCFN42, including known symbiosis genes (nif, nod, fix, fdx), as well as other genes not involved in symbiosis (vir, tra) and hypothetical genes. Only strain IE4771 displayed a low level of nucleotide identity and a low level of coverage compared with pSym sequences of RetCFN42 and RetCIAT652. According to 16S RNA gene data and nodulation tests, strain IE4771 belongs to an R. etli group that is distinct because it has two copies, not three copies, of nifH (three copies are usual in the more common R. etli strains) (45). These data suggest that at least two symbiotic plasmids may exist in the R. etli population. One plasmid would be highly conserved, often found in R. etli isolates, and prototypically defined by three nifH reiterations. This plasmid is exemplified by the pSym plasmids of RetCFN42 and RetCIAT652. The other, more divergent pSym plasmid, which is present in isolates from Puebla, México, has not been characterized at the genomic level yet. A recent origin of pSym is the simplest explanation for the high level of conservation of pSym in very divergent R. etli isolates. Alternatively, nodulation performance might provide strong selection pressure, selecting against any variation in pSym. The latter hypothesis seems improbable, as pSym genes with roles in nodulation and genes having unknown functions both have identical nucleotide sequences. In previous work, we analyzed the patterns of single-nucleotide polymorphisms in DNA sequences of the pSym plasmids of several R. etli strains (some of which were included in this study) (18). The data indicate that most of the nucleotide substitutions are spread over the population by recombination and that the contribution of mutations to polymorphism is relatively low. In agreement with this model, very few nucleotide variations were found in the pSym sequences compared here.

Several years ago Palacios and colleagues (38) asked how many genotypes would be capable of conferring the R. etli phenotype. Our data indicate that a unique pSym genotype (or perhaps very few pSym genotypes) might be responsible for the ability of R. etli to nodulate the common bean. Other comparative genomics studies using microarrays have shown that the symbiotic plasmid pSymA is the most variable replicon in strains of S. meliloti (20, 26). This result contrasts with the pSym conservation in the R. etli strains compared here. Since it was demonstrated that the ability to nodulate is encoded by plasmids (30), it has been common to call these plasmids “symbiotic.” New genome sequencing technologies and comparative genomics have revealed that a variety of mechanisms have led to symbiosis with legumes (36) and that pSym plasmids in rhizobial species are not comparable despite the fact that they have some common nod and nif genes. It should be emphasized that as genome sequence technology is becoming more accessible, it is now feasible to analyze many more R. etli genomes to understand diversification and evolution in R. etli pSym plasmids and the genome of the species as a whole. Future work should also help determine more accurately the sizes of the core and accessory genomes, as well as the size of the plasmid pool of R. etli. Lastly, a clear picture of the evolutionary relationships among the different genome components of Rhizobium should emerge from studies performed with this kind of approach.


We thank José Espíritu and Víctor del Moral for help with computational resources and Miguel A. Cevallos for critical reading of the manuscript. We also thank the anonymous reviewers for their valuable suggestions.

This work was supported by grants from CONACyT (grant U4633) and PAPIIT-UNAM (grants IN215908 and IN223005).

V.G. was responsible for the experimental design and manuscript preparation; J.L.A. was responsible for the comparative genomic analysis; R.I.S. and P.B. were responsible for genome sequencing and annotation; I.L.H.G. was responsible for bioinformatic analysis; J.L.F. and R.D. were responsible for genome sequencing; M.F., R.P., and G.D. were responsible for discussion of the data and provision of supporting materials; and J.M. was responsible for genome sequencing and participated in discussions. All authors read and approved the final manuscript.


[down-pointing small open triangle]Published ahead of print on 4 January 2010.


1. Abby, S., and V. Daubin. 2007. Comparative genomics and the evolution of prokaryotes. Trends Microbiol. 15:135-141. [PubMed]
2. Andersson, S. G., A. Zomorodipour, J. O. Andersson, T. Sicheritz-Ponten, U. C. Alsmark, R. M. Podowski, A. K. Naslund, A. S. Eriksson, H. H. Winkler, and C. G. Kurland. 1998. The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396:133-140. [PubMed]
3. Bentley, S. 2009. Sequencing the species pan-genome. Nat. Rev. Microbiol. 7:258-259. [PubMed]
4. Bentley, S. D., K. F. Chater, A. M. Cerdeno-Tarraga, G. L. Challis, N. R. Thomson, K. D. James, D. E. Harris, M. A. Quail, H. Kieser, D. Harper, A. Bateman, S. Brown, G. Chandra, C. W. Chen, M. Collins, A. Cronin, A. Fraser, A. Goble, J. Hidalgo, T. Hornsby, S. Howarth, C. H. Huang, T. Kieser, L. Larke, L. Murphy, K. Oliver, S. O'Neil, E. Rabbinowitsch, M. A. Rajandream, K. Rutherford, S. Rutter, K. Seeger, D. Saunders, S. Sharp, R. Squares, S. Squares, K. Taylor, T. Warren, A. Wietzorrek, J. Woodward, B. G. Barrell, J. Parkhill, and D. A. Hopwood. 2002. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature 417:141-147. [PubMed]
5. Bergthorsson, U., and H. Ochman. 1995. Heterogeneity of genome sizes among natural isolates of Escherichia coli. J. Bacteriol. 177:5784-5789. [PMC free article] [PubMed]
6. Buell, C. R., V. Joardar, M. Lindeberg, J. Selengut, I. T. Paulsen, M. L. Gwinn, R. J. Dodson, R. T. Deboy, A. S. Durkin, J. F. Kolonay, R. Madupu, S. Daugherty, L. Brinkac, M. J. Beanan, D. H. Haft, W. C. Nelson, T. Davidsen, N. Zafar, L. Zhou, J. Liu, Q. Yuan, H. Khouri, N. Fedorova, B. Tran, D. Russell, K. Berry, T. Utterback, S. E. Van Aken, T. V. Feldblyum, M. D'Ascenzo, W. L. Deng, A. R. Ramos, J. R. Alfano, S. Cartinhour, A. K. Chatterjee, T. P. Delaney, S. G. Lazarowitz, G. B. Martin, D. J. Schneider, X. Tang, C. L. Bender, O. White, C. M. Fraser, and A. Collmer. 2003. The complete genome sequence of Arabidopsis and the tomato pathogen Pseudomonas syringae pv. tomato DC3000. Proc. Natl. Acad. Sci. U. S. A. 100:10181-10186. [PubMed]
7. Campbell, A. 1981. Evolutionary significance of accessory DNA elements in bacteria. Annu. Rev. Microbiol. 35:55-83. [PubMed]
8. Carlson, R. W., B. Reuhs, T. B. Chen, U. R. Bhat, and K. D. Noel. 1995. Lipopolysaccharide core structures in Rhizobium etli and mutants deficient in O-antigen. J. Biol. Chem. 270:11783-11788. [PubMed]
9. Crossman, L. C., S. Castillo-Ramírez, C. McAnnula, L. Lozano, G. S. Vernikos, J. L. Acosta, Z. F. Ghazoui, I. Hernández-González, G. Meakin, A. W. Walker, M. F. Hynes, J. P. Young, J. A. Downie, D. Romero, A. W. Johnston, G. Dávila, J. Parkhill, and V. González. 2008. A common genomic framework for a diverse assembly of plasmids in the symbiotic nitrogen fixing bacteria. PLoS One 3:e2567. [PMC free article] [PubMed]
10. Delcher, A. L., D. Harmon, S. Kasif, O. White, and S. L. Salzberg. 1999. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27:4636-4641. [PMC free article] [PubMed]
11. Delcher, A. L., A. Phillippy, J. Carlton, and S. L. Salzberg. 2002. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30:2478-2483. [PMC free article] [PubMed]
12. Delcher, A. L., S. L. Salzberg, and A. M. Phillippy. 2003. Using MUMmer to identify similar regions in large sequence sets. Chapter 10, unit 10.3. Curr. Protoc. Bioinformatics 2003:10.3.1-10.3.18. doi:.10.1002/0471250953.bi1003s00 [PubMed] [Cross Ref]
13. Duelli, D. M., A. Tobin, J. M. Box, V. S. Kolli, R. W. Carlson, and K. D. Noel. 2001. Genetic locus required for antigenic maturation of Rhizobium etli CE3 lipopolysaccharide. J. Bacteriol. 183:6054-6064. [PMC free article] [PubMed]
14. Enright, A. J., S. Van Dongen, and C. A. Ouzounis. 2002. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30:1575-1584. [PMC free article] [PubMed]
15. Ewing, B., and P. Green. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8:186-194. [PubMed]
16. Ewing, B., L. Hillier, M. C. Wendl, and P. Green. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8:175-185. [PubMed]
17. Feil, E. J., B. C. Li, D. M. Aanensen, W. P. Hanage, and B. G. Spratt. 2004. eBURST: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data. J. Bacteriol. 186:1518-1530. [PMC free article] [PubMed]
18. Flores, M., L. Morales, A. Avila, V. González, P. Bustos, D. García, Y. Mora, X. Guo, J. Collado-Vides, D. Piñero, G. Dávila, J. Mora, and R. Palacios. 2005. Diversification of DNA sequences in the symbiotic genome of Rhizobium etli. J. Bacteriol. 187:7185-7192. [PMC free article] [PubMed]
19. Fraser, C. M., and R. D. Fleischmann. 1997. Strategies for whole microbial genome sequencing and analysis. Electrophoresis 18:1207-1216. [PubMed]
20. Giuntini, E., A. Mengoni, C. De Filippo, D. Cavalieri, N. Aubin-Horth, C. R. Landry, A. Becker, and M. Bazzicalupo. 2005. Large-scale genetic variation of the symbiosis-required megaplasmid pSymA revealed by comparative genomic analysis of Sinorhizobium meliloti natural strains. BMC Genomics 6:158. [PMC free article] [PubMed]
21. González, V., P. Bustos, M. A. Ramírez-Romero, A. Medrano-Soto, H. Salgado, I. Hernández-González, J. C. Hernández-Celis, V. Quintero, G. Moreno-Hagelsieb, L. Girard, O. Rodríguez, M. Flores, M. A. Cevallos, J. Collado-Vides, D. Romero, and G. Dávila. 2003. The mosaic structure of the symbiotic plasmid of Rhizobium etli CFN42 and its relation to other symbiotic genome compartments. Genome Biol. 4:R36. [PMC free article] [PubMed]
22. González, V., L. Lozano, S. Castillo-Ramírez, I. Hernández-González, P. Bustos, R. I. Santamaría, J. L. Fernández, J. L. Acosta, and G. Dávila. 2008. Evolutionary genomics of the nitrogen-fixing symbiotic bacteria, p. 183-198. In P. Dion and C. S. Nautiyal (ed.), Molecular mechanisms of plant and microbe coexistence, vol. 15. Springer, Heidelberg, Germany.
23. González, V., R. I. Santamaría, P. Bustos, I. Hernández-González, A. Medrano-Soto, G. Moreno-Hagelsieb, S. C. Janga, M. A. Ramírez, V. Jimenez-Jacinto, J. Collado-Vides, and G. Dávila. 2006. The partitioned Rhizobium etli genome: genetic and metabolic redundancy in seven interacting replicons. Proc. Natl. Acad. Sci. U. S. A. 103:3834-3839. [PubMed]
24. Gordon, D., C. Abajian, and P. Green. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8:195-202. [PubMed]
25. Guerrero, G., H. Peralta, A. Aguilar, R. Díaz, M. A. Villalobos, A. Medrano-Soto, and J. Mora. 2005. Evolutionary, structural and functional relationships revealed by comparative analysis of syntenic genes in Rhizobiales. BMC Evol. Biol. 5:55. [PMC free article] [PubMed]
26. Guo, H., S. Sun, B. Eardly, T. Finan, and J. Xu. 2009. Genome variation in the symbiotic nitrogen-fixing bacterium Sinorhizobium meliloti. Genome 52:862-875. [PubMed]
27. Hiller, N. L., B. Janto, J. S. Hogg, R. Boissy, S. Yu, E. Powell, R. Keefe, N. E. Ehrlich, K. Shen, J. Hayes, K. Barbadora, W. Klimke, D. Dernovoy, T. Tatusova, J. Parkhill, S. D. Bentley, J. C. Post, G. D. Ehrlich, and F. Z. Hu. 2007. Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome. J. Bacteriol. 189:8186-8195. [PMC free article] [PubMed]
28. Hogg, J. S., F. Z. Hu, B. Janto, R. Boissy, J. Hayes, R. Keefe, J. C. Post, and G. D. Ehrlich. 2007. Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains. Genome Biol. 8:R103. [PMC free article] [PubMed]
29. Itakura, M., K. Saeki, H. Omori, T. Yokoyama, T. Kaneko, S. Tabata, T. Ohwada, S. Tajima, T. Uchiumi, K. Honnma, K. Fujita, H. Iwata, Y. Saeki, Y. Hara, S. Ikeda, S. Eda, H. Mitsui, and K. Minamisawa. 2009. Genomic comparison of Bradyrhizobium japonicum strains with different symbiotic nitrogen-fixing capabilities and other Bradyrhizobiaceae members. ISME J. 3:326-339. [PubMed]
30. Johnston, A. W. B., J. L. Beynon, A. V. Buchanan-Wollaston, S. M. Setchell, P. R. Hirsh, and J. E. Beringer. 1978. High frequency transfer of nodulating ability between strains and species of Rhizobium. Nature 276:634-636.
31. Jumas-Bilak, E., S. Michaux-Charachon, G. Bourg, M. Ramuz, and A. Allardet-Servent. 1998. Unconventional genomic organization in the alpha subgroup of the Proteobacteria. J. Bacteriol. 180:2749-2755. [PMC free article] [PubMed]
32. Konstantinidis, K. T., and J. M. Tiedje. 2004. Trends between gene content and genome size in prokaryotic species with larger genomes. Proc. Natl. Acad. Sci. U. S. A. 101:3160-3165. [PubMed]
33. Lander, E. S., and M. S. Waterman. 1988. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2:231-239. [PubMed]
34. Legault, B. A., A. López-López, J. C. Alba-Casado, W. F. Doolittle, H. Bolhuis, F. Rodríguez-Valera, and R. T. Papke. 2006. Environmental genomics of “Haloquadratum walsbyi” in a saltern crystallizer indicates a large pool of accessory genes in an otherwise coherent species. BMC Genomics 7:171. [PMC free article] [PubMed]
35. Li, L., C. J. Stoeckert, Jr., and D. S. Roos. 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178-2189. [PubMed]
36. Masson-Boivin, C., E. Giraud, X. Perret, and J. Batut. 2009. Establishing nitrogen-fixing symbiosis with legumes: how many rhizobium recipes? Trends Microbiol. 17:458-466. [PubMed]
37. Medini, D., C. Donati, H. Tettelin, V. Masignani, and R. Rappuoli. 2005. The microbial pan-genome. Curr. Opin. Genet. Dev. 15:589-594. [PubMed]
38. Palacios, R., M. Flores, S. Brom, E. Martínez, V. González, S. Frenk, C. Quinto, M. A. Cevallos, L. Girard, D. Romero, A. Garciarrubio, D. Piñero, and G. Dávila. 1986. Organization of the Rhizobium phaseoli genome, p. 151-156. In D. P. S. Verma and N. Brisson (ed.), Molecular genetics of plant-microbe interactions. Martinus Nijhoff Publishers Organization, Dordrecht, the Netherlands.
39. Pallen, M. J., and B. W. Wren. 2007. Bacterial pathogenomics. Nature 449:835-842. [PubMed]
40. Paulsen, I. T., R. Seshadri, K. E. Nelson, J. A. Eisen, J. F. Heidelberg, T. D. Read, R. J. Dodson, L. Umayam, L. M. Brinkac, M. J. Beanan, S. C. Daugherty, R. T. Deboy, A. S. Durkin, J. F. Kolonay, R. Madupu, W. C. Nelson, B. Ayodeji, M. Kraul, J. Shetty, J. Malek, S. E. Van Aken, S. Riedmuller, H. Tettelin, S. R. Gill, O. White, S. L. Salzberg, D. L. Hoover, L. E. Lindler, S. M. Halling, S. M. Boyle, and C. M. Fraser. 2002. The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts. Proc. Natl. Acad. Sci. U. S. A. 99:13148-13153. [PubMed]
41. Piñero, D., E. Martinez, and R. K. Selander. 1988. Genetic diversity and relationships among isolates of Rhizobium leguminosarum biovar phaseoli. Appl. Environ. Microbiol. 54:2825-2832. [PMC free article] [PubMed]
42. Rasko, D. A., M. J. Rosovitz, G. S. Myers, E. F. Mongodin, W. F. Fricke, P. Gajer, J. Crabtree, M. Sebaihia, N. R. Thomson, R. Chaudhuri, I. R. Henderson, V. Sperandio, and J. Ravel. 2008. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J. Bacteriol. 190:6881-6893. [PMC free article] [PubMed]
43. Reanney, D. 1976. Extrachromosomal elements as possible agents of adaptation and development. Bacteriol. Rev. 40:552-590. [PMC free article] [PubMed]
44. Salzberg, S. L., A. L. Delcher, S. Kasif, and O. White. 1998. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26:544-548. [PMC free article] [PubMed]
44a. Shinegobu, S., H. Watanabe, M. Hattori, Y. Sakaki, and H. Ishikawa. 2000. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407:81-86. [PubMed]
45. Silva, C., P. Vinuesa, L. E. Eguiarte, V. Souza, and E. Martinez-Romero. 2005. Evolutionary genetics and biogeographic structure of Rhizobium gallicum sensu lato, a widely distributed bacterial symbiont of diverse legumes. Mol. Ecol. 14:4033-4050. [PubMed]
46. Slater, S. C., B. S. Goldman, B. Goodner, J. C. Setubal, S. K. Farrand, E. W. Nester, T. J. Burr, L. Banta, A. W. Dickerman, I. Paulsen, L. Otten, G. Suen, R. Welch, N. F. Almeida, F. Arnold, O. T. Burton, Z. Du, A. Ewing, E. Godsy, S. Heisel, K. L. Houmiel, J. Jhaveri, J. Lu, N. M. Miller, S. Norton, Q. Chen, W. Phoolcharoen, V. Ohlin, D. Ondrusek, N. Pride, S. L. Stricklin, J. Sun, C. Wheeler, L. Wilson, H. Zhu, and D. W. Wood. 2009. Genome sequences of three Agrobacterium biovars help elucidate the evolution of multichromosome genomes in bacteria. J. Bacteriol. 191:2501-2511. [PMC free article] [PubMed]
47. Smith, T. F., and M. S. Waterman. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195-197. [PubMed]
48. Snel, B., M. A. Huynen, and B. E. Dutilh. 2005. Genome trees and the nature of genome evolution. Annu. Rev. Microbiol. 59:191-209. [PubMed]
49. Sobral, B. W., R. J. Honeycutt, and A. G. Atherly. 1991. The genomes of the family Rhizobiaceae: size, stability, and rarely cutting restriction endonucleases. J. Bacteriol. 173:704-709. [PMC free article] [PubMed]
50. Spratt, B. G., W. P. Hanage, B. Li, D. M. Aanensen, and E. J. Feil. 2004. Displaying the relatedness among isolates of bacterial species—the eBURST approach. FEMS Microbiol. Lett. 241:129-134. [PubMed]
51. Tamames, J. 2001. Evolution of gene order conservation in prokaryotes. Genome Biol. 2:RESEARCH0020. [PMC free article] [PubMed]
52. Tettelin, H., V. Masignani, M. J. Cieslewicz, C. Donati, D. Medini, N. L. Ward, S. V. Angiuoli, J. Crabtree, A. L. Jones, A. S. Durkin, R. T. Deboy, T. M. Davidsen, M. Mora, M. Scarselli, I. Margarit y Ros, J. D. Peterson, C. R. Hauser, J. P. Sundaram, W. C. Nelson, R. Madupu, L. M. Brinkac, R. J. Dodson, M. J. Rosovitz, S. A. Sullivan, S. C. Daugherty, D. H. Haft, J. Selengut, M. L. Gwinn, L. Zhou, N. Zafar, H. Khouri, D. Radune, G. Dimitrov, K. Watkins, K. J. O'Connor, S. Smith, T. R. Utterback, O. White, C. E. Rubens, G. Grandi, L. C. Madoff, D. L. Kasper, J. L. Telford, M. R. Wessels, R. Rappuoli, and C. M. Fraser. 2005. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome.” Proc. Natl. Acad. Sci. U. S. A. 102:13950-13955. [PubMed]
53. Tettelin, H., V. Masignani, M. J. Cieslewicz, J. A. Eisen, S. Peterson, M. R. Wessels, I. T. Paulsen, K. E. Nelson, I. Margarit, T. D. Read, L. C. Madoff, A. M. Wolf, M. J. Beanan, L. M. Brinkac, S. C. Daugherty, R. T. DeBoy, A. S. Durkin, J. F. Kolonay, R. Madupu, M. R. Lewis, D. Radune, N. B. Fedorova, D. Scanlan, H. Khouri, S. Mulligan, H. A. Carty, R. T. Cline, S. E. Van Aken, J. Gill, M. Scarselli, M. Mora, E. T. Iacobini, C. Brettoni, G. Galli, M. Mariani, F. Vegni, D. Maione, D. Rinaudo, R. Rappuoli, J. L. Telford, D. L. Kasper, G. Grandi, and C. M. Fraser. 2002. Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae. Proc. Natl. Acad. Sci. U. S. A. 99:12391-12396. [PubMed]
54. Tettelin, H., D. Riley, C. Cattuto, and D. Medini. 2008. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11:472-477. [PubMed]
55. Vernikos, G. S., and J. Parkhill. 2006. Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics 22:2196-2203. [PubMed]
56. Welch, R. A., V. Burland, G. Plunkett III, P. Redford, P. Roesch, D. Rasko, E. L. Buckles, S. R. Liou, A. Boutin, J. Hackett, D. Stroud, G. F. Mayhew, D. J. Rose, S. Zhou, D. C. Schwartz, N. T. Perna, H. L. Mobley, M. S. Donnenberg, and F. R. Blattner. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 99:17020-17024. [PubMed]
57. Wolf, Y. I., I. B. Rogozin, N. V. Grishin, R. L. Tatusov, and E. V. Koonin. 2001. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. 1:8. [PMC free article] [PubMed]
58. Young, J. P., L. C. Crossman, A. W. Johnston, N. R. Thomson, Z. F. Ghazoui, K. H. Hull, M. Wexler, A. R. Curson, J. D. Todd, P. S. Poole, T. H. Mauchline, A. K. East, M. A. Quail, C. Churcher, C. Arrowsmith, I. Cherevach, T. Chillingworth, K. Clarke, A. Cronin, P. Davis, A. Fraser, Z. Hance, H. Hauser, K. Jagels, S. Moule, K. Mungall, H. Norbertczak, E. Rabbinowitsch, M. Sanders, M. Simmonds, S. Whitehead, and J. Parkhill. 2006. The genome of Rhizobium leguminosarum has recognizable core and accessory components. Genome Biol. 7:R34. [PMC free article] [PubMed]

Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)