This study illustrates the dangers of drawing universal conclusions when strains are selected based on specific criteria, such as phenotypic differences. No two strains in this study are truly representative of the population as a whole. Additionally, the large number of differences in pair-wise comparisons of any two strains illustrates the difficulty of associating genes with phenotypic differences, and the utility of sequencing multiple strains to increase the power of these associations. While our initial selection of the Florida strain was based on a phenotypic difference – that of tick transmissibility, the selection of subsequent strains (PR, VA, and MS) were made to try and minimize the effect of bias based on that phenotype, as well as select a wider geographic range of isolates to increase interstrain diversity. Interestingly, when the pyrosequenced strains are compared to Florida, there are more high-quality polymorphisms (identified when four reads, each with at least 20 base pairs flanking the polymorphic site, contain the difference, with at least one read in each direction) between Florida and the Mississippi strain, despite the fact that neither is tick-transmissible by D. andersoni
]. Further, St. Maries appears to be an outlier sequence, as there are at least 6,000 differences between St. Maries and all other sequenced strains.
The level of SNP diversity in these strains coupled with the high degree of gene content conservation also sheds an interesting light on the concept of the "core genome", described for Streptococcus agalactiae
]. For S. agalactiae
, approximately 90.5% of genes were considered part of the "core genome", or constant between strains, and each new strain added additional strain-specific genes to the "pan-genome". This is contrasted with Bacillus anthracis
, which had no new strain-specific genes after four strains were compared. The strains of A. marginale
sequenced here present an interesting data point, as A. marginale
has not been hypothesized to be a clonal population derived from another organism (as has been postulated for B. anthracis
), and yet has a closed core genome. The accumulation of large numbers of SNPs might indicate a greater evolutionary distance; however, the closed-core genome could be due to other factors. These could include the isolated nature of the intracellular niche occupied by A. marginale
, causing the organism to undergo reductive evolution to the point it is approaching the minimal gene complement, or may be, despite our efforts, related to the strains selected for sequencing. However, if this is due to long-term reductive evolution, it calls into question the source of the six split ORFs between the Florida and St. Maries genomes, as these are thought to be early reductive changes. Another possibility is that transmission of the organism among animals in a relatively restricted geographic area (i.e., within a herd) promotes a relatively clonal population of organisms through isolation in a similar environment.
Analysis of the level of SNP diversity in several bacterial genomes brings into question previous conclusions about the variability of obligate intracellular pathogens. Previous studies [6
] have found relatively large numbers of SNPs between intracellular organisms. It was therefore hypothesized that the relatively isolated intracellular niche limits opportunities for genetic exchange and increased numbers of SNPs provides a compensatory mechanism for providing diversity to drive evolution. Our results suggest this is unlikely, as there is no correlation between intracellular, facultative intracellular, and free-living organisms and the level of diversity. With few exceptions, there is a large range in the degree of variability in all the strains compared. Additionally, the organisms with the two highest rates of variability, Pseudomonas syringae
and Rhodopseudomonas palustris
, are both free-living. There is also significant variation at the genus and family level. These data suggest that the factors for retention of SNPs leading to bacterial diversity are likely multifactorial and complex.
While the composition of the gene content of the pan-genome is obviously important, this study reveals another characteristic that needs examination: the level of diversity in the pan-genome. The minimum of 20,028 variable sites found among these five genomes is approximately 1.67% of the estimated size of the pan-genome. The large number of unique SNPs in each strain (24.1% in the St. Maries genome, 6.0% in the Puerto Rico genome, 10.8% in the Virginia genome, and 25.5% in the Mississippi genome) suggests that while A. marginale has a closed core genome, the SNP profile of the core genome is moderately "open". When several strains of Streptococcus agalactiae (CJB111, COH1, A909, and 515) are compared to the 2603 VR strain, 99.18% of the 46,579 total detected SNPs are unique to an individual strain, while zero SNPs are common to all four strains. Similarly, 100% of SNPs between three strains of Bacillus anthracis (Ames, Ames Ancestor, and Sterne) and Mycobacterium tuberculosis (F11, H37Ra, and H37Rv), 98.8% of SNPs between three strains of Neisseria meningitides (FAM18, MC58, and Z2491), and 99.9% of SNPs between four strains of Chlamydophila pneumoniae (AR39, CWL029, J138, and TW-183) are unique to one strain. This suggests that these genomes have open SNP profiles regardless of being open or closed-core at the genome level. Further, there is no correlation between SNP diversity and lifestyle, with high levels of variation between strains and within genera, with limited exceptions. However, given that the majority of strains were selected based on phenotypic traits or previous work with each strain, it is unlikely that this represents the true diversity of these organisms. Additionally, the majority of organisms have only two sequenced strains, making analysis of variation within a species impossible to determine. Additional work will be required to build a picture of genomic diversity.
The genome of A. marginale
is highly recombinogenic, which, in spite of the highly conserved gene content, leads to increased plasticity. There are between five and nine functional msp2
pseudogenes in the strains examined to date [11
], and these can recombine in whole or in part into the msp2
expression site (or with each other) to generate new antigenic variants [26
]. Symmetrical inversions around the origin are thought to be quite common in bacteria [28
] and have been noted in Anaplasmataceae
, often utilizing repeated genes such as msp2
to mediate the inversion. These inversions are highlighted by comparisons between A. marginale
and Ehrlichia ruminantium
] and Anaplasma phagocytophilum
]. Many of these repetitive sequences flank ori
, as does another duplicated gene, rho
. While not around the origin, a smaller scale inversion was found between two strains of A. marginale
flanked by msp3
pseudogenes close to ori
. Another highly plastic genomic region is the AAAP locus [23
] that appears to be expanding and contracting within and between strains. In addition to changes in gene number, the sequences are highly variable (Table ). Further research will be needed to determine the significance of these differences, as well as the function of this locus.