In this report, we present a comprehensive aCGH analysis for a large series of natural Bp isolates. We found that the accessory (variably present) portion of the Bp genome corresponds to ~14% of the whole genome content, which is broadly similar to other γ-proteobacteria. Since this approach is limited to the detection of elements present in the Bp K96243 genome, and novel elements in query genomes are not detected, this estimated fraction of the accessory genome should be regarded as a lower bound.
In the only published study of a Bp genome sequence to date, Holden et al (2004) computationally identified 16 GIs comprising 6% of the K96243 genome 
, and our data confirm that most of these islands are indeed highly variable between strains. However, two GIs (7 and 14) were found in all strains and should thus be regarded as part of the Bp core genome. Furthermore, our data also revealed the variable presence of several other small genomic islets/indels across the two chromosomes, which might contribute to the phenotypic diversity of Bp. Notably, we observed that several indels (n6, n12 and n19) were related to LPS biology. Currently, the exact contribution of LPS to Bp virulence is unclear. For example, DeShazer et al (1998) showed that Bp type II O-PS is essential for serum resistance and virulence 
, and mice pre-immunized with Bp LPS displayed enhanced survival to a subsequent challenge 
. In contrast, other groups have reported that Bp LPS exhibits a reduced ability to activate immune cells compared to E. coli
LPS, suggesting that LPS might play only a minimal role in Bp virulence. It is possible that these conflicting results might reflect heterogeneity in LPS pathways resulting from the variable presence of these indels, and represent an important mechanism for host adaptation. Interestingly, while it was recently shown that type III O-PS mutants (indel n12) do not appear to exhibit significant virulence attenuation in mouse infection assays 
, we have found in preliminary work that Bp strains lacking the indel n19 LPS cluster generally exhibited lower levels of virulence compared to strains where this cluster was present (SSH, data not shown). In the AGC tree, n19 was absent both from three strains segregating as a single branch in the A clade, and from 5 strains in the C clade that segregated across multiple branches. This suggests that n19 may have been recurrently lost in different Bp lineages. Further experiments are clearly required to understand the role of these LPS clusters in Bp virulence.
We also found that the Bp strains could be clustered into distinct clades based on both the presence and absence of specific accessory genes. Of primary interest, strains belonging to the C clade of clinical isolates were largely defined by the presence of 218 genes, of which 85% are localized to the GIs. These findings provide evidence for a distinct repertoire of Bp genes that may cause a predisposition to human disease and that these genes tend to be located on GIs. Although many of the genes encoded on the GIs are of unknown function, we present experimental evidence that a strain mutated in one of these genes exhibited decreased adherence to human buccal endothelial cells, supporting a role in virulence potential. We also observed coordinated growth-associated expression of several GI genes, which is also consistent with the view that they play an important biological role. What might this biological role be? At present, we consider it most likely that this “virulent” combination of genes has likely emerged for reasons other than to cause human disease, particularly since cases of human (or animal) infection are relatively rare compared to the density of Bp in the soil. In contrast to bacteria which are obligately associated with eukaryotic hosts, soil bacteria such as Bp commonly face extreme and unpredictable biotic and abiotic challenges including extreme temperature shifts, solar radiation, variable humidity, competition for nutrients, and the requirement to survive ingestion by predatory protozoa, nematodes, the production of bacteriocides from other bacteria and phage infection. It thus seems entirely plausible that genes facilitating survival against these environmental challenges might have also indirectly enhanced the microbe's ability to colonize and “accidently” infect a human host, particularly when the host is immunocompromised 
Another possibility that might explain the enrichment of GIs in the clinical isolates is that Bp is undergoing cryptic cycling through normal human hosts (as opposed to the immunodeficient host), and that these GIs are selected during this host-pathogen interaction. In melioidosis-endemic NE Thailand, the majority of healthy individuals have antibodies to Bp by the age of 4 years, indicating a constant exposure to the bacterium that may occur by inoculation, inhalation or ingestion 
. Within these normal hosts, Bp is likely to spend a period of time being exposed to the effects of the host immune response, after which the microbe may experience bacterial death, persistence, or expulsion from the host in a viable state and subsequent return to the environment. This latter process might occur through skin desquamation or urine and stool, since human excrement commonly finds its way back to the environment. Such cryptic cycling of Bp through the normal human host population could also lead to the selection of factors that promote survival in vivo
. However, as we consider the human host to be a relatively minor component of Bp ecology, we argue that this scenario is, on balance, less likely.
The availability of both MLST and aCGH data for a representative sub-sample of isolates also provided us the opportunity to compare clade distributions defined either by accessory genome content or allelic variation in the core genome. We found that the animal associated strains largely corresponded to a single MLST clone (ST51). These isolates were assembled from three distinct sources: the Singapore zoo, the University of Malaya and a pig abbatoir in Singapore. The soil isolates corresponding to ST51 (which also clustered in the A clade) were not isolated from soil samples in proximity to the animal ST51 isolates, which suggests that this genotype is also present in the environment. The homogeneity of these isolates is therefore striking and cannot be explained simply by sampling bias. The consistency between the microarray and MLST data strongly suggest that this clade is monophyletic, and that the strains harbour similar gene repertoires by virtue of common descent.
In contrast, we also observed clear discrepancies between the MLST and aCGH clades. For example, three ST51 isolates clustered within the clinical aCGH clade, and ST423 was split between the clinical and environmental aCGH clades. There are three possibilities to explain these discrepencies: i) The MLST data represents the ancestral state which is inherited by descent into two AGC-defined clades - this is unlikely for the animal cluster as the vast majority of isolates are ST51, but might conceivably explain the ST423 split between the clinical and environmental clades. ii) Convergence of the MLST alleles - this would imply that isolates with the same ST are not identical by descent but happen to share the same combination of alleles. The presence of a few very common alleles for each gene, combined with high rates of recombination in Bp make this possibility more likely. iii) Independent convergence of gene content to one of the three clusters. Unless large numbers of genes can be transferred in single events, this possibility seems less parsimonious than (ii). More data are required to examine which of these hypotheses is most likely.
In summary, our study provides direct experimental confirmation that the Bp genome is highly plastic, and that gene acquisition and deletion are major drivers of this variability. This variability is far from random, and is functionally biased towards genes involved in mobile elements, hypothetical and paralogous genes, and LPS biosynthesis. Furthermore, genes on mobile elements may predispose individual strains, either directly or indirectly, towards causing human disease. We believe this latter result is significant in that most Bp research to date has focused on virulence components in the Bp core genome rather than genes on mobile elements. We conclude by noting that most of the Bp genome sequences currently available have been obtained from human clinical isolates. Given our results, it might be highly informative to subject a panel of animal and environmental Bp isolates to similar detailed genome analysis as well.