The approaches described here provide a framework for phylogenetic analysis of species such as B. pseudomallei
with high levels of LGT and homoplasy. Selecting only SNPs from whole genome comparisons eliminated faster evolving loci that are more prone to homoplasy [49
], and sampling >14,000 SNPs spread across the genome reduced the confounding effects of convergent evolution and LGT. Deleted and duplicated genomic regions in Burkholderia
are frequent [11
] and can lead to missing data and sampling paralogous rather than orthologous loci [51
], respectively. We therefore selected loci that were always present but not duplicated in any of the sequenced genomes. High clade credibility values coupled with a non-conflicting phylogenetic pattern of homoplastic SNPs provided confidence in the phylogenetic hypotheses presented here. The inferred phylogenies are a meaningful approximation of descent, and not simply a depiction of the inevitable stochastic variation in similarity that would be present in products of any finite random sampling of an infinitely panmictic population.
The validity of using phylogenetic trees to depict the evolutionary history of organisms exhibiting LGT has been hotly debated, with some authors championing web-like structures to depict instances of reticulate evolution [53
] and others suggesting the importance and appropriateness of discerning patterns of vertical inheritance [54
]. Certainly, intra- and interspecific genetic exchange has shaped the genome of extant Burkholderia
isolates. However, although a large proportion of the genome may have been shaped this way over evolutionary time, only a very small portion of the genome is laterally inherited from generation to generation. Thus, a phylogenetic tree remains a valid way of representing the major patterns of descent for these species. On such a tree, the small connective threads that depict LGT and discordant individual gene phylogenies can subsequently be strung as individual genes are studied.
It is likely that the most recent common ancestor to B. pseudomallei existed on the Australian continent. Our phylogenetic analyses indicate a tendency for Australian B. pseudomallei isolates to be associated with a more ancient common ancestor compared to other isolates. This pattern also is supported by completely independent MLST results from 599 B. pseudomallei STs that showed that the Australasian population is defined by greater allelic diversity and fewer shared alleles. The presence of B. thailandensis isolates in Australia and the phylogenetic position of Burkholderia sp. MSMB43 point to the possibility that Burkholderia sp. MSMB43, B. thailandensis, B. pseudomallei, and B. mallei isolates are all descendants from an Australian B. thailandensis-like isolate, although this pattern is based on very few B. thailandensis and Burkholderia. sp. isolates. As more B. thailandensis isolates are discovered, their phylogenetic and geographic associations will be critical for confirming or rejecting this provisional hypothesis.
The monophyletic B. mallei clade diverged from B. pseudomallei before the current Southeast Asian population was established (Figure ). The long branch leading to B. mallei strains suggests a long passage of time before a rapid radiation led to the extant population. A high consistency index among SNPs from whole genome comparisons of B. mallei strains provides evidence for a completely clonal mode of descent for this species since its relatively recent radiation, in contrast with B. pseudomallei. The lack of LGT among B. mallei isolates is not surprising given the loss of recombination opportunities associated with host sequestration and inability to thrive in the environment; it is likely that LGT between B. mallei and B. pseudomallei has not occurred for these same reasons. Although host specialization may account for the differential rates of LGT between the B. pseudomallei and B. mallei populations, other barriers may influence LGT among B. pseudomallei populations.
The mechanistic basis for high recombination frequencies observed in Southeast Asian populations of B. pseudomallei
, compared to Australian populations, is of considerable interest. As sequences diverge, the likelihood of homologous recombination decreases [55
]. Therefore, perhaps the greater genetic distances among Australian B. pseudomallei
strains may, in part, explain lower levels of LGT in this population versus the more closely related and more connected Southeast Asian population. However, B. thailandensis
shares more alleles with the Southeast Asian population of B. pseudomallei
than with the Australian population (7:1), providing some evidence that LGT between species does occur despite genetic divergence. Different levels of LGT among populations may be due to greater abundance of B. thailandensis
in Southeast Asia, providing greater opportunities for physical contact and LGT. In Australia, the typically lower abundance of B. pseudomallei
in the environment [59
] may account for lower rates of LGT in comparison to the Southeast Asian population [60
]. Large, intensively farmed artificial wetlands such as the rice paddy fields of Thailand may favor high cell densities and mobility of strains. Conversely, the largely tropical savannah areas of Northern Australia dispersed over vast distances with limited low density grazing and human populations would be expected to impede gene flow [61
]. A third scenario is that these populations may have evolved differential intrinsic LGT rates, however we have no evidence to support this hypothesis.
is subdivided into two distinct subpopulations with distinct geographic distributions that are separated by Wallace's Line. For hundreds of years naturalists have noted a tendency for plant and animal populations to be divided along Wallace's Line [62
] but, to our knowledge, no prokaryotic examples have been reported. Two mutually exclusive hypotheses may explain the biogeographic separation of the Australian B. pseudomallei
population from the more recent Asian population along Wallace's Line, both of which are reliant on the geological history of the region. Islands on the western side of Wallace's Line are part of the Eurasian tectonic plate, whereas those on the eastern side are on the Australian plate [63
]. Perhaps B. pseudomallei
was introduced into Southeast Asia after the late Miocene (approximately 12 million years ago (Ma)) collision of these two plates in the vicinity of Wallace's Line. Conversely, like other species, the biogeographic separation may have begun with the divergence of an ancestral population living in Gondwanaland. This initial divergence would be related to plate tectonic motion approximately 140 Ma when the Indian subcontinent split from Gondwanaland. Populations could have been subsequently introduced into Asia during the collision of the Indian plate and the Eurasian plate that began approximately 55 Ma [64
] and then spread to the western edge of Wallace's Line. It was previously postulated that B. pseudomallei
may have originated in Gondwanaland and dispersed with the breakup of that ancient supercontinent (the Gondwana hypothesis
), or alternatively dispersed from Australia to Southeast Asia via the later Miocene land bridges that partially linked those regions [23
]. However, low MLST allelic diversity and sharing of prevalent alleles between strains from Australia and Southeast Asia suggests that B. pseudomallei
may actually be a much younger species [65
]. A founding population must therefore have crossed Wallace's Line more recently than the late Miocene. Such an event would have to be rare to allow for genetic divergence to occur; indeed, B. pseudomallei
does not survive well in sea water [66
]. Although all molecular clock estimates are fraught with potential inaccuracies regarding estimates of mutation fixation rates and generation times, these two dispersion hypotheses differ by more than an order of magnitude(<12 Ma, and >140 Ma), making it likely that even a rough estimate of divergence times can discriminate between these two hypotheses. Indeed, using a range of mutation rates and generation times similar to those determined in other bacterial species, our molecular clock estimates support the hypothesis of a founding population of B. pseudomallei
crossing Wallace's Line and becoming isolated from the larger population, with subsequent spread throughout Southeast Asia (Additional file 6
). The range of our estimates for the time of divergence between the two populations (16 thousand years ago (Ka) - 225 Ka) coincides with the times of recent glacial periods when low sea levels would have maximized the potential for dispersion amongst what are now islands in the Malay Archipelago. We also dated the last common B. pseudomallei
ancestor to between 24.9 Ka and 346 Ka and the divergence of B. thailandensis
and B. pseudomallei
to between 307 Ka and 4.27 Ma.
Our results demonstrate that, given large amounts of molecular data and extensive sampling, past evolutionary and biogeographic events can be reconstructed despite relatively high levels of LGT. Our use of evolutionarily informative SNPs derived from WGSs is imperative for maximizing phylogenetic resolution and reduces the likelihood that individual LGT events will corrupt the overall phylogeny, as can be expected with limited genomic sampling. Despite the problems with using limited genomic sampling schemes for determining fine scale phylogenetic patterns of relatedness in B. pseudomallei
, such schemes are widely accessible and thus result in large data sets. Fortunately, the resolution of MLST data is sufficient for determining broad patterns of population dynamics and distribution for B. pseudomallei
and adds this species to the growing list of bacterial species in which biogeographic structuring has been demonstrated [68
]. More comprehensive phylogenetic and population studies will set the stage for framing and addressing further questions about single gene evolution, dispersal, and population sub-structuring.