During acute and chronic infections, bacterial pathogens can accumulate mutations that allow them to better adapt to their human host1,2
, evade the immune response12,13
, and become more resistant to antibiotic therapy3,4
. The spectrum of beneficial mutations that arise during the course of a bacterial infection is likely to reflect genetic pathways critical to bacterial pathogenesis in vivo
, and therefore may inspire new therapeutic directions. Recent advances in high-throughput sequencing make it possible to follow the genome evolution of bacterial pathogens7-9,14-17
, but it is still difficult to tease apart the adaptive driver mutations from the neutral passenger mutations that have been fixed by chance9-11
. In the laboratory, this is addressed by following several populations grown in parallel cultures under identical conditions; the adaptive role of mutations is indicated by their recurrence in replicate experiments17-19
. In natural and clinical environments, such studies are more difficult and have not yet been systematically performed on a genome-wide scale. As a result, global patterns of adaptive evolution that underlie bacterial pathogenesis in humans are not well characterized.
Here, we systematically identify recurrent patterns of evolution implicated in pathogenesis by comparing the genetic adaptation of a single bacterial strain in multiple human subjects during the spread of an epidemic. The airways of people with cystic fibrosis (CF, a lethal genetic disorder) are particularly prone to long-term bacterial infections. Most individuals with CF become colonized by a dominant bacterial strain that persists for many years20
, allowing significant time for genetic adaption2
. In the 1990s, a small epidemic of Burkholderia dolosa
– a rare CF pathogen21,22
that can be transmitted from person to person23
– broke out among individuals with CF in Boston24,25
. A total of 39 individuals were infected with B. dolosa
(); and all were followed in a Boston hospital, where bacteria isolated during normal care were routinely frozen.
Whole-genome sequencing of 112 Burkholderia dolosa isolates recovered from 14 epidemic patients indicates steady accumulation of mutations over years
We conducted a retrospective study of 112 B. dolosa
isolates from 14 individuals with Cystic Fibrosis from this epidemic outbreak – including the first infected subject in the Boston area (patient zero) – over the course of 16 years ( and Supplementary Table 1
). During this period, five of these individuals received a lung transplant, and eight died. Most of the 112 isolates were recovered from the subjects' airways; a few were recovered from the blood of subjects with bacteremia. This collection covers the epidemic with high temporal resolution and enables us to study the parallel evolution of the same strain in multiple individuals (Supplementary Fig. 1
We sequenced the whole-genome of these 112 B. dolosa
isolates on an Illumina GAIIx sequencer (75bp single-end reads, average read depth 37×; Supplementary Fig. 2
) and aligned the reads onto a B. dolosa
. We focused our analysis on SNPs; although structural variants and mobile elements may also be important, they are beyond the scope of this study. Our analysis identified 492 polymorphic loci (Supplementary Table 2
). These mutations accumulated at a steady rate of ~2 SNP/year (r=0.79) (), with no discernible difference between subjects (Supplementary Fig. 3
). This rate of mutation accumulation in the presence of selection within the human body is consistent with bacterial mutation fixation rates reported in long-term human infections2,27
. The steady accumulation of mutations generated enough genetic diversity to resolve evolutionary relationships between isolates, which were investigated through the creation of a maximum-likelihood phylogenetic tree ().
Bacterial phylogeny reveals a likely network of transmission between patients, and between organs
At the epidemic level, the phylogeny suggests a network of transmission between subjects. Isolates from the same subject tend to form genetically related clusters in the phylogeny (, Supplementary Fig. 3
). These clusters define subject-specific genetic fingerprints, from which transmission history can be inferred. We constructed the last common ancestor (LCA) for each subject's set of isolates, which bears the subject-specific fingerprints. The phylogenetic relationships between these inferred strains indicated the likely network of transmissions among the 14 subjects (). Because these data account for only 14 of the 39 subjects in this epidemic, we cannot determine whether transmission occurred directly from one subject to another, or indirectly through a subject not in our study, via a healthcare worker, or through a medical device. Nevertheless, this analysis shows that this specific epidemic was transmitted through several people during its spread and demonstrates the strength of the approach in identifying the infection network of an epidemic.
At the level of individual subjects, the phylogenetic analysis evidenced the transfer of multiple B. dolosa
clones to the subject's bloodstream during bacteremia. We examined isolates from the three subjects for whom we obtained more than one blood isolate (subjects H, K, and N). In two of these individuals, we found pairs of blood isolates that evolved from distinct lung isolates (), which is inconsistent with the transmission of a single clone from lungs to blood (Supplementary Fig. 4a
). This evidence for multiclonal transmission is consistent either with a punctuated transmission of multiple clones from the genetically diverse lung28-31
(Supplementary Fig. 4b
), or with multiple transmissions occurring over time. These different possibilities would lead to recommendations for distinct therapeutic actions: whereas a lung transplant might be effective in preventing the continuous leak of bacteria through lesions of the lung, it would not block the proliferation of bacteria already within the bloodstream. This analysis thus brings into focus unresolved questions about the mechanistic basis of bacteremia.
Finally, we investigated evolution at the gene level. We looked for genetic correlates of known pathogenic phenotypes. We first assayed resistance to ciprofloxacin, a fluoroquinolone frequently prescribed to CF subjects (). Resistance among the 112 isolates varied over two orders of magnitudes (Supplementary Fig. 5a
). We scored each gene for correlation between the presence of mutations and drug resistance (, inset). This genome-wide association study implicated a single gene in the phenotype, BDAG_02180, homolog to Escherichia coli gyrA
. All the genotypes associated with resistance had nonsynonymous mutations in T83 or D87, known for their role in fluoroquinolone resistance4,32,33
. Mutations in these residues occurred in six subjects. In each case, phylogenetic analysis indicated that mutations were independently acquired within the subject, after initial infection (Supplementary Fig. 5b
). In some cases, we even found in the same subject mutations in both residues, each carried by a different isolate. These findings support the presence of a strong selective pressure from fluoroquinolones but suggest that there are only few genetic paths to resistance in vivo
Pathogenic phenotypes are associated with point mutations in key genes
We then focused on a second pathogenic phenotype, the presentation of O-antigen repeats in the lipopolysaccharide (LPS) of the bacterium's outer membrane, known for its importance to virulence in related species34-36
. We found that some of our isolates present the O-antigen while others do not (). A single nucleotide in the glycosyltransferase gene BDAG_02317 correlated exactly with the presentation of O-antigen repeats (, inset). The ancestral genotype at this locus, a stop codon, corresponds to the absence of O-antigen repeats; two different mutations at the same amino acid position – each restoring a full-length protein – are associated with presence of the repeats. We confirmed this association experimentally; we found that complementation with the full-length glycosyltransferase gene could restore O-antigen presentation (Supplementary Fig. 6a, Supplementary Note
). Harnessing the phylogenetic information, we determined that the last common ancestors of strains from each subject presented the truncated genotype. Thus, the gain-of-function mutations occurred in nine subjects independently (Supplementary Fig. 6b
), highlighting the strength of the selective pressure for O-antigen presentation during the infection. These results identify a previously uncharacterized genetic mechanism for O-antigen switching and hint at a tradeoff during person-to-person transmission.
We recognize that the human body challenges bacteria with many selective pressures beyond those discussed above. We therefore developed a systematic approach for identifying genes under positive selection without prior knowledge of the phenotypes being selected. At the genome level, we found no evidence for selection in coding regions (dN/dS~1) and no significant intragenic bias (Supplementary Note
). However, we reasoned that genes under selection would be mutated independently in different subjects17-19
. We leveraged the phylogeny to calculate the number of mutations each gene received, distinguishing genes mutated multiple times from those mutated once but appearing in several subjects through the expansion of the lineage that carried them. We counted 561 independent mutational events in 304 genes (Supplementary Table 3
). Assuming neutral evolution, we would expect that these mutations would distribute randomly among the 5,014 B. dolosa
genes, and that genes would rarely acquire more than a single mutation. Instead, we observed that many more genes than expected contained multiple mutations (, inset). Seventeen genes were found to have three or more different mutations (neutral expectation: ~1, Online Methods), and four genes had over ten different mutations (expected: 0 genes).
Figure 4 Parallel evolution identifies a set of genes under strong selection during pathogenesis. a, Inset, The number of genes that acquired at least m mutations across the epidemic is plotted as a function of m (gray bars). This distribution contrasts sharply (more ...)
To determine whether genes that acquired multiple mutations were under positive selection, or were merely sites of mutational bias, we calculated the canonical measure for selection, dN/dS (). The large subset of 247 genes which contained only one mutation exhibited a weak but significant signal for purifying (i.e., negative) selection (dN/dS=0.63, p<10-3, Online Methods). The set of genes with two mutations did not show evidence of selection (dN/dS=1.4, CI:0.7-3.1); this set may include a combination of genes that are under some selection and genes that fixed two mutations by chance (22 expected under neutral drift; 28 observed). By contrast, the 17 genes that acquired three or more mutations received 18 times as many non-synonymous mutations as expected by neutral drift, and are under strong positive selection (dN/dS=18, CI:4.9-152.7). This suggests that these seventeen genes are not neutral mutational hotspots; they are undergoing adaptive evolution under the pressure of natural selection.
The 17 genes under positive selection (, Supplementary Table 4
), which are mostly conserved across the Burkholderia
genus (Supplementary Fig. 7
), indicate genetic pathways that may be involved in pathogenesis. The presence of the two genes previously identified in connection with antibiotic resistance and O-antigen presentation (gyrA
: 11 mutations; glycosyltransferase wbaD
: 10 mutations) provides a further connection of these genes to pathogenic phenotypes under selection. Eleven of the 17 genes belong to functional categories related to pathogenicity: membrane synthesis (4 genes, including 2 in LPS biosynthesis), secretion (2), and antibiotic resistance (5). The presence of a second glycosyltransferase in the O-antigen cluster (6 mutations) stresses the importance of this pathway to the disease. Notably, the remaining 6 genes had not previously been implicated in pathogenesis of lung infections. Three of these – a glucoamylase, a methyltransferase, and a sigma factor – have no well-annotated close homologs and their roles in pathogenicity are thus unclear. Another gene trio (homologs of fnr, fixL
, and fixJ
), including the gene most mutated gene (BDAG_01161, a homolog of fixL
, that had 17 nonsynonymous mutations), can be linked through homology to oxygen-dependent gene regulation37
. The large number of mutations in this pathway resonates with reports of lowered oxygen tension in CF mucus38
and of ties between oxygen sensing and virulence modulation in the gastrointestinal tract39
. Homologs of these three genes have been implicated in diverse regulatory processes37,39
, but their function and the genes they regulate in B. dolosa
are currently unknown. The identification of 17 B. dolosa
genes that underwent selective pressure during infection in subjects with cystic fibrosis highlights key pathways involved in pathogenesis and may suggest new therapeutic targets for this and other lung infections.
Tracking the genomic evolution of bacterial pathogens during the infection of their human host provides a direct method for observing evolutionary mechanisms in vivo and identifying genes central to pathogenesis. This study, which harnesses the combination of high-throughput sequencing and parallel evolution in the clinical settings, is a step towards a comprehensive understanding of genetic adaptation during pathogenesis. Systematically identifying selective pressures acting on pathogens within their hosts may help point to new therapeutic directions.