|Home | About | Journals | Submit | Contact Us | Français|
Current methods for differentiating isolates of predominant lineages of pathogenic bacteria often do not provide sufficient resolution to define precise relationships. Here, we describe a high-throughput genomics approach that provides a high-resolution view of the epidemiology and microevolution of a dominant strain of methicillin-resistant Staphylococcus aureus (MRSA). This approach reveals the global geographic structure within the lineage, its intercontinental transmission through four decades, and the potential to trace person-to-person transmission within a hospital environment. The ability to interrogate and resolve bacterial populations is applicable to a range of infectious diseases, as well as microbial ecology.
The development of molecular typing techniques has been instrumental in studying the population structure and evolution of bacterial pathogens. Sequence-based approaches, such as multilocus sequence typing (MLST) (1), have resulted in large searchable databases of the most clinically important species. However, MLST defines variation within a very small sample of the genome and cannot distinguish between closely related isolates. Full-genome sequencing provides a complete inventory of microevolutionary changes, but this approach is impractical for large population samples. The use of next-generation sequencing technologies, such as Illumina Genome Analyzer, bridges this gap by mapping genome-wide single-nucleotide polymorphisms (SNPs) and insertions or deletions (indels) to a reference sequence. The use of index adapters to create individually tagged genomic libraries provides the means to generate data for multiple bacterial isolates on a single sequencer lane and makes it feasible to rapidly generate whole-genome DNA sequence data for large population samples of bacteria.
Health care–associated, methicillin-resistant Staphylococcus aureus (HA-MRSA) is a globally important human pathogen. Current typing methods resolve the majority of HA-MRSA isolates into a small number of widely disseminated clonal lineages (2). One such clone, defined by MLST as sequence type 239 (ST239), is multiply antibiotic-resistant and accounts for at least 90% of HA-MRSA throughout China (3), Thailand (4), Turkey (5), and probably much of mainland Asia (6). ST239 has been detected in South America (7, 8) and is currently circulating in Eastern Europe (9-11). Variants of ST239 correspond to the epidemic MRSA(1)–1, -4, -11, Brazilian, Portuguese, Hungarian, and Viennese clones, which are distinguished on the basis of variation within the large type III SCCmec element, spa data, and subtle differences by pulsed-field gel electrophoresis (PFGE). Despite this variation, current typing methods provide little discriminatory power for subtyping ST239 isolates within a given region because single variants that undergo clonal expansion can dominate in hospitals throughout a large geographic area.
To investigate the utility of a second-generation DNA sequencing platform for high-resolution genotyping and investigation of the microevolutionary events within MRSA, we analyzed 63 ST239 isolates (table S1) from two distinct samples (12). The first sample, consisting of 43 isolates from a global collection recovered between 1982 and 2003, provides a snapshot of the global ST239 population. One of these isolates (TW20) was sequenced to completion to provide a reference for analysis. The second sample of 20 isolates, derived from patients at the Sappasithiprasong hospital in northeast Thailand within a 7-month period, provides a very closely related group, potentially linked via a chain of transmission.
Mapping reads for each isolate against TW20 (table S2) identified 6714 high-quality SNPs. These SNPs had a markedly uneven distribution across the genome (fig. S1A), largely related to whether the SNP resided in the core (present in all sample isolates) or accessory regions of the genome. The accessory genome primarily comprised mobile genetic elements (MGEs) such as phage, transposons, SCCmec, and genomic islands that are known to constitute a major source of variation between S. aureus genomes (13). Because MGEs have an inherent potential for horizontal transfer between isolates, which could confound phylogenetic interpretations, we distinguished between the “core” and “noncore” genome for subsequent analysis.
The maximum likelihood phylogeny presented in Fig. 1 was reconstructed by using the 4310 variable sites in the core genome (table S3). We are confident that our approach has resulted in a robust tree. First, we noted little evidence of homoplasy (convergent evolution); of the 4310 sites that exhibited a SNP, only 38 (0.88%) were homoplasic (cannot be explained without convergence when mapped onto the tree) (Table 1). Notably, many of the homoplasic SNPs were in genes involved in drug resistance, with 10 corresponding to mutations known to confer resistance. Secondly, the tree showed a striking consistency with geographic source (Fig. 1). The South American isolates, with one exception, clustered tightly within a highly distinct and uniform clade, which may reflect a recent expansion of a single variant throughout the continent. Similarly, the Thai and Chinese isolates formed a single, although more diverse, Asian clade. The European isolates were more diverse still, with most positioned basally on the tree, consistent with a possible European origin for ST239. Within the European isolates, there was also evidence of geographical clustering.
There were several exceptions to this geographical structure that illustrate the intercontinental spread of MRSA. Two PGFE-distinguishable clones of ST239 are known to have dominated in Portuguese hospitals during the 1990s: the Portuguese clone in the early 1990s and the Brazilian clone that appeared in 1997. All seven Portuguese clone isolates recovered between 1990 and 1993 clustered together, whereas the three Brazilian clone isolates clustered within the South American clade, strongly supporting the hypothesis that this second wave in Portugal resulted from the introduction of a South American variant.
More intriguing were two European isolates that clustered within the Thai clade: DEN907, isolated in Denmark, and TW20, from a large 2-year outbreak at a London hospital (14). In addition to the core SNPs, both isolates contain the SPβ-like (TW20) prophage characteristic of the Asian clade (fig. S1B). Records for the Danish isolate indicated that the patient was Thai, consistent with its position on the tree. The position of TW20 is less readily explained and potentially points to a single intercontinental transmission event, most likely from southeast Asia, that sparked the London outbreak.
Although the current isolate collection did not permit a robust temporal analysis, a linear regression of root-to-tip distances against the year of sampling showed a strong correlation, with older isolates positioned more basally (fig. S2). The estimated mutation rate for the isolate collection was 3.3 × 10−6 [95% confidence interval (CI) from 2.5 × 10−6 to 4.0 × 10−6] per site per year and would date the most recent common ancestor of ST239 to the mid to late 1960s, a period contemporaneous with the emergence of MRSA in Europe (15). This rate is about 1000 times faster than the canonical substitution rate estimate for E. coli (16) but more in line with recent rate estimates based on analyses of more closely related bacterial genomes (17, 18). Potential explanations for this could include a reduction in effective population size, leading to increased accumulation of mutations (although we have no evidence of this), or the possibility that some of the core SNPs were transferred by recombination, although the low level of homoplasy suggests that recombination has been rare. Alternatively, it may be that the greater resolution of our analysis allows us to determine the rate of mutation in the population before selection has had time to purify out those that are detrimental. This explanation implies that purifying selection acts on all mutations, including intergenic and synonymous sites, but over longer time periods, as suggested by Moran et al. (17) and shown for nonsynonymous mutations by Rocha et al. (19).
In addition to providing evidence for intercontinental transmission of ST239 variants, these data also hold the promise of revealing fine-scale transmission events between or within single hospitals. Our data included 20 isolates collected over 7 months at a single hospital in Thailand. These isolates were surprisingly divergent when compared with the South American clade (which encompasses isolates from Brazil, Chile, Argentina, and Uruguay). However, five isolates were differentiated by only 14 SNPs: four isolates (S21, S24, S39, and S42) obtained within a 16-day period and the remainder (S81) isolated 11 weeks later. These times of isolation are consistent with our estimated mutation rate of one core SNP every 6 weeks. We examined the possibility of an epidemiological link between these five isolates and noted that the patients were located in wards in adjacent blocks of the hospital and that these wards were not represented in the more divergent isolates. This result has important implications for infection control and generates invaluable information for interventions to target MRSA transmission.
Typing methods, such as spa and PFGE, are routinely used for epidemiological studies of S. aureus and other bacteria and can distinguish between different ST239 variants. We explored the extent to which the variation assayed by these methods is consistent with the high-resolution SNP data. Overall, we found high levels of consistency between spa type and phylogenetic position (Fig. 2), with only a single example of a spa type being shared by unrelated isolates (GRE317 and HU25). This finding contrasts with the study of Nübel et al. (20), who noted inconsistencies between the spa data and SNP data for the ST5 lineage. One possible explanation for this discrepancy is that there has been insufficient time to accumulate numerous spa homoplasies within the younger ST239 clone.
PFGE data for the isolates (excluding the Thai isolates) divided the collection into 10 clusters (fig. S3). Again, there was a large degree of consistency between the PFGE clusters and the tree (Fig. 2). However, there were some incompatibilities. For example, cluster 6 was found in unrelated European and Asian isolates. Although certain prophage and MGEs are associated with specific clades [e.g., SPβ-like (TW20) prophage with the Asian clade], the inconsistencies here are likely to be due to the frequent gain and loss of MGEs, which can have dramatic effects on PFGE patterns.
By analyzing whole-genome data of a collection of MRSA ST239, we have gained new insights into fundamental processes of evolution in an important human pathogen. By creating a precise and robust phylogeny for the collection, we now have a highly informative perspective on the evolution of the clone.
These observations point to a limited number of successful intercontinental transmission events and expansion of subclonal variants that in some cases have become dominant in their new geographical region. The potential to detect these new introductions and target heightened infection control interventions, as occurred in the London TW20 outbreak, has clear public health implications and highlights the need for more informed global surveillance strategies. Equally important is the achievement of absolute discrimination of isolates within a single clinical setting, even those recovered only days apart, and the ability to use this SNP data to inform epidemiological analysis. Multiple additional costly infection control interventions are often used to reduce MRSA transmission supported by patient, staff, and environmental screening programs. The estimated rate of core genome divergence (1 SNP per ~6 weeks) should provide sufficient diversity to separate recent from distant transmission events, thereby dramatically improving contact tracing in endemic and outbreak settings and allowing targeting of diagnostics and interventions according to need. The additional variation from noncore regions provides supplementary discriminatory power and may inform the design of bespoke typing schemes for specific clones and locales.
From these data, we have described an estimated time frame for the emergence of a bacterial pathogen clone and how it has subsequently evolved. Of particular importance is the observation that over a quarter (28.9%) of the homoplasies detected can be directly related to evolution of resistance to antibiotic drugs currently in use (21-26), confirming clinical practice as a major driver of pathogen evolution and lending heightened importance to understanding the relevance of other homoplasies. Such insights inform future surveillance strategies for the detection of emerging clones and management of epidemic spread. We fully anticipate that, as the technology and analytical methods improve, the approach described here will underpin the next wave of molecular data for epidemiological and microevolutionary studies in bacteria.
The Sanger Institute is core funded by the Wellcome Trust. We thank C. Milheiriço and J. D. Cockfield for preparation of genomic DNA and G. Dougan and the Sanger Institute Sequencing and Informatics groups for general support. S.G. and A.T. were supported by grants SFRH/BPD/25403/2005 and SFRH/BD/44220/2008, respectively, from Fundação para a Ciência e Tecnologia, Portugal. E.K.N., N.C., N.D., and S.J.P. were funded by the Wellcome Trust. Funding for the sequencing of the TW20 genome was provided by Guy's and St. Thomas' Charity. J.D.E. receives funding from the Department of Health via the National Institute for Health Research's comprehensive Biomedical Research Centre award to Guy's and St. Thomas' National Health Service Foundation Trust in partnership with King's College London. The Illumina Genome Analyzer reads are deposited in the Short Read Archive (National Center for Biotechnology Information) under the accession no. ERA000102. The annotated chromosome of TW20 has been submitted to European Molecular Biology Laboratory with the accession number FN433596.
Supporting Online Material
www.sciencemag.org/cgi/content/full/327/5964/469/DC1 Materials and Methods Figs. S1 to S4 Tables S1 to S4