|Home | About | Journals | Submit | Contact Us | Français|
Molecular pathogenomic analysis of the human bacterial pathogen group A Streptococcus has been conducted for a decade. Much has been learned as a consequence of the confluence of low-cost DNA sequencing, microarray technology, high-throughput proteomics, and enhanced bioinformatics. These technical advances, coupled with the availability of unique bacterial strain collections, have facilitated a systems biology investigative strategy designed to enhance and accelerate our understanding of disease processes. Here, we provide examples of the progress made by exploiting an integrated genome-wide research platform to gain new insight into molecular pathogenesis. The studies have provided many new avenues for basic and translational research.
Group A Streptococcus (GAS) is a Gram-positive bacterium that causes several diseases in humans, including pharyngitis and/or tonsillitis, skin infections (including impetigo, erysipelas, and other forms of pyoderma), acute rheumatic fever (ARF), scarlet fever, poststreptococcal glomerulonephritis (PSGN), a toxic shock–like syndrome, and necrotizing fasciitis (NF) (1–3). The organism is a human-adapted pathogen, that is, humans are the natural host and there is no animal or environmental reservoir that contributes to the life cycle of GAS. On a global basis, ARF and the rheumatic heart disease that follows are the most common cause of preventable pediatric heart disease worldwide (4). There are 25–35 million cases of GAS pharyngitis per year in the United States, which are responsible for about 1–2 billion dollars per year in direct health care costs, and 600 million cases globally each year (Table (Table1)1) (2, 3). The continued high morbidity and mortality caused by GAS in developing nations, the substantial financial burden attributable to GAS-related health care costs in the United States, the development and spread of antibiotic resistance in this pathogen (5–7), and the lack of a licensed human vaccine (8, 9) highlight the need for a fuller understanding of the molecular pathogenesis of GAS. Especially vital is the need for information bearing on the molecular events contributing to pathogen-host interaction, clone emergence, and epidemics.
In recent years, a paradigm shift has occurred in the manner in which microbial pathogenesis problems are studied. The confluence of data derived from genome-sequencing projects and the development of microarray technology permit a genome-wide strategy to be used for pathogenesis research, now commonly referred to as molecular pathogenomics investigation (10). Comparative genome sequencing and other types of analyses of multiple isolates of the same pathogenic bacterial species have revealed unexpectedly large amounts of intraspecies variation in gene content and uncovered strategies used in genome diversification (11–19). Strains of some bacterial species differ in gene content by up to 25% and have extensive allelic variation (11, 13, 15). This striking magnitude of genetic differentiation provides tremendous adaptive flexibility and influences the spectrum of clinical disease caused by distinct clones of the species. For example, the genome of E. coli serotype O157:H7, which is responsible for hemolytic uremic syndrome, is approximately 25% larger (i.e., approximately 1 Mb of DNA larger) than that of many other strains of E. coli that do not cause significant disease in humans (13).
A molecular pathogenomics approach has been applied to GAS for nearly a decade and has yielded new information about the genetic basis of GAS pathogenesis, clone emergence, and strain genotype–disease phenotype relationships. Herein we highlight key findings that illustrate that molecular pathogenomics can be used to greatly accelerate the rate of obtaining new information about long-standing and previously intractable infectious disease questions. Only by applying hypothesis-driven research and using new technologies will the elucidation of molecular events underlying infectious disease processes proceed with maximum efficiency.
The full-genome sequences that are now available for 13 strains of GAS make it one of the more deeply sequenced species of human pathogens (15, 20–28). GAS strains can be classified by either M protein or T protein serotype, depending on the composition of bacterial antigens expressed on their cell surface, and the 13 genome sequences represent serotypes responsible for more than 70% of M protein serotypes that commonly cause GAS pharyngitis and invasive infections in the Western Hemisphere (29–31). The strains selected for sequencing were chosen mainly because they represent abundantly occurring M protein serotypes or because they had a distinct clinical phenotype of special interest, such as high virulence, antimicrobial agent resistance, or association with a particular disease (15, 20–28). Several common themes have emerged from comparison of the genome sequences (15). Each genome is approximately 1.9 Mb in size, with approximately 10% of the overall gene content encoded on variably present exogenous genetic elements such as prophages and integrated conjugative elements (15), with the former accounting for most of the variably present gene content (15, 20–28). Importantly, the marked heterogeneity of gene content is even observed among GAS strains of the same M protein serotype. This extensive variation in gene content stands in stark contrast to humans and many other higher eukaryotic species, which have less than 0.1% variation in gene content among different members of a species. The 13 GAS genome sequences provide a critical resource for initiating investigation into how variation in gene content influences pathogenesis.
As recently as a decade ago, only relatively few GAS virulence factors had been identified. These included M protein, hyaluronic acid capsule, and the extracellular cysteine protease streptococcal pyrogenic exotoxin B (SpeB). Within a few years of the first GAS genome sequence becoming available, no fewer than 13 new proteins that contribute to pathogenesis had been described (32–45). Novel pathways of GAS host-pathogen interaction also had been elucidated. For example, Edwards et al. demonstrated that supernatants derived from GAS cleaved IL-8, resulting in reduced neutrophil activation and migration (37). Using the GAS genome data, the IL-8–degrading activity was shown to be encoded by a previously uninvestigated gene (spy0416, using the serotype M1 numbering system for strain SF370), and the proteinase encoded by this gene was subsequently termed “Streptococcus pyogenes cell envelope proteinase” (SpyCEP) (46, 47). In a short time, it has been learned that SpyCEP-mediated cleavage of IL-8 decreases neutrophil endothelial transmigration and that degradation of additional chemokines by SpyCEP retards neutrophil activation (46–48). Together, these discoveries indicate a key role for SpyCEP in GAS host-pathogen interaction (46–50). Similar rapid progress, facilitated by the availability of the GAS genome sequences, has been made in identifying and characterizing the function of novel GAS immunoglobulin-degrading enzymes (39, 40, 51), collagen-like proteins (41, 44, 52), and several superantigens (53–55). Given that numerous open reading frames in the GAS genome that encode putative cell surface proteins have yet to be investigated, it seems highly likely to us that heretofore unidentified GAS virulence factors will continue to be discovered in the coming years.
As noted above, the GAS genome encodes a broad range of virulence factors that are critical to the diverse array of infections that the bacterium causes (1). One important impact of GAS genome-wide investigations has been to provide an enhanced molecular understanding of how the pathogen coordinates virulence factor production (56). There are 13 conserved two-component regulatory systems (TCSs) in the completed GAS genomes, only one of which had been studied prior to the availability of the first GAS genome sequence in 2001 (57). TCSs regulate multiple unlinked chromosomal genes and control coordinated expression of genes encoding virulence factors, such as toxins, degradative enzymes, and immune-modulating molecules, in response to environmental stimuli. Since 2001, each of the 13 TCSs has been studied by genome-wide analyses to some degree, many in detail (58–62). Similarly, there are more than 100 putative stand-alone transcriptional regulators encoded within the GAS genome, only a few of which had been identified before the availability of a genome sequence. Of the 100 putative stand-alone transcriptional regulators, 12 have now been investigated in detail, and the transcriptomes that eight of these regulate in vitro have been determined (17, 63–68).
Genome-wide investigations of GAS regulatory pathways have transformed our understanding of pathogenesis. A key finding has been that global gene transcription varies highly depending on the environment, growth conditions, and stage of growth of the bacterium (69–72). For example, Voyich et al. (73) have shown that the GAS transcriptome is substantially altered in response to phagocytosis by human polymorphonuclear leukocytes (PMNs), a key step in invasive GAS disease. Similarly, interaction with pharyngeal epithelial cells and human saliva was found to induce marked alterations in the GAS transcriptome, thereby providing new information about potential strategies used for infection of and persistence in the oropharynx (69, 74). Longitudinal analysis of changes in the GAS transcriptome over time in nonhuman primates has revealed that the temporal pattern of GAS gene transcription in pharyngitis is very closely linked to three distinct phases of infection, namely colonization, acute infection, and asymptomatic carriage (75). During the colonization phase, when GAS CFUs were low, the expression of genes involved in carbohydrate metabolism was greatly increased, suggesting that carbon source acquisition is a key step in initial GAS growth and establishment of infection. Expression of genes encoding GAS virulence factors with known roles in GAS survival, dissemination, inhibition of PMN recruitment, and induction of host cytokines were highly expressed in the acute phase of infection, concomitant with an increase in GAS CFUs and host inflammation. Thus, the in vivo infection data are in strong alignment with ex vivo results (i.e., those obtained using saliva, epithelial cells, and PMNs).
A key finding of repeated GAS transcriptome analyses has been the elucidation of previously unappreciated connections between distinct gene categories. For example, recent work has shown that transcript levels of genes encoding proteins involved in carbohydrate catabolism and genes encoding virulence factors change in concert in response to environmental stimuli (74, 75). Such changes are at least partially due to a regulatory circuit controlled by catabolite control protein A (CcpA) (Figure (Figure1)1) (59, 74–78). Similarly, a genome-wide investigation of SpeB regulation led to the discovery that a lactose catabolism enzyme (LacD.1) has evolved to coordinate alterations in GAS virulence factor production as a result of changes in carbon source availability (Figure (Figure1)1) (79). Analysis of completed GAS genomes led to the discovery that one of the two GAS lactose operons has retained a catabolic role, whereas the other has evolved a regulatory function (Figure (Figure1)1) (80).
In addition to discovering ties between central metabolic processes and pathogenesis, GAS genome-wide studies have also revealed interactions between metal regulation, oxidative stress, and pathogenesis. Analysis of the completed genomes indicates that GAS encodes two highly conserved metalloregulators, MtsR and PerR, that regulate proteins involved in iron and manganese uptake (81, 82). Animal studies have found that PerR is needed for full GAS virulence in skin and soft tissue infection and in oropharynx infection (65, 82, 83). Surprisingly, the PerR regulon was found to include numerous carbohydrate utilization genes, suggesting links among central metabolic processes, oxidative stress response, and virulence (65). The key role for MtsR in the development of NF is discussed below in “Genome-wide dissection of the molecular events underlying GAS epidemics.”
The two major sites of GAS infection are the human throat and skin (84). It has long been recognized that particular M protein serotypes mainly cause pharyngitis, whereas others predominate in skin infection, leading to the idea of skin-specialist and throat-specialist GAS strains (85, 86). However, the molecular basis for these observations has been unclear (86). Bioinformatic study of GAS genomes led to the identification of an area of the genome that is highly variable between M protein serotypes, referred to as the fibronectin-binding, collagen-binding T antigen (FCT) region (85). Lately, evidence has accumulated that genetic heterogeneity within the FCT region may be a major factor determining why particular GAS strains colonize and infect distinct host regions (87). For example, the FCT region contains genes encoding the recently discovered cell surface pili that are critical to GAS epithelial cell adhesion (32, 33, 36). Thus, strain-to-strain differences in pili composition may contribute to the predisposition of particular GAS M protein serotypes for certain host sites in an analogous fashion to that observed for E. coli (88). Elucidation of the molecular basis for why particular GAS strains colonize and infect particular host environments holds the promise for developing novel preventive and therapeutic targets.
GAS thrives at human mucosal sites and also causes devastating invasive infections. Strains isolated from mucosal sites are genetically indistinguishable from invasive strains by standard assays, such as M protein serotyping and multi-locus sequence typing (89, 90). However, these techniques index only a very small part of the genome, which means that they greatly underestimate the amount of genetic variation present. Using newly developed genome resequencing techniques, Sumby et al. (91) analyzed the complete genomes of GAS isolates recovered from the spleen of mice that had been infected subcutaneously. Compared with the strain used to inoculate the mouse skin, the invasive GAS isolates (i.e., those obtained from the spleen after skin inoculation) had mutated forms of the control of virulence (CovR/S) TCS (91). Mutation of this TCS resulted in derepression of numerous virulence factors critical for combating the host immune system components encountered during bloodstream infection (Figure (Figure2)2) (91). For example, GAS secretes a potent DNase that is involved in escape from neutrophil extracellular traps (NETs), a host immune defense mechanism generated by dying PMNs (92). The DNase is upregulated in CovR/S mutants, thereby contributing to the development of invasive disease. The clinical relevance of these findings was confirmed by the discovery that many GAS strains causing invasive infections in humans often have function-altering mutations in the genes that encode the components of the CovR/S TCS (91, 93). Thus, it is currently thought that GAS mucosal isolates have an intact CovR/S system that limits GAS virulence factor production, whereas interaction with the host immune system and/or deep tissues selects for strains with CovR/S mutations, leading to a hypervirulent phenotype and the serious manifestations of invasive GAS disease (Figure (Figure2). 2).
Several types of molecular events contribute to the evolution and emergence of bacterial strains with enhanced virulence. The most well-understood and by far the most studied process is horizontal gene transfer (HGT), which involves bacterial viruses known as bacteriophages (transduction), plasmids (conjugation), and genomic DNA (transformation) (16, 94). HGT events create new strain genotypes by moving blocs of genetic material — sometimes large pieces of DNA that exceed 40–60 kb in size — between strains. Thus, HGT events represent a quantum evolutionary leap that can increase bacterial fitness by enhancing antimicrobial agent resistance, immune avoidance, and capacity to colonize or infect a new ecological niche. As in eukaryotes, bacterial evolution also occurs by more subtle processes, including accumulation of point mutations and small genomic changes such as those created by slipped-strand mispairing. Several of these molecular processes have contributed to the recent emergence and intercontinental dissemination of a new clone of serotype M1 GAS with distinct virulence properties (22, 95–97). This understanding was revealed by several lines of work, including comparative genome characterization conducted in several laboratories in a span of almost 20 years (22, 46, 95, 96, 98, 99). Comparative pathogenomic analysis resulted in two particularly important findings (22). First, low- and high-virulence serotype M1 strains differ in bacteriophage content and chromosomal integration site (22). Second, it was unexpectedly also discovered that another HGT event, involving reciprocal recombination of a 36-kb chromosomal region encoding the secreted toxins streptolysin O and NAD+-glycohydrolase, was a critical evolutionary event that shaped the genome of contemporary virulent M1 strains (22). The likely mechanism underlying this event was generalized transduction, a process involving inadvertent packaging of a random chromosomal segment from a donor strain into a bacteriophage capsid head, followed by transfer to a new recipient bacterial strain. Importantly, contemporary virulent M1 strains produce high levels of these two toxins compared with older, less virulent M1 strains, but the underlying molecular mechanism of increased expression is not yet understood.
Like other infectious diseases, the clinical manifestations of GAS infection reflect interaction between the bacterium and the host. In contrast to situations in which infection with the pathogen is the prime determinant of disease, such as occurs for Bacillus anthracis or ebola virus, most humans are repeatedly exposed to and colonized by GAS without developing clinical symptoms. Moreover, GAS isolates that cause superficial infections such as pharyngitis and impetigo can be genetically closely related to those of the same M protein serotype that cause lethal infections such as NF and toxic shock syndrome (90, 100). Therefore, it is highly probable that the development of severe clinical manifestations following GAS infection has a strong host susceptibility component. Most investigations of host susceptibility to severe GAS infection have focused on the role of HLA polymorphisms (101, 102). The interaction of GAS superantigens with HLA class II molecules on antigen-presenting cells can result in the activation of up to 25% of all T cells at a given time, although the number of T cells activated by a particular GAS superantigen varies substantially from person to person (103, 104). The protective and deleterious roles of particular HLA alleles in the development of streptococcal toxic shock syndrome were observed in an epidemiologic investigation and confirmed using transgenic mice (101, 102, 105).
In addition to focused research on HLA polymorphisms, recent studies have also begun to dissect the molecular basis of host susceptibility to serious GAS infection in mice using a genome-wide approach (106, 107). Researchers have exploited differences in susceptibility to GAS infection among strains of genetically defined mice to begin to localize protective and deleterious host genetic polymorphisms (106, 107). For example, a heightened inflammatory response to GAS was associated with worse outcomes in a mouse model of infection and linked to alterations in the expression of genes involved in apoptosis, macrophage activation, and prostaglandin synthesis (106). A genome-wide transcriptome analysis of murine macrophages also identified genes encoding proteins involved in prostaglandin synthesis as being upregulated during interaction with GAS, and the use of inhibitors of prostaglandin synthesis has been associated with severe GAS infection in humans (108, 109). There are significant experimental design and execution barriers to extrapolating the genome-wide approach to enhance understanding of human susceptibility to GAS infections. However, the key point is that the increasing availability of tools such as genome-wide SNP analysis holds significant promise for increasing our understanding of human host determinants of GAS infection.
Although group B Streptococcus (GBS) is a well-known cause of serious neonatal or maternal infections, GAS can also be responsible for these infections. Genome sequencing and pathogenesis studies have provided unexpected clues as to why serotype M28 GAS strains are repeatedly overrepresented in puerperal sepsis (childbed fever) cases (23).
It was hypothesized (23) that analysis of the genome sequence of a serotype M28 GAS strain causing puerperal sepsis would identify novel genetic elements that contributed to the overrepresentation of strains of this serotype in this infection type. Analysis of the genome sequence of a serotype M28 strain has borne this out, providing the highly unexpected discovery that these GAS strains have a genome that is a chimera, composed of largely GAS genetic material onto which has been molecularly grafted a large piece of foreign DNA shared with GBS strains (23) (Figure (Figure3).3). The foreign DNA is 37.4 kb in size, was acquired by HGT, and encodes seven secreted proteins that are produced in human infections (35). With very few exceptions, this genetic element is not present in other GAS strains. Importantly, one of the secreted proteins mediates attachment of M28 strains to human urogenital epithelium; another binds to GP340, a large human glycoprotein abundantly present in vaginal and oral secretions (35, 110). Thus, the M28 genome has been shaped by acquisition of a foreign genetic element that assisted in creating a disease-specialist GAS strain. As with other GAS molecular pathogenomics studies, these discoveries have served to catalyze downstream pathogenesis experiments. Studies are ongoing that are designed to provide a deeper understanding of the precise molecular processes involved in this niche adaptation and subsequent infection.
GAS has been used as a model system to study the molecular processes contributing to epidemics. Since 1992, more than 350 serotype M3 strains have been recovered in a prospective population-based surveillance study of GAS invasive infections being conducted in Ontario, Canada (27, 111, 112). These strains have caused two temporally distinct epidemic waves, centered in 1995 and 2000 (19). A molecular pathogenomics approach allowed the identification of key contributors to the episodic behavior of the GAS invasive isolates. First, the distinct epidemics were shown to be caused by a heterogeneous array of serotype M3 subclones, rather than recycling of a single clone (19). These distinct clones were characterized by the acquisition or loss of specific prophages that encode known GAS virulence factors. Second, host selective pressure appears to have resulted in a highly successful clone that rose to dominance in the second epidemic wave. DNA sequence analysis, coupled with immunologic studies, identified a four–amino acid duplication in the amino terminus of M protein in the new subclone responsible for many of the invasive cases in the 2000 epidemic (19). This duplication resulted in alterations in linear B cell epitopes, which produced substantial differences in the ability of human PMNs to phagocytosize and kill strains with the variant M protein (19). This key finding indicated that subtle or relatively minor allelic variation may participate in clone emergence and perpetuation of the epidemics.
To identify microbial genes or specific allelic variants that influence the outcome of host-pathogen interactions, it is possible to use a strategy analogous to that commonly practiced when undertaking genome-wide association studies (GWASs) — using genetic methods such as high-density SNP analysis — on humans with particular disease phenotypes, such as type 2 diabetes mellitus, macular degeneration, and schizophrenia (113). In many regards, GWASs are considerably simpler in bacterial infectious disease studies because, compared with humans, prokaryotic organisms have very small genomes and isogenic mutant strains can be generated and used to confirm the findings from the genetic association study. A novel bacterial GWAS conducted on strains recovered in the Ontario GAS epidemics (17) led to the discovery that a single nucleotide mutation in the gene encoding the MtsR metalloregulatory protein implicated in uptake of iron or manganese results in a decrease in the ability of GAS to cause devastating NF. The mechanism underlying this decreased ability to cause severe disease involves dysregulation of the control circuit responsible for wild-type levels of SpeB (our unpublished observations), a potent broad-spectrum protease that degrades extracellular matrix proteins, inactivates innate immune molecules, and destroys host tissue (114, 115). Inasmuch as single nucleotide mutations are the most abundant cause of genetic variation among members of the same species (13–15, 17, 18, 116), this discovery has broad implications for the confluence of bacterial molecular population genomics and pathogenesis research.
In the long term, the key to the control of most human pathogens is the use of efficacious vaccines, and GAS is no exception. Transformation of vaccine research and development by genome-wide analyses is expertly summarized in this Review series by Rinaudo et al. (117). Nevertheless, we believe that a few points related to GAS vaccine research need to be stressed here. Although the quest for a successful human vaccine against GAS pharyngitis and invasive infections has been ongoing for many decades (reviewed in ref. 8), and much progress has been made, a product licensed for use in the United States or elsewhere is still lacking. The many complete genome sequences of GAS that are publicly available have been used to assist the decades-long search for GAS vaccine candidates. For example, Lei et al. (118) analyzed the extracellular proteome and identified several previously undescribed proteins that have been the subject of subsequent vaccine research. Similarly, bioinformatic analysis of multiple GAS genomes identified 16 highly conserved cell surface lipoproteins. Subsequent biochemical, immunological, and mouse model research identified five lipoproteins as potential new vaccine candidates (119). Using a proteomics approach, Cole et al. showed that the GAS cell surface contains numerous immunogenic proteins not previously suspected of being associated with the cell wall, thus expanding the number of potential GAS vaccine candidates (120). The finding of proteins on the GAS cell surface or in the culture supernatant that lack traditional motifs associated with cell surface and secreted proteins has been a consistent finding of proteomic studies, stressing the need for experimental validation of predictions based on bioinformatic analyses (118, 120–123). Improved modalities for separating GAS cellular versus cell surface constituents may provide additional insights into novel GAS proteins that are critical for host-pathogen interaction and are thus potential vaccine candidates (124). In the most extensive GAS vaccine candidate study to date, Rodriguez-Ortega et al. (50) analyzed the surface-exposed proteome of GAS and identified one new antigen (SpyCEP) that conferred the ability to protect mice against lethal infection.
Enormous strides have been made in the last decade in our understanding of GAS-host interactions, and molecular pathogenomics studies have contributed substantially. However, many gaps remain in our knowledge. Of note, many aspects of the host response during GAS infection remain largely unknown, but application of the genome-wide integrative strategies discussed above to the host side of the equation are likely to yield useful data. Analysis of patients with ARF may be particularly useful in this regard, as elucidating the host genetic factors that contribute to susceptibility may aid in the development of novel diagnostics and treatments for this devastating disease. In addition, DNA sequencing costs continue to decrease dramatically, opening the door to many types of projects previously impossible due to financial constraints. For example, it is now reasonable to consider projects that involve genome sequence analysis of many hundreds or more GAS strains, similar to the 1,000–human genome project that is well underway (http://www.1000genomes.org/). Finally, we are hopeful that the new avenues of basic and translational research made possible by molecular pathogenomics studies will ultimately provide strategies for ameliorating human morbidity and mortality caused by GAS.
We thank K. Stockbauer for assistance with figures and members of our laboratories and anonymous reviewers for suggestions to improve the manuscript. The restricted length of the review prohibited us from citing all relevant work.
Conflict of interest: J.M. Musser received research support from Novartis Vaccines.
Citation for this article: J. Clin. Invest. 119:2455–2463 (2009). doi:10.1172/JCI38095