|Home | About | Journals | Submit | Contact Us | Français|
A pneumococcal serotyping/genotyping system (PSGS) was developed based upon targeted PCR, followed by electrospray ionization mass spectrometry and amplicon base composition analysis. Eight multiplex PCRs, 32 targeting serotype-determining capsular biosynthetic loci, and 8 targeting multilocus sequence typing (MLST) loci were employed for each of 229 highly diverse Streptococcus pneumoniae isolates. The most powerful aspect of the PSGS system was the identification of capsular serotypes accounting for the majority of invasive and carried pneumococcal strains. Altogether, 45 different serotypes or serogroups were correctly predicted among the 196 resolvable isolates, with only 2 unexpected negative results. All 33 isolates that represented 23 serotypes not included in the PSGS yielded negative serotyping results. A genotyping database was constructed using the base compositions of 65- to 100-bp sections of MLST alleles compiled within http://www.mlst.net. From this database, one or more MLST sequence types (STs) that comprised a PSGS genotype were identified. The end result of more PSGS genotypes (163) than conventional STs actually tested (155) was primarily due to amplification failures of 1 to 3 targets. In many instances, the PSGS genotype could provide resolution of single- and double-locus variants. This molecular serotyping/genotyping scheme is well suited to rapid characterization of large sets of pneumococcal isolates.
Disease-causing Streptococcus pneumoniae strains represent a genetically highly diverse array of strains that collectively cause a staggering global disease burden. The most important single parameter for epidemiological typing of this pathogen is the capsular serotype, for which there are 93 known types (5, 7, 16, 20, 22). The capsule is the organism's most important virulence factor, and current licensed vaccines effectively target subsets of specific capsular serotypes. Analysis of the sequences of the known capsular loci (4) has led to the development of useful PCR-based methods for deducing important serotypes (2, 10, 17, 18, 21).
Pneumococcal genotyping is also important for effective epidemiological monitoring of the pathogen. Multilocus sequence typing (MLST) provides unambiguous digital identifiers of specific clones (13). These MLST sequence types (STs) can reveal important insights into the emergence of specific clones within specific serotypes. The emergence of specific serotype-ST combinations can sometimes be traced back to serotype switch events or recent imports of internationally successful pathogenic strains (3, 6, 19).
The PCR/electrospray ionization mass spectrometry (ESI-MS) genetic analysis approach using the Ibis T5000 platform involves distributing a DNA extract into wells of a microtiter plate, each of which contains one or more PCR primer sets (11). The PCR product is electrosprayed into a mass spectrometer, allowing the derivation of the base composition of each DNA strand from the mass/charge ratio. In addition, a synthetic internal positive control targeted by one of the primer pairs is included in each well. Thus, if signal originates only from the positive control, it is established that amplification conditions were satisfactory but the sample DNA concentration was too low or the primers did not hybridize to the sample DNA. Using combined analysis of base compositions from multiple PCRs, evaluation of the genomic sample can be made. DNA base composition determination of specifically targeted PCR fragments through PCR/ESI-MS methodology has tremendous resolving power in the identification and genotyping of specific pathogens (12).
Current methods for serotyping and genotyping pneumococci are laborious and time-consuming. Here, we present the PCR/ESI-MS approach for obtaining both parameters concurrently and rapidly, based upon targeted PCR and the subsequent ability of mass spectrometry to determine the base compositions of the separate PCR fragment strands to identify this information within a known sequence database.
A highly diverse and well-characterized collection of 229 pneumococcal disease isolates and reference strains, altogether representing 76 different serotypes and 155 multilocus STs, was tested. They included 161 invasive isolates recovered through CDC's Active Bacterial Core surveillance (15) that represented 57 different serotypes. All serotypes had been previously determined using the Quellung reaction with typing sera prepared in the CDC Respiratory Diseases Branch Streptococcus Laboratory. STs were assigned for the 229 S. pneumoniae isolates by performing MLST as previously described (13) with modifications (19). In addition, six strains within the non-pneumococcal streptococcal species Streptococcus mitis, Streptococcus pseudopneumoniae, Streptococcus oralis, and Streptococcus pyogenes were tested.
Crude DNA extracts were prepared from fresh pneumococcal cultures grown on sheep blood agar plates, as previously described (21).
All aspects of the highly automated PCR/ESI-MS methodology using the T5000 unit have been previously described (11, 12). PCR reagents were distributed in a 96-well format to provide the capability of querying up to 12 DNA extracts with the 40 different primer pairs (Table 1). The PCR/ESI-MS pneumococcal serotyping and genotyping system (PSGS) employed 40 primer pairs, 32 of which targeted serotype-specific genes within pneumococcal capsular biosynthetic loci corresponding to the serotypes most frequently associated with invasive disease. The serotypes targeted include all of those contained in the recently licensed 13-valent conjugate vaccine (9), as well as the 23 serotypes included in the purified capsular polysaccharide vaccine (23). Additional serotypes commonly found in pneumococcal carriage were also targeted by the PSGS. The remaining 8 primer pairs used for the PSGS targeted highly discriminating segments of the 7 pneumococcal MLST loci (13), with the ddl locus targeted twice. These 40 primer pairs were distributed into eight 5-plex PCRs, as shown in Table 1 (Multiplex PCRs 1 to 8). The T5000 instrument allows the analysis of any combination of these 40 amplicons using PCR-electrospray ionization mass spectrometry (11). A positive control, designed to be targeted by each of the eight MLST primer pairs, was included in all wells.
The PSGS serotyping signature database corresponded to the base compositions of 46- to 110-bp amplicons derived from the targeting of 4 to 56 bp of known capsular biosynthetic loci by 32 distinct primer pairs (the first 4 primer pairs listed for each well of multiplex reactions 1 to 8 in Table 1). The PSGS MLST signature database corresponded to the base compositions of 117- to 142-bp amplicons derived from the targeting of 65- to 100-bp segments of the 7 MLST alleles included in the MLST database (the 5th primer pair listed for each of multiplex reactions 1 to 8 in Table 1); they included 6,158 STs as of 28 January 2011 (http://spneumoniae.mlst.net/). STs were scored by the best matches of base composition to the 8 different segments within the sequence database. STs consistently shared between all amplified targets were included within the assigned PSGS genotype. Thus, the PSGS genotype is defined as the aggregate of STs determined in this manner, with each ST defined as a PSGS genotype component.
The eBURST algorithm (14) was used as described at http://spneumoniae.mlst.net/eburst/. The program resolves 7 locus genotypes into related groups, with the simple requirement that each genotype included within a group share 6 of 7 identical loci with at least one other member of the group. Further, the algorithm predicts that the founder genotype of the group exhibits the greatest number of single-locus variants present within the group. For the purposes of this work, eBURST was used to divide all conventionally obtained MLST types into individual groups and singletons. eBURST was additionally employed to analyze the PSGS genotype components in order to relate them to the MLST type (see Table S1 in the supplemental material).
Altogether, the PSGS assigned 45 serotypes or serotype groups to 194/229 S. pneumoniae isolates (84.7%). The following 32 individual serotypes (number of isolates tested in parentheses) were correctly assigned to 116 isolates: 1 (3), 3 (3), 4 (4), 5 (4), 8 (2), 9L (3), 9N (3), 10A (6), 10B (1), 11F (1), 13 (3), 14 (6), 15A (6), 16F (4), 17F (3), 18A (3), 19A (10), 19B (2), 19C (1), 19F (7), 20 (3), 23A (2), 23B (2), 23F (10), 25F (1), 31 (3), 34 (2), 35B (6), 35F (4), 37 (4), 38 (4), and 40 (2). The PSGS correctly assigned 1 of 13 combinations of 2 or 3 serotypes to 78 isolates, as some targeted gene segment sequences were shared between closely related serotypes (Table 2). Each multiplex PCR included at least one primer pair that allowed the resolution of multiple serotypes through base composition analysis of the resultant amplicon (serotypes separated by a semicolon in Table 1). In each instance, distinctive signatures were captured by ESI-MS, and the respective serotypes were correctly determined. In particular, the primer pair 1-1 fully resolved the 20 isolates of serogroup 19A/B/C/F into the 4 composite serotypes 19A, 19F, and the rarely encountered serotypes 19B and 19C; for all 17 isolates belonging to the main serotype 19A or 19F, the serotypes were also independently confirmed by amplicon 4-2 analysis. No serotype was predicted for the 33 isolates that represent 23 rarer serotypes not covered by the PSGS (number of isolates tested in parentheses): 2 (1), 10F (2), 11B (1), 11C (2), 21 (1), 24A (1), 24B (2), 24F (2), 28A (2), 28F (2), 32A (2), 33B (1), 33C (1), 35A (1), 35C (2), 41A (2), 41F (1), 42 (2), 43 (1), 45 (1), 47A (1), 47F (1), and 48 (1). We noted only two failures in the serotyping assay: no serotype result was provided for one of three serotype 13 isolates and one of two serotype 40 isolates. Retrospective attempts to amplify the 2 strains with individual PCRs (Table 1, primer sets 7-4 and 8-1) were unsuccessful, indicating sequence divergence from the available known capsular biosynthetic loci from serotypes 13 and 40 (4).
The genotyping component of the PSGS assay is based on amplicon mass signatures retrieved in 8 targets representing the 7 MLST loci (ddl is targeted twice). Therefore, the reported genotypes consist of lists of 1 to 77 component STs (10.5 on average) that are compatible with the aggregate signatures that were observed. In the present study, the PSGS assigned one of 163 distinct genotypes to all 229 isolates (see Table S1 in the supplemental material). The conventionally obtained isolate ST was a PSGS genotype component in 217/229 instances (94.8%), including 42 instances (18.3%, representing 39 different STs) in which the isolate ST was uniquely retrieved. Of the 12 isolates for which the ST was not retrieved, 4 belonged to STs that were not represented in the PSGS signature database at the time of analysis; the remaining 8 isolates were not properly identified because of incorrect or missing signature determinations for one of the eight genotyping targets. Nonetheless, in 6 of those 12 instances, the isolate ST shared high relatedness with the PSGS output STs (see Table S1 in the supplemental material for PSGS genotypes 8, 29, 35, 109, and157), while the retrieved component STs differed from isolate ST by 2 alleles or more in the last instances.
Complete, 8-target signatures are normally expected for all STs; however, nine rare STs share a deletion in xpt, which would result in a 7-target signature. One of those STs was surveyed in the present study (ST695) and was correctly represented as a component of PSGS genotype 40 (see Table S1 in the supplemental material). PSGS genotypes were based on complete 8-target signatures for 167 isolates (representing 112 STs) and on partial 7-, 6-, and 5-target signatures for 50, 9, and 3 isolates, respectively. The retrieval of those partial signatures is the main reason why the number of PSGS genotypes (163) is, counterintuitively, higher than the number of STs that are actually present (155). For 164 isolates, there was a strict correlation between the actual ST and the PSGS genotype: each one of the corresponding 128 STs (representing 1 to 4 isolates) was associated with a particular PSGS genotype and vice versa (see Table S1 in the supplemental material for the complete detail for each isolate). The influence of partial (meaning a failure of 1 or more of the 8 targets) signatures on PSGS genotyping is illustrated in Table 3, where 45 isolates, belonging to 14 STs, are characterized by no less than 30 PSGS genotypes. As predicted, these incomplete signatures contained increased PSGS genotype components due to decreased specificity (this holds true for 11 of the 14 STs in Table 3, i.e., STs 100, 447, 62, 199, 13, 191, 220, 236, 376, 568, and 63). For 3 STs, however, differences in PSGS genotypes originated instead in a slightly imprecise mass signature in a single locus (PSGS genotypes 47/48 [ST5979], 156/157 [ST81] and 108/109 [ST393]). Conversely, 23 isolates belonged to 15 STs but shared 1 of only 7 PSGS genotypes due to instances of 2 or 3 related STs being encompassed within the same PSGS genotype (Table 4).
We found that eBURST groups in which two or more group members were either identical to or single-locus variants of the isolate ST were evident within the PSGS output for 161 of the 229 (70.3%) isolates (see Table S1 in the supplemental material). In 159 of these, the isolate ST was included in the PSGS output and showed clear relatedness to the major eBURST group members. For 104 of these 159 (65.4%) instances of deduced eBURST groups within the output, the isolate ST was the predicted founder of the major eBURST group. Table 4 and Fig. 1A and B depict the various capabilities of PSGS genotyping to distinguish between certain single-locus variants. This is shown by the eBURST group 1 member ST156, together with ST156 single-locus variants ST334, ST166, and ST162. Here, we observed that ST334 and ST166 shared PSGS genotype 1 with ST156, while ST162 shared PSGS genotype 2 with its single-locus variant ST1269 (a double-locus variant of ST156). The single-locus variant of ST1269 (and the double-locus variant of ST162), ST643, was assigned PSGS genotype 3 (Fig. 1A shows diagrammatic results corresponding to eBURST group 1). In addition, Fig. 1B illustrates the resolution of all 5 eBURST group 2 STs into 5 distinct PSGS genotypes and their relationship with the 5 PSGS genotype components. More examples of the capability of PSGS genotyping to resolve closely related isolates are listed in Table S1 in the supplemental material.
Nonpneumococcal species uniformly yielded no amplicon with the 32 serotyping primer pairs and at most one amplicon with the eight MLST primer pairs. The detection by PCR/ESI-MS of the positive control in each well showed that PCR conditions were satisfactory.
The analysis of the 229 pneumococcal test strains using the Ibis T5000 biosensor-based PSGS provided a robust demonstration of its capability in identifying and resolving serotypes. The PSGS was designed to segregate 62 out of the 93 currently recognized serotypes into 48 distinct sets of one to three related serotypes. Of 51,498 isolates in our invasive-isolate collection that were assigned Quellung reaction-based serotypes from 1995 to 2011, we found that only 166 (0.3%) were assigned 19 serotypes (2, 10F, 11B, 11C, 16A, 17A, 24A, 24B, 24F, 21, 27, 28A, 28F, 35A, 35C, 36, 39, 41A, and 43) not covered by the T5000 PSGS serotype deduction platform. Of these serotypes, type 21 was the most abundant (57 isolates).
In the present study, 53 serotypes (out of the 76 distinct serotypes actually tested) were, as expected, segregated into 45 distinct sets. For our purposes, it is important to note that within each of the serotype sets 7A/7F, 9A/9V, 11A/11D, 12F/12B/44, 18B/18C, 22F/22A, and 33A/33F shown in Table 2, there is only one predominant serotype associated with invasive disease in the United States (unpublished observations). For example, out of our collection of over 51,498 invasive isolates serotyped over the past 16 years through Active Bacterial Core surveillance (3, 6, 8, 15, 19), serotype 7F is documented from 3,806 isolates while serotype 7A has been found in only 9 isolates. Within this same collection, 1,903 serotype 12F isolates are documented, while only 4 serotype 12B isolates and no serotype 44 isolates have been found. The only primer pair listed in Table 1 that is incapable of resolving important serotypes from the vantage point of our invasive-disease surveillance is 1-4 (6A/6C-6B/6D), which does not serve to resolve the recently discovered serotype 6C from serotype 6A. Similarly, it cannot resolve serotypes 6B and 6D; however, we have found only two serotype 6D isolates within U.S. invasive-disease surveillance within the past 5 years. There are now PCR assays available that can resolve the 4 serogroup 6 serotypes (8, 20), which could be added to the PSGS assay.
While the PSGS genotyping component is far more complex than the serotyping component, it has obvious potential. The genotyping component of the PSGS assay is naturally less resolving than conventional sequence-based MLST, since each locus signature captures only a fraction of the information content of the targeted allele. The loss of resolving power is, however, limited, partly because of the compounding effect of the multilocus analysis and partly because the conventional MLST sequence scheme is itself highly redundant. The PSGS was quite efficient in assigning concordant MLST-based genotypes, especially when conventional genotypes were members of the major pneumococcal complexes that are well represented within the global MLST database. It must be noted that we tested the PSGS with extremely rare and unusual strains, often with serotypes that we have never observed outside of our serotype reference strains. There are a number of parameters that dictate the relationship of a PSGS signature to the existing MLST database. These parameters include the relative relatedness of the isolate ST to previously existing STs. This can be reflected by a PSGS signature consisting of a small number of ST components that are distantly related (differing in 2 to 5 loci), as in categories 6 and 7 (Table 5). Under these circumstances, we found the actual target sequences within the distantly related STs to be identical or highly conserved (data not shown). All 217 PSGS signatures where the isolate ST was included were found to share highly related target sequences and therefore provided useful genetic identifiers. Only 6 of the total 229 results (shown in Table 5, category 9) were found to inaccurately reflect the isolate genotype. The scope of the genetic diversity of the 229 strains used in this study is evidenced by the fact that they were represented by 155 different conventional STs. Forty-one of these genotypes were added to the genotyping database during the past 3 years, and 35 of the study isolate STs were discovered during this work. Nonetheless, although the PSGS genotype output relies upon the completeness of the genotyping database, we feel that its power in providing reasonably specific genotypes for strains in a labor-efficient manner is obviously well suited for large-scale surveillance efforts. This process could be simplified if a database of reference PSGS genotype designations corresponding to specific signatures of varying complexity could be provided as output.
In addition to its utility for pneumococcal isolate typing, we also feel that the PSGS has potential for determining serotypes and genotypes from culture-negative clinical specimens, such as cerebrospinal fluid, that yield ample pneumococcal DNA for conventional PCR assays (1, 24). It seems particularly suited for detecting mixed serotypes associated with nasopharyngeal carriage of pneumococci using a broth enrichment culture step for specimens prior to DNA extraction (10). Mixed carriage of multiple pneumococcal serotypes is quite common, and while their detection using conventional PCR assays is very effective, the process requires a great deal of manual manipulation.
Many of the isolates used were from the Active Bacterial Core surveillance program (http://www.cdc.gov/abcs/index.html), and we thank all of its participants who made these invasive isolates available. We also thank Dick Facklam for his care and maintenance of pneumococcal reference strains representing the rarely encountered serotypes included in this study.
Published ahead of print 21 March 2012
Supplemental material for this article may be found at http://jcm.asm.org/.