|Home | About | Journals | Submit | Contact Us | Français|
Neisseria meningitidis causes invasive meningococcal disease in infants, toddlers, and adolescents worldwide. DNA sequence-based typing, including multilocus sequence typing, analysis of genetic determinants of antibiotic resistance, and sequence typing of vaccine antigens, has become the standard for molecular epidemiology of the organism. However, PCR of multiple targets and consecutive Sanger sequencing provide logistic constraints to reference laboratories. Taking advantage of the recent development of benchtop next-generation sequencers (NGSs) and of BIGSdb, a database accommodating and analyzing genome sequence data, we therefore explored the feasibility and accuracy of Ion Torrent Personal Genome Machine (PGM) sequencing for genomic typing of meningococci. Three strains from a previous meningococcus serogroup B community outbreak were selected to compare conventional typing results with data generated by semiconductor chip-based sequencing. In addition, sequencing of the meningococcal type strain MC58 provided information about the general performance of the technology. The PGM technology generated sequence information for all target genes addressed. The results were 100% concordant with conventional typing results, with no further editing being necessary. In addition, the amount of typing information, i.e., nucleotides and target genes analyzed, could be substantially increased by the combined use of genome sequencing and BIGSdb compared to conventional methods. In the near future, affordable and fast benchtop NGS machines like the PGM might enable reference laboratories to switch to genomic typing on a routine basis. This will reduce workloads and rapidly provide information for laboratory surveillance, outbreak investigation, assessment of vaccine preventability, and antibiotic resistance gene monitoring.
Neisseria meningitidis is a Gram-negative facultative pathogen which causes invasive disease mostly in infants, toddlers, and adolescents (34). Despite a relatively low incidence in industrialized countries, the disease is considered to be of highest priority (3), because of a substantial case fatality rate and the risk of secondary cases, outbreaks, and even epidemics. Many countries therefore established reference laboratories for laboratory surveillance of disease, in addition to statutory notification. Typing of the organism is a major task of reference laboratories (47). DNA sequence-based typing has replaced earlier immunotyping (46). Capsular serogroup determination in conjunction with antigen sequence typing of PorA and FetA, immunodominant outer membrane proteins, is highly discriminatory (11, 44). Multilocus sequence typing (MLST) (28) provides information about the clonal descent of an isolate (27). Sequence-based typing has, furthermore, been developed to determine variations in antibiotic resistance genes associated with reduced antimicrobial susceptibility (37, 41). Sequence typing of subcapsular antigens, in addition, has become a major aspect of molecular typing of meningococci, with increasing attention being drawn toward protein-based vaccines against serogroup B meningococci (17, 18, 26). This includes sequence typing of the genes which encode the factor H binding protein (fHbp) (15, 33), the neisserial heparin binding antigen (36), and the neisserial adhesin A (8). Other protein-based vaccines in development comprise PorA (30, 45), ZnuD (38), and Opc (24). Therefore, simultaneous assessment of a large variety of target genes by DNA sequencing is highly desirable.
The meningococcal field is especially well suited as a paradigm for typing by whole-genome sequencing because of the development of sequence databases for meningococcal typing, which probably represent the most powerful resource in the bacterial world (20, 22, 23). As a latest development, the typing data are now powered by BIGSdb, an open-source software accommodating genome sequence data (23). The new database engine flexibly integrates numerous genomic loci for genetic analysis and serves for phylogenetic analysis, bacterial typing, and functional assessment. Most importantly for this project, submission of discontinuous genome sequence data from whatever source can be used to query MLSTs with 7 but also 20 loci comprising antigen sequences not only of variable regions but also of the whole coding gene, multiple vaccine antigens, and antibiotic resistance genes.
The introduction of affordable and fast benchtop next-generation sequencer (NGS) machines like the Ion Torrent Personal Genome Machine (PGM) or the Illumina MiSeq apparatus makes bacterial whole-genome sequencing (WGS) feasible for small and medium-sized laboratories (35). As the price per base continues to drop, these machines will soon be applicable for routine surveillance WGS. In this proof-of-principle study, we assessed the performance of PGM for meningococcal typing with three strains from a serogroup B meningococcal outbreak (12) and a genome-sequenced type strain as examples. We took advantage of the Neisseria PubMLST database (http://pubmlst.org/neisseria/) to rapidly analyze data. We were interested in the accuracy of data, whether editing of raw data was necessary, how much information beyond the standard typing scheme can be achieved, and what the time from DNA preparation to the result is.
Serogroup B strains DE9622, DE9686, and DE9938 were isolated in 2003 and 2004 in the neighboring German counties Düren, Aachen, and Heinsberg as part of a long-lasting community outbreak caused by strains of sequence type 42 (ST-42) (ST-41/44 clonal complex [cc]), with PorA variable region 1 (VR1) being 7-2, VR2 being 4, and FetA being F1-5 (12). The German outbreak strains were highly similar to the New Zealand outbreak strain and were identical upon analysis by multiple-locus variable-number tandem-repeat analysis (MLVA) (2, 12). The strains used in this study were isolated from children aged 2 to 4 years and were sent to the national reference laboratory in Würzburg, Germany, for typing in the frame of national laboratory surveillance. Strain MC58 (serogroup B, ST-74, ST-32 cc) was chosen as a reference strain because it was the first meningococcal isolate to be completely sequenced (43). The strain was isolated from invasive meningococcal disease in the United Kingdom in the 1980s and was kindly provided by E. R. Moxon (Oxford, United Kingdom). MICs toward penicillin and rifampin were determined according to the manufacturer′s instructions by Etest (bioMérieux, France). Bacteria were grown on Mueller-Hinton agar supplemented with 5% sheep blood (Becton Dickinson, Germany). Breakpoints defined by CLSI (http://www.clsi.org/) and EUCAST (http://www.eucast.org/) were applied.
Meningococci were incubated on sheep blood agar (bioMérieux, France) overnight at 37°C in 5% CO2. DNA isolation was performed as described previously (32). The quality of the genomic DNA was checked by gel electrophoresis, the purity was measured with a NanoDrop 1000 apparatus (NanoDrop Products), and the quantity was estimated by a fluorescence-based method using a Qubit double-stranded DNA BR assay kit and a Qubit fluorometer (Life Technologies, Germany) according to the manufacturer's instructions.
The genome sequences of the selected strains were determined on the PGM (Life Technologies, Germany). Libraries were generated using 1 μg of the genomic DNA and an Ion Xpress Plus fragment library kit comprising the Ion Shear chemistry according to the user guide. After a dilution of each library to 2.66 × 107 molecules/μl, 4.5 × 108 molecules were used as the templates for clonal amplification on Ion Sphere particles during the emulsion PCR according to the Ion Xpress Template 200 kit manual. The quality of the amplification was estimated on a Guava easyCyte 5 system (Millipore, Germany), loaded onto an Ion 316 chip, and subsequently sequenced using 105 sequencing cycles according to the Ion Sequencing 200 kit user guide. One hundred five sequencing cycles approximately result in an average reading length of 200 nucleotides.
The MIRA program (version 3.4.0) was used for de novo assembly of all four genomes (7). Consecutively, the draft genomes were uploaded to the BIGSdb website (http://pubmlst.org/software/database/bigsdb/) and analyzed (23). DNA sequences in FASTA format were submitted online to the neisserial locus/sequence definitions database at http://pubmlst.org/neisseria. The database was interrogated for each locus in succession. “Exact match” was searched for and recorded. Single nucleotide polymorphisms and indels of the MC58 genome sequence were extracted with the newest version of the CLC Genomics Workbench software (CLCbio, Aarhus, Denmark). For the comparison of the syntenic conservation of the chromosomal location of multiple genes between two genomes, synteny plots were generated using the MUMmer (version 3.0) software suite (25).
Genome projects have been registered as NCBI Bioprojects PRJNA78229, PRJNA78227, and PRJNA78225. The draft genome projects have been deposited in DDBJ/EMBL/GenBank under accession numbers AJJW00000000, AJJX00000000, and AJJY00000000. The versions of the draft genomes described in this paper are the first versions of these entries, i.e., AJJW01000000, AJJX01000000, and AJJY01000000, respectively.
The turnaround time from DNA isolation to sequence reads was about 32 h. The MIRA de novo assembly on an ordinary personal computer then took approximately 3 h per strain, and the following BIGSdb analysis needed another 20 min per genome. The accuracy of the sequencing protocol was first assessed by comparison of the MC58 draft sequence with the published genome sequence of the strain retrieved from GenBank accession number AE002098.2 (43). The draft genome had a 49-fold sequencing coverage, resulting in 181 contigs. The sequence length of the consensus was 2,194,618 nucleotides, the length of the largest contig was 132,117 nucleotides, and the median length was 25,538 nucleotides. Twelve single nucleotide polymorphisms were determined; four nucleotides differed, and at eight loci the chip technology revealed ambiguous results. Resequencing of 8 of the 12 loci by PCR and Sanger sequencing on both strands revealed identity to the original genome sequence in four cases and to the semiconductor technology-derived sequence also in four cases. Furthermore, 1,538 indels were recorded. We selected 10 loci for control by Sanger sequencing. All indels identified by semiconductor technology turned out to be false. However, this did not affect any of the typing loci described below. The accuracy of the sequence assembly was further demonstrated by the syntenic dot plot comparing the published MC58 sequence (43) with the de novo assembled MC58 sequence generated herein (Fig. 1). Ninety-two percent of the published open reading frames were identified in the MC58 draft genome. The syntenic dot plot, in addition, demonstrates a satisfactory agreement of the assemblies with very few discontinuities.
Three meningococcus B strains from a previous community outbreak in western Germany (12) and the type strain MC58 (43) were analyzed by semiconductor-based sequencing. De novo assembled genomes were analyzed by querying sequences in batch against BIGSdb in order to compare the data with our previous conventional typing results (12) and, in addition, to assess the multitude of typing information retrievable by the novel sequencing approach. The accuracy of extended typing beyond the data achieved with our previous conventional typing results (12) was assessed by strain comparison assuming a very high strain identity. For this purpose, we employed BIGSdb (23), to which whole-genome data can be uploaded and which allows comparison to genes of various functional categories. Table 1 summarizes the results. For all four strains (three outbreak strains with the prefix DE and reference strain MC58), conventional typing results available at the laboratory were confirmed without any further editing of the sequences. In addition to MLST and antigen sequence typing data, typing information was retrieved for the following: 13 additional housekeeping gene loci (extended MLST [eMLST] ), complete porA gene, partial and complete porin B (porB) gene, vaccine antigen gene fhbp (6), and antimicrobial resistance genes penA (41) and rpoB (40). Twelve of the additional 13 housekeeping loci of the eMLST (9) were identical among the outbreak strains, suggesting that they were correctly assessed. One locus differed in one strain by 14 nucleotide exchanges. This finding was highly suggestive of a recombination event, which is not unlikely even in highly related and epidemiologically linked strains (21, 39). This recombination event was independently confirmed by Sanger sequencing on both strands (Fig. 2). The complete porA and porB genes were also fully identical among the outbreak strains, as were the sequences of the fhbp gene. Finally sequence analyses of the antimicrobial resistance genes penA and rpoB revealed identical alleles (alleles 1 and 18, respectively) in strains DE9622, DE9686, and DE9938. The phenotypic susceptibility, determined as the MIC by Etest, was in line with the molecular analyses. For DE9622, DE9686, and DE9938, the penicillin MIC was 0.047 μg/ml and the rifampin MICs were 0.004, 0.006, and 0.008 μg/ml, respectively. The strains were thus determined to be sensitive by use of the Etest breakpoints defined by CLSI (http://www.clsi.org/) and EUCAST (http://www.eucast.org/). In the penA database at www.Neisseria.org (41), penA allele 1 is associated with a susceptible phenotype, because it lacks the typical mutations associated with reduced susceptibility. The same held true for rifampin, where allele 18 is in line with a susceptible phenotype (http://pubmlst.org/neisseria/).
Meningococcal typing serves a variety of purposes (16). Whereas the so-called fine type, which includes serogroup, PorA type, and FetA type (11), and the sequence type provide a framework for strain discrimination and phylogenetic assignment (19), prediction of antimicrobial resistance (37, 41) and vaccine strain coverage (18, 26) assists with clinical management and preventive measures. Vaccine antigen typing especially needs to be flexible due to various approaches to meningococcus serogroup B subcapsular antigen vaccine development (42, 48).
In Europe, for logistic and financial reasons, many national reference laboratories seem to be overburdened by the effort to fulfill the typing requirements of the European Centre for Disease Control and Prevention (ECDC), which include serogroup, PorA and FetA types, and the sequence type (16, 47). Running of seven PCRs and 14 sequencing reactions for a complete MLST scheme, despite all possibilities of automation, is an obvious challenge if done on hundreds of isolates. Not surprisingly, with the advent of deep sequencing technologies such as the Illumina technology (5) and 454 technology (29), interest in replacement of time-consuming PCR combined with Sanger sequencing by genome data acquisition has increased considerably but until now was hampered by cost and demands for rapid data processing. A further major advantage of whole-genome sequencing would be to archive abundant strain information for rapid retrospective reanalysis, if necessary.
This report describes the first application of 200 base reads for the Ion PGM platform. Increased reading length improves substantially the results of the de novo assembly. Longer reading length and scalability are the discriminating features in comparison to the just-released Illumina MiSeq platform. Because of PGM's pyrosequencing procedure, quite a high number of indel errors due to homopolymers were observed. However, indels did not affect the results of this study, and further editing was not necessary. Furthermore, this systematic error can, in principle, be well compensated when employing a genome-wide gene-by-gene analysis coupled with allele reference databases like BIGSdb. It is important to note that substitution errors, which are not compensable by comparison to allele reference databases, appeared at a very low rate.
The semiconductor technology and the related 454 technology are sensitive to homopolymeric tracts. We therefore wondered to what extent homopolymeric tracts in the meningococcal genome will cause difficulties for the typing approach. In fact, the meningococcal genome contains a variety of long intra- and intergenic homopolymeric tracts whose erroneous replication causes phase variation (31). Fortunately, none of the typing loci addressed in this study belonged to the category of contingency loci. Most surprisingly, the de novo sequence was highly robust and no further editing of sequences was required. Furthermore, the detection of numerous indels, which were incorrectly identified by the novel sequencing technology, was without consequence for the typing of three strains, as no indels were identified in the numerous loci addressed for typing. This is an important finding, as manual editing consecutive to genome sequence assembly would otherwise be detrimental to broad application of genomic typing.
The availability of the Neisseria sequence typing home page powered by BIGSdb greatly facilitated the approach (23). Typing data for thousands of strains have been compiled herein, and the query platform allows the interrogation of multiple loci within a negligible amount of time. The concept behind BIGSdb was a prerequisite for this study. It should be highlighted that comparable database structures are also needed for the application of genomic typing to other organisms of public health importance.
Genomic typing of microorganisms such as Neisseria meningitidis does not alter the typing philosophies per se; it simply facilitates data acquisition. Typing of meningococci by MLST with seven loci is sufficient to define the clonal framework of a strain. eMLST of 20 loci mostly serves refined phylogenetic analyses (9). The combination of serogroup, PorA type, and FetA type has been validated for its discriminatory power to identify possible epidemiological links between cases (10). For 1,616 strains isolated over a period of 42 months, Simpson's index was high, at 0.963. Discriminatory power does not need to be extended by inclusion of other targets for most purposes. However, genomic typing greatly facilitates the portfolio for on-the-fly analysis of vaccine antigens and antimicrobial resistance genes. It provides the unique possibility of data storage for retrospective analysis of strains with regard to antigen-encoding genes included in future generations of vaccines. Currently, retrospective analyses of this kind are regularly initiated for investigational vaccines and require novel repetitive sequencing of hundreds of strains (4). It will greatly facilitate the search for specific markers in the event of emergence of a new, highly virulent clone, such as the so-called electrophoretic type 15 (ET-15) clone (1), which is typed by a single nucleotide polymorphism and an insertion element (13, 49). Maintaining physical strain collections will continue to be an indispensable requirement in the future, because besides the antigenic variant, protein expression is another predictor of strain coverage by bactericidal antibodies elicited by vaccines. Nevertheless, constant readdressing of stored genome data will greatly speed up analyses and facilitate vaccine implementation.
Taken together, our first experience with the use of the Ion Torrent PGM for genomic typing of meningococci was very positive with respect to speed, accuracy, and the lack of necessity of further data editing. For broad use in many reference laboratories, cost for both hardware and consumables must be within the range of average budgets for laboratories. Laboratory partnering is a possible model to offer service to small countries. Alternatively, efficient networking, as exemplified by PulseNet (14), might help to distribute the technology at a large scale in the future. Furthermore, bioinformatics tools that enable nonspecialists to perform data processing at all steps of the procedure need to be developed.
We are indebted to the many senders of strains supporting the laboratory surveillance of meningococcal disease at the Reference Laboratory in Würzburg. We thank Johannes Elias for helpful discussions. Craig A. Cummings from Life Technologies is thanked for giving advice for bioinformatics analysis. Finally, we thank Anjali Shah from Life Technologies for inclusion and support for the PGM 200-nucleotide-read-length early access program.
The Reference Laboratory is funded by the Robert Koch Institute, Berlin, Germany. This publication made use of the Neisseria Multilocus Sequence Typing website (http://pubmlst.org/neisseria/) developed by Keith Jolley and sited at the University of Oxford (23). The development of this site has been funded by the Wellcome Trust and European Union.
Published ahead of print 29 March 2012