|Home | About | Journals | Submit | Contact Us | Français|
A real-time PCR assay with the ability to rapidly identify all pathogenic bacteria would have widespread medical utility. Current real-time PCR technologies cannot accomplish this task due to severe limitations in multiplexing ability. To this end, we developed a new assay system which supports very high degrees of multiplexing. We developed a new class of mismatch-tolerant “sloppy” molecular beacons, modified them to provide an extended hybridization range, and developed a multiprobe, multimelting temperature (Tm) signature approach to bacterial species identification. Sloppy molecular beacons were exceptionally versatile, and they were able to generate specific Tm values for DNA sequences that differed by as little as one nucleotide to as many as 23 polymorphisms. Combining the Tm values generated by several probe-target hybrids resulted in Tm signatures that served as highly accurate sequence identifiers. Using this method, PCR assays with as few as six sloppy molecular beacons targeting bacterial 16S rRNA gene segments could reproducibly classify 119 different sequence types of pathogenic and commensal bacteria, representing 64 genera, into 111 Tm signature types. Blinded studies using the assay to identify the bacteria present in 270 patient-derived clinical cultures including 106 patient blood cultures showed a 95 to 97% concordance with conventional methods. Importantly, no bacteria were misidentified; rather, the few species that could not be identified were classified as “indeterminate,” resulting in an assay specificity of 100%. This approach enables highly multiplexed target detection using a simple PCR format that can transform infectious disease diagnostics and improve patient outcomes.
Human bloodstream infections (BSI) must be treated rapidly and effectively in order to avoid significant morbidity and mortality (17). A rising incidence of drug-resistant infections has complicated antibiotic selection (33), emphasizing the importance of rapidly determining the identity and drug susceptibility profile of each infecting bacterial species. Unfortunately, conventional microbiological identification and drug susceptibility determination methods are often too time-consuming to allow quick treatment decisions. Many bacterial species have specific antibiotic indications while most have local antibiotic resistance patterns that can be predicted by periodic examination of antibiotic resistance profiles (10, 19). Consequently, species identification may be used to guide antibiotic therapy pending final antibiotic susceptibility tests. A rapid bacterial species identification method would dramatically speed up the diagnosis of serious diseases, enabling rapid and definitive treatment, and concomitantly decrease the use of broad-spectrum antibiotics.
Ideally, a rapid assay for BSI should be functionally equivalent to the current diagnostic standard, blood culture (15), in being able to identify all clinically significant pathogens as well as commensals. To this end, molecular assays have targeted bacterial 16S rRNA genes or 16S-23S rRNA gene spacer regions (11, 24, 27, 29). These DNA segments contain hypervariable sequences that can be used to identify virtually all known bacterial species. Hypervariable sequences are flanked by highly conserved DNA sequences that permit universal amplification of the diagnostic targets, utilizing a limited primer set. DNA sequencing methods, including some used in clinical diagnostics (2, 16), have confirmed the utility of this approach although the clinical matrix has usually been culture-positive material rather than uncultured patient blood. The roadblock to its wider clinical use has been an inability to develop user-friendly real-time PCR methods that can distinguish among the same diverse set of sequences as differentiated by DNA sequencing. DNA probes, such as those used in real-time PCR assays, are able to bind to only a limited range of DNA sequences. Thus, currently available PCR-based detection methods are usually limited to differentiating among bacterial species within a single genus (8, 11, 20, 26). Bacterial species identification can be moderately increased by multiplexing several probes to perform different species detection assays simultaneously (18, 30); however, the fundamental limitation caused by a narrow probe binding range remains the same. Moreover, it is difficult to distinguish more than six different fluorophores in a single real-time PCR assay using current real-time PCR instruments due to limitations in the discriminatory capacity of real-time PCR instruments. This places a practical limitation on the number of real-time PCR probes that can be used in a bacterial species identification system designed to be carried out in a single assay well.
Our approach for identifying bacterial species utilizes modified molecular beacon probes termed “sloppy molecular beacons” (SMBs). These probes, by virtue of their greater length (40 to 60 nucleotides) and the choice of their probe sequence, are able to form probe-target hybrids with PCR amplicons generated from a wide range of bacterial species. When the resulting hybrids are melted apart, the temperature at which they dissociate (Tm) is characteristic of the particular bacterial DNA sequence that is present in the sample. Although a single Tm value is not sufficiently precise to unambiguously identify each species that is present, a unique set of these specially designed probes, each with a unique sequence, provides a set of Tm values (Tm signature) that distinctively identifies the species that is present (Fig. (Fig.1A).1A). This approach, which was initially utilized in the speciation of mycobacteria (9), differs fundamentally from conventional real-time PCR assays that use a single DNA probe to detect a single (or closely related) target (Fig. (Fig.1B).1B). Here, we demonstrate how the variable hybridization signals generated by six specially modified SMB probes can be combined with a pattern-based analytic approach to reproducibly identify and distinguish several hundred different sequences, thereby classifying a wide range of bacteria into clinically relevant genera and species. This novel approach enabled us to develop a universal bacterial identification assay and to validate this assay on cultured clinical isolates and positive patient blood cultures.
Initial studies were performed on a reference set of bacterial isolates and a limited number of artificial DNA templates representing 64 genera and 133 species obtained from various sources (see Table S1 in the supplemental material). The identity of each isolate was confirmed by sequencing a 700-bp region of the 16S rRNA gene, which included the hypervariable regions V3 through V6 (3). To perform confirmatory studies, we also created a test set of clinical isolates by randomly selecting 164 clinical isolates cultured from 158 patient's clinical cultures of blood, sputum, and urine from the New Jersey Medical School University Hospital's clinical microbiology laboratory. An additional 81 positive and 25 negative blood cultures, selected at random over an 8-month period, were obtained from the same laboratory. The investigator performing confirmatory studies on test set or blood culture-derived isolates was blinded to all prior species assignments. All clinical samples were stripped of personal identifiers prior to study entry, and the use of these samples for the current study was approved by the New Jersey Medical School Institutional Review Board. Additional details on clinical specimen and isolate collection and characterization are provided in Methods in the supplemental material.
Bacterial DNA was isolated from reference and test set isolates by boiling a loop of pure bacterial culture in Insta Gene Matrix solution (Bio-Rad Laboratories, Hercules, CA) for 20 min. DNA used for limit-of-detection analysis was subjected to RNase digestion followed by phenol-chloroform purification and ethanol precipitation. DNA was isolated from blood culture bottles by mixing 1 ml of the blood culture with an equal volume of 8% NaOH and incubating the samples for 15 min with occasional mixing. The solution was then transferred to a Cepheid GeneXpert open sample processing cartridge (Cepheid, Sunnyvale, CA) and then automatically processed for DNA by a GeneXpert DX system (Cepheid).
We had previously identified the 16S rRNA gene V3 and V6 hypervariable segments that provided a high degree of sequence variation as most appropriate for bacterial species identification using DNA probes (3). We designed universal primer sets to amplify each region from all clinically relevant bacterial species using linear-after-the-exponential (LATE)-PCR (23) (detailed in Methods in the supplemental material). SMBs were designed using the in silico DNA folding program mfold (http://frontend.bioinfo.rpi.edu/applications/mfold/cgi-bin/dna-form1.cgi), and the probe-target hybrid folding program DINAMelt (http://dinamelt.bioinfo.rpi.edu/twostate.php) was used to predict the possible hybrid structures and Tms. Unlike conventional molecular beacons, we designed the SMBs so that their probe lengths ranged from 40 to 60 nucleotides, and stem lengths varied between 5 and 7 nucleotides. All the probes were purchased from Sigma Genosys (St. Louis, MO). Two probes, SMB45 and V6P3, were used to test melting temperature variation as a function of probe-target mismatches. Six probes were designed for species identification by making them partially complementary to the V3 (two probes) and V6 (four probes) hypervariable regions of the 16S rRNA gene. Details of the molecular beacon sequences are listed in Methods in the supplemental material.
PCR assays of the reference and clinical isolates used 2 to 10 ng of DNA or 108 to 106 copies of synthetic template, except where noted, for limit-of-detection analyses. Individual SMBs were tested in individual PCR wells. The finalized assay was performed in six wells. For the assay, all six wells contained DNA from a given sample and a PCR master mix. One of the six sloppy molecular beacons used in the assay was then added to each well. Each PCR contained 1× PCR buffer, a 200 mM concentration of each deoxyribonucleoside triphosphate, 2.5 mM (for V3) or 4 mM (for V6) MgCl2, 0.5 μM excess primer (V3R or V6F), 0.0167 μM limiting primer (V3F or V6R, respectively), 0.06 U/μl of Taq Polymerase Stoffel fragment (Applied Biosystems, Foster City, CA), 1 ng/μl of any one of the six probes, and bacterial DNA or synthetic target in a final volume of 20 μl in each well of a 384-well plate. PCR was performed using an ABI Prism 7900HT Sequence Detection System (Applied Biosystems) with the following cycling parameters: 94°C for 2 min followed by 45 cycles of 94°C for 15 s, 55°C for 30 s, and 72°C for 15 s. A post-PCR Tm analysis was then performed by heating the resulting amplicons to 94°C for 5 min, followed by gradual cooling to 35°C. After a 10-min hold at 35°C, the temperature was gradually raised by increments of 1° to 90°C, and the reaction mixture was held at each temperature for 1 min, with fluorescence monitored at each temperature for the last 30 s. The Tm data were plotted as the first derivative of fluorescence versus temperature after first smoothing the data by calculating a four-temperature-point rolling average and normalizing each curve to the fluorescence at 85°C. Tm values were identified by selecting the peak of each curve, and a six-point Tm signature was generated for each DNA sample.
Bacterial species in the test set of clinical isolates or in the blood culture study were identified by finding the closest matches between the Tm signature of each unknown bacteria to the Tm signatures of the known bacterial species in a reference set look-up table. A program was created to express each signature as a single point in six-dimensional space (where each axis is used to plot one of the six Tm values). The distance between that point and the point determined for each of the known species was then calculated to produce a series of distance indexes, or D values. The known species that was at the closest distance to the unknown species (i.e., which had the smallest D value) was then identified as a correct match. The calculated distance was usually in the range of 0 to 3 for matching species. An unknown species generating a D value of ≥5 was considered “indeterminate.” The program used to calculate D values from experimentally derived Tms is available in Excel format (see Data in the supplemental material).
We explored modifications in conventional molecular beacon design to enable them to hybridize to a wide range of sequences and generate sequence-specific Tm values. We were able to design probes with stable stem-and-loop conformations, avoiding aberrant secondary structures by introducing a small number of nucleotide substitutions into the probe sequence. The substitutions were chosen to correspond to one or more of the bacterial sequences in our reference set so that at the same time they also increased the range of species to which the probe would hybridize.
We created two sloppy molecular beacons, a 45-nucleotide-long SMB45 with a 5-bp stem and a 60-nucleotide-long V6P3 with a 6-bp stem (see Methods in the supplemental material) which were highly complementary to a segment within the V6 hypervariable region of the bacterial 16S rRNA gene. We then examined their hybridization characteristics utilizing sets of artificial PCR amplicons. The amplicons either had core sequences that were fully complementary to the probes or contained from 1 to 17 nucleotide mismatches. Mismatches were introduced to span the entire target at roughly regular intervals. In addition, to explore the effect of the position of the mismatch within the target sequence, mismatches were omitted from one end, creating an “anchor” sequence over which the SMB probe and targets were perfectly complementary (see Tables S2, S3, and S4 in the supplemental material).
Each sloppy molecular beacon hybridized and generated distinguishable Tm values in the presence of targets that differed by as many as 8 nucleotides for SMB45 with a 45-nucleotide probe (Fig. (Fig.2A)2A) and 14 nucleotides for V6P3 with a 60-nucleotide probe region (Fig. (Fig.2B);2B); the increased probe lengths of the molecular beacons resulted in a larger number of mismatches being tolerated and still resulted in measurable probe-target hybrid Tm values. Tm values for each probe-target hybrid decreased in a regular fashion with an increasing number of mismatches between the probe and the target, thereby generating distinct Tm values for different probe-target hybrids with various numbers of mismatches for each probe (Fig. 2A and B). We also observed that the presence of a 15-nucleotide-long anchor sequence stabilized the binding of the probe to its targets, increasing Tm values for each mismatch by 1 to 5°C, which increased the number of detectable mismatches by SMB45 to nine (Fig. (Fig.2C).2C). Experiments with different anchor lengths and mismatches incorporated into the anchors themselves enabled us to determine that 15- to 18-nucleotide-long anchors were optimal for a balance between the versatility of probe binding and discrimination of the different mismatched targets by the SMBs (data not shown). These results indicate that both sloppy molecular beacon probes with semiconserved anchor sequences and relatively large nonanchored probes generate a wide range of Tm values in the presence of divergent target sequences.
In examining the target sequences in the V6 hypervariable regions of the 133 species included in the test set of our study, we noticed that the high degree of sequence diversity present in this region would require that we use hybridization probes that were able to stably bind to very divergent sequences. Based on our above observations with artificial amplicons, we reasoned that both probes with very long sequences and probes with partial sequence extensions to the relatively conserved regions of the target sequence (i.e., anchored probes) are necessary to generate measurable Tm values from a reasonable number of target sequences. In addition to the long nonanchored V6P3 SMB, which virtually encompassed the entire V6 hypervariable region, we designed the relatively short V6P1 SMB including a 40-nucleotide-long probe with partially conserved sequences that served as a 15-nucleotide anchor. Each of these SMBs produced measurable Tm values in the presence of 57% and 95%, respectively, of the reference set bacteria (see Table S5 in the supplemental material). Both V6P1 and V6P3 tolerated up to 23 mismatches, producing Tm values of 44°C (Burkholderia mallei) and 46°C (Providencia alcalifaciens), respectively, which is attributable to their anchor sequence and long probe length, respectively. In spite of the longer probe lengths, each probe also retained the ability to distinguish between targets that differed by a small number of nucleotides. As expected from our studies with artificial amplicons, there was generally an inverse relationship between the number of probe-target mismatches and Tm value. However, the positional dependence of the mismatches, involvement of the probe stem in forming the probe-target hybrids, and formation of G-T pairings in the bacterial DNA samples introduced an additional level of Tm variability. Thus, in some cases, two targets with identical numbers of mismatches produced different Tm values, and several targets with large number of mismatches produced higher Tm values than targets with comparatively lower numbers of mismatches (as shown in Table Table11 for V6P3). This enhanced variability improved the ability of each SMB probe to differentiate among bacterial targets.
Concatenating and aligning the target regions of the V3 and V6 hypervariable stretches probed in our assay revealed that the 133 reference set bacteria were represented by 119 distinguishable V3-plus-V6 sequence types. We hypothesized that a small set of SMB probes could be selected so that each target sequence type would generate a unique Tm signature when the Tms produced by each individual SMB were combined. We designed a set of six different SMB probes seeking the maximum amount of Tm information from these 119 sequence types. Thus, in addition to V6P1 and V6P3, we designed four probes, V3P1 and V3P2, targeting the V3 region of 16S rRNA gene, and V6P2 and V6P4, targeting the V6 region of 16S rRNA gene. Each SMB probe, except V6P3, contained an anchor sequence within the probe, and each probe, except V6P1 (which was perfectly complementary to its target in Staphylococcus aureus), contained deliberately introduced mutations so that the probes were not perfectly complementary to specific bacterial sequences.
We tested the six-probe assay against DNA (or, in a few cases, artificial PCR target amplicons) from each of the 133 bacterial species in the reference set. This procedure was repeated six different times with different master mixes over a period of 3 months in order to introduce as much experimental variability as possible. The results were used to create a Tm signature look-up table, which listed the mean Tm value that each SMB generated in the presence of each target (see Table S5 in the supplemental material). The assay was then repeated blindly using coded DNA aliquots, and the identity of the samples was determined by calculating the distance index (D) between each unknown Tm signature with signatures in the look-up table. The reference species with the smallest D value identified the blinded sample (Table (Table2).2). The assay reproducibly generated 111 Tm signatures out of the 119 distinct sequence types (Fig. (Fig.3).3). A total of 104 of the Tm signatures corresponded to unique sequence types while 7 Tm signatures could not completely resolve DNA sequences but instead specifically identified six groups of two sequence types each and one group of three sequences types. With two exceptions, each group contained bacterial species from the same genera. The exceptions were the species pairs (i) Bartonella henselae and Campylobacter jejuni and (ii) Enterobacter cloacae and Klebsiella oxytoca, which produced indistinguishable Tm signatures even though their target sequences differed marginally. Thus, the assay was highly successful in detecting DNA sequence differences within the region probed by the SMBs. As a consequence, virtually all of the bacterial species within the reference set were either individually identified or else categorized into clinically useful groups of related species (Table (Table2).2). Significantly, the Tm values generated by each SMB probe hybridizing to DNA from the same bacterial species was highly reproducible, usually varying less than ±0.5°C (Fig. (Fig.3).3). The species-specific signatures were stable regardless of the quantity of DNA added to the PCR, to a lower limit of detection of ~500 genome equivalents for different bacteria spiked into blood, or if large amounts (up to 50 mg) of interfering human DNA was present (data not shown).
We performed a blinded evaluation of 164 clinical isolates from 158 patients in the course of their clinical diagnostic evaluations. Subsequent decoding showed that these samples represented 19 genera and 41 species, which included virtually all of the bacterial species commonly isolated in clinical laboratories (12, 13) (Fig. (Fig.4).4). Six separate PCR assays, each containing a different SMB probe, were performed with the DNA from each isolate. The resulting Tm signatures were used to generate D values for each isolate. In the relatively rare cases where the Tm signatures were identical in more than one species, D values were used to assign each unknown to a species “group.” To account for the clinical isolates that were not present in our species reference set, we added an indeterminate category to this study, defined as samples generating D values of >5.0. We were able to make correct assignments for 159 out of 164 isolates (a concordance of 97%; 95% confidence interval [CI], 94.3% to 99.6%). The five isolates that could not be correctly identified were assigned to the indeterminate group due to high D values; and no isolate was wrongly identified (specificity, 100%). Examining the five indeterminate samples showed that four consisted of species (two Pseudomonas aeruginosa and one each of E. cloacae and Neisseria gonorrhoeae) that should have been identified. In each case, miss-assignment to the indeterminate category was due to a manual error in deciphering the low Tm values by the blinded data interpreter, causing one or more Tm values to be erroneously entered as zero. DNA sequence analysis of the fifth isolate revealed it to be Streptococcus pseudoporcinus, which was absent from our reference set and, thus, not expected to be detected by our assay. When this one isolate was excluded from our study, the corrected assay concordance was 97.5% (95% CI, 95.2% to 99.9%).
As this assay will be particularly useful for rapidly identifying a bacterial species present in a blood culture sample, we performed a blinded assay on 81 positive and 25 negative clinical blood cultures. We expected to find occasional blood cultures that were positive for more than one bacterial species. Therefore, we added a “mixed” category for this study, defined as samples generating more than one Tm peak in any of the six SMB probes. All suspected mixed cultures with double Tm profiles for any probe were visually deconvolved to generate the species-specific signatures of the individual bacterial species present; otherwise, automated species assignments were made using the D value threshold. For deconvolution of the double Tm profiles, all the possible Tm signature patterns resulting from the presence of the double Tm peaks were checked for their D values against the reference set of Tm signatures. The two Tm signature patterns generating the smallest D values identified the bacteria in the mixture. Overall, we were able to correctly identify 76/86 (concordance of 88.4%; 95% CI, 81.6% to 95.1%) of the bacterial species present in the 81 positive cultures (Fig. (Fig.5).5). The assay made the correct species assignment for 67/76 (88%) of the blood cultures which contained a single bacterial species and for 9/10 (90%) of the bacterial species present in the five mixed blood cultures. Of the 10 indeterminate instances, four bacterial species (two P. aeruginosa and one each for Proteus mirabilis and group G Streptococcus) were called indeterminate when they should have been identified by the assay. An additional bacterial isolate (Staphylococcus hominis) was called negative. This single false-negative result occurred in a mixed culture which also contained a Streptococcus dysgalactiae species that was correctly identified. Finally, six isolates, five Pseudomonas putida and one Bacillus megaterium, were called indeterminate. However, we noted that these two species were absent from our reference and, thus, were not expected to be detected. Excluding these last six isolates from our study, the corrected assay concordance was 95.0% (95% CI, 90.2% to 99.8%). Importantly, the specificity of the assay was also high as none of the 25 negative blood cultures resulted in a bacterial species assignment, and there were no misidentifications.
This study presents a paradigm shift in PCR-based molecular diagnostics. Our approach uses a combination of probes and Tm signature to identify target sequences. Compared to assays which use individual real-time PCR probes to identify individual DNA targets, this approach makes it possible to identify a very large number of different target sequences with a very small number of probes. Our novel probe design strategies which included the use of anchor sequences or very long probe sequences (up to 60 nucleotides in length) were critical to its success. These modifications expanded the range of sequences to which the sloppy probes could hybridize while providing sufficient probe-target stability to generate reproducible Tm values and permit single nucleotide sequence discrimination. The variable length of the anchor sequences in the different probes and presence of mismatches in the anchor sequences themselves added to the differential sequence discrimination and hybridization versatility of the probes. The “sloppiness” of the probe hybridizations did not affect the overall assay precision, which resulted in a high level of sensitivity and specificity in our validation studies on clinical isolates and blood cultures, with not a single bacterial isolate being misidentified. This is attributable to the novel Tm signature principle combined with the distance index paradigm that we used in identifying the bacterial species. Thus, each sequence was identified by a distinct and unique combination of six different Tm values. In blinded assays, any deviant or missing Tm values in the Tm signature for any sequence always resulted in unusual D values, which easily enabled us to identify the assay as indeterminate and prevented false positive misinterpretations.
Our assay also identified most instances where two different bacterial species were present in the same blood culture by deconvolving double peaks into Tm values that were characteristic for two different bacterial species. This is because judicious designing of probes caused most bacteria to generate distinctly different Tm values for the same probe, generating discrete double peaks in the case of mixed DNA samples. Mixed blood cultures are uncommon, and when they occur, mixtures are often due to the presence of a contaminating commensal skin organism plus the true pathogen (18). We have confirmed in DNA spiking studies that we can identify the presence of common commensals when they are mixed with any of the bacterial species frequently seen in blood cultures (data not shown). It is interesting that the only bacterial species that we could not detect in a blood culture mixture was also the commensal S. hominis, which we suspect to be a contaminant present at a very low level in the clinical blood culture.
Sloppy molecular beacons represent one of several new approaches that have been proposed for rapid pathogen detection and speciation. A number of commercial and in-house molecular diagnostic PCR, ligase chain reaction, nucleic acid sequence-based amplification (22, 32), and pyrosequencing assays (4) for bloodstream infections have recently been described. These approaches work well in research settings or in clinical laboratories with substantial technical capacity; however, none of these methods combine the range of bacterial species identification, robustness, and the ease of use that is possible using our approach. Newer technologies including matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectroscopy, DNA microarrays, deep sequencing, and technologies like PCR coupled with electrospray ionization mass spectrometry may provide additional diagnostic modalities (1, 6, 7, 21, 31). However, these technologies are currently too costly, complex, and/or time-consuming for clinical use; and it is not yet clear whether they will ever offer a substantial advantage over our diagnostic method that requires little more than a real-time PCR.
Our current assay has some limitations. Several pairs of bacterial species could not be distinguished. These few pairs of organisms were identified as small groups rather than as individual species. A seventh sloppy molecular beacon designed to probe an additional target or sequence-specific modification of the existing probe sequences would be expected to resolve these remaining groups into individual species. The assay is clearly not able to identify a bacterial species which is not present in the reference set, which as a matter of fact increases its specificity. This would recur rarely once large collections of clinical isolates are tested and new species-specific signatures are added to the reference set. The assay was also not able to identify a few clinical isolates that were present in our reference set. This was attributable to manual errors in Tm determination, usually in the few Tm values that were below 45°C. Improved instrumentation and automated Tm identification would resolve this only true source of error. The assay has a sensitivity and specificity that rival conventional species identification performed in clinical laboratories (5, 14, 25, 28) even with the current limitations. However, the assay is far more rapid and has the capability of being far more sensitive than conventional assays, suggesting that our universal bacterial identification system can provide a significant advance to infectious disease diagnostics.
This work was supported by the National Institutes of Health grants U0I-AI-056689 and U0I-AI-175490.
D.A., F.R.K., and S.A.E.M. are among a group of inventors who earn royalties for molecular beacon usage.
Published ahead of print on 18 November 2009.
†Supplemental material for this article may be found at http://jcm.asm.org/.