Every technology has its advantages and disadvantages. There are two important challenges in detecting bacteria by amplifying and BigDye-terminator (Sanger) sequencing rDNA. (1) rDNA genes are present at multiple copies per genome, and the copy number differs among bacteria [
6,
7]. (2) The "universal" primers have mismatches to the rDNAs of highly relevant bacteria [
8,
9]. The negative impact of mismatch between primer and template is substantial [
9,
10]. Baker
et al. [
11] found that no primer pair had good matches to all bacterial rDNA. Therefore, bacterial genomes with few ribosomal RNA genes and/or with rDNA sequence mismatch to the primers will likely be under-represented in the sequencing library. The same considerations make determining the minimum detection limit problematic. In earlier work, we accomplished extensive modeling of the cost/benefit ratio for BigDye-terminator sequencing [
12]. We concluded that four 96-well plates of sequence reads maximized the cost/benefit ratio, which is what we accomplished for these vaginal swabs: 4 × 96 = 384 (minus ~ 5% failed reads = 365 reads). BigDye-terminator sequencing has a very low error rate. Nevertheless, our rule-of-thumb is to require 10 BigDye-terminator reads (~ 3% of the sequence reads) to securely detect a bacterium.
Our molecular probe technology requires a reasonably secure genome sequence for each bacterium and the synthesis of long oligonucleotides. Second generation sequencing is providing bacterial genome sequences faster and cheaper than BigDye-terminator sequencing. The cost of synthesizing oligonucleotides is coming down, while the length is going up.
For the molecular probes, the Homers are based upon single copy sequences. Thus, unlike rDNA-based detection, there is no copy number variation among bacterial genomes that could confound the results. However, to design the Homers, we started with complete genome sequences of specific strains of any given bacterial species. The bacterial genome sequence section of GenBank (presumably) contains only a fraction of the genome sequences of all of the strains for any given species. Thus, a molecular probe may be correctly positive for one strain's genome and correctly negative for another's. This situation would give rise to false negatives in detecting bacteria. We have attempted to minimize this possibility by employing multiple probes per genome and with Homers derived from different parts of the genome sequence.
We have employed two very different assays for the molecular probes: Tag4 array and SOLiD sequencing. There was an apparent lack of good, relative quantitation for both assays, as seen for the simulated clinical samples. With the Tag4 assay, fluorescence intensity is an exponential function of mass and, thereby, inherently difficult to quantitate. However, the assay for each sample requires an individual Tag4 array, and, therefore, each Tag4 assay is independent of the other Tag4 assays. The SOLiD assay requires only counting the number of reads supporting the presence of each bacterium. However, as with any multiplex sequencing, the samples are not independent, as there is a limit to the total number of reads.
Our goal is to produce a technology that will detect bacteria without culture, with commercially available reagents, highly multiplexed, and that will ultimately be fast and inexpensive. Other investigators have invented or adapted technologies toward likely the same goal. Several examples follow. The Insignia system is closest to our technology [
13,
14]. The system is in two parts. The first part is the publically available software that defines oligonucleotides unique to the target genome of interest [
13]. The second part is a quantitative PCR assay (qPCR) [
14]. The software is definitely useful. The qPCR assay cannot be multiplexed. Nikolaitchouk
et al. [
15] applied "checkerboard DNA-DNA hybridization" to detect the microbes in the human female genital tract and achieved a 13-plex reaction. Given the complexity of this technology, it is unlikely that very high multiplex can be achieved. DeSantis
et al. [
16] designed and successfully employed a microarray containing 297,851 oligonucleotide probes derived from the rDNA of 842 subfamilies of prokaryotes. Willenbrock
et al. [
17] designed and tested a microarray that contained genome sequences from seven
Escherichia coli genomes. Their microarray is not commercially available and is unlikely to accommodate very high multiplexing. Dumonceaux
et al. [
18] coupled microbe-specific oligonucleotides to fluorescently labeled microspheres and detected and counted the fluors by flow cytometry, achieving a 9-plex reaction. At present, it is not clear which, if any, of these technologies will turn out to be widely used for detecting bacteria.
While we have concentrated on the detection and identification of bacteria, our molecular probe technology is not limited to that function. Archaea, viruses, even individual genes (such as antibiotic-resistance genes or bacterial toxin genes), could also be detected. The only requirement is sufficient genome sequence to design the unique sequence similarity region of the molecular probe. Because of the multiplex nature of both assays for the molecular probe technology, thousands more probes, representing thousands more entities, may be added at any time [
4]. Eventually, the entire human microbiome, in health and in disease, may be assayed in a single reaction tube and employing only commercially available reagents.