|Home | About | Journals | Submit | Contact Us | Français|
We have adapted molecular inversion probe technology to identify microbes in a highly multiplexed procedure. This procedure does not require growth of the microbes. Rather, the technology employs DNA homology twice: once for the molecular probe to hybridize to its homologous DNA and again for the 20-mer oligonucleotide barcode on the molecular probe to hybridize to a commercially available molecular barcode array. As proof of concept, we have designed, tested, and employed 192 molecular probes for 40 microbes. While these particular molecular probes are aimed at our interest in the microbes in the human vagina, this molecular probe method could be employed to identify the microbes in any ecological niche.
The substantial majority of the earth's bacteria have not been grown in culture and do not form colonies on agar plates (for examples, see references 19 and 21). These statements are particularly true of the bacteria living in or on human beings (for examples, see references 2, 6, and 7). The Human Microbiome Project (for examples, see references 2, 5, 24, and 27) is employing DNA sequencing and other genome-based technologies to reveal the plethora of microbes living in or on humans. Our goal was to develop a massively multiplex method employing currently commercially available reagents to identify those microbes at the species level.
Several useful approaches to the identification of microbes that do not form colonies on agar plates have been published. Many scientists have employed dideoxy sequencing of the PCR-amplified 16S rRNA gene to identify microbes (for an example, see reference 22). Dideoxy sequencing is expensive, cumbersome, and slow. “Next-generation” sequencing reduces the cost of sequencing but produces much shorter read lengths. As examples, Tarnberg et al. (26), Jonasson et al. (11), and Sundquist et al. (25) employed pyrosequencing of small portions of the 16S rRNA gene to identify microbes. These scientists could not always identify the microbes down to the species level. “Checkerboard DNA-DNA hybridization” (CDH) is a technology that is more than a decade old (23). Nikolaitchouk et al. (14) have applied CDH to identify the microbes in the human female genital tract and achieved a 13-plex reaction. Dumonceaux et al. (4) coupled microbe-specific oligonucleotides to fluorescently labeled microspheres. Subsequently, the microbes were detected and counted by flow cytometry. Dumonceaux et al. (4) achieved a 9-plex reaction. DeSantis et al. (3) designed and employed a microarray containing 297,851 oligonucleotide probes derived from the 16S rRNA gene from 842 subfamilies of prokaryotes. DeSantis et al. (3) concluded that their microarray revealed greater prokaryotic diversity than dideoxy sequencing of a “typically sized clone library.”
None of these ingenious methods meets the requirements for a robust, commercially available, highly multiplex technology. Therefore, we have adapted an independent method to identify microbes: molecular inversion probes (8) coupled with a commercial molecular barcode array. This method does not require growth of the microbes. Rather, molecular probe technology is a nucleic acid-based technology employing DNA homology twice: once for the molecular probe to hybridize to its homologous DNA and again for the 20-mer oligonucleotide barcode on the molecular probe to hybridize to a commercially available oligonucleotide barcode array. We present here data demonstrating proof of concept in which molecular probes were designed, tested, and employed to detect microbes in simulated clinical samples. Because of our ongoing interest in the bacteria that inhabit the adult human vagina (10), we focus on that ecological niche. However, this method is sufficiently general that it can be applied to detect the microbes in any ecological niche, e.g., soil and the ocean.
The design of our molecular probes was based upon the previously published designs of padlock probes (15) and molecular inversion probes (8). There are three domains within our molecular probes (Fig. (Fig.11 A). The first domain is a contiguous 40-base sequence (a “homer”) unique to the genome of the microbe of choice, divided into two 20-mers. The second domain is a 20-base barcode from the Affymetrix GenFlex Tag16K array v2 (hereinafter the “Tag4 array”; Affymetrix, San Carlos, CA). The Tag4 array contains 8-μm features (also known as 20-mers or barcodes), each replicated and dispersed five times on the array (16). The third domain is a 36-base universal PCR amplification sequence (1). Thus, these molecular probes are 96 bases in length. Six examples of molecular probe design are given in Fig. Fig.2.2. The designs for all 192 molecular probes are given in Table S1 in the supplemental material. Since the oligonucleotide probes contain fewer than 100 bases, they can be purchased commercially. We purchased the 5′-phosphorylated, PAGE-purified molecular probes from Integrated DNA Technologies (Coralville, IA). We arbitrarily chose to give the molecular probe designs sequential “ED” numbers. The list of microbial DNAs employed in these experiments is given in Table S2 in the supplemental material.
We have written custom software, Blaster.rb, to identify 40-base homers unique to the genome of the microbe of interest. Blaster.rb is freely available at http://med.stanford.edu/sgtc/research/blaster.html. Blaster.rb divides an input sequence into 40-base oligomers, offset by 20 bases. Then, Blaster.rb applies the following screens. (i) The 40-mer must be between 45% and 55% G+C, such that multiplexed molecular probes will hybridize at the maximum rate at the same temperature. (ii) The 40-mer cannot contain more than three of any base in a row, because purines, especially guanine, tend to stack, and stacking interferes with hybridization. (iii) The 20th and 21st bases of the homer must be a G or a C. The later ligation event will take place between these two bases. We want three, rather than two, hydrogen bonds holding each base in place. (iv) The 40-mer cannot contain an inverted repeat. Such a repeat could form a stem-and-loop structure and reduce or abolish the homer's ability to form hydrogen bonds with its homologous sequence in the target DNA. (v) Since human DNA was the most likely contaminating DNA in a vaginal swab, the 40-mers were BLASTed against the GenBank Human Genome database. Any homer with a good match to human DNA was excluded. A “good match” was defined as more than 17 perfectly matched bases in a row or a single mismatch within 20 bases. (vi) The 40-mers that passed all of the previous screens were BLASTed against the GenBank Other Genomes database. Blaster.rb reported the 40-mers unique to the species in one list. A second list reported 40-mers unique to the species plus other members of the genus. Blaster.rb is not restricted to the rRNA genes or even to genes. As examples, for the six Lactobacillus gasseri molecular probes (Fig. (Fig.2),2), the homers were derived from the nucleotidyltransferase gene, the lactate dehydrogenase gene, a transcriptional regulator, and two different hypothetical protein genes, respectively. (vii) Blaster.rb does not require a complete genome sequence. For Atopobium vaginae (GenBank accession number ADNA01000000) and Gardnerella vaginalis (GenBank accession number ADNB01000000), we had robust contigs but not complete genome sequences. For each, we linked together about a dozen contigs, placing 20 As between each contig. (viii) Since sequences are added to GenBank daily, identifying unique 40-mers was a moving target. BLAST was repeated over time.
The 96-base molecular probe design was entered into the Rensselaer bioinformatics web server (http://frontend.bioinfo.rpi.edu/applications/mfold/) and folded into all possible secondary structures (28). Any design for which any form had 8 or more contiguous base pairs anywhere among the 96 bases was rejected. Such a structure could compete with the annealing process. Any design that could form six or more contiguous base pairs within 20 bases of the 5′ end or of the 3′ end was rejected. (GT was counted as a base pair.) Approximately half of our designs were rejected by these criteria.
Fig. Fig.1B1B presents an outline of the molecular probe procedure. In the first step (step a), a pool of molecular probes was annealed to the denatured target DNAs in Ampligase buffer (Epicentre Technologies, Madison, WI). We titrated most molecular probes and concluded that the optimal concentration for each molecular probe in the stock solution was 100 attomol/μl (60 × 106 molecules/μl; 10 times higher for Lactobacillus crispatus and Lactobacillus jensenii probes). One microliter of the mixed-probe stock solution was diluted into the final volume of the annealing reaction mixture (7.8 μl). The annealing reaction mixture was heated to 95°C for 5 min. Then, the temperature was dropped one degree at a time for 1 min at each temperature, until 65°C was reached and held at 65°C overnight. For step b, where there was sufficient homology between the homer of the molecular probe and the target DNA, the probe hybridized to the target and yielded 40 bp of duplex DNA. That hybridization brought the 5′-phosphorylated end of the probe adjacent to the 3′ end of the probe with no bases missing. For step c, ligation formed a phosphodiester bond between the 5′ and 3′ bases of the molecular probe. Ampligase (1 unit; Epicentre Technologies) was added, and the reaction mixture was heated to 58°C for 2 min. The volume was 14.2 μl. For step d, all linear DNA was digested by exonuclease treatment: 0.65 units of exonuclease I and 3.3 units of exonuclease III (Epicentre Technologies) were added to each reaction mixture, with a negligible change in reaction mixture volume. The reaction conditions were 37°C for 15 min, followed by 80°C for 15 min to abolish the enzymatic activities. The circular molecular probes remained. The fifth step (step e) was a conventional PCR. We employed AmpliTaq Gold kits purchased from ABI (Foster City, CA) and followed the manufacturer's instructions. We purchased the 5′-biotinylated and high-pressure liquid chromatography-purified universal forward and reverse amplification primers (1) 5′-biotin-TACTGAGGTCGGTACACTCT and 5′-biotin-AGTAGCCGTGACTATCGACT from Integrated DNA Technologies. Two units of AmpliTaq DNA polymerase were added per final 50-μl amplification reaction mixture. The cycling conditions were as follows: 95°C for 10 min; 25 cycles of 94°C for 30 s, 63°C for 30 s, and 72°C for 30 s; and 72°C for 5 min. For step f, we used 10-μl amounts to electrophorese through a 3% composite gel. The remaining 40-μl amounts (without purification) of the 5′-biotinylated PCR products were hybridized to a Tag4 array (17; also see Materials and Methods in the supplemental material).
Additional information is in the Materials and Methods section in the supplemental material.
The Atopobium vaginae Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number ADNA00000000. The version described in this paper is the first version, ADNA01000000. The Gardnerella vaginalis Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number ADNB00000000. The version described in this paper is the first version, ADNB01000000.
To test the specificity of our molecular probes, we reacted the probe set with different “dropout” pools of genomic DNAs (see Materials and Methods in the supplemental material). Each of the 36 pools was lacking one of the 36 target DNAs recognized by our probe set. The positive control contained all DNAs. For example, the data for the Bifidobacterium longum DNA dropout pool is shown in Fig. Fig.3.3. The only negative probes were the five probes directed against B. longum DNA. These molecular probes could not hybridize to the absent B. longum DNA and did not hybridize to any of the other 35 prokaryotic genomic DNAs in the dropout pool. Included in these tests for specificity were molecular probes for different species of the same genus. As one example, Neisseria gonorrhoeae probes did not hybridize to Neisseria meningitidis DNA and, conversely, N. meningitidis probes did not hybridize to N. gonorrhoeae DNA (data not shown). As a second example, there were three species of the Pseudomonas genus in the probe mix and the dropout pools: P. aeruginosa, P. fluorescens, and P. putida. The molecular probes for each species hybridized only to the species for which they were designed and not to the DNA of the other two species in the genus (data not shown).
Because of the importance of Lactobacillus bacteria in the adult vagina, additional experiments were performed to test the specificity of the Lactobacillus species probes. As an example, a mixture of six L. gasseri molecular probes was hybridized separately to each of seven Lactobacillus DNAs: L. acidophilus, L. brevis, L. crispatus, L. delbrueckii, L. gasseri, L. jensenii, and L. plantarum. The results are shown in Fig. Fig.4.4. The only strong fluorescence resulted from each molecular probe reacted with L. gasseri genomic DNA.
When collecting vaginal swabs, human cells are sometimes inadvertently also collected, leading to the presence of contaminating human DNA. During our annealing reaction, we did not want any of our molecular probes to hybridize to human DNA, since such hybridization would compromise the interpretation of the data. Therefore, three tests were undertaken. (i) All potential 40-base homers were BLASTed against the human DNA sequence in GenBank. Any potential homer with more than negligible homology was discarded (Materials and Methods). (ii) The sequences of the two universal amplification primers were BLASTed against the human DNA sequence in GenBank. “No significant similarity” was found. (iii) An experiment was undertaken to determine if human DNA would act as a target for any of the molecular probes (see Materials and Methods in the supplemental material). The resulting fluorescence signals are depicted in Fig. S1 in the supplemental material. No probe produced significant signal consistently with all three amounts of human DNA. Only one probe was found to react significantly with the highest concentration of human DNA. However, even this probe yielded a very low signal. Therefore, we concluded that the reactivity of our molecular probes with the assayed amounts of human DNA was negligible.
Fig. S2 in the supplemental material presents a histogram of the minimum detection limit for the molecular probes (see Materials and Methods in the supplemental material). Notably, molecular probes targeting the same whole-genome amplified (WGA) microbial DNA did not necessarily have the same minimum detection limit. Among the molecular probes with a minimum detection limit of, at most, 0.5 pg were probes for Burkholderia mallei, Corynebacterium glutamicum, L. brevis, L. gasseri, Stenotrophomonas maltophilia, and Treponema pallidum.
ATCC genomic Escherichia coli DNA was employed to measure the minimum detection limit without the complication of WGA. A mixture of the three E. coli molecular probes (ED35, ED36, and ED39) was hybridized to decreasing concentrations of E. coli DNA and taken through our procedure. The Tag4 array results are shown in Fig. Fig.5.5. ED35 had a minimum detection limit of 1 pg (326 yoctomol; 2,000 molecules). ED36 and ED39 had a minimum detection limit of 10 pg (3.26 zeptomol; 20,000 molecules). For the same amount of E. coli genomic DNA, the three molecular probes produced a different amount of fluorescence. For example, at apparent saturation (100 ng), ED35 produced 3,984 fluorescence units (FU), ED36 produced 5,513 FU, and ED39 produced 2,914 FU (Fig. (Fig.55).
To simulate the total DNA derived from vaginal swabs, we constructed five mixtures of DNAs wherein each mixture had very different mass amounts of 3 to 5 WGA genomic DNAs relevant to the human vagina, called simulated clinical samples A through E (see Table S3 in the supplemental material). As an example, the data for simulated clinical sample E is shown in Fig. Fig.6.6. The data for the other four simulated clinical samples are shown in Fig. S3 to S6 in the supplemental material. In all cases, the molecular probes detected only the DNAs known to be in the simulated clinical sample. There were no false positives, and there were no false negatives.
While there was excellent qualitative agreement, quantitative agreement was elusive. For example, in simulated sample E (Fig. (Fig.6),6), there were 0.45 ng (0.15 attomol) of WGA E. coli DNA and 10 ng (13 attomol) of WGA T. pallidum DNA. The range of fluorescence achieved for the five T. pallidum probes extends both higher and lower than the range of fluorescence of the three E. coli probes (Fig. (Fig.6),6), despite the ~100-fold difference in the molar amount of target DNA. The explanation probably lies in the well-known facts that standard PCR is not quantitative and fluorescence intensity is not a linear function of mass.
We have described a method that combines molecular probes with molecular barcode array technology and is capable of detecting any microbe for which adequate DNA sequence information is available. Custom software (which we have made publicly available) was used to identify the 40 contiguous bases of unique target DNA sequence that is essential for the specificity of each molecular probe. We verified the specificity of our molecular probes by testing them against DNA from many other microbial species and against human genomic DNA. Finally, we successfully applied this method to identify microbes in simulated clinical samples.
There are several important benefits in this technology. A key benefit of the molecular probe method is that it does not require growth of the microbe. Another benefit is that the molecular probes are 96 bases in length and, therefore, can be purchased commercially. The molecular probe method employs a commercially available oligonucleotide array. Quality control has been built into the Tag4 array, so that results are reproducible from scientist to scientist and from laboratory to laboratory (17).
Another key benefit is that the method is designed to identify microbes in multiplex. The Tag4 array contains 16,000 features (also known as 20-mers or barcodes). In this work, we have used fewer than 200 features. Thus, molecular probes for a nearly unlimited number of microbes can be added at any time (9). Sequence data produced by the ongoing Human Microbiome Project will only increase the number of organisms that can be interrogated by this method. In the near future, it should be possible to interrogate the entire human microbiome on one Tag4 array. It is widely recognized that the complex population of microbes living in or on humans can reflect the health of the individual (for examples, see references 12, 13, 18, and 20). Therefore, the method presented herein has potential applications for both basic research and clinical medicine.
We have benefited from discussions with Johan Baner, Lisa Diamond, Simon Fredrickson, Sujatha Krishnakumar, Jochen Kumm, Michael Mindrinos, Curtis Palm, and Nader Pourmand.
We are deeply grateful to 47 generous colleagues who sent us gifts of a combined 87 genomic DNAs: C. Arvidson, A. Blanchard, F. Biville, E. Carretto, T. Chang, J. Christensen, J.-A. Dillon, L. Drago, D. Edmonson, E. Eribe, H. Falentin, M. Ferris, D. Forster, S. Harvey, N. Jones, T. Klaenhammer, S. Kleinsteuber, S. Lau, M. Muller, R. Nichols, H. Nojiri, S. Norris, S. Patrick, B. Peyton, W. Picking, S. Pournaras, J. Ramos, V. Rodwell, G. Rossolini, T. Sandrin, M. Schneegurt, V. Sivadon, Y. Song, G. Spear, T. Stanton, B. Stark, J. Steele, R. Stephens, H. Suzuki, C. Vadeboncoeur, A. Viale, W. Wade, G. Welling, J. West, K. Williams, E. Worobec, and K. Yuen.
This work was supported by National Human Genome Research Institute grant PO1 HG000205.
R.W.H. designed the experiments. R.W.H. and R.P.S. wrote the manuscript. E.A.A. wrote the custom software Blaster.rb. M.F. entered the sequences into the custom software and retrieved the results. R.W.H. designed the molecular probes and performed the molecular probe reactions. M.M. hybridized the molecular probes to the arrays. A.M.A. washed and scanned the arrays. R.P.S. and R.W.H. analyzed the array data. R.P.S. and M.F. produced the graphs thereof. R.W.D. provided the intellectual, physical, and financial milieu for these experiments.
The authors declare no competing financial interests. Affymetrix (San Carlos, CA) currently holds the commercial license to molecular inversion probe technology.
Published ahead of print on 23 April 2010.
†Supplemental material for this article may be found at http://aem.asm.org/.