|Home | About | Journals | Submit | Contact Us | Français|
To evaluate PCR-generated artifacts (i.e., chimeras, mutations, and heteroduplexes) with the 16S ribosomal DNA (rDNA)-based cloning approach, a model community of four species was constructed from alpha, beta, and gamma subdivisions of the division Proteobacteria as well as gram-positive bacterium, all of which could be distinguished by HhaI restriction digestion patterns. The overall PCR artifacts were significantly different among the three Taq DNA polymerases examined: 20% for Z-Taq, with the highest processitivity; 15% for LA-Taq, with the highest fidelity and intermediate processitivity; and 7% for the conventionally used DNA polymerase, AmpliTaq. In contrast to the theoretical prediction, the frequency of chimeras for both Z-Taq (8.7%) and LA-Taq (6.2%) was higher than that for AmpliTaq (2.5%). The frequencies of chimeras and of heteroduplexes for Z-Taq were almost three times higher than those of AmpliTaq. The total PCR artifacts increased as PCR cycles and template concentrations increased and decreased as elongation time increased. Generally the frequency of chimeras was lower than that of mutations but higher than that of heteroduplexes. The total PCR artifacts as well as the frequency of heteroduplexes increased as the species diversity increased. PCR artifacts were significantly reduced by using AmpliTaq and fewer PCR cycles (fewer than 20 cycles), and the heteroduplexes could be effectively removed from PCR products prior to cloning by polyacrylamide gel purification or T7 endonuclease I digestion. Based upon these results, an optimal approach is proposed to minimize PCR artifacts in 16S rDNA-based microbial community studies.
The detection, identification, and characterization of microbial populations and their activities in environments are a great challenge to microbiologists. The application of culture-independent nucleic acid techniques has greatly advanced the detection and identification of microorganisms in natural habitats. In the last decade, the use of 16S rRNAs or ribosomal DNAs (rDNAs) as molecular markers has become routine for microbial ecologists. Several different rRNA-based approaches have been used to characterize microbial communities, such as cloning plus sequencing (31), amplified rDNA restriction analysis (23), terminal restriction fragment length polymorphism (RFLP) analysis (1, 21), RFLP analysis (25), denaturing gradient gel electrophoresis (DGGE) (16, 26), temperature gradient gel electrophoresis (TGGE) (27), single-strand conformation polymorphism analysis (19), and heteroduplex mobility assay (12). Nearly every study applying these approaches reveals novel microbial groups, and many of them are still undetectable by cultivation (2, 3, 7, 8, 11, 13–15, 17, 29, 39, 41). Use of these methods with PCR, however, can cause bias (32, 33) and artifacts that lead to overestimation of community diversity (38).
Although PCR-generated chimeras have received much attention (4, 17, 18, 20, 28, 34, 35, 38), PCR-generated heteroduplexes and mutations have largely been ignored. It is impossible to avoid the formation of heteroduplexes in the PCR products when a mixture of homologous genes is used as PCR templates. When a heteroduplex molecule is cloned and transformed, two homoduplex molecules of 16S rRNA genes will be produced and segregated as a result of plasmid propagation (30). When the mixed 16S rRNA genes are subjected to RFLP, DGGE or TGGE analysis, artificial RFLP patterns or DGGE or TGGE conformations will be generated.
PCR-generated mutations pose another potential problem. Compared to other DNA polymerases, Taq DNA polymerase has a higher intrinsic misincorporation rate during synthesis (6). Such errors can accumulate and be enlarged during PCR amplification (38). When an error occurs at the restriction enzyme-recognizing site, an artificial RFLP pattern will occur. Although the phenomenon of PCR-induced mutations is well known, their effects on community diversity studies have not been adequately addressed. In addition, when the amplified target gene has secondary structure, deletion mutations may exist (5). Since the 16S rRNA gene has a stable secondary structure, deletion mutations could be produced during PCR amplifications.
Theoretically, PCR-generated chimeras should be fewer in amplifications with DNA polymerases with higher processitivity and decrease as elongation time increases and cycle number decreases. The PCR-induced mutations should be lower for DNA polymerases with either higher fidelity (point mutation) (38) or higher processitivity (deletion mutation) (5) and decrease as the cycle number decrease. The frequency for formation of heteroduplexes in the PCR products should decrease when lesser amounts of PCR products are synthesized. In addition, there is a potential increase of total PCR artifacts as species diversity increases. To test these hypotheses and minimize the three types of PCR artifacts, we evaluated PCR artifacts under different amplification conditions, and here we provide a general approach for reducing artifacts in 16S rRNA gene-based cloning studies.
The 16S rRNA genes were amplified from more than 40 different gram-negative and positive bacteria with the eubacterium-specific primers FD1 and R1540 (36, 40) and digested with HhaI. Ten strains that showed distinct HhaI digestion patterns were selected for constructing model microbial communities (Table (Table1).1).
The selected 16S rRNA genes were cloned into plasmid vector pCR II with the TA cloning kit (Invitrogen BV, San Diego, Calif.). The cloned 16S rRNA genes were amplified from plasmid DNAs using primers TA-F (5′GCCGCCAGTGTGCTGGAATT3′) and TA-R (5′TAGATGCATGCTCGAGCGGC3′), which are specific to the polylinker of the vector pCR II. The amplified 16S rDNA fragments were purified by using the Wizard PCR Preps DNA Purification System (Promega, Madison, Wis.) and quantified with PicoGreen dye (Molecular Probes, Inc. Eugene, Oreg.) by using a LB50 Luminescence Spectrometer (Perkin-Elmer, Branchburg, N.J.). The amounts of template were calculated to yield equivalent genomic DNA under the assumption of seven rRNA copies per genome (4.64 Mb) as in Escherichia coli.
The effects of various PCR amplification parameters, including DNA polymerase, cycle number, elongation time, and template concentration, on the formation of PCR artifacts were examined in a four-species community, which contained C1-4, To1-4, B9-12, and P39, one each from the alpha, beta, and gamma subdivisions of the division Proteobacteria, and a gram-positive bacterium. All treatments were carried out in triplicate. For the amplifications above 20 cycles, three amplifications were combined, whereas for the amplifications below 20 cycles, 10 amplifications were combined.
To determine the effect of DNA polymerase on the formation of PCR artifacts, three Taq DNA polymerases were examined: TaKaRa Z-Taq (Pan Vera Corporation, Madison, Wis.), with the highest processitivity; LA-Taq (Pan Vera Corporation), with the highest accuracy and intermediate processitivity; and the conventionally used AmpliTaq DNA polymerase (Perkin-Elmer). PCR amplifications were carried out under the optimum conditions for each Taq as suggested by the manufacturers: in 1× Z-Taq buffer containing 3 mM Mg2+ and a 200 μM concentration of each deoxynucleoside triphosphate (dNTP) with Z-Taq; in 1× GC buffer I, including 2.5 mM Mg2+ and a 400 μM concentration of each dNTP with LA-Taq; and in a buffer of 10 mM Tris-HCl (pH 8.3 at 25°C)–50 mM KCl, containing 1.5 mM Mg2+ and a 200 μM concentration of each dNTP with AmpliTaq. The standard reaction mixture contained 20 pmol of each forward and reverse primer, 230 fg of cloned 16S rDNA (equivalent to 100 pg of genomic DNA) from each strain, bovine serum albumin (100 μg/ml), and 1 U of the Taq DNA polymerase in a final volume of 20 μl. Reaction mixtures were incubated in a thermocycler, (model Gene Amp PCR system 9700; PE Applied Biosystem, Branchburg, N.J.) at 95°C for 5 min, followed by 30 cycles at 95°C for 40 s, 58°C for 30 s, and 72°C for 4 min and then by a final extension at 72°C for 7 min.
To examine the effect of elongation time on the formation of PCR artifacts, the 16S rDNAs were amplified with Z-Taq as described above, except the template concentration from each strain was equivalent to 30 ng of genomic DNA. The extension times compared were 20 s, 2 min, and 4 min. To examine the effect of template concentration on the formation of PCR artifacts, the 16S rDNAs were mixed in equal ratios with an equivalent of 100 pg, 1 ng, or 10 ng of genomic DNA per strain and amplified with Z-Taq. In addition, the effect of PCR cycle number on the formation of PCR artifacts was evaluated with Z-Taq for 22, 25, and 28 cycles as well as with AmpliTaq for 15, 20, 25, and 30 cycles. Both amplifications used a mixture of four strains of an equivalent of 100 pg of genomic DNA per strain.
To examine the effect of species diversity on the formation of PCR artifacts, model communities were constructed consisting of 4, 7, and 10 species (Table (Table1).1). The total template concentration used in each community was equivalent to 500 pg of genomic DNA, with equal ratios of each strain. 16S rDNAs were amplified with AmpliTaq for 15 cycles.
Multiple amplications 3 or 10 were combined and purified from low-melting-point agarose gel by using the Wizard PCR Preps DNA Purification System prior to ligation. For amplification with fewer than 20 cycles, PCR products were concentrated by ethanol precipitation. The ratio of 16S rDNAs to pCR II vector was 1:1. Two microliters of each ligation reaction mixture was transformed by heat pulse into E. coli Top10F' competent cells (Invitrogen).
The 16S rDNA inserts were amplified directly from transformant cells in 20 μl with primers TA-F and TA-R (41). A 5-μl sample of the amplification mixture was digested with 1 U of HhaI (Gibco BRL, Gaithersburg, Md.) in a final volume of 15 μl at 37°C overnight. The resulting products were resolved by electrophoresis in 1.6% agarose gel in 1× Tris-borate-EDTA (TBE) buffer at 94 V for 4 h. The gel was stained with 0.5 μg of ethidium bromide per ml and visualized by UV excitation. PCR artifacts were detected by comparing the RFLP patterns to those of the reference strains using the Molecular Analyst Program (Bio-Rad, Hercules, Calif.).
DNA sequencing was carried out on an automated sequencer (model 373A; Applied Biosystems, Foster City, Calif.). Primers FD1 (E. coli 16S rRNA gene position 8 to 27), F925 (position 906 to 925), R529 (position 529 to 512), and R1392 (position 1406 to 1392) were used to obtain the sequences of both ends from both strands. About 200 bp from each individual primer was compared to the database containing the reference 16S rRNA gene sequences using the FASTA program (Genetics Computer Group Sequence Analysis Software Package; University of Wisconsin, Madison). If the sequences of the two ends showed the highest similarity to different reference strains but the sequences of both strands at the same end showed the highest similarity to the same reference strain, this clone was considered a chimera.
Primers F270 (position 246 to 261), F519 (position 512 to 529), F1099 (position 1099 to 1114), R350 (position 342 to 357), R925 (position 925 to 906), and R1540 (position 1541 to 1525) were used to obtain full 16S rRNA gene sequences. The sequences were assembled with PhredPhrap and Consed (University of Washington, Seattle) and compared to reference sequences using MAP, MAPSORT, and GAP (Genetics Computer Group). The error rate was the percentage of total examined clones that have a misincorporated nucleotide at the HhaI sites.
To detect the heteroduplexes, an 8-μl aliquot of the PCR product from the clone with altered RFLP pattern was mixed with 2 μl of 50 mM EDTA (final concentration, 10 mM), denatured at 95°C for 5 min, and renatured at 25°C for 40 min (9, 35). The sample was separated on a 5% nondenatured polyacrylamide (49:1 ratio of acrylamide to bis) gel (16 by 20 cm) with a D Gene System (Bio-Rad) in 1× TBE buffer at 250 V at least for 3 h. The gel was stained with 0.5 μg of ethidium bromide per ml for 15 min, destained in 1× TBE for 30 min, and visualized by UV excitation. A heteroduplex was determined by comparing its banding pattern to those of reference homoduplex molecules. The clones showing extra bands that migrated more slowly than the homoduplex molecules but faster than single-stranded DNA molecules were considered heteroduplexes.
To remove heteroduplexes prior to cloning, 45 μl of amplified 16S rDNAs (Z-Taq) was mixed with 5 μl of 10× loading buffer (0.25% bromophenol blue, 0.25% xylene cyanol, 25% Ficoll [type 400]), and separated on 5% nondenatured polyacrylamide gel. The bands corresponding to homoduplexes of 16S rDNAs were excised. The 16S rDNAs were recovered using a QIAEX II gel extraction kit (QIAGEN Inc., Valencia, Calif.) according to the manufacturer's instructions, except that the crushed gel strips were diffused twice with a 1.5× volume of diffuse buffer (0.5 M ammonium acetate 10 mM manganese acetate, 1 mM EDTA, and 0.1% sodium dodecyl sulfate) at 50°C for 30 min. 16S rDNAs were eluted in 40 μl of TE (10 mM Tris-HCl [pH 8.0], 0.1 mM EDTA) and concentrated to 20 μl by ethanol precipitation. The final concentration of the recovered 16S rDNAs was estimated on the 0.8% agarose gel, and about 20 ng was used for ligation.
To digest heteroduplexes, 60 μl of amplified 16S rDNAs (LA-Taq) was concentrated to 6 μl by ethanol precipitation, and incubated with 1 μl (30 U) of T7 endonuclease I in a volume of 20 μl in a buffer containing 50 mM Tris (pH 8.0), 50 mM potassium glutamate, 10 mM MgCl2, 5 mM dithiothreitol, and 5% glycerol at 37°C for 20 min and then purified with the QIAEX II gel extraction kit and eluted in 20 μl of TE buffer (10 mM Tris [pH 8.0], 0.1 mM EDTA). Addition of 3′-A overhangs postamplification was carried out in a 20-μl reaction mixture containing 1× PCR buffer, 1.5 mM MgCl2, 0.1 mM dATP, and 0.5 U of Taq DNA polymerase at 65°C for 20 min. The resulting 16S rDNAs (about 20 ng) were directly used for ligation.
To test whether heteroduplex molecules of nearly entire 16S rRNA genes (1.5 kb) can be detected by polyacrylamide gel electrophoresis (PAGE) cloned 16S genes from the two strains were mixed equally and subjected to PCR amplification. The migration of heteroduplexes was retarded due to the “bubble” formation between mismatches (Fig. (Fig.1).1). The decrease of the heteroduplex mobility was inversely proportional to the sequence similarity of the two parental molecules (Fig. (Fig.1A,1A, lanes 6 to 11). These results suggested that heteroduplexes, even those formed between distantly related 16S rRNA genes (76% similar), could be detected by PAGE. Some heteroduplexes of 16S rDNA formed between clones with misincorporated nucleotides and their parental strains were also detectable on the 5% nondenatured polyacrylamide gel (Fig. (Fig.1B,1B, lanes 6 to 9).
This PAGE method may fail to detect heteroduplex clones if the two 16S rDNAs are not equally abundant, which occurs very often due to the random partition of plasmids among daughter cells (30). To determine the effects of the ratios of two parental molecules on heteroduplex detection, the 16S rDNAs were coamplified from the mixture of C10-5 and B9-12 (similarity, 89%) and C10-5 and P-39 (similarity, 76%), respectively. The ratios of the two strains in the mixture were examined from 9:1 to 1:9, which caused the artificial RFLP patterns. The heteroduplexes formed from different ratios were all successfully detected by PAGE (data not shown).
The overall PCR-generated artifacts were enzyme dependent and significantly different among the three Taq DNA polymerases examined. The proportion of the overall PCR artifacts of LA-Taq was lower than that of Z-Taq but significantly higher than that of AmpliTaq (Table (Table2).2). No significant difference in chimera was observed between Z-Taq (8.7% ± 2.2%) and LA-Taq (6.2% ± 2.2%). However, the frequency of chimeras for AmpliTaq was significantly lower than those for Z-Taq and LA-Taq (Table (Table2).2).
Significant differences in the PCR-induced mutations were observed among the three enzymes. While the error rate for LA-Taq (6.8% ± 0.6%) was lower than that for Z-Taq (9.3% ± 0.0%), it was higher than that for AmpliTaq (3.7% ± 1.1%) (Table (Table2).2). (Results are given as means ± standard deviations.) No significant differences in the percentages of heteroduplexes were observed among three Taq DNA polymerases. The frequencies of heteroduplexes (1.2 to 3.7%) were lower than the frequencies of chimeras and mutations for all three Taq DNA polymerases (Table (Table22).
The total PCR artifacts decreased as the elongation time increased (Table (Table2).2). The highest percentage of PCR artifacts (25.5% ± 3.9%) was observed at an elongation time of 20 s. When the extension time increased to 4 min, the total PCR artifacts decreased to 16.1% ± 3.2%. Among the three types of PCR artifacts, the percentage of chimeras decreased dramatically from 10.3 to 3.7 as the elongation time increased. While the PCR-induced mutations varied from 9.6 to 12.8%, they were not significantly different among the three treatments. In addition, no significant difference in the percentage of heteroduplexes was observed when elongation time changed. About 2 to 3% heteroduplexes was observed for all three treatments (Table (Table22).
Since DNA template concentration will affect PCR amplification kinetics, it is expected that template concentration will have considerable effect on the formation of PCR artifacts. We found that the percentage of total PCR artifacts decreased from 17.3 to 10.5 as the template concentrations of each strain decreased from 10 ng to 100 pg (Table (Table2).2). This decrease was mainly due to the decrease in the error rates (from 8.0 to 2.5%). No significant difference in the percentages of heteroduplexes or chimeras was observed (Table (Table22).
As expected, a positive correlation was observed between the formation of PCR artifacts and the number of PCR cycles, with PCR artifacts increasing from 13.0 to 20.8% as the cycles increased from 22 to 28. While the percentage of chimeras increased significantly as the cycle number increased, there was no significant change in the frequency of heteroduplexes. A slight increase in mutations was observed (Table (Table22).
Since AmpliTaq from Perkin-Elmer gave the lowest proportion of PCR artifacts, the effect of PCR cycle number on the formation of PCR artifacts was also examined with this enzyme. No difference of PCR artifacts was observed when the number of PCR cycles was 20 or less, while the total PCR artifacts increased dramatically as the cycle number increased above 20. No chimera was detected when the cycle number was 15, whereas 4.3% chimeras was observed for 30 PCR cycles. Slight increases were observed in both mutations (1.9 to 3.7%) and heteroduplexes (0.6 to 2.5%). Similar to that described above, the overall PCR artifacts were significantly lower in the amplifications with AmpliTaq than those of Z-Taq (Table (Table22).
Total PCR artifacts increased as the species diversity increased (Table (Table3).3). Among 162 clones examined, four PCR artifacts (2.5%) were detected in the four-species community. All these four clones were mutations of P39. Six artifacts (3.7%) were found in the seven-species community; three were mutations, and the other three were heteroduplexes. There were nine PCR artifacts (5.6%) in the 10-species community. Three were mutations, and six were heteroduplexes. No chimeric molecule was detected in all three model communities.
Since heteroduplex molecules migrate more slowly than homoduplex molecules in polyacrylamide gel, the heteroduplex molecules should be separated from the PCR products by PAGE. To test the effectiveness of this purification method, a clone library was constructed using the 16S rDNAs purified from the polyacrylamide gel. No heteroduplexes were observed in this library, whereas 3.7% was detected in the control library (Table (Table4).4).
T7 endonuclease I can cut the single-stranded DNA bubbles in a heteroduplex molecule and chop them into small DNA fragments. The heteroduplexes formed from C10-5 and B9-12 were incubated with 30 and 60 U of T7 endonuclease I at 37°C for different time periods (Fig. (Fig.2).2). The intensity of the bands corresponding to heteroduplexes decreased for both samples incubated with 30 and 60 U of T7 endonuclease I at 37°C for 10 min and for the sample incubated with 60 U of enzyme on ice for 60 min (Fig. (Fig.2,2, lane 16). The smears resulting from the digested heteroduplex molecules were observed on PAGE; however, the intensity of the bands corresponding to 16S rDNA homoduplex molecules remained the same even after incubation with the enzyme at 37°C for 60 min (Fig. (Fig.2).2). These results suggested that T7 endonuclease I was effective in removing heteroduplexes, whereas the homoduplex molecules were protected under the conditions examined. The effectiveness of T7 endonuclease I in the elimination of heteroduplexes was further examined by cloning. A 16S rDNA library was constructed using the PCR products incubated with 30 U of T7 endonuclease I at 37°C for 20 min. No heteroduplexes were detected in this library, whereas 3.1% heteroduplexes was observed in the control library (Table (Table4).4).
PCR-based cloning approaches are powerful tools for analyzing microbial community diversity despite intrinsic problems of bias and artifacts. With careful planning and experimental condition control, the artifacts can be minimized. Although the frequencies of artifacts observed in this study cannot be extrapolated to other studies due to differences in experimental conditions, they do provide valuable information for improving the methodologies of PCR-based cloning studies.
The effects of three types of PCR artifacts on the 16S-based cloning studies depend on experimental purpose. Single-base mutations may have little or no effect on the overall tree topology when the entire 16S rRNA gene sequences are compared. Clean sequences may not be obtained if the clone resulted from a heteroduplex; thus, it should not be a concern when the sequence is required for the analysis. However, all three types of PCR artifacts can have serious impacts when the RFLP or terminal RFLP analysis DGGE, or TGGE, etc., is used for microbial community analysis.
The existence of heteroduplexes in a cloned 16S rRNA gene library has been an unappreciated problem and can lead to overestimating the diversity of a microbial community. We found that the occurrence of heteroduplexes indirectly correlated with DNA polymerases. More heteroduplexes were observed in the amplifications with Z-Taq than with AmpliTaq. This could be due to the fact that more PCR products were synthesized by Z-Taq than by AmpliTaq. High PCR product production will favor heteroduplex formation. This explanation is further supported by the observed increase in proportion of heteroduplexes as the cycle number and template concentrations increased. Heteroduplex frequency also appeared to be a function of species diversity. The frequency of heteroduplexes in the 10-species community was about two to four times higher than those of 7- or 4-species communities. This could be due to the fact that the probability of annealing between the two strands from the same origin decreases as the number of the heterogeneous genes increases. The corresponding increase in heteroduplexes with species diversity is potentially a problem for analyzing natural microbial communities, which may have hundreds to thousands of phylotypes. Besides, there is usually more than one rrn copy in a genome. Theoretically, the frequency of forming heteroduplex molecules between different copies of the 16S rRNA gene in the same genome is higher than that between different 16S rRNA genes, because a heteroduplex should be more stable when the two parental genes have higher sequence similarity.
We proved that heteroduplex molecules of the entire 16S rRNA gene (1.5 kb) could be effectively detected by PAGE in a wide similarity range of 16S rRNA genes. However, the conditions for detecting such long fragments are different from those for DGGE, single-strand conformation polymorphism analysis, or heteroduplex mobility assay. We found that a gel with a lower cross-linking ratio (49:1), which yields a bigger pore size, should be used. Including 10% glycerol in the polyacrylamide gel helped to detect heteroduplexes, but a low concentration of urea in the gel did not. In addition, the conditions for forming heteroduplexes by denaturation-renaturation were also critical. We found that quickly cooling to 25°C for renaturation was much better than slowly cooling to 25°C (37) or quickly cooling on ice (9). Renaturation on ice resulted in more single-stranded DNA fragments. Including 10 mM EDTA in denaturation-renaturation buffer was helpful. However, we had difficulty forming heteroduplexes by denaturation at 98°C for 7 min and renaturation at 60°C for 40 min (12). We could detect the heteroduplexes formed between both closely (98.5% similar) and distantly (76% similar) related 16S rRNA genes by 5% nondenatured polyacrylamide gel.
We recommend the elimination heteroduplexes prior to cloning. It is possible to use an enzyme such as T7 endonuclease I to cut the bubble in a heteroduplex and further destroy it (22). However, the experimental conditions for this treatment are critical. Low concentrations of enzyme or short incubation time do not remove all heteroduplexes, whereas high concentrations of enzyme or long incubation time can digest the homoduplex molecules. Also, postamplification with Taq is required for this approach to generate an A overhang for TA cloning. PAGE was also effective in removing heteroduplex molecules; however, this approach may be difficult for separating the heteroduplexes formed between highly related strains or heteroduplexes having very close conformations to the parental homoduplex molecules.
PCR-generated mutation is another little-recognized problem for 16S rRNA gene-based cloning studies. In general, misincorporated nucleotides in the PCR products is not a big concern since the errors are distributed randomly over the amplified fragment. Theoretically, less than one misincorporated nucleotide is expected when the entire 16S rRNA gene is amplified using an enzyme with an average of fidelity such as 8 × 10−6 /base/replication. However, the error rate observed in this study was much higher than predicted. The highest error rate was observed for Z-Taq; however, the fidelity of Z-Taq (8.6 × 10−6 /base/duplication) is very close to that of AmpliTaq. These results indicated that the PCR-generated errors were not merely the consequences of infidelity of Taq polymerases. Since both Z-Taq and LA-Taq have higher processivity than AmpliTaq, we suspected that the higher error rate might be caused by a lack of PCR reagents, especially dNTPs. This explanation is supported by two observations: first, more PCR products were synthesized by Z-Taq and LA-Taq compared to AmpliTaq when equal units of enzymes were used; second, more PCR-induced mutations were observed when more templates were used (Table (Table22).
PCR amplification fidelity is affected by many factors—not only the enzyme used but also buffer conditions, divalent metal cations, and thermal cycling parameters. It was reported that Taq fidelity decreased when the concentration of Mg2+ was in great excess compared to total dNTPs (11). The Mg2+ and dNTP concentrations used in this study were the optimum concentrations recommended by the manufacturers. The rates of excess of Mg2+ over total dNTPs with Z-Taq, LA-Taq, and AmpliTaq were 2.2, 0.9, and 0.7 mM, respectively. All these assays were within the range of high-fidelity conditions described by Eckert et al. (11). Whether it has true impact on the high error rate needs to be further examined.
We also observed that certain types of artificial RFLP patterns that were caused by a misincorporated single nucleotide appeared in many independent amplifications with each of the three Taq DNA polymerases. To better understand this phenomenon, eight clones, each with a distinct artificial RFLP pattern that most frequently arose, were studied in detail. Clones 16, 6, 34, and 7 gained an HhaI site whereas clones 15, 40, 14, and 35 lost an HhaI site, all due to a base substitution (Table (Table5).5). Moreover, both error sites of clone 15 (mutation of C1-4) and clone 14 (mutation of B9-12) were at the E. coli position 1109, where C was in a loop (Table (Table5).5). The 16S rDNA sequences of the two clones were significantly different (79.4% similar). Hence, we suspect that the secondary structure of 16S rRNA gene contributed to the high error rate observed.
Consistent with the results of Wang and Wang (34, 35), we found that the longer extension times and fewer PCR cycles decreased the frequency of chimeras. The percentages of chimeras found in this study, however, were more than three times lower than those observed by Wang and Wang (34, 35), probably due to different experimental systems and conditions. For example, they used pairs of cloned 16S rRNA genes with sequence similarity of 99.3, 86, and 82%, whereas sequence similarity varied from 76 to 89% in our four-species model community. Also, our chimera detection method could be less sensitive since we detected only an altered RFLP pattern. In theory, using Taq DNA polymerase with higher processitivity should lower the frequency of chimeras, since chimeric molecules are mainly caused by incomplete synthesis during the PCR cycle (24, 38). Conversely, the highest frequency of chimeras was observed for Z-Taq, which has the highest processitivity. Likely, other factors contributed to the formation of chimeras. For example, an undenatured region or secondary structure in the templates will make it difficult for DNA polymerase to read through, causing termination of DNA synthesis.
While some tools are available for detecting chimeras and heteroduplexes after cloning, no tools are available for identifying PCR-generated single-base mutations in natural samples. Thus, it is critical to minimize PCR artifact formation prior to cloning. PCR cycling is one key parameter to reducing all three types of PCR artifacts. We suggest that the PCR amplification for any cloning-based community studies be performed with as few cycles as possible (http://www.esd.ornl.gov/people/zhou/zhou.html). The appropriate cycle number will depend on the amount of template used, amplification efficiency, and existence and degree of inhibitory substances and thus should be determined experimentally. To minimize PCR artifacts, we suggest using the PCR products prior to or during the exponential period for cloning. To obtain enough products for cloning, we suggest combining multiple amplifications followed by concentration with ethanol precipitation. Mixing PCR products from independent amplifications can also help to minimize experimental errors and amplification bias. The concentrated sample can then be quantified and used for constructing 16S rRNA gene library. Because the extent of 16S rRNA gene artifacts can never be known in a natural sample, we suggest that interpretations be focused on comparative studies with replication and under identical PCR conditions.
We thank Joe-Chang Cho for discussion and advice on heteroduplex detection.
This research was supported by the Natural and Accelerated Bioremediation Research and the Biotechnology Investigations-Ocean Margins Program, Office of Biological and Environmental Research, The United States Department of Energy. Oak Ridge National Laboratory is managed by University of Tennessee-Battelle LLC for the Department of Energy under contract DE-AC05-00OR22725, and the Center for Microbial Ecology is funded by the National Science Foundation, DEB-9120006.