PCR-based cloning approaches are powerful tools for analyzing microbial community diversity despite intrinsic problems of bias and artifacts. With careful planning and experimental condition control, the artifacts can be minimized. Although the frequencies of artifacts observed in this study cannot be extrapolated to other studies due to differences in experimental conditions, they do provide valuable information for improving the methodologies of PCR-based cloning studies.
The effects of three types of PCR artifacts on the 16S-based cloning studies depend on experimental purpose. Single-base mutations may have little or no effect on the overall tree topology when the entire 16S rRNA gene sequences are compared. Clean sequences may not be obtained if the clone resulted from a heteroduplex; thus, it should not be a concern when the sequence is required for the analysis. However, all three types of PCR artifacts can have serious impacts when the RFLP or terminal RFLP analysis DGGE, or TGGE, etc., is used for microbial community analysis.
The existence of heteroduplexes in a cloned 16S rRNA gene library has been an unappreciated problem and can lead to overestimating the diversity of a microbial community. We found that the occurrence of heteroduplexes indirectly correlated with DNA polymerases. More heteroduplexes were observed in the amplifications with Z-Taq than with AmpliTaq. This could be due to the fact that more PCR products were synthesized by Z-Taq than by AmpliTaq. High PCR product production will favor heteroduplex formation. This explanation is further supported by the observed increase in proportion of heteroduplexes as the cycle number and template concentrations increased. Heteroduplex frequency also appeared to be a function of species diversity. The frequency of heteroduplexes in the 10-species community was about two to four times higher than those of 7- or 4-species communities. This could be due to the fact that the probability of annealing between the two strands from the same origin decreases as the number of the heterogeneous genes increases. The corresponding increase in heteroduplexes with species diversity is potentially a problem for analyzing natural microbial communities, which may have hundreds to thousands of phylotypes. Besides, there is usually more than one rrn copy in a genome. Theoretically, the frequency of forming heteroduplex molecules between different copies of the 16S rRNA gene in the same genome is higher than that between different 16S rRNA genes, because a heteroduplex should be more stable when the two parental genes have higher sequence similarity.
We proved that heteroduplex molecules of the entire 16S rRNA gene (1.5 kb) could be effectively detected by PAGE in a wide similarity range of 16S rRNA genes. However, the conditions for detecting such long fragments are different from those for DGGE, single-strand conformation polymorphism analysis, or heteroduplex mobility assay. We found that a gel with a lower cross-linking ratio (49:1), which yields a bigger pore size, should be used. Including 10% glycerol in the polyacrylamide gel helped to detect heteroduplexes, but a low concentration of urea in the gel did not. In addition, the conditions for forming heteroduplexes by denaturation-renaturation were also critical. We found that quickly cooling to 25°C for renaturation was much better than slowly cooling to 25°C (37
) or quickly cooling on ice (9
). Renaturation on ice resulted in more single-stranded DNA fragments. Including 10 mM EDTA in denaturation-renaturation buffer was helpful. However, we had difficulty forming heteroduplexes by denaturation at 98°C for 7 min and renaturation at 60°C for 40 min (12
). We could detect the heteroduplexes formed between both closely (98.5% similar) and distantly (76% similar) related 16S rRNA genes by 5% nondenatured polyacrylamide gel.
We recommend the elimination heteroduplexes prior to cloning. It is possible to use an enzyme such as T7 endonuclease I to cut the bubble in a heteroduplex and further destroy it (22
). However, the experimental conditions for this treatment are critical. Low concentrations of enzyme or short incubation time do not remove all heteroduplexes, whereas high concentrations of enzyme or long incubation time can digest the homoduplex molecules. Also, postamplification with Taq
is required for this approach to generate an A overhang for TA cloning. PAGE was also effective in removing heteroduplex molecules; however, this approach may be difficult for separating the heteroduplexes formed between highly related strains or heteroduplexes having very close conformations to the parental homoduplex molecules.
PCR-generated mutation is another little-recognized problem for 16S rRNA gene-based cloning studies. In general, misincorporated nucleotides in the PCR products is not a big concern since the errors are distributed randomly over the amplified fragment. Theoretically, less than one misincorporated nucleotide is expected when the entire 16S rRNA gene is amplified using an enzyme with an average of fidelity such as 8 × 10−6 /base/replication. However, the error rate observed in this study was much higher than predicted. The highest error rate was observed for Z-Taq; however, the fidelity of Z-Taq (8.6 × 10−6 /base/duplication) is very close to that of AmpliTaq. These results indicated that the PCR-generated errors were not merely the consequences of infidelity of Taq polymerases. Since both Z-Taq and LA-Taq have higher processivity than AmpliTaq, we suspected that the higher error rate might be caused by a lack of PCR reagents, especially dNTPs. This explanation is supported by two observations: first, more PCR products were synthesized by Z-Taq and LA-Taq compared to AmpliTaq when equal units of enzymes were used; second, more PCR-induced mutations were observed when more templates were used (Table ).
PCR amplification fidelity is affected by many factors—not only the enzyme used but also buffer conditions, divalent metal cations, and thermal cycling parameters. It was reported that Taq
fidelity decreased when the concentration of Mg2+
was in great excess compared to total dNTPs (11
). The Mg2+
and dNTP concentrations used in this study were the optimum concentrations recommended by the manufacturers. The rates of excess of Mg2+
over total dNTPs with Z-Taq
, and AmpliTaq
were 2.2, 0.9, and 0.7 mM, respectively. All these assays were within the range of high-fidelity conditions described by Eckert et al. (11
). Whether it has true impact on the high error rate needs to be further examined.
We also observed that certain types of artificial RFLP patterns that were caused by a misincorporated single nucleotide appeared in many independent amplifications with each of the three Taq DNA polymerases. To better understand this phenomenon, eight clones, each with a distinct artificial RFLP pattern that most frequently arose, were studied in detail. Clones 16, 6, 34, and 7 gained an HhaI site whereas clones 15, 40, 14, and 35 lost an HhaI site, all due to a base substitution (Table ). Moreover, both error sites of clone 15 (mutation of C1-4) and clone 14 (mutation of B9-12) were at the E. coli position 1109, where C was in a loop (Table ). The 16S rDNA sequences of the two clones were significantly different (79.4% similar). Hence, we suspect that the secondary structure of 16S rRNA gene contributed to the high error rate observed.
Nature of mutation in most frequently detected clones
Consistent with the results of Wang and Wang (34
), we found that the longer extension times and fewer PCR cycles decreased the frequency of chimeras. The percentages of chimeras found in this study, however, were more than three times lower than those observed by Wang and Wang (34
), probably due to different experimental systems and conditions. For example, they used pairs of cloned 16S rRNA genes with sequence similarity of 99.3, 86, and 82%, whereas sequence similarity varied from 76 to 89% in our four-species model community. Also, our chimera detection method could be less sensitive since we detected only an altered RFLP pattern. In theory, using Taq
DNA polymerase with higher processitivity should lower the frequency of chimeras, since chimeric molecules are mainly caused by incomplete synthesis during the PCR cycle (24
). Conversely, the highest frequency of chimeras was observed for Z-Taq
, which has the highest processitivity. Likely, other factors contributed to the formation of chimeras. For example, an undenatured region or secondary structure in the templates will make it difficult for DNA polymerase to read through, causing termination of DNA synthesis.
While some tools are available for detecting chimeras and heteroduplexes after cloning, no tools are available for identifying PCR-generated single-base mutations in natural samples. Thus, it is critical to minimize PCR artifact formation prior to cloning. PCR cycling is one key parameter to reducing all three types of PCR artifacts. We suggest that the PCR amplification for any cloning-based community studies be performed with as few cycles as possible (http://www.esd.ornl.gov/people/zhou/zhou.html
). The appropriate cycle number will depend on the amount of template used, amplification efficiency, and existence and degree of inhibitory substances and thus should be determined experimentally. To minimize PCR artifacts, we suggest using the PCR products prior to or during the exponential period for cloning. To obtain enough products for cloning, we suggest combining multiple amplifications followed by concentration with ethanol precipitation. Mixing PCR products from independent amplifications can also help to minimize experimental errors and amplification bias. The concentrated sample can then be quantified and used for constructing 16S rRNA gene library. Because the extent of 16S rRNA gene artifacts can never be known in a natural sample, we suggest that interpretations be focused on comparative studies with replication and under identical PCR conditions.