The use of Roche 454 GS FLX next generation sequencing has played an instrumental role in introducing the concept of a Rare Biosphere
, with this long tail of rare taxa being reported for nearly every community characterized using 454 pyrosequencing 
. Although the presence of rare taxa in various environments has been shown using a variety of independent methods 
, the true frequencies of these taxa, particularly as characterized using pyrosequencing data, remain in question 
. Moreover, little is known about the accuracy of community structural information derived from the frequency distribution of 16S rRNA gene amplicons within 454 pyrosequencing libraries obtained using the newer Titanium chemistry with longer read lengths.
Overall, our findings show that the 454-Ti sequencing platform provides useful information about microbial community structure since observed and expected frequencies of error-free reads exhibited good correlations (). Effects of pyrosequencing-specific biases (based on “P” iv
-SCs) were exceeded by the impact of PCR biases in mixed template samples (“T” and “E” iv
-SCs). For example, nearly half of the error-free reads in V3V4T originated from a single sequence (1216C) at only 1% relative abundance within the template DNA (Table S2
), and the positive PCR bias for this sequence in the V3–V4 regions resulted in a significant skew in recovered community structure information ( and ). Therefore, the presence of one or a few sequences prone to PCR bias can drastically skew observed relative abundances, but the rank frequency distribution of other sequences appears to be preserved (). Meanwhile, the V6 iv
-SCs did not appear to have been subject to significant PCR bias.
Error-containing reads comprised a significant portion of the total reads for all iv
-SCs. However, this was not due to quality issues with the sequencing process, as the observed proportions are in fact consistent with a high per-base accuracy (>99.5%, Table S1
). Therefore, it would have been impossible to systematically isolate the error-containing reads without a priori
knowledge of the community. To resolve this issue, “de-noising” algorithms that employ clustering techniques were used to assign error-containing sequences to the true sequences from which they arose 
. Our findings showed that these approaches occasionally infer the wrong “true” sequence from clusters of mixed error-free and error-containing reads, and invariably produced low-abundance false OTUs that are indistinguishable from real ones. These false OTUs can lead to an overestimation of the total number of OTUs in the iv
-SCs. In some cases, over-clustering by the de-noising algorithm compensated, albeit incorrectly, for this OTU inflation. Nevertheless, these de-noising algorithms represent a marked improvement over simple, arbitrary quality filters 
in that they effectively reduce the number of unique error-containing reads that can be mistaken for real sequences.
Although these de-noising pipelines were evaluated in their respective primary publications for the accuracy of recovered richness 
and relative abundances 
, this study provides the first independent, explicitly quantitative assessment of their performance using carefully constructed and well quantified in vitro
-simulated communities. Given that researchers interpreting results from these pipelines inevitably treat them as quantitatively representative of the biological communities, the results presented here provide a useful assessment of information obtained and disseminated using such methodology. A step-by-step comparison between the three de-noising algorithms was unfeasible due to their integrated pipeline design.
The process of clustering sequencing reads into OTUs traditionally involves three distinct steps: quality filtering, alignment, and clustering. The SLP and AmpliconNoise de-noising step constitutes an independent procedure that occurs after quality filtering but before alignment 
. PyroTagger instead combines de-noising, alignment and clustering into a single, final step 
. It should be noted that PyroTagger’s authors pointed out that it may not be suitable for 454-Ti data due to supposedly lower read quality, but given that 454-Ti has become the de facto
technology for amplicon sequencing, we felt that an assessment of the unique approach employed by PyroTagger needed to be included. AmpliconNoise was chosen over alternative flowgram-based clustering algorithms for several reasons: 1) it incorporates a number of significant performance improvements over PyroNoise 
; 2) its implementation allows it to be run on a computer cluster to speed up analysis; 3) it does not incorporate a greedy/heuristic step and thus has better reproducibility (vs. Qiime Denoiser 
, Figure S1
). We note that the two central components of AmpliconNoise, PyroNoise and SeqNoise 
, have recently been re-implemented in Mothur as the Shhh.flows command, which was shown to perform comparably to AmpliconNoise under similar circumstances 
Correlations between OTU frequencies calculated from de-noised reads and expected OTU relative abundances were similar to those calculated from error-free reads, indicating that these methods can effectively recover error-containing reads while maintaining approximate community structure. All three de-noising approaches identified similar numbers of OTUs that reflected real iv-SC taxa (i.e., true, miscalled and near-known OTUs), but differed in the numbers of false OTUs detected, with PyroTagger outperforming both SLP and AmpliconNoise ( and ). However, PyroTagger produced the poorest correlation between observed and expected relative abundances () and incorrectly merged reference V3–V4 sequences, indicating a tendency to over-cluster. The stringent quality-based filtering used by PyroTagger also discarded a greater number of raw sequencing reads (data not shown), resulting in the absence of several expected low-abundance taxa from the de-noised dataset ( and ).
SLP performed similarly to PyroTagger in predicting species richness within the V3V4P community, but did so by an over-aggressive de-noising procedure that resulted in several real taxa being erroneously grouped into one OTU. This occurred at the de-noising step and was not related to post de-noising clustering procedures (data not shown). Moreover, SLP inferred abundant (>1%) OTUs comprised entirely of error-containing reads in the reconstruction of the V6P iv-SC. Compared to SLP, false-derived OTUs were observed at much lower frequencies (<0.1%) for the V6P iv-SC reconstructed using either PyroTagger or AmpliconNoise. Although more computationally intensive, AmpliconNoise models the distribution of pyrosequencing errors at the flowgram level and is able to robustly assign error-containing reads to their parent error-free reads. AmpliconNoise appears to be free from the over-clustering effect observed with both PyroTagger and SLP, and therefore tends to overestimate OTU richness (). However, it incorrectly identified the highest number of OTU representative sequences with the V3V4P iv-SC, which may have ramifications for downstream analyses that rely on precise phylogenetic resolution.
Because AmpliconNoise includes a built-in chimera checker, Perseus, it bypasses the need for multiple sequence alignment (MSA) 
or reference sequences, as recommended for PyroTagger 
. For typical pyrosequencing amplicon datasets containing thousands of unique sequences, MSA is impractical, as are the use of reference sequences and a priori
assumptions about the identity of environmental sequences. The outcome of our analyses shows that AmpliconNoise is the de-noising algorithm least likely to allow chimeric reads to be “absorbed” into read predictions (), thus affecting abundance estimates. This may partially explain why the correlation between the expected and the observed frequencies of relevant OTUs was highest for the AmpliconNoise pipeline ().
Rather than using mixtures of genomic DNA preparations, plasmids containing cloned 16S rRNA genes were used for this study. This approach avoided the issues of inter-genomic variations in rrn
operon copy numbers, intra-genomic variation in rrn
operon sequences, and quantification inaccuracies due to genome size differences 
, thus allowing greater quantitative accuracy. We limited the richness of the iv
-SCs to twenty sequences to allow reliable quantification of libraries using both mixed plasmids and PCR products. Given the high proportion of artifactual rare OTUs recovered by all three de-noising pipelines with these relatively simple communities, it is unlikely that a more complex simulated community would have improved their performance. Nineteen of the twenty clones included in the study were from Cyanobacteria
isolated from similar environments and are therefore comparatively similar in sequence. This resulted in some of the reference sequences being clustered together, even by the most lenient clustering approach (), but it also exposed PyroTagger’s tendency to over-cluster and mask genuine diversity (). The inclusion of one Actinobacteria
clone (1216C) allowed us to explore the effects of primer bias on different phylogenetic groups.
Although we had a priori
knowledge of the iv
-SC sequences, we elected not to customize PCR primers to account for known mismatches and performed the experiment using “universal” primers commonly used for microbial community analyses 
. Thus, our analyses were subject to the same biases common to any study utilizing these common universal primers against environmental DNA. We also avoided using primers with degenerate bases since primer degeneracy can reduce specificity, lead to exhaustion of effective primers as the reaction progresses 
, and impose biases of its own 
. Recently, an alternative of using a mixture of non-degenerate primers has been proposed 
, which may significantly increase “universality” while avoiding the pitfalls of degenerate primers.
Numerous mechanisms can contribute to PCR bias, including polymerase error 
, formation of chimeric and heteroduplex molecules 
, and differential amplification efficiency 
. Our study incorporated many of the wet bench techniques known to be effective toward reducing these biases 
, including low cycle numbers (30 cycles), pooling multiple reactions (3×30 µl), high template concentration (>4 ng of 16S rRNA gene clones), and the use of a proofreading DNA polymerase. Differential primer annealing efficiency provides another mechanism for PCR bias, and although factors such as annealing temperature and primer GC content can influence the outcome of PCR 
, primer mismatch may have the greatest impact for PCR studies of 16S rRNA gene diversity.
The lack of a truly “universal” pair of 16S rRNA gene PCR primers has long been acknowledged 
. Although some have suggested that the number of taxa recovered is not necessarily linked to the taxonomic specificity (i.e., universality) of a primer set 
, our findings suggest that mispriming is a major, if not the main, factor leading to errors in the observation frequency of taxa within a community (Table S3
). Mispriming near the 5′ end of the priming region is thought to have little effect on PCR since extension occurs from the 3′ end 
. However, it has been reported that 454 Fusion primers containing the 454 adapter sequence at the 5′ end may be more susceptible to the effects of mispriming, resulting in the over-representation of templates that are not misprimed 
. The adoption of a two-step PCR for amplicon pyrosequencing may ameliorate this issue 
. Moreover, our findings highlight the complications associated with comparing community structures obtained using different primer sets.
Certain aspects of our experimental protocol may have exacerbated effects of PCR primer mismatch. For example, preferential amplification of perfectly matching template would be expected since the annealing temperature in our PCR protocol started high and decreased with each cycle (see Information S1
) rather than starting at a lower temperature 
. Our modified PCR protocol was chosen because it resulted in an increased DNA yield and thus enabled accurate quantification of PCR amplicons (a prerequisite of pyrosequencing of PCR amplicons). This limitation can be addressed by new instruments that enable small quantities of DNA to be precisely characterized (e.g., Agilent 2100 Bioanalyzer, Agilent Technologies), fractionated (e.g., LabChip XT, Caliper Life Sciences), and quantified (e.g., Kapa Library Quant Kits, Kapa Biosystems). Although these methods were not available for this study, we recommend that they be adopted for the preparation of 16S rRNA gene amplicon libraries for 454-Ti sequencing in addition to adopting PCR conditions such as very low Tm
and low (<25) PCR cycles (in conjunction with higher template quantity where possible) 
Our results have shown that while de-noising methods for pyrosequencing data need further development, they are an essential processing step for the recovery of usable community structure information. Overall, the largest hurdle to accurate estimation of microbial community structure appears to be PCR bias, which is independent of sequencing technology. Although a variety of measures may be taken to reduce the impact of PCR bias, it cannot be eliminated outright, and our findings highlight the need to better characterize this phenomenon using simulated communities. Another source of error also arises from PCR in the form of chimeric sequences, which are difficult to eliminate. Even though Perseus was able to effectively remove a large portion of chimeric sequences, a small portion of chimeric sequences contributed disproportionately to the number of OTUs observed, especially the infrequent (i.e., rare) OTUs ( and Table S4
). Therefore, chimeras can significantly inflate OTU estimates, even with short PCR amplicons generated from presumably “immune” 16S regions such as the V6 hypervariable region 
). These realities, combined with the observed prevalence of artifactual rare OTUs (), caution against singular interpretations of community structure, especially those that involve within-sample relative OTU frequencies or estimations of Rare Biosphere
diversity. Instead, the strength of the 454-Ti platform more likely lies in comparative studies and identifying the presence of specific rare taxa. Lastly, our findings highlight the dangers in quickly adopting technological advances without statistically robust validation, given that substantial portions of the Rare Biosphere
identified using up-to-date de-noising algorithm are still artifacts. The impressively high microbial diversities reported by some past studies 
based on less developed pyrosequencing quality filters should therefore be re-examined.