|Home | About | Journals | Submit | Contact Us | Français|
High-throughput molecular technologies can profile microbial communities at high resolution even in complex environments like the intestinal microbiota. Recent improvements in next-generation sequencing technologies allow for even finer resolution. We compared phylogenetic profiling of both longer (454 Titanium) sequence reads with shorter, but more numerous, paired-end reads (Illumina). For both approaches, we targeted six tandem combinations of 16S rRNA gene variable regions, in microbial DNA extracted from a human faecal sample, in order to investigate their limitations and potentials. In silico evaluations predicted that the V3/V4 and V4/V5 regions would provide the highest classification accuracies for both technologies. However, experimental sequencing of the V3/V4 region revealed significant amplification bias compared to the other regions, emphasising the necessity for experimental validation of primer pairs. The latest developments of 454 and Illumina technologies offered higher resolution compared to their previous versions, and showed relative consistency with each other. However, the majority of the Illumina reads could not be classified down to genus level due to their shorter length and higher error rates beyond 60nt. Nonetheless, with improved quality and longer reads, the far greater coverage of Illumina promises unparalleled insights into highly diverse and complex environments such as the human gut.
Complex microbial communities, like the human gastro-intestinal tract (GIT) and other bacterium-dense environments, are currently receiving increasing interest, due in large part to technological advances in culture-independent methods in recent years. Compared to capillary sequencing and non-sequence-based molecular methods, high-throughput sequencing provides unparalleled insight into community structures. Typically carried out by pyrosequencing on a 454 Genome Sequencer FLX machine (1), amplicons (sequence reads) of a single variable 16S rRNA gene region are quantified and subsequently assigned to microbial phylogenies (and thence to taxonomies). The nine different variable 16S rRNA gene regions are flanked by conserved stretches in most bacteria (2), and they can be used as targets for PCR primers with near-universal bacterial specificity (3,4). Although less discriminatory than the full-length 16S rRNA gene, massively parallel sequencing of the shorter reads offer either much higher coverage per sample (5) or many more samples per instrument run by means of innovative bar-coding techniques (6,7). The trade-off with the longer, but fewer, reads generated by traditional capillary sequencing means a lower proportion of amplicons that can be classified at genus or species levels. In contrast, the resolution of the community composition with amplicon pyrosequencing is potentially several orders of magnitude larger than clone library sequencing, and can be achieved at a significantly lower cost.
Different variable regions have been targeted in different studies. Generally, this selection has not been dependent on the sampled environment, but rather on published or unpublished recommendations and/or experimental familiarity with a certain region in the author’s laboratory. A few comparative studies have focused on assessing region suitability: after using different methodological approaches Sundquist et al. favored the V1/V2/V4 regions (8); Liu et al. the V2/V3/V4 regions (9); Wang et al. the V2/V4 regions (10) and Chakravorty et al. the V2/V3 regions (11). Recently, we compared high and low pyrosequencing coverage of the V4 and V6 regions and concluded that the RDP-Classifier consistently assigned more V4 than V6 reads from the human GIT down to genus-level (5). Furthermore, a lower coverage (40000 reads per sample) was sufficient to capture the majority of the bacterial diversity that was identified by five times greater sequencing depth. Pyrosequencing of the V4 region also yielded compositional profiles that were consistent with HITChip analysis (12), whereby we hybridised full-length 16S rRNA genes from two samples onto a phylogenetic array containing probes of concatenated V1 and V6 sequences.
As compositional studies like these depend on amplicon generation, they are subject to PCR bias of varying degrees (13,14). Although it was recently shown that amplicon length somewhat affected phylotype richness when comparing pyrosequencing reads of the V1–V2 and V8 regions of microbiota from the termite hindgut, the choice of region had a much larger impact on diversity values such as community richness and evenness (15). Moreover, parallel work from the same group suggested that far too relaxed quality filtering of raw pyrosequencing reads had been applied in many previous studies, thereby inflating previously reported diversity estimates measured at predefined phylotype similarity levels (16), as was also alluded to in an earlier study (17). However, pyrosequencing errors seem to have a lesser impact on both phylogenetic assignment rates (18) and methods comparing diversity across different communities (19). In a separate study, Wang and colleagues highlighted the lack of coverage and scope estimates for known 16S rRNA primers, and in response generated a comprehensive list of tested primers, along with recommendations of a few with particularly high coverage and universal properties (20).
Another massively parallel sequencing technology, that was first described the same year as 454 Pyrosequencing, is the Illumina technology [then Solexa Ltd. (21)]. Since then, the Illumina Genome Analyzer instruments have routinely been producing more than ten times the number of reads per run as the 454 GS FLX machines, albeit of much shorter lengths (typically between 36 and 76bp). Lazarevic and colleagues sequenced over 1.3 million single reads of the latter size from the 16S V5 region, in order to explore the human oral microbiota (22). The higher coverage allowed the identification of low-abundance genera not detected in earlier studies of oral microbial flora. However, the limitations to the application of the Illumina technology for compositional studies were noted in that study, and future enhancements were predicted to increase its suitability for environments of even higher complexity. Recently, Illumina sequencing has also been applied to other single variable 16S rRNA regions, such as the V4 region in various environmental communities (23), or the V6 region in vaginal microbiota (24).
In this study, we took advantage of recent performance improvements in both the Illumina (paired-end 101bp reads) and Pyrosequencing (>400bp reads) technologies, and applied these on the human gut microbiota. We targeted the highly diverse microbial community within a single human faecal sample by separately sequencing both entire amplicons from six tandem variable 16S regions, and flanking ends thereof. In addition to evaluating the effects of biases imposed by targeting different variable regions and commonly used primers, we discuss the parameters for what may become the future methods of choice for microbial community composition analysis.
To explore the potential of microbial community composition analysis using Illumina and 454 Titanium sequencing, as well as how classification accuracy varies with reads of different length and quality, we compiled a high-quality reference set. This was based on the SILVA SSURef database release 100 (25), comprising 409907 near full-length 16S rRNA sequences. To increase its quality and make it representative for bacterial communities within the GIT, the following filtering criteria were applied: (i) only bacterial sequences with no known anomalies, e.g. chimeras (pintail-score=100); (ii) sequence length at least 1300bp; (iii) existing RDP-classification not containing unclassified bacteria; (iv) sequence quality at least 90 (out of 100) and (v) isolated from samples of human or animal GIT or faeces. This filtering process resulted in a high-quality reference set of 27 013 full-length 16S rRNA sequences, whereof 98% originated from uncultured bacteria.
By using annealing locations for primers listed in Table 1, sequences for the six variable tandem regions were extracted in silico from the full-length sequence reference set, mimicking data from 454 Titanium reads (Figure 1). To also simulate paired-end and variously sized Illumina reads from the same 16S regions, 150/100/75/50bp fragments were in turn extracted from both ends of these Titanium-length reads, filling the interior regions with 20 N residues. The RDP Probe Match program was used for calculating coverage among 16S rRNA sequences in the RDP database. In addition to simulating reads with perfect quality, we introduced stochastic errors along the read lengths according to error rates provided by the sequencing vendors for 454 Titanium (Figure 2a), and Illumina (average of data from forward and reverse reads) using the two most recent sequencing kits (Figure 2b). The simulated reads were then taxonomically assigned using the RDP-classifier (10) with a bootstrap cut-off of 50%. We had previously found this cut-off value to achieve the optimal balance of between achieved accuracy and retained number of reads (5). As the RDP-classifier ignores any 8-mer words with Ns, the interior regions have no impact on classification results. Classifications of both the simulated reads and of their originating full-length 16S rRNA sequences were imported into a MySQL database. This allowed fast and precise comparisons with the reference set, resulting in measurements of classification accuracy for each set of simulated reads.
A faecal sample was collected from an 87-year-old female [subject D in our previous study (5)], who was a member of a larger cohort of elderly subjects recruited for the ELDERMET project (http://eldermet.ucc.ie). The Clinical Research Ethics Committee of the Cork Teaching Hospitals (CREC) granted full approval to the ELDERMET project on the 19th February 2008 [Ref: ECM 3 (a) 01/04/08]. Formal written consent was obtained, on the basis of an Information Sheet/Safety Statement, following an ethics protocol that was approved by CREC, in compliance with pertaining local, national and European ethics legislation and guidelines to best practice. The subject was taking an unknown antibiotic at the time of sampling. The sample was processed from fresh stool the same day as collection and DNA was extracted according to standard protocol (Qiagen, West Sussex, UK). Six amplicon libraries were created of variable 16S rRNA tandem regions using primers in Table 1. Standard PCR reaction conditions were employed for reactions with Taq polymerase: 2mM MgCl2, 200nM each primer and 200μM dNTPs. The PCR conditions were 94°C for 50s (initialization and denaturing) followed by 40°C for 30s (annealing), 72°C for 60 s in 35 cycles (extension), and a final elongation step at 72°C for 5min. Two negative control reactions containing all components, but water instead of template, were performed alongside all test reactions, and were routinely free of PCR product, demonstrating lack of contamination with post-PCR product. The optimal annealing temperature for the primers, which included either the 454 adapters or the standard paired-end Illumina adaptors, was empirically determined by gradient PCR using control reactions with initially purified bacterial genomic DNA, and validated on faecal microbial community DNA (data not shown). The usage of region-specific 16S rRNA primers made additional barcodes redundant. All six amplicons were pooled and subsequently sequenced on a 454 Genome Sequencer FLX Titanium one-quarter picoliter plate (Cogenics, Essex, UK) according to 454 protocols. In addition, the same pool of samples was sequenced on one Illumina GA-IIx lane (Fasteris, Geneva, Switzerland) for 101 cycles from both ends of paired-end library preparations, using sequencing kit version 3.0 followed by base-calling using the GAPipeline version 1.4.0.
Raw pyrosequencing reads were quality trimmed according to published recommendations (26) using a locally installed version of the RDP Pyrosequencing Pipeline (27): sequences with inexact matches to both primer sequences, having poor quality, one or more ambiguous bases or read-lengths at least 20bp shorter than the electropherogram peaks for each set of amplicon, were filtered off. Chimera sequences were detected with ChimeraSlayer (28).With the exception of the last criterion, the same criteria were applied to Illumina reads. Prior to this, purity filtering with ‘chastity’ values >0.6 and a maximum of one failed base call in the first 24 bases was applied to the raw reads. Additional filtering criteria were also applied and evaluated (Supplementary data). Once filtered, the reverse Illumina read for each amplicon was reverse-complemented and merged with the corresponding forward read, inserting 20 Ns in between. Both forward and reverse 16S rRNA primers were removed from all pyrosequencing and Illumina reads. The primer sequences carry per definition very little phylogenetic information, so their removal did not have an adverse affect on taxonomic classifications. This was also supported by tests with and without primers on the simulated reference set (data not shown).
The Naïve Bayesian Classifier (RDP-Classifier) was used for assigning reads into the new Bergey’s bacterial taxonomy (29) with a bootstrap cut-off of 50%. Trimmed sequences along with their classifications were imported into a MySQL database for efficient storage and advanced querying. To explore an alternative assignment method, a hierarchical tree summarising read assignments into the NCBI taxonomy was constructed using MEGAN (30) on BLAST searches against a previously published 16S rRNA—specific database (31) (with a bit-score threshold of 86, allowing ten hits per read). Pyrosequencing reads were aligned using Infernal (32) and associated covariance models obtained from the Ribosomal Database Project Group. Phylotype clusters of 97% similarity were obtained by applying the furthest neighbour approach using the Complete Linkage Clustering application of the RDP pyrosequencing pipeline. From these, rarefaction curves, Shannon diversities and Chao1 richness estimations were calculated using RDP software. Good’s coverage was calculated as G=1−n/N, where n is the number of singleton phylotypes and N is the total number of sequences in the sample.
In silico predictions show that classification accuracies are highly dependent on choice of region, sequencing technology and sequence quality
The complete sequences of six tandem variable regions, extracted from the 27 013 high-quality full-length 16S rRNA genes reference set, were used to simulate 454 Titanium reads, whereas the 150/100/75/50bp of the flanking ends were used to simulate corresponding paired-end Illumina reads of varying lengths (Figure 1). The coverage of both single and paired primers, as measured by the RDP Probe Match and matches against the reference set (Table 1), were generally high except for the V1/V2, V2/V3 and V7/V8 regions, in large part due to poor coverage of the single V1-for, V2-for and V8-rev primers. Many full-length sequences used in the two reference sets have truncated ends, thereby lacking complete sequences covering the V1-for and V8-rev primer regions, which is a likely reason for poor coverage with these primers.
When ignoring sequencing errors, the 2×150bp Illumina reads were almost as accurate as the longer Titanium reads, owing to their concatenated lengths approaching full Titanium read lengths (Figure 3). Not surprisingly, the genus-level accuracies for Illumina reads dropped as their read lengths decreased. Titanium reads were, however, still far from full-length 16S rRNA gene assignment accuracy. Regardless of sequencing technology and quality, the V3/V4 and V4/V5 regions were the most accurate. While in silico—induced errors, as modelled by error rates provided by the sequencing vendors (Figure 2), had little effect on the classification accuracies for pyrosequencing reads, they had a significantly negative impact for the longer Illumina reads which had increasingly deteriorating quality after 60bp. Paradoxically, the longer Illumina reads were actually less accurate for genus-level assignment than the shorter ones, since error rates increase exponentially with read length. For example, although the accuracy for the V4/V5 regions increased with error-free read lengths, the 2×75bp reads with induced KIT-v4 errors (dashed lines) were slightly more accurate at genus-level than the corresponding longer reads. With or without sequencing error, paired-end 50bp Illumina reads are, however, not worth pursuing for these types of compositional studies.
Since this analysis was based on GIT-related 16S rRNA genes, we also wanted to investigate whether we would obtain similar results if this criterion was removed, i.e. by not restricting reference sequences to GIT environments. Consequently, the reference set was increased to 60 000 high-quality full-length sequences on which we repeated the simulations. As displayed in Supplementary Figure S5, accuracy values for all regions and read lengths are lower at all taxonomic levels compared to the GIT-restricted sequences. Interestingly, the V4/V5 region in particular showed inferior performance and was here marginally better than the V7/V8 region. The RDP-classifier is trained on well-characterized 16S rRNA genes sequenced from compositional studies of diverse microbial environments. As such, GIT-related microbiota are over-represented relative to non-GIT environments (e.g. 45% of the 60000 reference high-quality full-length sequences originate from the GIT) which could thus explain the different patterns in classification accuracy.
As a reference, we also calculated similar accuracies for single variable regions (Supplementary Figure S6). Reassuringly, the accuracy of single V3/V4/V5 regions were consistent with, and slightly lower than, corresponding tandem regions of Titanium length derived from simulations of the GIT-related reference set. The V1 and V9 regions showed the poorest results, followed by the V7/V8 region.
The same regions modelled above were sequenced on both the 454 Titanium and Illumina platforms, using primers from Table 1. Our filtering approach decreased the numbers of accepted reads significantly, notably more for the Illumina reads. As quality deteriorates dramatically with increasing read length for Illumina (Figure 2b), we investigated the effect of additional quality filtering criteria (Supplementary Data). We concluded that the standard criteria provided the best balance of good classification efficiency, high retention of reads, even composition between regions and high similarity to composition as derived by 454 Titanium reads.
Sequencing and diversity characteristics (based on the 97% phylotype similarity level) are outlined in Table 2. The variations in amplicon mean lengths for 454 Titanium is a reflection of the differently sized tandem regions, while Illumina reads are consistently of the same length (2×101bp). Length deviations for the latter technology are instead due to trimming of the variously sized primer sequences. The observed richness levels varied dramatically depending on sequenced region and adapted technology; for Titanium reads, the range was 349 to 1146 phylotypes, and from 97670 to 173857 phylotypes for Illumina reads. Interestingly, the richness values that were the highest/lowest for Titanium were not necessarily the highest/lowest for Illumina reads. This is probably because the shorter Illumina reads sometimes cover regions of variability different from the overall variability as sequenced by the longer Titanium reads for the same region.
Figure 4A shows similarly deviating rarefaction curves for the six different 454 Titanium and Illumina amplicons. Similar curves for the complete set of Illumina reads were omitted due to computational difficulties, and for their apparent lack of reliability. Instead we calculated curves from random sub-samplings of 229 048 reads, equal in size to the region with fewest reads. We included curves for random sub-samplings of 8277 reads (amplicon V2/V3) to examine the underestimating effect we had previously observed (5), and which was also pronounced here for all amplicons except V4/V5. The inflated richness levels for the Illumina sequenced amplicons, as well as the nearly linear rarefaction curves, seemed unrealistic at best and are presumably artefacts of the high error rates in combination with the vast number of reads for each amplicon. This is also supported by the fact that richness values derived from random sub-samplings of Illumina reads (equal in numbers to the corresponding 454 regions) were substantially higher than for the 454 reads at the same read number levels (Table 2 and inset Figure 4A). Likewise, Good’s coverage values from Illumina reads are relatively small compared to the corresponding Titanium reads. This parameter is an estimator of the completeness of sampling (33) and should not be mistaken for sequencing coverage. High error rates produces many singleton phylotypes which results in lower Good’s coverage values (see formula in Methods). Thus, for the same reasons that the rarefaction curves are nearly linear and the Chao1 richness estimations are extremely high, Good’s coverage is relatively low; the V3/V4 and V7/V8 amplicon data produced the highest richness estimations and also the lowest Good’s coverage values.
When comparing classification efficiencies (CE), defined as the proportion of all reads confidently classified to a certain taxonomic level (genus-level from here on), we observed acceptable values (>87%) for all Titanium reads (Figure 4B). We included reads from the single V4 region (5) as a comparison, and its CE was grouped in the middle of the variable tandem regions, below V4/V5 and, interestingly, above V3/V4. The Illumina technology had far worse CE for all regions, owing to a combination of shorter read lengths and poorer quality. Overall, the V4/V5 region showed best performance for both technologies, while the V3/V4 region was the worst.
It has been reported that in some studies that reverse reads on the Illumina instrument are inferior in quality to the forward reads (34). Here, however, we only noted a slight difference in the average quality values of 15.4 and 16.6 for forward and reverse reads, respectively. To investigate if this quality difference had an effect on the classification potential, we RDP-classified the forward and reverse reads separately, and found that there was a clear difference between forward and reverse reads from all six tandem regions in terms of representative sequences in the RDP reference database: Classification efficiencies at the genus level were between 12 and 36 percentage points higher for the forward reads than for the corresponding reverse reads. It is unclear whether these discrepancies are due to the slight quality difference between forward and reverse reads, or the fact that the partial variable regions covered by the reverse reads are all less discriminatory. However, the fact that the phenomenon was evident for all six tandem regions suggests that the former explanation is the more likely one.
In order to compare resolution levels provided by the two sequencing technologies, we quantified the number of unique genera that could be identified by the RDP-classifier for the single V4 and tandem V4/V5 region (Figure 5). Even though the V4/V5 region were pyrosequenced with just over a quarter of the reads used for the single V4 region, 74% of all the V4 genera could be captured by the longer reads. Furthermore, the significantly higher Illumina coverage resulted in only a disproportionate increase in genera identified not detected by Pyrosequencing: These were Sporobacterium, Paludibacter, Oribacterium, Campylobacter, Abiotrophia and Johnsonella. This demonstrates that resolution is ultimately dependent on not only sequence coverage, but also classification efficiency, i.e. choice of region and sequence quality.
The two sequencing technologies revealed relatively similar profiles at phylum level, while they were more different at genus level (Figure 6). This is probably due to the much lower CE for Illumina, manifested by the significantly larger numbers of unclassified genera. It is not unlikely that genera which are classified to a higher extent with Titanium reads, such as Lachnospiraceae Incertae Sedis, are found within the larger cohort of unclassified Illumina reads.
Since the different sequencing targets and technologies used in this study were all applied on a single sample, we also wanted to investigate how phylum and genus profiles varied between replicates. Based on pyrosequencing of duplicate V4 amplicons libraries from four separate individuals (Supplementary Data), we found that even though taxonomic profiles between samples were not identical at phylum/genus level, all replicates still group together when compared to each other at the finest possible level of resolution (unique sequences). Thus, seemingly large variations in e.g. phylum distributions between samples may not necessarily reflect large differences in the overall microbiota, which should be taken into consideration when comparing the slighter variations in taxonomic profiles observed between the six sets of amplicons (Figure 6).
Intriguingly, the relative taxa abundances from the previously sequenced V4 region (5) were much more similar to V4/V5 than the V3/V4 region. The V3/V4 region also had by far the most deviating composition profile compared to the other regions, followed to some extent by V7/V8. This discrepancy was observed across the two technologies at both phylum and genus levels. Neither the RDP Probe Match (86% coverage) nor simulation (91% coverage) estimates (Table 1) implied any bias for the V3/V4 region. Similarly, Pearson correlations between genus classifications of full-length 16S rRNA sequences and the in silico—extracted variable regions from the reference set did not reveal any such bias either: V1/V2 (r=81%), V2/V3 (r=96%), V3/V4 (r=98%), V4/V5 (r=97%), V5/V6 (r=83%), V7/V8 (r=93%), and single regions V3 (r=85%) and V4 (r=87%). To further investigate reasons behind V3/V4 and V7/V8 deviations, we compared family classifications between these amplicons and HITChip hybridisations of full-length 16S rRNA from our previous study (5). From Supplementary Figure S7 it is evident that both these regions, and especially V3/V4, have much poorer correlations with HITChip hybridisations. Following comparisons with the sequenced single V4 region we already know that the V4-rev primer used is not responsible for the said bias. To finally exclude the possibility that the V3-for primer is the sole error-causing source we compared aggregates of full-length 16S rRNA gene sequences with V3 reads, sequenced with capillary Sanger sequencing and 454 Pyrosequencing, respectively (35). The ratios of the two largest phyla Bacteroidetes and Firmicutes were 0.66 for the full-length sequences and 0.77 for the V3 reads, thus relatively close, and definitely not as disparate as for the V3/V4 and V4/V5 regions. Only 41 chimera sequences were detected among the V3/V4 Titanium reads, which again would not explain the observed difference. Altogether, these data are conclusive evidence that the V3/V4 deviations are due to bias associated with the experimental amplification process occurring when these particular V3-for and V4-rev primers are combined, rather than to uneven primer coverage.
The RDP-classifier uses Bayesian probability theory for observing eight-character sub-sequences within each unknown query sequence, and has been trained on over 7000 bacterial full-length 16S rRNA genes. To investigate if a common alternative assignment approach would generate similar results, we applied the MEGAN tool (36) on BLAST searches of the trimmed Titanium reads against the RDP database. We deemed BLAST searches of the 4.6 million Illumina reads as being too computationally intense, and therefore performed the analysis on subsets of 40000 sequences per region instead. Correlations between genus classifications of 10 different sub-samplings were all consistently high (r>0.99), suggesting that any of these sub-sampled sets were representative for the complete set of Illumina reads. Figure 7 shows the comparison tree generated by MEGAN using all the Titanium reads and the subsets of Illumina reads, with relative taxonomic abundances for the six variable tandem regions at various taxonomic levels. Similarly to results from the RDP-classifier, composition profiles based upon the V3/V4 and V7/V8 regions indicated larger proportions of the Firmicutes phylum for both sequencing technologies. In contrast, there are several significant differences between the two assignment approaches at genus level; perhaps most strikingly, Bacteroides reads account for a large fraction of the community only for the V1/V2 and V3/V4 regions according to the MEGAN analysis of the Titanium reads.
To further investigate these discrepancies we generated correlation plots (Supplementary Figures S8 and S9) between phylum and genus classifications for the two approaches and sequencing technologies. For Titanium, only the V1/V2 and V4/V5 regions showed good correlations between the two classification methods, with Pearson correlations of 0.97 and 0.98, respectively. The reason behind the genus discrepancy was revealed from closer examininations of the MEGAN data for the Bacteroides assignments; in order to assign a read to the Bacteroides genus, all 10 first BLAST hits had to be against Bacteroides species. As it were in many cases, the ninth hit was against a group of bacteria labelled ‘uncultured bacterium adhufec’ (acronym for adult human faeces). These bacteria were, however, classified as belonging to the Bacteroidales family, and were, according to additional BLAST searches, unambiguous Bacteroides species (data not shown). Moreover, BLAST hits against the genera Clostridium, Roseburia and Ruminococcus are in many cases indistinguishable, which thus explain these genus deviations. In comparison, MEGAN analysis of the Illumina reads showed better consistency with the corresponding RDP-classifications, especially for the Bacteroides genus (Figure 7B and Supplementary Figure S6). The problematic ninth BLAST hit against the incorrectly labelled Bacteroides species was simply not an issue for the Illumina reads, since the reads had fewer hits with high scores. It is also important to note that the average classification efficiency for the RDP-classified Illumina reads was nearly twice that for the reads classified with MEGAN (59 versus 30%). To summarize, the deviating compositions of the V3/V4 and V7/V8 reads did not seem to be caused by poor performance of the RDP-classifier relative to the MEGAN approach.
The number of compositional studies of complex microbial communities that use high-throughput sequencing of partial 16S rRNA amplicons is increasing rapidly, encouraged by earlier successful studies and by the growing output-per-cost-ratio. Nonetheless, to obtain as accurate results as possible it is of paramount importance to minimize the amplification bias inherent in this approach, and to select variable 16S rRNA gene regions and sequencing primers with utmost care. Our main aim in this comparative study was not to investigate the primers with the highest performance expected, nor to test as many as possible. It was rather to investigate anomalous data generated with previously published primers, while at the same time evaluating their suitability, in new variable region combinations, in conjunction with recent sequencing technology improvements.
For sequencing by synthesis on the Illumina platform, standard paired-end linkers were ligated to the amplicons that been generated by universal 16S rRNA gene variable region primers. This does not significantly affect the yield of sequence data. Although it is theoretically possible that the ligation step might introduce a bias, such an effect has not been noted in the multiple genome re-sequencing projects completed on this platform (Fasteris, personal communication). Furthermore, analysis of the first base sequenced in any particular Illumina run did not identify bias towards a particular nucleotide (data not shown),which would be expected if there was a bias in the ligation, and the GC bias of the genome was maintained.
Based on simulation accuracies, classification efficiencies and consistency between two different classification approaches (RDP-Classifier and MEGAN based on BLAST searches), the V4/V5 region showed the best performance across the two sequencing technologies. Somewhat surprisingly however, we noted that sequenced reads of the V3/V4 region performed the worst; this was in spite of its high simulated accuracy (primer coverage and regional classification potential), and previous indications of good classification consistency for its constituent V3 and V4 parts (5,35). Hence, the bias was not associated with the selected individual primers or with the choice of sequencing method, but rather with amplification artefacts arising from the combination of these two specific V3-forward and V4-reverse primers. This emphasises that we should not blindly trust in silico predictions or primers, nor known results from separate components of the variable region in question. In contrast, support from actual amplification experiments using the proposed primer combination is absolutely necessary.
Moreover, even with longer variable regions, further developed sequencing technologies and higher coverage, it was evident that the microbial diversities measured from the same sample differed significantly depending on choice of variable region(s). We could therefore confirm the highly region-specific behaviour across datasets observed by other groups (16,17,37), and thereby re-iterate the weakness of comparing diversities between communities based on different ribosomal gene regions. Comparisons of additionally sequenced V4 amplicons also highlighted that although microbiota compositions may not be identical at phylum and genus level, their overall composition revealed at finer resolution could still have better discriminatory effect.
The extremely inflated diversity metrics, as derived from the Illumina reads, could in large part be explained by the high error rates above 60bp. The exponentially deteriorating quality after this point was also the source of poor accuracy and classification efficiency for the shorter Illumina reads. It is possible that a more suitable alternative to these paired-end reads, which flank the variable tandem regions, could be shorter-insert fragments where the poor quality read ends partly overlap, resulting in improved consensus quality in the critical sequence region. At present, neither taxonomic classifications nor community diversity as derived from Illumina reads are reliable enough, and the coverage improvement over pyrosequencing does not result in an equivalently increased insight into the rare community members. Subsequent analysis of beta diversity (between subjects or along time series) would also produce unreliable results, due to this limitation. Notwithstanding, the technology has enormous potential, and when quality improves further, the Illumina technology may reveal unprecedented diversity from even the most complex microbial environments on earth.
Supplementary Data are available at NAR Online.
M.J.C. is funded by a fellowship from the Health Research Board of Ireland. Q.W. and J.R.C. were supported by National Research Initiative number 2008-35107-04542 from the US Department of Agriculture (USDA) National Institute of Food and Agriculture. Funding for open access charge: This paper is funded, as part of the ELDERMET project (http://eldermet.ucc.ie), by the Government of Ireland through the Department of Agriculture, Fisheries and Food, and the Health Research Board, through the Food-Health Research Initiative, 2007–2011.
Conflict of interest statement. None declared.
The authors are grateful to Nessa Gallwey, Karen O'Donovan and Ann O'Neill for technical and clinical help, and to Siobhan Cusack for project management. This study is an output of the Eldermet consortium (http://eldermet.ucc.ie), which has the following additional principal investigators: Ted Dinan, Daniel Falush, Gerald Fitzgerald, Tony Fitzgerald, Albert Flynn, Colin Hill, Denis O'Mahony, Fergus Shanahan, Catherine Stanton, Cillian Twomey and Douwe van Sinderen.