Sample acquisition and storage
To compare methods for fecal storage and DNA preparation, ten participants were enrolled and studied, of whom 40% were female and 30% were African American (Table ). Each participant provided a single stool specimen that was sampled multiple times and then used for DNA extraction. Samples were processed immediately (Table , condition 8) or were first frozen at -80°C (Table , conditions 1-3, 7 and 9), placed on ice for 24 hours and then frozen at -80°C (Table , condition 4), placed on ice for 48 hours and then frozen at -80°C (Table , condition 5), or placed in PSP® (Invitek) buffer at room temperature for 48 hours and then frozen at -80°C (Table , condition 6).
Characteristics of participants
Sampling methods compared in this study.
Cell lysis and DNA purification
Four methods were used for DNA isolation from stool. Three commercial kits were used to isolate DNA from fecal samples-- QIAamp DNA Stool Minikit, PSP Spin Stool DNA Plus Kit, and the MoBio Powersoil DNA Isolation Kit. DNA was also purified by a fourth particularly harsh method in which cells were lysed by bead beating in the presence of hot phenol and then processed with the QIAamp DNA Stool Minikit. All samples for a single individual were from a single piece of stool.
When compared to the QIAamp kit, the DNA yields from the MoBio PowerSoil kit were approximately 10-fold less whereas the yields were the greatest for the PSP kit. The yield after bead beating in hot phenol was comparable to that obtained from the standard QIAamp DNA Stool Minikit isolation. With the QIAamp kit, yields were not affected by different storage methods.
454/Roche pyrosequence analysis
To compare how 16S rRNA gene sequence recovery was affected by storage and purification methods, total DNA from stool samples was PCR amplified using primers targeting regions flanking the variable regions 1 through 2 of the bacterial 16S rRNA gene (V1-2), gel purified, and analyzed using the 454/Roche GS FLX technology. The V1-2 region was chosen based on published simulations [25
]. Each primer set used for PCR amplification also contained an eight base DNA bar code that indexed each subject, storage method, and DNA purification method [28
]. PCR products were pooled, and a total of 473,169 sequence reads of average length 260 bases with correct bar codes and primer sequences were obtained for 57 samples (Additional File 1
Subsequent analysis was carried out using the QIIME pipeline [31
]. The pipeline takes in bar coded sequence reads, separates them into individual communities by bar code, and utilizes a suite of external programs to make taxonomic assignments (e. g. RDP [23
]) and estimate phylogenetic diversity. These data are used to generate taxonomic summaries and as input to UniFrac cluster analysis (described below) [33
Bacterial taxa detected
Figure shows the bacterial taxa detected summarized as a heat map. The most abundant genera are shown together with their Phylum-level assignments. For each subject, two identically processed samples taken 1 cm apart are shown (methods 1 and 2 in Table ). Overall there is good reproducibility between the two adjacent "gold standard" samples--of the taxa present as greater than 1% of the total, all were detected in the paired sample. However, low abundance taxa were detected sporadically--of the samples present at 0.2%-0.4% of the total in one replicate (red in Figure ), 35% were not detected in the second replicate. Statistical tests for significant differences are described below.
Figure 1 Composition of the gut microbiome in the ten subjects studied. Bacterial taxonomic assignments are indicated to the right of the heat map at the Phylum and Genus level except in cases where small numbers were detected (e. g. Proteobacteria), in which (more ...)
Communities were dominated by members of the Bacteriodetes
phyla, with lower amounts of Proteobacteria
, and others, as has been reported previously [5
]. Pronounced differences among the subjects were evident--for example, Fusobacteria
were particularly abundant in Subject 1003.
Bacterial taxa recovered using the different storage and DNA isolation procedures
The bacterial taxa recovered using the different methods are summarized in Figure . For each panel, all samples were pooled for subjects analyzed using each of the methods. Replicate samples (Table , methods 1 and 2) are included in each panel to show variation within biological replicates. Figure shows that bead-beating in phenol (Table , method 9) led to improved recovery of some Firmicutes compared to the Qiagen method. Figure shows that results were more similar between the MoBio method and the Qiagen method, though some differences were detected. Figure shows that most of the storage methods yielded indistinguishable results, at least for proportional recovery within the major groups. Storage in PSP (Figure ) was associated increased proportions of several Firmicutes, though the increase was not as pronounced as with the phenol and beat-beating method. For both the phenol/bead-beating and PSP methods, the Bacteriodetes declined in abundance, likely because of the proportional increase in Firmicutes. Thus storage method had little effect, but use of phenol bead-beating or PSP led to increased recovery of some Firmicutes.
Figure 2 Comparison of the recovery of different bacterial taxa with use of different stool storage and DNA isolation methods. 473,169 sequence reads were used to characterize the 57 communities analyzed. All subjects tested for each method were pooled for comparison (more ...)
UniFrac cluster analysis
We next sought to investigate the significance of the differences observed. In many studies of human subjects the question of interest centers on whether a biological factor (disease state, treatment, host genotype etc.) results in a measurable difference on a gut bacterial community against the background of the naturally occurring differences among humans. We thus asked whether the effects of the sample storage and DNA isolation methods were discernable against the background of variation among subjects.
The 16S rRNA gene sequence reads from the 57 communities were aligned to generate a phylogenetic tree using FastTree2 [35
]. Communities were then compared in a pair-wise fashion by means of the UniFrac distance metric, which quantifies the proportion of the branch length on the tree unique to each community in each pair. Pairwise UniFrac distances were used to generate a matrix of all distances between pairs of communities, and principal coordinate analysis used for the cluster analysis (Figures and ). All steps were carried out in an automated fashion within QIIME [36
]. UniFrac analysis was carried out either unweighted, using only presence-absence information, or weighted, which takes in to account the relative proportions of each group.
Figure 3 Comparison of the presence or absence of different bacterial taxa under the different storage conditions or DNA isolation methods tested using unweighted UniFrac. Unweighted UniFrac was used to generate a matrix of pairwise distances between communities, (more ...)
Figure 4 Comparison of the relative abundance of different bacterial taxa under the conditions tested using weighted UniFrac. Weighted UniFrac was used to generate a matrix of pairwise distances between communities, then a scatterplot was generated from the matrix (more ...)
The UniFrac distances matrices were first used to test for significant differences between "gold standard" samples (taken 1 cm apart on a single piece of stool). We tested the difference between pairs using distance based NP-MANOVA, which yielded p = 0.085 for unweighted UniFrac and p = 0.197 for weighted UniFrac. Thus the two gold standards were not significantly different.
Figure shows the unweighted UniFrac analysis colored to distinguish communities from the 10 individuals studied. Figure shows the same scatter plot colored by storage method, and Figure shows the plot colored by extraction method. The data emphasizes that individuals differ substantially from each other, and that storage and extraction methods have less pronounced effects. Also present in each individual cluster are the two replicates from 1 cm apart, emphasizing the reproducibility of the method. Statistical analysis was carried out by asking whether unweighted UniFrac distances were greater within groups than between groups, then 10,000 label permutations were used to generate an empirical P-value. Clustering by subject was highly significant (P < 0.0001). No significance was seen for clustering by extraction method (P = 0.16) or storage method (P = 0.98). We conclude that overall clustering, when analyzed for presence or absence of different bacterial groups, is dominated by differences between individuals.
Figure shows the weighted UniFrac analysis, which takes into account information on relative abundance, comparing the influence of individual of origin (Figure ), extraction method (Figure ), or storage method (Figure ). Again the differences among subjects were highly significant (P < 0.0001), but now the differences due to extraction methods were also significant (P = 0.001). Differences due to storage method were not significant. Thus when the proportional representation of different taxa is taken in to account, both the subject of origin and the extraction method exert significant effects.
We next investigated whether significant clustering could be detected when each extraction method was compared individually to the collection of other extraction methods. Again UniFrac distances were analyzed for within group and between group comparisons, and an empirical P-value generated from 10,000 permutations. No significant clustering was seen in the unweighted analysis. However, using weighted UniFrac significant clustering was seen for the phenol-bead beating method (P = 0.041) and the Qiagen method (P = 0.0014). The strong effect of the Qiagen method was driven in part by the fact that the most samples were analyzed using the Qiagen method, so the sample size was relatively large. Comparison of each method to the two gold standards using NP-MANOVA showed that the phenol bead beating and PSP methods both achieved p = 0.001. Given the significant differences for these methods, we investigated which taxa were most strongly affected, and found that in both cases the recovery of Firmicutes was increased while the recovery of Bacteriodetes was decreased (asterisks in Figure and ), possibly a result of improved lysis of Firmicutes with these methods (discussed below).
Comparison of the 454 GS FLX versus 454 Titanium sequencing methods and the effect of 16S rRNA gene region sequenced
454/Roche recently introduced Titanium chemistry, which results in longer sequence reads than the GS FLX method (~450 nt versus ~260 nt). We thus wished to compare the results of taxonomic assignments for the same samples using the two methods. Two of the DNA specimens analyzed above were resequenced using the Titanium chemistry and results compared by compiling the proportions of all taxa (Figure ).
Figure 5 Analysis of community composition determined using different recovery and sequencing strategies. A) Results of analysis of Subjects 3 and 7 are shown comparing sequencing using 454/Roche GS FLX versus Titanium, and use of different variable region primers. (more ...)
Analysis of longer 16S rRNA gene region also necessitated use of different primer pairs to amplify longer segments of the 16S rRNA gene. Several regions of the bacterial 16S rRNA gene are highly conserved, and multiple different primer sets have been used in published studies [4
]. Previous literature has shown that 16S PCR amplification can be biased [24
], so we sought to analyze this point in the context of 454/Roche pyrosequencing. To analyze the importance of primer choice for 454 Titanium pyrosequencing, we compared six primer sets, which amplified the 16S gene variable regions V1-3, V3-5, and V6-9. For each primer pair, two slightly different sequences were used. All reads were from right to left as drawn in Figure , with dark gray indicating the region of sequence determination. A total of 295,946 sequence reads were used to characterize the different primers (Additional File 2
). The GS FLX primers used for comparison amplified the V1-V2 region. Primer sequences are compiled in Additional File 3
In general, communities subjected to Titanium sequencing after amplification with the V1-V3 and V3-V5 regions resembled communities analyze with GS FLX sequencing after amplification with the V1-V2 region (Figure ). The communities recovered with Titanium sequencing after V6-V9 primer amplification were the most discordant. The V6-V9 primers consistently showed the lowest percentage of taxonomic assignments at the genus level (Figure ). We note that our choice of V6-V9 primer and sequencing direction did not cover the V6 regions efficiently, so results from others focusing on the V6 region specifically may differ from those reported here. Our data indicates that when pooling data from many experiments for meta-analysis, complications may arise when mixing V6-V9 data with data from other 16S rRNA gene regions. In this study, we also compared the effects of 20 versus 30 PCR cycles and the effects of PCR product purification using gels or binding to and elution from beads. Both were found to have little effect on the results (Additional File 2
and analysis not shown).
Comparison of recovery of 10 cloned 16S rRNA gene sequences after 454/Roche pyrosequencing
One question in analyzing microbial communities by 16S rRNA gene pyrosequencing centers on whether the amplification and sequencing methods result in recovery of sequences in proportion to their representation in the original community [24
]. As a first step in addressing this issue, we prepared and analyzed a mock DNA community composed of ten bacterial plasmids encoding near full length 16S rRNA gene fragments. The mixture was PCR amplified using primers that amplified the V1-V2 region of the 16S rRNA gene, and sequences were acquired using the GS FLX technology. Sequences were acquired for both an even mixture of the ten plasmids, and a staggered mixture (a total of 28,161 sequence reads; Additional File 4
Figure shows the distribution of sequences. In the even mixture, all ten sequences were recovered in roughly equal proportions, and the staggered communities showed differential recoveries in the expected directions. The two staggered mock communities were sequenced after amplification with two different DNA polymerase mixtures (GreenTaq and AmpliTaq), which did not result in major differences. Figure compares the input and observed proportions for both the even and staggered communities, showing that recovery was close to the input proportions (P < 0.0001). Thus we conclude that the pyrosequencing procedure used here resulted in proportions of sequence reads that closely matched the known input when cloned 16S rRNA genes are used as PCR templates.
Figure 6 Analysis of recovery efficiency after 454/Roche GS FLX sequencing of a cloned DNA mock community. A) Bar graph illustrating proportional recovery of 16S rRNA gene pyrosequence reads from a plasmid DNA mock community. A total of 28,161 sequence reads were (more ...)