|Home | About | Journals | Submit | Contact Us | Français|
Storage conditions are considered to be a critical component of DNA-based microbial community analysis methods. However, whether differences in short-term sample storage conditions impact the assessment of bacterial community composition and diversity demands systematic and quantitative assessment. Therefore, we used barcoded pyrosequencing of bacterial 16S rRNA genes to survey communities, harvested from a variety of habitats (soil, human gut (feces) and human skin) and subsequently stored at 20°, 4°, −20°, and −80°C for 3 and 14 days. Our results indicate that the phylogenetic structure and diversity of communities in individual samples was not significantly influenced by storage temperature or duration of storage. Likewise, the relative abundances of most taxa were largely unaffected by temperature even after 14 days of storage. Our results indicate that environmental factors and biases in molecular techniques likely impart greater amounts of variation to microbial communities than do differences in short-term storage conditions, including storage for up to two weeks at room temperature. These results suggest that many samples collected and stored under field conditions without refrigeration may be useful for microbial community analyses.
The treatment and handling of samples after collection is a critical aspect of study design when using DNA-based methods to compare the composition and diversity of microbial communities from environmental samples. It is widely assumed that microbial DNA must be extracted from the samples immediately after collection or, if this is not possible, that samples must be frozen (Rochelle et al., 1994). Samples stored at room temperature even for a short period before DNA is extracted are often considered unfit for downstream analyses because of changes to the microbial community. Although these assumptions are widespread, few and conflicting studies have directly tested the influence of storage conditions on DNA-based bacterial community analyses. For example, Dolfing et al. (2004) and Klammer et al. (2005) used DNA fingerprinting methods to show that the overall structure of soil bacterial communities was not strongly affected by storage conditions. Likewise, Roesch et al. (2009) reported only modest shifts in bacterial diversity in only one of four human gut samples after 72 hours of storage at room temperature. In contrast, both Tzeneva et al. (2009) and Ott et al. (2004) observed significant effects of storage conditions on the composition and diversity of microbial communities in soil and human gut samples, respectively. Nechvatal et al. (Nechvatal et al., 2008) tested a series of room-temperature methods for preserving stool for up to 5 days and found considerable variability among methods, but tested each sample with a different stool specimen, limiting their ability to draw general conclusions because of the large degree of variability among stool specimens from different human subjects.
Because of the diversity of samples, conditions tested and analytic methods employed, we still lack a comprehensive understanding of how and whether storage of samples prior to DNA extraction impacts bacterial community analyses and the magnitude of these potential storage effects. In particular, we do not know whether variation in storage conditions (temperature and length of storage) influences our ability to resolve differences in bacterial community composition and diversity between samples. To address these knowledge gaps, we analyzed bacterial communities in soil, human skin, and human fecal samples stored for different amounts of time and at varying temperatures using a barcoded pyrosequencing procedure with the sequence data from each sample analyzed using both phylogenetic and taxonomic community analysis procedures.
Microbial communities were sampled from three distinct habitats: surface soils, human skin, and human feces. Fecal samples (Fecal 1 and 2; c. 100 g each) were donated by two anonymous male participants. Immediately after collection, each sample was homogenized by stirring with a sterile spatula without added buffer in a sterile container. Replicate sub-samples (n=24) of each homogenized fecal sample were obtained by inserting sterile cotton swabs into each sample, and then placing the swab into its own separate dry, sterile 15 ml conical tube. Soil was collected (3- 2.5 × 10 cm cores) from two locations on the campus of the University of Colorado (40° 0′ N, 105° 16′ W) in June 2009. One set of cores was collected from underneath a Pinus ponderosa tree (Soil 1) while the other was collected from an irrigated lawn (Soil 2). Replicate cores were composited and sieved through 2 mm mesh and thoroughly homogenized by hand. From these two soil samples, forty-eight 1g sub-samples (n=24 per sample) were each placed in 1.5 ml centrifuge tubes. Skin samples were taken from the axillae (armpits) of one male and one female volunteer using sterile cotton swabs that had been pre-moistened in a sterile solution of 0.15 M NaCl and 0.01% TWEEN 20 (Paulino et al., 2006; Fierer et al., 2008). The axillary surface was swabbed for 30 s with all 24 swabs per individual at one time. The swabs were then placed in sterile 15 ml conical tubes for storage.
Replicate sub-samples of each community type (n=3) were subsequently stored at 20°, 4°, −20° and −80°C for either 3 or 14 days before DNA extraction. All sample-treatment combinations (4 storage temperatures; 2 storage times; 6 unique samples) were analyzed in triplicate as described in the next paragraph. Participants in the study gave informed consent under the sampling protocol approved by the University of Colorado Human Research Committee (protocol 1007.39).
Genomic DNA was extracted from all samples with the MoBio Power Soil DNA extraction kit (MoBio, Carlsbad, CA, USA) as described previously (Fierer et al., 2008; Lauber et al., 2009). Note that samples were placed in bead tubes containing solution C1 and incubated at 65°C for 10 min followed by 2 min bead beating with the MoBio vortex adapter; the remaining steps of the extraction were performed as directed by the manufacturer. PCR amplification of bacterial 16S rRNA genes using primers directed at variable regions V1 and V2 (positions 27 to 338 according to E. coli 16S rRNA gene numbering scheme) was achieved following the protocol described in our earlier publications (Fierer et al., 2008; Lauber et al., 2009). Briefly, amplicons generated from three PCR reactions per sample were pooled to reduce per-PCR variability and purified with the MoBio Ultra Clean PCR clean up kit according to the manufacturer’s instructions and quantified (PicoGreen; Invitrogen, Carlsbad, CA, USA). No-template PCR controls were also performed. PCR products generated from each sub-sample contained a sub-sample specific, error correcting barcode, which allowed us to assemble a single composite sample for pyrosequencing by combining equal amounts of amplicon from each sub-sample. The composite sample was then gel purified (Qiaquick gel Clean up kit, Qiagen, Valencia, CA, USA) and precipitated with ethanol to remove any remaining contaminants. DNA was sequenced using a Roche 454 FLX pyrosequencer.
16S rRNA sequences were processed according to methods described in our previous publications (Fierer et al., 2008; Hamady et al., 2008). Briefly, sequences less than 200 nt or greater than 300 nt or with average quality scores of < 25 were removed from the dataset, as were those with uncorrectable barcodes, ambiguous bases or if the bacterial 16S rRNA gene-specific primer was absent. Sequences were then assigned to the specific sub-samples based on their unique 12 nt barcode and then grouped into phylotypes at the 97% level of sequence identity using cd-hit (Li & Godzik, 2006) with minimum coverage of 97%. We chose to group the phylotypes at 97% identity because this matches the limits of resolution of pyrosequencing (Kunin et al., 2010) and because the branch length so omitted contributes little to the tree, and therefore to phylogenetic estimates of beta diversity (Hamady et al., 2009b). A representative for each phylotype was chosen by selecting the most abundant sequence in the phylotype with ties being broken by chosing the longest sequence. A phylogenetic tree of the representative sequences was constructed using the Kimura 2-parameter model in Fast Tree (Price et al., 2009) after sequences were aligned with NAST (minimum 150 nt at 75% minimum identity) (DeSantis et al., 2006a) against the GreenGenes database (DeSantis et al., 2006b). Hypervariable regions were screened out of the alignment using PH Lane mask (http://greengenes.lbl.gov/). Differences in community composition for each pair of samples were determined from the phylogenetic tree using the weighted and unweighted UniFrac algorithms (Lozupone & Knight, 2005; Lozupone et al., 2006). UniFrac is a tree-based metric that measures the distance between two communities as the fraction of branch length in a phylogenetic tree that is unique to one of the communities (as opposed to being shared by both). This method of community comparison accounts for the relative similarities and differences among phylotypes (or higher taxa) rather than treating all taxa at a given level of divergence as equal (Lozupone & Knight, 2008). Although UniFrac depends on a phylogenetic tree, it is relatively robust to differences in tree reconstruction method or to the approximation of using phylotypes to represent groups of very similar sequences (Hamady et al., 2009b).
UniFrac calculates the unique fraction of branch length for a sample from a phylogenetic tree constructed from each pair of samples in a data set. Because the UniFrac metric is a phylogenetic estimate of community similarity, it avoids some of the problems associated with analyses that compare communities at arbitrarily defined levels of sequence similarity (Lozupone & Knight, 2008; Hamady & Knight, 2009a). The phylogenetic diversity of each sample was determined from 1000 randomly selected sequences per sample using Faith’s phylogenetic diversity metric (Faith’s PD; Faith, 1992), which calculates the amount of branch length for each sample within the relaxed-neighbor-joining tree. The taxonomic identity of each phylotype was determined using the RDPII taxonomy (60% minimum threshold) (Cole et al., 2005). All sequences have been deposited in the GenBank short read archive (accession number SRA012078.1).
The effect of temperature and length of storage on relative taxon abundance (minimum 1% abundance per sample-treatmet combination) was assessed using the Kruskal-Wallis test in systat 11.0 for sequences classified to the level of order (fecal and skin) or family (soil). Statistical differences in overall community composition (UniFrac distances) were performed within each sample type with the permanova package in primer v6 using Sample, Day, Temperature and Day × Temperature as main factors. Pairwise UniFrac distances were visualized by non-metric multidimensional scaling in primer v6 (Clarke & Warwick, 2001). Differences in Faith’s PD due to temperature and length of storage were assessed using the Kruskal-Wallis test.
After eliminating low quality sequences, the number of reads ranged from 1304 to 3022 per sub-sample with an average of 2019 sequences per sub-sample and a total of 290,696 sequences for the data set. One sub-sample was omitted from the data set (Fecal 1 Day 14, 20°C replicate 2) due to visible fungal growth prior to DNA extraction. Each sample type yielded a similar total number of bacterial 16S rRNA sequences (97,943 for feces, 97,527 for skin and, 95,226 for soil). These distinct sample types harbored communities that were distinct with respect to their composition and diversity (Figs. (Figs.1,1, ,22 and Tables Tables1,1, ,2).2). The human-derived samples tended to have few dominant phyla (3 per sample type) accounting for 95 to 98% of the sequences: in contrast, in the soils, the six most abundant phyla accounted for only c. 83% of the sequences. Fecal samples were dominated by members of the phylum Bacteroidetes (62%), and the Firmicutes (35%), while skin samples had a relatively even distribution of Firmicutes (39%), Bacteroidetes (31%) and Actinobacteria (25%). Soil samples contained many phyla including the Bacteroidetes (32%), Acidobacteria (27%), and Proteobacteria [Alphaproteobacteria (10%), Betaproteobacteria (6.5%), Gammproteobacteria (5.2%) and the Deltaproteobacteria (2.7%)]. The unique distribution of phyla was also seen in overall community composition as NMDS visualization of pairwise UniFrac distances showed clustering by individual sample rather than temperature or length of storage (Fig. 1). Sample types also differed with respect to community diversity levels with soil bacterial communities harboring the highest levels, followed by fecal and skin samples (Faith’s PD = 40, 11 and 10, respectively). As noted below, each pair of sub-samples within a given sample type had bacterial communities that differed with respect to their composition and diversity and these differences were irrespective of the storage conditions (see Table 1 and below).
Bacterial communities in the fecal samples did not change appreciably with different storage conditions and retained their unique composition even after 14 days of storage (Fig. 1 and Table 1). Fecal 2 was dominated by the Bacteroidaceae (c. 75%), while Fecal 1 had a more even distribution of the six most abundant taxa across all temperatures (Fig. 2). Although the relative abundances of a few individual taxa were affected by temperature (Fig. 2), this did not have a significant effect on overall community composition. The UniFrac distance between bacterial communities from the two hosts was significantly greater (permanova P = < 0.001) for both weighted and unweighted UniFrac than the distance between samples stored at different temperatures and durations (P > 0.1, Table 1). This minimal effect of storage on the overall structure of the communities is evident from Fig. 1 which shows that replicate samples tended to cluster by host. Likewise, the phylogenetic diversity of the fecal samples remained consistent across the temperatures (Table 2). Our results extend those reported by Roesch et al. (2009) who found minimal differences in community composition and relative taxon abundances after 72 h of storage at the one temperature tested (room temperature) for 3 of the 4 samples in their study.
In summary, even though we did observe shifts in the abundance of some taxa in our small sample set under different storage conditions, this did not mask interpersonal differences in overall fecal bacterial community composition, and did not affect our ability to differentiate the host origin of the two fecal samples.
As with the fecal communities, each skin sample was distinctive in terms of dominant taxa and overall community composition. Skin 1 had relatively more Bacteroidales and Clostridiales (c. 20-30%) while Skin 2 had greater abundances of the Actinobacteridae and Bacillales (c. 35-55%) (Fig. 2). These person-to-person differences in taxon abundance were also evident in the UniFrac analyses, as each sample clustered by host rather by temperature or length of storage (Fig. 1). Weighted pairwise UniFrac distances were significantly greater between the samples from the two individuals (P < 0.001) than between replicate sub-samples stored at different temperatures (P = 0.93) or durations (P = 0.53). Similarly, we saw no significant effect on the phylogenetic diversity between replicate sub-samples analyzed after 3 days of storage versus 14 days of storage, irrespective of the storage temperature (P > 0.05 in all cases). The fact that the highly personalized nature of skin-associated bacterial communities (Gao et al., 2007; Grice et al., 2009; Costello et al., 2009) were still apparent after 14 days at a range of temperatures, with storage conditions having relatively little impact on community composition or diversity, has important implications for mass sampling efforts sponsored by various international human microbiome projects, which aim to relate microbial community structure and function to physiologic and pathophysiologic features in people experiencing a range of lifestyles in a variety geographic locations, some remote (Turnbaugh et al., 2007).
The two soils harbored unique bacterial communities, with temperature and length of storage having little effect on overall community composition (Figs. (Figs.11 and and2,2, Table 1). Soils retained similar abundances of the six most numerous taxa across the range of storage temperatures tested, except for the Burkholderiales, which were marginally affected by temperature in Soil 1 (P = 0.05, Fig. 2). Although each sample had similar abundances of most taxa, the two soil communities were clearly distinct regardless of storage conditions (P < 0.001, Fig. 1 and Table 1). Analysis of UniFrac pairwise distances showed no significant effect of Day, Temperature or Day × Temperature on overall community composition for sub-samples immediately frozen at −80°C and those stored at 20°C for 14 days (P > 0.05 in all cases). Likewise, phylogenetic diversity was unaffected by temperature or length of storage (P > 0.05 in all cases, Table 2).
We surveyed microbial communities from multiple environments under a broad range of storage conditions, and demonstrated that bacterial community composition in the samples was largely unaffected by differences in short-term storage conditions. Although it is not currently possible to resolve changes in bacteria at the species or strain level using pyrosequencing given the limitations of read length and error rate (Kunin et al., 2010), our results are consistent with other studies and indicate that the provenance of samples has a greater effect on microbial community structure than do the conditions under which samples are stored prior to conducting DNA extractions. Critically, these differences persist both at a broad level (e.g. between soil and skin) and at the more subtle level of specific samples (e.g. different soils, or skin from different people). Sub-samples stored under different conditions did not have identical bacterial communities, perhaps due to insufficient sample homogenization or the inherent variability in DNA extractions and PCR amplification between sub-samples. Importantly, these other potential sources of variability were more important than the variability introduced by differences in storage temperature and duration between sub-samples even after 14 days of storage at room temperature. Although specific taxa may change in relative abundance with different storage conditions, our data suggest that the types of samples in this study can be stored and shipped at room temperature without having a significant impact on the assessment of overall community composition or the relative abundances of most major bacterial taxa.
We thank Donna Berg-Lyons for her help with the sample processing, Jill Manchester for her help with DNA sequencing, plus Micah Hamady and Elizabeth Costello for assistance with the bioinformatics analyses. We would also like to thank members of the Fierer lab group for help on previous drafts of this manuscript. This work was supported by grants from the National Science Foundation (EAR 0724960), the U.S. Department of Agriculture (2008-04346) (N.F.), and the Howard Hughes Medical Institute (R.K.), the Bill and Melinda Gates Foundation, and the Crohn’s and Colitis Foundation of America, and NIH (R01 HG004872) (R.K. and J.I.G.)