|Home | About | Journals | Submit | Contact Us | Français|
Recent studies have highlighted the surprising richness of soil bacterial communities; however, bacteria are not the only microorganisms found in soil. To our knowledge, no study has compared the diversities of the four major microbial taxa, i.e., bacteria, archaea, fungi, and viruses, from an individual soil sample. We used metagenomic and small-subunit RNA-based sequence analysis techniques to compare the estimated richness and evenness of these groups in prairie, desert, and rainforest soils. By grouping sequences at the 97% sequence similarity level (an operational taxonomic unit [OTU]), we found that the archaeal and fungal communities were consistently less even than the bacterial communities. Although total richness levels are difficult to estimate with a high degree of certainty, the estimated number of unique archaeal or fungal OTUs appears to rival or exceed the number of unique bacterial OTUs in each of the collected soils. In this first study to comprehensively survey viral communities using a metagenomic approach, we found that soil viruses are taxonomically diverse and distinct from the communities of viruses found in other environments that have been surveyed using a similar approach. Within each of the four microbial groups, we observed minimal taxonomic overlap between sites, suggesting that soil archaea, bacteria, fungi, and viruses are globally as well as locally diverse.
Soil microorganisms represent a considerable fraction of the living biomass on Earth (63), with surface soils containing 103 to 104 kg of microbial biomass per hectare (7). Despite this abundance and the importance of soil microorganisms for key ecosystem functions (35, 37, 62), the diversity and structure of soil microbial communities remain poorly studied. With the advent of molecular techniques, we can now begin to survey the full extent of microbial diversity, including the vast majority of microorganisms which cannot be identified using traditional taxonomic approaches (47).
Of the microbial groups that are abundant in soil, the bacteria have been the most extensively studied. With an estimated 103 to 107 bacterial “species” per individual soil sample (15, 23, 59, 60), they are often considered to be the most diverse group of soil microorganisms (13). However, bacteria are not the only microorganisms found in soil; archaea, fungi, and viruses are also numerically abundant (58). To our knowledge, no previous studies have examined the sequence diversity of soil viruses, and no studies have compared the levels of genetic diversity found in the different taxonomic groups of soil microorganisms (bacteria, archaea, fungi, and viruses) inhabiting a given soil sample.
We propose that soil fungal, archaeal, and viral communities are likely to be as taxonomically diverse as soil bacterial communities. Although soil fungi have been studied for centuries, recent DNA-based surveys suggest that fruiting body and cultivation-based surveys have underestimated the total richness of soil fungal communities (33, 43, 54). Recent research also indicates that soil archaea are phylogenetically diverse (44, 46, 61) and are undersurveyed despite their apparent importance in soil processes (37). Soil viruses are known to be abundant, to be morphologically diverse, and to span a wide range of genome sizes (48, 64), but there are currently no published reports describing the genomic diversity of soil viral communities.
For this study, our goal was not to identify every individual microorganism found in soil. To do so would be prohibitively difficult given the magnitude of the required sequencing effort (17, 55). Rather, our goal was to compare the phylogenetic diversities of the four dominant taxonomic groups of soil microorganisms in soils collected from a tallgrass prairie, an arid desert, and a tropical rainforest. These sites were chosen because they represent globally dominant ecosystem types and span a broad gradient in aridity and productivity. We analyzed partial sequences of amplified 16S and 18S rRNA genes to characterize the phylogenetic diversity of archaeal, fungal, and bacterial communities in each soil. Because viruses lack ubiquitously conserved genetic elements, we assessed viral diversity by sequencing randomly chosen clones from viral DNA metagenomic libraries.
Soil was collected from three sites: Manu National Park in Peru (Amazonian terra firme forest; 12.65oS, 71.23oW), Mojave Desert in California (desert shrubland; 33.97oN, 116.07oW), and the long-term ecological research site at Konza Prairie in Kansas (tallgrass prairie; 39.10oN, 96.60oW). Additional soil and site information is given in Table Table1.1. At each site, mineral soil (the upper 5 cm) was collected from 10 locations within a single 100-m2 plot using a stratified random sampling approach. The individual soil samples from each plot were homogenized together, and the composited sample was sieved to 2 mm and stored either at 4°C for extraction of viral DNA or at −80°C for extraction of fungal, bacterial, and archaeal DNA.
For the bacterial, fungal, and archaeal clone libraries, DNA was extracted from each of the three soil samples using the MoBio PowerSoil DNA kit (MoBio Laboratories, Carlsbad, CA). DNA was extracted from 10 replicate subsamples (of 1.0 g soil) from each of the three composited soil samples (one from each plot). These replicate DNA extractions provided the templates for the construction of the bacterial, archaeal, and fungal clone libraries.
Viral community DNA was extracted from the soils using methods similar to those described elsewhere (8, 10). Soil samples (~200 g [wet weight]) were resuspended in 0.02-μm-filtered 1× phosphate-buffered saline solution and shaken vigorously to dislodge the viruses from the soil particles. The sediments were pelleted, and the supernatant was then filtered through a 0.2-μm Sterivex filter to remove all nonviral organisms. Viruses in the filtrate were concentrated by polyethylene glycol precipitation with polyethylene glycol 8000 added to a final concentration of 10%, and the samples were incubated for 12 h at 4°C (11). The samples were then centrifuged at 13,000 × g for 30 min on an SW41 rotor to pellet the viral particles. The viral pellet was resuspended in 0.02-μm-filtered phosphate-buffered saline solution and loaded onto a cesium chloride step gradient consisting of 1 ml each of 1.7, 1.5, and 1.35 g ml−1. The gradient was centrifuged for 2 h at 22,000 rpm on an SW41 rotor (average of 60,000 × g), and the DNA was isolated from the 1.35 to 1.5 g ml−1 fraction (which contains most of the viral particles) using formamide and cetyltrimethylammonium bromide extraction (53).
For the analysis of small-subunit rRNA genes, individual bacterial, archaeal, and fungal clone libraries were constructed from each soil sample. For each library, three replicate PCRs were conducted per soil DNA template (for a total of 30 replicate PCRs per library) using group-specific primers. The bacterial clone library was constructed using a universal eubacterial primer set, Bac8f (5′-AGAGTTTGATCCTGGCTCAG-3′) and Univ529r (5′-ACCGCGGCKGCTGGC-3′) (5, 36, 49). The archaeal clone library was constructed using the archaeon-specific primer Arc21f (5′-TTCCGGTTGATCCTGCCGGA-3′) (5) and Univ529r. The fungal library was constructed with the EF4 (5′-GGAAGGGRTGTATTTATTAG-3′) and fung5 (5′-GTAAAAGTCCTGGTTCCCC-3′) primer set (57), which has previously been shown to amplify 18S rRNA genes from most fungal groups (3, 24, 26, 32). Each 50-μl PCR mixture contained 1× HotStarTaq master mix (QIAGEN, Valencia, CA), 0.5 μM of each primer, and 50 ng of template DNA. The amplification protocol consisted of 15 min at 95°C, followed by 25 cycles of 60 s at 94°C, 30 s at the appropriate annealing temperature, and 60 s at 72°C and a final 10-min extension step at 72°C. The annealing temperatures for the bacterial, archaeal, and fungal amplifications were 54°C, 55°C, and 48°C, respectively.
The amplified products from the replicate PCRs were pooled together and cloned using the TOPO-TA PCR cloning kit (Invitrogen). Clones were picked and unidirectionally sequenced following standard protocols (SymBio, Menlo Park, CA). Sequences were screened for chimeras using Bellerophon (29), trimmed at conserved motifs, and aligned using either NAST (available at http://greengenes.lbl.gov) or ARB (available at http://www.arb-home.de). Figure Figure11 and Table Table22 indicate the number of sequences included in each library.
Because viruses lack ubiquitously conserved genetic elements, viral diversity was assessed by sequencing randomly chosen clones from viral DNA metagenomic libraries. The viral clone libraries were constructed using a linker-amplified shotgun library technique, as described by Breitbart et al. (11). Construction of the linker-amplified shotgun libraries was performed by Lucigen Corp. (Middleton, WI), with sequencing conducted at SymBio (Menlo Park, CA) and Agencourt (Beverly, MA). The total viral community DNA was randomly sheared using a HydroShear and end repaired, and double-stranded DNA linkers were ligated to the ends. The fragments were then amplified using the high-fidelity Vent DNA polymerase, ligated into the pSMART vector, and electroporated into MC12 cells. This method circumvents problems associated with modified nucleotides and deadly genes in viral genomes, as well as the low DNA concentrations in environmental samples.
We confirmed that the sequences from each library matched the targeted taxonomic group by comparing the sequences to those in the GenBank database using the BLAST algorithm (1). The archaeal, fungal, and bacterial libraries were dereplicated into operational taxonomic units (OTUs) using Fastgroup II (65). An OTU was defined as a group with ≥97% identity in their small-subunit rRNA gene sequences following the conventional definition of a microbial “species” (52). Due to the computational challenges associated with estimating diversity indices and the associated error around these estimates, we only used a single OTU definition for this study. After grouping sequences into OTUs at the ≥97% sequence similarity level, we used EstimateS (version 7; R. K. Colwell, http://purl.oclc.org/estimates) to produce rarefaction curves (Fig. (Fig.1).1). Because none of the rarefaction curves approached an asymptote, we know that we have undersampled the total diversity of each microbial group, and therefore the rarefaction curves cannot be used to compare the diversities of the microbial communities.
For each of the nine libraries, the rank-abundance data (where the observed OTUs are ordered from most to least abundant on the x axis and the abundance of each OTU is plotted on the y axis) were fit to four possible models: logarithmic, log-normal, exponential, and power law models. The equations for these four models are provided in reference 4. These equations describe the community structure by expressing the fraction fi of the community in the ith ranked OTU in terms of the model parameters a, b, and M. As an example, the equation describing the community structure of the power law model is
where M is the total predicted richness of the population, a is the proportional abundance of the most abundant genotype, and b−1 is a measure of the evenness of the population. The other models also express the fi in terms of model parameters a, b, and M, although the functional dependence is different. Note that since the sum of the fis is equal to 1, any two of the three parameters a, b and M determine the third.
The parameters for all the models for all of the libraries were estimated using maximum-likelihood methods. The estimates for the viral communities followed the procedure described by Breitbart et al. (11) and are further described in “Viral sequence analysis” below. The maximum-likelihood estimates of M and a for the other communities proceeded by minimizing the variance-weighted sum of squared deviations Y between the observed and the predicted number of OTUs sampled exactly k times in a sample of size n:
In this formula, d(k) is the number of OTUs that were actually observed k times, m(k) is the expected number of such OTUs and v(k) is the variance of the number of such OTUs based on the model. The corresponding likelihood (the probability that the hypothesis is true given the data) is proportional to exp(−Y/2), which is maximized by minimizing Y. The minimum values of Y are shown in Table Table33 (model error).
To calculate m(k) and v(k), we make use of random variables Xi,k that take value 1 or 0, depending on whether or not the ith OTU is sampled exactly k times. Since the probability of this happening is
it follows that this same expression also represents the expected value
The variables Xi,k are useful for us since the number of OTUs observed exactly k times is just Xi,k. Thus, knowing the expected value of the Xi,k enables us to calculate
where we have made use of the fact that since Xi,k takes only the value 1 or 0, Xi,k = Xi,k2. For practical evaluation, it is convenient to invoke the Poisson approximation for equation 4:
Confidence intervals for the predicted number of OTUs sampled k times can be obtained using the likelihood ratio test (27). Since the likelihood of any given pair (a,M) of parameter values is proportional to exp(−Y/2), we can determine the ratio, R, of that likelihood to the maximum likelihood. The likelihood ratio test relates the distribution of log(R) to a certain chi-square distribution. In fact, 2log(R) follows a chi-square distribution with N − 1 degrees of freedom, where N is the number of parameters involved (two in this case). This fact enables us to establish a region in the (a,M) plane containing the point of maximum likelihood as well as all points with a value of 2log(R) of >1, corresponding to 68.26% confidence (standard error). The standard error intervals shown in Fig. 2a and b represent, respectively, the projections onto the M and a axes of the confidence regions so determined.
For the viral clone libraries, all sequences (average sequence length, 493 bp) were trimmed and assembled with Sequencher 4.0 (Gene Codes Corp., Ann Arbor, MI). Since the viral sequences did not originate from a single locus, standard gene alignments could not be used to differentiate viral genotypes. Instead, we used a metagenomic definition of a “species” where an OTU has ≥98% identity over a minimum of 20 bp, as per Breitbart et al. (11). The contig spectra were as follows: rainforest [980, 8, 3, 1, 0, 0, 0…], desert [1592, 24, 0, 0, 0…], and prairie [1,899, 13, 1, 0, 0, 0…]. The resulting contig spectra were mathematically modeled to predict community structure using PHACCS (4) and Monte Carlo simulations as described previously (8, 11). To determine the identities of the environmental viruses, the viral metagenomic sequences were compared against the GenBank nonredundant database using TBLASTX. Significant hits to GenBank entries (E value of <0.001) were classified into groups based on sequence annotation in the nonredundant database. To determine the types of phages found in the soils, the sequences were compared against a database containing 510 complete phage genomes (51) using TBLASTX (http://phage.sdsu.edu/oceanviruses). Hits with an E value of <10−6 against this database (approximately equivalent to an E value of 0.001 against the nonredundant database) were considered significant.
The nonredundant sequences from this study have been deposited in the GenBank nonredundant database and have accession numbers EF429664 through EF431845 (bacteria, archaea, and fungi). All viral sequences from this study have been deposited in the GenBank GSS database with accession numbers ER781257 through ER785833.
The rarefaction results (Fig. (Fig.1)1) indicate that only a portion of the richness in the bacterial, fungal, and archaeal communities (at the ≥97% sequence similarity level) was surveyed with the clone libraries, as none of the curves reached an asymptote. However, coarse estimates of microbial diversity can be obtained without sampling every individual OTU in a given community (15, 28), and we can compare relative levels of community richness and evenness in the targeted microbial taxa. Nonparametric estimators (i.e., Chao I and ACE) (41) are frequently used to estimate the total number of OTUs in a given community (6, 30). However, in all cases, the nonparametric estimates of total OTU richness failed to stabilize or reach an asymptote (data not shown), so they cannot be used to estimate the total number of OTUs within each community (34). Instead, we used a parametric technique, based on the observed OTU abundance distribution, to predict the community-level diversity of these three groups, assuming that the form of the OTU abundance distribution is the same for both the libraries and the communities as a whole. For the viral communities, which were surveyed by constructing metagenomic libraries, the OTU abundance distribution was predicted by mathematically modeling the contig spectra.
We tested four different models that are commonly used to describe microbial community structure (23, 28) and used the most appropriate model (a power law function [Table [Table3])3]) to estimate the OTU richness and evenness of each community. While more complex parametric models have been used to estimate OTU richness (23, 55), these models were not tested because there is no a priori reason to choose one type of model over another and because less parsimonious models (those with a larger number of parameters) are likely to underestimate model error. The power law model yielded the lowest model error in 9 of the 12 cases (Table (Table3).3). Table Table22 shows the close correspondence between the observed number of OTUs and the power law model prediction of OTU numbers for each library. The second-best-performing model, the log-normal model, yielded estimates of OTU richness across soils and taxonomic groups that were generally similar to the estimates obtained using the power law model (Table (Table3).3). Since the levels of diversity are estimated from the OTU abundance curve, the estimates of OTU richness should be relatively robust to changes in library size (Table (Table2).2). However, for some of the OTU richness estimates, there was a wide range in the 70% confidence regions around the maximum-likelihood values (Fig. (Fig.2).2). This high degree of uncertainty in richness estimates reflects the difficulties associated with reliably fitting the tail of a given distribution. This is readily apparent in Table Table33 and in the extremely high richness estimates for the desert archaeal and prairie fungal communities (Fig. (Fig.2).2). Although our clone libraries are larger than most clone libraries published to date, they are still miniscule considering the overwhelming complexity of the soil microbial communities, making it difficult to estimate the exact number of OTUs in each taxonomic group. Due to this high degree of uncertainty, the richness estimates should be considered carefully, as they are likely to be more useful for comparing richness levels between taxonomic groups than for defining the exact number of OTUs in each of the collected soil samples. However, it is worth noting that there is far less uncertainty associated with the estimates of evenness for the individual communities (Fig. (Fig.2),2), as the evenness estimates are less susceptible to errors associated with predicting the specific shape of the tail end of the OTU distribution.
The model results suggest that the total OTU-level richness of bacteria, archaea, fungi, and viruses was extremely high at all sites (Fig. (Fig.2a),2a), with the estimated richness of the last three groups equaling or exceeding the richness of soil bacteria in all habitats. The desert archaeal, prairie fungal, and rainforest viral communities were particularly OTU rich, with a minimum estimate of >106 unique OTUs each (Fig. (Fig.2a),2a), more than an order of magnitude higher than bacterial richness at the same sites. Of course, given the caveats detailed above, it is important to recognize the high degree of uncertainty inherent in these richness estimates.
The estimated differences in evenness between taxa are likely to be more robust than our estimates of total OTU richness (Fig. (Fig.2).2). Of the four taxonomic groups, the archaeal communities were the least even, with a single OTU accounting for >8% of the population in a given community (Fig. (Fig.2b).2b). The fungal and archaeal communities had lower evenness levels than bacterial communities, an observation consistent with results reported elsewhere (43, 46, 61). There was no apparent correlation between the estimated evenness and richness of the communities (r2 = 0.05; P > 0.5). Interestingly, the estimated probabilities of selecting two individuals of the same OTU from a community (Simpson's diversity index) (41) were relatively consistent within each taxonomic group regardless of soil type (Fig. (Fig.3).3). This consistency suggests that the overall structure of each of these communities is controlled by the type of microbe in question rather than the specific features of the soil environment.
Although the slopes of the rarefaction curves were lower for archaea and fungi than for bacteria (Fig. (Fig.1),1), the differences in slopes reflect a lower community-level evenness in these groups (Fig. (Fig.2b),2b), not necessarily a lower overall OTU richness. This point is worth reiterating; the slopes of rarefaction curves reflect both the richness and evenness of communities, and therefore, in most cases, rarefaction analyses alone cannot be used to compare richness levels of different microbial communities (30).
Not only are soil bacteria, archaea, fungi, and viruses locally diverse, but our results indicate that these groups are also globally diverse, as we observed little phylogenetic overlap between soils. None of the identified archaeal, fungal, or bacterial OTUs was found at more than one site, and we observed only one instance of an overlapping viral sequence (≥98% identity over 20 bp) between sites when all viral sequences (4,577 in total) were assembled together. While we have no way of estimating the global richness of these groups, the lack of overlap in observed OTUs between sites tells us that the global diversity of each of these groups must be very high. The century-old speculation that the global diversity of the smallest organisms should be relatively low (22) appears to be incorrect.
The estimated number of bacterial OTUs in the three plots (≈104 unique OTUs [Fig. [Fig.2a])2a]) closely matches the estimates obtained in other studies (59, 60). Our estimates of fungal richness are substantially higher than estimates obtained using classical taxonomic approaches (a maximum of 3,000 fungal species identified from a single 400-ha site) (25), confirming the results of other studies showing that molecular surveys can uncover a large pool of fungal diversity that has been overlooked (2, 33, 40, 43). Soil archaea also appear to have an equivalent, if not greater, OTU richness than soil bacterial communities, consistent with the high levels of phylogenetic diversity observed in other studies of soil archaea (46, 61). To our knowledge, there are no comparable studies of phylogenetic richness in soil viral communities. However, it is important to note that because we examined only viruses with double-stranded DNA, the true richness of viral communities at each site is likely to be even higher than our estimates.
Of the three soils examined, no individual soil harbored the most diverse community of microorganisms. The estimated number of OTUs was highest in the desert soil for archaea, the prairie soil for fungi, and the rainforest soil for viruses, while the richness of bacterial OTUs was very similar across the three soils (Fig. (Fig.2a).2a). Due to a paucity of studies comparing microbial diversity across soils from different ecosystems and the large number of possible mechanisms that may influence levels of taxonomic richness, it is unclear how to interpret these results. Fierer and Jackson (21) found the lowest levels of bacterial diversity in rainforest soils, but their study (which estimated diversity by terminal restriction fragment length polymorphism fingerprinting) was not necessarily examining diversity at the same level of taxonomic resolution as in this study. The high estimated richness of archaeal OTUs in the desert soil is surprising considering the challenging nature of this environment, but other studies have also observed high levels of archaeal diversity in soils and other environments that are likely to be suboptimal for microbial growth (50, 61). The fungal results (Fig. (Fig.2a)2a) are consistent with a study by Jumpponen and Johnson (33) in which high fungal diversity was also observed in soils collected from Konza Prairie, KS.
To our knowledge, this is the first study to use sequencing to characterize soil viral communities. TBLASTX comparison of the soil sequences against the GenBank nonredundant database revealed that the majority of the viral sequences showed no significant similarity to previously described sequences (E value of <0.001). Among the identifiable hits, there were numerous similarities to phages (viruses that infect bacteria) (Table (Table4)4) and to herpesviruses (data not shown). While there was very little overlap in viral sequences (≥98% identity over 20 bp) between sites (see above), comparison of the sequences against a database containing the genomes of 510 completely sequenced phages demonstrated that similar types of phages were found in all three soil types (Table (Table4;4; Fig. Fig.4).4). The most abundant phage types observed in the soil samples were similar to phages that infect the soil bacteria Actinoplanes, Mycobacterium, Myxococcus, and Streptomyces, as well as the halophilic archaeon Haloarcula (Table (Table4).4). The phage types observed in the soil samples were significantly different from the dominant types found in marine or fecal samples (8, 9, 11) (Table (Table4;4; Fig. Fig.4),4), suggesting that distinct habitat types harbor distinct viral communities.
A number of mechanisms may contribute to the surprising local richness of soil microbial communities (Fig. (Fig.2a).2a). Such factors may include a high degree of microscale variability in soil properties, rapid rates of speciation, high immigration rates, and low rates of extinction (14, 18, 21, 22, 31, 66). In addition, it is important to recognize that small body size alone may partially account for the high diversity of soil microorganisms at individual sites. Since richness is often correlated with the abundance of a taxon in a given area (16, 56), which is largely a function of body size (42, 45), surveying microbial diversity in individual soils may be similar in magnitude to surveying the diversity of “macro-organisms” at continental scales. For example, estimating microbial richness in our 100-m2 plots is likely to be analogous in terms of scale to estimating bird species richness (assume a body size of 10−3 m3) in a 108-km2 area. While body size alone is not likely to account for the high diversity of soil microorganisms, once we reconcile differences in spatial scale, the local richness of soil microorganisms may be more comparable to the observed levels of plant and animal richness.
Together our results confirm that we have only begun to explore the diversity of soil microorganisms. In an individual sample, our data suggest that the actual number of archaeal, fungal, bacterial, and viral “species” (or OTUs) exceeds the total number of microbial species that have been named to date (≈7,500 named archaea and bacteria combined, ≈80,000 fungi, and ≈2,000 viruses) (12, 19, 20). Clearly, the majority of the microbial diversity on Earth remains undiscovered.
We are grateful to J. Blair and M. Silman for their help with soil collection and to P. Holden, W. Cook, and M. Wallenstein for their valuable assistance on this project. We thank P. Adler, A. Martin, and two anonymous reviewers for comments on previous drafts of the manuscript.
This work was supported by grants from the Mellon Foundation and NSF to N.F.; grants from the Mellon Foundation, NIGEC/NICCR/DOE, IAI, and NSF to R.B.J.; and grants from the Gordon and Betty Moore Foundation and NSF to F.R.
Published ahead of print on 7 September 2007.