The application of the 454 sequencing technique to the investigation of protistan communities in two anoxic marine basins revealed three significant findings. First, even a sampling effort that was one to two orders of magnitude larger than that achieved by environmental clone library construction and Sanger sequencing, was not successful in retrieving all unique SSU rRNA gene sequences present in a single sample (Figure ). Up to 5,600 unique tags could be identified in a 7-L water sample from the Cariaco basin without reaching saturation (sample CAR1). However, this is unlikely to reflect the true species richness, because (i) not all SSU rRNA gene copies within a species are necessarily identical [
34,
35], (ii) some of the observed tag variability may be due to extreme variability of the V9 region in specific taxonomic groups, and [
36] even when minimizing the effect of sequencing and PCR errors using a systematic trimming procedure (see Methods section and [
14]) the accuracy of the 454 pyrosequencing strategy (GS-technology) is 99.75% - 99.5% for small subunit rRNA genes [
37]. Indeed, in sample CAR1 the number of OTUs drops from 5,600 to ca. 2,600 when phylotypes are clustered based on one nucleotide difference (accounting for ca. 0.8% sequence similarity). Thus, about half of the unique protistan tags retrieved from this sample are potentially afflicted with an error and/or represent the same taxon. The detected number of unique tags would likely represent an overestimation of taxon richness. On the other hand, clustering OTUs at ten nucleotide differences (OTUs
10 nt, reflecting ca. 8% sequence similarity) resulted most likely in an underestimation because different taxa may be lumped together into the same OTU. Consequently, it is reasonable to assume that the true taxon richness is reflected in the range between OTUs
1 nt (ca. 1,700 in sample CAR1) and OTUs
5 nt (ca. 1,200 in CAR1).
Interestingly, even the number of detected OTUs
10 nt exceeded previous parametric and non-parametric richness estimates from the same sites, based on clone-library derived OTUs called at 99% or 98% sequence similarity, respectively [
10,
38,
39]. Explanations for this may be several fold: (i) even though the sample sizes obtained from previous Cariaco and Framvaren clone libraries were relatively large, the sample size may still have been too small to obtain adequate resolution of the complex communities. If so, this makes previous clone library-based richness estimates severe underestimations; (ii) the statistical error of previous richness estimates may be too large, which cannot be assessed due to a lack of good confidence intervals; [
36] abundance-based richness estimates may not reflect the true community richness or relative species abundance in a sample but rather the PCR-amplicon richness. The reasoning for the latter is that in contrast to bacteria, the copy number of SSU rRNA genes varies widely among protists [
8,
40,
41]. Thus, the relative amplicon copy number after PCR does not necessarily reflect the relative abundance of a specific taxon in a sample, rendering abundance-based species richness estimates highly erroneous. It is likely that these factors and probably other factors that we cannot account for at present resulted in severe richness underestimations. We hypothesize that the protistan richness in marine anoxic waters by far exceeds previous estimates, and that anaerobic protistan communities are substantially more complex than previously reported. It will be interesting to further investigate how sequence divergence of a hypervariable SSU rRNA gene region translates into taxonomic entities. This will help interpreting the vast diversity of tags generated by massively parallel tag sequencing.
Most of the observed complexity was found in the low-abundance populations. Even when calling OTUs at five nucleotide differences, the proportion of rare OTUs (represented by less than 10 tags) ranges between 71% and 81% in FV samples and between 78% and 83% in CAR samples (data not shown), indicating that the high number of rare taxa is not an artifact based on high intra-species heterogeneity in the V9 region. This corroborates, to a somewhat lesser extent, the previous findings in the bacterial world [
14,
15,
18]. The origin and meaning of this complexity is still unclear [
42,
43]. Actually, to date there is no evidence that this high frequency of low-abundant genotypes describes a true diversity. It could result from the amplification of detrital or extracellular DNA. On the other hand, it is reasonable to assume that a liter of water is only inhabited by a few individuals of a protist species that never meet in this volume and are therefore subjected to allopatric speciation. The result would be tremendous microheterogeneity that is reflected in these rare genotypes. One hypothesis suggests that these rare genotypes (if real) may represent a large genomic pool, which helps the protistan community to react to any biotic or abiotic changes [
43]. In this
seed-bank scenario, the species that are best adapted to prevailing environmental conditions would always be abundant in a community.
The second significant finding is the phylum-richness of protistan communities that is missed by the clone library/Sanger sequencing approach. Previous environmental protistan diversity surveys in the same sites of the Framvaren Fjord ([
10] and Behnke et al. unpublished, accession numbers [
DQ310187 to
DQ310369 and
EF526713 to
EF527205]) did not retrieve any sequences assigned to Apusozoa, Chrysomerophytes, Centroheliozoa, Eustigmatophytes, hyphochytriomycetes, Ichthyosporea, Oikomonads, Phaeothamniophytes, and rhodophytes, all of which have been recovered with the massively parallel tag sequencing approach. Similarly, a vast array of higher taxon ranks detected in this tag-sequencing project could not be detected with an extensive clone library sampling in Cariaco ([
26,
30] Edgcomb et al. in preparation). Interestingly, the tags that could be assigned to taxonomic groups not detected via clone libraries all account for <1% of the unique protistan tags, explaining why they have been missed with the clone library approach [
26,
30]. Regarding taxonomic groups that were represented by large relative abundances of tags (e.g. alveolates and stramenopiles), the 454 data sets corroborate well with clone library-obtained data. Evidence of and tentative explanations for the dominance of these taxonomic groups in anoxic marine systems have already been intensively discussed elsewhere (e.g. [
30,
44,
45]).
The broad taxonomic representation of 454 tags nicely demonstrates the efficiency of the primers used to target the hypervariable V9 region of eukaryote SSU rRNA genes. However, up to 50% of unique 454 tag sequences in our data sets were metazoa. This is a general problem also observed in SSU clone libraries (even though probably to a lesser extent) and not specific to 454 technology [
46-
48]. The consequence is that this large proportion of potential non-target tags has to be taken into account when designing protistan diversity studies using 454 technology. Either sequencing effort needs to be increased 1.5-fold to get the desired number of protistan tags, or group-specific 454 primers need to be applied subsequently to focus on selected protistan groups.
Our findings also reveal that higher sampling efforts can be obtained in a cost- and time-efficient way by the application of pyrosequencing, which therefore paints a substantially more comprehensive picture of protistan communities. The degree of undersampling inherent in most published clone library-based studies may be so high that it is possible that they cannot be compared in a meaningful manner to other equivalent surveys of diversity. Getting a comprehensive picture of a microbial community is critical to addressing fundamental questions in protistan ecology on the basis of molecular diversity surveys. Such questions include for example, determining the true richness and evenness of microbial communities, which is important in defining microbial ecosystem dynamics [
15], and determining the biogeographic distribution of specific taxonomic groups, the stability of protistan communities over time, as well as local patchiness of protists. All of these community attributes are cornerstones for understanding microbial diversity, ecology, and evolution [
16,
49,
50].
Some of these subjects frame the third important finding of this study. The eight sites sampled differed markedly in community composition. Based on community membership, it appears that protistan communities from the supersulfidic Framvaren Fjord with an interface located in the photic zone are distinct from the ones of a less sulfidic anoxic deep-sea site. Similarly, anaerobic protistan communities exposed to hydrogen sulfide are distinct from those that thrive in sulfide-free oxygen-depleted habitats. Even though we cannot unequivocally identify H
2S as the single most important driving force shaping these protistan communities using this dataset, this observation is not unexpected: H
2S-detoxification requires specific adaptation that is not necessarily present in all facultative or strictly anaerobic protists [
51,
52]. For example, Atkins et al. [
53] found a significant difference in the hydrogen sulfide tolerance of different hydrothermal vent species they isolated, including the closely related sister taxa
Cafeteria and
Caecitellus.
Cafeteria strains isolated by these authors could tolerate up to 30 mM sulfide under anoxic conditions over the 24 hr course of their experiment,
Rhynchomonas nasuta could tolerate up to 5 mM sulfide, and
Caecitellus could only tolerate up to 2 mM sulfide. Symbioses between protists and sulfide-oxidizing bacteria are another adaptive strategy observed in micro-oxic environments with high hydrogen sulfide concentrations. For example, the peritrich ciliate
Zoothamnium niveum found in mangrove channels of the Caribbean Sea depends on its sulfur oxidizing ectobionts for detoxification of its immediate environment [
54]. Scanning electron microscopy has revealed a visible diversity of ectobiotic prokaryotic associations with ciliates in the anoxic water column of Cariaco, and these associations are likely to be dependent on the distinct chemical nature of the basin's water column (see Additional file
1). The environmental selection pressure that acts on the phylogenetic composition of protistan communities can be of interest for the design of environment-specific phylo-chips (for example of application see Sunagawa et al. [
55] that may help to monitor the global distribution of specific protistan communities.
The temporal and spatial resolution of our sampling strategy is insufficient to deduce temporal and spatial patterns in protistan communities under study. Yet, possible explanations for the observation that in the Cariaco deep-sea basin, samples collected from the same depth at two different points in time are distinctively less similar to each other (samples CAR2 and CAR4 in Figure , UPGMA), compared to the shallow Framvaren Fjord (samples FV2 and FV4) are obvious: Surface waters of the Cariaco Basin are subject to strong seasonal upwelling, driving as much as 13-fold excursions in net primary production (NPP) between upwelling and non-upwelling seasons [
22]. This causes significant seasonal variations in vertical carbon fluxes, which seems to be not only very important for the dynamics of viral [
27] and bacterial communities [
56] in such systems, but also for protistan communities, even though the exact mechanisms for how vertical carbon flux variations may act on protistan communities are largely unknown. One possibility could be that due to selective interactions of protist with specific bacteria [
57-
59], changes in vertical carbon flux that have a direct influence on bacteria can act indirectly on protistan communities.
At first glance it seems disturbing that metazoa accounted for up to ca. 50% of all eukaryote tags (Figure ). Because most metazoans are very sensitive to anoxia and hydrogen sulfide, this raises the question about the nature of these tags, whether they represent organisms that could plausibly live in the geochemical environments under study or rather represent contamination. Such high proportions of unique metazoan tags are indeed not unexpected after careful consideration: body parts, eggs or planktonic larvae of an individual taxon that may have been present in 5 to 10 liter water samples used for DNA extraction would contribute tremendous amounts of genomic DNA compared to the few individuals of a protistan taxon. Therefore, the SSU rRNA gene copies of this individual metazoan taxon would outnumber any protistan SSU rRNA gene copy numbers by far, resulting in high proportions of metazoan tags. For example, one individual copepod contributes almost 9,000 nearly identical amplicons to the FV1 amplicon library (Additional file
2). In order to account for intrinsic error rates of the pyrosequencing technique (see above) and for intraspecies SSU rDNA polymorphisms as described above for protistan data, we also clustered all metazoan tags at one to five nucleotides differences in a separate analysis. Indeed, it turned out that the proportion of unique metazoan tags decreased decisively (Additional file
3), accounting for only 3.9% to 11.4% (Additional file
4) of total eukaryote tags when clustered at five nt differences (ca. 2% sequence divergence). Data serving as the basis for the relative distribution of taxonomic groups presented in Figures 4-9 can be found in Additional file
5.
Only a few taxa accounted for most of these metazoan tags, which belonged predominantly to copepods, cnidaria, ctenophores, molluscs and polychaetes (Additional file
2). Copepods can survive anoxia and high hydrogen sulfide concentrations for long periods of time [
60]. Also several molluscs [
61], cnidarians, ctenophores [
62] and polychaetes are tolerant of anoxia [
63]. Even Bryozoa that were detected in three of the samples (Additional file
2) are capable of thriving under anoxic conditions [
64]. Thus, the detection of metazoan sequences in anoxic environments retrieved by domain (Eukarya)-specific PCR primers is not surprising. Yet, with the exception of copepods, which we can observe frequently at least in the oxic-anoxic interfaces of our sampling sites we did not confirm the presence of these metazoan taxa in the water samples under study by visual inspection. This is mainly due to the fact that we only screened 20-μl aliquots microscopically (for protistan target taxa). Because of this, small forms (life stages) of larger metazoans or small metazoans like bryozoa represented in our amplicon libraries may have been easily overlooked. It is reasonable to assume that the metazoan amplicons may represent a mixture of allochtonous material (see the detection of an hymenoptera phylotype in FV4 that is represented by nearly 5,000 amplicons) and autochtonous organisms. However, taking into account the low proportion of unique metazoan taxa when clustered at 5 nt differences and the high likelihood of the indigenous nature of most of the metazoans represented by the non-protistan tags it is reasonable to consider contamination in general as an insignificant issue.
This study shows that when 454 pyrosequencing of the V9 region is paired with rigorous downstream data processing, this method is more time- and cost-efficient, and produces a much more comprehensive picture of the protist community than Sanger sequencing of clone libraries, allowing for better estimates of community complexity. While direct comparison of the Framvaren and Cariaco communities is complicated by multiple physico-chemical differences between these two sampling locations, it is possible to distinguish protistan communities on the basis of community composition in the supersulfidic Framvaren Fjord with an interface located in the photic zone from those in the deep-sea anoxic and less sulfidic environment. Furthermore, protistan populations in the sulfide-free oxic/anoxic interface in both Framvaren and Cariaco are distinct from those that are exposed to hydrogen sulfide. However, the specific environmental factors structuring protistan communities remain unknown.