|Home | About | Journals | Submit | Contact Us | Français|
Microbial eukaryotes (nematodes, protists, fungi, etc., loosely referred to as meiofauna) are ubiquitous in marine sediments and likely play pivotal roles in maintaining ecosystem function. Although the deep-sea benthos represents one of the world’s largest habitats, we lack a firm understanding of the biodiversity and community interactions amongst meiobenthic organisms in this ecosystem. Within this vast environment key questions concerning the historical genetic structure of species remain a mystery, yet have profound implications for our understanding of global biodiversity and how we perceive and mitigate the impact of environmental change and anthropogenic disturbance. Using a metagenetic approach, we present an assessment of microbial eukaryote communities across depth (shallow water to abyssal) and ocean basins (deep-sea Pacific and Atlantic). Within the 12 sites examined, our results suggest that some taxa can maintain eurybathic ranges and cosmopolitan deep-sea distributions, but the majority of species appear to be regionally restricted. For OCTUs reporting wide distributions, there appears to be a taxonomic bias towards a small subset of taxa in most phyla; such bias may be driven by specific life history traits amongst these organisms. In addition, low genetic divergence between geographically disparate deep-sea sites suggests either a shorter coalescence time between deep-sea regions or slower rates of evolution across this vast oceanic ecosystem. While high-throughput studies allow for broad assessment of genetic patterns across microbial eukaryote communities, intragenomic variation in rRNA gene copies and the patchy coverage of reference databases currently present substantial challenges for robust taxonomic interpretations of eukaryotic datasets.
The deep-sea benthos harbors vast numbers of eukaryotic meiofauna (organisms 38μm-1mm in size, such as nematodes, protists, fungi, etc.). Yet, there exists a well-recognized gap in the taxonomic understanding of their biodiversity. As a consequence, while we know that there are large numbers of individuals (50,000–5 million individuals per square meter (Vanhove et al. 1995; Danovaro et al. 2000)) from many different phyla (generally >15) in deep ocean sediments, we do not understand the general patterns of species-level distribution (cosmopolitan vs. local) for these communities. Knowledge of biogeographic patterns for these microscopic taxa directly impacts our understanding of global biotic diversity, and of the evolutionary mechanisms that shape their distribution. In addition, meiofaunal communities perform key ecosystem roles such as nutrient cycling and sediment stability in benthic marine habitats. Our continuing ignorance of species distributions and community structure in microbial eukaryote taxa currently precludes any informed mitigation and remediation of marine habitats in the wake of anthropogenic disturbance (e.g. oil spills).
The Baas-Becking hypothesis ‘everything is everywhere, but the environment selects’ (E is E) is among the most broadly discussed hypotheses for the biodiversity of small organisms (<2 mm) (Beijerinck 1918; Baas-Becking 1934; Finlay 2002; Fenchel and Finlay 2004; Kellogg and Griffin 2006). In this hypothesis small creatures with the potential for dispersal will have cosmopolitan distributions and will be found in common habitats with little or no evidence of historical constraints. Although this generalization has been challenged for some organismal groups (Lachance 2004; Foissner 2006; Telford et al. 2006; O’Malley 2007) it remains to be tested broadly across large numbers of meiofaunal taxa. Much of the argument pits traditional taxonomic assessment against molecular methods, where traditional approaches are viewed as ineffective at distinguishing closely related species, resulting in large-scale distributions of morphospecies (Todaro et al. 1996; Westheide and Schmidt 2003). The inherent nature of the E is E question requires intensive examination of phyla with severely underdeveloped taxonomies and an undescribed diversity so great that it would be unfathomable to surmount using traditional methods alone. Thus, molecular studies become the most viable option for assessing truly global patterns amongst meiofaunal communities. Limited genetic evidence has so far suggested contrasting patterns for meiofaunal phyla; some taxa appear to maintain broad, or even global species ranges (bdelloid rotifers, nematodes (Fontaneto et al. 2006; Derycke et al. 2008; Fontaneto et al. 2008)), while other evidence implies that geographic speciation may restrict microbial eukaryotes to specific regions or habitats (tardigrades, rotifers (Mills et al. 2007; Guil et al. 2009; Robeson et al. 2011)) However, the limited taxonomy of these small bodied, diverse phyla has limited traditional approaches and hindered large-scale assessments of their biogeographic patterns.
To address these questions we have applied a metagenetic approach (Sogin et al. 2006; Creer et al. 2010) to characterize the biodiversity of microbial eukaryote communities (typically, organisms that are retained on a 25–64μm sieve but pass through a 0.5mm mesh) across bathymetric gradients and oceanic basins. Here, we distinguish between the terms ‘meiofauna’ and ‘microbial eukaryotes’. Although sediment extraction protocols are typically biased towards the recovery of metazoan species (Creer et al. 2010), metagenetic studies additionally recover eukaryotic groups (fungi, protists, algae) not traditionally classified as meiofaunal taxa. This terminology thus reflects the broadened taxonomic coverage of high-throughput sequencing approaches. Two regions of the 18S rRNA gene (~400bps each) were independently sequenced from 12 environmental samples representing five deep-sea Pacific sites, five deep-sea Atlantic sties, and two shallow water sites: one in the Gulf of California and another in the Gulf of Mexico (Supp. Table S1).
A total of 12 samples were utilised in this study (Figure S2, Table S1B), representing ten deep-sea cores (5 Pacific sites and 5 Atlantic sites). Deep-sea samples were collected using a multicore; the top 0–5cm section was taken from one core tube of a multicorer deployment at each location. An approximately equal volume of sediment was collected from intertidal sample sites using a plexiglass coring tube. All sediment was immediately frozen upon collection. The meiofauna fraction of all samples was subsequently extracted according to standard protocols (Somerfield et al. 2005) for decantation and flotation in Ludox™ using a 45μm sieve.
Per sample, environmental DNA was extracted from 200μl of sediment via bead beating using a Disruptor Genie (Zymo Research, Orange, CA). Two diagnostic regions of the 18S gene were amplified from environmental extracts using MID-tagged fusion primers SSU_F04/SSU_R22 (Blaxter et al. 1998) and NF1/18Sr2b (Porazinska et al. 2009). All reactions were carried out using 2μl of environmental genomic templates and the DyNAzyme EXT PCR kit (New England Biolabs) under the following reaction conditions: 95°C for 2 min followed by 30 cycles of denaturation at 95°C for 1 min, annealing at 50°C for 45 sec, extension at 72°C for 3 min, with a final extension of 72°C for 10 min. All PCR products were visualized on a 1.5% agarose gel containing Ethidium Bromide. Amplicons were purified using the QIAquick PCR purification kit (QIAGEN), and equimolar concentrations of all samples were submitted for sequencing. Sequencing was carried out on the GS FLX Titanium platform, returning a total of 1,316,988 sequence reads that averaged 350–450 bps in length; raw data has been deposited in Dryad (http://dryad.org) for public download.
Raw sequence reads were processed and clustered using both the OCTUPUS pipeline (Fonseca et al. 2010; available at http://octupus.sourceforge.net/), and UCLUST within the QIIME toolkit (Caporaso et al. 2010). In each workflow, short reads (<200 bp) were eliminated and sequences were de-multiplexed and trimmed for quality; quality checks were carried using Lucy-trim (Chou and Holmes 2001) in OCTUPUS and split_libraries.py in QIIME, returning 1,258,077 and 969,343 processed sequence reads, respectively, across the 12 marine sites (Table S1A); Reads were subsequently separated into two datasets reflecting sequencing direction (forward-primer and reverse-primer datasets) containing both 18S gene regions. Forward and reverse sequence reads were independently seeded and clustered into Operational Clustered Taxonomic Units (OCTUs) using MegaBLAST (Zhang et al. 2000) in OCTUPUS and UCLUST in QIIME with a defined pairwise sequence identity cutoff value (95–99%). In OCTUPUS, MUSCLE alignments (Edgar 2004) were used to generate consensus sequences from each OCTU; in UCLUST, a representative sequence read was chosen to represent each clustered OCTU. Taxonomic assignments were generated for the final list of OCTU consensus/representative sequences, using MegaBLAST to retrieve the top scoring hit existing in GenBank as of May 2011. OCTU sequences that did not return any significant hits (<90% sequence identity) were labelled as ‘no match’. Chimeric OCTUs were identified using the chimera.pl script in OCTUPUS, and the Blast Fragments approach (identify_chimeric_seqs.py in QIIME) for UCLUST data.
Base error rates for OCTU clustering in OCTUPUS were calculated according to the proportion of OCTUs containing obviously mis-assigned MID-tag reads; estimates were very low, ranging from 0.015–0.026% of total sequence reads. Baseline error rates were further used as a cutoff to eliminate OCTUs with potentially erroneous MID-tag assignments and eliminate incorrect geographic inferences in OCTUPUS; the number of OCTUs falling below baseline error estimates ranged from 0.07–1.94% of total OCTUs per dataset. OCTUs that showed a high identity to human DNA or common lab contaminants were eliminated from datasets. Putatively chimeric sequences were both removed and included in analyses in order to assess biodiversity patterns under varying degrees of stringency; removal of chimeras did not effect the inferred biogeographic patterns in sequence datasets. Taxonomic proportions were calculated using forward sequence reads, excluding all flagged chimeras; proportional changes across datasets were assessed using a two-tailed Z-test for proportion (n=2). Phylogeographic comparisons were conducted using ‘well-sampled’ OCTUs, defined as those which contained a minimum number of sequence reads per cluster (with a cutoff set at 3 Standard Deviations below the mean); for 95% clustered datasets, this ‘well-sampled’ cutoff was defined as OCTUs containing ≥47reads in OCTUPUS data and ≥40 reads in UCLUST data, while for 99% clustered datasets the defined cutoff was OCTUs containing ≥2 reads in OCTUPUS and ≥4 reads in UCLUST. This subset of OCTUs was used as a statistically defined estimate of regionally restricted versus putatively cosmopolitan taxa (Table 2) amongst marine eukaryotes.
Phylogenetic community analyses were conducted using the Fast UniFrac online toolkit (http://bmf2.colorado.edu/fastunifrac)(Hamady et al. 2010). Consensus OCTU sequences (representing both diagnostic regions of the 18S gene) were aligned to curated 18S structural alignments using the SINA aligner available on the SILVA rRNA database (http://www.arb-silva.de). Relaxed Neighbor-Joining trees were subsequently constructed using the ARB software suite (Ludwig et al. 2004) and Clearcut program available within the Xplor program (Frank 2008). Principle Coordinate Analyses and Jackknifed cluster analyses were run using both weighted (using normalized abundance values for sequences reads per OCTU) and unweighted (no abundance information) datasets; default sequence counts and a range of bootstrap permutations (100–1000) were used to calculate Jackknife replicates. Observed biogeographic patterns were concordant across all UniFrac tools and parameters. A subset of OCTUs (those containing >10 sequence reads) whose assigned taxonomy was denoted as “Environmental” or “No Match” was manually examined in NJ tree topologies; the phylogenetic placement of these sequences was as a basis for subsequent taxonomic inferences.
Rarefaction curves were estimated using the parallel_multiple_rarefaction.py script in the QIIME pipeline. Prior to calculation of alpha diversity, high quality, trimmed sequence reads were clustered via UCLUST (99% similarity cutoff), with no separation of forward and reverse sequence reads. Rarefaction curves were calculated by subsampling the resulting OCTUs, with pseudoreplicate datasets containing between 10 and 20682 OTU sequences (in steps of 2067) with 10 repetitions performed per pseudoreplicate.
Environmental amplicons from 12 marine sites were pooled and sequenced on a full plate run of the GS FLX Titanium platform (Roche), returning ~1.3 million raw sequence reads. The highest quality reads were assigned to operationally clustered taxonomic units (OCTUs) using pairwise distance-based clustering in multiple computational pipelines (UCLUST and OCTUPUS); in each toolkit, clustering was repeatedly carried out under a range of sequence identity values (95–99%). A subset of reads (the F04/R22 locus from Atlantic sties 22#1 and 25#2) were subjected to stringent denoising in AmpliconNoise followed by Perseus (Quince et al. 2011); denoising pipelines were found to be too computationally intensive to run on the full 454 dataset. Empirical evidence suggests that 95% clustering tends to lump closely related biological species and genera, while 99% clustering effectively splits species into multiple OCTUs representing dominant and minor 18S variant sequences within individual species (Porazinska et al. 2010); thus, we focused on investigating biological patterns using these two parameters (‘relaxed’ clustering at 95% and ‘stringent’ clustering at 99%) as representing extreme ends of the clustering spectrum. Taxonomic assignments for each OCTU were derived from the top-scoring BLAST match (exhibiting >90% pairwise identity) recovered in public sequence databases. All analyses, including denoised datasets, recovered a diversity of eukaryotic taxa and suggested high levels of species richness across the marine samples analyzed. Denoising dramatically reduced the number of OCTUs (Table 1) but did not impact taxonomic inferences for the subset of sites analysed; the identities of abundant taxa were consistent irrespective of denoising. Similar taxonomic information was recovered from independently sequenced 18S gene regions (Table S2), illustrating the broad taxonomic coverage obtainable with conserved metazoan 18S primers. Our protocols were able to recover a substantial number of unicellular eukaryotes and 25 metazoan phyla (Table S2), including two of the most recently discovered and enigmatic lineages: Gnathostomulida and Loricifera (Littlewood et al. 1998; Sorensen et al. 2008). Despite this seemingly comprehensive coverage, it is likely that experimental biases (loss of taxa during sediment processing, failed primer binding) inherently prevented the recovery of all eukaryotic taxa present in marine samples. Rarefaction curves (Chao1 and Observed Species metrics, Figure 1) indicate that eukaryotic diversity was not exhaustively characterized, despite a deep sequencing effort across sample locations. Taxonomic proportions recovered at each sample site (Figure 2) further reveal a high variability in eukaryotic community assemblages, regardless of habitat; this variability is also supported by denoised data (Figure S1)
A notable portion of OCTUs (~10%) recovered no significant match (sequence identity <90%) to known ribosomal sequences. Although these taxa potentially represent novel eukaryotic lineages, the failure to recover a close sequence match likely reflects taxonomic gaps in public databases (Berney et al. 2004). To explore this phenomenon further, we manually investigated the phylogenetic placement of ‘Environmental’ and ‘No Match’ OCTUs (containing >10 raw sequence reads in forward-read datasets clustered at 99% identity) within Neighbour-Joining tree topologies. All examined sequences display similarity to taxa within known eukaryotic groups, although our focused analysis suggests that many of these OCTUs represent deep lineages in known phyla. Out of 986 unknown OCTUs examined, the majority (67.8%) represented unicellular eukaryotes, while the remaining OCTUs were assigned as nematodes (12.2%), algae (7.8%), stramenopiles (6.8%), fungi (3.7%), or other metazoa (1.7%). Very few unknown OCTUs grouped within the Arthropoda (0.5%), or Annelida (0.1%), suggesting that 18S data for these groups is relatively robust compared to other taxa.
Statistically-defined cutoffs (see materials and methods) were used to extract ‘well-sampled’ OCTUs and infer patterns of species distributions. At both stringent (99%) and relaxed (95%) clustering values, a subset of OCTUs appear to have cosmopolitan distributions spanning disparate geographic locales (present in both deep-sea Pacific and Atlantic sites, Table 2) or large depth gradients (present in intertidal and deep-sea sediments, Table 2). The proportion of these putatively cosmopolitan and eurybathic taxa drops dramatically with increasing clustering stringency; under relaxed clustering (95% sequence identity) in the OCTUPUS pipeline 75% of OCTUs were recovered as cosmopolitan and 37% appeared eurybathic, while these proportions drop to 9.08% and 1.5%, respectively, in stringently clustered datasets (Table 2). Similar patterns were evident after independent OTU clustering in UCLUST (Table 2). These results confirm cosmopolitan distributions amongst meiofaunal eukaryotes, although it appears to be the exception rather than the rule for marine taxa. Under the most stringent clustering parameters, the number of putatively eurybathic OCTUs is six to eight-fold lower than cosmopolitan deep-sea OCTUs; this distinct separation of deep-sea and shallow water taxa may reflect the significant physical differences between these two habitats—most taxa probably lack the physiological adaptations required to surmount large bathymetric gradients. Conversely, the deep-sea environment is largely stable, perhaps allowing increased survival rates and encouraging long-distance dispersal across a physically homogenous habitat.
In a phylogenetic analysis of community structure, Principle Coordinate Analyses (Figure 3A, 3C) and Jackknifed Clustering Analyses (Figure 3B, 3D) further supported a distinction between intertidal and deep-water sample sites as well as a notable separation of deep-sea Pacific and Atlantic sites; the same patterns were observed for OCTUs clustered using both stringent (99%) and relaxed (95%) pairwise identity cutoffs. In phylogenetic diversity analyses, deep-sea sites showed a higher degree of similarity in eukaryotic community structure, although there is an overall separation between Atlantic and Pacific Ocean basins. Since the present study included a limited number of intertidal sampling sites (2 locations), we further incorporated an expanded dataset including Fonseca et al.’s (2010) SSU pyrosequencing data (homologous to Region 1 in this study) from nine additional intertidal sites along the UK coastline. In this independent analysis, the observed geographic patterns remained consistent and we additionally recovered the same clustering patterns amongst UK sites as reported in the Fonseca et al. study (Figure S2).
Although the literature is peppered with attempts to address “Everything is Everywhere”, the hypothesis itself remains virtually untestable. The globally exhaustive sampling strategy and concurrent sampling depth required to determine if E is E for microbial eukaryotes is fundamentally prohibitive. Nevertheless, the explosion of high-throughput sequencing methods now enables us—for the first time—to test broad scale patterns of species distributions and community assemblages for these understudied taxa. In marine habitats, there is ample evidence to suggest that depth gradients limit species distributions (Howell et al. 2002; Aldea et al. 2008) seemingly far more than geographic distance (Pawlowski et al. 2007). Although our sample set was limited to 12 sites, our phylogenetically-informative analyses overwhelmingly supported a distinct separation of shallow water and deep-sea communities; despite the additional inclusion of relatively proximate shallow water sites (Littlehampton and Prestwick, UK), eukaryotic communities at deep-sea Atlantic sites strongly cluster with deep-sea Pacific sites located ~9000 kilometers away (95% bootstrap, Figure S3). Furthermore, geographic proximity did not always correlate with similar deep-sea community structures. Pacific sites 422 and 528 had similar community assemblages (Figure 2) and exhibited a close association in Jackknife cluster analyses (>99.9% support in Figure 3D) despite being separated by ~400km and 1000m in depth; Pacific site 321 and 422 are physically closer (only ~36km apart), yet show different taxon abundances (Figure 2) and are distinctly separated in both Jackknife cluster analysis and PCoA. These observations were concordant across both 18S gene regions, and suggest a complex pattern of species distributions at smaller scales in the deep-sea.
The assigned taxonomic identities of OCTUs suggest an inherent bias amongst microbial eukaryotes displaying cosmopolitan and eurybathic distributions. Such bias was evident regardless of clustering methods applied to raw data, although clustering method appeared to have some effect on the reported taxonomies of eurybathic and cosmopolitan deep-sea OTU subsets. In OCTUPUS and UCLUST, fungal OCTUs consistently account for a significantly larger proportion of putatively eurybathic and cosmopolitan deep-sea OCTUs (Figure 4 and Figure S4; Z-test, p < 0.003 in all datasets), implying that fungal species may be particularly adept at dispersing in marine environments. In UCLUST, unicellular eukaryotes always represented a smaller proportion of ubiquitous taxa (Z-test, p < 0.005 in all datasets) compared to the full environmental dataset (Figure S4B, S4E), although these taxa appeared to show increased dominance amongst cosmopolitan deep-sea OCTUs (Z-test, p < 0.000 amongst cosmopolitan taxa at 95% and 99%). These observations may reflect differences in the two computational approaches; the process of OTU clustering is not well understood for eukaryote taxa with pervasive intragenomic rRNA variation, and thus our biological interpretations of high-throughput data may be easily skewed without extensive investigation of such dubious patterns (e.g. by tracking OTU membership via raw sequence reads). Overall proportions OCTUs from UCLUST datasets generally corresponded to the taxonomic distributions observed in OCTUPUS data at both 95% and 99% clustering cutoff (Figure 4A, 4D and Figure S4A, S4D). Amongst cosmopolitan taxa, a striking taxonomic bias was observed within the Phylum Nematoda (Figure S5) where the order Enoplida was the sole group to show a significant proportional increase under both stringent and relaxed clustering in OCTUPUS (Z-tests, p=0.008 at 95%, p=0.000 at 99%), constituting 85% of cosmopolitan nematode OCTUs under the most stringent parameters (99% clustering). These observed taxonomic biases amongst cosmopolitan taxa may reflect specific life history traits; many unicellular eukaryotes are known to produce dispersive propagule stages (Alve and Goldstein 2003), and recent evidence suggests mechanisms for long-distance dispersal in Enoplid nematodes (Bik et al. 2010).
Exploring OCTU clustering in the context of geography offers substantial insight into the relationship between shallow and deepwater eukaryote communities. In comparisons of datasets clustered under different parameters (stringent vs. relaxed pairwise cutoffs), deep-sea species inhabiting Pacific and Atlantic sites show lower overall genetic divergence, versus taxa found across depth gradients (shallow to deep) where genetic divergence appears more pronounced. Half to three-quarters of deep-sea OCTUs are recovered as ‘cosmopolitan’ taxa using a relaxed 95% cutoff, whereby OCTUs contain sequences representing both Pacific and Atlantic locations; however, under stringent clustering parameters, the majority of OCTUs (88–90%) contain sequence reads from only one ocean basin (Table 2). This implies that within phyla, many closely related (but geographically isolated) deep-sea taxa show less than 5% divergence in rRNA genes. Increasing the maximum pairwise identity cutoff to 99% during OCTU clustering discourages lumping of reads and returns many regionally restricted OCTUs that may reflect a more accurate picture of biological species distributions. In contrast, a significant genetic divergence is apparent across bathymetric gradients; the number of ‘eurybathic’ taxa (OCTUs containing raw reads isolated from both shallow water and deep-sea sites) is low even under relaxed parameters (19–37% of OCTUs clustered at 95% identity) and drops to a minute subset of taxa (~1.5%) under stringent clustering parameters (Table 2). These patterns suggest that sister taxa in shallow water habitats exhibit a larger divergence in rRNA loci (>5%), and this reduces lumping of OCTUs under relaxed clustering. The lower pairwise distance amongst deep-sea OCTUs is further supported by full-length 18S sequence data from Enoplid nematodes (Bik et al. 2010); while nematodes exhibited consistent, deep divergences between shallow water and deep-sea clades, the internal structure of deep-sea groups showed no correlation with geography. These observations support an overall lower genetic divergence across deep-sea communities. These patterns may be indicative of a relatively recent geographic isolation between many microbial eukaryote taxa, with a shorter coalescence time between closely related deep-sea species compared to their shallow water counterparts. The extreme conditions and static nature of deep-sea habitats could also plausibly lead to slower rates of evolution in meiofaunal rRNA genes—a scenario that could provide an alternative explanation for the observed results.
Metagenetic methods offer much promise for gaining deep insight into diverse (and historically understudied) benthic communities in marine environments. In addition to piecing together global biogeographic patterns for microbial eukaryote species, the relative abundance of taxa can offer powerful insight. Although the average proportions of taxa were consistent when all samples were pooled (e.g. total OCTUs in OCTUPUS and UCLUST clustering, Figures 4 and S4), a fine-scale resolution becomes apparent when the eukaryotic community structure is examined at individual sample sites. In taxonomic studies, nematodes are often cited as the most abundant members of sediment communities (Lambshead 2004), but our deep sequencing results suggest an equal or more dominant role for other taxonomic groups at some sites. Recent high-throughput studies have uncovered similar hidden abundances (Platyhelmithes, (Fonseca et al. 2010) ), likely reflecting historical artefacts from traditional preservation methods. In our results, many sites exhibited lower relative abundances of nematodes and specific dominance of other taxonomic groups (e.g. Unicellular Eukaryotes at Pacific237, Platyhelminthes at the Shallow California site; Figure 2). It is presently unclear whether these patterns represent stochastic variation, or correlate with biological assemblages or specific habitat characteristics. Deep-sea Atlantic samples were collected from submarine canyon ecosystems off the Iberian margin; previous evidence suggests that continual disturbance in canyons results in lower species richness in nematodes, with dominance of predatory and scavenging genera (Ingels et al. 2009). Our results from canyon sediments reported a lower number of sequence reads equating to fewer OCTUs (Atlantic 22.1 and 25.2; Figure 2, Table S1A), an increased dominance of fungal taxa (Atlantic sites 25.2, 43, 45; Figure 2), and many predatory nematode taxa amongst the most abundant OCTUs at Atlantic sites. These observations may be indicative of increased physical disturbance in these canyon habitats compared to Pacific deep-sea locations, although further work is needed to fully characterize these preliminary insights.
Regardless of habitat, microbial eukaryote communities appear to be extremely complex and diverse across marine habitats. Rarefaction curves did not reach asymptote at any site (despite our deep sampling effort), suggesting a substantial amount of undiscovered biodiversity. Understanding ecological interactions through taxon presence/absence, relative abundance data, and phylogenetic lineages is the ultimate goal of high-throughput studies. However, there currently no robust bioinformatic pipelines for teasing out subtle associations between habitat metadata and high-throughput sequence data. Developing tools to provide more detailed taxonomic insight and a deeper exploration of community diversity analysis (e.g. within QIIME (Caporaso et al. 2010)) is an active, and exciting, area of research. Future high-throughput studies will additionally need to adopt a more detailed sampling regime (e.g. intensive assessment at multiple spatial scales) to clarify the scale and extent of community variation across marine ecosystems.
Assigning accurate taxonomy to eukaryotic OCTUs is inherently more difficult than popular approaches utilized in bacterial studies. While there are a number of alignment-based tools in existence which utilize secondary structure information for classification (Greengenes and RDP classifier, both implemented in QIIME (Caporaso et al. 2010)), these are currently restricted to bacterial and archaeal taxa; thus, utilizing pairwise identity scores (via BLAST) is currently the most informatically feasible option for large eukaryotic rRNA datasets. For eukaryotes, BLAST-derived taxonomy can provide a robust overview of communities at higher taxonomic levels, but such assignments data should generally be treated with caution. Our approach required a minimum pairwise identity score >90% for OCTUs to receive a taxonomic assignment from BLAST searches. The relative paucity of eukaryotic sequence data in public databases (versus known taxonomic diversity) results in many OCTUs displaying no significant hit, or the recovery of top BLAST matches with very low sequence identity (often below 95%); thus, many OCTU assignments can only be trusted down to phylum level, at best. “Environmental” OCTUs or those with no sufficient BLAST match represented nearly a quarter of clusters at under both stringent and relaxed clustering parameters (Figures 4 and S4). Aligning these sequences and manually importing them into a SILVA reference phylogeny can confirm the biological reality and taxonomic identity of these OCTUs; however, interpreting OCTU data in such an evolutionary context is labor intensive, with no automated approaches currently in existence for high-throughput eukaryotic data. Database modifications can also significantly impact analyses—for example, the increase in ‘environmental’ hits in denoised datasets (Figure S1, versus prior UCLUST analysis in Figure 2) likely stems from new depositions in GenBank, and adds a frustrating layer of ambiguity when attempting subsequent comparisons. Furthermore, even if OCTUs exhibit high BLAST scores to classified sequences, the hierarchical NCBI taxonomy and user-submitted annotation cannot always be trusted as accurate. Manual examination of BLAST results and subsequent phylogenetic examination of OCTUs reveals the presence of Acoela and Nemertodermatida (two groups of early-branching bilaterians) in our marine dataset. Accepting the assigned phylum-level NCBI taxonomy currently lumps these flat worms into the Platyhelminthes and ignores their distinct evolutionary origins (Hejnol et al. 2009; Mwinyi et al. 2010; Philippe et al. 2011), potentially hindering ecological inferences from high-throughput datasets. Because of these persistent issues, interpreting taxonomy within a phylogenetic framework will be critical for future studies.
Although our sampling scheme was biased and not globally intensive, the included variables have produced exciting insight into global distributions of microbial eukaryotes. OCTUs cannot necessarily be reconciled with biological species; lower cutoffs are known to lump taxonomic genera or even orders together, while the most stringent cutoffs (e.g. 99%) can substantially oversplit well-established species (Porazinska et al. 2010). Despite this proviso, a subset of eurybathic and cosmopolitan taxa were consistently recovered across a range of OCTU clustering parameters (95–99% identity in UCLUST and OCTUPUS) and these patterns withstood stringent, statistically-determined filtering measures. Although each OTU may not represent a separate species, the surprising consistency of biogeographic patterns must correlate (at least to some degree) with true biological species. High-throughput environmental datasets are often plagued by sequence chimeras (PCR artefacts) generated during initial gene amplification; in clustered metagenetic datasets, it is thought that these chimeras tend to appear as sample-specific OCTUs (e.g. restricted to single PCR reactions) containing relatively low numbers of sequence reads. This fact, coupled with our stringent pruning of putatively chimeric OTU sequences, means that chimeras cannot be used to discount the cosmopolitan and eurybathic distributions we observed in a subset of meiofaunal taxa. In addition, the cladogram of marine sites (Figure 3B,3D) was inferred on the basis of phylogenetic lineages shared between geographic locations (thus, non-chimeric OCTUs); these striking patterns would not be recovered if PCR artefacts were substantially impacting our analyses. These results strongly support cosmopolitan distributions for some microbial eukaryotes in marine habitats.
Finally, we acknowledge the taxonomic bias which may have been imposed during our extraction protocols; although the use of a 45um sieve is established practice for meiofauna studies, mesh pores are still relatively wide and may favor the recovery of larger species or life stages, particularly in finer deep-sea sediments. Despite this methodological bias, our interpretation of cosmopolitanism is not obviously skewed—organisms with both small and large body sizes were recovered within our subsets of cosmopolitan and eurybathic taxa (Figure 4). High-throughput studies are only now scratching the surface of global eukaryotic biodiversity; biogeographic patterns recovered in this study are essentially preliminary insights into an emerging field of research. As more locations are scrutinized and sequencing effort becomes increasingly deeper with newer technology, a focus on metadata and sophisticated computational tools will allow us to develop an organic understanding of marine ecosystems.
Table S1: Sample metadata and distribution of sequence reads across 12 marine sites. Table S1A: Number of raw reads obtained after independent quality checks and filtering in the QIIME and OCTUPUS pipelines. Table S1B: Corresponding metadata for each marine site analysed, including sediment type, collection date, GPS coordinates and depth information.
Table S2: Taxonomic Breakdown of all OCTUs recovered at 95% and 99% sequence identity cutoffs in the OCTUPUS pipeline, based on top-scoring BLAST matches obtained from two 18S gene regions. Comparable taxonomic diversities were observed in both independently analyzed gene regions. OCTUs denoted as ‘no match’ did not recover any significant BLAST hits (exhibiting <90% pairwise sequence identity to sequences in public databases); contaminant OTUs (Human and common lab contaminants) removed from dataset.
Figure S1: Eukaryotic community assemblages at Atlantic sites 22#1 and 25#2 following noise removal. Taxonomic assemblages inferred from raw pyrosequencing data denoised in AmpliconNoise followed by Perseus (Quince et al. 2011); proportions calculated from non-chimeric OCTUs recovered after clustering in OCTUPUS analysis at a 99% pairwise similarity cutoff.
Figure S2: Map displaying sample sites utilized in this study, representing Pacific (blue) and Atlantic (red) locations. Solid circles represent deep-sea sites, while torus shapes reflect shallow-water locations.
Figure S3: Jackknife Cluster Analysis using an expanded SSU pyrosequencing dataset. OCTUs represent Region 1 of SSU gene, clustered at 95% in the OCTUPUS pipeline. OCTU dataset was compiled using pooled raw data (forward and reverse reads) from the present study and published data obtained from Fonseca et al. (Fonseca et al. 2010).
Figure S4: Proportions of eukaryotic taxa recovered under relaxed (95%) and stringent (99%) clustering with UCLUST. Proportions calculated from non-chimeric OCTUs representing forward sequencing reads from both 18S gene regions. Only significant proportional increases (+) and decreases (−) are outlined across datasets; Black symbols represent changes observed between differential clustering of non-chimeric sequence reads at 95% (A) versus 99% (D) cutoffs. White symbols represent proportional changes between all non-chimeric OCTUs and subsets of OCTUs with observed eurybathic (B, E) and cosmopolitan (C, F) distributions.
Figure S5: Proportional changes observed for non-chimeric nematode OCTUs (BLAST-derived taxonomy) representing forward sequencing reads from both 18S loci. Significant proportional increases (+) and decreases (−) are denoted. Black symbols represent changes in taxonomic membership observed between clustering of non-chimeric sequence reads at 95% (A) versus 99% (D) cutoffs. Brown symbols represent proportional changes between all non-chimeric nematode OCTUs and subsets of nematode OCTUs with observed eurybathic (B, E) and cosmopolitan (C, F) distributions.
We thank D. Thistle (NSF grant OCE-0727243), V. Huvenne, T.J. Pereira and L. Hyde for collecting sediment cores. This work was supported by NSF (DEB 0228962 and DEB 0315829) and NIH (NIH-1P20RR030360-01)
Raw 454 reads and processed OTU data: DRYAD entry doi:10.5061/dryad.qm8f3