Environmental amplicons from 12 marine sites were pooled and sequenced on a full plate run of the GS FLX Titanium platform (Roche), returning ~1.3 million raw sequence reads. The highest quality reads were assigned to operationally clustered taxonomic units (OCTUs) using pairwise distance-based clustering in multiple computational pipelines (UCLUST and OCTUPUS); in each toolkit, clustering was repeatedly carried out under a range of sequence identity values (95–99%). A subset of reads (the F04/R22 locus from Atlantic sties 22#1 and 25#2) were subjected to stringent denoising in AmpliconNoise followed by Perseus (Quince et al. 2011
); denoising pipelines were found to be too computationally intensive to run on the full 454 dataset. Empirical evidence suggests that 95% clustering tends to lump closely related biological species and genera, while 99% clustering effectively splits species into multiple OCTUs representing dominant and minor 18S variant sequences within individual species (Porazinska et al. 2010
); thus, we focused on investigating biological patterns using these two parameters (‘relaxed’ clustering at 95% and ‘stringent’ clustering at 99%) as representing extreme ends of the clustering spectrum. Taxonomic assignments for each OCTU were derived from the top-scoring BLAST match (exhibiting >90% pairwise identity) recovered in public sequence databases. All analyses, including denoised datasets, recovered a diversity of eukaryotic taxa and suggested high levels of species richness across the marine samples analyzed. Denoising dramatically reduced the number of OCTUs () but did not impact taxonomic inferences for the subset of sites analysed; the identities of abundant taxa were consistent irrespective of denoising. Similar taxonomic information was recovered from independently sequenced 18S gene regions (Table S2
), illustrating the broad taxonomic coverage obtainable with conserved metazoan 18S primers. Our protocols were able to recover a substantial number of unicellular eukaryotes and 25 metazoan phyla (Table S2
), including two of the most recently discovered and enigmatic lineages: Gnathostomulida and Loricifera (Littlewood et al. 1998
; Sorensen et al. 2008
). Despite this seemingly comprehensive coverage, it is likely that experimental biases (loss of taxa during sediment processing, failed primer binding) inherently prevented the recovery of all eukaryotic taxa present in marine samples. Rarefaction curves (Chao1 and Observed Species metrics, ) indicate that eukaryotic diversity was not exhaustively characterized, despite a deep sequencing effort across sample locations. Taxonomic proportions recovered at each sample site () further reveal a high variability in eukaryotic community assemblages, regardless of habitat; this variability is also supported by denoised data (Figure S1
Effect of denoising on OTU number
Rarefaction Curves compiled using Chao1 estimation (top) and observed species counts (bottom)
Eukaryotic community assemblages present across marine sites
A notable portion of OCTUs (~10%) recovered no significant match (sequence identity <90%) to known ribosomal sequences. Although these taxa potentially represent novel eukaryotic lineages, the failure to recover a close sequence match likely reflects taxonomic gaps in public databases (Berney et al. 2004
). To explore this phenomenon further, we manually investigated the phylogenetic placement of ‘Environmental’ and ‘No Match’ OCTUs (containing >10 raw sequence reads in forward-read datasets clustered at 99% identity) within Neighbour-Joining tree topologies. All examined sequences display similarity to taxa within known eukaryotic groups, although our focused analysis suggests that many of these OCTUs represent deep lineages in known phyla. Out of 986 unknown OCTUs examined, the majority (67.8%) represented unicellular eukaryotes, while the remaining OCTUs were assigned as nematodes (12.2%), algae (7.8%), stramenopiles (6.8%), fungi (3.7%), or other metazoa (1.7%). Very few unknown OCTUs grouped within the Arthropoda (0.5%), or Annelida (0.1%), suggesting that 18S data for these groups is relatively robust compared to other taxa.
Statistically-defined cutoffs (see materials and methods) were used to extract ‘well-sampled’ OCTUs and infer patterns of species distributions. At both stringent (99%) and relaxed (95%) clustering values, a subset of OCTUs appear to have cosmopolitan distributions spanning disparate geographic locales (present in both deep-sea Pacific and Atlantic sites, ) or large depth gradients (present in intertidal and deep-sea sediments, ). The proportion of these putatively cosmopolitan and eurybathic taxa drops dramatically with increasing clustering stringency; under relaxed clustering (95% sequence identity) in the OCTUPUS pipeline 75% of OCTUs were recovered as cosmopolitan and 37% appeared eurybathic, while these proportions drop to 9.08% and 1.5%, respectively, in stringently clustered datasets (). Similar patterns were evident after independent OTU clustering in UCLUST (). These results confirm cosmopolitan distributions amongst meiofaunal eukaryotes, although it appears to be the exception rather than the rule for marine taxa. Under the most stringent clustering parameters, the number of putatively eurybathic OCTUs is six to eight-fold lower than cosmopolitan deep-sea OCTUs; this distinct separation of deep-sea and shallow water taxa may reflect the significant physical differences between these two habitats—most taxa probably lack the physiological adaptations required to surmount large bathymetric gradients. Conversely, the deep-sea environment is largely stable, perhaps allowing increased survival rates and encouraging long-distance dispersal across a physically homogenous habitat.
In a phylogenetic analysis of community structure, Principle Coordinate Analyses () and Jackknifed Clustering Analyses () further supported a distinction between intertidal and deep-water sample sites as well as a notable separation of deep-sea Pacific and Atlantic sites; the same patterns were observed for OCTUs clustered using both stringent (99%) and relaxed (95%) pairwise identity cutoffs. In phylogenetic diversity analyses, deep-sea sites showed a higher degree of similarity in eukaryotic community structure, although there is an overall separation between Atlantic and Pacific Ocean basins. Since the present study included a limited number of intertidal sampling sites (2 locations), we further incorporated an expanded dataset including Fonseca et al.’s (2010)
SSU pyrosequencing data (homologous to Region 1 in this study) from nine additional intertidal sites along the UK coastline. In this independent analysis, the observed geographic patterns remained consistent and we additionally recovered the same clustering patterns amongst UK sites as reported in the Fonseca et al.
study (Figure S2