The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed “fragment recruitment,” addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed “extreme assembly,” made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.
Marine microbes remain elusive and mysterious, even though they are the most abundant life form in the ocean, form the base of the marine food web, and drive energy and nutrient cycling. We know so little about the vast majority of microbes because only a small percentage can be cultivated and studied in the lab. Here we report on the Global Ocean Sampling expedition, an environmental metagenomics project that aims to shed light on the role of marine microbes by sequencing their DNA without first needing to isolate individual organisms. A total of 41 different samples were taken from a wide variety of aquatic habitats collected over 8,000 km. The resulting 7.7 million sequencing reads provide an unprecedented look at the incredible diversity and heterogeneity in naturally occurring microbial populations. We have developed new bioinformatic methods to reconstitute large portions of both cultured and uncultured microbial genomes. Organism diversity is analyzed in relation to sampling locations and environmental pressures. Taken together, these data and analyses serve as a foundation for greatly expanding our understanding of individual microbial lineages and their evolution, the nature of marine microbial communities, and how they are impacted by and impact our world.
TheSorcerer II GOS expedition, data sampling, and analysis is described. The immense diversity in the sequence data required novel comparative genomic assembly methods, which uncovered genomic differences that marker-based methods could not.