Uncultured unicellular eukaryotes have critical roles in global CO2 fixation in the oceans. The dilemma is that as earth systems undergo climate change, the responses of these elusive organisms and other uncultured taxa are nearly impossible to study or predict. Researchers are now investigating uncultured microbes by sequencing their genomes directly from the environment. The approach itself sounds straightforward: use fluorescently activated cell sorting (FACS) to separate populations, amplify their DNA by multiple displacement amplification (MDA), and sequence individual genes or genome fragments. But artifacts can be introduced at each step in the process and careful consideration is required during the data interpretation phase.
For many years, oceanographers inferred from pigment distributions that a group of unicellular eukaryotic phytoplankton belonging to the prymnesiophytes were important - and that these prymnesiophytes were tiny (picoplankton, up to 2 to 3 μm in diameter). Yet no such prymnesiophyte existed in culture - at least none that matched the 18S rDNA sequences commonly retrieved from the ocean by PCR. Two recent publications showed that uncultured pico-prymnesiophytes are responsible for 25 ± 9% of the overall primary production (photosynthetic uptake of CO2
into plant-like biomass) in the North Atlantic [1
]. Moreover, pico-prymnesiophytes form a significant portion of picoplanktonic photosynthetic biomass in biogeographical provinces stretching from the tropics to high latitude seas [1
In 2010 we sequenced a targeted metagenome from a wild pico-prymnesiophyte population using their natural photosynthetic pigments and size characteristics, at-sea FACS and MDA [2
], a non-PCR based DNA amplification technique that uses random hexamer primers and the bacteriophage-derived Ф29 polymerase. The resulting partial genome assemblies revealed densely packed genomes with sparse intergenic regions and novel features for phytoplankton. Since then, a second study has explored the 18S rDNA diversity of discrete eukaryotic phytoplankton populations, again using FACS, MDA and then PCR and 18S rRNA gene clone library construction [3
In addition to phytoplankton such as pico-prymnesiophytes, there are rare eukaryotes in seawater. An intriguing uncultivated group of such eukaryotes, the biliphytes, has been an elusive target because of their sparseness in marine samples and difficulties in attaining statistically supported data. However, a new study [4
] reports sequencing of biliphytes using FACS and MDA. Biliphytes were initially thought to potentially represent a unique group of red algae; however, more comprehensive phylogenetic analysis placed them elsewhere in the eukaryotic tree of life [5
]. Moreover, microscopy work indicated they contained orange, phycobilin-like fluorescence (a photosynthetic pigment found in cyanobacteria and some eukaryotic algae) and were picoplanktonic in size, and hence they were named 'picobiliphytes' [5
]. The same study also suggested they contained a remnant nucleus (a nucleomorph) from a eukaryote engulfed in an ancient secondary endosymbiotic event [5
]. This was very exciting because only two lineages are known to have nucleomorphs, making it difficult to trace evolutionary relationships between different ancestral host eukaryotes because of mixing of genes from hosts and the photosynthetic eukaryotes that they engulfed long ago. A subsequent study [6
] more tentatively suggested that they were photosynthetic, and placed the group in a similar (although not statistically supported) phylogenetic position, but found no evidence for a nucleomorph. The uncultured cells were more abundant in this study and appeared larger, about 3.5 to 4 μm diameter, so the group was renamed 'biliphytes'.
The recent FACS/MDA study of uncultured marine eukaryotes looked at single biliphyte cells [4
]. Biliphyte genome fragments were sequenced along with those of co-associated entities. The study [4
] found no nucleomorph-like genes, supporting inferences from microscopy results [2
] that no nucleomorph was present. The results also supported inferences [7
] that biliphytes may not be photosynthetic but perhaps facultative mixotrophs or phagotrophs, whereby transient detection of orange fluorescence could represent ingested prey items (for example, Synechococcus
Assessing characteristics that might be absent from a genome using partial genome sequences from single cells or populations hinges on the relationship between genome recovery and arguments about absence. These arguments can be tested using the Bernoulli distribution, a probability distribution of the number of successes from multiple independent yes/no experiments, each with the same probability of success, but only after the critical task of estimating genome recovery has been accomplished. There are inherent biases in MDA reactions that lead to insufficient coverage of entire genomes, and this is confounded by the possibility that a single FACS gating event can include more than one organism. Bacteria and viruses often reside in close extracellular association with eukaryotic cells [8
]. Diverse uncultured fungi have recently been discovered; these attach to diatoms and presumably other microbes too [9
]. MDA itself can also introduce artificial contaminants.
] and Yoon et al.
] both highlight the confounding influence of natural or artificial genomic contaminants in FACS/MDA-derived data. In [4
], in addition to biliphyte sequences, assemblies were generated from viruses and a bacterium hypothesized to be an ingested prey, but alternatively may have been attached to the cell surface of the sorted biliphyte cell. Each contaminating genome fragment, regardless of derivation, increases the apparent total genome pool that is sampled, reducing the probability of sampling the targeted taxon. The chances of Yoon et al.
] not recovering 150 genes encoded by the plastid (that one would expect to find if the organism was photosynthetic) in 6,000 independent sampling events from a pool of 12,000 genes is unlikely, but not implausible given MDA biases. The chances of not detecting specific genes increase if the gene pool is larger; biliphytes may have a larger gene pool given that a comparison with complete genomes from smaller eukaryotes was used to generate this estimate.
A hurdle for future efforts is implementation of bioinformatic methods for separating a heterogeneous genome population into its individual constituents. Genome sorting is further hindered by the chimeric nature of many eukaryotic genomes, which contain phylogenetic signals for other lineages and even bacterial phyla [10
]. Rigorous approaches are required to confidently classify data into genetic material from target cells versus that from other co-sorted entities. Phylogenomic filters can help identify bona fide
scaffolds assembled from target taxa reads to conservatively restrict comparative analyses. For example, Yoon et al
. reduced their data set using blastX to select 7.9 Mbp of contigs of eukaryotic, and possibly biliphyte, origin from approximately 28 Mbp of assembly derived from just over 3 Gbp of raw sequence. Further, they reported globally on the taxonomic content of open reading frames within contigs by using BLASTx combined with phylogenomic profiling; their analyses returned 5,231 phylogenetic trees. A classifiable majority of the putative picobiliphyte proteins were phylogenetically most similar to either Metazoa, Viridiplantae or Stramenopiles [4
]. An alternative approach is to include contig-level phylogenetic classification and analyses of expected and recovered gene family distributions, from which genomic properties such as gene size and density can be inferred [2
An ongoing difficulty is the paucity of appropriate reference genomes for the phylogenomic filtering database. For studies of genome sequences from cultured eukaryotes, confident phylogenetic classification of genes from distant lineages (for example, genes of bacterial or viral origin) [10
] is derived from genomic assembly - which is facilitated by sufficient sequence coverage for each nucleotide position. This type of coverage and corresponding assembly has been difficult to achieve for MDA material generated from eukaryotic nuclear DNA. This problem should be solved soon given initiatives by several single-cell genomics groups.
Perhaps most exciting in the latest publication on sequencing of uncultured eukaryotes [4
] was retrieval of genomes from viruses that were either attached to or infecting biliphytes. These had high coverage, assembled well and represented viruses for which no data were available. Viruses are enormously abundant in marine environments and have important roles in shaping the population dynamics of prokaryotic and eukaryotic microbes - their genomes can contain genes that seem to be 'stolen' from their hosts and can reveal especially important adaptations and environmental pressures.
Genomics of uncultured eukaryotic microbes [2
] is providing new information on their biology and interactions with other microbes. The findings of Yoon et al.
] also highlight that diversity of organism-organism interactions might drive the low cohesiveness between the three biliphyte cells investigated. The future holds much excitement - new discoveries will come from scaling up and refinement of current approaches for sorting facts about uncultured eukaryotes from fiction.