Networks of microorganisms constitute the chemical infrastructure of Earth’s biosphere. Microbial communities varying in complexity and vigor entwine every ecosystem on the planet. Humans depend on these microbial systems for global primary production (Liu et al., 1997
), ecosystem services (Brussaard, 1997
; Matson et al., 2011
), industrial processes (Prescott & Dunn, 1949
; Bai et al., 2008
), and most intimately in regard to the human microbiome (Proctor, 2011
; Relman, 2011
). As a result, there is tremendous interest in describing the physiology of these microorganisms, their relationships to one another, and their impact on human society. Presumptively, our ability to predict the response of microbial communities to perturbation (anthropogenic and otherwise) will improve in accord with the depth of our understanding. The standard of knowledge is higher yet for efforts to engineer the function of microbial systems.
The ongoing revolution in genomic science and sequencing technology has strongly impacted environmental genomics, providing increasingly comprehensive ‘shotgun’ (random) coverage of DNA in environmental samples (Berry et al., 2003
; Tyson et al., 2004
; Venter et al., 2004
) and making the genome sequencing of bacterial and archaeal isolates routine (Fleischmann et al., 1995
; Bult et al., 1996
). Next-generation sequencing has been applied broadly in metagenomic sequencing studies and helped the field advance beyond single-gene PCR-based studies. While providing rapid access to the catalog of community genes and enabling comparisons among different communities, the analysis of metagenomic data has remained largely gene and pathway centric (notable exceptions are discussed below).
Single-gene studies, metagenomic assemblies, and the genome sequences of a limited number of cultured isolates are not a sufficient basis on which to accurately model the responses of natural microbial networks or engineer the function of artificial communities as we desire. For example, gene catalogs and composite genomes assembled from metagenomic data do not presently distinguish between genes that are tightly coupled within the context of the same organism and genes that are coupled across different organisms. This is a critical limitation, because only gene products encoded by the same organism can freely come into contact with one another to form complexes, drive signaling pathways, or carry out multi-step enzymatic transformations of diffusible substrates at maximum rates. A systems-level predictive understanding of microbial physiology absolutely demands the interpretation of genes and pathways in a full genomic context. Furthermore, individual organisms (indeed single cells) encoding full genomes are the basic replicating unit of biology and an important unit of evolutionary selection, factors that cannot be ignored in understanding the development of microbial networks in larger populations as a function of time.
The need for whole genomes from microbial communities is clear. Although bringing novel isolates into axenic culture remains important to enable functional studies of new microorganisms, the traditional isolate sequencing paradigm falls short in three respects as a general approach to genome sequencing. First, although data on failed attempts are not often reported, the yield of axenic cultures from randomly targeted environmental organisms is understood to be low. Second, the distribution of successful cases is strongly biased. While an effort bias explains the preponderance of database genomes from heterotrophic human pathogens, other intrinsic biases have been recognized that favor isolation of organisms similar to those already cultured, as well as toward faster-growing organisms and those that depend to a lesser extent on interactions with the community network (Wu et al., 2009
). Finally, traditional isolation techniques are labor-intensive and slow, sometimes requiring years of effort due to the need for serial enrichment culture and the slow growth of organisms under suboptimal conditions, prompting the invention of automated systems (Connon & Giovannoni, 2002
The demand for greater numbers of more diverse genomes than are being delivered through isolate sequencing can be satisfied by an emerging spectrum of cultivation-agnostic approaches for genome sequencing (). These methods, although not requiring culture-based isolation, can be carried out in parallel with culture-based studies if conditions for growth are known. The lack of a requirement for axenic culture allows very broad application, with some schemes leveraging DNA present in the sample at the time of collection exclusively, permitting limited fixation of samples. The span of culture-agnostic genomics techniques ranges from new ways of processing standard metagenomic data sets to single-cell sequencing. These methods have variable requirements and result in genomic data sets with differing properties; as such, particular communities and target organisms are best served by different combinations of approaches. Such ambitious study designs will become increasingly tractable as sample preparation procedures are streamlined and bespoke instrumentation is commercialized.
Fig. 1 Methods for microbial genomics. (a) Standard metagenomics and sequence 'binning' to produce composite microbial genomes. (b) Targeted metagenomics and sequence 'binning' to produce composite microbial genomes. (c) Targeted enrichment of an organism to (more ...)
Composite genomes can be amassed from metagenomic contigs by classifying (or ‘binning’) reads according to the abundance of related reads and lineage-specific signatures such as nucleotide content signatures (Tyson et al., 2004
; Woyke et al., 2006
; Dick et al., 2009
; Hess et al., 2011
; Luo et al., 2011
; Tanaseichuk et al., 2011
; Wang et al., 2012b
; ). Although challenging in data sets from more complex microbial communities and for organisms with significant strain heterogeneity, this approach is expected to scale favorably with increased sequencing depth and advancements in assembly of metagenomic data (Mavromatis et al., 2007
; Namiki et al., 2011
; Peng et al., 2011
; Treangen et al., 2011
; Wrighton et al., 2012
). One way to enhance the targeting of organisms associated with a particular community function is through the use of enrichment culture under a condition designed to bloom organisms associated with a function of interest and/or to select against other organisms prior to the collection of a ‘targeted metagenomic’ data set (Hess et al., 2011
; ). Another avenue is the processing of tiny consortia that exhibit lower diversity simply because they contain a limited number of cells. Although not yet explored to a great extent, analysis of such microconsortia is now accessible thanks to the advances in whole-genome amplification (WGA) technology, although relative abundances of sequences from different strains are likely to change during WGA (Yilmaz et al., 2010
). When such physically aggregated groups of cells are selected, additional information about functional couplings can be accessed.
A third method, termed ‘targeted enrichment’ (Stein et al., 1996
; Hallam et al., 2006a
; Bergquist et al., 2009
), seeks to segregate a single organism by physically enriching for a target cell population based on combinations of phenotypic characteristics such as size, shape, density, and the spectral characteristics of native and applied fluorophores (Wallner et al., 1997
; Sekar et al., 2004
; ). The selected cells are then the basis for sequencing and assembly of a composite genome. This approach faces the challenge that the cell types are not uniquely delineated by measured properties in more complex samples. Even so, limited enrichment may improve the results of efforts to ‘bin’ genomes from mixed-organism data sets. Low yield of select cells in targeted enrichment is now readily ameliorated by the application of modern WGA methods and high-yield sequence library preparation procedures (Podar et al., 2007
Notwithstanding its limited applicability, isolate sequencing remains the gold standard method for microbial genome sequencing. In isolate sequencing, microgram quantities of high-quality DNA are purified from large-scale axenic cultures, which can be further propagated for functional studies of the microorganism. In many cases, isolate cultures originate from a single cell, facilitating construction of a clonal consensus genome, as all the cells sequenced are descendants of a single original cell (). However, lengthy procedures for enrichment culture under artificial conditions may select for new variants that are not representative of natural populations present in the original sample. Unlike genomes produced by the other methods discussed, isolate genomes are fully verifiable in the sense that additional equivalent material can typically be produced for confirmatory analyses and have the further advantage that the genome sequences obtained are insensitive to the sequencing and informatics methodologies used.
Single-cell sequencing is unique among current genomic approaches in yielding access to the genomes of individual cells without the complications of culture or compositing data from multiple cells or strains. Formally, single-cell genomic data sets are free of uncertainty in the grouping of sequence reads according to the strain of origin and can resolve extremely fine strain structure, hyper-variable loci, and phase variation at the whole-genome level without ambiguity.
The ability to resolve fine-scale heterogeneity is important in sequencing the genomes of asexual microorganisms, which undergo recombination less frequently than they reproduce. Populations of such organisms have the potential to rapidly diversify in a variety of patterns, as new mutations are not necessarily mixed through the population and there is no immediate imperative to maintain compatibility for recombine with the rest of the population (Koeppel et al., 2008
; Caro-Quintero & Konstantinidis, 2012
). The resulting mutational ‘fuzziness’, variation in the relative importance of mutation vis a vis
recombination, and variation in the rate of genetic exchange between geographically or ecologically distinct populations complicate phylogenic analysis and efforts to circumscribe microbial species (Koonin et al., 2001
; Gevers et al., 2005
; Smillie et al., 2011
; Denef & Banfield, 2012
; Shapiro et al., 2012
). Single-cell sequencing promises to provide direct access to fine-scale heterogeneity in complex microbial populations by resolving and linking ‘fuzzy’ diversity across whole genomes sequenced from individual cells (Pamp et al., 2012
More generally, cells are the fundamental quanta of biology, representing the most granular level where biological entities can command the full spectrum of biochemical activities. Not coincidentally, cells are both biologically relevant units of living matter and containers that physically subdivide biological samples—this is a convenience to experimental biology that should be utilized to the best effect possible. Analyses targeted to individual cells are an ideal approach to capitalize on this opportunity in biological science. Today, technologies enabling single-cell analysis are progressing and diversifying rapidly, as well as powering discovery in many fields of biological science. Besides environmental microbiology, many groups are currently active in applying single-cell genomic methods including WGA and shotgun sequencing to human cells. Although such applications to human genetics are not a focus of this review, many of the methods employed in these studies are related to those used for microbial samples and will be referenced where appropriate.
Single-cell single-gene sequencing studies were the first single-cell sequencing experiments. These studies, first in human cells (Küppers et al., 1993
; Sucher & Deitcher, 1995
; Maryanski et al., 1996
; Findlay, 1998
; Dietmaier et al., 1999
), then in microorganisms (Ruiz Sebastián & O’ryan, 2001
), depended on the physical (often manual) isolation of individual cells and the use of these cells as templates for PCR amplification and sequencing of specific genomic loci. The advent of higher-throughput automated systems enabled the application of this approach to larger numbers of bacterial cells (). Such techniques have been applied for multiplex PCR, product recovery, and sequencing of multiple loci per cell in uncultivated organisms to link phenotype-determining functional genes with phylogenetic markers, identifying phenotype–phylotype relationships (Ottesen et al., 2006
). Alternatively, this approach can be taken to correlate phage and host marker genes to establish phage–host relationships (Tadmor et al., 2011
). Nevertheless, such methods depend on targeting highly conserved or previously characterized genes with specific primers, which imposes strong biases and limits the scope of the approach for discovery.
Of all the genome-sequencing methods, single-cell whole-genome-sequencing workflows require the most demanding sample preparation: (1) single cells must be isolated with high confidence, (2) each cell’s envelope must be compromised such that (3) the DNA inside can be amplified by WGA free of contaminates to produce enough material to support (4) library preparation and (5) high-throughput DNA sequencing (). WGA is necessary despite the advent of commercial single-molecule sequencing technologies (Helicos, PacBio) and reliable protocols for the PCR amplification of finished sequence libraries. Fundamentally, this is the case because library preparation methods are not sufficiently conservative of the starting material. Given the low efficiency of library creation procedures, each locus in the raw material must be present in high copy number to avoid dropout of that locus in the finished sequence library. In this light, manufacturers currently specify minimum inputs in the nanogram range to prevent the loss of sequence information present in the original sample and to ensure that machine capacity can be utilized while minimizing redundant coverage of the raw input molecules.
In practical terms for single microbial cell sequencing, this means a million-fold amplification of the DNA present at the time of cell selection is required. Such high fold-amplification from subnanogram samples (Dean et al., 2002
) and individual bacteria (Raghunathan et al., 2005
) with good representation of the genome were first achieved by the multiple displacement amplification (MDA) WGA chemistry, but produced material with undesirable characteristics such as uneven representation and dislocated sequences. Nonetheless, investigators shortly succeeded in assembling shotgun sequence reads from single WGA-amplified Escherichia coli
(Zhang et al., 2006
), TM7 (Marcy et al., 2007b
), and sequencing multiple genes from E. coli
(Marcy et al., 2007a
), single marine bacteria (Stepanauskas & Sieracki, 2007
), and soil and cultivated archaea (Kvist et al., 2007
Several reviews on single-cell genomics are available that describe popular approaches and catalog the most recent examples that apply these methods (Lasken, 2007
; Binga et al., 2008
; Ishoey et al., 2008
; de Jager & Siezen, 2011
; Kalisky & Quake, 2011
; Kalisky et al., 2011
; Yilmaz & Singh, 2011; Kamke et al., 2012
; Stepanauskas, 2012
; Lecault et al., 2012
; Fritzsch et al., 2012
). The bulk of this review will present emerging approaches to single-cell genome sequencing in depth from a fundamental point of view, highlighting and placing in context the unique features of different methods and potential pitfalls, with the goal of facilitating forward-thinking experimental design for those new to this rapidly developing field.
Configuring experimental approaches for single-cell genomics is now wonderfully complex due to the diversity of experimental techniques and tremendous potential for synergy within integrative approaches considering different data sets or data types. This can be realized in parallel workflows, where comparative analyses evaluate single-cell genomes versus isolate genome sequences and/or composite genomes of differing flavors, or even metagenomic and transcriptomic data sets. More exciting still is the prospect of combined workflows, where, for example, metagenomic reads are incorporated into an assembly of single-cell data (Blainey et al., 2011
), and single-cell data are used to guide target selection or resolve phase variation in targeted enrichment. A promising approach is the use of single-cell analysis to parameterize and validate the binning of genomes from metagenomic data (Hess et al., 2011
; J.A. Dodsworth, P.C. Blainey, S.K. Murugapiran, W.D. Swingley, C.A. Ross, del Rio, S.G. Tringe, S.R. Quake and B.P. Hedlund, unpublished data). Conversely, metagenomic data sets can be used to overcome data quality limitations in the assembly of single-cell data sets (Blainey et al., 2011
; J.A. Dodsworth, P.C. Blainey, S.K. Murugapiran, W.D. Swingley, C.A. Ross, del Rio, S.G. Tringe, S.R. Quake and B.P. Hedlund, unpublished data) or to place single-cell data sets in the broader context of an entire microbial community.
The future of single-cell microbial sequencing is bright, particularly as technical approaches mature and diversify. Single-cell genomic data provide useful insights by themselves and particularly in combination with genomic and metagenomic data sets.