A myriad of uncultivated bacteria are found in environments ranging from surface ocean
1 to the human body
2. Advances in DNA amplification technology have enabled genome sequencing directly from individual cells without requiring growth in culture. These genome-centric culture-independent studies are a powerful complement to gene-centric metagenomics studies.
Genome sequencing requires that the femtograms of DNA present in a single cell be amplified into the micrograms of DNA necessary for existing sequencing technologies. Genomic sequencing from single bacterial genomes was first demonstrated
3 with cells isolated by flow cytometry, using multiple displacement amplification (MDA)
4–6 to prepare the template. MDA is now the preferred method for whole genome amplification from single cells
7,
8. The first attempt to assemble a complete bacterial genome from one cell
9 further explored the challenges of assembly from MDA DNA, including amplification bias and chimeric DNA rearrangements. Amplification bias results in orders of magnitude difference in coverage
3, and absence of coverage in some regions. Chimera formation occurs during the DNA branching process by which the phi29 DNA polymerase generates DNA amplification in MDA
10, but increased sequencing coverage helps to alleviate this problem.
Single cell sequencing methods have enabled investigation of novel uncultured microbes
11–13. However, while recent studies have continued to improve assemblies
14–18, the full potential of single cell sequencing has not yet been realized. The challenges facing single cell genomics are increasingly computational rather than experimental
17. All previous single cell studies used standard fragment assembly tools
19,
20, developed for data models characteristic of standard (rather than single cell) sequencing. These algorithms are not ideal for use with non-uniform read coverage. Most existing fragment assembly tools implicitly assume nearly uniform coverage, and most produce erroneous contigs (linking non-contiguous genomic fragments) when the rate of chimeric reads (or chimeric read pairs) exceeds a certain threshold. Thus, there is a need to adapt existing fragment assembly tools for single cell sequencing.
We developed a specialized software tool for assembling sequencing reads from single cell MDAs. Applying it to assemble single cell datasets from two known genomes and an unknown marine genome yielded valuable assemblies that identified the majority of genes, with no efforts to close gaps and resolve repeats.