De novo assembly of complete microbial genomes using new DNA sequencing technologies and without the aid of Sanger sequencing has been an unsolved challenge. We have developed a bi-level integrative approach, the Meta-Assembly, that answers this challenge.
Our Meta-Assembly strategy is composed of four key phases. In phase one we integrated Illumina and 454 reads at the very beginning of our assembly process to generate hybrid contigs, instead of using Illumina reads only for error correction of an assembly generated from just 454 reads 
. This early integration step was very important for reducing the number of degenerate nucleotide positions ( and ) and thus for the overall quality of the assembly. Incorporating the Illumina reads early in the assembly process significantly reduced the number of degenerate nucleotides in the assembly (~41000 N's) compared to when they are used for just error correction of the assembly generated by Newbler (~90,000 N's). In addition, we used EULER-SR 
instead of VCAKE 
as the short read assembler–in distinction from an earlier report 
. The fact that de-Bruijn graph based algorithms like EULER-SR 
and Velvet 
outperform VCAKE 
has been documented in an earlier study 
, and we found the same trend with our data as well. Since assembly of Illumina reads is the first step of the hybrid assembly phase, the quality of the initial assembly has the greatest impact on the outcome of the entire process. Moreover, EULER-SR 
is also capable of performing a de novo
assembly with a mixture of Illumina and 454 reads, but its performance does not degrade with increasing read length (
). This proved to be a significant advantage, for we were able to exploit the complementary nature of EULER-SR 
and Newbler to develop the Scaffold Bridging and Finishing Phase–enabling us to resolve all of the degenerate nucleotides.
Comparison of Meta-Assembly to other assembly programs.
In the second phase, we maximized the complementary information provided by different assembly algorithms. This component of our strategy is a key distinguishing aspect of our approach. Although Newbler alone was able to assemble the reads into five scaffolds, the resulting assembly had a considerable number of degenerate positions which could not be resolved just from an error correction step using Illumina reads (). Similarly, while EULER-SR 
and Velvet 
both generated high quality contigs, they do not perform as well as Newbler with respect to leveraging the paired-end information in the 454 reads. Our results clearly show that integrating more than one assembly algorithm is very important for enhancing the quality of the assembly.
In the third phase, the simple PCR-based search strategy allowed us to quickly order and orient the scaffolds into a circular genome. This is another unique aspect of our approach in that we address the problem of relative orientation of the scaffolds as well as their ordering with just a few PCRs. While we use the PCRs to order the scaffolds into a circular genome, we did not fill any gaps as no sequence information is obtained from the PCRs. We note that as technology improvements allow paired-end sequencing reads with longer inserts, the necessity of this PCR step will decrease.
In the fourth and final phase, we aligned Illumina reads against the ordered scaffold to account for indels and errors induced during the scaffold finishing phase.
To our knowledge this is the first reported de novo assembly of a complete genome using next generation sequencing technologies. Furthermore, our comprehensive comparative analysis of genomic characteristics of 895 microbial genomes reveals that KN400 is a characteristic microbial genome and is not an outlier in the space of all microbial genomes. We view our result as the demonstration of general a strategy for assembling genomes, wherein multiple data types are integrated at specific steps in the process to maximize the potential of their complementary nature and wherein multiple assembly programs are utilized such that deficiencies in one algorithmic approach are compensated by the strengths of another algorithmic approach. As new sequencing technologies and new assembly programs become available, they can be readily incorporated in this framework. Genome assembly will remain challenging for the foreseeable future, and we view the idea of such a readily extensible meta approach as one of the most promising ways to meet this challenge.