The Genome Analyzer is a powerful sequencing technology, yet still relatively new, and consequently it has not yet reached its full sequencing potential. Here we have described a number of modifications that allow for more efficient library preparation, and which enable a stable workflow in a production environment.
At the Sanger Institute, in addition to a sequencing research and development team we have several teams who are responsible for keeping the production instruments running. A library-making group processes samples, and generates, QCs and quantifies libraries. A production group, working in shifts, prepares and QCs flowcells by SybrGreen staining, prepares reagents for sequencing, and manages washing, priming and loading the instruments seven days per week. Informatics teams are responsible for facilitating sample tracking, for handling the sequence data and for performing pipeline analyses. All steps in the process are recorded using custom-written lab-tracking and run-tracking database software. All Genome Analyzers are networked and the generated image data is continually uploaded to a large compute and disk-storage cluster for image/base-calling analysis, alignment/assembly and other informatics tasks. Images are kept for about 1 month on a disk server, whereas run QC and other run details are stored in a database and short-read sequences are deposited permanently in a large repository. Individual sequencing projects are coordinated and overseen by a team of project managers.
We have recently upgraded all of our Genome Analyzers to the model 2. The wider flowcells used by upgraded machines offer a 40% greater imaging area, with the potential for increased read lengths (>70 bases) of a higher quality (below 1% for 1-50 bases). Combined with improvements to the image analysis software and a faster run time, both of which we are currently testing, a conservative prediction is that by Christmas 2008, our output will reach 6-10 terabases of high-quality sequence per year - equivalent to 180 human genomes at 15-fold coverage, or approximately 200,000 bases per second.
The improved workflow and high yield should maintain the Genome Analyzer as our next-generation sequencing platform of choice for the immediate future. How long this remains true depends upon the performance of existing rival technologies: 454, ABI’s SOLiD, Helicos’ ‘True Single Molecule Sequencing’ (http://www.helicosbio.com
) and Dover Systems’ Polonator (http://www.polonator.org
), and those that are on the horizon, such as nanopore technologies, for example Oxford Nanopore Technologies (http://www.nanoporetech.com
) and the Harvard Nanopore Group (http://golgi.harvard.edu/branton/index.htm
), and Pacific Biosciences’ Single Molecule Real Time technology (http://www.pacificbiosciences.com
), which promise to bring us closer to the eagerly anticipated $1,000 genome.