Environmental and clinical sampling for diagnostic, forensic, and metagenomic applications often yields mere nanograms of genetic material, an amount presently considered insufficient to support next-generation library preparation. Common practice is to amplify the materials using PCR or whole genome amplification, methods which introduce bias to the overall representation of the sample on an intentional or unintentional basis. There exists a clear need for a straightforward and reliable method to bring nanogram and subnanogram samples onto the next-generation sequencing platforms.
Quantifying the sequencing libraries by mass, as recommended in the sequencing protocols, presents three major stumbling blocks that render the quantification inaccurate to the degree where the sequencing results are compromised. First, mass-based quantification requires an accurate estimate of the length of the molecules to determine the molar concentration of DNA fragments. Second, degraded and damaged molecules that cannot be amplified in the massively parallel amplification step are counted. And third, methods of measuring DNA mass lack sensitivity, and are inaccurate at or below low-nanogram quantities.
Quantitative real-time PCR, and especially digital PCR, are ideal candidate techniques for this application because of their exquisite sensitivity. Some detection chemistries for real-time PCR, such as TaqMan, have the property of counting molecules rather than measuring DNA mass, although in the real-time modality, the measurements are relative and the methods by which standards are established often tie the real-time PCR results back to mass.
Recently, Meyer
et al. developed a SYBR Green real-time PCR assay that allows the user to estimate the number of amplifiable molecules in sequencing libraries [
11]. This was the first report of PCR-based quantification of sequencing libraries, and extended the sensitivity of library quantification significantly – although to an unknown extent, since the source material used to make the Neandertal (presumably the lowest input quantity) libraries was not quantified. However, the SYBR Green assay presents several disadvantages: SYBR Green I dye is an intercalating flurochrome that gives signal in proportion to DNA mass, not molecule number; SYBR Green assays rely on external standards that limit the absolute accuracy and are not universal to all sample types; finally, intercalating fluorochromes give signal from nonspecific PCR reaction products. After this manuscript was submitted, a report from the Sanger Center describing the use of real time Taqman PCR to quantify sequencing libraries appeared [
12]. While this eliminates some of the problems related to SYBR Green, it was not applied to trace libraries and suffers from the same drawbacks as all real-time assays.
In a real-time assay, the standard must have the same amplification efficiency and molecular weight distribution as the unknown library sample. This means the user must have on hand a bulk sequencing library very similar to the trace library being made and that the molecular weight distributions of both the standard and the new library be known – often an impractical requirement for low-concentration shotgun libraries. Furthermore, this standard library must be of extremely high quality if mass-based quantification is to be used to calibrate the assay for amplifiable molecules. If not, the concentration of all the unknown samples will be overestimated, and the yield of enriched beads or clusters will be poor. For this reason, Roche and Illumina recommend carrying out a four-point titration run on their sequencers to empirically determine the quantity of DNA to be used before carrying out a bulk sequencing run with a new library. In addition, Illumina recommends that the user check the library quality with traditional Sanger sequencing before its application to high-throughput sequencing.
Lastly, sequence-nonspecific detection chemistries like SYBR Green give signal from all dsDNA products generated, including primer dimers and nonspecific amplification products, which can be a severe issue in complex samples. In particular, side products can compete with specific amplification from low numbers (<1000) of template molecules, limiting the accuracy of SYBR Green quantification for dilute samples [
13]. Although the presence of these side products can often be discerned by analysis of the product melting curve, opportunities to optimize the primers are limited due to the short length of the adaptor sequences and the specific nucleotide sequences required for compatibility with proprietary sequencing reagents. Sensitivity to side products gives SYBR Green a tendency toward overestimation of the sample quantity.
The characteristics of the quantification methods discussed are summarized in Table . The digital PCR method eliminates the issues associated with mass-based quantification and real-time PCR, as well as the requirement for titration, significantly reducing the cost of preparing a library for bulk sequencing. For example, the marginal cost of titrating a 454 library on the sequencer according to the manufacturer's protocol is $1500 – $2000, while the cost to quantify a sequencing library on the digital PCR chips is $30 – $90, depending on the number of panels dedicated to each library (typically 1 – 3 panels per library). In addition, PCR-based quantification saves time and leaves the expensive sequencing instrument free to carry out bulk sequencing runs.