PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of jbtJBT IndexAssociation Homepage
 
J Biomol Tech. 2009 April; 20(2): 116–127.
PMCID: PMC2685608

Identification of Optimal Protocols for Sequencing Difficult Templates: Results of the 2008 ABRF DNA Sequencing Research Group Difficult Template Study 2008

Abstract

The 2008 ABRF DNA Sequencing Research Group (DSRG) difficult template sequencing study was designed to identify a general set of guidelines that would constitute the best approaches for sequencing difficult templates. This was a continuation of previous DSRG difficult template studies performed in 1996, 1997, and 2003. The distinguishing factors in the present study were the number of DNA templates used, the number of different types of difficult regions tested, and the inclusion of a follow-up phase of the study to identify optimal protocols for each type of difficult template. DNA templates with associated sequencing primers were distributed to participating laboratories and each laboratory returned their sequencing results along with descriptions of the experimental conditions used. The data were analyzed and the best protocols were identified for each difficult template. This information was subsequently distributed to the participating laboratories for a second round of sequencing to evaluate the general applicability of the optimized protocols. The average improvements in sequencing results were 11% overall, with a range of −25% to +43% using the optimized protocols. The full results from this study are presented here and they demonstrate that general experimental protocols and common additives can be used to improve the sequencing success for many difficult templates.

Keywords: difficult template, DNA sequencing, research group study

INTRODUCTION

The “classical” Sanger DNA sequencing technique1 is a well-established and mature technology used very successfully in many core facilities and large sequencing centers. Until 2005, when the first of the new, highly parallel sequencing instruments were introduced,2 the Sanger method was the dominant sequencing technology used globally. Throughout the 31 years since its introduction, almost every step in the DNA sequencing process has been optimized, and re-optimized, as technology changed from radioactivity to fluorescence, and from slab gels to capillary-based systems.312 With all of these improvements, it is now routinely possible to obtain over 900 Q ≥ 20 bases13,14 for most typical DNA templates. However, this is not necessarily true when encountering a difficult region, operationally classified as such if sequencing is impeded using the standard ABI protocol.3 The complexity lies in the fact that there are many types of difficult regions,15 and each situation requires a unique treatment.15,16 Previously, papers have been published that have addressed either singular situations or a narrow range of templates,17,1822 and therefore have limited applications for the broader scope. Confounding factors are that individual laboratories are not standardized in terms of reaction conditions, cleanup methods, instrumentation, and laboratory protocols.

This study was designed to address some of the shortcomings of the previous studies, and to assist core facilities, commercial laboratories, and other units encountering such situations to deal more effectively with a variety of nonstandard templates. Although the next generation sequencing technologies are making tremendous strides in all aspects of sequencing applications, the Sanger methodology will remain viable for many years to come, and the ability to effectively sequence a variety of difficult templates will be of great and lasting importance to the success of any sequencing project.

METHODOLOGY

The DNA Sequencing Research Group (DSRG) designed this study to identify a general set of guidelines that would constitute the best approaches for sequencing of difficult templates. This was a continuation of previous DSRG research group studies performed in 1996, 1997, and 2003.23,24 The distinguishing factors in the present study were the number of DNA templates tested, the number of different types of difficult regions tested, and the inclusion of a follow-up phase in the study. This follow-up phase involved the generation of consensus “optimal protocols” for each difficult template.

The DSRG distributed a set of 8 templates containing a variety of difficult regions along with a control DNA (pGem3zf) to each participating laboratory (refer to Table 1 for characteristics of these DNAs). Participants were requested to sequence each template, in triplicate, employing as many different conditions as they wished. The resultant electropherograms were collected along with the associated conditions and formulations used by individual laboratories. This was designated phase I of the study. The data from phase I were analyzed and the two best protocols were identified for each difficult template and control sample. In phase II of the study, this information was distributed to participating laboratories, and each laboratory was requested to re-sequence the samples in a second round to evaluate the general applicability of the optimized protocols. Results from this second round (phase II) were then collected and analyzed. In both phase I and phase II, the participants were asked to record a number of parameters, including the amounts of DNA and primers, reaction conditions (volumes and dilutions of reagents and additives), cycling parameters, thermocycler used, cleanup methods, sequencing instrument, etc. For phase II of this study very specific polymerase chain reation instruments and cleanup methods were recommended for each optimized protocol; however, we were aware that it was unreasonable to expect that laboratories could follow all specific instructions, due to inherent restrictions in availability of instrumentation or technologies. We assumed that polymerase chain reaction instruments and cleanup protocols would have only negligible effects on the overall quality and read length of sequencing data, and that the most critical components would be the sequencing chemistry (mix of various dye-terminators and other additives) as well as precycling steps and cycling conditions.

TABLE 1
DNA Characteristics and Q ≥ 20 Ranges for Data from Phase I and Phase II

DNA Preparation—Quality and Distribution

To ensure the uniformity of all DNAs, a single laboratory prepared all the sequencing templates and distributed aliquots to participating laboratories. The samples were transformed using electrocompetent TOP 10 cells (Invitrogen, Carlsbad, CA, One Shot TOP10 Electrocomp E.coli, catolog no. C4040-52) and large-scale DNA preps were performed using High Purity Plasmid Maxiprep System (Marligen Biosciences, Ijamsville, MD, catalog no. 11452-026). The DNA concentration and A260/280 ratio was measured using a Nanodrop ND-1000 spectrophotometer (Nanodrop, Willmington, DE). All DNAs had A260/280 ratios > 1.8, indicating a high quality of DNA.25 In addition, aliquots of approximately 200 ng of all DNAs were run on 1% agarose gel26 to visually assess their quality and integrity (Fig. 1) using a low DNA mass ladder and a supercoiled DNA ladder (Invitrogen, Carlsbad CA, catalog nos. 10068-013 and 15622-012, respectively) for comparison.

FIGURE 1
Assessing the quantity and integrity of difficult templates used in the DSRG study. Aliquots of about 200 ng of each DNA were run on a 1% agarose gel using standard molecular biology protocols.26 Lanes 1–8 indicate DNAs 1–8. Lane 9 is ...

Data Analysis

The Q ≥ 20 values were calculated using Sequence Scanner v1.0 (Applied Biosystems; Foster City, CA). We also evaluated signal strength (using the same software), but the utility of these data were of limited value as various BigDye dilutions and cleanup protocols used by participants rendered signal strength comparisons inconsequential. We also evaluated the electropherograms using the “contiguous read length” metric in Sequence Scanner. This is defined as the longest contiguous read length with quality higher than a specified limit and we observed no substantial difference between Q ≥ 20 and contiguous read length parameters. The data also were analyzed using other software and algorithms including the KB basecaller (Applied Biosystems), LongTrace (Nucleics, Bendigo VIC, Australia), and PHRED.13,14 The Q ≥ 20 values were not significantly different; therefore, we report the data as Q ≥ 20 values using the Sequence Scanner software package. The Q ≥ 20 values are an accepted measure of quality for sequencing traces for standard DNAs.13,14 However, the visual inspection of chromatograms for all templates used in this study quite often indicated that the usable trace region was shorter than the reported Q ≥ 20 value.

Additives

In addition to BigDye v3.1 (Applied Biosystems, catalog no. 4337455) used for cycle sequencing, common additives were dGTP v3.0 (Applied Biosystems, catalog no. 4390229), betaine (available as a 5 M solution from various distributors, such as catalog no. B-0300 from Sigma Aldrich, St. Louis, MO, or catalog no. 77507 from USB, Cleveland, OH), and DMSO (Sigma-Aldrich, catalog no. D2650).

RESULTS AND DISCUSSION

Table 1 shows the characteristics of the DNA templates and the range of Q ≥ 20 values for phase I and phase II of this study. In phase I over 50 different protocols were tested from 21 different laboratories, and the participants submitted data with a wide range of Q ≥ 20 values (zero to greater than 1000 bases). The protocols producing the best results were identified and were selected for phase II. Table 2 shows the compilation of the 10 most optimal protocols submitted by the study participants, and these protocols are assigned to specific templates, shown in Table 3. Often, multiple protocols differed only in the number of cycles, so the protocol with a median number of cycles was chosen as the representative for the group. These protocols were distributed to the participants so that each laboratory could re-sequence the templates using one or both of the optimized protocols.

TABLE 2
Most Optimal DNA Sequencing Protocols Selected from Phase I
TABLE 3
Assignment of Protocols from Phase I to Specific DNA Templates

Somewhat surprising was the wide spread in read length for control DNA distributed with each set of difficult templates. This may indicate that there were factors not controlled for in this study, perhaps unknown experimental procedures that affected the sequencing results.

Of the original 21 laboratories, 12 submitted results for phase II (see Table 4 for the results from these laboratories for both phase I and phase II). Most results (90%) from phase II showed improvements over phase I results, but in 18 of the 180 different results analyzed, the participating laboratories produced better data using an in-house protocol than the recommended protocols distributed in phase II (Table 4, blue). The Q ≥ 20 scores from 36 of the 180 different results improved significantly using the protocols provided for phase II (Table 4, red). It is worth noting that, for the most part, those laboratories that submitted poor or average data in phase I demonstrated the most improvement in phase II (Table 4), and that most laboratories already included various combinations of dGTP, betaine, and DMSO in their initial formulations. Overall, the average Q ≥ 20 scores from phase I to phase II for the 12 participating laboratories improved by an average of 11% (Fig. 2). The maximum improvement was observed for the DNA1:forward primer, which showed an average of 43% improvement in Q ≥ 20 score. Only one template:primer combination (DNA5:Reverse) showed a decrease in average Q ≥ 20 scores in the phase II results (−25%). What was unexpected, however, was the wide range of scores also observed for the data submitted in phase II. It is likely that this individual variation is due to the fact that most laboratories, although following a standard set of reaction conditions, by necessity, could not standardize every aspect of the protocols (e.g., individual labs used cleanup protocols available to them, often not the protocol identified as best), as noted above.

FIGURE 2
The average Q ≥20 scores for each template/primer combination from laboratories returning data for both phase I and phase II. The average Q ≥20 scores for each template/primer combination from Table 4 are shown graphically. The overall ...
TABLE 4
Comparison of Sequencing Results for Each Participant During Phase I and Phase II

Examples of Sequencing of Few Difficult Templates

Figure 3 shows an average (A) and the best (B) chromatograms for DNA 1 (very GC-rich template). Figure 3C shows different, potentially impeding DNA sequencing motifs in this DNA, including a region which is greater than 95% GC, CCG trinucleotide repeats and dinucleotide nonrepeats. The mixed dye-terminator (BigDye v3.1/dGTP v3.0/5 M betaine:1.5 μL/0.5 μL/2 μL) and the standard ABI cycling protocol offered the most optimal conditions for sequencing of this template. Figure 4 shows an average (A) and the best (B) chromatograms for DNA 3 (containing a strong hairpin structure, as seen in Fig. 4C). Again, the protocol used for DNA 1 was the most optimal. The sequencing of this DNA, and similar templates with strong hairpins, can be tricky, as often one can get clean but incorrect data. Li et al.27 described the detailed analysis of this phenomenon. Using the protocol with Sequence Resolver kit28 produced very clean and correct data (J. Kieleczawa, data not shown; see also ref. 15). The DNA 5 contained a very long (456 bases) stretch of C/T dinucleotide nonrepeat, as shown in Figure 5C. When using typical sequencing conditions, the average read length is less than 400 bases (Fig. 5A) and for the best conditions (protocol 1) this read length exceeds 600 bases (Fig. 5B). The DNA 8 has an Alu and inverted repeat (Fig. 5C). On average, participants were able to obtain sequence with read lengths of approximately 600 bases (Fig. 6A) and the best data (Fig. 6B) were obtained using protocol 1. In each case, adding a heat denaturation step to the most optimal protocol15 improved the data quality (J. Kieleczawa, data not shown; see also ref. 15). A few years ago a new and powerful biochemical tool28 was developed to help sequencing through many types of difficult templates (Sequence Resolver Kit from GE Healthcare) but it seems that currently its use is limited.

FIGURE 3FIGURE 3
The average (A) and the most optimal (B) results for DNA 1. Note that for an average protocol, the sequence stops after approximately 300 bases, whereas the most optimal protocol yields sequence data passed 900 bases. Horizontal red bars indicate the ...
FIGURE 4FIGURE 4
The average (A) and the most optimal (B) results for DNA 3. Note that for an average protocol, the sequence deteriorates after approximately 200 bases. The most optimal protocol yields sequence data past 800 bases, although significant background noise ...
FIGURE 5FIGURE 5
The average (A) and the most optimal (B) results for DNA 5. Note that for an average protocol, the sequence deteriorates after approximately 300 bases, and the most optimal protocol yields sequence data to approximately 600 bases. Horizontal red bars ...
FIGURE 6FIGURE 6
The average (A) and the most optimal (B) results for DNA 8. Note that for an average protocol, the sequence deteriorates after approximately 600 bases. The most optimal protocol yields sequence data over 1000 bases, although significant background noise ...

In this study we have evaluated many different protocols used in DNA sequencing centers for their ability to effectively sequence various difficult templates. Given that the characteristics of each difficult template are ultimately determined by the primary structure of the DNA molecule and may appear in a nearly infinite variety of combinations, it is not surprising that we were unable to identify a single protocol that would be effective for all types of difficult regions. However, the results of this study can be used as general guidelines and approaches to serve as starting points for troubleshooting difficult template sequencing. For example, protocol 1 in Table 2 had the widest application and should be considered as the initial procedure for resolving the sequencing of troublesome regions. In addition, these data also demonstrated the validity of incorporating additives and reagents to aid in sequencing through the most difficult templates.

With the renewed commitments from Applied Biosystems (M. Rosoff, S. Santhanam, personal communication, Salt Lake City 2008) to the continued support of capillary-based systems, and the development of new dye-terminator chemistries,29 it is reasonable to expect that further and significant progress will soon be possible. In addition, bioinformatics tools30 used to predict various DNA sequencing impeding structures may enable more effective resolution of most nonstandard sequences. Such tools cannot, however, predict problems that may arise from templates of unknown sequence. Currently there are very limited data analyzing the effectiveness of the next generation of sequencing technology for sequencing through difficult templates. We hope to explore this subject in the near future.

Acknowledgments

We would like to thank Kim Marquette, Michelle Mader, and Erica Mazaika of Wyeth Research, Cambridge, MA for the sample preparation and distribution efforts. Without the dedication and great effort of all participating laboratories this study we would not have succeeded in providing the community with the best currently available protocols.

REFERENCES

1. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA. 1977;74:5463–5467. [PubMed]
2. Margulies M, Egholm M, Altman WE, et al. Genome-sequencing in micro-fabricated high-density picolitre reactors. Nature. 2005;437:376–380. [PMC free article] [PubMed]
3. ABI PRISM® BigDye™ Terminator v3.1 Cycle Sequencing Kit: Protocol: Rev A. Part number 4337035. Applied Biosystems, Foster City, CA, 2002
4. Automated DNA Sequencing Chemistry Guide. Part number 4305080B. Applied Biosystems, Foster City, CA, 2000
5. Azadan RJ, Fogleman JC, Danielson PB. Capillary electrophoresis sequencing: Maximum read length at minimal cost. Bio-Techniques. 2002;32:24–28. [PubMed]
6. Taylor GO, Dunn IS. Automated cycle sequencing of PCR templates: Relationships between fragment size, concentration and strand renaturation rates on sequencing efficiency. DNA Sequence—The Journal of Sequencing and Mapping. 1994;5:9–15. [PubMed]
7. Tabor S, Richardson CC. Effect of manganese ions on the incorporation of dideoxynucleotides by bacteriophage DNA polymerase and E. coli DNA polymerase I. Proc Natl Acad Sci USA. 1989;86:4076–4080. [PubMed]
8. Naeve CW, Buck GA, Niece RL, Pon RT, Robertson M, Smith AJ. Accuracy of automated DNA sequencing: A multi-laboratory comparison of sequencing results. BioTechniques. 1995;19:448–453. [PubMed]
9. Seto D. An improved method for sequencing double stranded plasmid DNA from minipreps using DMSO and modified template preparation. Nucleic Acid Res. 1990;18:5905. [PMC free article] [PubMed]
10. Yamakawa H, Ohara O. A DNA cycle sequencing reaction that minimizes compressions on automated fluorescent sequencers. Nucleic Acid Res. 1997;25:1311–1312. [PMC free article] [PubMed]
11. Kieleczawa J, editor. DNA Sequencing: Optimizing the Process and Analysis. Sudbury, MA: Jones and Bartlett; 2005.
12. Kieleczawa J, editor. DNA Sequencing II: Optimizing Preparation and Cleanup. Sudbury, MA: Jones and Bartlett; 2006.
13. Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–185. [PubMed]
14. Ewing BG, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8:186–194. [PubMed]
15. Kieleczawa J. Fundamentals of sequencing of difficult templates—An overview. J Biomol Tech. 2006;17:207–217. [PMC free article] [PubMed]
16. Kieleczawa J. Simple modifications of the standard DNA sequencing protocol allow for sequencing through siRNA hairpins and other repeats. J Biomol Tech. 2005;16:220–223. [PMC free article] [PubMed]
17. Gerstner A, Sasvari-Szekely M, Kalasz H, Guttman A. Sequencing difficult DNA templates using membrane-mediated loading with hot sample application. BioTechniques. 2000;28:628–630. [PubMed]
18. Ducat DC, Herrera FJ, Triezenberg SJ. Overcoming obstacles in DNA sequencing of expression plasmids for short interfering RNAs. BioTechniques. 2003;34:1140–1144. [PMC free article] [PubMed]
19. Esposito D, Gillette W, Hartley JL. Blocking oligonucleotides improve sequencing through inverted repeats. BioTechniques. 2003;35:914–920. [PubMed]
20. Langan JE, Rowbottom L, Liloglou T, Field JK, Risk JM. Sequencing of difficult templates containing poly (A/T) tracts: Closure of sequencing gaps. BioTechniques. 2002;33:276–280. [PubMed]
21. Thomas MG, Hesse SA, McKie AT, Farzaneh F. Sequencing of cDNA using anchored oligo dT primers. Nucleic Acid Res. 1993;21:3915–3916. [PMC free article] [PubMed]
22. Zhao X, Haqqi T, Yadav SP. Sequencing telomeric DNA templates with short tandem repeats using dye terminator cycle sequencing. J Biomol Tech. 2000;11:111–121. [PMC free article] [PubMed]
23. Adams PS, Dolejsi MK, Hardin S, et al. DNA sequencing of a moderately difficult template: Evaluation of the results from a Thermus thermophilus unknown test sample. BioTechniques. 1996;21:678. [PubMed]
24. Hawes JW, Escobar H, Hunter T, et al. DNA Sequencing Research Group Difficult Repeat Sequence Study. 2003. http://www.abrf.org/ResearchGroups/DNASequencing/EPosters/DSRG2003Study.pdf
25. Willfinger WW, Mackey K, Chomczynski P. Assessing the quantity and integrity of RNA and DNA following nucleic acid purification. In: Kieleczawa J, editor. DNA Sequencing: Optimizing Preparation and Cleanup. Sudbury, MA: Jones and Bartlett; 2006. pp. 291–312.
26. Sambrook J, Russell DW. Molecular Cloning. 3rd ed. Cold Spring Harbor, NY: CSH Laboratory Press; 2001.
27. Li T, Ait-Zahra M, Wu P, Kieleczawa J. Sequencing through various secondary structures: Detailed studies of pDEST vectors and other DNA templates with hairpins. In: Kieleczawa J, editor. DNA Sequencing III: Dealing with Difficult Templates. Sudbury, MA: Jones and Bartlett; 2008. pp. 109–123.
28. Xiao H, Fuller CW. Improving sequence results from difficult templates with Phi 29 DNA polymerase and nucleotide analogs: The TempliPhi Sequence Resolver Kit. In: Kieleczawa J, editor. DNA Sequencing III: Dealing with Difficult Templates. Sudbury MA: Jones and Bartlett; 2008. pp. 91–108.
29. Yang A. Solutions for sequencing difficult regions. In: Kieleczawa J, editor. DNA Sequencing III: Dealing with Difficult Templates. Sudbury MA: Jones and Bartlett; 2008. pp. 65–90.
30. Kieleczawa J, Koffman D, Lakshmanan B, Kitzmiller A. Bioinformatics tools to aide sequencing of difficult templates. In: Kieleczawa J, editor. DNA Sequencing III: Dealing with Difficult Templates. Sudbury MA: Jones and Bartlett; 2008. pp. 163–177.

Articles from Journal of Biomolecular Techniques : JBT are provided here courtesy of The Association of Biomolecular Resource Facilities