|Home | About | Journals | Submit | Contact Us | Français|
Despite enormous progress, the DNA sequencing of difficult regions, whether they are part of complex genomes or individual constructs, still presents a significant challenge and may require many trials, brute-force, or the intervention of a very experienced sequencer (and sometimes all of the above). Very early on, it was realized that sequencing of various types of difficult templates requires distinct treatments. To sequence through GC-rich regions, the addition of DMSO, NP-40/Tween-20 detergents, or the mix (4:1) of BD3.0:dGTP3.0 was sometimes helpful. To get through long poly-A/T tails, sometimes one would be successful using tailored poly- A/T (V/B) N primers or primers that spanned part of pre-tail and tail regions. A few years ago invitrogen introduced a set of sequencing additives that proved to be useful for many different types of difficult templates.
In this paper, we present data using a modified ABI DNA sequencing protocol that in combination with Invitrogen’s and others’ reagents allows for a clean read through many complex DNA regions. The major part of this modification is the incorporation of a 5-min controlled-heat denaturation step of the template in the low-salt buffer and in the presence or absence of various additives. A collection of over twenty difficult templates was tested under several conditions, and in 7 out 22 cases no data at all were obtained using a standard ABI-like sequencing protocol. By adding just a heat-denaturation step we were able to get 300–800 good-quality bases.
The last three decades have witnessed amazing progress in the development of many tools and techniques in molecular biology, one of which is DNA sequencing. Since its humble beginnings,1,2 when it still took a considerable amount of skill and effort to sequence even a relatively small molecule, DNA sequencing is now almost as routine as, for example, gene cloning or polymerase chain reaction (PCR) technology. Quite often, the only barrier to performing any sequencing experiment is the cost of a sequencing instrument itself (which ranges from below $100,000 to about $500,000, depending on manufacturer and required throughput). In the simplest incarnation of sequencing, all one needs to do is to combine a few components (DNA, primer, water, additives, if any, and detection mix), perform cycle sequencing, purify, and run on a detection instrument. If all is carried out correctly, one can expect up to 1000 readable bases/reaction. This is true most of the time, but only when the starting DNA template is of good quality, sufficient quantity, and does not contain regions that are considered difficult. For the purposes of this paper, we consider a template (or just some part of it) to be difficult when it cannot be sequenced using a standard ABI-like DNA-sequencing protocol.3
The difficult templates can be classified into the following categories:
One of the complicating factors in sequencing many different categories of difficult templates is the increasingly apparent realization that there may not be a one-method- fits-all solution and that each category requires a separate approach (or even set of approaches). A number of papers describe modified sequencing protocols. However, they appear to apply only to specific types of difficult templates at best. 4–7,15–19
On the other hand, a single approach that incorporates a 5-min heat-denaturation step (with or without some other additives) was successfully applied to many different categories of difficult templates.7,13,20
In this paper, we review both general and specific approaches to sequencing many different kinds of difficult templates, starting with the description of the heat-denaturation step and its impact on the quality of sequencing data.
All materials and methods presented in this review were extensively described in earlier publications7,13,20,27,29,31 and will be not repeated here unless there is a notable deviation. However, for convenience, we will define standard and modified sequencing protocols. Standard sequencing protocol refers to a basic protocol recommended by Applied Biosystems technical literature, as described, for example, in reference 3, with the exception that the final volume is 10 μL. Briefly, DNA, primer, water, and dye terminator mix are combined, and cycled 25 times: 96°C/10 sec, 50°C/5 sec, 60°C/4 min. In the modified sequencing protocol, we combine DNA, primer, and 10 mM Tris (pH 8.0), heat-denature samples for 5 min at 98°C, and then add dye-terminator mix. If additives are used, they are included in the heat-denaturation step.
In the early days, DNA sequencing was dominated by non-thermostable polymerases, primarily the T7-based Sequenase,21–23 and to sequence any double-stranded template, the first step needed to be strand separation for the efficient annealing of the primer. There were many protocols designed for strand separation, all including heating (22°C to 100°C) in the presence or absence of 0.1 to 0.3 N NaOH followed by neutralization, precipitation, and re-suspension in the desired solution.24–27 These protocols were time-consuming, cumbersome, and insufficiently reliable for routine use. Instead, a simple heat-denaturation step was suggested in low-salt buffers and at elevated temperatures, to convert supercoiled plasmid DNA efficiently to a single-stranded form amenable for sequencing.20 The denaturation can be carried out in water, but it occasionally produces additional bands that effectively reduce the amount of template available for sequencing (J. Kieleczawa, unpublished observation). The time needed to convert DNA effectively from supercoiled form to single-stranded (ss) form depends on the size of the plasmid; the bigger the plasmid, the shorter time needed for this conversion. It takes 7.5 min (10 mM Tris-Cl, pH 8.0 buffer, 98°C) to convert 75% of pGem3zf (3.2 kbp) from supercoiled to ss form. (The 75% conversion level is selected to avoid potential degradation of DNA.) For any plasmid bigger than pGem3zf, one needs to subtract 1 min per multiple of 2.5 kbp from 7.5 min to achieve a similar level of denaturation. In fact, there is a linear relationship between size of a plasmid, at least in the 3- to 20-kbp size range, and time needed for efficient conversion to a form suitable for sequencing.20 As a rule of thumb, the time needed to achieve a similar level of denaturation in water should be halved compared to the time in 10 mM Tris-Cl buffer. Furthermore, the linear relationship between the size of the plasmid and the time needed for conversion holds true only for plasmids that do not contain any difficult regions. For example, templates rich in GC or with CTT require 30 and 20 min, respectively, for 75% conversion (J. Kieleczawa, unpublished observation). Surprisingly, plasmids with long poly-A/T tracts (70–80 bases) also require up to 20 min for effective conversion to ss form under similar conditions. In the presence of 2 mM MgCl2 (the final concentration of Mg ions under optimal cycle sequencing conditions using ABI’s dye-terminator chemistry3), there is no, or very little, conversion of supercoiled to ss form either during prolonged heat denaturation at 98°C or during cycle sequencing (Figure 11).). It appears that the primary part that is transformed to a form amenable to sequencing is the nicked form; hence, one needs to use more DNA than necessary to compensate for the partial conversion. Table 11 shows the sequence data for different NaOH-induced plasmid denaturation protocols and the denaturation conditions recommended in this work. Based on the data presented in references 7, 20, and 28, there are the following advantages to incorporation of the heat denaturation step into a sequencing protocol:
The kinetics of reversal from ss form back to supercoil- like form is extremely slow (measured in days) both in the presence or absence of 2 mM MgCl2 and at temperatures ranging from 0 (ice) to 22°C (Kieleczawa, unpublished observation). So, even if the assembly of sequencing reactions takes a few hours, as it could in a factory-like sequencing pipeline, there is no need to take any special precautions after the initial heat denaturation (performed with DNA, primer, low-salt buffer, or water) and before the addition of dye-terminator mix. Furthermore, as pointed out by Kieleczawa and Wu,29 the dye-terminator mix can be stored for several days at room temperature (or even at 37°C) with no adverse influence on the quality of the resulting sequences. Hence, all assembly and pre-treatment steps can be safely performed at room temperature, with the only necessary precaution being control of evaporation of the sequencing mix to avoid excessive imbalance in the dNTPs and Mg ion concentrations.
In this section we briefly describe possible modifications that present the best chance(s) to sequence through specific types of difficult templates.
It is well established that the most important factor influencing good sequencing data is the quality of the DNA preparation, assuming the amount of DNA is sufficient. As the chemistry and sample treatment vary among commercial kits, it is reasonable to expect that there will be differences in quality of sequence data obtained when the same DNA is prepared using different kits.30 In our studies31 we prepared twelve different difficult DNAs using six conventional DNA preparation methods and a Templiphi-based protocol to evaluate whether there is any advantage using one kit over another. Two well-established commercial DNA kits (from EdgeBiosystems and Qiagen) seem to be most consistent in producing highquality DNAs leading to the longest reads,31 as shown in Table 33.. The Templiphi produces branched and somewhat unstructured molecules32,33 that, we were expecting, would be easier to read through, but unfortunately this turned out to be incorrect. The technical literature from Amersham34 (currently part of GE Healthcare) claims that certain types of difficult templates, prepared using Templiphi, are easier to sequence compared to those isolated with conventional methods. However, this claim needs to be independently verified.
In the era of Yahoo and Google, it is only natural that one would be tempted to scan the Web for information/advice regarding sequencing of difficult templates. Typing, e.g., “difficult DNA templates” into Google’s search engine results in about 341,000 hits (211,000 in Yahoo’s search engine), which leads to about 1000 sites as of January 2006. Rarely, though, does one get any advice beyond “DMSO or glycerol” for GC-rich samples, or some unspecified proprietary treatment in a few commercial sites. Quite unusual is also the fact that almost all sites cite the same/similar amounts of DNA needed for optimal read length, concentrations of interfering agents, etc., without providing references to any published or unpublished data. An example of such unsubstantiated advice is the recommended amount of PCR fragment needed for optimal read length: use 10 ng of DNA/100 bases. In our experiments28 we demonstrate that even for a PCR fragment of 1250 base pairs, 1 ng of DNA is fully sufficient for optimal read length. In fact, we carried out similar experiments on PCR fragments of about 500 and 900 bases (with the amount of DNA/reaction varied from 0.1 ng to over 500 ng) and the results did not support the 10 g/100 bases claim (J. Kieleczawa, unpublished observation); in all cases, just 1 ng of DNA gave the optimal read length.
Therefore, a reader (PC user) seeking fast Web advice concerning any aspect of DNA sequencing needs to be highly skeptical and selective, as the advice one gets may not necessarily be helpful (although in most cases it will not hurt either).
In this paper, we have reviewed past and current methods to sequence through many types of difficult templates. Although a number of solutions were suggested over the last few years, they all seem to be quite specific to a particular type of difficult template. The possible exception is the heat-denaturation modification, which appears to be more broadly applicable for several different difficult templates and in combination with several additives gives the best chance to obtain good-quality data. Data presented in Table 22 show that for 32% of difficult templates, adding just a heat-denaturation step was the only possible way to obtain any sequencing data. For 22% of templates, the improvement was on the order of 5–10% in read length. Increase of 10–50% in read length was evident for another 14% of templates, and for the remaining 32% the increase in read length was on the order of 50–175%. It is apparent that more experiments need to be carried out to develop more general rules, and it is entirely possible that, in the interest of time, any given very difficult template may need to be sequenced using a few different chemistries in parallel. One of the potential venues to speed the development of general rules for sequencing of difficult templates is to organize a bank of well-characterized DNAs and involve the broad community in applying a range of technologies. The model of such community effort is well established—e.g., the DNA Sequencing Research Group (DSRG) conducted a study on the effect of DMSO on sequencing of some GC-rich templates.5,6 In collaboration with bioinformaticians, who now have extensive computational tools, it should be possible to look for a more detailed correlation between sequence patterns, especially right before and after difficult regions, and the success of a given chemistry. It is also conceivable that one of the big sequencing centers, NIH or some other entity, will establish a special unit that will investigate in greater detail the sequencing of all types of difficult templates, and the art will become science.
As we approach an era of $1000-a-genome sequencing, one still-unanswered question is how these new technologies will deal with more complex regions. It is possible that difficult regions will not present any challenges, as most of these methods rely on assembly of very short overlapping fragments. Hopefully, researchers who are directly involved in the development of new technologies will be able to answer such questions relatively soon.
I wish to thank Dr. Laird Bloom of Wyeth Research for critical review of this manuscript and many valuable suggestions. The support and encouragement by the management of the Biological Technologies department of Wyeth Research is also greatly appreciated. I would like to thank Jones and Bartlett Publishers, Sudbury MA, for permission to reprint table 1-1 from (J. Kieleczawa, ed) DNA Sequencing: Optimizing the Process and Analysis, 2005, and tables 1-1 and 1-4 from ( J. Kieleczawa, ed) DNA Sequencing: Optimizing the Preparation and Cleanup, 2006.