|Home | About | Journals | Submit | Contact Us | Français|
Anecdotal and not well-established evidence implies that there could be some effect of primer proximity in relation to a difficult region on read length and sequence quality. In this paper we sequenced many different categories of difficult regions where primers were located at various distances in relation to such regions and we found that there is only weak, if any, correlation between primer proximity and read length or sequence quality. The occasional improvements observed in some studies could be related instead to more optimal primers or better quality DNA. We suggest that instead of trying to design primers at varying distances to a difficult region, sequence finishers concentrate on applying modified chemistries appropriate to a given difficult region.
Despite tremendous advances using next generation sequencing technologies,1–3 the elucidation of DNA sequence by the Sanger protocol4 is still the preferred method of choice in most DNA core facilities as well as in small and big sequencing centers. Advances in sequencing chemistries,5–8 optimization of auxiliary protocols,9–12 and improvements in instrumentation have made this technology flexible, reliable, and easy to use. Assuming that the quality and the quantity of the DNA preparation are acceptable, one can easily obtain over 900 bases of good quality for most nondifficult templates. However, if the DNA template is difficult—defined as one that cannot be sequenced using the standard ABI-like protocol5—more advanced protocols are needed to get clean read through. In the last few years, significant progress has been made in sequencing through many kinds of difficult templates.13–22 However, one almost unexplored aspect of sequencing through any difficult region is the effect of primer proximity to a difficult region on the read length. Currently, to our best knowledge, only anecdotal evidence exists (e.g., Ref. 23, personal communications) that there could be some effect of primer proximity to a difficult region on the read length and sequence quality. In this paper, we systematically explore the effect of primer proximity in relation to a number of difficult regions on the ability to obtain clean and long read lengths through such regions.
Fourteen DNA templates used throughout this study contained a variety of difficult-to-sequence regions and were primarily collected through standard submission of sequencing requests to a DNA sequencing group at Wyeth, Cambridge. All of these templates were prepared using Marligen’s PowerPrep HP Plasmid Maxiprep System (Ijamsville, MD) and some DNAs were also prepared using Sequence Resolver Kit.11
DNA sequencing (in triplicates for each primer), cleanup of sequencing reactions, and electrophoresis were carried out as described before.14,15 Modifications to a standard DNA sequencing protocol are described in the legend to Figure 6. All dye terminator mixes were purchased from Applied Biosystems (Foster City, CA) and betaine was from Sigma (Sigma-Aldrich, St. Louis, MI). Data were analyzed using Sequencher program (Gene Codes, Ann Arbor, MI), and for assembly into contigs, only traces with a median read length, out of three, were used.
Primer selection for this study was greatly facilitated using the Find Primer algorithm which is part of the DNA sequencing LIMS developed at the Wyeth core facility.24–26 Briefly, this algorithm matches all primers available in our library against a reference sequence, and positions and orientations of found primers are displayed. If needed, new primers can be designed at specified intervals by using another algorithm developed at the Wyeth core facility. An example of such a primer match is shown in Figure 1. In each case presented in this paper, several primers (from 3 to 28) were selected on both sides of a difficult region (Table 1). To predict various potentially difficult-to-sequence regions in templates, we developed the “Examine Repeats” algorithm26 which can calculate up to seven various structures. In addition, the GC module calculates GC content in a reference sequence at specified intervals (Fig. 1). Examples of such predictions are shown in Figures 2 and and33.
Note: All primers used in this study passed primer design criteria as specified by Primer Designer software from Scientific and Educational Software (Cary, NC); Tm 54–70°C, GC% = 55±10, stability > 1.3 kcal/mole (3′ vs 5′), matches at 3′ end < 3, hairpin separation < 7, base runs < 4, adjacent homologous bases < 7, and repeats:dinucleotide pairs < 3.
The characteristic of difficult regions in each of the 14 templates used in this study, as well as the number of primers used in forward and reverse directions, is presented in Table 1. The forward/reverse range indicates the distance (in bases) from the 3′ end of sequencing primers to the beginning of a difficult region.
There are two general cases observed in this study. Case 1: The forward and reverse reads stop at the beginning of a difficult motif (DNAs 4–6) or at some distance into such aregion (DNAs 2, 13) without completely getting through, with the consequence that there is no assembly into a single contig. It is obvious that in this case read length is dependent on the distance of a primer from a difficult region, but in no situation was it possible to sequence through such a region. Case 2: The forward and reverse primers read through a difficult region and assemble into a single contig (all other DNAs in this study). Reads are somewhat shorter (with few exceptions) compared with typical read lengths of over 900 bases, and relatively small standard deviations (1–15% with median of about 4.5%) for reads in either forward or reverse directions indicates the lack of significant effect of primer position on the ability to obtain better quality and longer reads. Figure 4 shows an example of Sequencher assembly for DNA template 5 containing a strong 24-base hairpin. All 11 sequences, regardless of the distance to the hairpin, terminated at the beginning of a hairpin and did not overlap with sequences generated using reverse primers (not shown here). In Figure 5 (DNA 8) the forward and reverse reads assemble into a single contig but there is no significant effect of primer proximity to CA/GT dinucleotide repeats on the read length. Table 2 shows individual Q 20 read-length values corresponding to data presented in Figure 5.
In all cases presented in this work (107 forward and 150 reverse primers tested on 14 different difficult templates), we did not observe any significant effect of primer proximity to a difficult region on the ability to read through a difficult region (in DNAs for case 1) or on the substantially increased read lengths and better quality for DNAs representing case 2. A much better option to successfully sequence through any kind of difficult template is to use modified chemistry,14,15 as shown in Figure 6A,B, or a template that was prepared with a different preparation method.11,27 The data in this figure show the significant variations (for the same primer) of read length depending on the type of chemistry used. It is also evident that the most optimal type of chemistry depends on the direction of sequencing. This phenomenon is explored more deeply in an upcoming paper based on the interlaboratory study conducted by the DNA Sequencing Research Group on a much larger set of difficult templates (J. Kieleczawa et al., accepted for publication in JBT).
I wish to thank Drs. L. Bloom and B. Ulmer for critical reading and numerous suggestions during the preparation of this manuscript.