|Home | About | Journals | Submit | Contact Us | Français|
Herein we present Gene-Collector, a method for multiplex amplification of nucleic acids. The procedure has been employed to successfully amplify the coding sequence of 10 human cancer genes in one assay with uniform abundance of the final products. Amplification is initiated by a multiplex PCR in this case with 170 primer pairs. Each PCR product is then specifically circularized by ligation on a Collector probe capable of juxtapositioning only the perfectly matched cognate primer pairs. Any amplification artifacts typically associated with multiplex PCR derived from the use of many primer pairs such as false amplicons, primer-dimers etc. are not circularized and degraded by exonuclease treatment. Circular DNA molecules are then further enriched by randomly primed rolling circle replication. Amplification was successful for 90% of the targeted amplicons as seen by hybridization to a custom resequencing DNA micro-array. Real-time quantitative PCR revealed that 96% of the amplification products were all within 4-fold of the average abundance. Gene-Collector has utility for numerous applications such as high throughput resequencing, SNP analyses, and pathogen detection.
DNA analysis instruments are becoming increasingly more powerful in the capacity of sequence analysis. DNA resequencing microarrays (1,2) and high throughput parallel sequencing instruments (3,4) are currently used for whole genome analyses of low complexity genomes down to single nucleotide resolution. However, the human genome remains too large to access without complexity reduction by directed amplification of specific sequences. To match the throughput of these instruments, the amplification bottleneck needs to be addressed with more efficient technologies.
To increase assay throughput and allow for more efficient use of precious DNA samples, simultaneous amplification of many targets can be carried out by combining many specific primer pairs in individual PCRs (5,6). However, it is one of the crucial problems with PCR that when large numbers of specific primer pairs are added to the same reaction, both correct and incorrect amplicons are formed. At a later stage, this skews the uniformity of the products to the point where many amplicons drop out in favor of artifacts. Even with careful attention paid to the design of the primers, PCR is usually limited to 10–20 simultaneous reactions before yield and evenness is compromised by the accumulation of irrelevant amplification products (7,8). Therefore, large numbers of separate PCRs are typically performed whenever many genomic sequences need to be analyzed.
The correct amplicons in a multiplex PCR have a unique feature compared to the false ones in that their end sequences are composed of a cognate primer pair as apposed to a primer from one pair combined with a primer from another pair. The method we present herein takes advantage of this feature and specifically circularizes only the cognate paired ends through hybridization and ligation on a so-called Collector oligonucleotide probe. After the specific circularization reaction, two measures are used to enrich for circular DNA, exonuclease treatment for selective degradation of linear DNA and by rolling circle amplification. The method is thereby not limited by the primer cross-reaction-based amplification artifacts typically associated with multiplexed PCR.
Gene-Collector is related to the previously published Selector technology (9). Instead of circularizing multiplexed PCR-amplified DNA targets, the Selector technology circularizes specific genomic DNA targets derived from restriction enzyme digestions. As a consequence, the Selector technology requires a unique probe design for every specific set of target sequences (10), which renders it less modular in comparison to Gene-Collector, where new sets of Collector oligonucleotides can be mixed with any previously existing ones, making all Collector probes compatible with each other. We demonstrate the specificity and flexibility of Gene-Collector by multiplex amplification of 170 targets located in the coding regions of 10 human cancer genes: EGFR, AKT1, AKT2, APC, FRAP1, KRAS, MARK3, SMAD4, TGFBR2 and TP53.
All oligonucleotides were synthesized at the Stanford Genome Technology Center, see Supplementary Table 1 for primers, probes and target amplicon sequences. The thymidines were substituted with uracil bases in the Collector probes for degradation purposes by uracil-DNA glycosylase. However, this enzymatic procedure was later found not to be necessary and removed from the protocol.
First, multiplex PCR was run in 50μl with all 340 primers (170 pairs) at 100nM concentration each using 10 units pfu polymerase in 1 × pfu buffer (Stratagene), 200μM each dNTP and 200ng human genomic DNA, at 95°C for 5min—[(95°C for 30s; 55°C for 2min; 72°C for 8min) × 8] followed by 72°C for 10min. Excess primers were removed by the addition of exonuclease I and incubated for 30min at 30°C, followed by removal of enzymes by a Qiagen PCR purification column. Amplicon circularization by ligation was performed on 20nM of each collector probe in 1× Ampligase buffer (Epicentre), 5 units Ampligase, 5 units OptiKinase (USB), 1mM ATP, 1mM DTT at 37°C for 30min—[(95°C for 30s; 65°C for 2min; 55°C for 1min, 60°C for 5min) × 10] in 50μl. A combination of exonuclease I, exonuclease T7 gene 6 and λ-exonuclease reduced the amount of linear DNA during 45min at 37°C and then stopped when heated for 20min at 80°C. The circular DNA was concentrated by a second Qiagen PCR purification column eluted in the supplied elution buffer and set to evaporate for ~45min at 65°C. One microliter of the 10-fold concentrated circles were added to a 10μl TempliPhi reaction (GE) supplemented with 10% DMSO and run at 30°C for 16h, then inactivated at 65°C for 10min.
A 50-kb high-density DNA array was designed by Affymetrix to match the 10-gene reference sequence. The collector amplified product was purified in a PCR purification column (Qiagen). One hundred and fifty nanograms of purified product was fragmented, labeled and finally hybridized according to the protocol provided by Affymetrix (GeneChip CustomSeq Resequencing Array Protocol). The array was washed and stained using the Affymetrix GeneChip Fluidics Station 450 and scanned using GeneChip Scanner 3000 according to the protocol. The scanned probe array image was analyzed using Affymetrix GeneChip Sequence Analysis Software.
Ten microliter reactions containing 400nM of qPCR primers specific for the individual amplicons with 2μl of the TempliPhi reaction diluted 1000-fold in TE buffer were performed to assay their relative abundance. Bio-Rad Sybr Green master mix (1×) was used on an ABI 7900 instrument, see Supplementary Table 1 for primers.
Coding-sequence-specific PCR primer pairs were designed using ExonPrimer (http://ihg.gsf.de/ihg/ExonPrimer.html) for 10 cancer genes, see Supplementary Table 1. The resulting 170 primer pairs were synthesized and pooled into one tube. A multiplexed PCR was then run for eight cycles using pfu polymerase which generates blunt-end PCR products suitable for circularization by ligation (11). Excess primers were then removed using a single strand-specific exonuclease followed by a Qiagen PCR product purification column. A pool of Collector probes, each specific to one correct amplicon then guided a circularization reaction of matched PCR primer pair ends and closed circles were formed by a DNA ligase enzyme. The ligation reaction also involved a pre-step at 37°C for phosphorylation of 5′-ends by a kinase enzyme prior to ligation. Circularization was then followed by the addition of an exonuclease cocktail to degrade linear DNA such as amplification artifacts, genomic DNA and excess Collector probes. The circularized sequences were finally amplified using hyper-branched rolling circle amplification with random hexamers and phi-29 polymerase, TempliPhi (12). An outline of the Gene-Collector procedure is displayed in Figure 1.
The success rate of the amplification was assessed by hybridizing the final product on an Affymetrix custom-designed resequencing array containing probes scanning the coding sequence of these 10 genes with four variant probes for each nucleotide position, A, T, G and C. The array revealed that 90% of the target sequences had been successfully amplified as assessed by providing accurately read sequence for at least 30% of the nucleotides in each individual amplicon located in continuous stretches of sequence. The performance of the resequencing array itself will be reported elsewhere (Dahl et al. in preparation). Using real-time PCR with primers specific to the individual amplicons, we evaluated the failed amplifications and at which stage of the Collector protocol they had dropped out, see Table 1. Several sequences could probably be recovered through re-design of the initial multiplex PCR primers or by using prevalidated primer sets.
Uniform abundance of each product is an important feature of any multiplex amplification protocol, especially when used as a sample preparation step for the next generation high-throughput sequencing instruments, to avoid over- or under-sampling of target amplicons. The initial multiplex PCR is conducted under very non-stringent conditions in order to give all target sequences the best chance of efficient amplification. This would normally generate many amplification artifacts but these are efficiently removed by circularization and exonuclease degradation. To ensure uniformity of the multiplex-PCR, extension times were required to be long at 8min, with primer hybridizations conducted at 55°C for 2min. Each stage of the reaction was analyzed for evenness by quantitative PCR, see Figure 2. Surprisingly, some primer pairs which did not work in individual PCRs under standard conditions as analyzed by agarose gel, did produce the correct product with the Gene-Collector procedure (data not shown). The final amplification by TempliPhi was supplemented with a 10% final DMSO concentration to reduce the skewing effects of varied amplicon GC content. The average abundance of each final product was estimated to be at ~10nM in a 10μl reaction volume with 96% of all amplicons having no less than one-fourth of the average abundance.
In order to measure the levels of false amplification products generated by the Gene-Collector protocol, the final product was cloned and sequenced. The TempliPhi reaction produces concatemeric products of ~10kb each, which were fragmented by sonication, gel purified and cloned into a sequencing vector. When 96 colonies were picked and Sanger sequenced, 93 reads showed that 58% of the reads were of expected products, see Table 2. As cloning selects the sequence representation randomly, it provides an additional measure of frequency distribution. Most amplicons appeared only once showing even representation. Nine amplicons appeared twice and two of the targets three times. No non-specific products appeared more than once. The fraction of paired matched primers found among the non-specific products was much lower than for the specific ones. As can be seen in Table 2, few non-specific products were formed by two matched primer pairs amplifying a non-target sequence. This type of false product would still become circularized by the Collector probe but are not the main source of errors. A complete list of sequences is available in Supplementary Table 2. As expected from cloned rolling-circle-amplified material, many sequencing reactions produced concatemeric reads of repeated elements. Interestingly, this provided redundant sequencing within one and the same read with up to 3-fold coverage.
We have amplified all the coding sequences located in 10 cancer genes using a multiplexed procedure termed Gene-Collector. Resequencing of large numbers of cancer- related genes has recently shown to provide important biological insights into the disease (13). Even with extensive optimization, standard multiplex PCR is not a feasible approach to large-scale genetic studies as the failure rate is too high due to the many false amplicons out competing the correct ones for the amplification reagents. However, even though these false amplicons do result, the correct products are also present and at uniform abundance early in the amplification. Gene-Collector reduces the presence of false products enabling further amplification of the correct ones.
The presented initial multiplex PCR had very relaxed conditions in order to give all primer pairs the ability to hybridize through the use of low hybridization temperature and long duration. Polymerization of all templates was assured by a long extension time and an ample amount of DNA polymerase. This condition was suitable for all amplicons as the Collector procedure removes artifacts by exonuclease degradation. Primer-dimer artifacts, which are a major problem in traditional multiplexed PCR, are of little concern for Gene-Collector as the circularization process is impossible of such short DNA strands due to the lower limit size constraints of partially double stranded circular DNA (14).
Alternatively, one may use PCR in the final amplification of the circularized amplicons, which then gives distinct bands on standard agarose gel (Baner and Fredriksson in preparation). This version of the Gene-Collector protocol includes a general primer pair motif within the Collector probe and generates a purer product than the randomly primed RCA. This could, for example, be suitable for rapid multiplex pathogen detection using electrophoretic separation.
The relative abundance of products from the rolling circle reaction was very even. The rarely observed unevenness of this final product could be due to various factors. The lengths of amplicons spans from 160 to 800bp and with varied GC content, possibly resulting in different circularization efficiency and/or final amplification efficiency. As only a few of the amplification artifacts found by cloning and Sanger sequencing contained a primer sequence, we believe these to be mainly associated with the randomly primed RCA which is known to also amplify linear DNA but with a much lower efficiency. The impurities may also be derived from remaining fragments of genomic DNA and if so, their relative presence should decrease with increased levels of multiplexing. Further improvement of the final product purity is desired for certain applications and is under development. One may also note that target sequences could be arrayed if the circularization is performed on immobilized Collector probes.
Gene-Collector should be of great value for a wide range of amplification-based applications, particularly in combination with highly parallel DNA analysis platforms. The level of further multiplexing achievable with the Gene-Collector protocol will probably be more limited by how many primer pairs one can use in the initial multiplex PCR then on the circularization process. One class of parallel DNA analysis is large-scale sequencing and resequencing platforms (15), such as sequencing by hybridization (1,2), sequencing by ligation (4) or sequencing by synthesis (3) systems. The Collector technology also displays promising properties to be combined with PCR-intense genotyping methods (7,16), like mini-sequencing (17,18) and primer extension-based methods in concert with mass spectrometry analysis (19), as well as high throughput pathogen detection. Gene-Collector could also be combined with genetic variation detection techniques that require many single PCRs (20,21) to increase assay throughput.
In summary, the presented multiplexed protocol enables analysis of small and precious sample materials, reduces enzyme consumption and offers higher throughput of DNA amplification.
Supplementary Data is available at NAR online.
This work was supported by the Swedish Research Council, The Swedish Society for Medical Research, The Wenner-Gren Foundations, and the NIH (Center Grant 2P01HG000205). Special thanks to Keith Anderson and Mike Jensen at the Stanford Genome Technology Center for synthesis of oligonucleotides. Funding to pay the Open Access publication charge was provided by NIH (P01HG000205).
Conflict of interest statement. S.F. and F.D. are inventors on a patent application describing the published method.