Using next-generation DNA sequencing (NGS) technologies, there has been a dramatic increase in intermediate-scale, targeted resequencing applications. This is a generally useful approach for discovering polymorphisms and mutations of interest among candidate regions and validating novel variants and mutations from complete genomes and exomes (1
). NGS-based targeted resequencing also has immediate application as a clinical diagnostic for identifying pathogenic mutations in medical conditions such as inherited diseases and cancer. Therefore, it has become increasingly important to develop accessible, cost effective and flexible methods that can be used to design customized capture assays targeting any region throughout the entire human genome beyond coding sequences. Currently there is very little available with regard to conducting targeted resequencing of non-coding human genome regions. We present a genome-wide solution towards targeted resequencing of loci from the human genome. Relying on a capture technology we developed, our genome-wide design covers the human genome with in-solution capture probes. As a result, it provides both exome coverage as well as facilitating the analysis of non-coding regions such as promoters and regulatory sequences. These non-coding regions are of increasing interest with regard to disease-related polymorphisms and mutations.
As recently described by Natsoulis et al.
), this in-solution capture approach enables targeted resequencing of large sets of genomic loci targets up to 1
Mb and potentially higher. Using highly multiplexed pools of single-stranded 80-mer capture oligonucleotides to circularize target genomic regions en masse
(), this capture assay enables the amplification of target-specific regions with a universal set of PCR primers common to all targets. A capture oligonucleotide contains two single-stranded capture arm sequences that mediate circularization by hybridizing specifically to the complementary flanking sequences of a genomic target. A fixed sequence general motif links the capture arm oligonucleotides to form a complete capture oligonucleotide with 80
bp length. Each circularization reaction also incorporates a 40-bp universal vector oligonucleotide that complements the capture oligonucleotide's general motif. This vector provides the universal primer sequences necessary for downstream amplification. Previously, we designed a set of capture oligonucleotides spanning the human exome (http://oligoexome.stanford.edu
) and demonstrated that customized capture assays could be easily developed using this resource (3
Figure 1. Schema for target-specific capture and amplification by selective genomic circularization. This schema for the Natsoulis et al. (3) capture protocol describes the major steps for conducting capture and amplification of a target region. The light blue (more ...)
In brief, the full protocol includes the following key steps (): (i) genomic DNA is subject to restriction enzyme digestion by a single enzyme; (ii) the addition of capture oligonucleotides pools that are specific to a given restriction enzyme and the vector sequence circularizes genomic targets; (iii) 5′ exonuclease cleaves unbound 5′ sequence (the 5′ flap); (iv) circularization is completed by ligation; and (v) a uracil glycosylase reaction linearizes circularized molecules to produce capture regions flanked by universal primer sequences. These molecules can then be uniformly amplified by PCR and prepared for sequencing.
As has been described, this assay successfully targets up to 1
Mb of human sequence and can accommodate the highly multiplexed capture of thousands of loci (3
). Additionally, the technology achieves both high-sensitivity and high-specificity human genomic capture across target regions up to 800
bp in length. On-target regions of >10-fold coverage make up >85% of the original targets. Off-target capture as we recently demonstrated was <5%. Based on a published cost assessment (3
), the overall assay is significantly less costly than common capture methods such as multiplex PCR and in-solution capture. The procedure uses <100
ng of DNA per individual sample, which makes it ideal for clinical applications with limited sample material, and the capture assay can be completed in several days. Finally, this capture assay can be adapted for multiple sequencing platforms. The most recent application as described by Natsoulis et al.
) uses next-generation Illumina technology for downstream sequencing, but it may be adapted for use with other next-generation sequencers, as we have previously demonstrated with Roche's 454 sequencer (4
This selective capture protocol introduces several molecular constraints that must be considered in identifying capture arm sequences (). To complete the ligation in Step 4, the termini of a captured DNA sequence must lie flush to the 40-bp vector oligonucleotide. The 3′ capture arm of a successful capture oligonucleotide must therefore hybridize precisely at the 3′ terminus of a restriction fragment containing the genomic region of interest. The 5′ exonuclease in Step 3 enables flexible placement of the 5′ capture arm by removing the 5′ flap produced by genomic DNA that extends beyond the capture arm. These molecular mechanisms complicate capture arm design and render the procedure intractable by standard primer design software. Designing capture arms that cover any given human genome target represents a major challenge to disseminating this technology to interested users.
To facilitate designing customized targeted resequencing assays for any human genome region, we have created the Stanford Human OligoGenome Resource, a database of oligonucleotide capture sequences that span the human genome. Using our previous experience in designing and implementing assays, we improved the design method to avoid issues which decrease capture efficiency (3
). This unique resource has extensive utility given that it provides coverage for capture targets beyond the 3% of the coding region portion (e.g. exome) of the human genome. The OligoGenome website (http://oligogenome.stanford.edu/
) provides an interface for browsing, filtering and downloading capture oligonucleotide sequences based upon user queried genomic regions and annotation-based constraints. The capture oligonucleotide designs and the search tools expedite the experimental design of customized captures assays and provides researchers with the ability to query both inside and outside of the coding regions of the human genome. Given its low cost and limited infrastructure requirements (3
), this resource greatly improves the accessibility of highly multiplexed genomic target capture and resequencing for researchers.