Despite the ever-increasing throughput and steadily decreasing cost of next
generation sequencing (NGS), whole genome sequencing of humans is still not a
viable option for the majority of genetics laboratories. This is particularly
true in the case of complex disease studies, where large sample sets are often
required to achieve adequate statistical power. To fully leverage the potential
of NGS technology on large sample sets, several methods have been developed to
selectively enrich for regions of interest. Enrichment reduces both monetary and
computational costs compared to whole genome sequencing, while allowing
researchers to take advantage of NGS throughput. Several targeted enrichment
approaches are currently available, including molecular inversion probe ligation
sequencing (MIPS), oligonucleotide hybridization based approaches, and PCR-based
strategies. To assess how these methods performed when used in conjunction with
the ABI SOLID3+, we investigated three enrichment techniques: Nimblegen
oligonucleotide hybridization array-based capture; Agilent SureSelect
oligonucleotide hybridization solution-based capture; and Raindance
Technologies' multiplexed PCR-based approach. Target regions were selected
from exons and evolutionarily conserved areas throughout the human genome. Probe
and primer pair design was carried out for all three methods using their
respective informatics pipelines. In all, approximately 0.8 Mb of target space
was identical for all 3 methods. SOLiD sequencing results were analyzed for
several metrics, including consistency of coverage depth across samples,
on-target versus off-target efficiency, allelic bias, and genotype concordance
with array-based genotyping data. Agilent SureSelect exhibited superior
on-target efficiency and correlation of read depths across samples. Nimblegen
performance was similar at read depths at 20× and below. Both Raindance
and Nimblegen SeqCap exhibited tighter distributions of read depth around the
mean, but both suffered from lower on-target efficiency in our experiments.
Raindance demonstrated the highest versatility in assay design.
Next generation sequencing (NGS) technologies offer the possibility to map entire genomes at affordable costs. This brings the genetic testing procedure to a higher level of complexity. The positive aspect is the ease to cope with the complex diagnosis of genetically heterogeneous disorders and to identify novel disease genes. Worries arise from the management of too many DNA variations with unpredictable meaning and incidental findings that can cause ethical and clinical dilemmas. The technology of enrichment makes possible to focus the sequencing to the exome or to a more specific DNA target. This is being used to provide insights into the genetics underlying Mendelian traits involved in myopathies and to set up cost-effective diagnostic tests. This huge potential of the NGS applications makes likely that these will soon become the first approach in genetic diagnostic laboratories.
Next generation sequencing; NGS; neuromuscular disorders
Microarray-based enrichment of selected genomic loci is a powerful method for genome complexity reduction for next-generation sequencing. Since the vast majority of exons in vertebrate genomes are smaller than 150 nt, we explored the use of short fragment libraries (85–110 bp) to achieve higher enrichment specificity by reducing carryover and adverse effects of flanking intronic sequences. High enrichment specificity (60–75%) was obtained with a relative even base coverage. Up to 98% of the target-sequence was covered more than 20× at an average coverage depth of about 200×. To verify the accuracy of SNP/mutation detection, we evaluated 384 known non-reference SNPs in the targeted regions. At ∼200× average sequence coverage, we were able to survey 96.4% of 1.69 Mb of genomic sequence with only 4.2% false negative calls, mostly due to low coverage. Using the same settings, a total of 1197 novel candidate variants were detected. Verification experiments revealed only eight false positive calls, indicating an overall false positive rate of less than 1 per ∼200 000 bp. Taken together, short fragment libraries provide highly efficient and flexible enrichment of exonic targets and yield relatively even base coverage, which facilitates accurate SNP and mutation detection. Raw sequencing data, alignment files and called SNPs have been submitted into GEO database http://www.ncbi.nlm.nih.gov/geo/ with accession number GSE18542.
Targeted genome enrichment is a powerful tool for making use of the massive throughput of novel DNA-sequencing instruments. We herein present a simple and scalable protocol for multiplex amplification of target regions based on the Selector technique. The updated version exhibits improved coverage and compatibility with next-generation-sequencing (NGS) library-construction procedures for shotgun sequencing with NGS platforms. To demonstrate the performance of the technique, all 501 exons from 28 genes frequently involved in cancer were enriched for and sequenced in specimens derived from cell lines and tumor biopsies. DNA from both fresh frozen and formalin-fixed paraffin-embedded biopsies were analyzed and 94% specificity and 98% coverage of the targeted region was achieved. Reproducibility between replicates was high (R2 = 0, 98) and readily enabled detection of copy-number variations. The procedure can be carried out in <24 h and does not require any dedicated instrumentation.
Over the next few years, the efficient use of next-generation sequencing (NGS) in human genetics research will depend heavily upon the effective mechanisms for the selective enrichment of genomic regions of interest. Recently, comprehensive exome capture arrays have become available for targeting approximately 33 Mb or ∼180,000 coding exons across the human genome. Selective genomic enrichment of the human exome offers an attractive option for new experimental designs aiming to quickly identify potential disease-associated genetic variants, especially in family-based studies. We have evaluated a 2.1 M feature human exome capture array on eight individuals from a three-generation family pedigree. We were able to cover up to 98% of the targeted bases at a long-read sequence read depth of ≥3, 86% at a read depth of ≥10, and over 50% of all targets were covered with ≥20 reads. We identified up to 14,284 SNPs and small indels per individual exome, with up to 1,679 of these representing putative novel polymorphisms. Applying the conservative genotype calling approach HCDiff, the average rate of detection of a variant allele based on Illumina 1 M BeadChips genotypes was 95.2% at ≥10x sequence. Further, we propose an advantageous genotype calling strategy for low covered targets that empirically determines cut-off thresholds at a given coverage depth based on existing genotype data. Application of this method was able to detect >99% of SNPs covered ≥8x. Our results offer guidance for “real-world” applications in human genetics and provide further evidence that microarray-based exome capture is an efficient and reliable method to enrich for chromosomal regions of interest in next-generation sequencing experiments.
Targeted next-generation sequencing is becoming a common tool in the molecular diagnostic laboratory. However, currently available methods to enrich for regions of interest in the DNA sequence suffer from drawbacks such as high cost, complex protocols, lack of clinical-level accuracy and uneven target coverage. A target-enrichment approach using complementary long padlock probes described in a recent article significantly improves on previous methods in most of these areas.
See related Research: http://genomemedicine.com/content/5/5/50
Enriching target sequences in sequencing libraries via capture hybridization to bait/probes is an efficient means of leveraging the capabilities of next-generation sequencing for obtaining sequence data from target regions of interest. However, homologous sequences from non-target regions may also be enriched by such methods. Here we investigate the fidelity of capture enrichment for complete mitochondrial DNA (mtDNA) genome sequencing by analyzing sequence data for nuclear copies of mtDNA (NUMTs). Using capture-enriched sequencing data from a mitochondria-free cell line and the parental cell line, and from samples previously sequenced from long-range PCR products, we demonstrate that NUMT alleles are indeed present in capture-enriched sequence data, but at low enough levels to not influence calling the authentic mtDNA genome sequence. However, distinguishing NUMT alleles from true low-level mutations (e.g. heteroplasmy) is more challenging. We develop here a computational method to distinguish NUMT alleles from heteroplasmies, using sequence data from artificial mixtures to optimize the method.
In highly copy number variable (CNV) regions such as the human defensin gene locus, comprehensive assessment of sequence variations is challenging. PCR approaches are practically restricted to tiny fractions, and next-generation sequencing (NGS) approaches of whole individual genomes e.g. by the 1000 Genomes Project is confined by an affordable sequence depth. Combining target enrichment with NGS may represent a feasible approach.
As a proof of principle, we enriched a ~850 kb section comprising the CNV defensin gene cluster DEFB, the invariable DEFA part and 11 control regions from two genomes by sequence capture and sequenced it by 454 technology. 6,651 differences to the human reference genome were found. Comparison to HapMap genotypes revealed sensitivities and specificities in the range of 94% to 99% for the identification of variations.
Using error probabilities for rigorous filtering revealed 2,886 unique single nucleotide variations (SNVs) including 358 putative novel ones. DEFB CN determinations by haplotype ratios were in agreement with alternative methods.
Although currently labor extensive and having high costs, target enriched NGS provides a powerful tool for the comprehensive assessment of SNVs in highly polymorphic CNV regions of individual genomes. Furthermore, it reveals considerable amounts of putative novel variations and simultaneously allows CN estimation.
Next generation sequencing (NGS) provides a valuable method to quickly obtain sequence information from non-model organisms at a genomic scale. In principle, if sequencing is not targeted for a genomic region or sequence type (e.g. coding region, microsatellites) NGS reads can be used as a genome snapshot and provide information on the different types of sequences in the genome. However, no study has ascertained if a typical 454 dataset of low coverage (1/4-1/8 of a PicoTiter plate leading to generally less than 0.1x of coverage) represents all parts of genomes equally.
Partial genome shotgun sequencing of total DNA (without enrichment) on a 454 NGS platform was used to obtain reads of Apis mellifera (454 reads hereafter). These 454 reads were compared to the assembled chromosomes of this species in three different aspects: (i) dimer and trimer compositions, (ii) the distribution of mapped 454 sequences along the chromosomes and (iii) the numbers of different classes of microsatellites. Highly significant chi-square tests for all three types of analyses indicated that the 454 data is not a perfect random sample of the genome. Only the number of 454 reads mapped to each of the 16 chromosomes and the number of microsatellites pooled by motif (repeat unit) length was not significantly different from the expected values. However, a very strong correlation (correlation coefficients greater than 0.97) was observed between most of the 454 variables (the number of different dimers and trimers, the number of 454 reads mapped to each chromosome fragments of one Mb, the number of 454 reads mapped to each chromosome, the number of microsatellites of each class) and their corresponding genomic variables.
The results of chi square tests suggest that 454 shotgun reads cannot be regarded as a perfect representation of the genome especially if the comparison is done on a finer scale (e.g. chromosome fragments instead of whole chromosomes). However, the high correlation between 454 and genome variables tested indicate that a high proportion of the variability of 454 variables is explained by their genomic counterparts. Therefore, we conclude that using 454 data to obtain information on the genome is biologically meaningful.
Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions).
We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent ‘read-backmapping’ to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach.
We recommend applying our general ‘two-step’ mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results.
Two-stage mapping; Read-backmapping; Software performance; SNP discovery; Multiplexed targeted next-generation sequencing
The linkage of disease gene mapping with DNA sequencing is an essential strategy for defining the genetic basis of a disease. New massively parallel sequencing procedures will greatly facilitate this process, although enrichment for the target region before sequencing remains necessary. For this step, various DNA capture approaches have been described that rely on sequence-defined probe sets. To avoid making assumptions on the sequences present in the targeted region, we accessed specific cytogenetic regions in preparation for next-generation sequencing. We directly microdissected the target region in metaphase chromosomes, amplified it by degenerate oligonucleotide-primed PCR, and obtained sufficient material of high quality for high-throughput sequencing. Sequence reads could be obtained from as few as six chromosomal fragments. The power of cytogenetic enrichment followed by next-generation sequencing is that it does not depend on earlier knowledge of sequences in the region being studied. Accordingly, this method is uniquely suited for situations in which the sequence of a reference region of the genome is not available, including population-specific or tumor rearrangements, as well as previously unsequenced genomic regions such as centromeres.
genomic selection; enrichment; microdissection; next-generation sequencing
The dramatic increase in throughput of sequencing data from next generation sequencing platforms has enabled scientists to study the genome with unprecedented depth and accuracy. Nevertheless, routine genetic screens in large numbers of individuals continue to remain costprohibitive through these approaches. Agilent Technologies' SureSelect platform for targeted exome capture, combined with massively parallel sequencing, provides a more affordable method to gain novel insights into the genetic causes of inherited disorders. In addition, identification of both common and rare polymorphisms implicated in complex diseases like cancer is greatly facilitated by selectively sequencing the protein-coding regions of the genome. In collaboration with the Broad and Sanger Institutes, Agilent Technologies has continued to expand the number of SureSelect target enrichment catalog products in order to enable a more comprehensive view of the protein-coding regions in humans and model organisms. We discuss the SureSelectHuman All Exon v2 (44Mb) and SureSelectHuman All Exon 50Mb designs. We also introduce the SureSelectMouse All Exon target enrichment system, which improves the ability to study genetic variation between strains in greater detail, and significantly increases the efficiency of screening for causative mutations in N-ethyl-N-nitrosourea (ENU)-mutagenized mice. We demonstrate high performance with respect to capture efficiency, uniformity, reproducibility of enrichment, and ability to detect SNPs, insertion/deletions, and CNVs across Illumina (Genome Analyzer IIx and HiSeq2000) and SOLiD platforms. We highlight the utility of the SureSelect All Exon product portfolio for a wide variety of applications primarily due to the high specificity and excellent cross-platform sequence coverage. SureSelect All Exon designs also provide a means for standardization, consistency of performance, and reliability across multiple laboratories.
High-throughput sequencing opens avenues to find genetic variations that may be indicative of an increased risk for certain diseases. Linking these genomic data to other “omics” approaches bears the potential to deepen our understanding of pathogenic processes at the molecular level. To detect novel single nucleotide polymorphisms (SNPs) for glioblastoma multiforme (GBM), we used a combination of specific target selection and next generation sequencing (NGS). We generated a microarray covering the exonic regions of 132 GBM associated genes to enrich target sequences in two GBM tissues and corresponding leukocytes of the patients. Enriched target genes were sequenced with Illumina and the resulting reads were mapped to the human genome. With this approach we identified over 6000 SNPs, including over 1300 SNPs located in the targeted genes. Integrating the genome-wide association study (GWAS) catalog and known disease associated SNPs, we found that several of the detected SNPs were previously associated with smoking behavior, body mass index, breast cancer and high-grade glioma. Particularly, the breast cancer associated allele of rs660118 SNP in the gene SART1 showed a near doubled frequency in glioblastoma patients, as verified in an independent control cohort by Sanger sequencing. In addition, we identified SNPs in 20 of 21 GBM associated antigens providing further evidence that genetic variations are significantly associated with the immunogenicity of antigens.
Only a small fraction of large genomes such as that of the human contains the functional regions such as the exons, promoters, and polyA sites. A platform technique for selective enrichment of functional genomic regions will enable several next-generation sequencing applications that include the discovery of causal mutations for disease and drug response. Here, we describe a powerful platform technique, termed “functional genomic fingerprinting” (FGF), for the multiplexed genomewide isolation and analysis of targeted regions such as the exome, promoterome, or exon splice enhancers. The technique employs a fixed part of a uniquely designed Fixed-Randomized primer, while the randomized part contains all the possible sequence permutations. The Fixed-Randomized primers bind with full sequence complementarity at multiple sites where the fixed sequence (such as the splice signals) occurs within the genome, and multiplex amplify many regions bounded by the fixed sequences (e.g., exons). Notably, validation of this technique using cardiac myosin binding protein-C (MYBPC3) gene as an example strongly supports the application and efficacy of this method. Further, assisted by genomewide computational analyses of such sequences, the FGF technique may provide a unique platform for high-throughput sample production and analysis of targeted genomic regions by the next-generation sequencing techniques, with powerful applications in discovering disease and drug response genes.
DNA methylation is one of the most important epigenetic alterations involved in the control of gene expression. Bisulfite sequencing of genomic DNA is currently the only method to study DNA methylation patterns at single-nucleotide resolution. Hence, next-generation sequencing of bisulfite-converted DNA is the method of choice to investigate DNA methylation profiles at the genome-wide scale. Nevertheless, whole genome sequencing for analysis of human methylomes is expensive, and a method for targeted gene analysis would provide a good alternative in many cases where the primary interest is restricted to a set of genes.
Here, we report the successful use of a custom Agilent SureSelect Target Enrichment system for the hybrid capture of bisulfite-converted DNA. We prepared bisulfite-converted next-generation sequencing libraries, which are enriched for the coding and regulatory regions of 174 ADME genes (i.e. genes involved in the metabolism and distribution of drugs). Sequencing of these libraries on Illumina’s HiSeq2000 revealed that the method allows a reliable quantification of methylation levels of CpG sites in the selected genes, and validation of the method using pyrosequencing and the Illumina 450K methylation BeadChips revealed good concordance.
Next-generation sequencing (NGS) is arguably one of the most significant technological advances in the biological sciences of the last 30 years. The second generation sequencing platforms have advanced rapidly to the point that several genomes can now be sequenced simultaneously in a single instrument run in under two weeks. Targeted DNA enrichment methods allow even higher genome throughput at a reduced cost per sample. Medical research has embraced the technology and the cancer field is at the forefront of these efforts given the genetic aspects of the disease. World-wide efforts to catalogue mutations in multiple cancer types are underway and this is likely to lead to new discoveries that will be translated to new diagnostic, prognostic and therapeutic targets. NGS is now maturing to the point where it is being considered by many laboratories for routine diagnostic use. The sensitivity, speed and reduced cost per sample make it a highly attractive platform compared to other sequencing modalities. Moreover, as we identify more genetic determinants of cancer there is a greater need to adopt multi-gene assays that can quickly and reliably sequence complete genes from individual patient samples. Whilst widespread and routine use of whole genome sequencing is likely to be a few years away, there are immediate opportunities to implement NGS for clinical use. Here we review the technology, methods and applications that can be immediately considered and some of the challenges that lie ahead.
Genomic enrichment methods and next-generation sequencing produce uneven coverage for the portions of the genome (the loci) they target; this information is essential for ascertaining the suitability of each locus for further analysis. lociNGS is a user-friendly accessory program that takes multi-FASTA formatted loci, next-generation sequence alignments and demographic data as input and collates, displays and outputs information about the data. Summary information includes the parameters coverage per locus, coverage per individual and number of polymorphic sites, among others. The program can output the raw sequences used to call loci from next-generation sequencing data. lociNGS also reformats subsets of loci in three commonly used formats for multi-locus phylogeographic and population genetics analyses – NEXUS, IMa2 and Migrate. lociNGS is available at https://github.com/SHird/lociNGS and is dependent on installation of MongoDB (freely available at http://www.mongodb.org/downloads). lociNGS is written in Python and is supported on MacOSX and Unix; it is distributed under a GNU General Public License.
Next-generation sequencing (NGS) technologies are highly affordable and powerful tools for biomedical research. Methods for exon enrichment provide a means for focused study of protein-coding regions that may be involved in diseases. For this study, we combined Agilent exon capture and Applied Biosystems SOLiD sequencing technologies to determine single nucleotide polymorphisms (SNPs) specifically in the kinase genes of human lung tumors. Quantity and quality of genomic DNA from paired samples of normal and tumor lung tissue (N=5) were assessed with the Qubit® 2.0 fluorometer and Agilent 2100 Bioanalyzer. A Covaris® S2 system was utilized for shearing DNA (∼165 bp). After ligation of specific-adaptors, the kinase exome library was enriched using the Agilent SureSelect human kinome capture system utilizing 120-bp biotinylated RNA probes. Bound DNA was purified and barcoded for sample identification. Kinase captured libraries were quantified and pooled for amplification by emulsion PCR. Template beads for all 5 tumor-normal paired samples were pooled on a single slide and sequenced on the SOLiD 4 platform. Image analysis and base calling was performed with the SOLiD System Analysis Pipeline tool. Read length was 50 bp. GenomeQuest NGS analysis tools were used to map SOLID reads to the reference human genome (Build-hg19) as well as for SNPs identification. An average of ∼11 million reads was mapped per sample. Data analyses showed that 88.8% of the kinome probes are fully covered at a depth 1, whereas 95.55% of the probes had a depth of coverage greater than 10X. In addition, 72.2% of the sequence reads were either within or overlapping the target. Off-target reads (27.8%) were evenly distributed among chromosomes with no bias toward GC-rich regions. Overall, exon capture and NGS technologies are reliable and cost-effective approaches for SNPs detection and suitable for other applications in biomedical research.
Target enrichment technologies utilize single-stranded oligonucleotide probes to capture candidate genomic regions from a DNA sample before sequencing. We describe target capture using double-stranded probes, which consist of single-stranded, complementary long padlock probes (cLPPs), each selectively capturing one strand of a genomic target through circularization. Using two probes per target increases sensitivity for variant detection and cLPPs are easily produced by PCR at low cost. Additionally, we introduce an approach for generating capture libraries with uniformly randomized template orientations. This facilitates bidirectional sequencing of both the sense and antisense template strands during one paired-end read, which maximizes target coverage.
Screening large numbers of target regions in multiple DNA samples for sequence variation is an important application of next-generation sequencing but an efficient method to enrich the samples in parallel has yet to be reported. We describe an advanced method that combines DNA samples using indexes or barcodes prior to target enrichment to facilitate this type of experiment. Sequencing libraries for multiple individual DNA samples, each incorporating a unique 6-bp index, are combined in equal quantities, enriched using a single in-solution target enrichment assay and sequenced in a single reaction. Sequence reads are parsed based on the index, allowing sequence analysis of individual samples. We show that the use of indexed samples does not impact on the efficiency of the enrichment reaction. For three- and nine-indexed HapMap DNA samples, the method was found to be highly accurate for SNP identification. Even with sequence coverage as low as 8x, 99% of sequence SNP calls were concordant with known genotypes. Within a single experiment, this method can sequence the exonic regions of hundreds of genes in tens of samples for sequence and structural variation using as little as 1 μg of input DNA per sample.
next-generation sequencing; enrichment; capture; SNP; index
Phenotype-driven forward genetic experiments are powerful approaches for linking phenotypes to genomic elements but they still involve a laborious positional cloning process. Although sequencing of complete genomes now becomes available, discriminating causal mutations from the enormous amounts of background variation remains a major challenge.
To improve this, we developed a universal two-step approach, named 'fast forward genetics', which combines traditional bulk segregant techniques with targeted genomic enrichment and next-generation sequencing technology
As a proof of principle we successfully applied this approach to two Arabidopsis mutants and identified a novel factor required for stem cell activity.
We demonstrated that the 'fast forward genetics' procedure efficiently identifies a small number of testable candidate mutations. As the approach is independent of genome size, it can be applied to any model system of interest. Furthermore, we show that experiments can be multiplexed and easily scaled for the identification of multiple individual mutants in a single sequencing run.
The emergence of next-generation sequencing technology presents tremendous opportunities to accelerate the discovery of rare variants or mutations that underlie human genetic disorders. Although the complete sequencing of the affected individuals' genomes would be the most powerful approach to finding such variants, the cost of such efforts make it impractical for routine use in disease gene research. In cases where candidate genes or loci can be defined by linkage, association, or phenotypic studies, the practical sequencing target can be made much smaller than the whole genome, and it becomes critical to have capture methods that can be used to purify the desired portion of the genome for shotgun short-read sequencing without biasing allelic representation or coverage. One major approach is array-based capture which relies on the ability to create a custom in-situ synthesized oligonucleotide microarray for use as a collection of hybridization capture probes. This approach is being used by our group and others routinely and we are continuing to improve its performance.
Here, we provide a complete protocol optimized for large aggregate sequence intervals and demonstrate its utility with the capture of all predicted amino acid coding sequence from 3,038 human genes using 241,700 60-mer oligonucleotides. Further, we demonstrate two techniques by which the efficiency of the capture can be increased: by introducing a step to block cross hybridization mediated by common adapter sequences used in sequencing library construction, and by repeating the hybridization capture step. These improvements can boost the targeting efficiency to the point where over 85% of the mapped sequence reads fall within 100 bases of the targeted regions.
The complete protocol introduced in this paper enables researchers to perform practical capture experiments, and includes two novel methods for increasing the targeting efficiency. Coupled with the new massively parallel sequencing technologies, this provides a powerful approach to identifying disease-causing genetic variants that can be localized within the genome by traditional methods.
DNA methylation is a critical epigenetic mark that is essential for mammalian development and aberrant in many diseases including cancer. Over the past decade multiple methods have been developed and applied to characterize its genome-wide distribution. Of these, Reduced Representation Bisulfite Sequencing (RRBS) generates nucleotide resolution Illumina-based libraries that enrich for CpG-dense regions by methylation-insensitive restriction digestion. Here we provide an extensive, optimized protocol for generating RRBS libraries and discuss the power of this strategy for methylome profiling. We include information on sequence analysis and the relative coverage over genomic regions of interest for a representative mouse MspI generated RRBS library. Contemporary sequencing and array-based technologies are compared against sample throughput and coverage, highlighting the variety of options available to investigate methylation on the genome-scale.
Next-generation sequencing (NGS) combined with enrichment of target genes enables highly efficient and low-cost sequencing of multiple genes for genetic diseases. The aim of this study was to validate the accuracy and sensitivity of our method for comprehensive mutation detection in autism spectrum disorder (ASD). We assessed the performance of the bench-top Ion Torrent PGM and Illumina MiSeq platforms as optimized solutions for mutation detection, using microdroplet PCR-based enrichment of 62 ASD associated genes. Ten patients with known mutations were sequenced using NGS to validate the sensitivity of our method. The overall read quality was better with MiSeq, largely because of the increased indel-related error associated with PGM. The sensitivity of SNV detection was similar between the two platforms, suggesting they are both suitable for SNV detection in the human genome. Next, we used these methods to analyze 28 patients with ASD, and identified 22 novel variants in genes associated with ASD, with one mutation detected by MiSeq only. Thus, our results support the combination of target gene enrichment and NGS as a valuable molecular method for investigating rare variants in ASD.
Large-scale genetic screens in Arabidopsis are a powerful approach for molecular dissection of complex signaling networks. However, map-based cloning can be time-consuming or even hampered due to low chromosomal recombination. Current strategies using next generation sequencing for molecular identification of mutations require whole genome sequencing and advanced computational devises and skills, which are not readily accessible or affordable to every laboratory. We have developed a streamlined method using parallel massive sequencing for mutant identification in which only targeted regions are sequenced. This targeted parallel sequencing (TPSeq) method is more cost-effective, straightforward enough to be easily done without specialized bioinformatics expertise, and reliable for identifying multiple mutations simultaneously. Here, we demonstrate its use by identifying three novel nitrate-signaling mutants in Arabidopsis.
Next generation sequencing; EMS; PCR-amplified genomic library; Nitrate signalling; Positional cloning