Genetic evaluation of cardiomyopathies poses a challenge. Multiple genes are involved but no clear genotype–phenotype correlations have been found so far. In the past, genetic evaluation for hypertrophic (HCM) and dilated (DCM) cardiomyopathies was performed by sequential screening of a very limited number of genes. Recent developments in sequencing have increased the throughput, enabling simultaneous screening of multiple genes for multiple patients in a single sequencing run.
Development and implementation of a next generation sequencing (NGS) based genetic test as replacement for Sanger sequencing.
Methods and Results
In order to increase the number of genes that can be screened in a shorter time period, we enriched all exons of 23 of the most relevant HCM and DCM related genes using on-array multiplexed sequence capture followed by massively parallel pyrosequencing on the GS-FLX Titanium. After optimisation of array based sequence capture it was feasible to reliably detect a large panel of known and unknown variants in HCM and DCM patients, whereby the unknown variants could be confirmed by Sanger sequencing.
The rate of detection of (pathogenic) variants in both HCM and DCM patients was increased due to a larger number of genes studied. Array based target enrichment followed by NGS showed the same accuracy as Sanger sequencing. Therefore, NGS is ready for implementation in a diagnostic setting.
Cardiomyopathy; Diagnostics; Genetics; Molecular genetics
Next-generation sequencing (NGS) is widely used in biomedical research, but its adoption has been limited in molecular diagnostics. One application of NGS is the targeted resequencing of genes whose mutations lead to an overlapping clinical phenotype. This study evaluated the comparative performance of the Illumina Genome Analyzer and Roche 454 GS FLX for the resequencing of 16 genes associated with hypertrophic cardiomyopathy (HCM). Using a single human genomic DNA sample enriched by long-range PCR (LR-PCR), 40 GS FLX and 31 Genome Analyzer exon variants were identified using ≥30-fold read-coverage and ≥20% read-percentage selection criteria. Twenty-seven platform concordant variants were Sanger-confirmed. The discordant variants segregated into two categories: variants with read coverages ≥30 on one platform but <30-fold on the alternate platform and variants with read percentages ≥20% on one platform but <20% on the alternate platform. All variants with <30-fold coverage were Sanger-confirmed, suggesting that the coverage criterion of ≥30-fold is too stringent for variant discovery. The variants with <20% read percentage were identified as reference sequence based on Sanger sequencing. These variants were found in homopolymer tracts and short-read misalignments, specifically in genes with high identity. The results of the current study demonstrate the feasibility of combining LR-PCR with the Genome Analyzer or GS FLX for targeted resequencing of HCM-associated genes.
sequence analysis; DNA; next-generation sequencing
We show that massively parallel targeted sequencing of 19 genes provides a new and reliable strategy for molecular diagnosis of Usher syndrome (USH) and nonsyndromic deafness, particularly appropriate for these disorders characterized by a high clinical and genetic heterogeneity and a complex structure of several of the genes involved. A series of 71 patients including Usher patients previously screened by Sanger sequencing plus newly referred patients was studied. Ninety-eight percent of the variants previously identified by Sanger sequencing were found by next-generation sequencing (NGS). NGS proved to be efficient as it offers analysis of all relevant genes which is laborious to reach with Sanger sequencing. Among the 13 newly referred Usher patients, both mutations in the same gene were identified in 77% of cases (10 patients) and one candidate pathogenic variant in two additional patients. This work can be considered as pilot for implementing NGS for genetically heterogeneous diseases in clinical service.
Bioinformatics; next-generation sequencing; NSHL; Usher syndrome; variant prioritization
Treatment of EGFR-mutant non-small cell lung cancer patients with the tyrosine kinase inhibitors erlotinib or gefitinib results in high response rates and prolonged progression-free survival. Despite the development of sensitive mutation detection approaches, a thorough validation of these in a clinical setting has so far been lacking. We performed, in a clinical setting, a systematic validation of dideoxy ‘Sanger’ sequencing and pyrosequencing against massively parallel sequencing as one of the most sensitive mutation detection technologies available. Mutational annotation of clinical lung tumor samples revealed that of all patients with a confirmed response to EGFR inhibition, only massively parallel sequencing detected all relevant mutations. By contrast, dideoxy sequencing missed four responders and pyrosequencing missed two responders, indicating a dramatic lack of sensitivity of dideoxy sequencing, which is widely applied for this purpose. Furthermore, precise quantification of mutant alleles revealed a low correlation (r2 = 0.27) of histopathological estimates of tumor content and frequency of mutant alleles, thereby questioning the use of histopathology for stratification of specimens for individual analytical procedures. Our results suggest that enhanced analytical sensitivity is critically required to correctly identify patients responding to EGFR inhibition. More broadly, our results emphasize the need for thorough evaluation of all mutation detection approaches against massively parallel sequencing as a prerequisite for any clinical implementation.
A new priority in genome research is large-scale resequencing of genes to understand the molecular basis of hereditary disease and cancer. We assessed the ability of massively parallel pyrosequencing to identify sequence variants in pools. From a large collection of human PCR samples we selected 343 PCR products belonging to 16 disease genes and including a large spectrum of sequence variations previously identified by Sanger sequencing. The sequence variants included SNPs and small deletions and insertions (up to 44 bp), in homozygous or heterozygous state.
The DNA was combined in 4 pools containing from 27 to 164 amplicons and from 8,9 to 50,8 Kb to sequence for a total of 110 Kb. Pyrosequencing generated over 80 million base pairs of data. Blind searching for sequence variations with a specifically designed bioinformatics procedure identified 465 putative sequence variants, including 412 true variants, 53 false positives (in or adjacent to homopolymeric tracts), no false negatives. All known variants in positions covered with at least 30× depth were correctly recognized.
Massively parallel pyrosequencing may be used to simplify and speed the search for DNA variations in PCR products. Our results encourage further studies to evaluate molecular diagnostics applications.
The DNA Sequencing and Genotyping Facility of the Cornell University Life Sciences Core Laboratories Center (CLC) provides an array of shared research resources and services to the university community and to outside investigators. The facility offers a concentration of advanced instrumentation and expertise in their applications. Services include the ABI3730xl platform for Sanger sequencing of plasmid and PCR products.Resources for massively-parallel, “next generation” sequencing technology applications include the Roche 454 GS FLX and the Illumina Genome Analyzer IIx. The Sequenom MassArray is available for SNP genotyping and methylation studies.The ABI 7900HT is available for RT-PCR studies.The facility also provides support for SNP genotyping using automated sample processing pipelines for ABI SNPlex and Illumina array-based platforms, including both the Illumina BeadStation and BeadXpress readers.The goal of the facility is to provide rapid and accurate DNA sequencing and genotyping services.
Surveillance for HIV transmitted drug resistance (TDR) is performed using HIV genotype results from individual specimens. Pyrosequencing, through its massive parallel sequencing ability, can analyze large numbers of specimens simultaneously. Instead of using pyrosequencing conventionally, to sequence a population of viruses within an individual, we interrogated a single combined pool of surveillance specimens to demonstrate that it is possible to determine TDR rates in HIV protease from a population of individuals.
The protease region from 96 treatment naïve, HIV+ serum specimens was genotyped using standard Sanger sequencing method. The 462 bp protease amplicons from these specimens were pooled in equimolar concentrations and re-sequenced using the GS FLX Titanium system. The nucleotide (NT) and amino acid (AA) differences from the reference sequence, along with TDR mutations, detected by each method were compared. In the protease sequence, there were 212 nucleotide and 81 AA differences found using conventional sequencing and 345 nucleotide and 168 AA differences using pyrosequencing. All nucleotide and amino acid polymorphisms found at frequencies ≥5% in pyrosequencing were detected using both methods with the rates of variation highly correlated. Using Sanger sequencing, two TDR mutations, M46L and I84V, were each detected as mixtures at a frequency of 1.04% (1/96). These same TDR mutations were detected by pyrosequencing with a prevalence of 0.29% and 0.34% respectively. Phylogenetic analysis established that the detected low frequency mutations arose from the same single specimens that were found to contain TDR mutations by Sanger sequencing. Multiple clinical protease DR mutations present at higher frequencies were concordantly identified using both methods.
We show that pyrosequencing pooled surveillance specimens can cost-competitively detect protease TDR mutations when compared with conventional methods. With few modifications, the method described here can be used to determine population rates of TDR in both protease and reverse transcriptase. Furthermore, this pooled pyrosequencing technique may be generalizable to other infectious agents where a survey of DR rates is required.
Bacterial diversity in endodontic infections has not been sufficiently studied. The use of modern pyrosequencing technology should allow for more comprehensive analysis than traditional Sanger sequencing. This study investigated bacterial diversity in endodontic infections through taxonomic classification based on 16S rRNA gene sequences generated by 454 GS-FLX pyrosequencing and conventional Sanger capillary sequencing technologies. Sequencings were performed on 7 specimens from endodontic infections. On average, 47 vs. 28,590 sequences were obtained per sample for Sanger sequencing vs. pyrosequencing, representing a 600-fold difference in “depth-of-coverage”. Based on Ribosomal Database Project (RDP II) Classifier analysis, pyrosequencing identified 179 bacterial genera in 13 phyla, which was significantly more than Sanger sequencing. The phylum Bacteroidetes was the most prevalent bacterial phylum. These results indicate that bacterial communities in endodontic infections are more diverse than previously demonstrated. In addition, deep-coverage pyrosequencing of the 16S rRNA gene revealed low-abundance micro-organisms with potential clinical implications.
endodontic infection; pyrosequencing; 16S rRNA gene; bacterial diversity; taxonomy
Pyrosequencing is a relatively recent method for sequencing short stretches of DNA. Because both Pyrosequencing and Sanger dideoxy sequencing were recently used to characterize and validate DNA molecular barcodes in a large yeast gene-deletion project, a meta-analysis of those data allow an excellent and timely opportunity for evaluating Pyrosequencing against the current gold standard, Sanger dideoxy sequencing. Starting with yeast genomic DNA, parallel PCR amplification methods were used to prepared 4747 short barcode-containing constructs from 6000 Saccharomyces cerevisiae gene-deletion strains. Pyrosequencing was optimized for average read lengths of 25–30 bases, which included in each case a 20-mer barcode sequence. Results were compared with sequence data obtained by the standard Sanger dideoxy chain termination method. In most cases, sequences obtained by Pyrosequencing and Sanger dideoxy sequencing were of comparable accuracy, and the overall rate of failure was similar. The DNA in the barcodes is derived from synthetic oligonucleotide sequences that were inserted into yeast-deletion-strain genomic DNA by homologous recombination and represents the most significant amount of DNA from a synthetic source that has been sequenced to date. Although more automation and quality control measures are needed, Pyrosequencing was shown to be a fast and convenient method for determining short stretches of DNA sequence.
DNA sequencing; Pyrosequencing technology; Sanger dideoxy sequencing; Synthetic oligonucleotides; Yeast deletion strains
Human mitochondrial DNA (mtDNA) research has entered a massively parallel sequencing (MPS) era, providing deep insight into mtDNA genomics and molecular diagnostics. Analysis can simultaneously include coding and control regions, many samples can be studied in parallel, and even minor heteroplasmic changes can be detected. We investigated heteroplasmy using 16 different tissues from three unrelated males aged 40–54 years at the time of death. mtDNA was enriched using two independent overlapping long-range PCR amplicons and analysed by employing illumina paired-end sequencing. Point mutation heteroplasmy at position 16,093 (m.16093T > C) in the non-coding regulatory region showed great variability among one of the studied individuals; heteroplasmy extended from 5.1 % in red bone marrow to 62.0 % in the bladder. Red (5.1 %) and yellow bone marrow (8.9 %) clustered into one group and two arteries and two aortas from different locations into another (31.2–50.9 %), giving an ontogenetic explanation for the formation of somatic mitochondrial heteroplasmy. Our results demonstrate that multi-tissue screening using MPS provides surprising data even when there is a limited number (3) of study subjects and they give reason to speculate that mtDNA heteroplasmic frequency, distribution, and even its possible role in complex diseases or phenotypes seem to be underestimated.
Electronic supplementary material
The online version of this article (doi:10.1007/s00294-013-0398-6) contains supplementary material, which is available to authorized users.
Human tissue-specific mtDNA heteroplasmy; Illumina sequencing
The cobas 4800 BRAF V600 Mutation Test is a CE-marked and FDA-approved in vitro diagnostic assay used to select patients with metastatic melanoma for treatment with the selective BRAF inhibitor vemurafenib. We describe the pre-approval validation of this test in two external laboratories.
Melanoma specimens were tested for BRAF V600 mutations at two laboratories with the: cobas BRAF Mutation Test; ABI BRAF test; and bidirectional direct sequencing. Positive (PPA) and negative (NPA) percent agreements were determined between the cobas test and the other assays. Specimens with discordant results were tested with massively parallel pyrosequencing (454). DNA blends with 5% mutant alleles were tested to assess detection rates.
Invalid results were observed in 8/116 specimens (6·9%) with Sanger, 10/116 (8·6%) with ABI BRAF, and 0/232 (0%) with the cobas BRAF test. PPA was 97·7% for V600E mutation for the cobas BRAF test and Sanger, and NPA was 95·3%. For the cobas BRAF test and ABI BRAF, PPA was 71·9% and NPA 83·7%. For 16 cobas BRAF test-negative/ABI BRAF-positive specimens, 454 sequencing detected no codon 600 mutations in 12 and variant codon 600 mutations in four. For eight cobas BRAF test-positive/ABI BRAF-negative specimens, four were V600E and four V600K by 454 sequencing. Detection rates for 5% mutation blends were 100% for the cobas BRAF test, 33% for Sanger, and 21% for the ABI BRAF. Reproducibility of the cobas BRAF test was 111/116 (96%) between the two sites.
It is feasible to evaluate potential companion diagnostic tests in external laboratories simultaneously to the pivotal clinical trial validation. The health authority approved assay had substantially better performance characteristics than the two other methods. The overall success of the cobas BRAF test is a proof of concept for future biomarker development.
Massively parallel DNA sequencing instruments are enabling the decoding of whole genomes at significantly lower cost and higher throughput than classical Sanger technology. Each of these technologies have been estimated to yield assemblies with more problematic features than the standard method. These problems are of a different nature depending on the techniques used. So, an appropriate mix of technologies may help resolve most difficulties, and eventually provide assemblies of high quality without requiring any Sanger-based input.
We compared assemblies obtained using Sanger data with those from different inputs from New Sequencing Technologies. The assemblies were systematically compared with a reference finished sequence. We found that the 454 GSFLX can efficiently produce high continuity when used at high coverage. The potential to enhance continuity by scaffolding was tested using 454 sequences from circularized genomic fragments. Finally, we explore the use of Solexa-Illumina short reads to polish the genome draft by implementing a technique to correct 454 consensus errors.
High quality drafts can be produced for small genomes without any Sanger data input. We found that 454 GSFLX and Solexa/Illumina show great complementarity in producing large contigs and supercontigs with a low error rate.
The spread of highly pathogenic avian influenza A virus (HPAIV) of subtype H5N1 demands fast and reliable methods for in-depth, full-length sequence analysis. For this purpose, we designed a simple and sensitive method for the preparation of sequencing libraries from H5N1 HPAIV diagnostic RNA samples for sequencing with the Genome Sequencer FLX instrument. The method presented seamlessly integrates high-throughput pyrosequencing with the Roche/454 instrument into diagnostics without the need for additional equipment or molecular biological techniques besides standard PCR and the Genome Sequencer FLX sample preparation and sequencing pipeline.
The Yellowstone geothermal complex has yielded foundational discoveries that have significantly enhanced our understanding of the Archaea. This study continues on this theme, examining Yellowstone Lake and its lake floor hydrothermal vents. Significant Archaea novelty and diversity were found associated with two near-surface photic zone environments and two vents that varied in their depth, temperature and geochemical profile. Phylogenetic diversity was assessed using 454-FLX sequencing (∼51 000 pyrosequencing reads; V1 and V2 regions) and Sanger sequencing of 200 near-full-length polymerase chain reaction (PCR) clones. Automated classifiers (Ribosomal Database Project (RDP) and Greengenes) were problematic for the 454-FLX reads (wrong domain or phylum), although BLAST analysis of the 454-FLX reads against the phylogenetically placed full-length Sanger sequenced PCR clones proved reliable. Most of the archaeal diversity was associated with vents, and as expected there were differences between the vents and the near-surface photic zone samples. Thaumarchaeota dominated all samples: vent-associated organisms corresponded to the largely uncharacterized Marine Group I, and in surface waters, ∼69–84% of the 454-FLX reads matched archaeal clones representing organisms that are Nitrosopumilus maritimus-like (96–97% identity). Importance of the lake nitrogen cycling was also suggested by >5% of the alkaline vent phylotypes being closely related to the nitrifier Candidatus Nitrosocaldus yellowstonii. The Euryarchaeota were primarily related to the uncharacterized environmental clones that make up the Deep Sea Euryarchaeal Group or Deep Sea Hydrothermal Vent Group-6. The phylogenetic parallels of Yellowstone Lake archaea to marine microorganisms provide opportunities to examine interesting evolutionary tracks between freshwater and marine lineages.
archaea; Yellowstone Lake; pyrosequencing; N cycling; spatial distribution
The Colorado potato beetle (Leptinotarsa decemlineata) is a major pest and a serious threat to potato cultivation throughout the northern hemisphere. Despite its high importance for invasion biology, phenology and pest management, little is known about L. decemlineata from a genomic perspective. We subjected European L. decemlineata adult and larval transcriptome samples to 454-FLX massively-parallel DNA sequencing to characterize a basal set of genes from this species. We created a combined assembly of the adult and larval datasets including the publicly available midgut larval Roche 454 reads and provided basic annotation. We were particularly interested in diapause-specific genes and genes involved in pesticide and Bacillus thuringiensis (Bt) resistance.
Using 454-FLX pyrosequencing, we obtained a total of 898,048 reads which, together with the publicly available 804,056 midgut larval reads, were assembled into 121,912 contigs. We established a repository of genes of interest, with 101 out of the 108 diapause-specific genes described in Drosophila montana; and 621 contigs involved in insecticide resistance, including 221 CYP450, 45 GSTs, 13 catalases, 15 superoxide dismutases, 22 glutathione peroxidases, 194 esterases, 3 ADAM metalloproteases, 10 cadherins and 98 calmodulins. We found 460 putative miRNAs and we predicted a significant number of single nucleotide polymorphisms (29,205) and microsatellite loci (17,284).
This report of the assembly and annotation of the transcriptome of L. decemlineata offers new insights into diapause-associated and insecticide-resistance-associated genes in this species and provides a foundation for comparative studies with other species of insects. The data will also open new avenues for researchers using L. decemlineata as a model species, and for pest management research. Our results provide the basis for performing future gene expression and functional analysis in L. decemlineata and improve our understanding of the biology of this invasive species at the molecular level.
Molecular detection of viruses has been aided by high-throughput sequencing, permitting the genomic characterization of emerging strains. In this study, we comprehensively screened 500 respiratory secretions from children with upper and/or lower respiratory tract infections for viral pathogens. The viruses detected are described, including a divergent human parainfluenza virus type 4 from GS FLX pyrosequencing of 92 specimens. Complete full-genome characterization of the virus followed, using Single Molecule, Real-Time (SMRT) sequencing. Subsequent “primer walking” combined with Sanger sequencing validated the RS platform's utility in viral sequencing from complex clinical samples. Comparative genomics reveals the divergent strain clusters with the only completely sequenced HPIV4a subtype. However, it also exhibits various structural features present in one of the HPIV4b reference strains, opening questions regarding their lifecycle and evolutionary relationships among these viruses. Clinical data from patients infected with the strain, as well as viral prevalence estimates using real-time PCR, is also described.
The main objective of this study was to assess the abundance and diversity of chitin-degrading microbial communities in ten terrestrial and aquatic habitats in order to provide guidance to the subsequent exploration of such environments for novel chitinolytic enzymes. A combined protocol which encompassed (1) classical overall enzymatic assays, (2) chiA gene abundance measurement by qPCR, (3) chiA gene pyrosequencing, and (4) chiA gene-based PCR-DGGE was used. The chiA gene pyrosequencing is unprecedented, as it is the first massive parallel sequencing of this gene. The data obtained showed the existence across habitats of core bacterial communities responsible for chitin assimilation irrespective of ecosystem origin. Conversely, there were habitat-specific differences. In addition, a suite of sequences were obtained that are as yet unregistered in the chitinase database. In terms of chiA gene abundance and diversity, typical low-abundance/diversity versus high-abundance/diversity habitats was distinguished. From the combined data, we selected chitin-amended agricultural soil, the rhizosphere of the Arctic plant Oxyria digyna and the freshwater sponge Ephydatia fluviatilis as the most promising habitats for subsequent bioexploration. Thus, the screening strategy used is proposed as a guide for further metagenomics-based exploration of the selected habitats.
Electronic supplementary material
The online version of this article (doi:10.1007/s00253-012-4057-5) contains supplementary material, which is available to authorized users.
chiA; Bacterial community; Environment; Functional screening; chiA pyrosequencing
454 pyrosequencing, a massively parallel sequencing (MPS) technology, is often used to study HIV genetic variation. However, the substantial mismatch error rate of the PCR required to prepare HIV-containing samples for pyrosequencing has limited the detection of rare variants within viral populations to those present above ~1%. To improve detection of rare variants, we varied PCR enzymes and conditions to identify those that combined high sensitivity with a low error rate. Substitution errors were found to vary up to 3-fold between the different enzymes tested. The sensitivity of each enzyme, which impacts the number of templates amplified for pyrosequencing, was shown to vary, although not consistently across genes and different samples. We also describe an amplicon-based method to improve the consistency of read coverage over stretches of the HIV-1 genome. Twenty-two primers were designed to amplify 11 overlapping amplicons in the HIV-1 clade B gag-pol and env gp120 coding regions to encompass 4.7 kb of the viral genome per sample at sensitivities as low as 0.01-0.2%.
New high throughput pyrosequencers such as the 454 Life Sciences GS 20 are capable of massively parallelizing DNA sequencing providing an unprecedented rate of output data as well as potentially reducing costs. However, these new pyrosequencers bear a different error profile and provide shorter reads than those of a more traditional Sanger sequencer. These facts pose new challenges regarding how the data are handled and analyzed, in addition, the steep increase in the sequencers throughput calls for much computation power at a low cost.
To address these challenges, we created an automated multi-step computation pipeline integrated with a database storage system. This allowed us to store, handle, index and search (1) the output data from the GS20 sequencer (2) analysis projects, possibly multiple on every dataset (3) final results of analysis computations (4) intermediate results of computations (these allow hand-made comparisons and hence further searches by the biologists). Repeatability of computations was also a requirement. In order to access the needed computation power, we ported the pipeline to the European Grid: a large community of clusters, load balanced as a whole. In order to better achieve this Grid port we created Vnas: an innovative Grid job submission, virtual sandbox manager and job callback framework.
After some runs of the pipeline aimed at tuning the parameters and thresholds for optimal results, we successfully analyzed 273 sequenced amplicons from a cancerous human sample and correctly found punctual mutations confirmed by either Sanger resequencing or NCBI dbSNP. The sequencing was performed with our 454 Life Sciences GS 20 pyrosequencer.
We handled the steep increase in throughput from the new pyrosequencer by building an automated computation pipeline associated with database storage, and by leveraging the computing power of the European Grid. The Grid platform offers a very cost effective choice for uneven workloads, typical in many scientific research fields, provided its peculiarities can be accepted (these are discussed). The mentioned infrastructure was used to analyze human amplicons for mutations. More analyses will be performed in the future.
Motivation: DNA sequence reads from Sanger and pyrosequencing platforms differ in cost, accuracy, typical coverage, average read length and the variety of available paired-end protocols. Both read types can complement one another in a ‘hybrid’ approach to whole-genome shotgun sequencing projects, but assembly software must be modified to accommodate their different characteristics. This is true even of pyrosequencing mated and unmated read combinations. Without special modifications, assemblers tuned for homogeneous sequence data may perform poorly on hybrid data.
Results: Celera Assembler was modified for combinations of ABI 3730 and 454 FLX reads. The revised pipeline called CABOG (Celera Assembler with the Best Overlap Graph) is robust to homopolymer run length uncertainty, high read coverage and heterogeneous read lengths. In tests on four genomes, it generated the longest contigs among all assemblers tested. It exploited the mate constraints provided by paired-end reads from either platform to build larger contigs and scaffolds, which were validated by comparison to a finished reference sequence. A low rate of contig mis-assembly was detected in some CABOG assemblies, but this was reduced in the presence of sufficient mate pair data.
Availability: The software is freely available as open-source from http://wgs-assembler.sf.net under the GNU Public License.
Supplementary information: Supplementary data are available at Bioinformatics online.
Rehmannia glutinosa, one of the most widely used herbal medicines in the Orient, is rich in biologically active iridoids. Despite their medicinal importance, no molecular information about the iridoid biosynthesis in this plant is presently available. To explore the transcriptome of R. glutinosa and investigate genes involved in iridoid biosynthesis, we used massively parallel pyrosequencing on the 454 GS FLX Titanium platform to generate a substantial EST dataset. Based on sequence similarity searches against the public sequence databases, the sequences were first annotated and then subjected to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) based analysis. Bioinformatic analysis indicated that the 454 assembly contained a set of genes putatively involved in iridoid biosynthesis. Significantly, homologues of the secoiridoid pathway genes that were only identified in terpenoid indole alkaloid producing plants were also identified, whose presence implied that route II iridoids and route I iridoids share common enzyme steps in the early stage of biosynthesis. The gene expression patterns of four prenyltransferase transcripts were analyzed using qRT-PCR, which shed light on their putative functions in tissues of R. glutinosa. The data explored in this study will provide valuable information for further studies concerning iridoid biosynthesis.
Rehmannia glutinosa; iridoid biosynthesis; transcriptome
Sequence capture enrichment (SCE) strategies and massively parallel next generation sequencing (NGS) are expected to increase the rate of gene discovery for genetically heterogeneous hereditary diseases, but at present, there are very few examples of successful application of these technologic advances in translational research and clinical testing. Our study assessed whether array based target enrichment followed by re-sequencing on the Roche Genome Sequencer FLX (GS FLX) system could be used for novel mutation identification in more than 1000 exons representing 100 candidate genes for ocular birth defects, and as a control, whether these methods could detect two known mutations in the PAX2 gene. We assayed two samples with heterozygous sequence changes in PAX2 that were previously identified by conventional Sanger sequencing. These changes were a c.527G>C (S176T) substitution and a single basepair deletion c.77delG. The nucleotide substitution c.527G>C was easily identified by NGS. A deletion of one base in a long polyG stretch (c.77delG) was not registered initially by the GS Reference Mapper, but was detected in repeated analysis using two different software packages. Different approaches were evaluated for distinguishing false positives (sequencing errors) and benign polymorphisms from potentially pathogenic sequence changes that require further follow-up. Although improvements will be necessary in accuracy, speed, ease of data analysis and cost, our study confirms that NGS can be used in research and diagnostic settings to screen for mutations in hundreds of loci in genetically heterogeneous human diseases.
next generation sequencing; sequence capture; GS FLX; anophthalmia; microphthalmia; coloboma
We present dial-out PCR, a highly parallel method for retrieving accurate DNA molecules for gene synthesis. A complex library of DNA molecules is modified with unique flanking tags before massively parallel sequencing. Tag-directed primers then enable the retrieval of molecules with desired sequences by PCR. Dial-out PCR enables multiplex in vitro clone screening and is a compelling alternative to in vivo cloning and Sanger sequencing for accurate gene synthesis.
Direct population sequencing and reverse hybridization (line probe assay [LiPA])-based methods are the most common methods for detecting hepatitis B virus (HBV) drug resistance mutations, although only mutations present in viral quasispecies with a prevalence of ≥20% can be detected by sequencing, and only known mutations are detected by LiPA. Massively parallel ultradeep pyrosequencing (UDPS; GS FLX platform) was used to analyze HBV quasispecies in reverse transcriptase (RT) and hepatitis B S antigen (HBsAg) from five drug-naive patients and eight drug-resistant patients. Eight primer pairs were used to obtain partially overlapping amplicons, covering the RT gene from codons 1 to 288 and the complete overlapping HBsAg sequence. A 1% mutation frequency was selected as the cutoff based on an error rate estimated on plasmid DNA. This technology enabled simultaneous analysis of between 2,852 and 18,016 clonally amplified fragments from each patient. The results indicate that UDPS has a relative sensitivity much higher than both direct sequencing and LiPA. In addition, the UDPS results are quantitative, allowing establishment of the relative frequency of both known mutations and novel substitutions. Some of the detected RT substitutions led to changes also in HBsAg. On the whole, genotype D presented a higher heterogeneity than genotype A. Considering the high quantity of information that can be provided by a single test from one patient, the short turnaround time, the information on substitution frequency, and the detection of rare variants, there are strong advantages conferred by UDPS, and the new method could play a relevant role in the clinical management of HBV infection and therapy.
Despite the clinical utility of genetic diagnosis to address idiopathic sensorineural hearing impairment (SNHI), the current strategy for screening mutations via Sanger sequencing suffers from the limitation that only a limited number of DNA fragments associated with common deafness mutations can be genotyped. Consequently, a definitive genetic diagnosis cannot be achieved in many families with discernible family history. To investigate the diagnostic utility of massively parallel sequencing (MPS), we applied the MPS technique to 12 multiplex families with idiopathic SNHI in which common deafness mutations had previously been ruled out. NimbleGen sequence capture array was designed to target all protein coding sequences (CDSs) and 100 bp of the flanking sequence of 80 common deafness genes. We performed MPS on the Illumina HiSeq2000, and applied BWA, SAMtools, Picard, GATK, Variant Tools, ANNOVAR, and IGV for bioinformatics analyses. Initial data filtering with allele frequencies (<5% in the 1000 Genomes Project and 5400 NHLBI exomes) and PolyPhen2/SIFT scores (>0.95) prioritized 5 indels (insertions/deletions) and 36 missense variants in the 12 multiplex families. After further validation by Sanger sequencing, segregation pattern, and evolutionary conservation of amino acid residues, we identified 4 variants in 4 different genes, which might lead to SNHI in 4 families compatible with autosomal dominant inheritance. These included GJB2 p.R75Q, MYO7A p.T381M, KCNQ4 p.S680F, and MYH9 p.E1256K. Among them, KCNQ4 p.S680F and MYH9 p.E1256K were novel. In conclusion, MPS allows genetic diagnosis in multiplex families with idiopathic SNHI by detecting mutations in relatively uncommon deafness genes.