Charcot-Marie-Tooth (CMT) disease is a clinically and genetically heterogeneous distal symmetric polyneuropathy. Whole-exome sequencing (WES) of 40 individuals from 37 unrelated families with CMT-like peripheral neuropathy refractory to molecular diagnosis identified apparent causal mutations in ~45% (17/37) of families. Three candidate disease genes are proposed, supported by a combination of genetic and in vivo studies. Aggregate analysis of mutation data revealed a significantly increased number of rare variants across 58 neuropathy associated genes in subjects versus controls; confirmed in a second ethnically discrete neuropathy cohort, suggesting mutation burden potentially contributes to phenotypic variability. Neuropathy genes shown to have highly penetrant Mendelizing variants (HMPVs) and implicated by burden in families were shown to interact genetically in a zebrafish assay exacerbating the phenotype established by the suppression of single genes. Our findings suggest that the combinatorial effect of rare variants contributes to disease burden and variable expressivity.
Left ventricular noncompaction (LVNC) is an autosomal dominant, genetically heterogeneous cardiomyopathy with variable severity, which may co-occur with cardiac hypertrophy.
Methods and Results
Here, we generated whole exome sequence (WES) data from multiple members from five families with LVNC. In four out of five families, the candidate causative mutation segregates with disease in known LVNC genes MYH7 and TPM1. Subsequent sequencing of MYH7 in a larger LVNC cohort identified seven novel likely disease causing variants. In the fifth family, we identified a frameshift mutation in NNT, a nuclear encoded mitochondrial protein, not implicated previously in human cardiomyopathies. Resequencing of NNT in additional LVNC families identified a second likely pathogenic missense allele. Suppression of nnt in zebrafish caused early ventricular malformation and contractility defects, likely driven by altered cardiomyocyte proliferation. In vivo complementation studies showed that mutant human NNT failed to rescue nnt morpholino-induced heart dysfunction, indicating a probable haploinsufficiency mechanism.
Together, our data expand the genetic spectrum of LVNC and demonstrate how the intersection of WES with in vivo functional studies can accelerate the identification of genes that drive human genetic disorders.
noncompaction cardiomyopathy; genetics; human; genomics; left ventricular noncompaction
Genomic deletions, inversions, and other rearrangements known collectively as structural variations (SVs) are implicated in many human disorders. Technologies for sequencing DNA provide a potentially rich source of information in which to detect breakpoints of structural variations at base-pair resolution. However, accurate prediction of SVs remains challenging, and existing informatics tools predict rearrangements with significant rates of false positives or negatives.
To address this challenge, we developed ‘Structural Variation detection by STAck and Tail’ (SV-STAT) which implements a novel scoring metric. The software uses this statistic to quantify evidence for structural variation in genomic regions suspected of harboring rearrangements. To demonstrate SV-STAT, we used targeted and genome-wide approaches. First, we applied a custom capture array followed by Roche/454 and SV-STAT to three pediatric B-lineage acute lymphoblastic leukemias, identifying five structural variations joining known and novel breakpoint regions. Next, we detected SVs genome-wide in paired-end Illumina data collected from additional tumor samples. SV-STAT showed predictive accuracy as high as or higher than leading alternatives. The software is freely available under the terms of the GNU General Public License version 3 at https://gitorious.org/svstat/svstat.
SV-STAT works across multiple sequencing chemistries, paired and single-end technologies, targeted or whole-genome strategies, and it complements existing SV-detection software. The method is a significant advance towards accurate detection and genotyping of genomic rearrangements from DNA sequencing data.
Electronic supplementary material
The online version of this article (doi:10.1186/s13029-016-0051-0) contains supplementary material, which is available to authorized users.
Algorithm; Genome; Sequencing; Structural variation; Genotype; Translocation; Cancer
Genome-wide data are increasingly important in the clinical evaluation of human disease. However, the large number of variants observed in individual patients challenges the efficiency and accuracy of diagnostic review. Recent work has shown that systematic integration of clinical phenotype data with genotype information can improve diagnostic workflows and prioritization of filtered rare variants. We have developed visually interactive, analytically transparent analysis software that leverages existing disease catalogs, such as the Online Mendelian Inheritance in Man database (OMIM) and the Human Phenotype Ontology (HPO), to integrate patient phenotype and variant data into ranked diagnostic alternatives.
Our tool, “OMIM Explorer” (http://www.omimexplorer.com), extends the biomedical application of semantic similarity methods beyond those reported in previous studies. The tool also provides a simple interface for translating free-text clinical notes into HPO terms, enabling clinical providers and geneticists to contribute phenotypes to the diagnostic process. The visual approach uses semantic similarity with multidimensional scaling to collapse high-dimensional phenotype and genotype data from an individual into a graphical format that contextualizes the patient within a low-dimensional disease map. The map proposes a differential diagnosis and algorithmically suggests potential alternatives for phenotype queries—in essence, generating a computationally assisted differential diagnosis informed by the individual’s personal genome. Visual interactivity allows the user to filter and update variant rankings by interacting with intermediate results. The tool also implements an adaptive approach for disease gene discovery based on patient phenotypes.
We retrospectively analyzed pilot cohort data from the Baylor Miraca Genetics Laboratory, demonstrating performance of the tool and workflow in the re-analysis of clinical exomes. Our tool assigned to clinically reported variants a median rank of 2, placing causal variants in the top 1 % of filtered candidates across the 47 cohort cases with reported molecular diagnoses of exome variants in OMIM Morbidmap genes. Our tool outperformed Phen-Gen, eXtasy, PhenIX, PHIVE, and hiPHIVE in the prioritization of these clinically reported variants.
Our integrative paradigm can improve efficiency and, potentially, the quality of genomic medicine by more effectively utilizing available phenotype information, catalog data, and genomic knowledge.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-016-0261-8) contains supplementary material, which is available to authorized users.
Disease gene discovery; Exome; Semantic similarity; Variant prioritization
Gliomas are the most common brain tumor, with several histological subtypes of various malignancy grade. The genetic contribution to familial glioma is not well understood. Using whole exome sequencing of 90 individuals from 55 families, we identified two families with mutations in POT1 (p.G95C, p.E450X), a member of the telomere shelterin complex, shared by both affected individuals in each family and predicted to impact DNA binding and TPP1 binding, respectively. Validation in a separate cohort of 264 individuals from 246 families identified an additional mutation in POT1 (p.D617Efs), also predicted to disrupt TPP1 binding. All families with POT1 mutations had affected members with oligodendroglioma, a specific subtype of glioma more sensitive to irradiation. These findings are important for understanding the origin of glioma and could have importance for the future diagnostics and treatment of glioma.
The Brain Tumor Epidemiology Consortium (BTEC) is an open scientific forum, which fosters the development of multi-center, international and inter-disciplinary collaborations. BTEC aims to develop a better understanding of the etiology, outcomes, and prevention of brain tumors (http://epi.grants.cancer.gov/btec/). The 15th annual Brain Tumor Epidemiology Consortium Meeting, hosted by the Austrian Societies of Neuropathology and Neuro-oncology, was held on September 9 – 11, 2014 in Vienna, Austria. The meeting focused on the central role of brain tumor epidemiology within multidisciplinary neuro-oncology. Knowledge of disease incidence, outcomes, as well as risk factors is fundamental to all fields involved in research and treatment of patients with brain tumors; thus, epidemiology constitutes an important link between disciplines, indeed the very hub. This was reflected by the scientific program, which included various sessions linking brain tumor epidemiology with clinical neuro-oncology, tissue-based research, and cancer registration. Renowned experts from Europe and the United States contributed their personal perspectives stimulating further group discussions. Several concrete action plans evolved for the group to move forward until next year’s meeting, which will be held at the Mayo Clinic at Rochester, MN, USA.
brain tumor; epidemiology; clinical research; tissue-based research; risk factor research
Besides its growing importance in clinical diagnostics and understanding the genetic basis of Mendelian and complex diseases, whole exome sequencing (WES) is a rich source of additional information of potential clinical utility for physicians, patients and their families. We analyzed the frequency and nature of single nucleotide variants (SNVs) considered secondary findings and recessive disease allele carrier status in the exomes of 8554 individuals from a large, randomly sampled cohort study and 2514 patients from a study of presumed Mendelian disease having undergone WES.
We used the same sequencing platform and data processing pipeline to analyze all samples and characterized the distributions of reported pathogenic (ClinVar, Human Gene Mutation Database (HGMD)) and predicted deleterious variants in the pre-specified American College of Medical Genetics and Genomics (ACMG) secondary findings and recessive disease genes in different ethnic groups.
In the 56 ACMG secondary findings genes, the average number of predicted deleterious variants per individual was 0.74, and the mean number of ClinVar reported pathogenic variants was 0.06. We observed an average of 10 deleterious and 0.78 ClinVar reported pathogenic variants per individual in 1423 autosomal recessive disease genes. By repeatedly sampling pairs of exomes, 0.5 % of the randomly generated couples were at 25 % risk of having an affected offspring for an autosomal recessive disorder based on the ClinVar variants.
By investigating reported pathogenic and novel, predicted deleterious variants we estimated the lower and upper limits of the population fraction for which exome sequencing may reveal additional medically relevant information. We suggest that the observed wide range for the lower and upper limits of these frequency numbers will be gradually reduced due to improvement in classification databases and prediction algorithms.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0171-1) contains supplementary material, which is available to authorized users.
CLP1 is a RNA kinase involved in tRNA splicing. Recently, CLP1 kinase-dead mice were shown to display a neuromuscular disorder with loss of motor neurons and muscle paralysis. Human genome analyses now identified a CLP1 homozygous missense mutation (p.R140H) in five unrelated families, leading to a loss of CLP1 interaction with the tRNA splicing endonuclease (TSEN) complex, largely reduced pre-tRNA cleavage activity, and accumulation of linear tRNA introns. The affected individuals develop severe motor-sensory defects, cortical dysgenesis and microcephaly. Mice carrying kinase-dead CLP1 also displayed microcephaly and reduced cortical brain volume due to the enhanced cell death of neuronal progenitors that is associated with reduced numbers of cortical neurons. Our data elucidate a novel neurological syndrome defined by CLP1 mutations that impair tRNA splicing. Reduction of a founder mutation to homozygosity illustrates the importance of rare variations in disease and supports the clan genomics hypothesis.
Glioma is a rare, but highly fatal, cancer that accounts for the majority of malignant primary brain tumors. Inherited predisposition to glioma has been consistently observed within non-syndromic families. Our previous studies, which involved non-parametric and parametric linkage analyses, both yielded significant linkage peaks on chromosome 17q. Here, we use data from next generation and Sanger sequencing to identify familial glioma candidate genes and variants on chromosome 17q for further investigation. We applied a filtering schema to narrow the original list of 4830 annotated variants down to 21 very rare (<0.1% frequency), non-synonymous variants. Our findings implicate the MYO19 and KIF18B genes and rare variants in SPAG9 and RUNDC1 as candidates worthy of further investigation. Burden testing and functional studies are planned.
We undertook a gene identification and molecular characterization project in a large kindred originally clinically diagnosed with SCA-X1. While presenting with ataxia, this kindred also had some unique peripheral nervous system features. The implicated region on the X chromosome was delineated using haplotyping. Large deletions and duplications were excluded by array comparative genomic hybridization. Exome sequencing was undertaken in two affected subjects. The single identified X chromosome candidate variant was then confirmed to co-segregate appropriately in all affected, carrier and unaffected family members by Sanger sequencing. The variant was confirmed to be novel by comparison with dbSNP, and filtering for a minor allele frequency of <1% in 1000 Genomes project, and was not present in the NHLBI Exome Sequencing Project or a local database at the BCM HGSC. Functional experiments on transfected cells were subsequently undertaken to assess the biological effect of the variant in vitro. The variant identified consisted of a previously unidentified non-synonymous variant, GJB1 p.P58S, in the Connexin 32/Gap Junction Beta 1 gene. Segregation studies with Sanger sequencing confirmed the presence of the variant in all affected individuals and one known carrier, and the absence of the variant in unaffected members. Functional studies confirmed that the p.P58S variant reduced the number and size of gap junction plaques, but the conductance of the gap junctions was unaffected. Two X-linked ataxias have been associated with genetic loci, with the first of these recently characterized at the molecular level. This represents the second kindred with molecular characterization of X-linked ataxia, and is the first instance of a previously unreported GJB1 mutation with a dominant and permanent ataxia phenotype, although different CNS deficits have previously been reported. This pedigree has also been relatively unique in its phenotype due to the presence of central and peripheral neural abnormalities. Other X-linked SCAs with unique features might therefore also potentially represent variable phenotypic expression of other known neurological entities.
Whole-exome sequencing is a diagnostic approach for the identification of molecular defects in patients with suspected genetic disorders.
We developed technical, bioinformatic, interpretive, and validation pipelines for whole-exome sequencing in a certified clinical laboratory to identify sequence variants underlying disease phenotypes in patients.
We present data on the first 250 probands for whom referring physicians ordered whole-exome sequencing. Patients presented with a range of phenotypes suggesting potential genetic causes. Approximately 80% were children with neurologic pheno-types. Insurance coverage was similar to that for established genetic tests. We identified 86 mutated alleles that were highly likely to be causative in 62 of the 250 patients, achieving a 25% molecular diagnostic rate (95% confidence interval, 20 to 31). Among the 62 patients, 33 had autosomal dominant disease, 16 had auto-somal recessive disease, and 9 had X-linked disease. A total of 4 probands received two nonoverlapping molecular diagnoses, which potentially challenged the clinical diagnosis that had been made on the basis of history and physical examination. A total of 83% of the autosomal dominant mutant alleles and 40% of the X-linked mutant alleles occurred de novo. Recurrent clinical phenotypes occurred in patients with mutations that were highly likely to be causative in the same genes and in different genes responsible for genetically heterogeneous disorders.
Whole-exome sequencing identified the underlying genetic defect in 25% of consecutive patients referred for evaluation of a possible genetic condition. (Funded by the National Human Genome Research Institute.)
In a large cohort of osteogenesis imperfecta type V (OI type V) patients (17 individuals from 12 families), we identified the same mutation in the 5' UTR of the IFITM5 gene by whole exome and Sanger sequencing (IFITM5 c.-14C>T) and provide a detailed description of their phenotype. This mutation leads to the creation of a novel start codon adding 5 residues to IFITM5 and was recently reported in several other OI type V families. The variability of the phenotype was quite large even within families. Whereas some patients presented with the typical calcification of the forearm interosseous membrane, radial head dislocation and hyperplastic callus (HPC) formation following fractures, others had only some of the typical OI type V findings. Thirteen had calcification of interosseous membranes, fourteen had radial head dislocations, ten had HPC, nine had long bone bowing, eleven could ambulate without assistance, and one had mild unilateral mixed hearing loss. The bone mineral density varied greatly, even within families. Our study thus highlights the phenotypic variability of OI type V caused by the IFITM5 mutation.
Breast cancer is one of the most commonly diagnosed cancers in women. While there are several effective therapies for breast cancer and important single gene prognostic/predictive markers, more than 40,000 women die from this disease every year. The increasing availability of large-scale genomic datasets provides opportunities for identifying factors that influence breast cancer survival in smaller, well-defined subsets. The purpose of this study was to investigate the genomic landscape of various breast cancer subtypes and its potential associations with clinical outcomes. We used statistical analysis of sequence data generated by the Cancer Genome Atlas initiative including somatic mutation load (SML) analysis, Kaplan–Meier survival curves, gene mutational frequency, and mutational enrichment evaluation to study the genomic landscape of breast cancer. We show that ER+, but not ER−, tumors with high SML associate with poor overall survival (HR = 2.02). Further, these high mutation load tumors are enriched for coincident mutations in both DNA damage repair and ER signature genes. While it is known that somatic mutations in specific genes affect breast cancer survival, this study is the first to identify that SML may constitute an important global signature for a subset of ER+ tumors prone to high mortality. Moreover, although somatic mutations in individual DNA damage genes affect clinical outcome, our results indicate that coincident mutations in DNA damage response and signature ER genes may prove more informative for ER+ breast cancer survival. Next generation sequencing may prove an essential tool for identifying pathways underlying poor outcomes and for tailoring therapeutic strategies.
Mutation load; Breast cancer; DNA damage repair; Estrogen receptor
Although familial susceptibility to glioma is known, the genetic basis for this susceptibility remains unidentified in the majority of glioma-specific families. An alternative approach to identifying such genes is to examine cancer pedigrees, which include glioma as one of several cancer phenotypes, to determine whether common chromosomal modifications might account for the familial aggregation of glioma and other cancers.
Germline rearrangements in 146 glioma families (from the Gliogene Consortium; http://www.gliogene.org/) were examined using multiplex ligation-dependent probe amplification. These families all had at least 2 verified glioma cases and a third reported or verified glioma case in the same family or 2 glioma cases in the family with at least one family member affected with melanoma, colon, or breast cancer.The genomic areas covering TP53, CDKN2A, MLH1, and MSH2 were selected because these genes have been previously reported to be associated with cancer pedigrees known to include glioma.
We detected a single structural rearrangement, a deletion of exons 1-6 in MSH2, in the proband of one family with 3 cases with glioma and one relative with colon cancer.
Large deletions and duplications are rare events in familial glioma cases, even in families with a strong family history of cancers that may be involved in known cancer syndromes.
CDKN2A/B; family history; glioma; MLH1; MSH2; TP53
The debate regarding the relative merits of whole genome sequencing (WGS) versus exome sequencing (ES) centers around comparative cost, average depth of coverage for each interrogated base, and their relative efficiency in the identification of medically actionable variants from the myriad of variants identified by each approach. Nevertheless, few genomes have been subjected to both WGS and ES, using multiple next generation sequencing platforms. In addition, no personal genome has been so extensively analyzed using DNA derived from peripheral blood as opposed to DNA from transformed cell lines that may either accumulate mutations during propagation or clonally expand mosaic variants during cell transformation and propagation.
We investigated a genome that was studied previously by SOLiD chemistry using both ES and WGS, and now perform six independent ES assays (Illumina GAII (x2), Illumina HiSeq (x2), Life Technologies' Personal Genome Machine (PGM) and Proton), and one additional WGS (Illumina HiSeq).
We compared the variants identified by the different methods and provide insights into the differences among variants identified between ES runs in the same technology platform and among different sequencing technologies. We resolved the true genotypes of medically actionable variants identified in the proband through orthogonal experimental approaches. Furthermore, ES identified an additional SH3TC2 variant (p.M1?) that likely contributes to the phenotype in the proband.
ES identified additional medically actionable variant calls and helped resolve ambiguous single nucleotide variants (SNV) documenting the power of increased depth of coverage of the captured targeted regions. Comparative analyses of WGS and ES reveal that pseudogenes and segmental duplications may explain some instances of apparent disease mutations in unaffected individuals.
Exome sequencing; Whole-genome sequencing; Incidental findings; SH3TC2; Personal genomes; Precision medicine
Since the initial report of targeted-enrichment (Albert et al, 2007) we have been evolving the design and utility of capture reagents and methods, while taking advantage of the parallel advances in sequencing platforms. New exome designs target a comprehensive set of coding exons from 6 different gene databases, as well as computationally predicted coding and non-coding elements: regulatory regions, and conserved UTRs. Library automation, reduction of DNA input samples, capture hybridization multiplexing and application of faster read mapping tools such as BWA, together allow a rate of >4,300 libraries/captures per month, with >40,000 exome and regional capture libraries completed to date. In addition, a fully integrated informatics and analysis pipeline (Mercury), supports all aspects of data flow and analysis from the initial data production on the sequencing instrument to annotated variant calls (SNPs and small Indels). These laboratory methods and analysis pipelines have been production hardened at the Human Genome Sequencing Center (HGSC) and have now been applied toward clinical exome sequencing. Through a joint collaboration between the Human Genome Sequencing Center and the Medical Genetics Laboratories (MGL) of the Department of Molecular and Human Genetics, clinical exome sequencing and interpretation are now provided through the CAP/CLIA certified Whole Genome Laboratory (WGL). To date, the WGL has completed exome sequencing of 650 patient samples and final interpretation completed for over 450 patients with causative deleterious mutations identified in 25% of cases. Performance has been maintained to a high standard of 95% of the exome target bases represented at 20X coverage. Overall exome performance metrics, LIMS support, variant analysis and validation of the clinical pipeline for a CAP/CLIA environment will be presented.
Polymicrogyria is a disorder of neuronal development resulting in structurally abnormal cerebral hemispheres characterized by over-folding and abnormal lamination of the cerebral cortex. Polymicrogyria is frequently associated with severe neurologic deficits including intellectual disability, motor problems, and epilepsy. There are acquired and genetic causes of polymicrogyria, but most patients with a presumed genetic etiology lack a specific diagnosis. Here we report using whole-exome sequencing to identify compound heterozygous mutations in the WD repeat domain 62 (WDR62) gene as the cause of recurrent polymicrogyria in a sibling pair. Sanger sequencing confirmed that the siblings both inherited 1-bp (maternal allele) and 2-bp (paternal allele) frameshift deletions, which predict premature truncation of WDR62, a protein that has a role in early cortical development. The probands are from a non-consanguineous family of Northern European descent, suggesting that autosomal recessive PMG due to compound heterozygous mutation of WDR62 might be a relatively common cause of PMG in the population. Further studies to identify mutation frequency in the population are needed.
malformations of cortical development; high-throughput nucleotide sequencing; genetic testing; epilepsy; intellectual disability
Molecular diagnostics can resolve locus heterogeneity underlying clinical phenotypes that may otherwise be co-assigned as a specific syndrome based on shared clinical features, and can associate phenotypically diverse diseases to a single locus through allelic affinity. Here we describe an apparently novel syndrome, likely caused by de novo truncating mutations in ASXL3, which shares characteristics with Bohring-Opitz syndrome, a disease associated with de novo truncating mutations in ASXL1.
We used whole-genome and whole-exome sequencing to interrogate the genomes of four subjects with an undiagnosed syndrome.
Using genome-wide sequencing, we identified heterozygous, de novo truncating mutations in ASXL3, a transcriptional repressor related to ASXL1, in four unrelated probands. We found that these probands shared similar phenotypes, including severe feeding difficulties, failure to thrive, and neurologic abnormalities with significant developmental delay. Further, they showed less phenotypic overlap with patients who had de novo truncating mutations in ASXL1.
We have identified truncating mutations in ASXL3 as the likely cause of a novel syndrome with phenotypic overlap with Bohring-Opitz syndrome.
Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues.
We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set.
We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood.
We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS.
We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.
Whole-genome sequencing of patient DNA can facilitate diagnosis of a disease, but its potential for guiding treatment has been under-realized. We interrogated the complete genome sequences of a 14-year-old fraternal twin pair diagnosed with dopa (3,4-dihydroxyphenylalanine)–responsive dystonia (DRD; Mendelian Inheritance in Man #128230). DRD is a genetically heterogeneous and clinically complex movement disorder that is usually treated with l-dopa, a precursor of the neurotransmitter dopamine. Whole-genome sequencing identified compound heterozygous mutations in the SPR gene encoding sepiapterin reductase. Disruption of SPR causes a decrease in tetrahydrobiopterin, a cofactor required for the hydroxylase enzymes that synthesize the neurotransmitters dopamine and serotonin. Supplementation of l-dopa therapy with 5-hydroxytryptophan, a serotonin precursor, resulted in clinical improvements in both twins.
Next-generation DNA sequencing is opening new avenues for genetic association studies in common diseases that, like deep vein thrombosis (DVT), have a strong genetic predisposition still largely unexplained by currently identified risk variants. In order to develop sequencing and analytical pipelines for the application of next-generation sequencing to complex diseases, we conducted a pilot study sequencing the coding area of 186 hemostatic/proinflammatory genes in 10 Italian cases of idiopathic DVT and 12 healthy controls.
A molecular-barcoding strategy was used to multiplex DNA target capture and sequencing, while retaining individual sequence information. Genomic libraries with barcode sequence-tags were pooled (in pools of 8 or 16 samples) and enriched for target DNA sequences. Sequencing was performed on ABI SOLiD-4 platforms. We produced > 12 gigabases of raw sequence data to sequence at high coverage (average: 42X) the 700-kilobase target area in 22 individuals. A total of 1876 high-quality genetic variants were identified (1778 single nucleotide substitutions and 98 insertions/deletions). Annotation on databases of genetic variation and human disease mutations revealed several novel, potentially deleterious mutations. We tested 576 common variants in a case-control association analysis, carrying the top-5 associations over to replication in up to 719 DVT cases and 719 controls. We also conducted an analysis of the burden of nonsynonymous variants in coagulation factor and anticoagulant genes. We found an excess of rare missense mutations in anticoagulant genes in DVT cases compared to controls and an association for a missense polymorphism of FGA (rs6050; p = 1.9 × 10-5, OR 1.45; 95% CI, 1.22-1.72; after replication in > 1400 individuals).
We implemented a barcode-based strategy to efficiently multiplex sequencing of hundreds of candidate genes in several individuals. In the relatively small dataset of our pilot study we were able to identify bona fide associations with DVT. Our study illustrates the potential of next-generation sequencing for the discovery of genetic variation predisposing to complex diseases.
Deep vein thrombosis; venous thromboembolism; next-generation sequencing; target capture; multiplexing; FGA; rs6025; heamostateome; DVT; VTE
We have developed a solution-based method for targeted DNA capture-sequencing that is directed to the complete human exome. Using this approach allows the discovery of greater than 95% of all expected heterozygous singe base variants, requires as little as 3 Gbp of raw sequence data and constitutes an effective tool for identifying rare coding alleles in large scale genomic studies.
We have applied a high-throughput pyrosequencing technology for transcriptome profiling of Caenorhabditis elegans in its first larval stage. Using this approach, we have generated a large amount of data for expressed sequence tags, which provides an opportunity for the discovery of putative novel transcripts and alternative splice variants that could be developmentally specific to the first larval stage. This work also demonstrates the successful and efficient application of a next generation sequencing methodology.
We have generated over 30 million bases of novel expressed sequence tags from first larval stage worms utilizing high-throughput sequencing technology. We have shown that approximately 14% of the newly sequenced expressed sequence tags map completely or partially to genomic regions where there are no annotated genes or splice variants and therefore, imply that these are novel genetic structures. Expressed sequence tags, which map to intergenic (around 1000) and intronic regions (around 580), may represent novel transcribed regions, such as unannotated or unrecognized small protein-coding or non-protein-coding genes or splice variants. Expressed sequence tags, which map across intron-exon boundaries (around 300), indicate possible alternative splice sites, while expressed sequence tags, which map near the ends of known transcripts (around 600), suggest extension of the coding or untranslated regions. We have also discovered that intergenic and intronic expressed sequence tags, which are well conserved across different nematode species, are likely to represent non-coding RNAs. Lastly, we have incorporated available serial analysis of gene expression data generated from first larval stage worms, in order to predict novel transcripts that might be specifically or predominantly expressed in the first larval stage.
We have demonstrated the use of a high-throughput sequencing methodology to efficiently produce a snap-shot of transcriptional activities occurring in the first larval stage of C. elegans development. Such application of this new sequencing technique allows for high-throughput, genome-wide experimental verification of known and novel transcripts. This study provides a more complete C. elegans transcriptome profile and, furthermore, gives insight into the evolutionary and biological complexity of this organism.