Glioma is a rare, but highly fatal, cancer that accounts for the majority of malignant primary brain tumors. Inherited predisposition to glioma has been consistently observed within non-syndromic families. Our previous studies, which involved non-parametric and parametric linkage analyses, both yielded significant linkage peaks on chromosome 17q. Here, we use data from next generation and Sanger sequencing to identify familial glioma candidate genes and variants on chromosome 17q for further investigation. We applied a filtering schema to narrow the original list of 4830 annotated variants down to 21 very rare (<0.1% frequency), non-synonymous variants. Our findings implicate the MYO19 and KIF18B genes and rare variants in SPAG9 and RUNDC1 as candidates worthy of further investigation. Burden testing and functional studies are planned.
The Brain Tumor Epidemiology Consortium (BTEC) is an open scientific forum, which fosters the development of multi-center, international and inter-disciplinary collaborations. BTEC aims to develop a better understanding of the etiology, outcomes, and prevention of brain tumors (http://epi.grants.cancer.gov/btec/). The 15th annual Brain Tumor Epidemiology Consortium Meeting, hosted by the Austrian Societies of Neuropathology and Neuro-oncology, was held on September 9 – 11, 2014 in Vienna, Austria. The meeting focused on the central role of brain tumor epidemiology within multidisciplinary neuro-oncology. Knowledge of disease incidence, outcomes, as well as risk factors is fundamental to all fields involved in research and treatment of patients with brain tumors; thus, epidemiology constitutes an important link between disciplines, indeed the very hub. This was reflected by the scientific program, which included various sessions linking brain tumor epidemiology with clinical neuro-oncology, tissue-based research, and cancer registration. Renowned experts from Europe and the United States contributed their personal perspectives stimulating further group discussions. Several concrete action plans evolved for the group to move forward until next year’s meeting, which will be held at the Mayo Clinic at Rochester, MN, USA.
brain tumor; epidemiology; clinical research; tissue-based research; risk factor research
We undertook a gene identification and molecular characterization project in a large kindred originally clinically diagnosed with SCA-X1. While presenting with ataxia, this kindred also had some unique peripheral nervous system features. The implicated region on the X chromosome was delineated using haplotyping. Large deletions and duplications were excluded by array comparative genomic hybridization. Exome sequencing was undertaken in two affected subjects. The single identified X chromosome candidate variant was then confirmed to co-segregate appropriately in all affected, carrier and unaffected family members by Sanger sequencing. The variant was confirmed to be novel by comparison with dbSNP, and filtering for a minor allele frequency of <1% in 1000 Genomes project, and was not present in the NHLBI Exome Sequencing Project or a local database at the BCM HGSC. Functional experiments on transfected cells were subsequently undertaken to assess the biological effect of the variant in vitro. The variant identified consisted of a previously unidentified non-synonymous variant, GJB1 p.P58S, in the Connexin 32/Gap Junction Beta 1 gene. Segregation studies with Sanger sequencing confirmed the presence of the variant in all affected individuals and one known carrier, and the absence of the variant in unaffected members. Functional studies confirmed that the p.P58S variant reduced the number and size of gap junction plaques, but the conductance of the gap junctions was unaffected. Two X-linked ataxias have been associated with genetic loci, with the first of these recently characterized at the molecular level. This represents the second kindred with molecular characterization of X-linked ataxia, and is the first instance of a previously unreported GJB1 mutation with a dominant and permanent ataxia phenotype, although different CNS deficits have previously been reported. This pedigree has also been relatively unique in its phenotype due to the presence of central and peripheral neural abnormalities. Other X-linked SCAs with unique features might therefore also potentially represent variable phenotypic expression of other known neurological entities.
Whole-exome sequencing is a diagnostic approach for the identification of molecular defects in patients with suspected genetic disorders.
We developed technical, bioinformatic, interpretive, and validation pipelines for whole-exome sequencing in a certified clinical laboratory to identify sequence variants underlying disease phenotypes in patients.
We present data on the first 250 probands for whom referring physicians ordered whole-exome sequencing. Patients presented with a range of phenotypes suggesting potential genetic causes. Approximately 80% were children with neurologic pheno-types. Insurance coverage was similar to that for established genetic tests. We identified 86 mutated alleles that were highly likely to be causative in 62 of the 250 patients, achieving a 25% molecular diagnostic rate (95% confidence interval, 20 to 31). Among the 62 patients, 33 had autosomal dominant disease, 16 had auto-somal recessive disease, and 9 had X-linked disease. A total of 4 probands received two nonoverlapping molecular diagnoses, which potentially challenged the clinical diagnosis that had been made on the basis of history and physical examination. A total of 83% of the autosomal dominant mutant alleles and 40% of the X-linked mutant alleles occurred de novo. Recurrent clinical phenotypes occurred in patients with mutations that were highly likely to be causative in the same genes and in different genes responsible for genetically heterogeneous disorders.
Whole-exome sequencing identified the underlying genetic defect in 25% of consecutive patients referred for evaluation of a possible genetic condition. (Funded by the National Human Genome Research Institute.)
In a large cohort of osteogenesis imperfecta type V (OI type V) patients (17 individuals from 12 families), we identified the same mutation in the 5' UTR of the IFITM5 gene by whole exome and Sanger sequencing (IFITM5 c.-14C>T) and provide a detailed description of their phenotype. This mutation leads to the creation of a novel start codon adding 5 residues to IFITM5 and was recently reported in several other OI type V families. The variability of the phenotype was quite large even within families. Whereas some patients presented with the typical calcification of the forearm interosseous membrane, radial head dislocation and hyperplastic callus (HPC) formation following fractures, others had only some of the typical OI type V findings. Thirteen had calcification of interosseous membranes, fourteen had radial head dislocations, ten had HPC, nine had long bone bowing, eleven could ambulate without assistance, and one had mild unilateral mixed hearing loss. The bone mineral density varied greatly, even within families. Our study thus highlights the phenotypic variability of OI type V caused by the IFITM5 mutation.
Breast cancer is one of the most commonly diagnosed cancers in women. While there are several effective therapies for breast cancer and important single gene prognostic/predictive markers, more than 40,000 women die from this disease every year. The increasing availability of large-scale genomic datasets provides opportunities for identifying factors that influence breast cancer survival in smaller, well-defined subsets. The purpose of this study was to investigate the genomic landscape of various breast cancer subtypes and its potential associations with clinical outcomes. We used statistical analysis of sequence data generated by the Cancer Genome Atlas initiative including somatic mutation load (SML) analysis, Kaplan–Meier survival curves, gene mutational frequency, and mutational enrichment evaluation to study the genomic landscape of breast cancer. We show that ER+, but not ER−, tumors with high SML associate with poor overall survival (HR = 2.02). Further, these high mutation load tumors are enriched for coincident mutations in both DNA damage repair and ER signature genes. While it is known that somatic mutations in specific genes affect breast cancer survival, this study is the first to identify that SML may constitute an important global signature for a subset of ER+ tumors prone to high mortality. Moreover, although somatic mutations in individual DNA damage genes affect clinical outcome, our results indicate that coincident mutations in DNA damage response and signature ER genes may prove more informative for ER+ breast cancer survival. Next generation sequencing may prove an essential tool for identifying pathways underlying poor outcomes and for tailoring therapeutic strategies.
Mutation load; Breast cancer; DNA damage repair; Estrogen receptor
Although familial susceptibility to glioma is known, the genetic basis for this susceptibility remains unidentified in the majority of glioma-specific families. An alternative approach to identifying such genes is to examine cancer pedigrees, which include glioma as one of several cancer phenotypes, to determine whether common chromosomal modifications might account for the familial aggregation of glioma and other cancers.
Germline rearrangements in 146 glioma families (from the Gliogene Consortium; http://www.gliogene.org/) were examined using multiplex ligation-dependent probe amplification. These families all had at least 2 verified glioma cases and a third reported or verified glioma case in the same family or 2 glioma cases in the family with at least one family member affected with melanoma, colon, or breast cancer.The genomic areas covering TP53, CDKN2A, MLH1, and MSH2 were selected because these genes have been previously reported to be associated with cancer pedigrees known to include glioma.
We detected a single structural rearrangement, a deletion of exons 1-6 in MSH2, in the proband of one family with 3 cases with glioma and one relative with colon cancer.
Large deletions and duplications are rare events in familial glioma cases, even in families with a strong family history of cancers that may be involved in known cancer syndromes.
CDKN2A/B; family history; glioma; MLH1; MSH2; TP53
The debate regarding the relative merits of whole genome sequencing (WGS) versus exome sequencing (ES) centers around comparative cost, average depth of coverage for each interrogated base, and their relative efficiency in the identification of medically actionable variants from the myriad of variants identified by each approach. Nevertheless, few genomes have been subjected to both WGS and ES, using multiple next generation sequencing platforms. In addition, no personal genome has been so extensively analyzed using DNA derived from peripheral blood as opposed to DNA from transformed cell lines that may either accumulate mutations during propagation or clonally expand mosaic variants during cell transformation and propagation.
We investigated a genome that was studied previously by SOLiD chemistry using both ES and WGS, and now perform six independent ES assays (Illumina GAII (x2), Illumina HiSeq (x2), Life Technologies' Personal Genome Machine (PGM) and Proton), and one additional WGS (Illumina HiSeq).
We compared the variants identified by the different methods and provide insights into the differences among variants identified between ES runs in the same technology platform and among different sequencing technologies. We resolved the true genotypes of medically actionable variants identified in the proband through orthogonal experimental approaches. Furthermore, ES identified an additional SH3TC2 variant (p.M1?) that likely contributes to the phenotype in the proband.
ES identified additional medically actionable variant calls and helped resolve ambiguous single nucleotide variants (SNV) documenting the power of increased depth of coverage of the captured targeted regions. Comparative analyses of WGS and ES reveal that pseudogenes and segmental duplications may explain some instances of apparent disease mutations in unaffected individuals.
Exome sequencing; Whole-genome sequencing; Incidental findings; SH3TC2; Personal genomes; Precision medicine
Since the initial report of targeted-enrichment (Albert et al, 2007) we have been evolving the design and utility of capture reagents and methods, while taking advantage of the parallel advances in sequencing platforms. New exome designs target a comprehensive set of coding exons from 6 different gene databases, as well as computationally predicted coding and non-coding elements: regulatory regions, and conserved UTRs. Library automation, reduction of DNA input samples, capture hybridization multiplexing and application of faster read mapping tools such as BWA, together allow a rate of >4,300 libraries/captures per month, with >40,000 exome and regional capture libraries completed to date. In addition, a fully integrated informatics and analysis pipeline (Mercury), supports all aspects of data flow and analysis from the initial data production on the sequencing instrument to annotated variant calls (SNPs and small Indels). These laboratory methods and analysis pipelines have been production hardened at the Human Genome Sequencing Center (HGSC) and have now been applied toward clinical exome sequencing. Through a joint collaboration between the Human Genome Sequencing Center and the Medical Genetics Laboratories (MGL) of the Department of Molecular and Human Genetics, clinical exome sequencing and interpretation are now provided through the CAP/CLIA certified Whole Genome Laboratory (WGL). To date, the WGL has completed exome sequencing of 650 patient samples and final interpretation completed for over 450 patients with causative deleterious mutations identified in 25% of cases. Performance has been maintained to a high standard of 95% of the exome target bases represented at 20X coverage. Overall exome performance metrics, LIMS support, variant analysis and validation of the clinical pipeline for a CAP/CLIA environment will be presented.
Polymicrogyria is a disorder of neuronal development resulting in structurally abnormal cerebral hemispheres characterized by over-folding and abnormal lamination of the cerebral cortex. Polymicrogyria is frequently associated with severe neurologic deficits including intellectual disability, motor problems, and epilepsy. There are acquired and genetic causes of polymicrogyria, but most patients with a presumed genetic etiology lack a specific diagnosis. Here we report using whole-exome sequencing to identify compound heterozygous mutations in the WD repeat domain 62 (WDR62) gene as the cause of recurrent polymicrogyria in a sibling pair. Sanger sequencing confirmed that the siblings both inherited 1-bp (maternal allele) and 2-bp (paternal allele) frameshift deletions, which predict premature truncation of WDR62, a protein that has a role in early cortical development. The probands are from a non-consanguineous family of Northern European descent, suggesting that autosomal recessive PMG due to compound heterozygous mutation of WDR62 might be a relatively common cause of PMG in the population. Further studies to identify mutation frequency in the population are needed.
malformations of cortical development; high-throughput nucleotide sequencing; genetic testing; epilepsy; intellectual disability
Molecular diagnostics can resolve locus heterogeneity underlying clinical phenotypes that may otherwise be co-assigned as a specific syndrome based on shared clinical features, and can associate phenotypically diverse diseases to a single locus through allelic affinity. Here we describe an apparently novel syndrome, likely caused by de novo truncating mutations in ASXL3, which shares characteristics with Bohring-Opitz syndrome, a disease associated with de novo truncating mutations in ASXL1.
We used whole-genome and whole-exome sequencing to interrogate the genomes of four subjects with an undiagnosed syndrome.
Using genome-wide sequencing, we identified heterozygous, de novo truncating mutations in ASXL3, a transcriptional repressor related to ASXL1, in four unrelated probands. We found that these probands shared similar phenotypes, including severe feeding difficulties, failure to thrive, and neurologic abnormalities with significant developmental delay. Further, they showed less phenotypic overlap with patients who had de novo truncating mutations in ASXL1.
We have identified truncating mutations in ASXL3 as the likely cause of a novel syndrome with phenotypic overlap with Bohring-Opitz syndrome.
Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues.
We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set.
We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood.
We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS.
We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.
Whole-genome sequencing of patient DNA can facilitate diagnosis of a disease, but its potential for guiding treatment has been under-realized. We interrogated the complete genome sequences of a 14-year-old fraternal twin pair diagnosed with dopa (3,4-dihydroxyphenylalanine)–responsive dystonia (DRD; Mendelian Inheritance in Man #128230). DRD is a genetically heterogeneous and clinically complex movement disorder that is usually treated with l-dopa, a precursor of the neurotransmitter dopamine. Whole-genome sequencing identified compound heterozygous mutations in the SPR gene encoding sepiapterin reductase. Disruption of SPR causes a decrease in tetrahydrobiopterin, a cofactor required for the hydroxylase enzymes that synthesize the neurotransmitters dopamine and serotonin. Supplementation of l-dopa therapy with 5-hydroxytryptophan, a serotonin precursor, resulted in clinical improvements in both twins.
Next-generation DNA sequencing is opening new avenues for genetic association studies in common diseases that, like deep vein thrombosis (DVT), have a strong genetic predisposition still largely unexplained by currently identified risk variants. In order to develop sequencing and analytical pipelines for the application of next-generation sequencing to complex diseases, we conducted a pilot study sequencing the coding area of 186 hemostatic/proinflammatory genes in 10 Italian cases of idiopathic DVT and 12 healthy controls.
A molecular-barcoding strategy was used to multiplex DNA target capture and sequencing, while retaining individual sequence information. Genomic libraries with barcode sequence-tags were pooled (in pools of 8 or 16 samples) and enriched for target DNA sequences. Sequencing was performed on ABI SOLiD-4 platforms. We produced > 12 gigabases of raw sequence data to sequence at high coverage (average: 42X) the 700-kilobase target area in 22 individuals. A total of 1876 high-quality genetic variants were identified (1778 single nucleotide substitutions and 98 insertions/deletions). Annotation on databases of genetic variation and human disease mutations revealed several novel, potentially deleterious mutations. We tested 576 common variants in a case-control association analysis, carrying the top-5 associations over to replication in up to 719 DVT cases and 719 controls. We also conducted an analysis of the burden of nonsynonymous variants in coagulation factor and anticoagulant genes. We found an excess of rare missense mutations in anticoagulant genes in DVT cases compared to controls and an association for a missense polymorphism of FGA (rs6050; p = 1.9 × 10-5, OR 1.45; 95% CI, 1.22-1.72; after replication in > 1400 individuals).
We implemented a barcode-based strategy to efficiently multiplex sequencing of hundreds of candidate genes in several individuals. In the relatively small dataset of our pilot study we were able to identify bona fide associations with DVT. Our study illustrates the potential of next-generation sequencing for the discovery of genetic variation predisposing to complex diseases.
Deep vein thrombosis; venous thromboembolism; next-generation sequencing; target capture; multiplexing; FGA; rs6025; heamostateome; DVT; VTE
We have developed a solution-based method for targeted DNA capture-sequencing that is directed to the complete human exome. Using this approach allows the discovery of greater than 95% of all expected heterozygous singe base variants, requires as little as 3 Gbp of raw sequence data and constitutes an effective tool for identifying rare coding alleles in large scale genomic studies.
We have applied a high-throughput pyrosequencing technology for transcriptome profiling of Caenorhabditis elegans in its first larval stage. Using this approach, we have generated a large amount of data for expressed sequence tags, which provides an opportunity for the discovery of putative novel transcripts and alternative splice variants that could be developmentally specific to the first larval stage. This work also demonstrates the successful and efficient application of a next generation sequencing methodology.
We have generated over 30 million bases of novel expressed sequence tags from first larval stage worms utilizing high-throughput sequencing technology. We have shown that approximately 14% of the newly sequenced expressed sequence tags map completely or partially to genomic regions where there are no annotated genes or splice variants and therefore, imply that these are novel genetic structures. Expressed sequence tags, which map to intergenic (around 1000) and intronic regions (around 580), may represent novel transcribed regions, such as unannotated or unrecognized small protein-coding or non-protein-coding genes or splice variants. Expressed sequence tags, which map across intron-exon boundaries (around 300), indicate possible alternative splice sites, while expressed sequence tags, which map near the ends of known transcripts (around 600), suggest extension of the coding or untranslated regions. We have also discovered that intergenic and intronic expressed sequence tags, which are well conserved across different nematode species, are likely to represent non-coding RNAs. Lastly, we have incorporated available serial analysis of gene expression data generated from first larval stage worms, in order to predict novel transcripts that might be specifically or predominantly expressed in the first larval stage.
We have demonstrated the use of a high-throughput sequencing methodology to efficiently produce a snap-shot of transcriptional activities occurring in the first larval stage of C. elegans development. Such application of this new sequencing technique allows for high-throughput, genome-wide experimental verification of known and novel transcripts. This study provides a more complete C. elegans transcriptome profile and, furthermore, gives insight into the evolutionary and biological complexity of this organism.
High throughput sequencing-by-synthesis is an emerging technology that allows the rapid production of millions of bases of data. Although the sequence reads are short, they can readily be used for re-sequencing. By re-sequencing the mRNA products of a cell, one may rapidly discover polymorphisms and splice variants particular to that cell.
We present the utility of massively parallel sequencing by synthesis for profiling the transcriptome of a human prostate cancer cell-line, LNCaP, that has been treated with the synthetic androgen, R1881. Through the generation of approximately 20 megabases (MB) of EST data, we detect transcription from over 10,000 gene loci, 25 previously undescribed alternative splicing events involving known exons, and over 1,500 high quality single nucleotide discrepancies with the reference human sequence. Further, we map nearly 10,000 ESTs to positions on the genome where no transcription is currently predicted to occur. We also characterize various obstacles with using sequencing by synthesis for transcriptome analysis and propose solutions to these problems.
The use of high-throughput sequencing-by-synthesis methods for transcript profiling allows the specific and sensitive detection of many of a cell's transcripts, and also allows the discovery of high quality base discrepancies, and alternative splice variants. Thus, this technology may provide an effective means of understanding various disease states, discovering novel targets for disease treatment, and discovery of novel transcripts.
In a large cohort of osteogenesis imperfecta type V (OI type V) patients (17 individuals from 12 families), we identified the same mutation in the 5′ untranslated region (5′UTR) of the interferon-induced transmembrane protein 5 (IFITM5) gene by whole exome and Sanger sequencing (IFITM5 c.–14C > T) and provide a detailed description of their phenotype. This mutation leads to the creation of a novel start codon adding five residues to IFITM5 and was recently reported in several other OI type V families. The variability of the phenotype was quite large even within families. Whereas some patients presented with the typical calcification of the forearm interosseous membrane, radial head dislocation and hyperplastic callus (HPC) formation following fractures, others had only some of the typical OI type V findings. Thirteen had calcification of interosseous membranes, 14 had radial head dislocations, 10 had HPC, 9 had long bone bowing, 11 could ambulate without assistance, and 1 had mild unilateral mixed hearing loss. The bone mineral density varied greatly, even within families. Our study thus highlights the phenotypic variability of OI type V caused by the IFITM5 mutation.
OSTEOGENESIS IMPERFECTA; IFITM5; UNTRANSLATED REGION; HYPERPLASTIC CALLUS