Medulloblastoma is comprised of four distinct molecular variants: WNT, SHH, Group 3, and Group 4. We analyzed alternative splicing usage in 14 normal cerebellar samples and 103 medulloblastomas of known subgroup. Medulloblastoma samples have a statistically significant increase in alternative splicing as compared to normal fetal cerebella (2.3-times; P<6.47E-8). Splicing patterns are distinct and specific between molecular subgroups. Unsupervised hierarchical clustering of alternative splicing events accurately assigns medulloblastomas to their correct subgroup. Subgroup-specific splicing and alternative promoter usage was most prevalent in Group 3 (19.4%) and SHH (16.2%) medulloblastomas, while observed less frequently in WNT (3.2%), and Group 4 (9.3%) tumors. Functional annotation of alternatively spliced genes reveals over-representation of genes important for neuronal development. Alternative splicing events in medulloblastoma may be regulated in part by the correlative expression of antisense transcripts, suggesting a possible mechanism affecting subgroup specific alternative splicing. Our results identify additional candidate markers for medulloblastoma subgroup affiliation, further support the existence of distinct subgroups of the disease, and demonstrate an additional level of transcriptional heterogeneity between medulloblastoma subgroups.
medulloblastoma; alternative splicing; neuronal development; molecular subgroup; pediatric cancer
Leukemia stem cells (LSC) play a pivotal role in chronic myeloid leukemia (CML) tyrosine kinase inhibitor (TKI) resistance and progression to blast crisis (BC), in part, through alternative splicing of self-renewal and survival genes. To elucidate splice isoform regulators of human BC LSC maintenance, we performed whole transcriptome RNA sequencing; splice isoform-specific qRT-PCR, nanoproteomics, stromal co-culture and BC LSC xenotransplantation analyses. Cumulatively, these studies show that alternative splicing of multiple pro-survival BCL2 family genes promotes malignant transformation of myeloid progenitors into BC LSC that are quiescent in the marrow niche and contribute to therapeutic resistance. Notably, a novel pan-BCL2 inhibitor, sabutoclax, renders marrow niche-resident BC LSC sensitive to TKIs at doses that spare normal progenitors. These findings underscore the importance of alternative BCL2 family splice isoform expression in BC LSC maintenance and suggest that combinatorial inhibition of pro-survival BCL2 family proteins and BCR-ABL may eliminate dormant LSC and obviate resistance.
Pathways defining susceptibility of normal cells to oncogenic transformation may be valuable therapeutic targets. We characterized the cell of origin and its critical pathways in MN1-induced leukemias. Common myeloid (CMP), but not granulocyte-macrophage progenitors (GMP) could be transformed by MN1. Complementation studies of CMP-signature genes in GMPs demonstrated that MN1-leukemogenicity required the MEIS1/AbdB-like HOX-protein complex. ChIP-sequencing identified common target genes of MN1 and MEIS1, and demonstrated identical binding sites for a large proportion of their chromatin targets. Transcriptional repression of MEIS1 targets in established MN1 leukemias demonstrated antileukemic activity. As MN1 relies on but cannot activate expression of MEIS1/AbdB-like HOX proteins, transcriptional activity of these genes determines cellular susceptibility to MN1-induced transformation, and may represent a promising therapeutic target.
PMID: 21741595 CAMSID: cams3759
Recent sequencing efforts have described the mutational landscape of the pediatric brain tumor medulloblastoma. Although MLL2 is among the most frequent somatic single nucleotide variants (SNV), the clinical and biological significance of these mutations remains uncharacterized. Through targeted re-sequencing, we identified mutations of MLL2 in 8 % (14/175) of MBs, the majority of which were loss of function. Notably, we also report mutations affecting the MLL2-binding partner KDM6A, in 4 % (7/175) of tumors. While MLL2 mutations were independent of age, gender, histological subtype, M-stage or molecular subgroup, KDM6A mutations were most commonly identified in Group 4 MBs, and were mutually exclusive with MLL2 mutations. Immunohistochemical staining for H3K4me3 and H3K27me3, the chromatin effectors of MLL2 and KDM6A activity, respectively, demonstrated alterations of the histone code in 24 % (53/220) of MBs across all subgroups. Correlating these MLL2-and KDM6A-driven histone marks with prognosis, we identified populations of MB with improved (K4+/K27−) and dismal (K4−/K27−) outcomes, observed primarily within Group 3 and 4 MBs. Group 3 and 4 MBs demonstrate somatic copy number aberrations, and transcriptional profiles that converge on modifiers of H3K27-methylation (EZH2, KDM6A, KDM6B), leading to silencing of PRC2-target genes. As PRC2-mediated aberrant methylation of H3K27 has recently been targeted for therapy in other diseases, it represents an actionable target for a substantial percentage of medulloblastoma patients with aggressive forms of the disease.
MLL2; KDM6A; Histone lysine methylation; Medulloblastoma; PRC2
Chromosomal translocations are critically involved in the molecular pathogenesis of B-cell lymphomas, and highly recurrent and specific rearrangements have defined distinct molecular subtypes linked to unique clinicopathological features1,2. In contrast, several well-characterized lymphoma entities still lack disease-defining translocation events. To identify novel fusion transcripts resulting from translocations, we investigated two Hodgkin lymphoma cell lines by whole-transcriptome paired-end sequencing (RNA-seq). Here we show a highly expressed gene fusion involving the major histocompatibility complex (MHC) class II transactivator CIITA (MHC2TA) in KM-H2 cells. In a subsequent evaluation of 263 B-cell lymphomas, we also demonstrate that genomic CIITA breaks are highly recurrent in primary mediastinal B-cell lymphoma (38%) and classical Hodgkin lymphoma (cHL) (15%). Furthermore, we find that CIITA is a promiscuous partner of various in-frame gene fusions, and we report that CIITA gene alterations impact survival in primary mediastinal B-cell lymphoma (PMBCL). As functional consequences of CIITA gene fusions, we identify downregulation of surface HLA class II expression and overexpression of ligands of the receptor molecule programmed cell death 1 (CD274/PDL1 and CD273/PDL2). These receptor–ligand interactions have been shown to impact anti-tumour immune responses in several cancers3, whereas decreased MHC class II expression has been linked to reduced tumour cell immunogenicity4. Thus, our findings suggest that recurrent rearrangements of CIITA may represent a novel genetic mechanism underlying tumour–microenvironment interactions across a spectrum of lymphoid cancers.
Transposable element (TE) derived sequences comprise half of our genome and DNA methylome, and are presumed densely methylated and inactive. Examination of the genome-wide DNA methylation status within 928 TE subfamilies in human embryonic and adult tissues revealed unexpected tissue-specific and subfamily-specific hypomethylation signatures. Genes proximal to tissue-specific hypomethylated TE sequences were enriched for functions important for the tissue type and their expression correlated strongly with hypomethylation of the TEs. When hypomethylated, these TE sequences gained tissue-specific enhancer marks including H3K4me1 and occupancy by p300, and a majority exhibited enhancer activity in reporter gene assays. Many such TEs also harbored binding sites for transcription factors that are important for tissue-specific functions and exhibited evidence for evolutionary selection. These data suggest that sequences derived from TEs may be responsible for wiring tissue type-specific regulatory networks, and have acquired tissue-specific epigenetic regulation.
The extent to which a genomic test will be used in practice is affected by factors such as ability of the test to correctly predict response to treatment (i.e. sensitivity and specificity of the test), invasiveness of the testing procedure, test cost, and the probability and severity of side effects associated with treatment.
Using discrete choice experimentation (DCE), we elicited preferences of the public (Sample 1, N = 533 and Sample 2, N = 525) and cancer patients (Sample 3, N = 38) for different attributes of a hypothetical genomic test for guiding cancer treatment. Samples 1 and 3 considered the test/treatment in the context of an aggressive curable cancer (scenario A) while the scenario for sample 2 was based on a non-aggressive incurable cancer (scenario B).
In aggressive curable cancer (scenario A), everything else being equal, the odds ratio (OR) of choosing a test with 95% sensitivity was 1.41 (versus a test with 50% sensitivity) and willingness to pay (WTP) was $1331, on average, for this amount of improvement in test sensitivity. In this scenario, the OR of choosing a test with 95% specificity was 1.24 times that of a test with 50% specificity (WTP = $827). In non-aggressive incurable cancer (scenario B), the OR of choosing a test with 95% sensitivity was 1.65 (WTP = $1344), and the OR of choosing a test with 95% specificity was 1.50 (WTP = $1080). Reducing severity of treatment side effects from severe to mild was associated with large ORs in both scenarios (OR = 2.10 and 2.24 in scenario A and B, respectively). In contrast, patients had a very large preference for 95% sensitivity of the test (OR = 5.23).
The type and prognosis of cancer affected preferences for genomically-guided treatment. In aggressive curable cancer, individuals emphasized more on the sensitivity rather than the specificity of the test. In contrast, for a non-aggressive incurable cancer, individuals put similar emphasis on sensitivity and specificity of the test. While the public expressed strong preference toward lowering severity of side effects, improving sensitivity of the test had by far the largest influence on patients’ decision to use genomic testing.
Pharmacogenomics; Genomic medicine; Personalized medicine; Genetic testing; Discrete choice experiment; Conjoint analysis; Preference elicitation; Cancer treatment
Next-generation sequencing is making sequence-based molecular pathology and personalized oncology viable. We selected an individual initially diagnosed with conventional but aggressive prostate adenocarcinoma and sequenced the genome and transcriptome from primary and metastatic tissues collected prior to hormone therapy. The histology-pathology and copy number profiles were remarkably homogeneous, yet it was possible to propose the quadrant of the prostate tumour that likely seeded the metastatic diaspora. Despite a homogeneous cell type, our transcriptome analysis revealed signatures of both luminal and neuroendocrine cell types. Remarkably, the repertoire of expressed but apparently private gene fusions, including C15orf21:MYC, recapitulated this biology. We hypothesize that the amplification and over-expression of the stem cell gene MSI2 may have contributed to the stable hybrid cellular identity. This hybrid luminal-neuroendocrine tumour appears to represent a novel and highly aggressive case of prostate cancer with unique biological features and, conceivably, a propensity for rapid progression to castrate-resistance. Overall, this work highlights the importance of integrated analyses of genome, exome and transcriptome sequences for basic tumour biology, sequence-based molecular pathology and personalized oncology.
RNA sequencing; DNA sequencing; prostate cancer; fusion genes; neuroendocrine; personalized medicine; cancer genetics
Summary: Despite recent progress, computational tools that identify gene fusions from next-generation whole transcriptome sequencing data are often limited in accuracy and scalability. Here, we present a software package, BreakFusion that combines the strength of reference alignment followed by read-pair analysis and de novo assembly to achieve a good balance in sensitivity, specificity and computational efficiency.
Supplementary data are available at Bioinformatics online
Castrate resistant prostate cancer (CRPC) and neuroendocrine carcinoma of the prostate are invariably fatal diseases for which only palliative therapies exist. As part of a prostate tumour sequencing program, a patient tumour was analyzed using Illumina genome sequencing and a matched renal capsule tumour xenograft was generated. Both tumour and xenograft had a homozygous 9p21 deletion spanning the MTAP, CDKN2 and ARF genes. It is rare for this deletion to occur in primary prostate tumours yet approximately 10% express decreased levels of MTAP mRNA. Decreased MTAP expression is a prognosticator for poor outcome. Moreover, it appears that this deletion is more common in CRPC than in primary prostate cancer. We show for the first time that treatment with methylthioadenosine and high dose 6-thioguanine causes marked inhibition of a patient derived neuroendocrine xenograft growth while protecting the host from 6-thioguanine toxicity. This therapeutic approach can be applied to other MTAP-deficient human cancers since deletion or hypermethylation of the MTAP gene occurs in a broad spectrum of tumours at high frequency. The combination of genome sequencing and patient-derived xenografts can identify candidate therapeutic agents and evaluate them for personalized oncology.
massively parallel sequencing; MTAP; patient-derived xenograft; genitourinary cancers: prostate; animal models of cancer; gene expression profiling; functional genomics; xenograft models
Biallelic mutations of the DNA annealing helicase SMARCAL1 (SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin, subfamily a-like 1) cause Schimke immuno-osseous dysplasia (SIOD, MIM 242900), an incompletely penetrant autosomal recessive disorder. Using human, Drosophila and mouse models, we show that the proteins encoded by SMARCAL1 orthologs localize to transcriptionally active chromatin and modulate gene expression. We also show that, as found in SIOD patients, deficiency of the SMARCAL1 orthologs alone is insufficient to cause disease in fruit flies and mice, although such deficiency causes modest diffuse alterations in gene expression. Rather, disease manifests when SMARCAL1 deficiency interacts with genetic and environmental factors that further alter gene expression. We conclude that the SMARCAL1 annealing helicase buffers fluctuations in gene expression and that alterations in gene expression contribute to the penetrance of SIOD.
The current paradigm of cancer care relies on predictive nomograms which integrate detailed histopathology with clinical data. However, when predictions fail, the consequences for patients are often catastrophic, especially in prostate cancer where nomograms influence the decision to therapeutically intervene. We hypothesized that the high dimensional data afforded by massively parallel sequencing (MPS) is not only capable of providing biological insights, but may aid molecular pathology of prostate tumours. We assembled a cohort of six patients with high-risk disease, and performed deep RNA and shallow DNA sequencing in primary tumours and matched metastases where available. Our analysis identified copy number abnormalities, accurately profiled gene expression levels, and detected both differential splicing and expressed fusion genes. We revealed occult and potentially dormant metastases, unambiguously supporting the patients’ clinical history, and implicated the REST transcriptional complex in the development of neuroendocrine prostate cancer, validating this finding in a large independent cohort. We massively expand on the number of novel fusion genes described in prostate cancer; provide fresh evidence for the growing link between fusion gene aetiology and gene expression profiles; and show the utility of fusion genes for molecular pathology. Finally, we identified chromothripsis in a patient with chronic prostatitis. Our results provide a strong foundation for further development of MPS-based molecular pathology.
molecular pathology; massively parallel sequencing; neuroendocrine prostate cancer; REST repressor; chromothripsis
CrkRS (Cdc2-related kinase, Arg/Ser), or cyclin-dependent kinase 12 (CKD12), is a serine/threonine kinase believed to coordinate transcription and RNA splicing. While CDK12/CrkRS complexes were known to phosphorylate the C-terminal domain (CTD) of RNA polymerase II (RNA Pol II), the cyclin regulating this activity was not known. Using immunoprecipitation and mass spectrometry, we identified a 65-kDa isoform of cyclin K (cyclin K1) in endogenous CDK12/CrkRS protein complexes. We show that cyclin K1 complexes isolated from mammalian cells contain CDK12/CrkRS but do not contain CDK9, a presumed partner of cyclin K. Analysis of extensive RNA-Seq data shows that the 65-kDa cyclin K1 isoform is the predominantly expressed form across numerous tissue types. We also demonstrate that CDK12/CrkRS is dependent on cyclin K1 for its kinase activity and that small interfering RNA (siRNA) knockdown of CDK12/CrkRS or cyclin K1 has similar effects on the expression of a luciferase reporter gene. Our data suggest that cyclin K1 is the primary cyclin partner for CDK12/CrkRS and that cyclin K1 is required to activate CDK12/CrkRS to phosphorylate the CTD of RNA Pol II. These properties are consistent with a role of CDK12/CrkRS in regulating gene expression through phosphorylation of RNA Pol II.
Neuroblastoma is a childhood extracranial solid tumour that is associated with a number of genetic changes. Included in these genetic alterations are mutations in the kinase domain of the anaplastic lymphoma kinase (ALK) receptor tyrosine kinase (RTK), which have been found in both somatic and familial neuroblastoma. In order to treat patients accordingly requires characterisation of these mutations in terms of their response to ALK tyrosine kinase inhibitors (TKIs). Here, we report the identification and characterisation of two novel neuroblastoma ALK mutations (A1099T and R1464STOP), which we have investigated together with several previously reported but uncharacterised ALK mutations (T1087I, D1091N, T1151M, M1166R, F1174I and A1234T). In order to understand the potential role of these ALK mutations in neuroblastoma progression, we have employed cell culture-based systems together with the model organism Drosophila as a readout for ligand-independent activity. Mutation of ALK at position 1174 (F1174I) generates a gain-of-function receptor capable of activating intracellular targets such as ERK (extracellular signal regulated kinase) and STAT3 (signal transducer and activator of transcription 3) in a ligand-independent manner. Analysis of these previously uncharacterised ALK mutants and comparison with ALKF1174 mutants suggests that ALK mutations observed in neuroblastoma fall into three classes. These classes are: (i) gain-of-function ligand-independent mutations such as ALKF1174l, (ii) kinase-dead ALK mutants, e.g. ALKI1250T (Schönherr et al., 2011a) and (iii) ALK mutations that are ligand-dependent in nature. Irrespective of the nature of the observed ALK mutants, in every case the activity of the mutant ALK receptors could be abrogated by the ALK inhibitor crizotinib (Xalkori/PF-02341066), albeit with differing levels of sensitivity.
Oligodendroglioma is characterized by unique clinical, pathological, and genetic features. Recurrent losses of chromosomes 1p and 19q are strongly associated with this brain cancer but knowledge of the identity and function of the genes affected by these alterations is limited. We performed exome sequencing on a discovery set of 16 oligodendrogliomas with 1p/19q co-deletion to identify new molecular features at base-pair resolution. As anticipated, there was a high rate of IDH mutations: all cases had mutations in either IDH1 (14/16) or IDH2 (2/16). In addition, we discovered somatic mutations and insertions/deletions in the CIC gene on chromosome 19q13.2 in 13/16 tumours. These discovery set mutations were validated by deep sequencing of 13 additional tumours, which revealed 7 others with CIC mutations, thus bringing the overall mutation rate in oligodendrogliomas in this study to 20/29 (69%). In contrast, deep sequencing of astrocytomas and oligoastrocytomas without 1p/19q loss revealed that CIC alterations were otherwise rare (1/60; 2%). Of the 21 non-synonymous somatic mutations in 20 CIC-mutant oligodendrogliomas, 9 were in exon 5 within an annotated DNA interacting domain and 3 were in exon 20 within an annotated protein interacting domain. The remaining 9 were found in other exons and frequently included truncations. CIC mutations were highly associated with oligodendroglioma histology, 1p/19q co-deletion and IDH1/2 mutation (p<0.001). Although we observed no differences in the clinical outcomes of CIC mutant versus wild-type tumors, in a background of 1p/19q co-deletion, hemizygous CIC mutations are likely important. We hypothesize that the mutant CIC on the single retained 19q allele is linked to the pathogenesis of oligodendrogliomas with IDH mutation. Our detailed study of genetic aberrations in oligodendroglioma suggests a functional interaction between CIC mutation, IDH1/2 mutation and 1p/19q co-deletion.
Glioma; Oligodendroglioma; Next Generation Sequencing; Capicua; IDH1
Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its ~20-megabase genome, which contains ~6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.
Somatic hypermutation (SHM) in the variable region of immunoglobulin genes (IGV) naturally occurs in a narrow window of B cell development to provide high-affinity antibodies. However, SHM can also aberrantly target proto-oncogenes and cause genome instability. The role of aberrant SHM (aSHM) has been widely studied in various non-Hodgkin's lymphoma particularly in diffuse large B-cell lymphoma (DLBCL). Although, it has been speculated that aSHM targets a wide range of genome loci so far only twelve genes have been identified as targets of aSHM through the targeted sequencing of selected genes. A genome-wide study aiming at identifying a comprehensive set of aSHM targets recurrently occurring in DLBCL has not been previously undertaken. Here, we present a comprehensive assessment of the somatic hypermutated genes in DLBCL identified through an analysis of genomic and transcriptome data derived from 40 DLBCL patients. Our analysis verifies that there are indeed many genes that are recurrently affected by aSHM. In particular, we have identified 32 novel targets that show same or higher level of aSHM activity than genes previously reported. Amongst these novel targets, 22 genes showed a significant correlation between mRNA abundance and aSHM.
Aberrant somatic hypermutation; Genome wide study; Diffuse large B-cell lymphoma; Genomic rearrangements
Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome—in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)—which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes.
Malformations of the cardiovascular system are the most common type of birth defect in humans, frequently affecting the formation of valves and septa. During heart valve and septa formation, cells from the atrio-ventricular canal (AVC) and outflow tract (OFT) regions of the heart undergo an epithelial-to-mesenchymal transformation (EMT) and invade the underlying extracellular matrix to give rise to endocardial cushions. Subsequent maturation of newly formed mesenchyme cells leads to thin stress-resistant leaflets. TWIST1 is a basic helix-loop-helix transcription factor expressed in newly formed mesenchyme cells of the AVC and OFT that has been shown to play roles in cell survival, cell proliferation and differentiation. However, the downstream targets of TWIST1 during heart valve formation remain unclear. To identify genes important for heart valve development downstream of TWIST1, we performed global gene expression profiling of AVC, OFT, atria and ventricles of the embryonic day 10.5 mouse heart by tag-sequencing (Tag-seq). Using this resource we identified a novel set of 939 genes, including 123 regulators of transcription, enriched in the valve forming regions of the heart. We compared these genes to a Tag-seq library from the Twist1 null developing valves revealing significant gene expression changes. These changes were consistent with a role of TWIST1 in controlling differentiation of mesenchymal cells following their transformation from endothelium in the mouse. To study the role of TWIST1 at the DNA level we performed chromatin immunoprecipitation and identified novel direct targets of TWIST1 in the developing heart valves. Our findings support a role for TWIST1 in the differentiation of AVC mesenchyme post-EMT in the mouse, and suggest that TWIST1 can exert its function by direct DNA binding to activate valve specific gene expression.
The issue of heterozygosity continues to be a challenge in the analysis of genome sequences. In this article, we describe the use of allele ratios to distinguish biologically significant single-nucleotide variants from background noise. An application of this approach is the identification of lethal mutations in Caenorhabditis elegans essential genes, which must be maintained by the presence of a wild-type allele on a balancer. The h448 allele of let-504 is rescued by the duplication balancer sDp2. We readily identified the extent of the duplication when the percentage of read support for the lesion was between 70 and 80%. Examination of the EMS-induced changes throughout the genome revealed that these mutations exist in contiguous blocks. During early embryonic division in self-fertilizing C. elegans, alkylated guanines pair with thymines. As a result, EMS-induced changes become fixed as either G→A or C→T changes along the length of the chromosome. Thus, examination of the distribution of EMS-induced changes revealed the mutational and recombinational history of the chromosome, even generations later. We identified the mutational change responsible for the h448 mutation and sequenced PCR products for an additional four alleles, correlating let-504 with the DNA-coding region for an ortholog of a NFκB-activating protein, NKAP. Our results confirm that whole-genome sequencing is an efficient and inexpensive way of identifying nucleotide alterations responsible for lethal phenotypes and can be applied on a large scale to identify the molecular basis of essential genes.
Motivation: Identification of somatic single nucleotide variants (SNVs) in tumour genomes is a necessary step in defining the mutational landscapes of cancers. Experimental designs for genome-wide ascertainment of somatic mutations now routinely include next-generation sequencing (NGS) of tumour DNA and matched constitutional DNA from the same individual. This allows investigators to control for germline polymorphisms and distinguish somatic mutations that are unique to the tumour, thus reducing the burden of labour-intensive and expensive downstream experiments needed to verify initial predictions. In order to make full use of such paired datasets, computational tools for simultaneous analysis of tumour–normal paired sequence data are required, but are currently under-developed and under-represented in the bioinformatics literature.
Results: In this contribution, we introduce two novel probabilistic graphical models called JointSNVMix1 and JointSNVMix2 for jointly analysing paired tumour–normal digital allelic count data from NGS experiments. In contrast to independent analysis of the tumour and normal data, our method allows statistical strength to be borrowed across the samples and therefore amplifies the statistical power to identify and distinguish both germline and somatic events in a unified probabilistic framework.
Availability: The JointSNVMix models and four other models discussed in the article are part of the JointSNVMix software package available for download at http://compbio.bccrc.ca
Supplementary information:Supplementary data are available at Bioinformatics online.
Next generation sequencing has brought epigenomic studies to the forefront of current research. The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago. With early proof of concept studies published, the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount. In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms. We will discuss these in terms of library preparation, sequence platforms and analysis techniques.
epigenomics; next generation sequencing
Motivation: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge.
Results: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth ‘false positive’ predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study.
Availability: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca.
Supplementary information: Supplementary data are available at Bioinformatics online.