Next-generation sequencing is making sequence-based molecular pathology and personalized oncology viable. We selected an individual initially diagnosed with conventional but aggressive prostate adenocarcinoma and sequenced the genome and transcriptome from primary and metastatic tissues collected prior to hormone therapy. The histology-pathology and copy number profiles were remarkably homogeneous, yet it was possible to propose the quadrant of the prostate tumour that likely seeded the metastatic diaspora. Despite a homogeneous cell type, our transcriptome analysis revealed signatures of both luminal and neuroendocrine cell types. Remarkably, the repertoire of expressed but apparently private gene fusions, including C15orf21:MYC, recapitulated this biology. We hypothesize that the amplification and over-expression of the stem cell gene MSI2 may have contributed to the stable hybrid cellular identity. This hybrid luminal-neuroendocrine tumour appears to represent a novel and highly aggressive case of prostate cancer with unique biological features and, conceivably, a propensity for rapid progression to castrate-resistance. Overall, this work highlights the importance of integrated analyses of genome, exome and transcriptome sequences for basic tumour biology, sequence-based molecular pathology and personalized oncology.
RNA sequencing; DNA sequencing; prostate cancer; fusion genes; neuroendocrine; personalized medicine; cancer genetics
Summary: Despite recent progress, computational tools that identify gene fusions from next-generation whole transcriptome sequencing data are often limited in accuracy and scalability. Here, we present a software package, BreakFusion that combines the strength of reference alignment followed by read-pair analysis and de novo assembly to achieve a good balance in sensitivity, specificity and computational efficiency.
Supplementary data are available at Bioinformatics online
Castrate resistant prostate cancer (CRPC) and neuroendocrine carcinoma of the prostate are invariably fatal diseases for which only palliative therapies exist. As part of a prostate tumour sequencing program, a patient tumour was analyzed using Illumina genome sequencing and a matched renal capsule tumour xenograft was generated. Both tumour and xenograft had a homozygous 9p21 deletion spanning the MTAP, CDKN2 and ARF genes. It is rare for this deletion to occur in primary prostate tumours yet approximately 10% express decreased levels of MTAP mRNA. Decreased MTAP expression is a prognosticator for poor outcome. Moreover, it appears that this deletion is more common in CRPC than in primary prostate cancer. We show for the first time that treatment with methylthioadenosine and high dose 6-thioguanine causes marked inhibition of a patient derived neuroendocrine xenograft growth while protecting the host from 6-thioguanine toxicity. This therapeutic approach can be applied to other MTAP-deficient human cancers since deletion or hypermethylation of the MTAP gene occurs in a broad spectrum of tumours at high frequency. The combination of genome sequencing and patient-derived xenografts can identify candidate therapeutic agents and evaluate them for personalized oncology.
massively parallel sequencing; MTAP; patient-derived xenograft; genitourinary cancers: prostate; animal models of cancer; gene expression profiling; functional genomics; xenograft models
Biallelic mutations of the DNA annealing helicase SMARCAL1 (SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin, subfamily a-like 1) cause Schimke immuno-osseous dysplasia (SIOD, MIM 242900), an incompletely penetrant autosomal recessive disorder. Using human, Drosophila and mouse models, we show that the proteins encoded by SMARCAL1 orthologs localize to transcriptionally active chromatin and modulate gene expression. We also show that, as found in SIOD patients, deficiency of the SMARCAL1 orthologs alone is insufficient to cause disease in fruit flies and mice, although such deficiency causes modest diffuse alterations in gene expression. Rather, disease manifests when SMARCAL1 deficiency interacts with genetic and environmental factors that further alter gene expression. We conclude that the SMARCAL1 annealing helicase buffers fluctuations in gene expression and that alterations in gene expression contribute to the penetrance of SIOD.
The current paradigm of cancer care relies on predictive nomograms which integrate detailed histopathology with clinical data. However, when predictions fail, the consequences for patients are often catastrophic, especially in prostate cancer where nomograms influence the decision to therapeutically intervene. We hypothesized that the high dimensional data afforded by massively parallel sequencing (MPS) is not only capable of providing biological insights, but may aid molecular pathology of prostate tumours. We assembled a cohort of six patients with high-risk disease, and performed deep RNA and shallow DNA sequencing in primary tumours and matched metastases where available. Our analysis identified copy number abnormalities, accurately profiled gene expression levels, and detected both differential splicing and expressed fusion genes. We revealed occult and potentially dormant metastases, unambiguously supporting the patients’ clinical history, and implicated the REST transcriptional complex in the development of neuroendocrine prostate cancer, validating this finding in a large independent cohort. We massively expand on the number of novel fusion genes described in prostate cancer; provide fresh evidence for the growing link between fusion gene aetiology and gene expression profiles; and show the utility of fusion genes for molecular pathology. Finally, we identified chromothripsis in a patient with chronic prostatitis. Our results provide a strong foundation for further development of MPS-based molecular pathology.
molecular pathology; massively parallel sequencing; neuroendocrine prostate cancer; REST repressor; chromothripsis
CrkRS (Cdc2-related kinase, Arg/Ser), or cyclin-dependent kinase 12 (CKD12), is a serine/threonine kinase believed to coordinate transcription and RNA splicing. While CDK12/CrkRS complexes were known to phosphorylate the C-terminal domain (CTD) of RNA polymerase II (RNA Pol II), the cyclin regulating this activity was not known. Using immunoprecipitation and mass spectrometry, we identified a 65-kDa isoform of cyclin K (cyclin K1) in endogenous CDK12/CrkRS protein complexes. We show that cyclin K1 complexes isolated from mammalian cells contain CDK12/CrkRS but do not contain CDK9, a presumed partner of cyclin K. Analysis of extensive RNA-Seq data shows that the 65-kDa cyclin K1 isoform is the predominantly expressed form across numerous tissue types. We also demonstrate that CDK12/CrkRS is dependent on cyclin K1 for its kinase activity and that small interfering RNA (siRNA) knockdown of CDK12/CrkRS or cyclin K1 has similar effects on the expression of a luciferase reporter gene. Our data suggest that cyclin K1 is the primary cyclin partner for CDK12/CrkRS and that cyclin K1 is required to activate CDK12/CrkRS to phosphorylate the CTD of RNA Pol II. These properties are consistent with a role of CDK12/CrkRS in regulating gene expression through phosphorylation of RNA Pol II.
Neuroblastoma is a childhood extracranial solid tumour that is associated with a number of genetic changes. Included in these genetic alterations are mutations in the kinase domain of the anaplastic lymphoma kinase (ALK) receptor tyrosine kinase (RTK), which have been found in both somatic and familial neuroblastoma. In order to treat patients accordingly requires characterisation of these mutations in terms of their response to ALK tyrosine kinase inhibitors (TKIs). Here, we report the identification and characterisation of two novel neuroblastoma ALK mutations (A1099T and R1464STOP), which we have investigated together with several previously reported but uncharacterised ALK mutations (T1087I, D1091N, T1151M, M1166R, F1174I and A1234T). In order to understand the potential role of these ALK mutations in neuroblastoma progression, we have employed cell culture-based systems together with the model organism Drosophila as a readout for ligand-independent activity. Mutation of ALK at position 1174 (F1174I) generates a gain-of-function receptor capable of activating intracellular targets such as ERK (extracellular signal regulated kinase) and STAT3 (signal transducer and activator of transcription 3) in a ligand-independent manner. Analysis of these previously uncharacterised ALK mutants and comparison with ALKF1174 mutants suggests that ALK mutations observed in neuroblastoma fall into three classes. These classes are: (i) gain-of-function ligand-independent mutations such as ALKF1174l, (ii) kinase-dead ALK mutants, e.g. ALKI1250T (Schönherr et al., 2011a) and (iii) ALK mutations that are ligand-dependent in nature. Irrespective of the nature of the observed ALK mutants, in every case the activity of the mutant ALK receptors could be abrogated by the ALK inhibitor crizotinib (Xalkori/PF-02341066), albeit with differing levels of sensitivity.
Oligodendroglioma is characterized by unique clinical, pathological, and genetic features. Recurrent losses of chromosomes 1p and 19q are strongly associated with this brain cancer but knowledge of the identity and function of the genes affected by these alterations is limited. We performed exome sequencing on a discovery set of 16 oligodendrogliomas with 1p/19q co-deletion to identify new molecular features at base-pair resolution. As anticipated, there was a high rate of IDH mutations: all cases had mutations in either IDH1 (14/16) or IDH2 (2/16). In addition, we discovered somatic mutations and insertions/deletions in the CIC gene on chromosome 19q13.2 in 13/16 tumours. These discovery set mutations were validated by deep sequencing of 13 additional tumours, which revealed 7 others with CIC mutations, thus bringing the overall mutation rate in oligodendrogliomas in this study to 20/29 (69%). In contrast, deep sequencing of astrocytomas and oligoastrocytomas without 1p/19q loss revealed that CIC alterations were otherwise rare (1/60; 2%). Of the 21 non-synonymous somatic mutations in 20 CIC-mutant oligodendrogliomas, 9 were in exon 5 within an annotated DNA interacting domain and 3 were in exon 20 within an annotated protein interacting domain. The remaining 9 were found in other exons and frequently included truncations. CIC mutations were highly associated with oligodendroglioma histology, 1p/19q co-deletion and IDH1/2 mutation (p<0.001). Although we observed no differences in the clinical outcomes of CIC mutant versus wild-type tumors, in a background of 1p/19q co-deletion, hemizygous CIC mutations are likely important. We hypothesize that the mutant CIC on the single retained 19q allele is linked to the pathogenesis of oligodendrogliomas with IDH mutation. Our detailed study of genetic aberrations in oligodendroglioma suggests a functional interaction between CIC mutation, IDH1/2 mutation and 1p/19q co-deletion.
Glioma; Oligodendroglioma; Next Generation Sequencing; Capicua; IDH1
Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its ~20-megabase genome, which contains ~6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.
Somatic hypermutation (SHM) in the variable region of immunoglobulin genes (IGV) naturally occurs in a narrow window of B cell development to provide high-affinity antibodies. However, SHM can also aberrantly target proto-oncogenes and cause genome instability. The role of aberrant SHM (aSHM) has been widely studied in various non-Hodgkin's lymphoma particularly in diffuse large B-cell lymphoma (DLBCL). Although, it has been speculated that aSHM targets a wide range of genome loci so far only twelve genes have been identified as targets of aSHM through the targeted sequencing of selected genes. A genome-wide study aiming at identifying a comprehensive set of aSHM targets recurrently occurring in DLBCL has not been previously undertaken. Here, we present a comprehensive assessment of the somatic hypermutated genes in DLBCL identified through an analysis of genomic and transcriptome data derived from 40 DLBCL patients. Our analysis verifies that there are indeed many genes that are recurrently affected by aSHM. In particular, we have identified 32 novel targets that show same or higher level of aSHM activity than genes previously reported. Amongst these novel targets, 22 genes showed a significant correlation between mRNA abundance and aSHM.
Aberrant somatic hypermutation; Genome wide study; Diffuse large B-cell lymphoma; Genomic rearrangements
Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome—in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)—which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes.
Malformations of the cardiovascular system are the most common type of birth defect in humans, frequently affecting the formation of valves and septa. During heart valve and septa formation, cells from the atrio-ventricular canal (AVC) and outflow tract (OFT) regions of the heart undergo an epithelial-to-mesenchymal transformation (EMT) and invade the underlying extracellular matrix to give rise to endocardial cushions. Subsequent maturation of newly formed mesenchyme cells leads to thin stress-resistant leaflets. TWIST1 is a basic helix-loop-helix transcription factor expressed in newly formed mesenchyme cells of the AVC and OFT that has been shown to play roles in cell survival, cell proliferation and differentiation. However, the downstream targets of TWIST1 during heart valve formation remain unclear. To identify genes important for heart valve development downstream of TWIST1, we performed global gene expression profiling of AVC, OFT, atria and ventricles of the embryonic day 10.5 mouse heart by tag-sequencing (Tag-seq). Using this resource we identified a novel set of 939 genes, including 123 regulators of transcription, enriched in the valve forming regions of the heart. We compared these genes to a Tag-seq library from the Twist1 null developing valves revealing significant gene expression changes. These changes were consistent with a role of TWIST1 in controlling differentiation of mesenchymal cells following their transformation from endothelium in the mouse. To study the role of TWIST1 at the DNA level we performed chromatin immunoprecipitation and identified novel direct targets of TWIST1 in the developing heart valves. Our findings support a role for TWIST1 in the differentiation of AVC mesenchyme post-EMT in the mouse, and suggest that TWIST1 can exert its function by direct DNA binding to activate valve specific gene expression.
The issue of heterozygosity continues to be a challenge in the analysis of genome sequences. In this article, we describe the use of allele ratios to distinguish biologically significant single-nucleotide variants from background noise. An application of this approach is the identification of lethal mutations in Caenorhabditis elegans essential genes, which must be maintained by the presence of a wild-type allele on a balancer. The h448 allele of let-504 is rescued by the duplication balancer sDp2. We readily identified the extent of the duplication when the percentage of read support for the lesion was between 70 and 80%. Examination of the EMS-induced changes throughout the genome revealed that these mutations exist in contiguous blocks. During early embryonic division in self-fertilizing C. elegans, alkylated guanines pair with thymines. As a result, EMS-induced changes become fixed as either G→A or C→T changes along the length of the chromosome. Thus, examination of the distribution of EMS-induced changes revealed the mutational and recombinational history of the chromosome, even generations later. We identified the mutational change responsible for the h448 mutation and sequenced PCR products for an additional four alleles, correlating let-504 with the DNA-coding region for an ortholog of a NFκB-activating protein, NKAP. Our results confirm that whole-genome sequencing is an efficient and inexpensive way of identifying nucleotide alterations responsible for lethal phenotypes and can be applied on a large scale to identify the molecular basis of essential genes.
Motivation: Identification of somatic single nucleotide variants (SNVs) in tumour genomes is a necessary step in defining the mutational landscapes of cancers. Experimental designs for genome-wide ascertainment of somatic mutations now routinely include next-generation sequencing (NGS) of tumour DNA and matched constitutional DNA from the same individual. This allows investigators to control for germline polymorphisms and distinguish somatic mutations that are unique to the tumour, thus reducing the burden of labour-intensive and expensive downstream experiments needed to verify initial predictions. In order to make full use of such paired datasets, computational tools for simultaneous analysis of tumour–normal paired sequence data are required, but are currently under-developed and under-represented in the bioinformatics literature.
Results: In this contribution, we introduce two novel probabilistic graphical models called JointSNVMix1 and JointSNVMix2 for jointly analysing paired tumour–normal digital allelic count data from NGS experiments. In contrast to independent analysis of the tumour and normal data, our method allows statistical strength to be borrowed across the samples and therefore amplifies the statistical power to identify and distinguish both germline and somatic events in a unified probabilistic framework.
Availability: The JointSNVMix models and four other models discussed in the article are part of the JointSNVMix software package available for download at http://compbio.bccrc.ca
Supplementary information:Supplementary data are available at Bioinformatics online.
Next generation sequencing has brought epigenomic studies to the forefront of current research. The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago. With early proof of concept studies published, the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount. In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms. We will discuss these in terms of library preparation, sequence platforms and analysis techniques.
epigenomics; next generation sequencing
Motivation: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge.
Results: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth ‘false positive’ predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study.
Availability: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca.
Supplementary information: Supplementary data are available at Bioinformatics online.
The diagnosis of medulloblastoma likely encompasses several distinct entities, with recent evidence for the existence of at least four unique molecular subgroups that exhibit distinct genetic, transcriptional, demographic, and clinical features. Assignment of molecular subgroup through routine profiling of high-quality RNA on expression microarrays is likely impractical in the clinical setting. The planning and execution of medulloblastoma clinical trials that stratify by subgroup, or which are targeted to a specific subgroup requires technologies that can be economically, rapidly, reliably, and reproducibly applied to formalin-fixed paraffin embedded (FFPE) specimens. In the current study, we have developed an assay that accurately measures the expression level of 22 medulloblastoma subgroup-specific signature genes (CodeSet) using nanoString nCounter Technology. Comparison of the nanoString assay with Affymetrix expression array data on a training series of 101 medulloblastomas of known subgroup demonstrated a high concordance (Pearson correlation r = 0.86). The assay was validated on a second set of 130 non-overlapping medulloblastomas of known subgroup, correctly assigning 98% (127/130) of tumors to the appropriate subgroup. Reproducibility was demonstrated by repeating the assay in three independent laboratories in Canada, the United States, and Switzerland. Finally, the nanoString assay could confidently predict subgroup in 88% of recent FFPE cases, of which 100% had accurate subgroup assignment. We present an assay based on nanoString technology that is capable of rapidly, reliably, and reproducibly assigning clinical FFPE medulloblastoma samples to their molecular subgroup, and which is highly suited for future medulloblastoma clinical trials.
Electronic supplementary material
The online version of this article (doi:10.1007/s00401-011-0899-7) contains supplementary material, which is available to authorized users.
Medulloblastoma; Molecular classification; Clinical trials; NanoString
Skin-derived precursors (SKPs) are multipotent dermal stem cells that reside within a hair follicle niche and that share properties with embryonic neural crest precursors. Here, we have asked whether SKPs and their endogenous dermal precursors originate from the neural crest or whether, like the dermis itself, they originate from multiple developmental origins. To do this, we used two different mouse Cre lines that allow us to perform lineage tracing: Wnt1-cre, which targets cells deriving from the neural crest, and Myf5-cre, which targets cells of a somite origin. By crossing these Cre lines to reporter mice, we show that the endogenous follicle-associated dermal precursors in the face derive from the neural crest, and those in the dorsal trunk derive from the somites, as do the SKPs they generate. Despite these different developmental origins, SKPs from these two locations are functionally similar, even with regard to their ability to differentiate into Schwann cells, a cell type only thought to be generated from the neural crest. Analysis of global gene expression using microarrays confirmed that facial and dorsal SKPs exhibit a very high degree of similarity, and that they are also very similar to SKPs derived from ventral dermis, which has a lateral plate origin. However, these developmentally distinct SKPs also retain differential expression of a small number of genes that reflect their developmental origins. Thus, an adult neural crest-like dermal precursor can be generated from a non-neural crest origin, a finding with broad implications for the many neuroendocrine cells in the body.
Dermis; Stem cells; Lineage tracing; Neural crest; Somites
Small non-coding RNAs, such as microRNAs (miRNAs), are involved in diverse biological processes including organ development and tissue differentiation. Global disruption of miRNA biogenesis in Dicer knockout mice disrupts early embryogenesis and primordial germ cell formation. However, the role of miRNAs in early folliculogenesis is poorly understood. In order to identify a full transcriptome set of small RNAs expressed in the newborn (NB) ovary, we extracted small RNA fraction from mouse NB ovary tissues and subjected it to massive parallel sequencing using the Genome Analyzer from Illumina. Massive sequencing produced 4 655 992 reads of 33 bp each representing a total of 154 Mbp of sequence data. The Pash alignment algorithm mapped 50.13% of the reads to the mouse genome. Sequence reads were clustered based on overlapping mapping coordinates and intersected with known miRNAs, small nucleolar RNAs (snoRNAs), piwi-interacting RNA (piRNA) clusters and repetitive genomic regions; 25.2% of the reads mapped to known miRNAs, 25.5% to genomic repeats, 3.5% to piRNAs and 0.18% to snoRNAs. Three hundred and ninety-eight known miRNA species were among the sequenced small RNAs, and 118 isomiR sequences that are not in the miRBase database. Let-7 family was the most abundantly expressed miRNA, and mmu-mir-672, mmu-mir-322, mmu-mir-503 and mmu-mir-465 families are the most abundant X-linked miRNA detected. X-linked mmu-mir-503, mmu-mir-672 and mmu-mir-465 family showed preferential expression in testes and ovaries. We also identified four novel miRNAs that are preferentially expressed in gonads. Gonadal selective miRNAs may play important roles in ovarian development, folliculogenesis and female fertility.
miRNA; ovary; oocyte; microRNA; ncRNA
Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.
Genome rearrangements and associated gene fusions are known to be important oncogenic events in some cancers. We have developed a novel computational method called deFuse for detecting gene fusions in RNA-Seq data and have applied it to the discovery of novel gene fusions in sarcoma and ovarian tumors. We assessed the accuracy of our method and found that deFuse produces substantially better sensitivity and specificity than two other published methods. We have also developed a set of 60 positive and 61 negative examples that will be useful for accurate identification of gene fusions in future RNA-Seq datasets. We have trained a classifier on 11 novel features of the 121 examples, and show that the classifier is able to accurately identify real gene fusions. The 45 gene fusions reported in this study represent the first ovarian cancer fusions reported, as well as novel sarcoma fusions. By examining the expression patterns of the affected genes, we find that many fusions are predicted to have functional consequences and thus merit experimental followup to determine their clinical relevance.
Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression.
DNA methylation; Sequencing; Bisulfite
Clinical laboratories are adopting array genomic hybridization as a standard clinical test. A number of whole genome array genomic hybridization platforms are available, but little is known about their comparative performance in a clinical context.
We studied 30 children with idiopathic MR and both unaffected parents of each child using Affymetrix 500 K GeneChip SNP arrays, Agilent Human Genome 244 K oligonucleotide arrays and NimbleGen 385 K Whole-Genome oligonucleotide arrays. We also determined whether CNVs called on these platforms were detected by Illumina Hap550 beadchips or SMRT 32 K BAC whole genome tiling arrays and tested 15 of the 30 trios on Affymetrix 6.0 SNP arrays.
The Affymetrix 500 K, Agilent and NimbleGen platforms identified 3061 autosomal and 117 X chromosomal CNVs in the 30 trios. 147 of these CNVs appeared to be de novo, but only 34 (22%) were found on more than one platform. Performing genotype-phenotype correlations, we identified 7 most likely pathogenic and 2 possibly pathogenic CNVs for MR. All 9 of these putatively pathogenic CNVs were detected by the Affymetrix 500 K, Agilent, NimbleGen and the Illumina arrays, and 5 were found by the SMRT BAC array. Both putatively pathogenic CNVs identified in the 15 trios tested with the Affymetrix 6.0 were identified by this platform.
Our findings demonstrate that different results are obtained with different platforms and illustrate the trade-off that exists between sensitivity and specificity. The large number of apparently false positive CNV calls on each of the platforms supports the need for validating clinically important findings with a different technology.
The H3K9me3 histone modification is often found at promoter regions, where it functions to repress transcription. However, we have previously shown that 3′ exons of zinc finger genes (ZNFs) are marked by high levels of H3K9me3. We have now further investigated this unusual location for H3K9me3 in ZNF genes. Neither bioinformatic nor experimental approaches support the hypothesis that the 3′ exons of ZNFs are promoters. We further characterized the histone modifications at the 3′ ZNF exons and found that these regions also contain H3K36me3, a mark of transcriptional elongation. A genome-wide analysis of ChIP-seq data revealed that ZNFs constitute the majority of genes that have high levels of both H3K9me3 and H3K36me3. These results suggested the possibility that the ZNF genes may be imprinted, with one allele transcribed and one allele repressed. To test the hypothesis that the contradictory modifications are due to imprinting, we used a SNP analysis of RNA-seq data to demonstrate that both alleles of certain ZNF genes having H3K9me3 and H3K36me3 are transcribed. We next analyzed isolated ZNF 3′ exons using stably integrated episomes. We found that although the H3K36me3 mark was lost when the 3′ ZNF exon was removed from its natural genomic location, the isolated ZNF 3′ exons retained the H3K9me3 mark. Thus, the H3K9me3 mark at ZNF 3′ exons does not impede transcription and it is regulated independently of the H3K36me3 mark. Finally, we demonstrate a strong relationship between the number of tandemly repeated domains in the 3′ exons and the H3K9me3 mark. We suggest that the H3K9me3 at ZNF 3′ exons may function to protect the genome from inappropriate recombination rather than to regulate transcription.