PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (42)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
1.  Conserved Role of Intragenic DNA Methylation in Regulating Alternative Promoters 
Nature  2010;466(7303):253-257.
While the methylation of DNA in 5′ promoters suppresses gene expression, the role of DNA methylation in gene bodies is unclear1–5. In mammals, tissue- and cell type-specific methylation is present in a small percentage of 5′ CpG island (CGI) promoters, while a far greater proportion occurs across gene bodies, coinciding with highly conserved sequences5–10. Tissue-specific intragenic methylation might reduce,3 or, paradoxically, enhance transcription elongation efficiency1,2,4,5. Capped analysis of gene expression (CAGE) experiments also indicate that transcription commonly initiates within and between genes11–15. To investigate the role of intragenic methylation, we generated a map of DNA methylation from human brain encompassing 24.7 million of the 28 million CpG sites. From the dense, high-resolution coverage of CpG islands, the majority of methylated CpG islands were revealed to be in intragenic and intergenic regions, while less than 3% of CpG islands in 5′ promoters were methylated. The CpG islands in all three locations overlapped with RNA markers of transcription initiation, and unmethylated CpG islands also overlapped significantly with trimethylation of H3K4, a histone modification enriched at promoters16. The general and CpG-island-specific patterns of methylation are conserved in mouse tissues. An in-depth investigation of the human SHANK3 locus17,18 and its mouse homologue demonstrated that this tissue-specific DNA methylation regulates intragenic promoter activity in vitro and in vivo. These methylation-regulated, alternative transcripts are expressed in a tissue and cell type-specific manner, and are expressed differentially within a single cell type from distinct brain regions. These results support a major role for intragenic methylation in regulating cell context-specific alternative promoters in gene bodies.
doi:10.1038/nature09165
PMCID: PMC3998662  PMID: 20613842
Intragenic DNA methylation; alternate promoters; comparative epigenomics; SHANK3
2.  Mutational Analysis Reveals the Origin and Therapy-driven Evolution of Recurrent Glioma 
Science (New York, N.Y.)  2013;343(6167):189-193.
Tumor recurrence is a leading cause of cancer mortality. Therapies for recurrent disease may fail, at least in part, because the genomic alterations driving the growth of recurrences are distinct from those in the initial tumor. To explore this hypothesis, we sequenced the exomes of 23 initial low-grade gliomas and recurrent tumors resected from the same patients. In 43% of cases, at least half of the mutations in the initial tumor were undetected at recurrence, including driver mutations in TP53, ATRX, SMARCA4, and BRAF, suggesting recurrent tumors are often seeded by cells derived from the initial tumor at a very early stage of their evolution. Notably, tumors from 6 of 10 patients treated with the chemotherapeutic drug temozolomide (TMZ) followed an alternative evolutionary path to high-grade glioma. At recurrence, these tumors were hypermutated and harbored driver mutations in the RB and AKT-mTOR pathways that bore the signature of TMZ-induced mutagenesis.
doi:10.1126/science.1239947
PMCID: PMC3998672  PMID: 24336570
3.  Cell of origin in AML: Susceptibility to MN1-induced transformation is regulated by the MEIS1/AbdB-like HOX protein complex 
Cancer cell  2011;20(1):39-52.
Summary
Pathways defining susceptibility of normal cells to oncogenic transformation may be valuable therapeutic targets. We characterized the cell of origin and its critical pathways in MN1-induced leukemias. Common myeloid (CMP), but not granulocyte-macrophage progenitors (GMP) could be transformed by MN1. Complementation studies of CMP-signature genes in GMPs demonstrated that MN1-leukemogenicity required the MEIS1/AbdB-like HOX-protein complex. ChIP-sequencing identified common target genes of MN1 and MEIS1, and demonstrated identical binding sites for a large proportion of their chromatin targets. Transcriptional repression of MEIS1 targets in established MN1 leukemias demonstrated antileukemic activity. As MN1 relies on but cannot activate expression of MEIS1/AbdB-like HOX proteins, transcriptional activity of these genes determines cellular susceptibility to MN1-induced transformation, and may represent a promising therapeutic target.
doi:10.1016/j.ccr.2011.06.020
PMCID: PMC3951989  PMID: 21741595 CAMSID: cams3759
4.  Use of mutation profiles to refine the classification of endometrial carcinomas 
The Journal of pathology  2012;228(1):20-30.
The classification of endometrial carcinomas is based on pathological assessment of tumour cell type; the different cell types (endometrioid, serous, carcinosarcoma, mixed, and clear cell) are associated with distinct molecular alterations. This current classification system for high-grade subtypes, in particular the distinction between high-grade endometrioid (EEC-3) and serous carcinomas (ESC), is limited in its reproducibility and prognostic abilities. Therefore, a search for specific molecular classifiers to improve endometrial carcinoma subclassification is warranted. We performed target enrichment sequencing on 393 endometrial carcinomas from two large cohorts, sequencing exons from the following 9 genes; ARID1A, PPP2R1A, PTEN, PIK3CA, KRAS, CTNNB1, TP53, BRAF and PPP2R5C. Based on this gene panel each endometrial carcinoma subtype shows a distinct mutation profile. EEC-3s have significantly different frequencies of PTEN and TP53 mutations when compared to low-grade endometrioid carcinomas. ESCs and EEC-3s are distinct subtypes with significantly different frequencies of mutations in PTEN, ARID1A, PPP2R1A, TP53, and CTNNB1. From the mutation profiles we were able to identify subtype outliers, i.e. cases diagnosed morphologically as one subtype but with a mutation profile suggestive of a different subtype. Careful review of these diagnostically challenging cases suggested that the original morphological classification was incorrect in most instances. The molecular profile of carcinosarcomas suggests two distinct mutation profiles for these tumours; endometrioid-type (PTEN, PIK3CA, ARID1A, KRAS mutations), and serous-type (TP53 and PPP2R1A mutations). While this nine gene panel does not allow for a purely molecularly based classification of endometrial carcinoma, it may prove useful as an adjunct to morphological classification and serve as an aid in the classification of problematic cases. If used in practice, it may lead to improved diagnostic reproducibility and may also serve to stratify patients for targeted therapeutics.
doi:10.1002/path.4056
PMCID: PMC3939694  PMID: 22653804
Endometrial carcinoma; uterine; mutation profiles; endometrioid; serous; carcinosarcoma; classification
5.  Draft Genome Sequences of Six Rhodobacter capsulatus Strains, YW1, YW2, B6, Y262, R121, and DE442 
Genome Announcements  2014;2(1):e00050-14.
Rhodobacter capsulatus is a model organism for studying a novel type of horizontal gene transfer mediated by a phage-like gene transfer agent (RcGTA). Here we report the draft genome sequences of six R. capsulatus strains that exhibit different RcGTA properties, including RcGTA overproducers, RcGTA nonproducers, and/or RcGTA nonreceivers.
doi:10.1128/genomeA.00050-14
PMCID: PMC3924369  PMID: 24526637
6.  Vitamin C induces Tet-dependent DNA demethylation in ESCs to promote a blastocyst-like state 
Nature  2013;500(7461):222-226.
DNA methylation is a heritable epigenetic modification involved in gene silencing, imprinting, and the suppression of retrotransposons1. Global DNA demethylation occurs in the early embryo and the germline2,3 and may be mediated by Tet (ten-eleven-translocation) enzymes4–6, which convert 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC)7. Tet enzymes have been extensively studied in mouse embryonic stem cells (ESCs)8–12, which are generally cultured in the absence of Vitamin C (VitC), a potential co-factor for Fe(II) 2-oxoglutarate dioxygenase enzymes like Tets. Here we report that addition of VitC to ESCs promotes Tet activity leading to a rapid and global increase in hmC. This is followed by DNA demethylation of numerous gene promoters and up-regulation of demethylated germline genes. Tet1 binding is enriched near the transcription start site (TSS) of genes affected by VitC treatment. Importantly, VitC, but not other antioxidants, enhances the activity of recombinant Tet1 in a biochemical assay and the VitC-induced changes in hmC and mC are entirely suppressed in Tet1/2 double knockout (Tet DKO) ESCs. VitC has the strongest effects on regions that gain methylation in cultured ESCs compared to blastocysts and in vivo are methylated only after implantation. In contrast, imprinted regions and intracisternal A-particle (IAP) retroelements, which are resistant to demethylation in the early embryo2,13, are resistant to VitC-induced DNA demethylation. Collectively, this study establishes VitC as a direct regulator of Tet activity and DNA methylation fidelity in ESCs.
doi:10.1038/nature12362
PMCID: PMC3893718  PMID: 23812591
Vitamin C; DNA methylation; hydroxymethylcytosine; Tet enzymes; epigenetics; embryonic stem cells; germ cells
7.  DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape 
Nature genetics  2013;45(7):10.1038/ng.2649.
Introduction
Transposable element (TE) derived sequences comprise half of our genome and DNA methylome, and are presumed densely methylated and inactive. Examination of the genome-wide DNA methylation status within 928 TE subfamilies in human embryonic and adult tissues revealed unexpected tissue-specific and subfamily-specific hypomethylation signatures. Genes proximal to tissue-specific hypomethylated TE sequences were enriched for functions important for the tissue type and their expression correlated strongly with hypomethylation of the TEs. When hypomethylated, these TE sequences gained tissue-specific enhancer marks including H3K4me1 and occupancy by p300, and a majority exhibited enhancer activity in reporter gene assays. Many such TEs also harbored binding sites for transcription factors that are important for tissue-specific functions and exhibited evidence for evolutionary selection. These data suggest that sequences derived from TEs may be responsible for wiring tissue type-specific regulatory networks, and have acquired tissue-specific epigenetic regulation.
doi:10.1038/ng.2649
PMCID: PMC3695047  PMID: 23708189
8.  DNA methylation and SETDB1/H3K9me3 regulate predominantly distinct sets of genes, retroelements and chimaeric transcripts in mouse ES cells 
Cell stem cell  2011;8(6):10.1016/j.stem.2011.04.004.
Summary
DNA methylation and histone H3 lysine 9 trimethylation (H3K9me3) play important roles in silencing of genes and retroelements. However, a comprehensive comparison of genes and repetitive elements repressed by these pathways has not been reported. Here we show that in mouse embryonic stem cells (mESCs), the genes up-regulated following deletion of the H3K9 methyltransferase Setdb1 are distinct from those de-repressed in mESC deficient in the DNA methyltransferases Dnmt1, Dnmt3a and Dnmt3b, with the exception of a small number of primarily germline-specific genes. Numerous endogenous retroviruses (ERVs) lose H3K9me3 and are concomitantly de-repressed exclusively in SETDB1 knockout mESCs. Strikingly, ~15% of up-regulated genes are induced in association with de-repression of promoter proximal ERVs, half in the context of “chimaeric” transcripts that initiate within these retroelements and splice to genic exons. Thus, SETDB1 plays a previously unappreciated yet critical role in inhibiting aberrant gene transcription by suppressing the expression of proximal ERVs.
doi:10.1016/j.stem.2011.04.004
PMCID: PMC3857791  PMID: 21624812 CAMSID: cams3765
9.  Integrated genome and transcriptome sequencing identifies a novel form of hybrid and aggressive prostate cancer† 
The Journal of pathology  2012;227(1):53-61.
Next-generation sequencing is making sequence-based molecular pathology and personalized oncology viable. We selected an individual initially diagnosed with conventional but aggressive prostate adenocarcinoma and sequenced the genome and transcriptome from primary and metastatic tissues collected prior to hormone therapy. The histology-pathology and copy number profiles were remarkably homogeneous, yet it was possible to propose the quadrant of the prostate tumour that likely seeded the metastatic diaspora. Despite a homogeneous cell type, our transcriptome analysis revealed signatures of both luminal and neuroendocrine cell types. Remarkably, the repertoire of expressed but apparently private gene fusions, including C15orf21:MYC, recapitulated this biology. We hypothesize that the amplification and over-expression of the stem cell gene MSI2 may have contributed to the stable hybrid cellular identity. This hybrid luminal-neuroendocrine tumour appears to represent a novel and highly aggressive case of prostate cancer with unique biological features and, conceivably, a propensity for rapid progression to castrate-resistance. Overall, this work highlights the importance of integrated analyses of genome, exome and transcriptome sequences for basic tumour biology, sequence-based molecular pathology and personalized oncology.
doi:10.1002/path.3987
PMCID: PMC3768138  PMID: 22294438
RNA sequencing; DNA sequencing; prostate cancer; fusion genes; neuroendocrine; personalized medicine; cancer genetics
10.  The genetic landscape of high-risk neuroblastoma 
Nature genetics  2013;45(3):279-284.
Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%1. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 cases using a combination of whole exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per megabase (0.48 non-silent), and remarkably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, an additional 7.1% had focal deletions), MYCN (1.7%, a recurrent p.Pro44Leu alteration), and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1, and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies reliant upon frequently altered oncogenic drivers.
doi:10.1038/ng.2529
PMCID: PMC3682833  PMID: 23334666
11.  Genetic alterations activating kinase and cytokine receptor signaling in high-risk acute lymphoblastic leukemia 
Cancer cell  2012;22(2):153-166.
SUMMARY
Genomic profiling has identified a subtype of high-risk B-progenitor acute lymphoblastic leukemia (B-ALL) with alteration of IKZF1, a gene expression profile similar to BCR-ABL1-positive ALL and poor outcome (Ph-like ALL). The genetic alterations that activate kinase signaling in Ph-like ALL are poorly understood. We performed transcriptome and whole genome sequencing on 15 cases of Ph-like ALL, and identified rearrangements involving ABL1, JAK2, PDGFRB, CRLF2 and EPOR, activating mutations of IL7R and FLT3, and deletion of SH2B3, which encodes the JAK2 negative regulator LNK. Importantly, several of these alterations induce transformation that is attenuated with tyrosine kinase inhibitors, suggesting the treatment outcome of these patients may be improved with targeted therapy.
doi:10.1016/j.ccr.2012.06.005
PMCID: PMC3422513  PMID: 22897847
12.  Next Generation Sequencing of Prostate Cancer from a Patient Identifies a Deficiency of Methylthioadenosine Phosphorylase (MTAP), an Exploitable Tumor Target 
Molecular cancer therapeutics  2012;11(3):775-783.
Castrate resistant prostate cancer (CRPC) and neuroendocrine carcinoma of the prostate are invariably fatal diseases for which only palliative therapies exist. As part of a prostate tumour sequencing program, a patient tumour was analyzed using Illumina genome sequencing and a matched renal capsule tumour xenograft was generated. Both tumour and xenograft had a homozygous 9p21 deletion spanning the MTAP, CDKN2 and ARF genes. It is rare for this deletion to occur in primary prostate tumours yet approximately 10% express decreased levels of MTAP mRNA. Decreased MTAP expression is a prognosticator for poor outcome. Moreover, it appears that this deletion is more common in CRPC than in primary prostate cancer. We show for the first time that treatment with methylthioadenosine and high dose 6-thioguanine causes marked inhibition of a patient derived neuroendocrine xenograft growth while protecting the host from 6-thioguanine toxicity. This therapeutic approach can be applied to other MTAP-deficient human cancers since deletion or hypermethylation of the MTAP gene occurs in a broad spectrum of tumours at high frequency. The combination of genome sequencing and patient-derived xenografts can identify candidate therapeutic agents and evaluate them for personalized oncology.
doi:10.1158/1535-7163.MCT-11-0826
PMCID: PMC3691697  PMID: 22252602
massively parallel sequencing; MTAP; patient-derived xenograft; genitourinary cancers: prostate; animal models of cancer; gene expression profiling; functional genomics; xenograft models
13.  DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution 
Nature methods  2012;9(11):1107-1112.
DNA rearrangements such as sister chromatid exchanges (SCEs) are sensitive indicators of genomic stress and instability, but they are typically masked by single-cell sequencing techniques. We developed Strand-seq to independently sequence parental DNA template strands from single cells, making it possible to map SCEs at orders-of-magnitude greater resolution than was previously possible. On average, murine embryonic stem (mES) cells exhibit eight SCEs, which are detected at a resolution of up to 23 bp. Strikingly, Strand-seq of 62 single mES cells predicts that the mm9 mouse reference genome assembly contains at least 17 incorrectly oriented segments totaling nearly 1% of the genome. These misoriented contigs and fragments have persisted through several iterations of the mouse reference genome and have been difficult to detect using conventional sequencing techniques. The ability to map SCE events at high resolution and fine-tune reference genomes by Strand-seq dramatically expands the scope of single-cell sequencing.
doi:10.1038/nmeth.2206
PMCID: PMC3580294  PMID: 23042453
16.  Concurrent CIC mutations, IDH mutations and 1p/19q loss distinguish oligodendrogliomas from other cancers 
The Journal of pathology  2011;226(1):7-16.
Oligodendroglioma is characterized by unique clinical, pathological, and genetic features. Recurrent losses of chromosomes 1p and 19q are strongly associated with this brain cancer but knowledge of the identity and function of the genes affected by these alterations is limited. We performed exome sequencing on a discovery set of 16 oligodendrogliomas with 1p/19q co-deletion to identify new molecular features at base-pair resolution. As anticipated, there was a high rate of IDH mutations: all cases had mutations in either IDH1 (14/16) or IDH2 (2/16). In addition, we discovered somatic mutations and insertions/deletions in the CIC gene on chromosome 19q13.2 in 13/16 tumours. These discovery set mutations were validated by deep sequencing of 13 additional tumours, which revealed 7 others with CIC mutations, thus bringing the overall mutation rate in oligodendrogliomas in this study to 20/29 (69%). In contrast, deep sequencing of astrocytomas and oligoastrocytomas without 1p/19q loss revealed that CIC alterations were otherwise rare (1/60; 2%). Of the 21 non-synonymous somatic mutations in 20 CIC-mutant oligodendrogliomas, 9 were in exon 5 within an annotated DNA interacting domain and 3 were in exon 20 within an annotated protein interacting domain. The remaining 9 were found in other exons and frequently included truncations. CIC mutations were highly associated with oligodendroglioma histology, 1p/19q co-deletion and IDH1/2 mutation (p<0.001). Although we observed no differences in the clinical outcomes of CIC mutant versus wild-type tumors, in a background of 1p/19q co-deletion, hemizygous CIC mutations are likely important. We hypothesize that the mutant CIC on the single retained 19q allele is linked to the pathogenesis of oligodendrogliomas with IDH mutation. Our detailed study of genetic aberrations in oligodendroglioma suggests a functional interaction between CIC mutation, IDH1/2 mutation and 1p/19q co-deletion.
doi:10.1002/path.2995
PMCID: PMC3246739  PMID: 22072542
Glioma; Oligodendroglioma; Next Generation Sequencing; Capicua; IDH1
17.  Mutation Discovery in Regions of Segmental Cancer Genome Amplifications with CoNAn-SNV: A Mixture Model for Next Generation Sequencing of Tumors 
PLoS ONE  2012;7(8):e41551.
Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome—in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)—which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes.
doi:10.1371/journal.pone.0041551
PMCID: PMC3420914  PMID: 22916110
18.  JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data 
Bioinformatics  2012;28(7):907-913.
Motivation: Identification of somatic single nucleotide variants (SNVs) in tumour genomes is a necessary step in defining the mutational landscapes of cancers. Experimental designs for genome-wide ascertainment of somatic mutations now routinely include next-generation sequencing (NGS) of tumour DNA and matched constitutional DNA from the same individual. This allows investigators to control for germline polymorphisms and distinguish somatic mutations that are unique to the tumour, thus reducing the burden of labour-intensive and expensive downstream experiments needed to verify initial predictions. In order to make full use of such paired datasets, computational tools for simultaneous analysis of tumour–normal paired sequence data are required, but are currently under-developed and under-represented in the bioinformatics literature.
Results: In this contribution, we introduce two novel probabilistic graphical models called JointSNVMix1 and JointSNVMix2 for jointly analysing paired tumour–normal digital allelic count data from NGS experiments. In contrast to independent analysis of the tumour and normal data, our method allows statistical strength to be borrowed across the samples and therefore amplifies the statistical power to identify and distinguish both germline and somatic events in a unified probabilistic framework.
Availability: The JointSNVMix models and four other models discussed in the article are part of the JointSNVMix software package available for download at http://compbio.bccrc.ca
Contact: sshah@bccrc.ca
Supplementary information:Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts053
PMCID: PMC3315723  PMID: 22285562
19.  Next generation sequencing based approaches to epigenomics 
Briefings in Functional Genomics  2011;9(5-6):455-465.
Next generation sequencing has brought epigenomic studies to the forefront of current research. The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago. With early proof of concept studies published, the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount. In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms. We will discuss these in terms of library preparation, sequence platforms and analysis techniques.
doi:10.1093/bfgp/elq035
PMCID: PMC3080743  PMID: 21266347
epigenomics; next generation sequencing
20.  Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data 
Bioinformatics  2011;28(2):167-175.
Motivation: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge.
Results: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth ‘false positive’ predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study.
Availability: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca.
Contact: saparicio@bccrc.ca
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr629
PMCID: PMC3259434  PMID: 22084253
21.  Retrotransposon-Induced Heterochromatin Spreading in the Mouse Revealed by Insertional Polymorphisms 
PLoS Genetics  2011;7(9):e1002301.
The “arms race” relationship between transposable elements (TEs) and their host has promoted a series of epigenetic silencing mechanisms directed against TEs. Retrotransposons, a class of TEs, are often located in repressed regions and are thought to induce heterochromatin formation and spreading. However, direct evidence for TE–induced local heterochromatin in mammals is surprisingly scarce. To examine this phenomenon, we chose two mouse embryonic stem (ES) cell lines that possess insertionally polymorphic retrotransposons (IAP, ETn/MusD, and LINE elements) at specific loci in one cell line but not the other. Employing ChIP-seq data for these cell lines, we show that IAP elements robustly induce H3K9me3 and H4K20me3 marks in flanking genomic DNA. In contrast, such heterochromatin is not induced by LINE copies and only by a minority of polymorphic ETn/MusD copies. DNA methylation is independent of the presence of IAP copies, since it is present in flanking regions of both full and empty sites. Finally, such spreading into genes appears to be rare, since the transcriptional start sites of very few genes are less than one Kb from an IAP. However, the B3galtl gene is subject to transcriptional silencing via IAP-induced heterochromatin. Hence, although rare, IAP-induced local heterochromatin spreading into nearby genes may influence expression and, in turn, host fitness.
Author Summary
Transposable elements (TEs) are often thought to be harmful because of their potential to spread heterochromatin (repressive chromatin) into nearby sequences. However, there are few examples of spreading of heterochromatin caused by TEs, even though they are often found within repressive chromatin. We exploited natural variation in TE integrations to study heterochromatin induction. Specifically, we compared chromatin states of two mouse embryonic stem cell lines harboring polymorphic retrotransposons of three families, such that one line possesses a particular TE copy (full site) while the other does not (empty site). Nearly all IAP copies, a family of retroviral-like elements, are able to strongly induce repressive chromatin surrounding their insertion sites, with repressive histone modifications extending at least one kb from the IAP. This heterochromatin induction was not observed for the LINE family of non-viral retrotransposons and for only a minority of copies of the ETn/MusD retroviral-like family. We found only one gene that was partly silenced by IAP-induced chromatin. Therefore, while induction of repressive chromatin occurs after IAP insertion, measurable impacts on host gene expression are rare. Nonetheless, this phenomenon may play a role in rapid change in gene expression and therefore in host adaptive potential.
doi:10.1371/journal.pgen.1002301
PMCID: PMC3183085  PMID: 21980304
22.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data 
PLoS Computational Biology  2011;7(5):e1001138.
Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.
Author Summary
Genome rearrangements and associated gene fusions are known to be important oncogenic events in some cancers. We have developed a novel computational method called deFuse for detecting gene fusions in RNA-Seq data and have applied it to the discovery of novel gene fusions in sarcoma and ovarian tumors. We assessed the accuracy of our method and found that deFuse produces substantially better sensitivity and specificity than two other published methods. We have also developed a set of 60 positive and 61 negative examples that will be useful for accurate identification of gene fusions in future RNA-Seq datasets. We have trained a classifier on 11 novel features of the 121 examples, and show that the classifier is able to accurately identify real gene fusions. The 45 gene fusions reported in this study represent the first ovarian cancer fusions reported, as well as novel sarcoma fusions. By examining the expression patterns of the affected genes, we find that many fusions are predicted to have functional consequences and thus merit experimental followup to determine their clinical relevance.
doi:10.1371/journal.pcbi.1001138
PMCID: PMC3098195  PMID: 21625565
23.  ARID1A Mutations in Endometriosis-Associated Ovarian Carcinomas 
The New England journal of medicine  2010;363(16):1532-1543.
BACKGROUND
Ovarian clear-cell and endometrioid carcinomas may arise from endometriosis, but the molecular events involved in this transformation have not been described.
METHODS
We sequenced the whole transcriptomes of 18 ovarian clear-cell carcinomas and 1 ovarian clear-cell carcinoma cell line and found somatic mutations in ARID1A (the AT-rich interactive domain 1A [SWI-like] gene) in 6 of the samples. ARID1A encodes BAF250a, a key component of the SWI–SNF chromatin remodeling complex. We sequenced ARID1A in an additional 210 ovarian carcinomas and a second ovarian clear-cell carcinoma cell line and measured BAF250a expression by means of immunohistochemical analysis in an additional 455 ovarian carcinomas.
RESULTS
ARID1A mutations were seen in 55 of 119 ovarian clear-cell carcinomas (46%), 10 of 33 endometrioid carcinomas (30%), and none of the 76 high-grade serous ovarian carcinomas. Seventeen carcinomas had two somatic mutations each. Loss of the BAF250a protein correlated strongly with the ovarian clear-cell carcinoma and endometrioid carcinoma subtypes and the presence of ARID1A mutations. In two patients, ARID1A mutations and loss of BAF250a expression were evident in the tumor and contiguous atypical endometriosis but not in distant endometriotic lesions.
CONCLUSIONS
These data implicate ARID1A as a tumor-suppressor gene frequently disrupted in ovarian clear-cell and endometrioid carcinomas. Since ARID1A mutation and loss of BAF250a can be seen in the preneoplastic lesions, we speculate that this is an early event in the transformation of endometriosis into cancer. (Funded by the British Columbia Cancer Foundation and the Vancouver General Hospital–University of British Columbia Hospital Foundation.)
doi:10.1056/NEJMoa1008433
PMCID: PMC2976679  PMID: 20942669
24.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications 
Nature biotechnology  2010;28(10):1097-1105.
Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression.
doi:10.1038/nbt.1682
PMCID: PMC2955169  PMID: 20852635
DNA methylation; Sequencing; Bisulfite
25.  Characterization of the Contradictory Chromatin Signatures at the 3′ Exons of Zinc Finger Genes 
PLoS ONE  2011;6(2):e17121.
The H3K9me3 histone modification is often found at promoter regions, where it functions to repress transcription. However, we have previously shown that 3′ exons of zinc finger genes (ZNFs) are marked by high levels of H3K9me3. We have now further investigated this unusual location for H3K9me3 in ZNF genes. Neither bioinformatic nor experimental approaches support the hypothesis that the 3′ exons of ZNFs are promoters. We further characterized the histone modifications at the 3′ ZNF exons and found that these regions also contain H3K36me3, a mark of transcriptional elongation. A genome-wide analysis of ChIP-seq data revealed that ZNFs constitute the majority of genes that have high levels of both H3K9me3 and H3K36me3. These results suggested the possibility that the ZNF genes may be imprinted, with one allele transcribed and one allele repressed. To test the hypothesis that the contradictory modifications are due to imprinting, we used a SNP analysis of RNA-seq data to demonstrate that both alleles of certain ZNF genes having H3K9me3 and H3K36me3 are transcribed. We next analyzed isolated ZNF 3′ exons using stably integrated episomes. We found that although the H3K36me3 mark was lost when the 3′ ZNF exon was removed from its natural genomic location, the isolated ZNF 3′ exons retained the H3K9me3 mark. Thus, the H3K9me3 mark at ZNF 3′ exons does not impede transcription and it is regulated independently of the H3K36me3 mark. Finally, we demonstrate a strong relationship between the number of tandemly repeated domains in the 3′ exons and the H3K9me3 mark. We suggest that the H3K9me3 at ZNF 3′ exons may function to protect the genome from inappropriate recombination rather than to regulate transcription.
doi:10.1371/journal.pone.0017121
PMCID: PMC3039671  PMID: 21347206

Results 1-25 (42)