Chromatin is composed of DNA and a variety of modified histones and non-histone proteins, which impact cell differentiation, gene regulation and other key cellular processes. We present a genome-wide chromatin landscape for Drosophila melanogaster based on 18 histone modifications, summarized by 9 prevalent combinatorial patterns. Integrative analysis with other data (non-histone chromatin proteins, DNaseI hypersensitivity, GRO-seq reads produced by engaged polymerase, short/long RNA products) reveals discrete characteristics of chromosomes, genes, regulatory elements, and other functional domains. We find that active genes display distinct chromatin signatures that are correlated with disparate gene lengths, exon patterns, regulatory functions, and genomic contexts. We also demonstrate a diversity of signatures among Polycomb targets that include a subset with paused polymerase. This systematic profiling and integrative analysis of chromatin signatures provides insights into how genomic elements are regulated, and will serve as a resource for future experimental investigations of genome structure and function.
Microsatellites - simple tandem repeats present at millions of sites in the human genome - can shorten or lengthen due to a defect in DNA mismatch repair. We present here the first comprehensive genome-wide analysis of the prevalence, mutational spectrum and functional consequences of microsatellite instability (MSI) in cancer genomes. We analyzed MSI in 277 colorectal and endometrial cancer genomes (including 57 microsatellite-unstable ones) using exome and whole-genome sequencing data. Recurrent MSI events in coding sequences showed tumor type-specificity, elevated frameshift-to-inframe ratios, and lower transcript levels than wildtype alleles. Moreover, genome-wide analysis revealed differences in the distribution of MSI versus point mutations, including overrepresentation of MSI in euchromatic and intronic regions compared to heterochromatic and intergenic regions, respectively, and depletion of MSI at nucleosome-occupied sequences. Our results provide a panoramic view of MSI in cancer genomes, highlighting their tumor type-specificity, impact on gene expression, and the role of chromatin organization.
Gastric cancer is a leading cause of cancer deaths, but analysis of its molecular and clinical characteristics has been complicated by histological and aetiological heterogeneity. Here we describe a comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project. We propose a molecular classification dividing gastric cancer into four subtypes: tumours positive for Epstein–Barr virus, which display recurrent PIK3CA mutations, extreme DNA hypermethylation, and amplification of JAK2, CD274 (also known as PD-L1) and PDCD1LG2 (also knownasPD-L2); microsatellite unstable tumours, which show elevated mutation rates, including mutations of genes encoding targetable oncogenic signalling proteins; genomically stable tumours, which are enriched for the diffuse histological variant and mutations of RHOA or fusions involving RHO-family GTPase-activating proteins; and tumours with chromosomal instability, which show marked aneuploidy and focal amplification of receptor tyrosine kinases. Identification of these subtypes provides a roadmap for patient stratification and trials of targeted therapies.
Short bowel syndrome remains a condition of high morbidity and mortality, and current therapeutic options carry significant side effects. To identify new treatments we focused on postresection changes in microRNAs—short noncoding RNAs, which suppress target genes—and suggest a previously undiscovered role for microRNA-125a (mir-125a) in intestinal adaptation.
Rats underwent either 80% massive small bowel resection or transection and were harvested after 48 hours. Jejunum was harvested for microRNA microarrays, laser capture microdissection, and RNA and protein analysis. Mir-125a was overexpressed in intestinal epithelium–6 (crypt-derived) cells (IEC-6) and effects on proliferation and apoptosis determined using MTS and flow cytometry. Expression of potential targets of mir-125a in rat jejunum and IEC-6 cells was determined using quantitative real-time polymerase chain reaction (RNA) and Western blotting (protein).
Resection upregulated mir-125a and mir-214 by 2.4-folds and 3.2-folds, respectively. Highest levels of expression were noted in the crypt fraction. Mir-125a overexpression induced apoptosis and resultant growth arrest in IEC-6 cells. The expression of the prosurvival Bcl-2 family member Mcl-1 was downregulated in both mir-125a-overexpressing IEC-6 cells and in jejunum of resected rats, confirming Mcl-1 as a previously undiscovered target of mir-125a.
Upregulation of mir-125a suppresses the prosurvival protein Mcl1, producing the increase in apoptosis known to accompany the proliferative changes characteristic of intestinal adaptation. Our data highlight a potential role for microRNAs as mediators of the adaptive process and may facilitate the development of new therapeutic options for short bowel syndrome.
pluripotent stem cells; reprogramming; epigenetics; chromatin interactions; chromosome conformation capture
While the chromatin state of pluripotency genes has been extensively studied in embryonic stem cells (ESCs) and differentiated cells, their potential interactions with other parts of the genome remain largely unexplored. Here, we identified a genome-wide, pluripotency-specific interaction network around the Nanog promoter by adapting circular chromosome conformation capture-sequencing (4C-seq). This network was rearranged during differentiation and restored in induced pluripotent stem cells. A large fraction of Nanog-interacting loci were bound by Mediator or cohesin in pluripotent cells. Depletion of these proteins from ESCs resulted in a disruption of contacts and the acquisition of a differentiation-specific interaction pattern prior to obvious transcriptional and phenotypic changes. Similarly, the establishment of Nanog interactions during reprogramming often preceded transcriptional upregulation of associated genes, suggesting a causative link. Our results document a complex, pluripotencyspecific chromatin "interactome" for Nanog and suggest a functional role for longrange genomic interactions in the maintenance and induction of pluripotency.
Spt6 is a highly conserved histone chaperone that interacts directly with both RNA polymerase II and histones to regulate gene expression. To gain a comprehensive understanding of the roles of Spt6, we performed genome-wide analyses of transcription, chromatin structure, and histone modifications in a Schizosaccharomyces
pombe spt6 mutant. Our results demonstrate dramatic changes to transcription and chromatin structure in the mutant, including elevated antisense transcripts at >70% of all genes and general loss of the +1 nucleosome. Furthermore, Spt6 is required for marks associated with active transcription, including trimethylation of histone H3 on lysine 4, previously observed in humans but not Saccharomyces cerevisiae, and lysine 36. Taken together, our results indicate that Spt6 is critical for the accuracy of transcription and the integrity of chromatin, likely via its direct interactions with RNA polymerase II and histones.
Identification of somatic rearrangements in cancer genomes has accelerated through analysis of high-throughput sequencing data. However, characterization of complex structural alterations and their underlying mechanisms remains inadequate. Here, applying an algorithm to predict structural variations from short reads, we report a comprehensive catalog of somatic structural variations and the mechanisms generating them, using high-coverage whole-genome sequencing data from 140 patients across ten tumor types. We characterize the relative contributions of different types of rearrangements and their mutational mechanisms, find that ~20% of the somatic deletions are complex deletions formed by replication errors, and describe the differences between the mutational mechanisms in somatic and germline alterations. Importantly, we provide detailed reconstructions of the events responsible for loss of CDKN2A/B and gain of EGFR in glioblastoma, revealing that these alterations can result from multiple mechanisms even in a single genome and that both DNA double-strand breaks and replication errors drive somatic rearrangements.
Aneuploidy has been recognized as a hallmark of cancer for over 100 years, yet no general theory to explain the recurring patterns of aneuploidy in cancer has emerged. Here we develop Tumor Suppressor and Oncogene (TUSON) Explorer, a computational method that analyzes the patterns of mutational signatures in tumors and predicts the likelihood that any individual gene functions as a tumor suppressor (TSG) or oncogene (OG). By analyzing >8200 tumor-normal pairs we provide statistical evidence suggesting many more genes possess cancer driver properties than anticipated, forming a continuum of oncogenic potential. Integrating our driver predictions with information on somatic copy number alterations, we find that the distribution and the potency of TSGs (STOP genes), OGs and essential genes (GO genes) on chromosomes can predict the complex patterns of aneuploidy and copy number variation characteristic of cancer genomes. We propose that the cancer genome is shaped through a process of cumulative haploinsufficiency and triplosensitivity.
Summary: We have developed Nozzle, an R package that provides an Application Programming Interface to generate HTML reports with dynamic user interface elements. Nozzle was designed to facilitate summarization and rapid browsing of complex results in data analysis pipelines where multiple analyses are performed frequently on big datasets. The package can be applied to any project where user-friendly reports need to be created.
Availability: The R package is available on CRAN at http://cran.r-project.org/package=Nozzle.R1. Examples and additional materials are available at http://gdac.broadinstitute.org/nozzle. The source code is also available at http://www.github.com/parklab/Nozzle.
Supplementary data are available at Bioinformatics online.
Glioblastoma contains a hierarchy of stem-like cancer cells, but how this hierarchy is established is unclear. Here, we show that asymmetric Numb localization specifies glioblastoma stem-like cell (GSC) fate in a manner that does not require Notch inhibition. Numb is asymmetrically localized to CD133-hi GSCs. The predominant Numb isoform, Numb4, decreases Notch and promotes a CD133-hi, radial glial-like phenotype. However, upregulation of a novel Numb isoform, Numb4 delta 7 (Numb4d7), increases Notch and AKT activation while nevertheless maintaining CD133-hi fate specification. Numb knockdown increases Notch and promotes growth while favoring a CD133-lo, glial progenitor-like phenotype. We report the novel finding that Numb4 (but not Numb4d7) promotes SCFFbw7 ubiquitin ligase assembly and activation to increase Notch degradation. However, both Numb isoforms decrease epidermal growth factor receptor (EGFR) expression, thereby regulating GSC fate. Small molecule inhibition of EGFR activity phenocopies the effect of Numb on CD133 and Pax6. Clinically, homozygous NUMB deletions and low Numb mRNA expression occur primarily in a subgroup of proneural glioblastomas. Higher Numb expression is found in classical and mesenchymal glioblastomas and correlates with decreased survival. Thus, decreased Numb promotes glioblastoma growth, but the remaining Numb establishes a phenotypically diverse stem-like cell hierarchy that increases tumor aggressiveness and therapeutic resistance.
In a chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiment, an important consideration in experimental design is the minimum number of sequenced reads required to obtain statistically significant results. We present an extensive evaluation of the impact of sequencing depth on identification of enriched regions for key histone modifications (H3K4me3, H3K36me3, H3K27me3 and H3K9me2/me3) using deep-sequenced datasets in human and fly. We propose to define sufficient sequencing depth as the number of reads at which detected enrichment regions increase <1% for an additional million reads. Although the required depth depends on the nature of the mark and the state of the cell in each experiment, we observe that sufficient depth is often reached at <20 million reads for fly. For human, there are no clear saturation points for the examined datasets, but our analysis suggests 40–50 million reads as a practical minimum for most marks. We also devise a mathematical model to estimate the sufficient depth and total genomic coverage of a mark. Lastly, we find that the five algorithms tested do not agree well for broad enrichment profiles, especially at lower depths. Our findings suggest that sufficient sequencing depth and an appropriate peak-calling algorithm are essential for ensuring robustness of conclusions derived from ChIP-seq data.
Polycomb group (PcG)-mediated repression is an evolutionarily conserved process critical for cell fate determination and maintenance of gene expression during embryonic development. However, the mechanisms underlying PcG recruitment in mammals remain unclear since few regulatory sites have been identified. We report two novel prospective PcG-dependent regulatory elements within the human HOXB and HOXC clusters and compare their repressive activities to a previously identified element in the HOXD cluster. These regions recruited the PcG proteins BMI1 and SUZ12 to a reporter construct in mesenchymal stem cells and conferred repression that was dependent upon PcG expression. Furthermore, we examined the potential of two DNA-binding proteins, JARID2 and YY1, to regulate PcG activity at these three elements. JARID2 has differential requirements, whereas YY1 appears to be required for repressive activity at all 3 sites. We conclude that distinct elements of the mammalian HOX clusters can recruit components of the PcG complexes and confer repression, similar to what has been seen in Drosophila. These elements, however, have diverse requirements for binding factors, which, combined with previous data on other loci, speaks to the complexity of PcG targeting in mammals.
Increased levels of hypoxia and hypoxia inducible factor 1 (HIF-1α) in human sarcomas correlate with tumor progression and radiation resistance. Prolonged anti-angiogenic therapy of tumors can delay tumor growth but may also increase hypoxia and HIF-1α activity. In our recent clinical trial, treatment with the anti-vascular endothelial growth factor A (VEGF-A) antibody, bevacizumab, followed by a combination of bevacizumab and radiation led to near complete necrosis in nearly half of sarcomas. Gene set enrichment analysis of microarrays from pre-treatment biopsies found the Gene Ontology category “Response to hypoxia” was upregulated in poor responders, and hierarchical clustering based on 140 hypoxia-responsive genes reliably separated poor responders from good responders. The most commonly used chemotherapeutic drug for sarcomas, doxorubicin (Dox), was recently found to block HIF-1α binding to DNA at low metronomic doses. In 4 sarcoma cell lines, HIF-1 shRNA or Dox at low concentrations blocked HIF-1α induction of VEGF-A by 84–97%, and carbonic anhydrase 9 by 83–93%. HT1080 sarcoma xenografts had increased hypoxia and/or HIF-1α activity with increasing tumor size and with anti-VEGF receptor antibody (DC101) treatment. Combining DC101 with HIF-1α shRNA or metronomic Dox had a synergistic effect in suppressing growth of HT1080 xenografts, at least in part via induction of tumor endothelial cell apoptosis. In conclusion, sarcomas respond to increased hypoxia by expressing HIF-1α-target genes which may promote resistance to anti-angiogenic and other therapies. Adding HIF-1α inhibition blocks resistance and augments destruction of the tumor vasculature.
ChIP-seq has become the primary method for identifying in vivo protein–DNA interactions on a genome-wide scale, with nearly 800 publications involving the technique appearing in PubMed as of December 2012. Individually and in aggregate, these data are an important and information-rich resource. However, uncertainties about data quality confound their use by the wider research community. Recently, the Encyclopedia of DNA Elements (ENCODE) project developed and applied metrics to objectively measure ChIP-seq data quality. The ENCODE quality analysis was useful for flagging datasets for closer inspection, eliminating or replacing poor data, and for driving changes in experimental pipelines. There had been no similarly systematic quality analysis of the large and disparate body of published ChIP-seq profiles. Here, we report a uniform analysis of vertebrate transcription factor ChIP-seq datasets in the Gene Expression Omnibus (GEO) repository as of April 1, 2012. The majority (55%) of datasets scored as being highly successful, but a substantial minority (20%) were of apparently poor quality, and another ∼25% were of intermediate quality. We discuss how different uses of ChIP-seq data are affected by specific aspects of data quality, and we highlight exceptional instances for which the metric values should not be taken at face value. Unexpectedly, we discovered that a significant subset of control datasets (i.e., no immunoprecipitation and mock immunoprecipitation samples) display an enrichment structure similar to successful ChIP-seq data. This can, in turn, affect peak calling and data interpretation. Published datasets identified here as high-quality comprise a large group that users can draw on for large-scale integrated analysis. In the future, ChIP-seq quality assessment similar to that used here could guide experimentalists at early stages in a study, provide useful input in the publication process, and be used to stratify ChIP-seq data for different community-wide uses.
ChIP-seq; chromatin immunoprecipitation; cross-correlation; quality assessment; transcription factor
Aberrant activation of the Hedgehog (Hh) pathway can drive tumorigenesis1. To investigate the mechanism by which glioma-associated oncogene family zinc finger-1 (GLI1), a crucial effector of Hh signaling2, regulates Hh pathway activation, we searched for GLI1-interacting proteins. We report that the chromatin remodeling protein SNF5 (encoded by SMARCB1, hereafter called SNF5), which is inactivated in human malignant rhabdoid tumors (MRTs), interacts with GLI1. We show that Snf5 localizes to Gli1-regulated promoters and that loss of Snf5 leads to activation of the Hh-Gli pathway. Conversely, re-expression of SNF5 in MRT cells represses GLI1. Consistent with this, we show the presence of a Hh-Gli–activated gene expression profile in primary MRTs and show that GLI1 drives the growth of SNF5-deficient MRT cells in vitro and in vivo. Therefore, our studies reveal that SNF5 is a key mediator of Hh signaling and that aberrant activation of GLI1 is a previously undescribed targetable mechanism contributing to the growth of MRT cells.
Dosage compensation in Drosophila is mediated by the MSL complex, which increases male X-linked gene expression approximately two-fold. The MSL complex preferentially binds the bodies of active genes on the male X, depositing H4K16ac with a 3′ bias. Two models have been proposed for the influence of the MSL complex on transcription: one based on promoter recruitment of RNA polymerase II (Pol II), and a second featuring enhanced transcriptional elongation. Here we utilize nascent RNA-sequencing to document dosage compensation during transcriptional elongation. We also compare X and autosomes from published data on paused and elongating polymerase to assess the role of Pol II recruitment. Our results support a model for differentially regulated elongation, starting with release from 5′ pausing and increasing through X-linked gene bodies. Our results highlight facilitated transcriptional elongation as a key mechanism for coordinated regulation of a diverse set of genes.
Sequence-directed mapping of nucleosome positions is of major biological interest. Here, we present a web-interface for estimation of the affinity of the histone core to DNA and prediction of nucleosome arrangement on a given sequence. Our approach is based on assessment of the energy cost of imposing the deformations required to wrap DNA around the histone surface. The interface allows the user to specify a number of options such as selecting from several structural templates for threading calculations and adding random sequences to the analysis.
mRNA expression profiling has suggested the existence of multiple glioblastoma subclasses, but their number and characteristics vary among studies and the etiology underlying their development is unclear. In this study, we analyzed 261 microRNA expression profiles from the Cancer Genome Atlas (TCGA), identifying five clinically and genetically distinct subclasses of glioblastoma that each related to a different neural precursor cell type. These microRNA-based glioblastoma subclasses displayed microRNA and mRNA expression signatures resembling those of radial glia, oligoneuronal precursors, neuronal precursors, neuroepithelial/neural crest precursors or astrocyte precursors. Each subclass was determined to be genetically distinct, based on the significant differences they displayed in terms of patient race, age, treatment response and survival. We also identified several microRNAs as potent regulators of subclass-specific gene expression networks in glioblastoma. Foremost among these is miR-9, which suppresses mesenchymal differentiation in glioblastoma by downregulating expression of JAK kinases and inhibiting activation of STAT3. Our findings suggest that microRNAs are important determinants of glioblastoma subclasses through their ability to regulate developmental growth and differentiation programs in several transformed neural precursor cell types. Taken together, our results define developmental microRNA expression signatures that both characterize and contribute to the phenotypic diversity of glioblastoma subclasses, thereby providing an expanded framework for understanding the pathogenesis of glioblastoma in a human neurodevelopmental context.
MicroRNA; Microarray; Glioma; Tumor; Genome
MicroRNAs (miRNAs) are endogenous noncoding RNA molecules that are involved in post-transcriptional gene silencing. Using global miRNA expression profiling, we found miR-21, -155, and 18a to be highly upregulated in rat kidneys following tubular injury induced by ischemia/reperfusion (I/R) or gentamicin administration. Mir-21 and -155 also showed decreased expression patterns in blood and urinary supernatants in both models of kidney injury. Furthermore, urinary levels of miR-21 increased 1.2-fold in patients with clinical diagnosis of acute kidney injury (AKI) (n = 22) as compared with healthy volunteers (n = 25) (p < 0.05), and miR-155 decreased 1.5-fold in patients with AKI (p < 0.01). We identified 29 messenger RNA core targets of these 3 miRNAs using the context likelihood of relatedness algorithm and found these predicted gene targets to be highly enriched for genes associated with apoptosis or cell proliferation. Taken together, these results suggest that miRNA-21 and -155 could potentially serve as translational biomarkers for detection of AKI and may play a critical role in the pathogenesis of kidney injury and tissue repair process.
MicroRNAs; kidney; biomarker; ischemia/reperfusion injury; nephrotoxicity.
Variation in chromatin composition and organization often reflects differences in genome function. Histone variants, for example, replace canonical histones to contribute to regulation of numerous nuclear processes including transcription, DNA repair and chromosome segregation. Here we focus on H2A.Bbd, a rapidly evolving variant found in mammals but not in invertebrates. We report that in human cells, nucleosomes bearing H2A.Bbd form unconventional chromatin structures enriched within actively transcribed genes and characterized by shorter DNA protection and nucleosome spacing. Analysis of transcriptional profiles from cells depleted for H2A.Bbd demonstrated widespread changes in gene expression with a net down-regulation of transcription and disruption of normal mRNA splicing patterns. In particular, we observed changes in exon inclusion rates and increased presence of intronic sequences in mRNA products upon H2A.Bbd depletion. Taken together, our results indicate that H2A.Bbd is involved in formation of a specific chromatin structure that facilitates both transcription and initial mRNA processing.
Structural variations are widespread in the human genome and can serve as genetic markers in clinical and evolutionary studies. With the advances in the next-generation sequencing technology, recent methods allow for identification of structural variations with unprecedented resolution and accuracy. They also provide opportunities to discover variants that could not be detected on conventional microarray-based platforms, such as dosage-invariant chromosomal translocations and inversions. In this review, we will describe some of the sequencing-based algorithms for detection of structural variations and discuss the key issues in future development.
copy number variations; paired-end sequencing; chromosomal alterations; translocations; indels
Urothelial carcinoma (UC) causes substantial morbidity and mortality worldwide. However, the molecular mechanisms underlying urothelial cancer development and tumor progression are still largely unknown. Using informatics analysis, we identified Sh3gl2 (endophilin A1) as a bladder urothelium-enriched transcript. The gene encoding Sh3gl2 is located on chromosome 9p, a region frequently altered in UC. Sh3gl2 is known to regulate endocytosis of receptor tyrosine kinases implicated in oncogenesis, such as the epidermal growth factor receptor (EGFR) and c-Met. However, its role in UC pathogenesis is unknown. Informatics analysis of expression profiles as well as immunohistochemical staining of tissue microarrays revealed Sh3gl2 expression to be decreased in UC specimens compared to nontumor tissues. Loss of Sh3gl2 was associated with increasing tumor grade and with muscle invasion, which is a reliable predictor of metastatic disease and cancer-derived mortality. Sh3gl2 expression was undetectable in 19 of 20 human UC cell lines but preserved in the low-grade cell line RT4. Stable silencing of Sh3gl2 in RT4 cells by RNA interference 1) enhanced proliferation and colony formation in vitro, 2) inhibited EGF-induced EGFR internalization and increased EGFR activation, 3) stimulated phosphorylation of Src family kinases and STAT3, and 4) promoted growth of RT4 xenografts in subrenal capsule tissue recombination experiments. Conversely, forced re-expression of Sh3gl2 in T24 cells and silenced RT4 clones attenuated oncogenic behaviors, including growth and migration. Together, these findings identify loss of Sh3gl2 as a frequent event in UC development that promotes disease progression.