Oligodendroglioma is characterized by unique clinical, pathological, and genetic features. Recurrent losses of chromosomes 1p and 19q are strongly associated with this brain cancer but knowledge of the identity and function of the genes affected by these alterations is limited. We performed exome sequencing on a discovery set of 16 oligodendrogliomas with 1p/19q co-deletion to identify new molecular features at base-pair resolution. As anticipated, there was a high rate of IDH mutations: all cases had mutations in either IDH1 (14/16) or IDH2 (2/16). In addition, we discovered somatic mutations and insertions/deletions in the CIC gene on chromosome 19q13.2 in 13/16 tumours. These discovery set mutations were validated by deep sequencing of 13 additional tumours, which revealed 7 others with CIC mutations, thus bringing the overall mutation rate in oligodendrogliomas in this study to 20/29 (69%). In contrast, deep sequencing of astrocytomas and oligoastrocytomas without 1p/19q loss revealed that CIC alterations were otherwise rare (1/60; 2%). Of the 21 non-synonymous somatic mutations in 20 CIC-mutant oligodendrogliomas, 9 were in exon 5 within an annotated DNA interacting domain and 3 were in exon 20 within an annotated protein interacting domain. The remaining 9 were found in other exons and frequently included truncations. CIC mutations were highly associated with oligodendroglioma histology, 1p/19q co-deletion and IDH1/2 mutation (p<0.001). Although we observed no differences in the clinical outcomes of CIC mutant versus wild-type tumors, in a background of 1p/19q co-deletion, hemizygous CIC mutations are likely important. We hypothesize that the mutant CIC on the single retained 19q allele is linked to the pathogenesis of oligodendrogliomas with IDH mutation. Our detailed study of genetic aberrations in oligodendroglioma suggests a functional interaction between CIC mutation, IDH1/2 mutation and 1p/19q co-deletion.
Glioma; Oligodendroglioma; Next Generation Sequencing; Capicua; IDH1
Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome—in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)—which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes.
Motivation: Identification of somatic single nucleotide variants (SNVs) in tumour genomes is a necessary step in defining the mutational landscapes of cancers. Experimental designs for genome-wide ascertainment of somatic mutations now routinely include next-generation sequencing (NGS) of tumour DNA and matched constitutional DNA from the same individual. This allows investigators to control for germline polymorphisms and distinguish somatic mutations that are unique to the tumour, thus reducing the burden of labour-intensive and expensive downstream experiments needed to verify initial predictions. In order to make full use of such paired datasets, computational tools for simultaneous analysis of tumour–normal paired sequence data are required, but are currently under-developed and under-represented in the bioinformatics literature.
Results: In this contribution, we introduce two novel probabilistic graphical models called JointSNVMix1 and JointSNVMix2 for jointly analysing paired tumour–normal digital allelic count data from NGS experiments. In contrast to independent analysis of the tumour and normal data, our method allows statistical strength to be borrowed across the samples and therefore amplifies the statistical power to identify and distinguish both germline and somatic events in a unified probabilistic framework.
Availability: The JointSNVMix models and four other models discussed in the article are part of the JointSNVMix software package available for download at http://compbio.bccrc.ca
Supplementary information:Supplementary data are available at Bioinformatics online.
Next generation sequencing has brought epigenomic studies to the forefront of current research. The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago. With early proof of concept studies published, the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount. In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms. We will discuss these in terms of library preparation, sequence platforms and analysis techniques.
epigenomics; next generation sequencing
Motivation: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge.
Results: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth ‘false positive’ predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study.
Availability: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca.
Supplementary information: Supplementary data are available at Bioinformatics online.
The “arms race” relationship between transposable elements (TEs) and their host has promoted a series of epigenetic silencing mechanisms directed against TEs. Retrotransposons, a class of TEs, are often located in repressed regions and are thought to induce heterochromatin formation and spreading. However, direct evidence for TE–induced local heterochromatin in mammals is surprisingly scarce. To examine this phenomenon, we chose two mouse embryonic stem (ES) cell lines that possess insertionally polymorphic retrotransposons (IAP, ETn/MusD, and LINE elements) at specific loci in one cell line but not the other. Employing ChIP-seq data for these cell lines, we show that IAP elements robustly induce H3K9me3 and H4K20me3 marks in flanking genomic DNA. In contrast, such heterochromatin is not induced by LINE copies and only by a minority of polymorphic ETn/MusD copies. DNA methylation is independent of the presence of IAP copies, since it is present in flanking regions of both full and empty sites. Finally, such spreading into genes appears to be rare, since the transcriptional start sites of very few genes are less than one Kb from an IAP. However, the B3galtl gene is subject to transcriptional silencing via IAP-induced heterochromatin. Hence, although rare, IAP-induced local heterochromatin spreading into nearby genes may influence expression and, in turn, host fitness.
Transposable elements (TEs) are often thought to be harmful because of their potential to spread heterochromatin (repressive chromatin) into nearby sequences. However, there are few examples of spreading of heterochromatin caused by TEs, even though they are often found within repressive chromatin. We exploited natural variation in TE integrations to study heterochromatin induction. Specifically, we compared chromatin states of two mouse embryonic stem cell lines harboring polymorphic retrotransposons of three families, such that one line possesses a particular TE copy (full site) while the other does not (empty site). Nearly all IAP copies, a family of retroviral-like elements, are able to strongly induce repressive chromatin surrounding their insertion sites, with repressive histone modifications extending at least one kb from the IAP. This heterochromatin induction was not observed for the LINE family of non-viral retrotransposons and for only a minority of copies of the ETn/MusD retroviral-like family. We found only one gene that was partly silenced by IAP-induced chromatin. Therefore, while induction of repressive chromatin occurs after IAP insertion, measurable impacts on host gene expression are rare. Nonetheless, this phenomenon may play a role in rapid change in gene expression and therefore in host adaptive potential.
Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.
Genome rearrangements and associated gene fusions are known to be important oncogenic events in some cancers. We have developed a novel computational method called deFuse for detecting gene fusions in RNA-Seq data and have applied it to the discovery of novel gene fusions in sarcoma and ovarian tumors. We assessed the accuracy of our method and found that deFuse produces substantially better sensitivity and specificity than two other published methods. We have also developed a set of 60 positive and 61 negative examples that will be useful for accurate identification of gene fusions in future RNA-Seq datasets. We have trained a classifier on 11 novel features of the 121 examples, and show that the classifier is able to accurately identify real gene fusions. The 45 gene fusions reported in this study represent the first ovarian cancer fusions reported, as well as novel sarcoma fusions. By examining the expression patterns of the affected genes, we find that many fusions are predicted to have functional consequences and thus merit experimental followup to determine their clinical relevance.
Ovarian clear-cell and endometrioid carcinomas may arise from endometriosis, but the molecular events involved in this transformation have not been described.
We sequenced the whole transcriptomes of 18 ovarian clear-cell carcinomas and 1 ovarian clear-cell carcinoma cell line and found somatic mutations in ARID1A (the AT-rich interactive domain 1A [SWI-like] gene) in 6 of the samples. ARID1A encodes BAF250a, a key component of the SWI–SNF chromatin remodeling complex. We sequenced ARID1A in an additional 210 ovarian carcinomas and a second ovarian clear-cell carcinoma cell line and measured BAF250a expression by means of immunohistochemical analysis in an additional 455 ovarian carcinomas.
ARID1A mutations were seen in 55 of 119 ovarian clear-cell carcinomas (46%), 10 of 33 endometrioid carcinomas (30%), and none of the 76 high-grade serous ovarian carcinomas. Seventeen carcinomas had two somatic mutations each. Loss of the BAF250a protein correlated strongly with the ovarian clear-cell carcinoma and endometrioid carcinoma subtypes and the presence of ARID1A mutations. In two patients, ARID1A mutations and loss of BAF250a expression were evident in the tumor and contiguous atypical endometriosis but not in distant endometriotic lesions.
These data implicate ARID1A as a tumor-suppressor gene frequently disrupted in ovarian clear-cell and endometrioid carcinomas. Since ARID1A mutation and loss of BAF250a can be seen in the preneoplastic lesions, we speculate that this is an early event in the transformation of endometriosis into cancer. (Funded by the British Columbia Cancer Foundation and the Vancouver General Hospital–University of British Columbia Hospital Foundation.)
Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression.
DNA methylation; Sequencing; Bisulfite
The H3K9me3 histone modification is often found at promoter regions, where it functions to repress transcription. However, we have previously shown that 3′ exons of zinc finger genes (ZNFs) are marked by high levels of H3K9me3. We have now further investigated this unusual location for H3K9me3 in ZNF genes. Neither bioinformatic nor experimental approaches support the hypothesis that the 3′ exons of ZNFs are promoters. We further characterized the histone modifications at the 3′ ZNF exons and found that these regions also contain H3K36me3, a mark of transcriptional elongation. A genome-wide analysis of ChIP-seq data revealed that ZNFs constitute the majority of genes that have high levels of both H3K9me3 and H3K36me3. These results suggested the possibility that the ZNF genes may be imprinted, with one allele transcribed and one allele repressed. To test the hypothesis that the contradictory modifications are due to imprinting, we used a SNP analysis of RNA-seq data to demonstrate that both alleles of certain ZNF genes having H3K9me3 and H3K36me3 are transcribed. We next analyzed isolated ZNF 3′ exons using stably integrated episomes. We found that although the H3K36me3 mark was lost when the 3′ ZNF exon was removed from its natural genomic location, the isolated ZNF 3′ exons retained the H3K9me3 mark. Thus, the H3K9me3 mark at ZNF 3′ exons does not impede transcription and it is regulated independently of the H3K36me3 mark. Finally, we demonstrate a strong relationship between the number of tandemly repeated domains in the 3′ exons and the H3K9me3 mark. We suggest that the H3K9me3 at ZNF 3′ exons may function to protect the genome from inappropriate recombination rather than to regulate transcription.
B cell lymphoma 6 (BCL6), which encodes a transcriptional repressor, is a critical oncogene in diffuse large B cell lymphomas (DLBCLs). Although a retro-inverted BCL6 peptide inhibitor (RI-BPI) was recently shown to potently kill DLBCL cells, the underlying mechanisms remain unclear. Here, we show that RI-BPI induces a particular gene expression signature in human DLBCL cell lines that included genes associated with the actions of histone deacetylase (HDAC) and Hsp90 inhibitors. BCL6 directly repressed the expression of p300 lysine acetyltransferase (EP300) and its cofactor HLA-B–associated transcript 3 (BAT3). RI-BPI induced expression of p300 and BAT3, resulting in acetylation of p300 targets including p53 and Hsp90. Induction of p300 and BAT3 was required for the antilymphoma effects of RI-BPI, since specific blockade of either protein rescued human DLBCL cell lines from the BCL6 inhibitor. Consistent with this, combination of RI-BPI with either an HDAC inhibitor (HDI) or an Hsp90 inhibitor potently suppressed or even eradicated established human DLBCL xenografts in mice. Furthermore, HDAC and Hsp90 inhibitors independently enhanced RI-BPI killing of primary human DLBCL cells in vitro. We also show that p300-inactivating mutations occur naturally in human DLBCL patients and may confer resistance to BCL6 inhibitors. Thus, BCL6 repression of EP300 provides a basis for rational targeted combinatorial therapy for patients with DLBCL.
Adenocarcinomas of the tongue are rare and represent the minority (20 to 25%) of salivary gland tumors affecting the tongue. We investigated the utility of massively parallel sequencing to characterize an adenocarcinoma of the tongue, before and after treatment.
In the pre-treatment tumor we identified 7,629 genes within regions of copy number gain. There were 1,078 genes that exhibited increased expression relative to the blood and unrelated tumors and four genes contained somatic protein-coding mutations. Our analysis suggested the tumor cells were driven by the RET oncogene. Genes whose protein products are targeted by the RET inhibitors sunitinib and sorafenib correlated with being amplified and or highly expressed. Consistent with our observations, administration of sunitinib was associated with stable disease lasting 4 months, after which the lung lesions began to grow. Administration of sorafenib and sulindac provided disease stabilization for an additional 3 months after which the cancer progressed and new lesions appeared. A recurring metastasis possessed 7,288 genes within copy number amplicons, 385 genes exhibiting increased expression relative to other tumors and 9 new somatic protein coding mutations. The observed mutations and amplifications were consistent with therapeutic resistance arising through activation of the MAPK and AKT pathways.
We conclude that complete genomic characterization of a rare tumor has the potential to aid in clinical decision making and identifying therapeutic approaches where no established treatment protocols exist. These results also provide direct in vivo genomic evidence for mutational evolution within a tumor under drug selection and potential mechanisms of drug resistance accrual.
RNA-seq; WTSS; PRC2; genome; transcriptome; epigenetic
Starting with SAGE-libraries prepared from C. elegans FAC-sorted embryonic intestine cells (8E-16E cell stage), from total embryos and from purified oocytes, and taking advantage of the NextDB in situ hybridization data base, we define sets of genes highly expressed from the zygotic genome, and expressed either exclusively or preferentially in the embryonic intestine or in the intestine of newly hatched larvae; we had previously defined a similarly expressed set of genes from the adult intestine. We show that an extended TGATAA-like sequence is essentially the only candidate for a cis-acting regulatory motif common to intestine genes expressed at all stages. This sequence is a strong ELT-2 binding site and matches the sequence of GATA-like sites found to be important for the expression of every intestinal gene so far analyzed experimentally. We show that the majority of these three sets of highly expressed intestinal-specific/intestinal-enriched genes respond strongly to ectopic expression of ELT-2 within the embryo. By flow-sorting elt-2(null) larvae from elt-2(+) larvae and then preparing Solexa/Illumina-SAGE libraries, we show that the majority of these genes also respond strongly to loss-of-function of ELT-2. To test the consequences of loss of other transcription factors identified in the embryonic intestine, we develop a strain of worms that is RNAi-sensitive only in the intestine; however, we are unable (with one possible exception) to identify any other transcription factor whose intestinal loss-of-function causes a phenotype of comparable severity to the phenotype caused by loss of ELT-2. Overall, our results support a model in which ELT-2 is the predominant transcription factor in the post-specification C. elegans intestine and participates directly in the transcriptional regulation of the majority (> 80%) of intestinal genes. We present evidence that ELT-2 plays a central role in most aspects of C. elegans intestinal physiology: establishing the structure of the enterocyte, regulating enzymes and transporters involved in digestion and nutrition, responding to environmental toxins and pathogenic infections, and regulating the downstream intestinal components of the daf-2/daf-16 pathway influencing aging and longevity.
Motivation: Next-generation sequencing (NGS) has enabled whole genome and transcriptome single nucleotide variant (SNV) discovery in cancer. NGS produces millions of short sequence reads that, once aligned to a reference genome sequence, can be interpreted for the presence of SNVs. Although tools exist for SNV discovery from NGS data, none are specifically suited to work with data from tumors, where altered ploidy and tumor cellularity impact the statistical expectations of SNV discovery.
Results: We developed three implementations of a probabilistic Binomial mixture model, called SNVMix, designed to infer SNVs from NGS data from tumors to address this problem. The first models allelic counts as observations and infers SNVs and model parameters using an expectation maximization (EM) algorithm and is therefore capable of adjusting to deviation of allelic frequencies inherent in genomically unstable tumor genomes. The second models nucleotide and mapping qualities of the reads by probabilistically weighting the contribution of a read/nucleotide to the inference of a SNV based on the confidence we have in the base call and the read alignment. The third combines filtering out low-quality data in addition to probabilistic weighting of the qualities. We quantitatively evaluated these approaches on 16 ovarian cancer RNASeq datasets with matched genotyping arrays and a human breast cancer genome sequenced to >40× (haploid) coverage with ground truth data and show systematically that the SNVMix models outperform competing approaches.
Availability: Software and data are available at http://compbio.bccrc.ca
Supplemantary information: Supplementary data are available at Bioinformatics online.
A method for de novo assembly of a eukaryotic genome using Illumina, 454 and Sanger generated sequence data
Sequencing-by-synthesis technologies can reduce the cost of generating de novo genome assemblies. We report a method for assembling draft genome sequences of eukaryotic organisms that integrates sequence information from different sources, and demonstrate its effectiveness by assembling an approximately 32.5 Mb draft genome sequence for the forest pathogen Grosmannia clavigera, an ascomycete fungus. We also developed a method for assessing draft assemblies using Illumina paired end read data and demonstrate how we are using it to guide future sequence finishing. Our results demonstrate that eukaryotic genome sequences can be accurately assembled by combining Illumina, 454 and Sanger sequence data.
Foxa2 (HNF3β) is a one of three, closely related transcription factors that are critical to the development and function of the mouse liver. We have used chromatin immunoprecipitation and massively parallel Illumina 1G sequencing (ChIP–Seq) to create a genome-wide profile of in vivo Foxa2-binding sites in the adult liver. More than 65% of the ∼11.5 k genomic sites associated with Foxa2 binding, mapped to extended gene regions of annotated genes, while more than 30% of intragenic sites were located within first introns. 20.5% of all sites were further than 50 kb from any annotated gene, suggesting an association with novel gene regions. QPCR analysis demonstrated a strong positive correlation between peak height and fold enrichment for Foxa2-binding sites. We measured the relationship between Foxa2 and liver gene expression by overlapping Foxa2-binding sites with a SAGE transcriptome profile, and found that 43.5% of genes expressed in the liver were also associated with Foxa2 binding. We also identified potential Foxa2-interacting transcription factors whose motifs were enriched near Foxa2-binding sites. Our comprehensive results for in vivo Foxa2-binding sites in the mouse liver will contribute to resolving transcriptional regulatory networks that are important for adult liver function.
We have applied a high-throughput pyrosequencing technology for transcriptome profiling of Caenorhabditis elegans in its first larval stage. Using this approach, we have generated a large amount of data for expressed sequence tags, which provides an opportunity for the discovery of putative novel transcripts and alternative splice variants that could be developmentally specific to the first larval stage. This work also demonstrates the successful and efficient application of a next generation sequencing methodology.
We have generated over 30 million bases of novel expressed sequence tags from first larval stage worms utilizing high-throughput sequencing technology. We have shown that approximately 14% of the newly sequenced expressed sequence tags map completely or partially to genomic regions where there are no annotated genes or splice variants and therefore, imply that these are novel genetic structures. Expressed sequence tags, which map to intergenic (around 1000) and intronic regions (around 580), may represent novel transcribed regions, such as unannotated or unrecognized small protein-coding or non-protein-coding genes or splice variants. Expressed sequence tags, which map across intron-exon boundaries (around 300), indicate possible alternative splice sites, while expressed sequence tags, which map near the ends of known transcripts (around 600), suggest extension of the coding or untranslated regions. We have also discovered that intergenic and intronic expressed sequence tags, which are well conserved across different nematode species, are likely to represent non-coding RNAs. Lastly, we have incorporated available serial analysis of gene expression data generated from first larval stage worms, in order to predict novel transcripts that might be specifically or predominantly expressed in the first larval stage.
We have demonstrated the use of a high-throughput sequencing methodology to efficiently produce a snap-shot of transcriptional activities occurring in the first larval stage of C. elegans development. Such application of this new sequencing technique allows for high-throughput, genome-wide experimental verification of known and novel transcripts. This study provides a more complete C. elegans transcriptome profile and, furthermore, gives insight into the evolutionary and biological complexity of this organism.
The eukaryotic genome is packaged as chromatin with nucleosomes comprising its basic structural unit, but the detailed structure of chromatin and its dynamic remodeling in terms of individual nucleosome positions has not been completely defined experimentally for any genome. We used ultra-high–throughput sequencing to map the remodeling of individual nucleosomes throughout the yeast genome before and after a physiological perturbation that causes genome-wide transcriptional changes. Nearly 80% of the genome is covered by positioned nucleosomes occurring in a limited number of stereotypical patterns in relation to transcribed regions and transcription factor binding sites. Chromatin remodeling in response to physiological perturbation was typically associated with the eviction, appearance, or repositioning of one or two nucleosomes in the promoter, rather than broader region-wide changes. Dynamic nucleosome remodeling tends to increase the accessibility of binding sites for transcription factors that mediate transcriptional changes. However, specific nucleosomal rearrangements were also evident at promoters even when there was no apparent transcriptional change, indicating that there is no simple, globally applicable relationship between chromatin remodeling and transcriptional activity. Our study provides a detailed, high-resolution, dynamic map of single-nucleosome remodeling across the yeast genome and its relation to global transcriptional changes.
The eukaryotic genome is packed in a systematic hierarchy to accommodate it within the confines of the cell's nucleus. This packing, however, presents an impediment to the transcription machinery when it must access genomic DNA to regulate gene expression. A fundamental aspect of genome packing is the spooling of DNA around nucleosomes—structures formed from histone proteins—which must be dislodged during transcription. In this study, we identified all the nucleosome displacements associated with a physiological perturbation causing genome-wide transcriptional changes in the eukaryote Saccharomyces cerevisiae. We isolated nucleosomal DNA before and after subjecting cells to heat shock, then identified the ends of these DNA fragments and, thereby, the location of nucleosomes along the genome, using ultra-high–throughput sequencing. We identified localized patterns of nucleosome displacement at gene promoters in response to heat shock, and found that nucleosome eviction was generally associated with activation and their appearance with gene repression. Nucleosome remodeling generally improved the accessibility of DNA to transcriptional regulators mediating the response to stresses like heat shock. However, not all nucleosomal remodeling was associated with transcriptional changes, indicating that the relationship between nucleosome repositioning and transcriptional activity is not merely a reflection of competing access to DNA.
Ultra-high-throughput sequencing is used to show that distinct, localized patterns of nucleosome repositioning at promoters underlie the genome-wide transcriptional response to a physiological stimulus.
Analysis of a 2.6 million longSAGE sequence tag resource generated from nine human embryonic stem cell lines reveals an enrichment of RNA binding proteins and novel ES-specific transcripts.
To facilitate discovery of novel human embryonic stem cell (ESC) transcripts, we generated 2.5 million LongSAGE tags from 9 human ESC lines. Analysis of this data revealed that ESCs express proportionately more RNA binding proteins compared with terminally differentiated cells, and identified novel ESC transcripts, at least one of which may represent a marker of the pluripotent state.
Prostate cancer is the most frequently diagnosed cancer in American men, and few effective treatment options are available to patients who develop hormone-refractory prostate cancer. The molecular changes that occur to allow prostate cells to proliferate in the absence of androgens are not fully understood.
Subtractive hybridization experiments performed with samples from an in vivo model of hormonal progression identified 25 expressed sequences representing novel human transcripts. Intriguingly, these 25 sequences have small open-reading frames and are not highly conserved through evolution, suggesting many of these novel expressed sequences may be derived from untranslated regions of novel transcripts or from non-coding transcripts. Examination of a large metalibrary of human Serial Analysis of Gene Expression (SAGE) tags demonstrated that only three of these novel sequences had been previously detected. RT-PCR experiments confirmed that the 6 sequences tested were expressed in specific human tissues, as well as in clinical samples of prostate cancer. Further RT-PCR experiments for five of these fragments indicated they originated from large untranslated regions of unannotated transcripts.
This study underlines the value of using complementary techniques in the annotation of the human genome. The tissue-specific expression of 4 of the 6 clones tested indicates the expression of these novel transcripts is tightly regulated, and future work will determine the possible role(s) these novel transcripts may play in the progression of prostate cancer.
High throughput sequencing-by-synthesis is an emerging technology that allows the rapid production of millions of bases of data. Although the sequence reads are short, they can readily be used for re-sequencing. By re-sequencing the mRNA products of a cell, one may rapidly discover polymorphisms and splice variants particular to that cell.
We present the utility of massively parallel sequencing by synthesis for profiling the transcriptome of a human prostate cancer cell-line, LNCaP, that has been treated with the synthetic androgen, R1881. Through the generation of approximately 20 megabases (MB) of EST data, we detect transcription from over 10,000 gene loci, 25 previously undescribed alternative splicing events involving known exons, and over 1,500 high quality single nucleotide discrepancies with the reference human sequence. Further, we map nearly 10,000 ESTs to positions on the genome where no transcription is currently predicted to occur. We also characterize various obstacles with using sequencing by synthesis for transcriptome analysis and propose solutions to these problems.
The use of high-throughput sequencing-by-synthesis methods for transcript profiling allows the specific and sensitive detection of many of a cell's transcripts, and also allows the discovery of high quality base discrepancies, and alternative splice variants. Thus, this technology may provide an effective means of understanding various disease states, discovering novel targets for disease treatment, and discovery of novel transcripts.