Ovarian cancer is the fifth leading cause of cancer death in women. Almost 70% of ovarian cancer deaths are due to the high-grade serous subtype, which is typically detected only after it has metastasized. Characterization of high-grade serous cancer is further complicated by the significant heterogeneity and genome instability displayed by this cancer. Other than mutations in TP53, which is common to many cancers, highly recurrent recombinant events specific to this cancer have yet to be identified. Using high-throughput transcriptome sequencing of seven patient samples combined with experimental validation at DNA, RNA and protein levels, we identified a cancer-specific and inter-chromosomal fusion gene CDKN2D-WDFY2 that occurs at a frequency of 20% among sixty high-grade serous cancer samples but is absent in non-cancerous ovary and fallopian tube samples. This is the most frequent recombinant event identified so far in high-grade serous cancer implying a major cellular lineage in this highly heterogeneous cancer. In addition, the same fusion transcript was also detected in OV-90, an established high-grade serous type cell line. The genomic breakpoint was identified in intron 1 of CDKN2D and intron 2 of WDFY2 in patient tumor, providing direct evidence that this is a fusion gene. The parental gene, CDKN2D, is a cell-cycle modulator that is also involved in DNA repair, while WDFY2 is known to modulate AKT interactions with its substrates. Transfection of cloned fusion construct led to loss of wildtype CDKN2D and wildtype WDFY2 protein expression, and a gain of a short WDFY2 protein isoform that is presumably under the control of the CDKN2D promoter. The expression of short WDFY2 protein in transfected cells appears to alter the PI3K/AKT pathway that is known to play a role in oncogenesis. CDKN2D-WDFY2 fusion could be an important molecular signature for understanding and classifying sub-lineages among heterogeneous high-grade serous ovarian carcinomas.
High-grade serous carcinoma (HG-SC) is the most common subtype of ovarian cancer observed in women. This subtype of ovarian cancer is typically detected at advanced stages due to lack of effective early screening tools. Recurrent cancer-specific gene fusions resulting from chromosomal translocations have the potential to serve as effective screening tools as well as therapeutic targets. Here we identified CDKN2D-WDFY2 as a cancer-specific fusion gene present in 20% of HG-SC tumors, by far the most frequent gene recombinant event found in this highly heterogeneous disease. We also presented evidence that the expression of this fusion may affect the PI3K/AKT pathway that is important for cancer progression. Thus CDKN2D-WDFY2 could very well represent a major cellular lineage important for detecting and classifying heterogeneous ovarian carcinomas, and could provide insight into the underlying mechanism of this deadly disease. This is critical, given that ovarian cancer kills 140,200 women worldwide each year, and few ovarian cancer-specific molecular alterations are currently available for targeting and screening.
Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each of two healthy human adults and systematically compared five widely used Bisulfite-seq mapping algorithms: Bismark, BSMAP, Pash, BatMeth and BS Seeker. We evaluated their computational speed and genomic coverage and verified their percentage methylation estimates. With the exception of BatMeth, all mappers covered >70% of CpG sites genome-wide and yielded highly concordant estimates of percentage methylation (r2 ≥ 0.95). Fourfold variation in mapping time was found between BSMAP (fastest) and Pash (slowest). In each library, 8–12% of genomic regions covered by Bismark and Pash were not covered by BSMAP. An experiment using simulated reads confirmed that Pash has an exceptional ability to uniquely map reads in genomic regions of structural variation. Independent verification by bisulfite pyrosequencing generally confirmed the percentage methylation estimates by the mappers. Of these algorithms, Bismark provides an attractive combination of processing speed, genomic coverage and quantitative accuracy, whereas Pash offers considerably higher genomic coverage.
Loss of Dicer, an enzyme critical for microRNA biogenesis, results in lethality due to a block in mouse embryonic stem cell (mES) differentiation. Using ChIP-Seq we found increased H3K9me2 at over 900 CpG islands in the Dicer-/-ES epigenome. Gene ontology analysis revealed that promoters of chromatin regulators to be among the most impacted by increased CpG island H3K9me2 in ES (Dicer-/-). We therefore, extended the study to include H3K4me3 and H3K27me3 marks for selected genes. We found that the ES (Dicer-/-) mutant epigenome was characterized by a shift in the overall balance between transcriptionally favorable (H3K4me3) and unfavorable (H3K27me3) marks at key genes regulating ES cell differentiation. Pluripotency genes Oct4, Sox2 and Nanog were not impacted in relation to patterns of H3K27me3 and H3K4me3 and showed no changes in the rates of transcript down-regulation in response to RA. The most striking changes were observed in regards to genes regulating differentiation and the transition from self-renewal to differentiation. An increase in H3K4me3 at the promoter of Lin28b was associated with the down-regulation of this gene at a lower rate in Dicer-/-ES as compared to wild type ES. An increase in H3K27me3 in the promoters of differentiation genes Hoxa1 and Cdx2 in Dicer-/-ES cells was coincident with an inability to up-regulate these genes at the same rate as ES upon retinoic acid (RA)-induced differentiation. We found that siRNAs Ezh2 and post-transcriptional silencing of Ezh2 by let-7g rescued this effect suggesting that Ezh2 up-regulation is in part responsible for increased H3K27me3 and decreased rates of up-regulation of differentiation genes in Dicer-/-ES.
Playing a central role in the maintenance of hemostasis as well as in thrombotic disorders, platelets contain a relatively diverse messenger RNA (mRNA) transcriptome as well as functional mRNA-regulatory microRNAs, suggesting that platelet mRNAs may be regulated by microRNAs. Here, we elucidated the complete repertoire and features of human platelet microRNAs by high-throughput sequencing. More than 492 different mature microRNAs were detected in human platelets, whereas the list of known human microRNAs was expanded further by the discovery of 40 novel microRNA sequences. As in nucleated cells, platelet microRNAs bear signs of post-transcriptional modifications, mainly terminal adenylation and uridylation. In vitro enzymatic assays demonstrated the ability of human platelets to uridylate microRNAs, which correlated with the presence of the uridyltransferase enzyme TUT4. We also detected numerous microRNA isoforms (isomiRs) resulting from imprecise Drosha and/or Dicer processing, in some cases more frequently than the reference microRNA sequence, including 5′ shifted isomiRs with redirected mRNA targeting abilities. This study unveils the existence of a relatively diverse and complex microRNA repertoire in human platelets, and represents a mandatory step towards elucidating the intraplatelet and extraplatelet role, function and importance of platelet microRNAs.
BACKGROUND AND AIMS
The intestinal microbiomes of healthy children and pediatric patients with irritable bowel syndrome (IBS) are not well defined. Studies in adults have indicated that the gastrointestinal microbiota could be involved in IBS.
We analyzed 71 samples from 22 children with IBS (pediatric Rome III criteria) and 22 healthy children, ages 7–12 years, by 16S rRNA gene sequencing, with an average of 54,287 reads/stool sample (average 454 read length = 503 bases). Data were analyzed using phylogenetic-based clustering (Unifrac), or an operational taxonomic unit (OTU) approach using a supervised machine learning tool (randomForest). Most samples were also hybridized to a microarray that can detect 8,741 bacterial taxa (16S rRNA PhyloChip).
Microbiomes associated with pediatric IBS were characterized by a significantly greater percentage of the class Gammaproteobacteria (0.07% vs 0.89% of total bacteria; P <.05); one prominent component of this group was Haemophilus parainfluenzae. Differences highlighted by 454 sequencing were confirmed by high-resolution PhyloChip analysis. Using supervised learning techniques, we were able to classify different subtypes of IBS with a success rate of 98.5%, using limited sets of discriminant bacterial species. A novel Ruminococcus-like microbe was associated with IBS, indicating the potential utility of microbe discovery for gastrointestinal disorders. A greater frequency of pain correlated with an increased abundance of several bacterial taxa from the genus Alistipes.
Using16S metagenomics by Phylochip DNA hybridization and deep 454 pyrosequencing, we associated specific microbiome signatures with pediatric IBS. These findings indicate the important association between gastrointestinal microbes and IBS in children; these approaches might be used in diagnosis of functional bowel disorders in pediatric patients.
Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues.
We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set.
We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
Microbial metagenomic analyses rely on an increasing number of publicly available tools. Installation, integration, and maintenance of the tools poses significant burden on many researchers and creates a barrier to adoption of microbiome analysis, particularly in translational settings.
To address this need we have integrated a rich collection of microbiome analysis tools into the Genboree Microbiome Toolset and exposed them to the scientific community using the Software-as-a-Service model via the Genboree Workbench. The Genboree Microbiome Toolset provides an interactive environment for users at all bioinformatic experience levels in which to conduct microbiome analysis. The Toolset drives hypothesis generation by providing a wide range of analyses including alpha diversity and beta diversity, phylogenetic profiling, supervised machine learning, and feature selection.
We validate the Toolset in two studies of the gut microbiota, one involving obese and lean twins, and the other involving children suffering from the irritable bowel syndrome.
By lowering the barrier to performing a comprehensive set of microbiome analyses, the Toolset empowers investigators to translate high-volume sequencing data into valuable biomedical discoveries.
Dynamic changes to the epigenome play a critical role in establishing and maintaining cellular phenotype during differentiation, but little is known about the normal methylomic differences that occur between functionally distinct areas of the brain. We characterized intra- and inter-individual methylomic variation across whole blood and multiple regions of the brain from multiple donors.
Distinct tissue-specific patterns of DNA methylation were identified, with a highly significant over-representation of tissue-specific differentially methylated regions (TS-DMRs) observed at intragenic CpG islands and low CG density promoters. A large proportion of TS-DMRs were located near genes that are differentially expressed across brain regions. TS-DMRs were significantly enriched near genes involved in functional pathways related to neurodevelopment and neuronal differentiation, including BDNF, BMP4, CACNA1A, CACA1AF, EOMES, NGFR, NUMBL, PCDH9, SLIT1, SLITRK1 and SHANK3. Although between-tissue variation in DNA methylation was found to greatly exceed between-individual differences within any one tissue, we found that some inter-individual variation was reflected across brain and blood, indicating that peripheral tissues may have some utility in epidemiological studies of complex neurobiological phenotypes.
This study reinforces the importance of DNA methylation in regulating cellular phenotype across tissues, and highlights genomic patterns of epigenetic variation across functionally distinct regions of the brain, providing a resource for the epigenetics and neuroscience research communities.
While current major national research efforts (i.e., the NIH Human Microbiome Project) will enable comprehensive metagenomic characterization of the adult human microbiota, how and when these diverse microbial communities take up residence in the host and during reproductive life are unexplored at a population level. Because microbial abundance and diversity might differ in pregnancy, we sought to generate comparative metagenomic signatures across gestational age strata. DNA was isolated from the vagina (introitus, posterior fornix, midvagina) and the V5V3 region of bacterial 16S rRNA genes were sequenced (454FLX Titanium platform). Sixty-eight samples from 24 healthy gravidae (18 to 40 confirmed weeks) were compared with 301 non-pregnant controls (60 subjects). Generated sequence data were quality filtered, taxonomically binned, normalized, and organized by phylogeny and into operational taxonomic units (OTU); principal coordinates analysis (PCoA) of the resultant beta diversity measures were used for visualization and analysis in association with sample clinical metadata. Altogether, 1.4 gigabytes of data containing >2.5 million reads (averaging 6,837 sequences/sample of 493 nt in length) were generated for computational analyses. Although gravidae were not excluded by virtue of a posterior fornix pH >4.5 at the time of screening, unique vaginal microbiome signature encompassing several specific OTUs and higher-level clades was nevertheless observed and confirmed using a combination of phylogenetic, non-phylogenetic, supervised, and unsupervised approaches. Both overall diversity and richness were reduced in pregnancy, with dominance of Lactobacillus species (L. iners crispatus, jensenii and johnsonii, and the orders Lactobacillales (and Lactobacillaceae family), Clostridiales, Bacteroidales, and Actinomycetales. This intergroup comparison using rigorous standardized sampling protocols and analytical methodologies provides robust initial evidence that the vaginal microbial 16S rRNA gene catalogue uniquely differs in pregnancy, with variance of taxa across vaginal subsite and gestational age.
The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR–mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.
The human genome contains many loci with high incidence of structural mutations, including insertions and deletions of chromosomal segments. This excessive mutability has accelerated evolution and contributed to human disease but has yet to be explained. Segments of DNA repeated in low-copy numbers (LCRs) have been previously implicated in promoting structural mutability in specific disease-associated loci. Lack of methylation (hypomethylation) of genomic DNA has been previously associated with high structural mutability in gibbons and in human cancer cells, but the association with structural mutability in the human germline has not been explored prior to this study. Our analyses confirm the role of LCRs in promoting structural mutability on the genome scale but also reveal a surprisingly strong association of genomic instability with hypomethylation. Specifically, evolutionary analyses reveal that methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in human sperm, harbor a tenfold higher number of structural mutations than genome-wide average. Moreover, the structural mutations in individuals diagnosed with schizophrenia, bipolar disorder, developmental delay, and autism are significantly more concentrated within hypomethylated regions. Our findings suggest a new connection between methylation of genomic DNA, selective structural mutability, evolution, and human disease.
Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data.
Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454). The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%).
We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.
Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression.
DNA methylation; Sequencing; Bisulfite
Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at http://code.google.com/p/readdepth/.
Only thirteen microRNAs are conserved between D. melanogaster and the mouse; however, conditional loss of miRNA function through mutation of Dicer causes defects in proliferation of premeiotic germ cells in both species. This highlights the potentially important, but uncharacterized, role of miRNAs during early spermatogenesis. The goal of this study was to characterize on postnatal day 7, 10, and 14 the content and editing of murine testicular miRNAs, which predominantly arise from spermatogonia and spermatocytes, in contrast to prior descriptions of miRNAs in the adult mouse testis which largely reflects the content of spermatids. Previous studies have shown miRNAs to be abundant in the mouse testis by postnatal day 14; however, through Next Generation Sequencing of testes from a B6;129 background we found abundant earlier expression of miRNAs and describe shifts in the miRNA signature during this period. We detected robust expression of miRNAs encoded on the X chromosome in postnatal day 14 testes, consistent with prior studies showing their resistance to meiotic sex chromosome inactivation. Unexpectedly, we also found a similar positional enrichment for most miRNAs on chromosome 2 at postnatal day 14 and for those on chromosome 12 at postnatal day 7. We quantified in vivo developmental changes in three types of miRNA variation including 5′ heterogeneity, editing, and 3′ nucleotide addition. We identified eleven putative novel pubertal testis miRNAs whose developmental expression suggests a possible role in early male germ cell development. These studies provide a foundation for interpretation of miRNA changes associated with testicular pathology and identification of novel components of the miRNA editing machinery in the testis.
Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing.
Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms.
We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.
Nuage are amorphous ultrastructural granules in the cytoplasm of male germ cells as divergent as Drosophila, Xenopus, and Homo sapiens. Most nuage are cytoplasmic ribonucleoprotein structures implicated in diverse RNA metabolism including the regulation of PIWI-interacting RNA (piRNA) synthesis by the PIWI family (i.e., MILI, MIWI2, and MIWI). MILI is prominent in embryonic and early post-natal germ cells in nuage also called germinal granules that are often associated with mitochondria and called intermitochondrial cement. We find that GASZ (Germ cell protein with Ankyrin repeats, Sterile alpha motif, and leucine Zipper) co-localizes with MILI in intermitochondrial cement. Knockout of Gasz in mice results in a dramatic downregulation of MILI, and phenocopies the zygotene–pachytene spermatocyte block and male sterility defect observed in MILI null mice. In Gasz null testes, we observe increased hypomethylation and expression of retrotransposons similar to MILI null testes. We also find global shifts in the small RNAome, including down-regulation of repeat-associated, known, and novel piRNAs. These studies provide the first evidence for an essential structural role for GASZ in male fertility and epigenetic and post-transcriptional silencing of retrotransposons by stabilizing MILI in nuage.
Many aspects of RNA processing are essential for or prominent in the differentiation of germ cells. Some RNA metabolism in animal germ cells is associated with physical structures surrounding the cell nucleus called nuage. Nuage has a distinct granular appearance prior to the meiotic divisions with unclear functions. We have identified a protein called GASZ, which plays a structural role in this early nuage. In mice lacking GASZ, retrotransposons—endogenous viral-like particles—become released from their typical repressed state in the germline by the loss of small RNAs called piRNAs, resulting in DNA damage and delayed germ cell maturation. Protection of the germline from genetic intruders may require the association of piRNA-synthesizing enzymes and other components of this nuage structure through direct or indirect associations with GASZ. Mutations in GASZ and other nuage components may contribute to infertility in men who do not produce spermatozoa.
MicroRNAs modulate tumorigenesis through suppression of specific genes. As many tumour types rely on overlapping oncogenic pathways, a core set of microRNAs may exist, which consistently drives or suppresses tumorigenesis in many cancer types. Here we integrate The Cancer Genome Atlas (TCGA) pan-cancer data set with a microRNA target atlas composed of publicly available Argonaute Crosslinking Immunoprecipitation (AGO-CLIP) data to identify pan-tumour microRNA drivers of cancer. Through this analysis, we show a pan-cancer, coregulated oncogenic microRNA ‘superfamily’ consisting of the miR-17, miR-19, miR-130, miR-93, miR-18, miR-455 and miR-210 seed families, which cotargets critical tumour suppressors via a central GUGC core motif. We subsequently define mutations in microRNA target sites using the AGO-CLIP microRNA target atlas and TCGA exome-sequencing data. These combined analyses identify pan-cancer oncogenic cotargeting of the phosphoinositide 3-kinase, TGFβ and p53 pathways by the miR-17-19-130 superfamily members.
AGO-CLIP permits the identification of miRNA target genes. Here, Hamilton et al. compile publicly available AGO-CLIP data and combine this information with miRNA analysis from The Cancer Genome Atlas, permitting the identification of an oncogenic miRNA superfamily that targets tumour suppressor genes.