Loss of Dicer, an enzyme critical for microRNA biogenesis, results in lethality due to a block in mouse embryonic stem cell (mES) differentiation. Using ChIP-Seq we found increased H3K9me2 at over 900 CpG islands in the Dicer-/-ES epigenome. Gene ontology analysis revealed that promoters of chromatin regulators to be among the most impacted by increased CpG island H3K9me2 in ES (Dicer-/-). We therefore, extended the study to include H3K4me3 and H3K27me3 marks for selected genes. We found that the ES (Dicer-/-) mutant epigenome was characterized by a shift in the overall balance between transcriptionally favorable (H3K4me3) and unfavorable (H3K27me3) marks at key genes regulating ES cell differentiation. Pluripotency genes Oct4, Sox2 and Nanog were not impacted in relation to patterns of H3K27me3 and H3K4me3 and showed no changes in the rates of transcript down-regulation in response to RA. The most striking changes were observed in regards to genes regulating differentiation and the transition from self-renewal to differentiation. An increase in H3K4me3 at the promoter of Lin28b was associated with the down-regulation of this gene at a lower rate in Dicer-/-ES as compared to wild type ES. An increase in H3K27me3 in the promoters of differentiation genes Hoxa1 and Cdx2 in Dicer-/-ES cells was coincident with an inability to up-regulate these genes at the same rate as ES upon retinoic acid (RA)-induced differentiation. We found that siRNAs Ezh2 and post-transcriptional silencing of Ezh2 by let-7g rescued this effect suggesting that Ezh2 up-regulation is in part responsible for increased H3K27me3 and decreased rates of up-regulation of differentiation genes in Dicer-/-ES.
Playing a central role in the maintenance of hemostasis as well as in thrombotic disorders, platelets contain a relatively diverse messenger RNA (mRNA) transcriptome as well as functional mRNA-regulatory microRNAs, suggesting that platelet mRNAs may be regulated by microRNAs. Here, we elucidated the complete repertoire and features of human platelet microRNAs by high-throughput sequencing. More than 492 different mature microRNAs were detected in human platelets, whereas the list of known human microRNAs was expanded further by the discovery of 40 novel microRNA sequences. As in nucleated cells, platelet microRNAs bear signs of post-transcriptional modifications, mainly terminal adenylation and uridylation. In vitro enzymatic assays demonstrated the ability of human platelets to uridylate microRNAs, which correlated with the presence of the uridyltransferase enzyme TUT4. We also detected numerous microRNA isoforms (isomiRs) resulting from imprecise Drosha and/or Dicer processing, in some cases more frequently than the reference microRNA sequence, including 5′ shifted isomiRs with redirected mRNA targeting abilities. This study unveils the existence of a relatively diverse and complex microRNA repertoire in human platelets, and represents a mandatory step towards elucidating the intraplatelet and extraplatelet role, function and importance of platelet microRNAs.
BACKGROUND AND AIMS
The intestinal microbiomes of healthy children and pediatric patients with irritable bowel syndrome (IBS) are not well defined. Studies in adults have indicated that the gastrointestinal microbiota could be involved in IBS.
We analyzed 71 samples from 22 children with IBS (pediatric Rome III criteria) and 22 healthy children, ages 7–12 years, by 16S rRNA gene sequencing, with an average of 54,287 reads/stool sample (average 454 read length = 503 bases). Data were analyzed using phylogenetic-based clustering (Unifrac), or an operational taxonomic unit (OTU) approach using a supervised machine learning tool (randomForest). Most samples were also hybridized to a microarray that can detect 8,741 bacterial taxa (16S rRNA PhyloChip).
Microbiomes associated with pediatric IBS were characterized by a significantly greater percentage of the class Gammaproteobacteria (0.07% vs 0.89% of total bacteria; P <.05); one prominent component of this group was Haemophilus parainfluenzae. Differences highlighted by 454 sequencing were confirmed by high-resolution PhyloChip analysis. Using supervised learning techniques, we were able to classify different subtypes of IBS with a success rate of 98.5%, using limited sets of discriminant bacterial species. A novel Ruminococcus-like microbe was associated with IBS, indicating the potential utility of microbe discovery for gastrointestinal disorders. A greater frequency of pain correlated with an increased abundance of several bacterial taxa from the genus Alistipes.
Using16S metagenomics by Phylochip DNA hybridization and deep 454 pyrosequencing, we associated specific microbiome signatures with pediatric IBS. These findings indicate the important association between gastrointestinal microbes and IBS in children; these approaches might be used in diagnosis of functional bowel disorders in pediatric patients.
Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues.
We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set.
We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
Microbial metagenomic analyses rely on an increasing number of publicly available tools. Installation, integration, and maintenance of the tools poses significant burden on many researchers and creates a barrier to adoption of microbiome analysis, particularly in translational settings.
To address this need we have integrated a rich collection of microbiome analysis tools into the Genboree Microbiome Toolset and exposed them to the scientific community using the Software-as-a-Service model via the Genboree Workbench. The Genboree Microbiome Toolset provides an interactive environment for users at all bioinformatic experience levels in which to conduct microbiome analysis. The Toolset drives hypothesis generation by providing a wide range of analyses including alpha diversity and beta diversity, phylogenetic profiling, supervised machine learning, and feature selection.
We validate the Toolset in two studies of the gut microbiota, one involving obese and lean twins, and the other involving children suffering from the irritable bowel syndrome.
By lowering the barrier to performing a comprehensive set of microbiome analyses, the Toolset empowers investigators to translate high-volume sequencing data into valuable biomedical discoveries.
Dynamic changes to the epigenome play a critical role in establishing and maintaining cellular phenotype during differentiation, but little is known about the normal methylomic differences that occur between functionally distinct areas of the brain. We characterized intra- and inter-individual methylomic variation across whole blood and multiple regions of the brain from multiple donors.
Distinct tissue-specific patterns of DNA methylation were identified, with a highly significant over-representation of tissue-specific differentially methylated regions (TS-DMRs) observed at intragenic CpG islands and low CG density promoters. A large proportion of TS-DMRs were located near genes that are differentially expressed across brain regions. TS-DMRs were significantly enriched near genes involved in functional pathways related to neurodevelopment and neuronal differentiation, including BDNF, BMP4, CACNA1A, CACA1AF, EOMES, NGFR, NUMBL, PCDH9, SLIT1, SLITRK1 and SHANK3. Although between-tissue variation in DNA methylation was found to greatly exceed between-individual differences within any one tissue, we found that some inter-individual variation was reflected across brain and blood, indicating that peripheral tissues may have some utility in epidemiological studies of complex neurobiological phenotypes.
This study reinforces the importance of DNA methylation in regulating cellular phenotype across tissues, and highlights genomic patterns of epigenetic variation across functionally distinct regions of the brain, providing a resource for the epigenetics and neuroscience research communities.
While current major national research efforts (i.e., the NIH Human Microbiome Project) will enable comprehensive metagenomic characterization of the adult human microbiota, how and when these diverse microbial communities take up residence in the host and during reproductive life are unexplored at a population level. Because microbial abundance and diversity might differ in pregnancy, we sought to generate comparative metagenomic signatures across gestational age strata. DNA was isolated from the vagina (introitus, posterior fornix, midvagina) and the V5V3 region of bacterial 16S rRNA genes were sequenced (454FLX Titanium platform). Sixty-eight samples from 24 healthy gravidae (18 to 40 confirmed weeks) were compared with 301 non-pregnant controls (60 subjects). Generated sequence data were quality filtered, taxonomically binned, normalized, and organized by phylogeny and into operational taxonomic units (OTU); principal coordinates analysis (PCoA) of the resultant beta diversity measures were used for visualization and analysis in association with sample clinical metadata. Altogether, 1.4 gigabytes of data containing >2.5 million reads (averaging 6,837 sequences/sample of 493 nt in length) were generated for computational analyses. Although gravidae were not excluded by virtue of a posterior fornix pH >4.5 at the time of screening, unique vaginal microbiome signature encompassing several specific OTUs and higher-level clades was nevertheless observed and confirmed using a combination of phylogenetic, non-phylogenetic, supervised, and unsupervised approaches. Both overall diversity and richness were reduced in pregnancy, with dominance of Lactobacillus species (L. iners crispatus, jensenii and johnsonii, and the orders Lactobacillales (and Lactobacillaceae family), Clostridiales, Bacteroidales, and Actinomycetales. This intergroup comparison using rigorous standardized sampling protocols and analytical methodologies provides robust initial evidence that the vaginal microbial 16S rRNA gene catalogue uniquely differs in pregnancy, with variance of taxa across vaginal subsite and gestational age.
The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR–mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.
The human genome contains many loci with high incidence of structural mutations, including insertions and deletions of chromosomal segments. This excessive mutability has accelerated evolution and contributed to human disease but has yet to be explained. Segments of DNA repeated in low-copy numbers (LCRs) have been previously implicated in promoting structural mutability in specific disease-associated loci. Lack of methylation (hypomethylation) of genomic DNA has been previously associated with high structural mutability in gibbons and in human cancer cells, but the association with structural mutability in the human germline has not been explored prior to this study. Our analyses confirm the role of LCRs in promoting structural mutability on the genome scale but also reveal a surprisingly strong association of genomic instability with hypomethylation. Specifically, evolutionary analyses reveal that methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in human sperm, harbor a tenfold higher number of structural mutations than genome-wide average. Moreover, the structural mutations in individuals diagnosed with schizophrenia, bipolar disorder, developmental delay, and autism are significantly more concentrated within hypomethylated regions. Our findings suggest a new connection between methylation of genomic DNA, selective structural mutability, evolution, and human disease.
Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data.
Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454). The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%).
We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.
Small non-coding RNAs, such as microRNAs (miRNAs), are involved in diverse biological processes including organ development and tissue differentiation. Global disruption of miRNA biogenesis in Dicer knockout mice disrupts early embryogenesis and primordial germ cell formation. However, the role of miRNAs in early folliculogenesis is poorly understood. In order to identify a full transcriptome set of small RNAs expressed in the newborn (NB) ovary, we extracted small RNA fraction from mouse NB ovary tissues and subjected it to massive parallel sequencing using the Genome Analyzer from Illumina. Massive sequencing produced 4 655 992 reads of 33 bp each representing a total of 154 Mbp of sequence data. The Pash alignment algorithm mapped 50.13% of the reads to the mouse genome. Sequence reads were clustered based on overlapping mapping coordinates and intersected with known miRNAs, small nucleolar RNAs (snoRNAs), piwi-interacting RNA (piRNA) clusters and repetitive genomic regions; 25.2% of the reads mapped to known miRNAs, 25.5% to genomic repeats, 3.5% to piRNAs and 0.18% to snoRNAs. Three hundred and ninety-eight known miRNA species were among the sequenced small RNAs, and 118 isomiR sequences that are not in the miRBase database. Let-7 family was the most abundantly expressed miRNA, and mmu-mir-672, mmu-mir-322, mmu-mir-503 and mmu-mir-465 families are the most abundant X-linked miRNA detected. X-linked mmu-mir-503, mmu-mir-672 and mmu-mir-465 family showed preferential expression in testes and ovaries. We also identified four novel miRNAs that are preferentially expressed in gonads. Gonadal selective miRNAs may play important roles in ovarian development, folliculogenesis and female fertility.
miRNA; ovary; oocyte; microRNA; ncRNA
In an important model for neuroscience, songbirds learn to discriminate songs they hear during tape-recorded playbacks, as demonstrated by song-specific habituation of both behavioral and neurogenomic responses in the auditory forebrain. We hypothesized that microRNAs (miRNAs or miRs) may participate in the changing pattern of gene expression induced by song exposure. To test this, we used massively parallel Illumina sequencing to analyse small RNAs from auditory forebrain of adult zebra finches exposed to tape-recorded birdsong or silence.
In the auditory forebrain, we identified 121 known miRNAs conserved in other vertebrates. We also identified 34 novel miRNAs that do not align to human or chicken genomes. Five conserved miRNAs showed significant and consistent changes in copy number after song exposure across three biological replications of the song-silence comparison, with two increasing (tgu-miR-25, tgu-miR-192) and three decreasing (tgu-miR-92, tgu-miR-124, tgu-miR-129-5p). We also detected a locus on the Z sex chromosome that produces three different novel miRNAs, with supporting evidence from Northern blot and TaqMan qPCR assays for differential expression in males and females and in response to song playbacks. One of these, tgu-miR-2954-3p, is predicted (by TargetScan) to regulate eight song-responsive mRNAs that all have functions in cellular proliferation and neuronal differentiation.
The experience of hearing another bird singing alters the profile of miRNAs in the auditory forebrain of zebra finches. The response involves both known conserved miRNAs and novel miRNAs described so far only in the zebra finch, including a novel sex-linked, song-responsive miRNA. These results indicate that miRNAs are likely to contribute to the unique behavioural biology of learned song communication in songbirds.
Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression.
DNA methylation; Sequencing; Bisulfite
Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at http://code.google.com/p/readdepth/.
Only thirteen microRNAs are conserved between D. melanogaster and the mouse; however, conditional loss of miRNA function through mutation of Dicer causes defects in proliferation of premeiotic germ cells in both species. This highlights the potentially important, but uncharacterized, role of miRNAs during early spermatogenesis. The goal of this study was to characterize on postnatal day 7, 10, and 14 the content and editing of murine testicular miRNAs, which predominantly arise from spermatogonia and spermatocytes, in contrast to prior descriptions of miRNAs in the adult mouse testis which largely reflects the content of spermatids. Previous studies have shown miRNAs to be abundant in the mouse testis by postnatal day 14; however, through Next Generation Sequencing of testes from a B6;129 background we found abundant earlier expression of miRNAs and describe shifts in the miRNA signature during this period. We detected robust expression of miRNAs encoded on the X chromosome in postnatal day 14 testes, consistent with prior studies showing their resistance to meiotic sex chromosome inactivation. Unexpectedly, we also found a similar positional enrichment for most miRNAs on chromosome 2 at postnatal day 14 and for those on chromosome 12 at postnatal day 7. We quantified in vivo developmental changes in three types of miRNA variation including 5′ heterogeneity, editing, and 3′ nucleotide addition. We identified eleven putative novel pubertal testis miRNAs whose developmental expression suggests a possible role in early male germ cell development. These studies provide a foundation for interpretation of miRNA changes associated with testicular pathology and identification of novel components of the miRNA editing machinery in the testis.
Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing.
Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms.
We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.
Nuage are amorphous ultrastructural granules in the cytoplasm of male germ cells as divergent as Drosophila, Xenopus, and Homo sapiens. Most nuage are cytoplasmic ribonucleoprotein structures implicated in diverse RNA metabolism including the regulation of PIWI-interacting RNA (piRNA) synthesis by the PIWI family (i.e., MILI, MIWI2, and MIWI). MILI is prominent in embryonic and early post-natal germ cells in nuage also called germinal granules that are often associated with mitochondria and called intermitochondrial cement. We find that GASZ (Germ cell protein with Ankyrin repeats, Sterile alpha motif, and leucine Zipper) co-localizes with MILI in intermitochondrial cement. Knockout of Gasz in mice results in a dramatic downregulation of MILI, and phenocopies the zygotene–pachytene spermatocyte block and male sterility defect observed in MILI null mice. In Gasz null testes, we observe increased hypomethylation and expression of retrotransposons similar to MILI null testes. We also find global shifts in the small RNAome, including down-regulation of repeat-associated, known, and novel piRNAs. These studies provide the first evidence for an essential structural role for GASZ in male fertility and epigenetic and post-transcriptional silencing of retrotransposons by stabilizing MILI in nuage.
Many aspects of RNA processing are essential for or prominent in the differentiation of germ cells. Some RNA metabolism in animal germ cells is associated with physical structures surrounding the cell nucleus called nuage. Nuage has a distinct granular appearance prior to the meiotic divisions with unclear functions. We have identified a protein called GASZ, which plays a structural role in this early nuage. In mice lacking GASZ, retrotransposons—endogenous viral-like particles—become released from their typical repressed state in the germline by the loss of small RNAs called piRNAs, resulting in DNA damage and delayed germ cell maturation. Protection of the germline from genetic intruders may require the association of piRNA-synthesizing enzymes and other components of this nuage structure through direct or indirect associations with GASZ. Mutations in GASZ and other nuage components may contribute to infertility in men who do not produce spermatozoa.