The epigenome plays the pivotal role as interface between genome and environment. True genome-wide assessments of epigenetic marks, such as DNA methylation (methylomes) or chromatin modifications (chromatinomes), are now possible, either through high-throughput arrays or increasingly by second-generation DNA sequencing methods. The ability to collect these data at this level of resolution enables us to begin to be able to propose detailed questions, and interrogate this information, with regards to changes that occur due to development, lineage and tissue-specificity, and significantly those caused by environmental influence, such as ageing, stress, diet, hormones or toxins. Common complex traits are under variable levels of genetic influence and additionally epigenetic effect. The detection of pathological epigenetic alterations will reveal additional insights into their aetiology and how possible environmental modulation of this mechanism may occur. Due to the reversibility of these marks, the potential for sequence-specific targeted therapeutics exists. This review surveys recent epigenomic advances and their current and prospective application to the study of common diseases.
Genomics; epigenetics; epigenomics; common disease; complex traits; gene environment interaction
Gene Set Enrichment (GSE) is a computational technique which determines whether a priori defined set of genes show statistically significant differential expression between two phenotypes. Currently, the gene sets used for GSE are derived from annotation or pathway databases, which often contain computationally based and unrepresentative data. Here, we propose a novel approach for the generation of comprehensive and biologically derived gene sets, deriving sets through the application of machine learning techniques to gene expression data. These gene sets can be produced for specific tissues, developmental stages or environments. They provide a powerful and functionally meaningful way in which to mine genomewide association and next generation sequencing data in order to identify disease-associated variants and pathways.
gene set enrichment; annotation database; gene expression data; machine learning; next generation sequencing
Eukaryotic cells express a large variety of ribonucleic acid-(RNA)-binding proteins (RBPs) with diverse affinity and specificity towards target RNAs that play a crucial role in almost every aspect of RNA metabolism. In addition, specific domains in RBPs impart catalytic activity or mediate protein–protein interactions, making RBPs versatile regulators of gene expression. In this review, we elaborate on recent experimental and computational approaches that have increased our understanding of RNA–protein interactions and their role in cellular function. We review aspects of gene expression that are modulated post-transcriptionally by RBPs, namely the stability of polymerase II-derived mRNA transcripts and their rate of translation into proteins. We further highlight the extensive regulatory networks of RBPs that implement a combinatorial control of gene expression. Taking cues from the recent development in the field, we argue that understanding spatio-temporal RNA–protein association on a transcriptome level will provide invaluable and unexpected insights into the regulatory codes that define growth, differentiation and disease.
RNA-binding proteins; RNA-binding domains; RBP–RNA interaction; RBP regulatory networks; RBP target identification
Genomic imprinting refers to genes that are epigenetically programmed in the germline to express exclusively or preferentially one allele in a parent-of-origin manner. Expression-based genome-wide screening for the identification of imprinted genes has failed to uncover a significant number of new imprinted genes, probably because of the high tissue- and developmental-stage specificity of imprinted gene expression. A very large number of technical and biological artifacts can also lead to the erroneous evidence of imprinted gene expression. In this article, we focus on three common sources of potential confounding effects: (i) random monoallelic expression in monoclonal cell populations, (ii) genetically determined monoallelic expression and (iii) contamination or infiltration of embryonic tissues with maternal material. This last situation specifically applies to genes that occur as maternally expressed in the placenta. Beside the use of reciprocal crosses that are instrumental to confirm the parental specificity of expression, we provide additional methods for the detection and elimination of these situations that can be misinterpreted as cases of imprinted expression.
genomic imprinting; DNA methylation; monoallelic expression; germline; placenta
Fertilization of the oocyte by the sperm results in the formation of a totipotent zygote, in which the maternal and paternal chromatin is enclosed in two pronuclei undergoing distinct programmes of transcriptional activation and chromatin remodelling. The highly packaged paternal chromatin delivered by the sperm is decondensed and acquires a number of specific epigenetic marks, but markedly remains devoid of those usually associated with constitutive heterochromatin. During this period the maternal chromatin remains relatively stable except for marks associated with transcription and/or replication such as arginine methylation and H3/H4 acetylation. The embryo then undergoes a series of mitotic divisions without significant additional growth but differentiation, resulting in the formation of a blastocyst containing distinct cell types. The chromatin remodelling events during these stages are likely to be important in establishing the nuclear foundations required for later triggers of differentiation. Overall, we summarize three important points during these earliest reprogramming events: (i) relatively stable maternal chromatin after fertilization, (ii) rapid acquisition of specific histone marks by the paternal chromatin during the hours that follow fertilization and (iii) rapid remodelling of constitutive heterochromatic marks and modifications in the core of the nucleosome from the first mitotic division. These features are likely to be required for the creation of a chromatin environment compatible with cellular reprogramming and plasticity.
Mouse embryo; epigenetic reprogramming; cell plasticity; totipotency; chromatin; heterochromatin; methylation
Chromatin-immunoprecipitation and sequencing (ChIP-seq) is a rapidly maturing technology that draws on the power of high-throughput short-read sequencing to decipher chromatin states with unprecedented precision and breadth. Although some aspects of the experimental protocol require careful tuning, the bottleneck currently firmly lies with the downstream data analysis. We give an overview of the better-established aspects of genome mapping and data normalization and we describe the more recent progress in peak calling and their statistical analysis and provide a brief overview of popular follow-up analyses such as genomic feature categorization and motif search.
ChIP-seq; high-throughput sequencing; DNA binding; transcriptional regulation; bioinformatics
Specific binding of transcription factors (TFs) determines in a large part the connectivity of gene regulatory networks as well as the quantitative level of gene expression. A multiplicity of both experimental and computational methods is currently used to discover and characterize the underlying TF–DNA interactions. Experimental methods can be further subdivided into in vitro- and in vivo-based approaches, each accenting different aspects of TF-binding events. In this review we summarize the flexibility and performance of a selection of both types of experimental methods. In conclusion, we argue that a serial combination of methods with different throughput and data type constitutes an optimal experimental strategy.
transcription factor; DNA binding; binding affinity; ChIP; MITOMI; PBM; SELEX
With the expanding availability of sequencing technologies, research previously centered on the human genome can now afford to include the study of humans’ internal ecosystem (human microbiome). Given the scale of the data involved in this metagenomic research (two orders of magnitude larger than the human genome) and their importance in relation to human health, it is crucial to guarantee (along with the appropriate data collection and taxonomy) proper tools for data analysis. We propose to adapt the approaches defined for the analysis of gene-expression microarray in order to infer information in metagenomics. In particular, we applied SAM, a broadly used tool for the identification of differentially expressed genes among different samples classes, to a reported dataset on a research model with mice of two genotypes (a high density lipoprotein knockout mouse and its wild-type counterpart). The data contain two different diets (high-fat or normal-chow) to ensure the onset of obesity, prodrome of metabolic syndromes (MS). By using 16S rRNA gene as a genomic diversity marker, we illustrate how this approach can identify bacterial populations differentially enriched among different genetic and dietary conditions of the host. This approach faithfully reproduces highly-relevant results from phylogenetic and standard statistical analyses, used to explain the role of the gut microbiome in relation to obesity. This represents a promising proof-of-principle for using functional genomic approaches in the fast growing area of metagenomics, and warrants the availability of a large body of thoroughly tested and theoretically sound methodologies to this exciting new field.
human microbiome; functional genomic; metagenomics
Protein kinase phosphorylation is central to the regulation and control of protein and cellular function. Over the past decade, the development of many high-throughput approaches has revolutionized the understanding of protein phosphorylation and allowed rapid and unbiased surveys of phosphoproteins and phosphorylation events. In addition to this technological advancement, there have also been computational improvements; recent studies on network models of protein phosphorylation have provided many insights into the cellular processes and pathways regulated by phosphorylation. This article gives an overview of experimental and computational techniques for identifying and analyzing protein phosphorylation on a systems level.
systems biology; protein phosphorylation; mass spectrometry; protein microarray; kinase; phosphoprotein
Our understanding of the genetic mechanisms of organismal aging has advanced dramatically during the past two decades. With the development of large-scale RNAi screens, the last few years saw the remarkable identifications of hundreds of new longevity genes in the roundworm Caenorhabditis elegans. The various RNAi screens revealed many biological pathways previously unknown to be related to aging. In this review, we focus on findings from the recent large-scale RNAi longevity screens, and discuss insights they have provided into the complex biological process of aging and considerations of the RNAi technology will continue to have on the future development of the aging field.
aging; RNAi screen; gene networks; C. elegans; lifespan
Nature is replete with examples of diverse cell types, tissues and body plans, forming very different creatures from genomes with similar gene complements. However, while the genes and the structures of proteins they encode can be highly conserved, the production of those proteins in specific cell types and at specific developmental time points might differ considerably between species. A full understanding of the factors that orchestrate gene expression will be essential to fully understand evolutionary variety. Transcription factor (TF) proteins, which form gene regulatory networks (GRNs) to act in cooperative or competitive partnerships to regulate gene expression, are key components of these unique regulatory programs. Although many TFs are conserved in structure and function, certain classes of TFs display extensive levels of species diversity. In this review, we highlight families of TFs that have expanded through gene duplication events to create species-unique repertoires in different evolutionary lineages. We discuss how the hierarchical structures of GRNs allow for flexible small to large-scale phenotypic changes. We survey evidence that explains how newly evolved TFs may be integrated into an existing GRN and how molecular changes in TFs might impact the GRNs. Finally, we review examples of traits that evolved due to lineage-specific TFs and species differences in GRNs.
transcription factors; gene regulatory network; evolution; lineage-specific genes
While initiation of transcription has attracted the most attention in the field of gene regulation, it has become clear that additional stages in the gene expression cascade including post-transcriptional events are under equally exquisite control. The seminal discovery that short RNAs (microRNA, small interfering RNA, Piwi-interacting RNA), play important roles in repressing gene expression has spurred a rush of new interest in post-transcriptional gene silencing mechanisms. The development of affinity tags and high-resolution tandem mass spectrometry (MS/MS) has greatly simplified the analysis of proteins that regulate gene expression. Further, the use of DNA microarrays and ‘second generation’ nucleic acid sequencing (‘deep sequencing’) technologies has facilitated the identification of their regulatory targets. These technological advancements mark a significant step towards a comprehensive understanding of gene regulatory networks. The purpose of this review is to highlight several recent reports that illustrate the value of affinity-purification (immunoprecipitation) followed by mass spectrometric protein analysis and nucleic acid analysis by deep sequencing (AP-MS/Seq) to examine mRNA after it has been transcribed. The ability to identify the direct nucleic acid targets of post-transcriptional gene regulatory machines is a critical first step towards understanding the contribution of post-transcriptional pathways on gene expression.
AP-MS; mass spectrometry; deep sequencing; microRNA; post-transcriptional gene silencing; Argonaute
As time goes by, it becomes more and more apparent that the puzzles of life involve more and more molecular pieces that fit together in increasingly complex ways. Genomics and Proteomics technologies nowadays, produce reliable and quantitative data that could potentially reveal all the molecular pieces of a particular puzzle. However, this is akin to the opening of Pandora’s box; and we are now facing the problem of integrating this vast amount of data with its incredible complexity into some coherent whole. With the aid of engineering methods designed to build and analyze computerized man-made systems, a new emerging field called ‘Executable Biology’ aims to create computer programmes that put together the pieces in ways that allows capturing their dynamicity and ultimately elucidating how molecular function generates cellular function. This review aspires to highlight the main features characterizing these kinds of executable models and what makes them uniquely qualified to reason about and analyze biological networks.
computational modelling; signalling pathways