The eukaryotic cell nucleus displays a high degree of spatial organization, with discrete functional subcompartments that provide microenvironments where specialized processes take place. Concordantly, the genome also adopts defined conformations that, in part, enable specific genomic regions to interface with these functional centers. Yet the roles of many subcompartments and the genomic regions that contact them have not been explored fully. More fundamentally, it is not entirely clear how genome organization impacts function, and vice versa. The past decade has witnessed the development of a new breed of methods that are capable of assessing the spatial organization of the genome. These stand to further our understanding of the relationship between genome structure and function, and potentially assign function to various nuclear subcompartments. Here, we review the principal techniques used for analyzing genomic interactions, the functional insights they have afforded and discuss the outlook for future advances in nuclear structure and function dynamics.
genome organization; nuclear structure and function; chromosome conformation capture; next generation sequencing
Pluripotent embryonic stem (ES) cells are specialized cells with a dynamic chromatin structure, which is intimately connected with their pluripotency and physiology. In recent years somatic cells have been reprogrammed to a pluripotent state through over-expression of a defined set of transcription factors. These cells, known as induced pluripotent stem (iPS) cells, recapitulate ES cell properties and can be differentiated to apparently all cell lineages, making iPS cells a suitable replacement for ES cells in future regenerative medicine. Chromatin modifiers play a key function in establishing and maintaining pluripotency, therefore, elucidating the mechanisms controlling chromatin structure in both ES and iPS cells is of utmost importance to understanding their properties and harnessing their therapeutic potential. In this review, we discuss recent studies that provide a genome-wide view of the chromatin structure signature in ES cells and iPS cells and that highlight the central role of histone modifiers and chromatin remodelers in pluripotency maintenance and induction.
embryonic stem cells; induced pluripotent stem cells; reprogramming; epigenetics; chromatin structure; differentiation
The mechanisms regulating the coordinate activation of tens of thousands of replication origins in multicellular organisms remain poorly explored. Recent advances in genomics have provided valuable information about the sites at which DNA replication is initiated and the selection mechanisms of specific sites in both yeast and vertebrates. Studies in yeast have advanced to the point that it is now possible to develop convincing models for origin selection. A general model has emerged, but yeast data have also revealed an unsuspected diversity of strategies for origin positioning. We focus here on the ways in which chromatin structure may affect the formation of pre-replication complexes, a prerequisite for origin activation. We also discuss the need to exercise caution when trying to extrapolate yeast models directly to more complex vertebrate genomes.
DNA replication origin; nucleosome positioning; chromatin structure; transcription factors; genome-wide studies
Chromatin modifications at both histones and DNA are critical for regulating gene expression. Mis-regulation of such epigenetic marks can lead to pathological states; indeed, cancer affecting the hematopoietic system is frequently linked to epigenetic abnormalities. Here, we discuss the different types of modifications and their general impact on transcription, as well as the polycomb group of proteins, which effect transcriptional repression and are often mis-regulated. Further, we discuss how chromosomal translocations leading to fusion proteins can aberrantly regulate gene transcription through chromatin modifications within the hematopoietic system. PML–RARa, AML1–ETO and MLL-fusions are examples of fusion proteins that mis-regulate epigenetic modifications (either directly or indirectly), which can lead to acute myeloblastic leukemia (AML). An in-depth understanding of the mechanisms behind the mis-regulation of epigenetic modifications that lead to the development and progression of AMLs could be critical for designing effective treatments.
chromatin; epigenetics; transcription; leukemia; polycom
Structural variations are widespread in the human genome and can serve as genetic markers in clinical and evolutionary studies. With the advances in the next-generation sequencing technology, recent methods allow for identification of structural variations with unprecedented resolution and accuracy. They also provide opportunities to discover variants that could not be detected on conventional microarray-based platforms, such as dosage-invariant chromosomal translocations and inversions. In this review, we will describe some of the sequencing-based algorithms for detection of structural variations and discuss the key issues in future development.
copy number variations; paired-end sequencing; chromosomal alterations; translocations; indels
Next generation sequencing has brought epigenomic studies to the forefront of current research. The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago. With early proof of concept studies published, the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount. In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms. We will discuss these in terms of library preparation, sequence platforms and analysis techniques.
epigenomics; next generation sequencing
The epigenome plays the pivotal role as interface between genome and environment. True genome-wide assessments of epigenetic marks, such as DNA methylation (methylomes) or chromatin modifications (chromatinomes), are now possible, either through high-throughput arrays or increasingly by second-generation DNA sequencing methods. The ability to collect these data at this level of resolution enables us to begin to be able to propose detailed questions, and interrogate this information, with regards to changes that occur due to development, lineage and tissue-specificity, and significantly those caused by environmental influence, such as ageing, stress, diet, hormones or toxins. Common complex traits are under variable levels of genetic influence and additionally epigenetic effect. The detection of pathological epigenetic alterations will reveal additional insights into their aetiology and how possible environmental modulation of this mechanism may occur. Due to the reversibility of these marks, the potential for sequence-specific targeted therapeutics exists. This review surveys recent epigenomic advances and their current and prospective application to the study of common diseases.
Genomics; epigenetics; epigenomics; common disease; complex traits; gene environment interaction
Gene Set Enrichment (GSE) is a computational technique which determines whether a priori defined set of genes show statistically significant differential expression between two phenotypes. Currently, the gene sets used for GSE are derived from annotation or pathway databases, which often contain computationally based and unrepresentative data. Here, we propose a novel approach for the generation of comprehensive and biologically derived gene sets, deriving sets through the application of machine learning techniques to gene expression data. These gene sets can be produced for specific tissues, developmental stages or environments. They provide a powerful and functionally meaningful way in which to mine genomewide association and next generation sequencing data in order to identify disease-associated variants and pathways.
gene set enrichment; annotation database; gene expression data; machine learning; next generation sequencing
Eukaryotic cells express a large variety of ribonucleic acid-(RNA)-binding proteins (RBPs) with diverse affinity and specificity towards target RNAs that play a crucial role in almost every aspect of RNA metabolism. In addition, specific domains in RBPs impart catalytic activity or mediate protein–protein interactions, making RBPs versatile regulators of gene expression. In this review, we elaborate on recent experimental and computational approaches that have increased our understanding of RNA–protein interactions and their role in cellular function. We review aspects of gene expression that are modulated post-transcriptionally by RBPs, namely the stability of polymerase II-derived mRNA transcripts and their rate of translation into proteins. We further highlight the extensive regulatory networks of RBPs that implement a combinatorial control of gene expression. Taking cues from the recent development in the field, we argue that understanding spatio-temporal RNA–protein association on a transcriptome level will provide invaluable and unexpected insights into the regulatory codes that define growth, differentiation and disease.
RNA-binding proteins; RNA-binding domains; RBP–RNA interaction; RBP regulatory networks; RBP target identification
Next-generation sequencing technologies are making a substantial impact on many areas of biology, including the analysis of genetic diversity in populations. However, genome-scale population genetic studies have been accessible only to well-funded model systems. Restriction-site associated DNA sequencing, a method that samples at reduced complexity across target genomes, promises to deliver high resolution population genomic data—thousands of sequenced markers across many individuals—for any organism at reasonable costs. It has found application in wild populations and non-traditional study species, and promises to become an important technology for ecological population genomics.
RADSeq; population genetics; next-generation sequencing; genetic marker discovery; SNP discovery
Genomic imprinting refers to genes that are epigenetically programmed in the germline to express exclusively or preferentially one allele in a parent-of-origin manner. Expression-based genome-wide screening for the identification of imprinted genes has failed to uncover a significant number of new imprinted genes, probably because of the high tissue- and developmental-stage specificity of imprinted gene expression. A very large number of technical and biological artifacts can also lead to the erroneous evidence of imprinted gene expression. In this article, we focus on three common sources of potential confounding effects: (i) random monoallelic expression in monoclonal cell populations, (ii) genetically determined monoallelic expression and (iii) contamination or infiltration of embryonic tissues with maternal material. This last situation specifically applies to genes that occur as maternally expressed in the placenta. Beside the use of reciprocal crosses that are instrumental to confirm the parental specificity of expression, we provide additional methods for the detection and elimination of these situations that can be misinterpreted as cases of imprinted expression.
genomic imprinting; DNA methylation; monoallelic expression; germline; placenta
Fertilization of the oocyte by the sperm results in the formation of a totipotent zygote, in which the maternal and paternal chromatin is enclosed in two pronuclei undergoing distinct programmes of transcriptional activation and chromatin remodelling. The highly packaged paternal chromatin delivered by the sperm is decondensed and acquires a number of specific epigenetic marks, but markedly remains devoid of those usually associated with constitutive heterochromatin. During this period the maternal chromatin remains relatively stable except for marks associated with transcription and/or replication such as arginine methylation and H3/H4 acetylation. The embryo then undergoes a series of mitotic divisions without significant additional growth but differentiation, resulting in the formation of a blastocyst containing distinct cell types. The chromatin remodelling events during these stages are likely to be important in establishing the nuclear foundations required for later triggers of differentiation. Overall, we summarize three important points during these earliest reprogramming events: (i) relatively stable maternal chromatin after fertilization, (ii) rapid acquisition of specific histone marks by the paternal chromatin during the hours that follow fertilization and (iii) rapid remodelling of constitutive heterochromatic marks and modifications in the core of the nucleosome from the first mitotic division. These features are likely to be required for the creation of a chromatin environment compatible with cellular reprogramming and plasticity.
Mouse embryo; epigenetic reprogramming; cell plasticity; totipotency; chromatin; heterochromatin; methylation
Chromatin-immunoprecipitation and sequencing (ChIP-seq) is a rapidly maturing technology that draws on the power of high-throughput short-read sequencing to decipher chromatin states with unprecedented precision and breadth. Although some aspects of the experimental protocol require careful tuning, the bottleneck currently firmly lies with the downstream data analysis. We give an overview of the better-established aspects of genome mapping and data normalization and we describe the more recent progress in peak calling and their statistical analysis and provide a brief overview of popular follow-up analyses such as genomic feature categorization and motif search.
ChIP-seq; high-throughput sequencing; DNA binding; transcriptional regulation; bioinformatics
Specific binding of transcription factors (TFs) determines in a large part the connectivity of gene regulatory networks as well as the quantitative level of gene expression. A multiplicity of both experimental and computational methods is currently used to discover and characterize the underlying TF–DNA interactions. Experimental methods can be further subdivided into in vitro- and in vivo-based approaches, each accenting different aspects of TF-binding events. In this review we summarize the flexibility and performance of a selection of both types of experimental methods. In conclusion, we argue that a serial combination of methods with different throughput and data type constitutes an optimal experimental strategy.
transcription factor; DNA binding; binding affinity; ChIP; MITOMI; PBM; SELEX
With the expanding availability of sequencing technologies, research previously centered on the human genome can now afford to include the study of humans’ internal ecosystem (human microbiome). Given the scale of the data involved in this metagenomic research (two orders of magnitude larger than the human genome) and their importance in relation to human health, it is crucial to guarantee (along with the appropriate data collection and taxonomy) proper tools for data analysis. We propose to adapt the approaches defined for the analysis of gene-expression microarray in order to infer information in metagenomics. In particular, we applied SAM, a broadly used tool for the identification of differentially expressed genes among different samples classes, to a reported dataset on a research model with mice of two genotypes (a high density lipoprotein knockout mouse and its wild-type counterpart). The data contain two different diets (high-fat or normal-chow) to ensure the onset of obesity, prodrome of metabolic syndromes (MS). By using 16S rRNA gene as a genomic diversity marker, we illustrate how this approach can identify bacterial populations differentially enriched among different genetic and dietary conditions of the host. This approach faithfully reproduces highly-relevant results from phylogenetic and standard statistical analyses, used to explain the role of the gut microbiome in relation to obesity. This represents a promising proof-of-principle for using functional genomic approaches in the fast growing area of metagenomics, and warrants the availability of a large body of thoroughly tested and theoretically sound methodologies to this exciting new field.
human microbiome; functional genomic; metagenomics
Eukaryotic chromatin can be highly dynamic and can continuously exchange between an open transcriptionally active conformation and a compacted silenced one. Post-translational modifications of histones have a pivotal role in regulating chromatin states, thus influencing all chromatin dependent processes. Methylation is currently one of the best characterized histone modification and occurs on arginine and lysine residues. Histone methylation can regulate other modifications (e.g. acetylation, phosphorylation and ubiquitination) in order to define a precise functional chromatin environment. In this review we focus on histone methylation and demethylation, as well as on the enzymes responsible for setting these marks. In particular we are describing novel concepts on the interdependence of histone modifications marks and discussing the molecular mechanisms governing this cross-talks.
histone modifications; histone methylation; cross-talk; epigenetic; chromatin
Protein kinase phosphorylation is central to the regulation and control of protein and cellular function. Over the past decade, the development of many high-throughput approaches has revolutionized the understanding of protein phosphorylation and allowed rapid and unbiased surveys of phosphoproteins and phosphorylation events. In addition to this technological advancement, there have also been computational improvements; recent studies on network models of protein phosphorylation have provided many insights into the cellular processes and pathways regulated by phosphorylation. This article gives an overview of experimental and computational techniques for identifying and analyzing protein phosphorylation on a systems level.
systems biology; protein phosphorylation; mass spectrometry; protein microarray; kinase; phosphoprotein
Our understanding of the genetic mechanisms of organismal aging has advanced dramatically during the past two decades. With the development of large-scale RNAi screens, the last few years saw the remarkable identifications of hundreds of new longevity genes in the roundworm Caenorhabditis elegans. The various RNAi screens revealed many biological pathways previously unknown to be related to aging. In this review, we focus on findings from the recent large-scale RNAi longevity screens, and discuss insights they have provided into the complex biological process of aging and considerations of the RNAi technology will continue to have on the future development of the aging field.
aging; RNAi screen; gene networks; C. elegans; lifespan
Differential gene expression plays a critical role in the development and physiology of multicellular organisms. At a ‘systems level’ (e.g. at the level of a tissue, organ or whole organism), this process can be studied using gene regulatory network (GRN) models that capture physical and regulatory interactions between genes and their regulators. In the past years, significant progress has been made toward the mapping of GRNs using a variety of experimental and computational approaches. Here, we will discuss gene-centered approaches that we employed to characterize GRNs and describe insights that we have obtained into the global design principles of gene regulation in complex metazoan systems.
transcription factor; differential gene expression; gene-centered; gene regulatory network; yeast one-hybrid assay; Caenorhabditis elegans
Nature is replete with examples of diverse cell types, tissues and body plans, forming very different creatures from genomes with similar gene complements. However, while the genes and the structures of proteins they encode can be highly conserved, the production of those proteins in specific cell types and at specific developmental time points might differ considerably between species. A full understanding of the factors that orchestrate gene expression will be essential to fully understand evolutionary variety. Transcription factor (TF) proteins, which form gene regulatory networks (GRNs) to act in cooperative or competitive partnerships to regulate gene expression, are key components of these unique regulatory programs. Although many TFs are conserved in structure and function, certain classes of TFs display extensive levels of species diversity. In this review, we highlight families of TFs that have expanded through gene duplication events to create species-unique repertoires in different evolutionary lineages. We discuss how the hierarchical structures of GRNs allow for flexible small to large-scale phenotypic changes. We survey evidence that explains how newly evolved TFs may be integrated into an existing GRN and how molecular changes in TFs might impact the GRNs. Finally, we review examples of traits that evolved due to lineage-specific TFs and species differences in GRNs.
transcription factors; gene regulatory network; evolution; lineage-specific genes
In the last decade or so, advances in genome-scale technologies have allowed systematic and detailed analysis of gene function. The experimental accessibility of budding yeast makes it a test-bed for technology development and application of new functional genomic tools and resources that pave the way for comparable efforts in higher eukaryotes. In this article, we review advances in reporter screening technology to discover trans-acting regulators of promoters (or cis-elements) of interest in the context of a novel functional genomics approach called Reporter Synthetic Genetic Array (R-SGA) analysis. We anticipate that this methodology will enable researchers to collect quantitative data on hundreds of gene expression pathways in an effort to better understand transcriptional regulatory networks.
gene expression; reporter gene; Saccharomyces cerevisiae; cell cycle; histone gene
While initiation of transcription has attracted the most attention in the field of gene regulation, it has become clear that additional stages in the gene expression cascade including post-transcriptional events are under equally exquisite control. The seminal discovery that short RNAs (microRNA, small interfering RNA, Piwi-interacting RNA), play important roles in repressing gene expression has spurred a rush of new interest in post-transcriptional gene silencing mechanisms. The development of affinity tags and high-resolution tandem mass spectrometry (MS/MS) has greatly simplified the analysis of proteins that regulate gene expression. Further, the use of DNA microarrays and ‘second generation’ nucleic acid sequencing (‘deep sequencing’) technologies has facilitated the identification of their regulatory targets. These technological advancements mark a significant step towards a comprehensive understanding of gene regulatory networks. The purpose of this review is to highlight several recent reports that illustrate the value of affinity-purification (immunoprecipitation) followed by mass spectrometric protein analysis and nucleic acid analysis by deep sequencing (AP-MS/Seq) to examine mRNA after it has been transcribed. The ability to identify the direct nucleic acid targets of post-transcriptional gene regulatory machines is a critical first step towards understanding the contribution of post-transcriptional pathways on gene expression.
AP-MS; mass spectrometry; deep sequencing; microRNA; post-transcriptional gene silencing; Argonaute
As time goes by, it becomes more and more apparent that the puzzles of life involve more and more molecular pieces that fit together in increasingly complex ways. Genomics and Proteomics technologies nowadays, produce reliable and quantitative data that could potentially reveal all the molecular pieces of a particular puzzle. However, this is akin to the opening of Pandora’s box; and we are now facing the problem of integrating this vast amount of data with its incredible complexity into some coherent whole. With the aid of engineering methods designed to build and analyze computerized man-made systems, a new emerging field called ‘Executable Biology’ aims to create computer programmes that put together the pieces in ways that allows capturing their dynamicity and ultimately elucidating how molecular function generates cellular function. This review aspires to highlight the main features characterizing these kinds of executable models and what makes them uniquely qualified to reason about and analyze biological networks.
computational modelling; signalling pathways