Deoxyribonucleic acid methylation is a long known epigenetic mark involved in many biological processes and the ‘readers’ of this mark belong to several distinct protein families that ‘read’ and ‘translate’ the methylation mark into a function. Methyl-CpG binding domain proteins belong to one of these families that are associated with transcriptional activation/repression, regulation of chromatin structure, pluripotency, development, and differentiation. Discovered decades ago, the systematic determination of the genomic binding sites of these readers and their epigenome make-up at a genome-wide level revealed the tip of the functional iceberg. This review focuses on two members of the methyl binding proteins, namely MBD2 and MBD3 that reside in very similar complexes, yet appear to have very different biological roles. We provide a comprehensive comparison of their genome-wide binding features and emerging roles in gene regulation.
DNA methylation; methyl-CpG binding domain proteins; MBD2; MBD3; transcription regulation; chromatin immunoprecipitation
Aberrant DNA methylation often occurs in colorectal cancer (CRC). In our study we applied a genome-wide DNA methylation analysis approach, MethylCap-seq, to map the differentially methylated regions (DMRs) in 24 tumors and matched normal colon samples. In total, 2687 frequently hypermethylated and 468 frequently hypomethylated regions were identified, which include potential biomarkers for CRC diagnosis. Hypermethylation in the tumor samples was enriched at CpG islands and gene promoters, while hypomethylation was distributed throughout the genome. Using epigenetic data from human embryonic stem cells, we show that frequently hypermethylated regions coincide with bivalent loci in human embryonic stem cells. DNA methylation is commonly thought to lead to gene silencing; however, integration of publically available gene expression data indicates that 75% of the frequently hypermethylated genes were most likely already lowly or not expressed in normal tissue. Collectively, our study provides genome-wide DNA methylation maps of CRC, comprehensive lists of DMRs, and gives insights into the role of aberrant DNA methylation in CRC formation.
DNA methylation; colorectal cancer; biomarkers; H3K27me3; gene expression; Illumina sequencing
Rodent malaria parasites (RMP) are used extensively as models of human malaria. Draft RMP genomes have been published for Plasmodium yoelii, P. berghei ANKA (PbA) and P. chabaudi AS (PcAS). Although availability of these genomes made a significant impact on recent malaria research, these genomes were highly fragmented and were annotated with little manual curation. The fragmented nature of the genomes has hampered genome wide analysis of Plasmodium gene regulation and function.
We have greatly improved the genome assemblies of PbA and PcAS, newly sequenced the virulent parasite P. yoelii YM genome, sequenced additional RMP isolates/lines and have characterized genotypic diversity within RMP species. We have produced RNA-seq data and utilised it to improve gene-model prediction and to provide quantitative, genome-wide, data on gene expression. Comparison of the RMP genomes with the genome of the human malaria parasite P. falciparum and RNA-seq mapping permitted gene annotation at base-pair resolution. Full-length chromosomal annotation permitted a comprehensive classification of all subtelomeric multigene families including the ‘Plasmodium interspersed repeat genes’ (pir). Phylogenetic classification of the pir family, combined with pir expression patterns, indicates functional diversification within this family.
Complete RMP genomes, RNA-seq and genotypic diversity data are excellent and important resources for gene-function and post-genomic analyses and to better interrogate Plasmodium biology. Genotypic diversity between P. chabaudi isolates makes this species an excellent parasite to study genotype-phenotype relationships. The improved classification of multigene families will enhance studies on the role of (variant) exported proteins in virulence and immune evasion/modulation.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-014-0086-0) contains supplementary material, which is available to authorized users.
Plasmodium chabaudi; Plasmodium berghei; Plasmodium yoelii; Genomes; RNA-seq; Genotypic diversity; Multigene families; pirs; Phylogeny
Wnt signaling activates gene expression through the induced formation of complexes between DNA-binding T-cell factors (TCFs) and the transcriptional coactivator β-catenin. In colorectal cancer, activating Wnt pathway mutations transform epithelial cells through the inappropriate activation of a TCF7L2/TCF4 target gene program. Through a DNA array-based genome-wide analysis of TCF4 chromatin occupancy, we have identified 6,868 high-confidence TCF4-binding sites in the LS174T colorectal cancer cell line. Most TCF4-binding sites are located at large distances from transcription start sites, while target genes are frequently “decorated” by multiple binding sites. Motif discovery algorithms define the in vivo-occupied TCF4-binding site as evolutionarily conserved A-C/G-A/T-T-C-A-A-A-G motifs. The TCF4-binding regions significantly correlate with Wnt-responsive gene expression profiles derived from primary human adenomas and often behave as β-catenin/TCF4-dependent enhancers in transient reporter assays.
Understanding the links between genetic, epigenetic and non-genetic factors throughout the lifespan and across generations and their role in disease susceptibility and disease progression offer entirely new avenues and solutions to major problems in our society. To overcome the numerous challenges, we have come up with nine major conclusions to set the vision for future policies and research agendas at the European level.
Genome; Epigenome; Microbiome; Environment
MBD2 is a subunit of the NuRD complex that is postulated to mediate gene repression via recruitment of the complex to methylated DNA. In this study we adopted an MBD2 tagging-approach to study its genome wide binding characteristics. We show that in vivo MBD2 is mainly recruited to CpG island promoters that are highly methylated. Interestingly, MBD2 binds around 1 kb downstream of the transcription start site of a subset of ∼400 CpG island promoters that are characterized by the presence of active histone marks, RNA polymerase II (Pol2) and low to medium gene expression levels and H3K36me3 deposition. These tagged-MBD2 binding sites in MCF-7 show increased methylation in a cohort of primary breast cancers but not in normal breast samples, suggesting a putative role for MBD2 in breast cancer.
Post-translational modifications of core histones play an important role in regulating fundamental biological processes such as DNA repair, transcription and replication. In this paper, we describe a novel assay that allows sequential targeting of distinct histone modifying enzymes to immobilized nucleosomal templates using recombinant chimeric targeting molecules. The assay can be used to study the histone substrate specificity of chromatin modifying enzymes as well as whether and how certain enzymes affect each other's histone modifying activities. As such the assay can help to understand how a certain histone code is established and interpreted.
Chromatin; Histone code
Gene transcription mediated by RNA polymerase II (pol-II) is a key step in gene expression. The dynamics of pol-II moving along the transcribed region influence the rate and timing of gene expression. In this work, we present a probabilistic model of transcription dynamics which is fitted to pol-II occupancy time course data measured using ChIP-Seq. The model can be used to estimate transcription speed and to infer the temporal pol-II activity profile at the gene promoter. Model parameters are estimated using either maximum likelihood estimation or via Bayesian inference using Markov chain Monte Carlo sampling. The Bayesian approach provides confidence intervals for parameter estimates and allows the use of priors that capture domain knowledge, e.g. the expected range of transcription speeds, based on previous experiments. The model describes the movement of pol-II down the gene body and can be used to identify the time of induction for transcriptionally engaged genes. By clustering the inferred promoter activity time profiles, we are able to determine which genes respond quickly to stimuli and group genes that share activity profiles and may therefore be co-regulated. We apply our methodology to biological data obtained using ChIP-seq to measure pol-II occupancy genome-wide when MCF-7 human breast cancer cells are treated with estradiol (E2). The transcription speeds we obtain agree with those obtained previously for smaller numbers of genes with the advantage that our approach can be applied genome-wide. We validate the biological significance of the pol-II promoter activity clusters by investigating cluster-specific transcription factor binding patterns and determining canonical pathway enrichment. We find that rapidly induced genes are enriched for both estrogen receptor alpha (ER) and FOXA1 binding in their proximal promoter regions.
Cells express proteins in response to changes in their environment so as to maintain normal function. An initial step in the expression of proteins is transcription, which is mediated by RNA polymerase II (pol-II). To understand changes in transcription arising due to stimuli it is useful to model the dynamics of transcription. We present a probabilistic model of pol-II transcription dynamics that can be used to compute RNA transcription speed and infer the temporal pol-II activity at the gene promoter. The inferred promoter activity profile is used to determine genes that are responding in a coordinated manner to stimuli and are therefore potentially co-regulated. Model parameters are inferred using data from high-throughput sequencing assays, such as ChIP-Seq and GRO-Seq, and can therefore be applied genome-wide in an unbiased manner. We apply the method to pol-II ChIP-Seq time course data from breast cancer cells stimulated by estradiol in order to uncover the dynamics of early response genes in this system.
Immunological memory in vertebrates is often exclusively attributed to T and B cell function. Recently it was proposed that the enhanced and sustained innate immune responses following initial infectious exposure may also afford protection against reinfection. Testing this concept of “trained immunity,” we show that mice lacking functional T and B lymphocytes are protected against reinfection with Candida albicans in a monocyte-dependent manner. C. albicans and fungal cell wall β-glucans induced functional reprogramming of monocytes, leading to enhanced cytokine production in vivo and in vitro. The training required the β-glucan receptor dectin-1 and the noncanonical Raf-1 pathway. Monocyte training by β-glucans was associated with stable changes in histone trimethylation at H3K4, which suggests the involvement of epigenetic mechanisms in this phenomenon. The functional reprogramming of monocytes, reminiscent of similar NK cell properties, supports the concept of “trained immunity” and may be employed for the design of improved vaccination strategies.
Nontypeable Haemophilus influenzae (NTHi) is one of the leading causes of noninvasive mucosal infections, such as otitis media, sinusitis, and conjunctivitis. During its life cycle, NTHi is exposed to different CO2 levels, which vary from ∼0.04% in ambient air during transmission to a new host to over 5% in the respiratory tract and tissues of the human host during colonization and disease. We used the next-generation sequencing Tn-seq technology to identify genes essential for NTHi adaptation to changes in environmental CO2 levels. It appeared that H. influenzae carbonic anhydrase (HICA), which catalyzes the reversible hydration of CO2 to bicarbonate, is a molecular factor that is conditionally essential for NTHi survival in ambient air. Growth of NTHi Δcan strains was restored under 5% CO2-enriched conditions, by supplementation of the growth medium with sodium bicarbonate, or by genetic complementation with the can gene. Finally, we showed that HICA not only is essential for environmental survival but also appeared to be important for intracellular survival in host cells. Hence, HICA is important for NTHi niche adaptation.
Genome-wide association studies (GWAS) revealed genomic risk loci that potentially have an impact on disease and phenotypic traits. This extensive resource holds great promise in providing novel directions for personalized medicine, including disease risk prediction, prevention and targeted medication. One of the major challenges that researchers face on the path between the initial identification of an association and precision treatment of patients is the comprehension of the biological mechanisms that underlie these associations. Currently, the focus to solve these questions lies on the integrative analysis of system-wide data on global genome variation, gene expression, transcription factor binding, epigenetic profiles and chromatin conformation. The generation of this data mainly relies on next-generation sequencing. However, due to multiple recent developments, mass spectrometry-based proteomics now offers additional, by the GWAS field so far hardly recognized possibilities for the identification of functional genome variants and, in particular, for the identification and characterization of (differentially) bound protein complexes as well as physiological target genes. In this review, we introduce these proteomics advances and suggest how they might be integrated in post-GWAS workflows. We argue that the combination of highly complementary techniques is powerful and can provide an unbiased, detailed picture of GWAS loci and their mechanistic involvement in disease.
Although carbon dioxide (CO2) is known to be essential for Streptococcus pneumoniae growth, it is poorly understood how this respiratory tract pathogen adapts to the large changes in environmental CO2 levels it encounters during transmission, host colonization, and disease. To identify the molecular mechanisms that facilitate pneumococcal growth under CO2-poor conditions, we generated a random S. pneumoniae R6 mariner transposon mutant library representing mutations in 1,538 different genes and exposed it to CO2-poor ambient air. With Tn-seq, we found mutations in two genes that were involved in S. pneumoniae adaptation to changes in CO2 availability. The gene pca, encoding pneumococcal carbonic anhydrase (PCA), was absolutely essential for S. pneumoniae growth under CO2-poor conditions. PCA catalyzes the reversible hydration of endogenous CO2 to bicarbonate (HCO3−) and was previously demonstrated to facilitate HCO3−-dependent fatty acid biosynthesis. The gene folC that encodes the dihydrofolate/folylpolyglutamate synthase was required at the initial phase of bacterial growth under CO2-poor culture conditions. FolC compensated for the growth-phase-dependent decrease in S. pneumoniae intracellular long-chain (n > 3) polyglutamyl folate levels, which was most pronounced under CO2-poor growth conditions. In conclusion, S. pneumoniae adaptation to changes in CO2 availability involves the retention of endogenous CO2 and the preservation of intracellular long-chain polyglutamyl folate pools.
Exploitation of embryonic stem cells (ESC) for therapeutic use and biomedical applications is severely hampered by the risk of teratocarcinoma formation. Here, we performed a screen of selected epi-modulating compounds and demonstrate that a transient exposure of mouse ESC to MS-275 (Entinostat), a class I histone deacetylase inhibitor (HDAC), modulates differentiation and prevents teratocarcinoma formation. Morphological and molecular data indicate that MS-275-primed ESCs are committed towards neural differentiation, which is supported by transcriptome analyses. Interestingly, in vitro withdrawal of MS-275 reverses the primed cells to the pluripotent state. In vivo, MS275-primed ES cells injected into recipient mice give only rise to benign teratomas but not teratocarcinomas with prevalence of neural-derived structures. In agreement, MS-275-primed ESC are unable to colonize blastocysts. These findings provide evidence that a transient alteration of acetylation alters the ESC fate.
Stem cell; Epigenetic; HDACi
Identification of responsive genes to an extra-cellular cue enables characterization of pathophysiologically crucial biological processes. Deep sequencing technologies provide a powerful means to identify responsive genes, which creates a need for computational methods able to analyze dynamic and multi-level deep sequencing data. To answer this need we introduce here a data-driven algorithm, SPINLONG, which is designed to search for genes that match the user-defined hypotheses or models. SPINLONG is applicable to various experimental setups measuring several molecular markers in parallel. To demonstrate the SPINLONG approach, we analyzed ChIP-seq data reporting PolII, estrogen receptor (), H3K4me3 and H2A.Z occupancy at five time points in the MCF-7 breast cancer cell line after estradiol stimulus. We obtained 777 early responsive genes and compared the biological functions of the genes having binding within 20 kb of the transcription start site (TSS) to genes without such binding site. Our results show that the non-genomic action of via the MAPK pathway, instead of direct binding, may be responsible for early cell responses to activation. Our results also indicate that the responsive genes triggered by the genomic pathway are transcribed faster than those without binding sites. The survival analysis of the 777 responsive genes with 150 primary breast cancer tumors and in two independent validation cohorts indicated the ATAD3B gene, which does not have binding site within 20 kb of its TSS, to be significantly associated with poor patient survival.
Cellular processes in mammalian cells are tightly regulated to ensure that the cells function properly as a part of an organism. Dysregulation of some of these processes, such as apoptosis, cell proliferation and growth, can lead to cancer. One of the most important regulation mechanisms for cellular processes is via activation of membrane receptors by extra-cellular stimulus. Such cues trigger signal cascades that lead to altered expression of a number of genes in the cell nucleus; a key challenge in biomedicine is to identify which genes respond to a specific stimulus. These so called response genes can be investigated on a whole-genome scale with genomic sequencing, which is a technology that can quantify protein binding to DNA or gene activation. Analysis of such whole-genome data, however, is challenging due to billions of data points measured in the experiments. Here we introduce a novel computational method, SPINLONG, which is a widely applicable novel computational method that integrates multiple levels of deep sequencing data to produce experimentally testable hypotheses. We applied SPINLONG to breast cancer data and found early responsive genes for estrogen receptor and analyzed their regulation. These analyses resulted in a gene whose high activity is associated with decreased breast cancer patient survival.
The human tumour antigen PRAME (preferentially expressed antigen in melanoma) is frequently overexpressed during oncogenesis, and high PRAME levels are associated with poor clinical outcome in a variety of cancers. However, the molecular pathways in which PRAME is implicated are not well understood. We recently characterized PRAME as a BC-box subunit of a Cullin2-based E3 ubiquitin ligase. In this study, we mined the PRAME interactome to a deeper level and identified specific interactions with OSGEP and LAGE3, which are human orthologues of the ancient EKC/KEOPS complex. By characterizing biochemically the human EKC complex and its interactions with PRAME, we show that PRAME recruits a Cul2 ubiquitin ligase to EKC. Moreover, EKC subunits associate with PRAME target sites on chromatin. Our data reveal a novel link between the oncoprotein PRAME and the conserved EKC complex and support a role for both complexes in the same pathways.
The liver X receptors (LXRs) are nuclear receptors that form permissive heterodimers with retinoid X receptor (RXR) and are important regulators of lipid metabolism in the liver. We have recently shown that RXR agonist-induced hypertriglyceridemia and hepatic steatosis in mice are dependent on LXRs and correlate with an LXR-dependent hepatic induction of lipogenic genes. To further investigate the roles of RXR and LXR in the regulation of hepatic gene expression, we have mapped the ligand-regulated genome-wide binding of these factors in mouse liver. We find that the RXR agonist bexarotene primarily increases the genomic binding of RXR, whereas the LXR agonist T0901317 greatly increases both LXR and RXR binding. Functional annotation of putative direct LXR target genes revealed a significant association with classical LXR-regulated pathways as well as peroxisome proliferator-activated receptor (PPAR) signaling pathways, and subsequent chromatin immunoprecipitation-sequencing (ChIP-seq) mapping of PPARα binding demonstrated binding of PPARα to 71 to 88% of the identified LXR-RXR binding sites. The combination of sequence analysis of shared binding regions and sequential ChIP on selected sites indicate that LXR-RXR and PPARα-RXR bind to degenerate response elements in a mutually exclusive manner. Together, our findings suggest extensive and unexpected cross talk between hepatic LXR and PPARα at the level of binding to shared genomic sites.
Mouse embryonic stem (ES) cells grown in serum exhibit greater heterogeneity in morphology and expression of pluripotency factors than ES cells cultured in defined medium with inhibitors of two kinases (Mek and GSK3), a condition known as “2i” postulated to establish a naive ground state. We show that the transcriptome and epigenome profiles of serum- and 2i-grown ES cells are distinct. 2i-treated cells exhibit lower expression of lineage-affiliated genes, reduced prevalence at promoters of the repressive histone modification H3K27me3, and fewer bivalent domains, which are thought to mark genes poised for either up- or downregulation. Nonetheless, serum- and 2i-grown ES cells have similar differentiation potential. Precocious transcription of developmental genes in 2i is restrained by RNA polymerase II promoter-proximal pausing. These findings suggest that transcriptional potentiation and a permissive chromatin context characterize the ground state and that exit from it may not require a metastable intermediate or multilineage priming.
► High-resolution genome-wide transcriptome and epigenome of naive pluripotency ► Reduced H3K27me3 at promoters and fewer bivalent domains in naive ES cells ► Reduced lineage priming and increased RNA polymerase II pausing in the naive state ► Naive ES cells show no delay in differentiation
Ground state pluripotency is characterized by a permissive chromatin context, but gene expression is not promiscuous due to the high prevalence of promoter-proximal pausing of transcription.
Nucleosome translocation along DNA is catalyzed by eukaryotic SNF2-type ATPases. One class of SNF2-ATPases is distinguished by the presence of a C-terminal bromodomain and is conserved from yeast to man and plants. This class of SNF2 enzymes forms rather large protein complexes that are collectively called SWI/SNF complexes. They are involved in transcription and DNA repair. Two broad types of SWI/SNF complexes have been reported in the literature; PBAF and BAF. These are distinguished by the inclusion or not of polybromo and several ARID subunits. Here we investigated human SS18, a protein that is conserved in plants and animals. SS18 is a putative SWI/SNF subunit which has been implicated in the etiology of synovial sarcomas by virtue of being a target for oncogenic chromosomal translocations that underlie synovial sarcomas.
We pursued a proteomic approach whereby the SS18 open reading frame was fused to a tandem affinity purification tag and expressed in amenable human cells. The fusion permitted efficient and exclusive purification of so-called BAF-type SWI/SNF complexes which bear ARID1A/BAF250a or ARID1B/BAF250b subunits. This demonstrates that SS18 is a BAF subtype-specific SWI/SNF complex subunit. The same result was obtained when using the SS18-SSX1 oncogenic translocation product. Furthermore, SS18L1, DPF1, DPF2, DPF3, BRD9, BCL7A, BCL7B and BCL7C were identified. ‘Complex walking’ showed that they all co-purify with each other, defining human BAF-type complexes. By contrast,we demonstrate that human PHF10 is part of the PBAF complex, which harbors both ARID2/BAF200 and polybromo/BAF180 subunits, but not SS18 and nor the above BAF-specific subunits.
SWI/SNF complexes are found in most eukaryotes and in the course of evolution new SWI/SNF subunits appeared. SS18 is found in plants as well as animals. Our results suggest that in both protostome and deuterostome animals, a class of BAF-type SWI/SNF complexes will be found that harbor SS18 or its paralogs, along with ARID1, DPF and BCL7 paralogs. Those BAF complexes are proteomically distinct from the eukaryote-wide PBAF-type SWI/SNF complexes. Finally, our results suggests that the human bromodomain factors BRD7 and BRD9 associate with PBAF and BAF, respectively.
Chromatin Immuno Precipitation (ChIP) profiling detects in vivo protein-DNA binding, and has revealed a large combinatorial complexity in the binding of chromatin associated proteins and their post-translational modifications. To fully explore the spatial and combinatorial patterns in ChIP-profiling data and detect potentially meaningful patterns, the areas of enrichment must be aligned and clustered, which is an algorithmically and computationally challenging task. We have developed CATCHprofiles, a novel tool for exhaustive pattern detection in ChIP profiling data. CATCHprofiles is built upon a computationally efficient implementation for the exhaustive alignment and hierarchical clustering of ChIP profiling data. The tool features a graphical interface for examination and browsing of the clustering results. CATCHprofiles requires no prior knowledge about functional sites, detects known binding patterns “ab initio”, and enables the detection of new patterns from ChIP data at a high resolution, exemplified by the detection of asymmetric histone and histone modification patterns around H2A.Z-enriched sites. CATCHprofiles' capability for exhaustive analysis combined with its ease-of-use makes it an invaluable tool for explorative research based on ChIP profiling data.
CATCHprofiles and the CATCH algorithm run on all platforms and is available for free through the CATCH website: http://catch.cmbi.ru.nl/.
User support is available by subscribing to the mailing list firstname.lastname@example.org.
The discovery of the Ten-Eleven-Translocation (TET) oxygenases that catalyze the hydroxylation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC) has triggered an avalanche of studies aiming to resolve the role of 5hmC in gene regulation if any. Hitherto, TET1 is reported to bind to CpG-island (CGI) and bivalent promoters in mouse embryonic stem cells, whereas binding at DNAseI hypersensitive sites (HS) had escaped previous analysis. Significant enrichment/accumulation of 5hmC but not 5mC can indeed be detected at bivalent promoters and at DNaseI-HS. Surprisingly, however, 5hmC is not detected or present at very low levels at CGI promoters notwithstanding the presence of TET1. Our meta-analysis of DNA methylation profiling points to potential issues with regard to the various methodologies that are part of the toolbox used to detect 5mC and 5hmC. Discrepancies between published studies and technical limitations prevent an unambiguous assignment of 5hmC as a ‘true' epigenetic mark, that is, read and interpreted by other factors and/or as a transiently accumulating intermediary product of the conversion of 5mC to unmodified cytosines.
DNA demethylation; sixth base; Ten-Eleven-Translocation; 5-hydroxymethylcytosine; 5-methylcytosine
Imprinted macro non-protein-coding (nc) RNAs are cis-repressor transcripts that silence multiple genes in at least three imprinted gene clusters in the mouse genome. Similar macro or long ncRNAs are abundant in the mammalian genome. Here we present the full coding and non-coding transcriptome of two mouse tissues: differentiated ES cells and fetal head using an optimized RNA-Seq strategy. The data produced is highly reproducible in different sequencing locations and is able to detect the full length of imprinted macro ncRNAs such as Airn and Kcnq1ot1, whose length ranges between 80–118 kb. Transcripts show a more uniform read coverage when RNA is fragmented with RNA hydrolysis compared with cDNA fragmentation by shearing. Irrespective of the fragmentation method, all coding and non-coding transcripts longer than 8 kb show a gradual loss of sequencing tags towards the 3′ end. Comparisons to published RNA-Seq datasets show that the strategy presented here is more efficient in detecting known functional imprinted macro ncRNAs and also indicate that standardization of RNA preparation protocols would increase the comparability of the transcriptome between different RNA-Seq datasets.
DNA methylation is an epigenetic modification that plays a crucial role in a variety of biological processes. Methylated DNA is specifically bound by Methyl-CpG Binding Proteins (MBPs). Three different types of MBPs have been identified so far: the Methyl-CpG Binding Domain (MBD) family proteins, three BTB/POZ-Zn-finger proteins, and UHRF1. Most of the known MBPs have been identified via homology with the MBD and Zn-finger domains as present in MeCP2 and Kaiso, respectively. It is conceivable that other proteins are capable of recognizing methylated DNA.
For the purpose of identifying novel ‘readers’ we set up a methyl-CpG pull-down assay combined with stable-isotope labeling by amino acids in cell culture (SILAC). In a methyl-CpG pull-down with U937 nuclear extracts, we recovered several known MBPs and almost all subunits of the MBD2/NuRD complex as methylation specific binders, providing proof-of-principle. Interestingly, RBP-J, the transcription factor downstream of Notch receptors, also bound the DNA in a methylation dependent manner. Follow-up pull-downs and electrophoretic mobility shift assays (EMSAs) showed that RBP-J binds methylated DNA in the context of a mutated RBP-J consensus motif.
The here described SILAC/methyl-CpG pull-down constitutes a new approach to identify potential novel DNAme readers and will advance unraveling of the complete methyl-DNA interactome.
Motivation: The intensification of DNA sequencing will increasingly unveil uncharacterized species with potential alternative genetic codes. A total of 0.65% of the DNA sequences currently in Genbank encode their proteins with a variant genetic code, and these exceptions occur in many unrelated taxa.
Results: We introduce FACIL (Fast and Accurate genetic Code Inference and Logo), a fast and reliable tool to evaluate nucleic acid sequences for their genetic code that detects alternative codes even in species distantly related to known organisms. To illustrate this, we apply FACIL to a set of mitochondrial genomic contigs of Globobulimina pseudospinescens. This foraminifer does not have any sequenced close relative in the databases, yet we infer its alternative genetic code with high confidence values. Results are intuitively visualized in a Genetic Code Logo.
Availability and implementation: FACIL is available as a web-based service at http://www.cmbi.ru.nl/FACIL/ and as a stand-alone program.
Supplementary information: Supplementary data are available at Bioinformatics online.
The beginning of this century was not only marked by the publication of the first draft of the human genome but also set off a decade of intense research on epigenetic phenomena. Apart from DNA methylation, it became clear that many other factors including a wide range of histone modifications, different shades of chromatin accessibility, and a vast suite of noncoding RNAs comprise the epigenome. With the recent advances in sequencing technologies, it has now become possible to analyze many of these features in depth, allowing for the first time the establishment of complete epigenomic profiles for basically every cell type of interest. Here, we will discuss the recent advances that allow comprehensive epigenetic mapping, highlight several projects that set out to better understand the epigenome, and discuss the impact that epigenomic mapping can have on our understanding of both healthy and diseased cells.
epigenome; chromatin accessibility; DNA methylation; ChIP-seq; RNA-seq