1.  A Comparison of Peak Callers Used for DNase-Seq Data 
PLoS ONE  2014;9(5):e96303.
Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase-seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value.
PMCID: PMC4014496  PMID: 24810143
2.  Synergistic Mechanisms of DNA Demethylation during Transition to Ground-State Pluripotency 
Stem Cell Reports  2013;1(6):518-531.
Pluripotent stem cells (PSCs) occupy a spectrum of reversible molecular states ranging from a naive ground-state in 2i, to metastable embryonic stem cells (ESCs) in serum, to lineage-primed epiblast stem cells (EpiSCs). To investigate the role of DNA methylation (5mC) across distinct pluripotent states, we mapped genome-wide 5mC and 5-hydroxymethycytosine (5hmC) in multiple PSCs. Ground-state ESCs exhibit an altered distribution of 5mC and 5hmC at regulatory elements and dramatically lower absolute levels relative to ESCs in serum. By contrast, EpiSCs exhibit increased promoter 5mC coupled with reduced 5hmC, which contributes to their developmental restriction. Switch to 2i triggers rapid onset of both the ground-state gene expression program and global DNA demethylation. Mechanistically, repression of de novo methylases by PRDM14 drives DNA demethylation at slow kinetics, whereas TET1/TET2-mediated 5hmC conversion enhances both the rate and extent of hypomethylation. These processes thus act synergistically during transition to ground-state pluripotency to promote a robust hypomethylated state.
Graphical Abstract
•Distinct genome-wide 5mC and 5hmC profiles in diverse pluripotent stem cells•Poised enhancers and promoters are enriched in 5hmC in ESCs in serum, but not 2i•Prdm14 overexpression in serum ESCs promotes partial demethylation at slow kinetics•Mutations in Tet1/Tet2 partially block DNA hypomethylation in ground-state cells
Pluripotent stem cells (PSCs) can give rise to all embryonic lineages. Hackett, Surani, and colleagues analyzed the epigenetic landscape of PSCs in distinct but interchangeable pluripotent “states” and found they are associated with discrete 5mC and 5hmC profiles at regulatory elements and genome wide. Notably, ground-state PSCs are globally hypomethylated via the synergistic effects of PRDM14-dependent repression of Dnmt3 genes and TET-mediated 5hmC conversion.
PMCID: PMC3871394  PMID: 24371807
3.  Germline DNA Demethylation Dynamics and Imprint Erasure through 5-hydroxymethylcytosine 
Science (New York, N.Y.)  2012;339(6118):10.1126/science.1229277.
Mouse primordial germ cells (PGC) undergo sequential epigenetic changes and genome-wide DNA demethylation to reset the epigenome for totipotency. Here, we demonstrate that erasure of CpG methylation (5mC) in PGCs occurs via conversion to 5-hydroxymethylcytosine (5hmC), driven by high levels of TET1 and TET2. Global conversion to 5hmC initiates asynchronously among PGCs at embryonic day (E) 9.5-E10.5 and accounts for the unique process of imprint erasure. Mechanistically, 5hmC enrichment is followed by its protracted decline thereafter at a rate consistent with replication-coupled dilution. The conversion to 5hmC is a significant component of parallel redundant systems that drive comprehensive reprogramming in PGCs. Nonetheless, we identify rare regulatory elements that escape systematic DNA demethylation in PGCs, providing a potential mechanistic basis for transgenerational epigenetic inheritance.
PMCID: PMC3847602  PMID: 23223451
4.  The epigenetic regulator PLZF represses L1 retrotransposition in germ and progenitor cells 
The EMBO journal  2013;32(13):10.1038/emboj.2013.118.
Germ cells and adult stem cells maintain tissue homeostasis through a finely tuned program of responses to both physiological and stress related signals. PLZF, a member of the POK family of transcription factors, acts as an epigenetic regulator of stem cell maintenance in germ cells and in hematopoietic stem cells. We identified L1 retrotransposons as the primary targets of PLZF. PLZF-mediated DNA methylation induces silencing of the full length L1 gene and inhibit L1 retrotransposition. Furthermore, PLZF causes the formation of barrier-type boundaries by acting on inserted truncated L1 sequences in protein coding genes. Cell stress releases PLZF-mediated repression, resulting in L1 activation/retrotransposition and impaired spermatogenesis and myelopoiesis. These results reveal a novel mechanism of action by which, PLZF represses retrotransposons, safeguarding normal progenitor homeostasis.
PMCID: PMC3810588  PMID: 23727884
5.  Binding of TFIIIC to SINE Elements Controls the Relocation of Activity-Dependent Neuronal Genes to Transcription Factories 
PLoS Genetics  2013;9(8):e1003699.
In neurons, the timely and accurate expression of genes in response to synaptic activity relies on the interplay between epigenetic modifications of histones, recruitment of regulatory proteins to chromatin and changes to nuclear structure. To identify genes and regulatory elements responsive to synaptic activation in vivo, we performed a genome-wide ChIPseq analysis of acetylated histone H3 using somatosensory cortex of mice exposed to novel enriched environmental (NEE) conditions. We discovered that Short Interspersed Elements (SINEs) located distal to promoters of activity-dependent genes became acetylated following exposure to NEE and were bound by the general transcription factor TFIIIC. Importantly, under depolarizing conditions, inducible genes relocated to transcription factories (TFs), and this event was controlled by TFIIIC. Silencing of the TFIIIC subunit Gtf3c5 in non-stimulated neurons induced uncontrolled relocation to TFs and transcription of activity-dependent genes. Remarkably, in cortical neurons, silencing of Gtf3c5 mimicked the effects of chronic depolarization, inducing a dramatic increase of both dendritic length and branching. These findings reveal a novel and essential regulatory function of both SINEs and TFIIIC in mediating gene relocation and transcription. They also suggest that TFIIIC may regulate the rearrangement of nuclear architecture, allowing the coordinated expression of activity-dependent neuronal genes.
Author Summary
In neurons, acetylation of histones and other epigenetic modifications influence gene expression in response to synaptic activity. Genes that are concomitantly expressed in response to stimulation are transcribed at specific nuclear foci, known as transcription factories (TFs) that are enriched with active RNA Polymerase II and often include specific transcription factors. Here, we show a novel regulatory role for Short Interspersed Elements (SINEs) located in the proximity of activity-regulated genes. SINEs represent a new class of regulatory sequences that function as coordinators of depolarization-dependent transcription. Binding of the general transcription factor TFIIIC to SINEs regulates activity-dependent transcription, relocation of inducible genes to transcription factories and dendritogenesis. Our study provides new fundamental insights into the mechanisms by which relocation of inducible genes to transcription factories and changes of nuclear architecture coordinate the transcriptional program in response to neuronal activity.
PMCID: PMC3744447  PMID: 23966877
6.  Chromatin Accessibility Data Sets Show Bias Due to Sequence Specificity of the DNase I Enzyme 
PLoS ONE  2013;8(7):e69853.
DNase I is an enzyme which cuts duplex DNA at a rate that depends strongly upon its chromatin environment. In combination with high-throughput sequencing (HTS) technology, it can be used to infer genome-wide landscapes of open chromatin regions. Using this technology, systematic identification of hundreds of thousands of DNase I hypersensitive sites (DHS) per cell type has been possible, and this in turn has helped to precisely delineate genomic regulatory compartments. However, to date there has been relatively little investigation into possible biases affecting this data.
We report a significant degree of sequence preference spanning sites cut by DNase I in a number of published data sets. The two major protocols in current use each show a different pattern, but for a given protocol the pattern of sequence specificity seems to be quite consistent. The patterns are substantially different from biases seen in other types of HTS data sets, and in some cases the most constrained position lies outside the sequenced fragment, implying that this constraint must relate to the digestion process rather than events occurring during library preparation or sequencing.
DNase I is a sequence-specific enzyme, with a specificity that may depend on experimental conditions. This sequence specificity is not taken into account by existing pipelines for identifying open chromatin regions. Care must be taken when interpreting DNase I results, especially when looking at the precise locations of the reads. Future studies may be able to improve the sensitivity and precision of chromatin state measurement by compensating for sequence bias.
PMCID: PMC3724795  PMID: 23922824
7.  Inactive or moderately active human promoters are enriched for inter-individual epialleles 
Genome Biology  2013;14(5):R43.
Inter-individual epigenetic variation, due to genetic, environmental or random influences, is observed in many eukaryotic species. In mammals, however, the molecular nature of epiallelic variation has been poorly defined, partly due to the restricted focus on DNA methylation. Here we report the first genome-scale investigation of mammalian epialleles that integrates genomic, methylomic, transcriptomic and histone state information.
First, in a small sample set, we demonstrate that non-genetically determined inter-individual differentially methylated regions (iiDMRs) can be temporally stable over at least 2 years. Then, we show that iiDMRs are associated with changes in chromatin state as measured by inter-individual differences in histone variant H2A.Z levels. However, the correlation of promoter iiDMRs with gene expression is negligible and not improved by integrating H2A.Z information. We find that most promoter epialleles, whether genetically or non-genetically determined, are associated with low levels of transcriptional activity, depleted for housekeeping genes, and either depleted for H3K4me3/enriched for H3K27me3 or lacking both these marks in human embryonic stem cells. The preferential enrichment of iiDMRs at regions of relative transcriptional inactivity validates in a larger independent cohort, and is reminiscent of observations previously made for promoters that undergo hypermethylation in various cancers, in vitro cell culture and ageing.
Our work identifies potential key features of epiallelic variation in humans, including temporal stability of non-genetically determined epialleles, and concomitant perturbations of chromatin state. Furthermore, our work suggests a novel mechanistic link among inter-individual epialleles observed in the context of normal variation, cancer and ageing.
PMCID: PMC4053860  PMID: 23706135
Epigenetics; DNA methylation; epialleles
8.  Distinct Epigenomic Features in End-Stage Failing Human Hearts 
Circulation  2011;124(22):2411-2422.
The epigenome refers to marks on the genome, including DNA methylation and histone modifications, that regulate the expression of underlying genes. A consistent profile of gene expression changes in end-stage cardiomyopathy led us to hypothesize that distinct global patterns of the epigenome may also exist.
Methods and Results
We constructed genome-wide maps of DNA methylation and histone-3 lysine-36 trimethylation (H3K36me3) enrichment for cardiomyopathic and normal human hearts. More than 506 Mb sequences per library were generated by high-throughput sequencing, allowing us to assign methylation scores to ≈28 million CG dinucleotides in the human genome. DNA methylation was significantly different in promoter CpG islands, intragenic CpG islands, gene bodies, and H3K36me3-enriched regions of the genome. DNA methylation differences were present in promoters of upregulated genes but not downregulated genes. H3K36me3 enrichment itself was also significantly different in coding regions of the genome. Specifically, abundance of RNA transcripts encoded by the DUX4 locus correlated to differential DNA methylation and H3K36me3 enrichment. In vitro, Dux gene expression was responsive to a specific inhibitor of DNA methyltransferase, and Dux siRNA knockdown led to reduced cell viability.
Distinct epigenomic patterns exist in important DNA elements of the cardiac genome in human end-stage cardiomyopathy. The epigenome may control the expression of local or distal genes with critical functions in myocardial stress response. If epigenomic patterns track with disease progression, assays for the epigenome may be useful for assessing prognosis in heart failure. Further studies are needed to determine whether and how the epigenome contributes to the development of cardiomyopathy.
PMCID: PMC3634158  PMID: 22025602
genes; genome-wide analysis; genomics; heart failure
10.  Epigenome-Wide Association Studies for common human diseases 
Nature reviews. Genetics  2011;12(8):529-541.
Despite the success of genome-wide association studies (GWAS) in identifying loci associated with common diseases, a significant proportion of the causality remains unexplained. Recent advances in genomic technologies have placed us in a position to initiate large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation (DNAm). Such Epigenome-Wide Association Studies (EWAS) present novel opportunities but also create new challenges that are not encountered in GWAS. We discuss EWAS study design, cohort and sample selections, statistical significance and power, confounding factors, and follow-up studies. We also discuss how integration of EWAS with GWAS can help to dissect complex GWAS haplotypes for functional analysis.
PMCID: PMC3508712  PMID: 21747404
Epigenomics; Disease Genetics; DNA Methylation; Epigenetics; Quantitative Trait
11.  Genomic Targets of Brachyury (T) in Differentiating Mouse Embryonic Stem Cells 
PLoS ONE  2012;7(3):e33346.
The T-box transcription factor Brachyury (T) is essential for formation of the posterior mesoderm and the notochord in vertebrate embryos. Work in the frog and the zebrafish has identified some direct genomic targets of Brachyury, but little is known about Brachyury targets in the mouse.
Methodology/Principal Findings
Here we use chromatin immunoprecipitation and mouse promoter microarrays to identify targets of Brachyury in embryoid bodies formed from differentiating mouse ES cells. The targets we identify are enriched for sequence-specific DNA binding proteins and include components of signal transduction pathways that direct cell fate in the primitive streak and tailbud of the early embryo. Expression of some of these targets, such as Axin2, Fgf8 and Wnt3a, is down regulated in Brachyury mutant embryos and we demonstrate that they are also Brachyury targets in the human. Surprisingly, we do not observe enrichment of the canonical T-domain DNA binding sequence 5′-TCACACCT-3′ in the vicinity of most Brachyury target genes. Rather, we have identified an (AC)n repeat sequence, which is conserved in the rat but not in human, zebrafish or Xenopus. We do not understand the significance of this sequence, but speculate that it enhances transcription factor binding in the regulatory regions of Brachyury target genes in rodents.
Our work identifies the genomic targets of a key regulator of mesoderm formation in the early mouse embryo, thereby providing insights into the Brachyury-driven genetic regulatory network and allowing us to compare the function of Brachyury in different species.
PMCID: PMC3316570  PMID: 22479388
12.  Genome Wide Analysis of Acute Myeloid Leukemia Reveal Leukemia Specific Methylome and Subtype Specific Hypomethylation of Repeats 
PLoS ONE  2012;7(3):e33213.
Methylated DNA immunoprecipitation followed by high-throughput sequencing (MeDIP-seq) has the potential to identify changes in DNA methylation important in cancer development. In order to understand the role of epigenetic modulation in the development of acute myeloid leukemia (AML) we have applied MeDIP-seq to the DNA of 12 AML patients and 4 normal bone marrows. This analysis revealed leukemia-associated differentially methylated regions that included gene promoters, gene bodies, CpG islands and CpG island shores. Two genes (SPHKAP and DPP6) with significantly methylated promoters were of interest and further analysis of their expression showed them to be repressed in AML. We also demonstrated considerable cytogenetic subtype specificity in the methylomes affecting different genomic features. Significantly distinct patterns of hypomethylation of certain interspersed repeat elements were associated with cytogenetic subtypes. The methylation patterns of members of the SINE family tightly clustered all leukemic patients with an enrichment of Alu repeats with a high CpG density (P<0.0001). We were able to demonstrate significant inverse correlation between intragenic interspersed repeat sequence methylation and gene expression with SINEs showing the strongest inverse correlation (R2 = 0.7). We conclude that the alterations in DNA methylation that accompany the development of AML affect not only the promoters, but also the non-promoter genomic features, with significant demethylation of certain interspersed repeat DNA elements being associated with AML cytogenetic subtypes. MeDIP-seq data were validated using bisulfite pyrosequencing and the Infinium array.
PMCID: PMC3315563  PMID: 22479372
13.  Identification of Type 1 Diabetes–Associated DNA Methylation Variable Positions That Precede Disease Diagnosis 
PLoS Genetics  2011;7(9):e1002300.
Monozygotic (MZ) twin pair discordance for childhood-onset Type 1 Diabetes (T1D) is ∼50%, implicating roles for genetic and non-genetic factors in the aetiology of this complex autoimmune disease. Although significant progress has been made in elucidating the genetics of T1D in recent years, the non-genetic component has remained poorly defined. We hypothesized that epigenetic variation could underlie some of the non-genetic component of T1D aetiology and, thus, performed an epigenome-wide association study (EWAS) for this disease. We generated genome-wide DNA methylation profiles of purified CD14+ monocytes (an immune effector cell type relevant to T1D pathogenesis) from 15 T1D–discordant MZ twin pairs. This identified 132 different CpG sites at which the direction of the intra-MZ pair DNA methylation difference significantly correlated with the diabetic state, i.e. T1D–associated methylation variable positions (T1D–MVPs). We confirmed these T1D–MVPs display statistically significant intra-MZ pair DNA methylation differences in the expected direction in an independent set of T1D–discordant MZ pairs (P = 0.035). Then, to establish the temporal origins of the T1D–MVPs, we generated two further genome-wide datasets and established that, when compared with controls, T1D–MVPs are enriched in singletons both before (P = 0.001) and at (P = 0.015) disease diagnosis, and also in singletons positive for diabetes-associated autoantibodies but disease-free even after 12 years follow-up (P = 0.0023). Combined, these results suggest that T1D–MVPs arise very early in the etiological process that leads to overt T1D. Our EWAS of T1D represents an important contribution toward understanding the etiological role of epigenetic variation in type 1 diabetes, and it is also the first systematic analysis of the temporal origins of disease-associated epigenetic variation for any human complex disease.
Author Summary
Type 1 diabetes (T1D) is a complex autoimmune disease affecting >30 million people worldwide. It is caused by a combination of genetic and non-genetic factors, leading to destruction of insulin-secreting cells. Although significant progress has recently been made in elucidating the genetics of T1D, the non-genetic component has remained poorly defined. Epigenetic modifications, such as methylation of DNA, are indispensable for genomic processes such as transcriptional regulation and are frequently perturbed in human disease. We therefore hypothesized that epigenetic variation could underlie some of the non-genetic component of T1D aetiology, and we performed a genome-wide DNA methylation analysis of a specific subset of immune cells (monocytes) from monozygotic twins discordant for T1D. This revealed the presence of T1D–specific methylation variable positions (T1D–MVPs) in the T1D–affected co-twins. Since these T1D–MVPs were found in MZ twins, they cannot be due to genetic differences. Additional experiments revealed that some of these T1D–MVPs are found in individuals before T1D diagnosis, suggesting they arise very early in the process that leads to overt T1D and are not simply due to post-disease associated factors (e.g. medication or long-term metabolic changes). T1D–MVPs may thus potentially represent a previously unappreciated, and important, component of type 1 diabetes risk.
PMCID: PMC3183089  PMID: 21980303
14.  Systematic bias in high-throughput sequencing data and its correction by BEADS 
Nucleic Acids Research  2011;39(15):e103.
Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina’s Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses.
PMCID: PMC3159482  PMID: 21646344
15.  DNA methylation profiling of human chromosomes 6, 20 and 22 
Nature genetics  2006;38(12):1378-1385.
DNA methylation constitutes the most stable type of epigenetic modifications modulating the transcriptional plasticity of mammalian genomes. Using bisulfite DNA sequencing, we report high-resolution methylation reference profiles of human chromosomes 6, 20 and 22, providing a resource of about 1.9 million CpG methylation values derived from 12 different tissues. Analysis of 6 annotation categories, revealed evolutionary conserved regions to be the predominant sites for differential DNA methylation and a core region surrounding the transcriptional start site as informative surrogate for promoter methylation. We find 17% of the 873 analyzed genes differentially methylated in their 5′-untranslated regions (5′-UTR) and about one third of the differentially methylated 5′-UTRs to be inversely correlated with transcription. While our study was controlled for factors reported to affect DNA methylation such as sex and age, we did not find any significant attributable effects. Our data suggest DNA methylation to be ontogenetically more stable than previously thought.
PMCID: PMC3082778  PMID: 17072317
16.  Dalliance: interactive genome viewing on the web 
Bioinformatics  2011;27(6):889-890.
Summary: Dalliance is a new genome viewer which offers a high level of interactivity while running within a web browser. All data is fetched using the established distributed annotation system (DAS) protocol, making it easy to customize the browser and add extra data.
Availability and Implementation: Dalliance runs entirely within your web browser, and relies on existing DAS server infrastructure. Browsers for several mammalian genomes are available at, and the use of DAS means you can add your own data to these browsers. In addition, the source code (Javascript) is available under the BSD license, and is straightforward to install on your own web server and embed within other documents.
PMCID: PMC3051325  PMID: 21252075
17.  Assessing Computational Methods of Cis-Regulatory Module Prediction 
PLoS Computational Biology  2010;6(12):e1001020.
Computational methods attempting to identify instances of cis-regulatory modules (CRMs) in the genome face a challenging problem of searching for potentially interacting transcription factor binding sites while knowledge of the specific interactions involved remains limited. Without a comprehensive comparison of their performance, the reliability and accuracy of these tools remains unclear. Faced with a large number of different tools that address this problem, we summarized and categorized them based on search strategy and input data requirements. Twelve representative methods were chosen and applied to predict CRMs from the Drosophila CRM database REDfly, and across the human ENCODE regions. Our results show that the optimal choice of method varies depending on species and composition of the sequences in question. When discriminating CRMs from non-coding regions, those methods considering evolutionary conservation have a stronger predictive power than methods designed to be run on a single genome. Different CRM representations and search strategies rely on different CRM properties, and different methods can complement one another. For example, some favour homotypical clusters of binding sites, while others perform best on short CRMs. Furthermore, most methods appear to be sensitive to the composition and structure of the genome to which they are applied. We analyze the principal features that distinguish the methods that performed well, identify weaknesses leading to poor performance, and provide a guide for users. We also propose key considerations for the development and evaluation of future CRM-prediction methods.
Author Summary
Transcriptional regulation involves multiple transcription factors binding to DNA sequences. A limited repertoire of transcription factors performs this complex regulatory step through various spatial and temporal interactions between themselves and their binding sites. These transcription factor binding interactions are clustered as distinct modules: cis-regulatory modules (CRMs). Computational methods attempting to identify instances of CRMs in the genome face a challenging problem because a majority of these interactions between transcription factors remain unknown. To investigate the reliability and accuracy of these methods, we chose twelve representative methods and applied them to predict CRMs on both the fly and human genomes. Our results show that the optimal choice of method varies depending on species and composition of the sequences in question. Different CRM representations and search strategies rely on different CRM properties, and different methods can complement one another. We provide a guide for users and key considerations for developers. We also expect that, along with new technology generating new types of genomic data, future CRM prediction methods will be able to reveal transcription binding interactions in three-dimensional space.
PMCID: PMC2996316  PMID: 21152003
18.  Integrated Genetic and Epigenetic Analysis Identifies Haplotype-Specific Methylation in the FTO Type 2 Diabetes and Obesity Susceptibility Locus 
PLoS ONE  2010;5(11):e14040.
Recent multi-dimensional approaches to the study of complex disease have revealed powerful insights into how genetic and epigenetic factors may underlie their aetiopathogenesis. We examined genotype-epigenotype interactions in the context of Type 2 Diabetes (T2D), focussing on known regions of genomic susceptibility. We assayed DNA methylation in 60 females, stratified according to disease susceptibility haplotype using previously identified association loci. CpG methylation was assessed using methylated DNA immunoprecipitation on a targeted array (MeDIP-chip) and absolute methylation values were estimated using a Bayesian algorithm (BATMAN). Absolute methylation levels were quantified across LD blocks, and we identified increased DNA methylation on the FTO obesity susceptibility haplotype, tagged by the rs8050136 risk allele A (p = 9.40×10−4, permutation p = 1.0×10−3). Further analysis across the 46 kb LD block using sliding windows localised the most significant difference to be within a 7.7 kb region (p = 1.13×10−7). Sequence level analysis, followed by pyrosequencing validation, revealed that the methylation difference was driven by the co-ordinated phase of CpG-creating SNPs across the risk haplotype. This 7.7 kb region of haplotype-specific methylation (HSM), encapsulates a Highly Conserved Non-Coding Element (HCNE) that has previously been validated as a long-range enhancer, supported by the histone H3K4me1 enhancer signature. This study demonstrates that integration of Genome-Wide Association (GWA) SNP and epigenomic DNA methylation data can identify potential novel genotype-epigenotype interactions within disease-associated loci, thus providing a novel route to aid unravelling common complex diseases.
PMCID: PMC2987816  PMID: 21124985
19.  Genome-Wide Identification of Targets and Function of Individual MicroRNAs in Mouse Embryonic Stem Cells 
PLoS Genetics  2010;6(10):e1001163.
Mouse Embryonic Stem (ES) cells express a unique set of microRNAs (miRNAs), the miR-290-295 cluster. To elucidate the role of these miRNAs and how they integrate into the ES cell regulatory network requires identification of their direct regulatory targets. The difficulty, however, arises from the limited complementarity of metazoan miRNAs to their targets, with the interaction requiring as few as six nucleotides of the miRNA seed sequence. To identify miR-294 targets, we used Dicer1-null ES cells, which lack all endogenous mature miRNAs, and introduced just miR-294 into these ES cells. We then employed two approaches to discover miR-294 targets in mouse ES cells: transcriptome profiling using microarrays and a biochemical approach to isolate mRNA targets associated with the Argonaute2 (Ago2) protein of the RISC (RNA Induced Silencing Complex) effector, followed by RNA–sequencing. In the absence of Dicer1, the RISC complexes are largely devoid of mature miRNAs and should therefore contain only transfected miR-294 and its base-paired targets. Our data suggest that miR-294 may promote pluripotency by regulating a subset of c-Myc target genes and upregulating pluripotency-associated genes such as Lin28.
Author Summary
Stem cells in plants and animals contain many small RNAs, which help to regulate differentiation into diverse cell types. Mutation in a gene necessary for the maturation of small RNAs in plants causes the stem cells (called meristem cells) to remain in an indeterminate, overproliferating state. Similarly in worms, a small RNA called lin-4 miRNA prevents “stem cell–like cells” appearing at inappropriate times. Thus, it is important to determine the precise functions of key individual small RNAs in embryonic stem cells. To address this, we created embryonic stem cells lacking all miRNAs into which we introduced a single miRNA. We discovered that a single miRNA could affect the expression of many genes in stem cells, which in turn regulate key properties of stem cells. These together help establish an intricate network of gene regulation in stem cells that defines their properties. Our findings are of broad interest because different miRNAs have critical functions in diverse cell types in developing embryos. It is important to understand the function of these molecules also because misregulation of miRNA function underlies some human diseases, including cancers.
PMCID: PMC2958809  PMID: 20975942
20.  Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated 
BMC Genomics  2010;11:519.
DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS") but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq) not to be biological transcription factor binding sites ("empirical TFBS"). We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding.
Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation.
Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation.
PMCID: PMC2997012  PMID: 20875111
21.  Metamotifs - a generative model for building families of nucleotide position weight matrices 
BMC Bioinformatics  2010;11:348.
Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence.
We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain.
We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.
PMCID: PMC2906491  PMID: 20579334
22.  iMotifs: an integrated sequence motif visualization and analysis environment 
Bioinformatics  2010;26(6):843-844.
Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important.
iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces.
The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided.
Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files.
PMCID: PMC2832821  PMID: 20106815
23.  DNA Methylation-mediated Down-regulation of DNA Methyltransferase-1 (DNMT1) Is Coincident with, but Not Essential for, Global Hypomethylation in Human Placenta 
The Journal of Biological Chemistry  2010;285(13):9583-9593.
The genome of extraembryonic tissue, such as the placenta, is hypomethylated relative to that in somatic tissues. However, the origin and role of this hypomethylation remains unclear. The DNA methyltransferases DNMT1, -3A, and -3B are the primary mediators of the establishment and maintenance of DNA methylation in mammals. In this study, we investigated promoter methylation-mediated epigenetic down-regulation of DNMT genes as a potential regulator of global methylation levels in placental tissue. Although DNMT3A and -3B promoters lack methylation in all somatic and extraembryonic tissues tested, we found specific hypermethylation of the maintenance DNA methyltransferase (DNMT1) gene and found hypomethylation of the DNMT3L gene in full term and first trimester placental tissues. Bisulfite DNA sequencing revealed monoallelic methylation of DNMT1, with no evidence of imprinting (parent of origin effect). In vitro reporter experiments confirmed that DNMT1 promoter methylation attenuates transcriptional activity in trophoblast cells. However, global hypomethylation in the absence of DNMT1 down-regulation is apparent in non-primate placentas and in vitro derived human cytotrophoblast stem cells, suggesting that DNMT1 down-regulation is not an absolute requirement for genomic hypomethylation in all instances. These data represent the first demonstration of methylation-mediated regulation of the DNMT1 gene in any system and demonstrate that the unique epigenome of the human placenta includes down-regulation of DNMT1 with concomitant hypomethylation of the DNMT3L gene. This strongly implicates epigenetic regulation of the DNMT gene family in the establishment of the unique epigenetic profile of extraembryonic tissue in humans.
PMCID: PMC2843208  PMID: 20071334
Development Differentiation/Tissue; DNA/Methylation; DNA/Methyltransferase; Epigenetics; Gene Transcription; Extraembryonic Tissue; Placenta; Trophoblast
24.  Differential DNA Methylation Correlates with Differential Expression of Angiogenic Factors in Human Heart Failure 
PLoS ONE  2010;5(1):e8564.
Epigenetic mechanisms such as microRNA and histone modification are crucially responsible for dysregulated gene expression in heart failure. In contrast, the role of DNA methylation, another well-characterized epigenetic mark, is unknown. In order to examine whether human cardiomyopathy of different etiologies are connected by a unifying pattern of DNA methylation pattern, we undertook profiling with ischaemic and idiopathic end-stage cardiomyopathic left ventricular (LV) explants from patients who had undergone cardiac transplantation compared to normal control. We performed a preliminary analysis using methylated-DNA immunoprecipitation-chip (MeDIP-chip), validated differential methylation loci by bisulfite-(BS) PCR and high throughput sequencing, and identified 3 angiogenesis-related genetic loci that were differentially methylated. Using quantitative RT-PCR, we found that the expression of these genes differed significantly between CM hearts and normal control (p<0.01). Moreover, for each individual LV tissue, differential methylation showed a predicted correlation to differential expression of the corresponding gene. Thus, differential DNA methylation exists in human cardiomyopathy. In this series of heterogenous cardiomyopathic LV explants, differential DNA methylation was found in at least 3 angiogenesis-related genes. While in other systems, changes in DNA methylation at specific genomic loci usually precede changes in the expression of corresponding genes, our current findings in cardiomyopathy merit further investigation to determine whether DNA methylation changes play a causative role in the progression of heart failure.
PMCID: PMC2797324  PMID: 20084101
25.  Differential chromatin marking of introns and expressed exons by H3K36me3 
Nature genetics  2009;41(3):376-381.
Variation in patterns of methylations of histone tails reflects and modulates chromatin structure and function1-3. To provide a framework for the analysis of chromatin function in C. elegans, we generated a genome-wide map of histone H3 tail methylations. We find that C. elegans genes show similarities in distributions of histone modifications to those of other organisms, with H3K4me3 near transcription start sites, H3K36me3 in the body of genes, and H3K9me3 enriched on silent genes. Unexpectedly, we also observe a striking novel pattern: exons are preferentially marked with H3K36me3 relative to introns. H3K36me3 exon marking is dependent on transcription and its level is lower in alternatively spliced exons, supporting a splicing related marking mechanism. We further show that the difference in H3K36me3 marking between exons and introns is evolutionarily conserved in human and mouse. We propose that H3K36me3 exon marking in chromatin provides a dynamic link between transcription and splicing.
PMCID: PMC2648722  PMID: 19182803

