Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.
Many human diseases are complex with multiple genetic and environmental causal factors interacting together to give rise to disease phenotypes. Such factors affect biological systems through many layers of regulations, including transcriptional and epigenetic regulation, and protein changes. To fully understand their molecular mechanisms, complex diseases are often studied in diverse dimensions including genetics (genotype variations by single nucleotide polymorphism (SNP) arrays or whole exome sequencing), transcriptomics, epigenetics, and proteomics. However, errors in sample annotation or labeling often occur in large-scale genetic and genomic studies and are difficult to avoid completely during data generation and management. Identifying and correcting these errors are critical for integrative genomic studies. In this study, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors based on multiple types of molecular data before further integrative analysis. Our results indicate that signals increased more than 100% after correction of sample labeling errors in a large lung genomic study. Our method can be broadly applied to large genomic data sets with multiple types of omics data, such as TCGA (The Cancer Genome Atlas) data sets.
Understanding the fundamental dynamics of epigenome variation during normal aging is critical for elucidating key epigenetic alterations that affect development, cell differentiation and diseases. Advances in the field of aging and DNA methylation strongly support the aging epigenetic drift model. Although this model aligns with previous studies, the role of other epigenetic marks, such as histone modification, as well as the impact of sampling specific CpGs, must be evaluated. Ultimately, it is crucial to investigate how all CpGs in the human genome change their methylation with aging in their specific genomic and epigenomic contexts. Here, we analyze whole genome bisulfite sequencing DNA methylation maps of brain frontal cortex from individuals of diverse ages. Comparisons with blood data reveal tissue-specific patterns of epigenetic drift. By integrating chromatin state information, divergent degrees and directions of aging-associated methylation in different genomic regions are revealed. Whole genome bisulfite sequencing data also open a new door to investigate whether adjacent CpG sites exhibit coordinated DNA methylation changes with aging. We identified significant ‘aging-segments’, which are clusters of nearby CpGs that respond to aging by similar DNA methylation changes. These segments not only capture previously identified aging-CpGs but also include specific functional categories of genes with implications on epigenetic regulation of aging. For example, genes associated with development are highly enriched in positive aging segments, which are gradually hyper-methylated with aging. On the other hand, regions that are gradually hypo-methylated with aging (‘negative aging segments’) in the brain harbor genes involved in metabolism and protein ubiquitination. Given the importance of protein ubiquitination in proteome homeostasis of aging brains and neurodegenerative disorders, our finding suggests the significance of epigenetic regulation of this posttranslational modification pathway in the aging brain. Utilizing aging segments rather than individual CpGs will provide more comprehensive genomic and epigenomic contexts to understand the intricate associations between genomic neighborhoods and developmental and aging processes. These results complement the aging epigenetic drift model and provide new insights.
This paper aims to integrate into current understanding of AIS causation, etiopathogenetic information presented at two Meetings during 2012 namely, the International Research Society of Spinal Deformities (IRSSD) and the Scoliosis Research Society (SRS). The ultimate hope is to prevent the occurrence or progression of the spinal deformity of AIS with non-invasive treatment, possibly medical. This might be attained by personalised polymechanistic preventive therapy targeting the appropriate etiology and/or etiopathogenetic pathways, to avoid fusion and maintain spinal mobility. Although considerable progress had been made in the past two decades in understanding the etiopathogenesis of adolescent idiopathic scoliosis (AIS), it still lacks an agreed theory of etiopathogenesis. One problem may be that AIS results not from one cause, but several that interact with various genetic predisposing factors. There is a view there are two other pathogenic processes for idiopathic scoliosis namely, initiating (or inducing), and those that cause curve progression. Twin studies and observations of family aggregation have revealed significant genetic contributions to idiopathic scoliosis, that place AIS among other common disease or complex traits with a high heritability interpreted by the genetic variant hypothesis of disease. We summarize etiopathogenetic knowledge of AIS as theories of pathogenesis including recent multiple concepts, and blood tests for AIS based on predictive biomarkers and genetic variants that signify disease risk. There is increasing evidence for the possibility of an underlying neurological disorder for AIS, research which holds promise. Like brain research, most AIS workers focus on their own corner and there is a need for greater integration of research effort. Epigenetics, a relatively recent field, evaluates factors concerned with gene expression in relation to environment, disease, normal development and aging, with a complex regulation across the genome during the first decade of life. Research on the role of environmental factors, epigenetics and chronic non-communicable diseases (NCDs) including adiposity, after a slow start, has exploded in the last decade. Not so for AIS research and the environment where, except for monozygotic twin studies, there are only sporadic reports to suggest that environmental factors are at work in etiology. Here, we examine epigenetic concepts as they may relate to human development, normal life history phases and AIS pathogenesis. Although AIS is not regarded as an NCD, like them, it is associated with whole organism metabolic phenomena, including lower body mass index, lower circulating leptin levels and other systemic disorders. Some epigenetic research applied to Silver-Russell syndrome and adiposity is examined, from which suggestions are made for consideration of AIS epigenetic research, cross-sectional and longitudinal. The word scoliogeny is suggested to include etiology, pathogenesis and pathomechanism.
Scoliosis; Etiology; Pathogenesis; Scoliogeny; Epigenetics
CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of large-scale epigenetic datasets. Although widely used, the current CpG island criteria incur significant disadvantages: (1) reliance on arbitrary threshold parameters that bear little biological justification, (2) failure to account for widespread heterogeneity among CpG islands, and (3) apparent lack of specificity when applied to the human genome. This study is driven by the idea that a quantitative score of “CpG island strength” that incorporates epigenetic and functional aspects can help resolve these issues. We construct an epigenome prediction pipeline that links the DNA sequence of CpG islands to their epigenetic states, including DNA methylation, histone modifications, and chromatin accessibility. By training support vector machines on epigenetic data for CpG islands on human Chromosomes 21 and 22, we identify informative DNA attributes that correlate with open versus compact chromatin structures. These DNA attributes are used to predict the epigenetic states of all CpG islands genome-wide. Combining predictions for multiple epigenetic features, we estimate the inherent CpG island strength for each CpG island in the human genome, i.e., its inherent tendency to exhibit an open and transcriptionally competent chromatin structure. We extensively validate our results on independent datasets, showing that the CpG island strength predictions are applicable and informative across different tissues and cell types, and we derive improved maps of predicted “bona fide” CpG islands. The mapping of CpG islands by epigenome prediction is conceptually superior to identifying CpG islands by widely used sequence criteria since it links CpG island detection to their characteristic epigenetic and functional states. And it is superior to purely experimental epigenome mapping for CpG island detection since it abstracts from specific properties that are limited to a single cell type or tissue. In addition, using computational epigenetics methods we could identify high correlation between the epigenome and characteristics of the DNA sequence, a finding which emphasizes the need for a better understanding of the mechanistic links between genome and epigenome.
A key challenge for bioinformatic research is the identification of regulatory regions in the human genome. Regulatory regions are DNA elements that control gene expression and thereby contribute to the organism's phenotype. An important class of regulatory regions consists of so-called CpG islands, which are characterized by frequent occurrence of the CG sequence pattern. CpG islands are strongly associated with open and transcriptionally competent chromatin structure, they play a critical role in gene regulation, and they are involved in the epigenetic causes of cancer. In this article we make several conceptual improvements to the definition and mapping of CpG islands. First, we show that the traditional distinction between CpG islands and non-CpG islands is too harsh, and instead we propose a quantitative measure of CpG island strength to gradually distinguish between stronger and weaker regulatory regions. Second, by genome-wide comparison of multiple epigenome datasets we identify high correlation between features of the genome's DNA sequence and the epigenome, indicating strong functional interdependence. Third, we develop and apply a novel method for predicting the strength of all CpG islands in the human genome, giving rise to an improved and more accurate CpG island mapping.
Throughout most of the mammalian genome, genetically regulated developmental programming establishes diverse yet predictable epigenetic states across differentiated cells and tissues. At metastable epialleles (MEs), conversely, epigenotype is established stochastically in the early embryo then maintained in differentiated lineages, resulting in dramatic and systemic interindividual variation in epigenetic regulation. In the mouse, maternal nutrition affects this process, with permanent phenotypic consequences for the offspring. MEs have not previously been identified in humans. Here, using an innovative 2-tissue parallel epigenomic screen, we identified putative MEs in the human genome. In autopsy samples, we showed that DNA methylation at these loci is highly correlated across tissues representing all 3 embryonic germ layer lineages. Monozygotic twin pairs exhibited substantial discordance in DNA methylation at these loci, suggesting that their epigenetic state is established stochastically. We then tested for persistent epigenetic effects of periconceptional nutrition in rural Gambians, who experience dramatic seasonal fluctuations in nutritional status. DNA methylation at MEs was elevated in individuals conceived during the nutritionally challenged rainy season, providing the first evidence of a permanent, systemic effect of periconceptional environment on human epigenotype. At MEs, epigenetic regulation in internal organs and tissues varies among individuals and can be deduced from peripheral blood DNA. MEs should therefore facilitate an improved understanding of the role of interindividual epigenetic variation in human disease.
There is growing interest in the possibility that interindividual epigenetic variation plays an important role in a broad range of human diseases. The tissue-specificity of epigenetic regulation, however, will in many cases make it difficult to obtain the appropriate tissues in which to perform large-scale studies linking epigenetic dysregulation to disease. We have used an innovative two-tissue DNA methylation screen to identify genomic regions that exhibit interindividual epigenetic variation which occurs systemically—i.e. similarly in all tissues. Such regions—called metastable epialleles—have previously been identified in mice because they cause visible phenotypic variation amongst genetically identical individuals. Indeed, we found that even monozygotic twins show substantial epigenetic discordance at these loci. Further, we show that, as in mice, establishment of DNA methylation at these putative human metastable epialleles is labile to maternal environment around the time of conception. Metastable epialleles should facilitate an improved understanding both of the role of interindividual epigenetic variation in human disease and of the effects of early environment on the establishment of human epigenotype.
Cancer is a disease arising from both genetic and epigenetic modifications of DNA that contribute to changes in gene expression in the cell. Genetic modifications include loss or amplification of DNA, loss of heterozygosity (LOH) as well as gene mutations. Epigenetic changes in cancer are generally thought to be brought about by alterations in DNA and histone modifications that lead to the silencing of tumour suppressor genes and the activation of oncogenic genes. Other consequences that result from epigenetic changes, such as inappropriate expression or repression of some genes in the wrong cellular context, can also result in the alteration of control and physiological systems such that a normal cell becomes tumorigenic. Excessive levels of the enzymes that act as epigenetic modifiers have been reported as markers of aggressive breast cancer and are associated with metastatic progression. It is likely that this is a common contributor to the recurrence and spread of the disease. The emphasis on genetic changes, for example in genome-wide association studies and increasingly in whole genome sequencing analyses of tumours, has resulted in the importance of epigenetic changes having less attention until recently. Epigenetic alterations at both the DNA and histone level are increasingly being recognised as playing a role in tumourigenesis. Recent studies have found that distinct subgroups of poor-prognosis tumours lack genetic alterations but are epigenetically deregulated, pointing to the important role that epigenetic modifications and/or their modifiers may play in cancer. In this review, we highlight the multitude of epigenetic changes that can occur and will discuss how deregulation of epigenetic modifiers contributes to cancer progression. We also discuss the off-target effects that epigenetic modifiers may have, notably the effects that histone modifiers have on non-histone proteins that can modulate protein expression and activity, as well as the role of hypoxia in epigenetic regulation.
Epigenetics; Hypoxia; Cancer; DNA methylation; Histone modifications; Acetylation; Demethylation; Transcription
Epigenetic marks such as cytosine methylation are important determinants of cellular and whole-body phenotypes. However, the extent of, and reasons for inter-individual differences in cytosine methylation, and their association with phenotypic variation are poorly characterised. Here we present the first genome-wide study of cytosine methylation at single-nucleotide resolution in an animal model of human disease. We used whole-genome bisulfite sequencing in the spontaneously hypertensive rat (SHR), a model of cardiovascular disease, and the Brown Norway (BN) control strain, to define the genetic architecture of cytosine methylation in the mammalian heart and to test for association between methylation and pathophysiological phenotypes. Analysis of 10.6 million CpG dinucleotides identified 77,088 CpGs that were differentially methylated between the strains. In F1 hybrids we found 38,152 CpGs showing allele-specific methylation and 145 regions with parent-of-origin effects on methylation. Cis-linkage explained almost 60% of inter-strain variation in methylation at a subset of loci tested for linkage in a panel of recombinant inbred (RI) strains. Methylation analysis in isolated cardiomyocytes showed that in the majority of cases methylation differences in cardiomyocytes and non-cardiomyocytes were strain-dependent, confirming a strong genetic component for cytosine methylation. We observed preferential nucleotide usage associated with increased and decreased methylation that is remarkably conserved across species, suggesting a common mechanism for germline control of inter-individual variation in CpG methylation. In the RI strain panel, we found significant correlation of CpG methylation and levels of serum chromogranin B (CgB), a proposed biomarker of heart failure, which is evidence for a link between germline DNA sequence variation, CpG methylation differences and pathophysiological phenotypes in the SHR strain. Together, these results will stimulate further investigation of the molecular basis of locally regulated variation in CpG methylation and provide a starting point for understanding the relationship between the genetic control of CpG methylation and disease phenotypes.
Epigenetic marks provide information that is not encoded in the primary DNA sequence itself but in modifications of genomic DNA and of the associated proteins. Methylation of genomic DNA at cytosine residues is an important epigenetic modification that is associated with developmental processes, carcinogenesis and other diseases. Genome-wide extent of, and reasons for inter-individual differences in cytosine methylation, and their association with phenotypic variation are poorly characterised. To address these questions we have determined and compared the genome-wide methylation patterns in heart tissue of two inbred rat strains, the spontaneously hypertensive rat, an animal model of human disease and a control rat strain. Comparison of methylation differences between genetically identical animals from the same strain and differences between animals from different strains allowed us to quantify association of epigenetic and genetic differences. We show that differences in an individual's germline DNA sequence are important determinants of the variability in methylation between individuals. Comparison with previous reports implicates common mechanisms for regulation of cytosine methylation that are highly conserved across species. Finally, we find correlation between a proposed blood biomarker for heart failure and variation in DNA methylation, suggesting a link between germline DNA sequence variation, methylation and a disease-related phenotype.
Recent studies have demonstrated that the DNA methylome changes with age. This epigenetic drift may have deep implications for cellular differentiation and disease development. However, it remains unclear how much of this drift is functional or caused by underlying changes in cell subtype composition. Moreover, no study has yet comprehensively explored epigenetic drift at different genomic length scales and in relation to regulatory elements.
Here we conduct an in-depth analysis of epigenetic drift in blood tissue. We demonstrate that most of the age-associated drift is independent of the increase in the granulocyte to lymphocyte ratio that accompanies aging and that enrichment of age-hypermethylated CpG islands increases upon adjustment for cellular composition. We further find that drift has only a minimal impact on in-cis gene expression, acting primarily to stabilize pre-existing baseline expression levels. By studying epigenetic drift at different genomic length scales, we demonstrate the existence of mega-base scale age-associated hypomethylated blocks, covering approximately 14% of the human genome, and which exhibit preferential hypomethylation in age-matched cancer tissue. Importantly, we demonstrate the feasibility of integrating Illumina 450k DNA methylation with ENCODE data to identify transcription factors with key roles in cellular development and aging. Specifically, we identify REST and regulatory factors of the histone methyltransferase MLL complex, whose function may be disrupted in aging.
In summary, most of the epigenetic drift seen in blood is independent of changes in blood cell type composition, and exhibits patterns at different genomic length scales reminiscent of those seen in cancer. Integration of Illumina 450k with appropriate ENCODE data may represent a fruitful approach to identify transcription factors with key roles in aging and disease.
Two well-known features of aging are the gradual decline of the body’s ability to regenerate tissues, as well as an increased incidence of diseases like cancer and Alzheimers. One of the most recent exciting findings which may underlie the aging process is a gradual modification of DNA, called epigenetic drift, which is effected by the covalent addition and removal of methyl groups, which in turn can deregulate the activity of nearby genes. However, this study presents the most convincing evidence to date that epigenetic drift acts to stabilize the activity levels of nearby genes. This study shows that instead, epigenetic drift may act primarly to disrupt DNA binding patterns of proteins which regulate the activity of many genes, and moreover identifies specific regulatory proteins with key roles in cancer and Alzheimers. The study also performs the most comprehensive analysis of epigenetic drift at different spatial scales, demonstrating that epigenetic drift on the largest length scales is highly reminiscent of those seen in cancer. In summary, this work substantially supports the view that epigenetic drift may contribute to the age-associated increased risk of diseases like cancer and Alzheimers, by disrupting master regulators of genomewide gene activity.
Advances in genomic studies have led to significant progress in understanding the epigenetically controlled interplay between chromatin structure and nuclear functions. Epigenetic modifications were shown to play a key role in transcription regulation and genome activity during development and differentiation or in response to the environment. Paradoxically, the molecular mechanisms that regulate the initiation and the maintenance of the spatio-temporal replication program in higher eukaryotes, and in particular their links to epigenetic modifications, still remain elusive. By integrative analysis of the genome-wide distributions of thirteen epigenetic marks in the human cell line K562, at the 100 kb resolution of corresponding mean replication timing (MRT) data, we identify four major groups of chromatin marks with shared features. These states have different MRT, namely from early to late replicating, replication proceeds though a transcriptionally active euchromatin state (C1), a repressive type of chromatin (C2) associated with polycomb complexes, a silent state (C3) not enriched in any available marks, and a gene poor HP1-associated heterochromatin state (C4). When mapping these chromatin states inside the megabase-sized U-domains (U-shaped MRT profile) covering about 50% of the human genome, we reveal that the associated replication fork polarity gradient corresponds to a directional path across the four chromatin states, from C1 at U-domains borders followed by C2, C3 and C4 at centers. Analysis of the other genome half is consistent with early and late replication loci occurring in separate compartments, the former correspond to gene-rich, high-GC domains of intermingled chromatin states C1 and C2, whereas the latter correspond to gene-poor, low-GC domains of alternating chromatin states C3 and C4 or long C4 domains. This new segmentation sheds a new light on the epigenetic regulation of the spatio-temporal replication program in human and provides a framework for further studies in different cell types, in both health and disease.
Previous studies revealed spatially coherent and biological-meaningful chromatin mark combinations in human cells. Here, we analyze thirteen epigenetic mark maps in the human cell line K562 at 100 kb resolution of MRT data. The complexity of epigenetic data is reduced to four chromatin states that display remarkable similarities with those reported in fly, worm and plants. These states have different MRT: (C1) is transcriptionally active, early replicating, enriched in CTCF; (C2) is Polycomb repressed, mid-S replicating; (C3) lacks of marks and replicates late and (C4) is a late-replicating gene-poor HP1 repressed heterochromatin state. When mapping these states inside the 876 replication U-domains of K562, the replication fork polarity gradient observed in these U-domains comes along with a remarkable epigenetic organization from C1 at U-domain borders to C2, C3 and ultimately C4 at centers. The remaining genome half displays early replicating, gene rich and high GC domains of intermingled C1 and C2 states segregating from late replicating, gene poor and low GC domains of concatenated C3 and/or C4 states. This constitutes the first evidence of epigenetic compartmentalization of the human genome into replication domains likely corresponding to autonomous units in the 3D chromatin architecture.
The recent development of whole genome association studies has lead to the robust identification of several loci involved in different common human diseases. Interestingly, some of the strongest signals of association observed in these studies arise from non-coding regions located in very large introns or far away from any annotated genes, raising the possibility that these regions are involved in the etiology of the disease through some unidentified regulatory mechanisms. These findings highlight the importance of better understanding the mechanisms leading to inter-individual differences in gene expression in humans. Most of the existing approaches developed to identify common regulatory polymorphisms are based on linkage/association mapping of gene expression to genotypes. However, these methods have some limitations, notably their cost and the requirement of extensive genotyping information from all the individuals studied which limits their applications to a specific cohort or tissue. Here we describe a robust and high-throughput method to directly measure differences in allelic expression for a large number of genes using the Illumina Allele-Specific Expression BeadArray platform and quantitative sequencing of RT-PCR products. We show that this approach allows reliable identification of differences in the relative expression of the two alleles larger than 1.5-fold (i.e., deviations of the allelic ratio larger than 60∶40) and offers several advantages over the mapping of total gene expression, particularly for studying humans or outbred populations. Our analysis of more than 80 individuals for 2,968 SNPs located in 1,380 genes confirms that differential allelic expression is a widespread phenomenon affecting the expression of 20% of human genes and shows that our method successfully captures expression differences resulting from both genetic and epigenetic cis-acting mechanisms.
We describe a new methodology to identify individual differences in the expression of the two copies of one gene. This is achieved by comparing the mRNA level of the two alleles using a heterozygous polymorphism in the transcript as marker. We show that this approach allows an exhaustive survey of cis-acting regulation in the genome; we can identify allelic expression differences due to epigenetic mechanisms of gene regulation (e.g. imprinting or X-inactivation) as well as differences due to the presence of polymorphisms in regulatory elements. The direct comparison of the expression of both alleles nullifies possible trans-acting regulatory effects (that influence equally both alleles) and thus complements the findings from gene expression association studies. Our approach can be easily applied to any cohort of interest for a wide range of studies. It notably allows following up association signals and testing whether a gene sitting on a particular haplotype is over- or under-expressed, or can be used for screening cancer tissues for aberrant gene expression due to newly arisen mutations or alteration of the methylation patterns.
The role of epigenetic processes in the control of gene expression has been known for a number of years. DNA methylation at cytosine residues is of particular interest for epigenetic studies as it has been demonstrated to be both a long lasting and a dynamic regulator of gene expression. Efforts to examine epigenetic changes in health and disease have been hindered by the lack of high-throughput, quantitatively accurate methods. With the advent and popularization of next-generation sequencing (NGS) technologies, these tools are now being applied to epigenomics in addition to existing genomic and transcriptomic methodologies. For epigenetic investigations of cytosine methylation where regions of interest, such as specific gene promoters or CpG islands, have been identified and there is a need to examine significant numbers of samples with high quantitative accuracy, we have developed a method called Bisulfite Amplicon Sequencing (BSAS). This method combines bisulfite conversion with targeted amplification of regions of interest, transposome-mediated library construction and benchtop NGS. BSAS offers a rapid and efficient method for analysis of up to 10 kb of targeted regions in up to 96 samples at a time that can be performed by most research groups with basic molecular biology skills. The results provide absolute quantitation of cytosine methylation with base specificity. BSAS can be applied to any genomic region from any DNA source. This method is useful for hypothesis testing studies of target regions of interest as well as confirmation of regions identified in genome-wide methylation analyses such as whole genome bisulfite sequencing, reduced representation bisulfite sequencing, and methylated DNA immunoprecipitation sequencing.
Molecular Biology; Issue 96; Epigenetics; DNA methylation; next-generation sequencing; bioinformatics; gene expression; cytosine; CpG; gene expression regulation
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
Different cells in the body are characterised by different functions and different levels of gene expression despite each sharing the same genetic code. This variation in gene activity from cell to cell is achieved by mechanisms and processes that are collectively termed epigenetics. These epigenetic changes alter gene expression without altering the DNA sequence. One epigenetic mechanism that is readily measured is DNA methylation. It is potentially reversible and heritable over rounds of cell division. Furthermore such epigenetic modification of DNA can be influenced by environment, gene interaction or by stochastic error and there is a higher rate of epimutation than DNA mutation. Variation in DNA methylation is a well-recognised cause of human disease and is likely to play a pivotal role in the cause of complex disorders. The challenge is to identify consistent epigenetic alterations of aetiological significance, given that epigenetic modification of DNA differs between tissues, occurs at different times of development within the same tissue and is sensitive to continual environmental factors. This makes it difficult to determine whether epigenetic mutations are a primary cause or secondary to the disease process. Genomic imprinting is one of the best understood examples of epigenetic regulation of gene expression. The expression patterns of imprinted genes are characterised by expression from only one allele (of the pair) in a consistent parent of origin manner. The pattern is set by targeted methylation within the male or female germ line that resists the post fertilisation waves of demethylation of the zygote. Imprinted genes are thought to play an important role in fetal growth and their carefully regulated expression is important for normal cellular metabolism and human behaviour. Several well-known disorders of imprinting are known including Beckwith Wiedemann syndrome, Transient Neonatal Diabetes, Temple syndrome, Wang Kagami Ogata syndrome, Russell Silver syndrome, Angelman syndrome Prader Willi syndrome and Pseudohypoparathyroidism type 1B. Only a proportion of people with these syndromes have a true epigenetic error, as uniparental disomy (inheritance of both chromosome homologues from one parent with no contribution from the other) and copy number variation are more common underlying causes. Studies to determine the cause of seemingly ‘true’ epigenetic aberrations, identified in imprinting disorders, may provide helpful insights into the causes of epigenetic mutations in general. For example the work on imprinting disorders has led to the identification of ZFP57, as a gene essential for DNA methylation maintenance.
Epigenetic mutations; DNA methylation; imprinting disorders
DNA methylation is an important epigenetic mechanism for regulating the activity of the genome. Inter-individual differences in the epigenome, including the DNA methylome, are thought to account for the missing variance in disease susceptibility that has not been identified in Genome-Wide Association Studies (GWAS). Large-scale profiling of DNA methylation in population cohorts at the sample size of thousands to tens of thousands is necessary to characterize the epigenetic component of diseases susceptibility. Although whole genome bisulfite sequencing has been demonstrated in mammalian-size genomes, it is still too costly for a large sample size. In addition, only a very small fraction of CpG sites in the human genome is variable and carries information related to the epigenetic state of the cells, whereas the majority of CpG sites are static. For large-scale methylation sequencing projects, ideally the sequencing cost should be spent only on the informative sites in the genome.
We have previously developed a targeted bisulfite sequencing method based on bisulfite padlock probes (BSPP), which can quantify the absolute CpG methylation levels on an arbitrary set of genomic regions. In this talk I will present a second-generation BSPP method, which has a highly optimized protocol for production-scale methylation sequencing at a batch size of 96 samples. We also designed and optimized a set of ∼300,000 padlock probes targeting a set of carefully selected genomic regions (DMRs, enhancers, insulators, promoters, DNase I hypersensitive sites) throughout the entire human genome. A computational pipeline for probe design and efficient processing of methylation sequencing data was also developed. This method allows us to obtain highly accurate measurements of CpG and non-CpG methylation on >500,000 highly informative sites at the cost of <$250 per sample. Preliminary results on clinical samples processed with our second-generation BSPP method will be discussed.
Cellular processes requiring access to the DNA genome are regulated by an overlay of epigenetic modifications, including histone modification and chromatin remodeling. Similar to the cellular host, many nuclear DNA viruses that depend upon the host cell’s transcriptional machinery are also subject to the regulatory impact of chromatin assembly and modification. Infection of cells with alphaherpesviruses (herpes simplex virus [HSV] and varicella-zoster virus [VZV]) results in the deposition of nucleosomes bearing repressive histone H3K9 methylation on the viral genome. This repressive state is modulated by the recruitment of a cellular coactivator complex containing the histone H3K9 demethylase LSD1 to the viral immediate-early (IE) gene promoters. Inhibition of the activity of this enzyme results in increased repressive chromatin assembly and suppression of viral gene expression during lytic infection as well as reactivation from latency in a mouse ganglion explant model. However, available small-molecule LSD1 inhibitors are not originally designed to inhibit LSD1, but rather monoamine oxidases (MAO) in general. Thus, their specificity for and potency to LSD1 is low. In this study, a novel specific LSD1 inhibitor was identified that potently repressed HSV IE gene expression, genome replication, and reactivation from latency. Importantly, the inhibitor also suppressed primary infection of HSV in vivo in a mouse model. Based on common control of a number of DNA viruses by epigenetic modulation, it was also demonstrated that this LSD1 inhibitor blocks initial gene expression of the human cytomegalovirus and adenovirus type 5.
IMPORTANCE Epigenetic mechanisms, including histone modification and chromatin remodeling, play important regulatory roles in all cellular processes requiring access to the genome. These mechanisms are often altered in disease conditions, including various cancers, and thus represent novel targets for drugs. Similarly, many viral pathogens are regulated by an epigenetic overlay that determines the outcome of infection. Therefore, these epigenetic targets also represent novel antiviral targets. Here, a novel inhibitor was identified with high specificity and potency for the histone demethylase LSD1, a critical component of the herpes simplex virus (HSV) gene expression paradigm. This inhibitor was demonstrated to have potent antiviral potential in both cultured cells and animal models. Thus, in addition to clearly demonstrating the critical role of LSD1 in regulation of HSV infection, as well as other DNA viruses, the data extends the therapeutic potential of chromatin modulation inhibitors from the focused field of oncology to the arena of antiviral agents.
Epigenetic mechanisms, including histone modification and chromatin remodeling, play important regulatory roles in all cellular processes requiring access to the genome. These mechanisms are often altered in disease conditions, including various cancers, and thus represent novel targets for drugs. Similarly, many viral pathogens are regulated by an epigenetic overlay that determines the outcome of infection. Therefore, these epigenetic targets also represent novel antiviral targets. Here, a novel inhibitor was identified with high specificity and potency for the histone demethylase LSD1, a critical component of the herpes simplex virus (HSV) gene expression paradigm. This inhibitor was demonstrated to have potent antiviral potential in both cultured cells and animal models. Thus, in addition to clearly demonstrating the critical role of LSD1 in regulation of HSV infection, as well as other DNA viruses, the data extends the therapeutic potential of chromatin modulation inhibitors from the focused field of oncology to the arena of antiviral agents.
Genetic and epigenetic mechanisms may interact and together affect biological processes and disease development. However, most previous studies have investigated genetic and epigenetic mechanisms independently, and studies examining their interactions throughout the human genome are lacking. To identify genetic loci that interact with the epigenome, we performed the first genome-wide DNA methylation quantitative trait locus (mQTL) analysis in human pancreatic islets. We related 574,553 single nucleotide polymorphisms (SNPs) with genome-wide DNA methylation data of 468,787 CpG sites targeting 99% of RefSeq genes in islets from 89 donors. We identified 67,438 SNP-CpG pairs in cis, corresponding to 36,783 SNPs (6.4% of tested SNPs) and 11,735 CpG sites (2.5% of tested CpGs), and 2,562 significant SNP-CpG pairs in trans, corresponding to 1,465 SNPs (0.3% of tested SNPs) and 383 CpG sites (0.08% of tested CpGs), showing significant associations after correction for multiple testing. These include reported diabetes loci, e.g. ADCY5, KCNJ11, HLA-DQA1, INS, PDX1 and GRB10. CpGs of significant cis-mQTLs were overrepresented in the gene body and outside of CpG islands. Follow-up analyses further identified mQTLs associated with gene expression and insulin secretion in human islets. Causal inference test (CIT) identified SNP-CpG pairs where DNA methylation in human islets is the potential mediator of the genetic association with gene expression or insulin secretion. Functional analyses further demonstrated that identified candidate genes (GPX7, GSTT1 and SNX19) directly affect key biological processes such as proliferation and apoptosis in pancreatic β-cells. Finally, we found direct correlations between DNA methylation of 22,773 (4.9%) CpGs with mRNA expression of 4,876 genes, where 90% of the correlations were negative when CpGs were located in the region surrounding transcription start site. Our study demonstrates for the first time how genome-wide genetic and epigenetic variation interacts to influence gene expression, islet function and potential diabetes risk in humans.
Inter-individual variation in genetics and epigenetics affects biological processes and disease susceptibility. However, most studies have investigated genetic and epigenetic mechanisms independently and to uncover novel mechanisms affecting disease susceptibility there is a highlighted need to study interactions between these factors on a genome-wide scale. To identify novel loci affecting islet function and potentially diabetes, we performed the first genome-wide methylation quantitative trait locus (mQTL) analysis in human pancreatic islets including DNA methylation of 468,787 CpG sites located throughout the genome. Our results showed that DNA methylation of 11,735 CpGs in 4,504 unique genes is regulated by genetic factors located in cis (67,438 SNP-CpG pairs). Furthermore, significant mQTLs cover previously reported diabetes loci including KCNJ11, INS, HLA, PDX1 and GRB10. We also found mQTLs associated with gene expression and insulin secretion in human islets. By performing causality inference tests (CIT), we identified CpGs where DNA methylation potentially mediates the genetic impact on gene expression and insulin secretion. Our functional follow-up experiments further demonstrated that identified mQTLs/genes (GPX7, GSTT1 and SNX19) directly affect pancreatic β-cell function. Together, our study provides a detailed map of genome-wide associations between genetic and epigenetic variation, which affect gene expression and insulin secretion in human pancreatic islets.
Aberrant CpG methylation is a universal epigenetic trait of cancer cell genomes. However, human cancer samples or cell lines preclude the investigation of epigenetic changes occurring early during tumour development. Here, we have used MeDIP-seq to analyse the DNA methylome of APCMin adenoma as a model for intestinal cancer initiation, and we present a list of more than 13,000 recurring differentially methylated regions (DMRs) characterizing intestinal adenoma of the mouse. We show that Polycomb Repressive Complex (PRC) targets are strongly enriched among hypermethylated DMRs, and several PRC2 components and DNA methyltransferases were up-regulated in adenoma. We further demonstrate by bisulfite pyrosequencing of purified cell populations that the DMR signature arises de novo in adenoma cells rather than by expansion of a pre-existing pattern in intestinal stem cells or undifferentiated crypt cells. We found that epigenetic silencing of tumour suppressors, which occurs frequently in colon cancer, was rare in adenoma. Quite strikingly, we identified a core set of DMRs, which is conserved between mouse adenoma and human colon cancer, thus possibly revealing a global panel of epigenetically modified genes for intestinal tumours. Our data allow a distinction between early conserved epigenetic alterations occurring in intestinal adenoma and late stochastic events promoting colon cancer progression, and may facilitate the selection of more specific clinical epigenetic biomarkers.
The formation and progression of tumours to metastatic disease is driven by two major mechanisms, i.e. genetic alterations that activate oncogenes or inactivate tumour suppressor genes, and changes in the epigenome that cause variations in the expression of the genetic information. A deeper understanding of the interaction between the genetic and epigenetic mechanisms is critical for the selection of tumour biomarkers and for the future development of therapies. Human tumour specimens and cell lines contain a plethora of genetic and epigenetic changes, which complicate data analysis. In contrast, mouse tumour models such as the APCMin mouse used in this study arise by a single initiating genetic mutation, yet share key traits with human cancer. Here we show that mouse adenomas acquire a multitude of epigenetic alterations, which are recurring in mouse adenoma and in human colon cancer, representing early and advanced tumours, respectively. The use of a mouse model thus allowed us to uncover a sequence of epigenetic changes occurring in tumours, which may facilitate the identification of novel clinical colon cancer biomarkers.
With the advent of cost-effective genotyping technologies, genome-wide association studies allow researchers to examine hundreds of thousands of single nucleotide polymorphisms (SNPs) for association with human disease. Recently, many researchers applying this strategy have detected strong associations to disease with SNP markers that are either not in linkage disequilibrium with any nonsynonymous SNP or large distances from any annotated gene. In such cases, no well-established standard practice for effective SNP selection for follow-up studies exists. We aim to identify and prioritize groups of SNPs that are more likely to affect phenotypes in order to facilitate efficient SNP selection for follow-up studies.
Based on the annotations available in the Ensembl database, we categorized SNPs in the human genome into classes related to regulatory attributes, such as epigenetic modifications and transcription factor binding sites, in addition to classes related to gene structure and cross-species conservation. Using the distribution of derived allele frequencies (DAF) within each class, we assessed the strength of natural selection for each class relative to the genome as a whole. We applied this DAF analysis to Perlegen resequenced SNPs genome-wide. Regulatory elements annotated by Ensembl such as specific histone methylation sites as well as classes defined by cross-species conservation showed negative selection in comparison to the genome as a whole.
These results highlight which annotated classes are under purifying selection, have putative functional importance, and contain SNPs that are strong candidates for follow-up studies after genome-wide association. Such SNP annotation may also be useful in interpreting results of whole-genome sequencing studies.
Understanding how genetic variation affects distinct cellular phenotypes, such as gene expression levels, alternative splicing and DNA methylation levels, is essential for better understanding of complex diseases and traits. Furthermore, how inter-individual variation of DNA methylation is associated to gene expression is just starting to be studied. In this study, we use the GenCord cohort of 204 newborn Europeans’ lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords. The samples were previously genotyped for 2.5 million SNPs, mRNA-sequenced, and assayed for methylation levels in 482,421 CpG sites. We observe that methylation sites associated to expression levels are enriched in enhancers, gene bodies and CpG island shores. We show that while the correlation between DNA methylation and gene expression can be positive or negative, it is very consistent across cell-types. However, this epigenetic association to gene expression appears more tissue-specific than the genetic effects on gene expression or DNA methylation (observed in both sharing estimations based on P-values and effect size correlations between cell-types). This predominance of genetic effects can also be reflected by the observation that allele specific expression differences between individuals dominate over tissue-specific effects. Additionally, we discover genetic effects on alternative splicing and interestingly, a large amount of DNA methylation correlating to alternative splicing, both in a tissue-specific manner. The locations of the SNPs and methylation sites involved in these associations highlight the participation of promoter proximal and distant regulatory regions on alternative splicing. Overall, our results provide high-resolution analyses showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific. Furthermore, the details of how this tissue-specificity may vary across inter-relations of molecular traits, and where these are occurring, can yield further insights into gene regulation and cellular biology as a whole.
In order to better understand how genetic differences between individuals can cause diseases, it is crucial to understand how genetic variants affect cellular functions in the different tissues that compose the human body. From the umbilical cord of 195 newborn babies, we previously obtained three different cell-types: fibroblasts, T-cells and immortalized B-cells. From every individual in each cell type we measured four features across the genome: 1) genetic differences, 2) DNA methylation, an epigenetic modification of DNA that can affect its functional state, 3) gene expression—the amount of gene activity, 4) alternative splicing—which of the different versions of a gene is manifested. We find thousands of genetic variants of the DNA sequence that affect methylation, gene expression, and splicing. We show that while these genetic effects often affect multiple cell-types, the strength of these effects varies between cell-types. Also epigenetic methylation marks of DNA associate to gene expression and particularly often to splicing. Since abnormalities in gene expression, DNA methylation and alternative splicing are associated to diseases, it is important to continue studying how these traits are inter-related and affected by genetic variation across cell-types.
DNA methylation is globally reprogrammed during mammalian preimplantation development, which is critical for normal development. Recent reduced representation bisulfite sequencing (RRBS) studies suggest that the methylome dynamics are essentially conserved between human and mouse early embryos. RRBS is known to cover 5–10% of all genomic CpGs, favoring those contained within CpG-rich regions. To obtain an unbiased and more complete representation of the methylome during early human development, we performed whole genome bisulfite sequencing of human gametes and blastocysts that covered>70% of all genomic CpGs. We found that the maternal genome was demethylated to a much lesser extent in human blastocysts than in mouse blastocysts, which could contribute to an increased number of imprinted differentially methylated regions in the human genome. Global demethylation of the paternal genome was confirmed, but SINE-VNTR-Alu elements and some other tandem repeat-containing regions were found to be specifically protected from this global demethylation. Furthermore, centromeric satellite repeats were hypermethylated in human oocytes but not in mouse oocytes, which might be explained by differential expression of de novo DNA methyltransferases. These data highlight both conserved and species-specific regulation of DNA methylation during early mammalian development. Our work provides further information critical for understanding the epigenetic processes underlying differentiation and pluripotency during early human development.
DNA methylation reprogramming after fertilization is critical for normal mammalian development. Early embryos are sensitive to environmental stresses and a number of reports have pointed out the increased risk of DNA methylation errors associated with assisted reproduction technologies. Therefore, it is very important to understand normal DNA methylation patterns during early human development. Recent reduced representation bisulfite sequencing studies reported partial methylomes of human gametes and early embryos. To provide a more comprehensive view of DNA methylation dynamics during early human development, we report on whole genome bisulfite sequencing of human gametes and blastocysts. We show that the paternal genome is globally demethylated in blastocysts whereas the maternal genome is demethylated to a much lesser extent. We also reveal unique regulation of imprinted differentially methylated regions, gene bodies and repeat sequences during early human development. Our high-resolution methylome maps are essential to understand epigenetic reprogramming by human oocytes and will aid in the preimplantation epigenetic diagnosis of human embryos.
Recent developments in genomic sequencing have advanced our understanding of the mutations underlying human malignancy. Melanoma is a prototype of an aggressive, genetically heterogeneous cancer notorious for its biologic plasticity and predilection towards developing resistance to targeted therapies. Evidence is rapidly accumulating that dysregulated epigenetic mechanisms (DNA methylation/demethylation, histone modification, non-coding RNAs) may play a central role in the pathogenesis of melanoma. Therefore, we sought to characterize the frequency and nature of mutations in epigenetic regulators in clinical, treatment-naïve, patient melanoma specimens obtained from one academic institution.
Targeted next-generation sequencing for 275 known and investigative cancer genes (of which 41 genes, or 14.9 %, encoded an epigenetic regulator) of 38 treatment-naïve patient melanoma samples revealed that 22.3 % (165 of 740) of all non-silent mutations affected an epigenetic regulator. The most frequently mutated genes were BRAF, MECOM, NRAS, TP53, MLL2, and CDKN2A. Of the 40 most commonly mutated genes, 12 (30.0 %) encoded epigenetic regulators, including genes encoding enzymes involved in histone modification (MECOM, MLL2, SETD2), chromatin remodeling (ARID1B, ARID2), and DNA methylation and demethylation (TET2, IDH1). Among the 38 patient melanoma samples, 35 (92.1 %) harbored at least one mutation in an epigenetic regulator. The genes with the highest number of total UVB-signature mutations encoded epigenetic regulators, including MLL2 (100 %, 16 of 16) and MECOM (82.6 %, 19 of 23). Moreover, on average, epigenetic genes harbored a significantly greater number of UVB-signature mutations per gene than non-epigenetic genes (3.7 versus 2.4, respectively; p = 0.01). Bioinformatics analysis of The Cancer Genome Atlas (TCGA) melanoma mutation dataset also revealed a frequency of mutations in the 41 epigenetic genes comparable to that found within our cohort of patient melanoma samples.
Our study identified a high prevalence of somatic mutations in genes encoding epigenetic regulators, including those involved in DNA demethylation, histone modification, chromatin remodeling, and microRNA processing. Moreover, UVB-signature mutations were found more commonly among epigenetic genes than in non-epigenetic genes. Taken together, these findings further implicate epigenetic mechanisms, particularly those involving the chromatin-remodeling enzyme MECOM/EVI1 and histone-modifying enzyme MLL2, in the pathobiology of melanoma.
Electronic supplementary material
The online version of this article (doi:10.1186/s13148-015-0091-3) contains supplementary material, which is available to authorized users.
Melanoma; Next-generation sequencing (NGS); Epigenetics; MECOM (MDS1 and EV1 complex locus); MLL2; Ten-eleven translocation (TET); Isocitrate dehydrogenase 2 (IDH2); 5-hydroxymethylcytosine; DNA demethylation
The epigenetic activity of transposable elements (TEs) can influence the regulation of genes; though, this regulation is confined to the genes, promoters, and enhancers that neighbor the TE. This local cis regulation of genes therefore limits the influence of the TE's epigenetic regulation on the genome. TE activity is suppressed by small RNAs, which also inhibit viruses and regulate the expression of genes. The production of TE heterochromatin-associated endogenous small interfering RNAs (siRNAs) in the reference plant Arabidopsis thaliana is mechanistically distinct from gene-regulating small RNAs, such as microRNAs or trans-acting siRNAs (tasiRNAs). Previous research identified a TE small RNA that potentially regulates the UBP1b mRNA, which encodes an RNA–binding protein involved in stress granule formation. We demonstrate that this siRNA, siRNA854, is under the same trans-generational epigenetic control as the Athila family LTR retrotransposons from which it is produced. The epigenetic activation of Athila elements results in a shift in small RNA processing pathways, and new 21–22 nucleotide versions of Athila siRNAs are produced by protein components normally not responsible for processing TE siRNAs. This processing results in siRNA854's incorporation into ARGONAUTE1 protein complexes in a similar fashion to gene-regulating tasiRNAs. We have used reporter transgenes to demonstrate that the UPB1b 3′ untranslated region directly responds to the epigenetic status of Athila TEs and the accumulation of siRNA854. The regulation of the UPB1b 3′ untranslated region occurs both on the post-transcriptional and translational levels when Athila TEs are epigenetically activated, and this regulation results in the phenocopy of the ubp1b mutant stress-sensitive phenotype. This demonstrates that a TE's epigenetic activity can modulate the host organism's stress response. In addition, the ability of this TE siRNA to regulate a gene's expression in trans blurs the lines between TE and gene-regulating small RNAs.
The portion of the genome that does not encode for genes is often overlooked as a source of cellular regulatory information. Here, we demonstrate that regulatory information controlling expression and protein production from a gene called UBP1b is coming from a distant non-gene transposable element (TE). TEs are fragments of DNA that, unlike genes, are capable of duplicating themselves from one location in the genome to another, and occupy nearly half of the human genome. TEs are often referred to as “junk DNA,” as the study of cellular regulation and function is focused on genes. The regulation of TEs is distinct from genes, as a process termed epigenetic silencing heritably represses TE expression and activity. We have demonstrated that the epigenetic status (active versus silenced) of the Athila TE family regulates the UBP1b gene through the activity of a TE small RNA. The function of the UPB1b gene is to respond to and regulate cellular stress, and the epigenetic regulatory status of the Athila TE therefore modulates this stress response. This demonstrates that the epigenetic regulation of TEs can be a source of gene regulatory information, influencing a basic cellular function such as the stress response.
Several recent studies have shown a genetic influence on gene expression variation, including variation between the two chromosomes within an individual and variation between individuals at the population level. We hypothesized that genetic inheritance may also affect variation in chromatin states. To test this hypothesis, we analyzed chromatin states in 12 lymphoblastoid cells derived from two Centre d'Etude du Polymorphisme Humain families using an allele-specific chromatin immunoprecipitation (ChIP-on-chip) assay with Affymetrix 10K SNP chip. We performed the allele-specific ChIP-on-chip assays for the 12 lymphoblastoid cells using antibodies targeting at RNA polymerase II and five post-translation modified forms of the histone H3 protein. The use of multiple cell lines from the Centre d'Etude du Polymorphisme Humain families allowed us to evaluate variation of chromatin states across pedigrees. These studies demonstrated that chromatin state clustered by family. Our results support the idea that genetic inheritance can determine the epigenetic state of the chromatin as shown previously in model organisms. To our knowledge, this is the first demonstration in humans that genetics may be an important factor that influences global chromatin state mediated by histone modification, the hallmark of the epigenetic phenomena.
Human health and disease are determined by an interaction between genetic background and environmental exposures. Both normal development and disease are mediated by epigenetic regulation of gene expression. The epigenetic regulation causes heritable changes in gene expression, which is not associated with DNA sequence changes. Instead, it is mediated by chemical modification of DNA such as DNA methylation or by protein modifications such as histone acetylation and methylation. Although much has been known about epigenetic inheritance during development, little is known about the influence of the genetic background on epigenetic processes such as histone modifications. In this report the authors studied five histone modifications on a genome-wide level in cells from different families. Global epigenetic states, as measured by these histone modifications, showed a similar pattern for cells derived from the same family. This study demonstrates that genetic inheritance may be an important factor influencing global chromatin states mediated by histone modifications in humans. These observations illustrate the importance of integrating genetic and epigenetic information into studies of human health and complex diseases.
The study of genome-wide DNA methylation changes has become more accessible with the development of various array-based technologies though when studying species other than human the choice of applications are limited and not always within reach. In this study, we adapted and tested the applicability of Methylation Specific Digital Karyotyping (MSDK), a non-array based method, for the prospective analysis of epigenetic changes after perinatal nutritional modifications in a mouse model of allergic airway disease. MSDK is a sequenced based method that allows a comprehensive and unbiased methylation profiling. The method generates 21 base pairs long sequence tags derived from specific locations in the genome. The resulting tag frequencies determine in a quantitative manner the methylation level of the corresponding loci.
Genomic DNA from whole lung was isolated and subjected to MSDK analysis using the methylation-sensitive enzyme Not I as the mapping enzyme and Nla III as the fragmenting enzyme. In a pair wise comparison of the generated mouse MSDK libraries we identified 158 loci that are significantly differentially methylated (P-value = 0.05) after perinatal dietary changes in our mouse model. Quantitative methylation specific PCR and sequence analysis of bisulfate modified genomic DNA confirmed changes in methylation at specific loci. Differences in genomic MSDK tag counts for a selected set of genes, correlated well with changes in transcription levels as measured by real-time PCR. Furthermore serial analysis of gene expression profiling demonstrated a dramatic difference in expressed transcripts in mice exposed to perinatal nutritional changes.
The genome-wide methylation survey applied in this study allowed for an unbiased methylation profiling revealing subtle changes in DNA methylation in mice maternally exposed to dietary changes in methyl-donor content. The MSDK method is applicable for mouse models of complex human diseases in a mixed cell population and might be a valuable technology to determine whether environmental exposures can lead to epigenetic changes.
Behavioral phenotyping and genome-wide profiling of the histone modifier EHMT in Drosophila reveals a mechanism through which an epigenetic writer may control cognition.
The epigenetic modification of chromatin structure and its effect on complex neuronal processes like learning and memory is an emerging field in neuroscience. However, little is known about the “writers” of the neuronal epigenome and how they lay down the basis for proper cognition. Here, we have dissected the neuronal function of the Drosophila euchromatin histone methyltransferase (EHMT), a member of a conserved protein family that methylates histone 3 at lysine 9 (H3K9). EHMT is widely expressed in the nervous system and other tissues, yet EHMT mutant flies are viable. Neurodevelopmental and behavioral analyses identified EHMT as a regulator of peripheral dendrite development, larval locomotor behavior, non-associative learning, and courtship memory. The requirement for EHMT in memory was mapped to 7B-Gal4 positive cells, which are, in adult brains, predominantly mushroom body neurons. Moreover, memory was restored by EHMT re-expression during adulthood, indicating that cognitive defects are reversible in EHMT mutants. To uncover the underlying molecular mechanisms, we generated genome-wide H3K9 dimethylation profiles by ChIP-seq. Loss of H3K9 dimethylation in EHMT mutants occurs at 5% of the euchromatic genome and is enriched at the 5′ and 3′ ends of distinct classes of genes that control neuronal and behavioral processes that are corrupted in EHMT mutants. Our study identifies Drosophila EHMT as a key regulator of cognition that orchestrates an epigenetic program featuring classic learning and memory genes. Our findings are relevant to the pathophysiological mechanisms underlying Kleefstra Syndrome, a severe form of intellectual disability caused by mutations in human EHMT1, and have potential therapeutic implications. Our work thus provides novel insights into the epigenetic control of cognition in health and disease.
Epigenetic regulators can affect gene transcription through modification of DNA and histones, which together form chromatin. The importance of such regulators for cognition is increasingly appreciated, but only few key factors have been identified so far. Excellent candidates are histone modifiers that are involved in intellectual disability, such as EHMT1, implicated in Kleefstra Syndrome. Here, we characterized the neuronal function of EHMT in Drosophila. Flies that lack EHMT are viable but show highly selective defects in specific aspects of neuronal development and function, including learning and memory. Genome-wide analysis of EHMT-mediated histone methylation revealed that EHMT targets the majority of all currently known Drosophila learning and memory genes. It also targets genes known to be involved in the other aspects of behavior and neuronal development that are compromised in EHMT mutants. Remarkably, EHMT mutant memory deficits can be reversed in adulthood, suggesting that epigenetic influences on cognition are not always permanent. Our results provide novel insights into the epigenetic control of cognition in health and disease.