Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.
Many human diseases are complex with multiple genetic and environmental causal factors interacting together to give rise to disease phenotypes. Such factors affect biological systems through many layers of regulations, including transcriptional and epigenetic regulation, and protein changes. To fully understand their molecular mechanisms, complex diseases are often studied in diverse dimensions including genetics (genotype variations by single nucleotide polymorphism (SNP) arrays or whole exome sequencing), transcriptomics, epigenetics, and proteomics. However, errors in sample annotation or labeling often occur in large-scale genetic and genomic studies and are difficult to avoid completely during data generation and management. Identifying and correcting these errors are critical for integrative genomic studies. In this study, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors based on multiple types of molecular data before further integrative analysis. Our results indicate that signals increased more than 100% after correction of sample labeling errors in a large lung genomic study. Our method can be broadly applied to large genomic data sets with multiple types of omics data, such as TCGA (The Cancer Genome Atlas) data sets.
This paper aims to integrate into current understanding of AIS causation, etiopathogenetic information presented at two Meetings during 2012 namely, the International Research Society of Spinal Deformities (IRSSD) and the Scoliosis Research Society (SRS). The ultimate hope is to prevent the occurrence or progression of the spinal deformity of AIS with non-invasive treatment, possibly medical. This might be attained by personalised polymechanistic preventive therapy targeting the appropriate etiology and/or etiopathogenetic pathways, to avoid fusion and maintain spinal mobility. Although considerable progress had been made in the past two decades in understanding the etiopathogenesis of adolescent idiopathic scoliosis (AIS), it still lacks an agreed theory of etiopathogenesis. One problem may be that AIS results not from one cause, but several that interact with various genetic predisposing factors. There is a view there are two other pathogenic processes for idiopathic scoliosis namely, initiating (or inducing), and those that cause curve progression. Twin studies and observations of family aggregation have revealed significant genetic contributions to idiopathic scoliosis, that place AIS among other common disease or complex traits with a high heritability interpreted by the genetic variant hypothesis of disease. We summarize etiopathogenetic knowledge of AIS as theories of pathogenesis including recent multiple concepts, and blood tests for AIS based on predictive biomarkers and genetic variants that signify disease risk. There is increasing evidence for the possibility of an underlying neurological disorder for AIS, research which holds promise. Like brain research, most AIS workers focus on their own corner and there is a need for greater integration of research effort. Epigenetics, a relatively recent field, evaluates factors concerned with gene expression in relation to environment, disease, normal development and aging, with a complex regulation across the genome during the first decade of life. Research on the role of environmental factors, epigenetics and chronic non-communicable diseases (NCDs) including adiposity, after a slow start, has exploded in the last decade. Not so for AIS research and the environment where, except for monozygotic twin studies, there are only sporadic reports to suggest that environmental factors are at work in etiology. Here, we examine epigenetic concepts as they may relate to human development, normal life history phases and AIS pathogenesis. Although AIS is not regarded as an NCD, like them, it is associated with whole organism metabolic phenomena, including lower body mass index, lower circulating leptin levels and other systemic disorders. Some epigenetic research applied to Silver-Russell syndrome and adiposity is examined, from which suggestions are made for consideration of AIS epigenetic research, cross-sectional and longitudinal. The word scoliogeny is suggested to include etiology, pathogenesis and pathomechanism.
Scoliosis; Etiology; Pathogenesis; Scoliogeny; Epigenetics
CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of large-scale epigenetic datasets. Although widely used, the current CpG island criteria incur significant disadvantages: (1) reliance on arbitrary threshold parameters that bear little biological justification, (2) failure to account for widespread heterogeneity among CpG islands, and (3) apparent lack of specificity when applied to the human genome. This study is driven by the idea that a quantitative score of “CpG island strength” that incorporates epigenetic and functional aspects can help resolve these issues. We construct an epigenome prediction pipeline that links the DNA sequence of CpG islands to their epigenetic states, including DNA methylation, histone modifications, and chromatin accessibility. By training support vector machines on epigenetic data for CpG islands on human Chromosomes 21 and 22, we identify informative DNA attributes that correlate with open versus compact chromatin structures. These DNA attributes are used to predict the epigenetic states of all CpG islands genome-wide. Combining predictions for multiple epigenetic features, we estimate the inherent CpG island strength for each CpG island in the human genome, i.e., its inherent tendency to exhibit an open and transcriptionally competent chromatin structure. We extensively validate our results on independent datasets, showing that the CpG island strength predictions are applicable and informative across different tissues and cell types, and we derive improved maps of predicted “bona fide” CpG islands. The mapping of CpG islands by epigenome prediction is conceptually superior to identifying CpG islands by widely used sequence criteria since it links CpG island detection to their characteristic epigenetic and functional states. And it is superior to purely experimental epigenome mapping for CpG island detection since it abstracts from specific properties that are limited to a single cell type or tissue. In addition, using computational epigenetics methods we could identify high correlation between the epigenome and characteristics of the DNA sequence, a finding which emphasizes the need for a better understanding of the mechanistic links between genome and epigenome.
A key challenge for bioinformatic research is the identification of regulatory regions in the human genome. Regulatory regions are DNA elements that control gene expression and thereby contribute to the organism's phenotype. An important class of regulatory regions consists of so-called CpG islands, which are characterized by frequent occurrence of the CG sequence pattern. CpG islands are strongly associated with open and transcriptionally competent chromatin structure, they play a critical role in gene regulation, and they are involved in the epigenetic causes of cancer. In this article we make several conceptual improvements to the definition and mapping of CpG islands. First, we show that the traditional distinction between CpG islands and non-CpG islands is too harsh, and instead we propose a quantitative measure of CpG island strength to gradually distinguish between stronger and weaker regulatory regions. Second, by genome-wide comparison of multiple epigenome datasets we identify high correlation between features of the genome's DNA sequence and the epigenome, indicating strong functional interdependence. Third, we develop and apply a novel method for predicting the strength of all CpG islands in the human genome, giving rise to an improved and more accurate CpG island mapping.
Throughout most of the mammalian genome, genetically regulated developmental programming establishes diverse yet predictable epigenetic states across differentiated cells and tissues. At metastable epialleles (MEs), conversely, epigenotype is established stochastically in the early embryo then maintained in differentiated lineages, resulting in dramatic and systemic interindividual variation in epigenetic regulation. In the mouse, maternal nutrition affects this process, with permanent phenotypic consequences for the offspring. MEs have not previously been identified in humans. Here, using an innovative 2-tissue parallel epigenomic screen, we identified putative MEs in the human genome. In autopsy samples, we showed that DNA methylation at these loci is highly correlated across tissues representing all 3 embryonic germ layer lineages. Monozygotic twin pairs exhibited substantial discordance in DNA methylation at these loci, suggesting that their epigenetic state is established stochastically. We then tested for persistent epigenetic effects of periconceptional nutrition in rural Gambians, who experience dramatic seasonal fluctuations in nutritional status. DNA methylation at MEs was elevated in individuals conceived during the nutritionally challenged rainy season, providing the first evidence of a permanent, systemic effect of periconceptional environment on human epigenotype. At MEs, epigenetic regulation in internal organs and tissues varies among individuals and can be deduced from peripheral blood DNA. MEs should therefore facilitate an improved understanding of the role of interindividual epigenetic variation in human disease.
There is growing interest in the possibility that interindividual epigenetic variation plays an important role in a broad range of human diseases. The tissue-specificity of epigenetic regulation, however, will in many cases make it difficult to obtain the appropriate tissues in which to perform large-scale studies linking epigenetic dysregulation to disease. We have used an innovative two-tissue DNA methylation screen to identify genomic regions that exhibit interindividual epigenetic variation which occurs systemically—i.e. similarly in all tissues. Such regions—called metastable epialleles—have previously been identified in mice because they cause visible phenotypic variation amongst genetically identical individuals. Indeed, we found that even monozygotic twins show substantial epigenetic discordance at these loci. Further, we show that, as in mice, establishment of DNA methylation at these putative human metastable epialleles is labile to maternal environment around the time of conception. Metastable epialleles should facilitate an improved understanding both of the role of interindividual epigenetic variation in human disease and of the effects of early environment on the establishment of human epigenotype.
Epigenetic marks such as cytosine methylation are important determinants of cellular and whole-body phenotypes. However, the extent of, and reasons for inter-individual differences in cytosine methylation, and their association with phenotypic variation are poorly characterised. Here we present the first genome-wide study of cytosine methylation at single-nucleotide resolution in an animal model of human disease. We used whole-genome bisulfite sequencing in the spontaneously hypertensive rat (SHR), a model of cardiovascular disease, and the Brown Norway (BN) control strain, to define the genetic architecture of cytosine methylation in the mammalian heart and to test for association between methylation and pathophysiological phenotypes. Analysis of 10.6 million CpG dinucleotides identified 77,088 CpGs that were differentially methylated between the strains. In F1 hybrids we found 38,152 CpGs showing allele-specific methylation and 145 regions with parent-of-origin effects on methylation. Cis-linkage explained almost 60% of inter-strain variation in methylation at a subset of loci tested for linkage in a panel of recombinant inbred (RI) strains. Methylation analysis in isolated cardiomyocytes showed that in the majority of cases methylation differences in cardiomyocytes and non-cardiomyocytes were strain-dependent, confirming a strong genetic component for cytosine methylation. We observed preferential nucleotide usage associated with increased and decreased methylation that is remarkably conserved across species, suggesting a common mechanism for germline control of inter-individual variation in CpG methylation. In the RI strain panel, we found significant correlation of CpG methylation and levels of serum chromogranin B (CgB), a proposed biomarker of heart failure, which is evidence for a link between germline DNA sequence variation, CpG methylation differences and pathophysiological phenotypes in the SHR strain. Together, these results will stimulate further investigation of the molecular basis of locally regulated variation in CpG methylation and provide a starting point for understanding the relationship between the genetic control of CpG methylation and disease phenotypes.
Epigenetic marks provide information that is not encoded in the primary DNA sequence itself but in modifications of genomic DNA and of the associated proteins. Methylation of genomic DNA at cytosine residues is an important epigenetic modification that is associated with developmental processes, carcinogenesis and other diseases. Genome-wide extent of, and reasons for inter-individual differences in cytosine methylation, and their association with phenotypic variation are poorly characterised. To address these questions we have determined and compared the genome-wide methylation patterns in heart tissue of two inbred rat strains, the spontaneously hypertensive rat, an animal model of human disease and a control rat strain. Comparison of methylation differences between genetically identical animals from the same strain and differences between animals from different strains allowed us to quantify association of epigenetic and genetic differences. We show that differences in an individual's germline DNA sequence are important determinants of the variability in methylation between individuals. Comparison with previous reports implicates common mechanisms for regulation of cytosine methylation that are highly conserved across species. Finally, we find correlation between a proposed blood biomarker for heart failure and variation in DNA methylation, suggesting a link between germline DNA sequence variation, methylation and a disease-related phenotype.
The recent development of whole genome association studies has lead to the robust identification of several loci involved in different common human diseases. Interestingly, some of the strongest signals of association observed in these studies arise from non-coding regions located in very large introns or far away from any annotated genes, raising the possibility that these regions are involved in the etiology of the disease through some unidentified regulatory mechanisms. These findings highlight the importance of better understanding the mechanisms leading to inter-individual differences in gene expression in humans. Most of the existing approaches developed to identify common regulatory polymorphisms are based on linkage/association mapping of gene expression to genotypes. However, these methods have some limitations, notably their cost and the requirement of extensive genotyping information from all the individuals studied which limits their applications to a specific cohort or tissue. Here we describe a robust and high-throughput method to directly measure differences in allelic expression for a large number of genes using the Illumina Allele-Specific Expression BeadArray platform and quantitative sequencing of RT-PCR products. We show that this approach allows reliable identification of differences in the relative expression of the two alleles larger than 1.5-fold (i.e., deviations of the allelic ratio larger than 60∶40) and offers several advantages over the mapping of total gene expression, particularly for studying humans or outbred populations. Our analysis of more than 80 individuals for 2,968 SNPs located in 1,380 genes confirms that differential allelic expression is a widespread phenomenon affecting the expression of 20% of human genes and shows that our method successfully captures expression differences resulting from both genetic and epigenetic cis-acting mechanisms.
We describe a new methodology to identify individual differences in the expression of the two copies of one gene. This is achieved by comparing the mRNA level of the two alleles using a heterozygous polymorphism in the transcript as marker. We show that this approach allows an exhaustive survey of cis-acting regulation in the genome; we can identify allelic expression differences due to epigenetic mechanisms of gene regulation (e.g. imprinting or X-inactivation) as well as differences due to the presence of polymorphisms in regulatory elements. The direct comparison of the expression of both alleles nullifies possible trans-acting regulatory effects (that influence equally both alleles) and thus complements the findings from gene expression association studies. Our approach can be easily applied to any cohort of interest for a wide range of studies. It notably allows following up association signals and testing whether a gene sitting on a particular haplotype is over- or under-expressed, or can be used for screening cancer tissues for aberrant gene expression due to newly arisen mutations or alteration of the methylation patterns.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
Advances in genomic studies have led to significant progress in understanding the epigenetically controlled interplay between chromatin structure and nuclear functions. Epigenetic modifications were shown to play a key role in transcription regulation and genome activity during development and differentiation or in response to the environment. Paradoxically, the molecular mechanisms that regulate the initiation and the maintenance of the spatio-temporal replication program in higher eukaryotes, and in particular their links to epigenetic modifications, still remain elusive. By integrative analysis of the genome-wide distributions of thirteen epigenetic marks in the human cell line K562, at the 100 kb resolution of corresponding mean replication timing (MRT) data, we identify four major groups of chromatin marks with shared features. These states have different MRT, namely from early to late replicating, replication proceeds though a transcriptionally active euchromatin state (C1), a repressive type of chromatin (C2) associated with polycomb complexes, a silent state (C3) not enriched in any available marks, and a gene poor HP1-associated heterochromatin state (C4). When mapping these chromatin states inside the megabase-sized U-domains (U-shaped MRT profile) covering about 50% of the human genome, we reveal that the associated replication fork polarity gradient corresponds to a directional path across the four chromatin states, from C1 at U-domains borders followed by C2, C3 and C4 at centers. Analysis of the other genome half is consistent with early and late replication loci occurring in separate compartments, the former correspond to gene-rich, high-GC domains of intermingled chromatin states C1 and C2, whereas the latter correspond to gene-poor, low-GC domains of alternating chromatin states C3 and C4 or long C4 domains. This new segmentation sheds a new light on the epigenetic regulation of the spatio-temporal replication program in human and provides a framework for further studies in different cell types, in both health and disease.
Previous studies revealed spatially coherent and biological-meaningful chromatin mark combinations in human cells. Here, we analyze thirteen epigenetic mark maps in the human cell line K562 at 100 kb resolution of MRT data. The complexity of epigenetic data is reduced to four chromatin states that display remarkable similarities with those reported in fly, worm and plants. These states have different MRT: (C1) is transcriptionally active, early replicating, enriched in CTCF; (C2) is Polycomb repressed, mid-S replicating; (C3) lacks of marks and replicates late and (C4) is a late-replicating gene-poor HP1 repressed heterochromatin state. When mapping these states inside the 876 replication U-domains of K562, the replication fork polarity gradient observed in these U-domains comes along with a remarkable epigenetic organization from C1 at U-domain borders to C2, C3 and ultimately C4 at centers. The remaining genome half displays early replicating, gene rich and high GC domains of intermingled C1 and C2 states segregating from late replicating, gene poor and low GC domains of concatenated C3 and/or C4 states. This constitutes the first evidence of epigenetic compartmentalization of the human genome into replication domains likely corresponding to autonomous units in the 3D chromatin architecture.
DNA methylation is an important epigenetic mechanism for regulating the activity of the genome. Inter-individual differences in the epigenome, including the DNA methylome, are thought to account for the missing variance in disease susceptibility that has not been identified in Genome-Wide Association Studies (GWAS). Large-scale profiling of DNA methylation in population cohorts at the sample size of thousands to tens of thousands is necessary to characterize the epigenetic component of diseases susceptibility. Although whole genome bisulfite sequencing has been demonstrated in mammalian-size genomes, it is still too costly for a large sample size. In addition, only a very small fraction of CpG sites in the human genome is variable and carries information related to the epigenetic state of the cells, whereas the majority of CpG sites are static. For large-scale methylation sequencing projects, ideally the sequencing cost should be spent only on the informative sites in the genome.
We have previously developed a targeted bisulfite sequencing method based on bisulfite padlock probes (BSPP), which can quantify the absolute CpG methylation levels on an arbitrary set of genomic regions. In this talk I will present a second-generation BSPP method, which has a highly optimized protocol for production-scale methylation sequencing at a batch size of 96 samples. We also designed and optimized a set of ∼300,000 padlock probes targeting a set of carefully selected genomic regions (DMRs, enhancers, insulators, promoters, DNase I hypersensitive sites) throughout the entire human genome. A computational pipeline for probe design and efficient processing of methylation sequencing data was also developed. This method allows us to obtain highly accurate measurements of CpG and non-CpG methylation on >500,000 highly informative sites at the cost of <$250 per sample. Preliminary results on clinical samples processed with our second-generation BSPP method will be discussed.
Cellular processes requiring access to the DNA genome are regulated by an overlay of epigenetic modifications, including histone modification and chromatin remodeling. Similar to the cellular host, many nuclear DNA viruses that depend upon the host cell’s transcriptional machinery are also subject to the regulatory impact of chromatin assembly and modification. Infection of cells with alphaherpesviruses (herpes simplex virus [HSV] and varicella-zoster virus [VZV]) results in the deposition of nucleosomes bearing repressive histone H3K9 methylation on the viral genome. This repressive state is modulated by the recruitment of a cellular coactivator complex containing the histone H3K9 demethylase LSD1 to the viral immediate-early (IE) gene promoters. Inhibition of the activity of this enzyme results in increased repressive chromatin assembly and suppression of viral gene expression during lytic infection as well as reactivation from latency in a mouse ganglion explant model. However, available small-molecule LSD1 inhibitors are not originally designed to inhibit LSD1, but rather monoamine oxidases (MAO) in general. Thus, their specificity for and potency to LSD1 is low. In this study, a novel specific LSD1 inhibitor was identified that potently repressed HSV IE gene expression, genome replication, and reactivation from latency. Importantly, the inhibitor also suppressed primary infection of HSV in vivo in a mouse model. Based on common control of a number of DNA viruses by epigenetic modulation, it was also demonstrated that this LSD1 inhibitor blocks initial gene expression of the human cytomegalovirus and adenovirus type 5.
IMPORTANCE Epigenetic mechanisms, including histone modification and chromatin remodeling, play important regulatory roles in all cellular processes requiring access to the genome. These mechanisms are often altered in disease conditions, including various cancers, and thus represent novel targets for drugs. Similarly, many viral pathogens are regulated by an epigenetic overlay that determines the outcome of infection. Therefore, these epigenetic targets also represent novel antiviral targets. Here, a novel inhibitor was identified with high specificity and potency for the histone demethylase LSD1, a critical component of the herpes simplex virus (HSV) gene expression paradigm. This inhibitor was demonstrated to have potent antiviral potential in both cultured cells and animal models. Thus, in addition to clearly demonstrating the critical role of LSD1 in regulation of HSV infection, as well as other DNA viruses, the data extends the therapeutic potential of chromatin modulation inhibitors from the focused field of oncology to the arena of antiviral agents.
Epigenetic mechanisms, including histone modification and chromatin remodeling, play important regulatory roles in all cellular processes requiring access to the genome. These mechanisms are often altered in disease conditions, including various cancers, and thus represent novel targets for drugs. Similarly, many viral pathogens are regulated by an epigenetic overlay that determines the outcome of infection. Therefore, these epigenetic targets also represent novel antiviral targets. Here, a novel inhibitor was identified with high specificity and potency for the histone demethylase LSD1, a critical component of the herpes simplex virus (HSV) gene expression paradigm. This inhibitor was demonstrated to have potent antiviral potential in both cultured cells and animal models. Thus, in addition to clearly demonstrating the critical role of LSD1 in regulation of HSV infection, as well as other DNA viruses, the data extends the therapeutic potential of chromatin modulation inhibitors from the focused field of oncology to the arena of antiviral agents.
With the advent of cost-effective genotyping technologies, genome-wide association studies allow researchers to examine hundreds of thousands of single nucleotide polymorphisms (SNPs) for association with human disease. Recently, many researchers applying this strategy have detected strong associations to disease with SNP markers that are either not in linkage disequilibrium with any nonsynonymous SNP or large distances from any annotated gene. In such cases, no well-established standard practice for effective SNP selection for follow-up studies exists. We aim to identify and prioritize groups of SNPs that are more likely to affect phenotypes in order to facilitate efficient SNP selection for follow-up studies.
Based on the annotations available in the Ensembl database, we categorized SNPs in the human genome into classes related to regulatory attributes, such as epigenetic modifications and transcription factor binding sites, in addition to classes related to gene structure and cross-species conservation. Using the distribution of derived allele frequencies (DAF) within each class, we assessed the strength of natural selection for each class relative to the genome as a whole. We applied this DAF analysis to Perlegen resequenced SNPs genome-wide. Regulatory elements annotated by Ensembl such as specific histone methylation sites as well as classes defined by cross-species conservation showed negative selection in comparison to the genome as a whole.
These results highlight which annotated classes are under purifying selection, have putative functional importance, and contain SNPs that are strong candidates for follow-up studies after genome-wide association. Such SNP annotation may also be useful in interpreting results of whole-genome sequencing studies.
Genetic and epigenetic mechanisms may interact and together affect biological processes and disease development. However, most previous studies have investigated genetic and epigenetic mechanisms independently, and studies examining their interactions throughout the human genome are lacking. To identify genetic loci that interact with the epigenome, we performed the first genome-wide DNA methylation quantitative trait locus (mQTL) analysis in human pancreatic islets. We related 574,553 single nucleotide polymorphisms (SNPs) with genome-wide DNA methylation data of 468,787 CpG sites targeting 99% of RefSeq genes in islets from 89 donors. We identified 67,438 SNP-CpG pairs in cis, corresponding to 36,783 SNPs (6.4% of tested SNPs) and 11,735 CpG sites (2.5% of tested CpGs), and 2,562 significant SNP-CpG pairs in trans, corresponding to 1,465 SNPs (0.3% of tested SNPs) and 383 CpG sites (0.08% of tested CpGs), showing significant associations after correction for multiple testing. These include reported diabetes loci, e.g. ADCY5, KCNJ11, HLA-DQA1, INS, PDX1 and GRB10. CpGs of significant cis-mQTLs were overrepresented in the gene body and outside of CpG islands. Follow-up analyses further identified mQTLs associated with gene expression and insulin secretion in human islets. Causal inference test (CIT) identified SNP-CpG pairs where DNA methylation in human islets is the potential mediator of the genetic association with gene expression or insulin secretion. Functional analyses further demonstrated that identified candidate genes (GPX7, GSTT1 and SNX19) directly affect key biological processes such as proliferation and apoptosis in pancreatic β-cells. Finally, we found direct correlations between DNA methylation of 22,773 (4.9%) CpGs with mRNA expression of 4,876 genes, where 90% of the correlations were negative when CpGs were located in the region surrounding transcription start site. Our study demonstrates for the first time how genome-wide genetic and epigenetic variation interacts to influence gene expression, islet function and potential diabetes risk in humans.
Inter-individual variation in genetics and epigenetics affects biological processes and disease susceptibility. However, most studies have investigated genetic and epigenetic mechanisms independently and to uncover novel mechanisms affecting disease susceptibility there is a highlighted need to study interactions between these factors on a genome-wide scale. To identify novel loci affecting islet function and potentially diabetes, we performed the first genome-wide methylation quantitative trait locus (mQTL) analysis in human pancreatic islets including DNA methylation of 468,787 CpG sites located throughout the genome. Our results showed that DNA methylation of 11,735 CpGs in 4,504 unique genes is regulated by genetic factors located in cis (67,438 SNP-CpG pairs). Furthermore, significant mQTLs cover previously reported diabetes loci including KCNJ11, INS, HLA, PDX1 and GRB10. We also found mQTLs associated with gene expression and insulin secretion in human islets. By performing causality inference tests (CIT), we identified CpGs where DNA methylation potentially mediates the genetic impact on gene expression and insulin secretion. Our functional follow-up experiments further demonstrated that identified mQTLs/genes (GPX7, GSTT1 and SNX19) directly affect pancreatic β-cell function. Together, our study provides a detailed map of genome-wide associations between genetic and epigenetic variation, which affect gene expression and insulin secretion in human pancreatic islets.
DNA methylation is globally reprogrammed during mammalian preimplantation development, which is critical for normal development. Recent reduced representation bisulfite sequencing (RRBS) studies suggest that the methylome dynamics are essentially conserved between human and mouse early embryos. RRBS is known to cover 5–10% of all genomic CpGs, favoring those contained within CpG-rich regions. To obtain an unbiased and more complete representation of the methylome during early human development, we performed whole genome bisulfite sequencing of human gametes and blastocysts that covered>70% of all genomic CpGs. We found that the maternal genome was demethylated to a much lesser extent in human blastocysts than in mouse blastocysts, which could contribute to an increased number of imprinted differentially methylated regions in the human genome. Global demethylation of the paternal genome was confirmed, but SINE-VNTR-Alu elements and some other tandem repeat-containing regions were found to be specifically protected from this global demethylation. Furthermore, centromeric satellite repeats were hypermethylated in human oocytes but not in mouse oocytes, which might be explained by differential expression of de novo DNA methyltransferases. These data highlight both conserved and species-specific regulation of DNA methylation during early mammalian development. Our work provides further information critical for understanding the epigenetic processes underlying differentiation and pluripotency during early human development.
DNA methylation reprogramming after fertilization is critical for normal mammalian development. Early embryos are sensitive to environmental stresses and a number of reports have pointed out the increased risk of DNA methylation errors associated with assisted reproduction technologies. Therefore, it is very important to understand normal DNA methylation patterns during early human development. Recent reduced representation bisulfite sequencing studies reported partial methylomes of human gametes and early embryos. To provide a more comprehensive view of DNA methylation dynamics during early human development, we report on whole genome bisulfite sequencing of human gametes and blastocysts. We show that the paternal genome is globally demethylated in blastocysts whereas the maternal genome is demethylated to a much lesser extent. We also reveal unique regulation of imprinted differentially methylated regions, gene bodies and repeat sequences during early human development. Our high-resolution methylome maps are essential to understand epigenetic reprogramming by human oocytes and will aid in the preimplantation epigenetic diagnosis of human embryos.
Aberrant CpG methylation is a universal epigenetic trait of cancer cell genomes. However, human cancer samples or cell lines preclude the investigation of epigenetic changes occurring early during tumour development. Here, we have used MeDIP-seq to analyse the DNA methylome of APCMin adenoma as a model for intestinal cancer initiation, and we present a list of more than 13,000 recurring differentially methylated regions (DMRs) characterizing intestinal adenoma of the mouse. We show that Polycomb Repressive Complex (PRC) targets are strongly enriched among hypermethylated DMRs, and several PRC2 components and DNA methyltransferases were up-regulated in adenoma. We further demonstrate by bisulfite pyrosequencing of purified cell populations that the DMR signature arises de novo in adenoma cells rather than by expansion of a pre-existing pattern in intestinal stem cells or undifferentiated crypt cells. We found that epigenetic silencing of tumour suppressors, which occurs frequently in colon cancer, was rare in adenoma. Quite strikingly, we identified a core set of DMRs, which is conserved between mouse adenoma and human colon cancer, thus possibly revealing a global panel of epigenetically modified genes for intestinal tumours. Our data allow a distinction between early conserved epigenetic alterations occurring in intestinal adenoma and late stochastic events promoting colon cancer progression, and may facilitate the selection of more specific clinical epigenetic biomarkers.
The formation and progression of tumours to metastatic disease is driven by two major mechanisms, i.e. genetic alterations that activate oncogenes or inactivate tumour suppressor genes, and changes in the epigenome that cause variations in the expression of the genetic information. A deeper understanding of the interaction between the genetic and epigenetic mechanisms is critical for the selection of tumour biomarkers and for the future development of therapies. Human tumour specimens and cell lines contain a plethora of genetic and epigenetic changes, which complicate data analysis. In contrast, mouse tumour models such as the APCMin mouse used in this study arise by a single initiating genetic mutation, yet share key traits with human cancer. Here we show that mouse adenomas acquire a multitude of epigenetic alterations, which are recurring in mouse adenoma and in human colon cancer, representing early and advanced tumours, respectively. The use of a mouse model thus allowed us to uncover a sequence of epigenetic changes occurring in tumours, which may facilitate the identification of novel clinical colon cancer biomarkers.
Understanding how genetic variation affects distinct cellular phenotypes, such as gene expression levels, alternative splicing and DNA methylation levels, is essential for better understanding of complex diseases and traits. Furthermore, how inter-individual variation of DNA methylation is associated to gene expression is just starting to be studied. In this study, we use the GenCord cohort of 204 newborn Europeans’ lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords. The samples were previously genotyped for 2.5 million SNPs, mRNA-sequenced, and assayed for methylation levels in 482,421 CpG sites. We observe that methylation sites associated to expression levels are enriched in enhancers, gene bodies and CpG island shores. We show that while the correlation between DNA methylation and gene expression can be positive or negative, it is very consistent across cell-types. However, this epigenetic association to gene expression appears more tissue-specific than the genetic effects on gene expression or DNA methylation (observed in both sharing estimations based on P-values and effect size correlations between cell-types). This predominance of genetic effects can also be reflected by the observation that allele specific expression differences between individuals dominate over tissue-specific effects. Additionally, we discover genetic effects on alternative splicing and interestingly, a large amount of DNA methylation correlating to alternative splicing, both in a tissue-specific manner. The locations of the SNPs and methylation sites involved in these associations highlight the participation of promoter proximal and distant regulatory regions on alternative splicing. Overall, our results provide high-resolution analyses showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific. Furthermore, the details of how this tissue-specificity may vary across inter-relations of molecular traits, and where these are occurring, can yield further insights into gene regulation and cellular biology as a whole.
In order to better understand how genetic differences between individuals can cause diseases, it is crucial to understand how genetic variants affect cellular functions in the different tissues that compose the human body. From the umbilical cord of 195 newborn babies, we previously obtained three different cell-types: fibroblasts, T-cells and immortalized B-cells. From every individual in each cell type we measured four features across the genome: 1) genetic differences, 2) DNA methylation, an epigenetic modification of DNA that can affect its functional state, 3) gene expression—the amount of gene activity, 4) alternative splicing—which of the different versions of a gene is manifested. We find thousands of genetic variants of the DNA sequence that affect methylation, gene expression, and splicing. We show that while these genetic effects often affect multiple cell-types, the strength of these effects varies between cell-types. Also epigenetic methylation marks of DNA associate to gene expression and particularly often to splicing. Since abnormalities in gene expression, DNA methylation and alternative splicing are associated to diseases, it is important to continue studying how these traits are inter-related and affected by genetic variation across cell-types.
The epigenetic activity of transposable elements (TEs) can influence the regulation of genes; though, this regulation is confined to the genes, promoters, and enhancers that neighbor the TE. This local cis regulation of genes therefore limits the influence of the TE's epigenetic regulation on the genome. TE activity is suppressed by small RNAs, which also inhibit viruses and regulate the expression of genes. The production of TE heterochromatin-associated endogenous small interfering RNAs (siRNAs) in the reference plant Arabidopsis thaliana is mechanistically distinct from gene-regulating small RNAs, such as microRNAs or trans-acting siRNAs (tasiRNAs). Previous research identified a TE small RNA that potentially regulates the UBP1b mRNA, which encodes an RNA–binding protein involved in stress granule formation. We demonstrate that this siRNA, siRNA854, is under the same trans-generational epigenetic control as the Athila family LTR retrotransposons from which it is produced. The epigenetic activation of Athila elements results in a shift in small RNA processing pathways, and new 21–22 nucleotide versions of Athila siRNAs are produced by protein components normally not responsible for processing TE siRNAs. This processing results in siRNA854's incorporation into ARGONAUTE1 protein complexes in a similar fashion to gene-regulating tasiRNAs. We have used reporter transgenes to demonstrate that the UPB1b 3′ untranslated region directly responds to the epigenetic status of Athila TEs and the accumulation of siRNA854. The regulation of the UPB1b 3′ untranslated region occurs both on the post-transcriptional and translational levels when Athila TEs are epigenetically activated, and this regulation results in the phenocopy of the ubp1b mutant stress-sensitive phenotype. This demonstrates that a TE's epigenetic activity can modulate the host organism's stress response. In addition, the ability of this TE siRNA to regulate a gene's expression in trans blurs the lines between TE and gene-regulating small RNAs.
The portion of the genome that does not encode for genes is often overlooked as a source of cellular regulatory information. Here, we demonstrate that regulatory information controlling expression and protein production from a gene called UBP1b is coming from a distant non-gene transposable element (TE). TEs are fragments of DNA that, unlike genes, are capable of duplicating themselves from one location in the genome to another, and occupy nearly half of the human genome. TEs are often referred to as “junk DNA,” as the study of cellular regulation and function is focused on genes. The regulation of TEs is distinct from genes, as a process termed epigenetic silencing heritably represses TE expression and activity. We have demonstrated that the epigenetic status (active versus silenced) of the Athila TE family regulates the UBP1b gene through the activity of a TE small RNA. The function of the UPB1b gene is to respond to and regulate cellular stress, and the epigenetic regulatory status of the Athila TE therefore modulates this stress response. This demonstrates that the epigenetic regulation of TEs can be a source of gene regulatory information, influencing a basic cellular function such as the stress response.
The study of genome-wide DNA methylation changes has become more accessible with the development of various array-based technologies though when studying species other than human the choice of applications are limited and not always within reach. In this study, we adapted and tested the applicability of Methylation Specific Digital Karyotyping (MSDK), a non-array based method, for the prospective analysis of epigenetic changes after perinatal nutritional modifications in a mouse model of allergic airway disease. MSDK is a sequenced based method that allows a comprehensive and unbiased methylation profiling. The method generates 21 base pairs long sequence tags derived from specific locations in the genome. The resulting tag frequencies determine in a quantitative manner the methylation level of the corresponding loci.
Genomic DNA from whole lung was isolated and subjected to MSDK analysis using the methylation-sensitive enzyme Not I as the mapping enzyme and Nla III as the fragmenting enzyme. In a pair wise comparison of the generated mouse MSDK libraries we identified 158 loci that are significantly differentially methylated (P-value = 0.05) after perinatal dietary changes in our mouse model. Quantitative methylation specific PCR and sequence analysis of bisulfate modified genomic DNA confirmed changes in methylation at specific loci. Differences in genomic MSDK tag counts for a selected set of genes, correlated well with changes in transcription levels as measured by real-time PCR. Furthermore serial analysis of gene expression profiling demonstrated a dramatic difference in expressed transcripts in mice exposed to perinatal nutritional changes.
The genome-wide methylation survey applied in this study allowed for an unbiased methylation profiling revealing subtle changes in DNA methylation in mice maternally exposed to dietary changes in methyl-donor content. The MSDK method is applicable for mouse models of complex human diseases in a mixed cell population and might be a valuable technology to determine whether environmental exposures can lead to epigenetic changes.
Several recent studies have shown a genetic influence on gene expression variation, including variation between the two chromosomes within an individual and variation between individuals at the population level. We hypothesized that genetic inheritance may also affect variation in chromatin states. To test this hypothesis, we analyzed chromatin states in 12 lymphoblastoid cells derived from two Centre d'Etude du Polymorphisme Humain families using an allele-specific chromatin immunoprecipitation (ChIP-on-chip) assay with Affymetrix 10K SNP chip. We performed the allele-specific ChIP-on-chip assays for the 12 lymphoblastoid cells using antibodies targeting at RNA polymerase II and five post-translation modified forms of the histone H3 protein. The use of multiple cell lines from the Centre d'Etude du Polymorphisme Humain families allowed us to evaluate variation of chromatin states across pedigrees. These studies demonstrated that chromatin state clustered by family. Our results support the idea that genetic inheritance can determine the epigenetic state of the chromatin as shown previously in model organisms. To our knowledge, this is the first demonstration in humans that genetics may be an important factor that influences global chromatin state mediated by histone modification, the hallmark of the epigenetic phenomena.
Human health and disease are determined by an interaction between genetic background and environmental exposures. Both normal development and disease are mediated by epigenetic regulation of gene expression. The epigenetic regulation causes heritable changes in gene expression, which is not associated with DNA sequence changes. Instead, it is mediated by chemical modification of DNA such as DNA methylation or by protein modifications such as histone acetylation and methylation. Although much has been known about epigenetic inheritance during development, little is known about the influence of the genetic background on epigenetic processes such as histone modifications. In this report the authors studied five histone modifications on a genome-wide level in cells from different families. Global epigenetic states, as measured by these histone modifications, showed a similar pattern for cells derived from the same family. This study demonstrates that genetic inheritance may be an important factor influencing global chromatin states mediated by histone modifications in humans. These observations illustrate the importance of integrating genetic and epigenetic information into studies of human health and complex diseases.
CpG island methylation plays an important role in epigenetic gene control during mammalian development and is frequently altered in disease situations such as cancer. The majority of CpG islands is normally unmethylated, but a sizeable fraction is prone to become methylated in various cell types and pathological situations. The goal of this study is to show that a computational epigenetics approach can discriminate between CpG islands that are prone to methylation from those that remain unmethylated. We develop a bioinformatics scoring and prediction method on the basis of a set of 1,184 DNA attributes, which refer to sequence, repeats, predicted structure, CpG islands, genes, predicted binding sites, conservation, and single nucleotide polymorphisms. These attributes are scored on 132 CpG islands across the entire human Chromosome 21, whose methylation status was previously established for normal human lymphocytes. Our results show that three groups of DNA attributes, namely certain sequence patterns, specific DNA repeats, and a particular DNA structure, are each highly correlated with CpG island methylation (correlation coefficients of 0.64, 0.66, and 0.49, respectively). We predicted, and subsequently experimentally examined 12 CpG islands from human Chromosome 21 with unknown methylation patterns and found more than 90% of our predictions to be correct. In addition, we applied our prediction method to analyzing Human Epigenome Project methylation data on human Chromosome 6 and again observed high prediction accuracy. In summary, our results suggest that DNA composition of CpG islands (sequence, repeats, and structure) plays a significant role in predisposing CpG islands for DNA methylation. This finding may have a strong impact on our understanding of changes in CpG island methylation in development and disease.
DNA methylation is the only epigenetic mechanism in eukaryotes that is known to directly modify the DNA. It plays an important role for gene control during development and cell differentiation, and it is a promising therapeutic target in cancer research. While a genome-wide picture of DNA methylation patterns is currently emerging, we have only fragmentary knowledge about the linkage between DNA methylation and other genomic attributes such as DNA sequence and structure, repetitive elements, or sequence conservation. The authors fill this gap by reporting on a comprehensive bioinformatical analysis of DNA methylation on human Chromosome 21—and in part, extending to other regions of the human genome. They report new associations that will help elucidate the functions of DNA methylation along the human genome. Furthermore, the authors show that their findings can be applied to predicting DNA methylation patterns from genome sequence. Such predictions have the potential of speeding up genome-wide epigenetic profiling: It may be possible to focus experimental resources on a few selected areas while bioinformatics procedures are applied to the bulk of the genome.
Behavioral phenotyping and genome-wide profiling of the histone modifier EHMT in Drosophila reveals a mechanism through which an epigenetic writer may control cognition.
The epigenetic modification of chromatin structure and its effect on complex neuronal processes like learning and memory is an emerging field in neuroscience. However, little is known about the “writers” of the neuronal epigenome and how they lay down the basis for proper cognition. Here, we have dissected the neuronal function of the Drosophila euchromatin histone methyltransferase (EHMT), a member of a conserved protein family that methylates histone 3 at lysine 9 (H3K9). EHMT is widely expressed in the nervous system and other tissues, yet EHMT mutant flies are viable. Neurodevelopmental and behavioral analyses identified EHMT as a regulator of peripheral dendrite development, larval locomotor behavior, non-associative learning, and courtship memory. The requirement for EHMT in memory was mapped to 7B-Gal4 positive cells, which are, in adult brains, predominantly mushroom body neurons. Moreover, memory was restored by EHMT re-expression during adulthood, indicating that cognitive defects are reversible in EHMT mutants. To uncover the underlying molecular mechanisms, we generated genome-wide H3K9 dimethylation profiles by ChIP-seq. Loss of H3K9 dimethylation in EHMT mutants occurs at 5% of the euchromatic genome and is enriched at the 5′ and 3′ ends of distinct classes of genes that control neuronal and behavioral processes that are corrupted in EHMT mutants. Our study identifies Drosophila EHMT as a key regulator of cognition that orchestrates an epigenetic program featuring classic learning and memory genes. Our findings are relevant to the pathophysiological mechanisms underlying Kleefstra Syndrome, a severe form of intellectual disability caused by mutations in human EHMT1, and have potential therapeutic implications. Our work thus provides novel insights into the epigenetic control of cognition in health and disease.
Epigenetic regulators can affect gene transcription through modification of DNA and histones, which together form chromatin. The importance of such regulators for cognition is increasingly appreciated, but only few key factors have been identified so far. Excellent candidates are histone modifiers that are involved in intellectual disability, such as EHMT1, implicated in Kleefstra Syndrome. Here, we characterized the neuronal function of EHMT in Drosophila. Flies that lack EHMT are viable but show highly selective defects in specific aspects of neuronal development and function, including learning and memory. Genome-wide analysis of EHMT-mediated histone methylation revealed that EHMT targets the majority of all currently known Drosophila learning and memory genes. It also targets genes known to be involved in the other aspects of behavior and neuronal development that are compromised in EHMT mutants. Remarkably, EHMT mutant memory deficits can be reversed in adulthood, suggesting that epigenetic influences on cognition are not always permanent. Our results provide novel insights into the epigenetic control of cognition in health and disease.
DNA methylation is an epigenetic modification involved in regulatory processes such as cell differentiation during development, X-chromosome inactivation, genomic imprinting and susceptibility to complex disease. However, the dynamics of DNA methylation changes between humans and their closest relatives are still poorly understood. We performed a comparative analysis of CpG methylation patterns between 9 humans and 23 primate samples including all species of great apes (chimpanzee, bonobo, gorilla and orangutan) using Illumina Methylation450 bead arrays. Our analysis identified ∼800 genes with significantly altered methylation patterns among the great apes, including ∼170 genes with a methylation pattern unique to human. Some of these are known to be involved in developmental and neurological features, suggesting that epigenetic changes have been frequent during recent human and primate evolution. We identified a significant positive relationship between the rate of coding variation and alterations of methylation at the promoter level, indicative of co-occurrence between evolution of protein sequence and gene regulation. In contrast, and supporting the idea that many phenotypic differences between humans and great apes are not due to amino acid differences, our analysis also identified 184 genes that are perfectly conserved at protein level between human and chimpanzee, yet show significant epigenetic differences between these two species. We conclude that epigenetic alterations are an important force during primate evolution and have been under-explored in evolutionary comparative genomics.
Differences in protein coding sequences between humans and their closest relatives are too small to account for their phenotypic differences. It has been hypothesized that these differences may be explained by alterations of gene regulation rather than primary genome sequence. DNA methylation is an important epigenetic modification that is involved in many biological processes, but from an evolutionary point of view this modification is still poorly understood. To this end, we performed a comparative analysis of CpG methylation patterns between humans and great apes. Using this approach, we were able to study the dynamics of DNA methylation in recent primate evolution and to identify regions showing species-specific methylation pattern among humans and great apes. We find that genes with alterations of promoter methylation tend to show increased rates of divergence in their protein sequence, and in contrast we also identify many genes with regulatory changes between human and chimpanzee that have perfectly conserved protein sequence. Our study provides the first global view of evolutionary epigenetic changes that have occurred in the genomes of all species of great apes.
A growing body of evidence points towards epigenetic mechanisms being responsible for a wide range of biological phenomena, from the plasticity of plant growth and development to the nutritional control of caste determination in honeybees and the etiology of human disease (e.g., cancer). With the (partial) elucidation of the molecular basis of epigenetic variation and the heritability of certain of these changes, the field of evolutionary epigenetics is flourishing. Despite this, the role of epigenetics in shaping host–pathogen interactions has received comparatively little attention. Yet there is plenty of evidence supporting the implication of epigenetic mechanisms in the modulation of the biological interaction between hosts and pathogens. The phenotypic plasticity of many key parasite life-history traits appears to be under epigenetic control. Moreover, pathogen-induced effects in host phenotype may have transgenerational consequences, and the bases of these changes and their heritability probably have an epigenetic component. The significance of epigenetic modifications may, however, go beyond providing a mechanistic basis for host and pathogen plasticity. Epigenetic epidemiology has recently emerged as a promising area for future research on infectious diseases. In addition, the incorporation of epigenetic inheritance and epigenetic plasticity mechanisms to evolutionary models and empirical studies of host–pathogen interactions will provide new insights into the evolution and coevolution of these associations. Here, we review the evidence available for the role epigenetics on host–pathogen interactions, and the utility and versatility of the epigenetic technologies available that can be cross-applied to host–pathogen studies. We conclude with recommendations and directions for future research on the burgeoning field of epigenetics as applied to host–pathogen interactions.
Epstein-Barr virus (EBV), a human gammaherpesvirus, is associated with a series of malignant tumors. These include lymphomas (Burkitt’s lymphoma, Hodgkin’s disease, T/NK-cell lymphoma, post-transplant lymphoproliferative disease, AIDS-associated lymphoma, X-linked lymphoproliferative syndrome), carcinomas (nasopharyngeal carcinoma, gastric carcinoma, carcinomas of major salivary glands, thymic carcinoma, mammary carcinoma) and a sarcoma (leiomyosarcoma). The latent EBV genomes persist in the tumor cells as circular episomes, co-replicating with the cellular DNA once per cell cycle. The expression of latent EBV genes is cell type specific due to the strict epigenetic control of their promoters. DNA methylation, histone modifications and binding of key cellular regulatory proteins contribute to the regulation of alternative promoters for transcripts encoding the nuclear antigens EBNA1 to 6 and affect the activity of promoters for transcripts encoding transmembrane proteins (LMP1, LMP2A, LMP2B). In addition to genes transcribed by RNA polymerase II, there are also two RNA polymerase III transcribed genes in the EBV genome (EBER 1 and 2). The 5′ and internal regulatory sequences of EBER 1 and 2 transcription units are invariably unmethylated. The highly abundant EBER 1 and 2 RNAs are not translated to protein. Based on the cell type specific epigenetic marks associated with latent EBV genomes one can distinguish between viral epigenotypes that differ in transcriptional activity in spite of having an identical (or nearly identical) DNA sequence. Whereas latent EBV genomes are regularly targeted by epigenetic control mechanisms in different cell types, EBV encoded proteins may, in turn, affect the activity of a set of cellular promoters by interacting with the very same epigenetic regulatory machinery. There are EBNA1 binding sites in the human genome. Because high affinity binding of EBNA1 to its recognition sites is known to specify sites of DNA demethylation, we suggest that binding of EBNA1 to its cellular target sites may elicit local demethylation and contribute thereby to the activation of silent cellular promoters. EBNA2 interacts with histone acetyltransferases, and EBNALP (EBNA5) coactivates transcription by displacing histone deacetylase 4 from EBNA2-bound promoter sites. EBNA3C (EBNA6) seems to be associated both with histone acetylases and deacetylases, although in separate complexes. LMP1, a transmembrane protein involved in malignant transformation, can affect both alternative systems of epigenetic memory, DNA methylation and the Polycomb-trithorax group of protein complexes. In epithelial cells LMP1 can up-regulate DNA methyltransferases and, in Hodgkin lymphoma cells, induce the Polycomb group protein Bmi-1. In addition, LMP1 can also modulate cellular gene expression programs by affecting, via the NF-κB pathway, levels of cellular microRNAs miR-146a and miR-155. These interactions may result in epigenetic dysregulation and subsequent cellular dysfunctions that may manifest in or contribute to the development of pathological changes (e.g. initiation and progression of malignant neoplasms, autoimmune phenomena, immunodeficiency). Thus, Epstein-Barr virus, similarly to other viruses and certain bacteria, may induce pathological changes by epigenetic reprogramming of host cells. Elucidation of the epigenetic consequences of EBV-host interactions (within the framework of the emerging new field of patho-epigenetics) may have important implications for therapy and disease prevention, because epigenetic processes are reversible and continuous silencing of EBV genes contributing to patho-epigenetic changes may prevent disease development.
Mechanisms that generate transcript diversity are of fundamental importance in eukaryotes. Although a large fraction of human protein-coding genes and lincRNAs produce more than one mRNA isoform each, the regulation of this phenomenon is still incompletely understood. Much progress has been made in deciphering the role of sequence-specific features as well as DNA-and RNA-binding proteins in alternative splicing. Recently, however, several experimental studies of individual genes have revealed a direct involvement of epigenetic factors in alternative splicing and transcription initiation. While histone modifications are generally correlated with overall gene expression levels, it remains unclear how histone modification enrichment affects relative isoform abundance. Therefore, we sought to investigate the associations between histone modifications and transcript diversity levels measured by the rates of transcription start-site switching and alternative splicing on a genome-wide scale across protein-coding genes and lincRNAs. We found that the relationship between enrichment levels of epigenetic marks and transcription start-site switching is similar for protein-coding genes and lincRNAs. Furthermore, we found associations between splicing rates and enrichment levels of H2az, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K9me3, H3K27ac, H3K27me3, H3K36me3, H3K79me2, and H4K20me, marks traditionally associated with enhancers, transcription initiation, transcriptional repression, and others. These patterns were observed in both normal and cancer cell lines. Additionally, we developed a novel computational method that identified 840 epigenetically regulated candidate genes and predicted transcription start-site switching and alternative exon splicing with up to 92% accuracy based on epigenetic patterning alone. Our results suggest that the epigenetic regulation of transcript isoform diversity may be a relatively common genome-wide phenomenon representing an avenue of deregulation in tumor development.
Traditionally, the regulation of gene expression was thought to be largely based on DNA and RNA sequence motifs. However, this dogma has recently been challenged as other factors, such as epigenetic patterning of the genome, have become better understood. Sparse but convincing experimental evidence suggests that the epigenetic background, in the form of histone modifications, acts as an additional layer of regulation determining how transcripts are processed. Here we developed a computational approach to investigate the genome-wide prevalence and the level of association between the enrichment of epigenetic marks and transcript diversity generated via alternative transcription start sites and splicing. We found that the role of epigenetic patterning in alternative transcription start-site switching is likely to be the same for all genes whereas the role of epigenetic patterns in splicing is likely gene-specific. Furthermore, we show that epigenetic data alone can be used to predict the inclusion pattern of an exon. These findings have significant implications for a better understanding of the regulation of transcript diversity in humans as well as the modifications arising during tumor development.
Remarkable progress in the field of epigenetics has turned academic, medical and public attention to the potential applications of these new advances in medicine and various fields of biomedical research. The result is a broader appreciation of epigenetic phenomena in the a etiology of common human diseases, most notably cancer. These advances also represent an exciting opportunity to incorporate epigenetics and epigenomics into carcinogen identification and safety assessment. Current epigenetic studies, including major international sequencing projects, are expected to generate information for establishing the ‘normal’ epigenome of tissues and cell types as well as the physiological variability of the epigenome against which carcinogen exposure can be assessed. Recently, epigenetic events have emerged as key mechanisms in cancer development, and while our search of the Monograph Volume 100 revealed that epigenetics have played a modest role in evaluating human carcinogens by the International Agency for Research on Cancer (IARC) Monographs so far, epigenetic data might play a pivotal role in the future. Here, we review (i) the current status of incorporation of epigenetics in carcinogen evaluation in the IARC Monographs Programme, (ii) potential modes of action for epigenetic carcinogens, (iii) current in vivo and in vitro technologies to detect epigenetic carcinogens, (iv) genomic regions and epigenetic modifications and their biological consequences and (v) critical technological and biological issues in assessment of epigenetic carcinogens. We also discuss the issues related to opportunities and challenges in the application of epigenetic testing in carcinogen identification and evaluation. Although the application of epigenetic assays in carcinogen evaluation is still in its infancy, important data are being generated and valuable scientific resources are being established that should catalyse future applications of epigenetic testing.