PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (987707)

Clipboard (0)
None

Related Articles

1.  Prediction of epigenetically regulated genes in breast cancer cell lines 
BMC Bioinformatics  2010;11:305.
Background
Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines, which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fxed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis.
Results
Our two-step dimensionality reduction compressed 90% of the original data, reducing 137,688 methylation sites to 14,505 clusters. Methylation-expression associations produced 18,312 correspondences, which were used to further analyze epigenetic regulation. Logistic regression was used to identify 58 genes from these correspondences that showed a statistically signifcant negative correlation between methylation profles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes.
Conclusions
Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.
doi:10.1186/1471-2105-11-305
PMCID: PMC2903569  PMID: 20525369
2.  Integrative DNA Methylation and Gene Expression Analyses Identify DNA Packaging and Epigenetic Regulatory Genes Associated with Low Motility Sperm 
PLoS ONE  2011;6(6):e20280.
Background
In previous studies using candidate gene approaches, low sperm count (oligospermia) has been associated with altered sperm mRNA content and DNA methylation in both imprinted and non-imprinted genes. We performed a genome-wide analysis of sperm DNA methylation and mRNA content to test for associations with sperm function.
Methods and Results
Sperm DNA and mRNA were isolated from 21 men with a range of semen parameters presenting to a tertiary male reproductive health clinic. DNA methylation was measured with the Illumina Infinium array at 27,578 CpG loci. Unsupervised clustering of methylation data differentiated the 21 sperm samples by their motility values. Recursively partitioned mixture modeling (RPMM) of methylation data resulted in four distinct methylation profiles that were significantly associated with sperm motility (P = 0.01). Linear models of microarray analysis (LIMMA) was performed based on motility and identified 9,189 CpG loci with significantly altered methylation (Q<0.05) in the low motility samples. In addition, the majority of these disrupted CpG loci (80%) were hypomethylated. Of the aberrantly methylated CpGs, 194 were associated with imprinted genes and were almost equally distributed into hypermethylated (predominantly paternally expressed) and hypomethylated (predominantly maternally expressed) groups. Sperm mRNA was measured with the Human Gene 1.0 ST Affymetrix GeneChip Array. LIMMA analysis identified 20 candidate transcripts as differentially present in low motility sperm, including HDAC1 (NCBI 3065), SIRT3 (NCBI 23410), and DNMT3A (NCBI 1788). There was a trend among altered expression of these epigenetic regulatory genes and RPMM DNA methylation class.
Conclusions
Using integrative genome-wide approaches we identified CpG methylation profiles and mRNA alterations associated with low sperm motility.
doi:10.1371/journal.pone.0020280
PMCID: PMC3107223  PMID: 21674046
3.  Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions 
BMC Bioinformatics  2008;9:365.
Background
Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner.
Results
We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age.
Conclusion
Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.
doi:10.1186/1471-2105-9-365
PMCID: PMC2553421  PMID: 18782434
4.  An integrative characterization of recurrent molecular aberrations in glioblastoma genomes 
Nucleic Acids Research  2013;41(19):8803-8821.
Glioblastoma multiforme (GBM) is the most common and malignant primary brain tumor in adults. Decades of investigations and the recent effort of the Cancer Genome Atlas (TCGA) project have mapped many molecular alterations in GBM cells. Alterations on DNAs may dysregulate gene expressions and drive malignancy of tumors. It is thus important to uncover causal and statistical dependency between ‘effector’ molecular aberrations and ‘target’ gene expressions in GBMs. A rich collection of prior studies attempted to combine copy number variation (CNV) and mRNA expression data. However, systematic methods to integrate multiple types of cancer genomic data—gene mutations, single nucleotide polymorphisms, CNVs, DNA methylations, mRNA and microRNA expressions and clinical information—are relatively scarce. We proposed an algorithm to build ‘association modules’ linking effector molecular aberrations and target gene expressions and applied the module-finding algorithm to the integrated TCGA GBM data sets. The inferred association modules were validated by six tests using external information and datasets of central nervous system tumors: (i) indication of prognostic effects among patients; (ii) coherence of target gene expressions; (iii) retention of effector–target associations in external data sets; (iv) recurrence of effector molecular aberrations in GBM; (v) functional enrichment of target genes; and (vi) co-citations between effectors and targets. Modules associated with well-known molecular aberrations of GBM—such as chromosome 7 amplifications, chromosome 10 deletions, EGFR and NF1 mutations—passed the majority of the validation tests. Furthermore, several modules associated with less well-reported molecular aberrations—such as chromosome 11 CNVs, CD40, PLXNB1 and GSTM1 methylations, and mir-21 expressions—were also validated by external information. In particular, modules constituting trans-acting effects with chromosome 11 CNVs and cis-acting effects with chromosome 10 CNVs manifested strong negative and positive associations with survival times in brain tumors. By aligning the information of association modules with the established GBM subclasses based on transcription or methylation levels, we found each subclass possessed multiple concurrent molecular aberrations. Furthermore, the joint molecular characteristics derived from 16 association modules had prognostic power not explained away by the strong biomarker of CpG island methylator phenotypes. Functional and survival analyses indicated that immune/inflammatory responses and epithelial-mesenchymal transitions were among the most important determining processes of prognosis. Finally, we demonstrated that certain molecular aberrations uniquely recurred in GBM but were relatively rare in non-GBM glioma cells. These results justify the utility of an integrative analysis on cancer genomes and provide testable characterizations of driver aberration events in GBM.
doi:10.1093/nar/gkt656
PMCID: PMC3799430  PMID: 23907387
5.  DNA methylation subgroups and the CpG island methylator phenotype in gastric cancer: a comprehensive profiling approach 
BMC Gastroenterology  2014;14:55.
Background
Methylation-induced silencing of promoter CpG islands in tumor suppressor genes plays an important role in human carcinogenesis. In colorectal cancer, the CpG island methylator phenotype (CIMP) is defined as widespread and elevated levels of DNA methylation and CIMP+ tumors have distinctive clinicopathological and molecular features. In contrast, the existence of a comparable CIMP subtype in gastric cancer (GC) has not been clearly established. To further investigate this issue, in the present study we performed comprehensive DNA methylation profiling of a well-characterised series of primary GC.
Methods
The methylation status of 1,421 autosomal CpG sites located within 768 cancer-related genes was investigated using the Illumina GoldenGate Methylation Panel I assay on DNA extracted from 60 gastric tumors and matched tumor-adjacent gastric tissue pairs. Methylation data was analysed using a recursively partitioned mixture model and investigated for associations with clinicopathological and molecular features including age, Helicobacter pylori status, tumor site, patient survival, microsatellite instability and BRAF and KRAS mutations.
Results
A total of 147 genes were differentially methylated between tumor and matched tumor-adjacent gastric tissue, with HOXA5 and hedgehog signalling being the top-ranked gene and signalling pathway, respectively. Unsupervised clustering of methylation data revealed the existence of 6 subgroups under two main clusters, referred to as L (low methylation; 28% of cases) and H (high methylation; 72%). Female patients were over-represented in the H tumor group compared to L group (36% vs 6%; P = 0.024), however no other significant differences in clinicopathological or molecular features were apparent. CpG sites that were hypermethylated in group H were more frequently located in CpG islands and marked for polycomb occupancy.
Conclusions
High-throughput methylation analysis implicates genes involved in embryonic development and hedgehog signaling in gastric tumorigenesis. GC is comprised of two major methylation subtypes, with the highly methylated group showing some features consistent with a CpG island methylator phenotype.
doi:10.1186/1471-230X-14-55
PMCID: PMC3986689  PMID: 24674026
Methylation; Gastric cancer; Microarray; CIMP; GoldenGate
6.  Epigenomic diversity of colorectal cancer indicated by LINE-1 methylation in a database of 869 tumors 
Molecular Cancer  2010;9:125.
Background
Genome-wide DNA hypomethylation plays a role in genomic instability and carcinogenesis. LINE-1 (L1 retrotransposon) constitutes a substantial portion of the human genome, and LINE-1 methylation correlates with global DNA methylation status. LINE-1 hypomethylation in colon cancer has been strongly associated with poor prognosis. However, whether LINE-1 hypomethylators constitute a distinct cancer subtype remains uncertain. Recent evidence for concordant LINE-1 hypomethylation within synchronous colorectal cancer pairs suggests the presence of a non-stochastic mechanism influencing tumor LINE-1 methylation level. Thus, it is of particular interest to examine whether its wide variation can be attributed to clinical, pathologic or molecular features.
Design
Utilizing a database of 869 colorectal cancers in two prospective cohort studies, we constructed multivariate linear and logistic regression models for LINE-1 methylation (quantified by Pyrosequencing). Variables included age, sex, body mass index, family history of colorectal cancer, smoking status, tumor location, stage, grade, mucinous component, signet ring cells, tumor infiltrating lymphocytes, CpG island methylator phenotype (CIMP), microsatellite instability, expression of TP53 (p53), CDKN1A (p21), CTNNB1 (β-catenin), PTGS2 (cyclooxygenase-2), and FASN, and mutations in KRAS, BRAF, and PIK3CA.
Results
Tumoral LINE-1 methylation ranged from 23.1 to 90.3 of 0-100 scale (mean 61.4; median 62.3; standard deviation 9.6), and distributed approximately normally except for extreme hypomethylators [LINE-1 methylation < 40; N = 22 (2.5%), which were far more than what could be expected by normal distribution]. LINE-1 extreme hypomethylators were significantly associated with younger patients (p = 0.0058). Residual plot by multivariate linear regression showed that LINE-1 extreme hypomethylators clustered as one distinct group, separate from the main tumor group. The multivariate linear regression model could explain 8.4% of the total variability of LINE-1 methylation (R-square = 0.084). Multivariate logistic regression models for binary LINE-1 hypomethylation outcomes (cutoffs of 40, 50 and 60) showed at most fair predictive ability (area under receiver operator characteristics curve < 0.63).
Conclusions
LINE-1 extreme hypomethylators appear to constitute a previously-unrecognized, distinct subtype of colorectal cancers, which needs to be confirmed by additional studies. Our tumor LINE-1 methylation data indicate enormous epigenomic diversity of individual colorectal cancers.
doi:10.1186/1476-4598-9-125
PMCID: PMC2892454  PMID: 20507599
7.  Aberrant DNA Methylation of OLIG1, a Novel Prognostic Factor in Non-Small Cell Lung Cancer 
PLoS Medicine  2007;4(3):e108.
Background
Lung cancer is the leading cause of cancer-related death worldwide. Currently, tumor, node, metastasis (TNM) staging provides the most accurate prognostic parameter for patients with non-small cell lung cancer (NSCLC). However, the overall survival of patients with resectable tumors varies significantly, indicating the need for additional prognostic factors to better predict the outcome of the disease, particularly within a given TNM subset.
Methods and Findings
In this study, we investigated whether adenocarcinomas and squamous cell carcinomas could be differentiated based on their global aberrant DNA methylation patterns. We performed restriction landmark genomic scanning on 40 patient samples and identified 47 DNA methylation targets that together could distinguish the two lung cancer subgroups. The protein expression of one of those targets, oligodendrocyte transcription factor 1 (OLIG1), significantly correlated with survival in NSCLC patients, as shown by univariate and multivariate analyses. Furthermore, the hazard ratio for patients negative for OLIG1 protein was significantly higher than the one for those patients expressing the protein, even at low levels.
Conclusions
Multivariate analyses of our data confirmed that OLIG1 protein expression significantly correlates with overall survival in NSCLC patients, with a relative risk of 0.84 (95% confidence interval 0.77–0.91, p < 0.001) along with T and N stages, as indicated by a Cox proportional hazard model. Taken together, our results suggests that OLIG1 protein expression could be utilized as a novel prognostic factor, which could aid in deciding which NSCLC patients might benefit from more aggressive therapy. This is potentially of great significance, as the addition of postoperative adjuvant chemotherapy in T2N0 NSCLC patients is still controversial.
Christopher Plass and colleagues find thatOLIG1 expression correlates with survival in lung cancer patients and suggest that it could be used in deciding which patients are likely to benefit from more aggressive therapy.
Editors' Summary
Background.
Lung cancer is the commonest cause of cancer-related death worldwide. Most cases are of a type called non-small cell lung cancer (NSCLC). Like other cancers, treatment of NCSLC depends on the “TNM stage” at which the cancer is detected. Staging takes into account the size and local spread of the tumor (its T classification), whether nearby lymph nodes contain tumor cells (its N classification), and whether tumor cells have spread (metastasized) throughout the body (its M classification). Stage I tumors are confined to the lung and are removed surgically. Stage II tumors have spread to nearby lymph nodes and are treated with a combination of surgery and chemotherapy. Stage III tumors have spread throughout the chest, and stage IV tumors have metastasized around the body; patients with both of these stages are treated with chemotherapy alone. About 70% of patients with stage I or II lung cancer, but only 2% of patients with stage IV lung cancer, survive for five years after diagnosis.
Why Was This Study Done?
TNM staging is the best way to predict the likely outcome (prognosis) for patients with NSCLC, but survival times for patients with stage I and II tumors vary widely. Another prognostic marker—maybe a “molecular signature”—that could distinguish patients who are likely to respond to treatment from those whose cancer will inevitably progress would be very useful. Unlike normal cells, cancer cells divide uncontrollably and can move around the body. These behavioral changes are caused by alterations in the pattern of proteins expressed by the cells. But what causes these alterations? The answer in some cases is “epigenetic changes” or chemical modifications of genes. In cancer cells, methyl groups are aberrantly added to GC-rich gene regions. These so-called “CpG islands” lie near gene promoters (sequences that control the transcription of DNA into mRNA, the template for protein production), and their methylation stops the promoters working and silences the gene. In this study, the researchers have investigated whether aberrant methylation patterns vary between NSCLC subtypes and whether specific aberrant methylations are associated with survival and can, therefore, be used prognostically.
What Did the Researchers Do and Find?
The researchers used “restriction landmark genomic scanning” (RLGS) to catalog global aberrant DNA methylation patterns in human lung tumor samples. In RLGS, DNA is cut into fragments with a restriction enzyme (a protein that cuts at specific DNA sequences), end-labeled, and separated using two-dimensional gel electrophoresis to give a pattern of spots. Because methylation stops some restriction enzymes cutting their target sequence, normal lung tissue and lung tumor samples yield different patterns of spots. The researchers used these patterns to identify 47 DNA methylation targets (many in CpG islands) that together distinguished between adenocarcinomas and squamous cell carcinomas, two major types of NSCLCs. Next, they measured mRNA production from the genes with the greatest difference in methylation between adenocarcinomas and squamous cell carcinomas. OLIG1 (the gene that encodes a protein involved in nerve cell development) had one of the highest differences in mRNA production between these tumor types. Furthermore, three-quarters of NSCLCs had reduced or no expression of OLIG1 protein and, when the researchers analyzed the association between OLIG1 protein expression and overall survival in patients with NSCLC, reduced OLIG1 protein expression was associated with reduced survival.
What Do These Findings Mean?
These findings indicate that different types of NSCLC can be distinguished by examining their aberrant methylation patterns. This suggests that the establishment of different DNA methylation patterns might be related to the cell type from which the tumors developed. Alternatively, the different aberrant methylation patterns might reflect the different routes that these cells take to becoming tumor cells. This research identifies a potential new prognostic marker for NSCLC by showing that OLIG1 protein expression correlates with overall survival in patients with NSCLC. This correlation needs to be tested in a clinical setting to see if adding OLIG1 expression to the current prognostic parameters can lead to better treatment choices for early-stage lung cancer patients and ultimately improve these patients' overall survival.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040108.
Patient and professional information on lung cancer, including staging (in English and Spanish), is available from the US National Cancer Institute
The MedlinePlus encyclopedia has pages on non-small cell lung cancer (in English and Spanish)
Cancerbackup provides patient information on lung cancer
CancerQuest, provided by Emory University, has information about how cancer develops (in English, Spanish, Chinese and Russian)
Wikipedia pages on epigenetics (note that Wikipedia is a free online encyclopedia that anyone can edit)
The Epigenome Network of Excellence gives background information and the latest news about epigenetics (in several European languages)
doi:10.1371/journal.pmed.0040108
PMCID: PMC1831740  PMID: 17388669
8.  Single-CpG-resolution methylome analysis identifies clinicopathologically aggressive CpG island methylator phenotype clear cell renal cell carcinomas 
Carcinogenesis  2012;33(8):1487-1493.
To clarify the significance of DNA methylation alterations during renal carcinogenesis, methylome analysis using single-CpG-resolution Infinium array was performed on 29 normal renal cortex tissue (C) samples, 107 non-cancerous renal cortex tissue (N) samples obtained from patients with clear cell renal cell carcinomas (RCCs) and 109 tumorous tissue (T) samples. DNA methylation levels at 4830 CpG sites were already altered in N samples compared with C samples. Unsupervised hierarchical clustering analysis based on DNA methylation levels at the 801 CpG sites, where DNA methylation alterations had occurred in N samples and were inherited by and strengthened in T samples, clustered clear cell RCCs into Cluster A (n = 90) and Cluster B (n = 14). Clinicopathologically aggressive tumors were accumulated in Cluster B, and the cancer-free and overall survival rates of patients in this cluster were significantly lower than those of patients in Cluster A. Clear cell RCCs in Cluster B were characterized by accumulation of DNA hypermethylation on CpG islands and considered to be CpG island methylator phenotype (CIMP)-positive cancers. DNA hypermethylation of the CpG sites on the FAM150A, GRM6, ZNF540, ZFP42, ZNF154, RIMS4, PCDHAC1, KHDRBS2, ASCL2, KCNQ1, PRAC, WNT3A, TRH, FAM78A, ZNF671, SLC13A5 and NKX6-2 genes became hallmarks of CIMP in RCCs. On the other hand, Cluster A was characterized by genome-wide DNA hypomethylation. These data indicated that DNA methylation alterations at precancerous stages may determine tumor aggressiveness and patient outcome. Accumulation of DNA hypermethylation on CpG islands and genome-wide DNA hypomethylation may each underlie distinct pathways of renal carcinogenesis.
Abbreviations:BAMCAbacterial artificial chromosome array-based methylated CpG island amplificationCnormal renal cortex tissue obtained from patients without any primary renal tumorCIMPCpG island methylator phenotypeHCChepatocellular carcinomaNnon-cancerous renal cortex tissue obtained from patients with clear cell renal cell carcinomasNCBINational Center for Biotechnology InformationRCCrenal cell carcinomaTtumorous tissueTNMTumor-Node-Metastasis
doi:10.1093/carcin/bgs177
PMCID: PMC3418891  PMID: 22610075
9.  The Honey Bee Epigenomes: Differential Methylation of Brain DNA in Queens and Workers 
PLoS Biology  2010;8(11):e1000506.
Using genome-wide methylation profiles in honey bee queen and worker brains to understand how contrasting organismal outputs are generated from the same genotype.
In honey bees (Apis mellifera) the behaviorally and reproductively distinct queen and worker female castes derive from the same genome as a result of differential intake of royal jelly and are implemented in concert with DNA methylation. To determine if these very different diet-controlled phenotypes correlate with unique brain methylomes, we conducted a study to determine the methyl cytosine (mC) distribution in the brains of queens and workers at single-base-pair resolution using shotgun bisulfite sequencing technology. The whole-genome sequencing was validated by deep 454 sequencing of selected amplicons representing eight methylated genes. We found that nearly all mCs are located in CpG dinucleotides in the exons of 5,854 genes showing greater sequence conservation than non-methylated genes. Over 550 genes show significant methylation differences between queens and workers, revealing the intricate dynamics of methylation patterns. The distinctiveness of the differentially methylated genes is underscored by their intermediate CpG densities relative to drastically CpG-depleted methylated genes and to CpG-richer non-methylated genes. We find a strong correlation between methylation patterns and splicing sites including those that have the potential to generate alternative exons. We validate our genome-wide analyses by a detailed examination of two transcript variants encoded by one of the differentially methylated genes. The link between methylation and splicing is further supported by the differential methylation of genes belonging to the histone gene family. We propose that modulation of alternative splicing is one mechanism by which DNA methylation could be linked to gene regulation in the honey bee. Our study describes a level of molecular diversity previously unknown in honey bees that might be important for generating phenotypic flexibility not only during development but also in the adult post-mitotic brain.
Author Summary
The queen honey bee and her worker sisters do not seem to have much in common. Workers are active and intelligent, skillfully navigating the outside world in search of food for the colony. They never reproduce; that task is left entirely to the much larger and longer-lived queen, who is permanently ensconced within the colony and uses a powerful chemical influence to exert control. Remarkably, these two female castes are generated from identical genomes. The key to each female's developmental destiny is her diet as a larva: future queens are raised on royal jelly. This specialized diet is thought to affect a particular chemical modification, methylation, of the bee's DNA, causing the same genome to be deployed differently. To document differences in this epigenomic setting and hypothesize about its effects on behavior, we performed high-resolution bisulphite sequencing of whole genomes from the brains of queen and worker honey bees. In contrast to the heavily methylated human genome, we found that only a small and specific fraction of the honey bee genome is methylated. Most methylation occurred within conserved genes that provide critical cellular functions. Over 550 genes showed significant methylation differences between the queen and the worker, which may contribute to the profound divergence in behavior. How DNA methylation works on these genes remains unclear, but it may change their accessibility to the cellular machinery that controls their expression. We found a tantalizing clue to a mechanism in the clustering of methylation within parts of genes where splicing occurs, suggesting that methylation could control which of several versions of a gene is expressed. Our study provides the first documentation of extensive molecular differences that may allow honey bees to generate different phenotypes from the same genome.
doi:10.1371/journal.pbio.1000506
PMCID: PMC2970541  PMID: 21072239
10.  Recursively partitioned mixture model clustering of DNA methylation data using biologically informed correlation structures 
DNA methylation is a well-recognized epigenetic mechanism that has been the subject of a growing body of literature typically focused on the identification and study of profiles of DNA methylation and their association with human diseases and exposures. In recent years, a number of unsupervised clustering algorithms, both parametric and non-parametric, have been proposed for clustering large-scale DNA methylation data. However, most of these approaches do not incorporate known biological relationships of measured features, and in some cases, rely on unrealistic assumptions regarding the nature of DNA methylation. Here, we propose a modified version of a recursively partitioned mixture model (RPMM) that integrates information related to the proximity of CpG loci within the genome to inform correlation structures from which subsequent clustering analysis is based. Using simulations and four methylation data sets, we demonstrate that integrating biologically informative correlation structures within RPMM resulted in improved goodness-of-fit, clustering consistency, and the ability to detect biologically meaningful clusters compared to methods which ignore such correlation. Integrating biologically-informed correlation structures to enhance modeling techniques is motivated by the rapid increase in resolution of DNA methylation microarrays and the increasing understanding of the biology of this epigenetic mechanism.
doi:10.1515/sagmb-2012-0068
PMCID: PMC4007267  PMID: 23468465
finite mixture models epigenetics; genomic data; model-based clustering
11.  Genome-Wide DNA Methylation Analysis Predicts an Epigenetic Switch for GATA Factor Expression in Endometriosis 
PLoS Genetics  2014;10(3):e1004158.
Endometriosis is a gynecological disease defined by the extrauterine growth of endometrial-like cells that cause chronic pain and infertility. The disease is limited to primates that exhibit spontaneous decidualization, and diseased cells are characterized by significant defects in the steroid-dependent genetic pathways that typify this process. Altered DNA methylation may underlie these defects, but few regions with differential methylation have been implicated in the disease. We mapped genome-wide differences in DNA methylation between healthy human endometrial and endometriotic stromal cells and correlated this with gene expression using an interaction analysis strategy. We identified 42,248 differentially methylated CpGs in endometriosis compared to healthy cells. These extensive differences were not unidirectional, but were focused intragenically and at sites distal to classic CpG islands where methylation status was typically negatively correlated with gene expression. Significant differences in methylation were mapped to 403 genes, which included a disproportionally large number of transcription factors. Furthermore, many of these genes are implicated in the pathology of endometriosis and decidualization. Our results tremendously improve the scope and resolution of differential methylation affecting the HOX gene clusters, nuclear receptor genes, and intriguingly the GATA family of transcription factors. Functional analysis of the GATA family revealed that GATA2 regulates key genes necessary for the hormone-driven differentiation of healthy stromal cells, but is hypermethylated and repressed in endometriotic cells. GATA6, which is hypomethylated and abundant in endometriotic cells, potently blocked hormone sensitivity, repressed GATA2, and induced markers of endometriosis when expressed in healthy endometrial cells. The unique epigenetic fingerprint in endometriosis suggests DNA methylation is an integral component of the disease, and identifies a novel role for the GATA family as key regulators of uterine physiology–aberrant DNA methylation in endometriotic cells correlates with a shift in GATA isoform expression that facilitates progesterone resistance and disease progression.
Author Summary
Women develop endometriosis when endometrial tissue with altered sensitivity to ovarian hormones grows outside the uterus. The persistent survival of these cells results in chronic pelvic pain and infertility. Although the origin of the disease remains a mystery, it only occurs in women and menstruating primates, suggesting that the unique evolution behind primate uterine development and menstruation are linked to the disease. Epigenetic defects affecting the uterine physiological response to ovarian hormones are also involved in endometriosis, and several genes implicated in disease progression are differentially methylated. Here we compared DNA methylation with gene expression in endometriosis using large-scale arrays. By comparing healthy and diseased cells treated with or without hormones to mimic part of the menstrual cycle, we uncovered many differentially methylated genes with defective expression in endometriosis that also regulate the hormone-dependent aspects of menstruation. In addition to expanding our understanding of how methylation affects endometriosis many fold, this also led us to propose an epigenetic switch that permits GATA6 expression in endometriosis instead of GATA2, and this switch promotes the aberrant expression of many of the genes seen in endometriosis. Our work provides novel unifying insight into the cause and development of endometriosis.
doi:10.1371/journal.pgen.1004158
PMCID: PMC3945170  PMID: 24603652
12.  Method to Detect Differentially Methylated Loci with Case-Control Designs using Illumina Arrays 
Genetic epidemiology  2011;35(7):686-694.
It is now understood that virtually all human cancer types are the result of the accumulation of both genetic and epigenetic changes. DNA methylation is a molecular modification of DNA that is crucial for normal development. Genes that are rich in CpG dinucleotides are usually not methylated in normal tissues, but are frequently hypermethylated in cancer. With the advent of high-throughput platforms, large-scale structure of genomic methylation patterns is available through genome-wide scans and tremendous amount of DNA methylation data have been recently generated. However, sophisticated statistical methods to handle complex DNA methylation data are very limited. Here we developed a likelihood based Uniform-Normal-mixture model to select differentially methylated loci between case and control groups using Illumina arrays. The idea is to model the data as three types of methylation loci, one unmethylated, one completely methylated, and one partially methylated. A three-component mixture model with two Uniform distributions and one truncated normal distribution was used to model the three types. The mixture probabilities and the mean of the normal distribution were used to make inference about differentially methylated loci. Through extensive simulation studies, we demonstrated the feasibility and power of the proposed method. An application to a recently published study on ovarian cancer identified several methylation loci that are missed by the existing method.
doi:10.1002/gepi.20619
PMCID: PMC3197755  PMID: 21818777
DNA methylation; mixture model; case-control designs
13.  The Dynamics of DNA Methylation Covariation Patterns in Carcinogenesis 
PLoS Computational Biology  2014;10(7):e1003709.
Recently it has been observed that cancer tissue is characterised by an increased variability in DNA methylation patterns. However, how the correlative patterns in genome-wide DNA methylation change during the carcinogenic progress has not yet been explored. Here we study genome-wide inter-CpG correlations in DNA methylation, in addition to single site variability, during cervical carcinogenesis. We demonstrate how the study of changes in DNA methylation covariation patterns across normal, intra-epithelial neoplasia and invasive cancer allows the identification of CpG sites that indicate the risk of neoplastic transformation in stages prior to neoplasia. Importantly, we show that the covariation in DNA methylation at these risk CpG loci is maximal immediately prior to the onset of cancer, supporting the view that high epigenetic diversity in normal cells increases the risk of cancer. Consistent with this, we observe that invasive cancers exhibit increased covariation in DNA methylation at the risk CpG sites relative to normal tissue, but lower levels relative to pre-cancerous lesions. We further show that the identified risk CpG sites undergo preferential DNA methylation changes in relation to human papilloma virus infection and age. Results are validated in independent data including prospectively collected samples prior to neoplastic transformation. Our data are consistent with a phase transition model of carcinogenesis, in which epigenetic diversity is maximal prior to the onset of cancer. The model and algorithm proposed here may allow, in future, network biomarkers predicting the risk of neoplastic transformation to be identified.
Author Summary
DNA methylation is a covalent modification of DNA which can regulate how active genes are. DNA methylation is altered at many genomic loci in cancer cells, leading to widespread functional disruption. Importantly, DNA methylation alterations across the genome are seen even in early carcinogenesis. Although the pattern of DNA methylation change during carcinogenesis has been studied at individual genomic loci, no study has yet analysed how these patterns change at a systems-level, specifically how do DNA methylation patterns at pairs of genomic sites change during disease progression. Doing so can shed light on how the epigenetic diversity of cell populations changes during the carcinogenic process. This study performs a systems-level analysis of the dynamic changes in DNA methylation correlation pattern during cervical carcinogenesis, demonstrating that epigenetic diversity is maximal just prior to the onset of cancer. Importantly, this supports the view that the risk of cancer development is closely related to an increase in epigenetic diversity in apparently healthy cells. In addition, the study provides a computational algorithm which successfully identifies the altered genomic sites confering the risk of cervical cancer.
doi:10.1371/journal.pcbi.1003709
PMCID: PMC4091688  PMID: 25010556
14.  A Beta-mixture model for dimensionality reduction, sample classification and analysis 
BMC Bioinformatics  2011;12:215.
Background
Patterns of genome-wide methylation vary between tissue types. For example, cancer tissue shows markedly different patterns from those of normal tissue. In this paper we propose a beta-mixture model to describe genome-wide methylation patterns based on probe data from methylation microarrays. The model takes dependencies between neighbour probe pairs into account and assumes three broad categories of methylation, low, medium and high. The model is described by 37 parameters, which reduces the dimensionality of a typical methylation microarray significantly. We used methylation microarray data from 42 colon cancer samples to assess the model.
Results
Based on data from colon cancer samples we show that our model captures genome-wide characteristics of methylation patterns. We estimate the parameters of the model and show that they vary between different tissue types. Further, for each methylation probe the posterior probability of a methylation state (low, medium or high) is calculated and the probability that the state is correctly predicted is assessed. We demonstrate that the model can be applied to classify cancer tissue types accurately and that the model provides accessible and easily interpretable data summaries.
Conclusions
We have developed a beta-mixture model for methylation microarray data. The model substantially reduces the dimensionality of the data. It can be used for further analysis, such as sample classification or to detect changes in methylation status between different samples and tissues.
doi:10.1186/1471-2105-12-215
PMCID: PMC3126746  PMID: 21619656
15.  Quantitation of DNA methylation by melt curve analysis 
BMC Cancer  2009;9:123.
Background
Methylation of DNA is a common mechanism for silencing genes, and aberrant methylation is increasingly being implicated in many diseases such as cancer. There is a need for robust, inexpensive methods to quantitate methylation across a region containing a number of CpGs. We describe and validate a rapid, in-tube method to quantitate DNA methylation using the melt data obtained following amplification of bisulfite modified DNA in a real-time thermocycler.
Methods
We first describe a mathematical method to normalise the raw fluorescence data generated by heating the amplified bisulfite modified DNA. From this normalised data the temperatures at which melting begins and finishes can be calculated, which reflect the less and more methylated template molecules present respectively. Also the T50, the temperature at which half the amplicons are melted, which represents the summative methylation of all the CpGs in the template mixture, can be calculated. These parameters describe the methylation characteristics of the region amplified in the original sample.
Results
For validation we used synthesized oligonucleotides and DNA from fresh cells and formalin fixed paraffin embedded tissue, each with known methylation. Using our quantitation we could distinguish between unmethylated, partially methylated and fully methylated oligonucleotides mixed in varying ratios. There was a linear relationship between T50 and the dilution of methylated into unmethylated DNA. We could quantitate the change in methylation over time in cell lines treated with the demethylating drug 5-aza-2'-deoxycytidine, and the differences in methylation associated with complete, clonal or no loss of MGMT expression in formalin fixed paraffin embedded tissues.
Conclusion
We have validated a rapid, simple in-tube method to quantify methylation which is robust and reproducible, utilizes easily designed primers and does not need proprietary algorithms or software. The technique does not depend on any operator manipulation or interpretation of the melt curves, and is suitable for use in any laboratory with a real-time thermocycler. The parameters derived provide an objective description and quantitation of the methylation in a specimen, and can be used to for statistical comparisons of methylation between specimens.
doi:10.1186/1471-2407-9-123
PMCID: PMC2679043  PMID: 19393074
16.  Polycomb group genes are targets of aberrant DNA methylation in renal cell carcinoma 
Epigenetics  2011;6(6):703-709.
The combined effects of genetic and epigenetic aberrations are well recognized as causal in tumorigenesis. Here, we defined profiles of DNA methylation in primary renal cell carcinomas (RCC) and assessed the association of these profiles with the expression of genes required for the establishment and maintenance of epigenetic marks. A bead-based methylation array platform was used to measure methylation of 1,413 CpG loci in ∼800 cancer-associated genes and three methylation classes were derived by unsupervised clustering of tumors using recursively partitioned mixture modeling (RPMM). Quantitative RT-PCR was performed on all tumor samples to determine the expression of DNMT1, DNMT3B, VEZF1 and EZH2. Additionally, methylation at LINE-1 and AluYb8 repetitive elements was measured using bisulfite pyrosequencing. Associations between methylation class and tumor stage (p = 0.05), LINE-1 (p < 0.0001) and AluYb8 (p < 0.0001) methylation, as well as EZH2 expression (p < 0.0001) were noted following univariate analyses. A multinomial logistic regression model controlling for potential confounders revealed that AluYb8 (p < 0.003) methylation and EZH2 expression (p < 0.008) were significantly associated with methylation class membership. Because EZH2 is a member of the Polycomb repressive complex 2 (PRC2), we next analyzed the distribution of Polycomb group (PcG) targets among methylation classes derived by clustering the 1,413 array CpG loci using RPMM. PcG target genes were significantly enriched (p < 0.0001) in methylation classes with greater differential methylation between RCC and non-diseased kidney tissue. This work contributes to our understanding of how repressive marks on DNA and chromatin are dysregulated in carcinogenesis, knowledge that might aid the development of therapies or preventive strategies for human malignancies.
doi:10.4161/epi.6.6.16158
PMCID: PMC3230543  PMID: 21610323
EZH2; DNA methylation; renal cell carcinoma; polycomb; microarray
17.  Genomic Distribution and Inter-Sample Variation of Non-CpG Methylation across Human Cell Types 
PLoS Genetics  2011;7(12):e1002389.
DNA methylation plays an important role in development and disease. The primary sites of DNA methylation in vertebrates are cytosines in the CpG dinucleotide context, which account for roughly three quarters of the total DNA methylation content in human and mouse cells. While the genomic distribution, inter-individual stability, and functional role of CpG methylation are reasonably well understood, little is known about DNA methylation targeting CpA, CpT, and CpC (non-CpG) dinucleotides. Here we report a comprehensive analysis of non-CpG methylation in 76 genome-scale DNA methylation maps across pluripotent and differentiated human cell types. We confirm non-CpG methylation to be predominantly present in pluripotent cell types and observe a decrease upon differentiation and near complete absence in various somatic cell types. Although no function has been assigned to it in pluripotency, our data highlight that non-CpG methylation patterns reappear upon iPS cell reprogramming. Intriguingly, the patterns are highly variable and show little conservation between different pluripotent cell lines. We find a strong correlation of non-CpG methylation and DNMT3 expression levels while showing statistical independence of non-CpG methylation from pluripotency associated gene expression. In line with these findings, we show that knockdown of DNMTA and DNMT3B in hESCs results in a global reduction of non-CpG methylation. Finally, non-CpG methylation appears to be spatially correlated with CpG methylation. In summary these results contribute further to our understanding of cytosine methylation patterns in human cells using a large representative sample set.
Author Summary
Epigenetic modifications including DNA methylation at the position 5 of the cytosine base provide regulatory information to the genome sequence. The primary target of cytosine methylation in mammals is the CpG dinucleotide. However, previous studies in the mouse and more recent work in humans have highlighted the presence of non-CpG methylation in pluripotent cells. Currently, little is known about the role of this type of DNA methylation. We sought to further characterize non-CpG methylation by employing a comprehensive data set of genome-scale methylation maps across various human cell types. Our analysis reveals that non-CpG methylation varies dramatically between pluripotent cells and is closely linked to CpG methylation. Moreover, we show that depletion of the de novo DNA methyltransferases results in a global reduction of non-CpG methylation levels. Taken together, these findings further advance our understanding of cytosine methylation and describe its distribution among a large number of human cell types.
doi:10.1371/journal.pgen.1002389
PMCID: PMC3234221  PMID: 22174693
18.  Widespread Hypomethylation Occurs Early and Synergizes with Gene Amplification during Esophageal Carcinogenesis 
PLoS Genetics  2011;7(3):e1001356.
Although a combination of genomic and epigenetic alterations are implicated in the multistep transformation of normal squamous esophageal epithelium to Barrett esophagus, dysplasia, and adenocarcinoma, the combinatorial effect of these changes is unknown. By integrating genome-wide DNA methylation, copy number, and transcriptomic datasets obtained from endoscopic biopsies of neoplastic progression within the same individual, we are uniquely able to define the molecular events associated progression of Barrett esophagus. We find that the previously reported global hypomethylation phenomenon in cancer has its origins at the earliest stages of epithelial carcinogenesis. Promoter hypomethylation synergizes with gene amplification and leads to significant upregulation of a chr4q21 chemokine cluster and other transcripts during Barrett neoplasia. In contrast, gene-specific hypermethylation is observed at a restricted number of loci and, in combination with hemi-allelic deletions, leads to downregulatation of selected transcripts during multistep progression. We also observe that epigenetic regulation during epithelial carcinogenesis is not restricted to traditionally defined “CpG islands,” but may also occur through a mechanism of differential methylation outside of these regions. Finally, validation of novel upregulated targets (CXCL1 and 3, GATA6, and DMBT1) in a larger independent panel of samples confirms the utility of integrative analysis in cancer biomarker discovery.
Author Summary
The incidence of esophageal adenocarcinoma (EA) is increasing at an alarming pace in the United States. Distinct pathological stages of Barrett's metaplasia and low- and high-grade dysplasia can be seen preceding malignant transformation. These precursor lesions provide a unique in vivo model for deepening our understanding the early steps in human neoplasia. By integrating genome-wide DNA methylation, copy number, and transcriptomic datasets obtained from endoscopic biopsies of neoplastic progression within the same individual, we are uniquely able to define the molecular events associated progression of Barrett esophagus. We show that the predominant change during this process is loss of DNA methylation. We show that this global hypomethylation occurs very early during the process and is seen even in preinvasive lesions. This loss of DNA methylation drives carcinogenesis by cooperating with gene amplifications in upregulating proteins during this process. Finally we uncovered proteins that upregulated by loss of methylation or gene amplification (CXCL1 and 3, GATA6, and DMBT1) and show their relevance by validating their levels in larger independent panel of samples, thus confirming the utility of integrative analysis in cancer biomarker discovery.
doi:10.1371/journal.pgen.1001356
PMCID: PMC3069107  PMID: 21483804
19.  CMS: A Web-Based System for Visualization and Analysis of Genome-Wide Methylation Data of Human Cancers 
PLoS ONE  2013;8(4):e60980.
Background
DNA methylation of promoter CpG islands is associated with gene suppression, and its unique genome-wide profiles have been linked to tumor progression. Coupled with high-throughput sequencing technologies, it can now efficiently determine genome-wide methylation profiles in cancer cells. Also, experimental and computational technologies make it possible to find the functional relationship between cancer-specific methylation patterns and their clinicopathological parameters.
Methodology/Principal Findings
Cancer methylome system (CMS) is a web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. Methylation intensities were obtained from MBDCap-sequencing, pre-processed and stored in the database. 191 patient samples (169 tumor and 22 normal specimen) and 41 breast cancer cell-lines are deposited in the database, comprising about 6.6 billion uniquely mapped sequence reads. This provides comprehensive and genome-wide epigenetic portraits of human breast cancer and endometrial cancer to date. Two views are proposed for users to better understand methylation structure at the genomic level or systemic methylation alteration at the gene level. In addition, a variety of annotation tracks are provided to cover genomic information. CMS includes important analytic functions for interpretation of methylation data, such as the detection of differentially methylated regions, statistical calculation of global methylation intensities, multiple gene sets of biologically significant categories, interactivity with UCSC via custom-track data. We also present examples of discoveries utilizing the framework.
Conclusions/Significance
CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research. CMS is freely accessible at: http://cbbiweb.uthscsa.edu/KMethylomes/.
doi:10.1371/journal.pone.0060980
PMCID: PMC3632540  PMID: 23630576
20.  Breast Cancer DNA Methylation Profiles Are Associated with Tumor Size and Alcohol and Folate Intake 
PLoS Genetics  2010;6(7):e1001043.
Although tumor size and lymph node involvement are the current cornerstones of breast cancer prognosis, they have not been extensively explored in relation to tumor methylation attributes in conjunction with other tumor and patient dietary and hormonal characteristics. Using primary breast tumors from 162 (AJCC stage I–IV) women from the Kaiser Division of Research Pathways Study and the Illumina GoldenGate methylation bead-array platform, we measured 1,413 autosomal CpG loci associated with 773 cancer-related genes and validated select CpG loci with Sequenom EpiTYPER. Tumor grade, size, estrogen and progesterone receptor status, and triple negative status were significantly (Q-values <0.05) associated with altered methylation of 209, 74, 183, 69, and 130 loci, respectively. Unsupervised clustering, using a recursively partitioned mixture model (RPMM), of all autosomal CpG loci revealed eight distinct methylation classes. Methylation class membership was significantly associated with patient race (P<0.02) and tumor size (P<0.001) in univariate tests. Using multinomial logistic regression to adjust for potential confounders, patient age and tumor size, as well as known disease risk factors of alcohol intake and total dietary folate, were all significantly (P<0.0001) associated with methylation class membership. Breast cancer prognostic characteristics and risk-related exposures appear to be associated with gene-specific tumor methylation, as well as overall methylation patterns.
Author Summary
The current standard prognostic indicator for breast cancer is tumor-node-metastasis staging; though, as population-based studies and clinical trials are conducted, molecular characterization of disease is beginning to allow improved markers of prognosis and assist clinicians in choosing the most appropriate therapies. We investigated DNA methylation profiles in over 160 well annotated breast tumor samples and found significant relationships with standard and other known predictors of prognosis, as well as established risk factors for disease: alcohol intake and dietary folate. Recently the United States National Cancer Institute Cancer Biomarkers Research Group articulated a need for a “Strategic Approach to Validating Methylated Genes as Biomarkers for Breast Cancer,” and our work is extremely responsive to this call for a national strategy. Recognizing the increasing use of pre-operative chemotherapy for patients with operable, early-stage disease, there is added complexity in breast cancer staging. Since chemotherapy can considerably decrease tumor size, it is still unclear whether pre-operative or post-operative stage best informs prognosis and treatment decisions for patients electing pre-operative chemotherapy. However, our data clearly illustrate the promise of tumor DNA methylation for augmenting tumor staging and can be attained with minimal tissue in a pre-operative context.
doi:10.1371/journal.pgen.1001043
PMCID: PMC2912395  PMID: 20686660
21.  A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data 
BMC Bioinformatics  2009;10:165.
Background
Cluster analysis has become a standard computational method for gene function discovery as well as for more general explanatory data analysis. A number of different approaches have been proposed for that purpose, out of which different mixture models provide a principled probabilistic framework. Cluster analysis is increasingly often supplemented with multiple data sources nowadays, and these heterogeneous information sources should be made as efficient use of as possible.
Results
This paper presents a novel Beta-Gaussian mixture model (BGMM) for clustering genes based on Gaussian distributed and beta distributed data. The proposed BGMM can be viewed as a natural extension of the beta mixture model (BMM) and the Gaussian mixture model (GMM). The proposed BGMM method differs from other mixture model based methods in its integration of two different data types into a single and unified probabilistic modeling framework, which provides a more efficient use of multiple data sources than methods that analyze different data sources separately. Moreover, BGMM provides an exceedingly flexible modeling framework since many data sources can be modeled as Gaussian or beta distributed random variables, and it can also be extended to integrate data that have other parametric distributions as well, which adds even more flexibility to this model-based clustering framework. We developed three types of estimation algorithms for BGMM, the standard expectation maximization (EM) algorithm, an approximated EM and a hybrid EM, and propose to tackle the model selection problem by well-known model selection criteria, for which we test the Akaike information criterion (AIC), a modified AIC (AIC3), the Bayesian information criterion (BIC), and the integrated classification likelihood-BIC (ICL-BIC).
Conclusion
Performance tests with simulated data show that combining two different data sources into a single mixture joint model greatly improves the clustering accuracy compared with either of its two extreme cases, GMM or BMM. Applications with real mouse gene expression data (modeled as Gaussian distribution) and protein-DNA binding probabilities (modeled as beta distribution) also demonstrate that BGMM can yield more biologically reasonable results compared with either of its two extreme cases. One of our applications has found three groups of genes that are likely to be involved in Myd88-dependent Toll-like receptor 3/4 (TLR-3/4) signaling cascades, which might be useful to better understand the TLR-3/4 signal transduction.
doi:10.1186/1471-2105-10-165
PMCID: PMC2717092  PMID: 19480678
22.  A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters 
BMC Bioinformatics  2012;13(Suppl 3):S15.
Background
DNA methylation is essential for normal development and differentiation and plays a crucial role in the development of nearly all types of cancer. Aberrant DNA methylation patterns, including genome-wide hypomethylation and region-specific hypermethylation, are frequently observed and contribute to the malignant phenotype. A number of studies have recently identified distinct features of genomic sequences that can be used for modeling specific DNA sequences that may be susceptible to aberrant CpG methylation in both cancer and normal cells. Although it is now possible, using next generation sequencing technologies, to assess human methylomes at base resolution, no reports currently exist on modeling cell type-specific DNA methylation susceptibility. Thus, we conducted a comprehensive modeling study of cell type-specific DNA methylation susceptibility at three different resolutions: CpG dinucleotides, CpG segments, and individual gene promoter regions.
Results
Using a k-mer mixture logistic regression model, we effectively modeled DNA methylation susceptibility across five different cell types. Further, at the segment level, we achieved up to 0.75 in AUC prediction accuracy in a 10-fold cross validation study using a mixture of k-mers.
Conclusions
The significance of these results is three fold: 1) this is the first report to indicate that CpG methylation susceptible "segments" exist; 2) our model demonstrates the significance of certain k-mers for the mixture model, potentially highlighting DNA sequence features (k-mers) of differentially methylated, promoter CpG island sequences across different tissue types; 3) as only 3 or 4 bp patterns had previously been used for modeling DNA methylation susceptibility, ours is the first demonstration that 6-mer modeling can be performed without loss of accuracy.
doi:10.1186/1471-2105-13-S3-S15
PMCID: PMC3311103  PMID: 22536899
23.  Comprehensive Biostatistical Analysis of CpG Island Methylator Phenotype in Colorectal Cancer Using a Large Population-Based Sample 
PLoS ONE  2008;3(11):e3698.
Background
The CpG island methylator phenotype (CIMP) is a distinct phenotype associated with microsatellite instability (MSI) and BRAF mutation in colon cancer. Recent investigations have selected 5 promoters (CACNA1G, IGF2, NEUROG1, RUNX3 and SOCS1) as surrogate markers for CIMP-high. However, no study has comprehensively evaluated an expanded set of methylation markers (including these 5 markers) using a large number of tumors, or deciphered the complex clinical and molecular associations with CIMP-high determined by the validated marker panel.
Metholodology/Principal Findings
DNA methylation at 16 CpG islands [the above 5 plus CDKN2A (p16), CHFR, CRABP1, HIC1, IGFBP3, MGMT, MINT1, MINT31, MLH1, p14 (CDKN2A/ARF) and WRN] was quantified in 904 colorectal cancers by real-time PCR (MethyLight). In unsupervised hierarchical clustering analysis, the 5 markers (CACNA1G, IGF2, NEUROG1, RUNX3 and SOCS1), CDKN2A, CRABP1, MINT31, MLH1, p14 and WRN were generally clustered with each other and with MSI and BRAF mutation. KRAS mutation was not clustered with any methylation marker, suggesting its association with a random methylation pattern in CIMP-low tumors. Utilizing the validated CIMP marker panel (including the 5 markers), multivariate logistic regression demonstrated that CIMP-high was independently associated with older age, proximal location, poor differentiation, MSI-high, BRAF mutation, and inversely with LINE-1 hypomethylation and β-catenin (CTNNB1) activation. Mucinous feature, signet ring cells, and p53-negativity were associated with CIMP-high in only univariate analysis. In stratified analyses, the relations of CIMP-high with poor differentiation, KRAS mutation and LINE-1 hypomethylation significantly differed according to MSI status.
Conclusions
Our study provides valuable data for standardization of the use of CIMP-high-specific methylation markers. CIMP-high is independently associated with clinical and key molecular features in colorectal cancer. Our data also suggest that KRAS mutation is related with a random CpG island methylation pattern which may lead to CIMP-low tumors.
doi:10.1371/journal.pone.0003698
PMCID: PMC2579485  PMID: 19002263
24.  Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates 
eLife  2013;2:e00348.
Two-thirds of gene promoters in mammals are associated with regions of non-methylated DNA, called CpG islands (CGIs), which counteract the repressive effects of DNA methylation on chromatin. In cold-blooded vertebrates, computational CGI predictions often reside away from gene promoters, suggesting a major divergence in gene promoter architecture across vertebrates. By experimentally identifying non-methylated DNA in the genomes of seven diverse vertebrates, we instead reveal that non-methylated islands (NMIs) of DNA are a central feature of vertebrate gene promoters. Furthermore, NMIs are present at orthologous genes across vast evolutionary distances, revealing a surprising level of conservation in this epigenetic feature. By profiling NMIs in different tissues and developmental stages we uncover a unifying set of features that are central to the function of NMIs in vertebrates. Together these findings demonstrate an ancient logic for NMI usage at gene promoters and reveal an unprecedented level of epigenetic conservation across vertebrate evolution.
DOI: http://dx.doi.org/10.7554/eLife.00348.001
eLife digest
DNA methylation—the addition of a methyl group to cytosine, one of the four bases found in DNA—is a central process in genetics. By preventing genes from being expressed as proteins, DNA methylation is one of a number of epigenetic mechanisms that can determine which proteins are made in different cell types without changing the underlying DNA sequence.
In warm-blooded vertebrates such as mammals most of the genome is methylated, however short regions of non-methylated DNA are known to be associated with gene promoters (regions of DNA that act as binding sites for the enzymes and transcription factors that transcribe the DNA in the gene into RNA). Much of our current understanding of the role of these islands of non-methylated DNA is based on computational predictions rather than experimental data. In cold-blooded vertebrates, for example, computer models often predict that non-methylated islands are not associated with gene promoters, which potentially suggests an evolutionary divergence in the role of methylation amongst vertebrates. However, this idea has not been confirmed by experimental data.
Long et al. have performed experiments to compare the location of non-methylated islands in seven different vertebrate species. In general they find that computational models are not a reliable method for identifying non-methylated islands. Moreover they find that non-methylated islands are a central epigenetic feature of gene promoters in all vertebrates analysed–including three mammals, a bird, a lizard, a frog and a fish—and not just in warm-blooded vertebrates as suggested by computational models. This shows that the epigenetic function of these non-methylated islands has been conserved over more than 450 million years of evolution.
In addition to the non-methylated islands associated with gene promoters, Long et al. identify two other types: intergenic non-methylated islands that are found away from gene promoters and are said to be ‘plastic’ because the DNA in these islands can acquire methyl groups, and ‘broad’ non-methylated islands that span many of the genes that are involved in embryonic development.
By showing that the epigenetic role of non-methylated islands has been conserved over time, and identifying three specific types of island, the work of Long et al. marks an important change in our understanding of epigenetics in vertebrates.
DOI: http://dx.doi.org/10.7554/eLife.00348.002
doi:10.7554/eLife.00348
PMCID: PMC3583005  PMID: 23467541
CpG islands; DNA methylation; Epigenetics; Chromatin; Evolutionary conservation; Chicken; Human; Mouse; Xenopus; Zebrafish
25.  Profile analysis and prediction of tissue-specific CpG island methylation classes 
BMC Bioinformatics  2009;10:116.
Background
The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissue- specific methylation pattern.
Results
We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation.
Conclusion
Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods.
doi:10.1186/1471-2105-10-116
PMCID: PMC2683815  PMID: 19383127

Results 1-25 (987707)