PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-3 (3)
 

Clipboard (0)
None
Journals
Authors
more »
Year of Publication
Document Types
1.  Significance analysis and statistical dissection of variably methylated regions 
Biostatistics (Oxford, England)  2011;13(1):166-178.
It has recently been proposed that variation in DNA methylation at specific genomic locations may play an important role in the development of complex diseases such as cancer. Here, we develop 1- and 2-group multiple testing procedures for identifying and quantifying regions of DNA methylation variability. Our method is the first genome-wide statistical significance calculation for increased or differential variability, as opposed to the traditional approach of testing for mean changes. We apply these procedures to genome-wide methylation data obtained from biological and technical replicates and provide the first statistical proof that variably methylated regions exist and are due to interindividual variation. We also show that differentially variable regions in colon tumor and normal tissue show enrichment of genes regulating gene expression, cell morphogenesis, and development, supporting a biological role for DNA methylation variability in cancer.
doi:10.1093/biostatistics/kxr013
PMCID: PMC3276267  PMID: 21685414
Bump finding; Functional data analysis; Multiple testing; Preprocessing; Variably methylation regions (VMRs)
2.  Accurate genome-scale percentage DNA methylation estimates from microarray data 
Biostatistics (Oxford, England)  2010;12(2):197-210.
DNA methylation is a key regulator of gene function in a multitude of both normal and abnormal biological processes, but tools to elucidate its roles on a genome-wide scale are still in their infancy. Methylation sensitive restriction enzymes and microarrays provide a potential high-throughput, low-cost platform to allow methylation profiling. However, accurate absolute methylation estimates have been elusive due to systematic errors and unwanted variability. Previous microarray preprocessing procedures, mostly developed for expression arrays, fail to adequately normalize methylation-related data since they rely on key assumptions that are violated in the case of DNA methylation. We develop a normalization strategy tailored to DNA methylation data and an empirical Bayes percentage methylation estimator that together yield accurate absolute methylation estimates that can be compared across samples. We illustrate the method on data generated to detect methylation differences between tissues and between normal and tumor colon samples.
doi:10.1093/biostatistics/kxq055
PMCID: PMC3062148  PMID: 20858772
DNA methylation; Epigenetics; Microarray
3.  Redefining CpG islands using hidden Markov models 
Biostatistics (Oxford, England)  2010;11(3):499-514.
The DNA of most vertebrates is depleted in CpG dinucleotide: a C followed by a G in the 5′ to 3′ direction. CpGs are the target for DNA methylation, a chemical modification of cytosine (C) heritable during cell division and the most well-characterized epigenetic mechanism. The remaining CpGs tend to cluster in regions referred to as CpG islands (CGI). Knowing CGI locations is important because they mark functionally relevant epigenetic loci in development and disease. For various mammals, including human, a readily available and widely used list of CGI is available from the UCSC Genome Browser. This list was derived using algorithms that search for regions satisfying a definition of CGI proposed by Gardiner-Garden and Frommer more than 20 years ago. Recent findings, enabled by advances in technology that permit direct measurement of epigenetic endpoints at a whole-genome scale, motivate the need to adapt the current CGI definition. In this paper, we propose a procedure, guided by hidden Markov models, that permits an extensible approach to detecting CGI. The main advantage of our approach over others is that it summarizes the evidence for CGI status as probability scores. This provides flexibility in the definition of a CGI and facilitates the creation of CGI lists for other species. The utility of this approach is demonstrated by generating the first CGI lists for invertebrates, and the fact that we can create CGI lists that substantially increases overlap with recently discovered epigenetic marks. A CGI list and the probability scores, as a function of genome location, for each species are available at http://www.rafalab.org.
doi:10.1093/biostatistics/kxq005
PMCID: PMC2883304  PMID: 20212320
CpG island; Epigenetics; Hidden Markov model; Sequence analysis

Results 1-3 (3)