Search tips
Search criteria

Results 1-2 (2)

Clipboard (0)
Year of Publication
Document Types
1.  Redefining CpG islands using hidden Markov models 
Biostatistics (Oxford, England)  2010;11(3):499-514.
The DNA of most vertebrates is depleted in CpG dinucleotide: a C followed by a G in the 5′ to 3′ direction. CpGs are the target for DNA methylation, a chemical modification of cytosine (C) heritable during cell division and the most well-characterized epigenetic mechanism. The remaining CpGs tend to cluster in regions referred to as CpG islands (CGI). Knowing CGI locations is important because they mark functionally relevant epigenetic loci in development and disease. For various mammals, including human, a readily available and widely used list of CGI is available from the UCSC Genome Browser. This list was derived using algorithms that search for regions satisfying a definition of CGI proposed by Gardiner-Garden and Frommer more than 20 years ago. Recent findings, enabled by advances in technology that permit direct measurement of epigenetic endpoints at a whole-genome scale, motivate the need to adapt the current CGI definition. In this paper, we propose a procedure, guided by hidden Markov models, that permits an extensible approach to detecting CGI. The main advantage of our approach over others is that it summarizes the evidence for CGI status as probability scores. This provides flexibility in the definition of a CGI and facilitates the creation of CGI lists for other species. The utility of this approach is demonstrated by generating the first CGI lists for invertebrates, and the fact that we can create CGI lists that substantially increases overlap with recently discovered epigenetic marks. A CGI list and the probability scores, as a function of genome location, for each species are available at
PMCID: PMC2883304  PMID: 20212320
CpG island; Epigenetics; Hidden Markov model; Sequence analysis
2.  Modified test statistics by inter-voxel variance shrinkage with an application to f MRI 
Biostatistics (Oxford, England)  2008;10(2):219-227.
Functional magnetic resonance imaging (f MRI) is a noninvasive technique which is commonly used to quantify changes in blood oxygenation and flow coupled to neuronal activation. One of the primary goals of f MRI studies is to identify localized brain regions where neuronal activation levels vary between groups. Single voxel t-tests have been commonly used to determine whether activation related to the protocol differs across groups. Due to the generally limited number of subjects within each study, accurate estimation of variance at each voxel is difficult. Thus, combining information across voxels is desirable in order to improve efficiency. Here, we construct a hierarchical model and apply an empirical Bayesian framework for the analysis of group f MRI data, employing techniques used in high-throughput genomic studies. The key idea is to shrink residual variances by combining information across voxels and subsequently to construct an improved test statistic. This hierarchical model results in a shrinkage of voxel-wise residual sample variances toward a common value. The shrunken estimator for voxel-specific variance components on the group analyses outperforms the classical residual error estimator in terms of mean-squared error. Moreover, the shrunken test statistic decreases false-positive rates when testing differences in brain contrast maps across a wide range of simulation studies. This methodology was also applied to experimental data regarding a cognitive activation task.
PMCID: PMC3159431  PMID: 18723853
General liner model; Group analysis; Hierarchical models; Image analysis; Shrinkage estimation

Results 1-2 (2)