Search tips
Search criteria

Results 1-8 (8)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
author:("Taslim, kenny")
1.  A Mixture Modeling Framework for Differential Analysis of High-Throughput Data 
The inventions of microarray and next generation sequencing technologies have revolutionized research in genomics; platforms have led to massive amount of data in gene expression, methylation, and protein-DNA interactions. A common theme among a number of biological problems using high-throughput technologies is differential analysis. Despite the common theme, different data types have their own unique features, creating a “moving target” scenario. As such, methods specifically designed for one data type may not lead to satisfactory results when applied to another data type. To meet this challenge so that not only currently existing data types but also data from future problems, platforms, or experiments can be analyzed, we propose a mixture modeling framework that is flexible enough to automatically adapt to any moving target. More specifically, the approach considers several classes of mixture models and essentially provides a model-based procedure whose model is adaptive to the particular data being analyzed. We demonstrate the utility of the methodology by applying it to three types of real data: gene expression, methylation, and ChIP-seq. We also carried out simulations to gauge the performance and showed that the approach can be more efficient than any individual model without inflating type I error.
PMCID: PMC4095709  PMID: 25057284
2.  A Quantitative Proteomic Workflow for Characterization of Frozen Clinical Biopsies: Laser Capture Microdissection Coupled with Label-Free Mass Spectrometry 
Journal of proteomics  2012;77:10.1016/j.jprot.2012.09.019.
This paper describes a simple, highly efficient and robust proteomic workflow for routine liquid-chromatography tandem mass spectrometry analysis of Laser Microdissection Pressure Catapulting (LMPC) isolates. Highly efficient protein recovery was achieved by optimization of a “one-pot” protein extraction and digestion allowing for reproducible proteomic analysis on as few as 500 LMPC isolated cells. The method was combined with label-free spectral count quantitation to characterize proteomic differences from 3,000–10,000 LMPC isolated cells. Significance analysis of spectral count data was accomplished using the edgeR tag-count R package combined with hierarchical cluster analysis. To illustrate the capability of this robust workflow, two examples are presented: 1) analysis of keratinocytes from human punch biopsies of normal skin and a chronic diabetic wound and 2) comparison of glomeruli from needle biopsies of patients with kidney disease. Differentially expressed proteins were validated by use of immunohistochemistry. These examples illustrate that tissue proteomics carried out on limited clinical material can obtain informative proteomic signatures for disease pathogenesis and demonstrate the suitability of this approach for biomarker discovery.
PMCID: PMC3835202  PMID: 23022584
Laser Capture Microdissection; Proteomics; Label-free; Biopsy; Mass Spectrometry
3.  Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data 
PLoS Computational Biology  2013;9(11):e1003326.
Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.
PMCID: PMC3828144  PMID: 24244136
4.  Integrative genome-wide chromatin signature analysis using finite mixture models 
BMC Genomics  2012;13(Suppl 6):S3.
Regulation of gene expression has been shown to involve not only the binding of transcription factor at target gene promoters but also the characterization of histone around which DNA is wrapped around. Some histone modification, for example di-methylated histone H3 at lysine 4 (H3K4me2), has been shown to bind to promoters and activate target genes. However, no clear pattern has been shown to predict human promoters. This paper proposed a novel quantitative approach to characterize patterns of promoter regions and predict novel and alternative promoters. We utilized high-throughput data generated using chromatin immunoprecipitation methods followed by massively parallel sequencing (ChIP-seq) technology on RNA Polymerase II (Pol-II) and H3K4me2. Common patterns of promoter regions are modeled using a mixture model involving double-exponential and uniform distributions. The fitted model obtained were then used to search for regions displaying similar patterns over the entire genome to find novel and alternative promoters. Regions with high correlations with the common patterns are identified as putative novel promoters. We used this proposed algorithm, RNA-seq data and several transcripts databases to find alternative promoters in MCF7 (normal breast cancer) cell line. We found 7,235 high-confidence regions that display the identified promoter patterns. Of these, 4,167 regions (58%) can be mapped to RefSeq regions. 2,444 regions are in a gene body or overlap with transcripts (non-coding RNAs, ESTs, and transcripts that are predicted by RNA-seq data). Some of these maybe potential alternative promoters. We also found 193 regions that map to enhancer regions (represented by androgen and estrogen receptor binding sites) and other regulatory regions such as CTCF (CCCTC binding factor) and CpG island. Around 5% (431 regions) of these correlated regions do not overlap with any transcripts or regulatory regions suggesting that these might be potential new promoters or markers for other annotation which are currently undiscovered.
PMCID: PMC3481451  PMID: 23134707
5.  DIME: R-package for identifying differential ChIP-seq based on an ensemble of mixture models 
Bioinformatics  2011;27(11):1569-1570.
Summary: Differential Identification using Mixtures Ensemble (DIME) is a package for identification of biologically significant differential binding sites between two conditions using ChIP-seq data. It considers a collection of finite mixture models combined with a false discovery rate (FDR) criterion to find statistically significant regions. This leads to a more reliable assessment of differential binding sites based on a statistical approach. In addition to ChIP-seq, DIME is also applicable to data from other high-throughput platforms.
Availability and implementation: DIME is implemented as an R-package, which is available at It may also be downloaded from
PMCID: PMC3102220  PMID: 21471015
6.  Epigenetic Silencing Mediated Through Activated PI3K/AKT Signaling in Breast Cancer 
Cancer research  2011;71(5):1752-1762.
Trimethylation of histone 3 lysine 27 (H3K27me3) is a critical epigenetic mark for the maintenance of gene silencing. Additional accumulation of DNA methylation in target loci is thought to cooperatively support this epigenetic silencing during tumorigenesis. However, molecular mechanisms underlying the complex interplay between the two marks remain to be explored. Here we demonstrate that activation of PI3K/AKT signaling can be a trigger of this epigenetic processing at many downstream target genes. We also find that DNA methylation can be acquired at the same loci in cancer cells, thereby reinforcing permanent repression in those losing the H3K27me3 mark. Because of a link between PI3K/AKT signaling and epigenetic alterations, we conducted epigenetic therapies in conjunction with the signaling-targeted treatment. These combined treatments synergistically relieve gene silencing and suppress cancer cell growth in vitro and in xenografts. The new finding has important implications for improving targeted cancer therapies in the future.
PMCID: PMC3048165  PMID: 21216892
Epigenetic silencing; H3K27me3; DNA methylation; PI3K/AKT signaling; breast cancer
7.  Integrated analysis identifies a class of androgen-responsive genes regulated by short combinatorial long-range mechanism facilitated by CTCF 
Nucleic Acids Research  2012;40(11):4754-4764.
Recently, much attention has been given to elucidate how long-range gene regulation comes into play and how histone modifications and distal transcription factor binding contribute toward this mechanism. Androgen receptor (AR), a key regulator of prostate cancer, has been shown to regulate its target genes via distal enhancers, leading to the hypothesis of global long-range gene regulation. However, despite numerous flows of newly generated data, the precise mechanism with respect to AR-mediated long-range gene regulation is still largely unknown. In this study, we carried out an integrated analysis combining several types of high-throughput data, including genome-wide distribution data of H3K4 di-methylation (H3K4me2), CCCTC binding factor (CTCF), AR and FoxA1 cistrome data as well as androgen-regulated gene expression data. We found that a subset of androgen-responsive genes was significantly enriched near AR/H3K4me2 overlapping regions and FoxA1 binding sites within the same CTCF block. Importantly, genes in this class were enriched in cancer-related pathways and were downregulated in clinical metastatic versus localized prostate cancer. Our results suggest a relatively short combinatorial long-range regulation mechanism facilitated by CTCF blocking. Under such a mechanism, H3K4me2, AR and FoxA1 within the same CTCF block combinatorially regulate a subset of distally located androgen-responsive genes involved in prostate carcinogenesis.
PMCID: PMC3367180  PMID: 22344698
8.  Comparative study on ChIP-seq data: normalization and binding pattern characterization 
Bioinformatics  2009;25(18):2334-2340.
Motivation: Antibody-based Chromatin Immunoprecipitation assay followed by high-throughput sequencing technology (ChIP-seq) is a relatively new method to study the binding patterns of specific protein molecules over the entire genome. ChIP-seq technology allows scientist to get more comprehensive results in shorter time. Here, we present a non-linear normalization algorithm and a mixture modeling method for comparing ChIP-seq data from multiple samples and characterizing genes based on their RNA polymerase II (Pol II) binding patterns.
Results: We apply a two-step non-linear normalization method based on locally weighted regression (LOESS) approach to compare ChIP-seq data across multiple samples and model the difference using an Exponential-NormalK mixture model. Fitted model is used to identify genes associated with differential binding sites based on local false discovery rate (fdr). These genes are then standardized and hierarchically clustered to characterize their Pol II binding patterns. As a case study, we apply the analysis procedure comparing normal breast cancer (MCF7) to tamoxifen-resistant (OHT) cell line. We find enriched regions that are associated with cancer (P < 0.0001). Our findings also imply that there may be a dysregulation of cell cycle and gene expression control pathways in the tamoxifen-resistant cells. These results show that the non-linear normalization method can be used to analyze ChIP-seq data across multiple samples.
Availability: Data are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2800347  PMID: 19561022

Results 1-8 (8)