Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Quantifying uniformity of mapped reads 
Bioinformatics  2012;28(20):2680-2682.
Summary: We describe a tool for quantifying the uniformity of mapped reads in high-throughput sequencing experiments. Our statistic directly measures the uniformity of both read position and fragment length, and we explain how to compute a P-value that can be used to quantify biases arising from experimental protocols and mapping procedures. Our method is useful for comparing different protocols in experiments such as RNA-Seq.
Availability and implementation: We provide a freely available and open source python script that can be used to analyze raw read data or reads mapped to transcripts in BAM format at
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3467739  PMID: 22815359
2.  Differential gene expression in normal and transformed human mammary epithelial cells in response to oxidative stress 
Free radical biology & medicine  2011;50(11):1565-1574.
Oxidative stress plays a key role in breast carcinogenesis. To investigate whether normal and malignant breast epithelial cells differ in their responses to oxidative stress, we examined the global gene expression profiles of three cell types, representing cancer progression from a normal to a malignant stage, under oxidative stress. Normal human mammary epithelial cells (HMEC), an immortalized cell line (HMLER-1), and a tumorigenic cell line (HMLER-5), were exposed to increased levels of reactive oxygen species (ROS) by treatment with glucose oxidase. Functional analysis of the metabolic pathways enriched with differentially expressed genes demonstrates that normal and malignant breast epithelial cells diverge substantially in their response to oxidative stress. While normal cells exhibit the up-regulation of antioxidant mechanisms, cancer cells are unresponsive to the ROS insult. However, the gene expression response of normal HMEC cells under oxidative stress is comparable to that of the malignant cells under normal conditions, indicating that altered redox status is persistent in breast cancer cells, which makes them resistant to increased generation of ROS. This study discusses some of the possible adaptation mechanisms of breast cancer cells under persistent oxidative stress that differentiate them from the response to acute oxidative stress in normal mammary epithelial cells.
PMCID: PMC3119600  PMID: 21397008
Oxidative stress; breast cancer; human mammary epithelial cells; microarrays; glucose oxidase; GluOx
3.  Shape-based peak identification for ChIP-Seq 
BMC Bioinformatics  2011;12:15.
The identification of binding targets for proteins using ChIP-Seq has gained popularity as an alternative to ChIP-chip. Sequencing can, in principle, eliminate artifacts associated with microarrays, and cheap sequencing offers the ability to sequence deeply and obtain a comprehensive survey of binding. A number of algorithms have been developed to call "peaks" representing bound regions from mapped reads. Most current algorithms incorporate multiple heuristics, and despite much work it remains difficult to accurately determine individual peaks corresponding to distinct binding events.
Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is statistically sound and robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We validate our approach using previously published data and show that it can discover previously missed regions.
The difficulty in accurately calling peaks for ChIP-Seq data is partly due to the difficulty in defining peaks, and we demonstrate a novel method that improves on the accuracy of previous methods in resolving peaks. Our introduction of a robust statistical test based on ideas from topological data analysis is also novel. Our methods are implemented in a program called T-PIC (Tree shape Peak Identification for ChIP-Seq) is available at
PMCID: PMC3032669  PMID: 21226895
4.  A Systems Biology View of Cancer 
Biochimica et biophysica acta  2009;1796(2):129-139.
In order to understand how a cancer cell is functionally different from a normal cell it is necessary to assess the complex network of pathways involving gene regulation, signaling, and cell metabolism, and the alterations in its dynamics caused by the several different types of mutations leading to malignancy. Since the network is typically complex, with multiple connections between pathways and important feedback loops, it is crucial to represent it in the form of a computational model that can be used for a rigorous analysis. This is the approach of systems biology, made possible by new –omics data generation technologies. The goal of this review is to illustrate this approach and its utility for our understanding of cancer. After a discussion of recent progress using a network-centric approach, three case studies related to diagnostics, therapy, and drug development are presented in detail. They focus on breast cancer, B cell lymphomas, and colorectal cancer. The discussion is centered on key mathematical and computational tools common to a systems biology approach.
PMCID: PMC2782452  PMID: 19505535
systems biology; cancer; mathematical modeling
5.  Coverage statistics for sequence census methods 
BMC Bioinformatics  2010;11:430.
We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how this can be used to detect regions with anomalous coverage. This modeling perspective is especially germane to current high-throughput sequencing experiments, where both sample preparation protocols and sequencing technology particulars can affect fragment length distributions.
Under the mild assumptions that fragment start sites are Poisson distributed and successive fragment lengths are independent and identically distributed, we observe that, regardless of fragment length distribution, the fragments produced in a sequencing experiment can be viewed as resulting from a two-dimensional spatial Poisson process. We then study the successive jumps of the coverage function, and show that they can be encoded as a random tree that is approximately a Galton-Watson tree with generation-dependent geometric offspring distributions whose parameters can be computed.
We extend standard analyses of shotgun sequencing that focus on coverage statistics at individual sites, and provide a null model for detecting deviations from random coverage in high-throughput sequence census based experiments. Our approach leads to explicit determinations of the null distributions of certain test statistics, while for others it greatly simplifies the approximation of their null distributions by simulation. Our focus on fragments also leads to a new approach to visualizing sequencing data that is of independent interest.
PMCID: PMC2940910  PMID: 20718980
6.  A General Map of Iron Metabolism and Tissue-specific Subnetworks 
Molecular bioSystems  2009;5(5):422-443.
Iron is required for survival of mammalian cells. Recently, understanding of iron metabolism and trafficking has increased dramatically, revealing a complex, interacting network largely unknown just a few years ago. This provides an excellent model for systems biology development and analysis. The first step in such an analysis is the construction of a structural network of iron metabolism, which we present here. This network was created using CellDesigner version 3.5.2 and includes reactions occurring in mammalian cells of numerous tissue types. The iron metabolic network contains 151 chemical species and 107 reactions and transport steps. Starting from this general model, we construct iron networks for specific tissues and cells that are fundamental to maintaining body iron homeostasis. We include subnetworks for cells of the intestine and liver, tissues important in iron uptake and storage, respectively; as well as the reticulocyte and macrophage, key cells in iron utilization and recycling. The addition of kinetic information to our structural network will permit the simulation of iron metabolism in different tissues as well as in health and disease.
PMCID: PMC2680238  PMID: 19381358
iron; liver; macrophage; reactive oxygen species; red blood cells

Results 1-6 (6)