Search tips
Search criteria

Results 1-4 (4)

Clipboard (0)
Year of Publication
Document Types
1.  Genome-Wide Analysis of Effectors of Peroxisome Biogenesis 
PLoS ONE  2010;5(8):e11953.
Peroxisomes are intracellular organelles that house a number of diverse metabolic processes, notably those required for β-oxidation of fatty acids. Peroxisomes biogenesis can be induced by the presence of peroxisome proliferators, including fatty acids, which activate complex cellular programs that underlie the induction process. Here, we used multi-parameter quantitative phenotype analyses of an arrayed mutant collection of yeast cells induced to proliferate peroxisomes, to establish a comprehensive inventory of genes required for peroxisome induction and function. The assays employed include growth in the presence of fatty acids, and confocal imaging and flow cytometry through the induction process. In addition to the classical phenotypes associated with loss of peroxisomal functions, these studies identified 169 genes required for robust signaling, transcription, normal peroxisomal development and morphologies, and transmission of peroxisomes to daughter cells. These gene products are localized throughout the cell, and many have indirect connections to peroxisome function. By integration with extant data sets, we present a total of 211 genes linked to peroxisome biogenesis and highlight the complex networks through which information flows during peroxisome biogenesis and function.
PMCID: PMC2915925  PMID: 20694151
2.  Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites 
Bioinformatics  2010;26(17):2071-2075.
Motivation: Histone acetylation (HAc) is associated with open chromatin, and HAc has been shown to facilitate transcription factor (TF) binding in mammalian cells. In the innate immune system context, epigenetic studies strongly implicate HAc in the transcriptional response of activated macrophages. We hypothesized that using data from large-scale sequencing of a HAc chromatin immunoprecipitation assay (ChIP-Seq) would improve the performance of computational prediction of binding locations of TFs mediating the response to a signaling event, namely, macrophage activation.
Results: We tested this hypothesis using a multi-evidence approach for predicting binding sites. As a training/test dataset, we used ChIP-Seq-derived TF binding site locations for five TFs in activated murine macrophages. Our model combined TF binding site motif scanning with evidence from sequence-based sources and from HAc ChIP-Seq data, using a weighted sum of thresholded scores. We find that using HAc data significantly improves the performance of motif-based TF binding site prediction. Furthermore, we find that within regions of high HAc, local minima of the HAc ChIP-Seq signal are particularly strongly correlated with TF binding locations. Our model, using motif scanning and HAc local minima, improves the sensitivity for TF binding site prediction by ∼50% over a model based on motif scanning alone, at a false positive rate cutoff of 0.01.
Availability: The data and software source code for model training and validation are freely available online at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2922897  PMID: 20663846
3.  SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments 
BMC Bioinformatics  2010;11:377.
High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires.
Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code.
The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.
PMCID: PMC2916924  PMID: 20630057
4.  Probabilistic analysis of gene expression measurements from heterogeneous tissues 
Bioinformatics  2010;26(20):2571-2577.
Motivation: Tissue heterogeneity, arising from multiple cell types, is a major confounding factor in experiments that focus on studying cell types, e.g. their expression profiles, in isolation. Although sample heterogeneity can be addressed by manual microdissection, prior to conducting experiments, computational treatment on heterogeneous measurements have become a reliable alternative to perform this microdissection in silico. Favoring computation over manual purification has its advantages, such as time consumption, measuring responses of multiple cell types simultaneously, keeping samples intact of external perturbations and unaltered yield of molecular content.
Results: We formalize a probabilistic model, DSection, and show with simulations as well as with real microarray data that DSection attains increased modeling accuracy in terms of (i) estimating cell-type proportions of heterogeneous tissue samples, (ii) estimating replication variance and (iii) identifying differential expression across cell types under various experimental conditions. As our reference we use the corresponding linear regression model, which mirrors the performance of the majority of current non-probabilistic modeling approaches.
Availability and Software: All codes are written in Matlab, and are freely available upon request as well as at the project web page∼erkkila2/. Furthermore, a web-application for DSection exists at
PMCID: PMC2951082  PMID: 20631160

Results 1-4 (4)