PMCC PMCC

Aide
Les critères de recherche

Avancée
Résultats 1-25 (37)
 

Notices sélectionnées (0)
Aucune

Sélectionner un filtre

Année de publication
1.  Increasing Coverage of Transcription Factor Position Weight Matrices through Domain-level Homology 
PLoS ONE  2012;7(8):e42779.
Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.
By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.
The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at http://dodoma.systemsbiology.netdodoma.systemsbiology.net.
doi:10.1371/journal.pone.0042779
PMCID: PMC3428306  PMID: 22952610
2.  Integrated Analysis of Gene Expression and Tumor Nuclear Image Profiles Associated with Chemotherapy Response in Serous Ovarian Carcinoma 
PLoS ONE  2012;7(5):e36383.
Background
Small sample sizes used in previous studies result in a lack of overlap between the reported gene signatures for prediction of chemotherapy response. Although morphologic features, especially tumor nuclear morphology, are important for cancer grading, little research has been reported on quantitatively correlating cellular morphology with chemotherapy response, especially in a large data set. In this study, we have used a large population of patients to identify molecular and morphologic signatures associated with chemotherapy response in serous ovarian carcinoma.
Methodology/Principal Findings
A gene expression model that predicts response to chemotherapy is developed and validated using a large-scale data set consisting of 493 samples from The Cancer Genome Atlas (TCGA) and 244 samples from an Australian report. An identified 227-gene signature achieves an overall predictive accuracy of greater than 85% with a sensitivity of approximately 95% and specificity of approximately 70%. The gene signature significantly distinguishes between patients with unfavorable versus favorable prognosis, when applied to either an independent data set (P = 0.04) or an external validation set (P<0.0001). In parallel, we present the production of a tumor nuclear image profile generated from 253 sample slides by characterizing patients with nuclear features (such as size, elongation, and roundness) in incremental bins, and we identify a morphologic signature that demonstrates a strong association with chemotherapy response in serous ovarian carcinoma.
Conclusions
A gene signature discovered on a large data set provides robustness in accurately predicting chemotherapy response in serous ovarian carcinoma. The combination of the molecular and morphologic signatures yields a new understanding of potential mechanisms involved in drug resistance.
doi:10.1371/journal.pone.0036383
PMCID: PMC3348145  PMID: 22590536
3.  DETERMINISTIC AND STOCHASTIC MODELS OF GENETIC REGULATORY NETWORKS 
Methods in enzymology  2009;467:335-356.
Traditionally molecular biology research has tended to reduce biological pathways to composite units studied as isolated parts of the cellular system. With the advent of high throughput methodologies that can capture thousands of data points, and powerful computational approaches, the reality of studying cellular processes at a systems level is upon us. As these approaches yield massive datasets, systems level analyses have drawn upon other fields such as engineering and mathematics, adapting computational and statistical approaches to decipher relationships between molecules. Guided by high quality datasets and analyses, one can begin the process of predictive modeling. The findings from such approaches are often surprising and beyond normal intuition. We discuss four classes of dynamical systems used to model genetic regulatory networks. The discussion is divided into continuous and discrete models, as well as deterministic and stochastic model classes. For each combination of these categories, a model is presented and discussed in the context of the yeast cell cycle, illustrating how different types of questions can be addressed by different model classes.
doi:10.1016/S0076-6879(09)67013-0
PMCID: PMC3230268  PMID: 19897099
4.  EPEPT: A web service for enhanced P-value estimation in permutation tests 
BMC Bioinformatics  2011;12:411.
Background
In computational biology, permutation tests have become a widely used tool to assess the statistical significance of an event under investigation. However, the common way of computing the P-value, which expresses the statistical significance, requires a very large number of permutations when small (and thus interesting) P-values are to be accurately estimated. This is computationally expensive and often infeasible. Recently, we proposed an alternative estimator, which requires far fewer permutations compared to the standard empirical approach while still reliably estimating small P-values [1].
Results
The proposed P-value estimator has been enriched with additional functionalities and is made available to the general community through a public website and web service, called EPEPT. This means that the EPEPT routines can be accessed not only via a website, but also programmatically using any programming language that can interact with the web. Examples of web service clients in multiple programming languages can be downloaded. Additionally, EPEPT accepts data of various common experiment types used in computational biology. For these experiment types EPEPT first computes the permutation values and then performs the P-value estimation. Finally, the source code of EPEPT can be downloaded.
Conclusions
Different types of users, such as biologists, bioinformaticians and software engineers, can use the method in an appropriate and simple way.
Availability
http://informatics.systemsbiology.net/EPEPT/
doi:10.1186/1471-2105-12-411
PMCID: PMC3277916  PMID: 22024252
5.  A regression model approach to enable cell morphology correction in high-throughput flow cytometry 
Large variations in cell size and shape can undermine traditional gating methods for analyzing flow cytometry data. Correcting for these effects enables analysis of high-throughput data sets, including >5000 yeast samples with diverse cell morphologies.
The regression model approach corrects for the effects of cell morphology on fluorescence, as well as an extremely small and restrictive gate, but without removing any of the cells.In contrast to traditional gating, this approach enables the quantitative analysis of high-throughput flow cytometry experiments, since the regression model can compare between biological samples that show no or little overlap in terms of the morphology of the cells.The analysis of a high-throughput yeast flow cytometry data set consisting of >5000 biological samples identified key proteins that affect the time and intensity of the bifurcation event that happens after the carbon source transition from glucose to fatty acids. Here, some yeast cells undergo major structural changes, while others do not.
Flow cytometry is a widely used technique that enables the measurement of different optical properties of individual cells within large populations of cells in a fast and automated manner. For example, by targeting cell-specific markers with fluorescent probes, flow cytometry is used to identify (and isolate) cell types within complex mixtures of cells. In addition, fluorescence reporters can be used in conjunction with flow cytometry to measure protein, RNA or DNA concentration within single cells of a population.
One of the biggest advantages of this technique is that it provides information of how each cell behaves instead of just measuring the population average. This can be essential when analyzing complex samples that consist of diverse cell types or when measuring cellular responses to stimuli. For example, there is an important difference between a 50% expression increase of all cells in a population after stimulation and a 100% increase in only half of the cells, while the other half remains unresponsive. Another important advantage of flow cytometry is automation, which enables high-throughput studies with thousands of samples and conditions. However, current methods are confounded by populations of cells that are non-uniform in terms of size and granularity. Such variability affects the emitted fluorescence of the cell and adds undesired variability when estimating population fluorescence. This effect also frustrates a sensible comparison between conditions, where not only fluorescence but also cell size and granularity may be affected.
Traditionally, this problem has been addressed by using ‘gates' that restrict the analysis to cells with similar morphological properties (i.e. cell size and cell granularity). Because cells inside the gate are morphologically similar to one another, they will show a smaller variability in their response within the population. Moreover, applying the same gate in all samples assures that observed differences between these samples are not due to differential cell morphologies.
Gating, however, comes with costs. First, since only a subgroup of cells is selected, the final number of cells analyzed can be significantly reduced. This means that in order to have sufficient statistical power, more cells have to be acquired, which, if even possible in the first place, increases the time and cost of the experiment. Second, finding a good gate for all samples and conditions can be challenging if not impossible, especially in cases where cellular morphology changes dramatically between conditions. Finally, gating is a very user-dependent process, where both the size and shape of the gate are determined by the researcher and will affect the outcome, introducing subjectivity in the analysis that complicates reproducibility.
In this paper, we present an alternative method to gating that addresses the issues stated above. The method is based on a regression model containing linear and non-linear terms that estimates and corrects for the effect of cell size and granularity on the observed fluorescence of each cell in a sample. The corrected fluorescence thus becomes ‘free' of the morphological effects.
Because the model uses all cells in the sample, it assures that the corrected fluorescence is an accurate representation of the sample. In addition, the regression model can predict the expected fluorescence of a sample in areas where there are no cells. This makes it possible to compare between samples that have little overlap with good confidence. Furthermore, because the regression model is automated, it is fully reproducible between labs and conditions. Finally, it allows for a rapid analysis of big data sets containing thousands of samples.
To probe the validity of the model, we performed several experiments. We show how the regression model is able to remove the morphological-associated variability as well as an extremely small and restrictive gate, but without the caveat of removing cells. We test the method in different organisms (yeast and human) and applications (protein level detection, separation of mixed subpopulations). We then apply this method to unveil new biological insights in the mechanistic processes involved in transcriptional noise.
Gene transcription is a process subjected to the randomness intrinsic to any molecular event. Although such randomness may seem to be undesirable for the cell, since it prevents consistent behavior, there are situations where some degree of randomness is beneficial (e.g. bet hedging). For this reason, each gene is tuned to exhibit different levels of randomness or noise depending on its functions. For core and essential genes, the cell has developed mechanisms to lower the level of noise, while for genes involved in the response to stress, the variability is greater.
This gene transcription tuning can be determined at many levels, from the architecture of the transcriptional network, to epigenetic regulation. In our study, we analyze the latter using the response of yeast to the presence of fatty acid in the environment. Fatty acid can be used as energy by yeast, but it requires major structural changes and commitments. We have observed that at the population level, there is a bifurcation event whereby some cells undergo these changes and others do not. We have analyzed this bifurcation event in mutants for all the non-essential epigenetic regulators in yeast and identified key proteins that affect the time and intensity of this bifurcation. Even though fatty acid triggers major morphological changes in the cell, the regression model still makes it possible to analyze the over 5000 flow cytometry samples in this data set in an automated manner, whereas a traditional gating approach would be impossible.
Cells exposed to stimuli exhibit a wide range of responses ensuring phenotypic variability across the population. Such single cell behavior is often examined by flow cytometry; however, gating procedures typically employed to select a small subpopulation of cells with similar morphological characteristics make it difficult, even impossible, to quantitatively compare cells across a large variety of experimental conditions because these conditions can lead to profound morphological variations. To overcome these limitations, we developed a regression approach to correct for variability in fluorescence intensity due to differences in cell size and granularity without discarding any of the cells, which gating ipso facto does. This approach enables quantitative studies of cellular heterogeneity and transcriptional noise in high-throughput experiments involving thousands of samples. We used this approach to analyze a library of yeast knockout strains and reveal genes required for the population to establish a bimodal response to oleic acid induction. We identify a group of epigenetic regulators and nucleoporins that, by maintaining an ‘unresponsive population,' may provide the population with the advantage of diversified bet hedging.
doi:10.1038/msb.2011.64
PMCID: PMC3202802  PMID: 21952134
flow cytometry; high-throughput experiments; statistical regression model; transcriptional noise
6.  Trade-off between Responsiveness and Noise Suppression in Biomolecular System Responses to Environmental Cues 
PLoS Computational Biology  2011;7(6):e1002091.
When living systems detect changes in their external environment their response must be measured to balance the need to react appropriately with the need to remain stable, ignoring insignificant signals. Because this is a fundamental challenge of all biological systems that execute programs in response to stimuli, we developed a generalized time-frequency analysis (TFA) framework to systematically explore the dynamical properties of biomolecular networks. Using TFA, we focused on two well-characterized yeast gene regulatory networks responsive to carbon-source shifts and a mammalian innate immune regulatory network responsive to lipopolysaccharides (LPS). The networks are comprised of two different basic architectures. Dual positive and negative feedback loops make up the yeast galactose network; whereas overlapping positive and negative feed-forward loops are common to the yeast fatty-acid response network and the LPS-induced network of macrophages. TFA revealed remarkably distinct network behaviors in terms of trade-offs in responsiveness and noise suppression that are appropriately tuned to each biological response. The wild type galactose network was found to be highly responsive while the oleate network has greater noise suppression ability. The LPS network appeared more balanced, exhibiting less bias toward noise suppression or responsiveness. Exploration of the network parameter space exposed dramatic differences in system behaviors for each network. These studies highlight fundamental structural and dynamical principles that underlie each network, reveal constrained parameters of positive and negative feedback and feed-forward strengths that tune the networks appropriately for their respective biological roles, and demonstrate the general utility of the TFA approach for systems and synthetic biology.
Author Summary
Biological systems constantly balance noise suppression with responsiveness. In a fluctuating environment, some changes are insignificant to living cells while others represent cues to which they must respond. These stimuli are interpreted by molecular circuits that enable the cell to strike an appropriate balance between responsiveness and noise suppression. This trade-off is governed by the structure and kinetic parameters of molecular networks, which have been tuned by evolutionary selection for different stimuli and responses. We consider three regulatory circuits (two from yeast and one from mammalian cells), which respond to different environments and involve very different physiological processes. To investigate the responses to a time varying signal, we developed a generalized time-frequency analysis framework for studying such trade-offs using mathematical models of regulatory circuits and explore how the structure and parameters of the circuit affect the trade-offs between noise suppression and responsiveness. The generalized TFA approach represents an effective tool for exploring and analyzing different systems-level dynamical properties. Making use of such properties can facilitate prediction and network control for systems- and synthetic biology applications.
doi:10.1371/journal.pcbi.1002091
PMCID: PMC3127798  PMID: 21738459
7.  Taming Data 
Cell host & microbe  2008;4(4):312-313.
A challenge in systems-level investigations of the immune response is the principled integration of disparate data sets for constructing predictive models. InnateDB (Lynn et al., 2008; http://www.innatedb.ca), a publicly available, manually curated database of experimentally verified molecular interactions and pathways involved in innate immunity, is a powerful new resource that facilitates such integrative systems-level analyses.
doi:10.1016/j.chom.2008.09.011
PMCID: PMC3074406  PMID: 18854235
8.  Genome-Wide Analysis of Effectors of Peroxisome Biogenesis 
PLoS ONE  2010;5(8):e11953.
Peroxisomes are intracellular organelles that house a number of diverse metabolic processes, notably those required for β-oxidation of fatty acids. Peroxisomes biogenesis can be induced by the presence of peroxisome proliferators, including fatty acids, which activate complex cellular programs that underlie the induction process. Here, we used multi-parameter quantitative phenotype analyses of an arrayed mutant collection of yeast cells induced to proliferate peroxisomes, to establish a comprehensive inventory of genes required for peroxisome induction and function. The assays employed include growth in the presence of fatty acids, and confocal imaging and flow cytometry through the induction process. In addition to the classical phenotypes associated with loss of peroxisomal functions, these studies identified 169 genes required for robust signaling, transcription, normal peroxisomal development and morphologies, and transmission of peroxisomes to daughter cells. These gene products are localized throughout the cell, and many have indirect connections to peroxisome function. By integration with extant data sets, we present a total of 211 genes linked to peroxisome biogenesis and highlight the complex networks through which information flows during peroxisome biogenesis and function.
doi:10.1371/journal.pone.0011953
PMCID: PMC2915925  PMID: 20694151
9.  Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites 
Bioinformatics  2010;26(17):2071-2075.
Motivation: Histone acetylation (HAc) is associated with open chromatin, and HAc has been shown to facilitate transcription factor (TF) binding in mammalian cells. In the innate immune system context, epigenetic studies strongly implicate HAc in the transcriptional response of activated macrophages. We hypothesized that using data from large-scale sequencing of a HAc chromatin immunoprecipitation assay (ChIP-Seq) would improve the performance of computational prediction of binding locations of TFs mediating the response to a signaling event, namely, macrophage activation.
Results: We tested this hypothesis using a multi-evidence approach for predicting binding sites. As a training/test dataset, we used ChIP-Seq-derived TF binding site locations for five TFs in activated murine macrophages. Our model combined TF binding site motif scanning with evidence from sequence-based sources and from HAc ChIP-Seq data, using a weighted sum of thresholded scores. We find that using HAc data significantly improves the performance of motif-based TF binding site prediction. Furthermore, we find that within regions of high HAc, local minima of the HAc ChIP-Seq signal are particularly strongly correlated with TF binding locations. Our model, using motif scanning and HAc local minima, improves the sensitivity for TF binding site prediction by ∼50% over a model based on motif scanning alone, at a false positive rate cutoff of 0.01.
Availability: The data and software source code for model training and validation are freely available online at http://magnet.systemsbiology.net/hac.
Contact: aderem@systemsbiology.org; ishmulevich@systemsbiology.org
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq405
PMCID: PMC2922897  PMID: 20663846
10.  SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments 
BMC Bioinformatics  2010;11:377.
Background
High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires.
Results
Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code.
Conclusion
The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.
doi:10.1186/1471-2105-11-377
PMCID: PMC2916924  PMID: 20630057
11.  Probabilistic analysis of gene expression measurements from heterogeneous tissues 
Bioinformatics  2010;26(20):2571-2577.
Motivation: Tissue heterogeneity, arising from multiple cell types, is a major confounding factor in experiments that focus on studying cell types, e.g. their expression profiles, in isolation. Although sample heterogeneity can be addressed by manual microdissection, prior to conducting experiments, computational treatment on heterogeneous measurements have become a reliable alternative to perform this microdissection in silico. Favoring computation over manual purification has its advantages, such as time consumption, measuring responses of multiple cell types simultaneously, keeping samples intact of external perturbations and unaltered yield of molecular content.
Results: We formalize a probabilistic model, DSection, and show with simulations as well as with real microarray data that DSection attains increased modeling accuracy in terms of (i) estimating cell-type proportions of heterogeneous tissue samples, (ii) estimating replication variance and (iii) identifying differential expression across cell types under various experimental conditions. As our reference we use the corresponding linear regression model, which mirrors the performance of the majority of current non-probabilistic modeling approaches.
Availability and Software: All codes are written in Matlab, and are freely available upon request as well as at the project web page http://www.cs.tut.fi/∼erkkila2/. Furthermore, a web-application for DSection exists at http://informatics.systemsbiology.net/DSection.
Contact: timo.p.erkkila@tut.fi; harri.lahdesmaki@tut.fi
doi:10.1093/bioinformatics/btq406
PMCID: PMC2951082  PMID: 20631160
12.  Age-Dependent Signature of Metallothionein Expression in Primary CD4 T Cell Responses Is Due to Sustained Zinc Signaling 
Rejuvenation research  2008;11(6):1001-1011.
The ability to mount adaptive immune responses to vaccinations and viral infections declines with increasing age. To identify mechanisms leading to immunosenescence, primary CD4 T cell responses were examined in 60- to 75-year-old individuals lacking overt functional defects. Transcriptome analysis indicated a selective defect in zinc homeostasis. CD4 T cell activation was associated with zinc influx via the zinc transporter Zip6, leading to increased free cytoplasmic zinc and activation of negative feedback loops, including the induction of zinc-binding metallothioneins. In young adults, activation-induced cytoplasmic zinc concentrations declined after 2 days to below prestimulation levels. In contrast, activated naïve CD4 T cells from older individuals failed to downregulate cytoplasmic zinc, resulting in excessive induction of metallothioneins. Activation-induced metallothioneins regulated the redox state in activated T cells and accounted for an increased proliferation of old CD4 T cells, suggesting that regulation of T cell zinc homeostasis functions as a compensatory mechanism to preserve the replicative potential of naïve CD4 T cells with age.
doi:10.1089/rej.2008.0747
PMCID: PMC2848531  PMID: 19072254
13.  Age-Dependent Signature of Metallothionein Expression in Primary CD4 T Cell Responses Is Due to Sustained Zinc Signaling 
Rejuvenation Research  2008;11(6):1001-1011.
Abstract
The ability to mount adaptive immune responses to vaccinations and viral infections declines with increasing age. To identify mechanisms leading to immunosenescence, primary CD4 T cell responses were examined in 60- to 75-year-old individuals lacking overt functional defects. Transcriptome analysis indicated a selective defect in zinc homeostasis. CD4 T cell activation was associated with zinc influx via the zinc transporter Zip6, leading to increased free cytoplasmic zinc and activation of negative feedback loops, including the induction of zinc-binding metallothioneins. In young adults, activation-induced cytoplasmic zinc concentrations declined after 2 days to below prestimulation levels. In contrast, activated naïve CD4 T cells from older individuals failed to downregulate cytoplasmic zinc, resulting in excessive induction of metallothioneins. Activation-induced metallothioneins regulated the redox state in activated T cells and accounted for an increased proliferation of old CD4 T cells, suggesting that regulation of T cell zinc homeostasis functions as a compensatory mechanism to preserve the replicative potential of naïve CD4 T cells with age.
doi:10.1089/rej.2008.0747
PMCID: PMC2848531  PMID: 19072254
14.  Role of the transcription factor C/EBPδ in a regulatory circuit that discriminates between transient and persistent Toll-like receptor 4-induced signals 
Nature immunology  2009;10(4):437-443.
The innate immune system is a two-edged sword; it is absolutely required for host defense against infection but, uncontrolled, can trigger a plethora of inflammatory diseases. Here we used systems biology approaches to predict and validate a gene regulatory network involving a dynamic interplay between the transcription factors NF-κB, C/EBPδ, and ATF3 that controls inflammatory responses. We mathematically modeled transcriptional regulation of Il6 and Cebpd genes and experimentally validated the prediction that the combination of an initiator (NF-κB), an amplifier (C/EBPδ) and an attenuator (ATF3) forms a regulatory circuit that discriminates between transient and persistent Toll-like receptor 4-induced signals. Our results suggest a mechanism that enables the innate immune system to detect the duration of infection and to respond appropriately.
doi:10.1038/ni.1721
PMCID: PMC2780024  PMID: 19270711
15.  A data integration framework for prediction of transcription factor targets: a BCL6 case study 
We present a computational framework for predicting targets of transcription factor regulation. The framework is based on the integration of a number of sources of evidence, derived from DNA sequence and gene expression data, using a weighted sum approach. Sources of evidence are prioritized based on a training set, and their relative contributions are then optimized. The performance of the proposed framework is demonstrated in the context of BCL6 target prediction. We show that this framework is able to uncover BCL6 targets reliably when biological prior information is utilized effectively, particularly in the case of sequence analysis. The framework results in a considerable gain in performance over scores in which sequence information was not incorporated. This analysis shows that with assessment of the quality and biological relevance of the data, reliable predictions can be obtained with this computational framework.
doi:10.1111/j.1749-6632.2008.03758.x
PMCID: PMC2771581  PMID: 19348642
network inference; transcription factor binding site prediction; data integration
16.  Bright Field Microscopy as an Alternative to Whole Cell Fluorescence in Automated Analysis of Macrophage Images 
PLoS ONE  2009;4(10):e7497.
Background
Fluorescence microscopy is the standard tool for detection and analysis of cellular phenomena. This technique, however, has a number of drawbacks such as the limited number of available fluorescent channels in microscopes, overlapping excitation and emission spectra of the stains, and phototoxicity.
Methodology
We here present and validate a method to automatically detect cell population outlines directly from bright field images. By imaging samples with several focus levels forming a bright field -stack, and by measuring the intensity variations of this stack over the -dimension, we construct a new two dimensional projection image of increased contrast. With additional information for locations of each cell, such as stained nuclei, this bright field projection image can be used instead of whole cell fluorescence to locate borders of individual cells, separating touching cells, and enabling single cell analysis. Using the popular CellProfiler freeware cell image analysis software mainly targeted for fluorescence microscopy, we validate our method by automatically segmenting low contrast and rather complex shaped murine macrophage cells.
Significance
The proposed approach frees up a fluorescence channel, which can be used for subcellular studies. It also facilitates cell shape measurement in experiments where whole cell fluorescent staining is either not available, or is dependent on a particular experimental condition. We show that whole cell area detection results using our projected bright field images match closely to the standard approach where cell areas are localized using fluorescence, and conclude that the high contrast bright field projection image can directly replace one fluorescent channel in whole cell quantification. Matlab code for calculating the projections can be downloaded from the supplementary site: http://sites.google.com/site/brightfieldorstaining
doi:10.1371/journal.pone.0007497
PMCID: PMC2760782  PMID: 19847301
17.  Fewer permutations, more accurate P-values 
Bioinformatics  2009;25(12):i161-i168.
Motivation: Permutation tests have become a standard tool to assess the statistical significance of an event under investigation. The statistical significance, as expressed in a P-value, is calculated as the fraction of permutation values that are at least as extreme as the original statistic, which was derived from non-permuted data. This empirical method directly couples both the minimal obtainable P-value and the resolution of the P-value to the number of permutations. Thereby, it imposes upon itself the need for a very large number of permutations when small P-values are to be accurately estimated. This is computationally expensive and often infeasible.
Results: A method of computing P-values based on tail approximation is presented. The tail of the distribution of permutation values is approximated by a generalized Pareto distribution. A good fit and thus accurate P-value estimates can be obtained with a drastically reduced number of permutations when compared with the standard empirical way of computing P-values.
Availability: The Matlab code can be obtained from the corresponding author on request.
Contact: tknijnenburg@systemsbiology.org
Supplementary information:Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp211
PMCID: PMC2687965  PMID: 19477983
18.  Adaptable data management for systems biology investigations 
BMC Bioinformatics  2009;10:79.
Background
Within research each experiment is different, the focus changes and the data is generated from a continually evolving barrage of technologies. There is a continual introduction of new techniques whose usage ranges from in-house protocols through to high-throughput instrumentation. To support these requirements data management systems are needed that can be rapidly built and readily adapted for new usage.
Results
The adaptable data management system discussed is designed to support the seamless mining and analysis of biological experiment data that is commonly used in systems biology (e.g. ChIP-chip, gene expression, proteomics, imaging, flow cytometry). We use different content graphs to represent different views upon the data. These views are designed for different roles: equipment specific views are used to gather instrumentation information; data processing oriented views are provided to enable the rapid development of analysis applications; and research project specific views are used to organize information for individual research experiments. This management system allows for both the rapid introduction of new types of information and the evolution of the knowledge it represents.
Conclusion
Data management is an important aspect of any research enterprise. It is the foundation on which most applications are built, and must be easily extended to serve new functionality for new scientific areas. We have found that adopting a three-tier architecture for data management, built around distributed standardized content repositories, allows us to rapidly develop new applications to support a diverse user community.
doi:10.1186/1471-2105-10-79
PMCID: PMC2670281  PMID: 19265554
19.  Using cell fate attractors to uncover transcriptional regulation of HL60 neutrophil differentiation 
BMC Systems Biology  2009;3:20.
Background
The process of cellular differentiation is governed by complex dynamical biomolecular networks consisting of a multitude of genes and their products acting in concert to determine a particular cell fate. Thus, a systems level view is necessary for understanding how a cell coordinates this process and for developing effective therapeutic strategies to treat diseases, such as cancer, in which differentiation plays a significant role. Theoretical considerations and recent experimental evidence support the view that cell fates are high dimensional attractor states of the underlying molecular networks. The temporal behavior of the network states progressing toward different cell fate attractors has the potential to elucidate the underlying molecular mechanisms governing differentiation.
Results
Using the HL60 multipotent promyelocytic leukemia cell line, we performed experiments that ultimately led to two different cell fate attractors by two treatments of varying dosage and duration of the differentiation agent all-trans-retinoic acid (ATRA). The dosage and duration combinations of the two treatments were chosen by means of flow cytometric measurements of CD11b, a well-known early differentiation marker, such that they generated two intermediate populations that were poised at the apparently same stage of differentiation. However, the population of one treatment proceeded toward the terminally differentiated neutrophil attractor while that of the other treatment reverted back toward the undifferentiated promyelocytic attractor. We monitored the gene expression changes in the two populations after their respective treatments over a period of five days and identified a set of genes that diverged in their expression, a subset of which promotes neutrophil differentiation while the other represses cell cycle progression. By employing promoter based transcription factor binding site analysis, we found enrichment in the set of divergent genes, of transcription factors functionally linked to tumor progression, cell cycle, and development.
Conclusion
Since many of the transcription factors identified by this approach are also known to be implicated in hematopoietic differentiation and leukemia, this study points to the utility of incorporating a dynamical systems level view into a computational analysis framework for elucidating transcriptional mechanisms regulating differentiation.
doi:10.1186/1752-0509-3-20
PMCID: PMC2652435  PMID: 19222862
20.  Biochemical and Statistical Network Models for Systems Biology 
Current opinion in biotechnology  2007;18(4):365-370.
doi:10.1016/j.copbio.2007.07.009
PMCID: PMC2034526  PMID: 17681779
21.  Systems biology driven software design for the research enterprise 
BMC Bioinformatics  2008;9:295.
Background
In systems biology, and many other areas of research, there is a need for the interoperability of tools and data sources that were not originally designed to be integrated. Due to the interdisciplinary nature of systems biology, and its association with high throughput experimental platforms, there is an additional need to continually integrate new technologies. As scientists work in isolated groups, integration with other groups is rarely a consideration when building the required software tools.
Results
We illustrate an approach, through the discussion of a purpose built software architecture, which allows disparate groups to reuse tools and access data sources in a common manner. The architecture allows for: the rapid development of distributed applications; interoperability, so it can be used by a wide variety of developers and computational biologists; development using standard tools, so that it is easy to maintain and does not require a large development effort; extensibility, so that new technologies and data types can be incorporated; and non intrusive development, insofar as researchers need not to adhere to a pre-existing object model.
Conclusion
By using a relatively simple integration strategy, based upon a common identity system and dynamically discovered interoperable services, a light-weight software architecture can become the focal point through which scientists can both get access to and analyse the plethora of experimentally derived data.
doi:10.1186/1471-2105-9-295
PMCID: PMC2478690  PMID: 18578887
22.  Critical Dynamics in Genetic Regulatory Networks: Examples from Four Kingdoms 
PLoS ONE  2008;3(6):e2456.
The coordinated expression of the different genes in an organism is essential to sustain functionality under the random external perturbations to which the organism might be subjected. To cope with such external variability, the global dynamics of the genetic network must possess two central properties. (a) It must be robust enough as to guarantee stability under a broad range of external conditions, and (b) it must be flexible enough to recognize and integrate specific external signals that may help the organism to change and adapt to different environments. This compromise between robustness and adaptability has been observed in dynamical systems operating at the brink of a phase transition between order and chaos. Such systems are termed critical. Thus, criticality, a precise, measurable, and well characterized property of dynamical systems, makes it possible for robustness and adaptability to coexist in living organisms. In this work we investigate the dynamical properties of the gene transcription networks reported for S. cerevisiae, E. coli, and B. subtilis, as well as the network of segment polarity genes of D. melanogaster, and the network of flower development of A. thaliana. We use hundreds of microarray experiments to infer the nature of the regulatory interactions among genes, and implement these data into the Boolean models of the genetic networks. Our results show that, to the best of the current experimental data available, the five networks under study indeed operate close to criticality. The generality of this result suggests that criticality at the genetic level might constitute a fundamental evolutionary mechanism that generates the great diversity of dynamically robust living forms that we observe around us.
doi:10.1371/journal.pone.0002456
PMCID: PMC2423472  PMID: 18560561
23.  Inference of Boolean Networks Using Sensitivity Regularization 
The inference of genetic regulatory networks from global measurements of gene expressions is an important problem in computational biology. Recent studies suggest that such dynamical molecular systems are poised at a critical phase transition between an ordered and a disordered phase, affording the ability to balance stability and adaptability while coordinating complex macroscopic behavior. We investigate whether incorporating this dynamical system-wide property as an assumption in the inference process is beneficial in terms of reducing the inference error of the designed network. Using Boolean networks, for which there are well-defined notions of ordered, critical, and chaotic dynamical regimes as well as well-studied inference procedures, we analyze the expected inference error relative to deviations in the networks' dynamical regimes from the assumption of criticality. We demonstrate that taking criticality into account via a penalty term in the inference procedure improves the accuracy of prediction both in terms of state transitions and network wiring, particularly for small sample sizes.
doi:10.1155/2008/780541
PMCID: PMC3171400  PMID: 18604289
25.  Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources 
PLoS ONE  2008;3(3):e1820.
An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org.
doi:10.1371/journal.pone.0001820
PMCID: PMC2268002  PMID: 18364997

Résultats 1-25 (37)