The aerobic energy metabolism of cardiac muscle cells is of major importance for the contractile function of the heart. Because energy metabolism is very heterogeneously distributed in heart tissue, especially during coronary disease, a method to quantify metabolic fluxes in small tissue samples is desirable. Taking tissue biopsies after infusion of substrates labeled with stable carbon isotopes makes this possible in animal experiments. However, the appreciable noise level in NMR spectra of extracted tissue samples makes computational estimation of metabolic fluxes challenging and a good method to define confidence regions was not yet available.
Here we present a computational analysis method for nuclear magnetic resonance (NMR) measurements of tricarboxylic acid (TCA) cycle metabolites. The method was validated using measurements on extracts of single tissue biopsies taken from porcine heart in vivo. Isotopic enrichment of glutamate was measured by NMR spectroscopy in tissue samples taken at a single time point after the timed infusion of 13C labeled substrates for the TCA cycle. The NMR intensities for glutamate were analyzed with a computational model describing carbon transitions in the TCA cycle and carbon exchange with amino acids. The model dynamics depended on five flux parameters, which were optimized to fit the NMR measurements. To determine confidence regions for the estimated fluxes, we used the Metropolis-Hastings algorithm for Markov chain Monte Carlo (MCMC) sampling to generate extensive ensembles of feasible flux combinations that describe the data within measurement precision limits. To validate our method, we compared myocardial oxygen consumption calculated from the TCA cycle flux with in vivo blood gas measurements for 38 hearts under several experimental conditions, e.g. during coronary artery narrowing.
Despite the appreciable NMR noise level, the oxygen consumption in the tissue samples, estimated from the NMR spectra, correlates with blood-gas oxygen uptake measurements for the whole heart. The MCMC method provides confidence regions for the estimated metabolic fluxes in single cardiac biopsies, taking the quantified measurement noise level and the nonlinear dependencies between parameters fully into account.
Cardiac physiology; Metabolic modeling; Metabolomics; Sensitivity analysis; 13C metabolic flux analysis
In the pathogen P. aeruginosa, the formation of virulence factors is regulated via Quorum sensing signaling pathways. Due to the increasing number of strains that are resistant to antibiotics, there is a high interest to develop novel antiinfectives. In the combat of resistant bacteria, selective blockade of the bacterial cell–to–cell communication (Quorum sensing) has gained special interest as anti–virulence strategy. Here, we modeled the las, rhl, and pqs Quorum sensing systems by a multi–level logical approach to analyze how enzyme inhibitors and receptor antagonists effect the formation of autoinducers and virulence factors.
Our rule–based simulations fulfill the behavior expected from literature considering the external level of autoinducers. In the presence of PqsBCD inhibitors, the external HHQ and PQS levels are indeed clearly reduced. The magnitude of this effect strongly depends on the inhibition level. However, it seems that the pyocyanin pathway is incomplete.
To match experimental observations we suggest a modified network topology in which PqsE and PqsR acts as receptors and an autoinducer as ligand that up–regulate pyocyanin in a concerted manner. While the PQS biosynthesis is more appropriate as target to inhibit the HHQ and PQS formation, blocking the receptor PqsR that regulates the biosynthesis reduces the pyocyanin level stronger.
Quorum sensing; Multi–level logical approach; Boolean network; Gene–regulatory network; Inhibitor; Pseudomonas aeruginosa; pqs system
Post-traumatic stress disorder (PTSD) is a severe anxiety disorder that affects a substantial portion of combat veterans and poses serious consequences to long-term health. Consequently, the identification of diagnostic and prognostic blood biomarkers for PTSD is of great interest. Previously, we assessed genome-wide gene expression of seven brain regions and whole blood in a social defeat mouse model subjected to various stress conditions.
To extract biological insights from these data, we have applied a new computational framework for identifying gene modules that are activated in common across blood and various brain regions. Our results, in the form of modular gene networks that highlight spatial and temporal biological functions, provide a systems-level molecular description of response to social stress. Specifically, the common modules discovered between the brain and blood emphasizes molecular transporters in the blood-brain barrier, and the associated genes have significant overlaps with known blood signatures for PTSD, major depression, and bipolar disease. Similarly, the common modules specific to the brain highlight the components of the social defeat stress response (e.g., fear conditioning pathways) in each brain sub-region.
Many of the brain-specific genes discovered are consistent with previous independent studies of PTSD or other mental illnesses. The results from this study further our understanding of the mechanism of stress response and contribute to a growing list of diagnostic biomarkers for PTSD.
Despite clinical research and development in the last decades, infectious diseases remain a top global problem in public health today, being responsible for millions of morbidities and mortalities each year. Therefore, many studies have sought to investigate host-pathogen interactions from various viewpoints in attempts to understand pathogenic and defensive mechanisms, which could help control pathogenic infections. However, most of these efforts have focused predominately on the host or the pathogen individually rather than on a simultaneous analysis of both interaction partners.
In this study, with the help of simultaneously quantified time-course Candida albicans-zebrafish interaction transcriptomics and other omics data, a computational framework was developed to construct the interspecies protein-protein interaction (PPI) network for C. albicans-zebrafish interactions based on the inference of ortholog-based PPIs and the dynamic modeling of regulatory responses. The identified C. albicans-zebrafish interspecies PPI network highlights the association between C. albicans pathogenesis and the zebrafish redox process, indicating that redox status is critical in the battle between the host and pathogen.
Advancing from the single-species network construction method, the interspecies network construction approach allows further characterization and elucidation of the host-pathogen interactions. With continued accumulation of interspecies transcriptomics data, the proposed method could be used to explore progressive network rewiring over time, which could benefit the development of network medicine for infectious diseases.
Computational systems biology; Network construction; Host-pathogen interaction; Protein-protein interaction network; Infection; Multivariate dynamic modeling; Redox
Replacement of dysfunctional β-cells in the islets of Langerhans by transdifferentiation of pancreatic acinar cells has been proposed as a regenerative therapy for diabetes. Adult acinar cells spontaneously revert to a multipotent state upon tissue dissociation in vitro and can be stimulated to redifferentiate into β-cells. Despite accumulating evidence that contact-mediated signals are involved, the mechanisms regulating acinar-to-islet cell transdifferentiation remain poorly understood.
In this study, we propose that the crosstalk between two contact-mediated signaling mechanisms, lateral inhibition and lateral stabilization, controls cell fate stability and transdifferentiation of pancreatic cells. Analysis of a mathematical model combining gene regulation with contact-mediated signaling reveals the multistability of acinar and islet cell fates. Inhibition of one or both modes of signaling results in transdifferentiation from the acinar to the islet cell fate, either by dedifferentiation to a multipotent state or by direct lineage switching.
This study provides a theoretical framework to understand the role of contact-mediated signaling in pancreatic cell fate control that may help to improve acinar-to-islet cell transdifferentiation strategies for β-cell neogenesis.
Lineage conversion; Intercellular communication; Reprogramming; Pancreas; Acinar cells; Islet cells; Mathematical model; Multicellular systems biology
This paper presents a novel model for proliferating cell populations in labeling experiments. It is especially tailored to the technique of Bromodeoxyuridine (BrdU), which is taken up by dividing cells and thus accumulates with increasing division number during uplabeling. The study of the evolving label intensities of BrdU labeled cell populations is aimed at quantifying proliferation properties such as division and death rates.
In contrast to existing models, our model considers a labeling efficacy that follows a distribution, rather than a uniform value. It thereby allows to account for noise as well as possibly space-dependent heterogeneity in the effective label uptake of the individual cells in a population. Furthermore, it enables more informative comparison with experimental data: The population-level label distribution is provided as a model output, thereby increasing the information content compared to existing models that give the fraction of labeled cells or the mean label intensity.
We employ our model to study some naturally arising examples of heterogeneity in label uptake, which are not covered by existing models. With simulations of noisy and spacially heterogeneous label uptake, we demonstrate that our model contributes a more realistic quantitative description of labeling experiments.
The presented model is to our knowledge the first one that predicts the full label distribution for BrdU labeling experiments. Thus, it can exploit more information, namely the full intensity distribution, from labeling measurements, and thereby opens up new quantitative insights into cell proliferation.
In bioprocess development, the needs of data analysis include (1) getting overview to existing data sets, (2) identifying primary control parameters, (3) determining a useful control direction, and (4) planning future experiments. In particular, the integration of multiple data sets causes that these needs cannot be properly addressed by regression models that assume linear input-output relationship or unimodality of the response function. Regularized regression and random forests, on the other hand, have several properties that may appear important in this context. They are capable, e.g., in handling small number of samples with respect to the number of variables, feature selection, and the visualization of response surfaces in order to present the prediction results in an illustrative way.
In this work, the applicability of regularized regression (Lasso) and random forests (RF) in bioprocess data mining was examined, and their performance was benchmarked against multiple linear regression. As an example, we used data from a culture media optimization study for microbial hydrogen production. All the three methods were capable in providing a significant model when the five variables of the culture media optimization were linearly included in modeling. However, multiple linear regression failed when also the multiplications and squares of the variables were included in modeling. In this case, the modeling was still successful with Lasso (correlation between the observed and predicted yield was 0.69) and RF (0.91).
We found that both regularized regression and random forests were able to produce feasible models, and the latter was efficient in capturing the non-linearity in the data. In this kind of a data mining task of bioprocess data, both methods outperform multiple linear regression.
We explore whether the process of multimerization can be used as a means to regulate noise in the abundance of functional protein complexes. Additionally, we analyze how this process affects the mean level of these functional units, response time of a gene, and temporal correlation between the numbers of expressed proteins and of the functional multimers. We show that, although multimerization increases noise by reducing the mean number of functional complexes it can reduce noise in comparison with a monomer, when abundance of the functional proteins are comparable. Alternatively, reduction in noise occurs if both monomeric and multimeric forms of the protein are functional. Moreover, we find that multimerization either increases the response time to external signals or decreases the correlation between number of functional complexes and protein production kinetics. Finally, we show that the results are in agreement with recent genome-wide assessments of cell-to-cell variability in protein numbers and of multimerization in essential and non-essential genes in Escherichia coli, and that the effects of multimerization are tangible at the level of genetic circuits.
Cancers are complex diseases arising from accumulated genetic mutations that disrupt intracellular signaling networks. While several predisposing genetic mutations have been found, these individual mutations account only for a small fraction of cancer incidence and mortality. With large-scale measurement technologies, such as single nucleotide polymorphism (SNP) microarrays, it is now possible to identify combinatorial effects that have significant impact on cancer patient survival.
The identification of synergetic functioning SNPs on genome-scale is a computationally daunting task and requires advanced algorithms. We introduce a novel algorithm, Geninter, to identify SNPs that have synergetic effect on survival of cancer patients. Using a large breast cancer cohort we generate a simulator that allows assessing reliability and accuracy of Geninter and logrank test, which is a standard statistical method to integrate genetic and survival data.
Our results show that Geninter outperforms the logrank test and is able to identify SNP-pairs with synergetic impact on survival.
Cancer is a broad group of genetic diseases which account for millions of deaths worldwide each year. Cancers are classified by various clinical, pathological and molecular methods, but even within a well-characterized disease, there is a significant inter-patient variability in survival, response to treatment, and other parameters. Especially in molecular level, tumours of the same category can appear significantly dissimilar due to complex combinations of genetic aberrations leading to a similar malignancy. We extended the current classification methods by studying tumour heterogeneity at pathway level.
We computed the rate of alterations in 1994 pathways and 2210 tumours consisting of eight different cancers. Using gene set enrichment analysis, each sample was computed a pathway aberration profile that reflected its molecular state. The profiles were analysed together to infer the characteristic aberration rates for each pathway within each cancer. Subgroups of tumours defined by similar pathway aberrations were identified using clustering analyses. The pathway aberration and gene expression profiles of the subgroups were consecutively compared across all eight cancer types to search for similar tumours crossing the standard classification.
We identified pathways and processes that were common to all cancers as well as traits that are unique to a cancer type or closely related cancers. Studying the gene expression patterns within the pathway context suggested potential alteration mechanisms. Clustering analysis revealed five clinically relevant subgroups of tumours in four cancers that exhibited significant differences in survival compared to others. The cross-cancer analysis of the subgroups resulted in the identification of tumours that shared potentially significant alterations.
This study represents the first effort to extend the molecular characterizations towards pathway level descriptions across the family of cancers. In addition to providing a proof-of-concept for single sample pathway aberration analysis in this context, we present a comprehensive pathway aberration dataset that can be used to study pathway aberration patterns within or across cancers. Significant similarities between subgroups of different cancers on pathway and gene expression levels provide interesting hypotheses for understanding variable drug response, or transferring treatments across diseases by identifying common druggable pathways or genes, for example.
Model development is a key task in systems biology, which typically starts from an initial model candidate and, involving an iterative cycle of hypotheses-driven model modifications, leads to new experimentation and subsequent model identification steps. The final product of this cycle is a satisfactory refined model of the biological phenomena under study. During such iterative model development, researchers frequently propose a set of model candidates from which the best alternative must be selected. Here we consider this problem of model selection and formulate it as a simultaneous model selection and parameter identification problem. More precisely, we consider a general mixed-integer nonlinear programming (MINLP) formulation for model selection and identification, with emphasis on dynamic models consisting of sets of either ODEs (ordinary differential equations) or DAEs (differential algebraic equations).
We solved the MINLP formulation for model selection and identification using an algorithm based on Scatter Search (SS). We illustrate the capabilities and efficiency of the proposed strategy with a case study considering the KdpD/KdpE system regulating potassium homeostasis in Escherichia coli. The proposed approach resulted in a final model that presents a better fit to the in silico generated experimental data.
The presented MINLP-based optimization approach for nested-model selection and identification is a powerful methodology for model development in systems biology. This strategy can be used to perform model selection and parameter estimation in one single step, thus greatly reducing the number of experiments and computations of traditional modeling approaches.
Dynamic modelling; Parameter estimation; Model discrimination; Global optimization
COnstraint-Based Reconstruction and Analysis (COBRA) methods are widely used for genome-scale modeling of metabolic networks in both prokaryotes and eukaryotes. Due to the successes with metabolism, there is an increasing effort to apply COBRA methods to reconstruct and analyze integrated models of cellular processes. The COBRA Toolbox for MATLAB is a leading software package for genome-scale analysis of metabolism; however, it was not designed to elegantly capture the complexity inherent in integrated biological networks and lacks an integration framework for the multiomics data used in systems biology. The openCOBRA Project is a community effort to promote constraints-based research through the distribution of freely available software.
Here, we describe COBRA for Python (COBRApy), a Python package that provides support for basic COBRA methods. COBRApy is designed in an object-oriented fashion that facilitates the representation of the complex biological processes of metabolism and gene expression. COBRApy does not require MATLAB to function; however, it includes an interface to the COBRA Toolbox for MATLAB to facilitate use of legacy codes. For improved performance, COBRApy includes parallel processing support for computationally intensive processes.
COBRApy is an object-oriented framework designed to meet the computational challenges associated with the next generation of stoichiometric constraint-based models and high-density omics data sets.
Genome-scale; Network reconstruction; Metabolism; Gene expression; Constraint-based modeling
Model selection and parameter inference are complex problems that have yet to be fully addressed in systems biology. In contrast with parameter optimisation, parameter inference computes both the parameter means and their standard deviations (or full posterior distributions), thus yielding important information on the extent to which the data and the model topology constrain the inferred parameter values.
We report on the application of nested sampling, a statistical approach to computing the Bayesian evidence Z, to the inference of parameters, and the estimation of log Z in an established model of circadian rhythms. A ten-fold difference in the coefficient of variation between degradation and transcription parameters is demonstrated. We further show that the uncertainty remaining in the parameter values is reduced by the analysis of increasing numbers of circadian cycles of data, up to 4 cycles, but is unaffected by sampling the data more frequently. Novel algorithms for calculating the likelihood of a model, and a characterisation of the performance of the nested sampling algorithm are also reported. The methods we develop considerably improve the computational efficiency of the likelihood calculation, and of the exploratory step within nested sampling.
We have demonstrated in an exemplar circadian model that the estimates of posterior parameter densities (as summarised by parameter means and standard deviations) are influenced predominately by the length of the time series, becoming more narrowly constrained as the number of circadian cycles considered increases. We have also shown the utility of the coefficient of variation for discriminating between highly-constrained and less-well constrained parameters.
Model selection; Parameter inference; Nested sampling; Circadian rhythms
The dynamics of gene regulation play a crucial role in a cellular control: allowing the cell to express the right proteins to meet changing needs. Some needs, such as correctly anticipating the day-night cycle, require complicated oscillatory features. In the analysis of gene regulatory networks, mathematical models are frequently used to understand how a network’s structure enables it to respond appropriately to external inputs. These models typically consist of a set of ordinary differential equations, describing a network of biochemical reactions, and unknown kinetic parameters, chosen such that the model best captures experimental data. However, since a model’s parameter values are uncertain, and since dynamic responses to inputs are highly parameter-dependent, it is difficult to assess the confidence associated with these in silico predictions. In particular, models with complex dynamics - such as oscillations - must be fit with computationally expensive global optimization routines, and cannot take advantage of existing measures of identifiability. Despite their difficulty to model mathematically, limit cycle oscillations play a key role in many biological processes, including cell cycling, metabolism, neuron firing, and circadian rhythms.
In this study, we employ an efficient parameter estimation technique to enable a bootstrap uncertainty analysis for limit cycle models. Since the primary role of systems biology models is the insight they provide on responses to rate perturbations, we extend our uncertainty analysis to include first order sensitivity coefficients. Using a literature model of circadian rhythms, we show how predictive precision is degraded with decreasing sample points and increasing relative error. Additionally, we show how this method can be used for model discrimination by comparing the output identifiability of two candidate model structures to published literature data.
Our method permits modellers of oscillatory systems to confidently show that a model’s dynamic characteristics follow directly from experimental data and model structure, relaxing assumptions on the particular parameters chosen. Ultimately, this work highlights the importance of continued collection of high-resolution data on gene and protein activity levels, as they allow the development of predictive mathematical models.
Bootstrap; Identifiability; Oscillatory models; Circadian rhythms; Sensitivity analysis; Parameter estimation
Time-course microarray experiments have been widely used to identify cell cycle regulated genes. However, the method is not effective for lowly expressed genes and is sensitive to experimental conditions. To complement microarray experiments, we propose a computational method to predict cell cycle regulated genes based on their genomic features – transcription factor binding and motif profiles.
Through integrating gene-expression data with ChIP-chip binding and putative binding sites of transcription factors, our method shows high accuracy in discriminating yeast cell cycle regulated genes from non-cell cycle regulated ones. We predict 211 novel cell cycle regulated genes. Our model rediscovers the main cell cycle transcription factors and provides new insights into the regulatory mechanisms. The model also reveals a regulatory circuit mediated by a number of key cell cycle regulators.
Our model suggests that the periodical pattern of cell cycle genes is largely coded in their promoter regions, which can be captured by motif and transcription factor binding data. Cell cycle is controlled by a relatively small number of master transcription factors. The concept of genomic feature based method can be readily extended to human cell cycle process and other transcriptionally regulated processes, such as tissue-specific expression.
Cell cycle regulated genes; Genomic features; Prediction
Influenza infection causes respiratory disease that can lead to death. The complex interplay between virus-encoded and host-specific pathogenicity regulators – and the relative contributions of each toward viral pathogenicity – is not well-understood.
By analyzing a collection of lung samples from mice infected by A/Vietnam/1203/2004 (H5N1; VN1203), we characterized a signature of transcripts and proteins associated with the kinetics of the host response. Using a new geometrical representation method and two criteria, we show that inoculation concentrations and four specific mutations in VN1203 mainly impact the magnitude and velocity of the host response kinetics, rather than specific sets of up- and down- regulated genes. We observed analogous kinetic effects using lung samples from mice infected with A/California/04/2009 (H1N1), and we show that these effects correlate with morbidity and viral titer.
We have demonstrated the importance of the kinetics of the host response to H5N1 pathogenesis and its relationship with clinical disease severity and virus replication. These kinetic properties imply that time-matched comparisons of ‘omics profiles to viral infections give limited views to differentiate host-responses. Moreover, these results demonstrate that a fast activation of the host-response at the earliest time points post-infection is critical for protective mechanisms against fast replicating viruses.
Influenza; Host; Response; Kinetics; Magnitude; Velocity; Transcriptomics; Proteomics; Multidimensional; Scaling
Microarray experiments can simultaneously identify thousands of genes that show significant perturbation in expression between two experimental conditions. Response networks, computed through the integration of gene interaction networks with expression perturbation data, may themselves contain tens of thousands of interactions. Gene set enrichment has become standard for summarizing the results of these analyses in terms functionally coherent collections of genes such as biological processes. However, even these methods can yield hundreds of enriched functions that may overlap considerably.
We describe a new technique called Markov chain Monte Carlo Biological Process Networks (MCMC-BPN) capable of reporting a highly non-redundant set of links between processes that describe the molecular interactions that are perturbed under a specific biological context. Each link in the BPN represents the perturbed interactions that serve as the interfaces between the two processes connected by the link.
We apply MCMC-BPN to publicly available liver-related datasets to demonstrate that the networks formed by the most probable inter-process links reported by MCMC-BPN show high relevance to each biological condition. We show that MCMC-BPN’s ability to discern the few key links from in a very large solution space by comparing results from two other methods for detecting inter-process links.
MCMC-BPN is successful in using few inter-process links to explain as many of the perturbed gene-gene interactions as possible. Thereby, BPNs summarize the important biological trends within a response network by reporting a digestible number of inter-process links that can be explored in greater detail.
Molecular interaction networks; Gene expression data; Networks of biological processes; Data integration; Markov chain Monte Carlo
Apoptosis is a tightly regulated process: cellular survive-or-die decisions cannot be accidental and must be unambiguous. Since the suicide program may be initiated in response to numerous stress stimuli, signals transmitted through a number of checkpoints have to be eventually integrated.
In order to analyze possible mechanisms of the integration of multiple pro-apoptotic signals, we constructed a simple model of the Bcl-2 family regulatory module. The module collects upstream signals and processes them into life-or-death decisions by employing interactions between proteins from three subgroups of the Bcl-2 family: pro-apoptotic multidomain effectors, pro-survival multidomain restrainers, and pro-apoptotic single domain BH3-only proteins. Although the model is based on ordinary differential equations (ODEs), it demonstrates that the Bcl-2 family module behaves akin to a Boolean logic gate of the type dependent on levels of BH3-only proteins (represented by Bad) and restrainers (represented by Bcl-xL). A low level of pro-apoptotic Bad or a high level of pro-survival Bcl-xL implies gate AND, which allows for the initiation of apoptosis only when two stress stimuli are simultaneously present: the rise of the p53 killer level and dephosphorylation of kinase Akt. In turn, a high level of Bad or a low level of Bcl-xL implies gate OR, for which any of these stimuli suffices for apoptosis.
Our study sheds light on possible signal integration mechanisms in cells, and spans a bridge between modeling approaches based on ODEs and on Boolean logic. In the proposed scheme, logic gates switching results from the change of relative abundances of interacting proteins in response to signals and involves system bistability. Consequently, the regulatory system may process two analogous inputs into a digital survive-or-die decision.
Apoptosis; Cell survival; Signaling pathway; Bcl-2 family; Bistability; Boolean logic; Ordinary differential equations
Integrative and comparative analyses of multiple transcriptomics, proteomics and metabolomics datasets require an intensive knowledge of tools and background concepts. Thus, it is challenging for users to perform such analyses, highlighting the need for a single tool for such purposes. The 3Omics one-click web tool was developed to visualize and rapidly integrate multiple human inter- or intra-transcriptomic, proteomic, and metabolomic data by combining five commonly used analyses: correlation networking, coexpression, phenotyping, pathway enrichment, and GO (Gene Ontology) enrichment.
3Omics generates inter-omic correlation networks to visualize relationships in data with respect to time or experimental conditions for all transcripts, proteins and metabolites. If only two of three omics datasets are input, then 3Omics supplements the missing transcript, protein or metabolite information related to the input data by text-mining the PubMed database. 3Omics’ coexpression analysis assists in revealing functions shared among different omics datasets. 3Omics’ phenotype analysis integrates Online Mendelian Inheritance in Man with available transcript or protein data. Pathway enrichment analysis on metabolomics data by 3Omics reveals enriched pathways in the KEGG/HumanCyc database. 3Omics performs statistical Gene Ontology-based functional enrichment analyses to display significantly overrepresented GO terms in transcriptomic experiments. Although the principal application of 3Omics is the integration of multiple omics datasets, it is also capable of analyzing individual omics datasets. The information obtained from the analyses of 3Omics in Case Studies 1 and 2 are also in accordance with comprehensive findings in the literature.
3Omics incorporates the advantages and functionality of existing software into a single platform, thereby simplifying data analysis and enabling the user to perform a one-click integrated analysis. Visualization and analysis results are downloadable for further user customization and analysis. The 3Omics software can be freely accessed at http://3omics.cmdm.tw.
Visualization; Omics integration; Systems biology; Transcriptomics; Proteomics; Metabolomics; Analysis
Filopodia are small cellular projections that help cells to move through and sense their environment. Filopodia play crucial roles in processes such as development and wound-healing. Also, increases in filopodia number or size are characteristic of many invasive cancers and are correlated with increased rates of metastasis in mouse experiments. Thus, one possible route to developing anti-metastatic therapies is to target factors that influence the filopodia system. Filopodia can be detected by eye using confocal fluorescence microscopy, and they can be manually annotated in images to quantify filopodia parameters. Although this approach is accurate, it is slow, tedious and not entirely objective. Manual detection is a significant barrier to the discovery and quantification of new factors that influence the filopodia system.
Here, we present FiloDetect, an automated tool for detecting, counting and measuring the length of filopodia in fluorescence microscopy images. The method first segments the cell from the background, using a modified triangle threshold method, and then extracts the filopodia using a series of morphological operations. We verified the accuracy of FiloDetect on Rat2 and B16F1 cell images from three different labs, showing that per-cell filopodia counts and length estimates are highly correlated with the manual annotations. We then used FiloDetect to assess the role of a lipid kinase on filopodia production in breast cancer cells. Experimental results show that PI4KIII β expression leads to an increase in filopodia number and length, suggesting that PI4KIII β is involved in driving filopodia production.
FiloDetect provides accurate and objective quantification of filopodia in microscopy images, and will enable large scale comparative studies to assess the effects of different genetic and chemical perturbations on filopodia production in different cell types, including cancer cell lines.
Filopodia; Morphology; FiloDetect; Microscopy image
Apoptosis is a cell suicide mechanism that enables multicellular organisms to maintain homeostasis and to eliminate individual cells that threaten the organism’s survival. Dependent on the type of stimulus, apoptosis can be propagated by extrinsic pathway or intrinsic pathway. The comprehensive understanding of the molecular mechanism of apoptotic signaling allows for development of mathematical models, aiming to elucidate dynamical and systems properties of apoptotic signaling networks. There have been extensive efforts in modeling deterministic apoptosis network accounting for average behavior of a population of cells. Cellular networks, however, are inherently stochastic and significant cell-to-cell variability in apoptosis response has been observed at single cell level.
To address the inevitable randomness in the intrinsic apoptosis mechanism, we develop a theoretical and computational modeling framework of intrinsic apoptosis pathway at single-cell level, accounting for both deterministic and stochastic behavior. Our deterministic model, adapted from the well-accepted Fussenegger model, shows that an additional positive feedback between the executioner caspase and the initiator caspase plays a fundamental role in yielding the desired property of bistability. We then examine the impact of intrinsic fluctuations of biochemical reactions, viewed as intrinsic noise, and natural variation of protein concentrations, viewed as extrinsic noise, on behavior of the intrinsic apoptosis network. Histograms of the steady-state output at varying input levels show that the intrinsic noise could elicit a wider region of bistability over that of the deterministic model. However, the system stochasticity due to intrinsic fluctuations, such as the noise of steady-state response and the randomness of response delay, shows that the intrinsic noise in general is insufficient to produce significant cell-to-cell variations at physiologically relevant level of molecular numbers. Furthermore, the extrinsic noise represented by random variations of two key apoptotic proteins, namely Cytochrome C and inhibitor of apoptosis proteins (IAP), is modeled separately or in combination with intrinsic noise. The resultant stochasticity in the timing of intrinsic apoptosis response shows that the fluctuating protein variations can induce cell-to-cell stochastic variability at a quantitative level agreeing with experiments. Finally, simulations illustrate that the mean abundance of fluctuating IAP protein is positively correlated with the degree of cellular stochasticity of the intrinsic apoptosis pathway.
Our theoretical and computational study shows that the pronounced non-genetic heterogeneity in intrinsic apoptosis responses among individual cells plausibly arises from extrinsic rather than intrinsic origin of fluctuations. In addition, it predicts that the IAP protein could serve as a potential therapeutic target for suppression of the cell-to-cell variation in the intrinsic apoptosis responsiveness.
Intrinsic apoptosis pathway; Stochastic model; Intrinsic noise; Extrinsic noise
Despite the close association between gene expression and metabolism, experimental evidence shows that gene expression levels alone cannot predict metabolic phenotypes, indicating a knowledge gap in our understanding of how these processes are connected. Here, we present a method that integrates transcriptome, fluxome, and metabolome data using kinetic models to create a mechanistic link between gene expression and metabolism.
We developed a modeling framework to construct kinetic models that connect the transcriptional and metabolic responses of a cell to exogenous perturbations. The framework allowed us to avoid extensive experimental characterization, literature mining, and optimization problems by estimating most model parameters directly from fluxome and transcriptome data. We applied the framework to investigate how gene expression changes led to observed phenotypic alterations of Saccharomyces cerevisiae treated with weak organic acids (i.e., acetate, benzoate, propionate, or sorbate) and the histidine synthesis inhibitor 3-aminotriazole under steady-state conditions. We found that the transcriptional response led to alterations in yeast metabolism that mimicked measured metabolic fluxes and concentration changes. Further analyses generated mechanistic insights of how S. cerevisiae responds to these stresses. In particular, these results suggest that S. cerevisiae uses different regulation strategies for responding to these insults: regulation of two reactions accounted for most of the tolerance to the four weak organic acids, whereas the response to 3-aminotriazole was distributed among multiple reactions. Moreover, we observed that the magnitude of the gene expression changes was not directly correlated with their effect on the ability of S. cerevisiae to grow under these treatments. In addition, we identified another potential mechanism of action of 3-aminotriazole associated with the depletion of tetrahydrofolate.
Our simulation results show that the modeling framework provided an accurate mechanistic link between gene expression and cellular metabolism. The proposed method allowed us to integrate transcriptome, fluxome, and metabolome data to determine and interpret important features of the physiological response of yeast to stresses. Importantly, given its flexibility and robustness, our approach can be applied to investigate the transcriptional-metabolic response in other cellular systems of medical and industrial relevance.
Gene expression; Kinetic models; Metabolic networks; S. cerevisiae; Transcriptomics; Fluxomics; Metabolomics
The study of metabolism has attracted much attention during the last years due to its relevance in various diseases. The advance in metabolomics platforms allows us to detect an increasing number of metabolites in abnormal high/low concentration in a disease phenotype. Finding a mechanistic interpretation for these alterations is important to understand pathophysiological processes, however it is not an easy task. The availability of genome scale metabolic networks and Systems Biology techniques open new avenues to address this question.
In this article we present a novel mathematical framework to find enzymes whose malfunction explains the accumulation/depletion of a given metabolite in a disease phenotype. Our approach is based on a recently introduced pathway concept termed Carbon Flux Paths (CFPs), which extends classical topological definition by including network stoichiometry. Using CFPs, we determine the Connectivity Curve of an altered metabolite, which allows us to quantify changes in its pathway structure when a certain enzyme is removed. The influence of enzyme removal is then ranked and used to explain the accumulation/depletion of such metabolite. For illustration, we center our study in the accumulation of two metabolites (L-Cystine and Homocysteine) found in high concentration in the brain of patients with mental disorders. Our results were discussed based on literature and found a good agreement with previously reported mechanisms. In addition, we hypothesize a novel role of several enzymes for the accumulation of these metabolites, which opens new strategies to understand the metabolic processes underlying these diseases.
With personalized medicine on the horizon, metabolomic platforms are providing us with a vast amount of experimental data for a number of complex diseases. Our approach provides a novel apparatus to rationally investigate and understand metabolite alterations under disease phenotypes. This work contributes to the development of Systems Medicine, whose objective is to answer clinical questions based on theoretical methods and high-throughput “omics” data.
Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database.
The function of a protein can be predicted by counting the occurrence frequency of GO (gene ontology) terms associated with domains of direct neighbors. The average success rate and coverage were 34.3% and 43.9%, respectively, for the test genomes, and were increased to 37.9% and 51.3% when a composite PON of the four species was used for the prediction. As a comparison, the success rate was 7.0% in the random control procedure. We also made predictions with GO term annotations of the second layer nodes using the composite network and obtained an impressive success rate (>30%) and coverage (>30%), even for small genomes. Further improvement was achieved by statistical analysis of manually annotated GO terms for each neighboring protein.
The PONs are composed of dense modules accompanied by a few long distance connections. Based on the PONs, we developed multiple approaches effective for protein function prediction.
Protein overlap network; Protein function prediction; Composite network; Functional genomics
The development of ovarian follicles hinges on the timely exposure to the appropriate combination of hormones. Follicle stimulating hormone (FSH) and luteinizing hormone (LH) are both produced in the pituitary gland and are transported via the blood circulation to the thecal layer surrounding the follicle. From there both hormones are transported into the follicle by diffusion. FSH-receptors are expressed mainly in the granulosa while LH-receptors are expressed in a gradient with highest expression in the theca. How this spatial organization is achieved is not known. Equally it is not understood whether LH and FSH trigger distinct signalling programs or whether the distinct spatial localization of their G-protein coupled receptors is sufficient to convey their distinct biological function.
We have developed a data-based computational model of the spatio-temporal signalling processes within the follicle and (i) predict that FSH and LH form a gradient inside the follicle, (ii) show that the spatial distribution of FSH- and LH-receptors can arise from the well known regulatory interactions, and (iii) find that the differential activity of FSH and LH may well result from the distinct spatial localisation of their receptors, even when both receptors respond with the same intracellular signalling cascade to their ligand.
The model integrates the large amount of published data into a consistent framework that can now be used to better understand how observed defects translate into failed follicle maturation.
Ovarian follicle development; PDE model; Computational biology; Bovine