Cellular decision-making is mediated by a complex interplay of external stimuli with the intracellular environment, in particular transcription factor regulatory networks. Here we have determined the expression of a network of 18 key haematopoietic transcription factors (TFs) in 597 single primary blood stem and progenitor cells isolated from mouse bone marrow. We demonstrate that different stem/progenitor populations are characterised by distinctive TF expression states, and through comprehensive bioinformatic analysis reveal positively and negatively correlated TF pairings, including previously unrecognised relationships between Gata2, Gfi1 and Gfi1b. Validation using transcriptional and transgenic assays confirmed direct regulatory interactions consistent with a regulatory triad in immature blood stem cells, where Gata2 may function to modulate cross-inhibition between Gfi1 and Gfi1b. Single cell expression profiling therefore identifies network states and allows reconstruction of network hierarchies involved in controlling stem cell fate choices, and provides a blueprint for studying both normal development and human disease.
Due to the high complexity of biological data it is difficult to disentangle cellular processes relying only on intuitive interpretation of measurements. A Systems Biology approach that combines quantitative experimental data with dynamic mathematical modeling promises to yield deeper insights into these processes. Nevertheless, with growing complexity and increasing amount of quantitative experimental data, building realistic and reliable mathematical models can become a challenging task: the quality of experimental data has to be assessed objectively, unknown model parameters need to be estimated from the experimental data, and numerical calculations need to be precise and efficient.
Here, we discuss, compare and characterize the performance of computational methods throughout the process of quantitative dynamic modeling using two previously established examples, for which quantitative, dose- and time-resolved experimental data are available. In particular, we present an approach that allows to determine the quality of experimental data in an efficient, objective and automated manner. Using this approach data generated by different measurement techniques and even in single replicates can be reliably used for mathematical modeling. For the estimation of unknown model parameters, the performance of different optimization algorithms was compared systematically. Our results show that deterministic derivative-based optimization employing the sensitivity equations in combination with a multi-start strategy based on latin hypercube sampling outperforms the other methods by orders of magnitude in accuracy and speed. Finally, we investigated transformations that yield a more efficient parameterization of the model and therefore lead to a further enhancement in optimization performance. We provide a freely available open source software package that implements the algorithms and examples compared here.
Modern high-throughput methods allow the investigation of biological functions across multiple ‘omics’ levels. Levels include mRNA and protein expression profiling as well as additional knowledge on, for example, DNA methylation and microRNA regulation. The reason for this interest in multi-omics is that actual cellular responses to different conditions are best explained mechanistically when taking all omics levels into account. To map gene products to their biological functions, public ontologies like Gene Ontology are commonly used. Many methods have been developed to identify terms in an ontology, overrepresented within a set of genes. However, these methods are not able to appropriately deal with any combination of several data types. Here, we propose a new method to analyse integrated data across multiple omics-levels to simultaneously assess their biological meaning. We developed a model-based Bayesian method for inferring interpretable term probabilities in a modular framework. Our Multi-level ONtology Analysis (MONA) algorithm performed significantly better than conventional analyses of individual levels and yields best results even for sophisticated models including mRNA fine-tuning by microRNAs. The MONA framework is flexible enough to allow for different underlying regulatory motifs or ontologies. It is ready-to-use for applied researchers and is available as a standalone application from http://icb.helmholtz-muenchen.de/mona.
Diffusion is a key component of many biological processes such as chemotaxis, developmental differentiation and tissue morphogenesis. Since recently, the spatial gradients caused by diffusion can be assessed in-vitro and in-vivo using microscopy based imaging techniques. The resulting time-series of two dimensional, high-resolutions images in combination with mechanistic models enable the quantitative analysis of the underlying mechanisms. However, such a model-based analysis is still challenging due to measurement noise and sparse observations, which result in uncertainties of the model parameters.
We introduce a likelihood function for image-based measurements with log-normal distributed noise. Based upon this likelihood function we formulate the maximum likelihood estimation problem, which is solved using PDE-constrained optimization methods. To assess the uncertainty and practical identifiability of the parameters we introduce profile likelihoods for diffusion processes.
Results and conclusion
As proof of concept, we model certain aspects of the guidance of dendritic cells towards lymphatic vessels, an example for haptotaxis. Using a realistic set of artificial measurement data, we estimate the five kinetic parameters of this model and compute profile likelihoods. Our novel approach for the estimation of model parameters from image data as well as the proposed identifiability analysis approach is widely applicable to diffusion processes. The profile likelihood based method provides more rigorous uncertainty bounds in contrast to local approximation methods.
Serum urate, the final breakdown product of purine metabolism, is causally involved in the pathogenesis of gout, and implicated in cardiovascular disease and type 2 diabetes. Serum urate levels highly differ between men and women; however the underlying biological processes in its regulation are still not completely understood and are assumed to result from a complex interplay between genetic, environmental and lifestyle factors. In order to describe the metabolic vicinity of serum urate, we analyzed 355 metabolites in 1,764 individuals of the population-based KORA F4 study and constructed a metabolite network around serum urate using Gaussian Graphical Modeling in a hypothesis-free approach. We subsequently investigated the effect of sex and urate lowering medication on all 38 metabolites assigned to the network. Within the resulting network three main clusters could be detected around urate, including the well-known pathway of purine metabolism, as well as several dipeptides, a group of essential amino acids, and a group of steroids. Of the 38 assigned metabolites, 25 showed strong differences between sexes. Association with uricostatic medication intake was not only confined to purine metabolism but seen for seven metabolites within the network. Our findings highlight pathways that are important in the regulation of serum urate and suggest that dipeptides, amino acids, and steroid hormones are playing a role in its regulation. The findings might have an impact on the development of specific targets in the treatment and prevention of hyperuricemia.
Electronic supplementary material
The online version of this article (doi:10.1007/s11306-013-0565-2) contains supplementary material, which is available to authorized users.
Gaussian Graphical Modeling; Metabolite network; Pathway reconstruction; Allopurinol; Uric acid; Purine metabolism
MiRNAs are short, non-coding RNAs that regulate gene expression post-transcriptionally through specific binding to mRNA. Deregulation of miRNAs is associated with various diseases and interference with miRNA function has proven therapeutic potential. Most mRNAs are thought to be regulated by multiple miRNAs and there is some evidence that such joint activity is enhanced if a short distance between sites allows for cooperative binding. Until now, however, the concept of cooperativity among miRNAs has not been addressed in a transcriptome-wide approach. Here, we computationally screened human mRNAs for distances between miRNA binding sites that are expected to promote cooperativity. We find that sites with a maximal spacing of 26 nucleotides are enriched for naturally occurring miRNAs compared with control sequences. Furthermore, miRNAs with similar characteristics as indicated by either co-expression within a specific tissue or co-regulation in a disease context are predicted to target a higher number of mRNAs cooperatively than unrelated miRNAs. These bioinformatic data were compared with genome-wide sets of biochemically validated miRNA targets derived by Argonaute crosslinking and immunoprecipitation (HITS-CLIP and PAR-CLIP). To ease further research into combined and cooperative miRNA function, we developed miRco, a database connecting miRNAs and respective targets involved in distance-defined cooperative regulation (mips.helmholtz-muenchen.de/mirco). In conclusion, our findings suggest that cooperativity of miRNA-target interaction is a widespread phenomenon that may play an important role in miRNA-mediated gene regulation.
microRNA; target regulation; target prediction; cooperativity
Metabolomics is a relatively new high-throughput technology that aims at measuring all endogenous metabolites within a biological sample in an unbiased fashion. The resulting metabolic profiles may be regarded as functional signatures of the physiological state, and have been shown to comprise effects of genetic regulation as well as environmental factors. This potential to connect genotypic to phenotypic information promises new insights and biomarkers for different research fields, including biomedical and pharmaceutical research. In the statistical analysis of metabolomics data, many techniques from other omics fields can be reused. However recently, a number of tools specific for metabolomics data have been developed as well. The focus of this mini review will be on recent advancements in the analysis of metabolomics data especially by utilizing Gaussian graphical models and independent component analysis.
MicroRNAs have emerged as key posttranscriptional regulators of gene expression during vertebrate development. We show that the miR-200 family plays a crucial role for the proper generation and survival of ventral neuronal populations in the murine midbrain/hindbrain region, including midbrain dopaminergic neurons, by directly targeting the pluripotency factor Sox2 and the cell-cycle regulator E2F3 in neural stem/progenitor cells. The lack of a negative regulation of Sox2 and E2F3 by miR-200 in conditional Dicer1 mutants (En1+/Cre; Dicer1flox/flox mice) and after miR-200 knockdown in vitro leads to a strongly reduced cell-cycle exit and neuronal differentiation of ventral midbrain/hindbrain (vMH) neural progenitors, whereas the opposite effect is seen after miR-200 overexpression in primary vMH cells. Expression of miR-200 is in turn directly regulated by Sox2 and E2F3, thereby establishing a unilateral negative feedback loop required for the cell-cycle exit and neuronal differentiation of neural stem/progenitor cells. Our findings suggest that the posttranscriptional regulation of Sox2 and E2F3 by miR-200 family members might be a general mechanism to control the transition from a pluripotent/multipotent stem/progenitor cell to a postmitotic and more differentiated cell.
For decades, cold-adapted, temperature-sensitive (ca/ts) strains of influenza A virus have been used as live attenuated vaccines. Due to their great public health importance it is crucial to understand the molecular mechanism(s) of cold adaptation and temperature sensitivity that are currently unknown. For instance, secondary RNA structures play important roles in influenza biology. Thus, we hypothesized that a relatively minor change in temperature (32–39°C) can lead to perturbations in influenza RNA structures and, that these structural perturbations may be different for mRNAs of the wild type (wt) and ca/ts strains. To test this hypothesis, we developed a novel in silico method that enables assessing whether two related RNA molecules would undergo (dis)similar structural perturbations upon temperature change. The proposed method allows identifying those areas within an RNA chain where dissimilarities of RNA secondary structures at two different temperatures are particularly pronounced, without knowing particular RNA shapes at either temperature. We identified such areas in the NS2, PA, PB2 and NP mRNAs. However, these areas are not identical for the wt and ca/ts mutants. Differences in temperature-induced structural changes of wt and ca/ts mRNA structures may constitute a yet unappreciated molecular mechanism of the cold adaptation/temperature sensitivity phenomena.
influenza; RNA; structure; temperature; vaccine
Motivation: Single-cell experiments of cells from the early mouse embryo yield gene expression data for different developmental stages from zygote to blastocyst. To better understand cell fate decisions during differentiation, it is desirable to analyse the high-dimensional gene expression data and assess differences in gene expression patterns between different developmental stages as well as within developmental stages. Conventional methods include univariate analyses of distributions of genes at different stages or multivariate linear methods such as principal component analysis (PCA). However, these approaches often fail to resolve important differences as each lineage has a unique gene expression pattern which changes gradually over time yielding different gene expressions both between different developmental stages as well as heterogeneous distributions at a specific stage. Furthermore, to date, no approach taking the temporal structure of the data into account has been presented.
Results: We present a novel framework based on Gaussian process latent variable models (GPLVMs) to analyse single-cell qPCR expression data of 48 genes from mouse zygote to blastocyst as presented by (Guo et al., 2010). We extend GPLVMs by introducing gene relevance maps and gradient plots to provide interpretability as in the linear case. Furthermore, we take the temporal group structure of the data into account and introduce a new factor in the GPLVM likelihood which ensures that small distances are preserved for cells from the same developmental stage. Using our novel framework, it is possible to resolve differences in gene expressions for all developmental stages. Furthermore, a new subpopulation of cells within the 16-cell stage is identified which is significantly more trophectoderm-like than the rest of the population. The trophectoderm-like subpopulation was characterized by considerable differences in the expression of Id2, Gata4 and, to a smaller extent, Klf4 and Hand1. The relevance of Id2 as early markers for TE cells is consistent with previously published results.
Availability: The mappings were implemented based on Prof. Neil Lawrence's FGPLVM toolbox1; extensions for relevance analysis and including the structure of the data can be obtained from one of the authors' homepage.2
In radiation protection, biokinetic models for zirconium processing are of crucial importance in dose estimation and further risk analysis for humans exposed to this radioactive substance. They provide limiting values of detrimental effects and build the basis for applications in internal dosimetry, the prediction for radioactive zirconium retention in various organs as well as retrospective dosimetry. Multi-compartmental models are the tool of choice for simulating the processing of zirconium. Although easily interpretable, determining the exact compartment structure and interaction mechanisms is generally daunting. In the context of observing the dynamics of multiple compartments, Bayesian methods provide efficient tools for model inference and selection.
We are the first to apply a Markov chain Monte Carlo approach to compute Bayes factors for the evaluation of two competing models for zirconium processing in the human body after ingestion. Based on in vivo measurements of human plasma and urine levels we were able to show that a recently published model is superior to the standard model of the International Commission on Radiological Protection. The Bayes factors were estimated by means of the numerically stable thermodynamic integration in combination with a recently developed copula-based Metropolis-Hastings sampler.
In contrast to the standard model the novel model predicts lower accretion of zirconium in bones. This results in lower levels of noxious doses for exposed individuals. Moreover, the Bayesian approach allows for retrospective dose assessment, including credible intervals for the initially ingested zirconium, in a significantly more reliable fashion than previously possible. All methods presented here are readily applicable to many modeling tasks in systems biology.
Bayesian inference; Model selection; MCMC sampling; Compartmental model; Internal dosimetry; Systems biology
Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain.
Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*α) is a conservative critical value for the p-gain, where α is the level of significance and B the number of tested metabolite pairs.
We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
p-gain; Metabolomics; MWAS; GWAS; Genome-wide association studies; Metabolome-wide association studies
Social contact with fungus-exposed ants leads to pathogen transfer to healthy nest-mates, causing low-level infections. These micro-infections promote pathogen-specific immune gene expression and protective immunization of nest-mates.
Due to the omnipresent risk of epidemics, insect societies have evolved sophisticated disease defences at the individual and colony level. An intriguing yet little understood phenomenon is that social contact to pathogen-exposed individuals reduces susceptibility of previously naive nestmates to this pathogen. We tested whether such social immunisation in Lasius ants against the entomopathogenic fungus Metarhizium anisopliae is based on active upregulation of the immune system of nestmates following contact to an infectious individual or passive protection via transfer of immune effectors among group members—that is, active versus passive immunisation. We found no evidence for involvement of passive immunisation via transfer of antimicrobials among colony members. Instead, intensive allogrooming behaviour between naive and pathogen-exposed ants before fungal conidia firmly attached to their cuticle suggested passage of the pathogen from the exposed individuals to their nestmates. By tracing fluorescence-labelled conidia we indeed detected frequent pathogen transfer to the nestmates, where they caused low-level infections as revealed by growth of small numbers of fungal colony forming units from their dissected body content. These infections rarely led to death, but instead promoted an enhanced ability to inhibit fungal growth and an active upregulation of immune genes involved in antifungal defences (defensin and prophenoloxidase, PPO). Contrarily, there was no upregulation of the gene cathepsin L, which is associated with antibacterial and antiviral defences, and we found no increased antibacterial activity of nestmates of fungus-exposed ants. This indicates that social immunisation after fungal exposure is specific, similar to recent findings for individual-level immune priming in invertebrates. Epidemiological modeling further suggests that active social immunisation is adaptive, as it leads to faster elimination of the disease and lower death rates than passive immunisation. Interestingly, humans have also utilised the protective effect of low-level infections to fight smallpox by intentional transfer of low pathogen doses (“variolation” or “inoculation”).
Close social contact facilitates pathogen transmission in societies, often causing epidemics. In contrast to this, we show that limited transmission of a fungal pathogen in ant colonies can be beneficial for the host, because it promotes “social immunisation” of healthy group members. We found that ants exposed to the fungus are heavily groomed by their healthy nestmates. Grooming removes a significant number of fungal conidiospores from the body surface of exposed ants and reduces their risk of falling sick. At the same time, previously healthy nestmates are themselves exposed to a small number of conidiospores, triggering low-level infections. These micro-infections are not deadly, but result in upregulated expression of a specific set of immune genes and pathogen-specific protective immune stimulation. Pathogen transfer by social interactions is therefore the underlying mechanism of social immunisation against fungal infections in ant societies. There is a similarity between such natural social immunisation and human efforts to induce immunity against deadly diseases, such as smallpox. Before vaccination with dead or attenuated strains was invented, immunity in human societies was induced by actively transferring low-level infections (“variolation”), just like in ants.
Although human musical performances represent one of the most valuable achievements of mankind, the best musicians perform imperfectly. Musical rhythms are not entirely accurate and thus inevitably deviate from the ideal beat pattern. Nevertheless, computer generated perfect beat patterns are frequently devalued by listeners due to a perceived lack of human touch. Professional audio editing software therefore offers a humanizing feature which artificially generates rhythmic fluctuations. However, the built-in humanizing units are essentially random number generators producing only simple uncorrelated fluctuations. Here, for the first time, we establish long-range fluctuations as an inevitable natural companion of both simple and complex human rhythmic performances. Moreover, we demonstrate that listeners strongly prefer long-range correlated fluctuations in musical rhythms. Thus, the favorable fluctuation type for humanizing interbeat intervals coincides with the one generically inherent in human musical performances.
Metabolomic profiling and the integration of whole-genome genetic association data has proven to be a powerful tool to comprehensively explore gene regulatory networks and to investigate the effects of genetic variation at the molecular level. Serum metabolite concentrations allow a direct readout of biological processes, and association of specific metabolomic signatures with complex diseases such as Alzheimer's disease and cardiovascular and metabolic disorders has been shown. There are well-known correlations between sex and the incidence, prevalence, age of onset, symptoms, and severity of a disease, as well as the reaction to drugs. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by gender. This study investigated sex-specific differences of serum metabolite concentrations and their underlying genetic determination. For discovery and replication we used more than 3,300 independent individuals from KORA F3 and F4 with metabolite measurements of 131 metabolites, including amino acids, phosphatidylcholines, sphingomyelins, acylcarnitines, and C6-sugars. A linear regression approach revealed significant concentration differences between males and females for 102 out of 131 metabolites (p-values<3.8×10−4; Bonferroni-corrected threshold). Sex-specific genome-wide association studies (GWAS) showed genome-wide significant differences in beta-estimates for SNPs in the CPS1 locus (carbamoyl-phosphate synthase 1, significance level: p<3.8×10−10; Bonferroni-corrected threshold) for glycine. We showed that the metabolite profiles of males and females are significantly different and, furthermore, that specific genetic variants in metabolism-related genes depict sexual dimorphism. Our study provides new important insights into sex-specific differences of cell regulatory processes and underscores that studies should consider sex-specific effects in design and interpretation.
The combination of genomic and metabolic studies during the last years has provided astonishing results. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by sex. The investigation of 131 serum metabolite concentrations of >3,300 population-based samples (KORA F3/F4) revealed significant differences in the metabolite profile of males and females. Furthermore, a genome-wide picture of sex-specific genetic variations in human metabolism (>2,000 subjects from KORA F3/F4 cohorts) was investigated. Sex-specific genome-wide association studies (GWAS) showed differences in the effect of genetic variations on metabolites in men and women. SNPs in the CPS1 (carbamoyl-phosphate synthase 1) locus showed genome-wide significant differences in beta-estimates of sex-specific association analysis (significance level: 3.8×10−10) for glycine. As global metabolomic techniques are more and more refined to identify more compounds in single biological samples, the predictive power of this new technology will greatly increase. This suggests that metabolites, which may be used as predictive biomarkers to indicate the presence or severity of a disease, have to be used selectively depending on sex.
Hematopoiesis is an ideal model system for stem cell biology with advanced experimental access. A systems view on the interactions of core transcription factors is important for understanding differentiation mechanisms and dynamics. In this manuscript, we construct a Boolean network to model myeloid differentiation, specifically from common myeloid progenitors to megakaryocytes, erythrocytes, granulocytes and monocytes. By interpreting the hematopoietic literature and translating experimental evidence into Boolean rules, we implement binary dynamics on the resulting 11-factor regulatory network. Our network contains interesting functional modules and a concatenation of mutual antagonistic pairs. The state space of our model is a hierarchical, acyclic graph, typifying the principles of myeloid differentiation. We observe excellent agreement between the steady states of our model and microarray expression profiles of two different studies. Moreover, perturbations of the network topology correctly reproduce reported knockout phenotypes in silico. We predict previously uncharacterized regulatory interactions and alterations of the differentiation process, and line out reprogramming strategies.
With the advent of high-throughput targeted metabolic profiling techniques, the question of how to interpret and analyze the resulting vast amount of data becomes more and more important. In this work we address the reconstruction of metabolic reactions from cross-sectional metabolomics data, that is without the requirement for time-resolved measurements or specific system perturbations. Previous studies in this area mainly focused on Pearson correlation coefficients, which however are generally incapable of distinguishing between direct and indirect metabolic interactions.
In our new approach we propose the application of a Gaussian graphical model (GGM), an undirected probabilistic graphical model estimating the conditional dependence between variables. GGMs are based on partial correlation coefficients, that is pairwise Pearson correlation coefficients conditioned against the correlation with all other metabolites. We first demonstrate the general validity of the method and its advantages over regular correlation networks with computer-simulated reaction systems. Then we estimate a GGM on data from a large human population cohort, covering 1020 fasting blood serum samples with 151 quantified metabolites. The GGM is much sparser than the correlation network, shows a modular structure with respect to metabolite classes, and is stable to the choice of samples in the data set. On the example of human fatty acid metabolism, we demonstrate for the first time that high partial correlation coefficients generally correspond to known metabolic reactions. This feature is evaluated both manually by investigating specific pairs of high-scoring metabolites, and then systematically on a literature-curated model of fatty acid synthesis and degradation. Our method detects many known reactions along with possibly novel pathway interactions, representing candidates for further experimental examination.
In summary, we demonstrate strong signatures of intracellular pathways in blood serum data, and provide a valuable tool for the unbiased reconstruction of metabolic reactions from large-scale metabolomics data sets.
External stimulations of cells by hormones, cytokines or growth factors activate signal transduction pathways that subsequently induce a re-arrangement of cellular gene expression. The analysis of such changes is complicated, as they consist of multi-layered temporal responses. While classical analyses based on clustering or gene set enrichment only partly reveal this information, matrix factorization techniques are well suited for a detailed temporal analysis. In signal processing, factorization techniques incorporating data properties like spatial and temporal correlation structure have shown to be robust and computationally efficient. However, such correlation-based methods have so far not be applied in bioinformatics, because large scale biological data rarely imply a natural order that allows the definition of a delayed correlation function.
We therefore develop the concept of graph-decorrelation. We encode prior knowledge like transcriptional regulation, protein interactions or metabolic pathways in a weighted directed graph. By linking features along this underlying graph, we introduce a partial ordering of the features (e.g. genes) and are thus able to define a graph-delayed correlation function. Using this framework as constraint to the matrix factorization task allows us to set up the fast and robust graph-decorrelation algorithm (GraDe). To analyze alterations in the gene response in IL-6 stimulated primary mouse hepatocytes, we performed a time-course microarray experiment and applied GraDe. In contrast to standard techniques, the extracted time-resolved gene expression profiles showed that IL-6 activates genes involved in cell cycle progression and cell division. Genes linked to metabolic and apoptotic processes are down-regulated indicating that IL-6 mediated priming renders hepatocytes more responsive towards cell proliferation and reduces expenditures for the energy metabolism.
GraDe provides a novel framework for the decomposition of large-scale 'omics' data. We were able to show that including prior knowledge into the separation task leads to a much more structured and detailed separation of the time-dependent responses upon IL-6 stimulation compared to standard methods. A Matlab implementation of the GraDe algorithm is freely available at http://cmb.helmholtz-muenchen.de/grade.
Extensive and automated data integration in bioinformatics facilitates the construction of large, complex biological networks. However, the challenge lies in the interpretation of these networks. While most research focuses on the unipartite or bipartite case, we address the more general but common situation of k-partite graphs. These graphs contain k different node types and links are only allowed between nodes of different types. In order to reveal their structural organization and describe the contained information in a more coarse-grained fashion, we ask how to detect clusters within each node type.
Since entities in biological networks regularly have more than one function and hence participate in more than one cluster, we developed a k-partite graph partitioning algorithm that allows for overlapping (fuzzy) clusters. It determines for each node a degree of membership to each cluster. Moreover, the algorithm estimates a weighted k-partite graph that connects the extracted clusters. Our method is fast and efficient, mimicking the multiplicative update rules commonly employed in algorithms for non-negative matrix factorization. It facilitates the decomposition of networks on a chosen scale and therefore allows for analysis and interpretation of structures on various resolution levels. Applying our algorithm to a tripartite disease-gene-protein complex network, we were able to structure this graph on a large scale into clusters that are functionally correlated and biologically meaningful. Locally, smaller clusters enabled reclassification or annotation of the clusters' elements. We exemplified this for the transcription factor MECP2.
In order to cope with the overwhelming amount of information available from biomedical literature, we need to tackle the challenge of finding structures in large networks with nodes of multiple types. To this end, we presented a novel fuzzy k-partite graph partitioning algorithm that allows the decomposition of these objects in a comprehensive fashion. We validated our approach both on artificial and real-world data. It is readily applicable to any further problem.
The set of regulatory interactions between genes, mediated by transcription factors, forms a species' transcriptional regulatory network (TRN). By comparing this network with measured gene expression data, one can identify functional properties of the TRN and gain general insight into transcriptional control. We define the subnet of a node as the subgraph consisting of all nodes topologically downstream of the node, including itself. Using a large set of microarray expression data of the bacterium Escherichia coli, we find that the gene expression in different subnets exhibits a structured pattern in response to environmental changes and genotypic mutation. Subnets with fewer changes in their expression pattern have a higher fraction of feed-forward loop motifs and a lower fraction of small RNA targets within them. Our study implies that the TRN consists of several scales of regulatory organization: (1) subnets with more varying gene expression controlled by both transcription factors and post-transcriptional RNA regulation and (2) subnets with less varying gene expression having more feed-forward loops and less post-transcriptional RNA regulation.
Bacterial cells can adapt to various genomic mutations and intriguingly many environmental changes. They do this by adjusting their gene expression profile to meet the requirements of a new condition. In this work, we study the interplay of different mechanisms of gene regulatory control driving this adaptation in the bacterium E. coli. We deconstruct the network of all transcription factor mediated regulatory interactions into subnets, topologically defined subgraphs which we expect to act as information processing units. Indeed, we find that many subnets react coordinately to cellular stress, and are used by the cells to account for mutations. In these subnets, we also find many small RNA targets. In contrast, those subnets that do not act in a coordinated fashion are highly enriched with feed-forward loops, a 3-node network motif with important information processing properties. Our approach reveals correlations and anti-correlations of three scales of regulatory control: subnets, feed-forward loops, and small RNA.
MicroRNAs are a large class of post-transcriptional regulators that bind to the 3′ untranslated region of messenger RNAs. They play a critical role in many cellular processes and have been linked to the control of signal transduction pathways. Recent studies indicate that microRNAs can function as tumor suppressors or even as oncogenes when aberrantly expressed. For more general insights of disease-associated microRNAs, we analyzed their impact on human signaling pathways from two perspectives. On a global scale, we found a core set of signaling pathways with enriched tissue-specific microRNA targets across diseases. The function of these pathways reflects the affinity of microRNAs to regulate cellular processes associated with apoptosis, proliferation or development. Comparing cancer and non-cancer related microRNAs, we found no significant differences between both groups. To unveil the interaction and regulation of microRNAs on signaling pathways locally, we analyzed the cellular location and process type of disease-associated microRNA targets and proteins. While disease-associated proteins are highly enriched in extracellular components of the pathway, microRNA targets are preferentially located in the nucleus. Moreover, targets of disease-associated microRNAs preferentially exhibit an inhibitory effect within the pathways in contrast to disease proteins. Our analysis provides systematic insights into the interaction of disease-associated microRNAs and signaling pathways and uncovers differences in cellular locations and process types of microRNA targets and disease-associated proteins.
Phenomenological information about regulatory interactions is frequently available and can be readily converted to Boolean models. Fully quantitative models, on the other hand, provide detailed insights into the precise dynamics of the underlying system. In order to connect discrete and continuous modeling approaches, methods for the conversion of Boolean systems into systems of ordinary differential equations have been developed recently. As biological interaction networks have steadily grown in size and complexity, a fully automated framework for the conversion process is desirable.
We present Odefy, a MATLAB- and Octave-compatible toolbox for the automated transformation of Boolean models into systems of ordinary differential equations. Models can be created from sets of Boolean equations or graph representations of Boolean networks. Alternatively, the user can import Boolean models from the CellNetAnalyzer toolbox, GINSim and the PBN toolbox. The Boolean models are transformed to systems of ordinary differential equations by multivariate polynomial interpolation and optional application of sigmoidal Hill functions. Our toolbox contains basic simulation and visualization functionalities for both, the Boolean as well as the continuous models. For further analyses, models can be exported to SQUAD, GNA, MATLAB script files, the SB toolbox, SBML and R script files. Odefy contains a user-friendly graphical user interface for convenient access to the simulation and exporting functionalities. We illustrate the validity of our transformation approach as well as the usage and benefit of the Odefy toolbox for two biological systems: a mutual inhibitory switch known from stem cell differentiation and a regulatory network giving rise to a specific spatial expression pattern at the mid-hindbrain boundary.
Odefy provides an easy-to-use toolbox for the automatic conversion of Boolean models to systems of ordinary differential equations. It can be efficiently connected to a variety of input and output formats for further analysis and investigations. The toolbox is open-source and can be downloaded at http://cmb.helmholtz-muenchen.de/odefy.
MicroRNA-mediated control of gene expression via translational inhibition has substantial impact on cellular regulatory mechanisms. About 37% of mammalian microRNAs appear to be located within introns of protein coding genes, linking their expression to the promoter-driven regulation of the host gene. In our study we investigate this linkage towards a relationship beyond transcriptional co-regulation.
Using measures based on both annotation and experimental data, we show that intronic microRNAs tend to support their host genes by regulation of target gene expression with significantly correlated expression patterns. We used expression data of three differentiating cell types and compared gene expression profiles of host and target genes. Many microRNA target genes show expression patterns significantly correlated with the expressions of the microRNA host genes. By calculating functional similarities between host and predicted microRNA target genes based on GO annotations, we confirm that many microRNAs link host and target gene activity in an either synergistic or antagonistic manner.
These two regulatory effects may result from fine tuning of target gene expression functionally related to the host or knock-down of remaining opponent target gene expression. This finding allows to extend the common practice of mapping large scale gene expression data to protein associated genes with functionality of co-expressed intronic microRNAs.