For decades, cold-adapted, temperature-sensitive (ca/ts) strains of influenza A virus have been used as live attenuated vaccines. Due to their great public health importance it is crucial to understand the molecular mechanism(s) of cold adaptation and temperature sensitivity that are currently unknown. For instance, secondary RNA structures play important roles in influenza biology. Thus, we hypothesized that a relatively minor change in temperature (32–39°C) can lead to perturbations in influenza RNA structures and, that these structural perturbations may be different for mRNAs of the wild type (wt) and ca/ts strains. To test this hypothesis, we developed a novel in silico method that enables assessing whether two related RNA molecules would undergo (dis)similar structural perturbations upon temperature change. The proposed method allows identifying those areas within an RNA chain where dissimilarities of RNA secondary structures at two different temperatures are particularly pronounced, without knowing particular RNA shapes at either temperature. We identified such areas in the NS2, PA, PB2 and NP mRNAs. However, these areas are not identical for the wt and ca/ts mutants. Differences in temperature-induced structural changes of wt and ca/ts mRNA structures may constitute a yet unappreciated molecular mechanism of the cold adaptation/temperature sensitivity phenomena.
influenza; RNA; structure; temperature; vaccine
Motivation: Single-cell experiments of cells from the early mouse embryo yield gene expression data for different developmental stages from zygote to blastocyst. To better understand cell fate decisions during differentiation, it is desirable to analyse the high-dimensional gene expression data and assess differences in gene expression patterns between different developmental stages as well as within developmental stages. Conventional methods include univariate analyses of distributions of genes at different stages or multivariate linear methods such as principal component analysis (PCA). However, these approaches often fail to resolve important differences as each lineage has a unique gene expression pattern which changes gradually over time yielding different gene expressions both between different developmental stages as well as heterogeneous distributions at a specific stage. Furthermore, to date, no approach taking the temporal structure of the data into account has been presented.
Results: We present a novel framework based on Gaussian process latent variable models (GPLVMs) to analyse single-cell qPCR expression data of 48 genes from mouse zygote to blastocyst as presented by (Guo et al., 2010). We extend GPLVMs by introducing gene relevance maps and gradient plots to provide interpretability as in the linear case. Furthermore, we take the temporal group structure of the data into account and introduce a new factor in the GPLVM likelihood which ensures that small distances are preserved for cells from the same developmental stage. Using our novel framework, it is possible to resolve differences in gene expressions for all developmental stages. Furthermore, a new subpopulation of cells within the 16-cell stage is identified which is significantly more trophectoderm-like than the rest of the population. The trophectoderm-like subpopulation was characterized by considerable differences in the expression of Id2, Gata4 and, to a smaller extent, Klf4 and Hand1. The relevance of Id2 as early markers for TE cells is consistent with previously published results.
Availability: The mappings were implemented based on Prof. Neil Lawrence's FGPLVM toolbox1; extensions for relevance analysis and including the structure of the data can be obtained from one of the authors' homepage.2
In radiation protection, biokinetic models for zirconium processing are of crucial importance in dose estimation and further risk analysis for humans exposed to this radioactive substance. They provide limiting values of detrimental effects and build the basis for applications in internal dosimetry, the prediction for radioactive zirconium retention in various organs as well as retrospective dosimetry. Multi-compartmental models are the tool of choice for simulating the processing of zirconium. Although easily interpretable, determining the exact compartment structure and interaction mechanisms is generally daunting. In the context of observing the dynamics of multiple compartments, Bayesian methods provide efficient tools for model inference and selection.
We are the first to apply a Markov chain Monte Carlo approach to compute Bayes factors for the evaluation of two competing models for zirconium processing in the human body after ingestion. Based on in vivo measurements of human plasma and urine levels we were able to show that a recently published model is superior to the standard model of the International Commission on Radiological Protection. The Bayes factors were estimated by means of the numerically stable thermodynamic integration in combination with a recently developed copula-based Metropolis-Hastings sampler.
In contrast to the standard model the novel model predicts lower accretion of zirconium in bones. This results in lower levels of noxious doses for exposed individuals. Moreover, the Bayesian approach allows for retrospective dose assessment, including credible intervals for the initially ingested zirconium, in a significantly more reliable fashion than previously possible. All methods presented here are readily applicable to many modeling tasks in systems biology.
Bayesian inference; Model selection; MCMC sampling; Compartmental model; Internal dosimetry; Systems biology
Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain.
Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*α) is a conservative critical value for the p-gain, where α is the level of significance and B the number of tested metabolite pairs.
We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
p-gain; Metabolomics; MWAS; GWAS; Genome-wide association studies; Metabolome-wide association studies
Social contact with fungus-exposed ants leads to pathogen transfer to healthy nest-mates, causing low-level infections. These micro-infections promote pathogen-specific immune gene expression and protective immunization of nest-mates.
Due to the omnipresent risk of epidemics, insect societies have evolved sophisticated disease defences at the individual and colony level. An intriguing yet little understood phenomenon is that social contact to pathogen-exposed individuals reduces susceptibility of previously naive nestmates to this pathogen. We tested whether such social immunisation in Lasius ants against the entomopathogenic fungus Metarhizium anisopliae is based on active upregulation of the immune system of nestmates following contact to an infectious individual or passive protection via transfer of immune effectors among group members—that is, active versus passive immunisation. We found no evidence for involvement of passive immunisation via transfer of antimicrobials among colony members. Instead, intensive allogrooming behaviour between naive and pathogen-exposed ants before fungal conidia firmly attached to their cuticle suggested passage of the pathogen from the exposed individuals to their nestmates. By tracing fluorescence-labelled conidia we indeed detected frequent pathogen transfer to the nestmates, where they caused low-level infections as revealed by growth of small numbers of fungal colony forming units from their dissected body content. These infections rarely led to death, but instead promoted an enhanced ability to inhibit fungal growth and an active upregulation of immune genes involved in antifungal defences (defensin and prophenoloxidase, PPO). Contrarily, there was no upregulation of the gene cathepsin L, which is associated with antibacterial and antiviral defences, and we found no increased antibacterial activity of nestmates of fungus-exposed ants. This indicates that social immunisation after fungal exposure is specific, similar to recent findings for individual-level immune priming in invertebrates. Epidemiological modeling further suggests that active social immunisation is adaptive, as it leads to faster elimination of the disease and lower death rates than passive immunisation. Interestingly, humans have also utilised the protective effect of low-level infections to fight smallpox by intentional transfer of low pathogen doses (“variolation” or “inoculation”).
Close social contact facilitates pathogen transmission in societies, often causing epidemics. In contrast to this, we show that limited transmission of a fungal pathogen in ant colonies can be beneficial for the host, because it promotes “social immunisation” of healthy group members. We found that ants exposed to the fungus are heavily groomed by their healthy nestmates. Grooming removes a significant number of fungal conidiospores from the body surface of exposed ants and reduces their risk of falling sick. At the same time, previously healthy nestmates are themselves exposed to a small number of conidiospores, triggering low-level infections. These micro-infections are not deadly, but result in upregulated expression of a specific set of immune genes and pathogen-specific protective immune stimulation. Pathogen transfer by social interactions is therefore the underlying mechanism of social immunisation against fungal infections in ant societies. There is a similarity between such natural social immunisation and human efforts to induce immunity against deadly diseases, such as smallpox. Before vaccination with dead or attenuated strains was invented, immunity in human societies was induced by actively transferring low-level infections (“variolation”), just like in ants.
Although human musical performances represent one of the most valuable achievements of mankind, the best musicians perform imperfectly. Musical rhythms are not entirely accurate and thus inevitably deviate from the ideal beat pattern. Nevertheless, computer generated perfect beat patterns are frequently devalued by listeners due to a perceived lack of human touch. Professional audio editing software therefore offers a humanizing feature which artificially generates rhythmic fluctuations. However, the built-in humanizing units are essentially random number generators producing only simple uncorrelated fluctuations. Here, for the first time, we establish long-range fluctuations as an inevitable natural companion of both simple and complex human rhythmic performances. Moreover, we demonstrate that listeners strongly prefer long-range correlated fluctuations in musical rhythms. Thus, the favorable fluctuation type for humanizing interbeat intervals coincides with the one generically inherent in human musical performances.
Metabolomic profiling and the integration of whole-genome genetic association data has proven to be a powerful tool to comprehensively explore gene regulatory networks and to investigate the effects of genetic variation at the molecular level. Serum metabolite concentrations allow a direct readout of biological processes, and association of specific metabolomic signatures with complex diseases such as Alzheimer's disease and cardiovascular and metabolic disorders has been shown. There are well-known correlations between sex and the incidence, prevalence, age of onset, symptoms, and severity of a disease, as well as the reaction to drugs. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by gender. This study investigated sex-specific differences of serum metabolite concentrations and their underlying genetic determination. For discovery and replication we used more than 3,300 independent individuals from KORA F3 and F4 with metabolite measurements of 131 metabolites, including amino acids, phosphatidylcholines, sphingomyelins, acylcarnitines, and C6-sugars. A linear regression approach revealed significant concentration differences between males and females for 102 out of 131 metabolites (p-values<3.8×10−4; Bonferroni-corrected threshold). Sex-specific genome-wide association studies (GWAS) showed genome-wide significant differences in beta-estimates for SNPs in the CPS1 locus (carbamoyl-phosphate synthase 1, significance level: p<3.8×10−10; Bonferroni-corrected threshold) for glycine. We showed that the metabolite profiles of males and females are significantly different and, furthermore, that specific genetic variants in metabolism-related genes depict sexual dimorphism. Our study provides new important insights into sex-specific differences of cell regulatory processes and underscores that studies should consider sex-specific effects in design and interpretation.
The combination of genomic and metabolic studies during the last years has provided astonishing results. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by sex. The investigation of 131 serum metabolite concentrations of >3,300 population-based samples (KORA F3/F4) revealed significant differences in the metabolite profile of males and females. Furthermore, a genome-wide picture of sex-specific genetic variations in human metabolism (>2,000 subjects from KORA F3/F4 cohorts) was investigated. Sex-specific genome-wide association studies (GWAS) showed differences in the effect of genetic variations on metabolites in men and women. SNPs in the CPS1 (carbamoyl-phosphate synthase 1) locus showed genome-wide significant differences in beta-estimates of sex-specific association analysis (significance level: 3.8×10−10) for glycine. As global metabolomic techniques are more and more refined to identify more compounds in single biological samples, the predictive power of this new technology will greatly increase. This suggests that metabolites, which may be used as predictive biomarkers to indicate the presence or severity of a disease, have to be used selectively depending on sex.
Hematopoiesis is an ideal model system for stem cell biology with advanced experimental access. A systems view on the interactions of core transcription factors is important for understanding differentiation mechanisms and dynamics. In this manuscript, we construct a Boolean network to model myeloid differentiation, specifically from common myeloid progenitors to megakaryocytes, erythrocytes, granulocytes and monocytes. By interpreting the hematopoietic literature and translating experimental evidence into Boolean rules, we implement binary dynamics on the resulting 11-factor regulatory network. Our network contains interesting functional modules and a concatenation of mutual antagonistic pairs. The state space of our model is a hierarchical, acyclic graph, typifying the principles of myeloid differentiation. We observe excellent agreement between the steady states of our model and microarray expression profiles of two different studies. Moreover, perturbations of the network topology correctly reproduce reported knockout phenotypes in silico. We predict previously uncharacterized regulatory interactions and alterations of the differentiation process, and line out reprogramming strategies.
With the advent of high-throughput targeted metabolic profiling techniques, the question of how to interpret and analyze the resulting vast amount of data becomes more and more important. In this work we address the reconstruction of metabolic reactions from cross-sectional metabolomics data, that is without the requirement for time-resolved measurements or specific system perturbations. Previous studies in this area mainly focused on Pearson correlation coefficients, which however are generally incapable of distinguishing between direct and indirect metabolic interactions.
In our new approach we propose the application of a Gaussian graphical model (GGM), an undirected probabilistic graphical model estimating the conditional dependence between variables. GGMs are based on partial correlation coefficients, that is pairwise Pearson correlation coefficients conditioned against the correlation with all other metabolites. We first demonstrate the general validity of the method and its advantages over regular correlation networks with computer-simulated reaction systems. Then we estimate a GGM on data from a large human population cohort, covering 1020 fasting blood serum samples with 151 quantified metabolites. The GGM is much sparser than the correlation network, shows a modular structure with respect to metabolite classes, and is stable to the choice of samples in the data set. On the example of human fatty acid metabolism, we demonstrate for the first time that high partial correlation coefficients generally correspond to known metabolic reactions. This feature is evaluated both manually by investigating specific pairs of high-scoring metabolites, and then systematically on a literature-curated model of fatty acid synthesis and degradation. Our method detects many known reactions along with possibly novel pathway interactions, representing candidates for further experimental examination.
In summary, we demonstrate strong signatures of intracellular pathways in blood serum data, and provide a valuable tool for the unbiased reconstruction of metabolic reactions from large-scale metabolomics data sets.
External stimulations of cells by hormones, cytokines or growth factors activate signal transduction pathways that subsequently induce a re-arrangement of cellular gene expression. The analysis of such changes is complicated, as they consist of multi-layered temporal responses. While classical analyses based on clustering or gene set enrichment only partly reveal this information, matrix factorization techniques are well suited for a detailed temporal analysis. In signal processing, factorization techniques incorporating data properties like spatial and temporal correlation structure have shown to be robust and computationally efficient. However, such correlation-based methods have so far not be applied in bioinformatics, because large scale biological data rarely imply a natural order that allows the definition of a delayed correlation function.
We therefore develop the concept of graph-decorrelation. We encode prior knowledge like transcriptional regulation, protein interactions or metabolic pathways in a weighted directed graph. By linking features along this underlying graph, we introduce a partial ordering of the features (e.g. genes) and are thus able to define a graph-delayed correlation function. Using this framework as constraint to the matrix factorization task allows us to set up the fast and robust graph-decorrelation algorithm (GraDe). To analyze alterations in the gene response in IL-6 stimulated primary mouse hepatocytes, we performed a time-course microarray experiment and applied GraDe. In contrast to standard techniques, the extracted time-resolved gene expression profiles showed that IL-6 activates genes involved in cell cycle progression and cell division. Genes linked to metabolic and apoptotic processes are down-regulated indicating that IL-6 mediated priming renders hepatocytes more responsive towards cell proliferation and reduces expenditures for the energy metabolism.
GraDe provides a novel framework for the decomposition of large-scale 'omics' data. We were able to show that including prior knowledge into the separation task leads to a much more structured and detailed separation of the time-dependent responses upon IL-6 stimulation compared to standard methods. A Matlab implementation of the GraDe algorithm is freely available at http://cmb.helmholtz-muenchen.de/grade.
Extensive and automated data integration in bioinformatics facilitates the construction of large, complex biological networks. However, the challenge lies in the interpretation of these networks. While most research focuses on the unipartite or bipartite case, we address the more general but common situation of k-partite graphs. These graphs contain k different node types and links are only allowed between nodes of different types. In order to reveal their structural organization and describe the contained information in a more coarse-grained fashion, we ask how to detect clusters within each node type.
Since entities in biological networks regularly have more than one function and hence participate in more than one cluster, we developed a k-partite graph partitioning algorithm that allows for overlapping (fuzzy) clusters. It determines for each node a degree of membership to each cluster. Moreover, the algorithm estimates a weighted k-partite graph that connects the extracted clusters. Our method is fast and efficient, mimicking the multiplicative update rules commonly employed in algorithms for non-negative matrix factorization. It facilitates the decomposition of networks on a chosen scale and therefore allows for analysis and interpretation of structures on various resolution levels. Applying our algorithm to a tripartite disease-gene-protein complex network, we were able to structure this graph on a large scale into clusters that are functionally correlated and biologically meaningful. Locally, smaller clusters enabled reclassification or annotation of the clusters' elements. We exemplified this for the transcription factor MECP2.
In order to cope with the overwhelming amount of information available from biomedical literature, we need to tackle the challenge of finding structures in large networks with nodes of multiple types. To this end, we presented a novel fuzzy k-partite graph partitioning algorithm that allows the decomposition of these objects in a comprehensive fashion. We validated our approach both on artificial and real-world data. It is readily applicable to any further problem.
The set of regulatory interactions between genes, mediated by transcription factors, forms a species' transcriptional regulatory network (TRN). By comparing this network with measured gene expression data, one can identify functional properties of the TRN and gain general insight into transcriptional control. We define the subnet of a node as the subgraph consisting of all nodes topologically downstream of the node, including itself. Using a large set of microarray expression data of the bacterium Escherichia coli, we find that the gene expression in different subnets exhibits a structured pattern in response to environmental changes and genotypic mutation. Subnets with fewer changes in their expression pattern have a higher fraction of feed-forward loop motifs and a lower fraction of small RNA targets within them. Our study implies that the TRN consists of several scales of regulatory organization: (1) subnets with more varying gene expression controlled by both transcription factors and post-transcriptional RNA regulation and (2) subnets with less varying gene expression having more feed-forward loops and less post-transcriptional RNA regulation.
Bacterial cells can adapt to various genomic mutations and intriguingly many environmental changes. They do this by adjusting their gene expression profile to meet the requirements of a new condition. In this work, we study the interplay of different mechanisms of gene regulatory control driving this adaptation in the bacterium E. coli. We deconstruct the network of all transcription factor mediated regulatory interactions into subnets, topologically defined subgraphs which we expect to act as information processing units. Indeed, we find that many subnets react coordinately to cellular stress, and are used by the cells to account for mutations. In these subnets, we also find many small RNA targets. In contrast, those subnets that do not act in a coordinated fashion are highly enriched with feed-forward loops, a 3-node network motif with important information processing properties. Our approach reveals correlations and anti-correlations of three scales of regulatory control: subnets, feed-forward loops, and small RNA.
MicroRNAs are a large class of post-transcriptional regulators that bind to the 3′ untranslated region of messenger RNAs. They play a critical role in many cellular processes and have been linked to the control of signal transduction pathways. Recent studies indicate that microRNAs can function as tumor suppressors or even as oncogenes when aberrantly expressed. For more general insights of disease-associated microRNAs, we analyzed their impact on human signaling pathways from two perspectives. On a global scale, we found a core set of signaling pathways with enriched tissue-specific microRNA targets across diseases. The function of these pathways reflects the affinity of microRNAs to regulate cellular processes associated with apoptosis, proliferation or development. Comparing cancer and non-cancer related microRNAs, we found no significant differences between both groups. To unveil the interaction and regulation of microRNAs on signaling pathways locally, we analyzed the cellular location and process type of disease-associated microRNA targets and proteins. While disease-associated proteins are highly enriched in extracellular components of the pathway, microRNA targets are preferentially located in the nucleus. Moreover, targets of disease-associated microRNAs preferentially exhibit an inhibitory effect within the pathways in contrast to disease proteins. Our analysis provides systematic insights into the interaction of disease-associated microRNAs and signaling pathways and uncovers differences in cellular locations and process types of microRNA targets and disease-associated proteins.
Phenomenological information about regulatory interactions is frequently available and can be readily converted to Boolean models. Fully quantitative models, on the other hand, provide detailed insights into the precise dynamics of the underlying system. In order to connect discrete and continuous modeling approaches, methods for the conversion of Boolean systems into systems of ordinary differential equations have been developed recently. As biological interaction networks have steadily grown in size and complexity, a fully automated framework for the conversion process is desirable.
We present Odefy, a MATLAB- and Octave-compatible toolbox for the automated transformation of Boolean models into systems of ordinary differential equations. Models can be created from sets of Boolean equations or graph representations of Boolean networks. Alternatively, the user can import Boolean models from the CellNetAnalyzer toolbox, GINSim and the PBN toolbox. The Boolean models are transformed to systems of ordinary differential equations by multivariate polynomial interpolation and optional application of sigmoidal Hill functions. Our toolbox contains basic simulation and visualization functionalities for both, the Boolean as well as the continuous models. For further analyses, models can be exported to SQUAD, GNA, MATLAB script files, the SB toolbox, SBML and R script files. Odefy contains a user-friendly graphical user interface for convenient access to the simulation and exporting functionalities. We illustrate the validity of our transformation approach as well as the usage and benefit of the Odefy toolbox for two biological systems: a mutual inhibitory switch known from stem cell differentiation and a regulatory network giving rise to a specific spatial expression pattern at the mid-hindbrain boundary.
Odefy provides an easy-to-use toolbox for the automatic conversion of Boolean models to systems of ordinary differential equations. It can be efficiently connected to a variety of input and output formats for further analysis and investigations. The toolbox is open-source and can be downloaded at http://cmb.helmholtz-muenchen.de/odefy.
MicroRNA-mediated control of gene expression via translational inhibition has substantial impact on cellular regulatory mechanisms. About 37% of mammalian microRNAs appear to be located within introns of protein coding genes, linking their expression to the promoter-driven regulation of the host gene. In our study we investigate this linkage towards a relationship beyond transcriptional co-regulation.
Using measures based on both annotation and experimental data, we show that intronic microRNAs tend to support their host genes by regulation of target gene expression with significantly correlated expression patterns. We used expression data of three differentiating cell types and compared gene expression profiles of host and target genes. Many microRNA target genes show expression patterns significantly correlated with the expressions of the microRNA host genes. By calculating functional similarities between host and predicted microRNA target genes based on GO annotations, we confirm that many microRNAs link host and target gene activity in an either synergistic or antagonistic manner.
These two regulatory effects may result from fine tuning of target gene expression functionally related to the host or knock-down of remaining opponent target gene expression. This finding allows to extend the common practice of mapping large scale gene expression data to protein associated genes with functionality of co-expressed intronic microRNAs.
The isthmic organizer mediating differentiation of mid- and hindbrain during vertebrate development is characterized by a well-defined pattern of locally restricted gene expression domains around the mid-hindbrain boundary (MHB). This pattern is established and maintained by a regulatory network between several transcription and secreted factors that is not yet understood in full detail. In this contribution we show that a Boolean analysis of the characteristic spatial gene expression patterns at the murine MHB reveals key regulatory interactions in this network. Our analysis employs techniques from computational logic for the minimization of Boolean functions. This approach allows us to predict also the interplay of the various regulatory interactions. In particular, we predict a maintaining, rather than inducing, effect of Fgf8 on Wnt1 expression, an issue that remained unclear from published data. Using mouse anterior neural plate/tube explant cultures, we provide experimental evidence that Fgf8 in fact only maintains but does not induce ectopic Wnt1 expression in these explants. In combination with previously validated interactions, this finding allows for the construction of a regulatory network between key transcription and secreted factors at the MHB. Analyses of Boolean, differential equation and reaction-diffusion models of this network confirm that it is indeed able to explain the stable maintenance of the MHB as well as time-courses of expression patterns both under wild-type and various knock-out conditions. In conclusion, we demonstrate that similar to temporal also spatial expression patterns can be used to gain information about the structure of regulatory networks. We show, in particular, that the spatial gene expression patterns around the MHB help us to understand the maintenance of this boundary on a systems level.
Understanding brain formation during development is a tantalizing challenge. It is also essential for the fight against neurodegenerative diseases. In vertebrates, the central nervous system arises from a structure called the neural plate. This tissue is divided into four regions, which continue to develop into forebrain, midbrain, hindbrain and spinal cord. Interactions between locally expressed genes and signaling molecules are responsible for this patterning. Two key signaling molecules in this process are Fgf8 and Wnt1 proteins. They are secreted from a signaling center located at the boundary between prospective mid- and hindbrain (mid-hindbrain boundary, MHB) and mediate development of these two brain regions. Here, we logically analyze the spatial gene expression patterns at the MHB and predict interactions involved in the differentiation of mid- and hindbrain. In particular, our analysis indicates that Wnt1 depends on Fgf8 for stable maintenance. A time-course analysis of Wnt1 expression after implantation of Fgf8-coated beads in mouse neural plate/tube explants experimentally validates our prediction about the interactions between these two key patterning molecules. Subsequently, we demonstrate that available data allows construction of a mathematical model able to explain the maintenance of the signaling center at the MHB. We begin to understand this small aspect of brain formation on a systems level.
The understanding of regulatory and signaling networks has long been a core objective in Systems Biology. Knowledge about these networks is mainly of qualitative nature, which allows the construction of Boolean models, where the state of a component is either 'off' or 'on'. While often able to capture the essential behavior of a network, these models can never reproduce detailed time courses of concentration levels.
Nowadays however, experiments yield more and more quantitative data. An obvious question therefore is how qualitative models can be used to explain and predict the outcome of these experiments.
In this contribution we present a canonical way of transforming Boolean into continuous models, where the use of multivariate polynomial interpolation allows transformation of logic operations into a system of ordinary differential equations (ODE). The method is standardized and can readily be applied to large networks. Other, more limited approaches to this task are briefly reviewed and compared. Moreover, we discuss and generalize existing theoretical results on the relation between Boolean and continuous models. As a test case a logical model is transformed into an extensive continuous ODE model describing the activation of T-cells. We discuss how parameters for this model can be determined such that quantitative experimental results are explained and predicted, including time-courses for multiple ligand concentrations and binding affinities of different ligands. This shows that from the continuous model we may obtain biological insights not evident from the discrete one.
The presented approach will facilitate the interaction between modeling and experiments. Moreover, it provides a straightforward way to apply quantitative analysis methods to qualitatively described systems.
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these “unknown metabolites” is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype–metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.
Genome-wide association studies on metabolomics data have demonstrated that genetic variation in metabolic enzymes and transporters leads to concentration changes in the respective metabolite levels. The conventional goal of these studies is the detection of novel interactions between the genome and the metabolic system, providing valuable insights for both basic research as well as clinical applications. In this study, we borrow the metabolomics GWAS concept for a novel, entirely different purpose. Metabolite measurements frequently produce signals where a certain substance can be reliably detected in the sample, but it has not yet been elucidated which specific metabolite this signal actually represents. The concept is comparable to a fingerprint: each one is uniquely identifiable, but as long as it is not registered in a database one cannot tell to whom this fingerprint belongs. Obviously, this issue tremendously reduces the usability of a metabolomics analyses. The genetic associations of such an “unknown,” however, give us concrete evidence of the metabolic pathway this substance is most probably involved in. Moreover, we complement the approach with a specific measure of correlation between metabolites, providing further evidence of the metabolic processes of the unknown. For a number of cases, this even allows for a concrete identity prediction, which we then experimentally validate in the lab.
To characterise the influence of the fat free mass on the metabolite profile in serum samples from participants of the population-based KORA (Cooperative Health Research in the Region of Augsburg) S4 study.
Subjects and Methods
Analyses were based on metabolite profile from 965 participants of the S4 and 890 weight-stable subjects of its seven-year follow-up study (KORA F4). 190 different serum metabolites were quantified in a targeted approach including amino acids, acylcarnitines, phosphatidylcholines (PCs), sphingomyelins and hexose. Associations between metabolite concentrations and the fat free mass index (FFMI) were analysed using adjusted linear regression models. To draw conclusions on enzymatic reactions, intra-metabolite class ratios were explored. Pairwise relationships among metabolites were investigated and illustrated by means of Gaussian graphical models (GGMs).
We found 339 significant associations between FFMI and various metabolites in KORA S4. Among the most prominent associations (p-values 4.75×10−16–8.95×10−06) with higher FFMI were increasing concentrations of the branched chained amino acids (BCAAs), ratios of BCAAs to glucogenic amino acids, and carnitine concentrations. For various PCs, a decrease in chain length or in saturation of the fatty acid moieties could be observed with increasing FFMI, as well as an overall shift from acyl-alkyl PCs to diacyl PCs. These findings were reproduced in KORA F4. The established GGMs supported the regression results and provided a comprehensive picture of the relationships between metabolites. In a sub-analysis, most of the discovered associations did not exist in obese subjects in contrast to non-obese subjects, possibly indicating derangements in skeletal muscle metabolism.
A set of serum metabolites strongly associated with FFMI was identified and a network explaining the relationships among metabolites was established. These results offer a novel and more complete picture of the FFMI effects on serum metabolites in a data-driven network.
PhenomiR is a comprehensive database of 542 studies reporting deregulation of miRNAs allowing large-scale statistical analysis of miRNA expression changes.
In recent years, microRNAs have been shown to play important roles in physiological as well as malignant processes. The PhenomiR database http://mips.helmholtz-muenchen.de/phenomir provides data from 542 studies that investigate deregulation of microRNA expression in diseases and biological processes as a systematic, manually curated resource. Using the PhenomiR dataset, we could demonstrate that, depending on disease type, independent information from cell culture studies contrasts with conclusions drawn from patient studies.
Transcriptome analysis of a zebrafish mutant that does not respond to amphetamine identifies a network of coordinated gene regulation that may underlie the susceptibility to addiction.
Addiction is a pathological dysregulation of the brain's reward systems, determined by several complex genetic pathways. The conditioned place preference test provides an evaluation of the effects of drugs in animal models, allowing the investigation of substances at a biologically relevant level with respect to reward. Our lab has previously reported the development of a reliable conditioned place preference paradigm for zebrafish. Here, this test was used to isolate a dominant N-ethyl-N-nitrosourea (ENU)-induced mutant, no addiction (naddne3256), which fails to respond to amphetamine, and which we used as an entry point towards identifying the behaviorally relevant transcriptional response to amphetamine.
Through the combination of microarray experiments comparing the adult brain transcriptome of mutant and wild-type siblings under normal conditions, as well as their response to amphetamine, we identified genes that correlate with the mutants' altered conditioned place preference behavior. In addition to pathways classically involved in reward, this gene set shows a striking enrichment in transcription factor-encoding genes classically involved in brain development, which later appear to be re-used within the adult brain. We selected a subset of them for validation by quantitative PCR and in situ hybridization, revealing that specific brain areas responding to the drug through these transcription factors include domains of ongoing adult neurogenesis. Finally, network construction revealed functional connections between several of these genes.
Together, our results identify a new network of coordinated gene regulation that influences or accompanies amphetamine-triggered conditioned place preference behavior and that may underlie the susceptibility to addiction.
We have recently released a comprehensive, manually curated database of mammalian protein complexes called CORUM. Combining CORUM with other resources, we assembled a dataset of over 2700 mammalian complexes. The availability of a rich information resource allows us to search for organizational properties concerning these complexes.
As the complexity of a protein complex in terms of the number of unique subunits increases, we observed that the number of such complexes and the mean non-synonymous to synonymous substitution ratio of associated genes tend to decrease. Similarly, as the number of different complexes a given protein participates in increases, the number of such proteins and the substitution ratio of the associated gene also tends to decrease. These observations provide evidence relating natural selection and the organization of mammalian complexes. We also observed greater homogeneity in terms of predicted protein isoelectric points, secondary structure and substitution ratio in annotated versus randomly generated complexes. A large proportion of the protein content and interactions in the complexes could be predicted from known binary protein-protein and domain-domain interactions. In particular, we found that large proteins interact preferentially with much smaller proteins.
We observed similar trends in yeast and other data. Our results support the existence of conserved relations associated with the mammalian protein complexes.