MiRNAs are short, non-coding RNAs that regulate gene expression post-transcriptionally through specific binding to mRNA. Deregulation of miRNAs is associated with various diseases and interference with miRNA function has proven therapeutic potential. Most mRNAs are thought to be regulated by multiple miRNAs and there is some evidence that such joint activity is enhanced if a short distance between sites allows for cooperative binding. Until now, however, the concept of cooperativity among miRNAs has not been addressed in a transcriptome-wide approach. Here, we computationally screened human mRNAs for distances between miRNA binding sites that are expected to promote cooperativity. We find that sites with a maximal spacing of 26 nucleotides are enriched for naturally occurring miRNAs compared with control sequences. Furthermore, miRNAs with similar characteristics as indicated by either co-expression within a specific tissue or co-regulation in a disease context are predicted to target a higher number of mRNAs cooperatively than unrelated miRNAs. These bioinformatic data were compared with genome-wide sets of biochemically validated miRNA targets derived by Argonaute crosslinking and immunoprecipitation (HITS-CLIP and PAR-CLIP). To ease further research into combined and cooperative miRNA function, we developed miRco, a database connecting miRNAs and respective targets involved in distance-defined cooperative regulation (mips.helmholtz-muenchen.de/mirco). In conclusion, our findings suggest that cooperativity of miRNA-target interaction is a widespread phenomenon that may play an important role in miRNA-mediated gene regulation.
microRNA; target regulation; target prediction; cooperativity
Metabolomics is a relatively new high-throughput technology that aims at measuring all endogenous metabolites within a biological sample in an unbiased fashion. The resulting metabolic profiles may be regarded as functional signatures of the physiological state, and have been shown to comprise effects of genetic regulation as well as environmental factors. This potential to connect genotypic to phenotypic information promises new insights and biomarkers for different research fields, including biomedical and pharmaceutical research. In the statistical analysis of metabolomics data, many techniques from other omics fields can be reused. However recently, a number of tools specific for metabolomics data have been developed as well. The focus of this mini review will be on recent advancements in the analysis of metabolomics data especially by utilizing Gaussian graphical models and independent component analysis.
Genome-wide association scans with high-throughput metabolic profiling provide unprecedented insights into how genetic variation influences metabolism and complex disease. Here we report the most comprehensive exploration of genetic loci influencing human metabolism to date, including 7,824 adult individuals from two European population studies. We report genome-wide significant associations at 145 metabolic loci and their biochemical connectivity regarding more than 400 metabolites in human blood. We extensively characterize the resulting in vivo blueprint of metabolism in human blood by integrating it with information regarding gene expression, heritability, overlap with known drug targets, previous association with complex disorders and inborn errors of metabolism. We further developed a database and web-based resources for data mining and results visualization. Our findings contribute to a greater understanding of the role of inherited variation in blood metabolic diversity, and identify potential new opportunities for pharmacologic development and disease understanding.
Psoriasis is characterized by an apoptosis-resistant and metabolic active epidermis, while a hallmark for allergic contact dermatitis (ACD) is T cell-induced keratinocyte apoptosis. Here, we induced ACD reactions in psoriasis patients sensitized to nickel (n = 14) to investigate underlying mechanisms of psoriasis and ACD simultaneously. All patients developed a clinically and histologically typical dermatitis upon nickel challenge even in close proximity to pre-existing psoriasis plaques. However, the ACD reaction was delayed as compared to non-psoriatic patients, with a maximum intensity after 7 days. Whole genome expression analysis revealed alterations in numerous pathways related to metabolism and proliferation in non-involved skin of psoriasis patients as compared to non-psoriatic individuals, indicating that even in clinically non-involved skin of psoriasis patients molecular events opposing contact dermatitis may occur. Immunohistochemical comparison of ACD reactions as well as in vitro secretion analysis of lesional T cells showed a higher Th17 and neutrophilic migration as well as epidermal proliferation in psoriasis, while ACD reactions were dominated by cytotoxic CD8+ T cells and a Th2 signature. Based on these findings, we hypothesized an ACD reaction directly on top of a pre-existing psoriasis plaque might influence the clinical course of psoriasis. We observed a strong clinical inflammation with a mixed psoriasis and eczema phenotype in histology. Surprisingly, the initial psoriasis plaque was unaltered after self-limitation of the ACD reaction. We conclude that sensitized psoriasis patients develop a typical, but delayed ACD reaction which might be relevant for patch test evaluation in clinical practice. Psoriasis and ACD are driven by distinct and independent immune mechanisms.
Biological data often originate from samples containing mixtures of subpopulations, corresponding e.g. to distinct cellular phenotypes. However, identification of distinct subpopulations may be difficult if biological measurements yield distributions that are not easily separable.
We present Multiresolution Correlation Analysis (MCA), a method for visually identifying subpopulations based on the local pairwise correlation between covariates, without needing to define an a priori interaction scale. We demonstrate that MCA facilitates the identification of differentially regulated subpopulations in simulated data from a small gene regulatory network, followed by application to previously published single-cell qPCR data from mouse embryonic stem cells. We show that MCA recovers previously identified subpopulations, provides additional insight into the underlying correlation structure, reveals potentially spurious compartmentalizations, and provides insight into novel subpopulations.
MCA is a useful method for the identification of subpopulations in low-dimensional expression data, as emerging from qPCR or FACS measurements. With MCA it is possible to investigate the robustness of covariate correlations with respect subpopulations, graphically identify outliers, and identify factors contributing to differential regulation between pairs of covariates. MCA thus provides a framework for investigation of expression correlations for genes of interests and biological hypothesis generation.
Multiresolution; Correlation; Subpopulation identification; qPCR analysis
Functional cell-to-cell variability is ubiquitous in multicellular organisms as well as bacterial populations. Even genetically identical cells of the same cell type can respond differently to identical stimuli. Methods have been developed to analyse heterogeneous populations, e.g., mixture models and stochastic population models. The available methods are, however, either incapable of simultaneously analysing different experimental conditions or are computationally demanding and difficult to apply. Furthermore, they do not account for biological information available in the literature. To overcome disadvantages of existing methods, we combine mixture models and ordinary differential equation (ODE) models. The ODE models provide a mechanistic description of the underlying processes while mixture models provide an easy way to capture variability. In a simulation study, we show that the class of ODE constrained mixture models can unravel the subpopulation structure and determine the sources of cell-to-cell variability. In addition, the method provides reliable estimates for kinetic rates and subpopulation characteristics. We use ODE constrained mixture modelling to study NGF-induced Erk1/2 phosphorylation in primary sensory neurones, a process relevant in inflammatory and neuropathic pain. We propose a mechanistic pathway model for this process and reconstructed static and dynamical subpopulation characteristics across experimental conditions. We validate the model predictions experimentally, which verifies the capabilities of ODE constrained mixture models. These results illustrate that ODE constrained mixture models can reveal novel mechanistic insights and possess a high sensitivity.
In this manuscript, we introduce ODE constrained mixture models for the analysis of population snapshot data of kinetics and dose responses. Population snapshot data can for instance be derived from flow cytometry or single-cell microscopy and provide information about the population structure and the dynamics of subpopulations. Currently available methods enable, however, only the extraction of this information if the subpopulations are very different. By combining pathway-specific ODE and mixture models, a more sensitive method is obtained, which can simultaneously analyse a variety of experimental conditions. ODE constrained mixture models facilitate the reconstruction of subpopulation sizes and dynamics, even in situations where the subpopulations are hardly distinguishable. This is shown for a simulation example as well as for the process of NGF-induced Erk1/2 phosphorylation in primary sensory neurones. We find that the proposed method allows for a simple but pervasive analysis of heterogeneous cell systems and more profound, mechanistic insights.
The balance of self-renewal and differentiation in long-term repopulating hematopoietic stem cells (LT-HSC) must be strictly controlled to maintain blood homeostasis and to prevent leukemogenesis. Hematopoietic cytokines can induce differentiation in LT-HSCs; however, the molecular mechanism orchestrating this delicate balance requires further elucidation. We identified the tumor suppressor GADD45G as an instructor of LT-HSC differentiation under the control of differentiation-promoting cytokine receptor signaling. GADD45G immediately induces and accelerates differentiation in LT-HSCs and overrides the self-renewal program by specifically activating MAP3K4-mediated MAPK p38. Conversely, the absence of GADD45G enhances the self-renewal potential of LT-HSCs. Videomicroscopy-based tracking of single LT-HSCs revealed that, once GADD45G is expressed, the development of LT-HSCs into lineage-committed progeny occurred within 36 hr and uncovered a selective lineage choice with a severe reduction in megakaryocytic-erythroid cells. Here, we report an unrecognized role of GADD45G as a central molecular linker of extrinsic cytokine differentiation and lineage choice control in hematopoiesis.
•Molecular mechanism of cytokine-mediated differentiation induction in LT-HSCs•Cytokine-regulated GADD45G induces and accelerates differentiation in LT-HSCs•The absence of GADD45G increases the self-renewal capacity in LT-HSCs•GADD45G-induced program selects for myelomonocytic and lymphoid lineages
Rieger and colleagues report an unrecognized function of the tumor suppressor GADD45G as a molecular link of differentiation-promoting cytokine signaling and rapid differentiation induction in hematopoiesis. Cytokine-regulated GADD45G induces and accelerates hematopoietic stem cell differentiation and overrides the self-renewal program by specifically activating MAP3K4-mediated MAPK p38. Videomicroscopy-based single stem cell tracking further revealed a GADD45G-mediated selective lineage choice against megakaryocytic-erythroid fate.
Motivation: High-throughput single-cell quantitative real-time polymerase chain reaction (qPCR) is a promising technique allowing for new insights in complex cellular processes. However, the PCR reaction can be detected only up to a certain detection limit, whereas failed reactions could be due to low or absent expression, and the true expression level is unknown. Because this censoring can occur for high proportions of the data, it is one of the main challenges when dealing with single-cell qPCR data. Principal component analysis (PCA) is an important tool for visualizing the structure of high-dimensional data as well as for identifying subpopulations of cells. However, to date it is not clear how to perform a PCA of censored data. We present a probabilistic approach that accounts for the censoring and evaluate it for two typical datasets containing single-cell qPCR data.
Results: We use the Gaussian process latent variable model framework to account for censoring by introducing an appropriate noise model and allowing a different kernel for each dimension. We evaluate this new approach for two typical qPCR datasets (of mouse embryonic stem cells and blood stem/progenitor cells, respectively) by performing linear and non-linear probabilistic PCA. Taking the censoring into account results in a 2D representation of the data, which better reflects its known structure: in both datasets, our new approach results in a better separation of known cell types and is able to reveal subpopulations in one dataset that could not be resolved using standard PCA.
Availability and implementation: The implementation was based on the existing Gaussian process latent variable model toolbox (https://github.com/SheffieldML/GPmat); extensions for noise models and kernels accounting for censoring are available at http://icb.helmholtz-muenchen.de/censgplvm.
Supplementary information: Supplementary data are available at Bioinformatics online.
Mathematical models are nowadays widely used to describe biochemical reaction
networks. One of the main reasons for this is that models facilitate the
integration of a multitude of different data and data types using parameter
estimation. Thereby, models allow for a holistic understanding of biological
processes. However, due to measurement noise and the limited amount of data,
uncertainties in the model parameters should be considered when conclusions are
drawn from estimated model attributes, such as reaction fluxes or transient
dynamics of biological species.
Methods and results
We developed the visual analytics system iVUN that supports
uncertainty-aware analysis of static and dynamic attributes of biochemical
reaction networks modeled by ordinary differential equations. The multivariate
graph of the network is visualized as a node-link diagram, and statistics of the
attributes are mapped to the color of nodes and links of the graph. In addition,
the graph view is linked with several views, such as line plots, scatter plots,
and correlation matrices, to support locating uncertainties and the analysis of
their time dependencies. As demonstration, we use iVUN to quantitatively
analyze the dynamics of a model for Epo-induced JAK2/STAT5 signaling.
Our case study showed that iVUN can be used to perform an in-depth study
of biochemical reaction networks, including attribute uncertainties, correlations
between these attributes and their uncertainties as well as the attribute
dynamics. In particular, the linking of different visualization options turned out
to be highly beneficial for the complex analysis tasks that come with the
biological systems as presented here.
Cellular decision-making is mediated by a complex interplay of external stimuli with the intracellular environment, in particular transcription factor regulatory networks. Here we have determined the expression of a network of 18 key haematopoietic transcription factors (TFs) in 597 single primary blood stem and progenitor cells isolated from mouse bone marrow. We demonstrate that different stem/progenitor populations are characterised by distinctive TF expression states, and through comprehensive bioinformatic analysis reveal positively and negatively correlated TF pairings, including previously unrecognised relationships between Gata2, Gfi1 and Gfi1b. Validation using transcriptional and transgenic assays confirmed direct regulatory interactions consistent with a regulatory triad in immature blood stem cells, where Gata2 may function to modulate cross-inhibition between Gfi1 and Gfi1b. Single cell expression profiling therefore identifies network states and allows reconstruction of network hierarchies involved in controlling stem cell fate choices, and provides a blueprint for studying both normal development and human disease.
In recent years, high-throughput microscopy has emerged as a powerful tool to analyze cellular dynamics in an unprecedentedly high resolved manner. The amount of data that is generated, for example in long-term time-lapse microscopy experiments, requires automated methods for processing and analysis. Available software frameworks are well suited for high-throughput processing of fluorescence images, but they often do not perform well on bright field image data that varies considerably between laboratories, setups, and even single experiments.
In this contribution, we present a fully automated image processing pipeline that is able to robustly segment and analyze cells with ellipsoid morphology from bright field microscopy in a high-throughput, yet time efficient manner. The pipeline comprises two steps: (i) Image acquisition is adjusted to obtain optimal bright field image quality for automatic processing. (ii) A concatenation of fast performing image processing algorithms robustly identifies single cells in each image. We applied the method to a time-lapse movie consisting of ∼315,000 images of differentiating hematopoietic stem cells over 6 days. We evaluated the accuracy of our method by comparing the number of identified cells with manual counts. Our method is able to segment images with varying cell density and different cell types without parameter adjustment and clearly outperforms a standard approach. By computing population doubling times, we were able to identify three growth phases in the stem cell population throughout the whole movie, and validated our result with cell cycle times from single cell tracking.
Our method allows fully automated processing and analysis of high-throughput bright field microscopy data. The robustness of cell detection and fast computation time will support the analysis of high-content screening experiments, on-line analysis of time-lapse experiments as well as development of methods to automatically track single-cell genealogies.
Due to the high complexity of biological data it is difficult to disentangle cellular processes relying only on intuitive interpretation of measurements. A Systems Biology approach that combines quantitative experimental data with dynamic mathematical modeling promises to yield deeper insights into these processes. Nevertheless, with growing complexity and increasing amount of quantitative experimental data, building realistic and reliable mathematical models can become a challenging task: the quality of experimental data has to be assessed objectively, unknown model parameters need to be estimated from the experimental data, and numerical calculations need to be precise and efficient.
Here, we discuss, compare and characterize the performance of computational methods throughout the process of quantitative dynamic modeling using two previously established examples, for which quantitative, dose- and time-resolved experimental data are available. In particular, we present an approach that allows to determine the quality of experimental data in an efficient, objective and automated manner. Using this approach data generated by different measurement techniques and even in single replicates can be reliably used for mathematical modeling. For the estimation of unknown model parameters, the performance of different optimization algorithms was compared systematically. Our results show that deterministic derivative-based optimization employing the sensitivity equations in combination with a multi-start strategy based on latin hypercube sampling outperforms the other methods by orders of magnitude in accuracy and speed. Finally, we investigated transformations that yield a more efficient parameterization of the model and therefore lead to a further enhancement in optimization performance. We provide a freely available open source software package that implements the algorithms and examples compared here.
Modern high-throughput methods allow the investigation of biological functions across multiple ‘omics’ levels. Levels include mRNA and protein expression profiling as well as additional knowledge on, for example, DNA methylation and microRNA regulation. The reason for this interest in multi-omics is that actual cellular responses to different conditions are best explained mechanistically when taking all omics levels into account. To map gene products to their biological functions, public ontologies like Gene Ontology are commonly used. Many methods have been developed to identify terms in an ontology, overrepresented within a set of genes. However, these methods are not able to appropriately deal with any combination of several data types. Here, we propose a new method to analyse integrated data across multiple omics-levels to simultaneously assess their biological meaning. We developed a model-based Bayesian method for inferring interpretable term probabilities in a modular framework. Our Multi-level ONtology Analysis (MONA) algorithm performed significantly better than conventional analyses of individual levels and yields best results even for sophisticated models including mRNA fine-tuning by microRNAs. The MONA framework is flexible enough to allow for different underlying regulatory motifs or ontologies. It is ready-to-use for applied researchers and is available as a standalone application from http://icb.helmholtz-muenchen.de/mona.
Diffusion is a key component of many biological processes such as chemotaxis, developmental differentiation and tissue morphogenesis. Since recently, the spatial gradients caused by diffusion can be assessed in-vitro and in-vivo using microscopy based imaging techniques. The resulting time-series of two dimensional, high-resolutions images in combination with mechanistic models enable the quantitative analysis of the underlying mechanisms. However, such a model-based analysis is still challenging due to measurement noise and sparse observations, which result in uncertainties of the model parameters.
We introduce a likelihood function for image-based measurements with log-normal distributed noise. Based upon this likelihood function we formulate the maximum likelihood estimation problem, which is solved using PDE-constrained optimization methods. To assess the uncertainty and practical identifiability of the parameters we introduce profile likelihoods for diffusion processes.
Results and conclusion
As proof of concept, we model certain aspects of the guidance of dendritic cells towards lymphatic vessels, an example for haptotaxis. Using a realistic set of artificial measurement data, we estimate the five kinetic parameters of this model and compute profile likelihoods. Our novel approach for the estimation of model parameters from image data as well as the proposed identifiability analysis approach is widely applicable to diffusion processes. The profile likelihood based method provides more rigorous uncertainty bounds in contrast to local approximation methods.
Serum urate, the final breakdown product of purine metabolism, is causally involved in the pathogenesis of gout, and implicated in cardiovascular disease and type 2 diabetes. Serum urate levels highly differ between men and women; however the underlying biological processes in its regulation are still not completely understood and are assumed to result from a complex interplay between genetic, environmental and lifestyle factors. In order to describe the metabolic vicinity of serum urate, we analyzed 355 metabolites in 1,764 individuals of the population-based KORA F4 study and constructed a metabolite network around serum urate using Gaussian Graphical Modeling in a hypothesis-free approach. We subsequently investigated the effect of sex and urate lowering medication on all 38 metabolites assigned to the network. Within the resulting network three main clusters could be detected around urate, including the well-known pathway of purine metabolism, as well as several dipeptides, a group of essential amino acids, and a group of steroids. Of the 38 assigned metabolites, 25 showed strong differences between sexes. Association with uricostatic medication intake was not only confined to purine metabolism but seen for seven metabolites within the network. Our findings highlight pathways that are important in the regulation of serum urate and suggest that dipeptides, amino acids, and steroid hormones are playing a role in its regulation. The findings might have an impact on the development of specific targets in the treatment and prevention of hyperuricemia.
Electronic supplementary material
The online version of this article (doi:10.1007/s11306-013-0565-2) contains supplementary material, which is available to authorized users.
Gaussian Graphical Modeling; Metabolite network; Pathway reconstruction; Allopurinol; Uric acid; Purine metabolism
The establishment of the mid-hindbrain region in vertebrates is mediated by the
isthmic organizer, an embryonic secondary organizer characterized by a
well-defined pattern of locally restricted gene expression domains with sharply
delimited boundaries. While the function of the isthmic organizer at the
mid-hindbrain boundary has been subject to extensive experimental studies, it
remains unclear how this well-defined spatial gene expression pattern, which is
essential for proper isthmic organizer function, is established during vertebrate
development. Because the secreted Wnt1 protein plays a prominent role in isthmic
organizer function, we focused in particular on the refinement of Wnt1
gene expression in this context.
We analyzed the dynamics of the corresponding murine gene regulatory network and
the related, diffusive signaling proteins using a macroscopic model for the
biological two-scale signaling process. Despite the discontinuity arising
from the sharp gene expression domain boundaries, we proved the existence of
unique, positive solutions for the partial differential equation system. This
enabled the numerically and analytically analysis of the formation and stability
of the expression pattern. Notably, the calculated expression domain of
Wnt1 has no sharp boundary in contrast to experimental evidence. We
subsequently propose a post-transcriptional regulatory mechanism for Wnt1
miRNAs which yields the observed sharp expression domain boundaries. We
established a list of candidate miRNAs and confirmed their expression pattern by
radioactive in situ hybridization. The miRNA miR-709 was identified as a
potential regulator of Wnt1 mRNA, which was validated by luciferase
In summary, our theoretical analysis of the gene expression pattern induction at
the mid-hindbrain boundary revealed the need to extend the model by an additional
Wnt1 regulation. The developed macroscopic model of a two-scale
process facilitate the stringent analysis of other morphogen-based patterning
Mid-Hindbrain Boundary; miRNA Modeling; Spatio-Temporal Model; Developemental Biology
MicroRNAs have emerged as key posttranscriptional regulators of gene expression during vertebrate development. We show that the miR-200 family plays a crucial role for the proper generation and survival of ventral neuronal populations in the murine midbrain/hindbrain region, including midbrain dopaminergic neurons, by directly targeting the pluripotency factor Sox2 and the cell-cycle regulator E2F3 in neural stem/progenitor cells. The lack of a negative regulation of Sox2 and E2F3 by miR-200 in conditional Dicer1 mutants (En1+/Cre; Dicer1flox/flox mice) and after miR-200 knockdown in vitro leads to a strongly reduced cell-cycle exit and neuronal differentiation of ventral midbrain/hindbrain (vMH) neural progenitors, whereas the opposite effect is seen after miR-200 overexpression in primary vMH cells. Expression of miR-200 is in turn directly regulated by Sox2 and E2F3, thereby establishing a unilateral negative feedback loop required for the cell-cycle exit and neuronal differentiation of neural stem/progenitor cells. Our findings suggest that the posttranscriptional regulation of Sox2 and E2F3 by miR-200 family members might be a general mechanism to control the transition from a pluripotent/multipotent stem/progenitor cell to a postmitotic and more differentiated cell.
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these “unknown metabolites” is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype–metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.
Genome-wide association studies on metabolomics data have demonstrated that genetic variation in metabolic enzymes and transporters leads to concentration changes in the respective metabolite levels. The conventional goal of these studies is the detection of novel interactions between the genome and the metabolic system, providing valuable insights for both basic research as well as clinical applications. In this study, we borrow the metabolomics GWAS concept for a novel, entirely different purpose. Metabolite measurements frequently produce signals where a certain substance can be reliably detected in the sample, but it has not yet been elucidated which specific metabolite this signal actually represents. The concept is comparable to a fingerprint: each one is uniquely identifiable, but as long as it is not registered in a database one cannot tell to whom this fingerprint belongs. Obviously, this issue tremendously reduces the usability of a metabolomics analyses. The genetic associations of such an “unknown,” however, give us concrete evidence of the metabolic pathway this substance is most probably involved in. Moreover, we complement the approach with a specific measure of correlation between metabolites, providing further evidence of the metabolic processes of the unknown. For a number of cases, this even allows for a concrete identity prediction, which we then experimentally validate in the lab.
For decades, cold-adapted, temperature-sensitive (ca/ts) strains of influenza A virus have been used as live attenuated vaccines. Due to their great public health importance it is crucial to understand the molecular mechanism(s) of cold adaptation and temperature sensitivity that are currently unknown. For instance, secondary RNA structures play important roles in influenza biology. Thus, we hypothesized that a relatively minor change in temperature (32–39°C) can lead to perturbations in influenza RNA structures and, that these structural perturbations may be different for mRNAs of the wild type (wt) and ca/ts strains. To test this hypothesis, we developed a novel in silico method that enables assessing whether two related RNA molecules would undergo (dis)similar structural perturbations upon temperature change. The proposed method allows identifying those areas within an RNA chain where dissimilarities of RNA secondary structures at two different temperatures are particularly pronounced, without knowing particular RNA shapes at either temperature. We identified such areas in the NS2, PA, PB2 and NP mRNAs. However, these areas are not identical for the wt and ca/ts mutants. Differences in temperature-induced structural changes of wt and ca/ts mRNA structures may constitute a yet unappreciated molecular mechanism of the cold adaptation/temperature sensitivity phenomena.
influenza; RNA; structure; temperature; vaccine
Motivation: Single-cell experiments of cells from the early mouse embryo yield gene expression data for different developmental stages from zygote to blastocyst. To better understand cell fate decisions during differentiation, it is desirable to analyse the high-dimensional gene expression data and assess differences in gene expression patterns between different developmental stages as well as within developmental stages. Conventional methods include univariate analyses of distributions of genes at different stages or multivariate linear methods such as principal component analysis (PCA). However, these approaches often fail to resolve important differences as each lineage has a unique gene expression pattern which changes gradually over time yielding different gene expressions both between different developmental stages as well as heterogeneous distributions at a specific stage. Furthermore, to date, no approach taking the temporal structure of the data into account has been presented.
Results: We present a novel framework based on Gaussian process latent variable models (GPLVMs) to analyse single-cell qPCR expression data of 48 genes from mouse zygote to blastocyst as presented by (Guo et al., 2010). We extend GPLVMs by introducing gene relevance maps and gradient plots to provide interpretability as in the linear case. Furthermore, we take the temporal group structure of the data into account and introduce a new factor in the GPLVM likelihood which ensures that small distances are preserved for cells from the same developmental stage. Using our novel framework, it is possible to resolve differences in gene expressions for all developmental stages. Furthermore, a new subpopulation of cells within the 16-cell stage is identified which is significantly more trophectoderm-like than the rest of the population. The trophectoderm-like subpopulation was characterized by considerable differences in the expression of Id2, Gata4 and, to a smaller extent, Klf4 and Hand1. The relevance of Id2 as early markers for TE cells is consistent with previously published results.
Availability: The mappings were implemented based on Prof. Neil Lawrence's FGPLVM toolbox1; extensions for relevance analysis and including the structure of the data can be obtained from one of the authors' homepage.2
In radiation protection, biokinetic models for zirconium processing are of crucial importance in dose estimation and further risk analysis for humans exposed to this radioactive substance. They provide limiting values of detrimental effects and build the basis for applications in internal dosimetry, the prediction for radioactive zirconium retention in various organs as well as retrospective dosimetry. Multi-compartmental models are the tool of choice for simulating the processing of zirconium. Although easily interpretable, determining the exact compartment structure and interaction mechanisms is generally daunting. In the context of observing the dynamics of multiple compartments, Bayesian methods provide efficient tools for model inference and selection.
We are the first to apply a Markov chain Monte Carlo approach to compute Bayes factors for the evaluation of two competing models for zirconium processing in the human body after ingestion. Based on in vivo measurements of human plasma and urine levels we were able to show that a recently published model is superior to the standard model of the International Commission on Radiological Protection. The Bayes factors were estimated by means of the numerically stable thermodynamic integration in combination with a recently developed copula-based Metropolis-Hastings sampler.
In contrast to the standard model the novel model predicts lower accretion of zirconium in bones. This results in lower levels of noxious doses for exposed individuals. Moreover, the Bayesian approach allows for retrospective dose assessment, including credible intervals for the initially ingested zirconium, in a significantly more reliable fashion than previously possible. All methods presented here are readily applicable to many modeling tasks in systems biology.
Bayesian inference; Model selection; MCMC sampling; Compartmental model; Internal dosimetry; Systems biology
To characterise the influence of the fat free mass on the metabolite profile in serum samples from participants of the population-based KORA (Cooperative Health Research in the Region of Augsburg) S4 study.
Subjects and Methods
Analyses were based on metabolite profile from 965 participants of the S4 and 890 weight-stable subjects of its seven-year follow-up study (KORA F4). 190 different serum metabolites were quantified in a targeted approach including amino acids, acylcarnitines, phosphatidylcholines (PCs), sphingomyelins and hexose. Associations between metabolite concentrations and the fat free mass index (FFMI) were analysed using adjusted linear regression models. To draw conclusions on enzymatic reactions, intra-metabolite class ratios were explored. Pairwise relationships among metabolites were investigated and illustrated by means of Gaussian graphical models (GGMs).
We found 339 significant associations between FFMI and various metabolites in KORA S4. Among the most prominent associations (p-values 4.75×10−16–8.95×10−06) with higher FFMI were increasing concentrations of the branched chained amino acids (BCAAs), ratios of BCAAs to glucogenic amino acids, and carnitine concentrations. For various PCs, a decrease in chain length or in saturation of the fatty acid moieties could be observed with increasing FFMI, as well as an overall shift from acyl-alkyl PCs to diacyl PCs. These findings were reproduced in KORA F4. The established GGMs supported the regression results and provided a comprehensive picture of the relationships between metabolites. In a sub-analysis, most of the discovered associations did not exist in obese subjects in contrast to non-obese subjects, possibly indicating derangements in skeletal muscle metabolism.
A set of serum metabolites strongly associated with FFMI was identified and a network explaining the relationships among metabolites was established. These results offer a novel and more complete picture of the FFMI effects on serum metabolites in a data-driven network.
Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain.
Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*α) is a conservative critical value for the p-gain, where α is the level of significance and B the number of tested metabolite pairs.
We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
p-gain; Metabolomics; MWAS; GWAS; Genome-wide association studies; Metabolome-wide association studies