Background: A recent comparison showed the extensive similarities between the structural properties of metabolites in the reconstructed human metabolic network (“endogenites”) and those of successful, marketed drugs (“drugs”).
Results: Clustering indicated the related but differential population of chemical space by endogenites and drugs. Differences between the drug-endogenite similarities resulting from various encodings and judged by Tanimoto similarity could be related simply to the fraction of the bitstrings set to 1. By extracting drug/endogenite substructures, we develop a novel family of fingerprints, the Drug Endogenite Substructure (DES) encodings, based on the ranked frequency of the various substructures. These provide a natural assessment of drug-endogenite likeness, and may be used as descriptors with which to derive quantitative structure-activity relationships (QSARs).
Conclusions: “Drug-endogenite likeness” seems to have utility, and leads to a simple, novel and interpretable substructure-based molecular encoding for cheminformatics.
drug transporters; cheminformatics; endogenites; metabolomics; encodings
One approach to experimental science involves creating hypotheses, then testing them by varying one or more independent variables, and assessing the effects of this variation on the processes of interest. We use this strategy to compare the intellectual status and available evidence for two models or views of mechanisms of transmembrane drug transport into intact biological cells. One (BDII) asserts that lipoidal phospholipid Bilayer Diffusion Is Important, while a second (PBIN) proposes that in normal intact cells Phospholipid Bilayer diffusion Is Negligible (i.e., may be neglected quantitatively), because evolution selected against it, and with transmembrane drug transport being effected by genetically encoded proteinaceous carriers or pores, whose “natural” biological roles, and substrates are based in intermediary metabolism. Despite a recent review elsewhere, we can find no evidence able to support BDII as we can find no experiments in intact cells in which phospholipid bilayer diffusion was either varied independently or measured directly (although there are many papers where it was inferred by seeing a covariation of other dependent variables). By contrast, we find an abundance of evidence showing cases in which changes in the activities of named and genetically identified transporters led to measurable changes in the rate or extent of drug uptake. PBIN also has considerable predictive power, and accounts readily for the large differences in drug uptake between tissues, cells and species, in accounting for the metabolite-likeness of marketed drugs, in pharmacogenomics, and in providing a straightforward explanation for the late-stage appearance of toxicity and of lack of efficacy during drug discovery programmes despite macroscopically adequate pharmacokinetics. Consequently, the view that Phospholipid Bilayer diffusion Is Negligible (PBIN) provides a starting hypothesis for assessing cellular drug uptake that is much better supported by the available evidence, and is both more productive and more predictive.
drug transporters; systems pharmacology; pharmacogenomics; Recon2
A major trend in recent Parkinson's disease (PD) research is the investigation of biological markers that could help in identifying at-risk individuals or to track disease progression and response to therapies. Central to this is the knowledge that inflammation is a known hallmark of PD and of many other degenerative diseases. In the current work, we focus on inflammatory signalling in PD, using a systems approach that allows us to look at the disease in a more holistic way. We discuss cyclooxygenases, prostaglandins, thromboxanes and also iron in PD. These particular signalling molecules are involved in PD pathophysiology, but are also very important in an aberrant coagulation/hematology system. We present and discuss a hypothesis regarding the possible interaction of these aberrant signalling molecules implicated in PD, and suggest that these molecules may affect the erythrocytes of PD patients. This would be observable as changes in the morphology of the RBCs and of PD patients relative to healthy controls. We then show that the RBCs of PD patients are indeed rather dramatically deranged in their morphology, exhibiting eryptosis (a kind of programmed cell death). This morphological indicator may have useful diagnostic and prognostic significance.
Parkinson's disease; hypercoagulability; erythrocytes; eryptosis
We bring together fifteen, nonredundant, tabulated collections (amounting to 696 separate measurements) of the apparent permeability (Papp) of Caco-2 cells to marketed drugs. While in some cases there are some significant interlaboratory disparities, most are quite minor. Most drugs are not especially permeable through Caco-2 cells, with the median Papp value being some 16 ⋅ 10−6 cm s−1. This value is considerably lower than those (1,310 and 230 ⋅ 10−6 cm s−1) recently used in some simulations that purported to show that Papp values were too great to be transporter-mediated only. While these values are outliers, all values, and especially the comparatively low values normally observed, are entirely consistent with transporter-only mediated uptake, with no need to invoke phospholipid bilayer diffusion. The apparent permeability of Caco-2 cells to marketed drugs is poorly correlated with either simple biophysical properties, the extent of molecular similarity to endogenous metabolites (endogenites), or any specific substructural properties. In particular, the octanol:water partition coefficient, logP, shows negligible correlation with Caco-2 permeability. The data are best explained on the basis that most drugs enter (and exit) Caco-2 cells via a multiplicity of transporters of comparatively weak specificity.
Caco-2 cells; Facilitated diffusion/transport; Permeability; Oral absorption; Transcellular transport; Mathematical models; Transporter-mediated uptake; Cheminformatics; Transporters
A recent paper in this journal argued that reported expression levels, kcat and Km for drug transporters could be used to estimate the likelihood that drug fluxes through Caco-2 cells could be accounted for solely by protein transporters. It was in fact concluded that if five such transporters contributed ‘randomly’ they could account for the flux of the most permeable drug tested (verapamil) 35% of the time. However, the values of permeability cited for verapamil were unusually high; this and other drugs have much lower permeabilities. Even for the claimed permeabilities, we found that a single ‘random’ transporter could account for the flux 42% of the time, and that two transporters can achieve 10 · 10−6 cm·s−1 90% of the time. Parameter optimisation methods show that even a single transporter can account for Caco-2 drug uptake of the most permeable drug. Overall, the proposal that ‘phospholipid bilayer diffusion (of drugs) is negligible’ is not disproved by the calculations of ‘likely’ transporter-based fluxes.
There has been recent debate as to the relative extents to which cellular transmembrane drug transports occur through any phospholipid bilayer region or is transporter-mediated only.
Much recent evidence suggests (perhaps surprisingly) that phospholipid bilayer diffusion is negligible.
A recent article in this journal suggested that the expression profile and kinetics of known transporters might not be adequate to explain the most active drug fluxes (of verapamil and propranolol) in Caco-2 cells via transporters only.
We show with our own simulations that this is not in fact the case, especially when evolutionary selection is taken into account, and that the Haldane relation accounts straightforwardly for directional differences, even for equilibrative transporters.
Typical protein transporters alone can easily account for measured drug fluxes in Caco-2 cells.
Improving enzymes by directed evolution requires the navigation of very large search spaces; we survey how to do this intelligently.
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the ‘search space’ of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (K
d) and catalytic (k
cat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving k
cat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the ‘best’ amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Introduction: Unliganded iron both contributes to the pathology of Alzheimer's disease (AD) and also changes the morphology of erythrocytes (RBCs). We tested the hypothesis that these two facts might be linked, i.e., that the RBCs of AD individuals have a variant morphology, that might have diagnostic or prognostic value.
Methods: We included a literature survey of AD and its relationships to the vascular system, followed by a laboratory study. Four different microscopy techniques were used and results statistically compared to analyze trends between high and normal serum ferritin (SF) AD individuals.
Results: Light and scanning electron microscopies showed little difference between the morphologies of RBCs taken from healthy individuals and from normal SF AD individuals. By contrast, there were substantial changes in the morphology of RBCs taken from high SF AD individuals. These differences were also observed using confocal microscopy and as a significantly greater membrane stiffness (measured using force-distance curves).
Conclusion: We argue that high ferritin levels may contribute to an accelerated pathology in AD. Our findings reinforce the importance of (unliganded) iron in AD, and suggest the possibility both of an early diagnosis and some means of treating or slowing down the progress of this disease.
Alzheimer's disease; erythrocytes; iron; scanning electron microscopy; atomic force microscopy
Blood in healthy organisms is seen as a ‘sterile’ environment: it lacks proliferating microbes. Dormant or not-immediately-culturable forms are not absent, however, as intracellular dormancy is well established. We highlight here that a great many pathogens can survive in blood and inside erythrocytes. ‘Non-culturability’, reflected by discrepancies between plate counts and total counts, is commonplace in environmental microbiology. It is overcome by improved culturing methods, and we asked how common this would be in blood. A number of recent, sequence-based and ultramicroscopic studies have uncovered an authentic blood microbiome in a number of non-communicable diseases. The chief origin of these microbes is the gut microbiome (especially when it shifts composition to a pathogenic state, known as ‘dysbiosis’). Another source is microbes translocated from the oral cavity. ‘Dysbiosis’ is also used to describe translocation of cells into blood or other tissues. To avoid ambiguity, we here use the term ‘atopobiosis’ for microbes that appear in places other than their normal location. Atopobiosis may contribute to the dynamics of a variety of inflammatory diseases. Overall, it seems that many more chronic, non-communicable, inflammatory diseases may have a microbial component than are presently considered, and may be treatable using bactericidal antibiotics or vaccines.
Atopobiosis of microbes (the term describing microbes that appear in places other than where they should be), as well as the products of their metabolism, seems to correlate with, and may contribute to, the dynamics of a variety of inflammatory diseases.
Graphical Abstract Figure.Atopobiosis of microbes (the term describing microbes that appear in places other than where they should be), as well as the products of their metabolism, seems to correlate with, and may contribute to, the dynamics of a variety of inflammatory diseases.
‘sterile’ blood microbiome; culturability; dormancy; dysbiosis; atopobiosis; Parkinson's disease; Alzheimer disease
We have noted in previous work, in a variety of inflammatory diseases, where iron dysregulation occurs, a strong tendency for erythrocytes to lose their normal discoid shape and to adopt a skewed morphology (as judged by their axial ratios in the light microscope and by their ultrastructure in the SEM). Similarly, the polymerization of fibrinogen, as induced in vitro by added thrombin, leads not to the common ‘spaghetti-like’ structures but to dense matted deposits. Type 2 diabetes is a known inflammatory disease. In the present work, we found that the axial ratio of the erythrocytes of poorly controlled (as suggested by increased HbA1c levels) type 2 diabetics was significantly increased, and that their fibrin morphologies were again highly aberrant. As judged by scanning electron microscopy and in the atomic force microscope, these could be reversed, to some degree, by the addition of the iron chelators deferoxamine (DFO) or deferasirox (DFX). As well as their demonstrated diagnostic significance, these morphological indicators may have prognostic value.
Type II diabetes; Erythrocytes; Deferoxamine; Deferasirox
We rehearse the processes of innovation and discovery in general terms, using as our main metaphor the biological concept of an evolutionary fitness landscape. Incremental and disruptive innovations are seen, respectively, as successful searches carried out locally or more widely. They may also be understood as reflecting evolution by mutation (incremental) versus recombination (disruptive). We also bring a platonic view, focusing on virtue and memory. We use ‘virtue’ as a measure of efforts, including the knowledge required to come up with disruptive and incremental innovations, and ‘memory’ as a measure of their lifespan, i.e. how long they are remembered. Fostering innovation, in the evolutionary metaphor, means providing the wherewithal to promote novelty, good objective functions that one is trying to optimize, and means to improve one's knowledge of, and ability to navigate, the landscape one is searching. Recombination necessarily implies multi- or inter-disciplinarity. These principles are generic to all kinds of creativity, novel ideas formation and the development of new products and technologies.
innovation; evolutionary computing; philosophy of science
Mapping the landscape of possible macromolecular polymer sequences to their fitness in performing biological functions is a challenge across the biosciences. A paradigm is the case of aptamers, nucleic acids that can be selected to bind particular target molecules. We have characterized the sequence-fitness landscape for aptamers binding allophycocyanin (APC) protein via a novel Closed Loop Aptameric Directed Evolution (CLADE) approach. In contrast to the conventional SELEX methodology, selection and mutation of aptamer sequences was carried out in silico, with explicit fitness assays for 44 131 aptamers of known sequence using DNA microarrays in vitro. We capture the landscape using a predictive machine learning model linking sequence features and function and validate this model using 5500 entirely separate test sequences, which give a very high observed versus predicted correlation of 0.87. This approach reveals a complex sequence-fitness mapping, and hypotheses for the physical basis of aptameric binding; it also enables rapid design of novel aptamers with desired binding properties. We demonstrate an extension to the approach by incorporating prior knowledge into CLADE, resulting in some of the tightest binding sequences.
We exploit the recent availability of a community reconstruction of the human metabolic network (‘Recon2’) to study how close in structural terms are marketed drugs to the nearest known metabolite(s) that Recon2 contains. While other encodings using different kinds of chemical fingerprints give greater differences, we find using the 166 Public MDL Molecular Access (MACCS) keys that 90 % of marketed drugs have a Tanimoto similarity of more than 0.5 to the (structurally) ‘nearest’ human metabolite. This suggests a ‘rule of 0.5’ mnemonic for assessing the metabolite-like properties that characterise successful, marketed drugs. Multiobjective clustering leads to a similar conclusion, while artificial (synthetic) structures are seen to be less human-metabolite-like. This ‘rule of 0.5’ may have considerable predictive value in chemical biology and drug discovery, and may represent a powerful filter for decision making processes.
Electronic supplementary material
The online version of this article (doi:10.1007/s11306-014-0733-z) contains supplementary material, which is available to authorized users.
Genome-wide metabolic reconstruction; Recon 2; Cheminformatics; KNIME; Metabolite-likeness; Drug-likeness
Blood-vessel dysfunction arises before overt hyperglycemia in type-2 diabetes (T2DM). We hypothesised that a metabolomic approach might identify metabolites/pathways perturbed in this pre-hyperglycemic phase. To test this hypothesis and for specific metabolite hypothesis generation, serum metabolic profiling was performed in young women at increased, intermediate and low risk of subsequent T2DM.
Participants were stratified by glucose tolerance during a previous index pregnancy into three risk-groups: overt gestational diabetes (GDM; n = 18); those with glucose values in the upper quartile but below GDM levels (UQ group; n = 45); and controls (n = 43, below the median glucose values). Follow-up serum samples were collected at a mean 22 months postnatally. Samples were analysed in a random order using Ultra Performance Liquid Chromatography coupled to an electrospray hybrid LTQ-Orbitrap mass spectrometer. Statistical analysis included principal component (PCA) and multivariate methods.
Significant between-group differences were observed at follow-up in waist circumference (86, 95%CI (79–91) vs 80 (76–84) cm for GDM vs controls, p<0.05), adiponectin (about 33% lower in GDM group, p = 0.004), fasting glucose, post-prandial glucose and HbA1c, but the latter 3 all remained within the ‘normal’ range. Substantial differences in metabolite profiles were apparent between the 2 ‘at-risk’ groups and controls, particularly in concentrations of phospholipids (4 metabolites with p≤0.01), acylcarnitines (3 with p≤0.02), short- and long-chain fatty acids (3 with p< = 0.03), and diglycerides (4 with p≤0.05).
Defects in adipocyte function from excess energy storage as relatively hypoxic visceral and hepatic fat, and impaired mitochondrial fatty acid oxidation may initiate the observed perturbations in lipid metabolism. Together with evidence from the failure of glucose-directed treatments to improve cardiovascular outcomes, these data and those of others indicate that a new, quite different definition of type-2 diabetes is required. This definition would incorporate disturbed lipid metabolism prior to hyperglycemia.
The de novo synthesis of genes is becoming increasingly common in synthetic biology studies. However, the inherent error rate (introduced by errors incurred during oligonucleotide synthesis) limits its use in synthesising protein libraries to only short genes. Here we introduce SpeedyGenes, a PCR-based method for the synthesis of diverse protein libraries that includes an error-correction procedure, enabling the efficient synthesis of large genes for use directly in functional screening. First, we demonstrate an accurate gene synthesis method by synthesising and directly screening (without pre-selection) a 747 bp gene for green fluorescent protein (yielding 85% fluorescent colonies) and a larger 1518 bp gene (a monoamine oxidase, producing 76% colonies with full catalytic activity, a 4-fold improvement over previous methods). Secondly, we show that SpeedyGenes can accommodate multiple and combinatorial variant sequences while maintaining efficient enzymatic error correction, which is particularly crucial for larger genes. In its first application for directed evolution, we demonstrate the use of SpeedyGenes in the synthesis and screening of large libraries of MAO-N variants. Using this method, libraries are synthesised, transformed and screened within 3 days. Importantly, as each mutation we introduce is controlled by the oligonucleotide sequence, SpeedyGenes enables the synthesis of large, diverse, yet controlled variant sequences for the purposes of directed evolution.
directed evolution; error correction; gene synthesis; protein libraries
The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research.
text mining; event extraction; semantic annotation; semantic search
Genomic data now allow the large-scale manual or semi-automated reconstruction of metabolic networks. A network reconstruction represents a highly curated organism-specific knowledge base. A few genome-scale network reconstructions have appeared for metabolism in the baker’s yeast Saccharomyces cerevisiae. These alternative network reconstructions differ in scope and content, and further have used different terminologies to describe the same chemical entities, thus making comparisons between them difficult. The formulation of a ‘community consensus’ network that collects and formalizes the ‘community knowledge’ of yeast metabolism is thus highly desirable. We describe how we have produced a consensus metabolic network reconstruction for S. cerevisiae. Special emphasis is laid on referencing molecules to persistent databases or using database-independent forms such as SMILES or InChI strings, since this permits their chemical structure to be represented unambiguously and in a manner that permits automated reasoning. The reconstruction is readily available via a publicly accessible database and in the Systems Biology Markup Language, and we describe the manner in which it can be maintained as a community resource. It should serve as a common denominator for system biology studies of yeast. Similar strategies will be of benefit to communities studying genome-scale metabolic networks of other organisms.
GeneGenie, a new online tool available at http://www.gene-genie.org, is introduced to support the design and self-assembly of synthetic genes and constructs. GeneGenie allows for the design of oligonucleotide cohorts encoding the gene sequence optimized for expression in any suitable host through an intuitive, easy-to-use web interface. The tool ensures consistent oligomer overlapping melting temperatures, minimizes the likelihood of misannealing, optimizes codon usage for expression in a selected host, allows for specification of forward and reverse cloning sequences (for downstream ligation) and also provides support for mutagenesis or directed evolution studies. Directed evolution studies are enabled through the construction of variant libraries via the optional specification of ‘variant codons’, containing mixtures of bases, at any position. For example, specifying the variant codon TNT (where N is any nucleotide) will generate an equimolar mixture of the codons TAT, TCT, TGT and TTT at that position, encoding a mixture of the amino acids Tyr, Ser, Cys and Phe. This facility is demonstrated through the use of GeneGenie to develop and synthesize a library of enhanced green fluorescent protein variants.
•We now have metabolic network models; the metabolome is represented by their nodes.•Metabolite levels are sensitive to changes in enzyme activities.•Drugs hitchhike on metabolite transporters to get into and out of cells.•The consensus network Recon2 represents the present state of the art, and has predictive power.•Constraint-based modelling relates network structure to metabolic fluxes.
Metabolism represents the ‘sharp end’ of systems biology, because changes in metabolite concentrations are necessarily amplified relative to changes in the transcriptome, proteome and enzyme activities, which can be modulated by drugs. To understand such behaviour, we therefore need (and increasingly have) reliable consensus (community) models of the human metabolic network that include the important transporters. Small molecule ‘drug’ transporters are in fact metabolite transporters, because drugs bear structural similarities to metabolites known from the network reconstructions and from measurements of the metabolome. Recon2 represents the present state-of-the-art human metabolic network reconstruction; it can predict inter alia: (i) the effects of inborn errors of metabolism; (ii) which metabolites are exometabolites, and (iii) how metabolism varies between tissues and cellular compartments. However, even these qualitative network models are not yet complete. As our understanding improves so do we recognise more clearly the need for a systems (poly)pharmacology.
It is well-known that individuals with increased iron levels are more prone to thrombotic diseases, mainly due to the presence of unliganded iron, and thereby the increased production of hydroxyl radicals. It is also known that erythrocytes (RBCs) may play an important role during thrombotic events. Therefore the purpose of the current study was to assess whether RBCs had an altered morphology in individuals with hereditary hemochromatosis (HH), as well as some who displayed hyperferritinemia (HF). Using scanning electron microscopy, we also assessed means by which the RBC and fibrin morphology might be normalized. An important objective was to test the hypothesis that the altered RBC morphology was due to the presence of excess unliganded iron by removing it through chelation. Very striking differences were observed, in that the erythrocytes from HH and HF individuals were distorted and had a much greater axial ratio compared to that accompanying the discoid appearance seen in the normal samples. The response to thrombin, and the appearance of a platelet-rich plasma smear, were also markedly different. These differences could largely be reversed by the iron chelator desferal and to some degree by the iron chelator clioquinol, or by the free radical trapping agents salicylate or selenite (that may themselves also be iron chelators). These findings are consistent with the view that the aberrant morphology of the HH and HF erythrocytes is caused, at least in part, by unliganded (‘free’) iron, whether derived directly via raised ferritin levels or otherwise, and that lowering it or affecting the consequences of its action may be of therapeutic benefit. The findings also bear on the question of the extent to which accepting blood donations from HH individuals may be desirable or otherwise.
Multiple models of human metabolism have been reconstructed, but each represents only a subset of our knowledge. Here we describe Recon 2, a community-driven, consensus ‘metabolic reconstruction’, which is the most comprehensive representation of human metabolism that is applicable to computational modeling. Compared with its predecessors, the reconstruction has improved topological and functional features, including ~2× more reactions and ~1.7× more unique metabolites. Using Recon 2 we predicted changes in metabolite biomarkers for 49 inborn errors of metabolism with 77% accuracy when compared to experimental data. Mapping metabolomic data and drug information onto Recon 2 demonstrates its potential for integrating and analyzing diverse data types. Using protein expression data, we automatically generated a compendium of 65 cell type–specific models, providing a basis for manual curation or investigation of cell-specific metabolic properties. Recon 2 will facilitate many future biomedical studies and is freely available at http://humanmetabolism.org/.
Systems biology projects and omics technologies have led to a growing number of biochemical pathway models and reconstructions. However, the majority of these models are still created de novo, based on literature mining and the manual processing of pathway data.
To increase the efficiency of model creation, the Path2Models project has automatically generated mathematical models from pathway representations using a suite of freely available software. Data sources include KEGG, BioCarta, MetaCyc and SABIO-RK. Depending on the source data, three types of models are provided: kinetic, logical and constraint-based. Models from over 2 600 organisms are encoded consistently in SBML, and are made freely available through BioModels Database at http://www.ebi.ac.uk/biomodels-main/path2models. Each model contains the list of participants, their interactions, the relevant mathematical constructs, and initial parameter values. Most models are also available as easy-to-understand graphical SBGN maps.
To date, the project has resulted in more than 140 000 freely available models. Such a resource can tremendously accelerate the development of mathematical models by providing initial starting models for simulation and analysis, which can be subsequently curated and further parameterized.
Modular rate law; Constraint based models; Logical models; SBGN; SBML
We present an experimental and computational pipeline for the generation of kinetic models of metabolism, and demonstrate its application to glycolysis in Saccharomyces cerevisiae. Starting from an approximate mathematical model, we employ a “cycle of knowledge” strategy, identifying the steps with most control over flux. Kinetic parameters of the individual isoenzymes within these steps are measured experimentally under a standardised set of conditions. Experimental strategies are applied to establish a set of in vivo concentrations for isoenzymes and metabolites. The data are integrated into a mathematical model that is used to predict a new set of metabolite concentrations and reevaluate the control properties of the system. This bottom-up modelling study reveals that control over the metabolic network most directly involved in yeast glycolysis is more widely distributed than previously thought.
Glycolysis; Systems biology; Enzyme kinetic; Isoenzyme; Modelling
A considerable number of areas of bioscience, including gene and drug discovery, metabolic engineering for the biotechnological improvement of organisms, and the processes of natural and directed evolution, are best viewed in terms of a ‘landscape’ representing a large search space of possible solutions or experiments populated by a considerably smaller number of actual solutions that then emerge. This is what makes these problems ‘hard’, but as such these are to be seen as combinatorial optimisation problems that are best attacked by heuristic methods known from that field. Such landscapes, which may also represent or include multiple objectives, are effectively modelled in silico, with modern active learning algorithms such as those based on Darwinian evolution providing guidance, using existing knowledge, as to what is the ‘best’ experiment to do next. An awareness, and the application, of these methods can thereby enhance the scientific discovery process considerably. This analysis fits comfortably with an emerging epistemology that sees scientific reasoning, the search for solutions, and scientific discovery as Bayesian processes.
automation; epistemology; evolutionary computing; heuristics; scientific discovery
Following a strategy similar to that used in baker’s yeast (Herrgård et al. Nat Biotechnol 26:1155–1160, 2008). A consensus yeast metabolic network obtained from a community approach to systems biology (Herrgård et al. 2008; Dobson et al. BMC Syst Biol 4:145, 2010). Further developments towards a genome-scale metabolic model of yeast (Dobson et al. 2010; Heavner et al. BMC Syst Biol 6:55, 2012). Yeast 5—an expanded reconstruction of the Saccharomyces cerevisiae metabolic network (Heavner et al. 2012) and in Salmonella typhimurium (Thiele et al. BMC Syst Biol 5:8, 2011). A community effort towards a knowledge-base and mathematical model of the human pathogen Salmonellatyphimurium LT2 (Thiele et al. 2011), a recent paper (Thiele et al. Nat Biotechnol 31:419–425, 2013). A community-driven global reconstruction of human metabolism (Thiele et al. 2013) described a much improved ‘community consensus’ reconstruction of the human metabolic network, called Recon 2, and the authors (that include the present ones) have made it freely available via a database at http://humanmetabolism.org/ and in SBML format at Biomodels (http://identifiers.org/biomodels.db/MODEL1109130000). This short analysis summarises the main findings, and suggests some approaches that will be able to exploit the availability of this model to advantage.
Metabolism; Modelling; Systems biology; Networks; Metabolic networks
Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge.
Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches.
Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText.
Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/.
Supplementary data are available at Bioinformatics online.