The applications of genome-scale metabolic modelling have increased over recent years, as have the number of metabolic models available and the diversity of organisms that such reconstructions cover [1
]. Traditional approaches to analysing such models have focused on constraint-based modelling, including widely used techniques such as flux balance analysis (FBA) [2
]. FBA relies upon specification of an objective function that the cell is assumed to optimise. Objective functions can cover a range of cellular objectives [3
], such as maximisation / minimisation of ATP consumption, but frequently (and particularly in the case of microorganisms) take the form of an assumed “biomass” function; a hypothetical reaction that mimics cell growth rate [4
]. Such a biomass function is used to account for the flow of materials that are necessary for building new cells, and is commonly required in constraint-based models even when maximising variables other than growth rate.
Maximisation of biomass yield is not generally considered a valid principle in microbiology [5
]; however, it is commonly assumed to be the optimality criterion driving the evolution of microorganisms [6
]. The premise is that an organism that acquires and/or redistributes resources to outgrow its competitors will be in the best position to survive [7
]. Assumption of growth rate maximisation in FBA studies has led to successful predictions of the actual growth rate in a number of organisms [4
]. Such an assumption, however, is likely to be invalid for individual cell types in multicellular organisms, where cellular objectives may differ greatly both between and within tissues. The assumption of maximal rates of biomass production involves an objective at the cellular level, whereas in multicellular organisms a given cell’s objective is likely to be realised via survival at the organism level, which may not necessarily be dependent upon the growth of the cell. Moreover, signals from the extracellular environment may trigger different cellular priorities and objectives. These issues have been demonstrated by the work of Gille et al
], who modelled the metabolism of human hepatocyte, indicating their objective is to preserve homeostasis of blood compounds, a process that is modulated by the extracellular availability of oxygen and other nutrients.
Another concern when applying FBA is the inadequacy of the biomass definition itself. Typically, the assumed composition of E. coli
is used as a template to define the biomass of the organism of interest [4
]. To do this, a cell’s macromolecular composition (in terms of protein, RNA, DNA, carbohydrate and lipid content), the metabolite content of each macromolecular class, and the biosynthetic and maintenance costs for various cellular processes are required [4
]. Not only are such numbers difficult to determine, but it is also to be expected that they would change drastically under different environmental conditions. Problems associated with reliance on a biomass objective function have led to a number of studies that focus on the determination of a suitable objective function [10
This work therefore focuses on another approach, investigating the use of ‘omics data to act as a guide for the prediction of the intracellular metabolic fluxes that a given cell exhibits. A priori
it may be supposed that enzymatic transcript concentrations and metabolic fluxes can be related to each other, albeit in a complex manner, since fluxes are clearly dependent on the concentrations of enzymes and/or their encoding transcripts [13
]. Drawing upon previous work [14
], this approach investigates how relating metabolic fluxes to enzyme-encoding gene expression levels affects the predictive power of constraint-based analysis. The hypothesis is that doing so would provide a comparable, or better, representation of intracellular fluxes than does reliance upon an assumed biomass objective alone. As stated in related work by Becker et al
], “the statement of an [assumed] objective introduces a ‘user-bias’ and such objective may not be relevant to the true physiological state.” It is the removal of this user-bias, through the application of a purely data-driven objective, that this work attempts to address.
This study involves the acquisition and analysis of absolute quantitative transcriptomics data from Saccharomyces cerevisiae
, and the subsequent use of these data to constrain an existing genome-scale metabolic model [19
]. A comparison between FBA results generated through this approach and those generated from applying a more traditional biomass objective is performed.
Such an approach may have ramifications for the genome-scale modelling of human metabolism [20
], where tissue-specific microarray data are publicly available and have been exploited in the development of tissue-specific models of human metabolism by Shlomi et al
]. The approach of Shlomi et al
. categorises numerous tissue-specific relative gene expression data sets into high, medium and low expression, with the goal of limiting the tissue-specific network to contain only those reactions for which consistent high gene expression has been observed. A disadvantage of this work is that it was performed upon a “consensus” of gene expression data, generated under a range of physiological and experimental conditions. Furthermore, the data used are generated from microarrays, which provide relative
expression ratios of the same gene under different conditions. Microarrays are applicable to comparative studies, and as such, the data that they produce do not allow for comparison of absolute expression levels across
genes, primarily due to differences in hybridisation efficiency [21
]. In addition, microarray data is associated with a number of common problems, including cross-hybridisation issues, limited dynamic detection range, presence of background noise and the detection of transcripts being limited to sequences printed on the array [22
The approach provided here relies upon absolute
gene expression data generated under the condition of interest, using RNA-Seq. RNA-Seq provides expression levels in terms of counts of expressed transcripts that can be related to transcripts per cell and thus an absolute level. Therefore, the expression levels generated are comparable across the transcriptome and have been shown to be more indicative of protein concentrations than gene expression levels generated from microarrays [21
]. By extension, RNA-Seq data is likely to provide a more reliable indication of enzymatic activity than that generated through relative expression techniques. Furthermore, RNA-Seq mitigates many of the limitations inherent in the use of microarrays [24