Here the strategy for reconciling differences between in silico
predictions and in vivo
observations (RING) is applied to validate and upgrade the first reconstruction of the genome-scale metabolic model of S. pombe
, SpoMBEL1693. The ability of this newly reconstructed metabolic model to represent the metabolic physiology of this yeast was analyzed by comparing the growth phenotypes obtained by single gene knockout simulations with those experimentally observed for the single-gene knockout mutant library [23
]. Using RING, the discrepancies between in silico
predictions and in vivo
observations were systematically and iteratively resolved. The overall scheme for the process can be seen in Figure .
Figure 1 Diagram of the overall scheme of RING (Reconciliation ofIn silico/in vivomutaNt Growth) divided into two stages. In the top stage (Blue Box), the reconstruction of the genome-scale metabolic model and the single-gene mutant growth simulations from the (more ...) In silico
growth phenotypes for the deletion of every metabolic reaction were generated and the respective genes associated to each metabolic reaction were identified. These growth phenotypes were then categorized as either positive or negative (viable or lethal) with a viability threshold of 10% of the “wild-type” growth rate. Then the in vivo
phenotypes for each gene was then retrieved from the single-gene knockout mutant library publically available [23
]. Once the in vivo
phenotypes were retrieved and compared against the in silico
predictions, the growth phenotypes were then further categorized based on whether the predictions matched the in vivo
observations (True or False). The false predictions were then sorted and analyzed in a step-wise manner outlined in Figure until all predictions were examined.
The iterative manner, with which RING was employed, was to ensure that the changes made to SpoMBEL1693 to reconcile the discrepancies, do not alter other results and negatively affect the overall accuracy, defined as the number of correct in silico predictions over the total number of predictions made, of the metabolic model. By reconciling discrepancies between in silico prediction and in vivo data, the genome-scale metabolic model was able to accurately represent the metabolic characteristics of S. pombe. Simulations were performed under YES media conditions and the knockout results were categorized as either positive or negative, where the positive represents viable phenotype for a given knockout and the negative represents a lethal phenotype for that knockout. When compared against the published results obtained with mutant library, the results are categorized as either true/false positives or true/false negatives according to whether the prediction agrees with the in vivo results. True results indicate that the in silico predictions match with the in vivo results and false results indicate a discrepancy between the two. A false positive indicates that SpoMBEL1693 predicts a viable phenotype while the in vivo result shows a lethal phenotype (Table ). A false negative result represents that the SpoMBEL1693 predicts a lethal phenotype while the in vivo result shows a viable phenotype (Table ). Analysis of false predictions via RING highlights gaps in the knowledge of the metabolism of S. pombe and leads to improvements to the metabolic model by reconciling these differences between the in silico prediction and in vivo observations.
Terms and definitions used in the analysis between in silico predictions and in vivo observations
Metabolic model characteristics
The metabolic model of S. pombe
, SpoMBEL1693, consists of 1693 metabolic reactions, including 386 transport and exchange reactions, and 1744 metabolites. The metabolic model is divided into 8 different compartments to represent the different organelles in S. pombe
: cytoplasm, mitochondria, nucleus, peroxisome, endoplasmic reticulum, golgi apparatus, vacuole and the extracellular environment (Additional file 1
). The metabolic reactions were taken from the Kyoto Encyclopedia of Genes and Genomes [24
], NCBI, and supplemented with information in the S. pombe
gene database on GeneDB [25
]. Compartmental assignment of the reactions was based on the reports in which protein localization experiments were performed [26
]. The total gene coverage of the metabolic model is 605 genes out of 4940 protein-coding genes.
An important metabolic reaction in SpoMBEL1693 is the biomass formation reaction. This “pseudo” metabolic reaction is used to represent the synthesis of cellular biomass, or cell growth. Construction of the biomass reaction involves the accumulation of all important components necessary for biomass formation with the coefficients determined through both experimental measurements and data present in the literature. The biomass reaction is particularly important in our analysis as it is employed to indicate whether a metabolic reaction and their respective genes are essential for growth. Detailed information in the construction of the biomass reaction can be found in the methods and in Additional file 2
. To validate the reconstruction of this metabolic model, the in silico
single knockout simulations was measured against the single-gene knockout mutant library through the use of the RING strategy and will be discussed in detailed here. Furthermore, additional validation of the metabolic model was done by comparing the metabolic model’s capability in utilizing various carbon sources and production of ethanol at different dilution rates (See Additional file 3
Gene/reaction essentiality simulation
Gene knockout simulations were performed to evaluate the capability of the metabolic model to predict growth phenotypes of S. pombe.
The impact of each metabolic reaction and its respective gene on the growth phenotype was investigated using the metabolic model. As a result, 198 essential metabolic reactions corresponding to 84 genes were identified (Additional file 4
). Transport reactions and metabolic reactions for which no gene assignment or experimental data were available were not included in the analysis. However, duplicate metabolic reactions in different compartments were included and this accounts for the large difference in number of metabolic reactions and genes. It should be noted that the in silico
simulation of the genome-scale metabolic model was based solely on the stoichiometry of the metabolic reactions, while the regulatory, signaling or other interactive information was not included.
Lethal genes were determined by observing the change in the in silico
growth rate when the corresponding metabolic reaction was removed from the model, representing the deletion of its respective genes. If the cell growth rate dropped to zero or less than 10% of the original “wild-type” growth rate, the resulting phenotype was classified as lethal and the reaction and its respective genes were considered to be essential. When no change to the in silico
growth rate was observed or remained greater than 10% of the “wild-type” growth rate, the metabolic reaction and its respective genes were determined to be non-essential, as the resulting phenotype is viable. The RING analysis was performed in an iterative manner where the metabolic model was revised based on the analysis of the comparison between the results of in silico
knockout simulation and those experimentally observed with single-gene knockout library [23
Resolution and analysis of false positive predictions
False results indicate that information is absent or incorrect in the metabolic model resulting in a discrepancy with what is observed in vivo. Thus, these false results must be resolved through adding missing or correcting erroneous information such that the in silico predictions match the observed in vivo phenotypes. In this section we will examine the different cases for which false positive prediction arises and strategies to resolve these discrepancies. A false positive prediction indicates that a viable phenotype is incorrectly predicted by the metabolic model when a metabolic reaction (and by association, its corresponding gene) is deleted. Analysis of the initial positive, or viable, predictions of mutant phenotypes of SpoMBEL1693 resulted in 65.4% of the positive predictions matching the observed in vivo phenotypes (296 false positives and 560 true positives) (Figure ). Strategies in resolving these inconsistencies through RING analysis are summarized in Figure and are outlined in this section. The different strategies were implemented in stages to systematically analyze the false positive predictions.
Figure 2 Summary of the metabolic reaction in their categories from the results of thein silicosingle-gene mutant prediction for SpoMBEL1693.A) Initial results and percentages of SpoMBEL1693 on predicting single-gene mutant growth phenotype. B) Improved rates (more ...)
Figure 3 Diagram of possible causes and solutions for false predictions. Circles represent metabolites and arrows represent metabolic reactions A) false positive predictions B) false negative predictions. Red arrows indicate problems in the network and green arrows (more ...)
The first step in reconciling false positive predictions the identification of all duplicated or redundant metabolic reactions localized in a different compartment of the metabolic network. The presence of these redundant metabolic reactions are the result of localization data placing the respective proteins in these compartment and as a result provides an alternate route through another cellular compartment (Figure Case 1). Localization data can also place an enzyme in another compartment but with no other enzymes that would balance the generation or consumption of the metabolites (orphan reaction). Knockout of this reaction would give a false positive if the gene were to be essential and the duplicate metabolic reaction in the functional compartment a true negative prediction. A total of 41 metabolic reactions fall under this category and when resolved were reclassified under the negative predictions. For instance, many of the metabolic reactions have had their respective proteins localized in the nucleus isolated from other metabolic reactions in clusters or as individuals but no complete pathways, such as the first two steps into lower glycolysis, nicotinate metabolism and pentose metabolism. To validate the essentiality of the genes, all instances of the encoding metabolic reactions were deleted simultaneously.
Metabolic reactions with false positive predictions were then checked for their connectivity to the metabolic network. Analysis of the connectivity of these metabolic reactions showed that false predictions were also correlated to dead end metabolic reactions in pathways which are not connected at the downstream end, but connected at the upstream end (dead end reactions) and non-redundant orphan metabolic reactions. The orphan metabolic reactions (Figure Case 2) account for 31 metabolic reactions in SpoMBEL1693, and include metabolic reactions that charge tRNA with amino acids to be used for protein synthesis. However, tRNA compositions have already been incorporated into the biomass formation reaction, making these metabolic reactions redundant and therefore were removed from the analysis, but retained in the metabolic model.
Metabolic reactions in dead end pathways were reconciled by connecting the ends of the pathways to the metabolic network (Figure Case 3). In the extreme instance where linking the metabolic pathway to the metabolic network failed to resolve the false positive prediction, the major downstream metabolite was incorporated into the biomass metabolic reaction representing cellular growth, directly linking the metabolic pathway to cellular growth. The heme biosynthetic pathway is one example of this case. Heme showed no metabolic role or function in the metabolic model, resulting in false positive results in the knockout simulation. However, the genes encoding for the metabolic reactions of the heme biosynthesis pathway were found to be essential for growth according to the single-gene mutant library as evidenced by the lethal phenotype displayed in knockouts of genes in heme biosynthesis. Thus, heme was incorporated into the biomass metabolic reaction with a coefficient calculated with a negligible cellular concentration to prevent any drain of cellular resources by heme biosynthesis. By incorporating heme into the biomass metabolic reaction, the biosynthesis of heme becomes linked to cellular growth. A consequence of linking heme to biomass is the inclusion of iron ions into the YES media. Sterol biosynthesis is one instance where linking the metabolic pathways to the rest of the network was sufficient for resolving false positive predictions. Gaps in the metabolic pathway of sterol biosynthesis were filled (SPBC1709.07 and SPBC16E9.05) and confirmed through GeneDB to resolve the false positive predictions. A total of 37 metabolic reactions with false positive predictions were resolved and re-categorized as true negatives.
The gene associations to metabolic reactions were then examined to reconcile false positive prediction from the knockout simulation. One instance of this case is the association of multiple metabolic reactions to a single gene (Figure Case 4). Enzymes encoded by a gene have been known to participate in multiple functionalities in the metabolic network, and as a result, multiple metabolic reactions in the metabolic model are associated with the same gene. Hence, deletion of just one of the metabolic reactions does not accurately reflect the single gene knockout of the respective gene. To resolve this, all metabolic reactions associated to the target gene were deleted simultaneously. With the metabolic reactions simultaneously deleted, such false positive prediction was resolved and a lethal phenotype was predicted. Sixty-four metabolic reactions were reconciled in this manner (Figure ).
The remaining false positive predictions were those that could not be reconciled in RING, due to lack of the information available regarding the metabolic network.
Sixty-two metabolic reactions with false positive predictions showed no flux in the in silico
wild-type flux distribution, indicating that these metabolic reactions are not used for growth, despite the fact that the deletion of their corresponding genes gives a lethal phenotype in vivo
. The absence of any flux through these 62 metabolic reactions could be attributed to the lack of regulatory information that would direct the flux through that metabolic reaction. Thirty seven metabolic reactions that showed false predictions were not reconciled with high confidence due to the simultaneous assignment of both viable and lethal genes to the metabolic reactions. Eight of the 37 metabolic reactions overlap with the previous category where the metabolic reactions exhibit no flux in SpoMBEL1693. The remaining 29 metabolic reactions are utilized and exhibit fluxes when the growth rate is maximized. However, there is no indication whether the deletion of the reaction results in a lethal phenotype or the lethal gene(s) functions in another capacity that is essential for growth, but not reflected in the metabolic network. Therefore, to resolve these cases with high confidence, detailed characterization of all the genes associated to the metabolic reaction is needed. Overall, the correct prediction rate of viable phenotype was improved to 79.6% (61 false positive and 561 true positive predictions) (Figure ) after RING was applied (Additional file 4
Resolution and analysis of false negative predictions
False negative predictions are results where the growth phenotype is predicted to be lethal, but instead is viable experimentally. Initial negative prediction rate was 41.4% (55 false negative and 39 true negative predictions) (Figure ). These false negative predictions were also analyzed in stages and reconciled through RING (Figure ).
Analysis of false negative predictions started with the examination of the genes associated to the metabolic reactions with false negative predictions. The large majority of false negative metabolic reactions were found to have multiple genes associated with the metabolic reactions (Figure Case 1). Eleven of the metabolic reactions were associated with both viable and lethal genes and 25 metabolic reactions were associated with only viable genes. Reconciling the false prediction of these metabolic reactions could not be resolved due to insufficient information regarding the functional roles these genes play in the metabolic reactions. For example, in metabolic reactions associated with both lethal and viable genes, it is possible that the viable gene is a minor or non-essential contributor to the functional performance of the metabolic reaction. Also, for metabolic reactions with multiple viable genes associated, it is possible that they perform an auxiliary role to each other and can functionally replace the other when that gene is deleted. In this instance, all genes associated to the metabolic reaction would have to be deleted to confirm essentiality of the reaction.
Another instance of Case 1 is where all the genes associated with the metabolic reaction are viable; it is also uncertain if the metabolic reaction is essential to the metabolic network (true negative) or if the negative prediction is indeed a false prediction. If the metabolic reaction is truly essential to the metabolic network, then the knockout of all the genes that are associated with the metabolic reaction would give the lethal phenotype when predicted using SpoMBEL1693. Single-gene knockout mutants for these genes would not be sufficient in suppressing the metabolic reaction as it would be compensated by the presence of alternate genes that can function in place of the deleted gene. Due to the lack of information that would allow for the reconciling of these false predictions, the metabolic reactions were removed from the analysis and noted for future research.
The remaining false negative predictions were examined to determine if the metabolic reactions affected the biosynthesis of biomass components for cellular growth. In this case, an alternate metabolic reaction is needed to resolve this false prediction (Figure Case 2). If a metabolic reaction is the only source of an essential metabolite (i.e.
an essential intermediate necessary for the biosynthesis of biomass components), strategies were investigated to supply the essential metabolite from other sources within the metabolic network (e.g.
another compartment). For example, in the cytoplasm, acetyl-CoA was produced only through the metabolic reaction represented by the enzyme Acetyl-CoA synthetase, which is a non-essential enzyme for growth based on the single-gene knockout mutant library. However, knockout simulations show that acetyl-CoA in the cytoplasm is essential for growth, a precursor to the synthesis of biomass components. Thus, an alternate pathway that can produce acetyl-CoA is needed in the cytoplasm. Alternate metabolic reactions capable of producing acetyl-CoA were found in the mitochondria. However, localization data of the metabolic enzymes in S. pombe
does not support the presence of the corresponding metabolic reactions in the cytoplasm [27
]. Thus, to allow the cytoplasm compartment access to the acetyl-CoA produced in the mitochondria, the exchange reaction for acetyl-CoA between the mitochondria and the cytoplasm was added to confirm that a viable phenotype can be attained (Figure Case 2). The addition of this exchange reaction resulted in a viable phenotype and suggests the presence of an acetyl-CoA transport from the mitochondria to the cytosol. Direct transport of acetyl-CoA between the intracellular compartments is not possible due to the compound’s bulkiness and amphiphilic nature [28
], therefore, the S. pombe
genome was searched for a carnitine-acetyl-CoA shuttle that has been reported in S. cerevisiae
(CAT2, YAT1 and YAT2). However, a search through the genome annotation and a BLAST search for the carnitine-acetyl-CoA shuttle in S. pombe
resulted in no candidates. Due to the lack of any possible candidates as a transport protein for acetyl-CoA across the mitochondrial membrane and the improbability of a direct transport of acetyl-CoA, the inconsistency of acetyl-CoA synthetase remained unresolved. The remaining 16 metabolic reactions were unable to be reconciled due to insufficient information. After RING analysis of false negative predictions, the reconciliation between in silico
and in vivo
phenotypes resulted in the improvement of the correct prediction rate from 41.4% to 92.5% of the negative predictions matching the observed in vivo
phenotypes (17 false negative predictions and 198 true negative predictions) (Figure ).
Comparative analysis of the yeast metabolic models
The predictive capability of the S. pombe
genome-scale metabolic model was compared to the predictive capability of another yeast metabolic model that has been reconstructed, S. cerevisiae i
MM904 was employed for similar studies in predicting the in silico
growth phenotypes and was used as a basis for eukaryotic metabolic model’s prediction capability of mutant growth phenotypes [18
]. First, the overall metabolisms of the two yeasts were examined with compartmental assignment of duplicate metabolic reactions ignored in both yeasts, with the exception of metabolic reactions where the localization of these reactions was distinctly different. One distinct difference between S. pombe
and S. cerevisiae
is the lack of metabolic reactions localized in the peroxisome, due to the scarcity of knowledge on peroxisome in the fission yeast, highlighting the need for additional studies into peroxisomal metabolism in S. pombe
]. The central metabolic network between the two yeasts displayed little variability in the structure of the metabolic network, with the exception of the absence of the glyoxylate shunt in S. pombe.
The results of the analysis of SpoMBEL1693 to predict mutant growth phenotypes were compared to those obtained with the S. cerevisiae
metabolic model i
]. In the analysis of i
MM904, the statistical classification function, specificity and sensitivity, were employed in the analysis of the essentiality simulation to represent the proportion of negative and positive (lethal and viable) phenotypes correctly predicted as negative and positive, respectively (Table ). In other words, specificity represents the proportion of negative phenotypes that were correctly predicted to be negative by the metabolic model (TN:TN
FP). Sensitivity is defined the same except that it looks at the proportion of positive phenotypes correctly predicted to be positive by the metabolic model (TP:TP
FN). The specificity of 53.6% and sensitivity of 99.1% were achieved using i
]. For comparison, the specificity and sensitivity in predicting the phenotypes of single-gene knockout mutants using SpoMBEL1693 were calculated. A higher specificity of 76.4% and a comparable sensitivity of 97.1% were obtained with SpoMBEL1693. A false viable rate, FP/(FP
FN), or the ratio of false predictions that have been experimentally observed to be lethal, was also calculated for i
MM904 and compared with that obtained with SpoMBEL1693. The false viable rate obtained with SpoMBEL1693 (23.5%) was lower than that (46.4%) obtained with i
MM904 (Figure ). The specificities of other metabolic models, for which essentiality analysis was performed, were also calculated. It was found that the specificity of SpoMBEL1693 was similar to four of the seven metabolic models (70-80%), and of the remaining three, only one had a higher specificity than SpoMBEL1693 (Figure ). The metabolic model of the extensively studied bacterium Escherichia coli, i
AF1260, was listed to have a specificity of 73.4%, placing the S. pombe
metabolic model on the same level of performance with this bacterium in predicting mutant growth phenotypes.
With the S. pombe
genome-scale metabolic model improved through RING, its metabolic capabilities were examined and compared to the metabolic capabilities of the S. cerevisiae
genome-scale metabolic model. The maximum in silico
mol yield of 4 different metabolites, which have been targeted in the past metabolic engineering (acetate, ethanol, lactate and succinate), was determined for each yeast using their respective genome-scale metabolic models (SpoMBEL1693 and i
MM904). Results show a difference in maximum in silico
yield for the metabolites acetate and lactate and no difference in the yields for ethanol and succinate (Table ). Simulations show that S. pombe
has a higher yield in producing lactate than S. cerevisiae
(approximately 15% less than S. pombe
) suggesting that S. pombe
would be a more ideal host for producing lactate from glucose. With acetate, S. pombe
shows a slightly lower yield than in S. cerevisiae
, which is an advantage for S. pombe
as acetate is commonly found as a metabolic by-product. Furthermore, the lower acetate yield may also be a reflection of the absence of acetate during the aerobic ethanol fermentation in S. pombe
, whereas acetate was observed in S. cerevisiae
Maximumin silicomolar yields of various metabolitesa