The diversity of bacterial metabolism and the perspective of engineering applications has spurred a steep increase in both the number of sequencing projects and the volume of high throughput experiments on bacteria. The need to interpret and integrate these datasets at the systems level has triggered the development of model-based computational methods [1
]. Among them, the constraint-based modeling approach (CBM) has proved to be particularly efficient at integrating large-scale omics
datasets related to metabolism, such as growth phenotypes, metabolite concentrations, or reaction fluxes [2
]. In addition to providing a structured summary of metabolism-related knowledge for a given species, a constraint-based model allows the prediction and analysis of a variety of properties resulting from topological, stoichiometric, and physiological constraints known to apply at steady-state to its global metabolic network. Applications range from studies on evolutionary or physiological properties to the design of metabolic engineering strategies for biotechnological or therapeutical purposes [3
]. Nearly twenty such models have been built so far [2
], typically through extensive curation work, and, for some of them, through iterative refinement processes where models were progressively improved by comparison with experimental datasets [4
Systematic evaluation of gene essentiality has proved to be a valuable resource for investigating gene functions; knockout mutant collections have been recently built in this aim for a number of bacteria [5
]. Rigorous analysis of their results remains a challenging task, however, as gene essentiality depends on the environmental condition and the link between genes and essential functions may be blurred by genetic or metabolic redundancy [9
]. Genome-scale metabolic models provide a valuable framework to help interpret essentiality screens, since they both recapitulate knowledge on metabolic networks and allow prediction of gene essentiality under well-defined conditions. They have also allowed meaningful cross-validation of reconstructed metabolic networks with sets of gene essentiality results, providing insights on potential erroneous or incomplete metabolic knowledge, and on possible improvements [4
]. In this article, we systematically exploit inconsistencies between model predictions and experimental results to improve a metabolic model reconstruction.
Our focus is on Acinetobacter baylyi
ADP1, a strictly aerobic γ-proteobacterium. Although phylogenetically close to the Acinetobacter baumanii
pathogenic strains, responsible for a growing number of nosocomial infections [13
], A. baylyi
ADP1 is an innocuous soil bacterium. Because of its metabolic versatility and high competency for natural genetic transformation, it is a model organism of choice for genetic and metabolic investigations [14
]. As a soil bacterium, A. baylyi
is able to degrade a wide range of molecules, including components of suberin, a protective polymer produced by plants in response to stress. Its harmlessness, nutritional versatility, and high capacity for adaptation have led bacteria of the Acinetobacter
genus to be used for a variety of biotechnological applications–including the degradation of pollutants (e.g. biphenyl, phenol, benzoate, crude oil, nitriles) and the production of valuable biochemical products such as lipases, proteases, bioemulsifiers, cyanophycine and different kinds of biopolymers [17
]. Following its sequencing and expert annotation [19
], a genome-wide single-knockout mutant library was generated (ADP1 mutant collection [8
]), enabling the high-throughput assessment of mutant phenotypes in defined growth conditions.
We report below on the reconstruction and refinement of a genome-scale metabolic model for A. baylyi with the help of high-throughput experimental data. Following an initial reconstruction using metabolic information extracted from the genome annotation and the literature, the model was iteratively assessed and improved by comparing its predictions with (1) large-scale growth phenotyping results of the wild-type strain on 190 distinct environments, (2) genome-wide gene essentiality data from the mutant collection, and (3) conditional gene essentiality data derived from growth phenotyping of A. baylyi mutants on eight defined media. We examined each inconsistency between experimental results and model predictions, and corrected the model when sufficient justifying evidence could be collected. Combining the three refinement steps, 1262 out of 1412 predictions were initially consistent with experimental results. Among the inconsistent cases, 65 led to improvements, increasing the completeness and accuracy of the model. The final version of the model, called iAbaylyiv4, predicted accurately (1) 91% of the wild-type growth phenotypes, (2) 94% of the genome-wide gene essentialities, and (3) 94% of the phenotypic profiles of A. baylyi mutants on the tested media.
We developed a web interface which provides easy access to both model and experimental data. The interface allows browsing of the metabolic network, online computation of phenotype predictions, and comparison of predictions with experimental results [20