Metabolic engineering of microbial strains has been of great interest for producing a wide variety of chemicals including biofuels, polymer precursors, and drugs. While conventional metabolic engineering approaches often focus on modifications to the desired and neighboring pathways, recent developments in computational analysis of metabolic models allow identification of genetic modifications needed to improve production of biochemicals

[1],

[2],

[3]. Computational approaches, such as BNICE

[4] and BioPathway Predictor

[5], have been developed which enumerate novel biochemical routes for chemical production. Metabolic pathway-based approaches, such as elementary modes

[6] and extreme pathways

[7], have been used to design strains with improved chemical production (e.g., ethanol and carotenoids

[8],

[9]). Subsequent analysis of elementary modes finds those with desired behaviors and finds the genetic strategies that would force cells to utilize these desired modes

[10]. While advances have improved the efficiency of these pathway-based approaches, enumerating these pathways for genome-scale metabolic networks is still a very challenging task

[11].

To avoid this computational challenge, approaches like flux balance analysis (FBA)

[12], minimization of metabolic adjustment (MOMA)

[13], and regulatory on/off minimization (ROOM)

[14] use optimization to predict knockout mutant phenotypes. For example, MOMA was used to find knockout mutations that would improve lycopene and valine production in

*Escherichia coli* [15],

[16]. In these studies, either an exhaustive search (all possible combinations are evaluated) or sequential search (where a strategy with k+1 deletions is identified by evaluating the best strategy with k deletions combined with all single deletions) was used to find mutants with the highest predicted production using MOMA. A more recent bi-level approach based on MOMA was developed (OptGene)

[17], which uses a genetic algorithm to find mutants with improved production, and this approach was used to improve sesquiterpene production in

*Saccharomyces cerevisiae* [18].

A number of bi-level strain design approaches use mixed-integer programming (MIP) to efficiently identify the mutations needed to achieve the highest production rates, including OptKnock, OptStrain, OptReg, OptForce, and OptORF. These bi-level MIP approaches consist of an ‘outer’ problem and an ‘inner’ problem, where the outer problem optimizes an engineering objective function and the inner problem optimizes a cellular objective function. Frequently, the inner problem is FBA which is a linear programming (LP) problem. In MIP-based approaches, the inner FBA problem is converted into optimality constraints by formulating a dual LP of FBA and enforcing strong duality

[19]. OptKnock

[19] identifies a set of reaction deletions which couple cellular growth and biochemical production, so an increase in mutant growth rate requires an increase in biochemical production as predicted by FBA. Due to this coupling, adaptive evolution of these strains, where higher growth rates are selected, leads to higher biochemical production

[20]. Another approach, OptStrain

[21], uses a multi-step process to first identify non-native reactions that would improve the host organism's maximum production capabilities. Reaction deletions can then be found which couple production and growth in the modified host metabolic network. In addition to reaction deletions, OptReg

[22] identifies reaction activations or inhibitions (increase or decrease in fluxes) to suggest up-regulation or down-regulation of metabolic genes for enhancing biochemical production. Recently, this MIP problem was reformulated and solved efficiently using a successive linear programming approach (EMILiO)

[23]. Another MIP-based bi-level approach, OptForce

[24], searches for all possible reaction modulations to meet a pre-specified overproduction target and identifies a minimal set of flux changes that need to be forced through genetic manipulations. Recently, we developed an approach, OptORF

[25], that identifies gene deletion and overexpression strategies (instead of reaction deletions) by directly taking into account gene to protein to reaction association and transcriptional regulation. All of these bi-level MIP approaches, except OptForce, predict mutant behaviors by finding solutions that maximize growth, and so resulting strains would often need to undergo adaptive evolution to improve growth and chemical production, but they may have increased stability

[20]. On the other hand, strains that have been designed using MOMA would not require adaptive evolution and for some compounds non-evolutionary strategies may be needed if product and biomass formation cannot be coupled. So the choice of computational approach will likely depend on the product of interest and experimental strategies used for strain development.

In this study, we report two new MIP-based bi-level strain design approaches and solution techniques to improve their runtime performance. First, we present SimOptStrain which simultaneously considers gene deletions in a host organism and reaction additions from a universal database such as KEGG

[26] or MetaCyc

[27]. Previously, the OptStrain framework used a multi-step procedure to first identify a minimal set of non-native reactions to add to the metabolic network to achieve the theoretical maximum production (TMP) of a biochemical target (Step 1 in ), and then identify deletion strategies in the expanded metabolic network using OptKnock (Step 2 in ). The current multi-step OptStrain procedure may miss higher production strategies by not evaluating additions and deletions simultaneously. First, additions of non-native reactions that yield zero (Solution s1 in ) or suboptimal (Solution s2 in ) increases in the TMP of a host organism are not considered, even though such reactions may increase the biochemical production when coupled to cellular growth in a mutant strain. Second, addition of the minimal number of non-native reactions may not lead to the highest chemical production that can be found when coupled to cellular growth rate (Solution s3 in ). To overcome these limitations, we developed a new bi-level MIP approach which simultaneously identifies gene deletions in a host organism and reaction additions from a curated universal database, and demonstrate the utility of the approach for production of succinate and glycerol.

Second, we present a new quadratic bi-level MIP approach, BiMOMA, to identify gene deletions for improving biochemical production when MOMA is used as an inner problem (see ). MOMA has been used in metabolic engineering for predicting metabolic flux distributions in un-evolved deletion mutants, and resulting strains do not need to undergo adaptive evolution. Previous studies

[15],

[16],

[18] employed a sequential search or heuristic algorithms, such as genetic algorithms (used by OptGene), to identify gene knockout mutants with improved biochemical production. However, these approaches can be computationally expensive and may miss higher production strategies since the number of possible combinations is extremely large and the optimality of such methods is generally not guaranteed. Here, we develop a direct bi-level approach using mixed-integer quadratically constrained programming (MIQCP) and show we can efficiently identify knockout strategies for improved production of glutamate and pyruvate.

These bi-level computational approaches lead to MIP formulations that become intractable when the number of allowed modifications is large. Pre-processing and heuristic algorithms have been used to improve tractability

[17],

[28]; however, these methods sometimes converge to local optima and can miss better solutions. Here, we show how novel MIP techniques based on duality can significantly improve the performance of strain design approaches. We first illustrate the improvement in performance by applying the developed techniques to OptORF and comparing the results to those obtained using heuristic algorithms. We then apply the MIP techniques to the two new bi-level strain design approaches. In this work, we use ‘approaches’ to describe the strain design problems and ‘techniques’ to refer to the MIP solution methods used to solve the bi-level problems.