Triterpenoids are a large class of isoprenoidal natural products present in higher plants. Among them, oleanane type triterpenes, which are produced from β-amyrin, are one of the most common triterpenes, along with ursane type triterpenes produced from α-amyrin. β-amyrin in particular serves as the olefin precursor to a wide range of downstream products. The action of oxidative enzymes and glycosyltransferases convert β-amyrin to various triterpene saponins. These saponins exhibit a wide range of both structural diversity and biological activity (antimicrobial, insecticidal agents) and therefore are regarded as important and promising sources for medicinal compounds. The effect of plant saponins on low-density lipoprotein cholesterol absorption and arterial atherosclerosis has received much attention, leading to the development of several cholesterol-reducing dietary supplements 
. The formation of these complex carbon skeletons through a series of protonation, cyclization, rearrangement and deprotonation reactions of 2,3-oxidosqualene is well documented in the famous biogenetic isoprene rule 
. Although triterpene synthases have been expressed in microbial hosts such as S. cerevisiae
there has been little effort made so far to engineer the metabolism of a microbial host for enhanced production of triterpenes. Imbalances in gene expression can lead to –over or –under production of enzymes in the pathway, accumulation of toxic metabolic intermediates, and metabolic burden on the host, all of which result in suboptimal product titers 
. A novel metabolic engineering strategy for designing a triterpenoid-yeast-production-platform is presented here based on the whole genome sequencing of S. cerevisiae
CEN.PK recently completed by Otero et al 
The non-synonymous SNPs, the so called non-silent SNPs, which are single nucleotide variations in the coding regions that gives ‘birth’ to amino acid mutations, are often involved in the modulation of protein function. Understanding the effect of individual amino acid mutations on a protein/enzyme function or stability is useful for altering its properties for wide variety of engineering studies. Since measuring the effects of mutations experimentally is a laborious process, a variety of computational methods and algorithms have been devised to predict these effects in silico 
. Bioinformatics approaches to predict the effect of mutations on protein stability utilizes the sequence alignment information of evolutionarily related sequences 
or protein families or rely on physicochemical modeling of the mutation augmented by information obtained from statistical analyses of protein sequences and three-dimensional structures 
. Computational approaches for predicting the effect of amino acid mutations has proven to be surprisingly successful, with a wide range of studies supporting them 
. Different computational algorithms provide valuable insights to explore relationships between beneficial mutations and phenotypic variation and speed up both fundamental and industrial applied research 
, and HFA1
genes are part of the sterol and fatty acid biosynthesis in S. cerevisiae. S. cerevisiae
CEN.PK contains an unusually high content of ergosterol and fatty acids compared to other S. cerevisiae
. When Otero and colleagues 
compared the genome-wide sequence of CEN.PK with S288C they identified a number of SNPs in these 3 genes. Our hypothesis in this study was that these SNPs are linked to the observed phenotype in CEN.PK, by the formation of more efficient Erg8, Erg9 and HFA1 proteins, influencing the flux towards the two pathways. Our hypothesis was supported by the use of an array of computational tools that there is a positive effect of the nsSNPs on the protein structure-stability-function of the Erg8, Erg9 and HFA1.
codes for a phosphomevalonate kinase, an essential cytosolic enzyme which catalyzes the reaction ATP+(R
)-5-diphosphomevalonate. An indirect over-expression of Erg8
through an enhanced activity of UPC2 (a global transcription factor regulating the biosynthesis of sterols in S. cerevisiae
) for terpenes production has been studied by Ro et al 
. However, UPC2 as a single modification had only a modest effect on amorphadiene production. A negative effect of the enhanced UPC2 activity on the epicedrol production, a sesquiterpene originating from FPP, was observed by Jackson et al 
. However, in the present study the direct over-expression of Erg8
resulted in higher ergosterol content than the control strain during growth on glucose, which was then reflected in the ethanol phase in the 1.6-fold higher production of β-amyrin compared to the control strain.
codes for a squalene synthase that joins two farnesyl pyrophosphate moieties in the reaction 2 farnesyl diphosphate
diphosphate+presqualene diphosphate. There have been several studies targeting Erg9
as an attempt to increase precursor availability for terpenes production. In the case of Shimada et al 
disruption of the Erg9
gene as a single modification in Candida utilis
had no significant effect on lycopene production. On the other hand Paradise et al 
increased by 5-fold the production of amorphadiene by down-regulating the Erg9
, however this improvement was in a strain background with several other genetic modifications. In line with the above two studies were the effects of Erg9
over-expression in the β-amyrin production observed here. While Erg9
over-expression as a single metabolic engineering strategy had no positive effect on β-amyrin production, in combination with Erg8
over-expression there was a 2.4-fold improvement compared to the control strain.
The HFA1 is a mitochondrial acetyl-coenzyme A carboxylase that catalyzes the production of malonyl-CoA in fatty acid biosynthesis through the reaction ATP+acetyl-CoA+HCO3 −
ADP+phosphate+malonyl-CoA. Interestingly, by enhancing the expression level of HFA1 the production level of β-amyrin was improved by 1.2 times. Kizer et al 
engineered an E. coli
strain to produce high levels of terpenoids, however, further optimization led to an imbalance in carbon flux and the accumulation of the pathway intermediate 3-hydroxy-3-methylglutaryl-coenzyme A (HMG-CoA), which proved to be toxic to E. coli.
Their results indicated that HMG-CoA inhibits fatty acid biosynthesis in the microbial host, leading to generalized membrane stress. The cytotoxic effects of HMG-CoA accumulation could be counteracted by the addition of palmitic acid and oleic acid, and it is possible that the positive effect of HFA1
over-expression in ergosterol and β-amyrin levels that we observed in our study to be a mechanism of the cell to deal with the high HMG-CoA concentrations. Over-expression of HFA1
with concomitant over-expression of Erg8
led to the highest production of β-amyrin in between all the single and double over-expression constructs, while the final concentration was 3.5-fold higher than the control strain. Further improvement in the β-amyrin production level was achieved by the triple over-expression construct.
In summary we have created a strain of S. cerevisiae
capable of producing 500% more β-amyrin than the control strain by the simultaneous over-expression of Erg8
. To the best of our knowledge the only metabolic engineering work applied for β-amyrin production has been performed by Kirby et al 
. By manipulating the two key enzymes in the pathway, HMG-CoA reductase and lanosterol synthase, Kirby and colleagues improved the β-amyrin production by 50%. This was a 10-fold lower improvement than the one achieved through our metabolic engineering strategy. However, in the study of Kirby et al
the final titer of β-amyrin was 6 mg/L 
In addition to the above modifications, a careful inspection of the metabolic pathways that include the acetyl-CoA molecule for SNPs could reveal more targets for redirecting the fluxes towards the mevalonate pathway. The supply of acetyl-CoA has been shown as an important parameter for the production of many secondary metabolites and in particular terpenoid molecules, as Shiba et al 
demonstrated in their study.
However, it is important also to stress out that despite the very encouraging results from integrating protein computational analysis with metabolic engineering, there is a clear need for further experimental verification of our hypothesis. In order to increase our confidence that the SNPs in the three proteins are responsible for the differences observed in the ergosterol level between the strains, we should create point mutations in the CEN.PK genes to construct the respective version of the S288 strain and examine if the phenotype of S288 is restored in the CEN.PK and vice versa. This could potentially demonstrate the role of the SNPs in a flux level. Additionally, an isolation of the different versions of the S288 and CEN.PK proteins and the evaluation of their in vitro activity against their natural substrates would definitely strengthen the computational predictions regarding the beneficial effects caused by the SNPs in the proteins of CEN.PK. It would also be of interest to overexpress Erg8, Erg9 and HFA1 in S288 and compare the obtained levels of β-amyrin in S288 and CEN.PK which may point out other limitations in creating a yeast β-amyrin hyper-producer.
In this work we propose that high-throughput genome sequencing of S. cerevisiae
may serve as a commonplace tool, complimentary to transcriptomics and physiological characterization, to extract direct genotype to phenotype information. The analysis presented here serves as a foundation for comparative metabolic engineering SNP analysis, where in the future reference strains may be compared to their metabolically engineered derivatives that use directed evolution in order to answer what changes have made a strain a preferred microbial cell factory. Future work must also expand to the SNP analysis presented in the paper of Otero et al 
to include all 13,787 SNPs, realizing phenotypic observations may not necessarily be linked directly to metabolic SNPs, but rather SNPs affecting larger regulatory mechanisms and networks, such as those governed by transcription factors.