The genome-scale model for the secretory machinery of yeast was built using a bottom-up approach. We then used the model as scaffold to compare the secretion system of yeast and human. By using protein abundance data for yeast, we further utilized the model to estimate the metabolic demands associated to the processing of clients by the secretory machinery. Finally the specific activities of each molecular component of the machinery were calculated.
Defining Components and Subsystems of the Secretory Machinery
In our aim to integrate all available mechanistic knowledge into a scaffold for the study of the protein secretory machinery we used a bottom-up systems biology approach, which is based on collecting, assembling and integrating all relevant information and data by a combination of a comprehensive literature survey and searches in different databases ().
Workflow for the model reconstruction.
The resulting reconstructed network includes 162 proteins and one RNA component (SCR1
). These 163 components represent the core components of the protein secretory machinery that are directly involved in the translocation, folding, post-translational modifications and transport of the proteins as well as biosynthesis pathways leading to the precursors required for glycosylation and glycosylphosphatidylinositol (GPI) attachment (;; and Table S1
Schematic representation of Saccharomyces cerevisiae secretory machinery model.
The properties of the yeast secretory machinery model.
To reduce the complexity, we divided the machinery into 16 subsystems (S1–S16) based on the function that each subsystem performs (). In order to define the subsystems, we relied on the knowledge obtained from classical molecular biology experiments on specific proteins such as carboxypeptidase Y (CPY) 
, mating pheromone (alpha-factor) 
-ATPase (Pma1p) 
and alkaline phosphatase Phop8 (ALP) 
. Although, the procedure of reconstruction provided us with a systematic repository of mechanistic information, it also allows to highlights the knowledge gaps. The 16 subsystems cover all the secretory machinery processes such as translocation, folding, sulfation, glycosylation and sorting while Most of the subsystems are located in the ER (S1–S9) ().
The model contains 137 different reactions of which 56 are template reactions, 26 are complex formation reactions, 30 are biosynthesis reactions, and 25 are exchange reactions (Table S2
). The template reactions are protein-specific and they formulate all the PTMs and sorting reactions. The complex formation reactions describe the formation of protein complexes that are involved in the template reactions. The dolichol
-biosynthesis pathways, which provide the precursors for the glycosylation and the formation of GPI
-anchored proteins, include the biosynthetic reactions. (; Text S1
; Table S2
). A virtual system boundary was defined by formulating exchange reactions to separate the secretory machinery from other functional modules of the cell. These exchange reactions account for supply of co-factors and precursors needed for the modification, sorting and biosynthetic reactions (; Text S1
In the model reconstruction, we avoided lumping reactions in order to ensure proper gene-protein-reaction links for the individual steps. Furthermore, this allowed evaluating the role of individual steps, e.g. signal peptide recognition that has been shown to be the rate controlling step in translocation 
. The reconstructed network condenses our current knowledge of the protein secretory system and it can be expanded and improved when new components or steps are identified.
The PSIM (Protein Specific Information Matrix): A Knowledge Package for Modeling the Protein Secretory Machinery
Each secretory protein may contain in its sequence information for seven possible features: (1) the presence or absence of a signal peptide that indicates if the protein will be imported into the ER, (2) the number of N
-linked and (3) O
-linked glycosylation sites, (4) the number of disulfide bonds to be formed, (5) the presence or absence of anchoring with GPI (glycosylphosphatidylinositol), (6) the number of transmembrane spanning domains, and (7) the transport signal motif for the final localization (). Once these features have been established it is possible to determine which subsystems in the secretory machinery are required to processes each specific protein along the way to its functional destination (). The details and the assumptions made at this stage are given in the Text S1
The required information for some of the selected features is available in databases such as O-GlycBase
which contains the O
-linked glycosylation sites, or dbPTM, which integrates information about different post-translational modifications 
. The information in these databases is not organism-specific and contains only proteins that have been studied experimentally. UniProt, as a high-quality source for protein information 
, contains information for all the mentioned features, experimentally or computationally derived and it has been used as our main preferred information source. We extracted all the information for the seven selected features for the whole yeast proteome (Table S7
). This information was condensed into the P
atrix (PSIM). Each row in the yeast PSIM (5882×7) represents a specific protein and each column represents one of the seven selected features. Therefore, each matrix cell contains information for a specific feature for a specific protein (). The possible combinations of the seven different features define theoretical 186 secretory classes, with each secretory class representing a unique combination of the seven different features (; Figure S1
; see materials and methods and Text S1
). The PSIM is organisim specific and extendable to contain more features for other PTMs and protein maturation steps specific to other organisms’ secretory machinery.
Simulation of Yeast Secretory Machinery using the y-PSIM and Template Reaction List
Using the information condensed in template reaction list and secretory classes, we developed an algorithm (in Python programing language), which generates a protein specific reaction list for each protein (; Text S1
). These reaction sets represent post-translational modifications and sorting processes that each protein undergoes through the machinery in order to reach its final functional state and destination.
After assigning each protein to one of the predicted secretory classes, it was found that the ER-Golgi secretory machinery potentially can process 1190 proteins. The PSIM of these proteins was used as input to the algorithms and the protein-specific reaction list for each of the proteins was generated (, for the complete genome-scale protein reaction list see Table S9
Secretory classes can be divided into two main categories: The classes that have N-terminal signal peptide and the classes with signal sequence in their transmembrane domain, which are mostly plasma and endomembrane proteins. This classification is important as the proteins in each category differ in translocation mechanism, especially in the way they are targeted to the translocon complex 
(see Text S1
). From 1190 proteins, 683 of them are in the first category (SP+), 552 of them with known localization, and they fall into 34 out of the 104 secretory classes. The remaining 514 are in the second category (SP-) and they accommodated only in 9 secretory classes from 80 defined theoretical classes for this category ().
Comparative properties of the Yeast and Human secretory systems.
It is noticeable that the SP+ secretory classes are more diverse but less populated than the SP- classes. Many of the 162 core components of the yeast secretory machinery are themselves processed by the secretory machinery, 68 of the core components belong to 13 different SP+ secretory classes and 65 belong to 5 SP- secretory classes. The remaining 30 components are cytoplasmic proteins mainly involved in vesicular transport processes (See ; Table S3
for more details).
Although the conventional secretory machinery is quite complex, recent investigation on the eukaryotic secretion systems has shown that there are alternative secretory pathways (called unconventional pathways), adding complexity to the secretion process 
. For example, some of the yeast cell wall proteins have been confirmed to lack signal peptides (Nombela et al, 2006; Pardo et al, 1999) and in mammals the fibroblast growth factor 2 (FGF2) (that does not contain a signal peptide) uses an alternative pathway to reach the plasma membrane 
. It still remains to be resolved how many of these 1190 are the main clients of the conventional secretory machinery which is the focus of this study. Therefore, we assumed for now they only use the conventional secretory machinery to be processed and transported to their functional station.
Human PISM (h-PSIM) and Human Secretory Classes
One of the potential applications of the model is to be used as a scaffold for improving our understanding of the protein secretory machinery in other eukaryotic organisms such as humans. In order to illustrate this, we used the same approach to generate a PSIM for the human proteome (called h-PSIM, Table S8
), which has dimensions 44540×8. The human secretory machinery is far more complex, and it is also tissue specific. However, it has been shown that the secretory machinery components are well conserved from yeast to human 
, which justifies using the yeast model as a scaffold. As expected, human cells use more SP+ secretory classes (46 out of 186) compared to yeast (34 out of 186). In human, SP+ secretory classes contain more proteins than in yeast. shows the detailed relative distribution of proteins in the different classes in human and yeast.
In yeast and human, the fractions of the proteins which are in SP+ and SP- secretory classes are similar, For example in both human and yeast most of the plasma transmembrane proteins do not have signal peptide or almost all the extracellular proteins have signal peptide. However, this was not observed in the Golgi apparatus and the vacuole (or lysosome). () 
Comparison of secretory proteins distribution based on localization and secretory features information between yeast and human.
Also, it is interesting that the fraction of the SP+ and SP- classes that are using different PTMs features are similar in yeast and human ().
The SP- secretory classes with transmembrane proteins which do not have signal peptides, they use signal sequences in their transmembrane domains to enter the ER. On the other hand, many of the plasma and endomembrane transmembrane proteins belong to SP+ classes.
Functional Properties of the Secretory System in Yeast and Human Cells
The extension of the approach to explore the protein secretory machinery in human cells provides a systematic platform to investigate the distribution of secretory proteins in the different classes for both organisms ().
Having defined the yeast and human SP+ and SP- secretory classes we performed a GO (gene onthology) enrichment analysis (see Materials and Methods), in order to evaluate biological functions of the proteins in the different secretory classes. Comparing GO enrichment for yeast proteins secreted by the SP- and SP- secretory classes () we found that GO terms related to the cell wall organization and biogenesis show the most statistically significant (lowest p-value) enrichment in the SP- secretory classes (; Table S10
). Yeast cells are surrounded by a rigid and thick (~200-nm) but also dynamic wall structure made of glycans and mannoproteins, which plays a key role in keeping the cell shape and integrity, maintaining osmotic stability, enable flocculation and adherence 
. The yeast cell wall comprises 15–30% of the cell dry weight and its main components are different glycans and secreted proteins 
. In addition, it is claimed that 20% of the yeast genome deals with cell wall biogenesis 
. All this evidence is consistent with the enriched GO terms in the conventional secretory machinery being related to cell wall biogenesis.
GO enrichment analysis of SP+ and SP- secretory classes in yeast and human.
GO enrichment analysis for the SP- secretory classes shows that these proteins mainly are involved in transport and localization processes such as transmembrane transport (ion transport), vesicle mediated transport dealing with protein localization (COPI, COPII, SNARE complex etc.) etc. (; Table S10
We also performed GO enrichment analysis for the human SP+ and SP- secretory classes. The results for the SP+ secretory machinery in human cells show, in contrast to yeast, where all the proteins in this group are annotated, that there are 2,557 non-annotated proteins containing a signal peptide (about 50% of all potential secretory proteins). Focusing on the annotated proteins, some of the GOs that indicate a statistically significant enrichment are those related to receptor binding, cytokine activity, hormone activity etc. (; see Table S14
For proteins belonging to the human SP- secretory classes 3,003 proteins are not annotated (~60%), whereas GO terms related to signalling are the most enriched among these proteins (; see Table S13
Energy and Metabolic Demand Estimation of the Secretory Machinery
The other impotent potential applications of the reconstructed genome-scale network for the secretory machinery is to estimate the usage of various co-factors (ATP and GTP) and metabolic precursors for glycosylation or sulfation such as GDP-man or FADH2. This allows linking the secretory machinery with the rest of the cellular metabolic processes. Using protein abundance data for yeast 
we calculated the metabolic precursor costs for each of the proteins passing through the machinery (cell−1
) (, Table S4
). GTP usage accounts for the amount of the energy needed for the translocation and transportation through the machinery 
, and therefore proteins (or their corresponding secretory classes) with high GTP usage generally have more vesicular transport steps before the proteins reach their final localization. ATP is used for degradation and folding 
and FADH2 
is used in connection with disulfide bond formation (see the Materials and Methods). The estimation of co-factor usage is based on the potential 11,591 protein specific reactions needed to process the 552 SP+ proteins. However, only 259 of these proteins have available abundance data. The reminding 291 proteins are likely to be either non-present or very low abundant and we therefore set their abundance arbitrary to one protein per cell. Hereby we could keep these secreted proteins in the model for annotation purposes but in our model they had a very minor contribution in estimation of the metabolic costs. Based on this we estimated the metabolite consumption as cell-1 h-1 for each subsystem (). We considered UB (Ubiquitin) as a metabolite as it is used as a precursor for labeling mis-folded proteins targeted for degradation. The Dolichol pathway uses precursors from lipid metabolism (dolichol synthesized from farnesyl-PP) 
, whereas the central carbon metabolism and nucleotide metabolism provide three different nucleotide-activated sugar donors for the dolichol pathway including: UDP-N-acetylglucosamine (UDP-GlcNAc) (provided by the Leloir pathway) 
, GDP-mannose (GDP-Man) 
and UDP-glucose (UDP-Glc) 
. The supply of all these metabolites has been reported to be flux controlling 
. In order to estimate the demand for dolichol pathway metabolic precursors, we calculated the amount of core glycan that is needed for the glycosylation of all the predicted glycosylation sites in proteins that pass through the secretory machinery.
Estimation of the secretory machinery metabolic demands.
In addition, we calculated the metabolic costs of the dolichol and GPI biosynthesis pathways separately to give a better resolution of these two biosynthetic pathways that are connecting the secretory machinery to the metabolic network. Dol-p-man (dolichyl phosphate mannose) and UDP-GlcNAc (Uridine diphosphate-N-acetylglucosamine) are the two metabolites that connect these pathways (; Table S4
). While we calculated the metabolic demands for each subsystem, we also explored the most abundant proteins passing through the secretory pathway (see Table S5
), and it is interesting that the two most abundant proteins in the yeast cell are secretory proteins. Cwp2p (UniProt: P43497) is the most abundant protein in the cell and it is a very short GPI-anchored mannoprotein (90 aa) which is the major constituent of the cell wall (clustered in secretory class 102). The second most abundant protein is Pma1p (UniProt: P37367), which is a plasma membrane P2-type ATPase that pumps protons out of the cell (905 aa, clustered in the secretory class 178) (see Figure S3
for other proteins). It is interesting to note that Pma1p does not have a signal peptide and is potentially secreted via the alternative secretory pathway. Most of the other highly abundant proteins in the yeast cell are involved in metabolism; chromatin assembly and translation 
. It is noticeable that among the machinery subsystems, ERAD and COPI subsystems both have a high average protein abundance regarding their involved components compare to the other subsystems (Figure S4
We are aware that our model represents a simplification so it is important to note that our estimations of precursor requirements, are based on current knowledge on the yeast secretory machinery and accordingly they are uncertain for subsystems like folding or ERAD for which we do not have protein specific stoichiometry. Also in terms of glycosylation there may be uncertainties as not necessarily all glycosylation sites are being used all the time 
We also estimated the metabolic costs of processing the whole set of proteinspresent in some cellular compartments which are secretory machinery clients (). The results shows that secretory proteins connected to the cell wall with GPI-anchored chains are the most costly proteins in terms of folding, PTMs and transport steps. This is also in accordance with the GO enrichment analysis (). The ER and vacuole proteins are the second most costly group. Interestingly, the results show that single-pass membrane proteins have higher processing costs than the multi-pass proteins, and proteins targeted to the ER and the vacuole membranes have higher metabolic demands than proteins targeted to the cell membrane. This ration can change if we include the cost for SP- classes’ proteins to the calculation. We also calculated the synthesis cost (ATP and NADPH) of the secretory proteins, and this showed that the ER proteins (especially those located in the lumen) have the highest synthesis cost and GPI-anchored proteins localized in the cell wall have the second highest synthesis costs (). As for metabolic costs the single-pass transmembrane proteins have higher synthesis costs than the multiple-pass transmembrane proteins (). Both the ER and the cell wall have proteins with high abundance and many PTM features.
Evaluation of Engineering Strategies for Improving the Secretory Machinery
Metabolic engineering of the secretory pathway is often based on altering the expression of some of the machinery components with the objective to increase secretion of a particular protein (often a heterologous) 
. Two key aspects to consider in this process are choosing the proper target(s) and optimizing the expression level. Although, many improvements have been done in this area, a systems biology approach may give a holistic picture of the secretion system and hereby suggests new targets for metabolic engineering 
. To evaluate the activity of the individual components of the secretory pathway we used the steady-state protein abundance data 
and our protein-specific reaction list to estimate the activity of the functional components of the system. A specific activity (SA) measure for each component was defined as the number of its catalytic cycles per cell per hour, in steady-state (see Materials and Methods). The SA for each component is a function of its abundance and the amount of the proteins that it catalyzes in steady state per cell per hour (Figure S2
). A logarithmic histogram of the SA for the different machinery components shows that the SA follows a normal distribution (µ
~2.2 and o'
~0.7) (). Accordingly, there are few proteins with high SA and evaluation of the proteins with highest specific activities shows that they are not limited to a specific subsystem ().
The specific activity (SA) network of the components of the yeast secretory machinery at exponential growth.
The components with high specific activity (with log (SA)>3).
shows a graph representing the connectivity between the subsystems and components of the yeast secretory pathway with their SA activity mapped to the node color (components). Some of the components are involved in several subsystems (such as Kar2p) and they are expected to have a higher impact on the function of the machinery if their expression level gets modified. On the other hand, the overexpression of proteins with high SA (which process a high number of molecules per unit of time) is also expected to have a higher impact than overexpression of proteins with lower SA.
For example, in the protein folding subsystem the Lhs1p is the least abundant (~139 molecules) component with the highest SA (~
) and Kar2p has a high abundance (~336941 molecules) with low SA (
). Kar2p is the main chaperon in the ER 
. Lhs1p and Sil1p (2420 molecules and with a high SA of
) are two NEFs (nucleotide exchange factors) which have ATPase activity and regulate the Kar2p ATP turnover 
. Each time Kar2p performs a catalytic cycle, it needs the presence of Lhs1p and Sil1p to restart a new cycle. However the mentioned NEFs have high SA (much lower abundances than Kar2p) and it is therefore likely that their activity is a bottleneck for the activity of Kar2p. As the ER is crowded, over-expressing these proteins with low abundance and high SA could therefore be more effective than the overexpression of KAR2
. There is some evidence in favor of the effect of the modulation of these chaperones in improving heterologous protein production 
. On the other hand, it has been shown that over-expression of KAR2
has not positive effect on the secretion level, while decreasing its expression shows negative effect 
In summary, for the production and secretion of a particular protein in yeast as a cell factory, the reconstructed model provides the three type of information including: the secretory class that targeted protein belongs which enables to have a list of mechanistic specific reactions with the catalyzing components, the estimation of the metabolic demands associated to maturation and sorting steps and the SA information about the natural capacity of the involved machinery component in corresponding processes. This information advances designing strategies to engineer the secretory machinery with the objective of high production rate.
In this work, we applied, for the first time, a genome-scale modeling approach to study the complexity of the eukaryal protein secretion pathway. We used a bottom-up network reconstruction method. The model contains detailed mechanistic knowledge of the secretory machinery and can be used to integrate -omics data in order to achieve a better understanding of the eukaryal secretion system. Identifying secretory classes allowed grouping the secretory proteins based on their PTMs and sorting features. Furthermore, generating protein-specific reaction lists and combining these with yeast protein abundances enabled estimation of the metabolic demands of the secretory machinery in a protein-specific manner. Additionally, the SA (specific activities) of the machinery components were estimated which provides information about the natural capacity of the machinery components catalytic activity.
In a nutshell, the reconstruction approach and the ‘PSIM’ matrix provide a framework for (i) capturing the genome-scale mechanistic details of the secretory machinery; (ii) integrating and analysing high-throughput data for evaluation of the function of different parts of the machinery and hereby increasing our knowledge of systemic properties; (iii) offering a systems biology framework for engineering industrial and therapeutic protein secretion strategies; (iv) and finally for connecting the model to other cellular processes such as metabolism.