Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2800989

Formats

Article sections

Authors

Related links

J Theor Biol. Author manuscript; available in PMC 2010 December 21.

Published in final edited form as:

Published online 2009 September 16. doi: 10.1016/j.jtbi.2009.09.003

PMCID: PMC2800989

NIHMSID: NIHMS146463

The publisher's final edited version of this article is available at J Theor Biol

See other articles in PMC that cite the published article.

Post-translational modification of proteins plays a central role in cellular regulation but its study has been hampered by the exponential increase in substrate modification forms (“modforms”) with increasing numbers of sites. We consider here biochemical networks arising from post-translational modification under mass-action kinetics, allowing for multiple substrates, having different types of modification (phosphorylation, methylation, acetylation, etc) on multiple sites, acted upon by multiple forward and reverse enzymes (in total number *L*), using general enzymatic mechanisms. These assumptions are substantially more general than in previous studies. We show that the steady-state modform concentrations constitute an algebraic variety that can be parameterised by rational functions of the *L* free enzyme concentrations, with coefficients which are rational functions of the rate constants. The parameterisation allows steady states to be calculated by solving *L* algebraic equations, a dramatic reduction compared to simulating an exponentially large number of differential equations. This complexity collapse enables analysis in contexts that were previously intractable and leads to biological predictions that we review. Our results lay a foundation for the systems biology of post-translational modification and suggest deeper connections between biochemical networks and algebraic geometry.

Post-translational modification (PTM) of proteins is a central regulatory mechanism in eukaryotic cells [44]. Although phosphorylation was the first modification to be discovered [12, 24], and remains the best studied, proteins are subject to other types of modification by covalent attachment of molecules to the side chains of amino acid residues. The modifiers include other small molecules, as in methylation, acetylation and sulfation, complex sugars, as in glycosylation, and small protein moieties, as in ubiquitin-like modification [44]. It has become increasingly clear that these modifications work together to orchestrate cellular function [20].

PTMs are dynamically maintained. Forward enzymes, which transfer modifiers from donor molecules to specific residues, are usually competing against reverse enzymes, which hydrolyse modified residues, detaching the modifier and returning it to the pool from which donor molecules are synthesised. In the case of phosphorylation, these enzymes are protein kinases and phosphoprotein phosphatases, respectively. These so-called “futile cycles” of modification and demodification can use energy to keep the concentrations of modified substrates far from equilibrium, testifying to the importance of such processes as regulatory mechanisms.

A given protein may be modified on multiple sites. For instance, the transcription factor and tumour suppressor p53 has 18 serine/threonine sites that can be phosphorylated and 10 lysine sites that can accommodate acetylation, methylation and attachment of ubiquitin, SUMO and NEDD [25]. O-linked modifications like phosphorylation, on residues like serine, threonine or tyrosine, are digital—at most one phosphate group is attached to each residue—but N-linked modifications like methylation, on residues like lysine or arginine, can be more complex [44]. Ubiquitin, in particular, can form linear and branched poly-ubiquitin chains connected by iso-peptide linkages through lysine residues. In general, an individual molecule with *n* sites of modification may be in one of several global states of modification (“modforms”), whose numbers increase exponentially with *n*. Cartoon diagrams often pick one of these modforms, usually the maximally modified one, to depict the state of a protein in the cell. However, there is always a population of such molecules present and individual molecules in the population may be in different states. The state of the population is best described as a frequency distribution over these single molecule states. We call this the modform distribution. Not all modforms will necessarily be present at any one time but this begs the question of which modforms are present and of how the relevant forward and reverse enzymes cooperate to shape the modform distribution. The methods of this paper were developed in part to address such questions.

The combinatorial explosion in the number of potential modforms is a challenge for both experiment and theory. Mass spectrometry techniques have only recently begun to provide data on modform distributions [32, 33], prompted, in part, by the growing realisation that different modforms may have distinct biological effects [31, 34, 45]. The theoretical challenge lies in the complexity of any mathematical model, which must accommodate some number, *L*, of forward and reverse enzymes and some total number, *N*, of modforms targetted by those enzymes. Additionally, the biochemistry of modification and demodification usually requires intermediate enzyme-substrate complexes associated to each enzyme and its substrates. If there are *P* such intermediate complexes, then the model will have *L*+*N*+*P* state variables, where *N* and *P* grow exponentially with *n*, while *L* is relatively much smaller:

$$L\ll N,P\phantom{\rule{thinmathspace}{0ex}}\propto \phantom{\rule{thinmathspace}{0ex}}{a}^{n},\phantom{\rule{thinmathspace}{0ex}}\text{for some}\phantom{\rule{thinmathspace}{0ex}}a\ge 2.$$

.

Because the dynamical equations are non-linear, these models cannot be analytically solved for the temporal trajectories of the variables. Simulation provides the widely used alternative to analytical solution. The trajectories of the system can be calculated by numerical integration once the site-specific rate constants have been given values. These values have usually not been measured and it may be necessary to search through the space of rate constants to determine whether a particular behaviour is robust to the choice of rate constant values [18]. Calculations of this kind rapidly become infeasible with increasing *n*, which has limited simulation studies to systems with few sites. These difficulties have made it hard to see what, if any, general principles lie behind the widespread use of multisite modification systems in cellular regulation.

In this paper we show that, if attention is restricted to the steady states of a multisite PTM system, then it is not necessary to numerically integrate *L* + *N* + *P* differential equations but only to solve *L* algebraic equations (Theorem 3). Furthermore, this reduction can be carried out without having to specify the rate constant values in advance. For instance, for the case of two enzymes, the steady states can be found by an analogue of the nullcline analysis that is widely used for two-dimensional systems of differential equations [40]: the steady states correspond to the intersections of two curves in the plane, no matter what the number of sites. This exponential reduction in complexity leads to biological insights in contexts that were previously intractable, as reviewed in §4, and provides a new theoretical foundation for studying multisite PTM systems.

The present paper generalises and conceptually simplifies a method that emerged in previous work for systems with two enzymes and one substrate [27, 41]. It allows for multiple enzymes, multiple types of modification, multiple substrates and complex biochemistry of modification and demodification. These assumptions are substantially more general than any used previously in the literature, [11, 14, 16, 22, 26, 28, 37, 38]. The main restrictions in terms of applicability of our methods are the requirements that enzymes cannot also be substrates and that the recharging mechanism for each modification should keep the concentration of donor molecules constant on the time scale of steady state formation. The latter requirement is widely believed to hold for phosphorylation, in which the donor molecule is ATP, and it has been taken for granted in all mathematical models of phosphorylation, but its validity for other forms of modification appears not to have been widely studied.

The two most significant examples that currently fall outside the scope of our analysis are kinase cascades and ubiquitin-like modifications. Both violate the first requirement by relying on chains of enzymatic modification. We discuss how our results might be extended to such cases in §4. It is an interesting question whether ubiquitin-like modifications satisfy the second requirement. Peptide modifiers are synthesised by mRNA translation rather than by the cell’s central metabolism, as is the case for small molecule modifiers, and little is known about how effectively this translation process is buffered against varying demand.

Our method of proof is one of hierarchical elimination of variables from the steady-state equations, which, because of mass-action, are polynomials in the variables. The intermediate enzyme-substrate complexes are first eliminated in favour of the substrates and the enzymes (Proposition 1). Using the results of this, the substrates are then eliminated in favour of the enzymes (Proposition 2). Each elimination step is framed as the solution to a system of linear equations (Lemma 2) but is undertaken over an extension field of the real numbers, which carries the nonlinearity that is present in the underlying steady-state equations. This allows us to solve a nonlinear system of equations using essentially linear methods. The extension field also permits the rate constants to be treated symbolically during the elimination. The Matrix-Tree theorem (Theorem 1) plays a key role in identifying the nonlinear coefficients that arise in the elimination steps. This important graph-theoretic result goes back to the 19th century but was first stated in the form we use it in 1948 [43]. It does not seem to have been previously noticed that the Matrix-Tree theorem immediately implies the famous King-Altman method, developed in 1956 [23], for calculating the rate function of an enzyme [5]. We find it intriguing that the method we use to analyse multisite PTM systems is so closely related to a key result of classical biochemistry. §2 reviews the graph theoretic preliminaries, including the Matrix-Tree theorem, before the main results are presented in §3.

Our results are not derivable from any existing mathematical theories of biochemical reaction networks, such as Chemical Reaction Network Theory (CRNT) [9, 15], more recently developed injectivity methods [8, 39] or Monotone Systems Theory [1]. We believe that they are best interpreted in algebraic geometric terms, as discussed at more length in an earlier paper [27]. Under the mass-action kinetics used here, a network of biochemical reactions gives rise to a polynomial dynamical system. Hence, for given rate constant values, the steady states form a real algebraic variety [6]. Despite this, the use of algebraic geometric methods to study biochemical networks has been surprisingly limited, the most interesting exception being the use of toric varieties to reinterpret the Deficiency Zero theorem of CRNT [7, 13]. For multisite PTM systems, we show that the variety of steady-state modform concentrations can be parameterised by *L* rational functions (Theorem 4). A rational parameterisation provides an explicit description of points on a variety, in contrast to their implicit definition as solutions of polynomial equations. Rationally parameterisable varieties are rare and of considerable interest in their own right. The rationality of multisite PTM systems suggests that algebraic geometry may provide powerful tools for analysing biochemical reaction networks and overcoming molecular complexity. We may hope, thereby, to better see the biological wood for the molecular trees.

In this paper we will analyse systems of ordinary differential equations arising from networks of biochemical reactions under mass-action kinetics. The rate constants and dynamical variables are usually treated as real variables in . In our analysis, we will make use of certain directed graphs whose edges have labels like *aX*, where *a* is a rate constant and *X* is a dynamical variable in steady state. These labels must satisfy a positivity condition, expressed in Lemma 1 below. The rate constants can reasonably be taken to be positive but whether or not a dynamical variable is positive in steady state is more delicate. Even if a system is started with all its variables positive, it may not be persistent within the positive orthant and may reach a steady state on the boundary in which some variables are zero. To avoid having to rule out such situations and thereby limit our analysis, we take a more algebraic approach. We treat the rate constants and the dynamical variables in steady state as uninterpreted symbols in an appropriate extension field of . We show that the calculations can be carried out over this extension field and, having done them, we then give real values to the symbols and draw conclusions over . While this avoids the problem and brings added benefits, it incurs some technical cost. The reader will not lose much by assuming that all calculations take place over and ignoring the problems that arise with loss of positivity in the dynamical variables. The symbolic calculations only appear in §3.3 and §3.4; rate constants and dynamical variables are treated as real variables elsewhere.

For more information about the algebraic methods reviewed here see, for instance, [19]. For any finite set *Q* = {*q*_{1}, …, *q _{n}*}, [

$${q}^{\alpha}={q}_{1}^{{\alpha}_{1}}\cdots {q}_{n}^{{\alpha}_{n}},\text{with}\phantom{\rule{thinmathspace}{0ex}}{\alpha}_{i}\ge 0.$$

(*Q*) will denote the field of rational functions in the elements of *Q*: the smallest field in which the symbols in *Q* can be added, subtracted, multiplied and divided as if they were (non-zero) numbers. Equivalently, (*Q*) is the field of fractions of [*Q*]: each element *f* (*Q*) can be expressed as the ratio of two polynomials, *f* = *p*_{1}/*p*_{2} where *p*_{1}, *p*_{2} [*Q*]. [*Q*] sits inside (*Q*) in the obvious way: *p* = *p*/1.

For various finite sets *Q*, we will use elements of [*Q*] as labels in graphs, and use (*Q*) as a field over which we solve linear equations. For instance, *Q* may consist of rate constants. To relate calculations over [*Q*] or (*Q*) back to biology, the symbols ultimately have to be given real values. This is not a problem for the polynomial ring [*Q*]. Any assignment, ι : *Q* → , extends to a homomorphism of rings ι : [*Q*] → . Hence, any symbolic algebraic expression in [*Q*] gives rise to a corresponding expression in . However, the field of rational functions, (*Q*), contains elements like 1/(ι(*q*_{1}) − *q*_{1}), which become undefined in no matter what assignment of real values are made to elements of *Q*. To infer an expression over , it is essential to show that nothing blows up in this way.

We make use of S-positivity to do this. A polynomial *p* [*Q*] is said to be S-positive (“sum positive”) if it is a non-zero sum of positive monomials. That is, p is S-positive if *p* = ∑_{α}*c*_{α}*q*^{α} ≠ 0 and if *c*_{α} > 0 whenever *c*_{α} ≠ 0. A rational function in (*Q*) is said to be S-positive if it can be expressed as the ratio of two S-positive polynomials. If the elements of *Q* are given positive real values, which is biochemically realistic for rate constants, then any S-positive rational function in (*Q*) will be well defined over . Note that, if *Q* = , so that [*Q*] = , then *x* [*Q*] is S-positive if, and only if, it is positive in the usual sense.

If *p* = ∑_{α}*c*_{α}*x*^{α} [*Q*] then *p* = 0 means that *c*_{α} = 0 for all α. If ι : *Q* → , then *p* = 0 in [*Q*] implies that ι(*p*) = 0 . The converse, of course, is false: if ι(*p*) = 0 it does not imply that *p* = 0 [*Q*]. However, if *p* ≠ 0 then the variety in * ^{n}* corresponding to the set of solutions of

If ι(*p*) = 0 for all ι : *Q* → ^{+}, then *p* = 0 [*Q*].

A labelled, directed graph is a triple, (*V*, *E*, ), where *V* is a finite set of nodes, *V* = {*v*_{1}, …, *v _{n}*},

If *G* is a labelled, directed graph, *G*^{} will denote the corresponding unlabelled, undirected graph, in which the direction of the edges is forgotten and multiple edges between the same vertices are merged. We use *v _{i}* ↔

There is a bijective correspondence between matrices and labelled, directed graphs. If *A* is a *n* × *n* matrix over , let (*A*) be the associated labelled, directed graph with nodes {1, …, *n*} and labelled edges $j\stackrel{{A}_{\mathit{\text{ij}}}}{\to}i$, if, and only if, *A _{ij}* ≠ 0. Note that entry

If *G* is a labelled directed graph then its Laplacian matrix, (*G*), is given by (*G*) = (*G*) − diag(1.(*G*)). Here, 1 denotes the all 1’s row vector of the appropriate dimension, 1 = (1, …, 1), and diag(*v*), where *v* is a row vector, denotes the diagonal matrix with *v* on the main diagonal

$$\text{diag}{(v)}_{\mathit{\text{ij}}}=\{\begin{array}{cc}{v}_{i}\hfill & \text{if}\phantom{\rule{thinmathspace}{0ex}}i=j\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}$$

The Laplacian encodes much information about graph structure; see, for instance, [3], but note that the conventions used here are different. Note also that, by construction, 1.(*G*) = 0.

Let G be a labelled, directed graph on n vertices with no self-loops in which each non-zero label is a S-positive element of [Q]. If G is strongly connected then the rank of (G) over (Q) is n − 1.

Let *u* = (*u*_{1}, …, *u _{n}*) (

$$\mathcal{L}{(G)}_{\mathit{\text{ii}}}=-{\displaystyle \sum _{k\ne i}\mathcal{L}{(G)}_{\mathit{\text{ki}}}}.$$

Hence, for the *i*-th column of *v*.(*G*) = 0,

$$\sum _{k\ne i}({v}_{k}-{v}_{i})\mathcal{L}{(G)}_{\mathit{\text{ki}}}=0.$$

(1)

So far, this has all been symbolic, in [*Q*]. Now suppose that the symbols in *Q* are given positive real values and, suppressing the corresponding assignment ι : *Q* → ^{+} for readability, let us consider the corresponding system of column equations to (1) in . Since the *v _{i}* are now real numbers, there is a smallest,

$$\sum _{k\ne m}({v}_{k}-{v}_{m})\mathcal{L}{(G)}_{\mathit{\text{km}}}=0,$$

each non-zero term is the product of a nonnegative quantity, *v _{k}* −

Some condition on the labels is necessary for the conclusion of Lemma 1. The following labelled, directed graph on {1, 2, 3}, with labels in , is strongly connected

$$1\stackrel{1}{\to}2,\text{\hspace{1em}}2\stackrel{2}{\to}1,\text{\hspace{1em}}1\stackrel{1}{\to}3,\text{\hspace{1em}}3\stackrel{2}{\to}1,\text{\hspace{1em}}2\stackrel{-1}{\to}3,\text{\hspace{1em}}3\stackrel{-1}{\to}2,$$

but its Laplacian has rank 1, not 2:

$$\left(\begin{array}{ccc}-2& 2& 2\\ 1& -1& -1\\ 1& -1& -1\end{array}\right)\phantom{\rule{thinmathspace}{0ex}}.$$

If *M* is any *n* × *n* matrix, we can construct a labelled directed graph with no self-loops by ignoring the labels on the main diagonal: *G* = (*M* − diag(*M _{ii}*)). If, in addition, 1.

$$\begin{array}{cc}\mathcal{L}(G)=\hfill & [M-\text{diag}({M}_{\mathit{\text{ii}}})]-\text{diag}(1.[M-\text{diag}({M}_{\mathit{\text{ii}}})])\hfill \\ \hfill & =M-\text{diag}({M}_{\mathit{\text{ii}}})+\text{diag}({M}_{\mathit{\text{ii}}})=M.\hfill \end{array}$$

In what follows we will need to solve linear systems of the form *M*.*z* = 0, where *M* is a *n* × *n* matrix over some field , *z* is a column vector of *n* unknowns and *M* has rank *n* − 1. Recall that the adjugate matrix of *M*, adj(*M*), is defined by

$$\text{adj}{(M)}_{\mathit{\text{ij}}}={(-1)}^{i+j}{M}_{(\mathit{\text{ji}})},$$

(2)

where *M*_{(ji)} is the maximal minor given by the determinant of the (*n* − 1) × (*n* − 1) matrix obtained from *M* by removing the *j*th row and *i*th column. Note the reversal of indices in (2). The adjugate satisfies the Laplace relations

$$\text{adj}(M).M=M.\text{adj}(M)=\text{det}(M)I.$$

If rk(*M*) = *n*−1, any column vector of adj(*M*) is a basis for the column null space. Taking the first column, let

$${z}_{i}=\text{adj}{(M)}_{i1}={(-1)}^{i+1}{M}_{(1i)},$$

(3)

so that *M*.*z* = 0.

The maximal minors have a particularly striking form when *M* is a Laplacian matrix. Let *G* be a labelled, directed graph with no self loops. *T* is said to be a spanning tree of *G* if *T* is a directed subgraph which reaches each node of *G* such that *T*^{} is connected and acyclic. *T* inherits labels from *G*. *T* is said to be rooted at *v* *G* if *v* is the unique sink in *T*. That is, *v* is the only node of *T* with no edges leaving it, *v* *w*. This implies that any non-root node has exactly one edge leaving it, for otherwise there would either be an additional sink or an undirected cycle. Let Θ_{v}(*G*) denote the set of spanning trees of *G* rooted at *v*.

(The Matrix-Tree Theorem [43, §3.6]) *Let G be a directed graph on* {1, …, *n*} *with labels in the field and no self-loops. The maximal minors of the Laplacian are given by*

$$\mathcal{L}{(G)}_{(\mathit{\text{ij}})}={(-1)}^{n+i+j-1}{\displaystyle \sum _{T\in {\mathrm{\Theta}}_{j}(G)}\left({\displaystyle \prod _{k\stackrel{a}{\to}l\in T}a}\right)}\phantom{\rule{thinmathspace}{0ex}}.$$

It is remarkable that the maximal minor is, up to a sign, just a sum of products of labels, since the determinant itself is a sum with alternating signs. Results like Theorem 1 go back to the 19th century work of Kirchhoff on electrical networks and of Sylvester and others on elimination theory; see [29, Chapter 5] for references. Theorem 1 was first proved by Tutte in 1948 [43]. Combining (3) with Theorem 1, we get the following.

*Let M be a n × n matrix over a field such that* 1.*M* = 0 *and rk(M)* = *n* − 1. *Let G be the labelled, directed graph constructed as in Remark 2 for which M = (G). The one dimensional column null space of M is generated by the vector ρ = (ρ _{1}, …, ρ_{n})^{t}, where*,

$${\rho}_{i}={(-1)}^{n+1}{\displaystyle \sum _{T\in {\Theta}_{i}(G)}\left({\displaystyle \mathbf{\prod}_{k\stackrel{a}{\to}l\in T}a}\right)\phantom{\rule{thinmathspace}{0ex}}.}$$

(4)

There is a simple condition for *x* to also be in the null space of *M*.

Suppose that *x*, ρ ^{n} with ρ_{k} ≠ 0 for some 1 ≤ *k* ≤ *n*. Then, *x* = λρ if, and only if, *x _{i}* = (ρ

The quantities ρ_{i}/ρ_{k} will play an important role below; see (8), (20) and (24). In these calculations, the labels will lie in [*Q*] and will either be symbols, like *q*_{1}, or S-positive polynomials, like *q*_{1}*q*_{2}+*q*_{3}*q*_{4}. Under these conditions, we see from (4) that each ρ_{i} is either 0, if there are no spanning trees rooted at vertex *i*, of, if there are such spanning trees, ρ_{i} is (−1)^{n+1} times a S-positive polynomial. Accordingly, ρ_{i}/ρ_{k} is a S-positive rational function.

If the biochemical mechanism of an enzyme is known, its rate function is often calculated in the quasi-steady state approximation [5]. King and Altman worked out a graphical method for doing this that is widely used [5, 23]. It seems not to have been previously noticed that this is an immediate application of the Matrix-Tree Theorem. Since we will see the Matrix-Tree Theorem at work in more detail later, we illustrate the application to rate functions with the simple example of reversible Michaelis-Menten kinetics. Here, enzyme *E* and substrate *S* reversibly form an enzyme-substrate complex *Y*, which reversibly yields enzyme and product *P* (compare Figure 1a):

$$E+S\underset{b}{\overset{a}{\rightleftharpoons}}Y\underset{d}{\overset{c}{\rightleftharpoons}}E+P.$$

(5)

The rate constants, *a*, *b*, *c*, *d* are all taken to be positive. The quasi-steady state approximation assumes that *Y* reaches steady state while substrate is being converted to product. Under mass-action kinetics, the differential equations for *E* and *Y* are

$$\begin{array}{ll}\frac{\mathit{\text{dE}}}{\mathit{\text{dt}}}\hfill & =-(\mathit{\text{aS}}+\mathit{\text{dP}}).E+(b+c)Y\hfill \\ \frac{\mathit{\text{dY}}}{\mathit{\text{dt}}}\hfill & =(\mathit{\text{aS}}+\mathit{\text{dP}}).E-(b+c)Y.\hfill \end{array}$$

(6)

Note that *E* is at steady-state, if, and only if, *Y* is at steady state (compare Lemma 3). Hence, (5) is in quasi-steady state, if, and only, if, (6) is in steady state. At steady-state, the right hand side of (6) is a system of linear equations in *E* and *Y*. We can ignore the uninteresting case when *S* = *P* = 0 and assume that the coefficients of (6) are positive, thereby avoiding symbolic calculations. Let *M* be the corresponding matrix over for the basis {*E*, *Y*}. We see from (6) that 1.*M* = 0. Since rk(*M*) = 1 by inspection, we can use Lemma 2 to find solutions of *M*.*z* = 0. The labelled, directed graph formed from *M* according to Remark 2, is

$$E\underset{b+c}{\overset{\mathit{\text{aS}}+\mathit{\text{dP}}}{\rightleftharpoons}}Y.$$

(7)

This has a single spanning tree rooted at *E*, $E,Y\stackrel{b+c}{\to}E$, and a single spanning tree rooted at *Y*, $E\stackrel{\mathit{\text{aS}}+\mathit{\text{dP}}}{\to}Y$. By Lemma 2, a basis for the column null space of *M* is

$$\left(\begin{array}{c}-(b+c)\\ -(\mathit{\text{aS}}+\mathit{\text{dP}})\end{array}\right)\phantom{\rule{thinmathspace}{0ex}}.$$

By Remark 3, *M*.*z* = 0 if, and only, if

$${z}_{Y}=\left[\left(\frac{a}{b+c}\right)\phantom{\rule{thinmathspace}{0ex}}S+\left(\frac{d}{b+c}\right)P\right]\phantom{\rule{thinmathspace}{0ex}}{z}_{E}.$$

(8)

We see that, at steady state, the enzyme-substrate complex is the free enzyme times a linear combination of substrate and product. The coefficients of the linear form are reciprocals of the forward and reverse Michaelis-Menten constants, *K _{f}* = (

$$\frac{\mathit{\text{dP}}}{\mathit{\text{dt}}}=\frac{\frac{c{E}_{\mathit{\text{tot}}}}{{K}_{f}}S-\frac{d{E}_{\mathit{\text{tot}}}}{{K}_{r}}P}{1+\frac{S}{{K}_{f}}+\frac{P}{{K}_{r}}}.$$

This simple calculation may provide some orientation for the more involved treatment that now follows.

We begin by setting up a general model for a PTM system. There are three kinds of chemical species in the system: enzymes, substrates and intermediate enzyme-substrate complexes. Let Enz, denote a non-empty, finite set of enzymes, Sub, a non-empty, finite set of substrate modforms and Int, a non-empty, finite set of enzyme-substrate complexes. A non-empty, finite set of sub-networks, Net, is defined in terms of these:

$$\begin{array}{l}\text{Enz}=\{{E}_{1},\cdots ,{E}_{L}\}\hfill \\ \text{Sub}=\{{S}_{1},\cdots ,{S}_{N}\}\hfill \\ \text{Int}=\{{Y}_{1},\cdots ,{Y}_{P}\}\hfill \\ \text{Net}=\{{T}_{1},\cdots ,{T}_{M}\}.\hfill \end{array}$$

Each sub-network, *T* Net, consists of an associated enzyme, *e*(*T*) Enz, a non-empty subset of modforms, σ(*T*) Sub, a non-empty subset of enzyme-substrate complexes, γ(*T*) Int, and a reaction network, *N*(*T*), defined below. The sub-networks encode the biochemical details of how enzymes convert modforms.

This formulation allows for multiple forward and reverse enzymes, which may catalyse different types of modification and demodification. Multiple substrates are also permitted; the distinction between them will emerge in the calculation, as part of Condition 2. The combinatorics of multisite modification are not directly represented: the modforms of all substrates are simply listed 1, …, *N*. As discussed in the Introduction, *N* and *P* may be exponentially larger than *L*.

The enzyme-substrate subsets must be disjoint between distinct sub-networks: if *i* ≠ *j* then γ(*T _{i}*) ∩ γ(

$$\underset{i=1}{\overset{M}{\cup}}\sigma ({T}_{i})}=\text{Sub},\text{\hspace{1em}}{\displaystyle \underset{i=1}{\overset{M}{\cup}}\gamma ({T}_{i})}=\text{Int}.$$

Given *Y* Int, *t*(*Y*) Net denotes the (unique) sub-network containing *Y*.

For *E* = *e*(*T*), any *S _{u}* σ(

$$\begin{array}{c}E+{S}_{u}\stackrel{{a}_{u,v}^{T}}{\to}{Y}_{v}\\ E+{S}_{u}\stackrel{{b}_{u,b}^{T}}{\leftarrow}{Y}_{v}\\ {Y}_{i}\stackrel{{c}_{i,j}^{T}}{\to}{Y}_{j}.\end{array}$$

(9)

We further assume Condition 1 in §3.3 and Condition 2 in §3.4. These are strong connectivity conditions on certain graphs that allow Lemma 1 to be used. They will be stated after introducing additional concepts below.

The reactions (9) imply that enzyme is conserved—it is either free or bound in some enzyme-substrate complex—while substrate can flow between different modforms. While not much is known about the biochemical details of how enzymes modify multisite substrates, the sub-network assumptions allow considerable flexibility. They can accommodate, for instance, overlapping site preferences, arbitrary orders of modification and demodification, distributivity or processivity [11] and intricate hierarchical dependencies between enzymes [10, 35]. Some simple examples are shown in Figure 1. The main assumption behind (9) is that the donor molecules that provide the modifier are kept at constant concentration, on the time scale of the PTM dynamics, by mechanisms that are not explicitly modelled. As mentioned in the Introduction, this assumption has always been made for phosphorylation but needs further investigation for other modifications. It means that the donor molecules do not have to be treated as dynamical variables; their affects can be absorbed into the rate constants. This permits both forward and reverse reactions to be bimolecular with only enzyme and substrate. (In fact, we are implicitly making a similar assumption for the reverse reactions by ignoring the water molecules needed for hydrolysis.) Enzymes would normally be expected to give rise to tree networks, as in Figure 1a–e, but cyclic networks like Figure 1f are mathematically allowed.

The data above give rise to a polynomial dynamical system defined by mass-action kinetics on the set of chemical species, Sub Enz Int. For 1 ≤ *u* ≤ *N*; for 1 ≤ *v* ≤ *P*, *T* = *t*(*Y _{v}*),

$$\frac{{\mathit{\text{dS}}}_{u}}{\mathit{\text{dt}}}={\displaystyle \sum _{{S}_{u}\in \sigma (T)}\left({\displaystyle \sum _{e(T)+{S}_{u}\leftrightarrow {Y}_{j}\in {N}^{\star}(T)}({b}_{u,j}^{T}{Y}_{j}-{a}_{u,j}^{T}e(T){S}_{u})}\right)}$$

(10)

$$\begin{array}{ll}\frac{{\mathit{\text{dY}}}_{v}}{\mathit{\text{dt}}}=\hfill & {\displaystyle \sum _{{Y}_{v}\leftrightarrow {Y}_{j}\in {N}^{\star}(T)}({c}_{j,v}^{T}{Y}_{j}-{c}_{v,j}^{T}{Y}_{v})}\hfill \\ \hfill & +{\displaystyle \sum _{E+{S}_{j}\leftrightarrow {Y}_{v}\in {N}^{\star}(T)}({a}_{j,v}^{T}{S}_{j}E-{b}_{j,v}^{T}{Y}_{v})}\hfill \end{array}$$

(11)

$$\frac{{\mathit{\text{dE}}}_{w}}{\mathit{\text{dt}}}={\displaystyle \sum _{e(T)={E}_{w}}\left({\displaystyle \sum _{{E}_{w}+{S}_{i}\leftrightarrow {Y}_{j}\in {N}^{\star}(T)}({b}_{i,j}^{T}{Y}_{j}-{a}_{i,j}^{T}{E}_{w}{S}_{i})}\right)\phantom{\rule{thinmathspace}{0ex}}.}$$

(12)

Terms like ${c}_{j,v}^{T}{Y}_{j}-{c}_{v,j}^{T}{Y}_{v}$ in equation (11) are indexed over edges in the undirected graph *N*^{} (*T*). If one or other corresponding directed edge is not present in *N*(*T*), the associated label should be treated as if it were zero. In other words,

$$\sum _{{Y}_{v}\leftrightarrow {Y}_{j}\in {N}^{\star}(T)}({c}_{j,v}^{T}{Y}_{j}-{c}_{v,j}^{T}{Y}_{v})}={\displaystyle \sum _{{Y}_{j}\to {Y}_{v}\in N(T)}{c}_{j,v}^{T}{Y}_{j}-{\displaystyle \sum _{{Y}_{v}\to {Y}_{j}\in N(T)}{c}_{v,j}^{T}{Y}_{v}}}.$$

Indexing over *N*^{}(*T*) is purely a notational convenience, which allows for a more compact syntax.

In this section, the rate constants and dynamical variables in (10)–(12) are treated as real variables. The structure of these equations can be better seen in terms of *f _{T}*, the net flux of enzyme out of sub-network

$${f}_{T}={\displaystyle \sum _{e(T)+{S}_{i}\leftrightarrow {Y}_{j}\in {N}^{\star}(T)}{b}_{i,j}^{T}{Y}_{j}-{a}_{i,j}^{T}e(T){S}_{i},}$$

so that equation (12) can be rewritten as

$$\frac{{\mathit{\text{dE}}}_{w}}{\mathit{\text{dt}}}={\displaystyle \sum _{e(T)={E}_{w}}{f}_{T}\phantom{\rule{thinmathspace}{0ex}}.}$$

(13)

Moreover, if (11) is added up for all *Y _{v}*

$${f}_{T}=-{\displaystyle \sum _{{Y}_{v}\in \gamma (T)}\frac{{\mathit{\text{dY}}}_{v}}{\mathit{\text{dt}}}.}$$

(14)

Finally, the enzyme flux from *T* is equal to the flux of all substrate modforms from *T*. Summed over all sub-networks, this gives the total substrate flux. More formally, if equation (10) is added up for all substrates, then, rearranging the order of summation, and noting that each *E* + *S* ↔ *Y* is unique since the *Y* are not shared, we see that

$$\sum _{u\in \text{Sub}}\frac{{\mathit{\text{dS}}}_{u}}{\mathit{\text{dt}}}={\displaystyle \sum _{T\in \text{Net}}{f}_{T}\phantom{\rule{thinmathspace}{0ex}}.}$$

(15)

From (13) and (14) we see that

$$\frac{d}{\mathit{\text{dt}}}\left({E}_{w}+{\displaystyle \sum _{e(T)={E}_{w}}\left({\displaystyle \sum _{{Y}_{v}\in \gamma (T)}{Y}_{v}}\right)}\right)=0.$$

The term being differentiated is evidently the total amount of enzyme *E _{w}* in the system, which we conclude to be the same at all times. Similarly, from (14) and (15) we see that

$$\frac{d}{\mathit{\text{dt}}}\left({\displaystyle \sum _{u\in \text{Sub}}{S}_{u}}+{\displaystyle \sum _{T\in \text{Net}}\left({\displaystyle \sum _{{Y}_{v}\in \gamma (T)}{Y}_{v}}\right)}\right)=0.$$

The term being differentiated is the total amount of substrate in the system, which must also be the same at all times. Let *S _{tot}* be the total amount of substrate and

$${E}_{w}+{\displaystyle \sum _{e(T)={E}_{w}}\left({\displaystyle \sum _{{Y}_{v}\in \gamma (T)}{Y}_{v}}\right)={E}_{w,\mathit{\text{tot}}}}$$

(16)

$$\sum _{u\in \text{Sub}}{S}_{u}+{\displaystyle \sum _{T\in \text{Net}}\left({\displaystyle \sum _{{Y}_{v}\in \gamma (T)}{Y}_{v}}\right)}={S}_{\mathit{\text{tot}}},$$

(17)

Of course, these conservation laws are evident from the form of the allowed reactions in (9) but the derivation above checks the correctness of the system equations and illuminates the role of *f _{T}*. Reinforcing that, the following immediate consequence of (14) is a key result.

In any steady state, for any sub-network T, f_{T} = 0.

It is instructive to see this another way, which does not depend on the details of how (14) is proved. In any steady state, the enzyme flux out of *T* cannot be negative. In other words, there cannot be positive flux of enzyme into *T*. If there were, this flux cannot escape through any *Y _{v}*

Lemma 3 says that, at steady state, not merely is *dE _{w}*/

In this section and the next we will work symbolically. Let Con denote the set of all rate constants for all sub-networks *T* Net and all reactions (9) in *N*(*T*),

$$\text{Con}=\{{a}_{u,v}^{T},{b}_{u,v}^{T},{c}_{i,j}^{T}\}.$$

Let *T* be any sub-network and *E* = *e*(*T*). In any steady state of the system, it follows from (11) and Lemma 3 that for all *Y _{v}* γ(

$$\sum _{{Y}_{v}\leftrightarrow {Y}_{j}\in {N}^{\star}(T)}({c}_{j,v}^{T}{Y}_{j}-{c}_{v,j}^{T}{Y}_{v})+}{\displaystyle \sum _{E+{S}_{j}\leftrightarrow {Y}_{v}\in {N}^{\star}(T)}({a}_{j,v}^{T}{S}_{j}E-{b}_{j,v}^{T}{Y}_{v})=0$$

(18)

$$\sum _{E+{S}_{i}\leftrightarrow {Y}_{j}\in {N}^{\star}(T)}({b}_{i,j}^{T}{Y}_{j}-{a}_{i,j}^{T}{S}_{i}E)=0}.$$

(19)

These form a system of linear equations in the variables *E* and *Y _{v}* γ(

Let *G _{T}* be the labelled, directed graph formed from

For any sub-network *T*, *G _{T}* is strongly connected.

All the examples in Figure 1 satisfy this condition, which seems biochemically reasonable.

By Lemma 1, *M _{T}* has rank

$$\frac{{\rho}_{i}}{{\rho}_{p}}={\displaystyle \sum _{{S}_{u}\in \sigma (T)}{\mu}_{i,u}^{T}{S}_{u}.}$$

(20)

The ${\mu}_{i,u}^{T}$ are, by definition, generalised Michaelis-Menten constants (compare the discussion in §2.4). Note that, as defined here, these are reciprocals of the usual Michaelis-Menten constants [5]. We prefer this convention because it allows these constants to be 0 when necessary, in preference to having to be ∞. By construction, the Michaelis-Menten constants are either 0 or are S-positive elements of (Con). In particular, they are well defined for any positive values of the rate constants in Con. The Michaelis-Menten constants for the example in Figure 1e are shown in Table 1.

Using Remark 3, we have proved the following generalisation of (8).

*For any sub-network T* *Net, the sub-network equation (18) and equation (19) are satisfied, if, and only if,*

$${Y}_{i}=\left({\displaystyle \sum _{{S}_{u}\in \sigma (T)}{\mu}_{i,u}^{T}{S}_{u}}\right)e(T),$$

(21)

for 1 ≤ *i* < *p*.

In steady state, the enzyme-substrate complexes satisfy (21). Substituting these into the expressions for the modforms given by (10) we obtain

$$\sum _{{S}_{u}\in \sigma (T)}\left({\displaystyle \sum _{e(T)+{S}_{u}\leftrightarrow {Y}_{j}\in {N}^{\star}(T)}\left(\left({\displaystyle \sum _{{S}_{v}\in \sigma (T)}{b}_{u,j}^{T}{\mu}_{j,v}^{T}e(T){S}_{v}}\right)-{a}_{u,j}^{T}e(T){S}_{u}\right)}\right)=0}.$$

(22)

These expressions are linear in the *S _{u}* with coefficients which are S-positive polynomials in (Con)[Enz]. Note the critical need at this point for

$$\sum _{T\in \text{Net}}\left({\displaystyle \sum _{{S}_{u}+e(t)\leftarrow {Y}_{j}\in N(T)}{b}_{u,j}^{T}{\mu}_{j,v}^{T}}\right)\phantom{\rule{thinmathspace}{0ex}}e(T)\phantom{\rule{thinmathspace}{0ex}},$$

whenever that expression is non-zero. Terms like ${b}_{u,j}^{T}{\mu}_{j,v}^{T}$ are familiar to biochemists as catalytic efficiencies. For each *T* Net and any distinct pair *S _{u}*,

$${\kappa}_{u,v}^{T}={\displaystyle \sum _{{S}_{u}+e(T)\leftarrow {Y}_{j}\in N(T)}{b}_{u,j}^{T}{\mu}_{j,v}^{T}}.$$

Note that ${\kappa}_{u,v}^{T}=0$ if *S _{v}* σ(

$$\sum _{T\in \text{Net}}{\kappa}_{u,v}^{T}e(T),$$

(23)

whenever that expression is non-zero. Figure 3 shows *G _{S}* for the example in Figure 1e.

The catalytic efficiencies provide a more concise set of labels for *G _{S}* than the original rate constants. Let Cat denote the set of all non-zero generalised catalytic efficiencies:

$$\text{Cat}=\{{\kappa}_{u,v}^{T}\ne 0|{S}_{u},{S}_{v}\in \text{Sub}\}.$$

Note that (Cat) is a subfield of (Con) and that any S-positive element of (Cat) is also S-positive as an element of (Con). The labels on *G _{S}* are S-positive polynomials in [Cat Enz], which are homogeneous and linear in the

Since the modforms may include distinct substrates, we cannot assume that *G _{S}* is connected. We can now state the second condition for the systems considered here.

The connected components of *G _{S}* are strongly connected.

This condition is biochemically reasonable. For a given substrate, each modification is usually balanced by another enzyme carrying out a de-modification. This implies strong connectivity of the corresponding component of *G _{S}*. Distinct substrates will give rise to distinct components. Although Figure 1e has only a single enzyme, the reversibility of the sub-network is sufficient in this case to give a single component which is strongly connected, as in Figure 3.

Condition 2 implies that *M _{S}* is block diagonal, with each block corresponding to one of the connected components. Each block can be treated separately. To avoid further complicating the exposition, we assume from now on that

$$\frac{{\rho}_{u}}{{\rho}_{1}}={r}_{u}({E}_{1},\cdots ,{E}_{L})\phantom{\rule{thinmathspace}{0ex}},$$

(24)

whose non-zero coefficients are S-positive elements of [Cat]. Here, we have, without loss of generality, chosen *S*_{1} as a reference modform. With multiple components, a reference modform will be needed for each component. Since the ρ_{u} are each homogeneous of degree *N* − 1, it follows that, for λ ,

$${r}_{u}(\lambda {E}_{1},\cdots ,\lambda {E}_{L})={r}_{u}({E}_{1},\cdots ,{E}_{L})\phantom{\rule{thinmathspace}{0ex}},$$

(25)

so that the *r _{u}* are, in fact, inhomogeneous functions of any

$${r}_{u}({E}_{1},\cdots ,{E}_{L})={r}_{u}(\frac{{E}_{1}}{{E}_{L}},\cdots ,\frac{{E}_{L-1}}{{E}_{L}},1).$$

Using Lemma 2 and Remark 3, we deduce the following analogue of Proposition 1.

*The modform equations (22) are satisfied if, and only if*,

$${S}_{u}={r}_{u}({E}_{1},\cdots ,{E}_{L}){S}_{1}.$$

(26)

Examples of the rational functions *r _{u}* for the case of a single kinase, single phosphatase and single substrate with 2 sites are given in [27].

We can now put everything together and return from symbols to real variables. Let us consider any multisite PTM system satisfying the assumptions in §3.1 along with Condition 1 and Condition 2. We assume that the rate constants have positive real values. In this case, the maximal minor ρ_{1} appearing in (24) is a real polynomial of degree *N* − 1 in the enzymes and ρ_{1} = 0 defines a hypersurface in ^{L}. As long as the vector of enzyme values lies off this hypersurface, the rational functions in (24) are well defined. If *G _{S}* has many components, a similar proviso must be made for each of them, in respect of the maximal minor for the corresponding reference modform. Note that, since ρ

*In any steady state of the system for which the enzyme values satisfy ρ*_{1} ≠ 0, *equation (26) and equation (21) hold. Conversely, if the enzymes have values in for which ρ _{1}* ≠ 0,

We have shown the first part above. Now suppose that *E _{i}* , such that ρ

If the system has specified total amounts of substrate and enzymes, it satisfies the conservation laws (16) and (17). These provide *L* + 1 equations for the *L* enzymes and the substrate. For each sub-network *T* Net, let ϕ* _{T}* [Enz] denote the S-positive polynomial

$${\varphi}_{T}({E}_{1},\cdots ,{E}_{L})={\displaystyle \sum _{{Y}_{v}\in \gamma (T)}\left({\displaystyle \sum _{{S}_{w}\in \sigma (T)}{\mu}_{v,w}^{T}{\rho}_{w}}({E}_{1},\cdots ,{E}_{L})\right)}\phantom{\rule{thinmathspace}{0ex}}.$$

It follows from Proposition 1, that, whenever ρ_{1} ≠ 0,

$$\sum _{{Y}_{v}\in \gamma (T)}{Y}_{v}}={\varphi}_{T}e(T)\frac{{S}_{1}}{{\rho}_{1}}.$$

Let Δ [Enz] denote the S-positive polynomial,

$$\mathrm{\Delta}={\displaystyle \sum _{u\in \text{Sub}}{\rho}_{u}+{\displaystyle \sum _{T\in \text{Net}}{\varphi}_{T}e(T).}}$$

(27)

Rewriting (17) to solve for *S*_{1} in terms of *S _{tot}*, we see that, whenever Δ ≠ 0,

$${S}_{1}=\frac{{\rho}_{1}{S}_{\mathit{\text{tot}}}}{\mathrm{\Delta}}.$$

(28)

We can then rewrite (16) to get

$${E}_{w}\phantom{\rule{thinmathspace}{0ex}}\left(1+\left({\displaystyle \sum _{e(T)={E}_{w}}{\varphi}_{T}}\right)\frac{{S}_{\mathit{\text{tot}}}}{\mathrm{\Delta}}\right)={E}_{w,\mathit{\text{tot}}}.$$

(29)

These *L* equations are well-defined whenever Δ ≠ 0. With multiple components, *S _{tot}* can be apportioned among the components and each reference modform will have a corresponding equation to (28).

Let Φ : ^{L} × → ^{L} be defined by the left-hand side of (29) so that, for 1 ≤ *w* ≤ *L*,

$${\mathrm{\Phi}}_{w}({E}_{1},\cdots ,{E}_{L},S)={E}_{w}\phantom{\rule{thinmathspace}{0ex}}\left(1+\left({\displaystyle \sum _{e(T)={E}_{w}}{\varphi}_{T}}\right)\frac{S}{\mathrm{\Delta}}\right)$$

*The steady states of a multisite PTM system are given by the solutions of a system of L equations for the L free enzyme concentrations. More precisely, if A, E* ^{L}, *S* , Δ_{ρ1} ≠ 0 *and* Φ(*E, S*) = *A, then there is a steady state of the system for which E gives the free enzyme concentrations, S = S _{tot} and A_{w} = E_{w,tot}. Conversely, any steady state of the system having these totals, for which* Δ

We have shown above that any steady state of the system having the specified totals, for which Δ_{ρ1} ≠ 0, satisfies Φ(*E, S*) = *A*. Now suppose that *E* ^{L}, *S* satisfy Δ_{ρ1} 0 and that Φ(*E, S*) = *A*. Define ${S}_{1}^{*}\in \mathbb{R}$ so that

$${S}_{1}^{*}=\frac{{\rho}_{1}S}{\mathrm{\Delta}}$$

which we may do since Δ ≠ 0. Now use Theorem 2 to define a steady state of the system, ${S}_{u}^{*},{Y}_{v}^{*}$, which we may do since ρ_{1} ≠ 0. We need only check that the total amount of substrate, ${S}_{\mathit{\text{tot}}}^{*}$, and enzymes, ${E}_{w,\mathit{\text{tot}}}^{*}$, satisfy ${S}_{\mathit{\text{tot}}}^{*}=S$ and ${E}_{w,\mathit{\text{tot}}}^{*}={A}_{w}$. The total amount of substrate is

$${S}_{\mathit{\text{tot}}}^{*}={\displaystyle \sum _{{S}_{u}\in \sigma (T)}{S}_{u}^{*}+{\displaystyle \sum _{T\in \text{Net}}{\displaystyle \sum _{{Y}_{v}\in \gamma (T)}{Y}_{v}^{*}}}},$$

Since (26) and (21) are satisfied through the use of Theorem 2, this gives

$${S}_{\mathit{\text{tot}}}^{*}=\mathrm{\Delta}\frac{{S}_{1}^{*}}{{\rho}_{1}}=S,$$

as required. Similarly, the total amount of enzyme *E _{w}* is

$$\begin{array}{c}{E}_{w,\text{tot}}^{*}={E}_{w}+{\displaystyle \sum _{e(T)={E}_{w}}\left({\displaystyle \sum _{{Y}_{v}\in \gamma (T)}{Y}_{v}^{*}}\right)}\\ ={E}_{w}\left(1+\left({\displaystyle \sum _{e(T)={E}_{w}}{\varphi}_{T}}\right)\frac{{S}_{1}^{*}}{{\rho}_{1}}\right)={\mathrm{\Phi}}_{w}(E,S)={A}_{w},\end{array}$$

as required.

Since Δ_{ρ1} [Enz] is S-positive, it never vanishes for positive free enzyme concentrations, which corresponds to the biochemically realistic case.

As suggested in the Introduction, Theorem 2 should be seen as an assertion that an appropriate algebraic variety is rationally parameterisable. This implies that points on the variety can be explicitly constructed as rational functions of some auxiliary parameters, in contrast to the implicit definition of points as solutions of polynomial equations [6]. For instance, *x*^{2} + *y*^{2} = 1 provides an implicit definition of the unit circle, while the expression of *x* and *y* as

$$\begin{array}{ll}x=\frac{2t}{{t}^{2}+1},\hfill & y=\frac{{t}^{2}-1}{{t}^{2}+1}\hfill \end{array}$$

(30)

provides an explicit rational parameterisation. In general, a rational parameterisation may be undefined at points where the denominators of the rational functions vanish (which is not the case for (30)) and not all points on the variety may be represented (such as the point (0, 1) in (30)).

Recall from §3.1 that there are *L* + *N* + *P* dynamical variables in our system. Let *V* ^{L+N+P} denote the steady-state variety, corresponding to the simultaneous solutions of the system equation (10)–equation (12). *V* is a real algebraic variety. Let π: ^{L+N+P} → ^{N} denote the projection on to the space of modforms. π(*V*) may lack some points required for it to be an algebraic variety but it may be completed to one, if required, [6].

The set π(*V*) has additional structure, not present in *V* . Equation (26) gives the modforms only up to a constant. If λ , then it is easy to see from the system equation (10)–equation (12) that multiplying each modform and each enzyme-substrate complex by λ, changes all the rates by λ. In particular, if the system is at steady state, then it is still at steady state after such a change. Hence, π(*V*) is a projective set: given *x* π(*V*) the line through the origin and *x* also lies in π(*V*). We can therefore consider π(*V*) as a subset, π(*V*)_{}, of the real projective space ^{N−1}. The following is a corollary of Theorem 2

*π*(*V*)_{} *has a rational parameterisation. Specifically, the rational functions r _{u} in (24) define a surjective mapping*

In view of (25), the dimension of π(*V*)_{}, once completed to a variety, is at most *L* − 1. For instance, in the case considered in [27, 41], with a single kinase and a single phosphatase, π(*V*)_{} is a rational curve.

Our results show that the exponentially large number of state variables, *L* + *N* + *P*, of a multisite PTM system is determined at steady state by a relatively small “core” of *L* variables. We have provided in Propositions 1 and 2 a linear algebraic algorithm for calculating all steady-state variables in terms of the core. Tools like Mathematica can readily carry out linear algebra over symbolic fields like (*Q*) and an example of such a program is available as the Supplementary Information to [27].

Previous steady-state analyses of multisite PTMs have largely focussed on phosphorylation. They have either used approximations, such as Michaelis-Menten or linear kinetics [14, 26, 28, 37], which ignore sequestration effects when enzymes have multiple substrates [4], or made simplifying biological assumptions, such as small numbers of sites [27] or sequential modification [16], which limits their applicability. The results of the present paper provide a foundation for developing models that are closer to biological reality while also extending the scope of analysis to allow for multiple enzymes, multiple types of modification and multiple substrates. The permitted biochemical mechanisms are also considerably enlarged. Our method of proof reveals the significance of the Matrix-Tree theorem, which seems to play a key role in several forms of algebraic elimination in biochemical networks [7, 23].

The present paper develops the methodology. Applications of these results are found in previous papers which have focussed on phosphorylation with a single kinase, single phosphatase and single substrate [27, 41]. The capability to treat the number, *n*, of sites as a variable has allowed us to show that, for appropriate rate constants, the number of stable phospho-form distributions can be as many as (*n* + 2/2, where *x* denotes the greatest integer not greater than *x* [41]. In particular, the number of stable phospho-form distributions increases with *n*. While some of these distributions are focussed on a few phospho-forms, others are more diffuse. Since different phospho-forms may have distinct biological effects [31, 34, 45], the phospho-proteome could be capable of substantial information processing. Indeed, it has been suggested that the remarkable variety of PTMs found on histone proteins in chromatin form a “code” for transcriptional regulation [21, 42]. The analysis in [41] provides the first example of a PTM mechanism which is capable of encoding arbitrary amounts of information and gives an estimate of its information capacity.

A second application, for systems with two sites, has shown that the steady-state geometry can distinguish between different reaction networks [27]. The geometry is detected algebraically through the use of “invariants”, or polynomial functions of the steady-state phospho-form concentrations which depend only on the rate constants [17]. Invariants take the same value no matter what amounts of enzymes and substrates are present or what steady state is formed. To exploit such results we have been developing, in collaboration with Hanno Steen’s group at Children’s Hospital in Boston, mass-spectrometric methods for accurately quantifying phospho-form distributions. We are using this to map out, for the first time, the steady-state geometry of a kinase, phosphatase, substrate system: the MAP kinase Erk, which is doubly phosphorylated by the MAP kinase kinase Mek and dephosphorylated by the dual-specificity phosphatase MKP3. Our experimental studies have already shown that these proteins engage in a more complex set of reactions than is commonly described in the literature and we are developing the method of invariants as a tool to work out the missing pieces.

Our results suggest several directions for future investigation. Which biochemical reaction networks have “cores”, or small subsets of variables in terms of which all others can be calculated at steady state? If a core exists, is it unique? How can cores be identified and how can the functional relationship with non-core variables be algorithmically determined? Do these functional relationships give rational parameterisations of the steady-state variety? A particularly interesting generalisation would be to allow substrates to also be enzymes, thereby accommodating kinase cascades. The difficulty here is that those substrates that are also enzymes can, presumably, no longer be eliminated and must hence be in the “core”, while, at the same time, as substrates, these variables may have non-trivial algebraic dependencies with other core variables. In the case treated here, the variables in the core are independent: their values can be arbitrarily assigned (Theorem 2). In more general cases there may be additional algebraic constraints on the core variables. We believe that the language and methods of algebraic geometry [6, 27] will be particularly useful for unravelling such issues.

Looking further ahead, it is a tantalising question as to whether the elimination procedures developed here can be extended from the steady state to the dynamics, perhaps, initially, to the local vicinity of the steady state. Since steady-state stability is determined by the eigenvalues of the Jacobian, it does not seem implausible that differential algebraic methods could encompass both the steady state itself as well as its local vicinity. More globally, multisite PTM systems have other attractors, such as limit cycles, of which the cyanobacterial circadian oscillator is a particularly significant example [30, 36]. Hilbert’s sixteenth problem asks about the number of limit cycles of two-dimensional polynomial dynamical systems. It remains unsolved, although some lower bounds are known [2]. Several lines of evidence, including the work presented here, suggest that the polynomial dynamics arising from biochemical reaction networks has very good properties at steady state. Could the same be true for other attractors like limit cycles?

Biologists are continually striving to elicit general principles from experimental data. Mathematical methods have been of less help in this respect than in accounting for the results of individual experiments. The present paper provides the tools to reason about post-translational modification systems without having to fix in advance many of the individual details, such as the number of enzymes or the combinatorics of modification. Being able to rise above the molecular complexity while retaining biological and biochemical realism provides a complementary capability to that of simulation. If this capability can be extended to a broad range of cellular processes, we will have a powerful tool with which to articulate the principles of cellular information processing.

The research undertaken here was supported in part by NIH under grant R01-GM081578. We thank Bernd Sturmfels for pointing us to the Matrix-Tree theorem, Alicia Dickenstein for many stimulating scientific discussions and the Statistical and Applied Mathematical Sciences Institute (SAMSI) for supporting an extended visit by JG to the Program on Algebraic Methods in Systems Biology and Statistics, during which this paper was drafted.

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1. Angeli D, Ferrell JE, Sontag ED. Detection of multistability, bifurcations, and hysteresis in a large class of biological positive-feedback systems. Proc. Natl. Acad. Sci. USA. 2004;101:1822–1827. [PubMed]

2. Christopher CJ, Lloyd NG. Polynomial systems: a lower bound for the Hilbert numbers. Proc. Roy. Soc. Lond. A. 1995;450:219–224.

3. Chung FRK. Spectral Graph Theory. No. 92 in Regional Conference Series in Mathematics. American Mathematical Society. 1997

4. Ciliberto A, Capuani F, Tyson JJ. Modeling networks of coupled enzymatic reactions using the total quasi-steady state approximation. PLoS Comp. Biol. 2007;3:e45. [PMC free article] [PubMed]

5. Cornish-Bowden A. Fundamentals of Enzyme Kinetics. 2nd Edition. London, UK: Portland Press; 1995.

6. Cox D, Little J, O’Shea D. Ideals, Varieties and Algorithms. 2nd Edition. Springer; 1997.

7. Craciun G, Dickenstein A, Shiu A, Sturmfels B. Toric dynamical systems. j. Symb. Comp., to appear. 2008

8. Craciun G, Tang Y, Feinberg M. Understanding bistability in complex enzyme-driven reaction networks. Proc. Natl. Acad. Sci. USA. 2006;103:8697–8602. [PubMed]

9. Feinberg M. Lectures on Chemical Reaction Networks, lecture notes, Mathematics Research Center, University of Wisconsin, 1979. 1979 www.che.eng.ohio-state.edu/~feinberg/research/

10. Ferrarese A, Marin O, Bustos VH, Venerando A, Antonelli M, Allende JE, Pinna LA. Chemical dissection of the APC Repeat 3 multistep phosphorylation by the concerted action of protein kinases CK1 and GSK3. Biochemistry. 2007;46:11902–11910. [PubMed]

11. Ferrell JE, Bhatt RR. Mechanistic studies of the dual phosphorylation of mitogen-activated protein kinase. J. Biol. Chem. 1997;272:19008–19016. [PubMed]

12. Fischer EH. Protein phosphorylation and cellular regulation, II. In: Ringertz N, editor. Nobel Lectures, Physiology or Medicine 1991–1995. Singapore: World Scientific; 1997.

13. Gatermann K, Huber B. A family of sparse polynomial systems arising in chemical reaction systems. J. Symbolic Computation. 2002;33:273–305.

14. Goldbeter A, Koshland DE. An amplified sensitivity arising from covalent modification in biological systems. Proc. Natl. Acad. Sci. USA. 1981;78(11):6840–6844. [PubMed]

15. Gunawardena J. 2003. Chemical Reaction Network Theory for *in-silico* biologists, Lecture notes, Harvard Univ, 2003. vcp.med.harvard.edu/papers/crnt.pdf.

16. Gunawardena J. Multisite protein phosphorylation makes a good threshold but can be a poor switch. Proc. Natl. Acad. Sci. USA. 2005;102:14617–14622. [PubMed]

17. Gunawardena J. Distributivity and processivity in multisite phosphorylation can be distinguished through steady-state invariants. Biophys. J. 2007;93:3828–3834. [PubMed]

18. Gunawardena J. Models in systems biology: the parameter problem and the meanings of robustness. In: Lodhi H, Muggleton S, editors. Elements of Computational Systems Biology. Wiley Book Series on Bioinformatics. John Wiley and Sons, Inc.; 2010.

19. Herstein I. Topics in Algebra. Wiley; 1975.

20. Hunter T. The age of crosstalk: phosphorylation, ubiquitination and beyond. Mol. Cell. 2007;28:730–738. [PubMed]

21. Jenuwein T, Allis CD. Translating the histone code. Science. 2001;293:1074–1080. [PubMed]

22. Kim SY, Ferrell JE. Substrate competition as a source of ultrasensitivity in the inactivation of Wee1. Cell. 2007;128:1133–1145. [PubMed]

23. King EL, Altman C. A schematic method of deriving the rate laws for enzyme-catalyzed reactions. J. Phys. Chem. 1956;60:1375–1378.

24. Krebs EG. Protein phosphorylation and cellular regulation, I. In: Ringertz N, editor. Nobel Lectures, Physiology or Medicine 1991–1995. Singapore: World Scientific; 1997.

25. Kruse J-P, Gu W. SnapShot: p53 posttranslational modifications. Cell. 2008;133:930. [PMC free article] [PubMed]

26. Lisman JE. A mechanism for memory storage insensitive to molecular turnover: a bistable autophosphorylating kinase. Proc. Natl. Acad. Sci. USA. 1985;82:3055–3057. [PubMed]

27. Manrai A, Gunawardena J. The geometry of multisite phosphorylation. Biophys. J. 2008;95:5533–5543. [PubMed]

28. Markevich NI, Hoek JB, Kholodenko BN. Signalling switches and bistability arising from multisite phosphorylation in protein kinase cascades. J. Cell Biol. 2004;164:353–359. [PMC free article] [PubMed]

29. Moon JW. Counting Labelled Trees. No. 1 in Canadian Mathematical Monographs. Canadian Mathematical Congress. 1970

30. Nakajima M, Imai K, Ito H, Nishiwaki T, Murayama Y, Iwasaki H, Oyama T, Kondo T. Reconstitution of circadian oscillation of cyanobacterial KaiC phosphorylation in vitro. Science. 2005;308:414–415. [PubMed]

31. Park K-S, Mohapatra DP, Misonou H, Trimmer JS. Graded regulation of the Kv2.1 potassium channel by variable phosphorylation. Science. 2006;313:976–979. [PubMed]

32. Pesavento JJ, Bullock CR, LeDuc RD, Mizzen CA, Kelleher NL. Combinatorial modification of human histone H4 quantitated by two-dimensional liquid chromatography coupled with top down mass spectrometry. J. Biol. Chem. 2008;283:14927–14937. [PMC free article] [PubMed]

33. Phanstiel D, Brumbaugh J, Berggren WT, Conrad K, Feng X, Levenstein ME, McAllister GC, Thomson JA, Coon JJ. Mass spectrometry identifies and quantifies 74 unique histone H4 isoforms in differentiating human embryonic stem cells. Proc. Natl. Acad. Sci. USA. 2008;105:4093–4098. [PubMed]

34. Pufall MA, Lee GM, Nelson ML, Kang H-S, Velyvis A, Kay LE, McIntosh LP, Graves BJ. Variable control of Ets-1 DNA binding by multiple phosphates in an unstructured region. Science. 2005;309:142–145. [PubMed]

35. Roach PJ. Multisite and hierarchal protein phosphorylation. Journal of Biological Chemistry. 1991;266:14139–14142. [PubMed]

36. Rust MJ, Markson JS, Lane WS, Fisher DS, O’Shea E. Ordered phosphorylation governs oscillation of a three-protein circadian clock. Science. 2007;318:809–812. [PMC free article] [PubMed]

37. Salazar C, Höfer T. Versatile regulation of multisite protein phosphorylation by the order of phosphate processing and protein-protein interactions. FEBS J. 2007;274:1046–1060. [PubMed]

38. Shacter-Noiman E, Chock PB, Stadtman ER. Protein phosphorylation as a regulatory device. Philos. Trans. R. Soc. Lond. B. 1983;302:157–166. [PubMed]

39. Soulé C. Graphic requirements for multistationarity. ComPlexUs. 2003;1:123–133.

40. Strogatz SH. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry and Engineering. Perseus Books. 2001

41. Thomson M, Gunawardena J. Unlimited multistability in multisite phosphorylation systems. Nature, to appear. 2009 [PMC free article] [PubMed]

42. Turner B. Cellular memory and the histone code. Nature. 2009;460:274–277. [PMC free article] [PubMed]

43. Tutte WT. The dissection of equilateral triangles into equilateral triangles. Proc. Camb. Phil. Soc. 1948;44:463–482.

44. Walsh CT. Roberts and Company. Colorado: Englewood; 2006. Posttranslational Modification of Proteins.

45. Wu RC, Qin J, Yi P, Wong J, Tsai SY, Tsai MJ, O’Malley BW. Selective phosphorylations of the SRC-3/AIB1 coactivator integrate genomic responses to multiple cellular signaling pathways. Mol. Cell. 2004;15:937–949. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |