|Home | About | Journals | Submit | Contact Us | Français|
Reversible phosphorylation on serine, threonine and tyrosine is the most widely studied posttranslational modification of proteins1,2. The number of phosphorylated sites on a protein (n) shows a significant increase from prokaryotes, with n≤7 sites, to eukaryotes, with examples having n≥150 sites3. Multisite phosphorylation has many roles4,5 and site conservation indicates that increasing numbers of sites cannot be due merely to promiscuous phosphorylation. A substrate with n sites has an exponential number (2n) of phospho-forms and individual phospho-forms may have distinct biological effects6,7. The distribution of these phospho-forms and how this distribution is regulated have remained unknown. Here we show that, when kinase and phosphatase act in opposition on a multisite substrate, the system can exhibit distinct stable phospho-form distributions at steady state and that the maximum number of such distributions increases with n. Whereas some stable distributions are focused on a single phospho-form, others are more diffuse, giving the phosphoproteome the potential to behave as a fluid regulatory network able to encode information and flexibly respond to varying demands. Such plasticity may underlie complex information processing in eukaryotic cells8 and suggests a functional advantage in having many sites. Our results follow from the unusual geometry of the steady-state phospho-form concentrations, which we show to constitute a rational algebraic curve, irrespective of n. We thereby reduce the complexity of calculating steady states from simulating 3×2n differential equations to solving two algebraic equations, while treating parameters symbolically. We anticipate that these methods can be extended to systems with multiple substrates and multiple enzymes catalysing different modifications, as found in posttranslational modification ‘codes’9 such as the histone code10,11. Whereas simulations struggle with exponentially increasing molecular complexity, mathematical methods of the kind developed here can provide a new language in which to articulate the principles of cellular information processing12.
A major difficulty in studying multisite phosphorylation from a systems perspective has been the lack of information regarding when different sites on the same protein are simultaneously phosphorylated13. Sites may be intricately dependent on each other14 and both the number and the position of phosphates can affect biological outcome6,7. The phospho-form distribution—the relative stoichiometries of each of the 2n phospho-forms—is thus the appropriate measure of phosphorylation state for a multisite substrate. Such distributions are starting to be measured15,16, prompted by interest in posttranslational modification (PTM) codes. A theoretical understanding will improve our ability to interpret such data, uncover biological principles and design appropriate experiments. Although we focus on phosphorylation, we hope to lay a foundation for analysing multiple PTMs.
Figure 1 summarizes a general model of multisite phosphorylation. A substrate S, with n sites, is acted on by a kinase E and a phosphatase F. Substrates may have multiple kinases and phosphatases in vivo but a single enzyme often addresses multiple sites and we focus here on the minimal enzymatic machinery needed for any n. Each enzyme may act distributively (Fig. 1a) as well as processively (Fig. 1b) using a standard biochemical scheme (Fig. 1c) and arbitrary preferences for site order. These assumptions are more general than in previous models17-22: by choosing the details appropriately, any kinase, phosphatase and substrate system can be represented. ATP is assumed to be recharged by some process that is not modelled. It is therefore not treated as a variable but its effect is absorbed into the site-specific parameters ( defined in Fig. 1c). Phosphorylation and dephosphorylation are assumed to take place on a fast time scale in comparison to synthesis and degradation of the component proteins. The model is, therefore, effectively closed: there is no flux of material through it and the total amounts of substrate, Stot, and enzymes, Etot and Ftot, remain constant at all times. With mass-action kinetics, these assumptions give rise to 3×2n nonlinear differential equations for the state variables (Fig. 1d).
With limited information on site-specific parameters, numerical simulations can be undertaken for randomly selected parameter values in an attempt to discern typical behaviours. However, doing so in a state space of dimension 3×2n rapidly becomes intractable as n increases. Here we introduce a new method of analysis, which allows strong conclusions to be drawn about steady states without having to specify parameter values in advance. Experimental evidence indicates that biological systems attain quasi-steady states in vivo23,24, including systems in which multisite phosphorylation has a significant role25. While the steady-state assumption must be confirmed in each experimental context, it has been widely used in modelling multisite phosphorylation17-22.
At steady state, the second equation in Fig. 1d yields an expression for XSu in terms of X and Su. Substituting this into the first equation in Fig. 1d makes that equation linear in Su. The coefficients are algebraic expressions in the site-specific parameters (collectively denoted a) and the auxiliary parameter t=E/F, the steady-state ratio of free kinase to free phosphatase. These expressions may be regarded as elements of a set of coefficients, (a, t), in which the a values and the t have been adjoined to the ordinary numbers, , as uninterpreted symbols that can be added, subtracted, multiplied and divided as if they were numbers (see the Supplementary Information for more details). The elements of (a, t) correspond to rational functions, or ratios of polynomials, in these parameters (Supplementary Information). By treating the parameters symbolically in this way, they can be used in calculations without their numerical values being known in advance.
The linearized equations for the Su can be solved by Gaussian elimination, which works as well over the coefficients (a, t) as over (Supplementary Information). The steady-state phospho-forms can thereby be shown to satisfy (Supplementary Information)
where S0 is the unphosphorylated phospho-form and ru(t) is a rational function of t with coefficients in (a). Although these rational functions are complex, they can be explicitly calculated for any given model (Supplementary Information).
If numerical values are to be given to the site-specific parameters, it is important to know that the rational functions ru(t) remain well defined. For instance, the rational function c/(1 – c) becomes undefined when c=1. We show that, for any positive site-specific parameter values and positive t, ru(t) is always well defined and positive (‘positivity’, Supplementary Information), ruling out such problems.
Equation (1) implies that the steady-state phospho-forms can be described through a single auxiliary variable, t, so that they form a one-dimensional geometric object, or ‘curve’. Despite increasing numbers of sites, and the exponentially increasing size of the model, the steady-state phospho-forms always remain a curve, providing the basis for an exponential reduction in complexity. What changes with n is the extent to which the curve undulates, which determines how many steady states can co-exist for given amounts of substrate and enzymes (see below). Not all curves can be described by rational functions; those that can are of considerable geometric interest, as explored in an earlier paper26 and discussed further in the Supplementary Information.
The upshot of equation (1) is that, at steady state, the 3×2n state variables are determined by S0, E and F. Because the substrate and enzyme totals remain constant, we can formulate three equations for these three unknowns. The equation for S0 can be solved directly in terms of Stot, leaving a pair of equations (defined in the Supplementary Information):
which determine the steady-state E and F values corresponding to any given substrate and enzyme totals. Equation (2) exactly characterizes the steady states of the model (Supplementary Information). To find steady states, it is no longer necessary to numerically simulate 3×2n differential equations; this can be done by only solving two algebraic equations. The complexity arising from the dynamics has been distilled away. This exponential reduction of complexity is the key to what follows.
Figure 2a shows an example with four sites and five steady states. (We assume sequentiality, as in Fig. 3a, but merely for convenience.) Only stable states are detected experimentally, or found by numerical simulation, and this example is tristable; the corresponding stable phospho-form distributions are markedly distinct. Whereas distributions 1 and 3 are each focused on a single phospho-form, distribution 2 is broader. With multiple stable states a system can encode many outcomes, or several bits of information, enabling complex information encoding and processing8. Such multistability is believed to underlie cellular differentiation and other decisions8,27 but experimental examples have, so far, only demonstrated bistability23-25. Bistability in two-site phosphorylation was previously shown by modelling20.
It remained unknown how the number of stable states depends on n. We found that multistability tends to occur when substrate is in excess over enzymes (as in Fig. 2a). In this approximation, the two equations for E and F in equation (2) can be reduced to one polynomial equation for t=E/F (Supplementary Information):
where ai (a) and N lies between n+1 and 2n depending on the model. Positivity of the ru(t) is essential here. Positive solutions of P(t)=0 correspond approximately to steady states. We show that the discrepancy between the exact steady states found by equation (2) and the approximate ones found by equation (3) can be made as small as desired by increasing the excess of Stot over Etot and Ftot (Supplementary Information).
The advantage of equation (3) over equation (2) is that we can readily construct solutions of the former. If n is even and we choose any n+1 distinct positive numbers, then we can always find a model for which the corresponding P(t) has these numbers as solutions (Supplementary Information). Moreover, Stot can be chosen arbitrarily, so that the approximation of equation (2) by equation (3) is as accurate as desired. It follows that the corresponding model has n+1 steady states. The example in Fig. 2a meets this bound for n=4. If n is odd the same can be achieved for any set of n distinct positive numbers (Supplementary Information). Note that odd numbers of steady states are constructed in both cases.
This is also what we find in simulations. Furthermore, if the steady states are ordered by their corresponding E/F values then unstable states always occur between stable ones (Supplementary Information), as in Fig. 2a. Hence, if there are 2k+1 steady states, k+1 of them are stable. It follows that a model with n sites can have as many as (n+2)/2 stable states (x being the greatest integer not greater than x). We see that the tristability in Fig. 2a is only the tip of the iceberg: the maximum number of stable states increases with increasing numbers of sites.
Experimental detection of stable states requires an understanding of how they arise dynamically. Figure 3b considers the inter-conversion of S0 and S1, for the sequential system in Fig. 3a. An informal argument based on the Michaelis–Menten approximation suggests that, for suitable parameter values, if the system is started entirely in S0 then substrate can remain trapped predominantly in that state, as in distribution 1 in Fig. 2a. A similar argument applies to trapping of S4, as in distribution 3. Simulations show that all three stable distributions can be reached by starting from suitable mixtures of S0 and S4 (Fig. 2b).
In vivo, the most likely way that a phosphorylation system is regulated is by modulating its enzymes. Because the enzymes have been assumed to be in their active states, this corresponds to altering Etot or Ftot. Figure 4a shows, for the tristable system in Fig. 2, that changes in Etot can switch the system between stable states 1 and 3. It is possible that more complex modulations, involving both Etot and Ftot, could access all three stable states or that different parameter values could facilitate additional switching capability. Figure 4b shows another option, in which changes in Etot can switch between three stable states even though there is only a narrow window of Etot values for which three stable states coexist. In other words, the system may not need to exhibit robust tristability to have access to three stable states.
DNA provides a static, structural mechanism for encoding information at a capacity of 2 bits per base pair. The idea that PTMs provide a dynamic mechanism for information encoding has been broadly influential10,11 but the mechanistic details remain a matter of debate9,28 and no estimate of information capacities has emerged for any such code. Our result provides the first demonstration of a PTM mechanism that can, in principle, encode an arbitrary amount of information, along with an estimate of its information capacity. If natural selection has found such a capability useful, that may help account for the emergence of large numbers of phosphorylation sites.
This work was supported in part by the NIH under grant R01-GM081578. We thank A. Manrai for scientific discussions, R. Ward for editorial help and HMS RITG for support with cluster computing. We acknowledge the encouragement of the late Stephen Thomson (1946–2006) and Charles Gunawardena (1929–2007).