|Home | About | Journals | Submit | Contact Us | Français|
Although characterization of the genotype has undergone revolutionary advances as a result of the successful genome projects, the chasm between our understanding of a fully characterized gene sequence and the phenotypic repertoire of the organism is as broad and deep as it was in the pre-genomic era. There are three fundamental unsolved problems that provide the context for the challenges in relating genotype to phenotype. We address one of these and describe a generic method for constructing a system design space in which qualitatively distinct phenotypes can be identified and counted, their relative fitness analyzed and compared, and their tolerance to change measured.
Interest in the physical characteristics that parents pass on to their offspring is as old as recorded history . Although Aristotle postulated that transmission occurred through the blood, the issue eventually became the scientific question of relating genetic determinants of the parents to the phenotypic characteristics expressed by the progeny, which is recounted in the history of genetics going back to Mendel . Throughout the pre-genomic era there was steady progress in our understanding of this relationship. However, the announcement of the draft sequence of the human genome in 2000 revealed the true magnitude of the problem. On the one hand, we now have a detailed genotype in the form of a sequenced genome; on the other, we know phenotypes such as hair color of cats, shape and size of flowers, height and weight of livestock, not to mention disease states in humans. The difficulty in relating these two levels of biological organization and function is hard to over-estimate. As Sydney Brenner has stated , “The problems faced by pre-and post-genomic genetics are … much the same -- they all involve bridging the chasm between genotype and phenotype”. Moreover, between the levels of genotype and phenotype of the organism there are many intervening levels that form a rich hierarchy of molecular subsystems. Although there are some intuitive notions of what is meant by phenotype at the level of the organism, it is far from clear what the term phenotype means at the level of the intervening biochemical systems and what the phenotypic repertoire of any given system might be.
The primary goal of this paper is to define what we mean by a phenotype at the level of a biochemical system, to enumerate the qualitatively distinct phenotypes according to this definition, and to compare their behaviors. Integral to this process is a generic method for the construction of the system design space.
To provide a context for our approach consider two fundamental problems that need to be solved in relating genotype to phenotype in a given environment: (1) the task of going from genome sequence to a model of the system and (2) the task of going from the model to the phenotype of the system in the given environment. The first task is the ongoing effort of experimental biology, and, whether considered from the bottom-up approach of molecular biology or the top-down approach of high-throughput technologies and computation, the magnitude of the problem is enormous . The difficulties of arriving at such a model involve issues of parameter estimation requiring model reduction  and sensitivity analysis . The second task, assuming that an explicit model or suite of models is in hand, is also a daunting problem. Exploring the full phenotypic potential of even a relative small nonlinear model, say with 15 parameters, by analytical methods is intractable. Furthermore, an empirical sampling of alternatives suffers from a combinatorial explosion. (If one were to sample just 10 values of each parameter in all combinations it would require 1015 simulations, and one might still have missed some important behavior beyond the sampled range, or between sampled points.)
We will address the second of these fundamental unsolved problems and focus on the notion of a “system design space” for relating the parameters of a model to its qualitatively distinct phenotypes. We will describe a generic method for constructing such a design space in which qualitatively distinct phenotypes can be identified and counted, their relative fitness analyzed and compared, and their tolerance to global change measured. The method is most easily understood in the context of a simple specific example. For this purpose we have selected one of the well-studied gene circuits of bacteriophage lambda.
It is important to realize that we could consider for our illustration any given model structure with any given values for its parameters. How closely the model reflects “reality” is not the issue here. Indeed, a model is simply our current working hypothesis at any stage in the scientific method. At any given stage in its development one must extract implications of the model in order to design experiments for testing the working model. Discrepancies between predictions from the model and the results from experimental tests are expected and used to modify the model and repeat the cycle. Regardless of how the model structure is determined and the values of the parameters estimated, one is faced with the task of revealing the phenotypic repertoire for the working model so that experiments can be designed for its testing.
The lytic/lysogenic decision of phage lambda has long been studied as a model of cell-fate decisions in higher organisms, and has also been explored by mathematical modeling [7, 8]. The regulator CII initially is a critical agent in this decision, but once the developmental path to lysogeny is established it no longer plays a role . The lysogeny/induction decision has received less attention as a model system, but it has also been well studied. The classical view held that CRO and CI form a double-negative bistable switch that plays an essential role in initiation of the induction process . However, there is now evidence suggesting that CRO is not required for initiation of induction [11-13], although it can influence the threshold level of DNA damage that initiates induction  and the subsequent progress of lytic development .
The core regulator of the lysogenic state is the CI protein, which as a dimer represses the early lytic transcripts from promoters pL and pR by binding to operator sites OL and OR while activating its own transcription from promoter pRM at low concentrations and repressing it at high concentrations. Following DNA damage, which leads to an SOS response, RecA protein is activated and stimulates CI monomer auto-cleavage. The lowering of CI concentration, and hence the dimer concentration, results in de-activation of cI transcription and de-repression of lytic functions.
A diagram emphasizing the core features of CI regulation is shown in Fig. 1 along with a version having symbols that simplify the mathematical notation.
Early mathematical models of this system have been review by Santillan and Mackey . Well-established features of these models include first-order loss of CI mRNA, first-order translation of the CI mRNA, dimerization of CI monomers, cooperative activation of pRM transcription by low concentrations of CI dimmers and repression at high concentrations. These well-established features of the cI gene circuit are included in a number of detailed kinetic [14-18] and stochastic [19-21] models that incorporate the recent experimental results of Dodd et al .
The following is a kinetic model that includes these conventional features of the system. It differs from existing models by the assumption of a phenomenological Hill-like function for the rate of pRM transcription and the potential for cooperative RecA* catalysis of CI monomer degradation under substrate-limited conditions. The first assumption appears to represent adequately the experimental data of Dodd et al , as shown below. The second assumption addresses the detailed kinetics of CI degradation, which is unknown because it has been difficult to study for technical reasons [J.W. Little, personal communication]. Nevertheless, fitting experimental data with this model yields an estimate of the capacity for regulation of CI degradation that is consistent with the limited experimental data available , as discussed below.
The behavior represented by these equations can be described as follows.
At very low concentrations of D, there is a basal rate of transcription given by γ M. As the concentration of D increases there is a cooperative activation of the rate, which is characterized by the Hill number p. When the concentration of D exceeds the intermediate value needed for half-maximal activation, the activation constant KD, the increase in rate diminishes as it approaches the maximum velocity given by γ Mmax. As the concentration of D approaches and then exceeds the inhibition constant KI, there is a cooperative repression of the rate, which is characterized by the Hill number n. The loss of mRNA, M, is first order with rate constant δ M.
The rate of translation is first order in the concentration of M with rate constant γ C. At very low concentrations of R, there is a basal rate of loss of C given by δ C. As the concentration of R increases there is a cooperative activation of the rate, which is characterized by the Hill number a. When the concentration of R exceeds the intermediate value needed for half-maximal activation, the activation constant KR, the increase in rate diminishes as it approaches the maximum velocity given by δ Cmax. The reversible dimerization of CI is characterized by the second-order rate constant γ D and the first-order rate constant β D.
The number of parameters and independent variables that need to be specified can be reduced without loss of generality by normalizing about a particular value, taken arbitrarily here to be the fully induced state. The result of normalizing Eqns. (1) - (3) is the following set of equations:
where , , , , , , , , , , , τ = δ Mt, , and . Note that d, κ and χ all have the same normalization factor (1 nM−1), so determining the factor relating D to d allows for the explicit determination of KD and KI from κ and χ, respectively.
Dodd et al.  produced two sets of data critical for understanding the mechanism that controls cI gene transcription in lambda lysogens. Both sets of data, which are relevant for the numerical characterization of our model, entailed the construction of Escherichia coli strains in which the structural gene for CI is fused to an IPTG-inducible promoter and the pRM promoter is fused to the structural gene lacz encoding β-galactosidase. In the first case, the pRM promoter is associated with wild-type operator sequences that allow binding of CI dimers at low concentration to the operator sites OR1-OR2 and OL1-OL2 to activate transcription, and that allow binding of CI dimers at high concentrations to the operator sites OR3 and OL3 to repress transcription. In the second case, the pRM promoter is associated with a disabled operator site OL3 that fails to bind CI dimers thereby eliminating repression at high concentrations. The concentration of CI is determined from a standard curve relating the levels of IPTG and measured levels of CI. The rate of transcription from the promoter pRM is measured in units of β-galactosidase activity as in standard reporter assays. We fit our model to these data by minimizing the root mean square deviation with constrained quasi-Newton optimization [24, 25]. We initiated the algorithm multiple times over a broad range of parameters, and consistently found that best fit was obtained by the same parameter set. The confidence intervals for the parameters were determined by repeating the process on the same data sampled with ±10% error.
The data for the strain with the mutant OL3 site are shown as blue dots in Fig. 2. Fitting the model produced the following parameter values: basal level of expression γ M = 50.1 ± 5.63 LacZ units, maximal level of expression γ Mmax = 359 ± 15.3 LacZ units, the activation constant characterizing the concentration of D for half maximal activation KD = 0.138 ± 0.0130 μM, and the Hill number characterizing activation p = 3.01 ± 0.676. We chose reasonable values γ M = 50 LacZ units, γ Mmax = 355 LacZ units, KD = 0.130 μM, and p = 3, which produced the fitted blue curve in Fig. 2 and the normalized parameter values σ = 7.1 (the capacity for regulation or ratio of maximal to basal expression) and κ = 130.
The data for the strain with the wild-type OL3 site are shown as red dots in Fig. 2. Using the parameters determined from data for the mutant OL3 site and fitting the model produced the remaining parameter values: the inhibition constant characterizing the concentration of D for half maximal inhibition KI = 0.316 ± 0.0359 μM and the Hill number characterizing inhibition n =1.45 ± 0.248. We chose reasonable values KI = 0.320 μM, and n = 1.5, which produced the fitted red curve in Fig. 2 and the normalized parameter value χ = 320.
Finally, the black line, which represents the loss of D in a stable wild-type lysogen in rich media, is obtained by assuming the first-order rate of loss to be the same for monomer and dimer (i.e., θ = 2) and the Hill number for R activation of C self cleavage to be one (i.e., a = 1, the predicted value based on maximum robustness of the local behavior, see Section 4.2), and by adjusting the parameters α and ϕ to match the values of D in the fully-induced state and in the wild-type lysogen, thus ensuring effective switching between the lytic and lysogenic states (α = 215 and ϕ =10). It is difficult to determine experimentally α, the capacity for regulation of activated RecA (i.e., its dynamic range from basal to maximal level of expression during induction), but rough estimates based on experiments with LexA stability suggest a value of ~ 100 [23a, J.W. Little, personal communication], which indicates that our estimate has the correct order of magnitude.
Other parameters that will be used in the later analysis include ψ, δ M and δ D. The value of ψ can be deduced from the previously estimated values and the equilibrium constant for the dimerization of CI (β D / γ D). The literature provides a broad range, from 0.05nM  to 5nM . Assuming a value of ~ 1 nM yields ψ = 10. For the message that has a half-life of 6 minutes  the value of δ M = ln(2) / 6 min-1, and for a culture growing exponentially in a rich media with a doubling time of 20 minutes  the value of δ D = ln(2) / 20 min-1.
Given the assumed nonlinear structure of our model (Section 2.1), and now our fitting of its parameter values to the experimental data of Dodd et al , we obtain a particular model that provides an appropriate example to address the second task described in the Introduction. Namely, how can we define the qualitatively distinct phenotypes, identify their number, and compare their behaviors? Recall that this task is separable from the first (i.e., how the particular model was obtained in the first place), which may have involved extremely difficult issues including model reduction and parameter estimation that are not the focus here.
The generic method we have proposed is based on the rate law description of chemical reactions expressed in the power-law formalism, which can be considered a canonical nonlinear formalism from four different perspectives . From the fundamental perspective, it includes the generalized mass action (GMA) representation for which traditional mass action is a special case when the exponential parameters are restricted to small positive integers. From the local perspective, it provides a general representation in logarithmic space that is guaranteed by Taylor’s theory to be accurate within a well-defined neighborhood about a nominal operating state. This representation is the most tractable and appropriate for systems in a well-regulated homeostasis. From the piecewise power-law perspective, it provides a global representation constructed of local descriptions that can be made as accurate as desired, again according to Taylor’s theory. From the recast perspective, it provides an exact global representation for an enormous scope of nonlinear equations. Thus, the power-law formalism provides a nonlinear representation that is sufficiently general to describe biochemical models composed of rate laws having essentially any form of biological interest.
Our generic construction makes use of each of the different perspectives mentioned above, as will become clear in the following sections. For the purposes of outlining the generic method for construction of the system design space we have selected a particular model, which includes both mass-action and rational-function nonlinearities, with specific values of its parameters. Extracting the behaviors of such a model, which is the task we are about to address, is required for the design of experiments to test the adequacy of a model, which need not concern us here. Additional details of the construction method and examples of other system design spaces can be found elsewhere [30, 31]. It should be emphasized that construction of the system design space facilitates the characterization of the fixed points within each of the phenotypically distinct regions. Hence, the steady-state equations play an essential role.
Creating the dimensionless parameter groups (Section 2.2) is actually the first step in the construction. The second is setting the derivatives to zero in Eqns. (4) - (6) to obtain the steady-state equations. In this case, these equations can be combined to yield the following algebraic equation involving d and the independent variable r, which may be considered the environment of this system:
This equation can be recast [32, 29] exactly into a generalized mass action (GMA) system of equations within the power-law formalism by letting x1 = d and introducing two new variables x2 = κp + dp + χ−ndn+p and . The resulting system of equations is
Each side of the steady-state equations [Eqns. (8) - (10)] is, in general, a sum of several terms. When one term on each side of each equation is dominant (i.e., is the term with the greatest magnitude on its side), the system of equations can be approximated locally by a second system of equations, the S-system representation, which has a single analytical solution that is linear in the logarithms of the concentration variables and rate constants [33, 34]. The combination of several such equations, following the classical methods of Bode as adapted for biochemical systems (see Appendix B of ref. ), produces a third system of equations, the piecewise power-law representation . The boundaries between pieces according to this method are not arbitrary, but are determined by the parameters of the original system. This third representation of the original system is the key to constructing the system design space and identifying qualitatively distinct phenotypes.
Since each term on each side of Eqns. (8) - (10) is potentially dominant, there are as many potential solutions as there are combinations of terms in the piecewise power-law representation; hence a bound on the total number of steady-state solutions (T) is provided by
where n is the number of equations and Li and Ri are the number of terms on the left- and right-hand sides of the ith equation. (In the case of Eqns. (8) - (10), the bound is T = (3 * 3) * (1 * 3) * (1 * 2) = 54.)
However, not all potential solutions are necessarily valid. The conditions for any given term to be dominant are provided by a set of linear inequalities in log space. A test of each potential solution against the inequalities necessary for its validity will determine whether or not a potential solution is in fact a valid solution. Substituting the valid solution into the corresponding dominance conditions yields a set of linear inequalities that defines the boundaries of the region in which the solution is valid. By following this strategy for the lambda cI gene circuit, and selecting the normalized parameters r and κ (see Section 2.3) for the axis of a 2-D plot that represents a slice through the design space, we obtain the result depicted in Fig.3. Note that only 10 of the 54 possibilities are valid solutions for this particular design space. Regions on the top of this design space (1,37,38) correspond to stable steady-state operating points with low values for the rate of transcription from the promoter pRM; we will henceforth refer to the qualitative nature of these as lytic-like or simply lytic states. Regions on the bottom of this design space (11,47,46,45) correspond to stable steady-state operating points with high values; we will henceforth refer to the qualitative nature of these as lysogen-like or simply lysogenic states. The regions in the middle (7,43,44) correspond to unstable steady-state operating points with intermediate values.
In each region of Fig. 3, the steady-state equations for the concentration of monomer CI (C), dimer CI (D) and mRNA (M) follow analytically from the form of the original nonlinear equations, independent of the particular numerical values for the parameters. Thus, one can in principle characterize the various regions and compare their behaviors analytically without knowing the values of the parameters. This has been done for simple systems [30, 31], but in practice it becomes intractable for large systems. The region of design space in which the system operates and the numerical positioning of the boundaries are determined when specific values for the parameters of the model are given.
The geometry of design space is determined algebraically and consists of straight-line boundaries with well-defined slopes and intersections in log space. These features are determined by the structure of the original model. In the case of Eqns. (1) - (3), these key landmarks are indicated in Fig. 3, and the x, y coordinates (log(r), log(κ) respectively) of the intersections and values of the slopes are summarized below.
Within each region there is a qualitatively distinct solution that is fully characterized  by well-established methods that quantify signal amplification (logarithmic gains ), robustness (parameter insensitivity ), and stability (eigenvalues ) involving local (small) changes in variables and parameters. Logarithmic gains, defined as L(Xi, Xk) = ln Xi / ln Xk, represent the amplification or attenuation of biochemical signals as they propagate through the system from input variables to output variables. Parameter sensitivities, defined as S(Xi, pj) = ln Xi / ln pj, represent the amplification or attenuation of changes in system variables in response to changes in the parameters that define the structure of the system. The eigenvalues, defined by the linearized system, are only a function of the system’s kinetic orders and turnover numbers. Each of these characteristics can be readily calculated and compared against a set of quantitative performance criteria to determine the relative fitness of systems represented in each region. These qualitatively distinct solutions we define as molecular phenotypes.
There are other measures of local robustness that include second-order parameter sensitivities [38, 6] and give a better indication of the synergy between parameters. These could be used to examine particular fixed points of the original equations, but at an increased computational expense. However, these second-order approaches are not relevant in the phenotypic regions we have defined because the equations in each region are already explicitly expressed in terms of linear functions in the logarithms of the independent variables and rate constants. Thus, the first-order sensitivities mentioned above are the relevant expressions and a second-order term would be zero.
Two criteria often found to correlate with the natural selection of a particular system design are the maximization (minimization) of a particular steady-state function and the response time following change. Moreover, when these values are optimal they often are robustly so (e.g., see ). Several criteria for effective local performance that reflect these expectations can be summarized as follows.
The lysogenic state should be locally robust to perturbations in the values of the N parameters that define the system, and this robustness can be quantified by the parameter sensitivities S(D, pi). It should not be influenced by fluctuations in the input signal (the level of DNA damage), which can be quantified by the logarithmic gain L(D, R). The robustness of the logarithmic gain in the face of perturbations in parameter values is also expected to be small, and this robustness can be quantified by the parameter sensitivities S[L(D, R), pi]. The dimer form of CI is responsible for the primary regulatory actions, but similar criteria hold for the monomer form of CI and the mRNA as well.
It is important to minimize the response time for restoring the system to its nominal steady state following small changes in the variables of the system, and this can be quantified in terms of the dominant eigenvalue λdominant. One could of course consider other criteria, but these will suffice for our purposes here.
The local robustness of the system in each of the phenotypic regions of system design space can be calculated analytically for the model in Fig. 1. From these results, one can predict maximum local robustness in most cases when the Hill numbers p and n are large, although tradeoffs are involved in determining the optimum values; whereas local robustness is maximal when a is small (data not shown). By inserting the parameter values estimated in Section 2.3 into these analytical results one can convert the analytical results into the numerical results summarized in Table 1. Small values imply superior performance for each of the criteria.
The local robustness in each of the phenotypic regions reveals that on average the influence of perturbations in parameters is attenuated (sensitivities less than one in the columns 2-4 in Table 1). The logarithmic gains (columns 5-7) describe the influence that changes in the environmental input signal (RecA levels) have on the output concentrations of the circuit (CI dimer, CI monomer, and mRNA). Fluctuations in the level of DNA damage have no influence on the phenotype when the circuit operates on the basal (regions 11, 7, 1) or maximal (regions 46, 44, 38) portions of the curve relating RecA activity to DNA damage. These fluctuations have an attenuated influence (logarithmic gain less than one in magnitude) on the phenotype when the circuit operates on the repressing portion (region 47) of the curve that relates rate of transcription to dimer concentration. They have an amplified influence (logarithmic gain greater than one in magnitude) on dimer concentration when the circuit operates on the upper plateau (region 45) or the lower plateau (region 37) of the curve relating rate of transcription to dimer concentration, or on the mRNA level when the circuit operates on the activating portion (region 43) of the curve relating rate of transcription to dimer concentration. In all cases, the influence that perturbations in parameters have upon the logarithmic gains (columns 8-10) is on average attenuated (sensitivities less than one). Overall, there is tendency for the local behavior to be more robust in regions with high values of the input variable (46, 44, 38) or regions with low values (11, 47, 7, 1), and to be less robust in the regions with intermediate values (45, 43, 37).
The local response times in each of the phenotypic regions of system design space are readily calculated. The half-times for the restoration of the steady states range from ~ 5 min (the fastest) to ~ 30 min (the slowest). Regions that correspond to basal levels of DNA damage (1, 11) are slower, while regions that correspond to high levels of DNA damage (38, 46) are faster. The response times shift from slow to fast over the intervening regions (37, 45, 47) as DNA damage increases. The same pattern is seen in the response time for divergence from the unstable steady states: faster in 44, slower in 7 and shifting from slow to fast in 43. On the whole, lysogenic regions (11, 47, 45, 46) are faster than their lytic counterparts (1, 37, 38), which is consistent with the experimental evidence for the local robustness of the lysogenic state .
Characterizing local (small) changes in performance is important for characterizing the qualitatively distinct phenotypes in each region, but it does not address the overall behavior when phenotypes change as a result of large environmental stimuli or mutation in components of the system. The partitioning of system design space into qualitatively different phenotypic regions provides a means, based on the boundaries, for calculating tolerances to global (large) changes in parameter values . These tolerances are defined for each parameter as the ratio of its value at the nominal steady state (normal operating point for the system) to its value on the boundary to an adjacent phenotypic region (or the inverse, depending on which value is larger). The geometry of system design space in Fig. 3 itself suggests other criteria that are relevant for assessing global behavior of the system.
Two criteria often found to correlate with the natural selection of a particular system design are large tolerances to a change in phenotype when the change represents dysfunction and fast responses to a change in phenotype when it is critical for survival. Moreover, when these values are optimal they often are robustly so (again, see ). Several criteria for effective global performance that reflect these tendencies can be summarized as follows.
The hysteretic region 43 in the design space of Fig. 3 provides a type of “safety factor”. Once a switch from lysogenic to lytic growth has been initiated, the phage becomes committed in the following sense: if there is a decrease in the SOS signal, it must be large enough to move the operation of the system completely back across the hysteretic region, any lesser decrease is insufficient to prevent completion of the switch. The hysteretic region also acts as a buffer to prevent inappropriate switching when there is only a transient small SOS signal, one that is insufficient to cause the system to cross the hysteretic region. These functions of the hysteretic region are enhanced by maximizing the horizontal distance between the inclined lines in Fig. 3, which can be quantified by the value of ΔH, which is given by σ[(2p−1)/(2pa)], or in terms of the original parameters
Again, this safety factor should be robust to parameter variations, and this robustness can be quantified by the parameter sensitivities S[ΔH, pi].
For long-term survival of the temperate life cycle, the phage must be capable of repeatedly switching from lysogenic to lytic growth and back again across the critical hysteretic region. This is facilitated by having a design that places the operating point of the system between the dashed lines in Fig. 3 and by having a large distance between these limits, which can be quantified by the value of ΔS, which is given by α2θ−2ϕ−2σ−(2p−1)/p, or in terms of the original parameters
Furthermore, this zone of operation should be robust to parameter variations, and this robustness can be quantified by the parameter sensitivities S[ΔS, pi].
The speed with which the phage can switch lysogeny OFF should be a maximum, given that this is a response to a critically deteriorating state of the host. This can be quantified by the half-time in going from a steady state with high CI concentration to one representing low CI concentrations, tOFF. Similarly, the speed with which the phage can switch lysogeny ON should be a maximum, although this may not be as critical as the need for speed of induction. This can be quantified by the half-time in going from a steady state with low CI concentration to one representing high CI concentrations, tON.
Finally, a system that has been selected to have good local as well as global performance is expected to reside within a phenotypic region of system design space far from the boundaries to other phenotypic regions in which the local performance is poorer. This corresponds to large global tolerances to parameter change. Global tolerances are quantified (according to the definition given at the beginning of Section 5) by calculating [Tlow,Thigh], where Tlow is the tolerance to a decrease and Thigh to an increase in a parameter value.
The size of the hysteretic buffer and the robustness of this buffer can be calculated analytically. The numerical results using the values of the parameters in Section 2.3 show that the breadth of this zone is represented by a 5-fold change in the normalized RecA activity. The parameter sensitivities of this ΔH value are attenuated with an average value of 0.242 ± 0.468.
The aspects of global behavior pertaining to the maintenance of a broad zone of switching across the hysteretic buffer, and the robustness of this zone in response to perturbations in parameter values, also can be calculated analytically. The numerical results, using the values of the parameters in Section 2.3, show that the breadth of this zone is represented by a 4-fold change in the normalized Michaelis constant for activation. The parameter sensitivities of this ΔS value are attenuated with an average value of 0.654 ± 0.627.
The global switching times are determined by solving the normalized equations [Eqns. (4) - (6)]. The results, using the values of the parameters in Section 2.3 and increasing the value of r from 0.00465 to 1.0, show that the half-time to switch lysogeny OFF is approximately 8 minutes for a culture growing exponentially in a rich media. Alternatively, when decreasing the value of r from 1.0 to 0.00465, the half-time to switch lysogeny ON is approximately 54 minutes, about 7-fold longer, under the same conditions.
One could use the local equations in each region to solve for the transient from an initial condition within one region until a boundary is reached, use the final values within the first region as the initial values for the solution within the second region, and solve for the transient behavior until a new steady state is achieved, or until another boundary is reached and the process repeated as necessary. Indeed, there are well-established precedents for this in the modeling hybrid systems . However, there are errors near the boundaries where the switching occurs that may be an issue is some cases. In any case, It is just as easy to solve the original equations when exploring the dynamics between the phenotypic regions that have been identified by the design space approach.
The global tolerances to large changes in parameter values are calculated from knowledge of their wild-type value within the region and their value on the boundary of the region in the system design space (Fig. 3). The values of tolerances for the combination of the best lysogenic regions (11 and 47) are summarized in Table 2. The most stringent tolerances are ~ 1.6-fold. In terms of total tolerance, the product of the fold increase and the fold decrease, the range varies from a low of ~ 2.7-fold variation to a high that is effectively infinite.
Although it had long been thought that CRO and CI form a double-negative bistable switch that plays an essential role in initiating induction, there is now evidence suggesting that CRO is not required for the initiation process. This conclusion is based on experiments that attempt to eliminate repression of transcription from pRM by CRO using mutant strains in which transcription from pRM is otherwise unaltered. In none of the experiments [11 - 13] is this ideal achieved. The evidence suggests that induction can be initiated without the involvement of CRO, although CRO does affect the threshold level of RecA activity necessary for induction  and the subsequent development of the lytic program . The results of the analysis in Section 5 support this revised view that CI itself is the core regulator governing initiation of the induction process. These results indicate that the cI gene circuit is robustly designed by natural selection when judged according to criteria for local performance (Table 1) and global performance (Table 2). Nevertheless, it is clear that factors outside the core cI gene circuit, such as the regulators CRO and CII, play roles in conjunction with CI. These roles can be interpreted in the context of the system design space in Fig. 3.
The regulator CRO influences the threshold level of DNA damage necessary to initiate induction. Atsumi and Little  have shown that without CRO, approximately 40% more DNA damage is required for induction. As damage and SOS activity increase, the level of CI is reduced. Transcription of cro then increases somewhat and the elevation in CRO leads to a partial repression of cI transcription that counteracts the derepression caused by the lowering of CI itself. The partial repression of cI transcription by CRO can be interpreted as an effective decrease in σ, the capacity for regulation of cI transcription, although this is a dynamic process and a more detailed analysis would be necessary to make quantitative predictions. When CRO is absent, this has the effect of shifting the d-e boundary in Fig. 3 to the right by the same amount as the decrease in σ. Thus, the 40% increase in the threshold level of r that is necessary for induction (or ~ 0.15 of a log unit in Fig. 3), represents approximately a 40% decrease in σ. Alternatively, when CRO is present, this has the effect of shifting the d-e boundary in Fig. 3 to the left by ~ 0.15 of a log unit, thus lowering the threshold level of r that is necessary for induction. Thus, induction and variations it its threshold are associated with movements in the horizontal direction between the dashed lines in Fig. 3.
The regulator CII assists in the establishment of lysogeny by providing an initial enhancement in the level of cI transcription from a separate promoter pRE that adds to the basal rate of transcription from the promoter pRM . The transcription from pRE is shut off once the commitment to lysogeny is established. This enhancement corresponds to a transient elevation in σ, the capacity for regulation of cI transcription. This has the qualitative effect of shifting the boundaries in Fig. 3 upward (and the d-e boundary to the right), thus favoring lysogeny by enlarging the lysogenic regions at the expense of the lytic regions. Again, this is a dynamic process and a more detailed analysis would be necessary to make quantitative predictions.
An increased multiplicity of infection leads to early elevation of CII concentrations . As just noted, this causes an increase in the parameter σ, which raises the boundary between the lytic region 1 and the lysogenic region 11. This implies an increased probability of lysogenic over lytic growth, which is consistent with the experimental observations of Kourilsky and Knapp . On the other hand, when cells are growing exponentially in poor versus rich media, CI will have an increased half-life (since this is largely dictated by the dilution rate). This causes an increase in the value of the parameter α = δ Cmax / δ C, which raises the boundary between region 7 and the lysogenic region 11. For example, the lysogenic state for cells grown in glucose minimal media with a doubling time of 60 minutes will have a value of α that is 3-fold larger than that for cells grown in LB broth . This corresponds to an upward shift of ~ 0.5 of a log unit in Fig. 3 and an increased tolerance for maintaining lysogeny, which is also consistent with the experimental observations of Kourilsky and Knapp . This result highlights the critical role played by the half-life of CI, which is discussed further below. Thus, many of the pivotal events early in the lytic versus lysogenic decision appear to be associated with movements in the vertical direction across region 7 in Fig. 3.
The ratio of mRNA to CI half-lives has a dramatic influence on the global switching times. This ratio is ~ 0.3 for the lysogen in a host cell with a doubling time of 20 minutes. During induction, when the half-life of CI is sharply reduced, this ratio rises to ~ 65. This large change in ratio would be responsible for a marked difference in switching times (as seen in Section 5.2), if it were not for the early overriding effect of CII. The time for the core circuit to turn OFF lysogeny, ~ 8 minutes, should be shorter than the time that the earliest lytic functions appear, and this is consistent with the results of Osterhout et al.  showing that transcripts of early lytic functions are first detected at around 25 minutes following induction.
The results in Sections 4 and 5 suggest that the system design of the cI gene circuit is not only robust to local (small) changes in parameters, but it is also remarkably tolerant to global (large) changes in parameters and environmental stimuli. For example, the system is capable of tolerating an approximate 4-fold range in values of the activation constant for cI transcription (KD) without losing the ability to reciprocally switch from lysogenic to lytic growth and back. This is consistent with the work of Little and colleagues [39a, 45, 46] in which they have constructed variants with major changes in the binding constants, and indeed the identity of the circuit elements, and yet many of their constructs retain the essential qualitative features of the wild-type phage. It should be noted that large values of r (major DNA damage) combined with low values of κ (including the activation constant KD) will lower the concentration of CI dimer while maintaining a high rate of transcription from the promoter pRM, which we have defined qualitatively as a lysogen-like state. If the level of dimer were lowered sufficiently, then lysogeny could not be maintained and induction would ensue.
It must be emphasized that the boundaries, which are key to the construction of the system design space, are not arbitrarily defined or selected in an ad hoc way to provide the most accurate representation of the original system. Indeed, they represent a judicious approximation that illuminates the qualitative nature of the phenotypes. The boundaries are dictated by the form of the original equations and by the kinetic parameters found therein. It is difficult to imagine another method of defining these boundaries that is as tractable and revealing as the method we have described here.
A generic approach has been used to construct a system design spaces for the core cI gene circuit of phage lambda in which qualitatively distinct phenotypes can be identified and counted, their relative fitness analyzed and compared, and their tolerance to large changes measured quantitatively. This 2-D design space is a dimensional compression of the 15-D parameter space, yet it contains all of the original parameters and independent variables that characterize the original system. Moreover, the manner in which the original parameters are combined to define landmarks in the system design space reveals important relationships among the parameters that are not evident in the larger parameter space.
The analysis within this framework reveals several interlocking principles that constitute the system design of phage lambda. The system design space provides a graphical means of representing these principles. One example is the size of the “hysteretic buffer” given by σ[(2p−1)/(2pa)]. Another example is the size of the switching zone across the hysteretic buffer given by α2θ−2ϕ−2σ(1−2p)/p. In addition to having a large size for this zone, it also is important for the normalized Michaelis constant for activation be located within this region, i.e., α−1σ2θϕ2 < κ <ασ1/p θ−1. Such relationships constitute key design principles governing the cI gene circuit. Others involving the half-lives of CI and its mRNA have been alluded to above.
While the specifics of the cI gene circuit may not be relevant to other gene circuits, the methods of constructing the system design space exhibited here are generic and might well lead to important insights when applied to the regulation of other systems. What will be needed for such an analysis is the formulation of a kinetic model that takes into account the essential interactions within the system and the appropriate experimental data that allows an estimation of the values for the model parameters.
We thank Pedro Coelho and Dean Tolla for fruitful discussions, and John Little for critical input in the construction of the model. This work was supported in part by a grant (RO1-GM30054) from the US Public Health Service to MAS, and by an Earl C. Anthony Fellowship to RAF.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.