|Home | About | Journals | Submit | Contact Us | Français|
The dynamics of how the constituent components of a natural system interact defines the spatio-temporal response of the system to stimuli. Modeling the kinetics of the processes that represent a biophysical system has long been pursued with the aim of improving our understanding of the studied system. Due to the unique properties of biological systems, in addition to the usual difficulties faced in modeling the dynamics of physical or chemical systems, biological simulations encounter difficulties that result from intrinsic multiscale and stochastic nature of the biological processes.
This chapter discusses the implications for simulation of models involving interacting species with very low copy numbers, which often occur in biological systems and give rise to significant relative fluctuations. The conditions necessitating the use of stochastic kinetic simulation methods and the mathematical foundations of the stochastic simulation algorithms are presented. How the well-organized structural hierarchies often seen in biological systems can lead to multiscale problems, and possible ways to address the encountered computational difficulties are discussed. We present the details of the existing kinetic simulation methods, and discuss their strengths and shortcomings. A list of the publicly available kinetic simulation tools and our reflections for future prospects are also provided.
Reactions occurring among a set of reactants define a kinetic reaction network. Traditionally, the set of reactions is described by nonlinear ordinary or partial differential equations, solutions of which provide insights into the dynamics of the studied processes. The size of such networks can vary considerably. In some instances, hundreds of thousands of reactions may be needed to describe the complex, coupled phenomena. Although kinetic modeling and the underlying fundamental mathematics are used in vastly different fields of science and engineering, here we will concentrate on biological kinetic networks which model cellular activities as a set of reaction processes. Biological networks are intrinsically very rich in character; feedback loops are common, and bifurcations and oscillations can lead to interesting dynamical behavior. Although there are many commonalities, two aspects, in particular, distinguish biological networks from networks found in other fields: Firstly, in biological systems, copy numbers of many species are very low, which can give rise to significant relative fluctuations. Secondly, biological systems tend to have well-organized structural hierarchies.
The traditional way of modeling the time evolution of the molecular populations in a reacting system is to use a set of coupled, first order, ordinary differential equations (ODEs) called the reaction rate equations (RRE). The RRE are usually inferred from phenomenological arguments under the assumption that the system is well-stirred or spatially homogeneous; that is, no persistent correlations develop among the positions of the molecules. The RRE is nonlinear in the presence of bimolecular reactions, which is almost always the case. Owing to the possibility of widely differing reaction rates, the RRE can often exhibit vastly different time scales, a condition known as dynamical stiffness. Sometimes, under certain simplifying assumptions such as quasi steady-state or partial equilibrium, the RRE reduce to a system of differential-algebraic equations (DAEs).
However, a deterministic approach alone may not be sufficient for cellular systems when the underlying problem is inherently stochastic. Fluctuations in the concentrations of biomolecules can be quite large and this may require special treatment in modeling the biological networks (1–4). This is particularly true for molecular species involved in the transcriptional regulation of gene expression. In certain such cases, as the small copy number limit is reached, differential equation-based models break down since the assumption of continuous molecular concentrations is no longer valid. In such cases a more general approach, based on stochastic methods in which natural fluctuations are treated in a direct and explicit manner, has been advocated for the quantitative analysis of biological networks (5–7).
A simple argument demonstrates why fluctuations can be considerable in cellular systems. The typical diameter of a microbial cell is a few micrometers (μm), which corresponds to a cellular volume on the order of ~10 femtoliter (fl). In this volume, the concentration of 1 molecule is roughly equal to 160 picomolar (pM). This discrete nature is, of course, more severe when the molecules are confined to even smaller substructures, which is typically the case in eukaryotic cells. Thus, given that typical binding affinities of the biomolecules are often in the pM to μM range, and ignoring whether the thermodynamics are still valid at this limit, even very small fluctuations in the molecule copy numbers can cause a regime change in biomolecular interactions. Therefore, fluctuations resulting from a few discrete reactions can significantly affect the dynamical patterns in cellular systems. In a series of papers Gillespie showed the equivalency of deterministic and stochastic formalisms in the limit as the populations of all constituent chemical species tend to infinity, and discussed the importance of stochastic fluctuations when the system sizes are small (1, 2). We note that stochastic differential equations are often used in physics, but they are usually obtained by starting with an assumed ordinary differential equation and then adding a “noise” term that has been carefully crafted to give a preconceived outcome. Although such approaches are computationally less expensive, they do not capture the true nature of the intrinsic fluctuations. Therefore, the use of discrete stochastic approaches is often more suitable in studying biological systems.
Although they provide a more realistic modeling frame, discrete stochastic simulations are often too slow and computationally expensive. Building upon earlier fundamental work (1, 2, 8), development of new mathematical methods and numerical algorithms have been pursued by many groups to overcome some of the computational bottlenecks (5, 6, 9–23). Development of approximate algorithms was also pursued. For example, the probability weighted dynamic Monte Carlo method (24) has been shown to enable the simulation of network models with tens of thousands of reactions and hundreds of compartments using desktop computers (25), and various tau-leaping algorithms, which allow the use of larger time steps with tolerable inexactness (26–32), can also lead to significant speed ups in the stochastic simulations.
The spatial organization and structural hierarchies that often occur in biological systems provide an additional degree of complexity. Information transfer and biochemical activity can depend on location in cellular systems. Indeed, spatial organization becomes particularly important when the kinetics of the interaction networks depend on the local environmental factors, i.e., when they depend on local conditions and the environment or when material transport is required to convey the information. A good example to this is syntrophic (mutually interdependent) microbial systems, in which organisms in the community depend on each other for metabolite use and production. Here the metabolic activity necessary for cellular growth and survival can depend on the substrate availability and favorable environmental factors that are jointly established by multiple organisms. The relative concentration of the organisms and their geometrical arrangement can lead to heterogeneous distributions reflecting local variations in the microbial composition. Thus, the dynamical behavior of microbes in syntrophic systems relies on the combination of highly localized molecular events (i.e., metabolism) and macroscopic transport of products over a much larger spatial scale (i.e., between the cells of syntroph organisms). An analogous situation arises in mammalian systems, where eukaryotic cells have compartments with specific functionalities. These subcellular compartments exchange material in a concerted fashion as part of fundamental cellular processes such as synthesis, degradation, and receptor signaling. At the same time, the milieu of the cell within its tissue group is a consequence of the tissue itself. Thus, compartmentalized reactions, though many spatial scales smaller than the tissue in which they are found, simultaneously drive and respond to changes in the tissue. This hierarchy of compartmentalized activity represents a key challenge in multiscale modeling.
Particle transport dynamics can also play a central role in transferring information in many biological problems (33). This is particularly true when messenger substrate molecules or interacting proteins are transported over distances many orders of magnitude larger than their size. For example, energy metabolism of many bacteria requires cytochrome shuffling as part of the electron transfer mechanism. Oxygen availability and penetration can be very important in bacterial growth, both in batch reactors and in biofilms. Plant ecology too involves location specific activities. Most plants allow their root nodule cells to be infected by microbes or fungi for arbuscule development, which is used to obtain nutrients (particularly phosphorus) from the soil. At the same time, plants use their leaves for the necessary carbon production, some of which is also transferred to the microbial/fungal mycorrhizal symbiotic (intimate interdependent) partner. Material transport also plays a central role in determining cellular response in higher organisms, for which stress-induced Ca++/cAMP release in mammalian systems would be a good example (33).
Recent progress in experimental techniques, particularly fluorescent labeling and high-resolution microscopy, is beginning to make it possible to collect biological information about local dynamical activities (see, for ex., Refs. (34–38)). Models that represent dynamics using spatially averaged quantities cannot capture effects due to local inhomogeneities. For the types of problems described above, computational biology studies aiming to complement the experimental efforts require the development and analysis of spatially resolved multiscale models. The standard chemical master equation (CME) and stochastic simulation algorithm (SSA) assume that the system is well-stirred or spatially homogeneous (1, 4), which for some problems might be an enormous simplification. Currently, the strategy most likely to work appears to be the locally homogeneous approach, which dates back to at least the mid-1970s (39, 40). In this approach, the system volume is subdivided into K subvolumes that are small enough for each to be considered spatially homogeneous. Each subvolume forms a reactor compartment, and the compartments are linked at the higher whole-system scale. Using this framework, whenever a chemical reaction occurs in the system we now know where it occurred, at least to within the resolution of the subdivisions. However, this spatial resolution comes at a considerable cost in computational complexity. The dimensionality of the state space and the number of chemical reaction channels have been increased by a factor of K, the number of subvolumes. Moreover, since a large set of reactions describing the diffusive transfer between subvolumes is needed to complete the model, the increase in computational complexity is in fact even greater.
From a purely mathematical point of view, the locally homogeneous approach to a spatially inhomogeneous system is essentially the same as for a spatially homogeneous system, in that one winds up with a CME and an SSA of the same “jump Markov” form (4, 41, 42). However, the resulting network model has a much higher dimensional state space and consists of a much larger set of reaction channels. The challenge of determining how to divide the system into subvolumes efficiently and without sacrificing the details of the problem must be overcome before this model can be routinely implemented. Another challenge is the development of algorithms that scale efficiently with the model dimension as it would be necessary to handle the complexity resulting from the steep increase in the network sizes. For large systems, the conspicuous absence of algorithms and software capable of taking advantage of available computing resources indicates a need for the development of hybrid algorithms in which the problem is separated into deterministic and stochastic parts to increase computational efficiency without sacrificing scientific veracity.
While both deterministic and stochastic formulations are currently used in the kinetic modeling of biological systems, as has been mentioned above, discrete stochastic methods are more suitable for biological systems. With this in mind, and because the numerical simulation of ordinary differential equation models is a well-established area with excellent and available software, in the remainder of this chapter we concentrate on stochastic simulation methods. We first summarize the fundamental steps and the mathematical foundations for kinetic simulation of reaction networks. We follow that with a discussion of shortcomings and size scaling properties of existing stochastic simulation algorithms. We then conclude by briefly highlighting the future research trends from our perspective.
Kinetic simulation of biochemical systems involves the following basic steps:
The following simple example illustrates steps (1–3). Assume that we are studying a receptor (R) system that binds a ligand L. The ligand bound receptors (RL) binds an effector/adaptor/scaffold molecule E that helps with signal transduction. Receptors that are in complex with the effector molecules are labeled as RLE. For this case, the kinetic system consists of the following two reversible reactions L+R RL and RL+E RLE. Forward and reverse reactions for these two reactions are second and first order reactions, respectively. For simplicity, let us use a mass-action formalism where the rates for the first (A→C) and second (A+B→C) order reactions are represented as k[A] and k[A][B], respectively, where [X] stands for the concentration (or the number of molecules with the proper volume conversion) of species X.
In this example, the species list would be L, R, RL, E, and RLE. The reaction list will contain four reactions (i) L+R→RL, (ii) RL→L+R, (iii) RL+E→RLE, and (iv) RLE→RL+E. If RLE is the species of interest (signal transducing receptors), the output of the simulations would be [RLE], the concentration of the species RLE. This identification completes the step (1).
Development of the RRE follows a simple rule: Reactions that decrease or increase the concentration of a species appear as a source term in the rate equation for that species. For example, the reaction RL+E→RLE decreases [RL] and [E], and increases [RLE]. So the term k3[RL][E] is included as a negative contribution in the rate equations for d[RL]/dt and d[E]/dt, and as a positive contribution in the d[RLE]/dt rate equation. Using this logic the RRE equations for the above example would be
Once the model is defined as above, preparation for the simulation is slightly different depending on the method to be used. If a deterministic approach is chosen, then the RRE is investigated in terms of the stiffness of the equations, and an appropriate integrator is chosen. We note that for more general formalisms, singularity of the Jacobian is another point of concern but that subject is outside the scope of this chapter. If a stochastic simulation approach is chosen, then the RRE is converted into a reaction table (Table 1) form where propensities are associated with each reaction.
In the stochastic approach, the stoichiometry matrix defines the change in the species concentrations when the corresponding reaction occurs. Entries for the reactants and the products appear as negative and positive, respectively. With this information at hand, at every step of the stochastic simulation one can update the system configuration x(t) according to the occurrence of reactions as x→x+ν·ω using the information about how many times the reactions occur ω.
Step (4), execution of the simulations, is the key step in the kinetic modeling studies. The numerical solution of ODEs or DAEs is a well established area with widely available software (43). Thus, we will focus on stochastic simulation methods in the remainder of this chapter.
The Stochastic Simulation Algorithm (SSA) developed by Gillespie (1–4) and its variants are the basis for stochastic kinetic simulation of biological systems. In SSA it is assumed that successive reactive molecular collisions in the system are separated by many non-reactive collisions, which effectively wipe out all positional correlations among the molecules. Therefore, SSA is suitable for the simulation of well stirred systems.
Defining the state of the system by the vector x(t) whose components xi(t) (i=1 to N for N species) donate the number of molecules of species Si in the system at time t, one can establish from kinetic theory the following result: For each elemental chemical reaction channel Rj (j =1,...,M), there exists a function aj such that, if the system is in state x=(x1,…, xN) at time t, then the probability that reaction Rj will occur somewhere inside the system in the next infinitesimal time dt is aj(x). This can be regarded as the fundamental premise of stochastic chemical kinetics. The function aj is called the propensity function of reaction channel Rj. Each reaction channel Rj is completely characterized by its propensity function together with its state change vector νj which represents the change in the population of the molecular species caused by one Rj event, i.e., the stoichiometric coefficients of the reactions. Thus, if the system is in state x and one Rj reaction occurs, the system jumps to state x+νj. This is mathematically known as a jump Markov process and a great deal is known about such processes (8, 41).
The stochastic evolution of the state vector x(t) is specified by the function P(x,t|x0,t0), defined as the probability that x(t) will equal x given that x(t0)= x0 for t0 ≤ t. It can be proven that this function obeys the following time-evolution equation (4, 44):
This elegant equation is known as the chemical master equation (CME). Unfortunately, it is of such a large dimension (it includes a differential equation corresponding to every possible state of the system) that its solution is almost always intractable, both analytically and numerically.
The SSA advances the system in time from one reaction event to the next, and it does so in a way that is rigorously equivalent to the CME. The SSA is therefore an exact method for simulating the time evolution of a spatially homogeneous chemical system. Because SSA processes reaction events serially, i.e., one after another, it is often too slow to be practical when applied to real systems of interest. An extremely large molecular population of even one reactant species, and/or a very fast reaction, will usually cause the time increments τ to be extremely small.
Several lines of research have been pursued in attempts to improve upon the exact SSA. These include more efficient implementations of the SSA and the development of approximate SSAs with improved numerical efficiencies.
The efficiency of the algorithms can strongly depend on the formulation and numerical implementation and these are key aspects to consider when developing simulation software. Gibson and Bruck used an efficient implementation strategy for the exact SSA (10) to remove a key limitation, which restricted the practical use of the first reaction method of Gillespie (1), by not requiring the generation of Nr new random uniform deviates following a reaction event where Nr is the number of active reactions. Rather than a single waiting time between events, the method stores the absolute firing time for each reaction in an indexed priority binary heap queue with times increasing with increasing depth in the heap. Assuming a static reaction network topology, a reaction dependency graph is created initially. Following an event, which is identified by the element at the top of the heap, a single random uniform deviate is generated, reaction propensities are efficiently updated using the dependency graph, and any absolute firing time requiring updating is adjusted through a linear transformation. Lastly, the heap queue is reordered. Gibson and Bruck have shown that the algorithm has a time complexity of order O(Nr+NE log Nr) where NE is the number of events and generally requires only a single random uniform deviate per event (45). While the next reaction method can be efficient for large, very sparse networks, significant overhead may incur in maintaining the indexed priority queue. This has led Cao et al. to conclude that, for most kinetic systems, the most efficient formulation of the SSA is an optimized direct Gillespie method (46).
One way to speed up the next reaction method type algorithms is to allow multiple reactions to take place in longer time steps, which is the underlying idea behind the probability-weighted dynamic Monte Carlo (PW-DMC) method (24) and the tau-leaping based algorithms (28–32, 47). The basic premise behind the PW-DMC method is that reactions with large propensities dominate the stochastic simulation. By attaching weights that are updated as the simulation progresses, the PW-DMC approach attempts to equalize the propensities of the reactions constituting the network model. In this process, the large propensities are scaled down with an associated factor, which corresponds to how many times the involved reaction occurs when selected. It was shown that the PW-DMC algorithm solves the multiple time scale problem to a large extent (24). Efficient statistical sampling obtained with the PW-DMC algorithm made the simulation of an integrated stochastic model of EGFR signaling network that contains tens of thousands of reactions possible using desktop computers in reasonable wall clock periods (25). The success of PW-DMC results from its effectiveness in bundling the occurring reactions, and in this aspect, it is conceptually similar to the approach pursued in tau-leaping methods. The main differences between the PW-DMC and tau-leaping methods are that in the PW-DMC the bundling of the reactions is done for each reaction type (i.e., some reactions can be selected to be left out if desired) and the integration time does not have to be set in advance, but the tolerance in the fluctuation levels for the species need to be predefined. These features make PW-DMC suitable to deal with the reaction networks that can be partitioned into subclasses according to their spatio-temporal properties where the allowed fluctuation levels can be dictated on a per reaction basis. It was shown (24) that, while the fluctuation variations may be affected, the PW-DMC conserves the mean quantities. In this respect, PW-DMC is a weak order (i.e., not a strong order) stochastic method. The fluctuations in the computed quantities depend on the achieved speed-up, which can be several orders of magnitude without any significant effect on the computed mean quantities (24).
In the tau-leaping (τL) methods (47), advances in the system configuration are determined based on a preselected time step τ during which more than one reaction event may occur. Tau-leaping is based on the same premises that underlie the CME and the SSA: If τ is chosen small enough to satisfy the leap condition, which says that none of the propensity functions may change noticeably during τ, then given x(t)=x, the system state at time t+τ can be approximately computed as
The Pj are statistically independent Poisson random variables; they physically represent the number of times each reaction channel fires in time τ. The Poisson random variable P(a,τ) is in fact defined as the number of events that will occur in time τ, given that a dt is the probability of an event occurring in the next infinitesimal time dt. We emphasize that τ must be chosen small enough that none of the propensity functions change significantly during the leap. Algorithms have been developed for estimating the largest value of τ that will satisfy that condition for a prescribed level of simulation accuracy (26, 28).
While the τL methods show significant gains in efficiency, the unbounded Poisson distribution can lead to nonphysical system configurations in which negative species populations arise when the expected number of events is large enough to consume the reactant populations. Cao et al. have shown how to avoid negative populations in explicit Poisson τL (PτL) methods (48). The non-negative PτL algorithm is based on the fact that negative populations typically arise from multiple firings of reactions that are only a few firings away from consuming all the molecules of one of their reactants. To focus on those reaction channels, the modified PτL algorithm introduces a control parameter nc, a positive integer that is usually set somewhere between 10 and 20. Any reaction channel that is currently within nc firings of exhausting one of its reactants is then classified as a critical reaction. The algorithm chooses τ in such a way that no more than one firing of all the critical reactions can occur during the leap. Essentially, the algorithm simulates the critical reactions using SSA, and the remaining non-critical reactions using τ-leaping (48).
An alternative approach is based on choosing the reaction numbers using binomially distributed random variables that have the appropriate finite sampling ranges. If a short time interval τΣ is considered, for each reaction Rj, the Poisson distribution is replaced with a binomial distribution B(pj, ωj*) with the same mean ajτ. Here ωj* is an upper bound (or limit) on the reaction number and pj is the probability that Rj will fire in the interval τΣ/ωj*. In the resulting binomial tau-leaping (BτL) method, the leap size is then chosen as the smaller of the times tj computed from the constraints pj=ajtj/ωj* ≤ 1, j=1:M and the upper bound τε. Once the tau step has been determined, each network reaction number ωj is generated by sampling the corresponding binomial distribution.
Although the binomial distribution has the desirable property of having a finite sampling range, this in itself does not completely resolve the non-negativity problem. In networks with multiple channel reactant dependencies, the interdependence arising from species participating in multiple reactions may still generate negative species populations. Tian and Burrage have shown how the BτL completely resolves the non-negativity problem when there is at most a two-reactant dependence in a reaction network (31). In the BτL method of Chatterjee et al. (29, 30) the issue of multiple-channel reactant dependencies is dealt with by assigning to each reaction a binomial random variable for the number of reaction events during a step and by modifying reaction number limits through tracking currently available population sizes as reactions fire during that step. Yet, as critically noted by Chatterjee et al., in the latter approach selected reaction numbers may depend on the order in which reactions are processed (29, 30).
The multinomial tau-leaping (MτL) method of Pettigrew and Resat efficiently extends the BτL method to networks with arbitrary multiple-channel reactant dependencies (32). Improvements were achieved by a combination of three factors: First, tau-leaping steps are determined simply and efficiently using a-priori information and Poisson distribution based estimates of expectation values for reaction numbers. Second, networks are partitioned into closed groups of reactions and corresponding reactants in which no group reactant set is found in any other group. Third, product formation is factored into upper bound estimation of the number of times a particular reaction occurs. Together, these features allow for larger time leaping steps while the numbers of reactions occurring simultaneously in a multi-channel manner are estimated accurately using a multinomial distribution (32). Partitioning of the reaction network into groups with closure can be done with a graph theoretic algorithm based on the Dulmage-Mendelsohn decomposition of the stoichiometric matrix representing the network. Two methods for selecting an upper limit Ω* on the system reaction number was advocated, the rate limiting reactant and the rejection methods (32) . The former is a conservative approach that guarantees that the chosen reactions do not lead to negative populations after the system’s configuration is updated according to the occurring reactions. In contrast, the selection process in the rejection method only guarantees that the system configuration after the tau-leaping step is non-negative. It however does not ensure that all possible multi reaction paths that take the system to that final configuration involve only physically feasible intermediate stages. It was argued that, since one can only detect the relationship between the starting and end points of a move during the simulation, ensuring the state of the system to always correspond to physically feasible configurations at the observation points (i.e., starting and end points of the moves in a simulation run) is the most crucial aspect in obtaining accurate results (32) . Therefore, allowing a simulation trajectory to temporarily cross the solution space boundary would introduce insignificant inaccuracies because statistics for intermediate states have only an indirect impact. In the MτL method, tau-leaping step is then chosen as the smaller of the times t computed from the constraint p = tΣ/Ω* ≤ 1 and the upper bound τΣ where Σ is the system propensity. The actual system reaction number Ω is then sampled from a binomial distribution as discussed above, and the reaction numbers ωj, j=1:M−1, are drawn from a multinomial distribution with corresponding probabilities αj= aj/Σ and ωM=Ω−ω1−ω2− …−ωM–1. Implemented advances over the BτL method were shown to lead to significant speed ups in the simulations of biologically relevant models (32). It was also shown that MτL is less sensitive to the error parameter of the τL methods, which is a desirable aspect of the kinetic simulation algorithms.
It is well known that the Poisson random variable P(a,τ), which has mean and variance aτ, can be approximated by a normal random variable with the same mean and variance when aτ1. Using this fact, it can be shown that if τsatisfies not only the leap condition but also the conditions aj(x)τ1 for all j, which physically means that during time τ every reaction channel can be expected to fire many more times than once, then the basic tau-leaping Eq. (2) further approximates to (49)
Here the Nj(0,1) are statistically independent normal random variables with mean 0 and variance 1. This modified update formula is computationally faster than the tau-leaping formula (2) which it approximates, because samples of the normal random variable are much easier to generate than samples of the Poisson random variable. Also, since by hypothesis every reaction channel fires many more times than once during each time step τ, simulations using the updating Eq. (3) will be very much faster than the SSA.
If the arguments leading to Eq. (3) are repeated with τ replaced by a macroscopic infinitesimal dt, which by definition is small enough that no propensity function changes significantly during dt yet large enough that every reaction channel fires many more times than once during dt, then one can deduce the equation
where the gammas are statistically independent Gaussian white noise processes (49–51).This equation is called the chemical Langevin equation (CLE), and it is an example of what is known to mathematicians as a stochastic differential equation. It is entirely equivalent to the Langevin leaping formula given by Eq. (3).
We note that stochastic differential equations are often used in physics, but they are usually obtained by starting with an assumed ordinary differential equation and then adding a “noise” term that has been carefully crafted to give some preconceived outcome. So it is noteworthy that the noise term in Eq. (4), namely the second summation there, has actually been derived from the same premises that underlie the CME and the SSA using physically transparent assumptions and approximations. Use of the Langevin formulas (3) and (4) is typically justified whenever all reactant species are present in sufficiently large numbers.
At the thermodynamic limit, molecular populations of all the reactants and the system volume become infinitely large, while the concentrations of all species stay constant. It can be proven that, in this limit, all propensity functions scale linearly with the system size. Thus, the deterministic terms on the right sides of Eqs. (3) and (4) grow in proportion to the system size, while the fluctuating terms grow in proportion to the square root of the system size. This is another way of showing that for large systems “relative fluctuations in the species populations scale as the inverse square root of the system size”, a central rule of thermodynamics. In the full thermodynamic limit, the foregoing scaling behavior implies that the fluctuating terms in Eqs. (3) and (4) become negligibly small compared to the deterministic terms, and the CLE reduces to
This is none other than the traditional reaction rate equation (RRE), which is more commonly used in terms of the concentration vector. Note that the RRE has not been assumed here; instead, it has been derived from the same premises that underlie the CME and the SSA, using a series of physically understandable approximations (49, 50). Thus, the discrete, stochastic CME/SSA and the continuous, deterministic RRE are actually connected across the end of the representation spectrum. The challenge, of course, is to develop both the software infrastructure and the theory necessary to make reliable adaptive decisions so that we can accurately and efficiently simulate complex reaction systems based on detecting a discrete or continuous regime and using an optimal blend of stochastic and deterministic methods.
In addition to the multiscale nature of the involved kinetic processes, computational efficiency of stochastic simulation algorithms also depend on several measures of the size of the investigated network model, such as the overall complex population, the total number of reactions, and the average number of nodal interactions or connectivity in a network. Understanding how algorithms scale with size and nodal connectivity is an important consideration in the design of numerical experiments for dynamic biological systems. As the complexity of the systems studied in biological systems increases, such scaling issues become a serious point of concern.
To study how stochastic kinetic simulation algorithms scale with network size, construction of scalable reaction models that can be used in benchmark studies has been advocated (42). Pettigrew and Resat used two scalable reaction models to compare the efficiency of the exact Gillespie algorithm as implemented using the popular Gibson-Bruck method (GGB) (10) with the inexact random substrate method of Firth and Bray (FB) (52). Foundations of these two algorithms are quite distinct from each other and the latter algorithm has been stated to be an efficient method when the total complex population is small and the number of multistate interactions large.
The first investigated scalable model, the Size Scalable Model (SSM), is a four major-compartment model that is typical for a signal transduction network in eukaryotic cells (42). The SSM involves 2 unimolecular and 5 bimolecular reversible reactions with mass transfer between compartments for 8 complex types including ligands, adapter complexes, and receptors (42). Upon phosphorylation the receptor can become active in signal transduction. A sub-compartmentalization process, which is equivalent to forming smaller locally homogeneous regions, allows for the creation of refined models where the size of the model increases in proportion to the number of created subcompartments while maintaining the connectivity (i.e., the topology) between the species. For SSM, it was found that the GGB algorithm, whose numerical efficiency scales with the logarithm of the total number of active reactions, performs significantly better in all cases including those characterized by a very small number (~100) of complex types, each with a very small population (42).
The second scalable model, the Variable Connectivity Model (VCM), is a single compartment signal transduction network involving 2 unimolecular and 2 bimolecular reversible reactions for 3 complex types, ligand, receptor, and ligand receptor complexes (42). The receptors in this model have multiple phosphorylation sites. The size and the node connectivity between the species increase in proportion to the number of phosphorylation sites N. For the VCM, it was shown that the FB random substrate method, whose efficiency scales with the square of the total complex population, can outperform the GGB algorithm when the node connectivity is sufficiently high and the total complex population sufficiently low. Performance of the FB and GGB algorithms were determined as a function of the number of phosphorylation sites for two distinct population sets. In the first set the total complex population was initially set at 192 complexes (128 ligands and 64 receptors), while for the second set the total complex population was 10 times higher with the 2:1 ratio of ligands to receptors unchanged. It was found that, for the smaller population set the FB algorithm becomes more efficient than the GGB when the number of phosphorylation sites N reaches 6, whereas in the larger population set the GGB is clearly superior for small N up to a crossover point that occur around N~11. Thus, it was concluded that with the exception of special cases, GGB is a more efficient method than the FB (42).
This simple study using two sample realistic networks clearly illustrated that the most efficient algorithm can depend on the problem type and the used simulation method should be chosen after a careful study of the topology and multiscale properties of the investigated network.
Table 14.2 lists and briefly describes some of the existing software tools that may be used in the kinetic simulations of biological systems. In this section we list and briefly describe some of the existing software tools that may be used in the kinetic simulations of biological systems. We would like to note that the list is in no way complete and it only represents a small percentage of the available tools.
|BIONETGEN||Automatic generation of physicochemical models of biological systems with combinatorial complexity such as cellular signaling from user-specified rules for biomolecular interactions at the protein domain level, integrated with tools for reaction network simulation and analysis (53).|
|BIO-SPICE||Open source software for intra and inter-cellular modeling and simulation of spatio-temporal processes, pathways and interaction-network tools for data analysis (MIAMESpice), model composition and visualization (BioSenS, Charon, JDesigner, Sal), mathematical tools including ordinary and partial differential and stochastic equation-based methods and algorithms for the simulation of reaction and diffusion-reaction networks (BioNetS Simpathica) (54, 55).|
|CELLDESIGNER||Gene regulatory network modeling, visual representation of biochemical semantics with SBML and database support, integration with Matlab or Mathematica ODE Solvers and direct access to SBW SBML compliant simulators such as Jarnac (56, 57).|
|CELLERATOR||Open source Mathematica based package for simulation and analysis of signal transduction networks in cells and multicellular tissues, automated equation generation with arrow-based reaction notation (58).|
|COPASI||Stochastic and deterministic simulation of network pathways, steady state and metabolic control analysis including stability analysis, elementary and mass conservation analysis, optimization and parameter estimation and SBML support (59).|
|DIZZY||Stochastic and deterministic kinetic modeling of integrated large-scale genetic, metabolic, and signaling networks with domain compartmentalization, features include modular simulation framework, reusable modeling elements, complex kinetic rate laws, multi-step reaction processes, steady-state noise estimation (60).|
|E-CELL||Object-oriented software for modeling, simulation, and analysis of large scale cellular networks, multi-algorithm multi-time scale simulation method with access to Gillespie-Gibson and Langevin stochastic algorithms, Euler and higher order adaptive methods for ordinary and algebraic differential equations, parallel and distributed computing capabilities and SBML support (61, 62).|
|GRID CELLWARE||Grid-based modeling and simulation tool for biochemical pathways, integrated environment for diverse mathematical representations, parameter estimation using swarm algorithm and optimization, user-friendly graphical display and capability for large, complex models, stochastic algorithms such as Gillespie, Gibson and tau-leaping and deterministic algorithms based on ordinary differential equation solvers (63).|
|MCELL||Monte Carlo simulator for investigating cellular physiology, model description language or MDL used to specify ligand creation, destruction, and release as well as chemical reactions involving effectors such as receptors or enzymes, ligand diffusion modeled with 3D random walks, wide variety of numerical and imaging options (64).|
http://www.mcell.cnl.salk.edu and www.mcell.psc.edu
|MESORD||Open source C++ software for stochastic simulation of kinetic reactions and diffusion based on next subvolume method (NSM) for structured geometries in three dimensions, SBML compatible, three-dimensional OpenGL simulation viewer (65).|
|NWKSIM||Fortran 90/95 platform for large scale general kinetic reaction network simulation, Java swing graphical user interface, unique modeling features for computation of evolving multiple compartments with trafficking and signal transduction processes, stochastic algorithms include direct Gillespie, probability weighted dynamic Monte Carlo and weighted Gibson-Gillespie Bruck, binomial and multinomial tau-leaping algorithms (32, 42).|
|SIGTRAN||Fortran 90/95 modeling and simulation platform for large scale reaction networks, Java swing graphical user interface, dual stochastic and deterministic (ODE/DAE) simulation modes, multistate macromolecule specification and simulation using the Firth-Bray algorithm, reaction-diffusion modeling with Fricke-Wendt algorithm, molecule tagging and tracking in signal transduction networks using a fully automated graph-theoretic algorithms for determination of unambiguous set of base species, SBML file support (42).|
|SIMBIOLOGY||Matlab toolkit for modeling, simulating, and analyzing biochemical pathways, graphical user interface with visual pathway expression, manual or SBML file input, stochastic or deterministic solvers, parameter estimation and sensitivity analysis, ensemble runs and post-run analysis tools with plotting.|
|SMARTCELL||Object-oriented C++ platform for modeling and simulation of diffusion-reaction networks in whole-cell context with support for any cell geometry with different cell compartments and species localization, includes DNA transcription and translation, membrane diffusion and multistep reactions as well as cell growth, localization and diffusion modeling based on mesoscopic stochastic reaction algorithm (66).|
|STOCHKIT||Efficient C++ stochastic simulation framework for intracellular biochemical processes, stochastic algorithms includes Gillespie SSA and explicit, implicit and trapezoidal tau-leaping methods, Kolmogorov distance and histogram distance for quantifying difference quantification in statistical distribution shapes via Matlab, extensible to new stochastic and multiscale algorithms (67).|
|SBTOOLBOX||Matlab toolbox offering open and user extensible environment for prototyping new algorithms,and building applications for analysis and simulation of biological systems, deterministic and stochastic simulation, network identification, parameter estimation and sensitivity analysis, bifurcation analysis, SBML model import (68).|
|VIRTUAL CELL||Model creation and simulation of cell biological processes, associates biochemical and electrophysiological data for individual reactions with experimental microscopic image data describing subcellular locations, cell physiological events simulated within empirically derived geometries, reusable, updatable and accessible models, simulation data stored on the Virtual Cell database server and is easily exportable in variety of formats (69, 70).|
In recent years there has been rapid progress in the analysis and development of algorithms for the accelerated discrete stochastic simulation of chemically reacting systems. Although these developments provide a good foundation, existing stochastic simulation methods are far from being computationally efficient to simulate detailed biological models. This is particularly true for the models that involve multiple time and spatial scales while still requiring high resolution for accurate representation. Thus, the development of efficient and adaptive multiscale algorithms and software for stochastic-deterministic simulations are urgently needed. This is particularly true for the simulation of spatially resolved models because immense recent progress in imaging is beginning to make the data about the inhomogeneous distribution of molecular species available. As the expense associated with spatially resolved models makes fully stochastic simulations nearly infeasible, efforts to develop hybrid approaches will become very important. Hybrid methods (22, 71) have recently been proposed to simulate multiscale chemically reacting systems. These methods combine the traditional deterministic reaction rate equation, or alternatively the chemical Langevin equation, with the SSA. The idea is to split the system into two regimes: the continuous regime and the discrete regime. The RRE is used to simulate the fast reactions between species with large populations and the SSA is used for slow reactions or species with small populations. The conditions for the continuous regime are 1) the number of instances of each molecular species in a reaction in the continuous regime must be large relative to one, and 2) the number of reaction events of each reaction occurring within one time step of the numerical solver must be large relative to one (72). If either condition is not satisfied for a reaction channel, that reaction channel must be handled in the discrete regime.
The hybrid methods efficiently use the multiscale properties of the problem but there are still some fundamental unsolved problems. The first is the automatic partitioning of systems. This needs to be done very carefully, because both accuracy and efficiency depends on a proper partitioning. It will require a precise awareness of the assumptions under which algorithms and approximation available in the multiscale framework are valid, and the corresponding tests on the system to determine the partitioning. This is one reason why a proper theoretical foundation for each algorithm is so critical — to ensure that we are using it only to make approximations for which the corresponding assumptions are valid. Concerns about automatic step size selection continue to exist. Another difficult problem arises in the case when a reaction is fast but one of the corresponding reactants has a small population thus failing to satisfy the second requirement above for the deterministic regime. Such a reaction channel will still be handled by the SSA, resulting in a very slow simulation. New approaches and approximations, such as the slow-scale SSA (73) have the potential to overcome this limitation.
Another promising area of research is the application of tau-leaping based stochastic algorithms to problems involving reaction-diffusion processes. When the SSA algorithm is applied to reaction-diffusion processes the standard approach is to subdivide the heterogeneous system volume into a sufficiently large number of small-volume elements V0~O(h3) so that the assumption of a homogeneous, well-stirred system within each element is recovered. With diffusion considered as a first-order reaction in which molecules of a species are exchanged between neighboring elements at a rate rD V0D/h2 , kinetic reactions within an element occur at a rate rCV0Σ . Here D and Σ are, respectively, the diffusion constant and the total system propensity. By choosing elements sufficiently small so that rC rD is true for all elements, the SSA may be applied although under these circumstances it is not usual to be confronted with a reaction network where the total number of processes and species is now several orders of magnitude larger than that found in a single element. Moreover, it has been observed in stochastic simulations with the SSA that reaction-diffusion processes are strongly dominated by diffusion events. The increase in network size and diffusion dominated events both contribute to a severe reduction in efficiency and it is highly likely that a number of more recent stochastic algorithms for reaction-diffusion suffer from the same problem. It is possible that τ-leaping methods may be well suited to overcoming this issue with the stochastic simulation of reaction-diffusion problems.
As stochastic simulation is used more and more often in studying the dynamics of biological systems, and as biological models become larger and more complicated, there is a need for the development and efficient implementation of simulation algorithms that can handle the increasing complexity. Thus, the issue of how an algorithm scales with the system size is crucial in assessing the limitations of the algorithm. Scaling concerns expose the serious need for establishing realistic scalable models that can be used in benchmark studies for algorithm efficiency comparison. Pettigrew and Resat (42) have introduced two such models. It was observed (42) that the scaling qualities of the algorithms investigated break down significantly when the network reaches a certain size. We suspect that this is a common occurrence for all of the existing algorithms, which unfortunately can impose severe limitations on the utility of the stochastic kinetic approaches. Roughly estimating, existing stochastic simulation algorithms can be used to investigate reaction networks containing ~105 reactions using a desktop machine. Utilization of high-performance resources may push this limit to 108. Considering that even the smallest microbial community or physiologically relevant eukaryotic network can easily contain >1012 reactions, it is clear that the scaling properties of the stochastic simulation algorithms need significant improvements before they can be widely used for larger problems. As discussed above, hybrid methods that reduce the computational complexity by treating the non-critical parts of the network using deterministic models may offer the most suitable solution to the scaling bottleneck in the near future.
We conclude by observing that while recent improvements in algorithm development, computational implementation, and hybrid approaches are indications of a promising future for the stochastic kinetic simulation of biological systems, there is still plenty of room for achieving significant advances.