At a molecular level, the functioning of cellular processes is unavoidably stochastic. Recent advances in real-time single cell imaging, micro-fluidic manipulation and synthetic biology have shown that gene expression and protein abundance in single cells are dynamic random variables submitted to significant variability [4
Markov processes approaches to gene networks dynamics, originating from the pioneering ideas of Delbrück [8
], capture diverse features of the experimentally observed variability, such as transcription bursts [4
], various types of steady-state distributions of RNA and protein numbers [10
], noise amplification or reduction [12
]. In the case of molecular switches, random transitions between two or several metastable states with different transcriptional activities, limit the possibility of the corresponding genetic circuits to store cellular memory and generate variability [6
Even the simplest stochastic model, such as the production module of a protein [11
] contains tens of variables and biochemical reactions, and at least the same number of parameters. Direct simulation of such networks by Stochastic Simulation Algorithm (SSA) [15
] is extremely time consuming. Network reverse engineering, model parameter identification, and design principles studies suffer from the same drawbacks when the full models are attacked with traditional tools. Various approximation methods were proposed, based on time discretization such as binomial, Poisson or Gaussian time leaping [16
], or based on fast/slow partitions [18
The time complexity of exact simulation algorithms scales with the number of jumps (reactions) per unit time. Currently available hybrid algorithms treat fast reactions as continuous variables, which significantly reduces the number of jumps [18
]. Although computationally efficient, the use of slow reactions as discrete variables, and of rapid reactions as continuous variables, does not always provide a convenient description of the hybrid stochastic process. Indeed, the discrete/continuous structure of the state space of a hybrid molecular system is better expressed in terms of species components. Activity of some promoters and production of the corresponding proteins occurs in bursts [6
]. In a bursting event, a steep increase of the number of transcripts is followed by relatively smoother exponential degradation. In this example, the natural candidates for continuous variables are the gene transcripts and products concentrations, while the states of the DNA promoters are the discrete variables. This picture also emphasizes the possibility for noise transfer from discrete, microscopic components, to continuous, mesoscopic, and closer to the phenotype, variables. In such processes, both microscopic and mesoscopic variables are noisy. At the mesoscopic scale, the stochastic fluctuations result from the intermittency of the dynamics and do not have a simple dependence on the numbers of molecules of the continuous components. Counter-intuitively, the noise amplitude could increase with the number of transcripts or proteins in a burst [6
Stochastic intermittence or bursting occurs not only in transcription, but also in many other biochemical processes, such as action potential firing [22
], blips in calcium signaling [23
], possibly happloinsufficiency [24
Thus, a large class of molecular stochastic models can be approximated by stochastic hybrid processes in which some state components are discrete, while others are continuous. Processes of this kind have been intensively studied in relation to applications in technology (air traffic management, flexible manufacturing systems, fault tolerant control systems, storage models, etc.) [25
]. Several classes of stochastic hybrid processes were studied, among which the most important are the switching diffusion processes and the piecewise deterministic processes. For switching diffusions, the continuous state component has continuous trajectories governed by stochastic differential equations, punctuated by jumps controlled by the discrete component. For piecewise deterministic processes, the continuous component follows deterministic ordinary differential equations between two jumps. The jumps can be changes of the parameters in the (stochastic) differential equations (switch of regime with no discontinuity in the trajectory), or instantaneous changes of the continuous variables (discontinuities of the trajectory), or both. Piecewise deterministic processes have been recently proposed as models for various biological systems [29
]. Diffusion approximations (without jumps) of Markov processes justify approximate versions of the Gillespie algorithm, such as the τ
-leap method [31
]. A general approach allowing to obtain and classify various hybrid approximations that can be obtained from molecular Markov jump processes is still needed and represents a first result of this paper. Hybrid stochastic processes can be obtained by applying the law of large numbers (or the central limit theorem) to the continuous components. This will be performed here by using a partial Kramers-Moyal expansion of the chemical master equation. In many cases, this procedure provides only a moderate reduction of the number of stochastic jumps to be simulated. Indeed, remaining discrete components can jump rapidly, in which case the simulation by a hybrid process performs ineffectively.
The second result of this paper allows us to cope with frequent jumps of the discrete components by using averaging. Averaging techniques are widely used in physics and chemistry to simplify models by eliminating fast microscopic variables [32
]. In our case, the microscopic variables are the discrete state components (for instance species present in low numbers) and the resulting approximations are averaged switched diffusion or averaged (piecewise) deterministic processes. Standard averaging techniques for discrete Markov chains [35
] use state aggregation. However, state aggregation algorithms are effective only for Markov chains with a finite, small number of states. A more effective averaging method, that we present here, is cycle averaging in chemical reaction networks. Our method exploits the formal analogy between the quasi-stationarity assumption for fast deterministic linear cycles [36
] and the stochastic averaging assumption for the same type of submodels.
Stochastic quasi-steady-state approximations presented in [37
] combine singular perturbations for the master equations and Van Kampen's Ω-expansion [38
] (first order Kramers-Moyal expansion) and are very close to our approach. However, we provide here a much more general setting and completely characterize the hybrid limits that result from this setting.
The stochastic dynamics of our simplified hybrid models provide good approximations of the un-reduced chemical master equation. We show rigorously elsewhere [39
] that the simplified hybrid processes are weak approximations of the un-reduced Markov process. This means that all the statistical properties of the trajectories are approximated in distribution. In particular, steady state distributions of molecular species concentrations should be accurately reproduced. Also, distributions of intermittence times like the times between bursts in the production of mRNA or proteins, or the commutation time for stochastic switches are expected to be similar for the reduced model and for the initial model.
The structure of this paper is the following. In the Methods section we obtain hybrid simplifications for pure jump Markov processes and also propose a new averaging algorithm for these models. In the "Results and Discussion" section we present several examples of hybrid models obtained from stochastic chemical kinetics models, including a medium scale protein production model for which we demonstrate averaging.