One goal of systems biology is to develop detailed models of complex biological systems that quantitatively capture known mechanisms and behaviors, and also make useful predictions. Such models serve as a basis for understanding, for the design of experiments, and for the development of clinical intervention. In support of this goal, there has been a strong push to build mechanistically correct kinetic models, often based on systems of ordinary differential equations (ODEs), that are capable of recapitulating the dynamic behavior of a signaling network. These models hold the promise of connecting biological and medical research to a class of computational analysis and design tools that could revolutionize how we understand biological processes and develop clinical therapies [1
One type of experiment for model validation involves stimulating a system with a step change in the input (typically by adding a high concentration of ligand) and then measuring the change of network readouts (the concentrations or activities of various downstream species) as a function of time. Candidate models are fit to the data and the best model is selected based on criteria such as the quality of the fit, the simplicity of the model, and other factors. While it is tempting to select a simple model consistent with the known biochemical mechanisms that fits all available data, future experimentation may prove this choice incorrect. Rather, it may be preferable to collect “all” models consistent with known mechanisms and data, and to design follow-on experiments capable of distinguishing among the model candidates. In support of this less-biased approach, here we develop an approach for designing these follow-on experiments using dynamic stimuli.
While the step-response experiment is attractive for its ease of implementation, dynamic stimuli have the potential to uncover more subtle system dynamics and to improve model selection in the cases where step-response experiments are not sufficiently discriminating. One example that illustrates the use of a dynamic stimulus to distinguish between two models is the work by Smith-Gill and co-workers on the detailed mechanism of antibody–antigen binding [3
]. Initial step-response experiments were compatible with either a one-step or two-step binding mechanism, in which the ligand and antibody first come together in a loose encounter complex before forming a fully bound complex. To resolve this ambiguity, the authors applied a series of rectangular pulses of ligand concentration to their system. The resulting binding curves produced by this dynamic stimulus were inconsistent with the one-step model but were consistent with a two-step model and suggested the existence of an encounter complex, even though such a complex could not be measured directly by the assay.
These results show that time varying inputs have the potential to distinguish closely related models of biochemical systems. For the relatively simple antibody–antigen system, an appropriate dynamic input was deduced intuitively. However this sort of intuitive design is difficult, especially in the case of more complex cell signaling pathway models, which may be described by hundreds or thousands of differential equations. An automated approach that could design experiments to test these complex systems has the potential to expand the scope of model selection experiments.
Previous work in designing dynamic stimuli for the purpose of model discrimination in systems biology has focused on choosing input trajectories that maximize the expected difference in the output trajectories of competing models [4
]. In addition to model discrimination, a rich literature exists on experimental design in systems biology for the purpose of estimating model parameters [2
]. These optimization approaches for model discrimination have been applied to small biological systems, but the nonlinearity of the models combined with the presence of many local minima has thus far limited their application [8
There is a need to extend these methods to design experiments that may not be optimal but are capable of discriminating between large pathway models. Instead of trying to design an input signal that maximizes the predicted difference between two model readouts, we recast the problem as a control problem (). We choose a target trajectory, and then challenge a model-based controller to drive the system to follow the target trajectory. The extent to which the controller based upon a given model is able to drive the physical system is a measure of the fitness of that model.
Schematic of Experimental Design
We demonstrate our methodology by applying it to the epidermal growth factor receptor (EGFR) pathway. This pathway has been extensively studied and modeled [14
]. EGFR and its family members (Erb2, Erb3, and Erb4) are known to mediate cell–cell interactions in organogenesis and adult tissues [19
]. Overexpression of EGFR family members is a marker of certain types of cancer, including head, neck, breast, bladder, and kidney [20
]. Because of their clinical importance, the EGFRs themselves, as well as various downstream proteins, are targets of therapeutic intervention [21
]. Despite clinical interest in the EGFR pathway and over 40 y of intense study, there is still much about the pathway that is not known. For example, in three recent studies [23
], a number of proteins that changed phosphorylation state in response to EGF stimulation were found that were not previously known to be part of the pathway; in addition, many of the known pathway proteins are not part of any computational model [26
The ordinary differential equation model of Hornberg et al. is a widely used mechanistic model of EGFR signaling [16
]. This model is a refinement of earlier models of the pathway [17
]. It describes signal transduction initiated at the cell surface by EGF binding to EGFR, leading eventually to the dual phosphorylation of ERK as the most downstream outcome, which then participates in a negative feedback to the top of the pathway. The elementary molecular processes modeled include bimolecular association and dissociation, phosphorylation and de-phosphorylation, synthesis and degradation, as well as endocytosis and trafficking all described with mass-action kinetics. The model contains 103 chemical species, 148 reactions, 97 independent reaction rates, and 103 initial conditions.
We applied our computational methods initially to a small portion of the EGFR model for development and demonstration purposes, and then to the full model. In both cases, we formulated a set of closely related models that exhibit similar step-response behavior. We built a controller capable of controlling each candidate model and asked the controller to drive the system output (doubly phosphorylated ERK) to a predetermined value. Finally, by applying these designed inputs based on the reference and perturbed models, we showed that it is possible to discriminate between the various model alternatives.