|Home | About | Journals | Submit | Contact Us | Français|
It is proposed that computational systems biology should be considered a biomolecular technique of the twenty-first century, because it complements experimental biology and bioinformatics in unique ways that will eventually lead to insights and a depth of understanding not achievable without systems approaches. This article begins with a summary of traditional and novel modeling techniques. In the second part, it proposes concept map modeling as a useful link between experimental biology and biological systems modeling and analysis. Concept map modeling requires the collaboration between biologist and modeler. The biologist designs a regulated connectivity diagram of processes comprising a biological system and also provides semi-quantitative information on stimuli and measured or expected responses of the system. The modeler converts this information through methods of forward and inverse modeling into a mathematical construct that can be used for simulations and to generate and test new hypotheses. The biologist and the modeler collaboratively interpret the results and devise improved concept maps. The third part of the article describes software, BST-Box, supporting the various modeling activities.
Throughout its history, biology has been driven by two forces: the quest for new discovery and the ingenuity of toolmakers. The light microscope opened unimaginable frontiers, and so did many other inventions and techniques—the ability to manage radioactivity for labeling experiments, the crystallization of proteins, and the polymerase chain reaction, to name but a few. Rapid improvements in methodological development and subsequent biological discoveries were especially impressive during the second half of the twentieth century and the beginning of the new millennium, and with every decade, the methods of data acquisition grew immensely in sophistication. Of note is that the speed of generating data continued to be slow in comparison to methods of data analysis and interpretation. This situation shifted dramatically with the automation of techniques like the polymerase chain reaction and robotic spotting and picking in proteomics.1 Within a span of a few years, many laboratories became equipped to produce in one experiment more data points than any human would be willing or able to analyze by hand. Technology had ushered in the era of experimental systems biology. The dominant response to the flood of data was, rightfully and as to be expected, the development of computational algorithms that were by their nature as high-throughput and large-scale as the data-generating technologies. Indeed, the field of bioinformatics, which did not even exist twenty years ago, developed with rapid speed, with the result that many types of genome analysis have already become routine.
While bioinformatics serves an extremely valuable purpose, it is by itself not sufficient a computational tool to yield true understanding of how biological systems function. Experimentation has given us a more complete parts list than was ever available before, and bioinformatics has allowed us to sort and manage this list with admirable efficiency. However, what is still missing is a set of tools that explain the rationale of a given biological component; that can determine with objective means why a particular, observed design in nature is superior to other designs that at first appear to be just as reasonable; that can merge diverse data and contextual pieces of information into quantitative, conceptual structures that can be analyzed with the rigors of the universal language of mathematics. The field of biological systems modeling and analysis, or “computational systems biology,” addresses these tasks.
Biological systems analysis can be traced back to several roots. Some of these may even be seen in the holistic views of antiquity2,3 or found in the early work of physiologists (e.g., see reference 4 ), who long ago began to investigate the nervous system, the digestive system, the cardiovascular system, and many other complex structures in the human body as integrated entities, in which diverse components had specific roles yet worked together synergistically to achieve much greater tasks than what each component could have accomplished on its own. It has also been suggested that the origins of systems biology be placed more recently, by considering it as the evolution of molecular biology into genome-wide analyses.5
While such assessments have legitimacy, more refined definitions of biological systems analysis and modeling require that purely descriptive approaches to biology be accompanied by the ability to make reliable, quantitative predictions of the responses of cells or organisms to experimentally untested situations. Moreover, as stated before, biological systems analysis must develop tools for providing objective and rigorous rationale for biological system designs and modes of operation.6 The need to make predictions and to discover general design and operating principles necessitates the consideration of a different set of roots from which modern biological systems analysis draws. This additional heritage is theoretical in nature and embodies ideas and concepts that reach beyond the possibilities afforded by traditional biological experimentation. As was true for the glorious era of physics in the early twentieth century, we are beginning to recognize the necessity to support biology with a rigorous mathematical foundation. Such a foundation not only permits bookkeeping of the large, functional assemblages of heterogeneous molecules that we encounter everywhere in biology, but is a prerequisite for the formulation of rules and general laws that will eventually form the rudimentary building blocks of a theory of biology. The roots of this crucial aspect of systems biology are evident in the work of seminal thinkers and visionaries like von Bertalanffy, who over half a century ago proposed the mathematical characterization of organisms as dynamic, open, nonlinear, complex systems.7 The work of von Bertalanffy, Lotka, Wiener, Weaver, Turing, Rosen, and many others who strove toward the quantitative description of biological systems and the discovery of general principles governing biology shaped the theoretical foundation of today’s computational systems biology and will continue to provide guideposts for its future.
Biological systems analysis will never replace hypothesis-driven and reductionistic biological research, but constitutes its conceptual and practical complement. Biological systems analysis may ultimately reach a role as central as it is in physics, where experiments are executed only after their theoretical underpinnings have been proven beyond doubt, but we are presently far away from such prominence. In the meantime, biological systems analysis will provide additional tools and techniques for functionally organizing diverse pieces of information and data that stem from traditional biological experimentation. Therefore, computational systems biologists must collaborate closely with experimentalists who focus sharply on select biological details and mechanisms.
Biological systems analysis may be dissected into four components:
For biological systems modeling and analysis to become a standard research technique with wider appeal, it is necessary not only to develop its theoretical foundation, but also to support all major methodologies with readily available, easy-to-use computational tools. We are still far from having such tools in a quality and accessibility comparable to modern word processors or spreadsheet programs, but the number of software packages for specific types of biological systems analyses is rapidly growing. Here, we describe three aspects of computational systems biology. First, we reclassify some of the traditional methods and techniques of biological systems modeling and analysis. We then introduce a variation on these methods, namely the novel technique of concept map modeling. Finally, we demonstrate preliminary software that supports traditional and concept map modeling.
To the uninitiated, mathematical modeling is often seen as one standard set of tools, conceptually similar to a specific technique like electron microscopy or microarray analysis. Indeed, experimentalists frequently approach a modeler with the request to “model their data.” The truth is that mathematical modeling comprises an enormous repertoire of techniques, and the only real commonality is that they all lead to some mathematical representation of a biological phenomenon (the model), which is subsequently analyzed and interpreted in biological terms. To some degree, the type or classification of the mathematical representation is a technical issue. Thus, a model may be deterministic or stochastic, continuous or discrete, mechanistic-explanatory, or more like a black box. In any case, the modeling process may be exactly the same. First, a symbolic model is constructed from first principles like physical laws, as an extension of an already existing model, or from intuition. This model almost always consists of equations that contain variables and parameters. Variables could be plant or animal species in an ecological system, metabolites in a pathway model, or the expression levels of particular genes in a genome experiment, while the parameters describe more or less fixed quantities like the reproductive rate of a species, the KM of an enzyme, or the transcription rate between DNA and RNA. The analysis of the mathematical model usually requires knowledge of all parameter values, which therefore need to be identified from the body of biological knowledge. While this may sound like a straightforward task, the estimation of parameter values is often very challenging and has been, and will continue to be, the most daunting bottleneck of mathematical modeling in biology. Once the parameters are estimated, the analysis of the model is (one could say “merely”) a matter of mastery of the tricks of the trade of mathematics and computer science. Because of the complexity of biological systems, the analysis is typically executed by computer, sometimes with elegance, but more often with brute force, grinding out approximate solutions that are more than sufficiently accurate for most biological purposes. The interpretation of results is ideally performed in collaboration between the subject area biologist and the mathematical modeler or computer scientist.
The real obstacle to fast progress in biomathematical modeling is thus the determination of unknown parameters from biological information. Even within the same modeling framework, this task may be attacked in distinctly different ways. We illustrate this here with a generic approach to biological systems modeling and analysis, called biochemical systems theory (BST).8,9 This framework was originally designed for studying the dynamics and other features of biochemical and gene regulatory systems, but is not restricted to these application areas in terms of its mathematical foundation. BST is almost forty years old, and its development, expansions, and applications have been documented in several books10–13 and hundreds of journal articles, proceedings, and book chapters.
The basic tenets of BST are quite simple and transparent. In a nutshell, each variable that changes over time is given a name, typically X with an index, and its dynamics is formulated as an ordinary differential equation, which describes the change over time as it is governed by processes that affect this variable. The processes are functions of other variables within and outside the system and often of the variable itself. In most realistic cases, the modeler has some general information about these processes, but does not know their numerical details or even their mathematical structure. As an example, a population may grow in some sigmoid fashion, but the underlying mathematical function may not necessarily be known. BST addresses this problem by symbolically approximating each process with a product of power-law functions. One may wonder how it is possible to approximate something unknown, but this is conceptually no different than executing a linear regression on data points from a system with unknown structure. The only difference is that the use of power-law functions in BST is much more general than linear regression. In fact, one can—quite surprisingly—show with mathematical means that any relevant nonlinearity can be faithfully represented in the power-law formulation of BST,14 which implies that we are not likely to run out of mathematical representations. On the biological side, many successful analyses attest to the validity of this approach (see, e.g., reference 12 ).
To be specific, suppose that the variables of the system are called Xi and the processes Vi. In BST, each process Vi involving at most n dependent (state) variables and m independent variables takes the form
Here, the dependent variables are governed by the dynamics of the system and change accordingly, whereas the independent variables are usually constant during each experiment and may include inputs, control variables, and constant enzyme activities. Typical examples in a metabolic setting are metabolites (dependent variables) as opposed to enzymes or a (constant) substrate feed (independent variables). The rate constant γi describes the turnover rate of the process, and each exponent fij is a kinetic order that quantifies the direct effect of variable Xj on Vi. A positive kinetic order indicates an activating or otherwise augmenting effect, while a negative kinetic order reflects inhibition. The magnitude of each kinetic order reflects the strength of the effect. If there is no direct effect, the corresponding kinetic order is 0, and the variable, raised to 0, automatically drops out of the term, because any positive value of X, raised to 0, equals 1.
Equations within BST may be formulated in slightly different ways. In the generalized mass action (GMA) representation, every process is considered individually. Thus, if the change in Xi is governed by p input and q output processes, the starting point is the equation
and each process vjk is represented by a distinct product of power-law functions as shown above, so that the resulting equations always have the form
where Ki is the sum of p and q in the ith equation.
As an alternative, the S-system form first aggregates all influxes for a given variable with a single power-law term and similarly aggregates all effluxes in a second power-law term. The starting point for this aggregation may thus be formulated generically for each variable as
As before, the first group of terms in brackets consists of fluxes entering the metabolite pool Xi; the second group in brackets consists of fluxes leaving this pool. The quantities Vi+ and Vi− are these groups of “aggregated” fluxes, which are now approximated as one product of power functions each. The corresponding S-system equations in BST therefore consist of at most one positive and one negative power-law term and have the form
Again, the rate constants α and β are non-negative and the kinetic orders g and h are real-valued. Variables that directly contribute to the influx into Xi are included in the alpha term and variables affecting the efflux from Xi populate the beta term. Variables that do not directly affect these terms have kinetic orders of 0 and therefore do not appear explicitly. The literature contains numerous descriptions of how to set up and analyze these types of models, as well as discussions about the similarities, differences, advantages, and disadvantages of the two alternative representations (e.g., references 12 , 13 ).
A key advantage of any representation within BST is that it is straightforward to set up equations from a diagram that shows how the components of the system affect each other. In fact, given a list of components and their influences, computer programs are able to write down the equations (see below). Again, the challenging part is the identification of suitable parameter values. At this point, three classes of methods are available for this task. They are briefly described now; in a later section we will introduce an additional class.
The standard way of identifying parameters is based on “local” information. Thus, to construct a model of a metabolic pathway, one considers one enzymatic or transport step at a time, combs the literature for information about this enzyme, its cofactors and modulators, and translates this information into a mathematical rate law (such as the vjk above), which could be a Michaelis-Menten or power-law function, among a wide variety of possibilities. The collection of all rate laws governs the dynamics of the model. Comparisons of the model responses with biological observations support the validity of the model or suggest adjustments in assumptions or parameter values. If done right, this forward process can ultimately lead to model representations of the pathway that exhibit the same features as reality, at least qualitatively, if not quantitatively. Some BST examples are models of the TCA cycle,15 purine metabolism,16–18 the citric acid cycle,19,20 the Maillard-glyoxylase network with formation of advanced glycation end products,21 the trehalose cycle,22 sphingolipid metabolism,23,24 the ferredoxin system,25 and the regulation of glycolysis.26,27 In almost all of these cases, the strategy consisted of setting up a symbolic model, estimating local parameters, studying the integration of all individual rate laws into a comprehensive model, testing the model, and making refinements to some of the model structure and the parameter values.
The main disadvantage of this strategy is that a considerable amount of kinetic information is needed, but this information is often available only from differently structured experiments and often only from different species. Furthermore, the process of construction and refinement is very labor intensive and requires a combination of biological and computational expertise that is still rare.
Modern molecular biology is offering an alternative in the form of high-throughput experimental data. Particularly useful for modeling and parameter estimation are measurements of biological components (metabolites, proteins, gene expression) at a series of time points after a stimulus. As an example, which we will use throughout for illustration, Neves et al.28–30 used in vivo NMR techniques to measure the concentrations of most glycolytic metabolites in the bacterium Lactococcus lactis. This type of data allows, at least in principle, an entirely different path toward suitable parameter values. Namely, the symbolic BST model (without specified parameter values) is fitted to the time series data by means of an optimization algorithm. Thus, in contrast to the forward, “bottom up” approach described before, parameters are estimated from the observed data “globally” or “top down.” The advantages are manifold. Most important is that the data come from the same organism, are obtained under the same well-characterized experimental conditions, sometimes even in vivo, and therefore account for all processes within the organisms that could have an effect on the variables of the system.31 A significant disadvantage of this strategy is that the estimation process itself is very challenging computationally. Also, many of the processes that affect the dynamics of the system in vivo are often either unknown in detail or not even considered at all in the model. As a typical example, most standard models of glycolysis show glucose 6-phosphate as a simple intermediate in a linear chain between glucose, fructose 6-phosphate, and fructose 1,6-bisphosphate. In reality though, glucose 6-phosphate is in equilibrium with glucose 1-phosphate and also serves as the initial substrate for the pentose shunt.
The inverse strategy may also be used for time series data that stem from a pathway with a structure that is not fully known or whose regulation is obscure. At least in principle it is possible to estimate parameter values from the data and interpret them as structural and regulatory features. With good data, this is presently possible for relatively small pathways.32–34
A particular problem with any modeling approach arises in the form of “ubiquitous” metabolites like ATP and NAD. Again using the example of glycolysis, ATP and NAD are clearly important players, but they are also involved in dozens of other reactions that are not part of the model. In the past, some mathematical models have considered them constant, which really defeats the purpose of glycolysis. Others have employed conservation relationships between ATP, ADP, and AMP, or NAD and NADH, thus allowing for some dynamics without having to model a lot of details (e.g., reference 35 ). Another alternative, gleaned from biochemical laboratory experimentation, is the construction of mathematical buffers that absorb excess material, while providing material in times of high demand. The mathematical features of the buffers can be designed to adjust for dynamic variations in concentration at a predetermined rate.36 As yet a different manner for dealing with ubiquitous metabolites, we recently proposed a partial modeling approach,37 which allows us to mix well-defined components with components whose dynamics are known only in the form of time series that are experimentally observed but cannot be formulated in terms of other model components. As a case in point, for our recent analysis of time series data describing pyruvate and lactate production in L. lactis,27 measured ATP, Pi, NAD, and NADH concentrations over time were available, but we were not in a position to formulate their dynamics as functions of the system variables, because each of these factors is involved in dozens of reactions, most of which were not modeled. As a solution, the better-defined components were formulated as differential equations in BST, and their dynamics included ATP and NAD as variables. However, ATP and NAD were not modeled as a differential equation per se but entered the system as time-dependent input functions. In mathematical jargon, we considered the observed data as raw or smoothed “forcing functions” to the differential equation system.
While the traditional and newer modeling strategies are very valuable, an important step is missing between the wet experiment, biological insight and intuition, and the construction of mathematical models. This step consists of the translation of heterogeneous biological diagrams into symbolic and subsequently parameterized equations. Heterogeneous means that some components in such diagrams refer to metabolites, some to genes, and some to a variety of processes, such as apoptosis, differentiation, or the initiation of a disease process. A good example is illustrated in Figure 11,, which shows how surface features can trigger proliferation and differentiation events in overlaying cells. Concept map modeling attempts to fill the gap between this type of mixed explicit and implicit information and the typical mathematical models encountered today in biology.
The goal of concept map modeling is the development of novel methods for formalizing qualitative knowledge in diagrams like Figure 11,, for creating conceptual models, and for making these amenable to coarse, and later refined, mathematical analysis. Typical conceptual models involve components and processes at various levels of biological organization, and the traditional response to this situation by the modeling community has often been to set up models within each level, with the implicit or explicit purpose of connecting them at a later time (e.g., reference 4 ). To reach its ultimate goals, this approach requires a lot of time, and it does not even fully exploit the implicit knowledge that biologists associate with such concept maps. Therefore, it seems beneficial to derive models that directly capture the biologist’s intuition and permit the inclusion of incomplete and heterogeneous information from one or several levels at once. Thus, the foremost goal of this approach is it to get started with a quantitative formalization of a biological phenomenon, by integrating in a coarse mathematical model what biologists perceive to be functional systems surrounding their hypotheses.
The approach begins with the translation of a hypothesized, static biological map into a hypothetical, mathematical model structure. This step is followed by the formalization of observed or alleged local dynamic responses of system components under a defined set of external conditions. These two ingredients allow us, at least in principle, to infer a fully parameterized model, which in subsequent steps is analyzed, refined, and connected with other coarse or detailed concept map models. A flow diagram of this approach is shown in Figure 22.
The initial step of this effort is to establish, in collaboration with biologists, lists of components and processes with relationships and rules that are visualized in the concept map. At this stage, it is necessary to discuss, question, and revisit in detail how the pieces within each aspect of the diagram conceptually relate to each other and how they contribute to the overarching functional entity, which at the highest level will eventually become a comprehensive model of the topic of investigation. In very many cases, these maps already exist in the heads of experienced biologists, and sometimes they had already been sketched out and published. However, the maps themselves do not allow further, quantitative analysis, and it is therefore desirable to facilitate the translation of hand-waving arguments into mathematically testable structures and analyses.
For each component of the map, one records details that might become important, as well as the biologist’s level of confidence in all pieces of qualitative and quantitative information populating the map. To mathematically trained scientists, this first formalization step might be uncomfortable and appear almost unscientific, but it is a crucial (and ultimately very rewarding) process, because biological systems analysis cannot wait until all details of a system are known and solidly quantified. To ensure that the mathematical representations of concept maps are consistent and unambiguous, it is useful to develop consistent diagramming conventions for concept maps that are extensions of ideas for biochemical maps, as well as a controlled vocabulary used to capture the map and its associated semi-quantitative data. Choosing BST as a modeling framework, the structural and regulatory information from traditional maps is already sufficient to set up symbolic equations. Indeed, this step is routine textbook material and can be accomplished with computer software (see below).
Biologists almost always know much more than what is captured in static maps. For instance, they often have at least some knowledge of the types of reactions that are possible in a single enzymatic step or the time it takes between a change in gene expression and the corresponding change in enzyme activity. This knowledge enables the biologist to convert the static concept map into a collection of Boolean or semi-quantitative dynamic (SQD) maps for given scenarios. In the Boolean case, a typical statement like “if we knock out gene X, then process Y does not occur” aids in the refinement of the static map, because it suggests the closer consideration of direct and indirect influences of components inside or outside the system. In the SQD case, a typical observation may be “if we bathe the cells in a medium containing A, B starts to rise within four or five minutes and C decreases to about half its normal level.” In the ideal case, one would be able to determine from this information accurate functions for each node over a range of scenarios, which would permit direct application of our inverse methods,32,38 as discussed before.
In realistic cases of concept map modeling we will usually not have detailed time series but rather semi-quantitative or only qualitative information on the dynamics of each node. Still, one can use this minimal information for the construction of a coarse initial model. Substituting for actual time series, one may choose a simple function that captures the dynamics at each node, according to the biologists’ observation, general experience, or intuition. The particular mathematical choice of these functions is not all that critical, because they are used only for the inverse task in lieu of smooth data. Thus, our emerging software (see below) permits the biologist to sketch the process dynamics, for instance, in the form of a saturating or sigmoidal curve. Once this curve is drawn, the software permits modifications and alterations in its shape by allowing the addition of points or free dragging of the curve. These curves capture the biologists’ semi-quantitative understanding of the effect of a given variable on the dynamics of a given node in a given scenario.
Once the proposed dynamics of all variables is entered, the curves are treated like measured time series, and methods of inverse modeling can be used to estimate parameters. One must caution that, in contrast to the relatively straightforward symbolic model design step, this parameterization is complicated. While our current software does permit parameter estimation, it is still much too slow for an interactive exploration of the concept map, and further research will be necessary to make this step fast enough. Several methods have recently been proposed for this purpose, but none of them is at this point sufficiently reliable, stable, and efficacious.
In summary, the starting point for concept map modeling is a network model based on the known or hypothesized connectivity and regulatory information on the static concept map (forward step), as well as a set of assumed or manually entered functions that mimic the observed or hypothesized dynamic responses under a specific scenario at each node of the network. These two ingredients are sufficient for inverse modeling techniques to estimate parameters of a global BST model for one scenario at a time or for a collection of several scenarios that are assessed simultaneously. The resulting coarse models may be used in two ways. First, it is possible to test stability, sensitivities, gains, and other features that are routinely studied in biological systems analysis.10,12 These features show whether the conceptual model has a chance of being correct, because very high sensitivities, or lack of stability, are often unrealistic. Secondly, one can now run simulations, which are cheap to execute and often quickly lead to the discovery of weaknesses in the model or confirm assumptions made. The results of this set of analyses are shared with the subject area biologists for interpretation and revisions of their concept maps, thereby initiating the typical iterative cycle of modeling and validation (see Figure 22).
New computational techniques become appealing to a wider audience only if they are supported by user-friendly software with an intuitive graphical user interface (GUI). Of course, many mathematical packages contain algorithms for integrating differential equations and for various types of optimization. Specific software has even been developed for solving and analyzing metabolic models, once they have been formulated in the form of fully parameterized differential equations. Examples include PLAS,39 Gepasi,40 and BSTLab.41 The packages BestKit42 and Cadlive43 furthermore allow the translation of topological diagrams into symbolic equations. Here we show an interactive MATLAB module that is under development in our laboratory. This software, Biochemical Systems Toolbox (BSTBox), is presently available as a preliminary test version that allows the user to conduct traditional and concept map modeling, as it was described above. The main features for this toolbox are:
Besides the basic MATLAB installation, BSTBox also requires the Optimization, Genetic Algorithm and Spline Toolboxes; future versions may reduce these requirements. BSTBox is designed to facilitate forward, inverse, partial, and concept map modeling. Each major step of the modeling approach is supported by appropriate functionality designed in separate tabs in the toolbox, which unlock progressively as the user progresses through the stages of model development and analysis.
Upon successful installation of the required libraries in MATLAB, BSTBox is invoked with the simple command “BST,” which allows users either to: (1) create a new model (map) from a list of components and time course data (experimental or hypothetical) specified in a Microsoft Excel file; or (2) load an already existing model (map) from an earlier saved MATLAB data file. The interface is shown in Figure 33.
To illustrate the software, we consider the pathway in Figure 44,, which describes the regulation of glycolysis in the bacterium Lactococcus lactis. A detailed description of the modeling process of this system was recently presented elsewhere.27,44 The flow of material in this pathway is governed by the following steps (for abbreviations, see legend of Figure 44):
In addition to the flow of material, the regulation of the pathway is of importance. In this case study, we consider one known pair of modulators and one hypothesized inhibition process, namely:
The first step of designing a new model consists of deducing symbolic equations from a list of components. For this task, it is assumed that the user has an MS-Excel spreadsheet with true experimental or alleged time-course data, identified in the first row by a title for each column. BSTBox screens the Excel file and presents the user with the GUI shown in Figure 55,, displaying five list categories, from which the user selects: (1) MS-Excel sheet-name containing raw data; (2) MS-Excel sheet-name containing smoothed data (if any); (3) column title containing time data; (4) column title(s) containing time-course data for dependent variables to be modeled; and (5) column title(s) containing (constant or varying) time-course data for “offline” and/or independent variables.
The model analysis may now proceed in different ways, depending on how much is known about the underlying pathway structure. At one extreme, the pathway structure and its regulation may be known in great detail. This is the case for our Lactococcus case study, with the possible, minor exception of G6P inhibiting or not inhibiting glucose uptake. In this situation, BST prescribes directly which variable is to be included in which processes, and this information is already sufficient to formulate equations.12 At the other extreme, nothing much specific may be known about the pathway. In such a case, all variables enter all equations, and the later parameter estimation step will ideally identify which parameter values are zero in the optimized solution, thereby indicating which effects of variables on processes are real and which ones not. This situation of structure identification has been discussed widely in the recent literature (e.g., reference 37 ). Not surprisingly, it is much more complex than inverse modeling with a known pathway structure.
Once all constituents of the map are identified, the user specifies, as well as contextual information allows, how these components interact and influence each other. As has been discussed many times in the literature (e.g., reference 12 ; also see above), a key distinction to be made here is between the flow of material among constituents, such as in the phosphorylation of glucose to glucose-6-phosphate, and the modulation of a process, such as the feedforward activation of pyruvate kinase by FBP. In many cases, a constituent may have several distinct roles. It may be the product of one reaction and the substrate for another reaction, and furthermore modulate one of the reactions modeled in the system. Often, this information has to be mined from experimental literature or characterized in collaboration with a subject area expert. It is often convenient to organize this information as a list of processes in the system and to associate with this list the components that contribute to or modulate those processes. Table 11 summarizes this information for the glycolytic pathway.
As a specific example, the dynamics of PEP is affected by one production process, namely through degradation of 3-PGA, and four degradation processes, namely: (1)the PTS mechanism, which is affected by glucose, PEP and G6P; (2) catalysis by pyruvate kinase into pyruvate, modulated by FBP and Pi; (3) degradation back into 3-PGA; and (4) unspecific channeling into other metabolic pathways.
BSTBox provides a GUI (Figure 66)) that permits the direct specification of all such processes of a table. It allows the user to select one metabolite at a time, in the left-most list, and then to include in the other lists the components that contribute to or modulate the influx or efflux for that particular metabolite. The buttons immediately below these lists allow the specification of the number of such fluxes as well as their inspection, one at a time. A right-click menu allows the user to add or delete metabolites to or from the list.
Another feature is the option of accounting for constraints such as precursor-product relationships. This specific GUI is shown in Figure 77.. Constraints should be implemented only after due consideration, as was discussed extensively.27 As an example for possibly unexpected complications, consider the unbranched, irreversible reaction step between G6P and FBP (Figure 44).). According to the map, all material leaving the G6P pool immediately enters the FBP pool, suggesting a “hard” precursor-product constraint. However, it is known that organisms have secondary routes of generating and using G6P, which are not included in the model but are presumably active in the organism and affect the observed time courses. Thus, insistence on hard constraints might sometimes be too restrictive. On the other hand, omission of known constraints increases the parameter search space, which is often a disadvantage. No general guidelines can be given, except that cautious consideration is advised.
Given the list of fluxes and their effectors, BSTBox permits the formulation of GMA or S-system equations with a few clicks. Figure 88 shows the GUI for generating the (not yet parameterized) model system, based on the table of processes and constraints that were specified earlier. BSTBox also offers the option to view and format this system of equations in multiple formats like HTML, MS Word, and MS Power-Point. For instance, the equations for the Lactococcus case study, formatted in HTML, are:
As a future enhancement, the BSTBox GUI (Figure 88)) will allow the transport of these symbolic equations into BSTLab,41 which permits a variety of analyses in numerical, symbolic, or semi-symbolic form.48,49
At this juncture, the user has achieved only a partial specification of the model system. It is partial since the parameter values are still unknown or unspecified. If the parameter values are unknown, which is usually the case, the user may use the “Estimate Parameters” tab, as detailed in a later section. If the parameter values are known, the user may bypass the parameter estimation phase by directly typing these values into the GUI, as shown later.
BSTBox allows the user to visualize and edit the data on which the model is to be based. Data manipulations are initiated with the “Edit Time Series Data” button (shown on the GUI in Figure 99),), which in turn invokes the SplineTool (shown in Figure 1010).). The SplineTool allows the user to smooth time courses with predefined filters or with curve-fits such as cubic-spline interpolants or some least-squares approximation. The user may also manually add or delete time points to the curve. An added feature allows the user literally to move a given point into a desired position. This is accomplished by increasing or decreasing the x- or y-value of a data point using the “+”/“−” buttons on the lower left section of the SplineTool GUI. After the completion of data entering and editing, the user has the option to export the data into an MS-Excel file from the BSTBox GUI (Figure 99).). Permitting this type of inclusion of hypothetical data opens an entirely new realm of modeling possibilities, as will be illustrated later.
For our Lactococcus case study, actual time-course data characterizing all metabolites of the pathway are available in the form of in vivo NMR measurements of all involved variables.28–30 As explained above, the BSTBox GUI (Figure 99)) in combination with the SplineTool (Figure 1010)) allows the user to view the metabolic time-series data, to edit and smooth them, for instance with a spline, and to toggle between raw and smoothed data. Smoothing is often computationally beneficial for purposes of parameter estimation. If there is suspicion that the smoothing process might introduce undue bias, the raw data may be fitted again, once the estimation based on the smoothed data has produced reasonable initial parameter estimates. Using the SplineTool, each experimental time course for the Lactococcus case study was edited by manually removing noisy data points. The semi-smoothed curves thus obtained are shown in Figure 1111.
For concept modeling, it is important to have the facility to enter and manipulate hypothetical time courses that had not been measured but are based on the biologist’s intuition. An example could be that, after a particular stimulus, variable X3 is expected to rise sharply in a sigmoidal fashion to about twice its baseline value, even though specific data are not (yet) at hand. In this case, the user could start with an empty spreadsheet and create data according to his or her qualitative knowledge about the data. For instance, suppose variable X3 is initially assumed to be at its nominal value. A data point X3(t = 0) at time zero is created with the known nominal value or with a value of 1, which would correspond to a representation of the variable in a scaled manner. If doubling of the variable occurs within 20 min, the user specifies the corresponding value of X3(t = 20) and has the option of creating hypothetical data connecting the two points in a sigmoidal fashion and to refine them, where necessary, with the manual editing tool.
Given time-course data and a mathematical model in the form of a system of differential equations, the estimation of parameter values constitutes an inverse problem. Many methods of optimization and system identification have been developed throughout the past decades. Notable examples include nonlinear or alternating regression,34,50–52 genetic algorithms,53–55 evolutionary programming,56,57 and simulated annealing.58 Sadly, none of these methods is ideal, and parameter estimation continues to be the bottleneck of mathematical modeling.
Conceptually, the process of parameter estimation involves: (1) construction of an objective function such as the sum of squared errors between model and data; (2) selection of an initial guess for each parameter; and (3)application of a numerical, iterative optimization scheme designed to minimize the objective function. This process is depicted graphically in Figure 1212.
The enormous technical difficulties in executing this seemingly straightforward process have been discussed many times, and will not be repeated here. Suffice it to say that much computer time is spent on this process and that, yet, a good solution is not always obtained. We recently described this estimation process for our Lactococcus case study in great detail,27,44 exploring a repertoire of methods including local gradient-based techniques such as regular-OD based estimation and slope-based estimation, as well as global optimization techniques like genetic algorithms. Many of these techniques are directly accessible from the BSTBox GUI, illustrated in Figure 1313,, which also allows the user to specify technical features like upper and lower bounds for parameter values, initial guesses, an acceptable residual error, and the choice of a numerical ODE solver. The user may choose to estimate parameters while minimizing the error with respect to the raw experimental data or a smoothed data-set, and also select the range of time over which each parameter is to be optimized. Results associated with our Lactococcus case study, obtained with BSTBox, are shown in Figure 1414.
Once the system has been fully identified, complete with all variables, processes, and parameter values, the BSTBox GUI shown in Figure 1515 allows the user to simulate, by the click of a button, the entire system, a single component, or a subset of components. As during parameter estimation, if only a sub-system is being simulated, BSTBox automatically feeds the system values for the offline components upon cubic spline approximation. The GUI then displays the results from the simulation together with the experimental data, which facilitates comparisons and assessments of quality of fit. These simulation results may also be viewed in separate windows.
One of the common analyses with computational models is the investigation of changes in system dynamics in response to changes in the initial values of the system. Known as perturbation studies, such analyses entail examining whether the system returns to its original or a different steady state or possibly to a different attractor, like a stable limit cycle. To facilitate such analyses, the BSTBox GUI provides a placeholder for specifying different initial conditions for each dependent variable.
In the past, the primary role of mathematics in molecular biology has been bookkeeping, first as a means of recording and storing quantitative information, and more recently, with the advent of bioinformatics, as a means of mining and interpreting the enormous amounts of data generated by high-throughput methods. Greatly increased computer power, advanced mathematical techniques, and the availability of data of vastly enhanced quality and quantity are now slowly beginning to move mathematics into the more prominent role of integrating information, offering predictions, and guiding future experimentation through the generation of computationally inspired hypotheses. A crucial component of this emerging computational systems biology is the development of mathematical models. These have traditionally been constructed from the bottom up—that is, by assembling network models from representations of local features. More recently, and complementing traditional approaches, effort has been dedicated to top-down model development and parameterization from time-series data, which modern methods of molecular biology are generating with increasing frequency.
In this paper, we propose to add to the existing repertoire of modeling techniques a novel method that bridges the gap between semi-quantitative biological knowledge and the construction of detailed mathematical models. The starting point for this method is a concept map, which shows connections and interactions between components of biological systems and responses. This type of map is very prevalent in the biological literature, yet it has not really been exploited for modeling purposes. The key toward model construction is the translation of biological expertise, experience, qualitative insights, and intuition associated with the map into quantifiable temporal responses of all components. This translation subsequently allows the application of modern inverse methods for the determination of parameter values that specify the model and render it useful for analysis and simulation. The method of concept map modeling thus has the potential of converting biological insight into a concrete mathematical model that may be used to explore assumptions and generate testable hypotheses. Many details are yet to be worked out, but it seems that, supported by specific software such as BSTBox, concept map modeling could soon be added to the repertoire of biomolecular techniques.
This work was supported in part by a National Heart, Lung and Blood Institute Proteomics Initiative (Contract N01-HV-28181; D. Knapp, PI), and an endowment from the Georgia Research Alliance. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring institutions.