Home | About | Journals | Submit | Contact Us | Français |

**|**BMC Bioinformatics**|**v.11; 2010**|**PMC2918581

Formats

Article sections

- Abstract
- Background
- Methods
- Results and Discussion
- Conclusions
- Authors' contributions
- Supplementary Material
- References

Authors

Related links

BMC Bioinformatics. 2010; 11: 385.

Published online 2010 July 20. doi: 10.1186/1471-2105-11-385

PMCID: PMC2918581

Fabien Corblin: rf.asiri@nilbroc.neibaf; Eric Fanchon: rf.gami@nohcnaf.cire; Laurent Trilling: rf.gami@gnillirt.tnerual

Received 2010 January 22; Accepted 2010 July 20.

Copyright ©2010 Corblin et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been cited by other articles in PMC.

A growing demand for tools to assist the building and analysis of biological networks exists in systems biology. We argue that the use of a formal approach is relevant and applicable to address questions raised by biologists about such networks. The behaviour of these systems being complex, it is essential to exploit efficiently every bit of experimental information. In our approach, both the evolution rules and the partial knowledge about the structure and the behaviour of the network are formalized using a common constraint-based language.

In this article our formal and declarative approach is applied to three biological applications. The software environment that we developed allows to specifically address each application through a new class of biologically relevant queries. We show that we can describe easily and in a formal manner the partial knowledge about a genetic network. Moreover we show that this environment, based on a constraint algorithmic approach, offers a wide variety of functionalities, going beyond simple simulations, such as proof of consistency, model revision, prediction of properties, search for minimal models relatively to specified criteria.

The formal approach proposed here deeply changes the way to proceed in the exploration of genetic and biochemical networks, first by avoiding the usual trial-and-error procedure, and second by placing the emphasis on sets of solutions, rather than a single solution arbitrarily chosen among many others. Last, the constraint approach promotes an integration of model and experimental data in a single framework.

A central task in molecular systems biology is to build and analyze genetic and biochemical networks in order to decipher the properties of cellular phenomena. The emphasis is not on investigating in detail one or a few molecules at a time, as is done traditionally in molecular biology, but rather on focusing on the network level.

We are specifically interested here in gene regulatory networks (GRNs) formalized as discrete genetic networks as defined by R. Thomas [1,2]. The main goal of this formalism is to obtain a qualitative understanding of the network dynamics by reasoning on discrete entities. In GRNs the molecular players are the genes and the proteins they produce. A genetic interaction corresponds to the fact that a gene *g _{i }*produces a protein

Beyond these basic functionalities (simulation, inference of parameter values), the construction of GRN models consistent with experimental data requires more sophisticated tools. It often occurs that a proposed model displays inconsistencies with part of the data. In such cases it is necessary to critically analyze the hypotheses used in building the model and to revise them. This analysis can be done "by hand" for small networks, e.g. up to three genes, but requires the use of computational tools to cope with the complexity of larger networks. In still other situations the observational constraints are weak with respect to the number of variables, and the number of solutions is very large. In such cases, it is interesting to derive properties that are shared by all the solutions, or subsets of them, in order to get a better understanding of the model properties and to design new experiments having the potential to substantially reduce the set of solutions.

Fundamentally, we want to provide the biologist studying GRNs with a software environment allowing to perform such tasks. The available knowledge is partial and bears on both the *structure *of the network of interest (the set of interactions) and the *behaviour *of the network in various conditions. The first kind of knowledge is said to be structural, or local (each interaction is a piece of information and can be studied in itself), whereas the second kind is said to be behavioural, or global (the network as a whole is giving rise to a given behaviour). The network architecture and its behaviour are closely inter-related. This relation is implemented formally as a set of constraints in a straightforward manner in our software environment, named GNBox (Genetic Networks toolBox - Additional file 1). More precisely, the philosophy of this approach is to represent a given problem, or set of problems, as a set of formulae linking variables. In our case this entails the specification of (i) the rules defining the updating scheme (how the successors of a state are computed); (ii) the network architecture (set of interactions); (iii) the observations about the behaviours of the network (partial information about paths), or any working hypothese about the system; (iv) the query itself (e.g. number of stationnary states, possible values of initially unkown parameters). The set of constraints thus defined is then submitted to a solver whether there exists solutions or not. A distinctive feature of the constraint approach is constraint propagation. It implements deduction rules and allows in favorable conditions to reduce drastically the search space, thus limiting enumeration. Of course some amount of enumeration is usually still necessary, but the aim of the game is to reduce it as much as possible. This formal relation is "executable" and allows not only to perform basic functionalities such as simulation or reverse-engineering, but also to assert and obtain properties on both the behaviours and the interactions. More specifically, we implemented in this constraint-based setting four main functionalities: (i) proof of consistency or inconsistency of a constraint pool, (ii) constraint relaxation in case of inconsistency (model revision), (iii) prediction of properties in case of consistency, (iv) search for minimal models, with respect to the number of thresholds, for example.

In this article we present our approach and we show how it can be applied successfully to the analysis of three different biological problems. In the section Methods we present the formalism we developped. We present the formal definition of interaction graphs and of the evolution rules of Thomas networks. These notions are required to express the queries implementing the functionalities mentionned above. Other notions related the specification of interaction compositions facilitate the expression of properties involving kinetic parameters. The implementation is discussed in [3]. In the section Results and Discussion we present three applications which differ by both the type of knowledge available and the type of biological questions asked. These applications permit (i) to illustrate the different functionalities of GNBox, (ii) to show the feasibility of this constraint-based approach on realistic biological problems, and (iii) to support the idea that a formal and declarative approach is very interesting to decipher the properties of GRNs, in order to assist in their construction.

We present briefly in this section the constraint technology, the constraint-based formalization of Thomas networks, the constraint-based formalization of biological properties of these networks, and the features of our software environment GNBox to elucidate GRNs.

Below we use the following mathematical notations: an integer *x *taking values between *min *and *max *is denoted *x * *min..max*, a Boolean *b *is an integer such as *b * 0..1, *b*1 *b*2 means that the Boolean *b*1 is equal (or equivalent) to the Boolean *b*2, *b*1 implies *b*2 is denoted *b*1 *b*2, the Boolean equal to *b*1 and *b*2 is denoted *b*1 ∧ *b*2, the Boolean equal to *b*1 or *b*2 is denoted *b*1 *b*2, the Boolean equal to the conjunction of a list of Boolean *b _{i }*is denoted ∧

We propose to implement the approach (particularly the link between network structure and behaviour) using Constraint Logic Programming (CLP) technology, with a finite domain solver. CLP is a programming paradigm based on first order logic. CLP considers specific classes of logical terms and proposes efficient resolution methods of equations over these terms (constraints). A CLP program is a logic formula, and its execution is the construction of a proof of consistency (or inconsistency) of this formula. The formula is consistent when it is possible to find an instantiation of the variables which satisfies the formula. Logicians call such an instantiation a model. A CLP program is reversible in the sense that it permits to impose and obtain partial knowledge over all the variables of the formula (including in our case the variables describing the interactions and behaviours). For example, let say that *p*(*x*, *y*) is a predicate defining a relationship between two entities *x *and *y*. If a measurement allows to reduce the domain of values of *x*, this information can be added as an additional constraint, and a *query *can be submitted about the possible values of *y*. The solver will try to propagate the additional information on *x *to reduce the domain of *y*, taking into account *p*(*x*, *y*). Conversely, if the measurement has been performed on *y*, this information can be propagated to *x *through *p*(*x*, *y*). This is reversibility. It must be said that different kinds of solvers exist, characterized by the type of variables (e.g. finite domain integers, reals) and the type of propagation rules used, among other things.

As all the variables describing interactions and behaviours have finite integer domains in the discrete framework used, the use of a constraint solver over finite domains is very well suited. In addition, the expressive power of first order logic and constraints over integers allows the definition of very general properties and functionalities. Finally, in order to be able to take advantage of the very efficient Boolean Satisfiability (SAT) solvers available, the GNBox environment is able to translate the CLP formalization into a Boolean formula in Conjunctive Normal Form (conjunction of disjunctions of Boolean variables or their negation, the input format used by most SAT solvers). Details on the translation into CNF can be found in [3]. In this way we combine the expressive power of CLP with the efficiency of SAT solvers.

In this subsection we present a constraint-based formalism to impose, check and infer properties about discrete genetic networks as defined by R. Thomas. We first introduce the notions needed to define and formalize the interaction graph and the evolution rules of Thomas networks. We define in the next subsection the notions of composition of interactions, additivity and observability properties which are useful to express hypotheses about kinetic parameters. All the presented notions of this subsection and the next one are illustrated with the example of Figure Figure11 and Figure Figure2,2, and will be put into use in the biological applications of the section Results.

The structure of a GRN is represented in an abstract way by an interaction graph . The nodes of are genes. Each node is associated to a concentration variable representing the concentration of the protein produced by the corresponding gene. The oriented edges of represent *interactions *between these genes, denoted by for the interaction on the gene (component) *c *of index *r*, *r * 1..*r _{c}*, where

The network structure being defined, the next step is to define the network state and dynamics. A *state S *of the network is a list of gene product concentrations (protein or RNA). The concentrations are discretized according to the thresholds appearing in . The concentration of the product of gene *c *in state *S *is the integer *S _{c }* 0..

We are now in a position to explain how the successor states of a given state are computed. Each state *S *is associated to a so-called *focal state*, denoted by , and belonging to the same state space. The focal state gives the direction of evolution (tendency) for each concentration. Consider for instance *S *= 0, 0 and *F _{s }*= 1, 1 in a 2-dimensional system. The successor of

To implement the Thomas evolution rules we need first to specify the equations which link a state *S *to its focal state *F _{S}*. These equations are named

The system of focal equations contains different kinds of parameters: constant concentrations associated to *input genes *(that is genes that are influenced by no genes in the network and whose state is fixed by external conditions), parameters related to reaction kinetics (similar to those that would appear in a differential description), and thresholds . The set of all these parameters is denoted by *P*. The parameters are amongs the unknowns of the system of constraints because their values are in general not known, or only partially known. The evolution rules, once formalized (see below), lend to a first set of logical constraints. To this first set are added structural constraints over the parameters derived from experimental data, and working hypotheses. The set of solutions of such a system of constraints defines a set of instantiated models (i.e. models in which all parameters are instantiated). The couple composed of a focal equation system and a set of structural constraints is called a *parameterized constrained model M*, or just *model*. A typical query includes one or several structurally-related models (when data are available on several mutants), and some additional behavioural constraints. If the resulting system (set of all constraints of the query) is under-constrained this set contains a large number of solutions. If it is over-constrained it is empty. In our context, this last case is interpreted as a contradiction between, on one hand, the experimental evidence and, on the other hand, the network structure or the hypotheses. More sophisticated queries are presented in the application part below, to illustrate the high-level functionalities mentioned in the introduction.

The parameterized focal equation system of a model *M *is completely defined by an interaction graph . In fact these two entities contain exactly the same information (as long as kinetic parameters are not instantiated nor constrained). The set of interactions of having the gene *c *as target induces a partition of the concentration space according to the thresholds of these interactions. This partition defines a set of regions called the *cellular contexts *of *c*. As long as the concentration of the proteins *c' *regulating *c *do not cross one of the thresholds, the system stays in the same cellular context, because from the viewpoint of gene *c *the regulatory conditions have not changed. This means that all the states *S *belonging to the same cellular context of *c *have the same focal component *F _{c,S }*of the focal state

We have introduced the main notions unformally (interaction, threshold, state, focal state, focal equation, cellular context, discrete kinetic parameter), and will now present formal definitions which are directly usable in constraint form.

**Definition 1 ***Let c be a component, and let S be a state. The focal component of c in S, denoted by F_{c,S}, is defined by the following focal equation of c:*.

*where ** is the discrete kinetic parameter of c with index l, l *1

The above formula means that if state *S' *belongs to the cellular context of index *l' *for gene *c *(that is is true) then the focal component *F _{c,S' }*is equal to .

**Example 1 ***The row "cellular contexts" in *Figure Figure11* describes formally and graphically, for a given order of thresholds, the cellular contexts for each component x and y of the considered example. Component x is the target of only one interaction and is thus associated to two cellular contexts, y is the target of two interactions and has 4 cellular contexts. The row "discrete kinetic parameters" gives the list of these parameters. The subscripts and superscripts make the correspondence with the associated cellular contexts (** with **, etc.). Finally the row "focal component" gives the equations describing the focal components F _{x, S }and F*

The focal state defines the direction of the dynamic transitions starting in *S*. In the Thomas networks, the authorized transitions are such that:.

1. *S' *and *S *are the same state or are neighbors,

2. *S' *and *S *differ on at most one component.

3. *S' *is in the "direction" of the focal state *F _{S}*.

The first property (formally ) is due to the fact that the concentrations evolve continuously, thus jumps over states are not allowed. The second is commonly called asynchronicity. The third one is specific to the discretization of evolution equations due to Thomas. We explained above that when two concentrations are increasing in a given state, it is not known in this kind of abstraction which will reach first its next threshold, and consequently which transition will occur first. In this situation both transitions are taken into account leading to two successors for the state considered (of course this generalizes to more than two). This is intimately connected to the non-determinism inherent to abstractions based on phase space partition.

The rules have the following consequences: *S *= *S' * *F _{S }*=

It is possible to specify a knock-out or ectopic expression *mutation*. For each non-input mutated gene *c *set to a constant value *v*, the constraint ∧_{l }= *v *must be introduced. For a mutated input gene to the value *v *the input parameter of the model is set to this value *v*. In some cases it is necessary to use several models in the same query, one model corresponding to the wild type and the others to mutants. In such cases we introduce constraints specifying that for all couples of models (*M ^{α}*,

A user of GNBox must describe the structure of the studied GRN (possible interactions between genes), and can use the language *LG*1 to specify the existence of a behaviour. The language *LG*1 is composed of the predicate *path*(*M*, *Path*, *L*) which is true if *Path *is a succession of *L *states authorized by the model *M *(achieving a formal link between a model and its behaviours), and a language to impose arithmetic constraints between variables of *Path*. Language *LG*1 is used to formalize observations on the behaviour of the system. Our approach allows to specify (declare) partial information. For example only a few concentrations may have been measured. Absence of information is absence of constraints.

The interaction graph lists the interactions individually but does not contain information on the manner in which different interactions are composed when they have the same target gene. The information about the way to compose interactions is embodied in relationships linking the parameters contained in . However, the manual interpretation of instantiations or properties over parameters of *P *is not convenient, especially for users not acquainted with the formalism of Thomas networks. For this reason we designed a higher level language *LG*2 to impose, check and infer properties about the way to compose the interactions in in the manner of the traditional notion of the logic of regulation (NEG, AND, OR gates). It should be understood that this is not fundamental to the approach but merely a facility to handle relationships between parameters induced by the composition of interactions. The user always has the choice to work directly on these relationships.

We explained above that the specification of a set of interactions for a gene *c *partitions (in cellular contexts) the concentration space by hyperplanes (corresponding to thresholds of interactions acting on *c*). *LG*2 permits, for every *c*, to partition the concentration space in union of cellular contexts of *c*, named *compositions of cellular contexts*, via the definition of *interaction compositions*. Any union of cellular contexts can be specified, and in particular an union of disconnected regions. Similarly to the semantic of a set of interactions, the semantic of a set of interaction compositions is the following: all the states belonging to a given composition of cellular contexts of *c *have the same evolution tendency of the concentration of protein *c*. The borders between these regions are constituted of parts of threshold hyperplanes of interactions taking part in the composition. We name these borders interaction compositions. An interaction composition for *c*, denoted by , *rc * 1..*rc _{c}*,

• an interaction .

• . The region where the state *S *are such that is said to satisfy .

• , where *ic *and *ic' *are interaction compositions. The region where the states *S *satisfies both *ic *and *ic' *is said to satisfy .

• , where *ic *and *ic' *are interaction compositions. The region where the states *S *satisfies *ic*, or *ic'*, or both, is said to satisfy .

**Example 2 ***The sixth row in *Figure Figure22* gives three possible sets of interactions compositions for y, related to the example in *Figure Figure1.1*. The four first rows recall the context of example in *Figure Figure1.1*. The fifth row gives the set of interaction compositions over x. The seventh row shows for each of these couple of sets a graphical representation of the detailed structure of the network, with signs *+ *and - over interactions and bridges, denoted by AND, to express a conjunction between two interaction compositions. Finally, the last row shows the resulting compositions of cellular contexts for y*.

*The first case (second column of last row) leads to the same partition of the discrete concentration space of y (the areas described by the cellular contexts are the same that those described by the compositions of cellular contexts)*.

*The second case expresses with the sole interaction composition on y, , that either the concentration of x and y are above and , respectively (the tendency of y is unique in this region), or the concentration of x or y are below and , respectively (the tendency of y is unique in this region)*.

*The third case expresses quite the same of the second case but permits that x interacts on y whatever the concentration of y. So, we obtain three compositions of cellular contexts because the fact that x can interact on y all along the border of the threshold *.

**Example 3 ***The *Figure Figure33* gives an example of a set of interaction compositions and resulting composition of cellular contexts. In the first column, we can see an interaction graph with two components a and b, a set of four interactions over b, and a partition of the concentration space into nine non empty cellular contexts. The indexes l of the conditions Cell appear in circles. The other cellular contexts are empty according to the order of the values of the thresholds . Note that usually this order is not known and the values of thresholds for a same species can be equal. In the second column (to make a parallel with the interactions and cellular contexts) we assumed to have two interaction compositions. We obtain a partition of the concentration space into four non empty compositions of cellular contexts (the pink region being the union of the two disconnected cellular contexts *1 *and *12*)*.

The language *LG*2 allows to define specific effects of an interaction composition on a component *c*. Here by effect we mean a shift in the position of the focal component *F _{c }*when the border associated to the interaction composition is crossed. Biologically, an increase of the tendency of

The first one, called *additivity*, is the systematic non-strict increase of tendency of *c *when a border is crossed in some predefined direction. In other words the effect on *c *of the interaction composition adds to the effect of all other interaction compositions on *c*. The direction in which the border is crossed for this property is the one given by the passage from a state where the interaction composition is not satisfied to a state where it is satisfied.

The second property, called *observability*, is the existence of a strict increase of the tendency on *c*. This means that the effect on the tendency of *c *exists at least at one crossing point (where the border associated to the interaction composition is crossed in the same direction as the additivity property). In contrast to the additivity property, observability property requires only the existence of an effect somewhere along the border.

To define these effects more formally we introduce for each interaction composition on *c *a set, denoted by , containing all couples of states (*S*0, *S*1) such that (i) *S*0 is adjacent to *S*1, (ii) *S*0 is a state in the region where is not satisfied, and (iii) *S*1 is a state in the region where is satisfied.

**Example 4 ***For the interaction composition of the example given in *Figure Figure33* we get * = {(0, 1, 0, 0), (1, 1, 1, 0), (2, 0, 1, 0), (1, 3, 2, 3), (2, 2, 2, 3), (3, 2, 3, 3)}. *Each of the couples *(*S*0, *S*1) *of this set is represented in *Figure Figure44* by a kind of arrow symbol, where the 'o' end is associated to state S*0*, and the *'|' *end to state S*1.

*LG*2 allows to specify that an interaction composition has an additive effect, denoted by , i.e. that the difference of trend of *c *is positive or zero all along the border defined by . The exact semantics of is: for every couple (*S*0, *S*1) of the trend of *c *in *S*0 is less than or equal to the trend of *c *in *S*1. Since the trend of a state is equal to the trend of all the states in the same cellular context, the additivity constraints are expressed as relations between discrete kinetic parameters .

**Example 5 ***For the example in *Figure Figure22* (with the given order of thresholds) we have * = {(0,0, 1,0), (0,1), 1,1), (0,2, 1,2)}, *and ** due to the negative sign associated to the interaction of y on x*.

*For the first case of interaction compositions on * = {(0,0, 1,0), (1,1, 1,1, (0,2, 1,2)}, , and = {(0,1, 0,2), (1,1, 1,2)}, .

*For the second case of interaction compositions on y*, = {(0,2, 1,2), (1,1, 1,2)} *and *. *If this additivity property is true, the only case of activation of y is when x and y are above **and ** respectively*.

*For the third case ** (the same as the first one in the first case because ** is the same) and ** (the same as the first one in the second case). If these aditivity are true, there are two cases of activation of y, one above ** and one above ** and *. *Moreover, the second case of activation is greater than the first one, due to the additivity property *.

If *multi-arcs *are present in the interaction graph (several arcs with the same origin and the same target node) the cellular contexts on each side of the border defined by the interaction composition are not the same depending on the values of the thresholds associated to the multi-arc. In that case the additivity constraints are relations involving also thresholds. Briefly, the additivity constraint of the interaction composition is: ∧(*adj*(*l*0, *l*l, *rc*) ≥ ) with *adj*(*l*0*, l*1*, rc*) true if it exists a couple (*S*0*, S*1) of such that (the cellular contexts *l*0 and *l*1 are non empty, adjacent, and on each side of the border defined by ).

**Example 6 ***For the example in Figure *Figure3,3*, the additivity constraint of the composition is:*.

*according to the identifiers l of cellular contexts for b (and so the identifiers of discrete kinetic parameters ). It can be checked with the graphic representation of cellular contexts of b in *Figure Figure33* that for ** we obtain **. This example shows that specifying additivity properties can be much more compact than working at the level of parameters. Without language LG*2 *we would have to write the above formula*.

In addition to the additivity property, *LG*2 allows to specify that an interaction composition has an observable effect, denoted by , i.e. that the difference of trend of *c *is strictly positive at least at one position along the border defined by . The exact semantics of is: for at least one couple (*S*0, *S*1) of the trend of *c *in *S*0 is strictly less than the trend of *c *in *S*1. To be more explicit, an interaction can be removed from the interaction graph if neither the interaction composition (reduced to a single interaction), nor its negation is observable. Briefly the observability constraint of the interaction composition is: with *adj*(*l*0*, l*1*, rc*) true if it exists a couple (*S*0*, S*1) of such that .

**Example 7 ***For the example in *Figure Figure33* with ** the constraint ** is *.

The core functionality of the GNBox environment is to test, for a given structure of a GRN, the consistency of a set of hypotheses about the behaviours of this GRN (language *LG*1) for several mutant types, about the interaction compositions (language *LG*2), and even directly about the parameters in *P*. GN-Box is able to identify consistent solutions in terms of state variables that define the behaviour (*LG*1) and in terms of parameters of *P*. In cases where the set of hypotheses is inconsistent, it is desirable to determine the possible relaxations of hypotheses to remove the inconsistency. GNBox can identify automatically, among a defined set of questionable hypotheses, all subsets of hypotheses whose relaxation removes the inconsistency (subsets of necessarily false hypotheses). These subsets are represented as disjunctions of conjunctions of negations of hypotheses. For example, the hypotheses *H*1 and *H*2 must be relaxed or the hypothesis *H*3 must be relaxed: (¬*H*1 ∧ *¬H*2) V *¬H*3. Also GNBox automatically identifies, among a defined set of hypotheses, all subsets of hypotheses necessarily true. These subsets are represented by disjunctions of conjunctions of hypotheses. For example, the hypotheses *H*1 and *H*2 are true or the hypothesis *H*3 is true: (*H*1 ∧ *H*2) *H*3.

The analysis of this network adapted from [7] illustrates mainly the capability of GNBox (i) to express constraints about reachability of states, and (ii) to find the minimal interaction graph consistent with observations.

The λ bacteriophage (or simply λ phage) is a virus that infects the bacterium *Escherichia coli*. The infection starts by the injection of the genetic material of the virus into the cytoplasm of the bacterium. We focus here on two simple observations about the evolution of the bacterium after infection: either the viral DNA is integrated in the genetic material of the bacterium, and the cells continue to divide normally (thus reproducing the phage DNA in the same process), or the genetic material replicates in the cytoplasm of the bacterial cell to create new viral particles and then new viruses until lysis (destruction) of the cell, which leads to the release of new virus particles in the extracellular medium. The first case corresponds to the *lysogenic phase *while the second corresponds to the *lytic phase*. The decision between these two phases is made by a network of viral genes.

The model proposed in [7] contains four viral genes denoted by *cI*, *cro*, *cII *and *n*. The gene *cI *is expressed only in the lysogenic phase, *cro *is expressed only in the lytic phase and genes *cII *and *n *are not expressed in both phases. The graph of interactions between these genes is given in Figure Figure5.5. Interactions and interaction compositions (deduced from experimental data) are given in Table Table1.1. We consider the set of all additivity and observability constraints for all these interaction compositions , etc.). In the following we assume that the thresholds are ordered so That . According to the previous section the set of interactions, the hypotheses about interaction compositions (set of interaction compositions, additivity properties, observability properties) and the hypotheses on threshold values define a parameterized constrained model (couple composed of a focal equation system derived from and a set of structural constraints). We call it *M*_{λ }and it is defined formally by the predicate *model_λ*(*M*_{λ}). A state *S *for this model is represented by an ordered list *S _{CI}*,

Interactions and interaction compositions hypotheses for the model about immunity control by the λ phage

The uninfected cell does not have any viral protein and can therefore be represented by the state *S*0 = *S*0* _{cI}, S*0

The GNBox environment proves the consistency of this pool of constraints in 2 seconds. All run times mentioned in this article are obtained on a laptop with 2 GB of RAM and running at 2.4 GHz. Moreover GNBox can provide the instantiations of the parameters of *P *that satisfy the pool of constraints.

**Example 8 ***An example of instantiation is:*

*(remember that the indexes l of discrete kinetic parameters * are set at their creation from the numbering of interactions * See Definition *1*). The set of transitions from S to S', denoted by S S', for this instantiation is represented in *Figure Figure66.

An interesting question, akin to reverse-engineering of the network, is: what are the minimal numbers of interaction compositions necessary to get a model consistent with Constraint 1 without specifying any additivity or observability constraints? In other words, we search for the minimal interaction graphs, in terms of interactions, which satisfy the observed behaviors. From a constraint point of view, this problem is specified and implemented in the following way. For each interaction composition on a gene *c*, a Boolean variable is created which means "all pairs of states separated only by this interaction composition have the same evolution tendency for *c*". Then GNBox searches for consistent models such that the number of these Boolean variables which are true is maximized. GNBox finds that only two interactions on *cro *are necessary (in two seconds). So, the minimal interaction graph contains only two interactions on *cro *and no interactions on the other genes. The result is surprising at first sight, but it should be borne in mind that the query contains only poor information about behaviours and no information on the interaction graph (but the limitation to possible interactions), the goal being to infer the minimum graph implied by this information. This does not preclude the existence of other interactions, but means that those are not necessary to account for the behaviours included in the query.

Finally this application lead us to the interesting question of the length *L *of the longest path without cycle in the state space for a given set of hypotheses *Set*. We call this length the *diameter *of the network for *Set*. This knowledge permits to restrict the length of paths in subsequent queries considering a set of hypotheses including *Set*. In our case *Set *is Constraint 1. The diameter for *Set *is 43. This highly combinatorial problem is answered in two queries: one to prove the existence of a solution for a length *L *of 43 in 459 seconds, the other showing inconsistency for a length *L *of 44 in 489 seconds.

The modeling and analysis of this network is adapted from [8]. It illustrates a case of model revision coming from an inconsistency of the initial set of hypotheses. We performed a similar and more exhaustive study reported in [9]. We show that by an automatic relaxation method over biological constraints we can suggest lines of research to the biologist or, said differently, generate new hypotheses.

Populations of the bacterium *Escherichia coli *grow exponentially in favorable conditions. This state is called the *exponential phase*. In stressing conditions, when food (carbon) starts to be lacking, the populations stop growing and they enter in a state called *stationary phase*, with altered physiology and morphology. The phenomenon is reversible: the population can return to the exponential phase if the conditions become favorable again.

The model, proposed in [8] and adapted to our formalism, contains one input node *sig *(signal, 0 in the absence of stress and 1 in the presence of stress) and five species: *crp*, *cya*, *fis*, *gyr *and *top*. The interaction graph is given in Figure Figure77 where the input *sig *is represented by a dotted circle filled in blue. Interactions and interaction compositions are given in Table Table2.2. Moreover we consider the set of all additivity and observability constraints for all interaction compositions. As before, the thresholds are ordered and equal to *p*. The model obtained from all these hypotheses is denoted by *M _{coli}*. Thus we have

Interactions and interaction compositions hypotheses for the model about carbon nutritional stress in *E. coli*

The exponential phase and the stationary phase are modeled by two states, respectively *S ^{ns }*(

Note that only three components are instantiated in each state and that there is a relationship between the two others which expresses the fact that the super-coiling of DNA is higher in the exponential phase. To model the presence or the absence of stress we use two models: a model without stress (*sig *= 0) and a model with stress (*sig *= 1). These two models are the same biological model *M*_{coli }in different conditions. So they share the same discrete kinetic parameters . In absence of stress the system beginning in the stressed state *S ^{s }*can reach the non-stressed state

where *L *is the length of the third and fourth paths for the reachability of the two steady states. In the following queries we choose *L *= 10 and *L *= 100 to compare performance, but if we want a general query without any limitation on *L *we should choose *L *= 324 (the total number of states of the model) but the amount of memory needed to generate the pool of constraints becomes very large. We point out here that for such queries involving paths, this approach is limited to networks of medium size.

With GNBox we prove that the pool of constraints composed of constraints for models and Constraint 2 and Constraint 3 is inconsistent in 2 seconds for path length *L *= 10 states, and 13 seconds for path length *L *= 100. In fact just imposing the existence of two steady states gives an inconsistency in less than 1 second, thus proving that the constraint pool is inconsistent whatever the value of *L*. In [8] it is noted that the proposed instantiated model is indeed inconsistent. Here we prove in addition that there exists no other instantiation of the discrete kinetic parameters (accepting the interaction compositions hypotheses) able to restore consistency. In other words it is proved that this network architecture with these hypotheses on interaction compositions is incompatible with the observations. It is thus necessary to revise the model. In [8] the authors suggest that a regulator or an interaction may be missing in the model. Here, instead, we keep the interaction graph as it is, and try to change the way interactions are composed. The set of unreliable hypotheses is the set of all additivity and observability properties about interaction compositions. We allow the relaxation of these hypotheses and we obtain the property in 7 seconds with a path length *L *= 10, and in 830 seconds with a path length *L *= 100. Discussions with the biologist lead to the conclusion that it is not acceptable to relax the additivity property of the first composition on *gyr*, . This suggests that the composition on *top*, , is the one which is not additive and consequently that it is possible to observe an inhibition of *top *by *fis*. This inhibition effect is actually observed for another kind of stress. In [10] it is said: "when Fis levels are low, hydrogen peroxide treatment results in topA activation". This means that *fis *acts in some cellular contexts as an inhibitor of *top*. This paper shows that the protein *fis *can indeed play an inhibitory role on *top *in some contexts, and it thus gives support to the new hypothesis that *fis *plays an inhibitory role in the response to nutritional stress. It is remarkable that this pool of constraints is inconsistent given that the number of adjustable parameters is relatively high. We insist here on the fact that inspection of the constraint pool did not allow to resolve manually this inconsistency.

Finally, it appears that the hypotheses of interaction compositions on *top *are not well supported by experiments, and we propose to determine the necessarily observable compositions of the type and (). The rationale for limiting the compositions to basic types (signed interactions) is to provide easily interpretable results in terms of the interaction graph complemented with interaction signs (corresponding to the choice between activation and inhibition). This allows to determine for example whether there are unnecessary arcs in . On the other hand this restriction still allows to guide the user in the choice of hypotheses about interaction compositions. We conserve all the previous hypotheses except the ones about the interaction compositions on *top*. We consider a new set of these six interaction compositions for *top*: . Finally we challenge the observability constraints onto them to find which of them are necessary for these hypotheses. GNBox returns the property (observability property of or observability property of) in 4 seconds with a length of path *L *= 10, and the same formula in 60 seconds with a length of path *L *= 100. This indicates that any solution of all constraints (except the hypotheses of composition on *top*) has the property or the property . This result provides an essential information to help the biologist to make additional hypotheses about interaction compositions.

In the first hours after fertilization, the embryo of the fly *Drosophila melanogaster *undergoes segmentation along the anteroposterior axis (head to tail). The embryo is partitioned into segments, each segment being made of cells characterized by specific levels of a set of proteins. Segmentation takes place in several successive stages controlled by distinct genetic modules. Here we focus on the gap-gene regulatory module.

The modeling and analysis of this network illustrates the expression of steady states in several segments of the embryo for the wild type and several mutant types, and the search for the minimum number of thresholds necessary to account for all the observations. The initial model is adapted from [11,12]. Although this model is not the most recent available, it is convenient for our purpose. The resolution of this query provides a set of minimal models (in terms of number of thresholds) consistent with a set of very diverse observations. The connection between the observations for all these models (one for each mutant type and for each segment) adds a new level of complexity.

The model, proposed in [11,12], controlling the gap-gene module contains seven genes: Giant denoted by *gt*, Hunchback zygotic denoted by *hb _{z}*, Hunchback maternal denoted by

Interactions and interaction compositions hypotheses for the model about gap-gene module of the segmentation of the *D. melanogaster *embryo

The concentrations of the proteins produced by input genes *bcd*, *hb _{m }*and

Moreover, we have inequality constraints between the thresholds for this model:

The observations relate to the existence of one steady state by mutant type and by segment with some properties between these states. The steady states are represented by ordered lists of four protein concentrations: for each region *R *and each mutant type *T*. The constraint associated to the observation of the steady state of each region *R *and each type *T *is:

The gradients of maternal origin mentioned above are used by the cell to derive positional information. To represent these gradients, the antero-posterior axis is partitioned into segments, each segment being identified by a combination of values of the input molecules *bcd*, *hb _{m }*and

Table Table44 shows the constraints between the concentrations of the steady states in regions *A*, *B*, *C *and *D *for each species and for each mutant; the bold font indicates a change of constraint compared to the wild type. All these constraints come from the interpretation of data in [12]. The first column of the table gives the mutant type, the second the genes, the third, fourth, fifth and sixth represent the steady state of the region *A*, *B*, *C *and *D*, respectively. Inequality symbols appear between the columns labeled A, B, C, D. They indicate constraints between concentrations of steady states of adjacent segments. Finally the two last columns give other constraints involving concentrations in segments that are not necessarily adjacent and comments about the differences compared to the wild type.

Constraints between stationary states and between input parameters for each region and each mutant type for the model about gap-gene module of the segmentation of the *D. melanogaster *embryo

We note in the following *C _{gap }*the set of constraints associated to the existence of the steady states of the 4 regions for each of the 10 types (composed of constraints defining and linking the models

If we add to *C _{gap }*the constraint represented in Table Table55 about the instantiation of steady states for all types, input parameters for the wild type according to the second table of [12], and the constraint ∧

It appears in the second figure of [11] that there is no auto-interaction onto *hb _{z}*. In fact, after discussion with D. Thieffry, a synergy between Hunchback and Bicoid on the activation of Hunchback has been reported, and

So we consider in the following the proposed model with this auto-interaction onto *hb _{z }*(in order to have a similar model to those in [11,12]). Obviously,

In previous applications the thresholds are instantiated and equal to *p*, *p *taking a value between 1 and *max _{c}*. This implies that the concentrations of

To identify the minimum number of different thresholds needed to satisfy all the observations and hypotheses, we must build a query using the method of relaxation of constraints. But in this case, the relaxation takes place on the number of thresholds in 1*..max _{c }*for each component

To summarize, we challenge the hypotheses about the number of thresholds for all components. From a constraint point of view, this problem is specified and implemented in the following way. We introduce Boolean variables *B _{j,c }*equivalent to "the number of thresholds for

The last query gives, in 17 seconds, two possible instantiations of the accepting *C _{gap }*and the minimal number of thresholds given by the previous query: . One remarks that the three thresholds of

Our methodology is composed of two parts: (i) a declarative constraint-based approach; (ii) a formalism for the description of the dynamics of discrete networks. We have presented here applications involving gene regulatory networks whose behaviour is satisfactorily represented in the formalism of R. Thomas. But it is important to note that the methodology can be applied to many other types of dynamical rules, such as Hopfield-like networks, Boolean networks with parallel, sequential or block-sequential updating. The only requirement is that the dynamical rules should be expressed as constraints on finite-domain variables. The potential domain of application of this methodology is thus much larger than just gene regulatory networks.

The Thomas' networks have largely been applied to the analysis of GRNs, for example those described in [7,13-15] or those described in [8,16,17] which use a very similar qualitative formalism.

Several modeling and simulation tools of biological regulatory networks (for example GINsim [18], BIOCHAM [19], GNA [20]) are used in combination with model checkers (NuSMV, CADP) and based on diverse formalisms (logic, Petri nets, ODEs). The idea is to add to the simulation functionality a formal verification functionality to check, or optimize [21], the fitness between the simulated and the observed behaviours.

Three types of abstractions are available in BIOCHAM, among which ordinary differential equations and Boolean networks. The inference of parameters is based on the technique of model-checking and the definition of a continuous degree of satisfaction of a temporal logic formula formalizing some observation on behaviour. This permits to find biochemical kinetic parameter values which are optimal with respect to a set of biological properties. Moreover it is possible to find the effect of parameter variations on the robustness of a behaviour specification [22]. Our work differs significantly in that it focuses to face the problem of incomplete knowledge to produce, by a constraint-based process, a class of models from which it is expected to design new experiments.

A steady state search module, including the so-called singular states, exists in GNA based on an integration of the SAT solver SAT4J [23]. This integration of a constraint approach avoids the generation of all the transitions to identify the steady states. But in contrast to our work aiming at providing general queries, [23] focus on the search of steady states, and only in the case of completely instantiated models (kinetic parameters instantiated and order of the thresholds predefined). The work in [24] search the same steady states with a CSP (Constraint Satisfaction Problem) formalization, the performances are worse than in [23]. In our work, we can write easily queries to identify steady states, and even cycles of length smaller than some predefined value. In addition in our case the kinetic parameters and the orders between thresholds can be only partially known. As explained here, other much more sophisticated queries are available, although in the current version we do not include singular states.

The formal approach proposed here modifies deeply the way to proceed in the building and in the exploration of genetic and biochemical networks, first by avoiding the usual trial-and-error procedure, and second by putting the emphasis on sets of solutions, rather than a single consistent solution arbitrarily chosen in a set. Last, the constraint approach lends to a unified description of network architecture and network behaviour, as both are described in terms on formal constraints. The knowledge available to initiate the modeling of a given phenomenon is generally sparse with respect to the complexity of the behaviour of the underlying networks. It is thus essential to exploit consistently, efficiently, and in a joint manner, every bit of experimental information. The representation of knowledge in terms of constraints is a way to achieve, at least to some extent, this goal.

Our environment GNBox implements a wide panel of functionalities: simulations, consistency proof, relaxation in case of inconsistency, search for a minimal model, prediction of properties in case of consistency. This last functionality generates properties which are verified by all solutions of the constraint pool. In line with what we said above, note that such properties are really supported by data. This contrasts with the usual practice of using just one solution to make prediction, neglecting the existence of other solutions. Properties of the selected single model should not be considered as true predictions.

We have presented three biological applications illustrating the use of most of these functionalities. These applications involve networks containing about 5 species and 15 possible interactions, and with set of hypotheses and observations without systematic instantiation of threshold parameters, with a large range of types of behaviours. In the third application the queries involve several structurally related models in order to incorporate knowledge about wild-type and mutant behaviour, in four segments of the embryo. The set of constraints generates a dense network of dependencies between the variables. The performances of GNBox are good for the different types of queries presented in the three applications. The most computer-intensive queries are those involving paths. For such queries our approach is limited to networks of medium size.

The perspectives are governed by the biological problems. The methodologies and technologies employed must be chosen according to these problems. A first perspective is to prioritize biological experiments. For example, consider a situation in which the state of knowledge is such that the number of consistent instantiated models is still large, and it is possible to perform double knock-out experiments. In such situation it would be interesting to be able to determine the most informative choice of pairs of genes to target for knock-out, an informative experiment being one which will potentially add non redundant constraints and thus reduce the set of solutions. Another perspective is to refine the abstraction of the discrete behaviours: for example by taking into account the trajectories sliding along the thresholds [17,25], and taking into account the difference of delays of chemical reactions [26-28]. Another perspective is the extended repairing consistency techniques adding species, related to the problem of composing networks [29].

The methods and applications was mainly developed by FC on theoretical foundation and ideas provided by EF and LT. All authors equally wrote this manuscript and approved it.

This work was performed at the TIMC-IMAG laboratory. Final editing of the manuscript was made while FC was at IRISA-INRIA. The work was supported by Microsoft Research through its PhD Scholarship Programme with a grant to FC. EF thanks IXXI (Institut des Systèmes Complexes Rhône-Alpes) for partial financial support. We thank Denis Thieffry for helpful clarifications on the drosophila embryo segmentation model.

- Thomas R, D'Ari R. Biological Feedback. CRC Press; 1990.
- Thomas R, Kaufman M. Multistationarity, the Basis of Cell Differentiation and Memory. II. Logical Analysis of Regulatory Networks in Term of Feedback Circuits. Chaos. 2001;11:180–195. doi: 10.1063/1.1349893. [PubMed] [Cross Ref]
- Corblin F, Bordeaux L, Fanchon E, Hamadi Y, Trilling L. Connections and Integration with SAT Solvers: A Survey and a Case Study in Computational Biology. Hybrid Optimization: the 10 years of CPAIOR, Springer. 2010. in press .
- Kauffman S. Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology. 1969;22:437–467. doi: 10.1016/0022-5193(69)90015-0. [PubMed] [Cross Ref]
- Demongeot J, Elena A, Sené S. Robustness in regulatory networks: a multi-disciplinary approach. Acta Biotheoretica. 2008;56:27–49. doi: 10.1007/s10441-008-9029-x. [PubMed] [Cross Ref]
- Mendoza L, Alvarez-Buylla E. Dynamics of the Genetic Regulatory Network for Arabidopsis thaliana Flower Morphogenesis. Journal of Theoretical Biology. 1998;193(2):307–319. doi: 10.1006/jtbi.1998.0701. [PubMed] [Cross Ref]
- Thieffry D, Thomas R. Dynamical Behaviour of Biological Regulatory Networks -II. Immunity Control in Bacteriophage Lambda. Bulletin of Mathematical Biology. 1995;57:277–297. [PubMed]
- Ropers D, de Jong H, Page M, Schneider D, Geiselmann J. Qualitative simulation of the carbon starvation response in
*Escherichia coli*. Biosystems. 2006;84(2):124–152. doi: 10.1016/j.biosystems.2005.10.005. [PubMed] [Cross Ref] - Corblin F, Tripodi S, Fanchon E, Ropers D, Trilling L. A declarative constraint-based method for analyzing discrete gene regulation networks. Biosystems. 2009;98(2):91–104. doi: 10.1016/j.biosystems.2009.07.007. [PubMed] [Cross Ref]
- Weinstein-Fischer D, Altuvia S. Differential regulation of Escherichia coli topoisomerase I by Fis. Mol Microbiol. 2007;63(4):1131–1144. doi: 10.1111/j.1365-2958.2006.05569.x. [PubMed] [Cross Ref]
- Sánchez L, Thieffry D. A Logical Analysis of the
*Drosophila*Gap-gene System. J theor Biol. 2001;211:115–141. doi: 10.1006/jtbi.2001.2335. [PubMed] [Cross Ref] - Thieffry D, Sánchez L. Alternative epigenetic states understood in terms of specific regulatory structures. Annals of the New York Academy of Sciences. 2002;981:135–153. doi: 10.1111/j.1749-6632.2002.tb04916.x. [PubMed] [Cross Ref]
- Sànchez L, van Helden J, Thieffry D. Establishment of the dorso-ventral pattern during embryonic development of
*Drosophila melanogaster*: a logical analysis. J Theor Biol. 1997;187:377–389. doi: 10.1006/jtbi.1997.0523. [PubMed] [Cross Ref] - Fanchon E, Corblin F, Trilling L, Hermant B, Gulino D. In: LNCS, Computational Methods in Systems Biology. Danos V, Schachter V, editor. Vol. 3082. 2005. Modeling the Molecular Network Controlling Adhesion Between Human Endothelial Cells: Inference and Simulation Using Constraint Logic Programming; pp. 104–118. full_text.
- Guespin-Michel J, Bernot G, Comet JP, Mérieau A, Richard A, Hulen C, Polack B. Epigenesis and dynamic similarity in two regulatory networks in Pseudomonas aeruginosa. Acta Biotheoretica. 2004;52(4):379–390. doi: 10.1023/B:ACBI.0000046604.18092.a7. [PubMed] [Cross Ref]
- de Jong H, Geiselmann J, Batt G, Hernandez C, Page M. Qualitative Simulation of the initiation of sporulation in
*Bacillus subtilis*. Bulletin of Mathematical Biology. 2004;66(2):261–299. doi: 10.1016/j.bulm.2003.08.009. [PubMed] [Cross Ref] - de Jong H, Gouzé JL, Hernandez C, Page M, Sari T, Geiselmann J. Qualitative Simulation of Genetic Regulatory Networks Using Piecewise-Linear Models. Bulletin of Mathematical Biology. 2004;66(2):301–340. doi: 10.1016/j.bulm.2003.08.010. [PubMed] [Cross Ref]
- Naldi A, Berenguier D, Fauré A, Lopez F, Thieffry D, Chaouiya C. Logical modelling of regulatory networks with GINsim 2.3. Biosystems. 2009;97(2):134–139. doi: 10.1016/j.biosystems.2009.04.008. [PubMed] [Cross Ref]
- Calzone L, Fages F, Soliman S. BIOCHAM: an environment for modeling biological systems and formalizing experimental knowledge. Bioinformatics. 2006;22(14):1805–1807. doi: 10.1093/bioinformatics/btl172. [PubMed] [Cross Ref]
- de Jong H, Geiselmann J, Hernandez C, Page M. Genetic Network Analyser: qualitative simulation of genetic regulatory networks. Bioinformatics. 2003;19(3):336–344. doi: 10.1093/bioinformatics/btf851. [PubMed] [Cross Ref]
- Rizk A, Batt G, Fages F, Soliman S. In: Proc. of the Fourth International Conference on Computational Methods in Systems Biology (CMSB'08), LNCS. Heiner M, Uhrmacher A, editor. Vol. 5307. Springer; 2008. On a continuous degree of satisfaction of temporal logic formulae with applications to systems biology; pp. 251–268.
- Rizk A, Batt G, Fages F, Soliman S. A general computational method for robustness analysis with applications to synthetic gene networks. Bioinformatics. 2009;25(12):169–178. doi: 10.1093/bioinformatics/btp200. [PMC free article] [PubMed] [Cross Ref]
- de Jong H, Page M. Search for steady states of piecewise-linear differential equation models of genetic regulatory networks. IEEE/ACM Trans. Comput Biol Bioinform. 2008;5(02):208–222. doi: 10.1109/TCBB.2007.70254. [PubMed] [Cross Ref]
- Devloo V, Hansen P, Labbé M. Identification of All Steady States in Large Biological Systems by Logical Analysis. Bulletin of Mathematical Biology. 2003;65 (6):1025–1051. doi: 10.1016/S0092-8240(03)00061-2. [PubMed] [Cross Ref]
- Corblin F, Fanchon E, Trilling L. Modélisation de réseaux biologiques discrets en programmation logique par contraintes. TSI (Technique et Science Informatiques) 2007;26:73–98. doi: 10.3166/tsi.26.73-98. [Cross Ref]
- Siebert H, Bockmayr A. Temporal constraints in the logical analysis of regulatory networks. Theoretical Computer Science. 2008;391(3):258–275. doi: 10.1016/j.tcs.2007.11.010. [Cross Ref]
- Ahmad J, Bernot G, Comet JP, Lime D, Roux O. Hybrid modelling and dynamical analysis of gene regulatory networks with delays. ComPlexUs. 2006;3(4) doi: 10.1159/000110010. [Cross Ref]
- Batt G, Salah R, Maler O. On Timed Models of Gene Networks. LNCS. 2007;4763:38–52.
- Gössler G. Computational Methods in Systems Biology. Vol. 4210. LNCS, Springer Berlin/Heidelberg; 2006. Compositional Reachability Analysis of Genetic Networks; pp. 212–226. full_text.

Articles from BMC Bioinformatics are provided here courtesy of **BioMed Central**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |