Inference of regulatory interactions in a genetic system provides fundamental biological knowledge and significant efforts have been invested for the solution of this problem, [1
]. The method we propose in this paper improves upon the previous contributions to the solution of this problem: it employs a more realistic model, it reduces the effect of noise on the solution obtained, it avoids the costly step involving numerical integration and, significantly, it explicitly utilizes gene expression ratios, which are typically the primary data of microarray-based gene expression analysis. Here, we use an S-system based [24
] model that explicitly accounts for proteins serving as regulatory agents. It also accounts for the non-linear dependency of transcription rates in the protein concentrations. We are solely dealing with gene expression data in view of the fact that reasonably-complete proteomic data are not readily available. We used the same model as in our previous work, [12
] for the development of a method for gene regulatory network inference based on steady state gene expression ratio data. In this paper, a heuristic solution for the problem is given, as dictated by the S-system based model and time-varying gene expression ratio data. The computational complexity of the method is exponential in the number of genes in the system. However if a subset of the interactions were already known to exist, then the method could be used on networks with a larger number of genes. The impact of noise in the data is reduced by using smoothing splines as approximations to the time profiles of gene expression.
The model used in this paper shares similarity with inference methods based on S-system models [11
]. However, these earlier methods do not consider the effect of proteins (whose concentrations are not measured) in regulating gene expression. Also, every evaluation of the objective function set up in [11
] and [13
] for optimization required the integration of a set of differential equations. This integration can be costly in terms of computational resources, as was pointed out in [28
] and [29
Related to the methods based on the S-system models are methods based on linear differential equations [16
]. The methods of Refs. [17
] and [19
] involve a least square fitting approach, but their models do not involve protein concentrations. Dasika et al
] used a linear regulatory model but allowed the current gene expressions to depend on the levels of gene expression of the previous time points. This time delay of the action of an mRNA on the transcription rates may capture the delay due to the protein-translation process and possible protein modification events like glycosylation, phosphorylation, methylation etc. However, the value of the time-delay parameter cannot be mapped easily to the biophysical and biochemical process it represents. The model presented here directly accounts for the protein translation process and thus there is an implicit time-delay in the regulation of gene expression. The model used in Ref. [18
] involves both mRNA and protein concentrations. However, the authors assume that all protein concentrations can be measured. The work in Refs. [20
] are representative of methods which analyze the time course gene expression data using a Bayesian network framework. This framework assumes a linear model between gene-expression levels at multiple time points and hence is similar, conceptually, to the one used in Ref. [16
Most of the previous model-based methods (Eg. [11
]) assume that the gene-expression data are available as absolute concentrations and they also assume linear, additive action of the regulatory mRNAs on the transcription rates. The method presented here is tailored for the analysis of relative
gene expression data, and it can be regarded as a non-lineargeneralization of the previous models. Apart from these models, there are model-based identification methods that include even broader description of cellular processes by including models for metabolic processes [14
]. However, the applicability of such models is restricted to smaller systems because of the complexity involved due to experimental measurements and computational requirements.
Here we describe a model-based inference approach of the regulatory network of a genetic system using time-varying mRNA-expression ratios obtained from experiments involving DNA microarrays. We employ an S-system approach to model the transcription and translation processes and, propose an optimization-based regulatory network inference method. The method is tested using synthetic data from a model genetic network of genes, and is applied on expression data of a core subset of genes involved in the sporulation cascade of the prokaryote Bacillus anthracis.