# Related Articles

Background

Inference of gene-regulatory networks (GRNs) is important for understanding behaviour and potential treatment of biological systems. Knowledge about GRNs gained from transcriptome analysis can be increased by multiple experiments and/or multiple stimuli. Since GRNs are complex and dynamical, appropriate methods and algorithms are needed for constructing models describing these dynamics. Algorithms based on heuristic approaches reduce the effort in parameter identification and computation time.

Results

The NetGenerator V2.0 algorithm, a heuristic for network inference, is proposed and described. It automatically generates a system of differential equations modelling structure and dynamics of the network based on time-resolved gene expression data. In contrast to a previous version, the inference considers multi-stimuli multi-experiment data and contains different methods for integrating prior knowledge. The resulting significant changes in the algorithmic procedures are explained in detail. NetGenerator is applied to relevant benchmark examples evaluating the inference for data from experiments with different stimuli. Also, the underlying GRN of chondrogenic differentiation, a real-world multi-stimulus problem, is inferred and analysed.

Conclusions

NetGenerator is able to determine the structure and parameters of GRNs and their dynamics. The new features of the algorithm extend the range of possible experimental set-ups, results and biological interpretations. Based upon benchmarks, the algorithm provides good results in terms of specificity, sensitivity, efficiency and model fit.

doi:10.1186/1752-0509-7-1

PMCID: PMC3605253
PMID: 23280066

Gene-regulatory networks; Network inference; Heuristic algorithm; ODE; NetGenerator

Computational gene regulation models provide a means for scientists to draw biological inferences from time-course gene expression data. Based on the state-space approach, we developed a new modeling tool for inferring gene regulatory networks, called time-delayed Gene Regulatory Networks (tdGRNs). tdGRN takes time-delayed regulatory relationships into consideration when developing the model. In addition, a priori biological knowledge from genome-wide location analysis is incorporated into the structure of the gene regulatory network. tdGRN is evaluated on both an artificial dataset and a published gene expression data set. It not only determines regulatory relationships that are known to exist but also uncovers potential new ones. The results indicate that the proposed tool is effective in inferring gene regulatory relationships with time delay. tdGRN is complementary to existing methods for inferring gene regulatory networks. The novel part of the proposed tool is that it is able to infer time-delayed regulatory relationships.

doi:10.1155/2009/484601

PMCID: PMC3171427
PMID: 19841683

Background

The steady-state behaviour of gene regulatory networks (GRNs) can provide crucial evidence for detecting disease-causing genes. However, monitoring the dynamics of GRNs is particularly difficult because biological data only reflects a snapshot of the dynamical behaviour of the living organism. Also most GRN data and methods are used to provide limited structural inferences.

Results

In this study, the theory of stochastic GRNs, derived from G-Networks, is applied to GRNs in order to monitor their steady-state behaviours. This approach is applied to a simulation dataset which is generated by using the stochastic gene expression model, and observe that the G-Network properly detects the abnormally expressed genes in the simulation study. In the analysis of real data concerning the cell cycle microarray of budding yeast, our approach finds that the steady-state probability of CLB2 is lower than that of other agents, while most of the genes have similar steady-state probabilities. These results lead to the conclusion that the key regulatory genes of the cell cycle can be expressed in the absence of CLB type cyclines, which was also the conclusion of the original microarray experiment study.

Conclusion

G-networks provide an efficient way to monitor steady-state of GRNs. Our method produces more reliable results then the conventional t-test in detecting differentially expressed genes. Also G-networks are successfully applied to the yeast GRNs. This study will be the base of further GRN dynamics studies cooperated with conventional GRN inference algorithms.

doi:10.1186/1471-2164-10-S3-S26

PMCID: PMC2788379
PMID: 19958490

Boolean-based method, despite of its simplicity, would be a more attractive approach for inferring a network from high-throughput expression data if its effectiveness has not been limited by high false positive prediction. In this study, we explored factors that could simply be adjusted to improve the accuracy of inferring networks. Our work focused on the analysis of the effects of discretisation methods, biological constraints, and stringency of Boolean function assignment on the performance of Boolean network, including accuracy, precision, specificity and sensitivity, using three sets of microarray time-series data. The study showed that biological constraints have pivotal influence on the network performance over the other factors. It can reduce the variation in network performance resulting from the arbitrary selection of discretisation methods and stringency settings. We also presented the master Boolean network as an approach to establish the unique solution for Boolean analysis. The information acquired from the analysis was summarised and deployed as a general guideline for an efficient use of Boolean-based method in the network inference. In the end, we provided an example of the use of such a guideline in the study of Arabidopsis circadian clock genetic network from which much interesting biological information can be inferred.

doi:10.1371/journal.pone.0030232

PMCID: PMC3260258
PMID: 22272315

Background

The regulation of gene expression is achieved through gene regulatory networks (GRNs) in which collections of genes interact with one another and other substances in a cell. In order to understand the underlying function of organisms, it is necessary to study the behavior of genes in a gene regulatory network context. Several computational approaches are available for modeling gene regulatory networks with different datasets. In order to optimize modeling of GRN, these approaches must be compared and evaluated in terms of accuracy and efficiency.

Results

In this paper, two important computational approaches for modeling gene regulatory networks, probabilistic Boolean network methods and dynamic Bayesian network methods, are compared using a biological time-series dataset from the Drosophila Interaction Database to construct a Drosophila gene network. A subset of time points and gene samples from the whole dataset is used to evaluate the performance of these two approaches.

Conclusion

The comparison indicates that both approaches had good performance in modeling the gene regulatory networks. The accuracy in terms of recall and precision can be improved if a smaller subset of genes is selected for inferring GRNs. The accuracy of both approaches is dependent upon the number of selected genes and time points of gene samples. In all tested cases, DBN identified more gene interactions and gave better recall than PBN.

doi:10.1186/1471-2105-8-S7-S13

PMCID: PMC2099481
PMID: 18047712

Background

Boolean networks capture switching behavior of many naturally occurring regulatory networks. For semi-quantitative modeling, interpolation between ON and OFF states is necessary. The high degree polynomial interpolation of Boolean genetic regulatory networks (GRNs) in cellular processes such as apoptosis or proliferation allows for the modeling of a wider range of node interactions than continuous activator-inhibitor models, but suffers from scaling problems for networks which contain nodes with more than ~10 inputs. Many GRNs from literature or new gene expression experiments exceed those limitations and a new approach was developed.

Results

(i) As a part of our new GRN simulation framework Jimena we introduce and setup Boolean-tree-based data structures; (ii) corresponding algorithms greatly expedite the calculation of the polynomial interpolation in almost all cases, thereby expanding the range of networks which can be simulated by this model in reasonable time. (iii) Stable states for discrete models are efficiently counted and identified using binary decision diagrams. As application example, we show how system states can now be sampled efficiently in small up to large scale hormone disease networks (Arabidopsis thaliana development and immunity, pathogen Pseudomonas syringae and modulation by cytokinins and plant hormones).

Conclusions

Jimena simulates currently available GRNs about 10-100 times faster than the previous implementation of the polynomial interpolation model and even greater gains are achieved for large scale-free networks. This speed-up also facilitates a much more thorough sampling of continuous state spaces which may lead to the identification of new stable states. Mutants of large networks can be constructed and analyzed very quickly enabling new insights into network robustness and behavior.

doi:10.1186/1471-2105-14-306

PMCID: PMC3853020
PMID: 24118878

Boolean function; Genetic regulatory network; Interpolation; Stable state; Binary decision diagram; Boolean tree

Motivation: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets.

Results: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses.

Availability: The methods outlined in this article have been implemented in Matlab and are available on request.

Contact:
d.l.wild@warwick.ac.uk

Supplementary Information:
Supplementary data is available for this article.

doi:10.1093/bioinformatics/bts222

PMCID: PMC3371854
PMID: 22689766

Background

Biological networks are constantly subjected to random perturbations, and efficient feedback and compensatory mechanisms exist to maintain their stability. There is an increased interest in building gene regulatory networks (GRNs) from temporal gene expression data because of their numerous applications in life sciences. However, because of the limited number of time points at which gene expressions can be gathered in practice, computational techniques of building GRN often lead to inaccuracies and instabilities. This paper investigates the stability of sparse auto-regressive models of building GRN from gene expression data.

Results

Criteria for evaluating the stability of estimating GRN structure are proposed. Thereby, stability of multivariate vector autoregressive (MVAR) methods - ridge, lasso, and elastic-net - of building GRN were studied by simulating temporal gene expression datasets on scale-free topologies as well as on real data gathered over Hela cell-cycle. Effects of the number of time points on the stability of constructing GRN are investigated. When the number of time points are relatively low compared to the size of network, both accuracy and stability are adversely affected. At least, the number of time points equal to the number of genes in the network are needed to achieve decent accuracy and stability of the networks. Our results on synthetic data indicate that the stability of lasso and elastic-net MVAR methods are comparable, and their accuracies are much higher than the ridge MVAR. As the size of the network grows, the number of time points required to achieve acceptable accuracy and stability are much less relative to the number of genes in the network. The effects of false negatives are easier to improve by increasing the number time points than those due to false positives. Application to HeLa cell-cycle gene expression dataset shows that biologically stable GRN can be obtained by introducing perturbations to the data.

Conclusions

Accuracy and stability of building GRN are crucial for investigation of gene regulations. Sparse MVAR techniques such as lasso and elastic-net provide accurate and stable methods for building even GRN of small size. The effect of false negatives is corrected much easier with the increased number of time points than those due to false positives. With real data, we demonstrate how stable networks can be derived by introducing random perturbation to data.

doi:10.1186/1471-2105-12-S13-S17

PMCID: PMC3278833
PMID: 22373004

Background

Understanding gene interactions in complex living systems can be seen as the ultimate goal of the systems biology revolution. Hence, to elucidate disease ontology fully and to reduce the cost of drug development, gene regulatory networks (GRNs) have to be constructed. During the last decade, many GRN inference algorithms based on genome-wide data have been developed to unravel the complexity of gene regulation. Time series transcriptomic data measured by genome-wide DNA microarrays are traditionally used for GRN modelling. One of the major problems with microarrays is that a dataset consists of relatively few time points with respect to the large number of genes. Dimensionality is one of the interesting problems in GRN modelling.

Results

In this paper, we develop a biclustering function enrichment analysis toolbox (BicAT-plus) to study the effect of biclustering in reducing data dimensions. The network generated from our system was validated via available interaction databases and was compared with previous methods. The results revealed the performance of our proposed method.

Conclusions

Because of the sparse nature of GRNs, the results of biclustering techniques differ significantly from those of previous methods.

doi:10.1186/1742-4682-8-39

PMCID: PMC3231811
PMID: 22018164

Background

The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed.

Results

In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes.

Conclusions

A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 ≤ q ≤ 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.

doi:10.1186/1752-0509-5-61

PMCID: PMC3117729
PMID: 21545720

Background

Identifying gene regulatory network (GRN) from time course gene expression data has attracted more and more attentions. Due to the computational complexity, most approaches for GRN reconstruction are limited on a small number of genes and low connectivity of the underlying networks. These approaches can only identify a single network for a given set of genes. However, for a large-scale gene network, there might exist multiple potential sub-networks, in which genes are only functionally related to others in the sub-networks.

Results

We propose the network and community identification (NCI) method for identifying multiple subnetworks from gene expression data by incorporating community structure information into GRN inference. The proposed algorithm iteratively solves two optimization problems, and can promisingly be applied to large-scale GRNs. Furthermore, we present the efficient Block PCA method for searching communities in GRNs.

Conclusions

The NCI method is effective in identifying multiple subnetworks in a large-scale GRN. With the splitting algorithm, the Block PCA method shows a promosing attempt for exploring communities in a large-scale GRN.

doi:10.1186/1471-2105-13-S9-S3

PMCID: PMC3372453
PMID: 22901088

Background

A number of models and algorithms have been proposed in the past for gene regulatory network (GRN) inference; however, none of them address the effects of the size of time-series microarray expression data in terms of the number of time-points. In this paper, we study this problem by analyzing the behaviour of three algorithms based on information theory and dynamic Bayesian network (DBN) models. These algorithms were implemented on different sizes of data generated by synthetic networks. Experiments show that the inference accuracy of these algorithms reaches a saturation point after a specific data size brought about by a saturation in the pair-wise mutual information (MI) metric; hence there is a theoretical limit on the inference accuracy of information theory based schemes that depends on the number of time points of micro-array data used to infer GRNs. This illustrates the fact that MI might not be the best metric to use for GRN inference algorithms. To circumvent the limitations of the MI metric, we introduce a new method of computing time lags between any pair of genes and present the pair-wise time lagged Mutual Information (TLMI) and time lagged Conditional Mutual Information (TLCMI) metrics. Next we use these new metrics to propose novel GRN inference schemes which provides higher inference accuracy based on the precision and recall parameters.

Results

It was observed that beyond a certain number of time-points (i.e., a specific size) of micro-array data, the performance of the algorithms measured in terms of the recall-to-precision ratio saturated due to the saturation in the calculated pair-wise MI metric with increasing data size. The proposed algorithms were compared to existing approaches on four different biological networks. The resulting networks were evaluated based on the benchmark precision and recall metrics and the results favour our approach.

Conclusions

To alleviate the effects of data size on information theory based GRN inference algorithms, novel time lag based information theoretic approaches to infer gene regulatory networks have been proposed. The results show that the time lags of regulatory effects between any pair of genes play an important role in GRN inference schemes.

doi:10.1186/1471-2105-11-S6-S19

PMCID: PMC3026366
PMID: 20946602

The large influx of data from high-throughput genomic and proteomic technologies has encouraged the researchers to seek approaches for understanding the structure of gene regulatory networks and proteomic networks. This work reviews some of the most important statistical methods used for modeling of gene regulatory networks (GRNs) and protein-protein interaction (PPI) networks. The paper focuses on the recent advances in the statistical graphical modeling techniques, state-space representation models, and information theoretic methods that were proposed for inferring the topology of GRNs. It appears that the problem of inferring the structure of PPI networks is quite different from that of GRNs. Clustering and probabilistic graphical modeling techniques are of prime importance in the statistical inference of PPI networks, and some of the recent approaches using these techniques are also reviewed in this paper. Performance evaluation criteria for the approaches used for modeling GRNs and PPI networks are also discussed.

doi:10.1155/2013/953814

PMCID: PMC3594945
PMID: 23509452

Inferring the topology of a gene-regulatory network (GRN) from genome-scale time-series measurements of transcriptional change has proved useful for disentangling complex biological processes. To address the challenges associated with this inference, a number of competing approaches have previously been used, including examples from information theory, Bayesian and dynamic Bayesian networks (DBNs), and ordinary differential equation (ODE) or stochastic differential equation. The performance of these competing approaches have previously been assessed using a variety of in silico and in vivo datasets. Here, we revisit this work by assessing the performance of more recent network inference algorithms, including a novel non-parametric learning approach based upon nonlinear dynamical systems. For larger GRNs, containing hundreds of genes, these non-parametric approaches more accurately infer network structures than do traditional approaches, but at significant computational cost. For smaller systems, DBNs are competitive with the non-parametric approaches with respect to computational time and accuracy, and both of these approaches appear to be more accurate than Granger causality-based methods and those using simple ODEs models.

doi:10.1098/rsfs.2011.0053

PMCID: PMC3262295
PMID: 23226586

gene-regulatory networks; inference; gene expression

Background

Inferring a gene regulatory network (GRN) from high throughput biological data is often an under-determined problem and is a challenging task due to the following reasons: (1) thousands of genes are involved in one living cell; (2) complex dynamic and nonlinear relationships exist among genes; (3) a substantial amount of noise is involved in the data, and (4) the typical small sample size is very small compared to the number of genes. We hypothesize we can enhance our understanding of gene interactions in important biological processes (differentiation, cell cycle, and development, etc) and improve the inference accuracy of a GRN by (1) incorporating prior biological knowledge into the inference scheme, (2) integrating multiple biological data sources, and (3) decomposing the inference problem into smaller network modules.

Results

This study presents a novel GRN inference method by integrating gene expression data and gene functional category information. The inference is based on module network model that consists of two parts: the module selection part and the network inference part. The former determines the optimal modules through fuzzy c-mean (FCM) clustering and by incorporating gene functional category information, while the latter uses a hybrid of particle swarm optimization and recurrent neural network (PSO-RNN) methods to infer the underlying network between modules. Our method is tested on real data from two studies: the development of rat central nervous system (CNS) and the yeast cell cycle process. The results are evaluated by comparing them to previously published results and gene ontology annotation information.

Conclusion

The reverse engineering of GRNs in time course gene expression data is a major obstacle in system biology due to the limited number of time points. Our experiments demonstrate that the proposed method can address this challenge by: (1) preprocessing gene expression data (e.g. normalization and missing value imputation) to reduce the data noise; (2) clustering genes based on gene expression data and gene functional category information to identify biologically meaningful modules, thereby reducing the dimensionality of the data; (3) modeling GRNs with the PSO-RNN method between the modules to capture their nonlinear and dynamic relationships. The method is shown to lead to biologically meaningful modules and networks among the modules.

doi:10.1186/1471-2164-10-S1-S15

PMCID: PMC2709258
PMID: 19594874

Motivation: Inferring global regulatory networks (GRNs) from genome-wide data is a computational challenge central to the field of systems biology. Although the primary data currently used to infer GRNs consist of gene expression and proteomics measurements, there is a growing abundance of alternate data types that can reveal regulatory interactions, e.g. ChIP-Chip, literature-derived interactions, protein–protein interactions. GRN inference requires the development of integrative methods capable of using these alternate data as priors on the GRN structure. Each source of structure priors has its unique biases and inherent potential errors; thus, GRN methods using these data must be robust to noisy inputs.

Results: We developed two methods for incorporating structure priors into GRN inference. Both methods [Modified Elastic Net (MEN) and Bayesian Best Subset Regression (BBSR)] extend the previously described Inferelator framework, enabling the use of prior information. We test our methods on one synthetic and two bacterial datasets, and show that both MEN and BBSR infer accurate GRNs even when the structure prior used has significant amounts of error (>90% erroneous interactions). We find that BBSR outperforms MEN at inferring GRNs from expression data and noisy structure priors.

Availability and implementation: Code, datasets and networks presented in this article are available at http://bonneaulab.bio.nyu.edu/software.html.

Contact:
bonneau@nyu.edu

Supplementary information:
Supplementary data are available at Bioinformatics online.

doi:10.1093/bioinformatics/btt099

PMCID: PMC3624811
PMID: 23525069

Coexpression of genes or, more generally, similarity in the expression profiles poses an unsurmountable obstacle to inferring the gene regulatory network (GRN) based solely on data from DNA microarray time series. Clustering of genes with similar expression profiles allows for a course-grained view of the GRN and a probabilistic determination of the connectivity among the clusters. We present a model for the temporal evolution of a gene cluster network which takes into account interactions of gene products with genes and, through a non-constant degradation rate, with other gene products. The number of model parameters is reduced by using polynomial functions to interpolate temporal data points. In this manner, the task of parameter estimation is reduced to a system of linear algebraic equations, thus making the computation time shorter by orders of magnitude. To eliminate irrelevant networks, we test each GRN for stability with respect to parameter variations, and impose restrictions on its behavior near the steady state. We apply our model and methods to DNA microarray time series’ data collected on Escherichia coli during glucose-lactose diauxie and infer the most probable cluster network for different phases of the experiment.

Electronic supplementary material

The online version of this article (doi:10.1007/s11693-011-9079-2) contains supplementary material, which is available to authorized users.

doi:10.1007/s11693-011-9079-2

PMCID: PMC3159693
PMID: 21949674

Gene regulatory networks; DNA microarray; Gene product degradation; Gene clustering; Optimization; Dynamic robustness

Coexpression of genes or, more generally, similarity in the expression profiles poses an unsurmountable obstacle to inferring the gene regulatory network (GRN) based solely on data from DNA microarray time series. Clustering of genes with similar expression profiles allows for a course-grained view of the GRN and a probabilistic determination of the connectivity among the clusters. We present a model for the temporal evolution of a gene cluster network which takes into account interactions of gene products with genes and, through a non-constant degradation rate, with other gene products. The number of model parameters is reduced by using polynomial functions to interpolate temporal data points. In this manner, the task of parameter estimation is reduced to a system of linear algebraic equations, thus making the computation time shorter by orders of magnitude. To eliminate irrelevant networks, we test each GRN for stability with respect to parameter variations, and impose restrictions on its behavior near the steady state. We apply our model and methods to DNA microarray time series’ data collected on Escherichia coli during glucose-lactose diauxie and infer the most probable cluster network for different phases of the experiment.

Electronic supplementary material

The online version of this article (doi:10.1007/s11693-011-9079-2) contains supplementary material, which is available to authorized users.

doi:10.1007/s11693-011-9079-2

PMCID: PMC3159693
PMID: 21949674

Gene regulatory networks; DNA microarray; Gene product degradation; Gene clustering; Optimization; Dynamic robustness

Despite the development of numerous gene regulatory network (GRN) inference methods in the last years, their application, usage and the biological significance of the resulting GRN remains unclear for our general understanding of large-scale gene expression data in routine practice. In our study, we conduct a structural and a functional analysis of B-cell lymphoma GRNs that were inferred using 3 mutual information-based GRN inference methods: C3Net, BC3Net and Aracne. From a comparative analysis on the global level, we find that the inferred B-cell lymphoma GRNs show major differences. However, on the edge-level and the functional-level—that are more important for our biological understanding—the B-cell lymphoma GRNs were highly similar among each other. Also, the ranks of the degree centrality values and major hub genes in the inferred networks are highly conserved as well. Interestingly, the major hub genes of all GRNs are associated with the G-protein-coupled receptor pathway, cell-cell signaling and cell cycle. This implies that hub genes of the GRNs can be highly consistently inferred with C3Net, BC3Net, and Aracne, representing prominent targets for signaling pathways. Finally, we describe the functional and structural relationship between C3Net, BC3Net and Aracne gene regulatory networks. Our study shows that these GRNs that are inferred from large-scale gene expression data are promising for the identification of novel candidate interactions and pathways that play a key role in the underlying mechanisms driving cancer hallmarks. Overall, our comparative analysis reveals that these GRNs inferred with considerably different inference methods contain large amounts of consistent, method independent, biological information.

doi:10.3389/fgene.2013.00281

PMCID: PMC3864360
PMID: 24379827

gene regulatory network; C3Net; BC3Net; Aracne; GPEA; statistical inference

Motivation

Conventional identification methods for gene regulatory networks (GRNs) have overwhelmingly adopted static topology models, which remains unchanged over time to represent the underlying molecular interactions of a biological system. However, GRNs are dynamic in response to physiological and environmental changes. Although there is a rich literature in modeling static or temporally invariant networks, how to systematically recover these temporally changing networks remains a major and significant pressing challenge. The purpose of this study is to suggest a two-step strategy that recovers time-varying GRNs.

Results

It is suggested in this paper to utilize a switching auto-regressive model to describe the dynamics of time-varying GRNs, and a two-step strategy is proposed to recover the structure of time-varying GRNs. In the first step, the change points are detected by a Kalman-filter based method. The observed time series are divided into several segments using these detection results; and each time series segment belonging to two successive demarcating change points is associated with an individual static regulatory network. In the second step, conditional network structure identification methods are used to reconstruct the topology for each time interval. This two-step strategy efficiently decouples the change point detection problem and the topology inference problem. Simulation results show that the proposed strategy can detect the change points precisely and recover each individual topology structure effectively. Moreover, computation results with the developmental data of Drosophila Melanogaster show that the proposed change point detection procedure is also able to work effectively in real world applications and the change point estimation accuracy exceeds other existing approaches, which means the suggested strategy may also be helpful in solving actual GRN reconstruction problem.

doi:10.1371/journal.pone.0074571

PMCID: PMC3792119
PMID: 24116005

Network inference deals with the reconstruction of biological networks from experimental data. A variety of different reverse engineering techniques are available; they differ in the underlying assumptions and mathematical models used. One common problem for all approaches stems from the complexity of the task, due to the combinatorial explosion of different network topologies for increasing network size. To handle this problem, constraints are frequently used, for example on the node degree, number of edges, or constraints on regulation functions between network components. We propose to exploit topological considerations in the inference of gene regulatory networks. Such systems are often controlled by a small number of hub genes, while most other genes have only limited influence on the network's dynamic. We model gene regulation using a Bayesian network with discrete, Boolean nodes. A hierarchical prior is employed to identify hub genes. The first layer of the prior is used to regularize weights on edges emanating from one specific node. A second prior on hyperparameters controls the magnitude of the former regularization for different nodes. The net effect is that central nodes tend to form in reconstructed networks. Network reconstruction is then performed by maximization of or sampling from the posterior distribution. We evaluate our approach on simulated and real experimental data, indicating that we can reconstruct main regulatory interactions from the data. We furthermore compare our approach to other state-of-the art methods, showing superior performance in identifying hubs. Using a large publicly available dataset of over 800 cell cycle regulated genes, we are able to identify several main hub genes. Our method may thus provide a valuable tool to identify interesting candidate genes for further study. Furthermore, the approach presented may stimulate further developments in regularization methods for network reconstruction from data.

doi:10.1371/journal.pone.0035077

PMCID: PMC3343044
PMID: 22570688

Background

Inferring the structure of gene regulatory networks (GRN) from a collection of gene expression data has many potential applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy.

Results

In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection, for that purpose. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (for Trustful Inference of Gene REgulation with Stability Selection), was ranked among the top GRN inference methods in the DREAM5 gene network inference challenge. In particular, TIGRESS was evaluated to be the best linear regression-based method in the challenge. We investigate in depth the influence of the various parameters of the method, and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference, in both directed and undirected settings.

Conclusions

TIGRESS reaches state-of-the-art performance on benchmark data, including both in silico and in vivo (E. coli and S. cerevisiae) networks. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/tigress. Moreover, TIGRESS can be run online through the GenePattern platform (GP-DREAM, http://dream.broadinstitute.org).

doi:10.1186/1752-0509-6-145

PMCID: PMC3598250
PMID: 23173819

Gene Regulatory Network inference; Feature selection; Gene expression data; LARS; Stability selection

Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms.

doi:10.1371/journal.pone.0051141

PMCID: PMC3514269
PMID: 23226565

Background

Inferring Gene Regulatory Networks (GRNs) from time course microarray data suffers from the dimensionality problem created by the short length of available time series compared to the large number of genes in the network. To overcome this, data integration from diverse sources is mandatory. Microarray data from different sources and platforms are publicly available, but integration is not straightforward, due to platform and experimental differences.

Methods

We analyse here different normalisation approaches for microarray data integration, in the context of reverse engineering of GRN quantitative models. We introduce two preprocessing approaches based on existing normalisation techniques and provide a comprehensive comparison of normalised datasets.

Conclusions

Results identify a method based on a combination of Loess normalisation and iterative K-means as best for time series normalisation for this problem.

doi:10.1371/journal.pone.0013822

PMCID: PMC2980467
PMID: 21103045

Background

Reconstruction of regulatory networks is one of the most challenging tasks of systems biology. A limited amount of experimental data and little prior knowledge make the problem difficult to solve. Although models that are currently used for inferring regulatory networks are sometimes able to make useful predictions about the structures and mechanisms of molecular interactions, there is still a strong demand to develop increasingly universal and accurate approaches for network reconstruction.

Results

The additive regulation model is represented by a set of differential equations and is frequently used for network inference from time series data. Here we generalize this model by converting differential equations into integral equations with adjustable kernel functions. These kernel functions can be selected based on prior knowledge or defined through iterative improvement in data analysis. This makes the integral model very flexible and thus capable of covering a broad range of biological systems more adequately and specifically than previous models.

Conclusion

We reconstructed network structures from artificial and real experimental data using differential and integral inference models. The artificial data were simulated using mathematical models implemented in JDesigner. The real data were publicly available yeast cell cycle microarray time series. The integral model outperformed the differential one for all cases. In the integral model, we tested the zero-degree polynomial and single exponential kernels. Further improvements could be expected if the kernel were selected more specifically depending on the system.

doi:10.1186/1752-0509-2-8

PMCID: PMC2248159
PMID: 18218091