The DREAM project provides a unique framework where network inference methods from a community of experts are collected and impartially assessed on benchmark datasets. The collection of 35 inference methods assessed here itself constitutes a unique resource, as it spans all commonly used approaches in the field. In addition, the collection includes novel approaches (including the two best individual team performers of the challenge), representing a snapshot of the latest developments in the field.
Our analyses revealed specific advantages and limitations of different inference approaches (see Supplementary Note 9
and the full description of approaches in Supplementary Note 10
). Sparse linear regression methods performed well, but only when data resampling strategies such as bootstrapping were used (the best performing regression methods all used data resampling, while the worst performing methods did not). Sparsity constraints employed by these methods effectively increased performance for cascade motifs, at the cost of missing interactions in feed-forward loops, fan-in, and fan-out motifs. Bayesian network methods exhibited below-average performance in this challenge, likely because they use heuristic searches, which are often too costly for systematic data resampling and may be better suited for smaller networks. Information theoretic methods performed better than correlation-based methods, but the two approaches had similar biases in predicting regulatory relationships. Compared to regression and Bayesian network methods, they perform better on feed-forward loops, fan-ins, and fan-outs (the more densely connected parts of the network), but have an increased rate of false positives for cascades. Meta predictors performed more robustly across datasets than other categories of methods, however, they could not match the robustness and performance of the community predictions, likely because they combine methods that do not provide sufficient diversity. Among all categories, methods that made explicit use of direct transcription factor perturbations (knockout or overexpression) greatly improved prediction accuracy for downstream targets (albeit at an increased false positive rate for cascades). For improving individual inference approaches we suggest the following: (1) optimally exploit direct transcription factor perturbations; (2) employ strategies to avoid over-fitting, such as data resampling; (3) develop more effective approaches to distinguish direct from indirect regulation (feed-forward loops vs
Overall, methods performed well for the in silico
and prokaryotic (E. coli
) datasets; however, inferring gene regulatory networks from the eukaryotic (S. cerevisiae
) dataset proved to be a greater challenge. A fundamental assumption of network inference algorithms is that mRNA levels of transcription factors and their targets tend to be correlated — we found that this is true for E. coli
, but not for S. cerevisiae
(Supplementary Note 5
). While the lower coverage of S. cerevisiae
gold standards may also play a role (E. coli
has the best-known regulatory network of any free-living organism16
), the poor correlation at the mRNA level in S. cerevisiae
is likely due to the increased regulatory complexity and prevalence of post-transcriptional regulation in eukaryotes, suggesting that accurate inference of eukaryotic regulatory networks requires additional inputs, such as promoter sequences, transcription factor binding, and chromatin modification datasets7
Individual studies that introduce a novel inference method naturally tend to focus on its advantages in a particular application, which can paint an over-optimistic picture of performance13
. While previous studies have explored strengths and weaknesses of inference approaches2,3
, the present assessment further shows that method performance is not robust across species and varies greatly even in the same category of inference methods (). This implies that performance is more related to the details of implementation, rather than the choice of the underlying methodology.
In network inference, variation in performance presents a problem, but at the same time offers a solution. By integrating the predictions from individual methods into community networks, we show that advantages of different methods complement each other, while limitations tend to be cancelled out. Instead of relying on a single inference method with uncertain performance on a previously unseen network, integrating predictions across inference methods becomes the best strategy. We note that not all of the 29 methods are required for enhanced performance. By considering complementary methods, we have shown that performance can be significantly improved with as few as three methods ().
Ensemble-based methods have a storied past, with applications ranging from economics1
to machine learning25
. In systems biology, robust models are often constructed from ensembles of instances (e.g., different parameterizations or model structures) that are derived from experimental data via a single approach26–30
, such as Monte Carlo sampling. In contrast, we formed consensus predictions from a large array of heterogeneous inference approaches. These “meta predictors” have been successful in other machine learning competitions31,32
. We have observed from previous DREAM challenges anecdotal evidence that community predictions can rank amongst the top performers13
, but we did not previously attempt a systematic study of prediction integration for network inference. Here we established, through rigorous assessments and experimentally derived datasets, the performance robustness of prediction integration for transcriptional gene network inference.
The shortcomings of individual methods revealed in our assessment present many opportunities for improving these methods. We also expect further improvements in performance from advanced community approaches that: (i
) actively leverage the method-specific advantages with regard to the datasets and networks of interest; (ii
) optimize diversity in the ensemble, e.g., by weighting methods so as to balance the contribution of different method categories or PCA clusters; and (iii
) employ more sophisticated voting schemes to negotiate consensus networks. To help spur developments in these areas, we provide the GP-DREAM web platform for the community to develop and apply network inference and consensus methods (http://dream.broadinstitute.org
). We will continue to expand this free toolkit with top performing methods from the DREAM challenges, as well as other methods contributed by the community.