Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or “dark matter” of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case, these predictions provide a valuable guide to these experimentally elusive regions.
To model accurate protein networks we need to extend our knowledge of protein associations in molecular systems much further. Biologists believe that high-throughput experiments will fill the gaps in our knowledge. However, if these approaches perform biased screenings, leaving important areas poorly characterized, success in modelling protein networks will require additional approaches to explore these ‘dark’ areas. We assess the value of integrating bio-computational approaches to build accurate and comprehensive network models for human and yeast proteomes and compare these models with models derived by combining multiple experimental datasets. We show that the predicted networks resemble the topological and error features of the experimental networks, and contain information on true protein associations within and beyond their constitutive first order binary predictions. We suggest that the majority of predicted network space is dark matter containing important functional areas, elusive to current experimental designs. Until novel experimental designs emerge as effective tools to screen these hidden regions, computational predictions will be a valuable approach for exploring them.