The immense complexity in biological structures and processes such as intracellular signal transduction networks is one of the obstacles to fully understanding how these systems function. As understanding of these biochemical pathways increases, it is clear that they form networks of astonishing complexity and diversity. This means that the complex pathways involved in regulation of one area of the cell (so complex that a researcher could spend their entire career working in that area alone) are so interconnected to other, equally complex areas that all of the different pathway systems must be studied together, as a whole, if any of the individual components are to be understood. However, the large scale and minute intricacy of each of the individual networks makes it difficult for cell biologists or biochemists working in one area of a cell’s biochemistry to be aware of, let alone relate their results to, findings obtained from the various different areas. So how will all of these individually complex systems be possible to study in an integrated biochemical “mega-system?”
In order to address this problem, the concept of systems biology study has emerged [1
]. However, with i) data being generated by laboratory scientists at a staggering rate in the course of studying the individual systems, ii) the fact that these individual systems are so complicated that scientists rarely have detailed knowledge about areas outside those that they study, there is a huge impediment to implementing a systems approach in cellular biochemistry, and iii) for laboratory scientists to fully embrace systems biology computational tools must lend themselves to usage without requiring advanced mathematical entry or programming.
Several significant advancements in the systems biology field have been made as a response to the sea of data being generated at ever increasing rates. For example, in the area of biochemical signal transduction, several community-based projects to organize information about signal transduction systems such as the Alliance for Cellular Signaling [9
], the former Signal Transduction Knowledge Environment [10
], UniProt [11
], or the WikiPathways project [12
] have been created. These resources provide a way to organize and store important laboratory-generated data and information such as gene sequences, protein characteristics, interaction partners, etc.; these are then easily accessible via the Internet to the scientific community. Building on these resources and advancements has been the development of tools to visualize and analyze these data and, specifically, the entities that make up the complex, network-like structures of biological processes. Amongst the most widely used tools to visualize biological networks is the open-source software, Cytoscape [13
The information contained in the above database resources (and visualized via Cytoscape) is limited in that it is mostly static; biological systems however are dynamic in nature. Hence to fully understand the underlying mechanisms (and those of corresponding diseases), the dynamics of these processes need to be considered.
Computational modeling and simulation has been successfully adopted in a number of fields to dramatically reduce development costs. The use of these modern tools to organize and probe biological structure and function has a high potential to provide the basis for new breakthroughs in both basic understanding of cell function and the development of disease therapies. The ability to observe the actual dynamics of large scale biological systems increases the probability that, out of the tens of thousands of combinations of interactions, unexpected points of intervention might be deciphered. The Cell Collective aims at providing an environment and resource where the biomedical community, as a whole, can more effectively bring these exciting new computational approaches to bear on cellular systems. The integration of computational and laboratory research has the potential to lead to improved understanding of biological processes, mechanisms of disease, and drug development.
If a “systems approach” is to be successful, then there must be a “system” into which the thousands of laboratory scientists all over the world can incorporate their detailed local knowledge of the pathways to create a global model of biochemical pathways. With such a systems platform, all local information would be far more accurate if laboratory scientists would contribute their specialized expertise into a system that enables the integration of the currently dispersed knowledge. Hence, a collaborative modeling platform has the potential to substantially impact and move forward biomedical research.
This is precisely the purpose of The Cell Collective. The Cell Collective is an environment to model biological processes. The platform allows scientists to deposit and track dynamical information about biological processes and integrate and interrogate this knowledge in the context of the biological process as a whole. Laboratory scientists can directly simulate large-scale models in real time to not only help test and form new hypotheses for their laboratory research, but also to make research more easily reproducible (through sharing their models with collaborators). Furthermore, the creation and simulation of models in The Cell Collective doesn’t require direct use of mathematics or programming – a substantial advancement in the field [14
]; this tool has been developed to bring modeling into the hands of mainstream laboratory scientists.
The role of The Cell Collective in the current landscape of systems biology technology
As a result of the constant flow of data from laboratories, the success of biomedical research relies now, more than ever, on computational and computer technologies. While a number of different technologies have already been developed and succeeded in their purpose, The Cell Collective further builds on the successes of these efforts to provide a novel technology to exploit the full potential of systems biology. In this section, a discussion of some of these technologies follows. Note that, the following is not an extensive review, rather we aim to illustrate how The Cell Collective fits within the landscape of systems biology resources. For better understanding, these resources have been categorized according to their function.
A) Biological databases (as mentioned in the Background section, Alliance for Cellular Signaling [9
], STKE [10
], UniProt [11
], the WikiPathways project [12
], KEGG [15
], UniProt [16
], Reactome [17
], Pathway Commons [18
], etc.) were developed as one of the first steps to deal with the sea of biological data being produced with high-throughput technologies. The information contained in these biological databases focuses on static cell “parts lists.” In other words, the data focuses on the description of the individual entities rather than the dynamical relationship between the individual parts. Conversely, The Cell Collective, and specifically its Knowledge Base component (discussed in the Results section) extends static knowledge and data into dynamical models; hence the information contained in the Knowledge Base (which is purely qualitative) is dynamical in nature; it takes into account the dynamical relationship between all of the interacting partners.
B) Software for dynamical models (which employ mathematical frameworks similar to the ones used in The Cell Collective – i.e., rule-based formalisms) also already exist (e.g., GINsim [19
], BooleanNet [20
], CellNetOptimizer [21
], or BoolNet [22
]). These tools have been built and used mainly for individual groups to study networks of a confined size. They also rely on the users’ training in computer programming and/or mathematics (and hence are first and foremost tools developed for modelers); this makes it difficult for laboratory scientists to incorporate these tools into their experimental studies. The Cell Collective provides a novel tool in the area of large-scale, whole cell models, while extending the use of computational modeling to laboratory scientists.
C) Model repositories such as the CellML repository [23
] or the BioModels Database provide a central location to store models developed by the community. These models are then available to others for download and further analyses using other tools. The BioModels Database is primarily a model repository, however, it does provide simulation capabilities via the JWS simulator [24
]. In addition, the PathCase systems biology tool [25
] provides a central place for kinetic models from the BioModels Database and KEGG pathways to be queried, visualized, and simulated side-by-side. Similar to these resources, The Cell Collective provides the first repository (with simulation capabilities) for models based on a qualitative mathematical formalism.
D) Model exchange standards such as the Systems Biology Markup Language (SBML, [27
]) or CellML [29
] make it easier for models to be exchanged between different groups and simulated/analyzed by different simulation tools. For example, when a research group wants to simulate a model deposited to the BioModels Database, the model’s description in SBML or CellML ensures that the model truly corresponds to the same model used by a different group, and hence the generated data can be easily reproduced. While users can share their models with other users of The Cell Collective directly, without the need to import/export model files, the platform currently provides SBML export features based on the most recent version of SBML L3 qualitative package [30
E) Visualization and analysis tools for static interaction networks, such as the aforementioned Cytoscape [13
], but also others including VisANT [31
] or Gephi (http://gephi.org
), have been used extensively to visualize and analyze the graph properties of networks of various types and sizes. As a complement to existing graph analyses, The Cell Collective deals with dynamical models – ones that can be put in motion via computer simulations – and hence focuses on the visualization of the dynamics of these models via simulations, and susbsequent analyses (e.g., input-output relationships). Together, The Cell Collective is a platform that not only provides a unique combination of successful systems biology and modeling approaches, but also offers significant innovations to these technologies. In this manuscript, discussed are the various components and features of the platform, and exemplified on a previously published large-scale network model of signal transduction [32