Innovations in biological and biomedical imaging produce complex high-content and multivariate image data. For decision-making and generation of hypotheses, scientists need novel information technology tools that enable them to visually explore and analyze the data and to discuss and communicate results or findings with collaborating experts from various places.
In this paper, we present a novel Web2.0 approach, BioIMAX, for the collaborative exploration and analysis of multivariate image data by combining the webs collaboration and distribution architecture with the interface interactivity and computation power of desktop applications, recently called rich internet application.
BioIMAX allows scientists to discuss and share data or results with collaborating experts and to visualize, annotate, and explore multivariate image data within one web-based platform from any location via a standard web browser requiring only a username and a password. BioIMAX can be accessed at http://ani.cebitec.uni-bielefeld.de/BioIMAX with the username "test" and the password "test1" for testing purposes.
Computational modeling of biological processes is a promising tool in biomedical research. While a large part of its potential lies in the ability to integrate it with laboratory research, modeling currently generally requires a high degree of training in mathematics and/or computer science. To help address this issue, we have developed a web-based tool, Bio-Logic Builder, that enables laboratory scientists to define mathematical representations (based on a discrete formalism) of biological regulatory mechanisms in a modular and non-technical fashion. As part of the user interface, generalized “bio-logic” modules have been defined to provide users with the building blocks for many biological processes. To build/modify computational models, experimentalists provide purely qualitative information about a particular regulatory mechanisms as is generally found in the laboratory. The Bio-Logic Builder subsequently converts the provided information into a mathematical representation described with Boolean expressions/rules. We used this tool to build a number of dynamical models, including a 130-protein large-scale model of signal transduction with over 800 interactions, influenza A replication cycle with 127 species and 200+ interactions, and mammalian and budding yeast cell cycles. We also show that any and all qualitative regulatory mechanisms can be built using this tool.
Recent advances in automation technologies have enabled the use of flow cytometry for high throughput screening, generating large complex data sets often in clinical trials or drug discovery settings. However, data management and data analysis methods have not advanced sufficiently far from the initial small-scale studies to support modeling in the presence of multiple covariates.
We developed a set of flexible open source computational tools in the R package flowCore to facilitate the analysis of these complex data. A key component of which is having suitable data structures that support the application of similar operations to a collection of samples or a clinical cohort. In addition, our software constitutes a shared and extensible research platform that enables collaboration between bioinformaticians, computer scientists, statisticians, biologists and clinicians. This platform will foster the development of novel analytic methods for flow cytometry.
The software has been applied in the analysis of various data sets and its data structures have proven to be highly efficient in capturing and organizing the analytic work flow. Finally, a number of additional Bioconductor packages successfully build on the infrastructure provided by flowCore, open new avenues for flow data analysis.
Semantic interoperability between routine healthcare and clinical research is an unsolved issue, as information systems in the healthcare domain still use proprietary and site-specific data models. However, information exchange and data harmonization are essential for physicians and scientists if they want to collect and analyze data from different hospitals in order to build up registries and perform multicenter clinical trials. Consequently, there is a need for a standardized metadata exchange based on common data models. Currently this is mainly done by informatics experts instead of medical experts.
We propose to enable physicians to exchange, rate, comment and discuss their own medical data models in a collaborative web-based repository of medical forms in a standardized format.
Based on a comprehensive requirement analysis, a web-based portal for medical data models was specified. In this context, a data model is the technical specification (attributes, data types, value lists) of a medical form without any layout information. The CDISC Operational Data Model (ODM) was chosen as the appropriate format for the standardized representation of data models. The system was implemented with Ruby on Rails and applies web 2.0 technologies to provide a community based solution. Forms from different source systems – both routine care and clinical research – were converted into ODM format and uploaded into the portal.
A portal for medical data models based on ODM-files was implemented (http://www.medical-data-models.org). Physicians are able to upload, comment, rate and download medical data models. More than 250 forms with approximately 8000 items are provided in different views (overview and detailed presentation) and in multiple languages. For instance, the portal contains forms from clinical and research information systems.
The portal provides a system-independent repository for multilingual data models in ODM format which can be used by physicians. It serves as a platform for discussion and enables the exchange of multilingual medical data models in a standardized way.
Semantic interoperability; CDISC ODM; form exchange; form repository
Integrative cancer biology research relies on a variety of data-driven computational modeling and simulation methods and techniques geared towards gaining new insights into the complexity of biological processes that are of critical importance for cancer research. These include the dynamics of gene-protein interaction networks, the percolation of sub-cellular perturbations across scales and the impact they may have on tumorigenesis in both experiments and clinics. Such innovative ‘systems’ research will greatly benefit from enabling Information Technology that is currently under development, including an online collaborative environment, a Semantic Web based computing platform that hosts data and model repositories as well as high-performance computing access. Here, we present one of the National Cancer Institute’s recently established Integrative Cancer Biology Programs, i.e. the Center for the Development of a Virtual Tumor, CViT, which is charged with building a cancer modeling community, developing the aforementioned enabling technologies and fostering multi-scale cancer modeling and simulation.
Cancer; complexity; systems biology; multi-scale computational tumor modeling; semantic layered research platform; digital model repository
A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface.
cyber-workstation; distributed parallel processing; real-time computational neuroscience; brain-machine interface
Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.
Web 2.0; bioinformatics; scientific social community; web service; pipelines
Microscopic techniques enable real-space imaging of complex biological events and processes. They have become an essential tool to confirm and complement hypotheses made by biomedical scientists and also allow the re-examination of existing models, hence influencing future investigations. Particularly imaging live cells is crucial for an improved understanding of dynamic biological processes, however hitherto live cell imaging has been limited by the necessity to introduce probes within a cell without altering its physiological and structural integrity. We demonstrate herein that this hurdle can be overcome by effective cytosolic delivery.
We show the delivery within several types of mammalian cells using nanometre-sized biomimetic polymer vesicles (a.k.a. polymersomes) that offer both highly efficient cellular uptake and endolysomal escape capability without any effect on the cellular metabolic activity. Such biocompatible polymersomes can encapsulate various types of probes including cell membrane probes and nucleic acid probes as well as labelled nucleic acids, antibodies and quantum dots.
We show the delivery of sufficient quantities of probes to the cytosol, allowing sustained functional imaging of live cells over time periods of days to weeks. Finally the combination of such effective staining with three-dimensional imaging by confocal laser scanning microscopy allows cell imaging in complex three-dimensional environments under both mono-culture and co-culture conditions. Thus cell migration and proliferation can be studied in models that are much closer to the in vivo situation.
As the “omics” revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter’s complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited.
COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a “semantic web in a box” approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer.
The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/.
Semantic web framework; Rapid application deployment; Linked data; Web services; Biomedical applications; Biomedical semantics
Since the development of the first mathematical cardiac cell model 50 years ago, computational modelling has become an increasingly powerful tool for the analysis of data and for the integration of information related to complex cardiac behaviour. Current models build on decades of iteration between experiment and theory, representing a collective understanding of cardiac function. All models, whether computational, experimental, or conceptual, are simplified representations of reality and, like tools in a toolbox, suitable for specific applications. Their range of applicability can be explored (and expanded) by iterative combination of ‘wet’ and ‘dry’ investigation, where experimental or clinical data are used to first build and then validate computational models (allowing integration of previous findings, quantitative assessment of conceptual models, and projection across relevant spatial and temporal scales), while computational simulations are utilized for plausibility assessment, hypotheses-generation, and prediction (thereby defining further experimental research targets). When implemented effectively, this combined wet/dry research approach can support the development of a more complete and cohesive understanding of integrated biological function. This review illustrates the utility of such an approach, based on recent examples of multi-scale studies of cardiac structure and mechano-electric function.
Heart; Mechano-Electric Feedback; Computational Model; Experimental Model; Multi-Scale
Whole-cell models promise to accelerate biomedical science and engineering. However, discovering new biology from whole-cell models and other high-throughput technologies requires novel tools for exploring and analyzing complex, high-dimensional data.
We developed WholeCellViz, a web-based software program for visually exploring and analyzing whole-cell simulations. WholeCellViz provides 14 animated visualizations, including metabolic and chromosome maps. These visualizations help researchers analyze model predictions by displaying predictions in their biological context. Furthermore, WholeCellViz enables researchers to compare predictions within and across simulations by allowing users to simultaneously display multiple visualizations.
WholeCellViz was designed to facilitate exploration, analysis, and communication of whole-cell model data. Taken together, WholeCellViz helps researchers use whole-cell model simulations to drive advances in biology and bioengineering.
Whole-cell modeling; Data visualization; Cell physiology; Computational biology; Mycoplasma; Bacteria; Systems biology
Much has changed in the 5 years since the responsibility for editing the JCI was transferred to Columbia University. Wars and a hurricane have conspired with other factors to overwhelm the national treasury. Support for investigator-initiated research at the NIH is now at a level that jeopardizes the nation’s ability to adequately train future scientists to maintain the country’s leadership in biomedical research. Indeed, there is insufficient support for even the best and brightest biomedical scientists to pursue the frontiers of the sciences at a time of unprecedented opportunities. Human embryonic stem cell research is still being suppressed in the United States. Economic models that enable academic health centers to flourish in the face of increasing challenges and the rising costs of health care have for the most part remained elusive. Translational research has become the buzzword, but there is widespread confusion and disagreement about how to do it. Despite all of these and other challenges to the biomedical research enterprise, the JCI remains vibrant, with record numbers of submissions and a loyal and enthusiastic readership.
“Brian” is a simulator for spiking neural networks (http://www.briansimulator.org). The focus is on making the writing of simulation code as quick and easy as possible for the user, and on flexibility: new and non-standard models are no more difficult to define than standard ones. This allows scientists to spend more time on the details of their models, and less on their implementation. Neuron models are defined by writing differential equations in standard mathematical notation, facilitating scientific communication. Brian is written in the Python programming language, and uses vector-based computation to allow for efficient simulations. It is particularly useful for neuroscientific modelling at the systems level, and for teaching computational neuroscience.
Python; spiking neural networks; simulation; teaching; systems neuroscience
Recent neuropsychological research has begun to reveal that neurons encode information in the timing of spikes. Spiking neural network simulations are a flexible and powerful method for investigating the behaviour of neuronal systems. Simulation of the spiking neural networks in software is unable to rapidly generate output spikes in large-scale of neural network. An alternative approach, hardware implementation of such system, provides the possibility to generate independent spikes precisely and simultaneously output spike waves in real time, under the premise that spiking neural network can take full advantage of hardware inherent parallelism. We introduce a configurable FPGA-oriented hardware platform for spiking neural network simulation in this work. We aim to use this platform to combine the speed of dedicated hardware with the programmability of software so that it might allow neuroscientists to put together sophisticated computation experiments of their own model. A feed-forward hierarchy network is developed as a case study to describe the operation of biological neural systems (such as orientation selectivity of visual cortex) and computational models of such systems. This model demonstrates how a feed-forward neural network constructs the circuitry required for orientation selectivity and provides platform for reaching a deeper understanding of the primate visual system. In the future, larger scale models based on this framework can be used to replicate the actual architecture in visual cortex, leading to more detailed predictions and insights into visual perception phenomenon.
Spiking neural network; Visual cortex; FPGA; Configurable
Modern life sciences research increasingly relies on computational solutions, from large scale data analyses to theoretical modeling. Within the theoretical models Boolean networks occupy an increasing role as they are eminently suited at mapping biological observations and hypotheses into a mathematical formalism. The conceptual underpinnings of Boolean modeling are very accessible even without a background in quantitative sciences, yet it allows life scientists to describe and explore a wide range of surprisingly complex phenomena. In this paper we provide a clear overview of the concepts used in Boolean simulations, present a software library that can perform these simulations based on simple text inputs and give three case studies. The large scale simulations in these case studies demonstrate the Boolean paradigms and their applicability as well as the advanced features and complex use cases that our software package allows. Our software is distributed via a liberal Open Source license and is freely accessible from
Computational modeling of cardiac electrophysiology is a powerful tool for studying arrhythmia mechanisms. In particular, cardiac models are useful for gaining insights into experimental studies, and in the foreseeable future they will be used by clinicians to improve therapy for the patients suffering from complex arrhythmias. Such models are highly intricate, both in their geometric structure and in the equations that represent myocyte electrophysiology. For these models to be useful in a clinical setting, cost-effective solutions for solving the models in real time must be developed. In this work, we hypothesized that low-cost GPGPU-based hardware systems can be used to accelerate arrhythmia simulations. We ported a two dimensional monodomain cardiac model and executed it on various GPGPU platforms. Electrical activity was simulated during point stimulation and rotor activity. Our GPGPU implementations provided significant speedups over the CPU implementation: 18X for point stimulation and 12X for rotor activity. We found that the number of threads that could be launched concurrently was a critical factor in optimizing the GPGPU implementations.
Many biomedical projects would benefit from reducing the time and expense of in vitro experimentation by using computer models for in silico predictions. These models may help determine which expensive biological data are most useful to acquire next. Active Learning techniques for choosing the most informative data enable biologists and computer scientists to optimize experimental data choices for rapid discovery of biological function. To explore design choices that affect this desirable behavior, five novel and five existing Active Learning techniques, together with three control methods, were tested on 57 previously unknown p53 cancer rescue mutants for their ability to build classifiers that predict protein function. The best of these techniques, Maximum Curiosity, improved the baseline accuracy of 56% to 77%. This paper shows that Active Learning is a useful tool for biomedical research, and provides a case study of interest to others facing similar discovery challenges.
The National Center for Biomedical Ontology is now in its seventh year. The goals of this National Center for Biomedical Computing are to: create and maintain a repository of biomedical ontologies and terminologies; build tools and web services to enable the use of ontologies and terminologies in clinical and translational research; educate their trainees and the scientific community broadly about biomedical ontology and ontology-based technology and best practices; and collaborate with a variety of groups who develop and use ontologies and terminologies in biomedicine. The centerpiece of the National Center for Biomedical Ontology is a web-based resource known as BioPortal. BioPortal makes available for research in computationally useful forms more than 270 of the world's biomedical ontologies and terminologies, and supports a wide range of web services that enable investigators to use the ontologies to annotate and retrieve data, to generate value sets and special-purpose lexicons, and to perform advanced analytics on a wide range of biomedical data.
Collaborative technologies; knowledge representations; knowledge acquisition and knowledge management; controlled terminologies and vocabularies; ontologies; knowledge bases; applications that link biomedical knowledge from diverse primary sources (includes automated indexing); statistical analysis of large datasets; methods for integration of information from disparate sources; discovery; and text and data mining methods; automated learning; information retrieval; HIT data standards; representing; identifying; and modeling biological structures; developing and refining ehr data standards (including image standards)
This article describes an innovative software toolkit that allows the creation of web applications that facilitate the acquisition, integration, and dissemination of multimedia biomedical data over the web, thereby reducing the cost of knowledge sharing. There is a lack of high-level web application development tools suitable for use by researchers, clinicians, and educators who are not skilled programmers. Our Web Interfacing Repository Manager (WIRM) is a software toolkit that reduces the complexity of building custom biomedical web applications. WIRM’s visual modeling tools enable domain experts to describe the structure of their knowledge, from which WIRM automatically generates full-featured, customizable content management systems.
In this paper we introduce Armadillo v1.1, a novel workflow platform dedicated to designing and conducting phylogenetic studies, including comprehensive simulations. A number of important phylogenetic and general bioinformatics tools have been included in the first software release. As Armadillo is an open-source project, it allows scientists to develop their own modules as well as to integrate existing computer applications. Using our workflow platform, different complex phylogenetic tasks can be modeled and presented in a single workflow without any prior knowledge of programming techniques. The first version of Armadillo was successfully used by professors of bioinformatics at Université du Quebec à Montreal during graduate computational biology courses taught in 2010–11. The program and its source code are freely available at: .
Summary: Many important data in current biological science comprise hundreds, thousands or more individual results. These massive data require computational tools to navigate results and effectively interact with the content. Mobile device apps are an increasingly important tool in the everyday lives of scientists and non-scientists alike. These software present individuals with compact and efficient tools to interact with complex data at meetings or other locations remote from their main computing environment. We believe that apps will be important tools for biologists, geneticists and physicians to review content while participating in biomedical research or practicing medicine. We have developed a prototype app for displaying gene expression data using the iOS platform. To present the software engineering requirements, we review the model-view-controller schema for Apple's iOS. We apply this schema to a simple app for querying locally developed microarray gene expression data. The challenge of this application is to balance between storing content locally within the app versus obtaining it dynamically via a network connection.
Availability: The Hematopoietic Expression Viewer is available at http://www.shawlab.org/he_viewer. The source code for this project and any future information on how to obtain the app can be accessed at http://www.shawlab.org/he_viewer.
Supplementary data are available at Bioinformatics online.
Cellzilla is a two-dimensional tissue simulation platform for plant modeling utilizing Cellerator arrows. Cellerator describes biochemical interactions with a simplified arrow-based notation; all interactions are input as reactions and are automatically translated to the appropriate differential equations using a computer algebra system. Cells are represented by a polygonal mesh of well-mixed compartments. Cell constituents can interact intercellularly via Cellerator reactions utilizing diffusion, transport, and action at a distance, as well as amongst themselves within a cell. The mesh data structure consists of vertices, edges (vertex pairs), and cells (and optional intercellular wall compartments) as ordered collections of edges. Simulations may be either static, in which cell constituents change with time but cell size and shape remain fixed; or dynamic, where cells can also grow. Growth is controlled by Hookean springs associated with each mesh edge and an outward pointing pressure force. Spring rest length grows at a rate proportional to the extension beyond equilibrium. Cell division occurs when a specified constituent (or cell mass) passes a (random, normally distributed) threshold. The orientation of new cell walls is determined either by Errera's rule, or by a potential model that weighs contributions due to equalizing daughter areas, minimizing wall length, alignment perpendicular to cell extension, and alignment perpendicular to actual growth direction.
mathematical model; computational model; software; meristem; cellerator; cellzilla; wuschel; clavata
Recent advances in genomic sequencing have enabled the use of genome sequencing in standard biological and biotechnological research projects. The challenge is how to integrate the large amount of data in order to gain novel biological insights. One way to leverage sequence data is to use genome-scale metabolic models. We have therefore designed and implemented a bioinformatics platform which supports the development of such metabolic models.
MEMOSys (MEtabolic MOdel research and development System) is a versatile platform for the management, storage, and development of genome-scale metabolic models. It supports the development of new models by providing a built-in version control system which offers access to the complete developmental history. Moreover, the integrated web board, the authorization system, and the definition of user roles allow collaborations across departments and institutions. Research on existing models is facilitated by a search system, references to external databases, and a feature-rich comparison mechanism. MEMOSys provides customizable data exchange mechanisms using the SBML format to enable analysis in external tools. The web application is based on the Java EE framework and offers an intuitive user interface. It currently contains six annotated microbial metabolic models.
We have developed a web-based system designed to provide researchers a novel application facilitating the management and development of metabolic models. The system is freely available at http://www.icbi.at/MEMOSys.
The Chronic Lymphocytic Leukemia (CLL) Research Consortium (CRC) consists of 9 geographically distributed sites conducting a program of research including both basic science and clinical components. To enable the CRC’s clinical research efforts, a system providing for real-time collaboration was required. CTMS provides such functionality, and demonstrates that the use of novel data modeling, web-application platforms, and management strategies provides for the deployment of an extensible, cost effective solution in such an environment.
Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation.
Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick.
Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to approach genomic analysis that traditional single gene lists do not, particularly in areas such as interaction discovery.