Innovations in biological and biomedical imaging produce complex high-content and multivariate image data. For decision-making and generation of hypotheses, scientists need novel information technology tools that enable them to visually explore and analyze the data and to discuss and communicate results or findings with collaborating experts from various places.
In this paper, we present a novel Web2.0 approach, BioIMAX, for the collaborative exploration and analysis of multivariate image data by combining the webs collaboration and distribution architecture with the interface interactivity and computation power of desktop applications, recently called rich internet application.
BioIMAX allows scientists to discuss and share data or results with collaborating experts and to visualize, annotate, and explore multivariate image data within one web-based platform from any location via a standard web browser requiring only a username and a password. BioIMAX can be accessed at http://ani.cebitec.uni-bielefeld.de/BioIMAX with the username "test" and the password "test1" for testing purposes.
Recent advances in automation technologies have enabled the use of flow cytometry for high throughput screening, generating large complex data sets often in clinical trials or drug discovery settings. However, data management and data analysis methods have not advanced sufficiently far from the initial small-scale studies to support modeling in the presence of multiple covariates.
We developed a set of flexible open source computational tools in the R package flowCore to facilitate the analysis of these complex data. A key component of which is having suitable data structures that support the application of similar operations to a collection of samples or a clinical cohort. In addition, our software constitutes a shared and extensible research platform that enables collaboration between bioinformaticians, computer scientists, statisticians, biologists and clinicians. This platform will foster the development of novel analytic methods for flow cytometry.
The software has been applied in the analysis of various data sets and its data structures have proven to be highly efficient in capturing and organizing the analytic work flow. Finally, a number of additional Bioconductor packages successfully build on the infrastructure provided by flowCore, open new avenues for flow data analysis.
Computational modeling of biological processes is a promising tool in biomedical research. While a large part of its potential lies in the ability to integrate it with laboratory research, modeling currently generally requires a high degree of training in mathematics and/or computer science. To help address this issue, we have developed a web-based tool, Bio-Logic Builder, that enables laboratory scientists to define mathematical representations (based on a discrete formalism) of biological regulatory mechanisms in a modular and non-technical fashion. As part of the user interface, generalized “bio-logic” modules have been defined to provide users with the building blocks for many biological processes. To build/modify computational models, experimentalists provide purely qualitative information about a particular regulatory mechanisms as is generally found in the laboratory. The Bio-Logic Builder subsequently converts the provided information into a mathematical representation described with Boolean expressions/rules. We used this tool to build a number of dynamical models, including a 130-protein large-scale model of signal transduction with over 800 interactions, influenza A replication cycle with 127 species and 200+ interactions, and mammalian and budding yeast cell cycles. We also show that any and all qualitative regulatory mechanisms can be built using this tool.
Integrative cancer biology research relies on a variety of data-driven computational modeling and simulation methods and techniques geared towards gaining new insights into the complexity of biological processes that are of critical importance for cancer research. These include the dynamics of gene-protein interaction networks, the percolation of sub-cellular perturbations across scales and the impact they may have on tumorigenesis in both experiments and clinics. Such innovative ‘systems’ research will greatly benefit from enabling Information Technology that is currently under development, including an online collaborative environment, a Semantic Web based computing platform that hosts data and model repositories as well as high-performance computing access. Here, we present one of the National Cancer Institute’s recently established Integrative Cancer Biology Programs, i.e. the Center for the Development of a Virtual Tumor, CViT, which is charged with building a cancer modeling community, developing the aforementioned enabling technologies and fostering multi-scale cancer modeling and simulation.
Cancer; complexity; systems biology; multi-scale computational tumor modeling; semantic layered research platform; digital model repository
Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.
Web 2.0; bioinformatics; scientific social community; web service; pipelines
A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface.
cyber-workstation; distributed parallel processing; real-time computational neuroscience; brain-machine interface
Microscopic techniques enable real-space imaging of complex biological events and processes. They have become an essential tool to confirm and complement hypotheses made by biomedical scientists and also allow the re-examination of existing models, hence influencing future investigations. Particularly imaging live cells is crucial for an improved understanding of dynamic biological processes, however hitherto live cell imaging has been limited by the necessity to introduce probes within a cell without altering its physiological and structural integrity. We demonstrate herein that this hurdle can be overcome by effective cytosolic delivery.
We show the delivery within several types of mammalian cells using nanometre-sized biomimetic polymer vesicles (a.k.a. polymersomes) that offer both highly efficient cellular uptake and endolysomal escape capability without any effect on the cellular metabolic activity. Such biocompatible polymersomes can encapsulate various types of probes including cell membrane probes and nucleic acid probes as well as labelled nucleic acids, antibodies and quantum dots.
We show the delivery of sufficient quantities of probes to the cytosol, allowing sustained functional imaging of live cells over time periods of days to weeks. Finally the combination of such effective staining with three-dimensional imaging by confocal laser scanning microscopy allows cell imaging in complex three-dimensional environments under both mono-culture and co-culture conditions. Thus cell migration and proliferation can be studied in models that are much closer to the in vivo situation.
As the “omics” revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter’s complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited.
COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a “semantic web in a box” approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer.
The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/.
Semantic web framework; Rapid application deployment; Linked data; Web services; Biomedical applications; Biomedical semantics
Since the development of the first mathematical cardiac cell model 50 years ago, computational modelling has become an increasingly powerful tool for the analysis of data and for the integration of information related to complex cardiac behaviour. Current models build on decades of iteration between experiment and theory, representing a collective understanding of cardiac function. All models, whether computational, experimental, or conceptual, are simplified representations of reality and, like tools in a toolbox, suitable for specific applications. Their range of applicability can be explored (and expanded) by iterative combination of ‘wet’ and ‘dry’ investigation, where experimental or clinical data are used to first build and then validate computational models (allowing integration of previous findings, quantitative assessment of conceptual models, and projection across relevant spatial and temporal scales), while computational simulations are utilized for plausibility assessment, hypotheses-generation, and prediction (thereby defining further experimental research targets). When implemented effectively, this combined wet/dry research approach can support the development of a more complete and cohesive understanding of integrated biological function. This review illustrates the utility of such an approach, based on recent examples of multi-scale studies of cardiac structure and mechano-electric function.
Heart; Mechano-Electric Feedback; Computational Model; Experimental Model; Multi-Scale
Much has changed in the 5 years since the responsibility for editing the JCI was transferred to Columbia University. Wars and a hurricane have conspired with other factors to overwhelm the national treasury. Support for investigator-initiated research at the NIH is now at a level that jeopardizes the nation’s ability to adequately train future scientists to maintain the country’s leadership in biomedical research. Indeed, there is insufficient support for even the best and brightest biomedical scientists to pursue the frontiers of the sciences at a time of unprecedented opportunities. Human embryonic stem cell research is still being suppressed in the United States. Economic models that enable academic health centers to flourish in the face of increasing challenges and the rising costs of health care have for the most part remained elusive. Translational research has become the buzzword, but there is widespread confusion and disagreement about how to do it. Despite all of these and other challenges to the biomedical research enterprise, the JCI remains vibrant, with record numbers of submissions and a loyal and enthusiastic readership.
“Brian” is a simulator for spiking neural networks (http://www.briansimulator.org). The focus is on making the writing of simulation code as quick and easy as possible for the user, and on flexibility: new and non-standard models are no more difficult to define than standard ones. This allows scientists to spend more time on the details of their models, and less on their implementation. Neuron models are defined by writing differential equations in standard mathematical notation, facilitating scientific communication. Brian is written in the Python programming language, and uses vector-based computation to allow for efficient simulations. It is particularly useful for neuroscientific modelling at the systems level, and for teaching computational neuroscience.
Python; spiking neural networks; simulation; teaching; systems neuroscience
Recent neuropsychological research has begun to reveal that neurons encode information in the timing of spikes. Spiking neural network simulations are a flexible and powerful method for investigating the behaviour of neuronal systems. Simulation of the spiking neural networks in software is unable to rapidly generate output spikes in large-scale of neural network. An alternative approach, hardware implementation of such system, provides the possibility to generate independent spikes precisely and simultaneously output spike waves in real time, under the premise that spiking neural network can take full advantage of hardware inherent parallelism. We introduce a configurable FPGA-oriented hardware platform for spiking neural network simulation in this work. We aim to use this platform to combine the speed of dedicated hardware with the programmability of software so that it might allow neuroscientists to put together sophisticated computation experiments of their own model. A feed-forward hierarchy network is developed as a case study to describe the operation of biological neural systems (such as orientation selectivity of visual cortex) and computational models of such systems. This model demonstrates how a feed-forward neural network constructs the circuitry required for orientation selectivity and provides platform for reaching a deeper understanding of the primate visual system. In the future, larger scale models based on this framework can be used to replicate the actual architecture in visual cortex, leading to more detailed predictions and insights into visual perception phenomenon.
Spiking neural network; Visual cortex; FPGA; Configurable
Modern life sciences research increasingly relies on computational solutions, from large scale data analyses to theoretical modeling. Within the theoretical models Boolean networks occupy an increasing role as they are eminently suited at mapping biological observations and hypotheses into a mathematical formalism. The conceptual underpinnings of Boolean modeling are very accessible even without a background in quantitative sciences, yet it allows life scientists to describe and explore a wide range of surprisingly complex phenomena. In this paper we provide a clear overview of the concepts used in Boolean simulations, present a software library that can perform these simulations based on simple text inputs and give three case studies. The large scale simulations in these case studies demonstrate the Boolean paradigms and their applicability as well as the advanced features and complex use cases that our software package allows. Our software is distributed via a liberal Open Source license and is freely accessible from
Computational modeling of cardiac electrophysiology is a powerful tool for studying arrhythmia mechanisms. In particular, cardiac models are useful for gaining insights into experimental studies, and in the foreseeable future they will be used by clinicians to improve therapy for the patients suffering from complex arrhythmias. Such models are highly intricate, both in their geometric structure and in the equations that represent myocyte electrophysiology. For these models to be useful in a clinical setting, cost-effective solutions for solving the models in real time must be developed. In this work, we hypothesized that low-cost GPGPU-based hardware systems can be used to accelerate arrhythmia simulations. We ported a two dimensional monodomain cardiac model and executed it on various GPGPU platforms. Electrical activity was simulated during point stimulation and rotor activity. Our GPGPU implementations provided significant speedups over the CPU implementation: 18X for point stimulation and 12X for rotor activity. We found that the number of threads that could be launched concurrently was a critical factor in optimizing the GPGPU implementations.
Many biomedical projects would benefit from reducing the time and expense of in vitro experimentation by using computer models for in silico predictions. These models may help determine which expensive biological data are most useful to acquire next. Active Learning techniques for choosing the most informative data enable biologists and computer scientists to optimize experimental data choices for rapid discovery of biological function. To explore design choices that affect this desirable behavior, five novel and five existing Active Learning techniques, together with three control methods, were tested on 57 previously unknown p53 cancer rescue mutants for their ability to build classifiers that predict protein function. The best of these techniques, Maximum Curiosity, improved the baseline accuracy of 56% to 77%. This paper shows that Active Learning is a useful tool for biomedical research, and provides a case study of interest to others facing similar discovery challenges.
The National Center for Biomedical Ontology is now in its seventh year. The goals of this National Center for Biomedical Computing are to: create and maintain a repository of biomedical ontologies and terminologies; build tools and web services to enable the use of ontologies and terminologies in clinical and translational research; educate their trainees and the scientific community broadly about biomedical ontology and ontology-based technology and best practices; and collaborate with a variety of groups who develop and use ontologies and terminologies in biomedicine. The centerpiece of the National Center for Biomedical Ontology is a web-based resource known as BioPortal. BioPortal makes available for research in computationally useful forms more than 270 of the world's biomedical ontologies and terminologies, and supports a wide range of web services that enable investigators to use the ontologies to annotate and retrieve data, to generate value sets and special-purpose lexicons, and to perform advanced analytics on a wide range of biomedical data.
Collaborative technologies; knowledge representations; knowledge acquisition and knowledge management; controlled terminologies and vocabularies; ontologies; knowledge bases; applications that link biomedical knowledge from diverse primary sources (includes automated indexing); statistical analysis of large datasets; methods for integration of information from disparate sources; discovery; and text and data mining methods; automated learning; information retrieval; HIT data standards; representing; identifying; and modeling biological structures; developing and refining ehr data standards (including image standards)
In this paper we introduce Armadillo v1.1, a novel workflow platform dedicated to designing and conducting phylogenetic studies, including comprehensive simulations. A number of important phylogenetic and general bioinformatics tools have been included in the first software release. As Armadillo is an open-source project, it allows scientists to develop their own modules as well as to integrate existing computer applications. Using our workflow platform, different complex phylogenetic tasks can be modeled and presented in a single workflow without any prior knowledge of programming techniques. The first version of Armadillo was successfully used by professors of bioinformatics at Université du Quebec à Montreal during graduate computational biology courses taught in 2010–11. The program and its source code are freely available at: .
This article describes an innovative software toolkit that allows the creation of web applications that facilitate the acquisition, integration, and dissemination of multimedia biomedical data over the web, thereby reducing the cost of knowledge sharing. There is a lack of high-level web application development tools suitable for use by researchers, clinicians, and educators who are not skilled programmers. Our Web Interfacing Repository Manager (WIRM) is a software toolkit that reduces the complexity of building custom biomedical web applications. WIRM’s visual modeling tools enable domain experts to describe the structure of their knowledge, from which WIRM automatically generates full-featured, customizable content management systems.
The Chronic Lymphocytic Leukemia (CLL) Research Consortium (CRC) consists of 9 geographically distributed sites conducting a program of research including both basic science and clinical components. To enable the CRC’s clinical research efforts, a system providing for real-time collaboration was required. CTMS provides such functionality, and demonstrates that the use of novel data modeling, web-application platforms, and management strategies provides for the deployment of an extensible, cost effective solution in such an environment.
Recent advances in genomic sequencing have enabled the use of genome sequencing in standard biological and biotechnological research projects. The challenge is how to integrate the large amount of data in order to gain novel biological insights. One way to leverage sequence data is to use genome-scale metabolic models. We have therefore designed and implemented a bioinformatics platform which supports the development of such metabolic models.
MEMOSys (MEtabolic MOdel research and development System) is a versatile platform for the management, storage, and development of genome-scale metabolic models. It supports the development of new models by providing a built-in version control system which offers access to the complete developmental history. Moreover, the integrated web board, the authorization system, and the definition of user roles allow collaborations across departments and institutions. Research on existing models is facilitated by a search system, references to external databases, and a feature-rich comparison mechanism. MEMOSys provides customizable data exchange mechanisms using the SBML format to enable analysis in external tools. The web application is based on the Java EE framework and offers an intuitive user interface. It currently contains six annotated microbial metabolic models.
We have developed a web-based system designed to provide researchers a novel application facilitating the management and development of metabolic models. The system is freely available at http://www.icbi.at/MEMOSys.
Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation.
Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick.
Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to approach genomic analysis that traditional single gene lists do not, particularly in areas such as interaction discovery.
With the increasing age and cost of operation of the existing NCI SEER platform core technologies, such essential resources in the fight against cancer as these will eventually have to be migrated to Grid based systems. In order to model this migration, a simulation is proposed based upon an agent modeling technology. This modeling technique allows for simulation of complex and distributed services provided by a large scale Grid computing platform such as the caBIG™ project’s caGRID. In order to investigate such a migration to a Grid based platform technology, this paper proposes using agent-based modeling simulations to predict the performance of current and Grid configurations of the NCI SEER system integrated with the existing translational opportunities afforded by caGRID. The model illustrates how the use of Grid technology can potentially improve system response time as systems under test are scaled. In modeling SEER nodes accessing multiple registry silos, we show that the performance of SEER applications re-implemented in a Grid native manner exhibits a nearly constant user response time with increasing numbers of distributed registry silos, compared with the current application architecture which exhibits a linear increase in response time for increasing numbers of silos.
Simulation-based medicine and the development of complex computer models of biological structures is becoming ubiquitous for advancing biomedical engineering and clinical research. Finite element analysis (FEA) has been widely used in the last few decades to understand and predict biomechanical phenomena. Modeling and simulation approaches in biomechanics are highly interdisciplinary, involving novice and skilled developers in all areas of biomedical engineering and biology. While recent advances in model development and simulation platforms offer a wide range of tools to investigators, the decision making process during modeling and simulation has become more opaque. Hence, reliability of such models used for medical decision making and for driving multiscale analysis comes into question. Establishing guidelines for model development and dissemination is a daunting task, particularly with the complex and convoluted models used in FEA. Nonetheless, if better reporting can be established, researchers will have a better understanding of a model’s value and the potential for reusability through sharing will be bolstered. Thus, the goal of this document is to identify resources and considerate reporting parameters for FEA studies in biomechanics. These entail various levels of reporting parameters for model identification, model structure, simulation structure, verification, validation, and availability. While we recognize that it may not be possible to provide and detail all of the reporting considerations presented, it is possible to establish a level of confidence with selective use of these parameters. More detailed reporting, however, can establish an explicit outline of the decision-making process in simulation-based analysis for enhanced reproducibility, reusability, and sharing.
standards in modeling; finite element analysis; reporting; biomechanics; tissue mechanics; joint biomechanics; musculoskeletal system; device mechanics; device evaluation
The National Center for Integrative and Biomedical Informatics (NCIBI) is one of the eight NCBCs. NCIBI supports information access and data analysis for biomedical researchers, enabling them to build computational and knowledge models of biological systems to address the Driving Biological Problems (DBPs). The NCIBI DBPs have included prostate cancer progression, organ-specific complications of type 1 and 2 diabetes, bipolar disorder, and metabolic analysis of obesity syndrome. Collaborating with these and other partners, NCIBI has developed a series of software tools for exploratory analysis, concept visualization, and literature searches, as well as core database and web services resources. Many of our training and outreach initiatives have been in collaboration with the Research Centers at Minority Institutions (RCMI), integrating NCIBI and RCMI faculty and students, culminating each year in an annual workshop. Our future directions include focusing on the TranSMART data sharing and analysis initiative.
Integrative; bioinformatics; computational biology; biomedical informatics; biomedical informatics; genomics; national center
Biomedical research is set to greatly benefit from the use of semantic web technologies in the design of computational infrastructure. However, beyond well defined research initiatives, substantial issues of data heterogeneity, source distribution, and privacy currently stand in the way towards the personalization of Medicine.
A computational framework for bioinformatic infrastructure was designed to deal with the heterogeneous data sources and the sensitive mixture of public and private data that characterizes the biomedical domain. This framework consists of a logical model build with semantic web tools, coupled with a Markov process that propagates user operator states. An accompanying open source prototype was developed to meet a series of applications that range from collaborative multi-institution data acquisition efforts to data analysis applications that need to quickly traverse complex data structures. This report describes the two abstractions underlying the S3DB-based infrastructure, logical and numerical, and discusses its generality beyond the immediate confines of existing implementations.
The emergence of the "web as a computer" requires a formal model for the different functionalities involved in reading and writing to it. The S3DB core model proposed was found to address the design criteria of biomedical computational infrastructure, such as those supporting large scale multi-investigator research, clinical trials, and molecular epidemiology.