One of the requirements for a federated information system is interoperability, the ability of one computer system to access and use the resources of another system. This feature is particularly important in biomedical research systems, which need to coordinate a variety of disparate types of data. In order to meet this need, the National Cancer Institute Center for Bioinformatics (NCICB) has created the cancer Common Ontologic Representation Environment (caCORE), an interoperability infrastructure based on Model Driven Architecture. The caCORE infrastructure provides a mechanism to create interoperable biomedical information systems. Systems built using the caCORE paradigm address both aspects of interoperability: the ability to access data (syntactic interoperability) and understand the data once retrieved (semantic interoperability). This infrastructure consists of an integrated set of three major components: a controlled terminology service (Enterprise Vocabulary Services), a standards-based metadata repository (the cancer Data Standards Repository) and an information system with an Application Programming Interface (API) based on Domain Model Driven Architecture. This infrastructure is being leveraged to create a Semantic Service Oriented Architecture (SSOA) for cancer research by the National Cancer Institute’s cancer Biomedical Informatics Grid (caBIG™).
Semantic Interoperability; Model Driven Architecture; Metadata; Controlled Terminology; ISO 11179
The purpose of this document is to provide readers with a resource of different ongoing
standardization efforts within the ‘omics’ (genomic, proteomics, metabolomics)
and related communities, with particular focus on toxicological and environmental
applications. The review includes initiatives within the research community as well as
in the regulatory arena. It addresses data management issues (format and reporting
structures for the exchange of information) and database interoperability, highlighting
key objectives, target audience and participants. A considerable amount of work
still needs to be done and, ideally, collaboration should be optimized and duplication
and incompatibility should be avoided where possible. The consequence of failing to
deliver data standards is an escalation in the burden and cost of data management
Robust, programmatically accessible biomedical information services that syntactically and semantically interoperate with other resources are challenging to construct. Such systems require the adoption of common information models, data representations and terminology standards as well as documented application programming interfaces (APIs). The National Cancer Institute (NCI) developed the cancer common ontologic representation environment (caCORE) to provide the infrastructure necessary to achieve interoperability across the systems it develops or sponsors. The caCORE Software Development Kit (SDK) was designed to provide developers both within and outside the NCI with the tools needed to construct such interoperable software systems.
The caCORE SDK requires a Unified Modeling Language (UML) tool to begin the development workflow with the construction of a domain information model in the form of a UML Class Diagram. Models are annotated with concepts and definitions from a description logic terminology source using the Semantic Connector component. The annotated model is registered in the Cancer Data Standards Repository (caDSR) using the UML Loader component. System software is automatically generated using the Codegen component, which produces middleware that runs on an application server. The caCORE SDK was initially tested and validated using a seven-class UML model, and has been used to generate the caCORE production system, which includes models with dozens of classes. The deployed system supports access through object-oriented APIs with consistent syntax for retrieval of any type of data object across all classes in the original UML model. The caCORE SDK is currently being used by several development teams, including by participants in the cancer biomedical informatics grid (caBIG) program, to create compatible data services. caBIG compatibility standards are based upon caCORE resources, and thus the caCORE SDK has emerged as a key enabling technology for caBIG.
The caCORE SDK substantially lowers the barrier to implementing systems that are syntactically and semantically interoperable by providing workflow and automation tools that standardize and expedite modeling, development, and deployment. It has gained acceptance among developers in the caBIG program, and is expected to provide a common mechanism for creating data service nodes on the data grid that is under development.
Globalization and the advances in modern information and communication technologies (ICT) are changing the practice of health care and policy making. In the globalized economies of the 21 century, health systems will have to respond to the need of increasingly mobile citizens, patients and providers. At the same time the increased use of ICT is enabling health systems to systematize, process and integrate multiple data silos from different settings and at various levels. To meet these challenges effectively, the creation of an interoperable, global e-Health information infrastructure is critical. Data interoperability within and across heterogeneous health systems, however, is often hampered by differences in terminological inconsistencies and the lack of a common language, particularly when multiple communities of practice from different countries are involved.
Discuss the functionality and ontological requirements for ICF in achieving semantic interoperability of e-Health information systems.
Most solution attempts for interoperability to date have only focused on technical exchange of data in common formats. Automated health information exchange and aggregation is a very complex task which depends on many crucial prerequisites. The overall architecture of the health information system has to be defined clearly at macro and micro levels in terms of its building blocks and their characteristics. The taxonomic and conceptual features of the ICF make it an important architectural element in the overall design of e-Health information systems. To use the ICF in a digital environment the classification needs to be formalized and modeled using ontological principles and description logic. Ontological modeling is also required for linking assessment instruments and clinical terminologies (e.g. SNOMED) to the ICF.
To achieve semantic interoperability of e-Health systems a carefully elaborated overall health information system architecture has to be established. As a content standard, the ICF can play a pivotal role for meaningful and automated compilation and exchange of health information across sectors and levels. In order to fulfill this role a ICF ontology needs to be developed.
semantic interoperability; health and disability classification; ontology development
Ongoing innovation in phylogenetics and evolutionary biology has been accompanied by a proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need for reusable software that can read, manipulate and transform this information into the various forms required to build computational pipelines.
We built a Python software library for working with phylogenetic data that is tightly integrated with Biopython, a broad-ranging toolkit for computational biology. Our library, Bio.Phylo, is highly interoperable with existing libraries, tools and standards, and is capable of parsing common file formats for phylogenetic trees, performing basic transformations and manipulations, attaching rich annotations, and visualizing trees. We unified the modules for working with the standard file formats Newick, NEXUS and phyloXML behind a consistent and simple API, providing a common set of functionality independent of the data source.
Bio.Phylo meets a growing need in bioinformatics for working with heterogeneous types of phylogenetic data. By supporting interoperability with multiple file formats and leveraging existing Biopython features, this library simplifies the construction of phylogenetic workflows. We also provide examples of the benefits of building a community around a shared open-source project. Bio.Phylo is included with Biopython, available through the Biopython website, http://biopython.org.
The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community.
SADI - Semantic Automated Discovery and Integration - is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services "stack", SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers.
SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behaviour we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies.
Current efforts within the biomedical ontology community focus on achieving interoperability between various biomedical ontologies that cover a range of diverse domains. Achieving this interoperability will contribute to the creation of a rich knowledge base that can be used for querying, as well as generating and testing novel hypotheses. The OBO Foundry principles, as applied to a number of biomedical ontologies, are designed to facilitate this interoperability. However, semantic extensions are required to meet the OBO Foundry interoperability goals. Inconsistencies may arise when ontologies of properties – mostly phenotype ontologies – are combined with ontologies taking a canonical view of a domain – such as many anatomical ontologies. Currently, there is no support for a correct and consistent integration of such ontologies.
We have developed a methodology for accurately representing canonical domain ontologies within the OBO Foundry. This is achieved by adding an extension to the semantics for relationships in the biomedical ontologies that allows for treating canonical information as default. Conclusions drawn from default knowledge may be revoked when additional information becomes available. We show how this extension can be used to achieve interoperability between ontologies, and further allows for the inclusion of more knowledge within them. We apply the formalism to ontologies of mouse anatomy and mammalian phenotypes in order to demonstrate the approach.
Biomedical ontologies require a new class of relations that can be used in conjunction with default knowledge, thereby extending those currently in use. The inclusion of default knowledge is necessary in order to ensure interoperability between ontologies.
At the GSC11 meeting (4-6 April 2011, Hinxton, England, the GSC’s genomic biodiversity working group (GBWG) developed an initial model for a data management testbed at the interface of biodiversity with genomics and metagenomics. With representatives of the Global Biodiversity Information Facility (GBIF) participating, it was agreed that the most useful course of action would be for GBIF to collaborate with the GSC in its ongoing GBWG workshops to achieve common goals around interoperability/data integration across (meta)-genomic and species level data. It was determined that a quick comparison should be made of the contents of the Darwin Core (DwC) and the GSC data checklists, with a goal of determining their degree of overlap and compatibility. An ad-hoc task group lead by Renzo Kottman and Peter Dawyndt undertook an initial comparison between the Darwin Core (DwC) standard used by the Global Biodiversity Information Facility (GBIF) and the MIxS checklists put forward by the Genomic Standards Consortium (GSC). A term-by-term comparison showed that DwC and GSC concepts complement each other far more than they compete with each other. Because the preliminary analysis done at this meeting was based on expertise with GSC standards, but not with DwC standards, the group recommended that a joint meeting of DwC and GSC experts be convened as soon as possible to continue this joint assessment and to propose additional work going forward.
To support more informed prescribing decisions, e-prescribing systems need data on patients’ medication histories and their drug-specific insurance coverage. We used an expert panel process to evaluate the technical adequacy of two standards for delivering this information, the Medication History function of the NCPDP SCRIPT Standard and the NCPDP Formulary and Benefit Standard.
We convened a panel representing 14 organizations that had experience with these standards. Experts within each organization submitted narrative responses and ratings assessing the standards in 6 domains, including data quality, completeness, usability, and interoperability. Areas of disagreement were discussed in recorded teleconferences. Narrative was analyzed using a grounded-theory approach.
Panelists agreed that the structure of the Medication History Standard was adequate for delivering accurate and complete information but implementation problems made the data difficult to use for decision support. The panel also agreed that the Formulary and Benefit Standard was adequate to deliver formulary status lists, but other parts of the standard were not used consistently and group-level variations in coverage were not represented. A common problem for both standards was the lack of unambiguous drug identifiers; panelists agreed that RxNorm deserves further evaluation as a solution to this problem.
A panel of industry experts found the basic structure of these two standards to be technically adequate, but to enable benefits for patient care, improvements are needed in the standards’ implementation.
We design and develop an electronic claim system based on an integrated electronic health record (EHR) platform. This system is designed to be used for ambulatory care by office-based physicians in the United States. This is achieved by integrating various medical standard technologies for interoperability between heterogeneous information systems.
The developed system serves as a simple clinical data repository, it automatically fills out the Centers for Medicare and Medicaid Services (CMS)-1500 form based on information regarding the patients and physicians' clinical activities. It supports electronic insurance claims by creating reimbursement charges. It also contains an HL7 interface engine to exchange clinical messages between heterogeneous devices.
The system partially prevents physician malpractice by suggesting proper treatments according to patient diagnoses and supports physicians by easily preparing documents for reimbursement and submitting claim documents to insurance organizations electronically, without additional effort by the user. To show the usability of the developed system, we performed an experiment that compares the time spent filling out the CMS-1500 form directly and time required create electronic claim data using the developed system. From the experimental results, we conclude that the system could save considerable time for physicians in making claim documents.
The developed system might be particularly useful for those who need a reimbursement-specialized EHR system, even though the proposed system does not completely satisfy all criteria requested by the CMS and Office of the National Coordinator for Health Information Technology (ONC). This is because the criteria are not sufficient but necessary condition for the implementation of EHR systems. The system will be upgraded continuously to implement the criteria and to offer more stable and transparent transmission of electronic claim data.
Electronic Health Records; Health Level Seven; Reimbursement; Relative Value Scales
One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings.
We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later.
We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.
Natural Language Processing; Standards and interoperability; Clinical information extraction; Clinical Element Models; Common type system
Biomedical researchers often have to work on massive, detailed, and heterogeneous datasets that raise new challenges of information management. This study reports an investigation into the nature of the problems faced by the researchers in two bioscience test laboratories when dealing with their data management applications. Data were collected using ethnographic observations, questionnaires, and semi-structured interviews. The major problems identified in working with these systems were related to data organization, publications, and collaboration. The interoperability standards were analyzed using a C4I framework at the level of connection, communication, consolidation, and collaboration. Such an analysis was found to be useful in judging the capabilities of data management systems at different levels of technological competency. While collaboration and system interoperability are the “must have” attributes of these biomedical scientific laboratory information management applications, usability and human interoperability are the other design concerns that must also be addressed for easy use and implementation.
OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals.
The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.
Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.
In Canada, federal, provincial, and territorial governments are developing an ambitious project to implement an interoperable electronic health record (EHR). Benefits for patients, healthcare professionals, organizations, and the public in general are expected. However, adoption of an interoperable EHR remains an important issue because many previous EHR projects have failed due to the lack of integration into practices and organizations. Furthermore, perceptions of the EHR vary between end-user groups, adding to the complexity of implementing this technology. Our aim is to produce a comprehensive synthesis of actual knowledge on the barriers and facilitators influencing the adoption of an interoperable EHR among its various users and beneficiaries.
First, we will conduct a comprehensive review of the scientific literature and other published documentation on the barriers and facilitators to the implementation of the EHR. Standardized literature search and data extraction methods will be used. Studies' quality and relevance to inform decisions on EHR implementation will be assessed. For each group of EHR users identified, barriers and facilitators will be categorized and compiled using narrative synthesis and meta-analytical techniques. The principal factors identified for each group of EHR users will then be validated for its applicability to various Canadian contexts through a two-round Delphi study, involving representatives from each end-user groups. Continuous exchanges with decision makers and periodic knowledge transfer activities are planned to facilitate the dissemination and utilization of research results in policies regarding the implementation of EHR in the Canadian healthcare system.
Given the imminence of an interoperable EHR in Canada, knowledge and evidence are urgently needed to prepare this major shift in our healthcare system and to oversee the factors that could affect its adoption and integration by all its potential users. This synthesis will be the first to systematically summarize the barriers and facilitators to EHR adoption perceived by different groups and to consider the local contexts in order to ensure the applicability of this knowledge to the particular realities of various Canadian jurisdictions. This comprehensive and rigorous strategy could be replicated in other settings.
In order to provide more effective and personalized healthcare services to patients and healthcare professionals, intelligent active knowledge management and reasoning systems with semantic interoperability are needed. Technological developments have changed ubiquitous healthcare making it more semantically interoperable and individual patient-based; however, there are also limitations to these methodologies. Based upon an extensive review of international literature, this paper describes two technological approaches to semantically interoperable electronic health records for ubiquitous healthcare data management: the ontology-based model and the information, or openEHR archetype model, and the link to standard terminologies such as SNOMED-CT.
Ubiquitous Healthcare; Electronic Health Record; Ontology; OpenEHR Archetype; SNOMED-CT
The Generation Challenge programme (GCP) is a global crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding. A key consortium research activity is the development of a GCP crop bioinformatics platform to support GCP research. This platform includes the following: (i) shared, public platform-independent domain models, ontology, and data formats to enable interoperability of data and analysis flows within the platform; (ii) web service and registry technologies to identify, share, and integrate information across diverse, globally dispersed data sources, as well as to access high-performance computational (HPC) facilities for computationally intensive, high-throughput analyses of project data; (iii) platform-specific middleware reference implementations of the domain model integrating a suite of public (largely open-access/-source) databases and software tools into a workbench to facilitate biodiversity analysis, comparative analysis of crop genomic data, and plant breeding decision making.
One of the most serious bottlenecks in the scientific workflows of biodiversity sciences is the need to integrate data from different sources, software applications, and services for analysis, visualisation and publication. For more than a quarter of a century the TDWG Biodiversity Information Standards organisation has a central role in defining and promoting data standards and protocols supporting interoperability between disparate and locally distributed systems.Although often not sufficiently recognized, TDWG standards are the foundation of many popular Biodiversity Informatics applications and infrastructures ranging from small desktop software solutions to large scale international data networks. However, individual scientists and groups of collaborating scientist have difficulties in fully exploiting the potential of standards that are often notoriously complex, lack non-technical documentations, and use different representations and underlying technologies. In the last few years, a series of initiatives such as Scratchpads, the EDIT Platform for Cybertaxonomy, and biowikifarm have started to implement and set up virtual work platforms for biodiversity sciences which shield their users from the complexity of the underlying standards. Apart from being practical work-horses for numerous working processes related to biodiversity sciences, they can be seen as information brokers mediating information between multiple data standards and protocols.The ViBRANT project will further strengthen the flexibility and power of virtual biodiversity working platforms by building software interfaces between them, thus facilitating essential information flows needed for comprehensive data exchange, data indexing, web-publication, and versioning. This work will make an important contribution to the shaping of an international, interoperable, and user-oriented biodiversity information infrastructure.
EDIT; Common Data Model; CDM; Scratchpads; Standards; TDWG; biowikifarm; Taxonomy; Biodiversity; Biodiversity informatics
With the deployments of Electronic Health Records (EHR), interoperability testing in healthcare is becoming crucial. EHR enables access to prior diagnostic information in order to assist in health decisions. It is a virtual system that results from the cooperation of several heterogeneous distributed systems. Interoperability between peers is therefore essential. Achieving interoperability requires various types of testing. Implementations need to be tested using software that simulates communication partners, and that provides test data and test plans.
In this paper we describe a software that is used to test systems that are involved in sharing medical images within the EHR. Our software is used as part of the Integrating the Healthcare Enterprise (IHE) testing process to test the Cross Enterprise Document Sharing for imaging (XDS-I) integration profile. We describe its architecture and functionalities; we also expose the challenges encountered and discuss the elected design solutions.
EHR is being deployed in several countries. The EHR infrastructure will be continuously evolving to embrace advances in the information technology domain. Our software is built on a web framework to allow for an easy evolution with web technology. The testing software is publicly available; it can be used by system implementers to test their implementations. It can also be used by site integrators to verify and test the interoperability of systems, or by developers to understand specifications ambiguities, or to resolve implementations difficulties.
The laboratory mouse has become the organism of choice for discovering gene function and unravelling pathogenetic mechanisms of human diseases through the application of various functional genomic approaches. The resulting deluge of data has led to the deployment of numerous online resources and the concomitant need for formalized experimental descriptions, data standardization, database interoperability and integration, a need that has yet to be met. We present here the Mouse Resource Browser (MRB), a database of mouse databases that indexes 217 publicly available mouse resources under 22 categories and uses a standardised database description framework (the CASIMIR DDF) to provide information on their controlled vocabularies (ontologies and minimum information standards), and technical information on programmatic access and data availability. Focusing on interoperability and integration, MRB offers automatic generation of downloadable and re-distributable SOAP application-programming interfaces for resources that provide direct database access. MRB aims to provide useful information to both bench scientists, who can easily navigate and find all mouse related resources in one place, and bioinformaticians, who will be provided with interoperable resources containing data which can be mined and integrated.
Database URL: http://bioit.fleming.gr/mrb
We report from the second ESF Programme on Functional Genomics workshop on
Data Integration, which covered topics including the status of biological pathways
databases in existing consortia; pathways as part of bioinformatics infrastructures;
design, creation and formalization of biological pathways databases; generating
and supporting pathway data and interoperability of databases with other external
databases and standards. Key issues emerging from the discussions were the need for
continued funding to cover maintenance and curation of databases, the importance
of quality control of the data in these resources, and efforts to facilitate the exchange
of data and to ensure the interoperability of databases.
Health-information exchange, that is, enabling the interoperability of automated health data, can facilitate important improvements in healthcare quality and efficiency. A vision of interoperability and its benefits was articulated more than a decade ago. Since then, important advances toward the goal have been made. The advent of the Health Information Technology for Economic and Clinical Health Act and the meaningful use program is already having a significant impact on the direction that health-information exchange will take. This paper describes how interoperability activities have unfolded over the last decade and explores how recent initiatives are likely to affect the directions and benefits of health-information exchange.
Clinical decision support
Recent increases in the volume and diversity of life science data and information and an increasing emphasis on data sharing and interoperability have resulted in the creation of a large number of biological ontologies, including the Cell Ontology (CL), designed to provide a standardized representation of cell types for data annotation. Ontologies have been shown to have significant benefits for computational analyses of large data sets and for automated reasoning applications, leading to organized attempts to improve the structure and formal rigor of ontologies to better support computation. Currently, the CL employs multiple is_a relations, defining cell types in terms of histological, functional, and lineage properties, and the majority of definitions are written with sufficient generality to hold across multiple species. This approach limits the CL's utility for computation and for cross-species data integration.
To enhance the CL's utility for computational analyses, we developed a method for the ontological representation of cells and applied this method to develop a dendritic cell ontology (DC-CL). DC-CL subtypes are delineated on the basis of surface protein expression, systematically including both species-general and species-specific types and optimizing DC-CL for the analysis of flow cytometry data. We avoid multiple uses of is_a by linking DC-CL terms to terms in other ontologies via additional, formally defined relations such as has_function.
This approach brings benefits in the form of increased accuracy, support for reasoning, and interoperability with other ontology resources. Accordingly, we propose our method as a general strategy for the ontological representation of cells. DC-CL is available from .
Image acquisition, processing, and quantification of objects (morphometry) require the integration of data inputs and outputs originating from heterogeneous sources. Management of the data exchange along this workflow in a systematic manner poses several challenges, notably the description of the heterogeneous meta-data and the interoperability between the software used. The use of integrated software solutions for morphometry and management of imaging data in combination with ontologies can reduce meta-data loss and greatly facilitate subsequent data analysis. This paper presents an integrated information system, called LabIS. The system has the objectives to automate (i) the process of storage, annotation, and querying of image measurements and (ii) to provide means for data sharing with third party applications consuming measurement data using open standard communication protocols. LabIS implements 3-tier architecture with a relational database back-end and an application logic middle tier realizing web-based user interface for reporting and annotation and a web-service communication layer. The image processing and morphometry functionality is backed by interoperability with ImageJ, a public domain image processing software, via integrated clients. Instrumental for the latter feat was the construction of a data ontology representing the common measurement data model. LabIS supports user profiling and can store arbitrary types of measurements, regions of interest, calibrations, and ImageJ settings. Interpretation of the stored measurements is facilitated by atlas mapping and ontology-based markup. The system can be used as an experimental workflow management tool allowing for description and reporting of the performed experiments. LabIS can be also used as a measurements repository that can be transparently accessed by computational environments, such as Matlab. Finally, the system can be used as a data sharing tool.
web-service; ontology; morphometry
Units are basic scientific tools that render meaning to numerical data. Their standardization and formalization caters for the report, exchange, process, reproducibility and integration of quantitative measurements. Ontologies are means that facilitate the integration of data and knowledge allowing interoperability and semantic information processing between diverse biomedical resources and domains. Here, we present the Units Ontology (UO), an ontology currently being used in many scientific resources for the standardized description of units of measurements.
DAS is a widely adopted protocol for providing syntactic interoperability among biological databases. The popularity of DAS is due to a simplified and elegant mechanism for data exchange that consists of sources exposing their RESTful interfaces for data access. As a growing number of DAS services are available for molecular biology resources, there is an incentive to explore this protocol in order to advance data discovery and integration among these resources.
We developed DASMiner, a Matlab toolkit for querying DAS data sources that enables creation of integrated biological models using the information available in DAS-compliant repositories. DASMiner is composed by a browser application and an API that work together to facilitate gathering of data from different DAS sources, which can be used for creating enriched datasets from multiple sources.
The browser is used to formulate queries and navigate data contained in DAS sources. Users can execute queries against these sources in an intuitive fashion, without the need of knowing the specific DAS syntax for the particular source. Using the source's metadata provided by the DAS Registry, the browser's layout adapts to expose only the set of commands and coordinate systems supported by the specific source. For this reason, the browser can interrogate any DAS source, independently of the type of data being served.
The API component of DASMiner may be used for programmatic access of DAS sources by programs in Matlab. Once the desired data is found during navigation, the query is exported in the format of an API call to be used within any Matlab application. We illustrate the use of DASMiner by creating integrative models of histone modification maps and protein-protein interaction networks. These enriched datasets were built by retrieving and integrating distributed genomic and proteomic DAS sources using the API.
The support of the DAS protocol allows that hundreds of molecular biology databases to be treated as a federated, online collection of resources. DASMiner enables full exploration of these resources, and can be used to deploy applications and create integrated views of biological systems using the information deposited in DAS repositories.