The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities.
We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted.
These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at .
The Mouse Tumor Biology Database (MTB) is a Web-based resource that provides access to information on tumor frequency and latency, genetics and pathology in genetically defined mice (transgenics, targeted mutations and inbred strains). MTB is designed to serve as an information resource for cancer genetics researchers who use the laboratory mouse as a model system for understanding human disease processes. Data in MTB are obtained from the primary scientific literature and direct submissions by the research community. MTB is accessible from the Mouse Genome Informatics Web site (http://www.informatics.jax.org ). User support is available for MTB via Email at email@example.com
The Mouse Tumor Biology (MTB) Database serves as a curated, integrated
resource for information about tumor genetics and pathology in genetically
defined strains of mice (i.e., inbred, transgenic and targeted mutation
strains). Sources of information for the database include the published
scientific literature and direct data submissions by the scientific community.
Researchers access MTB using Web-based query forms and can use the
database to answer such questions as ‘What tumors have
been reported in transgenic mice created on a C57BL/6J background?’, ‘What
tumors in mice are associated with mutations in the Trp53 gene?’ and ‘What pathology
images are available for tumors of the mammary gland regardless
of genetic background?’. MTB has been available on the
Web since 1998 from the Mouse Genome Informatics web site (http://www.informatics.jax.org).
We have recently implemented a number of enhancements to MTB including new
query options, redesigned query forms and results pages for pathology
and genetic data, and the addition of an electronic data submission
and annotation tool for pathology data.
In 2012, Health/Medical informatics profession celebrates five jubilees in Bosnia and Herzegovina: a) Thirty five years from the introduction of the first automatic manipulation of data; b) Twenty five years from establishing Society for Medical Informatics BiH; c) Twenty years from establishing scientific and professional journal of the Society for Medical Informatics of Bosnia and Herzegovina „Acta Informatica Medica“; d) Twenty years from establishing first Cathdra for Medical Informatics on biomedical faculties in Bosnia and Herzegovina and e) Ten years from the introduction of “Distance learning” in medical curriculum. All of the five mentioned activities in the area of Medical informatics had special importance and gave appropriate contribution in the development of Health/Medical informatics in Bosnia And Herzegovina.
Health/Medical informatics; Acta Inform Med; Jubilees; Bosnia and Herzegovina.
Despite the availability of community-based support services, cancer patients and survivors are not aware of many of these resources. Without access to community programs, cancer survivors are at risk for lower quality of care and lower quality of life. At the same time, non-profit community organizations lack access to advanced consumer informatics applications to effectively promote awareness of their services. In addition to the current models of print and online resource guides, new community-driven informatics approaches are needed to achieve the goal of comprehensive care for cancer survivors. We present the formulation of a novel model for synthesizing a local community’s collective wisdom of cancer-related resources through a combination of online social networking technologies and real-world collaborative partnerships. This approach can improve awareness of essential, but underutilized community resources.
Summary: MonkeySNP is a web-based resource created by the Genetic Resource and Informatics Program at the Oregon National Primate Research Center to facilitate access to non-human primate (NHP) single nucleotide polymorphisms (SNP) data. MonkeySNP is a mirror of the NCBI dbSNP database and contains additional NHP subpopulation genotype data and visual genotype displays to support SNP review and selection.
Supplementary information: Supplementary data are available at Bioinformatics online.
To report the results of a needs assessment of research and training in Medical Informatics (MI) and Bioinformatics (BI) in Latin America.
Methods and results
This assessment was conducted by QUIPU: The Andean Global Health Informatics Research and Training Center. After sending email invitations to MI–BI related professionals from Latin America, 142 surveys were received from 11 Latin American countries. The following were the top four ranked MI-related courses that a training programme should include: introduction to biomedical informatics; data representation and databases; mobile health; and courses that address issues of security, confidentiality and privacy. Several new courses and topics for research were suggested by survey participants. The information collected is guiding the development of curricula and a research agenda for the MI and BI QUIPU multidisciplinary programme for the Andean Region and Latin America.
The objective of this paper is to report the results of the first needs assessment of research and training in Medical Informatics (MI) and Bioinformatics (BI) in Latin America.
Top ranked courses in biomedical informatics included: mobile health, issues on security, confidentiality and privacy, public and clinical informatics and electronic health records.
The information collected in this needs assessment is guiding the development of curricula and a research agenda for training and research in the Andean region through the Peruvian NIH funded centre QUIPU. ‘Quipu’ is a Quechua word that describes an ancient system used throughout the Andes by the Incas to record and distribute information.
Strengths and limitations of this study
The online survey included participants from 11 Latin American countries.
It is the first needs assessment in Latin America addressing issues of training and research in biomedical informatics.
The sample was, however, purposive.
In this era of complete genomes, our knowledge of neuroanatomical circuitry remains surprisingly sparse. Such knowledge is critical, however, for both basic and clinical research into brain function. Here we advocate for a concerted effort to fill this gap, through systematic, experimental mapping of neural circuits at a mesoscopic scale of resolution suitable for comprehensive, brainwide coverage, using injections of tracers or viral vectors. We detail the scientific and medical rationale and briefly review existing knowledge and experimental techniques. We define a set of desiderata, including brainwide coverage; validated and extensible experimental techniques suitable for standardization and automation; centralized, open-access data repository; compatibility with existing resources; and tractability with current informatics technology. We discuss a hypothetical but tractable plan for mouse, additional efforts for the macaque, and technique development for human. We estimate that the mouse connectivity project could be completed within five years with a comparatively modest budget.
In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows.
CaGrid selected Taverna as its workflow execution system of choice due to its integration with web service technology and support for a wide range of web services, plug-in architecture to cater for easy integration of third party extensions, etc. The caGrid Workflow Toolkit (or the toolkit for short), an extension to the Taverna workflow system, is designed and implemented to ease building and running caGrid workflows. It provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation, which have been identified by the caGrid community as challenging in a multi-institutional and cross-discipline domain.
By extending the Taverna Workbench, caGrid Workflow Toolkit provided a comprehensive solution to compose and coordinate services in caGrid, which would otherwise remain isolated and disconnected from each other. Using it users can access more than 140 services and are offered with a rich set of features including discovery of data and analytical services, query and transfer of data, security protections for service invocations, state management in service interactions, and sharing of workflows, experiences and best practices. The proposed solution is general enough to be applicable and reusable within other service-computing infrastructures that leverage similar technology stack.
The Gene Expression Database (GXD) is a community resource of
gene expression information for the laboratory mouse. By combining
the different types of expression data, GXD aims to provide increasingly complete
information about the expression profiles of genes in different
mouse strains and mutants, thus enabling valuable insights into
the molecular networks that underlie normal development and disease.
GXD is integrated with the Mouse Genome Database (MGD). Extensive
interconnections with sequence databases and with databases from
other species, and the development and use of shared controlled
vocabularies extend GXD’s utility for the analysis of gene
expression information. GXD is accessible through the Mouse Genome Informatics
web site at http://www.informatic
s.jax.org/ or directly at http://www.informatics.jax.org/me
Having access to current “Omics” level technology and services such as Bioinformatics, Genomics and Proteomics is critical for a program to function in a translational research setting. The Research Centers in Minority Institutions (RCMI) program of the NIH enhances the research capacity and infrastructure at colleges and universities which serve underrepresented populations. RCMI institutions have made significant inroads to addressing and advancing biomedical research in underserved populations, yet, challenges such as limited resources, multiple data systems, and access to “Omics” services need to be addressed. Within this framework, the RCMI Infrastructure for Clinical and Translational Research (RCTR) and the RCMI Translational Research Network (RTRN) have established a Proteomics & Informatics Collaborative Group whose function is to connect “Omics” Service Cores across the RCMI. Initial steps in this effort revolved on establishing an inventory of Proteomics equipment, services offered and informatics flows. The overriding goal is to virtualize Proteomics and Informatics knowledge and services across multiple, geographically distinct RCMI centers, in order to maximize efficiency. Focused of creating a data center as the first step, the Data Technology Coordinating Center (DTCC) at Jackson State University functions as the data hub of the Collaborative, while the Proteomics Core directors are the architects and project managers. Given differing levels of experience and scale at various institutions, this Collaborative serves as a valuable tool for information sharing, lessons learned, and most importantly a combined focus in order to pool strengths at RCMI institutes. Next steps will focus on a shared Proteomics data repository and establishment of best practice standard operating procedures in order to set the stage for sample sharing and group informatics analysis.
With the increased use of the World Wide Web has come an increase in the
number of Uniform Resource Locator (URL) references cited in journals. Out
of the 17,698 references we collected from five biomedical informatics
journals between 1999 and 2005, 6.8% contained URLs. Overall, 22.6% of
these URLs were inaccessible. In-press articles
had 10.8% unavailable URLs. Approaches that guarantee permanent
access to URL citations of scientific publications are needed.
The Mouse Tumor Biology (MTB) database provides access to data about endogenously arising tumors (both spontaneous and induced) in genetically defined mice (inbred, hybrid, mutant and genetically engineered mice). Data include information on the frequency and latency of mouse tumors, pathology reports and images, genomic changes occurring in the tumors, genetic (strain) background and literature or contributor citations. Data are curated from the primary literature or submitted directly from researchers. MTB is accessed via the Mouse Genome Informatics web site (). Integrated searches of MTB are enabled through use of multiple controlled vocabularies and by adherence to standardized nomenclature, when available. Recently MTB has been redesigned and its database infrastructure replaced with a robust relational database management system (RDMS). Web interface improvements include a new advanced query form and enhancements to already existing search capabilities. The Tumor Frequency Grid has been revised to enhance interactivity, providing an overview of reported tumor incidence across mouse strains and an entrée into the database. A new pathology data submission tool allows users to submit, edit and release data to the MTB system.
Medical informatics, as a descriptive, scientific study, must be mathematically or theoretically described. Is it important to define a model for medical informatics? The answer is worth pursuing. The medical informatics profession stands to benefit three-fold: first, by clarifying the vagueness of the definition of medical informatics, secondly, by identifying the scope and content for educational programs, and, thirdly, by defining career opportunities for its graduates. Existing medical informatics curricula are not comparable. Consequently, the knowledge and skills of graduates from these programs are difficult to assess. The challenge is to promote academics that develops graduates for prospective employers to fulfill the criteria of the health care industry and, simultaneously, compete with computer science programs that produce information technology graduates. In order to meet this challenge, medical informatics programs must have unique curricula that distinguishes its graduates. The solution is to educate students in a comparable manner across the domain of medical informatics. This paper discusses a theoretical model for medical informatics.
The Gene Expression Database (GXD) is a community resource of gene expression information for the laboratory mouse. The database is designed as an open-ended system that can integrate different types of expression data. New expression data are made available on a daily basis. Thus, GXD provides increasingly complete information about what transcripts and proteins are produced by what genes; where, when and in what amounts these gene products are expressed; and how their expression varies in different mouse strains and mutants. GXD is integrated with the Mouse Genome Database (MGD). Continuously refined interconnections with sequence databases and with databases from other species place the gene expression information in the larger biological and analytical context. GXD is accessible through the Mouse Genome Informatics Web site at http://www. informatics.jax.org/ or directly at http://www.informatics. jax.org/menus/expression_menu.shtml
We describe a distributed architecture for medical informatics applications, based on the World-Wide Web (WWW) environment. After discussing previous experiences in the application of the WWW for medical purposes, we outline the features of a Common Lisp HTTP server designed to provide access to medical informatics applications using a standard Web browser. As an example of application, we describe a system for therapy planning and revision in the field of insulin-dependent diabetes. The system performs automatic data analysis and interpretation and provides advice on possible adjustments to the therapeutic protocol that the patients are following, taking advantage of the network and multimedia capabilities offered by the WWW for user interaction.
The objective of this study was to explore public health informatics (PHI) training programs that currently exist to meet the growing demand for a trained global workforce. We used several search engines, scientific databases, and the websites of informatics organizations; sources included PubMed, Google, the American Medical Informatics Organization, and the International Medical Informatics Organization. The search was conducted from May to July 2011 and from January to February 2012 using key words such as informatics, public health informatics, or biomedical informatics along with academic programs, training, certificate, graduate programs, or postgraduate programs. Course titles and catalog descriptions were gathered from the program or institution websites. Variables included PHI program categories, location and mode of delivery, program credits, and costs. Each course was then categorized based on its title and description as available on the Internet. Finally, we matched course titles and descriptions with the competencies for PHIs determined by Centers for Disease Control and Prevention (CDC). Descriptive analysis was performed to report means and frequency distributions for continuous and categorical variables. Stratified analysis was performed to explore average credits and cost per credit among both the public and private institutions. Fifteen PHI programs were identified across 13 different institutions, the majority of which were US-based. The average number of credits and the associated costs required to obtain PHI training were much higher in private as compared to public institutions. The study results suggest that a need for online contextual and cost-effective PHI training programs exists to address the growing needs of professionals worldwide who are using technology to improve public health in their respective countries.
public health informatics; training; global workforce
A critical component of the Neuroscience Information Framework (NIF) project is a consistent, flexible terminology for describing and retrieving neuroscience-relevant resources. Although the original NIF specification called for a loosely structured controlled vocabulary for describing neuroscience resources, as the NIF system evolved, the requirement for a formally structured ontology for neuroscience with sufficient granularity to describe and access a diverse collection of information became obvious. This requirement led to the NIF standardized (NIFSTD) ontology, a comprehensive collection of common neuroscience domain terminologies woven into an ontologically consistent, unified representation of the biomedical domains typically used to describe neuroscience data (e.g., anatomy, cell types, techniques), as well as digital resources (tools, databases) being created throughout the neuroscience community. NIFSTD builds upon a structure established by the BIRNLex, a lexicon of concepts covering clinical neuroimaging research developed by the Biomedical Informatics Research Network (BIRN) project. Each distinct domain module is represented using the Web Ontology Language (OWL). As much as has been practical, NIFSTD reuses existing community ontologies that cover the required biomedical domains, building the more specific concepts required to annotate NIF resources. By following this principle, an extensive vocabulary was assembled in a relatively short period of time for NIF information annotation, organization, and retrieval, in a form that promotes easy extension and modification. We report here on the structure of the NIFSTD, and its predecessor BIRNLex, the principles followed in its construction and provide examples of its use within NIF.
Neuroscience Information Framework; NIF standardized; Biomedical Informatics Research Network; Web Ontology Language
Access to consumer health informatics innovations requires availability
of certain technical resources. We discuss the use of a decision analysis
matrix in selecting technology for home use by participants in the
HeartCare II project, a web-based resource for patients managing congestive
heart failure. The matrix provided the structure for collecting, displaying, and
comparing data about the requisite criteria for the
technology. Its utility for a variety of decision problems with dissimilar
data elements is posited.
The Mouse Tumor Biology (MTB) Database supports the use of the mouse as a model system of hereditary and induced cancers by providing electronic access to: (i) tumor names and classifications, (ii) tumor incidence and latency data in different strains of mice, (iii) tumor pathology reports and images, (iv) information on genetic factors associated with tumors and tumor development, and (v) references (published and unpublished data). This resource has been designed to aid researchers in such areas as choosing experimental models, reviewing patterns of mutations in specific cancers, and identifying genes that are commonly mutated across a spectrum of cancers. MTB also provides hypertext links to related on-line resources and databases. MTB is accessible via the World Wide Web at http://tumor.informatics.jax.org. User support is available for MTB by Email at firstname.lastname@example.org
Gene trapping is a method of generating murine embryonic stem (ES) cell lines containing insertional mutations in known and novel genes. A number of international groups have used this approach to create sizeable public cell line repositories available to the scientific community for the generation of mutant mouse strains. The major gene trapping groups worldwide have recently joined together to centralize access to all publicly available gene trap lines by developing a user-oriented Website for the International Gene Trap Consortium (IGTC). This collaboration provides an impressive public informatics resource comprising ∼45 000 well-characterized ES cell lines which currently represent ∼40% of known mouse genes, all freely available for the creation of knockout mice on a non-collaborative basis. To standardize annotation and provide high confidence data for gene trap lines, a rigorous identification and annotation pipeline has been developed combining genomic localization and transcript alignment of gene trap sequence tags to identify trapped loci. This information is stored in a new bioinformatics database accessible through the IGTC Website interface. The IGTC Website () allows users to browse and search the database for trapped genes, BLAST sequences against gene trap sequence tags, and view trapped genes within biological pathways. In addition, IGTC data have been integrated into major genome browsers and bioinformatics sites to provide users with outside portals for viewing this data. The development of the IGTC Website marks a major advance by providing the research community with the data and tools necessary to effectively use public gene trap resources for the large-scale characterization of mammalian gene function.
To provide an overview of the expansion in public access to electronic biomedical information over the past two decades, with an emphasis on developments to which the U.S. National Library of Medicine contributed.
Review of the increasingly broad spectrum of web-accessible genomic data, biomedical literature, consumer health information, clinical trials data, and images.
The amount of publicly available electronic biomedical information has increased dramatically over the past twenty years. Rising expectations regarding access to biomedical information were stimulated by the spread of the Internet, the World Wide Web, advanced searching and linking techniques. These informatics advances simplified and improved access to electronic information and reduced costs, which enabled inter-organizational collaborations to build and maintain large international information resources and also aided outreach and education efforts The demonstrated benefits of free access to electronic biomedical information encouraged the development of public policies that further increase the amount of information available.
Continuing rapid growth of publicly accessible electronic biomedical information presents tremendous opportunities and challenges, including the need to ensure uninterrupted access during disasters or emergencies and to manage digital resources so they remain available for future generations.
Access to information; genetic databases; digital libraries; consumer health information; clinical trials
As the emphasis on individuals' active partnership in health care grows, so does the public's need for effective, comprehensible consumer health resources. Consumer health informatics has the potential to provide frameworks and strategies for designing effective health communication tools that empower users and improve their health decisions. This article presents an overview of the consumer health informatics field, discusses promising approaches to supporting health communication, and identifies challenges plus direction for future research and development. The authors' recommendations emphasize the need for drawing upon communication and social science theories of information behavior, reaching out to consumers via a range of traditional and novel formats, gaining better understanding of the public's health information needs, and developing informatics solutions for tailoring resources to users' needs and competencies. This article was written as a scholarly outreach and leadership project by members of the American Medical Informatics Association's Consumer Health Informatics Working Group.
The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to ). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics.
This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000–2005 ("recent") and 1990–1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays.
We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science.
The advancement of the computational biology field hinges on progress in three fundamental directions – the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources–data, software tools and web-services. The iTools design, implementation and resource meta - data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.