|Home | About | Journals | Submit | Contact Us | Français|
As the number of neuroscience databases increases, the need for neuroscience data integration grows. This paper reviews and compares several approaches, including the Neuroscience Database Gateway (NDG), Neuroscience Information Framework (NIF) and Entrez Neuron, which enable neuroscience database annotation and integration. These approaches cover a range of activities spanning from registry, discovery and integration of a wide variety of neuroscience data sources. They also provide different user interfaces for browsing, querying and displaying query results. In Entrez Neuron, for example, four different facets or tree views (neuron, neuronal property, gene and drug) are used to hierarchically organize concepts that can be used for querying a collection of ontologies. The facets are also used to define the structure of the query results.
In an article titled ‘Data-sharing in an information age’ , Insel et al.  made the following prediction: ‘as we emerge from the “decade of the brain”, we are entering a decade for which data-sharing will be the currency for progress in neuroscience’. The growth in the number and diversity of neuroscience databases on the Internet (or the Web) is consistent with this statement. As the number of neuroscience databases continues to grow, the problem of database interoperability is becoming prominent in neuroinformatics . Also, as described in , the neuroinformatic ecosystem involves not only data, but also access to and re-use of data.
Data accessibility and re-use cannot be achieved without locating the data sources first. One common method for discovering Web-accessible data sources is to use a Web search engine such as Google to perform a keyword search. This approach often lacks the specificity and sensitivity the user wants. For example, if the user enters the keyword ‘neuron’, numerous hits are returned. It is tedious and time-consuming for the user to sift though a large number of hits to find the relevant ones. Some of these hits are not related to ‘neuron’ in the way expected by neuroscientists. For example, one of these unrelated sites is hosted by a company named ‘NEURON’ that specializes in manufacturing and selling magnetic and smart card readers and encoders. However, if the user chooses a very specific search term (e.g. ‘hippocampal CA1 non-pyramidal neuron’), the search may not return any hits. Part of the problem is that generic Web search engines such as Google do not index the resources according to the needs of domain experts, such as neuroscientists.
To address the non-specific keyword search problem, the Neuroscience Database Gateway (NDG) (http://ndg.sfn.org/) has been created as a resource through the Society for Neuroscience (SfN) (http://www.sfn.org/). NDG provides a registry of neuroscience databases annotated with controlled keywords. Nearly 200 databases are currently listed in NDG. These databases span different neuroscience subdomains such as neurophysiology (e.g. SenseLab [5, 6]), neuroanatomy (e.g. BAMS) and neuroimaging (e.g. CCDB ). They also cover a wide variety of types of data for different species (e.g. mouse, rat and human). The types of data include images (e.g. CCDB ), brain atlases (e.g. Allen Brain Atlas) and neurological diseases (e.g. Alzforum ). Each NDG-registered database is annotated with keywords derived from a standard vocabulary/terminology that is curated and approved by a neuroscience user committee. Categories of keywords (e.g. database access, species and clinical conditions) are provided to help the user browse the databases within the registry. In addition, NDG provides a structured search interface for the user to query the registry based on controlled keywords. Figure 1a shows the home page of NDG. The search page can be accessed by clicking the ‘Search’ button as shown in the figure. Figure 1b shows the search interface and an example search involving selection of two keywords: ‘Experimental data’ and ‘Alzheimer's disease’ in the ‘DB Category’ field and ‘Clinical conditions’ field, respectively. Each keyword was selected from a popup list of controlled terms (the popup lists are also shown in Figure 1b). The search returns a list of matching databases shown in Figure 1c. Figure 1d shows the detailed description (annotation) of one of the matching databases ‘Whole Brain Atlas’ that contains normal and abnormal brain images (belonging to the category of experimental data) including those that are related to the study of Alzheimer's disease. Such a controlled keyword search approach has the potential to allow a more precise and accurate annotation and discovery of neuroscience resources compared to non-specific text-based document indexing and retrieval (e.g. Google).
While NDG is one of the first attempts in terms of enabling neuroscience database discovery by annotating and indexing (at a high abstract level) databases based on controlled keywords, other efforts are underway to make use of standard terminologies and ontologies to support both ‘shallow’ and ‘deep’ interoperability among neuroscience resources.
One major effort supporting integrative neuroscience research is the Neuroscience Information Framework (NIF) initiative funded by the National Institutes of Health, which is currently being refined by a multi-institutional consortium. The overarching goal of the NIF is to be a one-stop shop for neuroscience. A central element in the NIF is an ontology called ‘NIFSTD’ (which stands for ‘NIF standardized ontology’) . This ontology represents an integration/alignment of multiple ontologies (e.g. BirnLex and Gene Ontology) that have been developed for use in various biomedical/neuroscientific domains. It is available in the Web Ontology Language (OWL) format , which is a standard ontology format used by the Semantic Web . The OWL format allows machine-based querying and reasoning. The NIFSTD allows the neuroscientist to perform searches based on neuroscience-related concepts and concept relationships. The NIFSTD utilizes the OBO's Relation Ontology  for specifying relationships between entities.
The NIF has three main components: NIF resource registry, NIF database mediator and NIF document archive. All are currently still undergoing ongoing development and refinement. The NIF resource registry uses the NIFSTD to annotate neuroscience databases and tools in a fashion similar to the NDG. The NIF database mediator uses the NIFSTD to facilitate query mapping between multiple databases. The NIF document archive uses a text-mining tool called Textpresso  for storing and accessing the neuroscience literature. Like the NDG, the NIF resource registry contains information about a wide range of different types of databases and other Web-based resources relevant to neuroscience. These resources are indexed with a high level of abstraction using NIFSTD's ontological concepts. Such high level concepts are used to support ‘shallow’ interoperability. To search and/or integrate the content of the resources themselves, one has to use the NIF database mediator. The mediator allows automated searching of the contents of a set of mediated databases whose internal vocabularies have been mapped to the general and specific concepts in the NIFSTD ontology.
A prototype Web interface called ‘CBQI’ (Concept Based Query Interface)  (whose components are shown in Figure 2) was developed in an early version of the NIF to demonstrate how the Textpresso and NIFSTD are used to allow: (i) text documents to be retrieved through the Textpresso text-search engine and (ii) concept-based queries to be issued against multiple databases through the NIF resource registry and the NIF database mediator. In this prototype implementation, a keyword entered by the user is mapped to the corresponding concept in the NIFSTD ontology. Through the NIF database mediator, the NIFSTD concept is mapped to the corresponding items in the underlying databases including SenseLab, CCDB, SumsDB (http://sumsdb.wustl.edu:8081/sums) and NeuroMorpho.org (http://neuromorpho.org/neuroMorpho). The process of mapping all the relevant terms in a database to the equivalent concepts in NIFSTD is a manual task. As a result, the expansion of the NIF database mediator will be slow compared to the population of the other two NIF resources. Thus the full power of the concept-based approach can only be achieved incrementally over a relatively extended period of time. With these mappings, queries are issued against the local databases. The results returned from the local databases are displayed separately to the user. As described in , this query interface was limited in that concept-to-database mappings were only partially implemented. It also points out that some meaningful display of a subset of the NIFSTD ontology should ideally be implemented to better help the user choose the appropriate concept(s) for querying the databases within a particular context.
Figure 3a shows how a search is formulated via the prototype CBQI search interface (form). The interface has four components, reflecting the four major steps involved in formulating a search. The first step is labeled ‘Search for Keywords’. Here the user has entered the text term ‘purkinje’ for this simple example search. After entering this term, the user clicks on the ‘Search for Keywords’ button. This results in a search of the NIFSTD ontology for any concepts (keywords) that match the text word ‘purkinje’. A list of the concepts found is then displayed in the box labeled ‘Select Keywords’. In this case three concepts are displayed. The user may then highlight one or more of those concepts and click ‘Select’. The selected keywords are then copied into the ‘Compose Query’ box. Figure 3b shows how search results are displayed for the NIF Database Mediator. This screen displays different databases that contain potentially relevant data and allows the user to launch a search directly into any one of those databases to retrieve that data. From left to right, we see the names of (i) the database, (ii) a table in that database and (iii) fields within that table which may contain relevant information. Each table may have up to two buttons, one (a ‘Web link out’ button labeled with the name of the database) that links to a specific page for the search concept (in this case ‘Purkinje neuron’) in the resource, and another (‘Retrieve Data’) that retrieves information directly from the resource's back-end database. Note that the search term ‘Purkinje neuron’ has been translated to its corresponding term in each database: e.g. Purkinje neuron (in CCDB—Cell Centered Database (), and Cerebellar purkinje cell (in SenseLab [5, 6]). Database term translations are performed via the NIF Mediator using mappings between those terms and concepts in the NIFSTD ontology. For each database, the user is given the option of indicating (via checkboxes) which data fields he would like retrieved from each database (by default all fields are selected). For example, for the NeuronDB neuronal current table, if the user clicks on the ‘Retrieve Data’ button he is taken to a new (pop-up) screen (Figure 3c) containing data about neuronal currents that have been identified in various compartments of the purkinje cell. The advantage of this query output format is that the data can be inspected in a generic tabular format, and could for example be copied and pasted into a spreadsheet (or into a local database) for integrated analysis with data from other sources.
A further search enhancement is illustrated by Entrez Neuron. This is a prototype Web application (http://neuroweb3.med.yale.edu:8080) developed by our group, which provides an intuitive interface for the user to issue concept-based queries against multiple neuroscience ontologies. Entrez Neuron differs from CBQI in a number of ways including the following.
Derived loosely from the approach of Entrez Gene (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene), the prototype Entrez Neuron system supports neuron-centric data integration. Its target users are neuroscientists spanning multiple subdisciplines including neurophysiology, neuroanatomy, neuromolecular biology, neuroimaging, neuropharmacology and neurobiochemistry. It provides an interactive Web interface that allows neuroscientists to retrieve different types of information about neurons across multiple ontologies implemented using OWL and stored using Oracle 11g that supports SPARQL and native OWL inferencing. These ontologies constitute the SenseLab ontology collection that includes conversion of several SenseLab databases into OWL ontologies (http://www.w3.org/TR/hcls-senselab/) and their links to other biomedical ontologies. While genes are one of the fundamental units that biologists use in their research studies, neurons are one of the most important elements that neurobiologists explore. Entrez Neuron provides an interface that allows queries to be issued based on concepts displayed in different hierarchically structured views (or facets) derived from the ontologies. There are currently four facets involving the following concepts: neuron, neuronal property, gene and drug. These concepts and their relationships are depicted in Figure 4. As shown in the figure, a neuronal property is located in neuronal compartment of a neuron; a gene encodes a molecular component that enables a neuronal property; and a drug binds to a receptor.
Each of the facets and its associated data ontologies is described below.
These facets allow the user to browse different types of concepts organized in ways that are familiar to neuroscientists. They also represent different levels of granularity. Brain regions constitute the top layer. The next level consists of different types of neurons located in different brain regions. Below the level of neurons, there are cell membrane properties including neurotransmitters, receptors, and channels located in different neuronal compartments. Finally, there is the molecular level involving molecular entities like genes and drug molecules that interact with neuronal cell membrane properties. Such molecular interactions may relate to synaptic activities such as neurotransmission. Such multi-level organization not only allows the user to access information at any given level, but it also permits a drill-down approach to accessing different layers of information systematically.
These four facets represent those presently being explored on a prototype basis. Further facets can be introduced as the granularity of user demand requires. The Neuron facet for example might be enhanced to include the interregional circuits that form the major pathways and systems of the brain, and the microcircuits that carry out the processing within each region. The Neuron Properties facet could include the levels of integrative activity that occur within the dendritic trees and dendritic branches of a neuron. Both of these facets could retrieve information on physiological recordings and neural correlates of behavior. Current steps in those directions are illustrated in NeuronDB, which provides the identification of neuron properties in relation to canonical compartments of the neuron, and in ModelDB, which provides generation of the physiological recordings by simulations based on those properties. Entrez Neuron can provide a framework for incorporating these and other levels of organization and function into hierarchies, where desired by the corresponding research community.
The facets not only facilitate concept browsing, but they also allow related types of information to be retrieved from different sources. Figure 5 shows the types of information retrieved from NeuronDB, ModelDB and PDSP ontologies. The network graph shown in Figure 5 represents a semantic neighborhood view of the following concepts: Neurons, Neuronal Property and Drug. It consists of the following types of information (with some overlap) obtained from NeuronDB, ModelDB and PDSP:
The Web interface of Entrez Neuron was implemented using Ajax (http://en.wikipedia.org/wiki/AJAX) to provide responsiveness and interactivity. Figure 6 shows the Web-based query interface of Entrez Neuron. The interface consists of three panels: (i) search panel, (ii) facet tree panel and (iii) query results panel. As shown in the figure, the user has chosen three different terms (CA1 pyramidal neuron, GABA-A receptor and diazepam) from their corresponding facets (neuron facet, neuronal property facet and drug facet). For example, the user clicks on the browse button next to the Drug search box for browsing the drug facet tree and selecting a search term from the tree. In this case, the user expanded the tree and selected the term ‘diazepam’. The selected term is automatically entered in the Drug search box. Then the user presses the ‘GO’ button in the search panel to issue the query. The query results are displayed in the query results panel. As shown in the query results, there are two data sources (NeuronDB and PDSP) containing information about the chosen drug, neuron and neuronal property. The query results returned from each data source can be seen by clicking the corresponding tab. The query results are scrollable and presented in text format that is readable to neuroscientists (e.g. ‘Diazapam binds to GABA-A …’ for the NeuronDB tab and ‘CA1 pyramidal neuron with GABA-A receptor in Dam’ for the PDSP tab). The text output corresponds naturally to the semantic web structure (e.g. subject, property, object) shown in Figure 5.
The query interface supports two search types: exact search versus fuzzy search. While the former only allows exact matches of the search term, the latter allows the search to be broadened by including matches of the terms that have the same parent as the search term. Such a fuzzy search employs inferencing based on the parent-child relationship. For the example query shown in Figure 6, if the fuzzy search type were chosen, the query results will include results from matches of ‘CA3 pyramidal neuron’ that has the same parent (‘Hippocampus’) as ‘CA1 pyramidal neuron’. The fuzzy search results are displayed in separate tabs in the query results panel.
We have reviewed several approaches to the problem of registering, discovering and integrating neuroscience databases. Domain independent and text-based search engines such as Google are inadequate in terms of meeting the specific but diverse needs posed by neuroscientists. More focused search strategies are needed. NDG and NIF are representative approaches to implementing such search strategies. For ontologically-based querying and integration, CBQI and Entrez Neuron give demonstration of how this can be achieved. They also highlight the importance of an intuitive interface in enabling the neuroscientist to issue complex integrated queries without a steep learning curve.
We have identified several future directions for Entrez Neuron.
National Institutes of Health grants P01 DC04732, R01 DA021253 and U24 NS051869, The Fidelity Foundation.
The authors thank Maryann Martone and Willy WaiHo Wong for providing the OWL version of the Subcellular Anatomy Ontology (SAO) and assisting with their integration into Entrez Neuron. They also thank Melliyal Annamalai and Alan Wu from Oracle Corporation for their technical assistance.
Kei-Hoi Cheung is an associate professor at the Center for Medical Informatics, Yale University School of Medicine.
Ernest Lim is a programmer at the Center for Medical Informatics, Yale University School of Medicine.
Matthias Samwald is a postdoctoral researcher at DERI Galway, Ireland and the Konrad Lorenz Institute for Evolution and Cognition Research, Altenberg, Austria.
Huajun Chen is an associate professor at the College of Computer Science, Zhejiang University, Hangzhou, China.
Luis Marenco is an assistant professor at the Center for Medical Informatics, Yale University School of Medicine.
Matthew E. Holford is a programmer of the Center for Medical Informatics, Yale University School of Medicine and Department of Biostatistics, Yale University School of Public Health.
Thomas M. Morse is an associate research scientist in the Department of Neurobiology, Yale University School of Medicine.
Pradeep Mutalik is an associate research scientist at the Center for Medical Informatics, Yale University School of Medicine.
Gordon M. Shepherd is a professor in the Department of Neurobiology, Yale University School of Medicine.
Perry L. Miller is a professor and director of the Center for Medical Informatics, Yale University School of Medicine.