|Home | About | Journals | Submit | Contact Us | Français|
Commentaries by Derffuss and Mar (2009), Nielsen (2009), Hamilton (2009), and Laird and Fox (2009) agree on the need for a comprehensive database of published stereotaxic coordinates but offer diverse views on how best to achieve this objective. Here, I summarize recent enhancements to the SumsDB database that increase its utility and decrease the impediments to data submission, thereby making it attractive as a resource that can approach comprehensive content in a realistic time frame.
In their commentary “Lost in localization: The need for a universal coordinate database”, Derrfuss and Mar (2009), argue cogently for a comprehensive database that would provide efficient access to the hundreds of thousands of stereotaxic coordinates that summarize key experimental findings in an estimated 10,000 neuroimaging studies. They noted that existing coordinate databases contain only a modest fraction of the relevant data and also that none (at that time) was matching the pace at which new coordinate data are being published. The core problems are that submitting coordinate data requires substantial time and effort and that the benefits from submitting such data have not inspired widespread voluntary participation by the neuroimaging community. The keys to alleviating this bottleneck are to reduce the effort entailed and to increase the benefits of data submission. This is precisely the objective of recent improvements to the SumsDB database (http://sumsdb.wustl.edu/sums/).
It is useful to summarize key features of SumsDB and the associated visualization software (Caret and WebCaret), especially since many enhancements were implemented after Derrfuss and Mar submitted their commentary. Features that make SumsDB useful for data mining of coordinates (‘foci’ in our terminology) fall into five main categories.
The ‘Quick-Search’ repository in SumsDB (April, 2009) currently contains ~40,000 foci from ~1,300 studies (Fig. 1A) and supports searches based on many types of metadata. These include:
This enables searches that address questions of the following types:
Each focus is associated with extensive metadata, immediately viewable by clicking on that focus (arrow in Fig. 1E). There are also direct links from each focus to PubMed and to the online article. Thus, while the initial search results typically include many foci that are irrelevant to the primary question posed, information that is close at hand allows screening of extraneous foci and selection of just the relevant foci.
SumsDB is linked to online (WebCaret) and offline (Caret) software that enable visualization of search results on a human brain atlas. Important attributes of the atlas and the visualization software include:
Rapid access to the original publication is frequently important, in order to assess the relevance of various foci to whatever question the investigator has in mind or to critically evaluate exactly what experiments were done and how they were analyzed and interpreted. To facilitate this process, SumsDB, WebCaret, and Caret all provide links from specific search results to the corresponding online publications (via PubMed, and also directly when possible). For most foci, the table, figure, or page number is specified, thereby allowing the relevant section of the study to be found quickly.
SumsDB contains a ‘Study Collection Library’, in which each study collection points to a thematically unified group of studies and associated foci. Some study collections represent a formal meta-analysis linked to a published review or research article on that topic, such as ‘deception’ (Christ et al., 2008). Others represent informal, non-comprehensive collections that provide useful entrees to various topics of interest (e.g., ‘face perception’). Study collections can be kept private or made public, and they can be easily modified (e.g., to add newly identified studies related to the study collection topic).
The Neuroscience Information Framework portal (NIF, http://neuinfo.org/) allows efficient searching of a wide range of neuroscience-related resources available on the web. For SumsDB and many other resources, NIF supports ‘deep’ data mining, wherein relevant database contents (not just the home page) can be directly accessed by queries initiated in NIF. For SumsDB, NIF-initiated queries report specific search results and in addition allow users to immediately link out and view the results using WebCaret.
Data submission to SumsDB entails entering three types of data into two curated ‘libraries’.
For any given data entry, the Master Study Library and Foci Library support multiple versions that differ in their metadata content, providing useful flexibility and updating capabilities. The Quick-Search repository used to expedite routine searches is a distillation that includes a single entry for each focus and each study.
Tutorial and instruction documents (accessible via ‘Foci Data Mining’ on the SumsDB home page) show how to enter coordinate-related data into SumsDB. Training takes 5-10 hours, depending on initial familiarity with Caret software. After training, data submission typically takes 30 – 60 minutes per study. This is only modestly slower than the ~20 minutes needed for the AMAT database with its ‘minimal’ metadata requirements (Hamilton, 2009) and is much faster than the extensive task characterizations required by the BrainMap database (Fox et al., 2005; Laird and Fox, 2009). Thus, SumsDB provides an important middle ground, with large value added for a modest data entry effort.
Data submission to SumsDB offers multiple benefits to the submitter:
Nielsen (2009) has proposed a wiki-based approach to submitting coordinates and metadata. As noted by Hamilton (2009) and Laird and Fox (2009), this approach may encounter difficulties if it lacks an enforced, coherent metadata structure. Indeed, our experience with SumsDB is that a robust and carefully designed infrastructure is necessary for dealing with various technical complexities. These include avoiding unwanted duplication of foci and studies already in the database while allowing multiple versions of foci and studies when they differ meaningfully in metadata content (e.g., in the description of behavioral tasks). Providing for these and other important needs adds a modest overhead to the data submission process, but yields major benefits in making this a user-friendly resource.
SumsDB libraries have nearly tripled in content over the last 16 months, from 14,000 (~500 studies) in January 2008 to ~40,000 foci (~1,300 studies) in April 2009. This approaches the rate at which new studies reporting coordinates are published and makes SumsDB the fastest-growing of the existing coordinate databases. We anticipate being able to sustain this pace through ongoing curation efforts in the Van Essen lab. However, to accelerate the process and substantially reduce the large backlog, it is vital to enlist volunteers from the neuroimaging community. An attractive and feasible model is for one or two individuals (students, postdocs, or knowledgeable technicians) from each of many laboratories to enter data published by their own laboratory plus selected topics related to that lab's research interests. For example, if 50 volunteers each added ~20 studies per year (15-30 hours per volunteer, including training), the current rate of submission would approximately double, and about half of the relevant literature would be covered in ~5 years.
Psychology and neuroscience courses offer another way to promote data submission. For example, a classroom project to explore a specific aspect of brain function might include analysis of existing studies in SumsDB plus addition of relevant studies from the literature that are not yet in SumsDB. Bearing in mind that our central curation process prevents flawed or invalid entries from becoming public, such efforts can be undertaken by graduate students or even undergraduates in a supervised instructional setting.
Datasets underlying published meta-analyses constitute a particularly attractive source for database submissions. A PubMed search for ‘neuroimaging AND meta-analysis’ reports 180 meta-analyses, half of which were published since 2006. Most of these meta-analyses (based on inspection of the 20 most recent ones) involved extraction of coordinate data directly from the literature, not from any database. Instructions for handling existing meta-analyses are included in the documentation for submitting coordinate-related data into SumsDB, so as to capitalize on the work already done in extracting coordinate data and to generate a study collection that can be linked to the published meta-analysis.
Several developments could further accelerate the pace at which coordinate data are made accessible and useful to the community.
Greater sharing of data across existing coordinate databases would reduce duplication of effort. Already, SumsDB contains over 5,000 foci (228 studies) extracted from the AMAT database (with permission from and acknowledgment to A. Hamilton); similarly, AMAT includes many entries initially entered in the BREDE database (Nielsen, 2003). Data from the SumsDB Foci Library and Master Study Library are freely available for data mining or incorporation into other databases, with the expectation that usage of SumsDB will be appropriately acknowledged. Open sharing of data would allow each database to capitalize on its unique capabilities. The two largest databases (Brain Map and SumsDB) each have extensive visualization and analysis capabilities that differ greatly, making these resources more complementary than competing. Volunteers would presumably be more willing to contribute coordinate data to their preferred database if they anticipate that the data will soon populate other databases and thus be of broader utility. SumsDB allows credit to be allocated to the database of origin as well as to individual contributors, thus sharing the credit appropriately. Because databases differ in data format and metadata content, sharing can be expedited by providing a schema that characterizes the database structure, as we learned through the process of federating SumsDB with NIF. More generally, the recommendation that the neuroimaging community embrace open knowledge for data mining (Nielsen, 2009) is timely and meritorious.
Greater standardization of how results are tabulated and reported in journal articles could increase the efficiency of extracting coordinate data and metadata. This can begin with modest, incremental steps, though the ultimate objective should be to enable automatic or semi-automatic data extraction. However, without buy-in from the neuroimaging community, journal editors are unlikely to impose steps that might be even modestly burdensome to authors. On the other hand, systematization of how coordinate data are reported and described would benefit authors and journal readers alike (Poldrack et al., 2008 – see ‘Figures and tables should stand on their own’ section). In short, the potential benefits of data standardization are large, but the timing and the process for building community support need careful attention. Organizations such as the Organization for Human Brain Mapping, and Society for Neuroscience, and the International Neuroinformatics Coordinating Facility could help catalyze this process in response to inputs from their membership.
The conciseness of x,y,z coordinate data is both a strength and a limitation of this data format. Obviously, much greater information about complex spatial and temporal patterns of brain activation is available in the volume and surface data from which foci are extracted. In an explicit comparison of approaches, Salimi-Kharshidi et al. (2009) demonstrated that image-based meta-analyses (IBMA) are more sensitive than coordinate-based meta-analyses (CBMA) in revealing patterns of activation that are consistent across studies. For IBMA to become widely used for meta-analyses, a searchable database repository for image (volume) data from a large number of studies would be extremely useful. SumsDB can already handle volume as well as surface data and thus provides an existing option for this purpose. Indeed, a growing number of studies directly link to datasets in SumsDB, providing access to the underlying data and enabling WebCaret visualization of scenes that replicate the published figures (see ‘Publications with links to SumsDB’ on the home page). To facilitate efficient data mining, we intend to establish a Surface Library and a Volume Library in SumsDB, analogous to the existing Foci Library, that will house curated sets of published surface and volume data and associated metadata.
In conclusion, it is useful to place this and other emerging neuroimaging-related efforts in neuroinformatics into perspective relative to the burgeoning field of bioinformatics. Powerful data mining tools for analyzing genome and protein sequencing data have dramatically advanced our understanding of genetics and molecular biology over the past two decades. Progress has been slower in making neuroinformatics approaches useful to the neuroscience community for a variety of technical and sociological reasons (Gardner et al., 2008). Coordinate data from human neuroimaging studies represent ‘low-hanging fruit’, now ripe for exploitation using increasingly powerful neuroinformatics tools that can accelerate progress in elucidating brain function in health and disease.
I thank Erin Reid for yeoman's work in data entry and curation, John Harwell and Ping Gu for excellent software design and development, Jessica Cohen, Jeff Phillips, Shawn Christ, and Donna Dierker for comments, and Susan Danker for help in manuscript preparation. Supported NIH Grant R01-MH-60974, funded by the National Institute of Mental Health, the National Institute for Biomedical Imaging and Bioengineering, and the National Science Foundation, and by the Neuroscience Information Framework NIH Subcontract.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.