The increasing ability to obtain digital information in medical and biological neuroimaging research has lead to a vast increase of scientific data from across a variety of spatial and temporal scales (Van Essen 2002
). With each new technological advance neuroscientific data may be collected with finer resolution per unit time and render more detailed forms of biologically relevant information (Bandettini 2007
). Occurring simultaneously with advances in imaging technology has been the advancement of the World Wide Web - whose original purpose was to permit ease of data exchange between collaborating scientists but now links people, computers, and information on an unprecedented global scale. From this co-evolution of neuroscientific and computer network technology is an increased expectation that primary scientific data be openly shared via readily accessible databases (Koslow 2000
). One particularly notable example is from the domain of human brain imaging where large, three and four dimensional, volumes of structural and functional brain data are obtained using high-resolution magnetic resonance scanners. In any individual research publication, several tens of gigabytes-worth of data may be represented – collected across normal and patient populations. However, very often the raw and pre-processed versions of these data are not available to researchers outside of the team that collected them. Concerns over the sharing of the primary data may exist that prohibits their availability (Koslow 2002
). What are available may only be lists of local “hot spots” of activity referenced with respect to a triplet of brain atlas spatial coordinates, perhaps tables of region volumetric results, other summary statistics, and some very selective graphical renderings. Study meta-data (the data that describes how the data were obtained, the parameters, experimental design, etc) may be incomplete and limit the scope of future use. The raw and preprocessed versions of those data may end up being lost should the post-doc who did the work leave the lab, if the data are archived onto media that soon becomes outdated, or are unrecoverable following a computer mishap.
If, on the other hand, the data from published as well as ongoing studies can be archived using a reliable and well maintained framework, then the utility of the data can extend beyond the intent of their original collection (Van Horn and Gazzaniga 2005
). Datasets from diverse subjects or between patient groups can be mined to examine patterns among the data that would otherwise go unseen in any individual investigation wherein combining datasets can increase statistical power to observe more subtle effects. Using centralized (Van Horn, Woodward et al. 2002
) or distributed databasing approaches (Grethe, Baru et al. 2005
), research consortia can better manage work being performed across distant research centers. Importantly, through the use of databases, federally funded collections of neuroimaging data can reach the widest numbers of researchers who can turn that data into new knowledge, thereby maximizing their utility and justifying the cost of their collection.
With the rapid advances being made in neuroimaging technology, data acquisition, and computer networks the successful organization and management of neuroimaging data has become more important than ever before (Poliakov, Hertzenberg et al. 2007
; Hasson, Jeremy et al. 2008
). Technological advances in computer network throughput, disk storage, and archival capabilities can be brought to bear so that databases can truly be a resource for exchange and future use in computational anatomy and modeling (). However, databases still suffer from some reluctance on the part of the community who harbor doubts about their trustworthiness, the difficulties associated with sharing, and how their data will be used by others.
Neuroimage Data Flow via the LONI Imaging Data Archive (IDA)
During the early years of this decade, considerable attention was given to neuroimaging databases from the Organization of Human Brain Mapping (OHBM)
(Governing Council of the Organization for Human Brain Mapping 2001
), who expressed concern about the quality of brain imaging data being deposited into such archives, how such data might be re-used, and the potential for their being represented in new publication. The question of data ownership, in particular, was a primary concern in initial attempts to archive data (Editorial 2000
). A recent data ownership controversy (Abbott 2008
) has highlighted anew the still tenuous nature of data ownership, re-use, research ethical standards, and the pivotal role that peer-reviewed journals play in this process (Fox, Bullmore et al. 2008
). The implications of disagreements concerning appropriate data re-use and new publication also impacts the users of neuroscience data archives and how researchers might independently draw from archives and publish results independently. While some might view the threat of similar disputes as an argument against data sharing or large-scale archiving, we believe that this need not be the case and that open access to primary neuroscience data through curated archives can enhance collaborations, not hinder them. Leading scientific organizations, working closely with government organizations and journal publishers, are poised to enact policies that promote the use of databases while being sensitive to intellectual priority and research ethics. There are many positives to databasing neuroimaging data and it is helpful to review these aspects and how they contribute to the health of the field and encourage new thinking.
In this commentary we examine the various roles that neuroimaging databases play in scientific data sharing, data re-use, and consider some of the characteristics of trusted data archives. A spectrum of database models has been proposed that range from simple FTP sites to fully curated efforts containing data from published studies. We note the importance of thorough data management and organization. We discuss population-level brain atlases as one natural outcome of databases - essential for understanding normal and abnormal brain form and function. We detail our own experiences developing databases and give examples of successful utility for several large scale neuroimaging initiatives. The processing and examination of datasets from multiple subjects necessitates clever workflow design, optimization, and provenance with a view toward promoting independent re-analysis and study replication. We observe that, among other metrics, databases are only as good as how they are being used and their effectiveness in generating new science and contributions to education are important benchmarks of their success. The data present in neuroimaging archives also forms a basis for content-driven comparison representing new and interesting computational challenges. As many these resources are of immeasurable value to neuroscience, their long-term sustainability is imperative. Finally, we discuss the lessons that we, and the community, have learned in the creation and maintenance of these essential neuroscientific resources. We believe that such aspects strongly favor scientific organizations, such as OHBM, re-examining the role of neuroimaging databases and their use in promoting a healthy research enterprise.