There are numerous psychiatric disorders that can plague the development of children. Each of these disorders manifests as a distinct pattern of clinical, behavioral, etiological, neuroanatomic and neurofunctional characteristics that challenge the management of the individual patient, as well as the development of successful intervention and prevention strategies. In the area of neuroimaging, a substantial number of studies have been performed to date; and while much has been learned from this investment, this represents only the tip-of-the-iceberg of the information that can be gleaned from the data. Unfortunately, most of this additional, untapped information resource is lost due to ineffective sharing of the data that is possible by application of current technologies.
The Child and Adolescent NeuroDevelopment Initiative (CANDI) at University of Massachusetts Medical School is making available a series of structural brain images, as well as their anatomic segmentations and demographic data, as a coordinated set of related morphometric resources. The initial data set is a release of 103 subjects (T1-weighted MRI scans and anatomic segmentation) that comprised the neuroanatomic data published in 2008 in an article by Frazier, et al.1 The subjects include 57 males and 46 females, aged 4–17, and come from four diagnostic groups: Healthy Controls (N=29), Schizophrenia Spectrum (N=20), Bipolar Disorder with Psychosis (N=19), and Bipolar Disorder without Psychosis (N=35).
Both the McLean Hospital and Cambridge Health Alliance Institutional Review Boards approved the original data acquisition research protocols. All subjects signed assent forms, and their parents/legal guardians signed informed consent forms. The University of Massachusetts Medical School Institutional Review Board approved the data sharing protocol. Images were acquired from 1996–2006 at the McLean Hospital Brain Imaging Center on a 1.5 Tesla General Electric Signa Scanner. Structural imaging was performed using a three-dimensional inversion recovery-prepared spoiled gradient recalled echo in the coronal plane with 124 1.5 mm thick slices, repetition time 10 ms, TE 3, flip angle 25°, field of view 24 cm, acquisition matrix 256×192 and 2 excitations.2 All scans were reviewed by a clinical neuroradiologist to rule out gross pathology. The MR images released have undergone analysis at the Center for Morphometric Analysis (CMA) at the Massachusetts General Hospital which includes: preprocessing (‘positional normalization’ to put the image into the standard orientation of the Talairach coordinate space,3 and bias field correction4) and ‘general segmentation’ following the CMA segmentation protocol.5 Segmented regions include: Cerebral Cortex and White Matter, Cerebellum Cortex and White Matter, Lateral Ventricle, Thalamus, Ventral Diencephalon, Caudate, Putamen, Pallidum, Accumbens, Hippocampus and Amygdala, bilaterally; as well as Brain Stem, 3rd and 4th ventricles. Basic demographic details for each subject include diagnosis, age, gender, and handedness (Fig. 1).
The release is provided under the Creative Commons: Attribute license,6 and is structured as four bundles (tar) of imaging data, one for each diagnostic group. Within each bundle, there are separate directories for each subject that includes the data. Release version V1.0 includes only the imaging data and basic demographics and is accessible with no limitations. Version V1.1 adds the segmentation data as an indexed label file in register with the MR image for each subject, and requires provision of a contact email address (via the NITRC registration process) and requires a ‘clickthrough’ acceptance of the licensing terms. We do not perceive these extra access ‘hoops’ as burdensome, but they are necessary. While we hope that the data provided are perfect and free of any errors, experience has taught that some unforeseen errors may be present. If found, we will correct any error, and re-release a new version of the data and are obligated to inform all prior users of the data of any errors and corrections that were necessitated in order to limit the impact of any errors.
In addition to the image releases, the CANDIShare portal begins to establish a more richly interconnected set of related informatics resources. The first example of this is the inclusion of linkages between each data release and their representation in the Internet Brain Volume Database (IBVD).7 The IBVD is a web-based database of published brain neuroanatomic volumetric observations. As the images and segmentation in this data release supports the volumetric observations in a specific publication, this linkage completes the connection between the high-level, group volumetric observation and the detailed individualized data that supports it. As the IBVD is also interoperable with PubMed via the ‘Link-Out’ function, a PubMed user can get from the publication listing at PubMed, to the summary volumetric observations reported in IBVD, to the raw data that supports the observation in the CANDIShare data release. Each of the diagnostic group bundles are linked to their respective group pages in the IBVD via the link exposed at the download page of the NITRC project.
This image and segmentation data release for a specific publication represents the first in a series of data representing the pediatric brain in health and disease from data collected in our lab over the past 15 years. Additional publications will be added, and more resource interconnectivity will be included to maximize the utility of the released data with other classes of available data. This release of information is designed to be dramatically greater than merely ‘making the images available’: each image is associated with substantial analytic results, many of which have been utilized in the preparation of various publications and comparisons. Moreover, these data will be most effectively shared with the research community when shared in a way that preserves the linkages between the images, the resultant analytic data and meta-data, and its relationships to other public sources of related information. The utility of this shared data includes potentially expanding the numbers of subjects available for other studies (when the imaging protocols are comparable), facilitating additional research findings in this specific data set through reuse by other laboratories, and additional community access to accurate, manually-labeled datasets to promote development and testing of data analysis software. Access to data of this sort has already been valuable in development of numerous analysis techniques.8 In short, this represents a ‘Knowledge Management’ environment that will facilitate traversal of these data and linkages.