|Home | About | Journals | Submit | Contact Us | Français|
In normal humans, relationships between cognitive test performance and cortical structure have received little study, in part, because of the paucity of tools for measuring cortical structure. Computational morphometric methods have recently been developed that enable the measurement of cortical thickness from MRI data, but little data exist on their reliability. We undertook this study to evaluate the reliability of an automated cortical thickness measurement method to detect correlates of interest between thickness and cognitive task performance. Fifteen healthy older participants were scanned four times at two-week intervals on three different scanner platforms. The four MRI datasets were initially treated independently to investigate the reliability of the spatial localization of findings from exploratory whole-cortex analyses of cortical thickness-cognitive performance correlates. Next, the first dataset was used to define cortical ROIs based on the exploratory results that were then applied to the remaining three datasets to determine whether the relationships between cognitive performance and regional cortical thickness were comparable across different scanner platforms and field strengths. Verbal memory performance was associated with medial temporal cortical thickness, while visuomotor speed/set-shifting was associated with lateral parietal cortical thickness. These effects were highly reliable—in terms of both spatial localization and magnitude of absolute cortical thickness measurements—across the four scan sessions. Brain-behavior relationships between regional cortical thickness and cognitive task performance can be reliably identified using an automated data analysis system, suggesting that these measures may be useful as imaging biomarkers of disease or performance ability in multi-center studies in which MRI data are pooled.
Remarkably specific cognitive deficits can be present in patients with focal cortical lesions (Caramazza and Hillis, 1991; Damasio et al., 1996; Stuss et al., 2001a; Rosenbaum et al., 2005). Yet the relationships, in normal persons, between individual differences in performance on neuropsychologic tests and individual variability in measures of cortical structure have received relatively little study. The lack of data on this fundamental brain-behavior correlate is in part a reflection of the paucity of tools with which to perform such measurements. Manual operator-derived region-of-interest (ROI) measurements from magnetic resonance imaging (MRI) data have demonstrated, for example, that hippocampal and entorhinal volume in patients with Alzheimer’s disease (AD) correlates with performance on particular neuropsychological tests of memory (de Toledo-Morrell et al., 2000; De Toledo-Morrell et al., 2000; Kramer et al., 2005). However, manual ROI-based approaches are limited in that they are quite laborious, and thus are typically restricted to a few brain regions. Furthermore, this procedure enables the measurement of only cortical volume, not cortical thickness, because the cortical thickness is a property that can only be properly measured if the location and orientation of both the gray/white and pial cortical surfaces are known. In addition, cortical volumetric approaches typically require an a priori definition of the ROIs, limiting the possibility of exploratory analyses of other cortical regions or of subregions within the ROIs.
Voxel-based methods have been developed that enable the exploratory analysis of MRI data with respect to clinical diagnosis, cognitive performance, or other variables, and have demonstrated relationships between the grey matter density of particular brain regions and cognitive performance measures in patients with traumatic brain injury, for example (Gale et al., 2005). Yet given the range of normal individual variance in cortical morphologic features, such as gyral and sulcal patterns, the use of voxel-based tools that transform and smooth individual MRI data into common coordinate spaces may remove the precise features of interest for studies investigating within-group correlations between cortical morphometry and cognitive performance, and reduce the ability to specifically localize findings. Furthermore, the measure typically analyzed by voxel-based techniques—“grey matter density” (Thompson et al., 2001)—is difficult to interpret quantitatively with respect to a particular morphometric property of cerebral tissue (i.e., volume, thickness, surface area).
To enable the study of morphometric properties of the human cerebral cortex and their relationship to cognitive function, disease state, or other behavioral variables, automated methods have been developed for segmenting and measuring the cerebral cortex from MRI data (Dale et al., 1999; Joshi et al., 1999; MacDonald et al., 1999; Xu et al., 1999; Zeng et al., 1999; van Essen et al., 2001; Shattuck and Leahy, 2002; Sowell et al., 2003; Barta et al., 2005; Han et al., 2005). Using such tools, relationships have been identified between regional cortical thickness and intelligence quotient (Narr et al., 2006; Shaw et al., 2006), personality measures (Wright et al., 2006; Wright et al., 2007), and memory (Walhovd et al., 2006). Although the validation of MRI-derived cortical thickness measurements has been performed against manual measurements derived from both in vivo and post-mortem MRI brain scans (Rosas et al., 2002b; Kuperberg et al., 2003a; Salat et al., 2004), the reliability of measures of this fundamental morphometric property of the brain has received relatively little systematic investigation (Fischl and Dale, 2000; Rosas et al., 2002a; Kuperberg et al., 2003b; Sowell et al., 2004; Lerch and Evans, 2005; Han et al., 2006). Most studies have investigated reliability by comparing thickness measurements across different subjects, or by performing repeated scans on a few subjects acquired within the same scan session or within very short scan intervals (e.g., the subjects were removed from the scanner and then scanned again in 5 minutes (Sowell et al., 2004)). This approach may greatly underestimate the sources of variability relevant for longitudinal studies (e.g., subject-related factors, such as hydration status, or instrument-related factors, such as scanner drift). Furthermore, the level of reliability that is needed for the detection of effects of interest is not clear, as none of these studies have evaluated the reliability of cortical morphometric methods for the detection of specific effects of interest, such as correlation of thickness with behavioral measures or group differences in thickness between normal and diseased populations. Thus, the feasibility of the pooling of MRI data across multiple centers for the study of cortical thickness correlates of behavioral performance, disease state, or other purposes is unknown.
We undertook this study to extend our previous investigation of the reliability of a cortical thickness measurement method both within and across different scanner platforms and field strengths (Han et al., 2006). In the previous investigation, we analyzed the test-retest and cross-platform and cross-field strength reliability of cortical thickness measurements across the entire cerebral cortex.
The goal of the present analysis, which makes use of the same MRI dataset used previously (Han et al., 2006), was to investigate the reliability of detection of correlates of interest between cortical thickness and cognitive task performance. Fifteen healthy older subjects were scanned four times at two-week intervals on three different scanner platforms (test scan on Siemens 1.5T, re-test scan on Siemens 1.5T, cross-site/manufacturer scan on GE 1.5T located at a different clinical site, and cross-field-strength scan on Siemens 3T). Older participants were studied so that anatomical variability related to atrophy and age-related signal changes was represented. The two-week interval was chosen so that elements of variability related to subject hydration status and minor instrument drift would be included, which may be artificially minimized when the test-retest interval is several minutes to ~1 day. We initially treated each of the four MRI datasets independently to investigate the reliability of the spatial localization of findings from exploratory whole-cortex analyses of cortical thickness-cognitive performance correlates. Next, we used the first of the four datasets to define cortical ROIs based on the results of the exploratory analysis that could then be applied to the remaining three datasets, in an unbiased manner, to determine how well the magnitude measures of absolute cortical thickness within these regions corresponded across the datasets, and whether the relationships (slopes of regression lines) between cognitive performance and regional cortical thickness were comparable across different scanner platforms and field strengths.
Healthy individuals older than 65 were recruited from the Boston metropolitan community via newspaper advertisements specifically for this reliability study. Respondents were pre-screened extensively using a standardized telephone interview (Go et al., 1997), and excluded based on evidence of significant medical, psychiatric, or neurologic conditions, the use of psychoactive medications, contraindications to MRI, or evidence of memory impairment. Respondents who were not excluded based on pre-screening were scheduled for a screening visit. All participants provided informed consent in accordance with the Human Research Committee of Massachusetts General Hospital.
At the screening visit, following informed consent, all participants underwent a standard medical, neurologic, and psychiatric evaluation by a board-certified behavioral neurologist (B.C.D.), as well as routine laboratory screening. Subjects were excluded based on the following criteria: evidence of significant medical disease (e.g., cancer, cardiovascular disease, diabetes, lung or kidney disease), neurologic disease (e.g., epilepsy, significant head trauma), psychiatric illness (e.g., depression, substance abuse); evidence of cognitive impairment by clinical evaluation; or contraindication to MRI. For subjects who were not excluded, a neuropsychologic test battery was administered. This battery included the following standard neuropsychological tests: the California Verbal Learning Test (CVLT) (Delis et al., 1987), the Trail Making Test (TMT) (Reitan, 1958), and the Mini-Mental State Examination. These tests were selected because they have demonstrated sensitivity to the cognitive impairment in both prodromal and established AD (Albert et al., 2001).
Each subject underwent 4 MRI scan sessions at approximately two week intervals, including two sessions on a Siemens 1.5T Sonata scanner (Siemens AG, Erlangen, Germany), one on a Siemens 3T Trio scanner, and one on a GE 1.5T Signa scanner (General Electric, Milwaukee, WI). The Siemens scanners were located at the Martinos Center for Biomedical Imaging at Massachusetts General Hospital, and the GE scanner was located at the Brigham & Women’s Hospital. The first scan session followed the screening visit (during which neuropsychological testing was performed) by about 2-4 weeks.
In each Siemens scan session, the acquisition included two MPRAGE volumes (190 Hz/pixel, flip angle= 70, 1.5T: TR/TE/TI=2.73s/3.44ms/1s, 3T: TR/TE/TI =2.53s/3.25ms/1.1s). In each GE session, a custom MPRAGE sequence was programmed with parameters as similar as possible, and two volumes were acquired. The total scanning time for the MPRAGE sequences was about 8 minutes for each acquisition. All scans were 3D sagittal acquisitions with 128 contiguous slices (imaging matrix = 256x192, in-plane resolution = 1mm, and slice thickness = 1.33mm). In each Siemens scan session, the acquisitions were automatically aligned to a standardized anatomical atlas to ensure consistent slice prescription across scans (van der Kouwe et al., 2005; Benner et al., 2006).
Reconstruction of the cortical surfaces and measurement of cortical thickness were performed using the FreeSurfer toolkit (version 2.2, which is freely available to the research community via the internet at http://surfer.nmr.mgh.harvard.edu/). This suite of methods was initially proposed in the 1990s (Dale et al., 1999; Fischl et al., 1999a), and has undergone several important improvements over the years (Fischl et al., 1999b; Fischl et al., 2001; Fischl et al., 2002a; Segonne et al., 2004). With these updates, the current method is fully automated, and the major steps are outlined below and detailed previously (Han et al., 2006). For this study, each dataset was processed independently without use of the longitudinal methods outlined in Han et al. (2006).
The preprocessing step performs motion correction and averaging of the two acquisitions into one T1 volume, resampling to 1mm isotropic, spatial normalization, intensity normalization, and skull stripping. The white matter (WM) is then segmented and subcortical grey matter and ventricles are filled following subcortical segmentation. Once the two WM volumes (one for each hemisphere) have been generated, a surface tessellation is constructed for each hemisphere. Then a surface-based automatic topology correction (Segonne et al., 2004) is performed to remove topological “defects”, also known as handles or holes (Fischl et al., 2001), from the initial surface tessellation. After the topology correction, a deformable model method is applied to refine the surface localization to obtain a subvoxel-accurate representation of the gray/white boundary and then further deform it to find the pial surface (Dale et al., 1999). The surface deformation involves a multi-scale and locally adaptive estimation of the MRI values at the gray/white and the pial surfaces. The deformation is accomplished by minimizing a constrained energy functional, with the constraint that the surface be smooth at the spatial scale of a few millimeters. In addition, an absolute constraint is placed on the surface during the deformation that maintains the natural topology of the cortical surface.
The reconstruction of the white and pial surfaces is sufficient for the goal of estimating cortical thickness. For visualization purposes and to facilitate the comparison of thickness results across different subjects, it is necessary to obtain an inflated model of the cortex and establish a common surface-based coordinate system. Both a partially inflated representation and a spherical map are generated from the gray/white surface for each hemisphere (Fischl et al., 1999a). The inflated surface allows easy visualization of cortical localization. The data were smoothed on the surface using an iterative nearest-neighbor averaging procedure. One hundred iterations were applied, which is equivalent to applying a 2-dimensional Gaussian smoothing kernel along the cortical surface with a full-width/half-maximum of 18.4 mm. The spherical map of each subject for each hemisphere is further morphed to a common spherical atlas using a nonlinear surface registration procedure (Fischl et al., 1999b), which allows high-resolution, surface-based averaging and comparison of cortical measurements across different subjects.
With the reconstructed gray/white and pial surfaces, cortical thickness estimates are obtained as follows. For each point on the gray/white surface, the shortest distance to the pial surface is first computed. Next, for each point on the pial surface, the shortest distance to the gray/white is found, and the cortical thickness at that location is set to the average of these two values. (Note that pathological point pairs in which the dot product between the vector from the gray/white boundary to the pial surface with the surface normal is negative are excluded.)
Statistical surface maps were generated by computing a general linear model for the effects of the cognitive performance variable of interest on cortical thickness at each vertex point of the cortical surface model. Given the small number of subjects, the statistical threshold was set at a relatively liberal level for initial exploratory whole-cortex analyses (p< 0.01, two-tailed). For each MRI dataset, two separate general linear models were generated, one for CVLT delayed free recall and one for TMT-B timed speed of performance.
To explore the reliability of the detection of effects of interest across all four MRI datasets, we initially treated each dataset independently, performing two separate general linear model analyses as described above for each dataset. This yielded four maps for each cognitive performance variable, one from the ‘test’ 1.5T dataset, one from the ‘re-test’ 1.5T dataset, one from the ‘cross-platform’ 1.5T dataset, and one from the ‘cross-field-strength’ 3.0T dataset. The purpose of this analytic approach was to assess the reliability of the spatial localization of findings across all four datasets, based on the idea that lack of reliability of cortical thickness estimates would lead to differences in the spatial location of clusters of cortical regions in which thickness correlated with cognitive test performance.
In addition, we analyzed the reliability of regional cortical thickness magnitude estimates (in millimeters) across the four scan sessions, and the correlation between cortical thickness and cognitive performance, using the following approach. Once cortical regions with significant effects were identified as described above in the exploratory analysis of the ‘test’ dataset, a region of interest (ROI) based approach was used to further investigate the quantitative relationships between the four datasets. For these analyses, an ROI was generated on the template cortical surface that included all vertices in which thickness-performance correlations were observed. Specifically, in the delayed free recall analysis, an ROI was generated in the medial temporal lobe cortex. In the TMT-B analysis, an ROI was generated in the lateral parietal cortex. These ROIs are illustrated in Figures Figures33 and 6 below.
Using the spherical morph from each subject that transforms that subject’s cortical surface model to the average cortical surface template, these ROIs were mapped from the template to each individual subject and the mean cortical thickness within each ROI in each subject was measured. This generates, for the ‘test’ MRI datset for each of the 15 subjects, a mean cortical thickness measure for 1) the medial temporal cortical ROI and 2) the lateral parietal cortical ROI. Next, similar procedures were performed using the other three MRI datasets for each subject, using the ROIs generated from the ‘test’ dataset. Thus, bias is avoided by making measurements of the cortical thickness from the models generated from the last three datasets from each subject by using the ROI generated from the first (‘test’) dataset. The ROI analysis also avoids the problem of having to correct for large numbers of statistical tests (the so-called “multiple comparison” problem). The results of these analyses were used to plot each individual subject’s mean cortical thickness within the two ROIs against the cognitive performance measure of interest. A regression line as well as correlation r and p-values were then calculated. For all ROI based analyses a two-tailed significance threshold of p < .05 was used. These statistical analyses were performed using SPSS 11.0, (Chicago, IL).
Seventeen older subjects were screened, 16 were enrolled, and 15 completed the study (age between 66 - 81 years; mean: 69.5 years; std: 4.8 years). (One subject was excluded based on a history of significant head trauma, and the other was unable to complete the scanning procedures due to excessive motion in the scanner.) Demographic and cognitive performance data are provided in Table 1. Although an informant was not interviewed in this study, all subjects received a Clinical Dementia Rating score of 0 based on subject interview and office evaluation.
MRI data were obtained from all 15 subjects. Sequence-related gradient artifacts were present in MP-RAGE data obtained on the GE Signa scanner in two subjects. Both participants returned for repeat scans; one of these two scans was artifact-free but the other contained an artifact similar to the first GE scan, so GE data were unusable for that subject. Subject motion-related artifacts were present in two Siemens datasets, one from the re-test scan session on the 1.5T Sonata and one from a different subject’s 3T Trio scan session. Thus, the total number of analyzable datasets included fifteen test 1.5T Sonata sessions, 14 re-test 1.5T Sonata sessions, 14 3T Trio sessions, and 14 1.5T Signa sessions.
Results of the statistical analysis of cortical thickness predictors of CVLT delayed free recall performance demonstrated several regionally specific correlates in the ‘test’ 1.5T dataset. In the left hemisphere, the thickness of a region of the rostral medial temporal cortex correlated with delayed free recall performance. Medial paracentral/cingulate sulcus cortical thickness was inversely correlated with free recall performance. No regions on the lateral surface or on the right hemisphere were identified. These results are illustrated in Figure 1.
Results of similar statistical analyses in the ‘re-test’ 1.5T dataset, as well as the cross-platform 1.5T dataset and the cross-field-strength 3.0T dataset were remarkably consistent. When these datasets were analyzed independently, the spatial locations of cortical thickness correlates of delayed free recall performance were quite similar (based on visual inspection) to those described above, as shown in Figure 1. When these datasets were analyzed using an ROI-based approach, using the medial temporal cortical ROI derived from the ‘test’ dataset, strong correlations were found across the four datasets, with nearly identical slopes. The correlation derived from the 3.0T dataset demonstrated a bias toward slightly larger estimates of cortical thickness, but a nearly identical correlative effect with respect to delayed free recall performance data (slope of regression line) was observed in comparison to the three 1.5T datasets, as shown in Figure 2.
We also investigated the paracentral/cingulate sulcus region in which a consistent inverse relationship was found, using the same ROI approach described above (ROI defined in ‘test’ dataset and applied to additional datasets. The results were again very reliable across datasets in terms of slope of correlation, but the 3.0T data demonstrated a bias toward slightly larger estimates of cortical thickness (see Supplemental Figure 1).
Results of the statistical analysis of cortical thickness predictors of TMT Part B performance in the ‘test’ 1.5T dataset demonstrated, in the left hemisphere, that the thickness of a region of the lateral parietal cortex, localized rostrally within the intraparietal sulcus, was inversely correlated with TMT B performance speed. These results are illustrated in Figure 3.
Results of similar statistical analyses in the ‘re-test’ 1.5T dataset, as well as the cross-platform 1.5T dataset and the cross-field-strength 3.0T dataset were again strikingly consistent. When these datasets were treated independently and analyzed as described above, the spatial location of the cortical thickness correlate of TMT-B performance was quite similar to that described above, in the same region of the lateral parietal cortex, as shown in Figure 3. When these datasets were analyzed using an ROI-based approach, using the parietal cortical ROI derived from the ‘test’ dataset, strong correlations were found across the four datasets, again with nearly identical slopes. The correlation derived from the 3.0T dataset again demonstrated a bias toward slightly larger estimates of cortical thickness, but a nearly identical correlative effect with respect to TMT-B performance data was observed in comparison to the three 1.5T datasets, as shown in Figure 4.
In addition, an exploratory analysis was run to investigate the possibility that there may be a scan session * task performance interaction with respect to cortical thickness. This analysis was run in an exploratory fashion across the entire cortical surface within the framework of a general linear model with scan session as a class variable and task performance as a continuous variable (this analysis was run twice, once for each cognitive task variable). The contrast of interest focused on a class * performance interaction, asking the question of whether there may be a difference in the correlation slopes between performance and cortical thickness as a function of one or more of the scan sessions. The results of this analysis indicated that there were no regions showing such an effect (p<0.01, two-tailed).
In this study, we demonstrate that an automated method for the measurement of cortical thickness from MRI data is remarkably reliable for the detection and quantification of regional cortical thickness correlates of cognitive performance in normal older adults. In a small sample of subjects, dissociated effects were detected between the performance of two different cognitive tests and cortical thickness in two different brain regions. Verbal memory performance was associated with left medial temporal cortical thickness, while visuomotor speed/set-shifting was associated with thickness of a region of the lateral parietal cortex. These effects were highly reliable—not only in terms of spatial localization but also magnitude of absolute cortical thickness measurements—across four different scan sessions, including 1.5T test and re-test sessions, a 1.5T session at a different site using a scanner made by a different manufacturer, and a session on a higher-field strength (3.0T) scanner, each of which was separated by a two-week interval.
With respect to the localized regions of the cerebral cortex where thickness correlates with cognitive performance, the findings of the present study are consistent with previous data. Delayed recall of recently learned information depends critically on the integrity of the medial temporal lobe memory system (Squire et al., 2004). The rostral medial temporal cortex, encompassing entorhinal and perirhinal cortices, is known from neurophysiologic studies to be important for delayed memory performance in rats (Young et al., 1997), monkeys (Suzuki et al., 1997), and humans (Fried et al., 1997). Although some volumetric imaging studies of entorhinal cortex suggest that it is important for immediate recall (De Toledo-Morrell et al., 2000), others demonstrate its importance in delayed recall (Rodrigue and Raz, 2004; Dickerson et al., 2005). The TMT is usually thought to be a test in which performance is compromised by frontal cortical lesions, particularly those in the dorsolateral prefrontal cortex (Stuss et al., 2001b). Yet functional imaging studies have shown that performance of this task recruits parietal cortex (Asari et al., 2005), likely because it involves visuospatial attentional processing and set-shifting. A functional MRI study of an analogue of the TMT reported that this task engages, in addition to dorsolateral prefrontal and supplementary motor cortex, cortical regions within the lateral parietal cortex in the intraparietal sulci (Moll et al., 2002).
Manual-operator generated ROI measurements from MRI data have been used in hypothesis-driven studies to identify specific relationships between the volume of particular cortical regions and neuropsychological test performance measures. For example, in patients with AD, delayed verbal free recall was best predicted by left hippocampal volume, while delayed spatial free recall was best predicted by right hippocampal volume (de Toledo-Morrell et al., 2000). In a mixed group of AD, frontotemporal dementia, and semantic dementia patients, hippocampal volume was the best predictor of delayed recall while frontal cortical volume was the best predictor of semantic clustering (Kramer et al., 2005). However, because of their labor-intensive nature, manual ROI-based approaches generally lend themselves best to studies in which a priori hypotheses have enabled the selection of particular brain regions for measurement. For example, in both of the aforementioned studies, entorhinal volume was not measured, so it is unclear whether relationships would have been present with memory variables.
In contrast, voxel-based morphometry (VBM) techniques have been used in exploratory whole-brain analyses to identify regionally-specific relationships between cortical grey matter density and neuropsychological test performance (Gale et al., 2005). Yet VBM involves the voxel-based transformation and smoothing of individual MRI data into common coordinate space, which may remove the precise features of interest for studies of cortical anatomy (because transformations are performed based on voxel intensity without regard to gyral or sulcal anatomic features of the cortex). Furthermore, it is difficult to interpret the grey matter density measure employed by VBM with respect to measurable biologic properties of brain tissue. Thus, it may be useful to interpret the results of an exploratory VBM analysis in terms of the generation of a hypothesis, which then could be tested to identify the particular biologic property of brain tissue that accounts for the effect (e.g., volume, thickness, or surface area of a particular neuroanatomic structure).
The automated tools used in the present study enable both efficient exploratory whole-cortex analyses for the generation of hypotheses, as well as the derivation of ROIs for use as a priori ROIs in the subsequent testing of these hypotheses in, for example, a separate subject sample. The measure is a morphometric property of the cortex--thickness--that is interpretable in an individual subject and may relate, at least in part, to neuronal or synaptic numbers within the cortex (Gomez-Isla et al., 1996; Regeur, 2000). The present analysis is limited, however, to the cerebral cortex, and a separate but related set of automated tools is required to generate measurements of the hippocampal formation, amygdala, basal ganglia, and other subcortical structures (Fischl et al., 2002b). Another important limitation of this study is the small sample size. Because of this, the particular cortical regions identified in this study should be interpreted cautiously and subjected to replication in larger samples; they are highlighted here primarily to illustrate reliability of thickness measures. Furthmore, it is not clear why there may be inverse correlations between performance and thickness, as was observed in the paracentral/cingulate sulcus region in which thinner cortex was associated with better performance on CVLT. Issues such as these deserve further investigation in larger samples of subjects.
Given the growth of multicenter imaging studies that seek to identify quantitative measures of brain structure or function that relate to disease (Jack et al., 2003; Mueller et al., 2005; Murphy et al., 2006; Belmonte et al., 2007), it is surprising that there has been fairly little study of the influence of MRI instrument-related factors on the reliability of such putative imaging biomarkers (Han et al., 2006; Jovicich et al., 2006). Since measures of cortical thinning are sensitive (Lerch et al., 2006) and reasonably specific (Du et al., 2007), at least in the context of particular neurodegenerative diseases, these measures are a promising candidate MRI imaging biomarker. Knowledge of the degree to which different MRI instrument-related factors—such as field strength, scanner manufacturer, and scanner software and hardware upgrades—affect the reliability of cortical thickness measures is essential for the interpretation of these measures in basic and clinical neuroscientific studies. This knowledge is critical if cortical thickness measures are to find applications as biomarkers in clinical trials of putative treatments for neurodegenerative or other neuropsychiatric diseases. Reliability of putative MRI biomarkers should be assessed not only using statistical reliability measures (Han et al., 2006), such as intraclass correlation, but also by determining how well “real world” effects of interest can be detected. The present study indicates that automated measures of cortical thickness are highly reliable within scanner systems and across manufacturers and field strengths for the localization and quantification of cortical thickness correlates of cognitive test performance. Further study of the influence of instrument-related factors on quantitative MRI-derived measures of brain anatomy will be critical if these measures are to be successfully translated into useful imaging biomarkers of disease or performance ability (Dickerson and Sperling, 2005).
Supplemental figure. Reliability of measurements of absolute magnitude of cortical thickness within left paracentral/cingulate sulcus cortical region of interest (ROI). For this analysis, the ROI was derived from the test dataset (shown on top left left of Figure 1), which was applied to extract mean thickness from each of the four datasets. The correlations between these ROI cortical thickness measures (y axis) and delayed free recall (x axis) are all highly significant (p < 0.001) and have remarkably similar slopes. 3.0T Siemens (blue) estimates of thickness are slightly higher than those of Test Siemens 1.5T (red), Re-test Siemens 1.5T (green), and GE 1.5T (magenta).
This study was supported by grants from the NIA (K23-AG22509), the NCRR (P41-RR14075 R01 RR16594-01A1, the NCRR BIRN Morphometric Project BIRN002, U24 RR021382 & U24-RR021382), and the Mental Illness and Neuroscience Discovery (MIND) Institute. Additional support was provided by the National Institute for Biomedical Imaging and Bioengineering (R01 EB001550), the National Institute for Neurological Disorders and Stroke (R01 NS052585-01) and the National Alliance for Medical Image Computing (NAMIC), funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant U54 EB005149.
We thank Jeanette Gunther and the staff of the MGH Gerontology Research Unit, as well as Mary Foley and Larry White for their technical assistance with data acquisition. We express special appreciation to the participants in this study for their valuable contributions, without which this research would not have been possible.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.