Using the SRI24 atlas-based parcellation approach for analysis, the lateral ventricles and large white matter sample had exceptionally high ICCs and simple correlations as did the precuneus and lateral frontal cortical regions. Two subcortical structures, the globus pallidus and thalamus, had particularly low ICCs that improved appreciably with application of the regression correction function. With the exception of the globus pallidus (ICC corrected=.599) and the postcentral cortex (ICC corrected=.764), the corrected ICCs were all greater than .847 (.81 is considered “substantial” (Landis and Koch, 1977
In addition to the application of the individual ROI correction by RCF, we used two simple scalar approaches for global correction of volume differences between 1.5T and 3.0T measurements because of their ease of use and ready availability. Application of the scalar factor based on the ratio of the mean of supratentorial volume at the two field strengths resulted in improvement of the ICCs in 7 of the 24 ROIs examined and were primarily subcortical regions. The regression of supratentorial volume at the two field strengths resulted in ICC improvement 9 ROIs. Although improvements were clearly obtained with either global correction, more consistent and greater improvements were obtained with the RCF approach. These results suggest that the 1.5T to 3.0T differences are not merely a simple volumetric scalar factor but more likely due to field strength differences in B1 inhomogeneity, tissue type conspicuity and the ability to parcellate the tissue into gray and white matter. While the RCF may be the preferred correction approach of those examined, the global scalar approach provided some improvement in measurement consistency between the field strengths, and for certain ROIs would be a preferred adjustment over no adjustment.
Similar results were obtained for the FreeSurfer analyses. Not all regional regression correction functions were the same nor were the average 1.5T vs. 3.0T mean differences necessarily in the same direction across analysis platforms, that is, between SRI24 and FreeSurfer. This is understandable given the substantial difference in anatomical designation and the difference in quantification approaches. Nonetheless, the improvement in ICCs with regression correction functions to highly acceptable levels using either method supports the contention that data collected at different field strengths can be merged.
We assessed automated morphometry reliability at 1.5T on 69 controls measured within 2 years and randomized the effect of time. Two recent reports assessed the reliability of automated morphometry at 3T over short examination intervals (Morey et al., 2010
; Wonderlick et al., 2009
). Morey et al. (2010)
studied 23 young adults, scanned 1 hour apart on one day and again 1 hour apart 7 to 9 days later using a GE 3.0T EXCITE system with a 3D FSPGR. Of the 7 ROIs common to Morey et al. and our SRI24 results, 2 ROIs (lateral ventricles and hippocampus) had virtually the same ICC, 4 ICCs were higher (thalamus, caudate, putamen, globus pallidus) and one ICC lower (amygdala) in the Morey than our study. Wonderlick et al. (2009)
scanned 5 younger (24 year old) and 6 older (64.3 year old) subjects in two identical sessions two weeks apart on a Siemens 3T TIM Trio with four variations of an MPRAGE sequences. Of the 12 ROIs common to Wonderlick et al. and our study, 8 ROIs had higher ICC, 3 ROIs had about the same ICC, and 1 ROI had lower ICCs in the Wonderlick et al. study than our study. Our 1.5T reliability results tended to have lower ICCs when we applied totally unsupervised FreeSurfer analysis.
Differences between our 1.5T reliability results and the 3T reliability of Morey et al. (2010)
and Wonderlick et al. (2009)
are probably due to processing differences, such as manual editing in the latter, rather than to primary differences in field strength. The principle of the current study was to determine whether a correction factor could be derived from independent observations, with the ultimate aim being the ability to use 1.5T data where 3.0T data was unavailable or vice versa. Thus, we conducted independent parcellations on the two sets of 1.5T data to ascertain reliability and the split-half 1.5T-3.0T data to determine utility. It appears Morey et al. (2010)
used the FreeSurfer longitudinal analysis stream, which would likely artificially inflate the correspondence between scans by violating the assumption of independence that underlies valid scan/rescan analyses.
In an effort to examine interdependence of the measured differences across ROIs, we conducted a set of PCAs on 1.5T-3.0T difference scores. An unconstrained PCA yielded 8 factors with Eigenvalues >1, but 4 accounted for trivial variance. A follow-up PCA forced a 4-factor solution, which formed clusters with anatomical similarity: “Anterior Superior Cortex” (Factor 1 accounting for 15.6% of the variance) comprised the lateral frontal precentral, postcentral, parietal, and precuneus ROIs; “Allocortex” (Factor 2 accounting for 13.4% of the variance) comprised the anterior and posterior cingula, insula, hippocampus, corpus callosum, and Sylvian fissure ROIs; “Subcortex” (Factor 3 accounting for 11.3% of the variance) comprised the medial frontal, thalamus, caudate, putamen, centrum semiovale, and lateral and third ventricle ROIs; and “Posterior Inferior Cortex” (Factor 4 accounting for 8.1% of the variance) comprised the temporal, occipital, calcarine, and pallidal ROIs. The factor structure of the last PCA might lead to the speculation that signal intensity inhomogeneities are variable in the brain, with greater in the superior than inferior cortical regions and that variance in subcortical structure measurement could be due to accumulation of iron with age (Bartzokis et al., 2007b
; Hallgren and Sourander, 1958
; Pfefferbaum et al., 2010
The 1.5T and 3.0T data that we analyzed for each subject differed in significant ways in addition to magnetic field strength. The 1.5T data were acquired in the coronal plane with a quadrature head coil, whereas the 3.0T data were acquired in the axial plane with an 8-channel phased-array head coil. Well-recognized differences in T1-weighted images at 3.0T in contrast to 1.5T include substantial B1 inhomogeneity (accounted for in part by bias correction) and the greater iron-induced relaxivity shortening at 3.0T relative to 1.5T. The latter, especially in the basal ganglia, may have contributed to the relatively poorer correspondence between the two field strengths as well as the difficulty in segmentation and parcellation of subcortical T1-weighted data; for example, the thalamus and putamen had the lowest within-1.5T scanner reliability. Although we only had available reliability estimates for 1.5T data, the very high ICCs for repeated scans on the same subjects at 1.5T suggest that the measurement methods are adequately reliable to test the field strength differences. The differences in brain regional volumes across the two field strengths were essentially equally distributed in terms of 3.0T estimates as being greater or smaller than 1.5T estimates. Thus, the field strength differences were not merely a scanner calibration or global scaling phenomenon.
The results of the split-half analysis lend support to the use of a regression-based correction function on other data sets for which it is desirable to combine data across field strengths. Different laboratories may, out of necessity, establish their own functions, which would be tailored to their acquisition parameters and quantification approaches. The current analysis was performed on two analysis platforms in an automated and unsupervised fashion. It is common practice to inspect and edit parcellation and segmentation results; for example, FreeSurfer has a large suite of editing tools, and edited data blind to field strength might have produced higher correspondence than observed here.
The intention of the regression approach presented here was to adjust for differences between data collected on different scanner platforms due to the scanning acquisition procedure itself (timing parameters, hardware, and field strength). The fact that some ROIs were larger and others smaller at 3.0T compared with 1.5T indicates that the differences were not due simply to global spatial scaling differences between the scanners. The correction depends on the existence of systematic linear differences in regional volume estimates between acquisitions, but it cannot account for changes in subjects, e.g., changes attributable to development, disease, or aging, between acquisitions. For this reason, the data used in this analysis were restricted to scans collected at 1.5T and 3.0T less than 3 weeks apart to avoid aging and disease influences on the correction factors.
In summary, the results indicate high correspondence between selected cortical, subcortical, and CSF-filled spaces that varied in a linear systematic fashion and were improved by the application of a regression-based correction function. Despite acquisition differences, the high correspondence argues well for the proposition that selected T1-weighted regional anatomical brain data can be reliably combined across 1.5T and 3.0T field strengths with the application of an appropriate correction procedure. A similar approach could also be used for combination of data across any two scanner systems that produced linear systematic differences in structural volume estimates.