|Home | About | Journals | Submit | Contact Us | Français|
Longitudinal brain morphometric studies designed for data acquisition at a single MRI field strength can be seriously limited by system replacements from lower to higher field strength. Merging data across field strengths has not been endorsed for a variety of reasons, yet the ability to combine such data would broaden longitudinal investigations. To determine whether structural T1-weighted MRI data acquired across MR field strengths could be merged, parcellations of archival SPGR data acquired in 114 individuals at 1.5T and at 3.0T within 3 weeks of each other were compared. The first set of analyses examined 1) the correspondence between regional tissue volumes derived from data collected at 1.5T and 3.0T and 2) whether there were systematic differences for which a correction factor could be determined and applied to improve measurement agreement. Comparability of regional volume determination at 1.5T and 3.0T was assessed with intraclass correlation (ICC) computed on volumes derived from the automated and unsupervised SRI24 atlas registration and parcellation method. A second set of analyses measured the reliability of the registration and quantification using the same approach on longitudinal data acquired in 69 healthy adults at a single field strength, 1.5T, at an interval <2 years. The mainstay of the analyses was based on the SRI24 method; to examine the potential of merging data across field strengths and across image analysis packages, a secondary set of analyses used FreeSurfer instead of the SRI24 method. For both methods, a regression-based linear correction function significantly improved correspondence. The results indicated high correspondence between most selected cortical, subcortical, and CSF-filled spaces; correspondence was lowest in the globus pallidus, a region rich in iron, which in turn has a considerable field-dependent effect on signal intensity. Thus, the application of a regression-based correction function that improved the correspondence in regional volume estimations argues well for the proposition that selected T1-weighted regional anatomical brain data can be reliably combined across 1.5T and 3.0T field strengths with the application of an appropriate correction procedure.
Over the last two decades, quantitative neuroimaging studies using magnetic resonance imaging (MRI) have been conducted principally at 1.5T field strength. The availability of higher field strength, notably 3.0T, for clinical research purposes at times has replaced 1.5T systems, thereby derailing studies designed to be conducted at a single field strength. Even in centers that have maintained the lower field strength systems along with higher-strength systems, the availability of both poses a dilemma for longitudinal neuroimaging studies, especially those initiated on 1.5T systems: should a study be continued at the lower field strength to ensure longitudinal comparability of signal and quantification; alternatively, should a study change mid-stream to the higher field strength system with higher signal-to-noise and contrast-to-noise ratio per imaging time? Merging data across field strengths, however, has not been endorsed for a variety of reasons, including differences in field strength effects from B1-field homogeneity on various brain tissue types and their locations in the magnetic field. Yet, the ability to combine 1.5T and 3.0T data could expand sample sizes and holds other possibilities for broadening longitudinal investigations of changes in regional brain morphology and tissue integrity associated with normal development, normal aging, disease progression, and spontaneous or therapeutic resolution of disease.
Some MRI studies conducted across field strengths have taken advantage of the differences in relaxivity properties of tissue, especially those related to detection of iron (Duyn, 2011), which occurs with stroke (Srinivasan et al., 2006), some degenerative disorders (Bartzokis et al., 1999; Bartzokis et al., 2007a; Bartzokis et al., 2000), and diffuse axonal injury (Luccichenti et al., 2010), and accrues in selective basal ganglia and brain stem structures with age (Bartzokis et al., 2007b; Pfefferbaum et al., 2009; Pfefferbaum et al., 2010). Signal alteration of iron-laden tissue becomes greater with increasing field strength due to the ferromagnetic property of the iron causing increased T1, T2, and T2* relaxivity. Other comparisons note greater enhancement of a target using contrast agents, for example, to detect brain tumors; an early study showed greater enhancement at higher field strength, in this case 2.0T relative to 0.5T (Chang et al., 1994), and other more recent work corroborates this general finding (Attenberger et al.; Pinker et al., 2008). Other studies indicate conflicting findings related to tissue composition and pathological markers. In comparing the utility of 1.5T compared with 3.0T on clinical reading in detecting lesions in presurgical epilepsy patients, the two reviewing neuroradiologists had higher inter-rater agreement on the 3.0T than 1.5T images but identified more lesions at 1.5T than at 3.0T, and the lesions identified were different between scanners (Zijlmans et al., 2009). Differences noted in these and other studies known to influence tissue signal and border conspicuity (e.g., Bammer et al., 2007; Boss et al., 2007; Fushimi et al., 2007; Stankiewicz et al., 2011; Zhu et al., 2011) militate against combining data across field strengths for equivalent measurement of structural brain volumes unless the differences were detectable, measurable, regular, and therefore amenable to correction.
Several attempts have recently been made to assess the possibility of combining data across 1.5T and 3.0T field strengths. One study focused on measurement of intracranial volume (ICV) because of its utility in normalizing variability in regional volume estimates attributable to ICV differences (Keihaninejad et al., 2010). After correcting for B1-field inhomogeneity and applying a reverse normalization strategy for registration, notable at 3.0T, measurement reliability between magnet strengths was high, despite differences in MR system manufacturers and head coils. Another study focused on comparability of volume measurements of subcortical brain structure, also across 1.5T and 3.0T systems made by different manufacturers and using different head coils. (Goodro et al., 2011). Because different subjects were scanned on the different MR systems, the comparison was based on an outcome measure, which was the similarity of relations between structural volumes and age. The high agreement between volume estimates suggested comparability of the scanners in identifying effects of normal aging on subcortical measures. Cortical thickness examined with 1.5T and 3.0T scanners resulted in highly similar measurements with 3.0T providing slightly higher thickness estimates in the same 15 subjects scanned multiple times (Han et al., 2006). A further analysis of these data focused on regional brain volumes and indicated higher reliability across field strengths within the same manufacturer’s platform than between different manufacturer’s platforms (Jovicich et al., 2009).
In an effort to determine whether structural MRI data acquired at 1.5T and 3.0T, the two most widely available magnetic strengths for clinical and research purposes, could be merged, we assembled archival data acquired at 1.5T and at 3.0T in close temporal proximity in the same individuals to address measurement comparability. Accordingly, the first set of analyses examined 1) the correspondence between measurements derived from data collected at 1.5T and 3.0T and 2) whether systematic differences in volume estimates existed for which a linear correction could be determined and applied to improve measurement agreement. Comparability of regional volume determination at 1.5T and 3.0T was assessed using the automated and unsupervised SRI24 atlas registration and parcellation method, a 3-dimensional structural image quantification approach developed in our laboratory (Rohlfing et al., 2010). To ensure that differences between datasets were not a function of a lack of reliability of the parcellation approach, a second set of analyses measured the reliability of the quantification using the same approach on longitudinal data acquired in healthy adults at a single, field strength, 1.5T, at an interval of less than 2 years. The mainstay of the analyses and presentation herein is based on the SRI24 method. To examine the potential of merging data across field strengths and across image analysis packages, a secondary set of analyses used FreeSurfer, a free, widely-used analysis package based on an “inflation” approach and applied here for automated and unsupervised image parcellation and quantification (Fischl et al., 2004b).
The data reported herein were drawn from ongoing longitudinal studies of brain structure in alcoholism, HIV infection, and normal aging. Descriptions of recruitment, screening, and general demographics appear in previous publications (e.g., Pfefferbaum et al., 2007; Pfefferbaum et al., 2006; Rosenbloom et al., 2007). All subjects had provided written informed consent to participate in these studies, which were approved by the Institutional Review Boards of Stanford University and SRI International. Subjects were included in the analysis based solely on the availability of two scans, one at 1.5T and one at 3.0T, separated by no more than 3 weeks, but independent of age, sex, or diagnosis. The inclusion of such a heterogeneous group provided ample anatomical variability to assess the effects of MR field strength differences, the analysis of which was independent of subject characteristics.
The subjects with data at the two field strengths were 84 men and 30 women, age 47.5±10.2 years (21 to 66 years). The total group was composed of 31 men and 38 women who were deemed healthy controls, 19 men and 6 women with alcohol dependence, 25 men and 8 women with HIV-infection, and 24 men and 6 women with HIV infection and alcoholism. The average (±S.D.) MRI interval for the entire sample was 4.4±5.9 days, range=0 to 21 days. The MRI interval was 2.48 days (0 to 11 days) for the alcoholics and 3.9 days (0 to 21 days) for the HIV/alcoholic group. Recent alcohol consumption was monitored on breathalyzer each scan day, and date of last drink was recorded as part of the lifetime alcohol history and its update. Of the 61 patients with a history of alcoholism (with or without HIV infection), 19 were current drinkers (within the last month) at the time of the first scan, 21 were in early remission (sober for upwards of 1 year), and 22 were in full remission (sober for more than 1 year). No alcohol-dependent participant reported excessive drinking between the scans.
This group comprised 31 men and 38 women, all serving as healthy controls in ongoing studies, who were each scanned twice on the 1.5T system within a 2-year interval. Their average age was 50.2±15.5 years (range=20 to 81 years), and they underwent MRI scanning on average 13.6±4.42 months apart (range=5.8 to 23.2 months).
This set of MRI data was collected on a GE 1.5T Signa Twin whole-body system with a quadrature head coil (General Electric Healthcare, Waukesha, WI). Two coronal structural sequences were used for the analysis: a SPoiled Gradient Recalled Echo (SPGR) sequence (TR=25 ms, TE=5 ms, flip angle=30°, matrix=256×192, thick=2 mm, skip=0 mm, 94 slices) and a dual-echo fast spin echo (FSE) sequence (TR=7500 ms, TE1/2=13.5/108.3 ms, matrix = 256×192, thick=4 mm, skip=0 mm, 47 slices).
This set of MR data was collected on a GE 3.0T Signa whole-body system with an 8-channel phased-array head coil. Data were derived from T1-weighted Inversion-Recovery Prepared SPGR images (TR=7 ms, TE=2.2 ms, TI=300 ms, thick=1.25 mm, skip=0 mm, 124 slices) and dual-echo FSE images (TR=8583 ms, TE1/2=13.5/108.3 ms, thick=2.5 mm, skip=0 mm, 62 slices).
For both 1.5T and 3.0T data, all acquired structural images were first corrected for intensity inhomogeneity by applying a second-order polynomial multiplicative bias field computed via entropy minimization (Likar et al., 2001). The late-echo FSE image was corrected using the bias field computed from the corresponding early-echo image to maintain the ratio of early- and late-echo values at each pixel, which keeps quantities derived from this ratio (e.g., T2) invariant. For each subject and each session, the bias-corrected early-echo FSE image was then registered to the biascorrected SPGR image using intensity-based nonrigid image registration (Rohlfing and Maurer, 2003) (http://nitrc.org/projects/cmtk). The SPGR, early-echo FSE, and late-echo FSE images were each skull stripped using FSL’s Brain Extraction Tool, BET (Smith, 2002). The early- and late-echo brain masks were reformatted into SPGR image space and combined with the SPGR-derived brain mask via label voting (Rohlfing and Maurer, 2005) to form the final SPGR brain mask.
For each subject and each scan session, the skull-stripped SPGR images were registered to the SPGR channel of the SRI24 atlas (Rohlfing et al., 2010) (http://nitrc.org/projects/sri24) via nonrigid image registration (Rohlfing and Maurer, 2003). We chose the SRI24 atlas over other available brain templates (e.g., MNI152) because of its ability to discern detailed anatomical structures, which can thus be unambiguously outlined directly in the atlas images without the need to access the images that were used to create the atlas itself. Cortical and subcortical parcellation maps for all subjects and sessions were obtained by reformatting labels maps defined in SRI24 space directly into SPGR image spaces using the subject-to-atlas coordinate transformations.
All bias-corrected and skull-stripped SPGR images were segmented into three tissue compartments (gray matter, white matter, CSF) using FSL’s FAST tool (Zhang et al., 2001). As tissue priors to both initialize and guide the classification, we used the tissue probability maps provided with the SRI24 atlas, reformatted into subject SPGR space via the same transformations described above for the atlas-based parcellation.
The unprocessed SPGR data for each subject and each scan session were entered into the FreeSurfer image analysis suite for unsupervised cortical surface reconstruction and volumetric segmentation (http://surfer.nmr.mgh.harvard.edu/). The FreeSurfer suite did not make use of the FSE data used for skull stripping in the SRI24 analysis. The technical details of these procedures are described in prior publications by others (Dale et al., 1999; Dale and Sereno, 1993; Fischl and Dale, 2000; Fischl et al., 2001; Fischl et al., 2002; Fischl et al., 2004a; Fischl et al., 1999; Fischl et al., 2004b; Han et al., 2006; Jovicich et al., 2006; Morey et al., 2010; Segonne et al., 2004; Wonderlick et al., 2009). FreeSurfer morphometric procedures have been demonstrated to show good test-retest reliability across scanner manufacturers and across field strengths (Han et al., 2006).
The SRI24 analysis used 24 regions of interest (Figure 1), including cortical, subcortical, and white matter structures as well as cerebrospinal fluid (CSF)-filled spaces, and as was described in our prior work (Pfefferbaum et al., 2011; Pfefferbaum et al., 2006). For each subject and session, gray matter volume was computed for each cortical region, and tissue volume (gray plus white matter) for each subcortical region. Also measured were CSF-filled volumes of the lateral ventricles, third ventricle, and sylvian fissures as well as total ICV and supratentorial volume.
We selected 21 regions from the FreeSurfer atlas with similarly named, although not anatomically identical, brain structures to match the SRI24-parcellated regions (Destrieux et al., 2010).
Comparisons of regional volumes at the two MR field strengths and also between two time points at a single field strength were quantified with Pearson correlations (r) and intraclass correlations (ICC) with 95% confidence intervals. Comparisons of change in ICC before vs. after correction were conducted with t-tests. Statistical analyses were conducted with R (http://www.r-project.org) and Statview.
The scatter of the regional volumes from the identity line (Figure 2) and the ratio of mean 1.5T/3.0T volume (Table 1) indicated that 13 regions were larger at 3.0T than 1.5T by .3 to 15.0% and 11 were smaller at 3.0T than 1.5T by .2 to 19.1%. Bivariate correlations and ICCs (Table 1) were calculated for each of the 24 regional volumes from the 114 subjects scanned at both field strengths to assess the direction of discrepancies between measurement of each region and the comparability of measurements collected at 1.5T and 3.0T. All correlations were highly significant and ranged from r=.652 to .998 (mean r=.895, median r=.887). The ICCs, a preferred test of reliability over simple correlations because of its sensitivity to intercept differences (Shrout, 1998), ranged from .216 for the globus pallidus to .995 for the lateral ventricles (mean ICC=.801; median ICC=.834). According to the categorization of Landis and Koch (1977), 16 were “substantial,” 5 were “moderate,” 2 were “fair,” 1 (globus pallidus) was only “slight.”
To test whether the 1.5T-3.0T differences were due to simple (global) volumetric scaling, we examined the effect of expressing the regional volumes as a percentage of supratentorial volume or as standardized scores regressed on supratentorial volume. This approach did not improve overall ICCs between 1.5T- and 3.0T-derived volumes, and for most regions made them worse.
To compensate for the measurement discrepancies, we derived a linear regression correction function (RCF) for each region. For each of the 25 regional volumes, the linear fit of 3.0T on 1.5T volume was computed and the slope and intercept were used to transform the 3.0T volume.
Application of the RCFs significantly improved the ICCs (Wilcoxon Z=3.615, p=.0003), representing an average ICC improvement of .087. The post-RCF ICCs ranged from .599 to .998 (mean=.888; median=.881). ICCs improved for 17 regions, declined insubstantially for 4 regions, and remained unchanged for 3 regions (Table 1).
To test the robustness and potential utility of the RCFs, we divided the group of 114 subjects into two random samples (a and b) of 57 subjects each. The two samples were similar in age (a=48.7±9.2 years; b=46.5±11.2 years; t(112)=1.01, n.s.) and distributions of sex (χ2=.41, n.s.) and diagnosis (χ2=3.90, n.s.). We calculated regional RCFs for each sample, and then applied the RCFs to the other sample, that is, the RCFs of sample a were applied to sample b and vice versa. For the a–b comparison, the ICC of 19 regions improved and 5 declined, which was an average increase of .079 (Wilcoxon Z=1.947, p=.0515). 21 were in the substantial range (i.e., greater than .80); 2 (putamen and postcentral cortex) were in the moderate range; and the globus pallidus ICC was fair (.585). The b-a comparisons were similar, whereby 20 improved and 4 got worse (t(23)=3.180, p=.0042), mean ICC diff=.072; only the globus pallidus (ICC=.454) failed to reach the substantial ICC range (Table 2).
To explore whether the regression correction factors were disproportionately larger in the patient groups than controls, we used one-way ANOVAs to test for group effects in the signed and absolute RCFs differences between the 1.5T and 3.0T corrected volumes. Family-wise Bonferroni correction for 25 ROIs required p≤.002. For the signed RCFs, group differences emerged for the postcentral cortex (p=.0007), where the correction was greater for the control than either the HIV or alcoholic groups, and for the posterior cingulum (p=.0078), where the correction was greater for the controls than the comorbid group. While statistically significant, the differences were trivial (<1mm3). In no case did the group differ significantly in absolute differences of RCFs.
In an effort to determine whether a general, region-dependent correction factor could provide as robust an adjustment as our RCF, we used two global correction factor approaches. One computed a simple scalar factor based on the ratio of the mean of supratentorial volume of 1.5T divided by the same for 3.0T. The other computed the regression of supratentorial volume of 3.0T on 1.5T and applied the slope and intercept to each ROI of the 3.0T data, with the intercept for each ROI defined as the proportion of the supratentorial volume intercept/mean 1.5T supratentorial volume multiplied by the mean volume of a given ROI. The global ratio-based factor resulted in improved ICCs relative to uncorrected volumes for 7 of 24 ROIs: thalamus, globus pallidus, hippocampus, amygdala, insula, anterior-middle cingulum, and third ventricle. The global regression-based factor resulted in improved ICCs relative to uncorrected volumes for 9 of 24 ROIs: the 7 improved by the ratio-based method plus the postcentral cortex and lateral ventricles. Overall, however, the global regression-based factor diminished the ICCs on average by 6.6% (t(23)=2.391, p=.0254); the simple scale factor diminished the ICCs on average by 4.5% (t(23)=1.850, p=.0772). By contrast, the individual ROI regression correction procedure improved the ICCs in 17 of the 24 ROIs, an average of 8.7% (t(23)=3.770, p=.001); the ICC differences in the remaining 6 ROI were 0 to .03%. These results support the conclusion that the proposed individual ROI RCF approach provided a better 1.5T-3.0T volume adjustment than did the global ICC-based volume adjustment approaches.
We selected 21 regions from the FreeSurfer atlas that represented similar brain structures to the SRI24-parcellated regions. The average and range of 1.5T to 3.0T measurement ICCs and correlations were as follows: median ICC=.763, mean ICC=.722, range= .258 to .997; median r =.877, mean r=.860, range=.542 to .998 (Table 3 and Figure 4).
Application of the RCFs significantly improved all 21 ICCs (Wilcoxon Z=4.015, p=.0001), representing an average ICC improvement of .126. The post-RCF ICCs ranged from .457 to .998 (mean=.847; median=.870).
The second set of analyses measured the reliability of the registration and quantification using the SRI24 parcellation on longitudinal data acquired in 69 healthy control men and women scanned at a single field strength, 1.5T, at an interval of less than 2 years. To minimize systematic age-related changes over the scan interval, we randomly alternated the temporal order of the two acquisitions so that for half the subjects the earlier scan was chosen as the first observation, and for the other half the second scan was the first observation. The correlations between MRI sessions were highly significant for each regional volume and ranged from r=.829 to .996 (mean r=.949; median r=.956). The ICCs ranged from .830 in the putamen to .996 in the lateral ventricles (mean=.949; median=.957) (Figure 5).
The FreeSurfer analysis produced similar, high reliability. The correlations between MRI sessions were highly significant for each regional volume and ranged from r=.811 to .996 (mean r=.923; median r=.921). The ICCs ranged from .801 in the posterior cingulate cortex to .995 in the lateral ventricles (mean=.919; median=.921) (Figure 5).
Using the SRI24 atlas-based parcellation approach for analysis, the lateral ventricles and large white matter sample had exceptionally high ICCs and simple correlations as did the precuneus and lateral frontal cortical regions. Two subcortical structures, the globus pallidus and thalamus, had particularly low ICCs that improved appreciably with application of the regression correction function. With the exception of the globus pallidus (ICC corrected=.599) and the postcentral cortex (ICC corrected=.764), the corrected ICCs were all greater than .847 (.81 is considered “substantial” (Landis and Koch, 1977)).
In addition to the application of the individual ROI correction by RCF, we used two simple scalar approaches for global correction of volume differences between 1.5T and 3.0T measurements because of their ease of use and ready availability. Application of the scalar factor based on the ratio of the mean of supratentorial volume at the two field strengths resulted in improvement of the ICCs in 7 of the 24 ROIs examined and were primarily subcortical regions. The regression of supratentorial volume at the two field strengths resulted in ICC improvement 9 ROIs. Although improvements were clearly obtained with either global correction, more consistent and greater improvements were obtained with the RCF approach. These results suggest that the 1.5T to 3.0T differences are not merely a simple volumetric scalar factor but more likely due to field strength differences in B1 inhomogeneity, tissue type conspicuity and the ability to parcellate the tissue into gray and white matter. While the RCF may be the preferred correction approach of those examined, the global scalar approach provided some improvement in measurement consistency between the field strengths, and for certain ROIs would be a preferred adjustment over no adjustment.
Similar results were obtained for the FreeSurfer analyses. Not all regional regression correction functions were the same nor were the average 1.5T vs. 3.0T mean differences necessarily in the same direction across analysis platforms, that is, between SRI24 and FreeSurfer. This is understandable given the substantial difference in anatomical designation and the difference in quantification approaches. Nonetheless, the improvement in ICCs with regression correction functions to highly acceptable levels using either method supports the contention that data collected at different field strengths can be merged.
We assessed automated morphometry reliability at 1.5T on 69 controls measured within 2 years and randomized the effect of time. Two recent reports assessed the reliability of automated morphometry at 3T over short examination intervals (Morey et al., 2010; Wonderlick et al., 2009). Morey et al. (2010) studied 23 young adults, scanned 1 hour apart on one day and again 1 hour apart 7 to 9 days later using a GE 3.0T EXCITE system with a 3D FSPGR. Of the 7 ROIs common to Morey et al. and our SRI24 results, 2 ROIs (lateral ventricles and hippocampus) had virtually the same ICC, 4 ICCs were higher (thalamus, caudate, putamen, globus pallidus) and one ICC lower (amygdala) in the Morey than our study. Wonderlick et al. (2009) scanned 5 younger (24 year old) and 6 older (64.3 year old) subjects in two identical sessions two weeks apart on a Siemens 3T TIM Trio with four variations of an MPRAGE sequences. Of the 12 ROIs common to Wonderlick et al. and our study, 8 ROIs had higher ICC, 3 ROIs had about the same ICC, and 1 ROI had lower ICCs in the Wonderlick et al. study than our study. Our 1.5T reliability results tended to have lower ICCs when we applied totally unsupervised FreeSurfer analysis.
Differences between our 1.5T reliability results and the 3T reliability of Morey et al. (2010) and Wonderlick et al. (2009) are probably due to processing differences, such as manual editing in the latter, rather than to primary differences in field strength. The principle of the current study was to determine whether a correction factor could be derived from independent observations, with the ultimate aim being the ability to use 1.5T data where 3.0T data was unavailable or vice versa. Thus, we conducted independent parcellations on the two sets of 1.5T data to ascertain reliability and the split-half 1.5T-3.0T data to determine utility. It appears Morey et al. (2010) used the FreeSurfer longitudinal analysis stream, which would likely artificially inflate the correspondence between scans by violating the assumption of independence that underlies valid scan/rescan analyses.
In an effort to examine interdependence of the measured differences across ROIs, we conducted a set of PCAs on 1.5T-3.0T difference scores. An unconstrained PCA yielded 8 factors with Eigenvalues >1, but 4 accounted for trivial variance. A follow-up PCA forced a 4-factor solution, which formed clusters with anatomical similarity: “Anterior Superior Cortex” (Factor 1 accounting for 15.6% of the variance) comprised the lateral frontal precentral, postcentral, parietal, and precuneus ROIs; “Allocortex” (Factor 2 accounting for 13.4% of the variance) comprised the anterior and posterior cingula, insula, hippocampus, corpus callosum, and Sylvian fissure ROIs; “Subcortex” (Factor 3 accounting for 11.3% of the variance) comprised the medial frontal, thalamus, caudate, putamen, centrum semiovale, and lateral and third ventricle ROIs; and “Posterior Inferior Cortex” (Factor 4 accounting for 8.1% of the variance) comprised the temporal, occipital, calcarine, and pallidal ROIs. The factor structure of the last PCA might lead to the speculation that signal intensity inhomogeneities are variable in the brain, with greater in the superior than inferior cortical regions and that variance in subcortical structure measurement could be due to accumulation of iron with age (Bartzokis et al., 2007b; Hallgren and Sourander, 1958; Pfefferbaum et al., 2010).
The 1.5T and 3.0T data that we analyzed for each subject differed in significant ways in addition to magnetic field strength. The 1.5T data were acquired in the coronal plane with a quadrature head coil, whereas the 3.0T data were acquired in the axial plane with an 8-channel phased-array head coil. Well-recognized differences in T1-weighted images at 3.0T in contrast to 1.5T include substantial B1 inhomogeneity (accounted for in part by bias correction) and the greater iron-induced relaxivity shortening at 3.0T relative to 1.5T. The latter, especially in the basal ganglia, may have contributed to the relatively poorer correspondence between the two field strengths as well as the difficulty in segmentation and parcellation of subcortical T1-weighted data; for example, the thalamus and putamen had the lowest within-1.5T scanner reliability. Although we only had available reliability estimates for 1.5T data, the very high ICCs for repeated scans on the same subjects at 1.5T suggest that the measurement methods are adequately reliable to test the field strength differences. The differences in brain regional volumes across the two field strengths were essentially equally distributed in terms of 3.0T estimates as being greater or smaller than 1.5T estimates. Thus, the field strength differences were not merely a scanner calibration or global scaling phenomenon.
The results of the split-half analysis lend support to the use of a regression-based correction function on other data sets for which it is desirable to combine data across field strengths. Different laboratories may, out of necessity, establish their own functions, which would be tailored to their acquisition parameters and quantification approaches. The current analysis was performed on two analysis platforms in an automated and unsupervised fashion. It is common practice to inspect and edit parcellation and segmentation results; for example, FreeSurfer has a large suite of editing tools, and edited data blind to field strength might have produced higher correspondence than observed here.
The intention of the regression approach presented here was to adjust for differences between data collected on different scanner platforms due to the scanning acquisition procedure itself (timing parameters, hardware, and field strength). The fact that some ROIs were larger and others smaller at 3.0T compared with 1.5T indicates that the differences were not due simply to global spatial scaling differences between the scanners. The correction depends on the existence of systematic linear differences in regional volume estimates between acquisitions, but it cannot account for changes in subjects, e.g., changes attributable to development, disease, or aging, between acquisitions. For this reason, the data used in this analysis were restricted to scans collected at 1.5T and 3.0T less than 3 weeks apart to avoid aging and disease influences on the correction factors.
In summary, the results indicate high correspondence between selected cortical, subcortical, and CSF-filled spaces that varied in a linear systematic fashion and were improved by the application of a regression-based correction function. Despite acquisition differences, the high correspondence argues well for the proposition that selected T1-weighted regional anatomical brain data can be reliably combined across 1.5T and 3.0T field strengths with the application of an appropriate correction procedure. A similar approach could also be used for combination of data across any two scanner systems that produced linear systematic differences in structural volume estimates.
Statement on conflict of interest
No author of this manuscript has any conflicts of interest with this work, either financial or otherwise.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.