Our data-set included a total of 6 scanners and 136 subjects scanned over a 10-year period. None of the subjects were scanned more than once. All scans were done on the same platform, General Electric Signa 1.5 T scanners (slice thickness 1.6, matrix dimensions 256 × 192). There were minor variations in the TR, TE, and flip angle (see ). Scanners also underwent upgrades over time. Importantly, all scanners were monitored with daily phantom quality checks which calibrated the gradients to within ± 1 mm over a 200-mm volume centered at iso-center, monitored signal to noise and radio frequency (RF) transmit gain. The major hardware elements (body resonance module gradient coil and birdcage head transmit–receive volume coil) were unchanged throughout time and across scanners, except that for the oldest scanner (scanner 2), the wiring in the resistive shim set was not cooled to super conducting temperatures, whereas for the other 5 scanners, the shim coils were inside the dewar and were cooled to the superconducting range.
Table 1 Scanner and subject demographics ()
We used voxel-based morphometry (VBM) to evaluate the interaction of scanner and grey matter segmented modulated images for the 6 different scanners. VBM has the advantage of assessing the whole brain and not being biased to one particular region or structure. It entails a voxel-wise comparison of local volume of grey matter between groups after the images are spatially normalized into the same space, segmented, modulated, and smoothed. Voxel-wise statistical parametric maps result from statistically thresholded contrasts after corrections for multiple comparisons (Ashburner and Friston, 2000
) using false discovery rate (FDR) (Benjamini and Hochberg, 1995
Images were visually inspected for artifacts or structural abnormalities unrelated to AD. They were firstly segmented into white (WM) and grey matter (GM) using SPM5 (Wellcome Trust Centre for Neuroimaging, Institute of Neurology, UCL, London UK – http://www.fil.ion.ucl.ac.uk/spm
). Then, WM and GM segments were further normalized to a population template generated from the complete image set using a diffeomorphic registration algorithm. This non-linear warping technique minimizes structural variation between subjects (Ashburner, 2007
). For comparison, we also repeated the analysis using the more widely used standard SPM5 segmentation code (Ashburner and Friston, 2005
) instead of the diffeomorphic registration algorithm. Resolution before normalization was − .9, .9, 1.6 and after normalization was − 1.5, 1.5, 1.5 for the diffeomorphic registration algorithm and − 2.0, 2.0, 2.0 with the standard SPM5 procedure. A separate ‘modulation’ step (Ashburner and Friston, 2000
) was used to ensure that the overall amount of each tissue class was not altered by the spatial normalization procedure. Modulation was performed by multiplying the warped tissue probability maps by the Jacobian determinant of the warp on a voxel-by-voxel basis, which represents the relative volume ratio before and after warping, thus allowing voxel intensities in the segmented grey matter map, together with the size of the voxels, to reflect regional volume and preserve total grey matter volume from before the warp. Modulated grey matter scans were smoothed using a 6-mm full-width at half-maximum Gaussian kernel.
The smoothed grey matter images were analyzed in a factorial design, with the 6 different scanners as one factor (SCANNER) with 6 levels and the presence of AD (GROUP) as the second factor with two levels (present and absent). Age, gender, and total intracranial volume were entered as covariates (). We performed F-tests correcting for multiple comparisons across the brain (FDR correction). The degrees of freedom was 121 for all comparisons. The effect of group revealed strong effects (p < 0.001) in the left medial temporal lobe (− 29, − 24, − 9 [x, y, z]; F = 100.34) and right medial temporal lobe (30, − 27, − 9 [x, y, z]; F = 91.44) (). The effect of scanner showed significant differences (p < 0.05) in the right (9, − 30, 0 [x, y, z]; F = 11.93) and left (− 9, − 11, 8 [x, y, z]; F = 9.69) thalami (), but this effect was less than the effect of group. T-tests contrasting each scanner against the others revealed the scanner effect in the thalamus was mostly due to the oldest scanner (scanner 2). One newer scanner (scanner 6) revealed an effect that did not survive FDR correction in the right thalamus (8, − 5, − 3 [x, y, z]; T = 4.03), whereas the oldest scanner contrasted with the others revealed a significant effect (p < 0.05) in the left thalamus (− 9, − 11, 6 [x, y, z]; T = 5.94). Despite the effect of scanner, there was no significant interaction of scanner with group, the highest Z-score being 3.82 with a corrected p value of 0.942. We performed F-tests for each possible combination of scanners and found no significant interaction with any of these groupings. When using the standard VBM procedure implemented in SPM5 for normalization and segmentation, the results of the various contrasts followed the same patterns as when using the diffeomorphic normalization/segmentation procedure. We failed to find an interaction of scanner with group; the highest Z-score was 3.78 with an FDR corrected p value of 0.954. The effect of scanner showed significant differences (p < 0.05) in the left thalamus (− 10, − 10, 6 [x, y, z]; F = 9.49) and the right thalamus (8, − 28, − 4 [x, y, z]; F = 9.41). The effect of group also showed strong effects (p < 0.001) in the left medial temporal lobe (− 26, − 14, − 20 [x, y, z]; F = 101.46) and right medial temporal lobe (26, − 8, − 18 [x, y, z]; F = 77.98).
Design matrix. Six scanners are separated by patients and controls, i.e., scanner 1 normals, scanner 1 AD, scanner 2 normals, scanner 2 AD, etc. Nuisance covariates include age, sex, and total intracranial volume (TIV).
F-tests showing effect of disease group (AD vs. controls irrespective of scanner), FDR threshold of p < 0.001. Images are overlaid on a group average. Colour bar reflects the F-values.
F-test showing effect of scanner (irrespective of cases or controls), FDR threshold of p < 0.05. Images are overlaid on a group average. Colour bar reflects the F-values.
Importantly, we found no significant scanner effects in the medial temporal lobe cluster (− 29,− 24, − 9 [x, y, z]; F = 0.01) and the disease effect size in the thalamus was minimal compared to the effect size in the medial temporal lobe (9, − 30, 0 [x, y, z]; F = 1.06), suggesting minimal scanner effects in the areas that are most affected by AD and minimal disease effects in the areas showing scanner differences. Confidence intervals, which are reflective of the standard deviations, for the contrast estimates are shown in . At the voxel of greatest effect of group, the confidence interval is small relative to the effect size for the main effect of group. The opposite is true for the effect of scanner at the area of greatest disease, which is further evidence of lack of effect. For the main effect of scanner, confidence intervals are similar between the different scanner contrasts (see A), indirectly suggestive of relatively little variance across scanners.
Fig. 4 Contrast estimates and 90% confidence intervals for: (A) Main effect of scanner in thalamus at [9, − 30, 0; x, y, z] for contrasts of scanners 1 and 2; scanners 2 and 3; scanners 3 and 4; scanners 4 and 5; scanners 5 and 6. (B) Main effect (more ...)
Because there were 4 software upgrades, we also analyzed the interaction of software version and disease. Using the same basic design matrix as described for the interaction with scanners, this time the contrasts included cases and controls from each software version, covaried with age, gender and intracranial volume. As with the effect of scanner, there was no significant interaction of software version with group, the highest Z-score being 3.65 with a corrected p value of 0.880.