Manual test–retest reliability
The average relative volume difference from first to second manual measurement of ICV in the five subjects was − 0.4% ± 0.76 (range, − 1.2 − 0.83%) for the images obtained from the 1.5T scanner and − 0.3% ± 0.5 (range − 0.2 − 0.6%) for those scanned with the 3T scanner. Average DC were 0.94 ± 0.01 and 0.96 ± 0.02 for subjects scanned with the 1.5T and 3T scanner, respectively. The ICVs obtained from the 3T scanner were 0.74 ± 0.3% larger than those obtained from 1.5T scanner (paired t-test, p < 0.001). This implies that they are systematically slightly different even for manual segmentation, but this difference is well within the likely calibration error.
We performed four different analyses of accuracy based on the five subjects in Group 1:
1. Relative volume differences and its magnitude between manually and automatically calculated ICV: As described in 2.7 the relative volume differences and its magnitude between manual measurement and automated ICV measurements were calculated (, and ), where, in Eq. 1, V2 and V1 are ICV_MANUAL and ICV measured with automated methods (ICV_BET, ICV_SPMA, ICV_SPMB and ICV_RBM) for each of the bias correction methods (FAST, SPM and N3), respectively. In addition, the magnitude of the relative volume differences were calculated as a measure of robustness.
Relative (DIFF), magnitude (ADIFF) volume difference, DC (Dice coefficient) and ICC using FAST bias correction along with four ICV measurement methods. Positive differences mean that the manual estimation was larger.
Relative (DIFF), magnitude (ADIFF) volume difference, DC (Dice coefficient) and ICC using SPM bias correction along with four ICV measurement methods. Positive differences mean that the manual estimation was larger.
Relative (DIFF), magnitude (ADIFF), DC (Dice coefficient) and ICC using N3 bias correction along with four ICV measurement methods. Positive differences mean that the manual estimation was larger.
2. Overlap between ICV_MANUAL and the other automated methods: Overlap measures expressed as Dice coefficients (, and ).
3. Correlation between manually and automatically calculated ICV (ICC): , and show the degree of correlation between ICV_MANUAL and the automated methods.
4. Association between scanner field strength and method accuracy: The measurement of the total false-positive and false-negative error shows that the SPM-tissue class method and BET resulted in more negative error than RBM on the images obtained from the 3T scanner ( (d,f)), and the SPM-tissue class method had more positive error than RBM in images obtained at 1.5T (, ).
Fig. 6 Average positive and negative error by slice (n = 12) between manually segmented ICV and BET, SPM-tissue class, and RBM method results (Slice 19 is the left slice in the brain, Slice 129 is the top) in five subjects scanned at 1.5T and (more ...)
shows that FAST+SPM (A and B) in comparison with FAST+BET or FAST+RBM does not perform well and the resulting ICVs are smaller than the gold standard in images obtained from the 3T scanner. Both FAST+RBM and FAST+BET perform well on 3T images. However, the performance of FAST+BET is not as good as FAST+RBM in the case of the 1.5T scanner.
When SPM is used for bias correction, it can be seen that in the 3T scanner SPMB performs better than SPMA in computing ICV, while in the 1.5T scanner SPMA yields better results than SPMB (). The results obtained by BET and RBM methods in both scanners are comparable, but the relative volume difference and its magnitude is smaller for RBM than for BET in the 1.5T scanner.
When N3 is used for bias correction, ICV results on 1.5T images are too large, while ICV results on 3T images are closer to the manual reference. shows that the N3+RBM pipeline performs nominally better than any other pipeline.
, and show that RBM is the most robust and accurate method for the ICV measurement for images obtained from scanners with different field strengths, regardless of which bias correction method is used. Using automated methods (SPM or BET) to estimate ICV yielded acceptable intraclass correlation coefficients between 1.5T and 3T images, but both show a larger systematic bias (relative volume difference) than the RBM method. Both SPMA and SPMB consistently overestimated ICV on 1.5T images and underestimated ICV on images obtained from 3T scanners. When BET was used, this systematic bias was inverted.
The relative volume difference, its magnitude and DC were calculated between manual measurement and automated ICV measurements for two subjects in Group 2, after applying SPM's bias correction method (). confirms that RBM is the most robust and accurate method for ICV measurement even in patients with AD for images obtained from scanners with different field strengths. When different ICV measurement methods were applied to all the subjects in Group 2, RBM was more consistent between different field strengths (ICC = 0.97) in comparison with the other methods, BET (ICC = 0.96), SPMA (ICC = 0.65) and SPMB (ICC = 0.7).
Relative (DIFF), magnitude (ADIFF) volume difference with manual ground truth, DC (Dice coefficient) and ICC using four ICV measurement methods for two subjects with AD. Positive differences mean that the manual estimation was larger.
For further analysis, we used RBM to obtain ICV measurements on images of Group 4. Measurements obtained on 1.5T images tended to be smaller compared to 3T images (relative difference − 0.1 ± 2.5%).
Influence of field strength on ICV measurement
The intensity of intraventricular and cisternal CSF was measured with the sampling method described above (see Section 2.6) in the 10 MR scans of Group 1. The average CSF intensity difference between intraventricular and cisternal CSF was 16 ± 60% in images obtained at 1.5T and 32 ± 54% in images obtained at 3T, with higher intensity in the intraventricular region than in the cisternal region before applying bias correction. Means, standard deviations, and ranges of the intraventricular and cisternal CSF intensity difference were calculated for all images after application of the different non-uniformity correction methods (FAST, SPM5 and N3) (). FAST and SPM bias correction achieved greater uniformity of CSF signal than N3. These demonstrated greater uniformity for corrected 1.5T images (smaller relative difference) than for 3T images.
Descriptive statistics of intra-ventricular and cisternal CSF intensity difference in Group 1 for three bias correction methods. Positive values indicate higher intensity in the intraventricular CSF.
shows that none of the bias correction methods was able to eliminate the problem of CSF intensity in the cisternal region being lower than that in the intra-ventricular region in images obtained at 3T.
Fig. 7 Average CSF intensity of intraventricular and cisternal CSF in five subjects (10 MRIs) obtained at 1.5T and 3T with different bias correction methods. (a) FAST, (b) SPM, (c) N3. Horizontal lines: median; boxes: interquartile ranges; whiskers: range; circle: (more ...)
N3′s performance was poor in reducing the CSF intensity difference between the abovementioned regions as shown in . The imperfect result of N3 in comparison with the other two methods may have been a consequence of applying this method on the full volume instead of a skull-stripped image. SPM and FAST were able to reduce this discrepancy to an acceptable level in 1.5T images, but for 3T images, the methods were also unsuccessful.
Both the intraventricular and cisternal CSF of 3T and 1.5T scanners showed Gaussian distributions, somewhat skewed, which could be substantially normalized by using the log transform. Therefore, log intensity was used for determining the difference between intraventricular and cisternal CSF in images obtained at different field strengths. A Kolmogorov–Smirnov test (KS) indicated that the intensity difference between intraventricular and cisternal CSF measures within 1.5T scanners was normally distributed (p > 0.1), whereas this intensity difference measured at 3T was significantly different from a normal distribution (p < 0.05) ().
Histograms of the subtraction distribution for a single subject scanned with (a) 1.5T and (b) 3T.
It is conceivable that the CSF intensity difference we found could be due to the specific scanners, coils and acquisition parameters used in Group 1. We therefore performed the CSF sampling method on the subjects of Group 3 and Group 4 obtained with different scanners. On average, the CSF intensity in the intraventricular region was 11 ± 9% higher than CSF intensity in the cisternal region after applying SPM bias correction to images obtained at 3T in Group 3, while the observed intensity difference in 1.5T images was smaller, 2 ± 16%. In Group 4, even after the multi-step pre-processing corrections applied as part of the ADNI pre-processing (Gradwarp, B1 correction and using N3), while this difference was not consistent, three out of five subjects showed the abovementioned difference. However, the relative difference was smaller than for subjects in Group 1 and Group 3. For subjects in Group 4, the observed intensity difference in 1.5T images was 2 ± 4% and for images obtained at 3T this difference was 7 ± 5%.
The total CSF volume of the brain, combination of the CSF in the intraventricular and the subarachnoid spaces, and the cisternal CSF volume were estimated with the method described in Section 2.6 in the 10 MR scans of Group 1. The total CSF volume obtained from the 1.5T scanner was 14.3 ± 6.2% larger than those obtained from the 3T scanner. Furthermore, the estimated volume of the cisternal CSF obtained from 1.5T images was 16.8 ± 7.9% larger compared to 3T images. This difference implies systematic differences in ICV measurements, particularly peripherally using established methods. Means and standard deviations, of the total, intraventricular and cisternal CSF volume were calculated for all images ().
Descriptive statistics of total, intra-ventricular and cisternal CSF volumes in Group 1.