Test-retest reliability of traditional T1 sequence

The traditional MP-RAGE sequence with anisotropic geometry (1.3×1.0×1.3 mm) served as a “gold standard” against which the new sequences were compared. Intra-class correlation analysis demonstrated a high level of reliability for all measures (). Surface-based measures were particularly reproducible, with all computed ICCs falling above 0.95. Subcortical volumes were typically reliable; only measures of the pallidum fell below 0.9.

| **Table 2**Test-retest intra-class correlation coefficients (95% confidence interval) by sequence |

Surface maps of cortical thickness reliability reveal high local ICC values across most of the cortex (). Areas of relatively low reliability include entorhinal, medial orbitofrontal, lingual, and right rostral middle frontal cortex.

Effects of voxel size

To explore the effects of voxel size on morphometric measures and their reliability, we compared the MP-RAGE sequence with isotropic voxel geometry (1.0×l.0×1.0 mm) against the standard anisotropic MP-RAGE. With respect to session-to-session (test-retest) reliability, voxel size had no obvious effect on most morphometric measures. An exception was measures of pallidum volume, where reliability was low for the isotropic sequence relative to the anisotropic sequence. Surface maps of cortical thickness reliability were similar between the anisotropic and isotropic MP-RAGE sequences ().

Means for cortical measures (both thickness and volume) were higher for the isotropic sequence compared to the anisotropic MP-RAGE (*p* <0.01, Bonferroni corrected). Bias was observed in all cortical areas between these sequences, but was particularly strong in the frontal and parietal lobes (). All values were significant to *p* < 0.01, Bonferroni corrected.

| **Table 3**Mean (SD) cortical thickness difference between isotropic and anisotropic sequences (mm) |

Effects of parallel imaging acceleration

Second, we investigated the effect of conservative parallel imaging acceleration on morphometric measures by comparing an isotropic MP-RAGE sequence acquired with an acceleration factor of two to the anisotropic MP-RAGE. We found minimal effects of acceleration on the reliability of most morphometric measures (). Parietal and cingulate cortical thickness ICCs were noticeably lower in the accelerated sequence compared to the standard anisotropic MP-RAGE, but were still highly reliable. As with the non-accelerated isotropic MP-RAGE, putamen measures were the least reliable of all morphometric measures examined. Surface maps of cortical thickness reliability for the accelerated isotropic MP-RAGE were similar to that of the anisotropic sequence (), with regions of relatively low reliability in entorhinal, lingual, and right rostral middle frontal cortex. In contrast to the anisotropic MP-RAGE, however, reliability was also low for insular cortex, cuneus, pericalcarine cortex, and superior parietal cortex.

With respect to measurement bias, cortical thickness means were significantly higher for the accelerated isotropic MP-RAGE compared to the anisotropic MP-RAGE ().

Effects of multiecho sequence

Third, we compared the reliability of measures obtained from a high bandwidth multiecho MP-RAGE sequence to that of the relatively low bandwidth single-echo anisotropic MP-RAGE. Reliability for multiecho morphometric measures was generally high and comparable to anisotropic MP-RAGE measures (). In particular, pallidum volume measures were more consistent than such measures from other isotropic sequences. Surface maps of cortical thickness reliability for the accelerated multiecho MP-RAGE were similar to those of the standard anisotropic sequence (), with regions of relatively low reliability in entorhinal, lingual, medial orbitofrontal, and right rostral middle frontal cortex. Additionally, the multiecho sequence was less reliable than the anisotropic sequence for insular cortex and cuneus thickness.

As with other isotropic sequences, we found a significant bias towards higher cortical thickness measures using the multiecho MP-RAGE compared to the anisotropic MP-RAGE. This bias was consistent across the cortex and was particularly strong in frontal, parietal, and temporal regions ().

Voxel-wise maps of mean cortical thickness differences show the measurement bias between isotropic and anisotropic sequences to be pervasive across the cortex (). This bias is most evident between the anisotropic and multiecho MP-RAGE sequences, especially in frontal regions. Between isotropic sequences, cortical thickness differences are more limited with no perceptible bias towards one sequence or another.

Comprehensive statistical analysis

For cortical thickness measures, the percentage variance explained by differences between subject means is near or above 90% for all measures, indicating a high degree of overall reliability (). Differences between sequence means was the next largest contributor to variance behind that of subjects. Subject × sequence interactions also contributed some variance, particularly for the cingulate. The proportion of variance accounted for by mean differences between scanning sessions, an aspect of test-retest reliability, was below 0.01% of total variance for all measures. Other terms contributed little to negligible variance (below 2%).

Consistent with the paired comparisons of sequences above, as well as the percentage variance accounted for by mean sequence differences, repeated measures ANOVAs revealed an effect of sequence for all cortical thickness measures. Bonferroni-corrected post hoc contrasts of mean differences found that the anisotropic MP-RAGE measures were lower in all cortical regions compared to all isotropic MP-RAGE measures (). In addition, the high-bandwidth multiecho MP-RAGE produced higher measures than other isotropic sequences in frontal cortex (*p* <0.05, corrected) and lower measures in occipital cortex compared to the non-accelerated isotropic MP-RAGE (*p* <0.01, corrected). The accelerated single-echo MP-RAGE produced lower measures of temporal cortical thickness than other isotropic sequences (*p* <0.01, corrected). The main effect of scanning session and the interaction between session × sequence were not significant in any cortical region.

WM volume, with greater than 99.8% of variance explained by between-subject mean differences, was the most reliable measure examined in this study (). As with measures of cortical thickness, GM volume was significantly lower for the anisotropic MP-RAGE compared to all isotropic sequences (*p* <0.001, corrected). Between-subject differences accounted for greater than 97% of total variance in GM volume measures, however, with differences between sequences contributing only 2.5% to total variance. The main effect of scanning session and the interaction between session × sequence were not significant for either WM or GM volume measures.

Subcortical volume measures were generally less precise than cortical or WM measures, although precision varied greatly by structure. Measures of the caudate and thalamus were among the most reliable with between-subject differences accounting for more than 98% and 95% of total variance, respectively (). Measures of the pallidum were the least reliable. The proportion of variance attributed to between-session mean differences was below 0.6% for all measures.

Repeated measures ANOVAs revealed a significant effect of sequence for all structures except the pallidum (). Measures of amygdalar and hippocampal volume were lower for the anisotropic MP-RAGE compared to all other sequences (*p* <0.05, Bonferroni corrected). In contrast, measures of amygdalar and hippocampal volume tended to be higher for the multiecho MP-RAGE compared to all other sequences, although this difference was only significant when comparing the multiecho MP-RAGE to non-accelerated (MPR, ISO) sequences (*p* <0.01, Bonferroni corrected). Measures of thalamus volume were also lower for the anisotropic MP-RAGE compared to both single echo isotropic (ISO, 1SH) sequences (*p* <0.05, Bonferroni corrected), while the multiecho sequence produced lower measures of caudate volume compared to the accelerated isotropic single-echo MP-RAGE (*p* <0.01, Bonferroni corrected). No main effect of session or interaction between session × sequence was significant among subcortical volume measures.

No mean differences were sufficient to reach the minimal threshold required to achieve a power of 0.9. Thus, it is impossible to reasonably conclude that non-significant results were not susceptible to type 11 error, and it is therefore possible that further measurement bias exists between sequences that is undetectable given our sample size. It is possible, however, to determine through power analysis the minimum difference at which reasonable power (1-β>0.9) remains for each measure of interest, and therefore establish a practical threshold of reliability for each structure (). For measures of cortical thickness, most differences greater than 2% of the mean can be reliably detected given our relatively small sample size; volumetric difference thresholds vary greatly by structure and sequence, from 0.5–1% for surfaced-based measures of white matter volume to 16.6% for the segmented pallidum.

| **Table 4**Minimum detectable difference between groups by sequence; α = 0.5, 1-β = 0.9, two-tailed |

A subsequent repeated measures ANCOVA introduced sex as an additional between-subjects factor and the linear effect of age as a covariate. After Bonferroni correction for multiple comparisons, putamen and pallidum volumes were found to be significantly and negatively correlated with age. No significant main effect of sex or significant interaction with sex or age was observed.