In the current study, we considered three 18F-FDG PET global indices (PALZ, HCI, and metaROI) providing objective measures of AD-related hypometabolism, and we compared them both in technical terms and in terms of diagnostic performance on several independent groups of patients at different stages of AD, taken from the 3 largest 18F-FDG PET datasets currently available (ADNI, NEST-DD, and EADC-PET).
Global metrics show differences in complexity, technical requirements, and automation level. Their diagnostic performance considerably changed according to test dataset and disease stage, pointing out that no global index can be defined as the best-performing. For all indices, diagnostic performance improved with increasing disease severity, whereas in MCI due to AD (fast and slow converters), diagnostic performance was not consistent across different datasets.
In the literature, there are few reports of the diagnostic performance of 18
F-FDG PET global metrics in AD patients at different disease stages. PALZ performance was recently assessed in both ADNI and NEST-DD AD patient groups (15
): despite using different normative data-sets to assess specificity (either ADNI or NEST-DD, according to AD patient group), the authors found similar ROC curves and AUCs; to our knowledge, PALZ diagnostic performance in MCI due to AD has never been studied. HCI performance was previously assessed in terms of its ability to distinguish between AD patients, MCI patients who converted to AD, stable MCI patients, and controls and to predict rates of progression from MCI to probable AD (16
). Because in the current paper we used a modified version of HCI, current findings could not reliably be compared with previous ones. Finally, metaROI index performance was previously assessed in terms of sensitivity to detect longitudinal change in both cognitive and functional measurements within AD and MCI (17
); to our knowledge, metaROI diagnostic performance in AD patients at different disease stages has never been studied.
All three 18F-FDG PET global metrics under comparison were developed specifically for the discrimination between AD patients and controls. 18F-FDG PET metrics of AD-like hypometabolism could be used neither for differential diagnosis among various forms of dementia (which could, however, show abnormal scores on any of them) nor for highlighting vascular damage (which should be assessed using different techniques). Thus, patients with dementing diseases other than AD were not considered for the current investigation.
The diagnostic performance of 18
F-FDG PET indices was assessed in patients with AD at different stages (ranging from MCI due to AD to moderate AD), whereas MCI patients who did not convert to AD during follow-up were not considered. Although it would have been interesting to compare the ability of 18
F-FDG PET indices to identify patients who will never convert to dementia (true-negatives), given that the minimum observation time required to ensure no conversion is 5–6 y (25
) we could not exclude the possibility that patients who had not converted during the follow-up time (much shorter than 5 y for most available MCI patients) would have converted in the future, and we would thus have had unreliable results.
The control dataset used in ROC analyses to assess the diagnostic performance of 18
F-FDG PET global indices on different AD patient datasets included controls from the NEST-DD and EADC-PET databases. Despite the many strengths of ADNI, in that study the healthy subjects may not be fully representative of the healthy population, as they have been shown (although in quite a small sample) to have a high rate of Pittsburgh compound B positivity (26
), probably due to the recruitment modality. On the other hand, achieving a representative normative database is quite difficult, independently of selection modality. Despite the fact that controls from the EADC-PET and NEST-DD datasets have shown homogeneous sociodemographic, clinical, and metabolic features across different enrollment centers, one should be skeptical about the representativeness of the healthy elderly population. The use of the same normative dataset to assess the diagnostic performance of all 18
F-FDG PET global metrics under comparison on each test dataset improved the reliability of head-to-head comparisons. Furthermore, the independence of the normative dataset from all datasets used to develop and optimize different metrics made it possible to avoid any circularity, which could have biased the comparison.
Because each algorithm handled age differently, the age of the controls could be a potential confounding factor. Age correction embedded in PALZ and metaROI computation enabled the removal of any variance due to age. As the current implementation of HCI does not take age into account but significant linear dependence was found in the normative dataset, further work should be done to investigate the effect of age on HCI and to properly correct for such an effect under all possible diagnostic conditions.
Some limitations should be considered in the interpretation of the present results. First, as visual rating by expert physicians still represents the gold standard clinical method of assessing AD-like hypometabolism on 18
F-FDG PET, the diagnostic performance and accuracy of global metrics should be preliminarily compared with visual rating by independent expert raters. Second, the 3 global metrics included in this head-to-head comparison are not the only automated methods to assess AD-like hypometabolism on 18
F-FDG PET images; the 3 metrics should be further compared with other available voxel-based techniques, such as single-case SPM (27
) or 3-dimensional stereotactic surface projection and NEUROSTAT-based indices (28
). Third, patients with MCI due to AD were disaggregated into fast and slow converters on the basis of conversion time since enrollment; however, because the time of symptom onset is unknown (fast converters could be enrolled after having symptoms for a long time), caution should be used when considering such subgroupings. Finally, considerations about the user friendliness of the three 18
F-FDG PET summary metrics are based on their current implementation; however, they were all implemented for academic use. Additional programming can make them more automated and user-friendly for clinical settings.