|Home | About | Journals | Submit | Contact Us | Français|
Current neuropathologic consensus criteria for diagnosis of dementia yield a classification of processes that likely contributed to dementia in that individual. While dementia diagnosis currently relies on clinical criteria, practicing neuropathologists and researchers might benefit from a simple, accurate risk scoring protocol for the neuropathologic diagnosis of dementia. Using 232 consecutive autopsies from the population-based Adult Changes in Thought study, we developed two logistic regression-based risk scoring systems; one solely using neuropathologic measures and a second additionally including demographic information. Inverse-probability weighting was used to adjust for inherent selection bias in autopsy-based studies of dementing illnesses. Both systems displayed high levels of predictive accuracy; bias-adjusted area-under-the-curve statistics were 0.78 (95% CI 0.71, 0.85) and 0.87 (95% CI 0.83, 0.92), indicating improved performance with the inclusion of demographic characteristics, specifically age and birth cohort information. Application of the combined neuropathlogy/demographic model yielded bias-adjusted sensitivity and specificity of 81% each. In contrast, application of NIA-Reagan criteria yielded sensitivity and specificity of 53% and 84%. Our proposed scoring systems provide neuropathologists with tools to make a diagnosis, and interpret their diagnosis in the light of known sensitivity and specificity estimates. Evaluation in independent samples will be important to verify our findings.
Current neuropathologic consensus criteria for dementia yield a classification of processes that likely contributed to dementia in that individual . Two such processes, that are commonly co-morbid contributors to the dementia syndrome, are Alzheimer's disease (AD) and vascular brain injury (VBI) [2-7]. Neocortical Lewy bodies (nLBs) are a third independent pathologic correlate of dementia, often observed in combination with AD and/or VBI, although typically with lower prevalence in community-based samples [8-10].
Unfortunately current neuropathologic criteria do not facilitate a diagnosis of dementia; actual diagnoses of dementia currently rely on clinical criteria. As such, neuropathologists often cannot make definitive diagnoses because of remote and possibly unavailable clinical history. We believe practicing neuropathologists and researchers might benefit from a simple protocol whereby the cumulative burden of co-morbid neuropathologic contributors to dementia can be assessed and interpreted in light of known sensitivity and specificity estimates. At least then neuropathologists could make quantitatively rigorous statements about the likelihood that pathologic phenomena warrant a diagnosis of dementia.
Previous, related work on neuropathologic-based diagnostic risk scoring protocols has been limited either in terms of sample size or lack of generalizability to community-based settings. Newell and colleagues report on the application of the NIA-Reagan criteria to 84 brains from the Massachusetts Alzheimer Disease Research Center Brain Bank . They showed good general agreement, with 38 of 63 (60%) clinically demented patients assigned the ‘high likelihood’ category and 17 of 21 (81%) of non-demented patients assigned the ‘low likelihood’ category. Others have also evaluated the performance of the NIA-Reagan criteria with similar results [8, 12-16]. More recently, Jellinger evaluated previously proposed dementia disorder-specific criteria, although the study sample was limited to demented individuals . Gold and colleagues proposed thresholds for a series of pathologic substrates that performed well as diagnostic criteria for mixed dementia . Clinical evaluation of dementia was limited, however, in that cognitive status was based on the Clinical Dementia Rating ; also important was that the study was hospital-based further limiting the generalizability.
In this manuscript we seek to develop a simple, accurate risk scoring system for a neuropathologic-based diagnosis of dementia, applicable in general neurological clinical settings. The evaluation of neuropathologic risk factors for dementia or AD, however, is well known to be subject to potential selection bias . Few studies have been in a position to adjust for selection bias, since comprehensive information is required on individuals not selected for autopsy. Toward this, as a large, population-based study of aging, the Adult Changes in Thought study is well-positioned.
The Adult Changes in Thought (ACT) study is an ongoing population-based prospective study of incident AD and dementia, among individuals aged 65 years and older . Between 1994 and 2003 ACT enrolled 3,392 participants from a population base of 23,000 members of Group Health Cooperative (GHC), a large health care provider in King County, Washington. For all enrollees, demographic, medical history, and functional status information was collected at baseline and at subsequent biennial follow-up visits. At each visit, participants were evaluated with a protocol-based examination using the Cognitive Abilities Screening Instrument (CASI) , until diagnosis of dementia, withdrawal, or death. A CASI score of 85 or less triggered a comprehensive dementia workup, with a consensus-based clinical dementia diagnosis following Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, (DSM-IV) criteria . Additional details are presented elsewhere . Based on these criteria, enrollees were required to be dementia free at baseline; subsequent diagnoses of dementia were therefore taken to be incident cases.
Participants were asked to consent for brain autopsy. For participants who had not decided whether or not to provide consent, additional requests were made at subsequent biennial visits. In accordance with state law, next-of-kin were also required to file informed consent for autopsy after death. To minimize misclassification of dementia diagnosis status at time of autopsy among subjects without a positive diagnosis at their last follow-up visit, we excluded those whose death was more than 2 years beyond their last visit. Further, individual cases were excluded from evaluation if found to have known, less common causes of dementing illness and delirium or a history of chronic alcoholism.
Following fixation, all autopsied brains were evaluated for gross lesions including the extent of atherosclerosis and the number of macroscopic cystic infarcts. Formalin-fixed tissue sections were dissected and embedded in paraffin prior to sectioning and staining. We limited our evaluation to cystic infarcts, since acute and subacute infarcts were thought unlikely to have contributed to long-standing cognitive decline. Semi-quantitative neocortical neuritic plaque frequency (based on the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) scoring system), neurofibrillary tangle distribution (by Braak stage), amyloid angiopathy, neocortical and brainstem Lewy bodies, and hippocampal sclerosis were evaluated by established methods, as previously described . Cerebral microinfarcts were counted in frontal, temporal, parietal, and occipital lobes and in basal ganglia and thalamus . All evaluations were performed blinded to the clinical diagnosis.
We constructed two systems for the neuropathologic diagnosis of dementia: one based solely on neuropathologic measures (denoted ‘NP only’) and a second, including additional select demographic characteristics known to be risk factors for dementia/AD (denoted ‘NPD’). For the latter we considered age, birth cohort, gender, education, and the presence of at least one apolipoprotein (APOE) ε4 allele.
For both the NP only and NPD systems, we fit a logistic regression model for the binary outcome of whether or not an individual had a clinical diagnosis of dementia. While researchers have at their disposal a range of algorithms available for the construction of prediction models , in the context of a binary outcome, logistic regression models have been shown to be optimal in the sense of maximizing the receiver operating characteristic (ROC) curve at every point . For each model we began by fitting a ‘full’ model, which included all relevant covariates. We then developed a ‘final’ model, using backwards elimination; across all models, application of alternative stepwise algorithms yielded the same results.
A standard strategy for developing a risk scoring system is to split the available data into two sub-samples: a model-building sub-sample and a validation sub-sample. Although the ACT autopsy sample is large compared to other autopsy-based dementia studies, it is still relatively small. To ensure optimal use of the available information, we built on a strategy outlined by Harrell et al. . Specifically, we constructed the models using the entire ACT autopsy sample. Due to over-fitting of the sample, however, naïve evaluation of the performance of the resulting model may be optimistic and therefore not externally generalizable. To overcome this difficulty, the bootstrap (based on 1,000 replicates) was used to estimate the optimism associated with over-fitting. The estimate was then used to adjust the performance measures .
To evaluate potential selection bias, we compared characteristics of ACT participants that died and did not undergo autopsy to those that did undergo autopsy. The evaluation therefore focused on characteristics previously reported as being related to consent, including dementia status, age, race, gender, education, marital status, and depression . To adjust for potential selection bias we used inverse-probability weighting . The weights were obtained by fitting logistic selection models to participants that died, where the outcome was taken to be whether or not an autopsy was performed. Sensitivity analyses for the choice of covariates to be included in the selection model were performed to ensure robustness of the results. To avoid making strong assumptions, age was included in all selection models via a natural smoothing spline .
Given a fitted logistic model, a risk score for an individual is obtained by multiplying their covariate values by a set of scoring weights and evaluating the sum. Towards simpler and more practical scoring systems, the scoring weights were taken to be the estimated regression coefficients from the reduced models, multiplied by 10 and rounded. We investigated the impact of this simplification; no appreciable loss in performance was found. The risk score may then be compared to a threshold, used as a basis for a decision of whether or not a neuropathologic-based dementia diagnosis can be given. To evaluate the overall predictive performance of the various models, across all potential thresholds, we plot ROC curves and evaluated area-under the curve statistics (AUC) . The bootstrap, repeating the entire model construction/evaluation process 1,000 times, was used to obtain 95% confidence intervals .
Despite our two-year restriction on individuals without a diagnosis of dementia at their last known visit (following the ACT study protocol), there may still be potential misclassification of dementia status at the time of death. To investigate this we considered a series of sensitivity analyses where some individuals without a diagnosis at their last known visit were assumed to have a true (underlying) dementia status at death. We considered three schemes for the probability of misclassification: i) constant across all non-demented individuals, ii) increasing with age at last visit; and iii) increasing with time since last visit. For each scheme we examined settings where the overall rate of misclassification among non-demented individuals was 10% and then, separately, 20%. Individuals with a diagnosis of dementia were assumed to remain demented. For each of the six sensitivity analyses, we repeated the process 1,000 times to avoid dependence on a single re-assignment.
Finally, in addition to the proposed scoring systems, we also evaluated the performance of NIA-Reagan criteria on the ACT autopsy sample. Although the latter were developed to provide differential diagnoses among individuals with dementia, others have considered their application in more general populations .
Throughout, statistical significance was judged via two-sided tests, at the 0.05 level. All analyses were performed in R version 2.7.0 (R Foundation for Statistical Computing, Vienna, Austria).
Table 1 provides baseline demographic characteristics of 1,076 ACT participants who died during follow-up. A total of 232 participants (21.6%) underwent autopsy. Those who underwent autopsy were more likely to be demented (39.7% vs. 29.3% among those not autopsied), tended to be older at their last visit (47.0% were older than 85 years of age vs. 31.5%), more educated (63.0% with at least some college education vs. 52.3%), and were less likely to be non-white (3.4% vs. 8.6%). The two groups were similar in terms of gender, marital status, and depression status. Among individuals with a clinical dementia diagnosis at death, the majority of diagnoses were of AD type (53.0% among those not autopsied and 57.6% among those autopsied).
Table 2 provides results from two logistic selection models. The ‘saturated’ model consists of variables identified in the published literature as being potentially related to selection in autopsy-based studies. The ‘sparse’ model additionally excludes variables on the basis of statistical significance. The results indicate age (at last known visit), race, and education are statistically significantly associated with whether or not an individual underwent autopsy. Further, dementia status (the outcome for the main risk scoring analyses) was found to be associated with selection; based on the saturated selection model, demented individuals were more likely to be autopsied with an estimated adjusted odds ratio (OR) of 1.52 (95% CI 1.11, 2.09).
Autopsies were obtained on study participants between 1996 and 2007; 76.4% of autopsies on non-demented participants occurred since 2000, while 92.4% of autopsies on demented participants were from 2000 onwards. Table 3 provides information on autopsied individuals, according to clinical dementia status. Of the 140 non-demented study participants that underwent autopsy, 82 (58.6%) did so within one year of their last known visit (approximately half the protocol-based follow-up interval for ACT). Of the 92 autopsied participants with a clinical diagnosis of dementia, 59 (65.6%) died more than two years after their last known visit.
Neuropathologic measures, according to clinical dementia status, are also provided in Table 3. Based on χ2 tests, univariate analyses indicate highly statistically significant associations between dementia status and CERAD score, Braak stage, number of cerebral microinfarcts, and amyloid angiopathy (each p-value < 0.001). The number of cystic infarcts was also found to be associated with dementia status (p-value 0.033), whereas only marginal evidence of association was found for nLBs (p-value 0.087).
The results from the full logistic prediction models (i.e., prior to the application of backwards elimination) are presented in Table 4. Unadjusted odds ratio estimates, which ignore potential selection bias, are presented, together with adjusted estimates based on inverse probability weighting using the saturated selection model. Results based on weights from the sparse selection model did not differ substantially and are therefore not shown. Due to strong collinearity between the CERAD score and Braak stage measures, the former never attained statistical significance when Braak stage was included in any model. We therefore did not include CERAD score in any of our models.
For both the NP only and NPD models, Braak stage and the number of microvascular infarcts were highly statistically significant as predictors of dementia with more advanced pathology indicating increased risk (Table 4). These findings persisted when inverse probability weighting was used to adjust for potential selection bias; OR estimates and 95% CIs were similar for Braak stage while estimates for cerebral mircoinfarcts increased in magnitude. For example, in the NP only model the estimated OR corresponding to whether or not there were more than two cerebral mircoinfarcts increased from 7.17 (95% CI 2.60, 19.74) to 10.00 (95% CI 3.33, 29.96).
In unadjusted models, the presence of nLB disease, cystic infarcts, and amyloid angiopathy was either borderline or not statistically significant (Table 4). Adjustment for selection bias resulted in increases in the magnitude of the effect sizes for all three measures. For example, the estimated OR corresponding to the presence of nLBs increased from 5.76 to 13.45 in the NP only model and 4.04 to 9.79 in the NPD model. As a consequence, the presence of nLB and cystic infarcts achieved statistical significance in both models. Additionally, although amyloid angiopathy did not achieve our a priori threshold for statistical significance, the point estimate in the NPD model increased (OR 2.83; 95% CI 0.88, 9.08), with the observed p-value decreasing from 0.190 to 0.080.
Finally, Table 4 also provides details on estimated associations between various demographic characteristics and risk of dementia among individuals that die. Although point estimates suggesting potentially important associations, of the variables considered, only age and birth cohort were retained in the reduced model. Across the characteristics, there was little impact on either OR point estimates or 95% confidence intervals when weighting for selection bias was incorporated.
Table 5 provides the proposed scoring systems based on the final models. Both are presented to provide flexibility in choosing the one appropriate to data availability. Figure 1 shows the bootstrap-adjusted ROC curves for the two systems. The AUC for the NP only model was 0.78 (95% CI 0.71, 0.85); for the NPD model the AUC was 0.87 (95% CI 0.83, 0.92) indicating improved predictive performance with the inclusion of age and birth cohort information. Sensitivity analyses indicate somewhat reduced performance in the presence of misclassification. For each of the three schemes, given 10% misclassification among the non-demented participants, AUC for the NPD model decreased to approximately 0.84; given 20% misclassification, AUC decreased to approximately 0.80.
While ROC analyses evaluate the risk scoring system across all potential thresholds, in practice, pathologists will be required to choose a single threshold. One strategy for doing so is to stipulate a desired minimum sensitivity or a minimum specificity, depending on whether priority lies in identifying cases or non-cases. For the NP only model, stipulating minimum sensitivities of 70%, 80%, and 90% correspond to thresholds of 23, 11, and 0 (Table 5); the actual (optimism) adjusted sensitivities/specificities for each threshold are 75%/85%, 86%/67%, and 100%/0% respectively, reflecting the discrete nature of the NP only scoring systems (Figure 1). For the NPD model, the corresponding thresholds are 125, 117, and 107, yielding actual sensitivities/specificities of 72%/89%, 81%/81% and 92%/56%. Similarly, stipulating minimum specificities of 80% and 90% corresponds to thresholds of 23 and 34 for the NP only model, yielding sensitivities/specificities of 75%/85% and 60%/91% (results for 70% are the same as those for 80%). For the NPD model, the thresholds are 113, 117, and 129, yielding actual sensitivities/specificities of 83%/70%, 81%/81%, and 66%/91% respectively.
To illustrate the use of the systems, Table 5 also presents a hypothetical 70-year old born in 1934, with at least 3 microvascular lesions and the presence of cystic infarcts. The observed risk scores for this individual are 34 and 119 for the NP only and NPD models, respectively. Assuming a required minimum specificity of 80%, a positive diagnosis would be given using either the NP only or the NPD model. In some settings, where correct identification of non-demented individuals is a priority, may require the more stringent criteria of a minimum specificity of 90%. In this case, a positive diagnosis would be given if the NP measures were the only available information. However, a negative diagnosis would be given had age information been available since the threshold for diagnosis of 129 is strictly greater than the individuals' risk score.
Finally, applying the NIA-Reagan criteria, 137 of the 140 autopsies without a clinical dementia diagnosis were assigned to either the intermediate or low likelihood categories (98% specific), whereas only 20 of 91 autopsies from participants with a clinical dementia (and complete CERAD/Braak stage data) were assigned to the high likelihood category (22% sensitive). Applying a more liberal threshold of combining the intermediate and high likelihood categories yielded specificity and sensitivity of 84% and 53%, respectively. Applying the criteria solely to those autopsies with a clinical AD diagnosis did not significantly change these findings.
To our knowledge this is the first attempt at constructing a risk scoring system for a neuropathologic-based diagnosis of dementia applicable in general community-based settings. Two risk scoring systems were proposed to provide flexibility in clinical settings, depending on the availability of age-related information. For each model a range of thresholds are provided, along with estimates of sensitivity and specificity. Overall, both systems performed well with a high degree of accuracy, each providing a substantial improvement over the NIA-Reagan criteria. Although there was some overlap in the 95% confidence intervals, with the introduction of age and birth cohort information, the NPD model exhibited improved performance over the NP only model.
The combination of neuropathological and demographic variables was motivated by the phenomenon that individuals without a clinical evidence of dementia can exhibit relatively high levels of neurodegenerative disease [30, 31]. This may be a limitation of the criteria being applied. It may also represent underlying variation, where some individuals can bear a greater burden of neurodegenerative disease without clinical manifestation while others are more susceptible to developing the dementia syndrome with relatively less disease burden. This complexity seems likely and, if true, suggests that it may never be possible, using current histopathologic approaches, to discriminate between individuals with and without dementia. This phenomenon manifested in our results by decreased specificity (i.e., attributing a diagnosis incorrectly), especially using the NP only model. With the introduction of demographic information, in particular age, thresholds for the NPD model were (relatively) higher than the NP only model, making the standard to achieve diagnosis based on NP measures somewhat higher among younger patients. Intuitively, the NPD model balances the NP information with what might be expected based on age-specific prevalence.
An important strength of this work is the statistical methodology used to address two key challenges of autopsy-based dementia studies: potential selection bias and small samples. A consequence of the inverse-probability weighting, to account for selection bias, was that the estimated regression parameters were larger in magnitude (Table 4); this likely created additional separation (on the risk scoring scale) between cases and non-demented subjects, thereby improving performance. Additionally, taking advantage of the full dataset to estimate the components of the scoring system permitted maximal use of the available information, with the bootstrap ensuring honest and valid estimation of predictive performance. Nevertheless, as with all scoring systems, evaluation in an independent sample will be important. Further, evaluation of the performance of the proposed risk scoring systems in populations with differing prevalence of dementia sub-types , together with larger sample sizes, may provide an impetus for subtype specific prediction models.
Despite performing relatively well, there is room for improvement. In particular, we note that while the presence of amyloid angiopathy appeared to be an important predictor of clinical dementia, it was not retained as a component of either scoring system. The same applied to certain demographic variables such as gender, APOE genotype, and education. Given the magnitudes of the estimated coefficients (Table 4) together with a well-established literature [33, 34], it is likely that a lack of power, rather than the lack of an effect, is responsible; future work, based on larger samples sizes may resolve this. Finally, our neuropathologic measures are likely surrogates for damage to the brain structures that underlie cognition. Future measures of cortical or hippocampal synaptic density or measures of dendrite integrity may correlate more directly with cognitive function and provide the opportunity for improved scoring systems.
Beyond the exclusion of potentially predictive characteristics, the simplicity of our scoring systems likely do not reflect complexity of the underlying mechanisms. For example, in contrast our inclusion of age as a linear term, motivated by attempting to keep the model as simple as possible, dementia and AD incidence is known to increase with age at a faster than linear rate [20, 35]. We explored incorporating more complex functional forms for the age at death association, such as adding a quadratic term or using spline functions, although found that none improved the predictive performance of the model. Again future studies, based on larger sample sizes, may provide additional insight. Unfortunately, we did not have uniform access to intermediary outcomes such as mild cognitive impairment (MCI). Future studies based on extended outcomes, beyond our binary dementia classification, may yield prediction systems with improved ability to discriminate across a range of age-related conditions.
An important aspect of this work was to generate a simple and practical prediction system, together with honest and valid estimates of performance. With the only additional requirement being date of birth information (to calculate both age and cohort membership), generally available in most clinicopathologic settings, the NPD model achieves this goal.
Data collection and analyses were supported by the National Institutes of Health, U01 AG06781 (E. Larson) and R01 AG02380 (T. Montine).