|Home | About | Journals | Submit | Contact Us | Français|
The neuropathological examination is considered to provide the gold standard for Alzheimer disease (AD). To determine the accuracy of currently employed clinical diagnostic methods, clinical and neuropathological data from the National Alzheimer's Coordinating Center (NACC), which gathers information from the network of National Institute on Aging (NIA)-sponsored Alzheimer's Disease Centers (ADCs), were collected as part of the NACC Uniform Data Set (UDS) between 2005 and 2010. A database search initially included all 1198 subjects with at least one UDS clinical assessment and who had died and been autopsied; 279 were excluded as being not demented or because critical data fields were missing. The final subject number was 919. Sensitivity and specificity were determined based on “probable” and “possible” AD levels of clinical confidence and 4 levels of neuropathological confidence based on varying neuritic plaque densities and Braak neurofibrillary stages. Sensitivity ranged from 70.9% to 87.3%; specificity ranged from 44.3% to 70.8%. Sensitivity was generally increased with more permissive clinical criteria and specificity was increased with more restrictive criteria, whereas the opposite was true for neuropathological criteria. When a clinical diagnosis was not confirmed by minimum levels of AD histopathology, the most frequent primary neuropathological diagnoses were tangle-only dementia or argyrophilic grain disease, frontotemporal lobar degeneration, cerebrovascular disease, Lewy body disease and hippocampal sclerosis. When dementia was not clinically diagnosed as AD, 39% of these cases met or exceeded minimum threshold levels of AD histopathology. Neurologists of the NIA-ADCs had higher predictive accuracy when they diagnosed AD in demented subjects than when they diagnosed dementing diseases other than AD. The misdiagnosis rate should be considered when estimating subject numbers for AD studies, including clinical trials and epidemiological studies.
For Alzheimer disease (AD), as for any disease, it is extremely important to know, with precision and confidence, the accuracy of currently employed clinical diagnostic methods. This information is critical for AD research, including epidemiological studies, economic impact studies and, in particular, treatment and prevention trials. With the exception of those diseases caused by single gene defects, the most accurate diagnosis is obtained from the histological examination of tissue samples from affected sites. For many disorders this is possible during life through biopsy but biopsy has long been contraindicated for AD due to a high risk/benefit ratio. However, it is generally accepted that histological examination is the best indicator of AD diagnosis. The emergence of new biomarkers may have a major impact on clinical diagnostic practice (1); however, because measures of all biomarkers studied to date have considerable overlap with those found other types of dementia (as well the cognitively normal elderly population), it remains difficult to determine whether or not cognitive impairment is due to AD or another more dominant and concurrent process. Therefore, for the foreseeable future, autopsy will still serve the role of “gold standard” for the determination of clinical diagnostic accuracy rates.
The existing body of data on the accuracy of current clinical AD diagnostic methods shows much variability among studies (2-25). Sensitivity estimates have ranged between 41% and 100% (median of 87%), while specificity has ranged between 37% and 100% (median of 58%). Because many studies have reported only sensitivity or positive predictive value, a general impression has arisen that the clinical diagnosis of AD is extremely accurate. The interpretation of these data, however, is problematic due to differences in time and setting. In particular, whereas the clinical methods for diagnosing AD have not changed substantially since the introduction of the National Institute of Neurological Disorders and Stroke-Alzheimer's Disease and Related Disorders Association (NINDS-ADRDA) criteria in 1984 (26), there have been several changes in neuropathological diagnostic criteria, including the “Khachaturian criteria” of 1985 (27), the “Tierney” criteria of 1988 (28), The Consortium to Establish a Registry for Alzheimer's Disease (CERAD) criteria in 1991 (29), and the National Institute on Aging (NIA)-Reagan criteria in 1997 (30). The major goal in all of these efforts has been to determine the degree of AD histopathologic abnormalities necessary to cause dementia. Indeed, until the present time the consensus of expert opinion has required that the clinically documented presence of dementia be part of the definition of AD.
This study used clinical and neuropathological data collected by the National Alzheimer's Coordinating Center (NACC), which gathers information from the network of Alzheimer's Disease Centers (ADCs) sponsored by the United States NIA. The data analyzed in this study have been collected as part of the NACC Uniform Data Set (UDS) since 2005 (31) and thus represent the most current research practices. Because the last diagnostic accuracy studies utilizing a complete ADC dataset were performed in 1998 and 1999 (4, 23), it is important, for current research and clinical usage, to obtain more appropriate figures.
Subject data were obtained with the assistance of NACC personnel through a NACC UDS database search. The NACC UDS has been collected since September 2005 from more than 30 ADCs located throughout the United States (31, 33). Most ADCs are at university medical centers in urban areas. Research subjects are generally recruited from the practices of participating neurologists with some additional community-based recruitment. The initial data pull included all 1198 subjects that had at least one UDS clinical assessment, had died and were autopsied. From this group, 279 subjects were excluded as being recorded as “not demented” or because critical data fields were either not filled out or were marked “missing” or “not done.” Subjects without dementia were excluded because it has been the common practice for both clinical and neuropathological diagnostic definitions of AD to require the presence of dementia. Therefore, non-demented subjects are not generally classified as AD. Excluded critical data fields were those used to enter a response for the presence or absence of clinically probable AD, and for the CERAD neuritic plaque density and Braak stage. After exclusion of these subjects the final subject number for the study was 919.
The study goal was to estimate the sensitivity and specificity of the clinical diagnosis of AD using the neuropathological diagnosis as the gold standard. Clinical diagnosis was that given at the last assessment during life. Both the clinical and neuropathological diagnoses were stratified by level of confidence. For the clinical diagnosis of AD, NINDS-ADRDA criteria (26) were stratified by considering “probable AD” alone as well as “probable” plus “possible AD.”
For the neuropathological diagnosis of AD, the NIA-Reagan criteria (30) are the most current guidelines; these stratify the neuropathological confidence level as “high,” “intermediate” and “low”. Due to the idiosyncratic assignment of these categories from case to case (32), we instead stratified by all relevant combinations of the deterministic histopathological scores, consisting of the CERAD-defined neuritic plaque density score (29) and the Braak neurofibrillary tangle stage (33). These scoring methods have been shown to have a reasonably high level of reproducibility between observers and among research centers (34-37).
Following the determination of specificities and sensitivities associated with the levels of clinical and pathological diagnostic confidence, further analysis was directed at determining the final neuropathological diagnoses for cases having mismatched clinical and neuropathological diagnoses.
Statistical methods included calculation of sensitivity and specificity with no adjustments made for age, gender or other subject characteristics. Groups were compared with t-tests, analysis of variance and Kruskall-Wallis analysis of variance. For all tests, the significance level was set at p < 0.05.
Some subject characteristics are given in Table 1. Subjects were classified based on their clinical categorization as “probable AD,” “possible AD” or “not AD”. The “not AD” group are those not clinically diagnosed as either probable or possible AD; this group included only non-AD dementias as non-demented subjects were excluded from the study. Probable and possible AD were defined according to the NINCDS-ADRDA guidelines (26). The mean age of the “not AD” group was significantly lower than that of the probable or possible AD groups (72.8, 81.2 and 83.2 years, respectively). The gender distribution was generally skewed towards more males but did not significantly differ among groups. The median scores for neuritic plaque density and Braak stage were significantly different, with progressively lower scores moving from probable AD through possible AD and not AD groups. The 3 groups did not significantly differ in the mean interval between last clinical assessment and death.
The excluded subjects significantly differed from the 919 included subjects in terms of age, gender distribution and AD-related histopathological scores, but not in the mean interval last clinical assessment and death.
Measures of agreement between stratified levels for the clinical and neuropathological diagnosis of AD are shown in Table 2. Sensitivity ranged from 70.9% to 87.3% and specificity ranged from 44.3% to 70.8%. In general, sensitivity was increased with more permissive clinical criteria and specificity was increased with more restrictive clinical criteria, whereas the opposite was true for neuropathological criteria. When optimizing for sensitivity and specificity, the best result was 70.9% sensitivity and 70.8% specificity. This was achieved when the clinical diagnosis was defined as probable AD and the neuropathological diagnosis as moderate or frequent neuritic plaques with Braak stage III-VI.
Table 3 shows the positive predictive value (PPV) for the clinical diagnosis of AD, stratified by levels of clinical and neuropathological confidence. The results were compared to the PPV that would result if all demented subjects were diagnosed as AD. The PPV for the clinical diagnosis of AD ranged from 46.0% to 83.3%, with the best result achieved when the clinical diagnosis was defined by probable AD and the neuropathological diagnosis as moderate or frequent neuritic plaques with Braak stage III-VI (the prevalence of subjects at or above this histopathological threshold was 67.2%). The PPV for clinically probable AD was consistently about 4.5% to 5% higher than that for possible AD. The PPV for clinically probable AD was consistently approximately 16% higher than that resulting if all demented subjects were considered to have clinical AD.
Of the 526 subjects diagnosed as clinically probable AD, 438 were confirmed as neuropathological AD, as defined above, and 88 did not meet neuropathological criteria. For this analysis a relatively permissive neuropathological definition was used, i.e. CERAD neuritic plaque density of moderate or frequent in combination with any Braak neurofibrillary tangle stage between III and VI, inclusive.
The primary neuropathological findings for the cases not meeting the defined lower threshold for histopathological severity are summarized in Table 4. The most frequent primary neuropathological diagnosis among these subjects was AD, primary (NACC database code NPPAD), assigned by the neuropathologist to 17 cases despite the relatively low levels of AD histopathology. Other relatively frequent primary neuropathological findings were tangle-only dementia or argyrophilic grain disease (15 cases; NACC database code NPTAU), frontotemporal lobar degeneration (15 cases; NACC database code NPPFTLD), cerebrovascular disease (10 cases; NACC database code NPPVASC), Lewy body disease, with or without AD (9 cases; NACC database code NPPLEWY), and hippocampal sclerosis (9 cases; NACC database code NPPHIPP). A small number of cases received primary neuropathological diagnoses of progressive supranuclear palsy (3 cases), corticobasal degeneration (2 cases) and neuroaxonal dystrophy/Hallervorden-Spatz-like disease (2 cases); there were several other miscellaneous neuropathological diagnoses (1 case each) (Table 4).
There were 271 subjects who were clinically diagnosed as not having either probable or possible AD (Table 5). Of these, 107 cases met a minimum histopathological threshold for AD, i.e. neuritic plaque density of moderate or frequent in combination with any Braak neurofibrillary tangle stage between III and VI, inclusive, despite their negative clinical diagnoses. There were 164 cases that had clinical diagnoses other than AD and that were confirmed neuropathologically in that minimum AD histopathology thresholds were not present. The most frequent primary neuropathological diagnoses in these cases were frontotemporal lobar degeneration (60 cases), followed by Lewy body disease (39 cases), Creutzfeldt-Jakob disease (20 cases) and progressive supranuclear palsy (18 cases). Smaller numbers of cases were diagnosed as tangle-only dementia, argyrophilic grain disease, Pick disease, corticobasal degeneration, cerebrovascular disease, hippocampal sclerosis and amyotrophic lateral sclerosis. Many cases had contributing neuropathological diagnoses in addition to their primary diagnosis; the most frequent contributing diagnoses were similar to the most frequent primary diagnoses.
Assessment of diagnostic accuracy rates necessarily requires a “gold standard” for accuracy that is ideally presumed to be 100% correct. For AD, the neuropathological diagnosis has always been considered the gold standard, however, because the neuropathological criteria for AD have changed several times over the past 30 years (27-30), the question arises “how good is the present neuropathological gold standard?” The neuropathological criteria for AD used by all NIA ADCs since 1997 have been the NIA-Reagan criteria (30). These, however, do not give entirely specific guidelines for making the diagnosis, but rather set probability levels, given threshold levels of histopathological severity (CERAD neuritic plaque density and Braak neurofibrillary stage) for when dementia may be due to AD. Thus there are “low,” “intermediate” and “high” probability designations for classifying autopsy subjects although definitive recommendations on which of these 3 levels should be attained to justify the conclusion that dementia is due primarily to AD have not been defined. Because of 25 possible combinations of CERAD neuritic plaque density and Braak neurofibrillary stage, NIA-Reagan provides instructions for classification of only 3 (32, 38). Thus, a large fraction of subjects are not classifiable by NIA-Reagan criteria and the neuropathologist is left to make an arbitrary assignment.
In a recent examination of NACC data, Nelson et al estimated that 18% of subjects with dementia fell outside of the NIA-Reagan guidelines (34); here, we found that when non-demented controls were not included, 33% of subjects were not classifiable. The shortcomings of the NIA-Reagan criteria have been recognized and a revised set of criteria are currently in press (39, 40). In a recent multivariate analysis, approximately two-thirds of the variation in cognitive ability of elderly subjects was accounted for by variation in CERAD neuritic plaque density and Braak stage (41). We feel that our approach here has been cautious as the available body of knowledge does not allow neuropathologists to be dogmatic about exactly how much plaque and tangle pathology is necessary to cause dementia. It is likely that, just as some subjects with 90% coronary artery stenosis suffer myocardial infarctions and some do not, individual subjects will vary in their ability to withstand a given lesion density (perhaps due to “cognitive reserve”), differences in genetic background or differences in environmental exposures. Therefore, we thought it appropriate to offer a range of possible gold standards.
We found that sensitivity for AD diagnosis ranged from 70.9% to 87.3% whereas specificity ranged from 44.3% to 70.8%. Sensitivity was increased with more permissive clinical criteria, i.e. by allowing either “probable’ or “possible AD” to serve as the clinical diagnosis, whereas specificity was increased with more restrictive criteria, i.e. by only accepting “probable” as the clinical diagnosis. Changes in neuropathological criteria had the opposite effect. More liberal criteria decreased sensitivity but increased specificity. For the set of minimum neuropathological criteria that are perhaps most commonly used to define AD (i.e. CERAD neuritic plaque density of ‘moderate” or “frequent” and Braak stage III-VI), the sensitivity and specificity of the clinical diagnosis of “probable AD” were each approximately 71%, whereas for “combined “probable” and possible AD” sensitivity was about 83% with specificity about 55%. Comparisons with prior studies are difficult but these data agree with the median sensitivity (87%) and specificity (58%) calculated from a large set of previous publications (2-25). There is also relatively close agreement with the last 2 prior assessments of diagnostic accuracy using the total NIA-ADC dataset: Mayeux et al estimated a sensitivity of 93% and a specificity of 55% (4) and Tsuang et al reported a sensitivity of 84% and a specificity of 50% (23).
In most studies, sensitivity is relatively high while specificity is low and many studies have reported only sensitivity or positive predictive value, which has led to a false impression that the clinical diagnosis of AD is extremely accurate. While the clinical diagnosis of AD often is validated by neuropathological examination, a clinical diagnosis of a non-AD dementia is often not verified by neuropathology. For selecting subjects for clinical trials, the relatively high sensitivity is somewhat reassuring because it means that a relatively small percentage of non-AD subjects will be included in clinical trials. The positive predictive value, although not generally regarded as a good measure of diagnostic accuracy, is perhaps most relevant to clinical trial selection. The present results indicate that, when the minimum neuropathological threshold for diagnosis is defined as moderate or frequent neuritic plaques together with Braak stage III-VI, the positive predictive value of the clinical diagnosis of probable AD is 83%, i.e. 83% of subjects with that clinical diagnosis were confirmed neuropathologically to have AD lesions sufficient to cause dementia. Additionally, when neurologists of the NIA ADCs diagnose clinical AD, the positive predictive value is approximately 16% more accurate than if they diagnosed all dementia subjects with AD. However, even a modest level of diagnostic misclassification could have a significant effect on clinical trial calculations for minimum subject number and thus should be considered in clinical trial planning. For example, with an estimated response rate to a trial medication of 50%, a 20% misdiagnosis rate would lower the actual response rate to 40% (if non-AD dementia subjects do not respond to the medication), which would require an approximate doubling of subject recruitment to maintain statistical power.
For calculating diagnostic mismatches, we selected a relatively permissive combination, i.e. moderate or frequent neuritic plaques (NPs) with Braak stage III-VI. Lowering the lesion density threshold further (e.g. allowing sparse neuritic plaques and/or Braak stages 0-II) seemed counterintuitive because a large fraction of cognitively unimpaired elderly possesses such attributes. The new NIA criteria use the same combination of NPs and Braak stage as the minimum lesion density necessary for attributing the cause of dementia to AD (39, 40). Some might argue that the decision of whether or not to make the final diagnosis of AD should be left to the judgment of the individual neuropathologist rather than to rigid consensus criteria. We agree that individual neuropathologists should have some flexibility but deviations from consensus criteria should be only an occasional occurrence, otherwise, the definition of what constitutes AD becomes so variable as to hinder effective research.
Analysis of the subjects mismatched in terms of clinical and neuropathological diagnosis reveals some interesting trends. Neuropathologists not infrequently diagnosed AD in subjects clinically diagnosed as AD but not meeting a minimum histopathological threshold. This was despite the application of a relatively low minimum histopathological threshold (moderate or frequent neuritic plaques with Braak stage III-VI). This is most likely due to the lack of a specified minimum histopathological threshold for the primary neuropathological diagnosis of AD in the 1997 NIA-Reagan criteria. Other relatively frequent neuropathological findings are shown in Table 4. Of subjects clinically diagnosed as not having either probable or possible AD, the most frequent primary neuropathological diagnosis was AD, accounting for 39% of the total. While not formally analyzed here, we did note a high rate of multiple neuropathological diagnoses in demented subjects whereas clinicians rarely assign more than one diagnosis.
The reasons for the observed mismatches between clinical and neuropathological diagnoses cannot be understood completely but we offer the following tentative explanations: First, as extensively previously documented, the clinical diagnosis of dementia is relatively nonspecific (2-25). There are many alternate causes of dementia in the elderly and AD has generally been attributed to be the neuropathological cause in between 65% and 75% of dementia cases (42, 43). Correlation studies between molecular genetics and neuropathology also indicate that for several autosomal dominant dementias, the presenting clinical phenotype may vary widely, from various dementia subtypes to parkinsonism to motor neuron disease-like, even among individuals with the same causative mutation (44-47). These findings suggest that phenotype is not rigidly linked to etiology and may be moldable to a large extent by an individual's genetic and epigenetic background and/or total environmental exposure.
Because ADCs are tertiary referral centers, diagnostic accuracy rates may differ from those that may be achieved in secondary or primary care settings. For example, patients with a more complex clinical syndrome are more likely to be referred to a tertiary care center. Thus, there might be relatively more complicated cases that are more difficult to diagnose. On the other hand, the greater level of neurological expertise at these centers would presumably contribute to more accurate diagnoses. Because large reported autopsy series are only done in tertiary care settings, it is not possible to know for certain whether such differences exist. However, one study has reported no evident selection bias between autopsied and non-autopsied subjects among an incident community-diagnosed dementia series, suggesting that dementia case composition may not be substantially different in different settings (48).
The 1997 NIA-Reagan criteria for the neuropathological assessment of AD acknowledged that there are many uncertainties with defining the neuropathological gold standard, and indeed the criteria were intended to be only a tentative starting point that would invite an organized assessment once sufficient data had accumulated. In the revision of the 1997 criteria, combinations of plaque and tangle pathologies considered sufficient to cause dementia and qualify for the neuropathological diagnosis of AD are explicitly stated (40, 41). Topographical whole-brain amyloid staging has been incorporated as an adjunct to the preceding approach that was limited to lobar cortical plaque density estimates. Amyloid staging may be of immediate clinical diagnostic utility when used to guide amyloid imaging as the progression of amyloid plaques beyond the cortex appears to be a useful marker of higher Braak tangle stage and of the presence of dementia (49, 50). Additionally, more detailed advice has been given on how to assess common co-morbid neuropathology and AD subtypes such as AD with Lewy bodies, vascular brain injury, hippocampal sclerosis and TDP-43-immunoreactive tissue elements. This is a useful addition, although other clinically and neuropathologically defined AD subtypes have not been addressed (51). A major turn of direction occurred with the tacit acceptance that AD biological changes begin long before the clinical presentation of dementia or even cognitive impairment, whereas the 1997 criteria defined AD as requiring the presence of dementia. The present study also provides frequency estimates for various levels of plaque and tangle pathology in non-demented subjects, giving an overview of this important disease stage. Because there is a need for determining the biological cause of dementia, the probabilistic approach, which offers certain combinations of plaque and tangle pathology as being of “low,” “intermediate” and “high” likelihood of causing dementia, has been maintained. The approach to concurrent brain diseases of various types is a useful beginning but there is still a great need for large, multivariable clinicopathological studies that will ground probabilistic impact estimates on hard data rather than on expert opinion alone. This is especially true for the contribution of vascular brain injury, the documentation of which needs to become much more detailed before useful analyses can begin. It has become more apparent in recent years that dementia in the elderly is most often heterogeneous in origin and therefore cannot be understood solely by isolated studies of the “pure” conditions.
Although neuropathological consensus criteria for AD have changed several times over the last 3 decades, the clinical criteria have remained virtually unchanged since the NINCDS-ADRDA-sponsored criteria published in 1984 (6). New clinical diagnostic consensus criteria have now been finalized under the auspices of the NIA and Alzheimer's Association (52), with novel features including recognition of a preclinical stage of AD and the incorporation of imaging and laboratory-based cerebrospinal fluid biomarkers. It is expected that the latter will increase clinical diagnostic accuracy but verification will require the accumulation of a sufficient number of autopsied subjects to compare with the neuropathological diagnosis.
In conclusion, between 2005 and 2010, the accuracy rate of the clinical diagnosis of AD at NIA ADCs varied depending on the exact clinical and neuropathological criteria used. Among demented subjects, ADC neurologists were more accurate when they diagnosed AD than when they diagnosed subjects with another dementing disease. Those conducting clinical trials, epidemiological studies and governmental healthcare analyses should take diagnostic misclassification into consideration when determining experimental design and data analysis strategies. Whenever possible, efforts should be made to obtain data linked to neuropathological confirmation of diagnoses. With the maturation of many longitudinal clinicopathological programs, including those represented by the NIA ADCs, the sample sizes of such autopsy-confirmed cases are now becoming adequate to undertake such analyses. While new diagnostic biomarkers hold promise for increasing the clinical diagnostic accuracy for AD, it is expected that there will continue to be overlap in these measures between AD subjects, subjects with non-AD dementias and the non-demented elderly. The present study may be useful as a baseline against which to assess future improvements.
This study was supported by National Institute on Aging grants to the National Alzheimer's Coordinating Center (U01 AG016976) and the Arizona Alzheimer's Disease Core Center (P30 AG19610). Additionally, T.G.B. was supported by grants from the Arizona Department of Health Services, the Arizona Biomedical Research Commission and the Michael J. Fox Foundation for Parkinson's Research.