|Home | About | Journals | Submit | Contact Us | Français|
The neuropsychological battery from the National Alzheimer’s Disease Coordinating Center (NACC) is designed to provide a sensitive assessment of mild cognitive disorders for multicenter investigations. Comprised of eight common neuropsychological tests (12 measures), the battery assesses cognitive domains affected early in the course of Alzheimer’s disease (AD). We examined the factor structure of the battery across levels of cognition (normal, mild cognitive impairment (MCI), dementia) based on Clinical Dementia Rating (CDR) scores to determine cognitive domains tapped by the battery. Using data pooled from 29 NIA funded Alzheimer’s Disease Centers, exploratory factor analysis was used to derive a general model using half of the sample; four factors representing memory, attention, executive function, and language were identified. Confirmatory factor analysis (CFA) was used on the second half of the sample to evaluate invariance between groups and within groups over one year. Factorial invariance testing included systematic addition of constraints and comparisons of nested models. The general CFA model had a good fit. As constraints were added, model fit deteriorated slightly. Comparisons within groups demonstrated stability over one year. In a range of cognition from normal to dementia, factor structures and factor loadings will vary little. Further work is needed to determine if domains become more or less distinct in severely cognitively compromised individuals.
An emerging line of research seeks to identify biomarkers for individuals at-risk for Alzheimer’s disease (AD)with imaging,1 cerebrospinal fluid,2 blood samples,3 and detailed neuropsychological testing.4 In dementia research, neuropsychological batteries were originally developed with the goal of facilitating diagnosis; more recently the focus has been on early diagnosis. In 1984, the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA),5 specified 7 domains of cognition that may be impaired in AD. Neuropsychological test batteries, designed to tap these areas of cognition, have been used to identify variations in cognitive performance such as declines in memory,6 that may represent early disease pathology7. Different cognitive domains presumably reflect regional brain functions that may be affected either as part of the normal aging process, through the course of a neurodegenerative disease, or some other underlying pathology.7 We do not know, however, whether or not cognitive domains change or shift over the course of disease, i.e., whether neuropsychological tests measure the same thing in normal individuals compared with those suffering from dementia, or if the variation is simply one of a general decline in performance reflecting a more global cognitive decline. In samples of individuals with mild impairment or AD, slightly different factor structures seem to emerge implying that there is variation between normals, mildly impaired, and AD patients’ pattern of performance on tests.8-11
Factor analysis is used to evaluate the construct validity of neuropsychological batteries and measures of invariance are used to gauge the stability of factor structures. Indications of stability or instability across groups have implications for researchers seeking to determine how cognitive domains are affected when comparing normal individuals to those suffering from neurodegenerative disease such as AD. The stability with which factor structures correspond to cognitive domains across groups and over time has not been studied extensively.
Understanding factor stability over the continuum of brain disease is important as we begin diagnosing neurodegenerative disease at earlier stages. Shifts in the factor structure between groups would suggest different relationships between the measures and may signal demise of particular underlying neural systems or cognitive processes that cut across multiple cognitive domains. Herein we evaluate the factorial invariance of the National Alzheimer’s Coordinating Center (NACC) Uniform Data Set (UDS) neuropsychological battery to determine if the factor structure varies across groups or over time.
The NACC was established in 1999 in order to facilitate collection of standardized data from Alzheimer’s Disease Centers (ADC’s) across the United States. A neuropsychological battery, part of the UDS, comprising 8 tests (12 measures) was developed with the goal of tapping the following cognitive domains in mild cognitive impairment (MCI) and AD: attention, speed of processing, executive function, episodic memory, and language.12 Tests in the battery (listed below) were selected to focus on markers of aging, MCI, and AD by building on tests already being administered by a majority of ADCs while keeping participant burden low.12 All protocols and procedures were approved by the Institutional Review Boards of each ADC. Informed consent was obtained from all participants. (For a more detailed description of the UDS battery and NACC methodology see Morris 2006,13 Beekley 2007,14 and Weintraub 2009.12)
Data were collected at 29 NIA funded ADC’s using a standardized protocol. Participants are volunteers from the community; each ADC has its own recruitment protocol and participants are re-evaluated annually. The data consist of general demographic characteristics (age, sex, race, and education level in years), family history, health history, behavioral and functional assessments, and clinical information. As of May 5, 2008, the ADC’s participating in NACC administered 14,428 UDS initial assessment batteries to participants across the United States.
The UDS Neuropsychological Battery12 consists of 8 neuropsychological tests which are focused on characterization of non-demented aging, mild cognitive impairment (MCI), and mild AD. The battery is fixed, i.e., administered in a standardized, uniform fashion at all ADCs. Designed to be brief and to cover major cognitive domains, the tests included in the battery are: Logical Memory Story A (Immediate and Delayed recall),15 the Boston Naming Test (30 item),16, 17 WAIS Digit Symbol,18 Trail Making Test Parts A & B,19 Digits Forward,15 Digits Backward,15 and semantic fluency (Animals,20 and Vegetables) (Table 1). The Mini Mental State Examination (MMSE)21 was administered as a global indicator of dementia severity along with the Clinical Dementia Rating (CDR).22, 23
The CDR,23, 24 administered to all participants as a part of the diagnostic process, takes into consideration decline from a prior level of function by rating 6 cognitive domains. Domains assessed include memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care. Although the CDR is used in the diagnostic process, the results of the neuropsychological battery are not specifically used to determine CDR scores.
Factor analysis is a statistical technique that attempts to explain covariation among a set of observed variables by introducing unobserved (latent) variables that are presumed to be causes of the observed variables. Conditional on the unobserved variables, or factors, the observed variables are assumed to be uncorrelated. Observed variables “load” on factors with regression parameter estimates that are referred to as factor loadings. Because latent variables are unobserved, their scale is arbitrary so it is common to presume a unit normal distribution. There are a variety of different algorithms for estimating parameters of factor analysis models. In our study we used maximum likelihood parameter estimates as implemented in MPlus software (version 5.21, Muthén & Muthén, Los Angeles CA). Our analytic approach included both exploratory factor analysis (EFA; no a priori specification of latent factors and loading matrix is unconstrained) and confirmatory factor analysis (CFA; models are estimated given a fixed factor structure and the factor loading matrix contains many constraints).
Construct validity can be assessed by evaluating the variability of the factor structure, or factorial invariance.25 Each form of invariance described below is part of a hierarchy.26, 27 Beginning with simpler forms of invariance, restrictions are added and the level of similarity or dissimilarity between groups is evaluated by checking model fit11 (see description of fit statistics below). The most basic form of invariance, dimensional invariance, is an indication of the general structure specified by each group and is present when the same number of common factors is identified in each group. Configural invariance criteria are met when the same items are associated with the same common factors in each group. Metric invariance, or weak factorial invariance, is achieved when adequate model fit is demonstrated while the factor loadings are held constant across groups. For scalar (strong) invariance, factor loadings and intercepts are held constant across groups. Finally, strict factorial invariance is met when factor loadings, intercepts, and residual variances are held constant across groups.28 We performed tests of invariance using a multiple group analysis approach, where factor analytic models are estimated in separate groups simultaneously, and hypotheses about invariance are tested by adding constraints across groups and assessing differences in global indices of model fit.
The sample was subdivided according to CDR score rather than diagnoses because the results of the UDS neuropsychological battery are used in the diagnostic process. While the clinicians who assign CDR scores are not blinded to the results of the UDS neuropsychological battery, it is not a required source of information for staging disease severity. Individuals with CDR scores of 0.0 were considered cognitively normal. Those with scores of 0.5 were considered to have mild cognitive impairment (MCI), and those with scores greater than 0.5 were considered to have dementia. Because some of the tests in the battery typically yield skewed data (ceiling and floor effects), a Blom transformation29 was applied to normalize the data.
Initial EFA were performed using one half of the sample in order to develop an empirical model. A simple structure CFA model was then derived from the pattern of results found in the EFA model. CFA was used to evaluate invariance between groups. We similarly evaluated the stability of the factors over one year by comparing two time points within each group. We did not consider individuals’ group membership from one time point to the next because our focus was on the factor structure rather than individuals’ progress over time. Descriptive data analyses were performed using SAS statistical software.30 All factor analyses were completed using MPlus statistical software.31
A hierarchical set of models were produced in CFA with successively more restrictive criteria to assess levels of invariance. These nested models were tested using Satorra-Bentler Scaled χ2 test32 to determine if each successive model was significantly different from the prior one. This test corrects for the sensitivity of the χ2 to large sample sizes by applying a scaling factor to correct for multivariate normality. A second test, the Bayesian Information Criterion (BIC) was also applied to differentiate nested models.33 The BIC corrects for the χ2 sensitivity to sample size by multiplying the degrees of freedom by the natural log of the sample size and subtracting the product from the χ2; values less than zero represent models that have a better fit than a fully saturated model. Finally, Modification Indices (MI) provide estimates of the change in χ2 value that would result if a given parameter was left unconstrained. MI can be inspected for sources of model misfit.
The configural invariance model, the least constrained model, was used as the starting point. Individual model fit was evaluated by examining the following fit statistics. The Comparative Fit Index (CFI)34 captures the relative goodness of fit by comparing the model to the data while adjusting for complexity or parsimony35 (better fit >0.90; range 0.0 -1.0). The Tucker-Lewis Index (TLI)36 is an incremental fit index that makes a comparison between a null model and an incrementally more complex model37 (better fit > 0.90; range 0.0 -1.0). These two indices are measures of relative fit, indicating improvement relative to a null model.38 The Root Mean Square Error of Approximation (RMSEA)39 is not very sensitive to sample size and distribution and is a good measure of practical fit (better fit <0.05). The Standardized Root Mean Square Residual (SRMR)40 is the standardized difference between variance and covariance and it is less sensitive to distribution and sample size (better fit <0.05). The RMSEA and the SRMR reflect the size of the residuals associated with the model.38
A total of 14,428 participants completed initial assessment batteries as of 5 May 2008. Individuals under the age of 55 (n=517), those who reported a primary language other than English (1,080), and individuals who were missing a significant portion (10 or more) of the test scores (n=811) were excluded. A total of 12,020 individuals provided sufficient information for inclusion in the analysis. Individuals who were not included in the sample tended to be younger (mean 70.3(14.2) vs. 75.6(8.9), p<0.001), female (59.8% vs. 57.2%, p=0.02), with lower levels of education (13.0(4.9) vs. 15.0(3.1), p<0.001), lower MMSE scores (20.0(9.8) vs. 25.5(5.4), p<0.001), and a greater proportion were non-white participants (25.2% vs. 16.7%, p<0.001).
Among those retained in the sample, individuals classified as normal were significantly younger than individuals classified as mildly cognitively impaired (p<0.001) and those suffering from dementia (p<0.001) (Table 2). Dementia cases were in turn older than the mildly impaired individuals (p<0.001). The three groups were significantly different from each other on all other demographic characteristics. A listing of the sample means and standard deviations for all the test scores can be found in Table 3. The mean scores for normal individuals listed here are in line with means reported in a prior evaluation of the battery;12 differences may be attributed to exclusionary criteria imposed on our sample. The sample was randomly split into two subsamples of n=6010 each. The two samples were not statistically significantly different from each other in their demographic characteristics or test scores.
Initial exploratory factor analysis (EFA) was performed using half of the sample. Because the analysis was focused on factor structure and not data reduction, we considered model fit statistics rather than factor extraction procedures such as examination of scree plots41 or the application of the “Eigenvalue greater than one” rule.42 After reviewing the fit statistics (CFI = 0.999, TLI = 0.997, RMSEA = 0.019, SRMR = 0.003) we concluded that a 5-factor solution fit the data best.43 EFA was repeated for each of the cognitive subgroups to determine if the general model was the same in each group. All three groups’ factor structures were in agreement based on visual inspection, thus meeting criteria for dimensional invariance.
Using the EFA factor structure, a confirmatory factor analysis (CFA) model was built (Figure 1). We modified the original model in CFA to improve our fit by collapsing the four digits tests into one factor and allowing for a correlation between Digits Forward/Digits Forward Length, and Digits Backward/Digits Backward Length. Factor loadings, intercepts, and residual variances were free to vary across group(CFI = 0.989; TLI = 0.985; RMSEA = 0.045; SRMR = 0.019). The sample was then stratified into three groups based on CDR scores and a multiple group CFA analysis was performed. The normal group consisted of n=2409 individuals defined as having a CDR global score=0.0. A second group consisted of n=2037 individuals with MCI defined as having a CDR score=0.5. The dementia group (n=1564) was defined as having a CDR score >0.5. The initial multiple group CFA had only two constraints: variances were fixed at 1.0 and means were fixed at 0.0; this model had a good model fit, meeting criteria for configural invariance. There was a Heywood case (factor loading >1.0; negative variance)44 in the dementia group Logical Memory Immediate Recall factor loading. Evaluation of observed data suggests this phenomenon could be explained by a lack of variability in Immediate and Delayed recall responses due to floor effects in the dementia group (data not shown). Because the value in excess of 1.0 was small, as was the negative variance, and the model converged, we chose to proceed with the analysis45 acknowledging that, as one would expect, performance on memory measures does not follow a linear pattern among dementia cases.
Factor loadings for each group and fit statistics are shown in Table 4, Model 1. This table lists each factor and corresponding tests in the left column. For each model, three sets of standardized factor loadings are presented, one for each group. The type of invariance measured and the parameter constraints of each model are listed across the top; fit statistics for each model are listed on the bottom. The fit statistics for Model 1 suggest configural invariance. Considering only the CFI and TLI, Models 1-4 met demonstrate configural, metric, strong (scalar), and strict factorial invariance. The values for the RMSEA and SRMR, however, were>0.05 for models 2, 3, and 4. Although model fit deteriorated with the addition of constraints, Model 3 (strong invariance) still fit better than a fully saturated model (BIC = −70.33) implying some level of invariance. The non-negative BIC value for Model 4, however, indicated that a fully saturated model would fit better. Inspection of modification indices (MI) suggested that cross-loadings such as Attention by Trails B, Attention by Logical Memory Immediate Recall, and Logical Memory Delayed Recall would improve model fit. When these modifications were added, the resulting Model 4 fit statistics were as follows: CFI=0.968, TLI=0.968, RMSEA=0.056, SRMR=0.052 and BIC= −267.165.
We next considered the stability of the factors over time at the group level. For this analysis, we compared each group at baseline to the first follow-up evaluation (about a 1 year lag, on average). This analysis was performed without regard for participants that may have progressed from one group to another because our focus was on the stability of the factor structure rather than individual performance. A total of 2,449 participants who were evaluated at the first annual follow-up were classified as normal. For the MCI group, a total of 1,514 were evaluated one year later as were 1,428 in the dementia group. Using the CFA model described above, we evaluated invariance over time for each group. In the three groups, all four levels of invariance criteria were met indicating remarkable stability in the factors over time within group (Tables (Tables5a,5a, ,5b,5b, and and5c5c).
Our empirically derived EFA and CFA models were closely aligned with the theoretical models originally proposed in the development of the NACC UDS Battery (Table 1).12 When tested for invariance, the factor structure met criteria for dimensional and configural invariance. Findings suggest the presence of metric, strong, and strict factorial invariance although some fit statistics were not ideal (i.e., RMSEA, SRMR). When model modifications allowed for cross loadings on the Attention factor by Trails B, LM Immediate, and LM Delayed, the result suggested a level of strict invariance. The implication is that acknowledging some cross loadings of test scores; the battery is relatively invariant across groups. That is, normal, MCI, and dementia subgroups, as defined by the CDR, show slight differences in patterns of performance across cognitive domains but the factor structure of the NACC battery remains generally stable for each group. Factors (or cognitive domains) were also stable over time within each group as criteria were met for all four classifications of invariance in an evaluation over a one year time interval.
These findings provide evidence of the stability of cognitive domains as measured by the NACC UDS battery. For researchers using data derived from a nationwide sample of participants covering a range of cognitive function, it is important to know that the same domains are being measured at different levels of cognitive function. Variation in test performance then represents quantitative change rather than qualitative. This allows for comparison across groups and over time.
Others have performed similar analyses using different neuropsychological batteries on a range of diagnostic groups with varying results.8, 11, 46, 47 In samples of individuals with mild impairment or AD, slightly different factor structures than those seen in normal samples seem to emerge implying variation in the cognitive domains being measured. There are also differences in the number of factors extracted from various batteries. There are a couple of reasons for these differences. First, the psychometric characteristics of each test and each battery are important to consider as some may be better at cleanly measuring individual cognitive domains than others. As noted previoiusly,7 normals’ performance on neurocognitive tests tends to be uniform, hence the one-factor solution typically found in normal subjects. Individuals suffering from neurodegenerative disease such as AD, tend to perform heterogeneously.
Second, the number of factors extracted can vary depending on the objective of the analysis. If data reduction is the goal, factor extraction based on the use of Eigenvalues42 or scree plots41 may be appropriate and can yield a smaller number of factors. When more factors are extracted, the fit statistics tend to improve. It is up to the researchers to determine the appropriate number of factors that hang together and make theoretical sense. Parsimony and simplicity play important roles.
The strengths of the current study are the large sample size and the fact that the NACC battery was administered in a standardized fashion across sites. There are a number of limitations to be noted. The sample is not population-based and therefore results may not be generalizable. Centers may follow different protocols for volunteer recruitment and various centers have different areas of specialty, which may tend to attract volunteers with a particular interest or family history. Participants were classified into three groups based on the CDR. While not required, it is possible that test scores were considered in the assignment of CDR scores. The choice of the CDR as a grouping variable over actual diagnoses was an attempt to avoid any tautology. Because we did not rely on diagnoses, there is likely a large degree of heterogeneity within each subgroup of participants. We did not exclude participants with comorbid conditions, nor did we adjust for demographic characteristics as our focus was on the general factor structure of the battery. The battery itself, while broad in scope may have limitations in its ability to tap cognitive domains. Some of the test results tend to be skewed, having floor and ceiling effects. To remedy this possibility, a Blom transformation was applied to normalize the data and we used an MPLUS estimator that is robust to non-normality.
We found that the CFA factor structure fit all three CDR groups (i.e. dimensional and configural invariance held) and the ability of individual tests to adequately represent each factor varied only slightly between different groups defined by CDR scores. Our findings provide strong support for continued efforts to elucidate a biomarker of early dementia based on detailed neuropsychological assessment. Insofar as factor structures are consistent, differences in performance within each group suggest impairments in those cognitive domains rather than the inability of the battery to adequately measure performance. It is important to note the stability of the battery because impairments in specific cognitive domains have been correlated with neuropathology in prior studies.7, 48 In general, our findings are in contrast to others who have found singular factor structures or factor structures that vary in different patient populations.7, 49 Although this can be largely attributed to differences in batteries used and sample characteristics, our results suggest that there are quantitatively, not qualitatively, different cognitive patterns in the three diagnostic categories. In conclusion, we have found that the theoretical constructs behind the development of the UDS neuropsychological battery seem to hold. Within groups, the battery demonstrates stability over a one year period. Longer term follow-up will be useful in determining whether factor stability is maintained longitudinally at an individual level.
We would like to thank Drs. Walter Kukull and Leslie Phillips for their help in acquiring the NACC data, and Dr. Weintraub for permission to reproduce table 1. We would also like to thank the study participants who contributed their time and effort for the advancement of Alzheimer’s disease research.
Funding Sources: This work was supported by grants from the National Alzheimer’s Coordinating Center (NACC Junior Investigator grant: 2008-JI-02) and from the National Institute on Aging (P30- AG 028377, K01 AG029336).
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.