|Home | About | Journals | Submit | Contact Us | Français|
Decline in cognitive abilities can be an important contributor to the driving problems encountered by older adults, and neuropsychological assessment may provide a practical approach to evaluating this aspect of driving safety risk. The purpose of the present study was to evaluate several commonly used neuropsychological tests in the assessment of driving safety risk in older adults with and without neurological disease. A further goal of this study was to identify brief combinations of neuropsychological tests that sample performances in key functional domains and thus could be used to efficiently assess driving safety risk. 345 legally licensed and active drivers over the age of 50, with either no neurologic disease (N=185), probable Alzheimer's disease (N=40), Parkinson's disease (N=91), or stroke (N=29), completed vision testing, a battery of 10 neuropsychological tests, and an 18 mile drive on urban and rural roads in an instrumented vehicle. Performances on all neuropsychological tests were significantly correlated with driving safety errors. Confirmatory factor analysis was used to identify 3 key cognitive domains assessed by the tests (speed of processing, visuospatial abilities, and memory), and several brief batteries consisting of one test from each domain showed moderate corrected correlations with driving performance. These findings are consistent with the notion that driving places demands on multiple cognitive abilities that can be affected by aging and age-related neurological disease, and that neuropsychological assessment may provide a practical off-road window into the functional status of these cognitive systems.
Driving an automobile places demands on perceptual, motor, and cognitive systems that can be affected by normal aging and age-related neurologic disease. The population of legally licensed older drivers continues to increase, resulting in a growing number of drivers with varying degrees and profiles of safety-relevant functional limitations. Individual rights and the benefits of mobility provided by driving must be balanced against the potential public safety risk posed by the aging population of drivers. Ideally, decisions regarding cessation or limitation of driving will be guided by a rational approach informed by empirical studies. There is sufficient variability in driving performance within age groups and diagnoses to render age- or diagnosis-based restrictions unfair for some and unsafe for others. Driving history, including violations and crashes, can provide a good predictor of future unsafe driving, but intervention after the fact is not ideal. On-road driving evaluations can be informative and are considered by some to be the “gold standard” for determining driving competency in older adults (e.g., Dobbs et al., 2002). However, driving during these evaluations may not provide a representative sample of a person's typical driving behavior, and there are limitations including practicality and expense, particularly if repeated assessment is needed to monitor possible changes in driving safety status (Brown & Ott, 2004). Performances on tests of driver knowledge, such as State licensing exams, may not reflect the person's application of that knowledge while driving, and also may be preserved even in the face of substantial acquired cognitive deficits, such as dense amnesia (e.g., Anderson et al., 2007).
Although there is evidence that declining cognitive abilities can be an important contributor to the driving problems encountered by older adults, there has been limited study of the relationships between neuropsychological test performances and on-road driving ability (for reviews, see Carr & Ott, 2010; Iverson, Gronseth, Reger, Claassen, Dubinsky & Rizzo, 2010; Reger et al., 2004; O'Neill, Rizzo, Reger & Iverson, 2010). Some studies have failed to find relationships between neuropsychological test performances and driving (e.g., Trobe et al., 1996), and the evidence to date remains insufficient to make strong practice recommendations. However, several studies have found individual neuropsychological tests and composite indices of cognitive status to be significantly correlated with measures of driving safety in a number of settings, including simulator scenarios, performance in standard on-the-road driving tests, and final driving outcomes such as driving cessation (e.g., Dawson, Anderson, Uc, Dastrup, & Rizzo, 2009; Dawson, Uc, Anderson, Johnson, & Rizzo, 2010; De Raedt & Ponjaert-Kristoffersen, 2000; Ott et al. 2008; Uc et al., 2009; Uc, Rizzo, Anderson, Shi, & Dawson, 2005).
Studies to date of neuropsychological predictors of driving safety in older drivers have focused primarily on normal elderly drivers or drivers with specific diseases such as early Alzheimer's disease, Parkinson's disease, or Mild Cognitive Impairment (MCI). Such studies are important for the identification of cognitive, perceptual, or behavioral impairments that may lead to unsafe driving in a given patient group, and to guide targeted interventions for specific populations. However, relatively little attention has been directed toward the broader issue of relationships between neuropsychological test performances and driving safety more generally, across a broad spectrum of age-related conditions. The limited research to date that has taken this approach suggests that evaluation of cognitive abilities can provide valuable information regarding driving safety risk, across diagnostic categories (e.g., Grace et al., 2005; Whelihan et al., 2005).
Driving poses an equal-opportunity safety risk – in other words, the hazards inherent in the road and traffic do not make concessions or accommodations for being older or for having acquired cognitive, perceptual, or motor deficits. In this vein, Barrash et al. (2010) showed that “pure” or raw neuropsychological test scores, not adjusted for age or other demographic factors, provided more accurate prediction of driving performance than did demographically-adjusted scores. Once on the road, all drivers face essentially the same challenges, notwithstanding age or diagnosis. Their level of performance in key functional domains (cognitive, perceptual, motor) is the predominant factor in driver safety, not the etiology of that performance, which, in many cases, is undiagnosed (Johansson, Bronge, Lundberg, Persson, Seideman & Viitanen, 1996) or has not been studied with respect to driving (Rizzo, 2011).
Furthermore, in many instances, an older driver's neurological status may be unknown or uncertain. Conditions such as Alzheimer's disease develop gradually over a period of many years before clinical diagnosis can be made. Also, it is becoming increasingly apparent that a large percentage of older individuals with cognitive impairments have more than one neurodegenerative condition. For example, an autopsy study of 80 people with clinically diagnosed probable AD found that more than half of the tissue samples showed evidence not only of AD, but also other brain disease, primarily infarcts and Lewy body disease (Schneider, et al., 2007). Similarly, many patients with a clinical diagnosis of vascular dementia are found to also have Alzheimer's disease pathology at autopsy.
With these considerations in mind, the broad goal in this study was to examine the relationships between performances on several commonly used neuropsychological tests and driving safety risk in older adults, irrespective of diagnostic status. More specifically, we wanted to determine if one or more brief batteries of cognitive tests could provide efficient assessment of driving safety in older adults across a large and heterogeneous sample of legally licensed older drivers. The first step was to better specify the functional domains tapped by several commonly used neuropsychological tests in the assessment of driving safety with confirmatory factor analytic techniques (Bollen, 1989). Guided by the structure of the cognitive functions, the second step was to identify brief combinations of neuropsychological tests that sample performance across key functional domains, and which might be used to efficiently assess driving safety risk in clinical and research settings.
The participants were 345 (230 M, 115 F) active drivers between the ages of 50 and 85 (mean age = 68 years), including 185 with no neurologic disease, 40 with probable Alzheimer's disease (AD), 91 with Parkinson's disease (PD), and 29 with stroke. Diagnosis of AD was based on the NINCDS-ADRDA criteria (McKhann et al. 1984). Diagnosis of PD was based on UK Parkinson's Disease Society Brain Bank clinical diagnostic criteria (Gibb & Lees, 1988; Hughes, Daniel & Clifford, 1992). Participants in the Stroke group had a history of a single cerebrovascular event at least 3 months prior to the study, documented with CT or MR imaging. All participants held a valid state driver's license and were still driving. They were recruited from the general community by means of advertisements and from outpatient clinics. The data included in the current analyses were obtained from participants in our prior and ongoing studies who: a) had a diagnosis of either AD, PD, Stroke, or no neurologic disease, b) were age 50 or older, c) had completed a standardized neuropsychological battery described below, and d) had subsequently completed a standard on-road driving evaluation in an instrumented vehicle. Exclusion criteria included alcohol or substance abuse, major psychiatric disease, use of sedating medication, and corrected visual acuity less than 20/50. The methods for the neuropsychological assessment, on-road driving evaluation, and video review of driving errors were the same for all subjects across studies (Dawson et al., 2009; Dawson et al., 2010; Uc et al., 2010). For some analyses, two random subsamples were formed in which the distribution of diseased (Parkinson, Stroke, or Alzheimer's) and non-diseased participants were similar. These two samples were compared for similarity in means on safety errors during the drive and neuropsychological functioning, and none of those independent sample t-tests were significant at the conventional level. All participants provided informed consent according to the policies of the Institutional Review Board at the University of Iowa.
A battery of standardized neuropsychological tests was administered by a trained technician during a single session lasting less than 2 hours. The tests were selected on the basis of their conceptual relevance to driving and demonstrated sensitivity to brain dysfunction (for test descriptions, see Lezak, Howieson & Loring, 2004; Strauss, Sherman & Spreen, 2006). The tests included: Trail Making Test Part A (TMT-A), Judgement of Line Orientation, Complex Figure Test-Copy (CFT-Copy), Complex Figure Test-30 Minute Delayed Recall (CFT-Recall), WAIS-III Block Design, Benton Visual Retention Test (BVRT), Controlled Oral Word Association (COWA), Rey Auditory Verbal Learning Test (AVLT), Grooved Pegboard (average of left and right hands), and Useful Field of View (UFOV – total loss from all four subtests) (Ball & Rebok, 1994; Ball et al., 1993).
The raw scores on the individual tests were reversed when necessary so that high scores represented better functioning on each test, and subsequently transformed to t-scores based on the sample mean and standard deviation of 185 normal participants with no documented neurological condition. Hence, high scores on neurocognitive tests reflected better functioning.
Contrast sensitivity was assessed with Pelli-Robson Chart (Pelli, Robson & Wilkins, 1988). Visual acuity was measured as logarithm of minimum angle resolution on Early Treatment Diabetic Retinopathy Study chart for far visual acuity and reduced Snellen chart for near visual acuity (high scores are worse; Ferris, Kassoff, Bresnick & Bailey, 1982).
On a separate day following the neuropsychological testing, the participants took an 18-mile on-the-road-driving test around Iowa City in an instrumented vehicle. The test included both urban and rural routes and was conducted on days when weather did not lead to poor visibility or road conditions. The test began after a brief acclimation period to the vehicle, and a trained experimenter sat in the front passenger seat to give instructions and operate the dual controls, if needed. The vehicle is a midsized car with an automatic transmission and hidden instrumentation and sensors. Electronic data (steering wheel position, accelerator and brake pedal position, lateral and longitudinal acceleration, and vehicle speed) were recorded at 10 Hz. Four miniature lipstick-size cameras captured driver behavior (two views) and anterior environment (two views).
A certified driving instructor reviewed the videotapes of the drive to score safety errors according to the standards of Iowa Department of Transportation (September 7, 2005 version). The scoring generated information on frequency and types of safety errors the participants committed. The taxonomy of 76 errors types (e.g., incomplete stop, straddles lane line) are organized into 15 categories (e.g., stop signs, lane observance). 30 of these errors were classified as critical errors (e.g. entering an intersection on a red light), meaning under a different set of circumstances such errors would lead to crashes. The remaining errors were classified as non-critical errors. In all analyses, the sum of critical and noncritical errors was used as the outcome measure. A single reviewer evaluated all drives in this study. To evaluate the reliability of this scoring system, a sample of 30 drives were re-reviewed by this instructor and were independently reviewed by a second driving instructor. For total number of errors per drive, the primary reviewer's intra-rater correlation was .95, and the inter-rater correlation was .73.
As a preliminary step, the intercorrelations among the individual predictors and the outcomes were examined. Then the primary analyses proceeded in two steps. First, the structure and dimensionality of the neuropsychological and visual sensory functioning measures were examined with Confirmatory Factor Analyses (CFA). All model-fitting analyses were conducted with LISREL 8.72 on variance-covariances (Joreskog & Sorbom, 2001) using robust maximum likelihood (ML) estimation (Sattora & Bentler, 2001). Both normal ML theory and Robust ML, Satorra-Bentler rescaled (SB) chi-squares are provided in tables. However, inferences were based on SB rescaling (Satorra & Bentler, 2001)1. All comparisons involved nested models, constituting exact tests of the specific implied constraints in the target model relative to the comparison model.
In addition, we relied on several fit indices. Goodness-of-fit indices in particular allow us to evaluate if assumptions of the substantive models provide an adequate explanation of the observed phenomena, for each model in isolation. There are several ways to classify goodness-of-fit indices. Here, we relied on Kaplan's framework (Kaplan, 2000) and chose the following three stand-alone fit indices: 1) Root Mean Square Error of Approximation (RMSEA), with a 90% Confidence Interval (CI), 2 Expected Cross-Validation Index (ECVI), with a 90% CI, and 3) standardized Root Mean Square Residual (sRMR). Among incremental fit indices we examined the Comparative Fit Index (CFI) and Normed Fit Index (NFI). When the model is acceptable in the population of interest, we expect RMSEA to range from .05 to .08 or less, CFI to take on values .95 and higher, NFI to take on values .90 and higher, and sRMR to take on values close to .05 (Hu & Bentler, 1999). The values for the ECVI are evaluated relative to the value this index takes for the saturated model, which necessarily has a perfect fit. When the value of the ECVI in the target model is lower relative to its value for the saturated model, we have greater confidence that the results would hold in an independent sample of the same size.
For all confirmatory factor models, the scales of the latent factors were defined by fixing their variances to unity. Every indicator was forced to load on only one factor and all error correlations were constrained to be zero with the exception of the error correlation between CFT-Copy and CFT-Recall scores which was freely estimated.
In the second step of the analyses, the factor structure that provided the best fit to the data was used to inform the formation of small neuropsychological composite scores that could be used as part of a brief assessment of fitness to drive in clinical and research settings. One reasonable approach to forming brief assessment batteries would be to administer one test from among the indicators of each latent factor to briefly sample a cognitive domain. We evaluated all possible such small composites for their relative utility in prediction of driving performance. To that end, the corrected (for age and visual sensory functioning) and uncorrected correlations of the composites with driving errors were obtained in two random subsamples. Those correlations were z-transformed and used as outcome measures in two-way mixed design ANOVAs.
Table 1 presents the intercorrelations among the observed predictors and outcomes. As can be seen from Table 1, all correlations were in the expected direction. Out of a total of 91 correlations, all but two were significant at the .05 level or better. Absolute value of the correlations between each of the 10 individual neuropsychological tests and total driving errors ranged from .18 to .42 (all p<.01).
Table 2 presents the fit statistics for confirmatory factor models that examined the dimensionality of the battery of tests. As can be seen from Table 2, a single factor model (Model 1a) was highly inadequate. The two-factor model examined whether it would be adequate to separate visual sensory functioning (Contrast Sensitivity and Acuity measures) from cognitive functioning to achieve acceptable fit. Despite providing significant improvement over the single-factor model, the overall fit of this model was highly inadequate as well, indicating that a more differentiated view of cognitive functioning is warranted. The three-factor model evaluated the plausibility of differentiating between cognitive functioning tests so that speeded tests (both motor and cognitive processing) such as TMT-A, Grooved Pegboard (GP), and UFOV-Total Loss were distinguished from those that minimized the role of speed in overall scoring such as those tapping memory and visuospatial abilities. Again, while the fit of this model was significantly better than the two-factor model, the overall fit was inadequate given the RMSEA and NFI criteria, and elevations in standardized residuals indicated significant degrees of local misfit. The four-factor model further differentiated among cognitive tests so that those tapping memory (BVRT-E, COWA, AVLT-Recall, CFT-Recall) were distinguished from those that minimize memory demands during visuospatial tasks (JLO, CFT-Copy and Block Design). As can be seen from Table 2, the four-factor model showed significantly better fit than the three-factor model, and the fit of the four-factor model was adequate given RMSEA, CFI, and NFI criteria. We also considered the utility of differentiating among verbal versus nonverbal memory tests in a five-factor model. While the nested likelihood ratio-chi square test using normal ML theory indicated significant improvement in fit over the four-factor model, the rescaled SB chi-square difference was not significant. Given that previous factor analytic studies have not supported the separation of AVLT and COWA tests into latent factors that reflect primarily verbal functioning (Greenway et al., 2009; Siedlecki et al., 2008), we believe relying on the SB chi-square in our sample is the sounder choice here.
Collectively, those tests indicate a four factor-model that distinguishes visual sensory functioning from cognitive functioning and further distinguishes cognitive functioning into memory, visuospatial abilities, and speed of processing (SOP) components provides the best fit to the data. The factor loadings and estimated latent factor intercorrelations for this model are presented in Figure 1.
Next, CFA analyses were used to inform possible approaches to identifying specific smaller sets of tests from the full battery that might provide more efficient assessment of driver fitness in aging populations. When only one test from each of the three identified cognitive domains is selected to be part of a brief assessment, there are 36 possible three-test batteries, referred to as mini-composites hereafter. We evaluated which of the 36 mini-composites would yield the best prediction to driving performance by examining both the corrected (for age and visual sensory functioning) and the uncorrected correlations between each mini-composite and errors from the on-road driving test. The four panels of Figure 2 present corrected correlations between the 36 mini-composites and driving errors in two random subsamples, as well as their average (after r-to-z transforms and back-transforms), for each of four tests from the memory domain. The y-axis shows the magnitude of the corrected correlation, and the x-axis shows which of the tests from the visuospatial abilities and SOP domains were selected for a given mini-composite.
As can be seen from Figure 2, some mini-composites consistently showed larger absolute value correlations with errors on the road and less variability in correlations across the two random samples. For example, when BVRT-E served as the memory test (panel a), using UFOV to measure SOP and CFT-C to measure visuospatial ability produced higher correlations with errors on the road in both subject samples and less variability across samples (i.e. greater replicability) than choosing TMT-A and CFT-C. To enable more formal inferences on differences in the relative utility of the various possible mini-composites in predicting driving performance, the correlations were z-transformed in each random sample, and used as outcome measures in two-way ANOVA's. For example, there are 9 mini-composites for each of the four memory tests and hence 9 correlations with errors for each of these four tests. Given that we evaluated replicability of those correlations in two random samples, there were 18 correlations per memory test as outcomes in the ANOVA. A total of six two-way ANOVAs were conducted on z-transformed corrected (three two-way) and uncorrected (three two-way) correlations. The two random samples served as the between-subject factor in these ANOVAs, and the tests (indicators) from each of the three cognitive domains (latent factors) formed the levels of the within-subject factor.
4×2 mixed design ANOVA for memory tests indicated that there were significant differences in the size of the correlations of four memory tests with driving performance, F(3, 48) = 63.80, p < .001. CFT-R produced larger correlations with driving performance than BVRT-E, F(1,16) = 55.41, p < .001, BVRT-E produced larger correlations with driving performance than AVLT-Recall, F(1,16) = 52.63, p < .001, and COWA F(1,16) = 16.67, p < .005. There were no differences in the size of the correlations with driving performance when COWA or AVLT-Recall was selected to tap memory F(1,16) = 1.47, ns.
The 3×2 ANOVA for the three tests from the SOP domain also indicated differences in the relative size of correlations with driving performance F(2,44) = 17.47, p < .001. Follow-up comparisons indicated that UFOV total loss produced higher correlations than GP, F(1,22) = 12.01, p < .005, and GP produced higher correlations than TMT-A with driving performance, F(1,22) = 13.35, p < .001.
The 3×2 ANOVA for the three tests from the visuospatial domain also indicated differences in the relative size of correlations with driving performance, F(2,44)=35.71, p<.001. Follow-up comparisons within the visuospatial abilities domain indicated that CFT-C produced higher correlations with driving performance than Blocks, F(1,22) = 72.36, p < .001. Although Blocks and JLO produced similar magnitude correlations with driving performance on average (i.e. across random samples) F(1,22) < 1, ns, correlations of JLO with driving performance fluctuated to a greater extent (less replicability) than Blocks, F(1,22) = 59.12, p < .001.
In summary, the findings showed that UFOV followed by GP from the SOP domain; CFT-Copy followed by Block Design from the visuospatial abilities domain; and CFT-Recall followed by the BVRT-E from the memory domain yielded the highest correlations with on the road performance. Inferences were the same when uncorrected correlations were examined.
Along with perceptual and motor abilities, cognitive status is a key determinant of driving safety risk in older adults (Anstey et al., 2005). Neuropsychological assessment of relevant cognitive abilities can play an important role in screening and more comprehensive evaluation of driving fitness. The findings of the current study showed logical and significant relationships between performances on standardized tests tapping key cognitive domains and safety errors committed while driving an automobile. Noninvasive and practical tests of vision, speed of processing, visuospatial abilities, and memory were found to provide prediction of older drivers’ safety risk. In clinical settings that require recommendations regarding driving fitness, such test data can be combined with other relevant medical information, the driver's history of crashes or traffic violations, current driving activity and transportation needs, and concerns of the patient or family who have observed the patient's driving.
In this analysis of relationships between neuropsychological test performances on driving ability, we attempted to first differentiate and specify domains of cognitive functioning relevant to driving performance, and then to identify possible sets of tests that might be used in brief screening or assessments of driver fitness in clinical or research settings. Consistent with the multi-dimensional demands of driving, our findings suggest there is value in evaluating multiple key cognitive domains. Specifically, in addition to standard vision screening (near and far acuity and contrast sensitivity), key domains identified for assessment in older drivers included speed of processing, visuospatial processing, and memory. Within each of these domains, two or more standardized tests showed reasonably strong correlations with the driving outcome measures (e.g., Speed of Processing: UFOV and GP; Visuospatial abilities: CFT-copy and Block Design; Memory: CFT-Recall and BVRT-E). Choice of tests in a given situation involving evaluation of driving safety risk could consider the statistical relationships illustrated here and practical concerns such as time and cost.
This study provides the largest analysis of relationships between neuropsychological test performances and on-road driving to date, but it has limitations. The inclusion of only legally licensed and actively driving participants precludes the establishment of firm cutoff scores for predicting driver failure. However, this subject group reflects both the challenge clinicians face in identifying at-risk drivers, as well as the realities of recruiting research participants who may perceive their driving privileges to be at risk. Another limitation is that all drives in this study were completed in good weather conditions, daylight hours, and during non-rush hour times of day, in order to minimize safety risks to participants. Although the route included a variety of rural and urban driving challenges (e.g., controlled and uncontrolled intersections, left turns, lane changes on interstate and multi-lane city roads), the investigators’ choice of driving conditions necessarily reduced the challenges that may characterize typical driving. This design also does not allow consideration of drivers’ strategic approach to driving, which may or may not include self-restriction to good driving conditions.
Another limitation is our choice of neuropsychological tests. We chose as a starting point a brief battery of commonly used and readily available tests with logical and/or empirical links to driving, but these tests clearly do not represent all possible choices. We anticipate that ongoing research will help to identify even more useful tests of cognition for the prediction of driving safety. One domain that may not be adequately represented in the current battery is executive functions, which have theoretical and demonstrated empirical relevance to driving (e.g., Wellihan et al., 2005).
This study shows that neuropsychological abilities in specific cognitive domains are important factors in driving. Previous studies have related cognitive test scores to driver behavior, but often only as univariate predictors (Reger, et al, 2004), and, in the absence of clear cutoff scores the findings have not translated to clinical recommendations (Iverson et al, 2010). Continued study of the predictive value of established tests and new experimental approaches and theory are essential. While aging and age related medical disorders may increase the risk of driver errors that lead to vehicle crashes, the relationship between diagnosis and safety risk in many diagnoses is unstudied or unclear. The findings in this study provide support for a general evidence-based framework for evaluating driver fitness based upon a functional evaluation of multiple domains that are important for safe driving. Cognitive tests from these domains can complement evidence from other sources (such as driving simulation and road tests) in assessments of older driver safety, even in the absence of a known diagnosis. Diagnostic status, when known, can help guide recommendations regarding time to follow-up evaluation.
The neuropsychological tests and test batteries identified here may provide a cost-effective component of a set of evidence-based criteria for evaluating older drivers’ safety risk. Changing population demographics point to a pressing need for continued research aimed at effectively preserving mobility while reducing safety risk in older citizens.
This study was supported by awards AG 17177 and AG 15071 from the National Institute on Aging (NIA), NS 44930 from the National Institute of Neurological Disorders and Stroke (NINDS), which provided salary support to the authors. The authors would like to thank the entire neuroergonomics research team and all participants in the study.
1Because Full Information Maximum Likelihood output from LISREL is severely limited in providing goodness-of-fit indices with the exception of RMSEA, the tables and figures are based on Robust Maximum Likelihood estimates using 231 cases with listwise deletion. However, model fitting analyses relying on Full Information Maximum Likelihood on 345 subjects led to similar inferences regarding factor structure.