|Home | About | Journals | Submit | Contact Us | Français|
Expert knowledge may compensate for age-related declines in basic cognitive and sensory-motor abilities in some skill domains. We investigated the influence of age and aviation expertise (indexed by Federal Aviation Administration pilot ratings) on longitudinal flight simulator performance.
Over a 3-year period, 118 general aviation pilots aged 40 to 69 years were tested annually, in which their flight performance was scored in terms of 1) executing air-traffic controller communications; 2) traffic avoidance; 3) scanning cockpit instruments; 4) executing an approach to landing; and 5) a flight summary score.
More expert pilots had better flight summary scores at baseline and showed less decline over time. Secondary analyses revealed that expertise effects were most evident in the accuracy of executing aviation communications, the measure on which performance declined most sharply over time. Regarding age, even though older pilots initially performed worse than younger pilots, over time older pilots showed less decline in flight summary scores than younger pilots. Secondary analyses revealed that the oldest pilots did well over time because their traffic avoidance performance improved more vs younger pilots.
These longitudinal findings support previous cross-sectional studies in aviation as well as non-aviation domains, which demonstrated the advantageous effect of prior experience and specialized expertise on older adults’ skilled cognitive performances.
As the workforce ages in an era of accelerating technological advances, it becomes imperative to understand how aging affects performance in the workplace. In aviation, for example, an aging workforce coincident with the introduction of jet aircraft appears to have played a role in the Federal Aviation Administration’s (FAA) decision for mandatory retirement of airline pilots at age 60.1 It has been argued that age-based retirement rules are discriminatory and should be replaced with more direct methods of risk assessment.2 Simulations of occupationally relevant or hazardous activities such as driving are desirable complements to medical and neuropsychological assessments because simulations permit individuals to draw upon prior knowledge and procedural memory relevant to a skill domain. Cross-sectional studies of expert performers, including medical technologists, typists, and musicians, have found that expert knowledge may compensate for age-related declines in basic cognitive and sensory-motor abilities in some skill domains.3–8 Thus, expertise may moderate (reduce) the impact of age on occupationally relevant performance.
Flight simulator assessments provide objective, reliable performance measures that are sensitive to differences in age9 and level of aviation expertise,10 but all of the flight simulator and expertise research to date has been cross-sectional. Longitudinal studies are essential toward understanding the aging process and its interplay with putative protective factors such as expertise.11,12 Using data from the ongoing Stanford/VA longitudinal study of aviators aged 40 to 69 years at study entry, we investigated the influences of age and expertise on flight simulator performance over a 3-year period.
Participants were part of the ongoing longitudinal Stanford/VA Aviation Study approved by the Stanford University Institutional Review Board. Main inclusion criteria were age 40 to 69 years at study entry, current FAA medical certificate (Class III or higher), and current flying activity with 300 to 15,000 hours of total flight time. This range of total flight hours was designed to avoid strong collinearity between age and hours of aviation experience; older airline pilots, for example, typically have over 20,000 hours of total flight time. Retired pilots from major air carriers were excluded because decline in flight simulator performance could be explained by less opportunity to fly after retirement. Thus, we selected a group of pilots whose aviation activity did not necessarily change at age 60. All participants gave written informed consent to participate in annual testing, with the right to withdraw at any time.
At entry, each participant was classified into one of three levels of aviation expertise depending on which FAA pilot proficiency ratings had been previously attained: 1) least expertise: VFR (rated for flying under visual flight rules only); 2) moderate expertise: IFR (also rated for instrument flight); and 3) most expertise: CFII, ATP, or both CFII and ATP (certified flight instructor of IFR students or rated for flying air-transport planes). FAA ratings are a convenient yet valid indicator of expertise level because each rating requires progressively more advanced training and more hours of flight experience. Within the VFR group, all were recreational pilots, though two had aviation-related employment (airplane broker and aircraft mechanic). Within the IFR group, the majority (55/60) had careers unrelated to aviation, though a few were part-time CFIs,2 aviation analysts,2 or had been an aviator in the army. Approximately one half (14/26) of the CFII/ATP participants were employed as full-time air transport pilots,3 part-time air transport pilots,4 CFIIs,3 or their job duties included aircraft piloting.4
Participants completed a cognitive battery designed to test abilities relevant for piloting aircraft, including tests from the CogScreen-AE battery13 and tests of information processing speed14 (see table 1 for means and table E-1 on the Neurology Web site at www.neurology.org for descriptions of the measures). Participants with at least three annual time points of flight simulator testing were included in the longitudinal data analyses. Of 141 participants who completed baseline testing before June 1, 2001, 118 had at least three annual time points (mean = 3.8, SD = 0.43), representing an average span of 3.1 years of follow-up (SD = 0.6). Of the 23 participants who had fewer than three annual time points, 12 discontinued participation after the baseline visit (8% of 141); 10 discontinued after the first follow-up (7%); and 1 had only two time points due to missing the first annual follow-up (1%). Stepwise logistic regression modeling did not identify any participant characteristics indicative of selective attrition. The characteristics included in the model were age at entry, expertise group membership, years of education, total hours of flight time, gender, necessity of a FAA medical waiver, self-reported health, performance on five cognitive tests, and overall performance in the flight simulator at entry.
Table 1 summarizes characteristics of the longitudinal participants at entry, separated by expertise group. As shown in the table, the groups differed in mean age [F(2,117) = 5.03; p < 0.01]. Also, higher levels of expertise were associated with more total flight time [p < 0.0001; nonparametric Kruskal-Wallis F(2,117) = 35.30] as would be expected, and with more recent flight time [p < 0.01; Kruskal-Wallis F(2,117) = 5.30]. We detected no differences in cognitive test scores by expertise group (ps > 0.05; effect sizes [ES] ranged from −0.18 to 0.02). Older age was associated with lower cognitive test scores (all ps < 0.01, ESs ranged from −0.26 to −0.50), which is consistent with previous findings for the early enrollees of this study10,15 (supplementary data E-1 on the Neurology Web site at www.neurology.org lists results of models testing the effects of age, expertise, and their interaction on cognitive test scores). Finally, despite capping total flight time to avoid collinearity between age and flight time, there was a small correlation between age and total flight time (rs = 0.29, p < 0.01).
Pilots “flew” in a Frasca 141 flight simulator (Urbana, IL). The simulator was linked to a computer specialized for graphics (Silicon Graphics, Mountain View, CA) that generated a “through-the-window” visual environment and continuously collected data concerning the aircraft’s position and communication frequencies. This system simulated flying a small single-engine aircraft with fixed landing gear and fixed propeller above flat terrain with surrounding mountains and clear skies. A cockpit speaker system was used to present prerecorded audio messages that simulated an air-traffic controller speaking to the pilot.
Prior to longitudinal data collection, participants had six practice flights in the simulator to gain familiarity with the flight scenario used throughout the study. Participants typically completed their practice flights during a 1- to 3-week period, after which they had a 3-week break before returning for the baseline visit. At the baseline visit and each annual time point thereafter, the participant flew a 75-minute flight in the morning and a 75-minute flight in the afternoon. Each flight was followed by a 40- to 60-minute battery of cognitive tests. The entire test day lasted approximately 6 hours, including a 40- to 60-minute lunch break. Each flight began with the air-traffic controller’s takeoff clearance. The first air-traffic control (ATC) message was presented 3 minutes later, after participants had lifted off the runway and climbed to 1,200 ft (365.76 m). During the flight, pilots heard 16 ATC messages, presented at the rate of one message every 3 minutes, directing the pilot to fly a new heading, a new altitude, dial in a new radio frequency, and, in 50% of the legs, dial in a new transponder code. Participants were instructed to read back the ATC messages and then execute them in order and according to FAA standards. To further increase workload, pilots were confronted with randomly presented emergency situations: engine malfunctions (carburetor icing, drop of engine oil pressure in 8/16 legs), and suddenly approaching air traffic (10/16 legs). Pilots were to report engine malfunctions immediately and to avoid air traffic by veering quickly yet safely in the direction diagonal to the path of the oncoming plane. Pilots flew in severe turbulence throughout the flight, and also encountered a 15-knot crosswind during approach and landing. Multiple versions of this flight scenario were presented to reduce learning of specific maneuvers and ATC items.
The scoring system of the flight simulator-computer system produces 23 variables9,16 that measure deviations from ideal positions or assigned values (e.g., altitude in feet, heading in degrees, airspeed in knots), or reaction time (in seconds). Because these individual variables have different units of measurement, the raw scores for each variable were converted to z-scores, using the baseline visit mean and SD (scores on the morning and afternoon flights were averaged).
The z-scores on the individual measures were aggregated on the basis of previous principal component analyses into four component measures: 1) accuracy of executing the ATC communications; 2) traffic avoidance; 3) scanning cockpit instruments to detect engine emergencies; and 4) executing a visual approach to landing.9,16
The four component measures were averaged to create a flight summary score, which was the primary measure of performance. To elucidate further how performance changed over time, the four component measures were analyzed as secondary measures. Random effects modeling was used to examine baseline levels and annual rates of change in the primary and secondary flight measures. For each participant and measure, the participant’s scores from each test day were regressed on the age at test, yielding intercept and slope values for the primary summary measure and for each of the four secondary measures. Thus, each participant had five baseline scores (the intercepts at entry age) and five rate-of-change scores (the slopes). To test hypotheses regarding age and expertise, the baseline and rate-of-change scores were analyzed using general linear modeling (SAS Proc GLM). A separate GLM was constructed for the primary and each secondary outcome measure. The terms of these GLMs were intercept, expertise, age, and the age × expertise interaction. Expertise was coded as an ordinal variable (−1, 0, 1) and age was a continuous variable centered at the median.17 The hypotheses regarding the primary outcomes were as follows:
Table 2 lists the GLM estimates for the model terms (i.e., intercept, age, expertise, and age × expertise terms). Flight summary scores modestly declined an average of 0.025 standard units per year (p < 0.05; ES = −0.22). The average rates of change in the various component measures of simulator performance varied widely (see intercept term β0 estimates listed in table 2). Communication task performance declined the most steeply, showing a decline of 0.091 units per year (p < 0.0001; ES = −0.61). The average rate of decline in visual approach-to-landing performance was modest (p < 0.01; ES = −0.26). In contrast, traffic avoidance performance showed an improvement of 0.041 units per year (p < 0.05; ES = 0.19). There was virtually no change in the average time to report engine emergencies (t < 1; ES = 0.08).
Beneficial effects of expertise were observed at baseline and longitudinally. The expected age differences at baseline were observed; yet, the longitudinal age patterns were quite different than expected. The GLM estimates are summarized in table 2, and described in detail below.
Advanced flight ratings and certifications were associated with better flight summary scores at baseline (β2 = 0.155, p < 0.05, ES = 0.24) and less decline over time (β2 = 0.039, p < 0.05, ES = 0.23). The average performance of CFII/ATP rated pilots was essentially flat over the duration of follow-up (mean slope of the flight summary score = 0.002 ± 0.10). VFR-rated pilots had the steepest rate of decline in flight summary scores (mean = −0.066 ± 0.13). IFR-rated pilots had an intermediate rate of change (mean = −0.015 ± 0.11). The beneficial effects associated with aviation expertise were especially apparent in the communication task (see table 2).
Because the three expertise groups differed significantly in terms of hours of flight time, it was important to examine the extent to which more hours of flight experience could also account for better flight simulator performance. Also, the differing amounts of flight time for VFR, IFR, and CFII/ATP pilots progressively widened over the 3 years of follow-up: VFR pilots accumulated an average of 63.0 ± 88.4 hours per year, IFR pilots an average of 99.6 ± 94.9 hours per year, and CFII/ATP pilots an average of 223.9 ± 203.4 hours [F(2,117) = 12.69, p < 0.0001]. To examine the role of flight time on pilot performance, we recomputed the baseline and longitudinal age × expertise models, replacing FAA pilot ratings with flight time. In the model of baseline performance, the total hours of flight time reported at study entry was tested (along with age and its interaction with flight time). More total flight time did not predict better flight summary scores at baseline (p > 0.10; ES = 0.14; GLM parameter estimates are listed in table E-3). Similarly, greater accumulation of hours of flight experience during follow-up was not associated with less longitudinal decline in the flight summary score (p > 0.50; ES = 0.04; see table E-3). In short, expertise—defined by advanced training and extensive time engaged in the activity—was a stronger predictor of skilled performance than amount of activity alone. These findings illustrate how expertise is distinct from amount of activity, even though the two may be intercorrelated.
Older age was associated with lower flight summary scores at baseline (β1 = −0.038, p < 0.0001, ES = −0.58). The effects of age on baseline performance were most evident in the traffic avoidance (ES = −0.60) and approach measures (ES = −0.47), though age-related differences were significant for all of the flight component measures (see table 2). Longitudinal analysis of the flight summary scores revealed, surprisingly, that older pilots showed less decline over time than younger pilots (β1 = 0.004, p < 0.01, ES = 0.25). The unexpected longitudinal age pattern primarily reflects the fact that older pilots improved their traffic avoidance performance more so than younger pilots (p < 0.01; ES = 0.25).
To illustrate the age trends, pilots were subgrouped into three age ranges: 40 to 49, 50 to 59, or 60 to 69 years of age at study entry. This grouping reveals that, in terms of overall flight simulator performance, pilots aged 40 to 49 had a mean rate of decline of −0.057 standard units per year; pilots aged 50 to 59 had a mean decline of −0.040 units per year; 60- to 69-year-old pilots had a mean improvement of 0.018 units per year. The figure illustrates the age-related and expertise-related patterns of performance over time. Plotted are the baseline means and the directionality of annual change for pilots within the three age ranges and as a function of FAA rating. As can be seen in the figure, the annual rate of decline decreased with increasing age and with increasing expertise. We did not detect an interaction between age and expertise (p = 0.15; ES = 0.14). It should be noted that the numbers of participants in the extremes of the age and expertise distributions were modest (5 to 9).
Findings confirm that flight simulator assessments can detect changes in performance related to age and expertise. Over a 3-year span of testing, we observed a significant though modest decline in overall performance, which varied depending on pilot age and FAA proficiency ratings. Of the four flight components assessed, communication task performance declined the most steeply over time. The present study focused on general aviation pilots due to the difficulty in drawing conclusions about age-related performance differences among airline pilots because mandatory retirement impacts the amount and type of flight experience after age 60. Nonetheless, the population of older general aviation pilots is important in its own right because of needs for medical monitoring and because general aviation accident rates have historically been as much as 90 times the rate for air carriers.18 Remarkably, a recent epidemiologic study reported an increased risk of general aviation accidents with increasing age, beginning at age 35.18 Hours of flight experience, a variable more accessible than FAA ratings in aviation databases, is consistently found to be a relevant factor in epidemiologic studies of aircraft accident rates.18,19
In this study, more expert pilots, i.e., those with advanced FAA pilot ratings and certifications, had better baseline flight simulator performance, especially in the communication and approach-to-landing components. Several cross-sectional studies have documented the advantage of aviation expertise (and hours of flight experience) in laboratory studies of cockpit scanning,20 processing ATC communica-tions,8,10,21 performing instrument flight maneuvers,22 and making weather-related decisions.23 More expert pilots in this study also showed less decline over time on average. This longitudinal result bolsters previous cross-sectional findings in aviation as well as non-aviation domains, which demonstrated the advantageous effect of prior experience and specialized expertise on older adults’ skilled performances.3–7
The prevailing theoretical view is that the acquisition of expertise typically requires a decade or more of deliberate, well-structured practice in a particular skill domain (such as music, athletics, or chess)24 and reflects brain plasticity,25 such that experts build an elaborate, integrated base of declarative and procedural knowledge. This specialized base of knowledge supports attention to key relationships between individual items of information,20 anticipation of likely future events,26 and coordination of motor movements4 to respond faster and more accurately. For example, expert pilots attend to the relationship between speed and direction of visual information to anticipate the ideal flight path, whereas novices have not attained this skill.27 Finally, expert knowledge has been characterized as an example of crystallized intelligence, which is more stable across the lifespan6 than fluid abilities such as episodic memory recollection and executive control.
Aviation expertise was associated with less decline in flight simulator performance over time. Multiple, interrelated mechanisms related to amount and type of aviation knowledge and frequency of use may explain this finding. We conjecture that in addition to drawing upon aviation knowledge, pilots also learned test-taking strategies specific to the flight simulator testing scenario (especially during the practice sessions). To the extent that memory for test-taking strategies fades at annual follow-up visits, pilots with basic ratings might show decline in overall flight simulator performance. In contrast, strategy recollection may be less consequential for pilots with advanced ratings. First, a pilot with advanced ratings has the benefit of a more elaborate base of knowledge. Second, this knowledge base is better adapted24 to skills measured in the flight simulator, such as precise altitude control. Third, between the annual tests, a pilot with advanced ratings may engage in flight activities that require continual access of knowledge and practice of skills (e.g., flight instructing or instrument flying to maintain close altitudes and precision runway approaches), which may help maintain some skills measured in the simulator. VFR-rated pilots are less likely to be engaged in such precise flying between annual tests. Future research should record the time spent in specific types of flight activities to address questions of how much and what types of experience promote stable or improved aviation performance.
The age differences in flight simulator performance observed at baseline are consistent with earlier cross-sectional studies, which also found that older pilots executed air-traffic controller communications less accurately on average, evaded air-traffic conflicts less adroitly, and less skillfully approached the runway for landing.9,10,28 Unexpectedly, older pilots showed less longitudinal decline in overall flight performance than younger pilots. Secondary analyses revealed that the older pilots did well over time in part because their traffic avoidance performance improved more than younger pilots. There are several possible reasons why older pilots maintained their levels of performance over time, including sampling bias related to hardy survivor and nonrandom drop-out effects, birth cohort differences, and truncated age range of the sample. In view of the lack of evidence for nonrandom drop-out biasing, we focus on three other possible explanations: floor effects, in which poor performers have less room to decline than high performers; regression to the mean, in which over time lower performers improve and higher performers decline; and differential practice effects, in which older participants benefit more from repeated testing than the younger participants.
Older pilots performed worse than younger pilots at baseline on average, and therefore, may have less room to decline due to floor effects. A floor effect has been noted for the transponder item of the communication measure.29 Nevertheless, older and less expert pilots’ communication performance continued to decline over time. In other measures, such as approach, it was possible to have very large deviations from ideal positions, and therefore, substantial room to decline. Thus, floor effects do not appear to be a convincing explanation for the finding of less decline on average for older participants.
Regression to the mean may partially explain the findings, particularly in the traffic avoidance task. Two conditions that together allow regression to the mean are unreliable measures and the differential selection in a pre-post design of participants who initially scored at the extremes. Because we did not use an extreme groups pre-post design, regression to the mean is not an obvious explanation. Also, the flight summary score showed excellent consistency over time (intraclass correlation or ICC = 0.79) and reliability was enhanced by having three to four annual points per participant. Importantly, regression to the mean cannot explain, in the case of the summary and the communication scores, why expertise would give rise to higher baseline scores and less decline over time. Nonetheless, the traffic avoidance results may reflect regression to the mean to some extent because traffic avoidance was the component that showed the largest negative age relations at baseline, significant age-related improvement over time, and the least consistency over time (ICC = 0.48).
Another explanation that is consistent with the pattern of results is an age difference in practice effects. Practice effects have been increasingly recognized in longitudinal studies of normal aging and preclinical dementia.30–33 Indeed, the incremental increase of taking a test the second time has been estimated to be as much as 10 to 15 times larger than the effect of aging 1 year.30,32 We attempted to minimize practice effects by familiarizing pilots with the simulator scenario prior to the baseline assessment. Nevertheless, performance of the traffic avoidance task continued to improve over time, with older pilots improving more than younger pilots. There currently are two lines of evidence, in computerized testing, for greater improvement among older adults. Older adults have shown greater improvements than younger adults in reaction time in the task-switching34 and in consistent-mapping visual search paradigms.35–37 Although older adults did not show greater improvement when working-memory load was high,34 nor did they show as much stimulus-specific learning as did younger participants,36 older adults retained what they learned up to 16 months.37 In the present study, older pilots conceivably improved their reactions to oncoming traffic by learning task switching and visual search strategies helpful to performance. Greater improvement among older adults has rarely been reported in longitudinal studies employing paper-and-pencil neuropsychological or intellectual ability tests,31,32 but see reference38. Our atypical finding will need to be replicated in the independent cohort of aviators we are presently enrolling.
Some cross-sectional studies found that aviation expertise moderated age differences in pilot performance.8,39,40 Other studies found that while aviation expertise significantly helped performance, expertise did not significantly moderate the influence of age.10,21 The present study did not find that aviation expertise moderated the impact of age on longitudinal flight simulator performance. Because the numbers of participants in the extremes of the age and expertise distributions were modest, statistical power for the test of an age × expertise interaction was less than it would be in an extreme-groups design.41 Clearly, a longer duration of follow-up is crucial to examining an age-moderating effect of expertise on specialized skill domains.42
These findings have broader implications beyond aviation to the general issue of aging in the workplace. Several issues emerge from an aging workforce, including technological developments, training and retraining, retirement, physical capacity, health, and performance.43 Middle-aged workers, for example, can be retrained as effectively as young workers, while older workers also can be retrained but less efficiently than their younger counterparts.44 If retirement ages become increasingly delayed, objective assessments of workplace competence will become essential for older workers, especially when the occupation is viewed as a public safety concern. On the one hand, there is rising incidence of medical and neurologic problems with age.45 On the other hand, older expert workers may be able to adapt to normal age-associated changes through increased reliance on domain-specific knowledge and procedural memories, which are less age-sensitive, and by adopting strategies that help maintain successful performance and minimize errors.4,5,44,46,47 In order to fairly and objectively assess occupational competency, it is necessary to incorporate measures rich for domain-relevant knowledge and strategies.8,43,48
The authors thank Helena Kraemer, PhD, for biostatistical consulting and Katy Castile, Tiffany Doelger, and Anne Lademan for recruiting and testing participants. They also thank the aviator study participants for their donation of time and for being inspirational role models of intellectual exploration.
Supported by the Sierra-Pacific Mental Illness Research, Education, and Clinical Center (MIRECC) and the Medical Research Service of the Department of Veterans Affairs, and by NIA grants P30 AG 17824 and R37 AG 12713 (with a supplement for underrepresented minorities to Dr. Kennedy).
Disclosure: The authors report no conflicts of interest.
Reprints Information about ordering reprints can be found online: http://www.neurology.org/misc/reprints.shtml