The results from our meta-analysis support the conclusions of previous reviews that short-term total SD has a significant deleterious effect across most cognitive domains. Our current study represents an advance over previous meta-analyses in several important respects. First, we were able to take into account the known Treatment × Subject interaction in experiments of SD (Van Dongen et al., 2004
), thus improving the estimation of the sampling variance for each study. Second, we weighted each effect size on the basis of study quality, thus giving less influence to studies that may have been less well conducted. Third, we had more stringent inclusion criteria than Philibert (2005)
, which increased the homogeneity of our sample. Finally, and most important, we classified behavioral tests into finer grained cognitive domains than previous meta-analyses, further increasing the similarity of studies within each subsample.
Overall, average effect sizes appear to fall along a continuum, with tasks of greater complexity affected relatively less after total SD. The relative magnitude of effect sizes across cognitive domains was similar to those seen in the meta-analysis of Philibert (2005)
, although the absolute size of these effects was smaller across all categories. This is likely due to two reasons: We excluded all studies with a period of total SD greater than 48 hr, and we did not disattenuate effect sizes based on test–retest reliability of dependent measures.
The difference in the average effect size among the six cognitive domains was statistically significant and ranged from −0.125 to −0.762. As anticipated, the combined effect size for simple attention and vigilance tasks was the largest among all the categories studied. This finding is consistent with the notion that vigilance is the fundamental process affected by SD (Lim & Dinges, 2008
) and the deficit for which compensation is least available. In contrast, average effect sizes for complex attention and working memory tests fell into the moderate range. Although this pattern of results has been observed in the literature, this is, to our knowledge, the first time that this difference has been systematically investigated in a large body of studies.
Several points of interest arise on inspection of the group effect sizes of the complex cognitive tasks (all categories other than simple attention). First, we note that task performance in the complex attention category is relatively spared when compared with simple attention. These data are compelling, as many of the complex attention tests differ from the simple attention tests in only a single cognitive process (e.g., two-choice reaction time vs. simple reaction time). This finding suggests that for tests of orienting or executive attention, performance is relatively preserved after SD either because of the greater salience of the bottom-up feed (and thus the reduced need for internally motivated top-down control) or because of the recruitment of additional mental operations. However, we also observe that complexity alone is an inadequate construct with which to identify tasks that may not be as affected by SD, as there were still substantial effect size differences among complex tasks in different domains. The nuances of these behavioral effects, as well as their neural correlates, should continue to be an interesting and fruitful area of study.
We failed to find significant effects in two of the categories tested. First, there was no effect of SD on accuracy measures in tests of reasoning and crystallized intelligence. Crystallized abilities (e.g., the retrieval of domain-specific knowledge) are thought to be highly stable over a range of cognitive states, and are even of use in assessing premorbid functioning following neurological insult or the onset of dementia (O’Carroll & Gilleard, 1986
; Watt & O’Carroll, 1999
). It is unsurprising, therefore, that outcomes on these tests are relatively unaffected by short-term SD.
Second, the average effect size of the change in accuracy measures for tests of processing speed failed (but only barely) to reach statistical significance. There are at least two potential explanations for this finding. Nearly all the tasks in the processing speed category were self-paced, as opposed to work paced, and several authors have commented on the differences between these two classes of tests. Williams et al. (1959)
noted that a bias toward accurate responding is commonly found in complex, self-paced assignments, a conclusion reiterated by more recent researchers who have found speed but not accuracy effects on these tasks (e.g., De Gennaro, Ferrara, Curcio, & Bertini, 2001
). Koslowsky and Babkoff (1992)
also found a similar effect of work- versus self-paced tasks in their meta-analysis, although this increased effect size was seen only in studies with more than 48 hr of SD. A less common explanation of the relatively preserved accuracy on processing speed tasks relates to the nature of the operations being performed in them. These operations usually involve high levels of automaticity (e.g., decoding symbols in the Digit Symbol Substitution Test), and the fidelity of such overlearned skills is probably protected even during periods of fatigue, leading to the relatively small increase in the number of errors made.
An important feature of the current meta-analysis was the separate aggregation of accuracy and reaction time measures. Although there is some evidence that lapsing and lapse duration after SD are correlated in a test of simple reaction time (Lim & Dinges, 2008
), there is no a priori reason to assume that this relationship should hold across all cognitive domains. This point is not intuitive and warrants further discussion. illustrates the curve representing the speed–accuracy trade-off in a typical cognitive test, as well as the downward shift of this curve following a period of SD. The unexplored factor in this relationship is whether SD also biases subjects toward faster or slower responding, as represented by a shift along the lower curve. For instance, increases in the number of commission errors or false alarms on simple reaction time tests after SD have been attributed to increased disinhibition (Dorrian et al., 2005
), which can be thought of as a bias toward faster (and less accurate) responding.
Figure 3 Illustration of two possible ways in which sleep deprivation (SD) can affect speed and accuracy variables. Two sources of change may potentially occur following a period of total SD: a downward shift of the performance curve and a movement along the curve. (more ...)
As it turns out, the results of our analysis show remarkable agreement between accuracy and reaction time measures in each cognitive category: Overall, there was no significant effect when comparing accuracy and reaction time across the sample. This finding suggests that, on average, SD does not bias subjects toward either faster or more accurate responding, although this claim cannot be made of any individual cognitive test.
Of the three moderator variables studied, only hours awake (homeostatic sleep drive or sleep pressure) was a significant moderator of the effect of SD, and only for accuracy, not reaction time variables. Because of the nature of the coding in this study, we expected homeostatic sleep pressure to be a stronger predictor than circadian time or circadian offset, as there is considerable variability in endogenous circadian phase across individuals (Horne & Ostberg, 1976
). Nevertheless, the results obtained in this analysis were surprising, as both circadian factors and homeostatic sleep drive are known to modulate cognitive performance (Mallis, Mejdal, Nguyen, & Dinges, 2004
; Van Dongen & Dinges, 2005
A likely explanation for this negative result is that much of the observed heterogeneity is due to the variety of cognitive tests in each sample. If this assertion is correct, it implies that the amount of impairment on tests that putatively assess the same cognitive domain may still differ considerably following SD. In other words, the validity of these tests in assessing the cognitive process may not be as high after SD. For example, total SD is known to exacerbate the time-on-task effect (Doran et al., 2001
), suggesting that test length may be a confounding variable across tests of many cognitive processes. To obtain an objective standard of impairment, therefore, it may be necessary to establish norms on several of the most commonly used tests in each domain.
Although it would have been interesting to test the moderating effect of self-paced and work-paced paradigms in this analysis, these variables were highly confounded with cognitive domain (i.e., within each category, most or all tests tended to be either self-paced or work paced). From the data obtained in the main effects, however, we can infer that the differential effects of self-paced versus work paced on accuracy and reaction time measures are unlikely to be significant as suggested in previous meta-analyses. Instead, it is possible that these effects are present only under certain conditions (e.g., extremely long periods of SD or for particular subsets of tests).
As stated in the introduction, the chief objective of this meta-analysis was not to rule out any particular theoretical model but to direct attention to which of these models may have the greatest importance in explaining the real-world consequences of total SD. Although total SD does produce statistically significant differences in most cognitive domains, the largest effects are seen in tests of simple, sustained attention. This form of attention is critical in many industries involving sustained operations, during which a worker’s primary task may involve long, monotonous periods of low-level monitoring and situational awareness. Moreover, relatively brief failures of vigilance may potentially lead to disastrous consequences. For example, lapses in sustained attention are the direct cause of SD-related motor vehicle accidents (Dinges, Mallis, Maislin, & Powell, 1998
), in which an eyelid closure of 4 s is a sufficient amount of time for a driver to completely veer off a highway. We argue, therefore, that this cognitive module is of the greatest practical concern in combating SD-related problems in real-world situations.
A striking feature of this deficit in sustained attention is how rapidly large changes emerge. Although our analysis was restricted to subjects who had gone a single night without sleep, effect sizes were still large for both speed and accuracy measures on simple attention tasks. These findings support the data showing that deficits in sustained attention often presage the other observable cognitive effects of SD and may have considerable utility as an early warning system for imminent cognitive failure. This cognitive component should therefore be one of the primary targets of assessment for work fitness and a basis for decisions on whether subsequent countermeasures should be applied.
On the next rung of the hierarchy, we note that tests of working memory and other tests of executive attention are also robustly affected by one night of SD. Considerable research has been conducted over the past several decades to assess the effects of SD on decision making and its component subprocesses (e.g., response inhibition, updating strategies, assessing risk; Harrison & Horne, 2000
), and our data suggest that further investigation into these problems is a worthwhile endeavor. Indeed, neuroimaging data on these tasks are affording us new insights into the neural processes underlying the observable behavioral changes (for a review, see Chee & Chuah, 2008
) and suggesting possible neuropharmacological mechanisms through which we may intervene to ameliorate these problems in individuals who are most vulnerable to sleep loss (Chuah & Chee, 2008
Finally, although tests of processing speed and cognitive throughput such as the Digit Symbol Substitution Test are commonly used in SD paradigms, the results of this analysis demonstrate that their effects are relatively small compared with those of other tests. Indeed, studies of partial SD have demonstrated little or no effect on cognitive throughput tasks (Casement, Broussard, Mullington, & Press, 2006
; Dinges et al., 1997
). The implication of this finding is that changes in processing speed may be theoretically interesting but not of great practical significance in explaining and predicting real-world cognitive failures (Monk, 2007
This analysis contains a small number of limitations that may have affected the validity of the conclusions drawn. As we were able to obtain only a small amount of unpublished data, it is possible that there was a bias in the analysis toward effect sizes that reached statistical significance. Nevertheless, we received a 100% response rate from laboratories surveyed, and all but one of these investigators denied possessing any unpublished data that met our inclusion criteria. We are, therefore, relatively confident that the study was not greatly affected by publication bias.
Although every effort was made in this analysis to classify studies into appropriate and meaningful categories, it is clear that with the possible exception of simple attention, pure assays of most of the cognitive domains we have identified do not exist. Moreover, there remained numerous dissimilarities among the forms and characteristics of the tests within each category (e.g., task length, task demands), particularly within the category of complex attention. As discussed, this is the most likely reason why heterogeneity was in the moderate range for almost all categories studied. Despite these drawbacks, we propose that our taxonomy is a useful heuristic for several reasons. First, significant between-categories differences were found in the meta-analysis, suggesting that we have captured meaningful constructs with the classification we employed. Second, we have stayed faithful to categories that are well defined in the neuropsychological literature. In many cases, focal deficits on these tests have been observed in patients with specific pathologies or injuries (e.g., working memory in attention-deficit/hyperactivity disorder patients; Barkley, 1997
). Finally, several of the domains studied here have relatively high external validity. For instance, the challenge in simple attention tasks is similar to the real-world demands on air traffic controllers, and tasks such as the Psychomotor Vigilance Test have been shown to correlate highly with other indicators of dangerous, drowsy driving (Dinges et al., 1998
; Price et al., 2003
We were not able to study a number of moderator effects that may be important predictors of the residual intradomain heterogeneity. Task duration is likely to be one of these factors, with longer tasks associated with greater effect sizes due to the presence of the time-on-task effect. We were unable to code this moderator chiefly because many articles did not report task length and because of the variability in time to completion for all tasks that were self-paced. As we have already mentioned, the difference between self-paced and work-paced tests was highly confounded with cognitive domain, making it unfeasible to test this as a moderator. Additionally, variables such as novelty and motivation (Jones & Harrison, 2001
), though potentially important in affecting test outcomes, are not easily quantified.
Finally, a substantial number of studies entered into this meta-analysis reported only accuracy or reaction time as a dependent variable in their final published work. As a result, we could not conduct paired comparisons of these measures to assess their reliability. We encourage authors publishing in this field in the future to consider reporting both accuracy and reaction time measures where appropriate so that their relationship after SD can be better explored. We also suggest that, wherever possible, data from individual test bouts and not just omnibus F values for a series of bouts be reported, so as to enable the inclusion of more studies in future quantitative syntheses.
The results of this analysis have revealed the pattern of effects across cognitive domains and outcomes after a period of short-term total SD. Overall, there was a significant difference among cognitive domains, but not between speed and accuracy, suggesting that SD has differential effects on different cognitive processes but does not bias subjects toward either faster or more accurate responding in any of these domains. As some of the known key moderators of this effect did not explain the remaining between-studies variance, we infer that that the remaining heterogeneity is due to intertest differences and that test characteristics can influence the level of performance in the sleep-deprived state even when they are ostensibly assessing the same cognitive domain.
Finally, our results indicate that simple attention is the cognitive domain most strongly affected by short-term SD. Although decrements in other cognitive modules such as decision-making and memory processes no doubt contribute to real-world errors and accidents, the results of this analysis argue that deficits in sustained attention may represent the most parsimonious explanation for these occurrences. Thus, in light of these and other data, we believe that countermeasures targeting this cognitive module may be the most efficient means of accident prevention in industries where SD poses a significant safety risk.