In an earlier analysis, we systematically compared performance on the 10-min PVT to the first 1 to 9 min of the same test, and found that the highest effect sizes were often found for shorter than 10 min test durations, especially for outcome variables that did not involve lapses. [12
] This underlined the feasibility of shorter than 10-min PVTs and motivated us to develop a modified 3-min version of the test. A 5-min version had already been shown to reach similar degrees of sensitivity to sleep deprivation as the standard 10-min PVT, albeit only in TSD paradigms [33
]. Roach et al. [34
] concluded that a 90 s version of the PVT may not provide a reasonable substitute for the 10-min PVT.
This is the first study to systematically compare a modified brief 3-min version of the PVT with the standard 10-min version during both TSD and PSD. For this purpose, 74 subjects contributed 1,656 pairs of both versions of the test that were performed in close temporal proximity. However, we did not simply shorten test duration. We also decreased ISIs from the standard 2 to 10 s to 1 to 4 s for the following reasons: First, we wanted to get more precise estimates of our outcome variables by lowering ISIs and therefore by sampling more behavior. Thus, although the duration of the test decreased by 70%, the sampling rate only decreased by 32.4%. Second, we observed in our PVT databases (and were also able to show in this analysis) that short ISIs were associated with longer RTs, which we hypothesized was due to a central nervous system refractory period following a response and in preparation for the next stimulus. By capitalizing on this effect, we intended to increase sensitivity of the PVT-B. Third, we hypothesized that the higher cognitive workload associated with an increased stimulus density would increase the time on task effect, and therefore more quickly unmask sleepiness compared to a 3-min version with standard ISIs.
A comparison of RT distributions of the PVT-B and the PVT revealed that, both in alert and sleep deprived subjects, RTs were shorter, false start rate was higher, and lapse frequency was lower on the PVT-B compared to the PVT. This could be explained by differences in hardware (personal computer versus PVT-192 device), knowledge of test duration, increased stimulus density in the PVT-B, and by the fact that in > 90% of the trials the PVT-B was performed after the PVT. Even after controlling for differences in ISI, time on task, and test order, RTs on the PVT-B were still significantly faster compared to the PVT. Therefore, faster RTs on the PVT-B were likely due to hardware differences (stimulus presentation, response buttons, hardware response latencies) or knowledge of test duration. Systematic comparisons on the same hardware platform are needed to investigate the magnitude of the effect of knowledge of test duration.
We operationalized effect size as a measure of the PVT's ability or power to differentiate alert from sleep deprived subjects [12
]. Effect size addresses more than just the sensitivity of the PVT, which was used as the validation criterion by other authors. [32
] The PVT has to be sensitive (indicate high levels of sleepiness in sleep deprived subjects), specific (indicate low levels of sleepiness in alert subjects) and do this consistently in order to achieve high effect sizes. Our analyses showed that effect sizes of the PVT-B were consistently lower compared to the PVT. This is only partially in line with our previous work where we compared the 10-min PVT to the first 3 min of the same 10-min PVT test bout for 10 different PVT outcome metrics. [12
] That analysis found lower effect sizes for the first 3 min of the PVT only in 70% of the outcome metrics in both TSD and PSD. The fact that in this study subjects performed two distinct tests on different hardware platforms with altered ISIs in the 3-min version of the test and with the knowledge of different test durations may have contributed to this discrepancy. More studies using the same hardware for both tests and counter-balancing the order of test administration in a cross-over fashion are needed to elucidate the differences in effect sizes found between the two versions of the test.
Despite the above factors, effect sizes for the PVT-B were still substantial, and compared to the 70% decrease in test duration, the 22.7% average decrease in effect size was acceptable. According to Cohen's criteria [41
], all outcome metrics scored large effect sizes (>0.8) on the PVT-B in TSD. In the PSD study, all outcome metrics scored large effect sizes on the PVT. On the PVT-B, only mean 1/RT and slowest 10% 1/RT still scored large effect sizes. The effect sizes of lapses (both 500 ms and 355 ms definitions) and the performance score dropped to medium (>0.5 and <0.8), while the effect size of fastest 10% RT dropped to low (>0.2 and <0.5). Thus, it was shown that the utility of the PVT-B depends on the outcome metric.
Comparable to our analysis on optimal outcome metrics and task durations of the PVT [12
], the highest effect sizes were observed for the reciprocal measures 1/RT and slowest 10% 1/RT for both the PVT-B and the PVT, and during both TSD and PSD (with the exception that the performance score's effect size was higher than that of the slowest 10% 1/RT on PVT-B in TSD). This highlights the favorable properties of the reciprocal outcomes, which reflect response slowing in the pre-lapse domain (i.e. RTs < 500 ms) and effectively remove the influence of outlying long RTs. The reciprocal outcomes also showed very good coherence between PVT-B and the PVT with high p-values for the interaction between test version and test time, and they were the only two variables scoring large effect sizes on the PVT-B in PSD.
One advantage of the newly developed performance score is its easy interpretability. Although in terms of effect size it ranked only in 5th (TSD) and 6th (PSD) position on the PVT, the differences in effect size between the PVT and the PVT-B were lowest for this outcome measure, which is why it ranked in 2nd (TSD) and 3rd (PSD) position on the PVT-B. This is probably due to the fact that it takes both errors of omission (lapses) and errors of commission (false starts) into account, and therefore penalizes the bias towards faster RTs observed in PVT-B performance. Both the easy interpretability and these favorable statistical properties make the performance score a potential candidate for a primary outcome measure of the PVT-B. It is currently used to give astronauts feedback on their Reaction Self Test performance, a Microsoft Windows based version of the PVT-B, on board the International Space Station.
The PVT-B tracked the PVT closely over time in both TSD and PSD, especially for the reciprocal outcome measures (i.e., response speed). The increase in the frequency of 500 ms lapses during SD was less pronounced for the PVT-B compared to the PVT, as indicated by a significant interaction between test version and test time for this outcome metric. This is most likely a side effect of the overall decrease in PVT-B response times, as lowering the lapse threshold for the PVT-B diminished the difference in the number of lapses between tests to nonsignificant levels, even though the number of stimuli was lower for the PVT-B. Also, the increase in the fastest 10% RT was less pronounced for the PVT-B during both TSD and PSD compared to the PVT, with the highest observed drop in effect size of 67.8% for this measure during PSD. This could be explained by a general response bias towards faster RTs in the PVT-B, which would even be enhanced by increased compensatory effort during SD. The latter may be sufficient to keep the fastest 10% RTs low during a 3-min, but not during a 10-min version of the test.
Several limitations have to be considered when interpreting the findings from this analysis. First, test duration, hardware, and ISIs were changed simultaneously for PVT-B relative to the PVT. Although we were able to shed some light onto the contributions of these factors to the observed differences in response times between both test versions, it would still be very valuable to perform a counterbalanced cross-over study comparing both versions of the PVT using the same hardware. Second, the 355 ms lapse threshold for the PVT-B was found post-hoc in the TSD experiment, and may depend on PVT hardware. Although its utility was confirmed in the PSD experiment, further studies are needed to confirm the adequacy of the 355 ms lapse threshold for the PVT-B. Third, the PVT was performed once every 2 hours while the PVT-B was performed only once every 4 hours in the PSD protocol, which probably affected the comparison of both tests. However, we believe that our results are conservative as the difference in test frequency most likely decreased rather than increased the agreement between both versions of the PVT. Finally, our subject sample consisted of healthy, young to middle-aged subjects. Our findings may therefore not generalize to other populations.
This is the first time a modified 3-min version of the PVT was validated against the standard 10-min PVT during both TSD and PSD. Our analyses show that the PVT-B differentiated alert from sleep deprived subjects somewhat less than the standard 10-min PVT for all investigated outcome variables and during both TSD and PSD. However, effect sizes of the PVT-B were still large for all outcome metrics in TSD and (with the exception of fastest 10% RT) medium to large in PSD. Relative to the 70% decrease in test duration the 22.7% average decline in effect sizes of the PVT-B was deemed an acceptable trade-off between duration and sensitivity. The reciprocal outcome metrics mean 1/RT and slowest 10% 1/RT and the performance score were identified as candidates for primary outcome metrics for the PVT-B as they scored the largest effects sizes and/or the decrease in effect size compared to the PVT was relatively minor. Also, with the exception of fastest 10% RT in PSD and after lowering the lapse threshold for PVT-B from 500 ms to 355 ms, no statistical differences were found between both tests for all outcome variables and during both TSD and PSD. Therefore, we were able to show that the 3-min PVT-B remains a sensitive and specific assay for detecting wake-state instability induced by both total and partial sleep deprivation [14
]. It may be a useful tool in applied settings where use of the standard 10-min PVT is not feasible or undesirable. The validity of the PVT-B still needs to be established in such settings.
- The Psychomotor Vigilance Test (PVT) measures behavioral alertness.
- A brief 3 min version of the PVT remains sensitive to the effects of sleep loss.
- Its utility is currently evaluated on astronauts on board the International Space Station ISS.
- The brief PVT may be practical for many operational and clinical environments.