Area under the ROC curve (AUC) was used to measure accuracy for detecting fractures [26
]. AUC was estimated for each observer in each experimental condition, and the average areas were compared using Analysis of Variance (ANOVA). Independent variables were institution (Arizona, Iowa), level of training (Attending, Resident), and the reading session time-of-day (Early, Late). A more complex ANOVA added session order (readers assigned to Early-first-then-Late vs. Late-first-then-Early) and case difficulty (first 30 with 15 easier fractures, second 30 with 15 harder fractures) as other independent variables.
There was a significant drop in detection accuracy for Late vs Early reading. Average AUC was 0.885 for Early and 0.852 for Late reading, (F(1,36) = 4.15, p = 0.049 < 0.05). There were no other significant effects. The more complex ANOVA revealed that while attending radiologists and residents were about the same on easy cases, not surprisingly, residents were somewhat less accurate on hard cases. Supplemental analyses suggest that the reduction in accuracy for late reading was based on about the same increase in false positives as the decrease in true positives.
Total inspection time for interpreting the examinations was also analyzed. The ANOVA treated total inspection time as a dependent variable, and included fracture status (no fracture, fracture), institution, fracture difficulty, training level, and cases as independent variables. Each examination took 52.1 seconds on average for early reading and 51.5 seconds for late reading. On average, each examination took radiologists 50.7 seconds and residents 52.8 seconds. The only main effect was a significantly greater reading time for normal examinations than examinations with fractures (56.7 vs. 46.9 seconds, F(1,36) = 18.84, p = 0.0001 < 0.001).
To determine whether search time was affected by time of day, we studied the time to report fractures for cases in which the fracture was detected in both the early and late sessions. There was no significant difference between early and late reading time to report the fracture for all examinations (37.0 vs. 38.3 seconds), easier examinations (33.0 vs. 34.0 seconds), or harder examinations (42.5 vs. 44.2 seconds). For all examinations, average response time was 42.8 seconds for early reading and 36.0 seconds for late reading when the early session occurred first. Average response time was 31.2 seconds for early reading and 40.0 seconds for late reading when the late session occurred first (F(1,32) = 20.84, p = 0.0001 < 0.001). Similar results were obtained when easy and hard examinations were analyzed separately. These results suggest that responses in the second session were faster. This apparent practice effect is hardly surprising. The main finding was that when the fracture was found both early and late, the same amount of search time was required.
Visual Strain Results
Recall that accommodation measures (as a measure of visual strain) were taken every 0.2 seconds over a number of seconds. Medians were computed for each reader before (pre) and after (post) the early and late reading sessions. An ANOVA was used to analyze the accommodation measures with the fracture and asterisk targets. For the fracture, there was significantly greater accommodative error after the workday (−1.16 diopters late vs. −0.72 diopters for early, F(1,29) = 27.01, p < 0.0001). For the asterisk target, there was also significant main effect for session time of day (−1.04 diopters late vs. −0.64 diopters early, F(1,34) = 22.005, p < 0.0001). This suggests that readers are more myopic and are experiencing more visual strain after their workday. Overall there was no main effect for measures before and after the reading session, or for level of training. A significant Pre vs. Post × Attending vs. Resident interaction showed that while the attending radiologists tend to have less accommodative error after the reading session than before, residents tend to have more ().
Error in accommodation for the asterisk (left) and fracture (right) targets for Pre and Post measurements made Early and Late in the day.
We further hypothesized that if readers have greater visual strain and thus have more difficulty maintaining focus after visual work, their accommodation measures would be more variable. ANOVAs on the standard deviations of the accommodation measurements were computed. For the fracture target, there were no significant main effects or two-way interactions. There was a significant three-way interaction of Pre vs. Post × Attending vs. Resident × Early vs. Late (F(1,34) = 4.35, p < 0.05). For the asterisk target, residents' accommodation was significantly more variable than faculty (0.13 vs. 0.17 diopters, F(1,29) = 4.72, p < 0.05). There were no other significant main effects or two-way interactions. The three-way interaction of Pre vs. Post × Attending vs. Resident × Early vs. Late was again significant (F(1,29) = 8.12, p < 0.01). Because the nature of the three-way interactions was not consistent between the two targets (fracture and asterisk), nothing could be concluded beyond that the variability for residents was greater than for faculty. Overall, we must conclude that variability of accommodation was unaffected by visual work in our experiment.
Fatigue Survey Results
The scores for each of the five SOFI factors were analyzed with an ANOVA with session (Early vs. Late) and experience (Attending vs. Resident) as independent variables. Average rating values for each factor are shown in .
Mean and standard deviations (in parentheses) of the SOFI and SSQ survey ratings for Attendings and Residents Early and Late in the day.
For Lack of Energy (F(1,76) = 16.19, p = 0.0001 < 0.001), Physical Discomfort (F(1,76) = 5.091, p = 0.0269 < 0.05) and Sleepiness (F(1,76) = 7.761, p = 0.0067 < 0.01), there were statistically significant differences as a function of session, but not experience. For Physical Exertion and for Motivation there were no statistically significant differences as a function of either session or experience. Additional analyses indicated that there were no statistically significant differences on any of the factors as a function of gender or site.
The scores from the seven questions on the oculomotor strain sub-scale of the Simulator Sickness Questionnaire (SSQ) were averaged and analyzed with an ANOVA as a function of session and experience (see ). As with the SOFI, low scores represent lower levels of perceived oculomotor strain. There was a statistically significant difference in rated symptoms of oculomotor strain as a function of session (F(1,75) = 20.39, p < 0.0001), but not experience (F(1,75) = 0.99, p = 0.32).