|Home | About | Journals | Submit | Contact Us | Français|
This study examined the association between long working hours and cognitive function in middle age. Data were collected in 1997–1999 (baseline) and 2002–2004 (follow-up) from a prospective study of 2,214 British civil servants who were in full-time employment at baseline and had data on cognitive tests and covariates. A battery of cognitive tests (short-term memory, Alice Heim 4-I, Mill Hill vocabulary, phonemic fluency, and semantic fluency) were measured at baseline and at follow-up. Compared with working 40 hours per week at most, working more than 55 hours per week was associated with lower scores in the vocabulary test at both baseline and follow-up. Long working hours also predicted decline in performance on the reasoning test (Alice Heim 4-I). Similar results were obtained by using working hours as a continuous variable; the associations between working hours and cognitive function were robust to adjustments for several potential confounding factors including age, sex, marital status, education, occupation, income, physical diseases, psychosocial factors, sleep disturbances, and health risk behaviors. This study shows that long working hours may have a negative effect on cognitive performance in middle age.
Long working hours are common worldwide; for example, in the European Union member states, 12%–17% of employees worked overtime in 2001 (1). Long working hours have been found to be associated with cardiovascular and immunologic reactions, reduced sleep duration, unhealthy lifestyle (2–8), and adverse health outcomes, such as cardiovascular disease, diabetes, subjective health complaints, fatigue (2–7), and depression (8). There is increasing evidence to suggest the importance of midlife risk factors for later dementia (9). Furthermore, the link between cognitive impairment and later life dementia is clearly established (10, 11). Thus, it is important to examine risk factors for poor cognition in midlife, and there is little research on the potential effects of long working hours on cognition among middle-aged persons.
A cross-sectional study of 248 automotive workers found an association between overtime work and impaired performance on tests of attention and executive function (12). This finding was in agreement with findings from other studies that focused on different forms of shift work or work schedule rather than on long working hours (13, 14). For example, deterioration in cognitive performance, including impaired grammatical reasoning and alertness, has been found in post versus pretest conditions among employees working 9- to 12-hour shifts compared with a traditional 8-hour shift (13). However, little is known about the health effects of long total working hours as opposed to long hours of shift work.
This study examined the relation between long working hours and cognitive function over a 5-year follow-up period in a large-scale, prospective occupational cohort of British civil servants (the Whitehall II study) (15). We were able to take into account several factors that may act as confounders or mediators of this association, such as education, occupational position, physical health status, psychological and psychosocial factors, sleep problems, and health risk behaviors (2).
The Whitehall II study sample recruitment (phase 1) took place between late 1985 and early 1988 among all office staff, aged 35–55 years, from 20 London-based Civil Service departments (15). The response rate was 73% (6,895 men and 3,413 women). Since phase 1, there have been 7 further data collection phases. Informed consent was gained from all participants. The University College London Medical School Committee on the Ethics of Human Research approved the protocol.
As cognitive performance was measured on the whole sample for the first time at phase 5, this phase is used as baseline for the present study. We included all 2,214 participants (1,694 men and 520 women) who were employed and responded to the questions on working hours and for whom the covariates and cognitive test scores were available at phase 5 (1997–1999) and phase 7 (2002–2004). A flow chart of sample selection is shown in Figure 1. The mean age of the 2,214 participants at phase 5 was 52.1 years (standard deviation, 4.2; range, 45–66). There were no major differences between the participants and all full-time employees who participated in phase 5 (n=3,597) in terms of age (52.1 vs. 52.4 years), sex (77% vs. 75% male), occupational grade (18% with the lowest occupational grade vs. 22%), and prevalence of coronary heart disease (10% vs. 11%). However, employees who participated in our study at phases 5 and 7 differed from the cohort at recruitment to the Whitehall II study (n=10,308), in that they were younger (mean age, 40.6 vs. 44.5 years at phase 1); more likely to be male (77% vs. 67%) and from the higher socioeconomic groups (10% with the lowest grade vs. 23%); and less likely to have preexisting coronary heart disease at phase 1 (2.7% vs. 4.1%).
The cognitive function test battery at phases 5 and 7 consisted of 5 standard tasks chosen to evaluate cognitive functioning in middle-aged adults. The first was verbal memory assessed by a 20-word free recall test of short-term memory. Participants were presented a list of 20 one- or two-syllable words at 2-second intervals and were then asked to recall in writing as many of the words in any order within 2 minutes. The Alice Heim 4-I (AH 4-I) test (16) is a measure of inductive reasoning that assesses fluid intelligence, that is, the ability to identify patterns and to infer principles and rules. This test is composed of a series of 65 items (32 verbal and 33 mathematical reasoning items) of increasing difficulty. The participants had 10 minutes to complete this section. The Mill Hill vocabulary test (17) assesses crystallized intelligence, that is, knowledge of verbal meaning, and encompasses the ability to recognize and comprehend words. We used this test in its multiple-choice format that consists of a list of 33 stimulus words ordered by increasing difficulty, with 6 response choices per word. The final 2 tests were measures of verbal fluency: phonemic and semantic (18). Phonemic fluency was assessed via “S” words, and semantic fluency was assessed via “animal” words. Subjects were asked to recall in writing as many words beginning with “S” and as many animal names as they could. One minute was allowed for each test of verbal fluency. A higher score indicated better performance in each test.
The change score was calculated for each measure of cognitive function as phase 7 score minus phase 5 score. As the time interval between clinical examination at phases 5 and 7 varied between 3.9 and 7.1 years (mean, 5.5 years), the difference in cognitive score was divided by the time in years between the 2 measures for each individual and multiplied by 5 to give everyone the same (5-year) time period between the 2 phases of cognitive data collection.
Working hours were determined at phase 5 from the following 2 questions: “How many hours do you work per average week in your main job including work brought home?” and “How many hours do you work in an average week in your additional employment?”. Participants were divided into the following 3 groups: a total of 35–40 hours; 41–55 hours; and more than 55 hours per week (5–7). In addition, analyses were conducted by using the scale as a continuous variable. Participants in the Whitehall II study are almost exclusively white-collar civil servants. The most common weekly working hours correspond to 36 hours per week net, although various flexible working arrangements can also be arranged. In the present cohort, the total mean working hours were 45.2 hours/week (standard deviation, 8.0; range, 35–120).
Altogether, 20 sociodemographic characteristics and behavioral, psychological, psychosocial, and medical conditions known to be associated with cognitive function and/or working hours were included as covariates in the analysis (2–9, 12, 19–38). In addition to sex and age, marital status, indicators of socioeconomic position, that is, occupational grade (6 levels from which the lowest 2 levels were collapsed to obtain sufficient numbers), education (postgraduate, graduate, higher secondary school, lower secondary school, or no academic qualifications), and the participant's report of his/her annual gross salary were assessed. Employment status (working vs. not working) at follow-up was obtained from the phase 7 questionnaire.
The physical functioning component score of the Medical Outcomes Study SF-36 test (39) was used as a measure of global physical health status and divided into quartiles separately for men and women. Prevalent coronary heart disease at phase 5 included cases of nonfatal myocardial infarction and angina. In addition to definite nonfatal myocardial infarction and definite angina, our total nonfatal coronary heart disease events outcome included self-reported cases in the absence of any clinical record evidence of coronary disease. Systolic blood pressure and diastolic blood pressure were measured by using a Hawksley random-zero sphygmomanometer (Hawksley and Sons, Ltd., Lancing, United Kingdom). In keeping with standard definitions, subjects with systolic blood pressure of ≥140 mm Hg and diastolic blood pressure of ≥90 mm Hg or on antihypertensive treatment were considered to be hypertensive (40).
Psychological distress was assessed by using the 30-item General Health Questionnaire (GHQ-30) (41). The GHQ-30 has been validated in a number of diverse populations and has been validated specifically against the Clinical Interview Schedule in Whitehall II data, giving a cutoff point of 4/5 positive responses for dividing noncases from cases (42). In addition, a 5-item subscale of anxiety (e.g., feelings of constant strain, panic, nervousness) was derived from the GHQ-30 (41). Scores in the top decile were used to define anxiety cases, corresponding to the prevalence of anxiety disorders in the general population (43).
Sleep was assessed in 2 ways; the first was a measure of duration with respondents identified as short sleepers if they reported sleeping less than 6 hours on an average week night (44). Sleep quality was assessed by using the “Jenkins scale” (45), which assesses sleep disturbances during the past 4 weeks. The mean response score for all 4 questions was divided into tertiles.
Of the health behaviors, alcohol consumption (units/week) was classified into 3 categories: none; >0–14 (women)/21 (men) units;>14/>21 units (46). Smoking was assessed by a single question of whether the respondent was a current smoker or not. For the physical activity score, the participants were asked about the frequency and duration of their participation in physical activity (47). The amount of time spent in activities with metabolic equivalent values ranging from 0 to 6 or above was summed to allow calculation of the total number of hours per week of physical activity and divided into 3 categories—low, moderate, and high.
Social support was measured by the 15-item Close Persons Questionnaire (48), which includes questions about confiding/emotional support, practical support, and negative aspects of close relationships. The mean of all responses was divided into tertiles. Strain in family relations was measured with a single-item question of how often the participant had any worries or problems with other relatives, for example, parents or in-laws (always/often vs. sometimes/seldom/never/not applicable). Job strain was formulated by splitting the job demands score and decision latitude score at their medians. High demands and low decision latitude indicated high job strain, and other combinations indicated low job strain (49).
All analyses were carried out by using SAS, version 9.1, statistical software (SAS Institute, Inc., Cary, North Carolina), except missing-data analysis which was done using STATA, version 9.0, statistical software (StataCorp LP, College Station, Texas). First, we compared baseline characteristics of the participants by working hours and compared the longer-hours group (>55 hours per week) with the employees with normal working hours (35–40 hours per week) using χ2 tests. We used multiple analysis of covariance to examine whether work hours had an overall association with cognitive function, as checking for each measure of cognitive function separately increases the chance of Type 1 error. Subsequently, analysis of variance was used to assess the association between work hours and individual measures of cognitive function. When a significant difference was found in cognitive function tests at baseline and/or at follow-up between groups, additional analyses were carried out with the change score to assess temporal order and to examine whether the change was statistically significant. Sequential analyses were undertaken to see whether adjustment for covariates attenuated the association between long working hours and change in cognitive function. Age was entered into the models as a continuous variable, and all other covariates were entered as categorical variables. As recommended by Glymour et al. (50), we used baseline-unadjusted change scores for cognitive change. In order to examine linear trend in the association between working hours and cognitive function, we repeated the analysis using working hours as a continuous variable.
To explore whether selection bias might have occurred because of loss to follow-up, we undertook a sensitivity analysis in which we used multiple multivariate imputation (51) using working hours, all covariates, and cognition variables to impute values for missing values in any variables with some missing data, among all 3,163 participants free of stroke and transient ischemic attack at baseline. We used switching regression in STATA software, as described by Royston (51), carried out 20 cycles of regression switching, and generated 20 imputation data sets. The multiple multivariate imputation approach creates a number of copies of the data (in this case, we generated 20 copies), each of which has values that are missing imputed with an appropriate level of randomness using chained equations. The estimates are obtained by averaging across the results from each of these 20 data sets using Rubin's rules. The procedure takes account of uncertainty in the imputation, as well as uncertainty due to random variation, as undertaken in all multivariable analyses.
Characteristics of the study participants by working hours at baseline are shown in Table 1. A total of 853 (39%) participants reported 35–40 hours of work per week, 1,180 (53%) reported 41–55 hours, and 181 (8%) reported more than 55 hours of work per week. Compared with employees with 35–40 hours, a higher percentage of those who worked more than 55 hours were men and were married or cohabited and had a higher occupational grade, higher education, higher income, more psychological distress, shorter sleep, higher alcohol use, and more social support.
Multiple analysis of covariance revealed an overall association of working hours with cognitive function at baseline (P=0.002) and follow-up (P=0.037), as well as change in cognitive function scores between baseline and follow-up (P=0.044). Table 2 shows the associations between working hours at baseline and each cognitive function measure at baseline and at follow-up after adjustment for all the covariates measured at baseline. Compared with employees working 40 hours or less per week, employees working more than 55 hours had lower vocabulary scores at baseline and at follow-up. At follow-up, they had lower scores also on the reasoning test. No significant difference between groups was found in any other measures of cognitive function at follow-up. Repeating these analyses with working hours treated as a continuous variable largely replicated the findings and additionally showed an association between working hours and better phonemic fluency at baseline but not at follow-up.
Table 3 examines the mean difference in the change in reasoning score between those working normal hours and those working long hours. Successive models show the effects of step-by-step adjustments. The stepwise adjustments show that various adjustments produced little attenuation of the effect of working hours on the decline in reasoning score, and a clear dose-response pattern was revealed between exposure and outcome. Again, the findings were replicated in models replacing categories with a continuous measure of working hours.
To further examine whether the findings are robust, we ran a sensitivity analysis in a subgroup of participants still employed at follow-up (n=1,672, n=1,677). Consistent with the main analyses, working more than 55 hours versus 40 hours or less was associated with a greater decline in the reasoning score (difference, −1.47; P=0.002) and lower scores on the vocabulary test at baseline (difference, −0.77; P=0.009) and at follow-up (difference, −0.60; P =0.046). Corresponding P values for the continuous working hours were P=0.009, P=0.004, and P=0.023.
To examine sex differences, we conducted altogether 15 tests of interaction between sex and continuous working hours on cognitive function outcomes and found 2 statistically significant interactions: for the vocabulary test at baseline (P=0.015) and at follow-up (P=0.003). Sex-stratified analysis showed a significant negative association between working hours and vocabulary score at baseline and at follow-up among men (P<0.001) but not among women (P=0.899 and 0.339).
Finally, Table 4 repeats the analyses on those associations that were found to be robust in Tables 2 and and3,3, except that the results were obtained from the multiple multivariate imputation analysis for the baseline population, a total of 3,163 participants. To simplify comparison of cohorts before and after imputations, we present the effects of working hours as per 10-hour increase in a continuous measure. Imputation had little effect on the associations with vocabulary at baseline and follow-up and with reasoning at follow-up. The association with reasoning at baseline was strengthened, but otherwise the associations were similar to those before imputation. Corresponding P values for the categorical working hours variable were as follows: Between the groups of >55 hours versus ≤40 hours, P<0.001 for the vocabulary score at baseline and follow-up; P=0.068 for the reasoning score at baseline; P=0.002 for the reasoning score at follow-up; and P=0.025 for the change score in reasoning (data not shown), thus replicating the original findings.
In this study of middle-aged men and women, working more than 55 hours per week was associated with lower scores on 2 of the 5 tests of cognitive function. Long working hours at baseline were related to poorer performance on the vocabulary test at both baseline and follow-up. Furthermore, long working hours predicted decline in performance on the reasoning test over a 5-year follow-up period. These effects were robust to adjustments for 20 potential confounding factors, such as education, occupational position, physical diseases (cardiovascular dysfunction), psychosocial stress factors, sleep problems, and health risk behaviors.
We found an association between long working hours and decline in the scores for the AH 4-I reasoning test and associations with the Mill Hill vocabulary tests at baseline and at follow-up. The AH 4-I test is also recognized as a measure of fluid intelligence, that is, executive function or “meta” cognitive ability as it integrates other cognitive processes such as memory, attention, and speed of information processing. Fluid intelligence is seen to be intrinsically associated with information processing and involves short-term memory, abstract thinking, creativity, ability to solve novel problems, and reaction time. It is the aspect of intelligence most affected by aging, biologic factors, diseases, and injuries (52, 53). Fluid intelligence usually increases up to the mid-20s, after which it gradually declines until the 60s when a more rapid decline takes place.
The Mill Hill vocabulary test measures crystallized intelligence that is assumed to accumulate during the lifespan through education, occupational and cultural experience, and exposure to culture and intellectual pursuits (52, 53). Crystallized abilities usually increase up to the sixth or seventh decade of age and may not decrease until after 80 years of age. We found the Mill Hill scores to remain relatively stable as expected for this middle-aged cohort. However, the Mill Hill scores were lower among employees with long working hours at both baseline and follow-up. This consistency with 2 separate measures with a 5-year interval suggests not only a plausible finding but also stability of the far-reaching effect of long working hours on vocabulary. We did not find an interaction effect between follow-up employment status and working hours on significant outcomes, which suggests that the associations found are not dependent on employment status at follow-up. However, people who work long hours might be exposed to a narrower variation of intellectual pursuits, that is, only to those that are related to their work tasks, and therefore might not be able to develop a wide variety of functions in crystallized intelligence measured by the test. However, reversed causality is also possible: Employees with lower cognitive ability may be more prone to work overtime than workers with good cognitive ability in order to get their work done.
Previous literature, mostly cross-sectional, suggests that long working hours are associated with various health outcomes, the strongest effects being observed for cardiovascular diseases, fatigue, and sleep disturbances (2–8). These can also be hypothesized to be mediating mechanisms for the association between long working hours and cognitive decline. Hypertension is associated with cognitive dysfunction by producing subtle disturbances in cerebral perfusion and affecting brain cell metabolism (19, 20). However, we found no evidence of an association between long working hours and hypertension or coronary heart disease, suggesting that the effect of long hours on cardiovascular dysfunction, if any, is unlikely to explain cognitive decline in this study.
Another hypothesis on mediating mechanisms links long working hours with psychological stress and poor recovery from work as indicated by sleeping problems and reduced sleep. Psychological stress has been suggested as affecting the brain via 2 neuroendocrine systems: 1) the sympathetic adrenomedullary system with the secretion of epinephrine and norepinephrine and 2) the hypothalamic-pituitary-adrenocortical system with the secretion of cortisol (54). Of the few studies in the field, only 1 study has found an association between long working hours and neuroendocrinologic stress markers (55). We found that long working hours were associated with short sleep duration and psychologic distress but not with sleep disturbances. Further adjustment for these factors did not provide support for the hypothesis that psychological distress and poor recovery act as mediating mechanisms.
The third hypothesis suggests that long working hours may affect cognitive function through health risk behaviors. Evidence on the association between long working hours and unhealthy behaviors is weak, but there is stronger evidence for the relation between health behaviors and cognitive function (22–24, 26). We found that adjustment for all these health risk behaviors had no effect on the association between long working hours and cognitive function, suggesting that health risk behaviors may not be an important mediating or confounding variable.
When working hours were entered into the model as a continuous variable, we found an association between long hours and better phonemic fluency at baseline but not at follow-up. This inconsistency is also reflected in the lack of an association between the categorical working hours and phonemic fluency. More research is needed to determine whether employees with long working hours do better than other employees on tests of verbal fluency. Out of 15 analyses, we found 2 statistically significant interaction effects between working hours and sex, and sex-stratified analysis showed that long working hours were associated with poorer vocabulary performance among men but not among women. However, further research with larger samples is needed to examine potential sex differences in the association between working hours and cognition.
The strengths of this study include a large sample size and the possibility to explore prospectively the association between long working hours and a possible change in cognitive function over a 5-year interval, which has not been feasible in earlier studies. Furthermore, we used 5 separate measures of cognitive function, allowing associations with specific aspects of cognition to be observed, and we were able to adjust for a large number of covariates as potential confounding or mediating factors between the exposure and outcome.
There are also important limitations in this study. First, the period of 5 years for cognitive decline might not be sufficient to detect a significant decline in cognitive function in general. Second, the Whitehall II cohort is based on civil servants and not representative of the entire working population, limiting the generalizability of our results. Third, we used self-reported working hours, with inherent problems of recall. Fourth, middle-aged occupational cohorts, such as ours, are subject to a healthy survivor effect as the study design involves participants who are employed and gradually excludes those who develop work disability. However, all cohort studies focusing on work-related exposures at midlife are open to health-related selection because participants need to be employed. Because poor health is linked with worse cognition, the healthy survivor effect is likely to lead to conservative estimates of the associations found. The baseline of the present study was approximately 15 years after inclusion into the Whitehall II study; men, employees in the higher occupational grades, and those free from coronary heart disease were slightly overrepresented. However, the associations among work hours, vocabulary, and reasoning were robust to adjustments for sex, occupational grade, and health. Furthermore, the similarity of these associations in the complete case and multiple imputation analyses suggests that loss to follow-up after the baseline is an unlikely source of bias in this study.
Decline in cognitive function has already been shown to be present among the middle aged (9). As mild cognitive impairment predicts dementia (10, 11) and mortality (56–58), the identification of risk factors for mild cognitive impairment in middle age is important. The results of this study show that long working hours may be one of the risk factors that have a negative effect on cognitive performance in middle age. Our findings can have clinical significance, as the 0.6- to 1.4-unit difference in aspects of cognitive functioning between employees working long hours and those working normal hours is similar in magnitude to that of smoking, a risk factor for dementia (59), which has been found to affect cognition in the Whitehall II study (60). However, further research is needed to identify the potential underlying factors for the relation between long working hours and cognitive function and to examine the generalizability of our findings.
Author affiliations: Centre of Expertise for Work Organizations, Finnish Institute of Occupational Health, Helsinki, Finland (Marianna Virtanen, Markus Jokela, Jussi Vahtera, Mika Kivimäki); Department of Epidemiology and Public Health, University College London, London, United Kingdom (Jane E. Ferrie, Archana Singh-Manoux, David Gimeno, Michael G. Marmot, Mika Kivimäki); INSERM, Saint-Maurice, Cédex, France (Archana Singh-Manoux); Centre de Gérontologie, Hôpital Ste Périne, Paris, France (Archana Singh-Manoux); Division of Environmental and Occupational Health Sciences, Health Science Center at Houston, The University of Texas School of Public Health, San Antonio, Texas, (David Gimeno); and the National Research and Development Centre for Welfare and Health, Helsinki, Finland (Marko Elovainio).
The Whitehall II study has been supported by grants from the British Medical Research Council; the British Heart Foundation; the British Health and Safety Executive; the British Department of Health; the US National Heart, Lung, and Blood Institute (grant HL36310); the US National Institute on Aging (grant AG13196); the US Agency for Health Care Policy and Research (grant HS06516); and the John D. and Catherine T. MacArthur Foundation Research Networks on Successful Midlife Development and Socioeconomic Status and Health. A. S-M. is supported by a “European Young Investigator Award” from the European Science Foundation. M. G. M. is supported by a British Medical Research Council research professorship. J. E. F. is supported by the British Medical Research Council (grant G8802774). J. V. and M. K. are supported by the Academy of Finland (projects 117604, 124322, and 124271).
Conflict of interest: none declared.