The European Parliament and the Council of the European Union recognizes long working hours as an occupational safety and health hazard. The issue is addressed in the EU Working Time Directive, which, among other things, decrees that member states shall take the measures necessary to ensure that the average working time for each 7-day period, including overtime, does not exceed 48 hours (1).
A recent review concluded that long working hours are associated with an increased risk of depressive states, anxiety, and sleep disorders (2). It has, moreover, been suggested that not only “very much overtime” (49–100 work hours/week) but also “moderate overtime” (41–48 work hours/week) seems to be associated with an increased risk of mental distress (3).
The evidence behind the above conclusions and suggestions is however quite weak and mainly based on studies which are either cross-sectional (3–5) or of low statistical power (6–9). Moreover, none of the concerned research papers documented or claimed that their hypotheses and statistical models were completely defined before the data analysis began.
The underlying null hypothesis of the present project was that long working hours, to the extent they are currently practiced in Denmark, neither add to nor subtract from the national burden of mental ill health.
The hypothesis was operationalized and tested in a prospective cohort study on a random sample of the general working population of Denmark. The null hypothesis would be rejected if subsequent rates of mental ill health (manifested by the use of psychotropic medicine) among people with long working hours at baseline differed significantly from those among people with normal working hours. Special attention was paid to overtime work within the limits of the EU Working Time Directive (41–48 hours/week).
We also wanted know: (i) if the effect was independent of age, gender, shift work, and socioeconomic status (SES); (ii) if the results would change if the analysis was controlled for self-rated mental health, job satisfaction; and job insecurity; and (iii) what the estimated effect would be if we restricted the outcome to antidepressants, anxiolytics and hypnotics and sedatives, respectively.
We launched the study with an open mind, believing that some cases of mental ill-health would be caused while others would be prevented by long working hours (10). On the one hand, we knew that long working hours are associated with short sleep (11), which is a risk factor for psychiatric disorders (12). On the other hand, we knew that long working hours may increase income, and thereby decrease the risk of financial strain, a condition proven to be highly predictable of psychiatric disorders (13–15).
Methods
To prevent hindsight bias, the hypotheses and statistical methods of the study were completely defined, peer-reviewed, and published in a detailed study protocol (10) before the exposure and the outcome data were linked. The two first paragraphs of the study protocol’s data material section and the information that was given about the statistical methods of the primary and secondary analyses will be reproduced in their entirety in the present paper. The remaining parts of the material and methods of the study will only be described briefly. Further details are given in the study protocol.
Data material
The project data were obtained through a linkage of interview data from the Copenhagen Psychosocial Questionnaire (COPSOQ) study sample of 2004, the Danish National Working Environment Survey (DANES) of 2008, and the Danish Work Environment Cohort Study (DWECS) of 1995, 2000, 2005, and 2010 with data from the Central Person Register (CPR), the Employment Classification Module (ECM), and the Danish National Prescription Registry (DNPR). Participants’ unique personal identification numbers were used as the key in the linkage procedure. DNPR covers all purchases of prescription drugs at pharmacies in Denmark since 1995, regardless of whether or not they were reimbursed (16). The CPR has existed since 1968 and contains dates of deaths and migrations in the Danish population (17). A person’s occupation, industry, and SES are, as of 1975, registered annually in the ECM (18). SES was coded in accordance with Statistics Denmark’s official socioeconomic classification (19). The SES code among employees is based on the first digit of the Danish version of the International Standard Classification of Occupations (DISCO-88) (20), and contains the following categories: (i) legislators, senior officials, and managers (DISCO-88 group 1), (ii) professionals (DISCO-88 group 2), (iii) technicians and associate professionals (DISCO-88group 3), (iv) workers in occupations that require skills at a basic level (DISCO-88 group 4–8), (v) workers in elementary occupations (DISCO-88 group 9), and (vi) gainfully occupied people with an unknown occupation (missing DISCO-88 code).
The COPSOQ study sample is a random sample, comprising 4732 people, 20–59 years of age, 3517 of whom are wage earners (21). DANES is based on a random sample of the Danish population in 2008. It contains responses from 6531 people, 18–59 years of age, of whom 4919 are employees. The DWECS is an open cohort study initiated in 1990 with a random sample of people, aged 18–59 years, in the Danish population. The cohort has thereafter been supplemented with young people and immigrants so as to obtain a representative cross-sectional study of ≥5000 employees every fifth year (22). The reported response rates were 80%, 75%, 60%, 62%, 66%, and 48% for DWECS 1995 (16); DWECS 2000 (23); COPSOQ 2004 (21); DWECS 2005 (24); DANES 2008 (25); and DWECS 2010 (26), respectively.
Primary analysis
Case definition
The medical products in the DNPR are coded in accordance with the Anatomical Therapeutic Chemical (ATC) system. In the present project, a person was defined as a case if and when he or she redeemed a prescription for drugs in the ATC-code category N05 (psycholeptica) or N06 (psychoanaleptica). The psycholeptic category contains antipsychotics, anxiolytics, hypnotics, and sedatives, while the psychanaleptic category contains antidepressants, psychostimulants, and anti-dementia drugs.
Follow-up and inclusion criteria
Each of the included samples was followed for a period of 2–5 years, beginning at the start of the calendar year succeeding the one in which they were sampled. People should be aged 21–59 years at the start of the follow-up period and, according to the questionnaire, employed with ≥32 weekly working hours around the time of the interview to be eligible for inclusion. The sample from DWECS 2010 was followed for two years. The sample from the DANES 2008 was followed for four years. The remaining samples were followed for five years. A participant was censored if and when he or she died or emigrated. Person-years at risk were calculated for each participant. People who redeemed a prescription for a medication with an ATC-code that belong to the case definition during the calendar year preceding baseline was excluded from the follow-up. A participant who reached the clinical endpoint of the study was not allowed to re-enter the follow-up, ie, there would be maximum one case per person.
The following strategy ensured that the follow-up periods (among workers who participated in ≥6 surveys) were disjointed: (i) 5 persons were excluded from the follow-up of the COPSOQ sample of 2004 due to participation in the follow-up of the DWECS sample of 2000, (ii) 1 person was excluded from the follow-up of the DWECS sample of 2005 due to participation in the follow-up of the COPSOQ sample of 2004, (iii) 30 persons were excluded from the follow-up of the DANES sample of 2008 due to participation in the follow-up of the COPSOQ sample of 2004 or the DWECS sample of 2005, and (iv) 20 persons were excluded from the follow-up of the DWECS sample of 2010 due to participation in the follow-up of the DANES sample of 2008.
Statistical model
Poisson regression was used to model incidence rates of redeemed prescriptions for psychotropic medicine as a function of weekly working hours (32–40; 41-48; >48 hours/week). The analysis was controlled for gender, age (10-year classes), sample (DWECS 1995; DWECS 2000; COPSOQ 2004; DWECS 2005; DANES 2008; DWECS 2010), shift work (fixed night shifts or rotational shift work schedules versus other), and SES (according to the employment classification module, during the calendar year of the baseline interview). The logarithm of person-years at risk was used as offset. The significance level was set to 0.05. A likelihood ratio test was used to test the null hypothesis, which states that the analyzed rates are independent of weekly working hours.
Secondary analyses
Regardless of whether or not the primary research hypothesis was confirmed, we decided to perform a series of secondary analyses. The interpretation of the results from these analyses would, however, depend on the outcome of the primary hypothesis test. If the primary null hypothesis was rejected, then the secondary analyses would be regarded as nested hypothesis tests. Otherwise, they would be regarded as hypotheses generating exploratory analyses, the results of which need to be confirmed in an independent dataset before they can be deemed statistically significant.
With the endpoint and covariates of the primary hypothesis test, we used likelihood ratio tests to check for possible two-way interaction effects between working hours and gender, age, shift work (fixed night shifts or rotational shift work schedules vs other), or SES. Subsample analyses were thereafter performed on data stratified first by gender, then age (21–39 and 40–59 years), shift work, and finally SES. In keeping with the principles of nested hypothesis testing, we did not consider perceived differences in effect sizes between strata statistically significant unless both the primary hypothesis test and the concerned two-way interaction effect were statistically significant.
In addition to the above, we performed three separate analyses in exactly the same way as we did in the primary analysis but with endpoints defined by the following subsets of ATC-codes: N05B (anxiolytics), N05C (hypnotics and sedatives), and N06A (antidepressants).
Sensitivity analysis
We performed two sensitivity analyses. The first one showed how the estimates were affected when we, in addition to excluding workers who used prescription drugs at baseline, also excluded those with a poor self-rated mental health at baseline. The second showed how the estimates were affected when we, in addition to controlling for the covariates of the primary analysis, also controlled for job satisfaction and job insecurity. The first one was based on data from DWECS 1995, DWECS 2000, DWECS 2005 and DANES 2008 while the second was based on data from DWECS 1995, DWECS 2000 and DWECS 2005. Further details about these analyses are given in the study protocol (10).
Cross-sectional auxiliary analysis
To shed light on the issue of possible prescription bias, we calculated an odds ratio for poor mental health at baseline (among employees with long versus normal working hours) in two ways: the case definition of the first calculation was based on self-rated mental health and the second on redemption of a prescription for psychotropic drugs during the calendar year of the interview. Logistic regression was used to model the odds of the outcomes as a function of weekly working hours (≥41 versus 32–40). We controlled for gender, age, SES, shift work, and sample in the same way as in the primary analysis. Generalized estimating equations were used to estimate the parameters. Observations from the same person were treated as repeated measurements and a first order autoregressive correlation structure was assumed. The following samples were included in the analysis: DWECS 1995, DWECS 2000, DWECS 2005 and DANES 2008. Further details are given in the study protocol (10).
Results
In the primary analysis, the inclusion criteria for age, employment status, and working hours were fulfilled for 29 837 observations. Of these, we excluded 2252 due to redeemed prescriptions during the calendar year of the interview, 832 due to redeemed prescriptions in a previous follow-up period, and 794 due to missing data on shift work, which left us with 25 959 observations (19 259 persons) to be included in the analysis. The included observations yielded a total of 2914 new cases of psychotropic drug use in 99 018 person-years at risk. A flow-chart for each of the included data sets is given in table 1.
The likelihood ratio test did not reject the null hypothesis (P=0.085). For each of the work-hour categories, the estimated rate ratio (RR), person-years at risk, number of cases and crude rate (cases per 1000 person-years) are given in table 2. It should be noted that the first occurrence during the follow-period is not necessarily the first occurrence ever, and that the crude rates given in the table should be interpreted accordingly.
Since nested hypothesis testing was used to deal with the mass-significance problem and the primary hypothesis test failed to reject its null hypothesis, none of the RR given in table 2 are regarded as statistically significant. Another consequence of the nested hypothesis testing is that the secondary analyses will be regarded as hypothesis-generating exploratory analyses, the results of which need to be confirmed in an independent dataset before they can be deemed statistically significant [cf. Hannerz and Albertsen, (10)].
The RR for incident use of psychotropic drugs (ATC-code N05 and N06) are stratified by SES in table 3 and by gender, age and shift work status, respectively, in table 4. The 95% confidence interval (95% CI) of the RR for the contrast 41–48 versus 32–40 working hours/week contains the value 1 in each of the examined strata. Hence, there is no indication of an association between moderate overtime work and use of psychotropic medicine. The stratified analyses suggest, however, that excessive overtime work (>48 hours/week) might be associated with an increased risk among night or shift workers (RR 1.51, 95% CI 1.15–1.98).
The RR for incident use of anxiolytics (ATC-code N05B), hypnotics and sedatives (ATC-code N05C) and antidepressants (ATC-code N06A), respectively, are given in table 5. The overlap between different medication groups is described in table 6.
The first of the sensitivity analysis (based on data from DWECS 1995, DWECS 2000, DWECS 2005 and DANES 2008) showed that exclusion of workers with poor self-rated mental health at baseline did not have an important impact on the association between working hours and use of psychotropic drugs. The RR for incident use of psychotropic drugs among employees working 41–48 versus 32–40 hours/week was 1.06 (95% CI 0.94–1.20) with and 1.07 (95% CI 0.96–1.21) without exclusion of workers with poor self-rated mental health at baseline. The corresponding RR for the contrast >48 versus 32–40 hours/week were 1.08 (95% CI 0.92–1.26) and 1.11 (95% CI 0.96–1.28).
The second sensitivity analysis (based on data from DWECS 1995, DWECS 2000 and DWECS 2005) showed that job satisfaction and job insecurity did not have an important impact on the estimates. The RR for the contrast 41–48 versus 32–40 hours/week was 1.04 (95% CI 0.91–1.19) with and 1.04 (95% CI 0.91–1.18) without control for job satisfaction and job insecurity. The corresponding RR for the contrast >48 versus 32–40 hours/week were 1.23 (95% CI 1.05–1.44) and 1.20 (95% CI 1.02–1.40).
The cross-sectional auxiliary analysis showed that long working hours was not associated with an increased propensity to seek medical treatment; the odds ratio (among employees with long versus normal working hours) was 0.84 (95% CI 0.74–0.97) for redemption of a prescription for psychotropic drugs during the calendar year of the baseline interview and 1.05 (95% CI 0.84–1.30) for poor self-rated mental health.
Discussion
The primary analysis of the present study did not find any statistically significant associations between long working hours and the incidence of psychotropic drug usage among Danish employees. Stratification by gender, age, shift work and SES, respectively, suggested that overtime work which exceeds the limit of the EU Working Time Directive (>48 hours/week) might be associated with an increased risk among shift workers, but this is a finding which needs to be reproduced in an independent data set before it can be deemed statistically significant.
Methodological considerations
In this study, we used psychotropic medication as an indicator of mental health. However, not all mental health problems can be assumed to be treated by prescribed psychotropic medicine. Some mental health problems may be less severe than others while some people may have a stronger tendency to seek medical treatment for mental health problems than others. In order to rectify these shortcomings, we did two kinds of supplementary analyses. The cross-sectional auxiliary analysis did not suggest any difference in the tendency to seek redeemed prescription for mental health problems among people with long versus normal working hours respectively, and the sensitivity analyses showed no important impact on the results, when respondents with poor self-rated mental health at baseline were excluded from the analyses. So although the measure of psychotropic medication as an indicator of mental health problems does have some limitations, there are no indications that this should have influenced the results of the study in any important way.
A typical meta-analysis combines results from studies with different operationalization of exposure, covariates and outcome variables. Each study has its own target population while the target population of the meta-analysis is often missing or questionable. In such circumstances, it is possible to pool the results but not the data, and the variation between the studies needs to be handled as a random effect in the meta-analysis. The exposure, covariates and outcome variables of the present project were, however, operationalized in the same way in all of the included dataset. Moreover, all of the samples were drawn from the same target population through simple randomization. It was therefore possible to pool the data and circumvent the necessity to apply meta-analytic methods. The samples were, however, drawn at different time periods and, since we know that the prevalence of psychotropic drug usage changes with time (27), we had to control for this factor, and we did so by incorporating sample indicators as covariates in each of our analyses.
Follow-up periods varied from 2–5 years. It is possible that an association either weakens (due to transitions between exposure categories) or strengthens (due to latency periods) with time. The estimated strength of an association would thus depend on the length of the follow-up period. It appears, however, that the length of the follow-up periods did not play an important role in the present study. Our second sensitivity analysis was based solely on samples followed for five years and the estimates that were obtained in that analysis were very similar to those obtained in the primary analysis where all samples were included.
Strengths and weaknesses
Within-study selection bias was eliminated through our study protocol, in which all hypotheses and statistical models were completely specified, peer reviewed, and published before the questionnaire data were linked to the registers. The study population was randomly sampled from the target population and the statistical power was sufficiently large to detect important effects. The problem with reversed causality was minimized through the prospective design and the exclusion of prevalent cases. Bias from incomplete follow-up data was eliminated by the use of a clinical endpoint that was ascertained through national registers, which cover all residents of the target population.
Since the study was not a randomized controlled trial, we cannot, however, rule out the possibility that uncontrolled selection factors biased the results. The decision to work long hours might, for example, depend on personality traits (28), which are correlated with anxiety, depressive and substance abuse disorders (29). The direction of the reported correlations (28, 29) suggests, however, that a failure to control for personality traits would result in over- rather than under-estimated RR. The null finding of the present study is therefore unlikely to have been caused by uncontrolled personality traits.
Another potential source of selection bias comes from non-respondents in the baseline interviews. The decline in response rates from 80% in 1995 to 48% in 2010 is problematic. It has been shown that the response rates to public health questionnaires in Denmark tend to be especially low among young men, unmarried people, people with a low educational level and people with an ethnic background other than Danish (30, 31). It is possible that the response rates as well as the reasons for non-response in the present study differ between the exposed and unexposed workers. Long working hours imply, for example, less time to answer questionnaires. We believe, however, that any such bias was mitigated by our decision to control for age, gender and SES. We also believe that it was further mitigated by our decision to focus on relative rather than absolute rates and by our decision to exclude prevalent cases.
Previous research
We located 12 research papers, which together provided risk ratios from 42 disjoint statistical tests aimed at evaluating prospective relationships between long working hours and mental ill-health. Of these, 16 tests concerned depressive symptoms (6–8, 32–34), 11 concerned sleeping problems (9, 35), 4 concerned any type of adverse mental symptom (36), 3 concerned any type of psychotropic drug prescription (37, 38), 2 concerned anxiety symptoms (7), 2 concerned reduced psychological well-being (32), 2 concerned high alcohol consumption (32), and 2 concerned any type of medically diagnosed mental disorder (39). While 31 of the tests were restricted to workers in a single company (7–9, 7, 36–38) or occupational group (33), 11 targeted workers in a general population (6, 32, 34, 35, 38).
Since approximately 25% of male and 15% of female employees work ≥41 hours per week in the EU (40), where mental ill-health is the biggest single cause of disability benefit claims (41), a 50% increased risk of mental ill-health would clearly be a remarkable effect. Yet, only 8 of the 42 tests had an acceptable power (>80%) to detect an effect of such size [see appendix (www.sjweh.fi/data_repository.php for further details]. Most of the tests were, in other words, underpowered to a degree where no meaningful information would be imparted by a null finding, and such tests are not only associated with large statistical errors but also with publication and within-study selection bias (42).
One of the tests with acceptable power was performed in a population-based study of workers from seven regions of France (35). The age and gender standardized odds ratio for having developed a sleeping disorder at the end of a five-year follow-up period was estimated at 0.9 (95% CI 0.8–1.1) among workers who often worked >48 hours/week versus those who had never worked >48 hours/week at baseline.
Two of the tests with acceptable power concerned the development of depressive and anxiety symptoms in a five-year follow-up on a sample of British civil servants (7). The hazard ratio among employees working 41–55 hours/week compared with employees working 35–40 hours/week was estimated at 1.02 (95% CI 0.78–1.34) for depressive symptoms and 1.02 (95% CI 0.79–1.32) for anxiety symptoms. The analyses were controlled for gender and age as well as a series of occupational and lifestyle factors.
Two of the tests with acceptable power concerned redeemed prescriptions (37, 38). One of them was performed in a population-based Canadian study, which treated working hours as a continuous variable and estimated that the incidence of psychotropic drug usage decreased with weekly working hours RR=0.99 (95% CI 0.98–1.00) (37). The other was performed in a Finnish study, which estimated the RR for psychotropic drug usage among female employees of the City of Helsinki to be 0.96 (95% CI 0.77–1.18) for the contrast >40 versus ≤40 working hours/week (38).
The three remaining tests with acceptable power concerned workers from a large Japanese telecommunication enterprise and the odds of having developed any mental symptom (general fatigue, sleeping problems, lethargy, anxiety or worries, inability to cooperate with colleagues, feeling depressed) at the end a four-year follow-up period (36). The odds ratio for the contrast 9–12 hours versus ≤8 hours a day was 1.06 (95% CI 0.98–1.15) for men and 1.10 (95% CI 0.95–1.27) for women. The test for the contrast >12 versus ≤8 hours had acceptable power for the men (OR 1.41, 95% CI 1.09–1.83) but not women.
Concluding remarks
The present study did not find any statistically significant association between moderate overtime work and mental ill-health, and this null-finding aligns well with the results obtained in previously published and acceptably powered prospective studies. A statistical test can never prove that an effect is exactly equal to zero (43). The narrow confidence interval around the estimated RR between moderate overtime workers and employees with normal working hours indicates, however, if there is an effect then it is certainly too small to warrant further investigation [cf. Monson (44)].
Our study can neither confirm nor reject the possibility that excessive overtime is associated with a clinically important effect. Although not statistically significant, our primary analysis suggested that the average risk among employees whose overtime work exceeds the limit of the EU Working Time Directive (>48 hours/week) might be slightly higher than it is among employees with normal working hours, and our secondary analyses suggested that shift workers might be an especially vulnerable group. Although not statistically significant, the high rate ratio for long versus normal working hours among shift workers makes sense. It is well known that shift work is associated with sleeping problems (45), and it is reasonable to believe that some of the psychotropic drugs among the shift-workers were prescribed to cope with such problems. It is also reasonable to believe that the need to cope with sleeping problems would be greater among shift workers with very long working hours compared to shift workers with normal working hours.