Patterns of working hour characteristics and risk of sickness absence among shift-working hospital employees: a data-mining cohort study

Objectives: Data mining can complement traditional hypothesis-based approaches in characterizing unhealthy work exposures. We used it to derive a hypothesis-free characterization of working hour patterns in shift work and their associations with sickness absence (SA). Methods: In this prospective cohort study, complete payroll-based work hours and SA dates were extracted from a shift-scheduling register from 2008 to 2019 on 6029 employees from a hospital district in Southwestern Finland. We applied permutation distribution clustering to time series of successive shift lengths, between-shift rest periods, and shift starting times to identify clusters of similar working hour patterns over time. We examined associations of clusters spanning on average 23 months with SA during the following 23 months. Results: We identified eight distinct working hour patterns in shift work: (i) regular morning (M)/evening (E) work, weekends off; (ii) irregular M work; (iii) irregular M/E/night (N) work; (iv) regular M work, weekends off; (v) irregular, interrupted M/E/N work; (vi) variable M work, weekends off; (vii) quickly rotating M/E work, non-standard weeks; and (viii) slowly rotating M/E work, non-standard weeks. The associations of these eight working-hour clusters with risk of future SA varied. The cluster of irregular, interrupted M/E/N work was the strongest predictor of increased SA (days per year) with an incidence rate ratio of 1.77 (95% confidence interval 1.74–1.80) compared to regular M/E work, weekends off. Conclusions: This data-mining suggests that hypothesis-free approaches can contribute to scientific understanding of healthy working hour characteristics and complement traditional hypothesis-driven approaches.


Work shift burdensomeness in FIOH shift ergonomics
In the main text, we compared our novel data-driven clusters to more traditional work overload risk scores derived from past literature by an expert panel. The work overload risk scores in the FIOH (Finnish Institute of Occupational Health) shift ergonomics recommendations were formed according to the following rules:

Recommendation content
Score 3 Score 2 Score 1 Score 0 Working hours between 2 free days ( (14). Direct link here. Note, the FIOH recommendations are distinct from our data-driven clusters and here they were used merely as a point of comparison.
In this study, each possible period that allowed the calculation of the above scores for a given employee was used to do so. The numbers were then averaged to find an employee-specific risk value (range 0-3) for the target interval (e.g., for first half of his or her longest consequtive data stretch).

Supplementary information on cluster correlates and contents
Description of clusters in the main text was necessarily limited due to the space constraints and for the sake of brevity. In below, we supplement the description. First, we provide correlates of individual clusters by regressing them on multiple covariates (Table S1), although this remains a linear characterization with limited value for the minor clusters. Second, we provide figures used to characterize the clusters in terms of their input data also for the clusters that could not be illustrated in the main text due to the space constraints ( Figure S1-S4). Finally, to understand the highly non-parametric cluster model, it is useful to investigate some individual members of the clusters. For that purpose, the remaining text shows the weekdays of the first 25 shifts in the first 6 members (employees) in each of the 8 clusters, whereas the remaining figures (raw data plots) show the quantitative dimensions of shift length, rest length, and shift start time for the first 100 shifts of the first 4 members in each of the 8 clusters.

Supplementary Tables: Correlates for cluster memberships
In the main text, we adjusted effects of our data-driven clusters on sickness absence for other variables. Below, and the reader may find how those other variables are associated with the clusters themselves.

Supplementary figures characterizing cluster contents
In the main text, we showed figures characterizing the cluster contents, for some key clusters only for brevity. For the sake of comprehensiveness, the below supplementary figures show the same information for the remaining clusters.
Supplementary Figure S1. Empirical shift-by-characteristic densities for cluster no 2 (upper row) and cluster no 3 (lower row). Brighter colors indicate more probability mass.
Supplementary Figure S4. Cluster-average autocorrelation for log-recovery length in clusters not shown in the main text.

Examples of Shift Rotations Weekdays
In the below print out, weekdays of successive shifts are shown for 6 first samples in each cluster (mo = Monday; tu = Tuesday; we = Wednesday; th = Thursday; fr = Friday; sa = Saturday; su = Sunday).

Raw-data Plots
The following plots show the data for the first 100 shifts for the first 4 employees in each cluster. The data are almost "raw" in the sense that registered values are plotted for the data dimensions entered into the permutation distribution clustering method. They are only "almost" because a logarithm was taken from the rest-length values for the sake of data-point distinguishability (some very large values present in the data). A dashed line in those panels was drawn at log(11h) ≈ 2.4 log-hours to highlight which recovery lengths are shorter than the minimum daily rest in the European Working Time Directive (see main text). Each figure contains an upper-left panel plotting shift lengths against calendar days. The adjacent plot shows the same against the running number of shifts, and the remaining plots the two other dimensions entering into the 3-dimensional time-series clustering (i.e., rest length and shift start time).