Prediction of long-term absence due to sickness in employees: development and validation of a multifactorial risk score in two cohort studies

Our findings, using two large cohorts, show that sickness absence risk can be predicted reliably with a model consisting mainly of socioeconomic and lifestyle factors. The quick-to-administer risk prediction score we have developed and validated can be used by healthcare professionals in identifying individuals at risk of sickness absence. Prediction of long-term absence due to sickness in employees: development and validation of a multifactorial risk score in two cohort studies. Scand J Work Environ Health Objectives This study aimed to develop and validate a risk prediction model for long-term sickness absence. Methods Survey responses on work- and lifestyle-related questions from 65 775 public-sector employees were linked to sickness absence records to develop a prediction score for medically-certified sickness absence lasting >9 days and ≥90 days. The score was externally validated using data from an independent population-based cohort of 13 527 employees. For both sickness absence outcomes, a full model including 46 candidate predictors was reduced to a parsimonious model using least-absolute-shrinkage-and-selection-operator (LASSO) regression. Predictive performance of the model was evaluated using C-index and calibration plots. Results Variance explained in ≥90-day sickness absence by the full model was 12.5%. In the parsimonious model, the predictors included self-rated health (linear and quadratic term), depression, sex, age (linear and qua-dratic), socioeconomic position, previous sickness absences, number of chronic diseases, smoking, shift work, working night shift, and quadratic terms for body mass index and Jenkins sleep scale. The discriminative ability of the score was good (C-index 0.74 in internal and 0.73 in external validation). Calibration plots confirmed high correspondence between the predicted and observed risk. In >9-day sickness absence, the full model explained 15.2% of the variance explained, but the C-index of the parsimonious model was poor (<0.65). Conclusions Individuals’ risk of a long-term sickness absence that lasts ≥90 days can be estimated using a brief risk score. The predictive performance of this score is comparable to those for established multifactorial risk algorithms for cardiovascular disease, such as the Framingham risk score.

Sickness absence constitutes a major burden for companies and organizations in terms of lost productivity (1). Sickness absence is also a predictor of adverse health outcomes, such as premature mortality (2). Early identification of individuals at risk of long-term sickness absence would be beneficial for employees and the employer to enable targeted interventions to delay and possibly prevent work disability.
A number of predictors for sickness absence have already been identified. Psychological problems (3,4), unfavorable workplace characteristics (5,6), and adverse health behaviors (7) have been shown to be associated with an increased risk of sickness absence. By combining this information, attempts have been made to develop multifactorial prediction models for sickness absence (8)(9)(10)(11)(12)(13)(14)(15)(16). To date, however, the study samples used in developing these prediction models have often included only a limited number of occupations (8,9,11,16,17), which may limit the generalizability of the results. Furthermore, while some of these studies used official records to assess sickness absence (8,9,11,14,16), others have relied only on self-reported sickness Airaksinen et al absence (10,12,13,15). The methods used to develop prediction models have also been heterogeneous. One study set out to examine a single indicator as a predictor for sickness absence (18), while others have used varying ways to reduce the number of predictors in developing a parsimonious multifactorial model (11,13,16,19). Few prediction models to date demonstrate the ability to discriminate individuals at high risk of sickness absence accurately (8,11,15,16).
To address some of these limitations, we developed and validated separate parsimonious prediction scores for medically-certified sickness absence lasting >9 days and ≥90 days, respectively, using two large independent cohort studies that included several demographic, lifestyle, and job characteristic variables as candidate predictors of sickness absence.

Study design and participants
Data for this study were obtained from the Finnish Public Sector (FPS) study and the Health and Social Support (HeSSup) study, Finland. Both studies are described in detail elsewhere (20)(21)(22). Briefly, the FPS study population was employees in the municipal service of 10 Finnish towns and 21 public hospitals with survey data collected from employees at work during 2000-2002 or 2004. The study population of HeS-Sup was a population-based sample of Finns surveyed in 1998 in four age groups (20-24, 30-34, 40-44, and 50-54 years). Participants from both cohorts were excluded if they were on a sickness absence (≥90 days), disability pension, or retired at the time of responding. Further, participants from the FPS second survey (2004) were excluded if they had taken part in the first survey (2000)(2001)(2002). Thus, the final sample was 65 775 municipal employees from FPS and 13 527 employed adults from HeSSup. We used FPS to develop the prediction score and HeSSup for external validation in an independent population-based sample. Local ethic committees approved both studies.

Measurements of predictors
In the FPS survey, participants were asked a total of 82 questions on their sociodemographic characteristics, health status, lifestyle habits, as well as working conditions. These questions were grouper into 30 singleor multi-item candidate predictors of which 14 were dichotomous and 16 multilevel/multicategorical (for a full list of items see supplementary file 1, www.sjweh. fi/show_abstract.php?abstract_id=3713).
Sociodemographic factors. These were derived from employers' registers and included sex, age, and socioeconomic position. We used the International Standard Classification of Occupations (ISCO) to derive the socioeconomic position for the participants' job titles. The ISCO has 10 categories ranging from 1 (managers) to 9 (elementary occupations) with a separate category for armed force occupations. Only few participants fell into categories 6-8 (skilled agricultural workers, craft and trade workers, plant and machine operations); and, as those categories refer to similar skill level occupations, we combined them to form a single "process worker" category. As none of the participants were employed by the armed forces, our final measure for socioeconomic was (1-7): 1=manager/higher official, 2=senior specialist, 3=specialist, 4=office worker, 5=service worker, 6=process worker, and 7=other.
Health status and sleep. Participants rated their health using a 5-point scale (1=good, 2=rather good, 3=moderate, 4=rather poor, 5=poor). Self-reported height and weight were used to calculate body mass index (BMI). The 12-item general health questionnaire (GHQ) was used to assess psychological distress (23). Responses for GHQ were given on a 4-point Likert scale (1=better than usual, 2=same as usual, 3=worse than usual, 4=much worse than usual). Mean response for the GHQ questions were used in the analysis. The Jenkins sleep scale was used to assess sleep problems (24). Answers for the scale were given on a 6-point Likert scale (ie, 1=never, 6=almost every night).
Health behaviors. Alcohol consumption was assessed with questions on how much beer, wine, and spirits participants consumed in a week. Answers were transformed into units of alcohol per week. Participants also reported on their smoking status (0=non-smoker/former smoker, 1=current smoker). Participants assessed their weekly leisure time activity on four scales: walking, brisk walking, jogging, and running. Answers to each scale were given on 5-point scale (1=0, 2=<0.5, 3=1, 4=2-3, and 5=≥4 hours). Participants were scored as inactive if they reported <1 hour/week of at least walking briskly.
Sickness absence history. Participants were linked with their sickness absence records from the year preceding the surveys. The records, including all sickness absence spells lasting >9 days, were obtained from the Social Insurance Institution of Finland. The number of sickness absences had a range from 0-5 with only 68 individuals reporting 4 or 5 spells of absence. Thus, participants with 4 or 5 spells were recoded as 3 sickness absences, resulting in a range from 0-3.

Prediction of long-term absence due to sickness in employees
Chronic diseases. Participants reported physician-diagnosed diseases from a list of common ailments. We matched the diseases with the top 30 causes of global disability-adjusted life years (25). Diseases matched included bronchial asthma, myocardial infarction, angina pectoris, cerebrovascular diseases, migraine, depression, and diabetes. Diseases on the top 30 list that were not queried in the survey included sense organ diseases, lung cancer, and a range of severe communicable diseases. In addition, we formed a new variable that summed together the reported diseases that were included in the matched list. The number of diseases ranged from 0-7 with only 66 participants reporting >3 diseases. Thus, participants with ≥4 diseases were recoded as having 3 diseases, resulting in a final range of 0-3. Measures for individual diseases and number of diseases were used in the models simultaneously.
Work-related characteristics. Participants of the FPS survey were asked about: shift work in general (0=no, 1=yes) and whether the shift included night work (0=no, 1=yes), job demands (a 3-item scale), job control (a 6-item scale), job efforts (1 item), and job rewards (a 3-item scale). Answers were on a 5-point Likert scale. The mean scores for the scales were used to form new measures: job strain (26) and effort-reward imbalance (ERI) (27). Job strain was scored as 1 if job demand was the median and job control under the median, and 0 otherwise. ERI was defined as 1 if the ratio of job effort to reward was >1, and 0 otherwise.
Team work and management. Team work was assessed with a short version of team climate inventory (TCI) (28). TCI includes four subscales: participatory safety (4 items), support for innovation (3 items), vision (4 items), and task orientation (3 items). Management was assessed with scales measuring procedural (7 items) and relational justice (6 items) (29). Responses to these items were given using a 5-point Likert scale, and the mean of the responses on each scale were used in the analysis.
The HeSSup survey included all those questions that were included in the risk score developed using data from FPS.

Ascertainment of sickness absence at follow-up
Survey responses were linked to electronic records of sickness absence lasting >9 days obtained from the national register kept by the Social Insurance Institution of Finland. Linkage was performed using personal identification numbers. Residents in Finland aged 16−67 years are entitled to receive daily allowances due to medically certified sickness absence (30). After a qualifying period of 9 days from the day of falling ill, compensation is paid for a maximum of one year. All these sickness absence periods must be medically certified and they are encoded in the register with the beginning and end dates. In Finland, employees receive compensation based on their salary during their sickness absence up to 300 weekdays. If a sickness absence lasts ≥90 days, the employee needs to provide the Finnish Social Insurance Institution a broader certificate from an occupational healthcare physician of his/her inability to work to be entitled for compensations.
We identified individuals with ≥1 long (>9 days) absence as well as those with ≥1 very long (≥90 days) periods of sickness absence during the follow-up. These cut-off points represent official, reliable records. The linkage data were available for all respondents in both cohorts for the full follow-up. The linkage data were available for all respondents in the FPS cohort until 31 December 2011 and the HeSSup cohort until 31 December 2013.

Statistical analysis
We combined the two FPS subsamples (surveys from 2000-2002 and 2004) to form the development cohort (N=65 775). Missing predictor data were imputed using single imputation with predictive mean matching (31).
To develop a parsimonious prediction model, we used Cox proportional hazard models together with least absolute shrinkage and selection operator (LASSO) regression. LASSO forces the sum of the absolute value of regression coefficients to be less than a fixed value that is dependent on a parameter lambda. As lambda increases, LASSO reduces certain regression coefficients to zero, leaving only the most important predictors in the model. First, we defined a full model using all 30 single-or multi-item predictors, as well as quadratic terms for all 16 non-dichotomous predictors to allow for non-linear associations. As such, the full model included 46 predictors. Then, using LASSO with 20-fold cross validation, we chose a lambda value so that the mean cross validated error of the final model was within one standard error of the full model. This allowed us to derive a model that was close to full model in terms of fit, but had fewer predictors. For comparison, we built the models using more traditional techniques as described by Airaksinen et al (32).
We used R 2 to quantify variance explained by the predictors. We evaluated the predictive performance of the final parsimonious model using Harrell's concordance index (C-index) (31). In the context of this study, the C-index gives the probability that a randomly selected individual who had ≥1 very long sickness absence during the follow-up has a higher risk score than an individual who was not on a sickness absence during follow-up. The C-index is equal to the area under the curve (AUC) and has a range from 0.5 (no Airaksinen et al predictive ability) to 1 (maximum predictive ability). C-index under 0.7 represents poor, 0.7-0.8 good, and >0.8 strong discrimination ability. Internal validation was tested using the bootstrapping method, after which we validated the model externally in the HeSSup cohort. Furthermore, we calculated sensitivity and specificity of model at various cut-points in both cohorts. Finally, model calibration was assessed with calibration plots for both FPS and HeSSup data. All analyses were performed using R 3.2.2 (packages: mice, glmnet, hdnom, Hmisc, and leaps).

Results
Descriptive characteristics for both cohorts, and bivariate associations between all items and both outcomes, are provided in supplementary file 2, www.sjweh.fi/ show_abstract.php?abstract_id=3713. In the development cohort (FPS, N=65 775, mean age 43.7 years), 80% of the participants were women, corresponding to the gender distribution in the Finnish public sector. Of the participants, 43 247 individuals (66%, mean followup time of 4.9 year) were ≥1 times on sickness absence lasting >9 days, and 11 858 people (18%, mean followup of 8.1 years) had ≥1 sickness absence lasting ≥90 days. In the validation cohort, gender distribution was more equal (HeSSup, N=13 527, mean age 39.5 years, 57% women); 7499 people (55%, mean follow-up time of 6.4 years), and 2045 people (15%, mean follow-up time of 9.0 years) were on sickness absence lasting >9 days and ≥90 days, respectively. Follow-up times for both outcomes and cohorts are illustrated in figure 1.

Development of prediction score
The full model with all the candidate predictors, including quadratic terms for all but dichotomic predictors, explained R 2 =15.2% of variance in sickness absence lasting >9 days and R 2 =12.5% of variance in sickness absence lasting ≥90 days. With LASSO, we were able to reduce the number of predictors down to 17 for sickness absence lasting >9 days: self-rated health, depression, BMI, sex, socioeconomic position, previous sickness absences, relational justice, procedural justice, number of chronic diseases, job strain, smoking, shift work, working night shift, and quadratic terms for self-rated health, age, Jenkins sleep scale, and GHQ. For sickness absences lasting ≥90 days, 14 predictors were left in the model: self-rated health, depression, sex, age, socioeconomic position, previous sickness absences, number of chronic diseases, smoking, shift work, working night shift, and quadratic terms for self-rated health, BMI, age, and Jenkins sleep scale (table 1). The models developed using more traditional statistical techniques were very similar (supplementary file 3, www.sjweh.fi/ show_abstract.php?abstract_id=3713).

Validation of prediction score
In internal validation, the C-index of the final model was poor, 0.647 (95% confidence interval: 0.644-0.650) for >9-day sickness absence and good, 0.735 (95% CI Figure 1. Distribution of follow-up times for >9 day sickness absences (left) and >90 day sickness absences (right).
Prediction of long-term absence due to sickness in employees 0.731-0.740) for ≥90-day sickness absence. Thus, further analyses were only conducted for sickness absence lasting ≥90 days. Sensitivity analysis with follow-up time varying from 1-5 years replicated these results (table 2). As expected, for both outcomes, the C-indices were higher when follow-up time was short and lowered gradually with increasing follow-up time.
External validation of the model for ≥90-day sickness absence was undertaken using the HeSSup data, although there were minor differences between the development and validation cohorts in the way the predictors were assessed (see supplementary file 1 for the exact wordings of the questions in both cohorts). In spite of these differences, the C-index was close to that observed in the development cohort: 0.727 (95% CI: 0.716-0.738) for ≥90-day sickness absence. In figure  2, the performance of the model for ≥90-day sickness absence is illustrated using receiver operating characteristic (ROC) curves. For details on the sensitivity and specificity of the model, see supplementary file 4, www. sjweh.fi/show_abstract.php?abstract_id=3713.
We used calibration plots to evaluate the performance of the prediction score at different levels of risk. Figure 3 shows the calibration plots of the prediction score in the two cohorts. The predicted risk corresponded very well with the observed risks in the development cohort. A close match was also observed in the validation cohort although the predictions for the top two deciles were underestimate by 2.4% and 6.6%.
A nomogram that visualizes the weight of each factor in the model is shown in supplementary file 5, www.sjweh. fi/show_abstract.php?abstract_id=3713.

Discussion
We developed and validated externally a multifactorial prediction model for sickness absence lasting ≥90 days. The model had a good discriminative ability (C-statistic 0.73) and included the following factors: self-rated health, depression, sex, age, socioeconomic position, previous sickness absences, number of chronic diseases, smoking, shift work, working night shift, BMI, and Jenkins sleep scale. Calibration plots confirmed high correspondence between predicted and observed risk for this prediction tool. These results suggest that long-term sickness absence lasting ≥90 days can be predicted with accuracy that equals those used in primary prevention of common chronic conditions, such as cardiovascular disease (C-index 0.76 with the Framingham score) (33,34) and type 2 diabetes (C-index 0.80) (35).
Many prediction models have already been developed for sickness absence. Compared to those models, our model was based on data that covered a wide range of occupations of all occupational statuses in both sexes, instead of just one industry (16,17) or occupation (10,12,36,37), and our study is the first to develop and validate a multifactorial predictive algorithm for ≥90day sickness absence. An advantage over many of the previous models was that we were able to use an independent cohort for external validation. The predictors found are in agreement with previous studies suggesting that poor self-rated health is a robust predictor for sickness absence (38,39). Similarly, age (17), sex (40), smoking (41), obesity (42), previous sickness absences (16,38), presence of chronic diseases (39) and socioeconomic position (43) have been associated with sickness absences in previous studies. It is noteworthy that none of the work-related factors improved prediction after the inclusion of demographic and lifestyle variables  Developing a parsimonious prediction model that performs well in discriminating those at greater risk of sickness absence is challenging. The few studies that use the total number of sickness absence days (including absence spells with variable length) during the previous year as a predictor were not successful in accurately predicting the total number sickness absence days in follow-up (12,14,44). The models with good discriminative ability with C-statistics >0.70 typically relate to long-term sickness absence. For example, the Dutch "Balansmeter" (8), predicting sickness absence lasting >28 days, has a good discriminative performance, although the model consists of as many as 34 items, compared to 14 in our model. Our model for sickness  absence lasting ≥90 days performed well, both in internal and external validation, while the model for sickness absence lasting >9 days had poor discrimination in internal validation. It is likely that very long sickness absences are caused by more severe health issues with stable determinants, whereas shorter sickness absences could be a result of a wider range of reasons that could be episodic and not have a lasting impact on health, thus making their prediction difficult.
The model we developed included relatively few predictors that people have direct control over. We believe that our model is more useful for risk prediction than as a guide for treatment. Unlike in risk estimators for cardiovascular disease and diabetes, we did not set specific risk cut-off points that should be used as a basis for clinical decision making. Rather, the risk prediction score for ≥90-day sickness absence could be seen as a tool for selecting target groups for a range of preventive interventions using sensitivity and specificity estimates that are appropriate given the features of each specific treatment. More research is needed to identify costeffective measures to reduce the occurrence of work disability for people at different levels of risk.
The present study has some limitations. First, it remains unclear whether the present results are generalizable across different settings and countries or whether they are valid only to the Nordic welfare state model with a universal social security and employee protection. The development and validation cohort studies did not have identical measurement on 2 of the 14 predictors, namely, socioeconomic position and sleep. This means that the highest achieved academic degree and current occupational status as well as the two alternative measures of sleep disturbance might reflect slightly different underlying concepts. However, the developed models performed equally well in the two cohort studies suggesting that this is not a critical difference.
Our findings have implications for future research because good discriminative abilities for ≥90-day sickness absence justify further intervention studies on the benefits of using the predictive score in practice. These studies should determine whether applying predictive tools for long-term sickness absence offer benefit in particular for people in the "grey zone", that is those with a medium-level risk, because accurate assessment of the risk of prolonged sickness absence could inform the healthcare personnel to target them with timely interventions. Predictive tools may provide less benefit in studying high-risk people with known health problems who already participate in preventive interventions implemented by healthcare professionals.

Concluding remarks
Using data from two cohorts and almost 80 000 partici-pants, we developed and validated a prediction score for sickness absences lasting ≥90 days. The score performed accurately in Finnish settings. As a quick-to-administer score, it could be of use for healthcare professionals in identifying individuals at risk of prolonged sickness absence. Further research is needed to assess the potential benefits of using this score in relation to targeted interventions, as indicated by reduced long-term sickness absence rates, and to determine whether this score also identifies groups at risk in other countries with different sickness absence policies.