Long-term work disability is bad for an individual’s health, and returning to work is generally associated with a positive effect on the future course of the disease and work ability (1–3). Individuals who are unable to work due to a disease or disorder can apply for a work disability benefit. In most European countries, this covers both financial support to compensate loss of income and interventions supporting return to work.
Possible predictors for work disability include a broad range of external and personal factors. When conducting medical disability assessments to evaluate whether a work disability benefit should be granted, insurance physicians (IP) predominantly rely on factors relating to the disease and the disorder of a claimant (4, 5). One of the main tasks of an IP is estimating prognosis of work disability and determining if and when a reassessment should be planned (6). Medical reassessments are conducted to determine whether an individual’s health has improved or deteriorated to such an extent that adjustment of support to return to work is required or the continuing eligibility for the benefit has changed. In The Netherlands, claim duration for work disability benefits is long lasting for most claimants and IP consider prognosis of work disability as the most difficult part of the work disability assessment (7, 8). Therefore, accurate prognosis of future changes in work disability is important to identify those in need of return-to-work interventions and for efficient planning of medical reassessments.
Work ability, commonly measured with the Work Ability Index (WAI), is an important concept in the context of work disability duration. It is defined as the physical, mental, and social fit of an individual with the work demands and capability to participate in work (9). Self-assessed work ability is a strong predictor of work disability duration and return to work (10, 11). Clinical decision-support systems, in which characteristics of individual patients are used to generate patient-specific assessments or recommendations that are then presented to clinicians for consideration, are designed to aid decision-making (12). They can optimize the time with the client and improve the overall quality of services (13). A prediction model for future changes in work ability could aid IP in their medical disability assessment and lead to more precise estimation of future work disability. Because resources to perform medical reassessments are limited, the model is of most added value in practice if it can sufficiently identify claimants who will improve in their work ability. This ensures that medical reassessments are planned at the time an assessment interview with an IP has the most added value. However, claimants who perceive a relevant future improvement of their work ability form only a relatively small proportion of the total number of work disability claimants.
Predicting rare events or diseases with probabilistic statistical regression is difficult as these methods tend to be biased towards the majority class and underestimate the probability of rare events (14). Weighted regression can take account of the preponderance of claimants not experiencing a substantial change in their work ability, and focus accuracy on claimants who most likely will experience a change. Weighted least squares have its origin in econometrics and are used in a range of application areas, such as psychology, regional science and time series analysis (15, 16). However, we are not aware of any research in occupational epidemiology using weighted analysis. Therefore, the aim of this study was twofold: to (i) predict changes in work ability of claimants at one year after approval of the work disability benefit by building a model based on socio-demographic, work disability, health, functional limitation and personal factors; and (ii) explore whether the accuracy of predicting claimants with the highest probability of experiencing a relevant change of work ability could be improved by using weighted regression.
Methods
Study population
We used data of the FORWARD study, a longitudinal cohort study among 2539 individuals who applied for a work disability benefit at the Dutch Social Security Institute (SSI) between July 2014 and March 2015, after a two-year period of sick leave. Individuals were aged 18–64 years at inclusion. Claimants suffering from severe mental, cognitive, or visual disorders or those diagnosed with cancer were excluded from the FORWARD study. A more extensive description of the study cohort can be found elsewhere (Weerdesteijn et al. Does self-perceived health correlate with physician-assessed functional limitations in a medical work disability assessment? Submitted for publication, 2019).
From the FORWARD study, we retrieved data from the baseline questionnaire completed just before the medical disability assessment and the questionnaire at one-year follow-up. For each participant, we combined the self-reported data of the cohort study with administrative data from SSI databases. The participants of the FORWARD study all signed informed consent. The Medical Ethics Committee of the VU University Medical Center (Amsterdam, The Netherlands) has approved the FORWARD study.
Inclusion and exclusion criteria
To be included in the present study, the single-item question of the WAI needed to be answered both at baseline and one-year follow-up. Of the 2593 individuals included in the FORWARD study, 42 and 646 participants were excluded because they did not answer this question at baseline and one-year follow-up, respectively. We excluded participants who were ineligible or did not apply for work disability benefits (N=701) and those who were granted a permanent work disability benefit (N=260). In the latter case, there are no possibilities to return to work, and hence no reassessments need to be scheduled. In total, 944 participants were included in the present study.
Dependent variable
The dependent variable of the model was the change in self-reported work ability at one-year follow-up as compared to baseline. Work ability was measured with the first question of the WAI, also referred to as the work ability score (WAS) (17). This question asks participants to compare their current work ability with their lifetime best on a 0–10 scale. Higher scores indicate better work ability. The WAS is significantly correlated to the WAI and can therefore be used as a simple indicator for assessing work ability (18, 19). A single-item measure takes less time to complete and analyze and is, therefore, preferable in terms of costs, interpretation and missing data.
In line with previous studies, we defined an improvement or deterioration in WAS of ≥2 points as the smallest detectable self-reported change likely to have an effect on job opportunities and work disability benefit (20, 21). Based on their change in WAS scores at one-year follow-up as compared to baseline, we divided the participants into three groups: participants with no relevant change (|WAST1 - WAST0| ≤1), an improvement (WAST1 - WAST0 ≥2), or a deterioration (WAST1 - WAST0 ≤-2), with WAST0 and WAST1 the scores at baseline and one-year follow-up. WAST0 was also added as an independent variable to the model.
Independent variables
All independent variables were measured at baseline. The socio-demographics age, gender, marital status, and educational level, as well as the work-related characteristics work status and occupational sector were retrieved from the SSI database. In addition, a number of health characteristics were determined: primary diagnosis, comorbidity, permanency, treatment and medication, and functional limitations as registered by the IP during the medical disability assessment in the list of functional abilities (LFA). The LFA is partly based on the World Health Organisation’s International Classification (ICF) of Functioning, Disability, and Health (22). It consists of 106 items indicating the presence (dichotomous) and severity (ordinal) of limitations, categorized into six sections: personal functioning, social functioning, adjusting to the physical environment, dynamic movements, static posture, and working hours. Higher scores on the ordinal rating scales indicate more severe limitations to perform activities. We considered the average number of limitations of the first five sections and the single question of the last section regarding restrictions in the working hours per day as independent variables. If a claimant is too seriously disabled to return to work, eg, bedridden or receiving inpatient care, limitations are not registered in the LFA. This was the case for 119 (13%) of the participants in our study sample.
Besides registration data from the SSI, a number of self-reported surveys from the FORWARD study baseline questionnaire was used. The Short Form Health Survey (SF-36) is a measure of health status, containing 36 items on physical and mental functioning and role limitations, well-being, pain, general health, and health change. Scores range between 1‒60, higher scores indicating better health status (23). The Whitely Index (WI) contains 14 items to measure health anxiety. Scores range between 0‒14, with higher scores indicating more severe health anxiety (24). The Hospital Anxiety and Depression Scale (HADS) produces scales for anxiety and depression. Scores range between 0–21, with higher scores indicating higher distress (25). The Work and Well-being Inventory (WBI) measures symptoms, coping, support, stress, and disability with 87 items. Scores range between 0–84, with higher scores indicating more barriers for return to work (26). We also retrieved household and work-related characteristics, the latter regarding work demands and managerial tasks. The questionnaire also asked respondents about their expectations with respect to recovering and getting back to work.
Statistical analysis
Multinomial logistic regression analysis was used to predict changes in work disability at one-year follow-up. We fitted both standard and non-parametric multinomial logit (MNL) estimates. See figure 1 for the specification of the non-parametric MNL estimates. Because we were most interested in accurately predicting the largest improvements in WAS, we used the following linear weight function for claimants who experience an improvement in WAS (ie, WAST1 - WAST0 ≥2):
wi = ½(WAST1 - WAST0) + 1
For all claimants who did not experience an improvement in WAS (ie, WAST1 - WAST0<2), the weight was set to wi=1. By using the above linear weight function, claimants with an improvement in WAS of 2 points were assigned twice as much weight as claimants not experiencing an improvement in WAS. Because larger weights were assigned to claimants with a larger improvement in WAS, the model focusses on accurately predicting these claimants. In application areas where weighted regression is more often used, weight functions are often linear or exponential functions. For instance, in geographically weighted regression, locations that are closer get higher weights. In time series analysis, weights decrease for observations further back in time. Hence, the linear weight function of the present study is in line with weight specifications in other research (15, 27). Because weighted regression procedures are not commonly used in occupational epidemiology, there is no general approach to specify the exact weights that should be given to observations. Hence, we tried several weight functions and examined the effect on the performance of the prediction model. Assigning a weight equal to one to claimants with an improvement of WAS would result in the standard MNL model. Therefore, we considered assigning weights equal to 1.5, 2, 2.5, or 3. We did not consider weights >3 as we felt this would place disproportional emphasis on claimants with an improvement in WAS. Although the differences between the weight functions were small, we chose a weight of 2 in the final model as this resulted in the highest sensitivity, ie, the model that could identify most claimants with an improvement in WAS. The positive predictive value (PPV) and negative predictive value (NPV) were similar for the different weight functions that were considered.
The models were built using three steps. First, we performed univariable analyses to test the association of each independent variable with the outcome variable using likelihood ratio (LR) tests (cut off score P>0.2). Second, the variables remaining from the univariable analyses were tested for multicollinearity using variance inflation factors (VIF). We considered VIF <10 to be acceptable (28). Third, we selected the subset of predictors for the final model using a hybrid approach combining forward and backward selection procedures.
Before the start of the analysis, we randomly split the data into a training set (80% of the study population) to fit the models and a test set (20% of the study population) to evaluate the models. The purpose of developing the prediction model is that it can be used in practice. This means that we want to know how well the model predicts new cases. Therefore, the test set, ie, the held-out sample, is used to get an unbiased estimate of model effectiveness.
We calculated several performance measures to compare the standard and weighted MNL model. We reported both specificity and sensitivity as these are important measures of diagnostic accuracy of a model. However, they are of no practical use when IP need to estimate the probability of improvement in WAS for individual claimants (29). Hence, predictive values are more meaningful performance measures in this context. In general, there is a trade-off between sensitivity and predictive values. We can indicate the added value of the weighted model if it results in predictions with both higher sensitivity and predictive values.
We used McNemar’s test to statistically assess whether the standard and the weighted model had a similar proportion of errors on the test set. Calculation of the test statistic is based on the contingency table. It tests whether the models have equal accuracy for predicting true improvements in WAS, ie, it detects whether the difference between the misclassification rates of the models is statistically significant. The level of significance was set at P<0.05.
All analyses were performed in RStudio for Windows, version 0.99.902.
Results
Tables 1 and 2 show the baseline characteristics of the study population. Mean WAS on baseline was 2.5 [standard deviation (SD) 2.1], and 2.8 (SD 2.2) at one-year follow-up. The majority of the study population (N=599; 63%) did not experience a change in WAS at one-year follow-up; 208 claimants (22%) experienced an improvement in WAS [mean WAS improvement 3.1 (SD 1.5)] and 127 a deterioration (15%).
Table 1
Table 2
In this section, we mainly focus on the results of the 187 claimants who were randomly selected to be included in the test set. Among this group, the percentage experiencing a WAS improvement at one-year follow-up was slightly higher than that of the training set (24% versus 21%). Of all cases in the test set, the standard model predicted for 16.9% of the total number of claimants, an improvement of the WAS at one-year follow-up (table 3). The sensitivity was only 22%, showing that it was difficult to identify relevant claimants with standard regression procedures. The PPV was 62% and the NPV 79%. Eight variables ended up in the standard model: WAS at baseline, work status, WBI disability, wage loss, SF36 energy, SF36 physical functioning, WBI symptoms, and WI.
Table 3
The weighted model predicted a larger number of improvements compared to the standard model (table 3). The number of predicted cases increased from 16 to 27, ie, from 9% to 14% of the total number of claimants, and was now closer to the percentage of actually observed improvements in the study population (22%). The PPV and NPV were 63% and 82%, respectively. The weighted model contained 11 variables. It included the same variables as the standard model, except for the variable WI. Additionally, the variables LFA static posture, LFA working hours, mental healthcare, and SF36 health change were added. All the VIF scores in the collinearity statistics for the multivariable models were <10, therefore multicollinearity was not assumed. The last two columns of table 1 show the coefficients of the multivariable logit models.
The sensitivity, ie, the model’s ability to correctly detect claimants with an improvement in the WAS, increased from 22% to 37% when we compared the weighted to the standard model (table 4). Both the PPV and NPV of the weighted model were slightly higher as well; the PPV increased from 62% to 63%, and the NPV increased from 79% to 82%. This means that the predictions of the weighted model were correct more often than the predictions of the standard model, although the differences were small.
McNemar’s χ2 was equal to 6.667 and a corresponding P-value of 0.0009. This means that the two models had a different proportion of errors on the test set. The contingency table showed that the number of cases that the weighted model predicted correctly was higher than the number of claimants correctly classified by the standard model. The total number of claimants who were classified differently by the weighted model compared to the standard model was 15, which was sufficiently large to provide accurate P-values for McNemar’s test (minimum number is 10) (30).
The results that the weighted model was better at predicting claimants who will experience an improvement in WAS at one-year follow-up for the test set were in line with the results of the training set. In the test set the percentage of claimants identified increased from 9% to 14%.
Discussion
The aims of this study were to (i) predict changes in work ability at one year after approval of the work disability benefit and (ii) explore whether weighted regression procedures could improve the accuracy of predicting claimants with the highest probability of experiencing an improvement in WAS. A minority of 22% of the claimants in our study population experienced an improvement in WAS. Our standard model predicted a relevant improvement in WAS for only 9% of the claimants, while the weighted model predicted this for 14%. However, the PPV of the weighted model did not decrease compared to the standard model. Likewise, the NPV slightly increased. Hence, the weighted model predicted more claimants who will experience a relevant improvement in WAS at one-year follow-up. At the same time, IP can be more certain that the model predicts the correct outcome.
We used a weighted regression model with a linear weight function that assigns larger weights to claimants with a bigger improvement in WAS. Our finding that the weighted model could correctly identify a larger group of individuals with an improvement in WAS in both the training and test sets implies that our weight function could also be of added value in a population that was not used to build the models. However, as the set of possible weight functions is inexhaustible, it could be that there are other weight functions that provide similar or better results than the weight function we have chosen.
The majority of individuals in the study population (63%) did not experience a change in WAS at one-year follow-up. This is in agreement with previous research showing that changes in WAI are small for most individuals, especially for those with longer episodes of sickness absence (18, 31). Determinants of work ability have been reported in several studies. In the present study, work ability at baseline was the strongest predictor in both models. This is in line with previous research showing that, for sick listed workers diagnosed with cancer, WAS at baseline was an important predictor for WAS at one-year follow-up (32). This study also showed an association with wage loss, as we found that individuals with a lower level of wage loss were more likely to experience an improvement in WAS. A higher level of wage loss means more extensive functional limitations, which seems to have a negative effect on work ability at one-year follow-up. This relation was also found for degree of sickness absence and changed WAS at 6- and 12-months follow-up for women on sick leave for ≥60 days (18). Several studies have also found a relation between the WAI and mental and physical conditions, demands at work, individual characteristics and lifestyle (33, 34). These studies did, however, not report measures of diagnostic accuracy (eg, sensitivity and predictive values) of the estimated models.
As pointed out in a recent editorial on prediction models for sickness absence, researchers should be careful making claims on the accuracy of these models (35). Although the difference between the standard and weighted model in terms of predicting claimants with an improvement in WAS was statistically significant, it was small and it is therefore questionable whether this difference is relevant. However, in the current policies of the SSI, because of the limited capacity to perform IP reassessments and the fact that only a minority of 22% of the individuals actually experienced an improvement in WAS at one-year follow-up, the prediction model may be a relevant tool for identifying the group of claimants with the highest probability of experiencing an improvement in WAS. Our focus was not on predictions at the individual level, but at a population level. Hence, the small differences between the standard and the weighted model are regarded as useful in achieving a more effective allocation of limited occupational health care resources. The weighted model identifying 14% of the claimants, as opposed to 9% with the standard model, with 63% accuracy is considered as a useful auxiliary tool for IP when they plan reassessments. Likewise, in case the model predicts no substantial improvement in WAS at one-year follow-up (which is the case for 86% of the claimants), this could be an indication that for this group of claimants scheduling a reassessment at one-year follow-up has less added value as the NPV is 82%. These probabilities are much higher than the case where the SSI policy is to plan reassessments at random. However, it could be argued that, in other applications, the differences between the two models shown in the present study are too small to be of practical relevance.
We are not aware of any prediction model for future changes in work ability for individuals with a work disability benefit. Previous studies on long-term sickness absence in the general working population have shown that it is difficult to develop prediction models with high prediction accuracy that are relevant in practice. Studies identifying claimants at risk for work disability and long-term sickness absence showed only moderate prediction accuracy (36, 37). Studies on prediction models for individuals with specific chronic diseases such as low-back pain or common mental disorders validated prediction models in terms of PPV and NPV (38–40). Similar to the results of the present study, the NPV of their models were in the range of 74–98% which is considered high. However, they reported PPV of 33–57% which is lower than the PPV of our model (63%).
Strengths and limitations
A strength of the present study is that, by fitting weighted MNL, we are better able to meet practical needs. Non-parametric models offer important advantages because they can focus accuracy on claimants who most likely will experience a change in their entitlement of the work disability benefit. Moreover, by dividing the study sample in a training set to build our prediction models on and a test set to validate the models, we were able to assess the predictive accuracy and generalization of the model.
A further strength is that we combined self-reported questionnaire data with administrative data. This enriches the understanding of a broad range of medical, social, psychological, and work-related factors that can influence future work ability.
Moreover, whereas most studies about predictors of work disability duration and return to work focus on a specific category of diagnoses, our study cohort included a broad range of diseases and disorders. A limitation of our study is that two groups of individuals were excluded from the FORWARD cohort and could therefore also not be included in our study: individuals suffering from severe mental, cognitive, or visual disorders (eg, dementia or psychosis), due to their reduced ability to correctly complete the questionnaires, and individuals diagnosed with cancer.
A study limitation is that the FORWARD cohort questionnaires were not designed to identify the best independent variables for predicting changes in work ability. For instance, own expectation about future changes in work ability were not covered in the questionnaire while the individual’s own expectations are important predictors for duration of long-term sick leave and return to work (41, 42). Moreover, the administrative data that we used was not collected for research purposes but rather registered by SSI employees for administration purposes. However, the FORWARD cohort questionnaires are extensive and, by combining them with administrative data, we were able to cover a broad range of potential predictors. A final limitation of this study is our reliance on changes in self-reported work ability. In line with previous studies, we defined an improvement or deterioration in WAS of ≥2 points as a relevant change (10, 11, 20). However, it should be investigated if this is also the case for our study population.
Implications for research and practice
Commonly reported outcomes in epidemiological and medical research, such as the incidence of clinical events among a cohort of patients or the response rate in patients taking a certain treatment regimen, are rare events and usually difficult to estimate. Disease predictions can contribute to a wide range of applications, such as risk management, tailored health communication, and decision support systems (43, 44). Weighted analysis could aid these applications by making more accurate predictions of rare events and diseases.
Identification of claimants with a high probability of experiencing an improvement of work ability at one-year follow-up may assist IP during the medical disability assessment when they need to predict future work ability. This can aid accurate prognosis of work ability and providing suitable interventions to return to work.
To be used in practice, the prediction model needs to be supported by a suitable tool, which is easy to access and interpret for professionals. Future research should focus on the preferable design and content of such a decision support tool. Next, a cost-effectiveness analysis and process evaluation should be performed to determine the added value of the model for IP in making accurate prognoses of work ability.
Concluding remarks
This study showed that, compared to standard MNL models, there are indications that weighted regression procedures can correctly identify more claimants who experience an improvement in WAS. Our findings suggest that a weighted analysis could be an effective method in epidemiology when predicting rare events or diseases. More research is needed to examine the added value of weighted regression procedures in occupational epidemiology.