Long-term sick leave is a risk factor for exclusion from the labor market (1) and return to work (RTW) is associated with better health (2). Stress-related disorders account for a large part of sick leave (3). Consequently, much research has focused on developing and studying interventions to improve vocational outcomes after sick leave. A 2018 review showed that effective interventions usually consist of several components (4), and studies have hypothesized that these components should be integrated (5). Therefore, we conducted a trial (6, 7) to test the Integrated Healthcare and Vocational Rehabilitation (IBBIS) intervention (hereafter referred to as INT). INT aimed to improve both vocational and health outcomes, and we compared it to service as usual (SAU) as well as a mental healthcare intervention (MHC). The primary outcome was time to stable RTW at 12-month follow-up. All results from 6- and 12-months follow-up are published (7). Contrary to our initial hypothesis, SAU showed considerably faster RTW compared to INT [hazard rate (HR) 1.43, P=0.002]. However, INT showed some benefits on some exploratory outcomes but not on secondary outcomes. Compared to MHC, SAU similarly caused increased RTW (HR 1.35, P=0.008) but lower self-reported scores on levels of symptoms and functioning. Thus, against expectations, SAU did not clearly imply insufficient healthcare treatment, and that was the main reason for comparing INT not only to SAU but also to MHC. Conclusively, INT was found to be inferior overall, while MHC was only partially inferior because of the observed the health benefits. This paper reports the 24-month follow-up, since we hypothesized that INT would also yield long-term sustainable RTW that persisted beyond the intervention period.
Methods
Methods are reported elsewhere (6, 7) (supplementary material, www.sjweh.fi/article/4084, supplement 1). Eligible participants were adult sickness absentees (≥4 weeks), with a stress-related disorder. One outcome, recurrence of sick leave, was added at 24 months only. In INT, participants received best practice mental healthcare (BP-MH) and IBBIS vocational rehabilitation (IBBIS-VR). BP-MH was a stepped-care intervention, with treatment intensity determined by baseline symptom level. IBBIS-VR was based on the principles of the sharp-at work-intervention (8) and individual placement and support (9). The INT components were integrated through a range of activities using relational coordination (10). In SAU, participants received any primary sector healthcare and municipal VR they would otherwise have received, had they not been randomized. In MHC, participants received BP-MH but VR in municipal facilities, as in SAU. In INT and MHC, the practitioners’ adherence to manuals was examined through fidelity reviews. Data were obtained through self-report and registers. Outcomes at 24 months were time-to-RTW (a secondary outcome), weeks in work, proportion in work and a range of self-reported outcomes measures, including symptoms, all of which are described in supplement 1. Post-hoc, for 24-month follow-up only, we decided to count the crude number of recurrent sick leaves but without using statistical parametrization.
Statistical analyses
Proportion-in-work outcomes were analyzed using logistic regression and time-to-RTW outcomes using Cox-regression. Self-reported outcomes were analyzed using linear mixed-effects models. Throughout, we adhered to the intention-to-treat principle. Due to multiple testing, P-values <0.017 were considered statistically significant and those <0.05 borderline (the latter was a post-hoc decision). Subgroup analyses were performed according to selected baseline values and time. Sensitivity analyses were conducted by imputing all missing data in best/worst case scenarios, and regarding return to stable work outcomes with different thresholds for what minimal duration constituted “stable” RTW (1, 4, 8 or 12 weeks).
Results
Analyses included 636 participants in total. Figure 1 depicts participant flow, while table 1 shows baseline characteristics. SAU showed faster RTW rates than both MHC with a HR of 1.30 (P=0.013) and INT with a HR of 1.39 (P=0.003). We did not observe a similar difference between INT and MHC. SAU showed a significantly higher number of weeks in work than both MHC with a rate ratio (RR) of 1.21 (P=0.003) and INT with an RR of 1.16 (P=0.016), but no differences were detected between MHC and INT. Proportions in work were similar across the groups at 24-month follow-up. Figure 2 displays the Kaplan-Meier curve for the three groups and table 2 Table 2b the results of all other vocational outcomes. Two years after baseline, the level of anxiety was slightly higher in the SAU group compared to the MHC group, with a borderline statistically significant mean difference of 1.58 (P=0.04). No other self-reported differences were seen at 24-month follow-up. Recurring sick leave and all self-reported outcomes are presented in supplement 2. Sensitivity analyses of all outcomes are shown in supplement 3 and subgroup analyses for all outcomes in supplement 4. We detected no statistically significant interaction or subgroup deviations. Fidelity reviews showed that BP-MH was implemented with high fidelity but IBBIS-VR with only fair fidelity; see supplement 5.
Table 1
a % missing=0.3 b % missing=2.2
Table 2a
Table 2b
a Hazard ratio. b Odds ratio. c Rate ratio. * P<0.0167.
Discussion
SAU yielded consistently better vocational outcomes across multiple measures at all follow-ups compared to both INT and MHC. Regarding symptoms, the slightly lower level of anxiety (borderline statistical significance) in the MHC group compared to SAU at 24-month follow-up might be a chance finding since no other differences were seen across many symptom scales. Both MHC and INT failed to consistently improve the RTW process as hypothesized. Possible explanations for this failure include implementation, programme, or methodological failure (11).
Indications of implementation failure (the consequence of suboptimal implementation of an otherwise truly beneficial intervention): Through fidelity reviews, we observed a range of implementation issues: IBBIS-VR and the activities integrating the two intervention components showed lower implementation degree. As these two parts of the intervention were only part of INT, we can conclude that only interventions in the MHC group were implemented with high fidelity. True integration of the intervention components in INT was challenged by lack of trust and diverging norms and goals between the different sectors, as described elsewhere (12, 13). In other studies, VR services similar to IBBIS-VR have shown positive effect. BP-MH in the INT and MHC groups was implemented with high fidelity. Overall, these observations suggest that the negative outcome of INT may be due to implementation failure.
Indications of theory failure (when an intervention is not beneficial for the participants, even if the intervention is delivered perfectly as protocolled): Theory failure is indicated because MHC interventions were sufficiently implemented. Since BP-MH was delivered in INT as well, part of the failure of INT may also be due to theory failure.
Indications of methodological failure (when an otherwise truly beneficial intervention fails to show positive outcome due to invalid outcome measures): While time-to-RTW is the most common outcome measure in this research field, there is no consensus regarding specific definitions and measures of RTW (14). Strictly defined, the measure implies that the most optimal RTW time is zero. We chose RTW as the primary outcome measure on the basis of scientific precedence, but we argue that this outcome may not be the most appropriate since too fast RTW could imply premature exposure to workplace risk factors for relapse.
Limitations
Implementation issues severely limit both the internal and external validity of the study. Furthermore, due to the nature of the interventions, participants could not be blinded, and the study may be limited by some effects of the allocation procedure in itself.
Implications
Complex interventions involving co-location and integration of multidisciplinary teams are generally difficult to implement (15), as demonstrated by our study and other studies. Even if the interventions are truly beneficial, and the failure to show positive effects in this study solely rests on poor implementation, integration of services requires considerable management and administration. Therefore, any stakeholder intending to either trial or practice similar interventions should be duly cautioned and advised to pilot interventions more elaborately before large scale trialing. Furthermore, any effect of integration activities in our study is probably moderated by national service legislation and properties of the organization that the study intervention in embedded within (16). Thus, international generalization of these effects may not be feasible. Regarding choice of outcomes, further research and discussions between relevant stakeholders should take place to elucidate which outcome set best captures the quality of an individual’s RTW process.
Concluding remarks
This trial compared INT and MHC to SAU. Contrary to our initial hypothesis, both INT and MHC consistently showed significantly lower RTW rates across all follow-ups, compared with SAU. MHC yielded some short-term health benefits, but they were not sustained beyond six months. However, as we cannot rule out implementation failure in INT, and because BP-MH may not have constituted an equally or more qualified mental health service than SAU, we cannot conclude solidly on the results.
Ethics
The trial was registered at www.clinicaltrials.gov (#NCT02885519) and evaluated and approved by the Regional Ethics Committees of the Capital Region (# H-16015724) and the Danish Data Protection Agency (#RHP-2016-006). It was conducted in accordance with Danish and European regulations. An IBBIS team member informed every participant about the objective of the study and the implications of participation, and all participants gave oral and written consent before enrolment.