Health economic evaluations of interventions to increase physical activity and decrease sedentary behavior at the workplace: a systematic review

This systematic review identified 18 economic evaluations of worksite physical activity and sedentary behaviour interventions. Effects were small and the impact on costs was uncertain. Therefore, the economic evidence of these interventions remains unclear. Future studies are needed to determine which strategies work best. Economic evaluations of such interventions should be established using sound methodology and model the long-term cost-effectiveness. Health economic evaluations of interventions to increase physical activity and decrease sedentary behavior at the workplace: a review. Objective The workplace is an ideal setting to implement public health strategies, but economic justification for such interventions is needed. Therefore, we performed a critical appraisal and synthesis of health economic evaluations (HEE) of workplace interventions aiming to increase physical activity (PA) and/or decrease sedentary behavior (SB). Methods A comprehensive search filter was developed using appropriate guidelines, such as the Peer Review of Electronic Search Strategies (PRESS) checklist, and published search algorithms. Six databases and hand searches were used to identify eligible studies. Full HEE of workplace interventions targeting PA/SB were included. Methodological quality was assessed using the Consensus Health Economic Criteria (CHEC) list. Two researchers independently performed all procedures. Hedges’ g was calculated to compare intervention effects. Outcomes from HEE were recalculated in 2017 euros and benefit-standardized. Results Eighteen HEE were identified that fulfilled on average 68% of the CHEC list criteria. Most studies showed improvements in PA/SB, but effects were small and thus, their relevance is questionable. Interventions were heterogeneous, no particular intervention type was found to be more effective. HEE were heterogeneous regarding methodological approaches and the selection of cost categories was inconsistent. Indirect costs were the main cost driver. In all studies, effects on costs were subject to substantial uncertainty. Conclusions Due to small effects and uncertain impact on costs, the economic evidence of worksite PA/SB-interventions remains unclear. Future studies are needed to determine effective strategies. The HEE of such interventions should be developed using guidelines and validated measures for productivity costs. Additionally, studies should model the long-term costs and effects because of the long pay-back time of PA/SB interventions.

The positive health effects of physical activity (PA) are undisputed. PA is well known to improve muscular and cardiorespiratory fitness and therefore decreases the risk for many non-communicable diseases such as hypertension, stroke, diabetes, coronary heart disease and various cancers (1). The World Health Organization (WHO) recommends ≥150 minutes of moderate-intensity aerobic PA, 75 minutes of vigorous-intensity aerobic PA or an equivalent combination of both, per week (2). About 31.1% of adults worldwide do not meet these criteria and are thus physically inactive (3). At the same time, seden-tary behavior (SB) in today's society is increasing (4,5). SB is defined as "any waking behavior characterized by an energy expenditure ≤1.5 metabolic equivalent of task (MET) while in a sitting, reclining or lying posture" (6). Physical inactivity (PIA) and SB are not synonymous. For example, one can meet the recommendations for PA (and thus be sufficient physically active) while being too sedentary. Furthermore, causes for SB and PIA as well as biological mechanisms affecting health may be different for SB and PIA (7). However, there is evidence of an interaction between PIA and SB in relation to health. A large-scale meta-analysis showed that PA can attenuate or even eliminate the detrimental influence of SB on health (8). Therefore, increasing PA or reducing SB are both beneficial for health and interventions should focus on both. The consequences of PIA and SB are substantial. Insufficient PA is a major cause of ≥35 chronic diseases (9) and represents the fourth leading risk factor for mortality (10). PIA is responsible for 13.4 million disability-adjusted life years (DALY) and >5 million deaths every year (11,12). Likewise, excessive SB is clearly correlated with major chronic diseases and all-cause mortality (13,14).
PIA and SB result in an important economic burden to societies. Worldwide, in 2013, the economic burden related to PIA was estimated at INT$53.8 billion (direct medical costs) and INT$13.7 billion (indirect costs due to productivity loss) (11). As epidemiologist Jerry Morris pointed out as early as 1994, PA to treat PIA is a "best buy" intervention (15). "Best buy" interventions are highly cost-effective and have substantial public health impact. Decreasing the prevalence of PIA will thus not only positively impact health but also have a high probability of counteracting the rising health care costs. As an example, a Canadian health impact analysis showed that a 10% reduction in the prevalence of PIA would save the society CAN$150 million each year (16).
The reduction of PIA by 10% by 2025 is one of the WHO's nine global non-communicable-disease targets (1). However, societal trends like urbanization, motorized transportation, electronic entertainment and internet-based communication devices, may hamper the attempts to decrease prevalence of PIA and SB. Global and national policy developments as well as intervention strategies to increase PA among populations at risk, thus far, have not worked satisfactorily (17). The 2016 Lancet series on PA pointed out that the WHO target will not be reached without an immediate increase in action (18).
A promising way to tackle PIA and SB through activities of daily living is to offer interventions at the workplace (19)(20)(21). Adults spend most of their waking time at work and many occupations are typically related to SB (22). Furthermore, productivity of employees is known to be positively influenced by higher activity levels (23)(24)(25)(26). Thus, employers may also benefit from reduced PIA and SB through decreased absenteeism and presenteeism.
While there is evidence to support effectiveness of interventions at the workplace to counteract PIA and SB (19,(27)(28)(29), consequences on costs and health effects (ie, the "efficiency") should also be considered. Making an economic case for reducing PIA and SB at the workplace may sensitize employers, the public health sector, as well as political decision-makers, to support, develop, fund and implement such interventions at the workplace (8,18,30).
To our knowledge, only one review on the costeffectiveness (31) and one review on the cost-benefit (32) of PA and nutrition interventions at the workplace have been performed. Regarding cost-benefit analyses, the results were ambivalent as the authors found a positive return-on-investment (ROI) in non-randomized trials but a negative ROI among randomized trials. No conclusion could be made in terms of cost-effectiveness of workplace PA interventions because the methodological quality of the included studies was low and the results uncertain. Compared to these two reviews by van Dongen et al (31,32), there are three novel parts in the current study. First, PA and SB seem to have an interactive relationship with health (8). Therefore, the current study focused on both PA and SB interventions. Second, some of the interventions reviewed by van Dongen et al did not directly measure the impact on PA. To better understand the impact of the interventions, only studies which reported effects on PA/SB were included in the current review. Third, since the van Dongen et al reviews were published in 2011 and 2012, it is very likely that more recent evidence exists. As no review on the present research question is available, the goal of the present study is to perform a critical appraisal and synthesis of health economic evaluations of interventions aiming to increase PA and/or decrease SB at the workplace.

Methods
This systematic review was carried out following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (PRISMA) (33) and the five-step approach for systematic reviews of economic evaluations (34). The protocol was registered in the PROSPERO database (CRD42019122063).

Eligibility criteria
Studies performing a full health economic evaluation (HEE), ie, simultaneously analyzing costs as well as health effects of an intervention to increase PA and/or decrease SB in the context of worksite health promotion (WHP) compared with one or more alternatives (ie, the comparator) were included. This includes cost-effectiveness analyses (CEA), cost-utility analyses (CUA) or cost-benefit analyses (CBA). In such analyses, costs are always expressed in monetary units, while effect sizes can be expressed in terms of natural units (CEA), quality-of-life proxies (CUA) or in monetary units (CBA). Single-study based HEE (ie, alongside an RCT/ cohort study) and model-based HEE (ie, modelled costs and effects with data derived from different sources such as the literature or databases) were eligible for inclusion.
No limits were set for gender, country or type of industry in which the WHP program took place. Interventions could include education, counselling, onlineinterventions, any form of PA (eg, lunch walks, fitness centers, exercise groups) or ergonomic interventions (eg, standing desks). Multicomponent interventions which focused on different health outcomes were included if the intervention for PA/SB constituted a main component of the WHP program.
Studies were included if they reported effects on PA and/or SB. Effects could be reported in "natural units" (eg, MET minutes, energy expenditure, time of moderate/vigorous PA, sitting/standing time etc.) or as proportions (eg, number meeting the PA guidelines, prevalence of PIA etc.). Table 1 summarizes the PICO (problem/ patient/population, intervention/indicator, comparison, outcome) elements of this review.
No language limitations were set. The time horizon was set to the previous 20 years (1998-August 2019), as since then, computers and the internet have had a revolutionary impact on culture, communication and working conditions.

Information sources
A comprehensive literature search was performed in Medline (PubMed), Embase, EconLit, Web of Science, Scopus and NHS Economic Evaluation Database. Additionally, a keyword search in Google Scholar was carried out. In order to increase the sensitivity of the search, references of relevant reviews and from included articles were checked (backward tracking). Furthermore, screening of "cited by" articles (forward tracking) as well as expert interviews were performed. Update notifications from database searches were set and relevant studies were added throughout the process.

Search strategy and study selection
The database search strategy was established using the Peer Review of Electronic Search Strategies (PRESS) checklist (35), CADTH's Database Search Filters (36) and published recommendations to identify economic evaluations (37).
Sensitive search filters according to PICO were built. The C-element (comparison) was not further defined for the search strategy and was therefore omitted to maintain sensitivity of the search filter (see supplementary material, www.sjweh.fi/show_abstract. php?abstract_id=3871, table S1). Search results were stored in reference manager software (Zotero, version 5.0.59). After removing duplicates, titles and abstracts were screened. Fulltexts of relevant studies were consulted for definitive inclusion and reasons for exclusion were noted. Two independent researchers performed the search, the screening process and inclusion of studies. A consensus discussion between the researchers took place after title and abstract screening, as well as after fulltext consultation.

Data collection
Two independent researchers extracted data on study characteristics and outcomes of the economic evaluations and captured these in prepared digital forms. A consensus discussion took place at the end of the data extraction process. The research team was consulted in case of discrepancies and ambiguities.

Data items
The following data were extracted from the included studies: study details (publication year, country, design, perspective, time horizon), characteristics of study participants, details of the intervention and the comparator, measurement and valuation of effects and costs, incremental costs, incremental effects and economic metrics (incremental cost-effectiveness ratios (ICER), incremental costs-utility ratios (ICUR), net monetary benefit (NMB), benefit-cost-ratio (BCR) and return-oninvestment (ROI)). Where applicable, 95% confidence intervals (95% CI) were reported. Study authors were contacted in case of missing data.

Data synthesis
To the best of our knowledge, no general accepted method to pool estimates from HEE (ie, ICER) is available. Standard deviations (SD) or CI for cost data are often lacking, which makes pooling of costs impossible (32). It is difficult to compare WHP interventions because they need to match individual and local situations in companies as well as national (health) policy regulations. Consequently, reviewers concluded that interventions, time horizons and outcome measures differ substantially among studies (38,39). Taking this heterogeneity into account, plausibility to pool effects was not present and hence, pooling was not deemed possible. Thus, our analysis remains purely descriptive and studies were analyzed qualitatively. However, several approaches were performed to enhance comparability of included studies. To quantify the effects, standardized effect sizes (Hedges' g) were calculated following the instructions in the Cochrane handbook (40). All costs were converted to 2017 euros. In step 1, original costs were adjusted using the gross domestic product (GDP) deflator index provided by the International Monetary Fund (IMF) World Economic Outlook Database (41). As published GDP deflater indices by the IMF are only available till 2017, all prices were adjusted to the price year 2017. If the reference year for costs was not reported in the studies, the year of publication was used in conversion.
In step 2, original currencies were converted into euros (Belgium), accounting for purchasing power parities (PPP) between countries (42). Costs in the target currency and the target price year were calculated according to the following formula (43): Whether an intervention is cost-effective or costbeneficial depends on the perspective and thus on which costs were considered in the HEE. To provide a more comprehensive synthesis, benefit-standardized ROI/ ICER were calculated (32). If, for example, productivity costs and health care costs were considered, three ROI/ ICER were calculated: one considering only productivity costs, one considering only health care costs and one considering both.
Costs and CBA metrics were calculated for each study and descriptively summarized by means, SD, and medians (32).

Methodological quality appraisal
The Consensus Health Economic Criteria (CHEC) list was used to assess the methodological quality of the HEE (45). The CHEC list is a generally accepted criteria list consisting of 19 items, which should be regarded as a minimum standard for HEE. Items can be rated as positive, negative (inadequate methodology or insufficient information) or not applicable (NA). Two independent researchers applied the CHEC list and agreement among raters was evaluated using Inter Class Correlation (ICC) statistics. Discrepancies were discussed in a consensus meeting.

Literature search
Database searches yielded 3124 results of which 624 were duplicates. Additionally, 32 articles were identified through reference screening of 52 reviews in the field of WHP. After screening 2530 records, 198 fulltexts were assessed for eligibility and 17 studies (45-62)were included. One additional study (63) was included in the course of the work progress due to notifications from saved searches in databases (figure 1). Searches in databases other than PubMed did not yield additional studies.

General study characteristics
Eleven studies were randomized controlled trials (RCT), of which seven used a cluster-randomization. All cluster-RCT randomized the clusters at once and all but one took clustering into account for the statistical analysis.
One study only randomly allocated a proportion of the participants. Non-randomized controlled trials (N-RCT) were cohort studies (N=6) of which three were partially modelled (eg, the impact of health benefits on health care costs). One study was completely modelbased. Sample sizes ranged from 60-1260 in RCT and Cost original is the original cost in the original currency. All economic metrics were recalculated using 2017 Euros. Economic metrics are often calculated using different methods (44). The following formulas were applied in this study: In most studies, participants were employees with no specific health condition (N=12). These studies used general exclusion criteria such as pregnancy, inability to perform PA, long-term sick leave or no regular employment contract. Three studies focused on overweight employees and one study each on older employees (45 years), employees with an unhealthy life-style and employees with the diagnosis of diabetes, hyperlipidemia or hypertension. See table 2 for more details.

Interventions
Five studies focused on PA, six on PA and nutrition, and one on SB. Effectiveness data of the latter study were used in the model-based study. Five studies focused on a number of different health risks (eg, PIA, smoking, high alcohol consumption, high cholesterol, blood pressure or poor nutrition) which were identified through a health risk appraisal.
All but one study used some form of education/ counseling, but the techniques differed between studies. Studies used one or a combination of the following elements: written information, websites, e-mails, faceto-face coaching, group sessions, phone calls, videos or posters. Most studies reminded employees on a regular basis to implement the suggestions from the counseling sessions. Five trials also distributed pedometers and two studies provided financial incentives for performing PA. Two studies described environmental interventions such as the introduction of table tennis and exercise balls or a scan of environmental factors which may inhibit PA (eg, no shower facilities).
Fourteen studies described that the intervention included techniques of behavior change. However, it was difficult to evaluate to what extent these techniques were put into practice. Often, it was not clear to what extent employees had access to facilities to perform PA (eg, exercise groups, swimming pools, fitness centers, walking paths), which was explicitly reported in five studies. In one study, the intervention was actually a PAintervention consisting of one weekly yoga session, one weekly fitness workout and one weekly unsupervised training session.
The studies on SB used counseling techniques together with the implementation/installation of standing desks.    The intervention may be cost-effective for "need for recovery" depending on the decision-makers' willingnessto-pay. All interventions had a negative ROI To enhance comparability of effects across studies, standardized effect sizes (Hedges' g) were computed (figure 2). For two studies, the standardized effect size could not be calculated due to insufficient data. Four effect sizes for PA were negative (-0.25--0.01), eleven were 0-0.3 and five were >0.3. The median effect size was found to be 0.1 (interquartile range 0.02-0.24). There was no clear pattern for different intervention contents or type of outcome measure related to the effect size. However, the only study which applied a PA intervention (weekly yoga and fitness sessions) yielded the biggest effect size (g=1. 3). From all 20 PA-related effect sizes, six were significantly larger than zero. Three of eleven RCT and three of four N-RCT reported significant effects.

Continues
The three effect sizes regarding SB ranged from 0.06-0.29, with one being significantly larger than zero.

Costs
Costs reported in the studies could be divided into three subgroups: intervention, direct medical (health care, outof-pocket) and indirect (due to absenteeism and presenteeism) costs. All but two studies reported intervention costs. One of the latter studies did not report any of the costs separately. Descriptive analysis of the intervention costs among 16 studies yielded an arithmetic mean of €174 (SD €147, median €128) per person. Ten studies included direct medical costs of which seven found them to be lower in the intervention group during follow-up. However, these differences were uncertain due to large SD and thus, not statistically significant. Twelve studies included indirect cost in terms of presenteeism (N=1), absenteeism (N=3), or both (N=8). In ten of the twelve studies considering indirect costs, they were found to be lower in the intervention group during follow-up. Again, these differences were not significant. Indirect costs were the main cost-driver. In studies providing sufficient information on indirect costs (N=6), these represented 87.9% of the total costs. In four studies reporting absenteeism costs and presenteeism costs separately, presenteeism accounted for 82.4% of indirect costs.
The mean difference in total costs between intervention and control group was calculated for each study. Descriptive summary of these differences among studies yielded a mean difference of €0.45 (SD €752, median €31.4) per person in favor of the control group. A complete overview on costs can be found in table S3.

Methodological quality of economic evaluations
Agreement between the two raters for total scores of the CHEC list was high (ICC 0.98, 95% CI 0.94-0.99). On average, studies fulfilled 68% of the minimum-standard criteria of the CHEC list. Most studies described the study population (N=17), posed a clear research question (N=18), chose an appropriate time horizon (N=17) and identified all relevant outcomes (N=16). Less than half of the studies identified all relevant costs (N=8), measured costs in physical units (N=8), valued costs appropriately (N=7) and performed sensitivity analyses (N=7). Of eight studies with a time horizon over one year, three discounted costs. See table S2 for more detail.

Health Economic Evaluations
The perspective of the HEE was reported in thirteen studies. For the remaining five, the perspective was presenteeism costs; c includes medical costs; d includes "sport costs" (eg, expenses for sport shoes); e sample size for costs and effects is different; f follow-up for costs and effects is different; g costs are not reported, but the study reported that the intervention was cost-saving (ROI = 864%); k randomized controlled trial; m non-randomized controlled trial.
anticipated based on the available information. HEE of included studies used the employer's perspective (N=9), the societal perspective (N=4), the societal and the employer's perspective (N=3), the healthcare payer perspective (N=1) as well as the healthcare payer perspective and the employer's perspective (N=1). Studies consisted of CBA (N=7), CEA (N=3), CUA (N=3), CBA and CEA (N=3), CEA and CUA (N=1) and all three types (N=1). Studies reporting multiple perspectives typically performed a CEA or a CUA from the societal perspective and a CBA from the employer's perspective.

Cost-effectiveness analyses
ICER for fifteen studies could be benefit-standardized, ie, they were calculated considering different combinations of cost categories. The most generalizable perspective for an HEE is the societal perspective as it includes all costs (34). ICER from the societal perspective were calculated for eight studies and were found to be heterogeneous. In three studies, the ICER was dominant, ie, the intervention was more effective and less expensive than the comparison. For example, the ICER in the study by van Wier et al (61) was -€3.11/minute PA, meaning that €3.11 were saved per one additional minute of PA per week. In three studies, the intervention was more effective but also more costly as compared to the comparison. For example, the ICER in van Dongen et al's study (59) was €18.63/minute of sport, meaning that the increase in participation in sport of one minute per week costs society €18.63. In one study, the intervention was less costly but also less effective. One study yielded conflicting results as there was a negative and a positive effect among the two PA-related outcome measures. In two studies, the sample size for costs and effects differed; in two other studies, the follow-up time for costs and effects differed and in one study, both differed.
None of the studies yielded significant differences in costs and effects. This indicates that ICER are subject to substantial uncertainty and should therefore be interpreted with caution. For more detail on benefitstandardized ICER, see table 3.

Cost-benefit analyses
In line with ICER, the ROI were also recalculated and benefit-standardized for each study providing sufficient data. As with ICER, ROI across studies presented a heterogeneous picture. When considering the societal perspective and thus, including all the costs, ROI ranged from -450.47-864%. There was one outlier (12 246.18%) which was due to a very small investment (difference in intervention costs between the groups was only €13.21) rather than very high benefits. The median ROI was close to zero, regard-less of whether direct costs (-30.09%), indirect costs (44.64%) or all costs (31.09%) were included (supplementary figure S1). Only one study reported 95% CI of ROI estimates.
The ROI was found to be related to study design: the median ROI was -39.0% in RCT and 292.37% in N-RCT (P=0.03, figure S2). Spearman's rank correlation between CHEC list rating and ROI was -0.63 (P=0.03, figure S3). See table 3 for more detail.

Discussion
This systematic review aimed to evaluate and synthesize the health economic evidence of workplace interventions designed to increase PA and/or decrease SB. Eighteen HEE were included and analyzed.

Effects
Most interventions improved PA across all outcome measures, but effects on PA were variable and generally small. Although most studies used some form of counseling, interventions were heterogeneous. We were unable to link particular intervention elements to higher effects. These findings are in line with the findings of previous reviews which investigated effectiveness of worksite PA interventions (19,64). However, two studies with large effects have been identified. White et al's study (62) reported that exercise, expressed as time/ week, had increased by 106 minutes (Hedges' g=1.04). In this small (N=25) study, participants received comprehensive and individual health coaching from an interprofessional intervention team, led by a pharmacist. Compared to the other included studies, this intervention corresponds more to a clinical setting rather than a typical workplace setting and was clearly higher dosed. Furthermore, this study used a pre-post design, included volunteer employees and may therefore be subject to selection bias. Finally, it should be mentioned that despite the large effect, variation among participants was large. The van Dongen et al study (59) found that employees in the intervention group increased their sport time/week by 33 minutes compared to the controls (Hedges' g=1.3). This was the only intervention which consisted of a physical activity program (ie, exercise classes) rather than counseling only. Offering concrete situations to perform PA may therefore be more effective. Interestingly, the reported intervention costs in this study were €162 and thus not different from mean intervention costs from all studies (€174). However, it is worth mentioning that the large effect size was mainly achieved by a very small SD rather than a large effect.
Three studies measured SB and all found positive  a ICER were reported together with their location on the cost-effectiveness plane. The cost-effectiveness plane presents the effectiveness of the intervention on the x-axis and the total costs on the y-axis and consists of four quadrants. ICER in the south-east [SE] quadrant indicate that the intervention is more effective and less expensive. ICER in the south-west [SW] quadrant indicate that the intervention is less effective and less expensive. In the north-west [NW] quadrant, the intervention is less effective and more expensive while ICER in the north-east [NE] quadrant of the plane indicate that the intervention is more effective but also more expensive. In this situation, the cost-effectiveness depends on the willingness to pay for one additional unit of effect.
effects. The only study which set SB as primary outcome, found significant and relevant effects. Although PA may attenuate or even eliminate the detrimental influence of SB on health (8), SB and PA are different behaviors requiring individual management and thus, should both be addressed. Reducing SB while increasing PA may boost effectiveness of interventions meaning that such interventions may be more likely to be cost-effective in the long-term. The present review identified only two studies in which both SB and PA were targeted. However, effects of these outcomes were not considered for the HEE and thus, no conclusion for combined interventions can be drawn. A large trial among 69 219 employees found that besides significant improvement in PA and SB, significant changes in health outcomes were also found (65). HEE of such PA and SB interventions are needed to provide decision makers with the evidence to make informed decisions about allocation of scarce resources (18).
Six studies implemented interventions for employees with specific health conditions (eg, overweight). We found no relevant difference for effects between studies which focused on such groups (median Hedges' g=0.09) and studies which focused on healthy employees (median Hedges' g=0.15). This is somewhat surprising, as previous research showed larger effects of worksite PA-intervention when focusing on employees with specific health conditions (21). Furthermore, focusing on specific groups may also reduce intervention costs because the intervention is not directed at employees who are already physically active and thus, the intervention is likelier to be cost-effective.

Health economic evaluations
The included HEE differed in several ways. In CBA, the effect on the outcome is expressed in monetary terms. This was typical for HEE from the employer's perspective because the employer will only implement an intervention if the benefits are at least as high as the investment. HEE from the societal perspective, however, typically performed CEAs which results in an ICER. Most ICERs indicated that the intervention was more effective and more costly. It is difficult to determine if such interventions are cost-effective because costeffectiveness depends on the willingness-to-pay. To our knowledge, this willingness-to-pay threshold for PA/ SB has not been established as yet and would be an important subject for future studies.
We found that, even when using the same perspective and the same analytical approach, HEE included different cost categories, which hinder between-study comparisons. As in most HEE (66), our data showed that indirect costs (productivity) were the main cost-drivers. For example, in the study by Goetzel et al (50), the ROI was -42% excluding indirect costs but 103% including indirect costs. A systematic review found that PA was related to increased psychosocial health in employees (27) and there is also evidence that such health outcomes reduce presenteeism (67). Furthermore, low PA was found to be related to increased absenteeism (68). These are reasonable arguments why productivity should be considered in HEE of WHP. However, six studies did not include indirect costs. One reason may be that the methods for valuing productivity are controversial (66). However, in the last years, efforts were undertaken to provide practical guidance on how to estimate health-related productivity costs (66). Future studies should use such guides.
Between 6-10% of major non-communicable-disease can be avoided with the elimination of PIA (12). However, the pay-back time of PA/SB is long, which represents a challenge for controlled trials. A common approach in HEE is to model effects and costs over the long-term. We only found one study which modelled long-term costs and effects (63). There is thus a need for model-based HEE to better understand the economic value of worksite PA/SB-interventions in the long-term.
It was found that N-RCT delivered more favorable ROI compared to RCT. Furthermore, we found an inverse relationship between CHEC scores and ROI. This is in line with previous research in this field and often referred to selection bias (32,44).

Strengths and limitations
The literature search yielded only eighteen studies, two of which focused on SB. This small number of studies may limit the significance of this review, especially regarding SB. The number of retrieved references from database searches was 2530, which may seem to be small. However, this can be explained by the search filters targeting the setting (workplace) and the HEE which made the search strategy more specific.
With the use of published guidelines and search algorithms, we aimed to maximize the comprehensiveness of our search strategy. Furthermore, intensive reference tracking as well as search notifications from databases were applied in order to reduce the risk of missing studies. Nevertheless, restriction of some keywords to title or abstract may have limited the search, possibly resulting in missing some relevant studies.
The identified HEE were heterogeneous which limits comparison and thus drawing conclusions. As a consequence of this heterogeneity, it was inappropriate to carry out a meta-analysis, although initially planned. Descriptive analyses of the costs were performed. However, as interventions were heterogeneous, the mean costs should be interpreted with caution. Regarding external validity, we tried to provide a best possible comparison of studies by reporting all costs in 2017 euros and by calculating benefit-standardized economic outcomes. However, even if a uniform methodology was to be developed and used, comparisons across studies would be complicated because outcomes from HEE also depend on other factors like local regulations or national health policies.
Before applying the CHEC list, the two authors discussed the items thoroughly, which may explain the high reliability for rating the HEE. Nevertheless, some items were difficult to rate. For example, we did not define a threshold for "Is the chosen time horizon appropriate?" but decided individually, depending on the intervention and the outcome measures. Seventeen studies fulfilled this item which is contradictory with the fact that PA/ SB interventions have a long pay-back time. Likewise, there are no clear criteria for "Do the conclusions follow from the data reported?", where most disagreements were found (N=5).

Concluding remarks
Although most studies showed improvements in PA/ SB, effects were small and their relevance is questionable. No particular intervention type was found to be more effective. HEE were heterogeneous regarding methodological approaches and the selection of cost categories was inconsistent. Furthermore, effects on costs were subject to substantial uncertainty. Therefore, the economic evidence for worksite PA/SB interventions remains unclear.
Future studies are needed to determine which strategies work best for whom and under what circumstances. HEE of such interventions should be established using guidelines and validated, consistent measures of productivity costs as they were the main cost driver in included HEE. Additionally, studies should model the long-term costs and effects because of the long pay-back time of PA/SB interventions.