Vocational outcome of intervention for low-back pain.

Practical management guidelines for occupational health physicians are needed for the individual support of employees with low-back pain. In this study the level of evidence regarding the efficacy of intervention with vocational outcome parameters was assessed. In a systematic literature search, 40 randomized clinical trials on different types of intervention were retrieved. Their internal validity and statistical power criteria were assessed. The randomization procedure, blinding of patients, and sample size were problematic in most studies. For patients with acute low-back pain limited or moderate evidence was found for the efficacy of no bed rest, a short period of bed rest, and spinal manipulation. For chronic patients limited evidence was found for the efficacy of antidepressants. For the other types of intervention, studies with sufficient statistical power were lacking. Such studies are needed before more-detailed evidence-based guidelines can be formulated for occupational health care.

One of the main tasks of the occupational health physician in The Netherlands is to offer guidance to employees who have been sick.According to the Dutch Labour Conditions Act every employer should counsel the sick employee and contract an occupational health service (in Dutch: "Arbodienst") for assistance (1).However, even for a common problem such as low-back pain, it remains unclear what counseling or intervention can prevent prolonged absenteeism from work.In order to improve the quality of occupational health care, it is important to formulate evidence-based guidelines.In 1987, the Quebec Task Force on Spinal Disorders indicated the importance of such practical management guidelines for lowback pain (2).In the United States (3), the United Kingdom (4), and The Netherlands (3, clinical guidelines have been formulated, but specific guidelines for the occupational health physician do not exist. In the past 15 years several reviews have been published on the efficacy of various types of treatment or rehabilitation of low-back pain patients.(See, eg, references 6-1 0.) Most of these reviews concentrate on one form of treatment; multiple outcome variables, such as pain, muscular strength, or functional disability, have been assessed.However, the relation of these clinical outcome variables to vocational status is unclear (11).Therefore, it is important to assess the efficacy of different types of treatment with respect to vocation separately.Vocational status parameters, such as duration of sick leave and rates of return to work, are important for the occupational health physician because employers are particularly interested in these figures.
The main objective of this study was to contribute to the development of evidence-based guidelines for occupational health physicians in their approach to the sickness absence of workers with low-back pain.Using vocational status as a measure of outcome, we reviewed the literature on nonsurgical intervention for workers with low-back pain.We have confined ourselves to the outcomes of randomized clinical trials because the randomized clinical trial is considered the most appropriate design for studying the efficacy of intervention (12).Further~nore, we evaluated studies on (sub-)acute (2 3 month) and chronic (>3 months) low-back pain patients separately.

Methods
cational program is usually combined with an exercise program.

Retrieval of studies
A search was performed of the MEDLINE (from 1966), CLINPSYCH (from 1980) and NIOSHTIC (from 1966) data bases through December 1995.We used medical subject headings, indexing terms, and text words.The key word "back pain" was combined with one of the following terms: (employee) absenteeism, (re)employment, sick leave, return to work, sickness absence, occupational disability, and employment status.Relevant references in the retrieved articles and in published reviews were examined.
A study was included if the following criteria were met: (i) it concerned a randomized clinical trial; (ii) it concerned subjects with acute or chronic low-back pain; (iii) the intervention was directed towards acute lowback pain patients [nonsteroidal antiinflammatory drugs (NSAIDs), bed rest, spinal manipulation, back school or back exercises, and case management methods] or chronic patients (antidepressants, nonsteroidal antiinflammatory drugs, spinal manipulation, back school or exercises, behavioral therapy, and case management methods); (iv) the outcome parameter was rate of return to work, duration of sick leave or another measurement of vocational status; and (v) the article was published and available in English.
We selected these specific categories of intervention because using return to work in conjunction with all types of medical treatment did not seem feasible.Many treatments aim primarily at a reduction of pain or functional limitations.When there is no effect on functional limitations, an effect on vocational status is not expected (13).Therefore, we excluded intervention without or with evidence against their efficacy in terms of pain or functional status.Since no positive evidence was found in the literature about intervention methods for low-back pain (5,(14)(15)(16)(17)(18)(19)(20)(21), we excluded studies on analgesics, muscle relaxants, epidural and intraarticular injections, traction, orthoses, biofeedback, acupuncture, and transcutaneous nerve stimulation.We included one exception (ie, case management methods) because of our interest in facilitating return to work.Such methods depend on the institutionalized presence of a case manager, who assesses the client's needs and ensures, through a care plan, that suitable services are provided to meet the needs, within the limits imposed by the insurance company (22).In a return-to-work process, this job is first and foremost the task of an occupational physician.We combined "back schools" and exercise therapy into one type of intervention because they overlap, since an edu-

Assessment of evidence
1 Reviews and meta-analyses of randomized clinical trials have offered diverse and extensive scales for assessing the quality of studies (23).However, only the randomization of treatment allocation, double-blinding, and handling of withdrawals and dropouts have proved to be associated with bias (24).Furthermore, internal validity criteria should be distinguished from other quality aspects, such as statistical power, external validity, and presentation (25).Therefore, we used the procedure represented in figure 1 in assessing the evidence and relevance for guidelines.We distinguished trials that compare a therapeutic intervention with a placebo (placebo trials) from trials that compare 2 or more different treatments (pragmatic trials).We used a broad definition for a placebo and also a categorized waiting-list group as such.
The issue of blinding was limited to blinding of the study participants for treatment because the rate of return to work and the duration of sick leave are objective parameters.Therefore, blinding the observer seemed less relevant.
Thus the assessment of internal validity resulted in a score between 0 and 5 points.A randomized clinical trial was considered to have a high internal validity if the score was 4 or 5, and a low internal validity if it was 3 points or less.We devised detailed classification to improve the reliability of the assessment.To examine the reproducibility of the 5 items, one of us (JV) reassessed a random sample of 10 trials unaware of the assessments made by the principal assessor (WvdW).The reproducibility was sufficient (kappa 0.76, 88% agreement).The inconsistencies were mostly due to the criterion of comparability; therefore, this criterion was slightly adjusted.
Statistical power of the study.Small sample size is a major reason for erroneously concluding a finding is negative and thus missing a clinically important effect.We took this point into account with regard to studies with negative findings as a second step of the quality assessment.Closely following Moher and his colleagues, we calculated the sample size necessary to detect a 25% relative difference between the treatment and reference group, with a 1-tailed a = 0.05 and 80% power, using a z-test for proportions and a t-test for means (34).If the calculated sample size was above the real sample size, the power of the study was scored as insufficient.Studies with a negative outcome were only included in the next step of determining the level of evidence when they had enough power.

Level of evidence.
For studies that used rates, we reported the rates and calculated the rate difference and rate ratio with a 95% confidence interval (35).We reported the rates of negative events, for example, "rate of no return to work", which means that a lower rate refers to a more effective treatment.For quantitative variables, such as sick leave, the statistical means and absolute differences between the means in the experimental and reference groups were described.We calculated confidence intervals of 95%, assuming a normal distribution.
A study can concern more than 1 outcome variable or more than 1 period in time.We presented the results for each outcome parameter.A result was defined as positive for a specific outcome parameter if there was a significant difference in favor of the experimental inter-vention.In other words, the 95% confidence interval did not include a rate ratio of 1 or a rate difference or absolute difference of 0. A study was considered to be negative if there was no significant difference between the 2 types of intervention or if the reference intervention was more effective than the experimental intervention.
A meta-analysis is problematic with regard to the heterogeneity of the patient groups, the types of interventions, follow-up periods, or outcome parameters (36,37).Therefore, we used a rating system with four levels of scientific evidence for an overall conclusion regarding the efficacy of the intervention.This system is an adaptation of that used in the US Clinical Practice Guideline for Acute Low-back Problems in Adults (3): level A: strong evidence: multiple high-quality randomized clinical trials; level B: moderate evidence: I high-quality randomized clinical trial and 2 1 low-quality randomized clinical trials; level C: limited evidence: 1 high-quality randomized clinical trial or multiple low-quality randomized clinical trials; level D: no evidence: 1 low-quality randomized clinical trial, no randomized clinical trial or contradictory outcomes.
The number of randomized clinical trials identified for each intervention varied widely, ranging from 0 on case management methods for chronic low-back pain to 7 on back school or exercise programs for acute lowback pain.

Internal validity
Tables 1 and 2 show the internal validity scores of the studies.Only the study of Malmivaara et a1 (52) reached the maximum score of 5 points, while 6 randomized clinical trials had a score of 4 points.Therefore 7 randomized clinical trials had a high score for internal validity.The scores on the 5 criteria diverged widely.The randomization procedure and blinding of study participants were the most problematic to meet, being met in only 9 (27%) and 10 (30%) randomized clinical trials, respectively."Loss to follow-up" and "intention-to-treat" scored much better, in 31 (94%) and 28 (85%) of the trials, respectively.Adequate handling of the base-line characteristics scored 1 point in 16 randomized clinical trials (48%).

Statistical power
The outcome parameters and results are presented in tables 3 and 4. The statistical power of the studies with negative results was rated as sufficient for only 5 outcome parameters in 4 trials (15% of the negative studies).For 4 trials we estimated the power by assuming a certain -reasonablestandard deviation.In 7 trials the data were not sufficient to calculate the power for 1 or more outcome parameters.

Level of evidence in acute low-back pain studies
Nonsteroidal antiinflammatory drugs.Four studies about nonsteroidal antiinflammatory drugs were identified (58, 62,68,69).Three studies were placebo trials.They had a negative outcome, but none passed the statistical power criterion (58, 68, 69).(See figure 1.) Therefore no evidence for or against the efficacy of nonsteroidal antiinflammatory drugs was available for patients with acute symptoms.
Hosie's (62) pragmatic trial had only 1 point for internal validity and a negative result.We could not calculate the power of this study because only median days were reported.There was also no evidence that 1 particular nonsteroidal antiinflammatory drug was more effective than another.
Limitation of bed rest.Four studies were identified that compared different periods of bed rest or the avoidance of bed rest with a short period of bed rest (42,52,69,70).Three could be used after application of the power criterion.The study with high internal validity by Malmivaara et a1 (52) reported positive results for duration of sick leave and negative results for the scale "ability to work".These results suggest that avoiding bed rest leads to shorter sick leave, while the difference in bed rest period did not affect functional ability.The study with low internal validity by Deyo et a1 (42) reported fewer days of sick leave for 2 days of bed rest in comparison with 7 days of bed rest.The other study with low internal validity, by Wiesel et a1 (69), reported negative results with combat trainees.In fact, the reference group with a forced period of bed rest had significantly fewer days of sick leave.A likely explanation for these contradictory results is the setting of this study.The patients, who were forced to stay in bed, were military trainees.Staying in bed until return to full duty was probably appreciated less than walking around without duty, which was permitted for the experimental group until the pain Three weeks inpatient, Written and oral instructions el versus r: mean scores of differences of modified Swedish back (N = 160) (r) sickness allowance days between pretreatment school, refresher course and 1.5 years posttreatment due to spinal after 1 YZ years (N = 157) disorders: (e) t0.3, (r) t3.8, AD: -3.5 (NS) and (el) Modified Swedish back 2.5 years posttreatment due to spinal disorders: school, refresher course (e) +I .6,(r) t4.8, AD: -3.2 (NS) after 1 YZ years (N = 159) e2 versus r: mean scores of differences of (62) sickness allowance days between pretreatment and 1.5 years posttreatment due to spinal disorders: (e) -0.2, (r) t3.8, AD: -4.0 (NS) and 2.5 years posttreatment due to spinal disorders: (e) t2.2, (r) t4.8, AD: -2.6 (NS) e l versus e2: mean scores of differences of sickness allowance days between pretreatment and 1.5 years posttreatment due to spinal disorders: AD: t0.5 (NS) and 2.5 years posttreatment due to spinal disorders: AD: -0.6 (NS) Rate of disability pension after 4.5 years on the average: e l versus r: (e) 0.10, (r) 0.12, RD: -0.02 (-0.09-0.05);RR: 0.86 (95% CI 0.46-1.61);e2 versus r: (e) 0.08, (r) 0.12, RD: -0.04 (95% CI -0. a t = positive conclusion (experimental treatment betterthan reference treatment and 95% CI of RD does not include 0 or 95% CI of RR does not include 1 sufficient power: sample size enough to detect a 25% difference with I-tail a = 0.05 and I-fi = 0.80), -= negative conclusion.+=sufficient power, -= insufficient power, ( ) estimated power, ?= power could not be established."xperimental t reference groups = 25.Experimental t reference groups = 77.disappeared.Therefore, for "civilian life" there is moderate evidence for the efficacy of avoiding bed rest or for short periods of bed rest, in terms of duration of sick leave after 3 months for patients with and without radiating pain.
Spinal manipulation.Six randomized clinical trials about spinal manipulation were identified (40,41,61,65,67,71).In three a placebo group was used.One study qualified for the final assessment of the level of evidence.It had a high internal validity score and reported positive results after 10 days for patients with pelvic joint dysfunction (71).
Two of the 3 pragmatic trials reached the final assessment.The trial by Blomberg et a1 (41) had high internal validity and a positive result for patients with or without radiating pain.In a study with low internal validity Rasmussen (65) reported that spinal manipulation was more effective than short waves for patients without signs of root pressure (65).
In conclusion, there is limited evidence for the efficacy of spinal manipulation in comparison with placebo treatment in cases of pelvic joint dysfunction.There is moderate evidence that spinal manipulation is more effective in the short run than other conservative types of treatment, like physiotherapy, at least for patients without radiating pain.
Back schools or exercise therapy Seven randomized clinical trials were identified in 9 articles (40,43,(49)(50)(51)(52)(53)(54)67).Both placebo trials could be used for the assessment of the level of evidence.One randomized clinical trial had high internal validity with a negative result (43) and the other had low internal validity with a positive result (40).The differences between these 2 studies could not be explained by the intensity of the program.While the program of the negative trial had 10 sessions, the positive one had only 4. Merely the content of the program differed, the program of the positive study including a worksite visit.
Five randomized clinical trials compared 2 or more alternative interventions, and 2 of these trials could be used for the level of evidence.One study with high internal validity had a negative result after 3 and 12 weeks of follow-up (52).The study by Lindstrom et a1 (50, 51) had low internal validity and reported a positive result with 2 years of follow-up.The intervention of these trials differed strongly in intensity in that the negative study had an exercise program of 1 session and the positive trial comprised a program of 10 sessions, on the average, and included a worksite visit.
In conclusion, there is no evidence for the efficacy of back schools or exercise therapy, and there is no evidence either that back schools or exercise therapy is more effective than usual medical case.
Case management methods.We identified 4 randomized clinical trials using case management methods (43)(44)(45)47).The only placebo trial, a high-quality study, reported a negative result (43).Only 1 of the 3 pragmatic trials, a study with low internal validity, could be used in the final assessment of the level of evidence (47).It reported a positive outcome.In conclusion, there was neither evidence for the efficacy of case management methods nor for the greater efficacy of case management in comparison with conventional treatment.

Level of evidence in chronic low-back pain studies
Antidepressants.Only one placebo randomized clinical trial (57), a study with high internal validity and a positive result, was identified that used antidepressants.Therefore there was limited evidence of efficacy for chronic low-back pain patients after 2 months of antidepressants.
Nonsteroidal antiinflammatory drugs.Only 1 placebo trial with low internal validity and with a positive result was identified (64), therefore there was no evidence for or against the effectiveness of nonsteroidal antiinflammatory drugs for chronic patients.
Spinal manipulation.Two articles about spinal manipulation were identified (59, 60).One study used placebo treatment, but had insufficient statistical power (60).The other trial compared alternative treatments (59).It was a study with low validity and a positive outcome.
In conclusion, there is no evidence for the efficacy of spinal manipulation for chronic patients, whether compared with a placebo or other treatments.
Back schools or exercise therapy.Five randomized clinical trials were identified (46,48,55,56,63) for back schools or exercise therapy.One of these trials used a waiting list group, but did not pass the statistical power criterion.
One study had no description of the treatment for the reference group (56).This study was included in the group of pragmatic trials, with the other three.Two trials, both with low internal validity, were considered for the final assessment.One reported positive results (55), the other negative (56).Both were inpatient programs in a rehabilitation center and lasted 6 weeks.These contradictory results indicate that there is no evidence that back schools or exercise therapy is more effective than usual care.
Behavioral therapy.Three randomized clinical trials with low internal validity were identified that used behavioral therapy (38,39,66).In 1 randomized clinical trial, Turner (66) compared 2 groups subjected to different behavioral approaches with a group on a waiting list, but the 2 behavioral approach groups were also compared with each other.Only the last comparison could be used for the final assessment of the level of evidence.On the basis of this study it was concluded that there is no evidence that cognitive-behavioral group therapy as part of progressive-relaxation training is more effective in the long term than progressive-relaxation training alone.

Case management methods.
No randomized clinical trials about case management were identified for chronic low-back pain patients.

Discussion
We used 2 strategies for identifying relevant articles for our review.First.we studied other reviews on interven-tion for low-back pain patients, and, second, we searched computer data bases.An intensive search is important because evidence for guidelines is based on only 1 or 2 articles.Therefore, a conclusion can easily be reversed as 1 or 2 new articles on an intervention program are identified.Our key words were broad, but failed to select all articles with vocational status outcome parameters from MEDLINE and the other data bases.A data-base search with key words regarding specific types of intervention would be more useful.However, a key word search can also not guarantee the identification of all relevant articles.Publication bias, the limited number of indexed journals, and the use of different key words are all problems in a literature review that should be reduced, for instance, by a registration system of clinical trials (77).To this end, the Cochrane Collaboration (25), which aims at the registration of all randomized clinical trials, is promising.
We used a stepwise quality assessment, starting with an assessment of the internal validity.Ideally, studies with low internal validity should not be included because of the association of methodological inconsistencies with bias (24).An obvious disadvantage is the lack of studies with high internal validity, as was concluded in other reviews (20).If the results of studies support each other and there is a lack of studies with high internal validity, trials with low internal validity are the second best option for establishing the level of evidence.The use of the 5 internal validity criteria is comparable to the use of the criteria that are part of the quality assessment by other authors (23).The condition that 4 or 5 criteria should be met for high internal validity is arbitrary.We argue that at least 4 criteria should be met because of the proved association between the selected criteria and bias (24).However, the subjectivity of quality assessments remains a problem.Therefore, the development of an easy, universal, and reliable scale deserves attention (23).For instance, we did not weight the criteria because the relative value of each criterion was arbitrary.The choice of the rating system for the level of evidence was also subjective.The choice of our system corresponds as much as possible with the AHCPR system of the Agency for Health Care Policy and Research (AHCPR) in the United States although it could not be copied completely (3).We used randomized clinical trials and no other studies or expert opinions in the same manner as the AHCPR system.It should be noted that a more conservative system, such as that used by Tulder et a1 (20,21), would lead to less evidence and an even greater demand for more randomized clinical trials.Therefore, we would encourage a discussion about the balance between a demand for more research and satisfactory evidence for practical guidelines.
The methodological problems most prevalent in the studies concerned adequate randomization and blind-ing of study participants.This result is consistent with those of reviews on other outcome parameters, such as pain and functional limitations (20).However, "intention to treat" scored much better among the retrieved studies in our review.The most likely explanation is the use of sick leave rates as an outcome parameter, which could be retrieved despite the failure to follow-up the respondents.
The second step in the quality assessment was related to the number of study participants.In meta-analyses, multiple studies with low power can be combined to show one, more powerful outcome.However, heterogeneity as to the intervention, follow-up period, study population, and the like does not always permit such a combination (36,37).In our view, the reviewed intervention studies differ too much for a meta-analysis, and we therefore looked at the studies separately.We excluded studies with low power to prevent an erroneous negative conclusion about the efficacy.Unfortunately, many identified studies were excluded because of insufficient power, despite the fact that we used a 25% difference between the experimental and reference treatment as a relevant clinical difference.In our opinion, this criterion is not very rigorous.For instance, if we had considered a smaller difference as relevant (eg, 20%) even more studies would have been excluded.Therefore, we recommend sample size calculations at the beginning of a study to minimize this problem as much as possible.These calculations give only a global estimation of the sample size because of the uncertainty about some parameters.The lack of power for the vocational outcome parameter in the retrieved studies can be due to the fact that some studies were primarily aimed at other outcome parameters, with less variance.
Most of the studies have broad confidence intervals for their outcome parameters.This great variance can be partially explained by the small number of patients included, but also by the assessment of difference~ in the means.Although we assumed a normal distribution for duration of sick leave, as most of the studies did, it is more appropriate to use survival techniques for analyzing time to return to work.Then fewer patients would be necessary to achieve sufficient power.
We have presented the outcome of the trials in 2 ways, as an absolute outcome (rate differences for rates) and a relative outcome (rate ratios for rates).For some studies it was not possible to calculate confidence intervals, sometimes not even the outcome itself.In these articles, the authors stated that the differences were not significant, without reporting standard deviations.In our opinion, it is important to report results uniformly to allow for comparison of the outcomes.As argued before, we recommend a standard method of reporting results for time to return to work.
Another explanation for the broad confidence intervals is the fact that return to work is a multifactorial process, with many confounders.Its outcome is not only influenced by the patient and his or her low-back pain, but also by the work situation, the possibilities for working part-time or working at a lower pace, and problems in the home situation and even the macro-economic situation (78,79).Intervention that is directed at the patient cannot influence all these factors; it should therefore have a high impact on the patient to have a net positive outcome.Because of this multifactorial influence on sick leave, a multifactorial intervention strategy can be the most effective.The individual patient with low-back pain should be advised of effective treatment options.However, at the same time, the work situation should be adapted to make it easier for the patient to return to work.A good example of this "case management approach" is the Sherbrooke model, which includes both clinical and ergonomic approaches (80).More randomized clinical trials on this subject are needed, as has already been noted by Lechner et a1 (46) in their review about the effectiveness of work hardening and work conditioning programs (76).They found only 2 randomized clinical trials out of 12, most of them with promising results.
Recently, another relevant review was published by Scheer et a1 (81) about randomized clinical trials for acute low-back pain in relation to return to work.They assessed the methodological quality of the studies, but the relationship between this quality rating and the level of evidence remained unclear.Furthermore, we found more randomized clinical trials on spinal manipulation which led to evidence for the efficacy of this intervention.For bed rest, exercise, back schools and case management methods, their conclusion was the same as ours.However, with our rating system, we showed that one of the main reasons for negative findings is lack of statistical power.

Concluding remarks
For most of the reviewed types of intervention, the scientific evidence is limited for the efficacy of intervention for patients with low-back pain in terms of sickness absence rates or duration of sick leave.We have formulated the following main points from this review for the guidelines for occupational physicians: bed rest should be limited or even avoided; normal activity should be continued as much as possible; . if any conservative treatment for patients with acute low-back pain is considered, spinal manipulation is the best option; antidepressants can be helpful for chronic low-back pain patients.
Furthermore, there are promising results for exercise and education programs, especially for intensive programs in an occupational setting (6).However, these positive findings have not yet been confirmed in randomized clinical trials with vocational outcome parameters.More highquality randomized clinical trials with sufficient statistical power and vocational outcome parameters are still needed before guidelines based on stronger evidence can be established for occupational physicians.

Table 1 .
lnternal validity of the studies on acute low-back pain.

Table 2 .
lnternal validity of the studies on chronic low-back pain.

Table 4 .
Results of studies on the efficacy of various conservative treatments for chronic low-back pain.(RD = rate difference; RR = rate ratio; AD = absolute difference; 95% CI = 95% confidence interval; NS = not significant; e = experimental treatment; r = reference treatment; NS = not significant)