Evaluation of questionnaire-based information on previous physical work loads

Evaluation of questionnaire-based information on previous physical work loads. Scand J Work Environ Health 1999;25(3):246-254. Objectives The principal aim of the present study was to evaluate questionnaire-based information on past physical work loads (6-year recall). Methods Effects of memory difficulties on reproducibility were evaluated for 82 subjects by comparing previously reported results on current work loads (test-retest procedure) with the same items recalled 6 years later. Validity was assessed by comparing self-reports in 1995, regarding work loads in 1989, with worksite measurements performed in 1989. Results Six-year reproducibility, calculated as weighted kappa coefficients (k,"), varied between 0.36 and 0.86, with the highest values for proportion of the workday spent sitting and for perceived general exertion and the lowest values for trunk and neck flexion. The six-year reproducibility results were similar to previously reported test-retest results for these items; this finding indicates that memory difficulties was a minor problem. The validity of the questionnaire responses, expressed as rank correlations (r,) between the questionnaire responses and workplace measurements, varied between -0.16 and 0.78. The highest values were obtained for the items sitting and repetitive work, and the lowest and "unacceptable" values were for head rotation and neck flexion. Misclassification of exposure did not appear to be differential with regard to musculoskeletal symptom status, as judged by the calculated risk estimates. C O C U S O S The validity of some of these self-administered questionnaire items appears sufficient for a crude assessment of physical work loads in the past in epidemiologic studies of the general population with predominantly low levels of exposure. assessment, occupational, questionnaire, reproducibility, retrospective, validity. 6-year reproducibility (p,) of different questionnaire items varied between 0.36 and 0.86, with the highest values for the proportion of the workday spent sitting (0.86) and perceived general exertion (0.70) and the lowest values for forward bending of the trunk (0.36) and head bent forward (0.37). There no general trend for gender differences in the kappa values. The reproducibility (p,") for the men significantly exceeded for the women for the items on perceived exertion (P=0.004) and the lifting of loads between 1 and 5 kilograms

In epidemiologic studies of work-related musculoskeletal disorders, direct measurements or systematic observations at the workplace are recommended in preference to questionnaire-based information (1)(2)(3). However, exposures that occurred in the past are often no longer available for observation. Thus, in most cases, questionnaires are the method of choice both because of their relatively low cost and because the subjects can be asked to recall past exposures. The accuracy of much self-reported information has been doubted, mainly because of presumed recall problems. The magnitude of memory difficulties is likely to be closely related to the kind of information requested; for example there may be major problems with details such as percentages of workhours with parts of the body in specified positions. It is also reasonable to presume that memory difficulties increase with time (ie, more mistakes are made when answering questions regarding work loads 25 years back in time when compared with questions on work loads a few years ago) and that the risk of differential misclassification of exposure due to symptom status may also increase with the length of the recall period.
The validity of questionnaire information on physical work loads is crucial for the accuracy of risk estimates for associations between work loads and musculoskeletal outcome. The validity of some self-reported gross activities in current work (eg, the fraction of the workday spent sitting) has been reported as sufficient for use in epidemiologic studies, while the validity of information on body postures has been found to be lower (4,5). Validity studies performed so far have been restricted to present work load, while, to our knowledge, the validity of self-reported historical physical work loads has only been evaluated once before in the literature (6).
The aim of our study was to evaluate questionnaire information about past physical work loads (6 years ago), in terms of test-retest reliability and validity, by questionnaire responses and workplace measurements obtained 6 years back in time.

Study design
Questionnaire information on historical physical work loads was validated by comparing self-reports (collected in 1995), on physical work loads 6 years previously (in 1989), with worksite measurements obtained in 1989 and used as criterion values. Reports on perceived general exertion (RPE) was validated by heart-rate measurements.
Disagreement between measurements performed in 1989 and questionnaire information collected later (in 1995) probably has 2 main sources, recall problems and methodological differences between questionnaire items and workplace measurements. Recall difficulties were evaluated by comparing previously reported results on test-retest reproducibility (2 week test-retest) with the 6year reproducibility results presented in this study. Disagreement between questionnaire definition and workplace measurements may arise as a result of differences in the response scales used, especially with regard to the duration of recorded as opposed to self-reported work loads and to differences in task definition between workplace measurements and questionnaire items. The applicability of different response scales and different phrasings of items was also investigated.
Differential bias related to the musculoskeletal symptom status was analyzed by comparing the reproducibility, sensitivity, and specificity between subjects with and without low-back symptoms. The musculoskeletal symptom status both in 1989 and in 1995 was considered.

Subjects
The study group consisted of 39 men and 58 women, a subgroup of the Stockholm MUSIC 1 study. This group was originally selected in 1989, and individual workplace measurements were then performed. "MUSIC" is an acronym for Musculoskeletal Intervention Center and is a network of 10 departments in Stockholm with the aim of preventing musculoskeletal disorders. The subjects constituted 4 subgroups [furniture removers (N=12), a sample from the general male population (N=27), medical secretaries (N=13), and a sample from the general female population (N=45] in order to cover a broad spectrum of physical work loads (7). These subjects were all working in the Stockholm area during the Stockholm MUSIC 1 study, and most of them were still doing so in 1995, when they were contacted and asked to participate in a follow-up. Their present addresses and phone numbers were obtained from national population and phone number registers. The main reasons for nonparticipation were (i) could not be located at a recent phone number or address, (ii) did not respond to phone calls or letters, (iii) refused to participate (2 subjects).
The study was reviewed and approved by the regional Ethics Committee of Human Research at the Karolinska Institute, Stockholm, Sweden.

Data collection instruments
Measurements at the workplaces in 1989. Physical work loads were recorded individually at the workplaces during a normal workday for each of the 97 subjects. Heart rate was recorded during a whole workday by a Sport Tester PE 3000 (Polar Electro, Finland), and the percentage of time spent sitting was determined with a Posimeter device (8). Body postures, repetitive finger and hand movements, and manual handling were recorded by experienced ergonomists and physiotherapists with the portable ergonomic observation (PEO) method (9). The heart rates and the PEO results of the observation day were weighted for a "typical workweek" according to interview information on task duration during the week (7,9). 1989. Illustrated questions about work postures and manual handling were used (appendix). Their reproducibility for present work loads have previously been evaluated with a test-retest procedure in a population study of 343 subjects (10) and validated in relation to worksite measurements comprising the same subjects as in the present study (5). The physical activities described by these items were quantified as a proportion of the workday (6-degree ordinal scale) or as the frequency per hour (5-degree ordinal scale).

Questions in
Questions in 1995. In 1995 new items (appendix) were added to the previously described items in order to Scand J Work Environ Health 1999, vol25, no 3 Evaluation of auestionnaire-based information sxplore the influence on the validity of different response scales. These items have been evaluated for test-retest reliability (1 1) and were quantified as the proportion of the workday, using a 100-millimeter visual analogue scale (VAS) or using the number of days per month spent at a certain activity as recorded on a 5-degree ordinal scale.
Physical exertion at work was asked for both in 1989 and 1995, regarding the general perceived exertion in 1989, with a modified RPE scale (0-14). Information on past musculoskeletal symptoms (last 12 months) requested in 1989 and in 1995 was obtained by the Nordic questionnaire (12) and additional questions were asked in 1995 regarding present employment or retirement status.

Procedure
Subjects living in the Stockholm area in 1995 were contacted by telephone and invited to meet with a member of the research team. Those living far away or too busy for a personal meeting were sent instructions and questionnaires and contacted by telephone (3 subjects). When the subjects answered the self-administered questionnaire items in 1995, they were instructed to consider their work situation as it was during the time of the Stockholm MUSIC study in 1989.
A total of 7 questionnaire items that could be evaluated in relation to workplace measurements were identified and checked for comparability with the workplace measurements performed in 1989. Thus repetitive finger and hand movements were recorded as "repetitive work" during the workplace measurements in 1989, but they were given as 1 or 2 separate questions in the questionnaires. Only the item on repetitive finger movements was chosen for the validation of "repetitive work" in the past. Manual handling of loads was measured in 1989 as the total number of lifts, independent of weight, and the answers concerning these items in the questionnaires could therefore not be validated.
Two more items were not evaluated. One was because of the incongruent definition between the questionnaire item and the workplace measurement (work with hands above shoulder level). The other item (trunk rotation more than 45 degrees) was deleted from the analysis since too few exposed subjects (7 subjects) exceeded 0% of their worktime in this activity during an estimated normal week.
The 6-year reproducibility of the questionnaire items was analyzed with weighted Cohen's kappa coefficients (k, ) with 95% confidence intervals (95% CI). Kappa co-efficients describe agreement beyond chance and produce results identical with those of intraclass correlation coefficients, if calculated with quadratic weights on categorical responses. Kappa values exceeding 0.75 were regarded as "excellent agreement" beyond chance, values below 0.40 represented "poor agreement", and values between 0.40 and 0.75 were rated "fair to good agreement" (13). Tests of the statistical significance of differences in the kappa coefficients between independent groups (men,women; with symptoms,without symptoms) were made according to Fleiss (13).
Validity was analyzed for the retrospective questionnaire responses obtained in 1995 with worksite measurements performed in 1989 as the reference values. The response scales differed between the measurements at the workplaces in 1989 and the questionnaire items in 1995 so that an exact matching of scales was not possible, and the Spearman rank correlation coefficient (r,) was chosen for the calculation of agreement. A correlation coefficient of at least 0.6 was regarded as an indicator of high agreement. Tests of statistical significance of differences in rank correlation coefficients between independent groups (men,women; with symptoms;without symptoms) were made according to Blalock (14).
For calculations of sensitivity and specificity the cutoff point between the exposed and unexposed subjects was defined as the median value because of the incomparability between the scales used, and also in order to have enough subjects in both exposure groups. However, some analyses of sensitivity and specificity, using cutoff points based on current knowledge about adverse effects on the musculoskeletal system, were also performed. The distributions of questionnaire responses and measurement values were highly skewed in some cases, with most responses at the lower end of the scales. The sensitivity and specificity were only calculated if there were 10 or more subjects in each exposure group.

Results
Eighty-two of the 97 subjects (85%) participated in our current study (table 1). The reported percentages of neck and shoulder or low-back symptoms during the last 12 months were approximately equal in 1989 and 1995. About 113 of the subjects reported neck and shoulder or low-back symptoms in both of these years, 113 only in 1989 and 113 only in 1995.

Six-year reproducibility
Slightly higher work loads were reported in 1995 than in the original responses in 1989 for some of the items, such as perceived general exertion, hands above shoulder level, and trunk bent forward (table 2). The 6-year reproducibility (p,) of different questionnaire items varied between 0.36 and 0.86, with the highest values for the proportion of the workday spent sitting (0.86) and perceived general exertion (0.70) and the lowest values for forward bending of the trunk (0.36) and head bent forward (0.37). There was no general trend for gender differences in the kappa values. The reproducibility (p,") for the men significantly exceeded that for the women for the items on perceived exertion (P=0.004) and the lifting of loads between 1 and 5 kilograms (P=0.002). Higher reproducibility among the women than the men was found for the item on repetitive finger movements (P<0.001). The presence of low-back symptoms during the last 12 months before answering the questionnaire in 1995 did not seem to affect the test-retest reliability in any obvious direction (table 2), and no statistically significant differences were found between those with lowback symptoms and those without. The influence on reproducibility for neck and shoulder symptoms was also examined, with similar results. The influence of symptom status in 1995 on reproducibility did not differ when compared with the influence of symptom status in 1989.

Validity of retrospective questionnaire information
The correlation between workplace measurements and questionnaire reports was high for sitting and repetitive work, moderate for perceived exertion, trunk flexion, and kneeling and squatting, but poor for neck flexion and neck rotation (table 3).
The correlation (r,) among the men exceeded that among the women for the item on perceived exertion (P=0.02). Higher correlation among the women than among the men was found for the item on repetitive work, especially with the 6-degree ordinal scale (P=0.07), but to a less extent with the 5-degree ordinal scale (P=0.27). The items on sitting, kneeling and squatting, and trunk bent forward showed only minor gender differences. The 2 questions on head postures yielded low a col~elation for the women and could not be evaluated for the men because of the few measurements performed (table 3).
Among the subjects with low-back symptoms the coefficient of correlation (r,) was higher than among those without symptoms for the item on perceived physical exertion (P=0.03). Higher (not statistically significant) agreement among those without low-back symptoms was noted for the item on forward bending. The item on kneeling and squatting showed significantly higher agreement among the subjects with low-back symptoms (P=0.01). No trend could be seen for the 1995 responses regarding sensitivity and specificity between those with or without low-back symptoms.
The distribution of the responses was highly skewed for some items, and sensitivity and specificity were therefore not calculated for all the subgroups (table 3). When cut-off points based on current knowledge about adverse effects on the musculoskeletal system were used, slightly lower sensitivity and specificity values were obtained in comparison with the corresponding values for cut-off points based on median values.

Discussion
In this study of people interviewed on 2 occasions, the agreement between self-reports of curremt conditions and self-reports of the same conditions after a delay of 6 years was fair to excellent (k, >0.40) for 9 of 13 items, namely, perceived exertion, sitting, kneeling and squatting, head rotation, hands above shoulder level, repetitive finger movements, and the 3 items on the manual handling of loads. Low reproducibility (k, <0.40) was found for 4 items on head and trunk posture and repetitive hand movements (table 2).
Differences between the previously reported 2-week test-retest reproducibilities of the questionnaire items used in the Stockholm MUSIC study (10) and the 6-year reproducibility results in this study were regarded as a measure of memory difficulties during the time period between 1989 and 1995. For 7 of the 13 items in  (sitting, kneeling and squatting, head rotation, hands above shoulder level, repetitive finger movements, manual handling of loads between 6 and 15 kilograms, and manual handling of loads between 16 and 45 kilograms), a decrease in the correlation coefficient of less than 0.10 between the 2-week and 6-year reproducibility was found, indicating only minor memory difficulties (average 2-week reproducibility 0.64, average 6-year reproducibility 0.62). The corresponding average reproducibility values for the 6 remaining items were 0.63 for 2 weeks and 0.43 for 6 years. The results are in accordance with the previously reported results, with comparisons of different time intervals between test and re-test (1 I), and also with the results of a study of physical activities during leisure time (15). The majority of the subjects in our study (about 75%) were still working in the same occupations in 1995 as in 1989. This circumstance may lead to an underestimation of memory difficulties in questions about physical work loads 6 years back, if the subjects used current work as a proxy for their work situation in 1989. Thirteen subjects in the present study changed jobs between 1989 and 1995,8 of them to jobs with a similar level of physical load, and 5 to higher or lower physical load, based on an evaluation of the job titles. Physical work loads often decrease with age, especially among men (16), and reports of work loads in the past would thereby probably be underreported in cases of influence from the current work situation. However, this is less likely to be a big problem in this study, which showed mainly unchanged or even slightly higher responses in 1995 for work loads 6 years earlier, compared with the responses in 1989 (table 2).
The validity of the self-reported exposures was good for the items on sitting and repetitive work, and intermediate for the items on perceived exertion and forward bending of the trunk. The results of the items on head postures showed poor results, a finding in line with previously reported results for these items (5). The results of the item on kneeling and squatting showed less agreement than in the study by Wiktorin et a1 (5).
A major concern in epidemiologic studies of physical work loads and musculoskeletal health is whether the musculoskeletal symptom status may cause differential misclassification of exposure. If so, the point estimates of relative risk could be either under-or overestimated, depending on the direction of the bias. In view of possible gender differences, agreement between questionnaire responses and workplace measurements in relation to musculoskeletal symptom status should preferably be evaluated in groups with a similar gender distribution. This was the case regarding low-back symptoms, for which the men constituted approximately 35% of both the group with and that without symptoms (except for the items on head posture). A comparison of the subjects with and without symptoms did not indicate any tendency for the subjects with symptoms to report higher or lower exposure levels than the symptom-free subjects (table  2). However, the precision of the questionnaire responses (random error) could still cause differential misclassification if the subjects with musculoskeletal symptoms remember their previous work loads more exactly than those without symptoms. The rank correlation coefficients for the subjects with low-back symptoms during the last 12 months differed from the corresponding Table 3. Agreement calculated as the Spearman rank correlation (r,) between questionnaire responses in 1995 concerning physical load at work in 1989 and reference measurements at the workplaces in 1989. Sensitivity and specificity were calculated using the workplace measurements as criterion values, and the variables were dichotomized into 2 classes (exposed and unexposed) by the median value. The study group was divided into different groups according to gender and low-back symptom status during the last 12 months before answering the questionnaire in 1995.  correlation for those without symptoms for only a couple of items. However, these differences in agreement between symptom groups were not associated with differences between calculated prevalence ratios based on questionnaire responses (PR,,,,, ) and those based on reference measurements (PR,,). Thus there appears to have been no serious differential misclassification of exposure in our study. This conclusion is in accordance with a previously reported evaluation of present physical work loads obtained with the MUSIC questionnaire (5), but not with the results of a recently published study of forest industry workers in Finland (4). The additional items used in 1995 were constructed on the basis of experiences from the Stockholm MUSIC study, but with a focus on physical work loads in the past and in the general population, and some changes in the wordings of items and the response scales were therefore considered necessary. These changes limited the items that could be validated in relation to workplace measurements performed in 1989. Both the old and the new (additional) items could be regarded as suitable for collecting information on historical physical work loads. However, the new items could be recommended in studies of the general population, as the ordinal response scale was found to generate better (ie, wider) response distributions in cases of predominantly low-exposure levels, which are often found in population studies. This finding is in accordance with the intentions when the new items were created.
Comparisons between previously reported validity of the self-reported physical work loads at present (5) and in the past (the present study) could be made for 6 items with comparable cut-off points in the 2 studies. Approximately unchanged and acceptable correlation values were found for daily time spent sitting and for forward bending of the trunk. Unchanged and "low agreement" were noted for the item on head rotation, and decreased agreement was found for kneeling and squatting and for forward bending of the head. Despite the promising results for some of the items evaluated in this study, no general recommendation for the use of these items in epidemiologic studies can be made. However, the validity of some of these self-administered questionnaire items (perceived exertion, sitting, kneeling and squatting, trunk flexion and repetitive work) are believed to be sufficient for the assessment of physical work loads in the past on a rough level, in studies of general population groups.