Misclassification of physical work exposures as a design issue for musculoskeletal intervention studies.

OBJECTIVES
This study determined the impact of misclassification due to using job titles as surrogate variables for physical work exposures to assess confounding in a study of the preventive effect of back belts on back injury. The authors present retail merchandise data that quantify misclassification from residual confounding by physical work exposures on injury rate ratios when available administrative job titles are used.


METHODS
Job title and direct observation data on 134 workers were used to calculate the percentage to which the job-title-adjusted rate ratio for back injury accounts for confounding by the true physical work exposures, awkward postures, and heavy weight handling. Workers' compensation data, an estimate of the effect of back belts from the literature, and the percentage of adjustment of the rate ratio due to the job title variable were used to calculate the magnitude of bias from the rate ratio adjusted for job title.


RESULTS
The job title variable was found to have sensitivities of 97% and 85% and specificities of 68% and 58% for awkward postures and heavy weight handling, respectively. The magnitude of confounding bias remaining for the back-injury rate ratio when the job title surrogate was used was 24% for postures and 45% for heavy weight handling.


CONCLUSIONS
The administrative job title performed poorly in this setting; residual confounding was sufficient to bias the rate ratio from 2.0 to 1.3. The effect of additional sources of misclassification and the need for better exposure measures than job title are discussed.

Job title has been used in hundreds of occupational health studies as a surrogate for workplace exposures to environmental substances or physical factors (noise, heat, physical demands) in the workplace. Where substances in the workplace are causally associated with cancer or chronic diseases, job titles and work histories can frequently be transformed to exposure estimates (1). For occupational back injury studies, job title may act as a surrogate for physical work exposures, and, if there is more detail in the job title, such as worktasks or activities, the transformation to exposure estimate is better. However, the underlying physical work exposure measures that are the actual causes of work-related injuries or disorders are generally not assessed; therefore the accuracy with which any particular job title reflects the underlying physical work exposure measure is usually unknown. Postures and weight lifted have been consistently reported as risk factors in epidemiologic studies of low-back pain and back injury (2)(3)(4)(5). Lowback disorders are multifactorial, and, while factors such as vibration and heavy work have also been reported by several studies, postures and weight lifted are invariably reported as risk factors, and they are reported in a variety of settings and industries (6).
Most occupational epidemiologic studies of back disorders have used job titles as a surrogate measure for physical work exposures. But the exclusive assessment of physical work exposures in occupational studies of musculoskeletal disorders by job titles has been extensively criticized (1,(7)(8)(9)(10)(11). Job titles are used as surrogate variables under the presumption that the information in job titles is "good enough"; this may be a reasonable argument when the effect of the primary study variable is strong. However, in studies in which the primary study variable is weak when compared with the effect of a strong confounder, misclassification of the surrogate can reduce the power of the study and lead to seriously misleading inferences (1 2, 13).
The situation of a strong confounder and a weak study variable arises in evaluations of the effect of back belts (primary study variable) while controlling for physical work exposures (confounders) among retail merchandise store workers. In large-scale epidemiologic back-belt studies, with limited control over the use of belts by participants, the need to estimate physical work exposures accurately gains in importance (14,15). During the design of a back-belt intervention study (a study of approximately 10 000 workers in 160 stores) (16) for the National Institute for Occupational Safety and Health (NIOSH), it was felt that, if administrative job titles were the sole estimate of physical work exposures, residual confounding because of incomplete control of the physical work factors might remain even after adjustment of the rate ratio. For an estimation of this bias from surrogate variables, a more direct measure of physical work exposures for this population was needed. Because direct measures of physical work exposure were available only with a great expenditure of effort, a 4-store job analysis and pilot survey of systematically observed postures and weight handling was carried out (17).
The data presented illustrate residual confounding due to misclassification bias occurring from using easily obtained administrative job titles as a surrogate for physical work exposures among retail merchandise workers. The misclassification in this setting concerns biased estimates of the effect of back-belt wearing on back injury. The data are from (i) pilot survey data from 134 workers in 4 stores providing belt-wearing data, observations of postures and weight handling, and a 3-level job classification and (ii) back-injury rates by a 2-level job-classification variable using workers' compensation data from 3 1 076 workers in 260 stores of the same chain. The objectives of this report are to determine the sensitivity and specificity of the job titles from the administrative data from the pilot survey for low versus high levels of physical work exposure and to estimate the amount of bias which would remain for the rate ratio in a back-belt intervention study if the job titles were used as the sole control of confounding by physical work exposures.

Administrative job titles and workers' compensation injury rates
In these retail merchandise stores, 2 job titles (department manager and stocker or unloader) have significant exposure to back stressing activities; however, stockers and unloaders typically handle much heavier case weights of merchandise and are particularly highly exposed during the unloading of trucks. Using workers' compensation and payroll data, we previously showed that these 2 job titles had incidence rates of 1.82 and 3.64 injuries per 100 workers per year, respectively, for back injury. The data came from administrative payroll records and workers' compensation reports supplied from the corporate databases of the retail merchandise company for 1994-1995 (18). The 2-category job title combined stockers and unloaders into a single entity. Because stockers and unloaders share many of the same duties, particularly on the 1000 to 0600 shift, the combination was administratively logical. However, day-shift stockers are assigned jobs with significantly lighter physical work requirements due to the lack of truck unloading during the day, and the majority of stockers are employed on the day shift. Thus there is a potential for the misclassification of back-stressing exposures when a job title combining stockers and unloaders is used.

In-store survey and job analyses
An assessment of the workers' physical work exposures was carried out in an in-store survey of 134 retail merchandise workers in 4 West Virginia stores. A detailed description of the data collection procedures, interobserver agreement, and distributions of the data has been presented by Pan et a1 (17). A form of systematic observation which registers work at the level of activities was needed. The authors used the PATH (postures, activities, tools and handling) observation method because it was well adapted to work with work cycles of variable lengths and permitted the registration of activities (19). The 4store survey was carried out for the same job titles and the same chain of stores as the NIOSH back belt intervention study in order to test the systematic observation methods. For the current report, the in-store survey provided data on physical work exposure variables, job titles, and back belt use. The posture and weight-handled variables in the 4-store survey were used as the standard risk factors against which to calculate the sensitivity and specificity of the 2-category job titles. The sensitivity and specificity of the job titles for the dichotomized risk factors was determined from a 2x2 matrix. The sensitivity and specificity of a job was assessed for 134 workers separately for weight handled and posture, based on 11-15 minutes of observation per person, using multiple observations per person during the observation interval (17).
For each person observed in the pilot study, the weight handled was recorded at each time point as (i) no weight, (ii) <0.4 kg, (iii) 0.4-10 kg, (iv) 10-20 kg, or (v) >20 kg. Persons observed handling weights in the 0.4 to 10 kg or heavier categories at least 25% of the time, and also handling >10 kg at least 10% of the time, were classified as "heavy lifters." These workers were thus lifting at least moderate weights some of the time in addition to handling heavy weights at least occasionally. All others were classified as "not heavy lifters." For each worker, the trunk posture was also recorded as neutral or nonneutral at each time point. Workers observed with a nonneutral posture at least 25% of the time were categorized as having a "nonneutral posture". All others were categorized as having a "neutral posture". Frequencies of job title by weight handled, job title by posture, and job title by belt wearing status have been previously published (17). Sensitivity and specificity between job title and the definitions of posture and weight handled were calculated.

Quantifying confounder misclassification
Because misclassifying confounding variables can be as damaging as misclassifying primary intervention variables, we used the method described by Savitz & Baron (20) to quantify the bias introduced by the incomplete control of confounding. The following equation calculates the completely adjusted rate ratio (RRcA), which is the rate ratio adjusted for the true confounder: where RR, is the partially adjusted rate ratio, which is the rate ratio adjusted for the surrogate measure only, RRc is the overall crude rate ratio for the intervention variable, and the percentage of adjustment of the rate ratio, PARR, is a quantity between 0 and 1 that represents the degree to which the surrogate variable misclassifies the true confounder.
With perfect classification of the confounder (by the surrogate variable), the percentage of adjustment of the rate ratio takes the value 1 and the completely adjusted rate ratio equals the partially adjusted rate ratio. For less than perfect confounder classification, the percentage of adjustment of the rate ratio is calculated as a function of the specificity and sensitivity between the surrogate measure and the true confounder. This relationship is given by the curves in figures 1, 2, and 3 in the report by Savitz & Baron (20), which shows the residual confounding that remains in the odds ratio of the study variable at predetermined levels of the primary variable and confounder odds ratios. The magnitude of bias in the partially adjusted, when compared with the completely adjusted, rate ratio is calculated as (IRR, -RRcAI)/RRcA; it reflects the classification accuracy and the extent of confounding between the primary and confounder variables.
For this study, nonbelt wearing (as compared with belt wearing) was considered the primary intervention variable, and ergonomic exposure was the true confounder. Since determining the optimal characterization of ergonomic exposure is difficult, both trunk posture and weight handled were considered the true confounder. The 2-category job title, obtained through company records, was defined as the surrogate measure. RR, was estimated as 1.3 on the basis of prior knowledge from 5 published studies on the effect of back belts on back injury or back pain (21)(22)(23)(24)(25). The crude rate ratio, RR,, has been calculated as a function of (i) RR,, (ii) the incidence of back injury by job title and belt-wearing status, and (iii) the distribution of workers by job title. (See the appendix.) Estimates for incidence rates were obtained from the analysis of 260 stores of the same retail chain during 21 months in 1994-1995 (18).

Results
In back-injury prevention studies a confounding variable must be associated with the outcome, back injuly, and may be correlated either positively or negatively with the primary study variable, wearing back belts. In the survey of workers' compensation data from 260 stores, the association between job title and back injury had a rate ratio of 2.0 for stockers or unloaders versus department managers [95% confidence interval (95% CI) 1.72-2.341. In the pilot survey of 4 stores of the same chain, the association of the administrative job title variable with belt-wearing status had a prevalence ratio of 2.32 for stockers or unloaders versus department managers (95% CI 1.51-3.56).
The distribution of the back-stressing risk factors "weights handled" and "bending and twisting of the trunk" showed considerable heterogeneity across the job titles (17). Stockers and unloaders were observed in bent or twisted postures 43% and 63% of the time, respectively, which was 6 to 9 times as often as for department managers, who were only observed in such postures 7% of the time. However, only unloaders were observed doing heavy handling a significant percentage of the time (27%), compared with 0% for stockers and 3% for department managers. All further calculations collapse stockers with unloaders due to the limitations of the 2level job-title variable of the payroll data.
Weight handled was defined as "heavy and not heavy" and postures as "nonneutral and neutral" for the purposes of determining the sensitivities and specificities as described. Table 1 shows the sensitivity and specificity for the 2-category job-title variable with respect to the risk factor of nonneutral postures. The sensitivity to detect nonneutral postures (97%) was very good, but the specificity for neutral postures of 68% was only fair. Similarly, table 2 shows that the sensitivity for heavy weight handling was good (85%), but the specificity for nonheavy weight handling was only fair (58%). The poorer specificities were due to the fact that stockers represented a very mixed job title, with some stockers handling exclusively light-weight merchandise.
The Savitz & Baron equation (20) requires specifying a partially adjusted (based on job title) rate ratio for back injury, a crude rate ratio, and the percentage of adjustment of the rate ratio estimated from the sensitivity and specificity of the job title in order to estimate a completely adjusted rate ratio. Derivation of the crude rate ratio of 1.0 using the belt-wearing distribution data and injury incidence data is given in the appendix. The negative confounding (ie, crude rate ratio biased towards the null) reflects the fact that among the 134 workers, there was a tendency for those doing heavier physical work, mostly stockers and unloaders, to wear back belts (ie, prevalence ratio of 2.32 for back-belt wearing). The stockers and unloaders also had a history of the highest workers' compensation back-injury rates. Taken together, these factors bias the rate ratio for back belts towards the null. A further implication is that better statistical control of physical work-exposure variables would lead to rate ratio estimates farther away from the null than 1.3. Figure 1, reproducing figure 2 from Savitz & Baron (20), plots the sensitivities and specificities for postures and weights handled from tables 1 and 2 and derives the percentage of adjustment of the rate ratio. From the appendix we have the back injury incidence rates from reference 18 and the distribution of belt wearing by job title from reference 17. Given the estimate of a partially adjusted rate ratio (RR,) of 1.3, we then calculated the rates of back injury in each of the 4 groups (job title x belt wearing status). Then, given the distribution of job titles among the 134 workers in the pilot survey, we pooled the incidence rates for belt-wearing workers and analogously for nonbelt wearing workers to derive the crude rate ratio (RR,) for back injury for nonbelt versus belt wearing. Once RR, and RR, are known, the completely adjusted rate ratio can be calculated by substitution as RR,, = [(RR, -RR,) / PA,,] + RR,. Table 3 presents the partially and completely adjusted rate ratios, the percentage of adjustment of the rate ratio (from figure 1) and the magnitude of bias in the partially adjusted rate ratio. The magnitude of bias remaining in a back-belt variable adjusted only for a 2-category job title ranged from 24% to 45%. With the use of the average of the completely adjusted rate ratios [ie, (2.4+1.7)/2], this level of residual confounding was sufficient to bias a rate ratio from 2.0 to 1.3.

Discussion
A major problem confronting epidemiologists concerned with evaluating intervention programs to reduce musculoskeletal disorders is the difficulty of estimating  relevant physical work-exposure variables for the study participants. This problem is particularly important for studies in which the intervention is applied to the individual (such as screening or the use of protective equipment). A review of exposure assessments of risk factors for back disorders conducted in occupational epidemiologic studies concluded that the quality of exposure data in most of the studies reviewed was poor (7). Thirty-eight of the 81 reviewed studies (47%) used job titles or categories as their sole measure of exposure to risk factors for back problems. As has been stated repeatedly, the methods of assessing work exposure in studies of workrelated musculoskeletal disorders need to be improved (10,26,19), and job title is frequently a poor surrogate for physical work load in that it is difficult or impossible to allow the exposure-response relationship to be established (15). While it can be argued that job titles present objective information, the amount of information broad job categories capture is really very limited, and in industries such as construction, agriculture, and retail store work long and irregular work cycles further reduce the correlation between broad job categories and the underlying risk factors for back problems. Because of the pervasive danger of obtaining a biased rate ratio due to inadequate control of confounding by a job title variable, we sought to determine the magnitude of bias that might occur. An estimate of the likely direction and magnitude of this type of bias is helpful in justifying the need for additional resources to estimate variables of physical work exposure more accurately in the NIOSH intervention study of the effects of back belts (16), as well as in justifying the use of job observation in future studies. One published study concerning back belts reported evidence of negative confounding by variables of physical work exposure (23). In that study the crude odds ratio for not wearing a belt was 1.17, which increased to 1.67 when control for estimated weight lifted per day was introduced.
A more direct measure of the underlying variables for physical work exposure, other than job title, was desired, but the alternatives (questionnaires, diaries, systematic observations and direct measurements) did not provide any obvious easy choices for replacing job titles. For a detailed discussion of the trade-offs among these 4 methods see Winkel & Mathiasson (27) and Kilbom (15). Each of the 4 alternative methods has advantages and drawbacks. Direct measurements using instruments such as lumbar motion monitors, videos for motion analysis, goniometers, and inclinometers generate highly reproducible data but cannot feasiblely be employed at the worksite when thousands of workers are spread across 160 worksites. Questionnaires or interviews about postures and manual materials handling are much easier to deploy in a large study, but may have validity problems. In a large validation study self-reports of extreme postures and heavy lifting agreed well with results obtained with direct instrumentation or systematic observation. But less extreme postures and the lifting of weights below 10 pounds (4.5 kg) had universally poor agreement (7). Two other validation studies have found generally poor agreement between questionnaire-reported postures and loads and direct observations (28,29). The agreement between self-assessment of the number of task repetitions and the true number of repetitions has been found to vary greatly (30). Diaries have not been used extensively in workplace studies, even though multiple entries may reduce measurement error. Because they record activities concurrently, diaries may suffer from less recall bias than questionnaires do (3 1). Systematic observations offer a compromise since they are less expensive and more suitable to a crowded workplace than direct instrumentation and they are potentially more reproducible than questionnaire estimates of posture angles and discrete weights because of the use of trained observers. Systematic observation methods have been used in numerous workplace studies, with manual or computer-assisted data entry, using time-sampling or real-time measurements. It was felt that a time-sampled version of systematic observations on a reasonable-sized subset of the study population could supply estimates of postures and weights lifted and could be superior to a surrogate such as job title.
In the example presented, misclassification relative to systematic observations was introduced through the use of a dichotomous job title as a surrogate for physical work exposures. No simple rule can be adduced from these data to determine the proper number of levels for a better surrogate variable for work exposure. Administratively constructed data are created for the convenience of company pay and personnel management, and they bear little or no relation to what is desirable for epidemiologic studies. Therefore, it is difficult ever to know the amount of misclassification inherent in administrative data-set variables without conducting job analyses that break the job titles down into more homogeneous entities. In the example presented, a 2-level job-title variable was not good enough as a surrogate for physical work exposures; the estimated rate ratios for back injuries due to not wearing back belts would have been biased towards the null by 24% to 45%. Additional levels for a surrogate variable for work exposure is a sensible way to protect against this type of misclassification bias; general recommendations call for using 4 or 5 levels for confounding variables (14).
Misclassification of a confounder, as outlined for this population of retail merchandise workers, is more of a threat to validity when the study variable is a weak risk factor in comparison with the confounding factor (12,13). The rate ratio for the highest to lowest level of work exposure, as estimated conservatively from the workers' compensation data, is at least 2. By contrast, all the published studies of back belt risk ratios have been 1.6 or below. In addition to concern raised by a weak study variable is the issue of whether other sources of error are likely to cancel or exaggerate residual confounding from the misclassified confounder (20). Nondifferential misclassification of the primary study variable (back belt wearing) tends to bias the rate ratio towards the null and exacerbate the residual negative confounding by the misclassified confounder. Knowing the likely direction of the measurement biases in advance is helpful in determining which method should be used to assess physical work exposures.
While the best overall replacement (ie, direct instrumentation, self-administered questionnaires, telephone interviews, diaries, systematic observations) for administrative job title data remains unresolved, there are some features of the systematic observation method that are attractive for large epidemiologic studies and musculoskeletal disorder intervention. First, several specialized instruments for systematic observation have already been developed; these have been reviewed by Kilbom (15). Second, the PATH method is particularly well suited for use in epidemiologic studies because it was designed to quantify musculoskeletal risk factors by task (14). Following a systematic observation of a representative sample enrolled in an epidemiologic study, subsequent selfreports from the workers can quantify the individual duration of tasks and, from this information and the distribution of risk factors within tasks, the exposure to the work-related risk factors can be estimated. Such combined measurement methods have previously been recommended (19,32). The process of characterizing the entire study population with a representative sample can be formally accomplished through imputation or multiple imputation methods (33). Iterative procedures, such as the EM algorithm (34), use the subset of complete data to calculate maximum likelihood estimates. By establishing a relationship between the likelihoods of the complete and incomplete data, optimal values are then imputed for missing observations.
There are some limitations to the use of systematic observation methods. The method we employed, the PATH method, requires instantaneous registration of all work postures, activities, and weights at a single moment in time. Therefore, the precision of work-exposure estimates with that method is still limited to only a few categories. For many epidemiologic studies, the requirements for precision are not so great, and the exposure estimates based on the imperfect standard of time-sampled systematic observations may be adequate. Another limitation with the systematic observation method is that workplaces must be accessible and the workers must be followed at close range while working. In the retail stores of this study, this was not much of an issue, but in other settings, such as in construction or agriculture, viewing workers at close range can be a problem. Finally, the accuracy of estimation for a person when self-report data are combined with the distribution of risk factors within worker subgroups is difficult to assess. These estimates are dependent on many worker factors, including intraindividual task variability, recall bias, language problems, and difficulty in correctly estimating the percentage of the workday spent in each activity (35).
In a review of exposure assessment in occupational epidemiology (I), a concluding recommendation was "to diminish misclassification, the best feasible methods should be used to assess work exposure and collect work histories" [p 271. Because the best methods of assessing physical work exposures will continue to be debated, this statement can be restated for musculoskeletal studies to say "use methods that are good enough to measure critical musculoskeletal risk factors with enough validity that study measures (rate ratios, odds ratios) are substantially unbiased". In our opinion, the best feasible methods for musculoskeletal epidemiologic studies, particularly intervention studies, should include some combination of systematic observations, job analysis, and collection of selfreport task data. An approach using these methods puts the burden on the researcher to control the quality of work exposure data and shifts the discussion of exposure assessment away from administratively determined job titles.