Reducing random measurement error in assessing postural load on the back in epidemiologic surveys

Objectives The goal of this study was to design strategies to assess postural load on the back in occupational epidemiology by taking into account the reliability of n~easurement methods and the variability of exposure among the workers under study. Methods Intertnethod reliability studies were evaluated to estimate the systematic bias (accuracy) and randoin measurement error (precision) of various methods to assess postural load on the back. Intrainethod reliability studies were reviewed to estimate random variability of back load over time. Results Intermethod surveys have shown that questionnaires have a moderate reliability for gross activities such as sitting, whereas duration of trunk flexion and rotation should be assessed by observation methods or inclinometers. Intra~nethod surveys indicate that exposure variability can markedly affect the reliability of estimates of back load if the estimates are based upon a single measurement over a certain time period. Equations have been presented to evaluate various study designs according to the reliability of the measurement method, the optiniuin allocation of the number of repeated measurements per subject, and the number of subjects in the study. Conclusion Prior to a large epidemiologic study, an exposure-oriented survey should be conducted to evaluate the performance of measurement instruinents and to estimate sources of variability for back load. The strategy for assessing back load can be optimized by balancing the n~unber of workers under study and the number of repeated measurements per worker.

Mechanical load is regarded as a primary cause of musculoskeletal disorders. In epidelniologic studies no instruments are available with which to measure mechanical load directly on specific segments of the human body. Hence biomechanical models are commonly used to estimate the forces and moments acting on the location of interest. For example, a biomechanical model has been developed to predict compression and shear forces at the level of the L,-S, intervertebral disc; it takes into account the weight and position of the trunk, the arms, the upper and lower legs, and any external load if present (1). Similar approaches using biomechanical models to estimate postural load on other body segments have been published for the shoulder (2) and the neck (3). When biomechanical lnodels are applied to predict postural load, exposure to postural load can be estimated by ineasuring the angular position of the body segment of interest.
Several techniques have been developed to measure the distributions of angular position of body segments of workers performing their regular activities. These techniques range from subjective methods such as questionnaires to objective methods with real-time recording of the posture of the human body (4). Questionnaires can be used to collect reasonably simple data, but detailed and complex data cannot be sought without risk of substantial measurement error (5). Therefore, questionnaires may be too coarse a method with which to arrive at a quantitative conclusion on an individual's exposure. Sophisticated measurement techniques like three-dimensional video analysis may provide satisfactory information, but their applicability in epidemiologic studies is hampered since they are often more elaborate, expensive or timeconsuming (6). Moreover, these techniques may contribute information too detailed to summarize exposure patterns in parameters useful in epidemiologic studies.
In epidemiologic studies the validity and the practicability of a method for assessing postural load must be balanced. Feasibility  use of alternative methods that are less precise or less valid or both. This article discusses the choice of a measurement method in relation to exposure assessment strategy. The assessment of postural load on the back is talcen as an example, but many considerations also hold for the assessment of postural load on other body segments. This paper starls with a brief review of the instrulnents of exposure assessment currently used in epidemiologic studies on associations between postural load on the back and the occurrence of back disorders. Errors in exposure assessment are examined by means of validity and reliability aspects. Finally, the reliability of a particular measurement method is evaluated in relation to the design of a measurement strategy in a particular epidemiologic study.

Assessing the validity and reliability of measurement methods
Validity and reliability studies are essential tools in addressing misclassification and measurement error in categorical and continuous exposure measures, respectively. Moreover, estimates of parameters of validity and reliability can be used to evaluate the attenuation of associations between exposure and disease due to nondifferential measurement error (5). A general definition of validity is the degree to which a measurement measures what it purporls to measure (7). The validity of a measurement technique can be derived from a comparison with an instrument that measures accurately (ie, zero bias) and precisely (ie, small random error), like a gold standard that measures the true exposure value, on the average, with a random error sufficiently small considering its purpose.
Fairly often a perfect measusenlent instrument is not available or it is infeasible so that alternative methods are used to ascertain the validity of a measurement technique. An appropriate approach is to conduct a reproducibility study to obtain indirect information on error distributions (5,8). In reproducibility studies two or more separate assessments of exposure are performed on the same individuals, either by different instruments, if intermethod reliability is to be assessed, or by repeated measurements with the same instrument, if intramethod reliability is to be estimated. The rationale behind this procedure can be made more explicit by considering the relationship between true scores (T) and observed scores (X) with two imperfect measurement techniques in a where XI, = observed score for subject i at time I with method 1, X,, = observed score for subject i at time t + At with method 2, T, = true score for subject i at time t in random sample of rz subjects, a, = constant bias, PI = relative bias, and E,, = random measurement error.
In studies of intramethod reliability methods 1 and 2 are the same and At > 0. In intermethod studies methods 1 and 2 are different and At can be zero. Method 1 is considered perfect if a, = 0 and 0, = 1, with E, being sufficiently small. In the classical test theory the described parallel test model is used to obtain the reliability coefficient, p,,,,, by correlating parallel measurements on a population of N subjects. Undcr strict assumptions the reliability coefficient can yield information about the validity coefficient of either method 1 or method 2, although these latter coefficients, p2,, and pZm2, cannot be estimated directly fi-om empirical data. In formula form (9): Essential assumptions are: normally distributed parameters T, XI, x,, &, and &, ; XI and X, have equal true scores (a, = a, and p, = p2 = 1); random errors with E(E,) = E(E,) = 0; equal error variances with oE12 = G~~~ = 0:; and uncorrelated errors with pElE2 = 0 and p,, = p,,, = 0.
The assumption of equal variances may be incorrect, particularly when an exposure measure XI of moderate precision (eg, derived from a diary) is compared with measure X2 with greater precision (eg, derived from an observation method). If the other assumptions are not violated, it can be demonstrated that (9): This equation presents an upper and lower boundary for the validity coefficient of exposure measure XI. The intermethod reliability coefficient is estimated by the Pearson correlation coefficient r of continuous variables and the Spemlnan cosselation coefficient r, of categorical variables with an ordinal scale (5).
One has to bear in mind that the reliability coefficient obtained by parallel testing depends on the range of the true scores in the sample. Reliability will increase with a wider range. Therefore, the use of the reliability coefficient as a measure of agreement has been criticized (10). Caution is also needed because the reliability coefficient in a particular sample of n subjects is an estimate of the true reliability coefficient in the total population of N subjects. Reliability in another sample of subjects with a different exposure distribution may result in a different coefficient.
The parallel test method can also be used within the context of an intramethod reliability study by applying a particular instrument to the same subjects at two or more points in time. This is of special interest when exposure variability due both to imperfect measurements with random error and to variability of exposure over time is evaluated. The Pearson correlation coefficient r is not appropriate for intramethod surveys since systematic bias (constant or relative) is not reflected in the correlatioll coefficient. In intramethod studies the reliability p, is estimated by the intraclass coefficient r, for continuous variables and the weighted Cohen's ltappa K, for categorical variables (5). These estimates treat systematic bias as part of the random error. When bias is absent, however, p, is equivalent to r, provided each pair of measurements is considered twice (1 1). Liu and his colleagues have shown that the intramethod reliability coefficient p, in a study with two nleasureinents of each worker can be expressed as (12): where 0,2 denotes the intraindividual variance and 0,2 the interindividual variance of the exposure variable measured. The ratio CJ,~/O,.~ is also called the variance ratio h. Thus the reliability of a dual lneasurement survey is estimated by the intraclass coefficient I.,, which can be rewritten as the term 1 /(1 + h). The objective of intramethod surveys is to estimate the performance of a particu-lar measurement method in relation to its reliability and exposure variability.

Measuremenf errors in mefhods for assessing back load
Reviews of epideiniologic studies on back disorders have demonstrated that job title is the exposure variable most frequently used. Among the epidemiologic studies that attempted to assess postural load on the back, a questio11naire approach was Lhe most cornsnon (4,6). In the pas1 decade observatioslal Lechniques have been increasingly used (6). These methods vmy froin pencil-and-paper techniques based upon multiple observatio~ls of worl<ers at specific intervals throughout a representative period of work activities (13) to video-computerized systems for the real-time recording of trunk postures and lnovenlents (14). Howcver, epide~niologic studies addressing the reliability of the lneasurement inelhod applied are few (6). Table 1 presents findings of intermethod reliability studies on various aspects of postural load (15)(16)(17)(18)(19)(20)(21).

Reducing random measuremelit error in postural load assessment
Although some studies refer to the validity of a specific measurement instrument, the correlations have been interpreted as intermethod reliability coefficients. Questionnaires are of limited use when postural load is assessed. Only duration of sitting during a norrnal shift was estimated with reasonable reliability and without systematic bias (16,21). Therefore, assessing gross postural activity by questionnaire, defined as either duration of sitting or duration of standing or walking, may be appropriate. When a questionnaire is considered for this particular purpose, a diary kept on the subjects over several shifts can offer the advantage of collecting infornlation on variation in exposure (16,19). Postural load due to trunk posture is best assessed by direct observation or inclinolneter measure~nents. Comparisons of both methods showed reliability coefficients of about 0.60 for duration of trunk flexion over 20 degrees (15,17). One study mentioned an extremely strong correlation of 0.99 for the frequency of trunk flexions over 72 degrees (20). This extraordinary result may be due to a highly skewed exposure distribution with a few subjects with high exposure.
The occurrence of systematic bias (ie, a # 0 or P $1) in several reliability studies is problematic since it may lead to an overestimation or underesti~nation of the risk per unit of exposure. Systematic bias can result from a dependency of the relative bias on the exposure magnitude. Several authors have pointed at a discrepancy in the definition of the angular position of the trunk as another origin of bias (15,17). Intrainethod reliability studies focusing on changes in postural load over time are few. Harber and his colleagues demonstrated considerable variability in trunk flexion both within and between supermarliet checkers, whereby the coefficients of variation for within-wosker variance tended to be higher (22). Another study presented variance ratios for trunk flexion and trunk rotation among five occupational title groups. After log-transfor~nation of the exposure data, variance ratios were reported that varied from 0.2 to 7.1 (23). When normally distributed exposure parameters are assumed and this particular range of variance ratios is used, the associated reliability coefficients, expressed as intraclass coefficient I.,, would range from 0.12 to 0.83. A few studies have been published on intra-and interobserver agreement for observation methods. One study reported an interrater agreement of 86% for duration of trunli flexion and rotation (20 degrees) (24). Another study presented agreement of over 90% for the duration of trunli flexion (>I5 degrees) within and between observers (25). In a study with the assessment of the same parameter of exposure, an interobserver agreement of 81% was reported (15). Recently, van Beeli and his colleagues described interobserver reliabilities for trunk postures (assessed at four levels) of more than 80% agreement (26). These studies suggest that intraobserver and interobserver agreement are high when trained observers are used to collect data on exposurc to postural load. Although the measures of agreement were not presented by estimates of reliability coefficients, it can reasonably be assumed that the measurement error of an observalion method due to observer variability is small cornpared with the true variability in exposure due to workplace conditions (23).

Random measurement error and ihe attenuation of odds ratios
The information on the reliability of a particular measurement method can be used to evaluate the influence of randorn measurement error on the risk estimate in epidemiologic studies. In the case of musculoskeletal disorders cross-sectional studies are often published that present the odds ratio as a risk estimate. The ainount of measurement error in a continuous exposure variable, such as percentage of work time with trunli flexion, can be expressed by the variation from either the "true" value or the same measurement repeated (11). The effect of randoin ineasurernent error on the coefficient in a regression model is given by (8,11): where OR, is the observed odds ratio per unit increment of exposure in the logistic regression model, OR, is the unbiased true odds ratio, and p, is the reliability coefficient estimated by the intraclass coefficient I.,. Figure 1 graphically illustrates the influence of four intraclass coefficients with values l.0,0.8, 0.5 and 0.1, respectively, on the true odds ratio. These coefficients correspond to variance ratios of 0, 0.25, 1 and 9, respectively, and reflect the results obtained in a study on the variability of exposure to trunk flexion and rotation (23). The figure shows the attenuation towards unity for all intraclass coefficients and a dramatic weakening of the odds ratio at intraclass coefficients of 0.5 and below. An intraclass coefficient of 0.5 corresponds to a validity coefficient of 0.7, under the strict assumptions of equation 2.
In general, this approach to estimating the attenuation in the odds ratio holds true for one exposure variable measured with random nondifferential error. In the specific situation of calculating odds ratios as risk estimates, an additional assumption is that the outcome variable of interest must be a rare disease. When inore than one covariate in the logistic model is measured with error, then the estimates of any of the covariate effects can be influenced by measurement error (8,28). Appropriate models taking into account multiple variables with random nondifferential measurement error have been discussed extensively by Rosner and his colleagues (29,30). In these models a correction procedure for observed odds ratios can be complicated by the fact that both attenuation and overestimation may occur (28).

Optimizing measurement strategies
The previous section was restricted to the classical approach of a reliability study prior to the epidelniologic study that applies the same method of exposure measurement on a large scale. Such a study usually consists of an inter-or intramethod reproducibility survey of a limited number of measurements to derive a reliability coefficient. This information is then used to evaluate the influence of random error in single measurements of exposure in the epidemiologic study. Examples concerning postural load on the back are the surveys presented in table 1. An important drawback of this approach is that one measurement of exposure to postural load may not be sufficient to estimate accurately the true exposure of each subject. The few studies on patterns of exposure to back load have demonstrated considerable variability in trunk flexion both within and between workers (22,23). This problem can be addressed by choosing the appropriate number of measurements per subject to distinguish one worker from another and, at the same time, assuming that the true exposure is estimated unbiased by the average value of a number of measurements. In recent publications this principle has been used to review effects of random measurement error in various study designs (5,11,12,31).
Liu and his colleagues have presented an equation to calculate the reliability coefficient of the average value or, alternatively, the number of repeats required given a particular reliability coefficient. The reliability coefficient of the average exposure, p,, depends on the number of repeats per subject (12): where k denotes the number of repeats per subject. This equation demonstrates that the reliability is determined by the number of measurements per subject and the ratio of the intraindividual variance, 0,2 , and the interindividual variance, o,~. Figure 2 graphically illustrates the relation between p, and k at four levels of the variance ratio h with values 0.25, 1, 4 and 9, respectively. These variance ratios correspond with intramethod reliability coefficients r, of 0.8, 0.5, 0.2 and 0.1, respectively. Figure 2 demonstrates that, in studies in which k is very large or the intraindividual variance is considerably smaller than true odds ratio This equation yields an answer to the question of how many ineasureinents are required to achieve a specified degree of reliability. For example, to obtain p, = 0.8 in study populatiolls with variance ratios of 0.25, 1, 4, and 9, the number of repeat measurements required is 1, 4, 16, and 36, respectively. Thus equations 7 and 8 arc instrumental when one is designing an appropriate mcasurement strategy for large epidemiologic studies. Information on thc variance ratio can be obtained from an exposure-oriented survey prior to the start of the epidemiologic study. In such a survey repeat ineasurements should bc collected randoinly in time on a random sample of workers.
Along the same lines the opti~nuln allocation for the number of subjects (sample size) and repeat ineasurements per subject can be evaluated. The familiar forn~ula for determining the sample size whcn cases and referents in a 1:l ratio are to be compared on a continuous exposure variable is (3 1  The nulnerator contains the t-statistics for the desired level of significance, a, and the power, 1-P, and the pooled variance of exposure. The denominator denotes the difference in exposure to be detected. When the intraindividual variance is equal to zero among the cases and the referents, the pooled variance is estimated by: Substituted into equation 9, the sample size a,. is calculated. The assumption is that the true exposure of each subject is estimated unbiased by a single measurement per subject. If the intraindividual variance is greater than zero, then the unbiased exposure for each subject can be estimated by averaging l c measurements for each subject. In the situation of equal variance ratios for the two groups, the pooled variance can be expressed as (12): and, consequently, the sample size 11, is equal to: This formula shows that the required sample size must be increased by (I + Xlk) due to intraindividual variance. Equation 7 shows that l/p, is equal to the tern1 (1 + hlkc). Hence a simple equation for the required sample size is derived (1 1): The sample size calculated under the assumption of intraindividual variance equal to zero (12,) need only be divided by the reliability coefficient of the average of k repeat measurements to obtain the actual sample size required (n,). Figure 3 shows the relation between the sample size ratio, expressed as rzJn,, and the number of repeat measureinents per subject. This figure demonstrates that a larger sample size can be counterbalanced by increasing the nuinber of repeat measurements. The efficiency of either decision depends on the actual value of the variance ratio and the study costs. Suppose the researcl~er is interested in the risk for low-back pain associated with trunlc flexion, expressed by the average percentage of worlc time with trunlc flexion over 20 degrees. The minimum difference of significant importance is set at 5% and the standard deviation of exposure to trunlc flexion among worlcers with low-back pain and those without low-baclc pain is about 10%. A two-sided a of 5% (t, = 1.96) and a 90% power (tp = 1.28) are defined. According to equation 8, a sample size rz, of 84 is obtained. In groups with a variance ratio of 0.25 the required sample size is 105 with one measurement per worker and 95 with two measurements per worl<er. The extra costs of 95 measurements are usually less than offset by the decrease of the sample size with 10 worlters. However, two mea-surenlents per subject in groups wit11 a variance ratio of four reduce the sample size by about 40% (frorn 420 to 252). If the cost of one additional measurement per worker is smaller than the cost or monitoring one additional worker, more measurements per worlcer could be an attractive option.

Discussion
The importance of validly assessing exposure in epidemiologic studies has been stressed by many authors (5,8,11,12,(27)(28)(29)(30). Several studies have shown that assessing exposure to postural load on the baclc is subject to systematic bias and random measurement error (21)(22)(23). Both can lead to spurious conclusions about the relation between postural load and the occui-rence of back disorders. Therefore, exposure assessment should be a major topic in the design of epiderniologic studies on back disorders (4).
The choice of an appropriate strategy for assessing postural load on the back at the worlcplace is influenced, for example, by the type of epide~niologic study, the amount and detail of data required, the validity of the measurement method, and the variability oTpostura1 load within and between workers. Three methods cornmol~ly used in musculoslceletal epidemiology are the questionnaire, the diary, and the observation technique. Questionnaires have been applied to collect information on past and present exposure to various aspects of back load, such as duration of daily periods of sitting and postures maintained with a twisted trunk (6). Diaries offer the advantage of minor recall bias since they focus on the assessment of present exposure. Their application may take account of infrequent exposure and the variability of exposure (19). Observation techniques deliver the most-detailed information on exposure patterns but are expensive and time-consunling, both of which hamper their use in epidelniologic studies. Although these considerations may guide the researcher towards a particular measurement technique, the decision should also focus on measmernent error assoc~ated with the application of various measurement methods and its consequences for the risk estimate in the epidemiologic study planned.
The reliability of a measurement method can be evaluated in i~~terrnethod reproducibility studies. The results of intermclhod studies, as presented in table 1, show that questionnaires and diasies often lack accuracy. The systematic bias can be estimated as the difference between the sample means of methods 1 and 2. In theory, individual exposure assessments can be adjusted when this bias is constant in time and independent of the expo-sure and disease status of the subjects. Generally, this will not be the case, as the magnitude and direction of systematic bias (at the individual level) is difficult to predict. Therefore, these studies suggest that questionnaires and diaries may be warranted only when gross postural activities such as sitting, standing and walking are assessed (18,19). Postmal load due to trunk posture is best assessed with observation techniques or direct instrumentation.
Another approach for assessing the reliability of a measurement method is an intrametllod reproducibility survey. Reliability coefficients in these surveys include the effects of instrumental features and exposure variability. The latter source of impesfect exposure assessment often surpasses the effect of instrumental characteristics (23). Again, systematic bias can introduce insuperable problems. When systematic bias is absent, the reliability coefficient yields information crucial to the design of lneasurernent strategies. An exposure-oriented survey could combine the inter-and intra~nethod approach by conducting measurements with two methods with each pair of measurements repeated once. This procedure allows both the occurrence of systematic bias (instrumental cl~aracteristics) and random ersor (precision and exposure variability) to be estimated.
This article does not address the important question of whether a higher frequency or a longer duration of measurement is to be preferred. It is obvious that a longer duration of measurement will generally decrease the exposure variability. The answer to this question requires detailed information on the temporal variation of back load, which depends on individual behavior and the particular characteristics of the job and tasks involved. Currently, no general guidelines are available to evaluate the trade-off between repeated and prolonged measurements in a group of worlters. Computer simulations on detailed quantitative exposure distributions, such as Monte Carlo techniques, have to be performed to explore the effect of different sample sizes and duration on the estimation of the mean exposure of individuals.
The equations presented in this article enable the influence of random error on the odds ratio in crosssectional studies to be evaluated. In principle, these equations only apply to situations with a rare disease. In cross-sectional studies on back disorders this rare disease assumption is seldom met, and, consequently, odds ratios overestimate risk ratios (32). Hence it does not malce sense to correct an odds ratio for the amount of random measurement exor if the adjusted "true" odds ratio is not a valid rislc estimate. Alternative methods to correct for random measurement error have been developed that are not restricted by the rase disease assumption, based on probit regression (33) and discriminant analysis (34). However, despite methodological drawbacks in the familiar procedure of odds ratios derived from logistic regression, the equations presented can guide towards an appropriate measurement strategy.
An important feature of the intramethod reliability coefficient is its use in evaluating various design options regarding the sample size of the study and the number of measurements per worker. With the use of various criteria for optimizing the study, traditional power calculations can be applied to consider the best measurement stratcgy (35,36). This article described straightforward procedures with which to optimize the measurement strategy through the minimum attenuation of the odds ratio and maximum discriminatory power of the study. A core element in these procedures is an unbiased estimate of the intramethod reliability coefficient or variance ratio. It should be remembered that the reliability coefficient depends on the exposure distributions among the worlcers monitored, and this dependability hampers the application from one population to another (5). In order to obtain a reliability coefficient with reasonable precision, the sample size in an exposure-oriented survey must be sufficiently large. A disadvantage of the application of an intramethod reliability coefficient is that it requires a fully random measurement strategy whereby measurements are randomly sampled within worlcer's exposure experiences and worlcers are randornly sampled within the occupational population. I11 particular situations other measurement strategies may be more appropriate, for example, sampling in specific exposure strata or adopting a group-based exposure assessment strategy rather than assessment at the individual level.
In epidemiologic studies on back disorders the procedures described may intuitively appeal through their focus on the role of repeat measurements. Currently, in epidemiologic studies on back disorders, fairly simple parametcrs of exposure to postural load are being adopted, and the characterization of exposure is usually limited to the worker's average exposure. Valid assessment of a worker's average exposure requires workers to be monitored repeatedly or, alternatively, inonitored with increasing averaging time of measurement. The appropriate number of measurements per subject or the optimum measurement d~iration is relative to the load variation in the populations under study. This paper provides a quantitative framework to evaluate various design options in measurement strategies in relation to the precision of the odds ratio.