Modeling long-term average exposure in occupational exposure-response analysis

Modeling long-term average exposure in occupational exposure-response analysis. Scand J Work Health 1995;21:504-12. Objectives Estimates of long-term average exposure to occupational hazards are often imprecise because illtraindividual variability in exposure can be large and exposure is usually based on one or few measurements. One potential result is bias of exposure-response relationships. The possibility was studied of a more valid measure of exposure being obtained by modeling exposure and consequently increasing the number of days with exposure estimates, using sinlple measurable exposure surrogates. Methods In a group of 198 Dutch pig farmers, exposure to endotoxins was measured on one workday in summer and one day in winter. Farmers recorded activity patterns during one week in both seasons, and farm characteristics were evaluated. Relationships between farm characteristics and activities and log-transformed measured exposure levels were quantified in a multiple regression analysis. Exposure was estimated for 14 d with known activity patterns. Results The ratio of intraindividual and interindividual variance in log-transformed measured exposure was 4.7. Given this ratio, the true regression coefficient of lung function on exposure would potentially be attenuated by 70%. The variance ratio for predicted exposures was only 1.2, and the potential attenuation by variation in exposure estimates was decreased to 8%. There was no relationship between lung function and measured exposure. Modeled long-term average exposure was inversely related to base-line lung function; it reached statistical significance for asymptomatic farmers. C O ~ C ~ U S ~ O ~ S The results suggest that the presented strategy offers a possibility to minimize measurement effort in occupational epidemiologic studies, without apparent loss of statistical power.

The validity and precision of the measures of exposure used in statistical analyses are key issues in studying exposure-response relationships. On a conceptual level, validity refers to the biological relevance of the measure of exposure. More practically, validity is dependent on biases occurring in the measurement process. Precision refers to the random error in the exposure estimate. A regression coefficient of a continuous outcome variable such as base-line lung function on exposure will generally be underestimated when exposure is measured with nondifferential random error (1)(2)(3). This underestimation hampers the detection of the weak associations that are more often studied in occupational epidemiology. Because of practical reasons, it is common practice in studies on chronic effects to estimate long-term average occupational exposure from a limited number of measurements. The resulting measure of exposure is therefore generally subject to considerable measurement error as a result of intraindividual exposure variability. An example in which an underestimation of exposure-response relationships, or attenuation, occurred was demonstrated empirically for a population of 1 105 coal miners (4). On the average 34 dust exposure measurements were available per worker. With three randomly chosen measurements per individual, the regression coefficient of base-line lung function on exposure was only 40% of the regression coefficient when all the available measurements were used. However, three measurements per individual still involved more than 3000 exposure measurements. This result illustrates that reducing attenua-tion by increasing the number of measurements per individual might require substantial investments in field studies.
The magnitude of potential bias in the regression coefficient depends on the ratio of the intraindividual and interindividual components of variance and the number of repeated measurements, according to the formula (2) The application of equation 1 assumes the absence of autocorrelation between repeated measurements. Although the autocorrelation of repeated 8-h time-weighted average (TWA) exposure measurements in occupational settings is usually low and decreases with time between measurements (5), repeated measurements should preferably not be taken on consecutive days in order to estimate the day-to-day variability correctly. Equation 1 refers to stationary processes. If they are not (eg, due to changes over time in the tasks involved, etc), bias cannot be predicted. With a single exposure measurement or few measurements per individual in stationary processes, a large ratio h will result in considerable underestimation of the exposure-response relationship. Underestimation of a relationship can be reduced by either increasing the interindividual variability in exposure or decreasing the influence of intraindividual variability by increasing the number of measurements per individual (2,6). The interindividual variance, or contrast in exposure between individuals, can be optimized in the design stage of the study by choosing a population with large differences in exposure between subjects. The interindividual variance can also be increased by introducing an external group with extremely low or high exposure. Reducing attenuation by only increasing the number of measurements per individual is a method which will often not be feasible, since the costs and logistics of large field work campaigns are restrictive factors. Correction for the attenuation of the regression coefficient can be applied in specific cases if h is known.
It does not, however, increase the power of the study and therefore can only be useful in situations in which exposure-response relationships are expected to be strong.
Empirical modeling of exposure can be used to predict exposure in situations where limited numbers of exposure measurements are available with the use of using simple measurable exposure surrogates in the models (7,8). So far, some researchers have reported the use of statistical models to predict occupational exposures. Several studies were undertaken for industrial hygiene purposes (8)(9)(10)(11)(12). In some studies historical exposure was modeled by job, department, production process, and the like for application in epidemiologic exposure-response analyses (8,(13)(14)(15)(16)(17). Application of this technique to obtain an estimate of individual, current long-term average exposure has, to our knowledge, not been reported.
Work conditions of Dutch pig farmers are characterized by large day-to-day variability in activities and in time spent in different locations with distinctive exposures. Long-term average exposure could not be assessed by a large number of actual exposure measurements for each individual since the field work was logistically complex and costly. In this study, statistical models were used to estimate long-term average exposure to endotoxins among pig farmers. Endotoxins are regarded as a major respiratory health hazard of pig farmers (1 8). First, farm characteristics and daily activities were related to measured exposure to endotoxins. Factors affecting exposure were identified and their association with exposure was quantified. Subsequently, information on activity patterns collected during one week in summer and one week in winter and information on farm characteristics was used to estimate exposure for each day of these two weeks. The average of the estimated 14 daily exposures was used as a measure of long-term average exposure. The number of repeated observations for each individual increased from 2 for measured exposure to 14 for estimated exposure, covering most of the variability in time activity patterns. The different measures of exposure were applied in exposure-response relationship analyses.

Population and health data
The study population comprised 198 male owners of pig farms who worked at least 5 h a day in pig farming in the southern part of The Netherlands. Ninety-eight farmers had one or more respiratory symptoms (chronic cough, chronic phlegm, shortness of breath, ever wheezing, frequent wheezing, chest tightness), and 100 farmers did not have these symptoms. The respiratory symptoms were investigated with a shortened version of a questionnaire on respiratory symptoms developed by the British Medical Research Council (19,20). Lung function was determined during a medical examination. Spirometry was performed according to methods and procedures of the European Community for Coal and Steel, as described elsewhere (21,22). Lung function tests were taken during the day between 0800 and 15 15.

Measurement strategy
Exposure to inhalable dust containing endotoxins was determined by means of personal sampling. The procedure for measuring dust levels has been described in detail elsewhere (23). Endotoxins were analyzed in the dust sample with the kinetic Lirnulus Amoebocyte Lysate test (24).
Exposure was assessed for each participant during one work shift in summer (4 June until 3 July 1991) and during one work shift in winter (27 January until 27 February 1992). The average measurement time was 8.3 (SD 0.6, h. The measurements were made from Monday through Thursday with the day of the week randomly chosen. Meteorological data were obtained from a monitoring station in the south of The Netherlands. In the summer and winter, the farmers were requested to complete a diary on time spent in different activities during the day of the exposure measurement and the following six days. Time (in quarters of an hour) spent in 21 preselected activities in pig farming had to be recorded. Thus for each participant a two-week record of daily activity patterns was obtained. In a subgroup of six farmers, exposure measurements were performed nearly monthly during a one-year period. The procedures were similar to those described for measurements taken in the summer and winter.
The pig farms in the study were characterized by the presence of a large number of compartments in several buildings. The farm characteristics were generally very heterogeneous on a farm and depended on the type of compartment and the period of construction of the buildings. The 198 farms were visited to record the farm characteristics of all the compartments present by means of a walk-through survey. Data were recorded on the number of animals, feeding methods, heating and ventilation, type of floor, bedding material, and degree of contamination. The data set contained 95 distinct variables.

Computational and statistical methods
Data on exposure levels during 2-d periods, activity patterns during a 14-d period, farm characteristics, and outdoor temperature were used to estimate the long-term average exposure to endotoxins. First, associations between factors affecting exposure were quantified. Variation in the log-transformed personal TWA [ln(TWA)] exposure was explained by time spent on activities during the sampling, farm characteristics [either as an average percentage of time dealing with the characteristic or as a dummy variable (Oil)], and outdoor temperature in a classical stepwise regression analysis. All of the 95 characteristics and 21 activities which showed some association with endotoxin exposure in the univariate regression analysis were included in the stepwise regression analysis. Potential confounding factors were evaluated. The analyses were based on log-transformed exposure levels to standardize variance and to obtain normally distributed residuals. All of the measurements were used as independent observations in the analysis since the correlation between repeated measurements taken in the summer and winter was low (0.19). Each independent variable had to meet a significance level of at least 0.50 for entry into the model and was kept in the model at a significance level below 0.10. Model adequacy was tested with standard regression techniques such as residual plots and outlier detection.
The derived regression equations were subsequently used to estimate the ln(exposure) during the sampling (2 d) and ln(exposure) for all the 12 to 14 d for which activity patterns were known. The estimates were based on models using activities as recorded by the individual farmers in their diaries and the farm characteristics, derived from the walk-through surveys. For the modeling exposure, an outdoor temperature of 17°C was used for summer and 5°C for winter, which were the average outdoor temperatures during the field work periods. The average of the ln(exposure)s was taken as a measure of the long-term average exposure. The intraindividual and interindividual variance components were estimated by applying a one-way random effect analysis of variance in a manner similar to a method described earlier (25). The variance components were estimated for measured exposure (two measurements), predicted exposure during the sampling (two estimates), and predicted long-term average exposure (2 12 estimates), all on the log scale. An exposure-response relationship was estimated for predicted exposure during the sampling for the demonstration of the effect of an increased number of exposure estimates per individual, with the use of information on surrogate variables. The inLra-and i%terindividual variability was expressed as ,R,,, and ,R,,,, the ratios between the 97.5th and 2.5th percentiles of the log-normal within-and between-worker exposure distributions, and were computed as exp [3.92 . variance c o m p~n e n t~.~] (26).
The attenuation of exposure-response relationships due to intraindividual variability relative to interindividual variability was estimated and represented as the attenuation ratio. The attenuation ratio is the ratio between the empirically estimated regression coefficient (b) and the true regression coefficient (P), computed as (1 +hln)-I, in which h is the ratio of the intra-and interindividual variance components, and n is the number of repeated measurements per individual (2).
The relationship between the base-line lung function [forced expiratory volume in 1 s (FEVl,o)] and measured and modeled endotoxin exposure was evaluated by means of a multiple linear regression analysis, with adjustment for smoking habits (pack-years), standing measures were based on log-transformed values since exposure variability was also assessed for the log-trans-height, and age. The followi~lg measures of exposure were used in the analyses: average measured exposure, average predicted exposure during sampling, and predicted long-term average exposure. All the exposure Regression Standard P-value coefficient error Table 2. Multiple linear regression analysis of the farm characteristics, activities, and outdoor temperature significantly related to the log-transformed personal endotoxin concentrations of 198 pig farmers (348 observations).

Exposure levels
Altogether 350 endotoxin measurements were available after 46 samples (12%) were excluded because of failure of equipment during the sampling or failure in the endotoxin analysis. Table 1 gives a summary of the exposure levels. The mean exposure in winter was significantly higher than in summer (paired t-test, P < 0.05).
formed exposure measure. The statistical tests were onesided.
The statistical analyses were performed with the program SASJPC version 6.04 using PROC REG and PROC NESTED.

Modeling exposure
Outdoor temperature, 12 farm characteristics, and 8 activities in pig farming explained 37% (adjusted R2 =  (table 2). The activities and farm characteristics explained an equal amount of the variation. The activities in the model were, in general, (almost) daily activities (feeding, controlling, re-penning) and activities done once in a few days only in close contact with very active animals (iron injection, castrating, teeth cutting, ear tagging). Flooring (convex floor, fully slatted floor, fully slatted floor with piglet mat, synthetic grid, combined concrete and metal grid, floor heating, floor heating in combination with delta heating tubes), and to a less extent feeding methods (manual dosage dry feeding, pig starter, automated dry feeding in trough) were the major groups of farm characteristics in the model. Altogether 164 farmers had recorded time activity patterns during at least two periods of 6 d. For them, long-term average exposure could be estimated with the use of predictive models and data on activity patterns during 12 to 14 d. Further analyses were undertaken for the group of 125 farmers with complete data on measured and modeled endotoxin exposure. In this group, the Pearson correlation between individual average and predicted long-term average exposure on the one hand and average measured exposure and predicted exposure during sampling on the other was 0.53 and 0.81, respectively. Table 3 gives an overview of the different exposure indices, and results of the analyses of variance. The interindividual variance in endotoxin exposure was small for measured exposure, with a ratio of 4.1 between the 97.5;h and 2.5th percentiles of the exposure distribution ( , , R, , , ) . The intraindividual variance was considerably larger than the interindividual variance; 82% of the total variability in expo-   sure appeared to be day-to-day variability. The large ratio of intraindividual and interindividual variance would result in largely underestimated exposure-response relationships. With two measurements per individual available, the empirical regression coefficient of baseline lung function on exposure is expected to be only 30% of its true value. As a consequence of the modeling with constant variables for farm characteristics, the intraindividual variance for modeled exposure during sampling was much smaller than for the measured exposure. The interindividual variance was slightly smaller than for the measured exposure. The predicted long-term average exposure yielded an intraindividual variance that was similar to the predicted exposure during sampling, but the interindividual variance was increased. The result was a substantial reduction in the variance ratio h from 2.0 for modeled exposure during sampling to 1.2 for modeled exposure during 14 d. The potential attenuation of the regression coefficient of lung function on modeled endotoxin exposure was reduced by this decrease in h and was further reduced by the use of all data on activity patterns during 12 to 14 d from 50 to less than 10%.
Lung function and exposure I Four people with incomplete data on smoking habits were excluded from the analysis, leaving 121 respondents with complete data. The associations between exposure to endotoxins and base-line F E V , , are presented in table 4. The results are given for the entire population with complete data, for ever smokers, and for farmers without chronic respiratory symptoms. There appeared to be a considerable decrease in base-line lung function with increase in predicted long-term average exposure. The P-values were clearly lower when the regression models with predicted (long-term) average exposure were compared with the regression models with measured exposure. Within the total population, the reduction in FEV,, was 210 ml when exposure to endotoxins increased with a factor of 2.72. Larger reductions were seen among ever smokers and asymptomatic farmers. The estimated reduction of 410 ml in F E V , , in asymptomatic farmers was statistically significant (P < 0.05, tested one-sided). The other presented associations between the modeled long-term average exposure were borderline statistically significant (0.05 < P < 0.10).
Among the never smokers and symptomatic farmers (not shown), and for all the other cases using predicted exposure during sampling or measured exposure, no exposure-response relationship was observed.

Discussion
Exposure to endotoxins is generally regarded as one of the major respiratory health hazards of pig farmers (18).
Our study clearly showed that, despite the considerable measurement effort compared with other studies, it is difficult to detect associations between endotoxin exposure and chronic health effects. For example, in the only previous study among pig farmers, applying personal exposure monitoring to evaluate associations between chronic respiratory health effects and exposure to dust, exposure was based on a single exposure measurement (27). In other studies endotoxin exposure was assessed by static sampling during 1 d among 168 farmers (22) or during 2 d among 46 farmers (28). The large ratio between the intraindividual and interindividual variance in exposure in our study (4.7) implied that the true regression coefficient of lung function on the average measured endotoxin exposure was expected to be attenuated by 70. With the observed value of h, 42 endotoxin exposure measurements would be required to obtain a 10% attenuated regression coefficient (1,2), which need not be uncommon (4).
The large value of h was not caused by an exceptionally large intraindividual variability in exposure. The intraindividual variability compared relatively well with that reported for 81 groups of industrial workers exposed to total particulate matter (25). The large value of h can mainly be attributed to a small interindividual variability in exposure. The ratio between the 972th and 2.5th percentiles of the exposure distribution ($, , , , ) was 4.1. In a recent study (25) on 80 out of 165 industrial groups based on job title and factory, this ratio exceeded 4. For comparison, in a population of 120 workers in eight different job categories in the Dutch animal feed industry, this ratio was 82 for dust exposure and 234 for endotoxin exposure (6). In that study h was about 0.9 for dust and endotoxin exposure, and 30% attenuation would be expected with only two exposure measurements per individual.
The modeled exposure during sampling and the modeled long-term average exposure showed clearly less intraindividual variance than the measured exposure. The ratio of the 97.5th and 2.5th percentile; of the distribution of the individual mean exposure (,R, , , ) was about a factor 4.5 smaller for the modeled long-term average endotoxin exposure and modeled exposure during sampling than for measured exposure. The intraindividual variability in the modeled exposure primarily reflects variation due to the performance of activities present in the models. Day-to-day variability due to time spent on locations with different characteristics was not taken into account. This situation, together with the unmodeled variables, explains the reduction in the intraindividual variance. The interindividual variance of the modeled exposure was only slightly affected when compared with the measured exposure. Clearly, the intraindividual variance in the modeled exposure underestimated the true intraindividual variance in exposure, and the h for the modeled exposure was no precise estimate of the h for actual exposure.
By modeling, the effect of time spent in locations with different exposures was reduced to zero. However, the variability of the modeled exposure due to activities (and season) would still result in considerable attenuation, about a factor two, of the exposure-response relationships when modeled exposure during the 2 d of sampling is applied. Long-term average exposure was therefore estimated with the use of the predicted exposure levels for 14 d, a period covering most activities performed by the pig farmers. For this approach, the attenuation of the exposure-response relationships due to intraindividual relative to interindividual variance was estimated to be 8%. The reduction in attenuation, compared with the modeled exposure during sampling, was partially caused by a smaller variance ratio for the estimated long-term average exposure, which was probably caused by taking into account work in pig farming during the weekend and before and after the exposure measurements. The largest decrease was probably a consequence of an increase in the number of repeated exposure estimates per individual, however; therefore the chance that rarely occurring activities were included was increased. This result clearly shows the beneficial use of estimating exposure in situations in which no actual exposure measurements can be performed and which differ in work conditions that presumably affect exposure levels.
In theory it is possible that other types of exposure have been modeled which may be causally related to lung function, but this possibility seems unlikely to be the case. Especially the characteristics of the flooring were found to be related to the exposure levels, which were probably causally related to the endotoxin concentrations. Manure is an important source of gram-negative bacteria, which in turn contain endotoxins. The amount of manure staying on the floor, as well as the possibility of the exchange of air between the pit underneath the floor and the space above, depends on the type of flooring. In addition, activities which explained the variation in dust exposure were similar to those explaining the variation in endotoxin exposure. The modeled dust exposure was, however, not related to any health outcome (not shown).
A large reduction of several hundred milliliters in FEV,, was expected over the range of the endotoxin exposure levels when the modeled long-term average exposure was applied. The stronger associations between the modeled long-term average exposure and response sharply contrast with the observed associations when average measured exposure or predicted exposure during sampling are used. Among ever smokers larger associations between lung function and long-term average exposure were observed; this finding suggests an interac-tion effect between exposure and smoking, as found in some other studies (29,30). In addition, the relationships between lung function and the long-term average exposure estimated for current work conditions seemed to correlate better within the group of asymptomatic farmers than within that of symptomatic farmers. The estimated effect of endotoxin exposure on lung function was larger for all the reported associations, and the P-values were all equal to or belo1.v 0.10. Standard errors of regression coefficients were larger with the modeled exposure than with the measured exposure, probably due to an error in the modeled exposure (R2 was 37%). These results demonstrate that an exposure-response relationship can be detected by applying exposure modeling techniques in combination with a limited number of exposure measurements per individual, even when the population is almost homogeneously exposed and the dayto-day variability in exposure is relatively large.
In general, several epidemiologic concepts determine the deviation of the estimated exposure-response relationship from its true value, which may be either attenuation, inflation, or change in direction. These concepts include misclassification of disease, the measurement error being differential, selection bias, residual confounding, and the extent to which the applied exposure measure deviates from the etiologically relevant type of exposure. For the etiologically relevant type of exposure precision and validity aspects determined the strength of the correlation between the applied and relevant exposure measure. The main objective of our study was to use modeling to obtain a more valid measure of long-term average exposure for epidemiologic purposes. This goal could not be achieved by increasing the number of actual exposure measurements for each individual to a sufficient number, since field work was logistically complex and costly. The actual effect of modeling with the effect of time spent in locations with different exposure reduced to zero cannot be assessed, but the effect of increasing the number of days with different work conditions to estimate long-term average exposure seems evident. No positive evidence can directly be derived from the data about which measure of exposure col-selates best with the biologically relevant exposure, implying the type of agent and exposure measure (eg, peak exposure long-term average exposure, or cumulative exposure), since the biologically relevant exposure was not known. In our study long-term average exposure was regarded to be the biologically relevant measure of exposure. A study among Dutch animal feed workers supports this idea (31). In that study similar procedures and techniques were applied for lung function testing and exposure measurements. The average exposure level of animal feed workers was 67 ng . m-3 (arithmetic mean) in the highest exposure group, which was a factor two lower than the average exposure level of 130 ng . m-3 among the pig farmers. A statistically significant association of 122 ml for FEV,, with an increase in endotoxin exposure of 24.8 ng . m3 was observed in this population, which comprised 80% ever smokers. These associations compare very well with those between estimated long-term average exposure and lung function observed in our study. The difference of 97 ng . m-3 between the 10th and 90th percentile of the exposure distribution was almost a factor of 2.72, which corresponds to a reduction in lung function of 200 to 400 ml. In an earlier study among cotton workers, a much smaller effect of endotoxin exposure of 24 to 78 ml per 100 ng . m-3 in FEV,, was reported (32). The exposure-response relationships in our study indicated that modeled long-term average exposure probably correlates better with biologically relevant exposure than with the average of the measured exposure, despite the large propostion of about 65% of unexplained variance in the models. An indication supporting this observation can be found in the repeated monthly measurements among the subpopulation of six farmers. The number of valid exposure measurements per individual ranged from six to nine, and its average was considered to be the best estimate of long-term average exposure. The correlation (R) of the average of two exposure measurements in the summer and winter with this best estimate was 0.61, whereas the conrelation of the modeled long-term average exposure with the best estimate was 0.80.
The presented modeling strategy provides an instrument to minimize the effort in exposure sampling in occupational epidemiology, without apparent loss of statistical power. Without increasing exposure monitoring effort, it allows an increase in the number of exposure estimates. The central idea is that empirical models based on a limited number of measurements can be applied to estimate exposure in other cases with the use of simple measurable exposure surrogates in the models. Information on these surrogates can easily be obtained by interview or questionnaire. A possible application can be an increase of the study population in large cohort studies, if exposure is measured in a representative sample of the cohort and estimated for other cohort members. This application will especially be effective when small working units are dealt with, which are often encountered in agriculture or small industries. In these situations complex field work logistics lead to high costs per measurement. A second application of the strategy is increasing the number of exposure estimates per individual; this application can be applied in retrospective and prospective studies, as well as in cross-sectional studies, as was done in this study. It can result in a more accurate estimate of long-term average exposure than when few actual exposure measurements are performed. This application can also be very effective in work situations in which the large day-to-day variability in exposure levels results in a large value of h. Recently, it was shown that large day-to-day variability can be expected for specific groups of workers, such as those working outdoors, those in places without local exhaust ventilation, those with an intermittent production process or with a local source of exposure, and among mobile workers (25). On the basis of this study it seems that the gain in information on exposure through the application of empirical models can by far outweigh the loss of information due to unmeasured factors affecting exposure.