Sensitization and chronic beryllium disease at a primary manufacturing facility, part 2: validation of historical exposures

Sensitization and chronic beryllium disease at a primary manufacturing facility, part 2: validation of historical exposures. Scand J Work Environ 2012;38(3): 259–269. Objective The aim of this study was to evaluate the validity of a job exposure matrix (JEM) constructed for the period 1994–1999. Historical exposure estimates (HEE) for the JEM were constructed for all job and year combinations by applying temporal factors reflecting annual change in area air measurements (1994–1998) to the personal baseline exposure estimates (BEE) collected in 1999. The JEM was generated for an epidemiologic study to examine quantitative exposure–response relationships with sensitization and chronic beryllium disease. Methods The validity of the BEE and HEE was evaluated by comparing them with a validation dataset of independently collected personal beryllium exposure measurements from 1999 and 1994–1998, respectively. Agreement between the JEM and validation data was assessed using relative bias and concordance correlation coefficients (CCC). Results The BEE and HEE overestimated the measured exposures in their respective validation datasets by 8% and 6%, respectively. The CCC reflecting the deviation of the fitted line from the concordance line, showed good agreement for both BEE (CCC=0.80) and HEE (CCC=0.72). Proportional difference did not change with exposure levels or by process area and year. Overall, the agreement between the JEM and validation estimates (from combined HEE and BEE) was high (CCC=0.77). Conclusions This study demonstrated that the reconstructed beryllium exposures at a manufacturing facility were reliable and can be used in epidemiologic studies.

Exposure to beryllium has been known to lead to beryllium sensitization (BeS) and cause chronic beryllium disease (CBD) (1). However, exposure-response relationships for BeS and CBD have been inconsistent (2,3). Possible reasons for this inconsistency include lack of accurate and precise estimates of historical exposure leading to exposure misclassification, lack of biologically relevant exposure indices and summary measures, different bioavailability among the various forms of beryllium, exclusion of the skin as a route of exposure for sensitization, and lack of consideration of the impact of dose rate and genetic susceptibility (4). Few epidemiologic studies of BeS and CBD have utilized quantitative exposure data (3,(5)(6)(7)(8)(9)(10). In these studies, exposure measurements were collected using different approaches including shortduration breathing zone samples, general area air samples, daily-weighted average measurements (a combination of the breathing zone and general area samples with activity-time data), 37 mm closed-face cassettes "total" samples, size-fractionated impactor samples, and fixed air-head area samples. The daily-weighted average has also been used in other studies of beryllium exposures or for evaluating health outcomes such as cancer or changes in pulmonary function (11)(12)(13)(14). In all these studies, estimates of past exposures have been based on historical daily-weighted average, area air sampling data, or on the assumption that past exposures are not different from current exposures. Furthermore, none of the epidemiologic studies that have utilized historical exposure data have validated their exposure estimates.
Validation of reconstructed exposures as part of epidemiologic studies is an important process that ensures the reliability of the exposure estimates and provides a degree of confidence in the resulting exposure-response relationships. Hornung et al (15) used internal validation to verify the reliability of estimates reconstructed from an exposure model by splitting the data into model building and validation datasets. However, internal validation is rarely conducted often due to lack of adequate sample size. Many exposure reconstruction studies have used external datasets such as published literature (16)(17)(18) or additional exposure data from the same or different workplaces (19)(20)(21) for validation. In instances where validation using external data is not feasible, re-sampling using Monte Carlo methods has been used (22).
A variety of statistical methods have been used to evaluate the agreement between the reconstructed (or predicted) and measured (or validation) exposures. The Bland Altman plots of differences (or Tukey mean-difference plots) are commonly employed to assess agreement between two measurement methods by plotting the difference of two measurements against their mean, as well as the mean difference (or bias) and the 95% confidence interval (95% CI) (23). The plots of differences provide valuable graphical representation of the association between two measurement methods. Absolute agreement involves two parameters: bias (distance from the unity line) and precision (variation of bias). Hornung (24) proposed the mean of individual differences between the measured and predicted values as a measure of bias and the standard deviation of individual differences as a measure of precision. A measure of "accuracy" was also proposed that reflected absolute agreement and combined the measures of bias and precision as a square root of sum of the squared bias and precision. Many validation studies have used these parameters as criteria to determine the validity of their reconstructed exposure estimates (15-17, 19, 20). However, these parameters are unscaled (ie, without fixed boundaries/ranges for measuring performance) and are calculated based on the absolute scale of the data, which makes it difficult to interpret the degree of agreement or compare across studies, among different workplaces or settings, or from different exposure ranges. The concordance correlation coefficient (CCC) provides information on the scaled agreement and is often used to assess agreement in the field of clinical sciences and statistics (25). The CCC has two components, namely, precision and accuracy coefficients and is calculated as a product of these components. The precision coefficient measures the deviation of each pair from the best-fit line and is equal to Pearson correlation coefficient (r p ); accuracy coefficient is a measure of how well the fitted line coincides with the unity line (26,27). The CCC is especially appealing as it can be separated into its components of accuracy and precision and provides good information on the source of disagreement (28,29).
The objective of this study was to evaluate a job exposure matrix (JEM) developed for an epidemiologic study at a beryllium manufacturing facility by validating the historical exposure estimates using the CCC. Measures of bias were also calculated to allow comparisons with the results published in the literature.

Methods
In 1999, a sub-cohort of 264 short-term workers hired after 1 January 1994 at a beryllium manufacturing facility was surveyed to assess the relationship between beryllium exposure and BeS and CBD (30). A JEM for the period 1994-1999 was constructed for personal total exposure (from 37 mm closed-face cassettes) as well as respirable and submicron (<1 µm) particle beryllium mass concentrations (from personal impactor samplers). Only the JEM for the personal total beryllium measurements was validated in our study because additional personal total measurements were available for the period 1994-1999 from exposure surveillance monitoring. These historical exposure data were too few to form the basis for the historical reconstruction; however, they could be used for the validation of reconstructed exposures.

Summary of the processes and jobs at the plant
A simplified schematic diagram of the manufacturing processes used in the production of beryllium oxide, beryllium metal, and beryllium alloy is shown in figure  1, details of which have been previously described (31). Briefly, jobs associated with the main product lines (ie, beryllium oxide, beryllium metal, and beryllium alloy) were classified as production jobs. Other production jobs were classified as miscellaneous production group and included the resource recovery process, quality assurance and quality control, and research and development jobs. Jobs classified as production support included laundry, decontamination, and waste-water treatment. Non-production jobs included facility maintenance, electricians, and administrative staff. Altogether, 24 process areas and 269 jobs were defined for exposure reconstruction based on the processes and jobs described above.

Exposure reconstruction method and creation of JEM
A brief description of the exposure reconstruction process is provided here, details of which are presented in Park et al the companion paper (4). For the validation, historical exposures were reconstructed for the period 1994-1999 for all 269 jobs present at the facility in 1999 (not just the 89 job groups in the short-term workers' cohort). We selected all jobs for the validation study because (i) the same data and exposure reconstruction approach will be utilized in multiple epidemiologic studies of short-and long-term workers whose work tenure include more jobs, and (ii) this study validates the approach taken to reconstruct exposures, not just the jobs in the short-term workers cohort. A dataset of the plant-wide total beryllium exposure survey conducted in 1999 was used to obtain mean exposures of all jobs, which served as baseline exposure estimates (BEE job, 1999 ) for the JEM construction. These survey data consisted of repeated measurements of personal full-shift (N=4026) samples collected over a two-month period (June-August) from all jobs and most workers (89%) at the facility. Historical general area air samples (N=76 349 measurements for the years 1994-1999) collected as part of the company's process surveillance monitoring from fixed locations within the 24 process areas across all production and selected non-production units were used to calculate the temporal factors (TF process area, year ). A large fraction (39.5%) of the general area samples were below the limit of detection (LOD) and thus the TF were estimated using the Tobit regression model, which uses the maximum likelihood estimate (MLE) method. The TF reflected the change in exposure levels in process areas for any given year (1994)(1995)(1996)(1997)(1998)  The HEE and the BEE are the minimum variance unbiased estimator (MVUE) of the arithmetic means for jobs in the JEM which is the desired measure of arithmetic means when data are lognormal. A considerable fraction (17%) of the total beryllium exposure survey data were below the LOD and varied by jobs. Thus the MVUE arithmetic means was calculated (32) using the mean and standard deviation of log-transformed exposures obtained using the MLE method to account for the measurement data below the LOD (4). The BEE and HEE were evaluated by comparing them to the MVUE arithmetic means exposures for jobs from a surveillance (validation) dataset of personal total samples collected for the period 1994-1999 and not used in the creation of the JEM as described in detail below.

Validation data sources
A validation dataset of 2177 personal total measurements collected from 1994-1999 was available. At this facility, surveillance monitoring using the personal total measurements was started in 1994, although far fewer samples were collected in the earlier compared to later years. The personal total samples were also collected throughout 1999 as part of the routine surveillance monitoring, separate from the 1999 sampling campaign that was conducted for all job titles (which formed the basis for the BEE). These surveillance data had little documentation and thus underwent several quality assurance checks summarized in figure 2. The purpose of collecting these data likely varied and included identifying problems, evaluating the efficacy of controls, and monitoring worker exposure; thus only samples taken for >6 hours that likely represented workers' full-shift exposures were selected for validation dataset (N=695). The next step was to match these data to the jobs in the JEM. The personal total samples in the validation data were coded by a combination of location (process area) and a specific operation code. Ideally, each exposure code would have matched one of 269 job codes in the JEM. However, only 12.5% (87 out of 695 measurements) of the exposure codes matched the job codes in the JEM; the remainder of the validation measurements had location codes, but the specific operation code was missing. The operations codes were missing because the job coding scheme was not fully developed until 1999, and the samples from earlier years were not recoded to reflect the final coding scheme. The dataset with confirmed job codes (N=87) will be referred to henceforth as partial validation data and represents data with the most accurate record of job title. Job codes were retrospectively coded for measurements that lacked a specific operation code using information on the sampling date, name of the worker being monitored, and the worker's job title recorded in the company records or the work history records available for the epidemiologic study. Using this approach, 345 additional personal total measurements were assigned a job code, bringing the validation dataset to 432 (87+345), henceforth referred to as full validation data. Therefore, the full validation data consisted of both the confirmed and assigned job codes. These surveillance data formed the validation datasets against which the BEE and HEE in the JEM were compared. Figure 3 provides an overview of the exposure reconstruction and validation process. The JEM was evaluated at two levels and included the evaluation of BEE and HEE. The BEE were compared to the 1999 validation data to assess the consistency between the two datasets and evaluate for potential bias in the BEE due to seasonal effects resulting from the short sampling campaign of the comprehensive survey (during summer months). The HEE were compared to the 1994-1998 validation data, which are thought to be more representative of the historical exposures than the reconstructed (calculated) estimates, thus they were considered as the target ("true") values in validation process. Validation of the BEE was accomplished using both validation datasets, namely, partial and full validation data, to identify the most reasonable dataset for further validity test. The validity of the HEE was assessed using only the full validation data, as this dataset had larger sample size, included more job codes, and provided marginally better agreement statistics than the partial validation dataset for the BEE. A large fraction of the validation data were below the LOD, and the MVUE arithmetic mean of jobs in the validation datasets was calculated as previously described using the mean and standard deviation of log-transformed exposures obtained using a MLE (4). The MVUE arithmetic mean of jobs in the validation datasets were then directly compared to the MVUE arithmetic mean for jobs in the JEM. All means in this study are MVUE arithmetic means, and will henceforth be referred to simply as means.

Validation process
For validation, bias was first calculated as the mean differences between reconstructed and validation values on a logarithmic scale (represented by bias in log e ) (24),

Full validation data
Park et al which was then used to calculate equivalent bias on a linear scale as described in equation 1 (20). created in SigmaPlot 11.0 (Systat Software Inc, San Jose, CA, USA). Proportional difference was calculated as the difference between the JEM and validation estimates divided by the average of JEM and validation and was plotted on the y-axis against the average of the two measurements on the x-axis. The LIFEREG procedure in SAS was used to calculate the mean and variance of the log-transformed personal total measurements in the validation dataset to account for measurements below the LOD (35). The CCC was calculated in SAS according to the method of moments described by Lin (27), using a SAS macro made available by Lin et al (36). Analysis of variance (ANOVA) tests were conducted to examine for patterns of differences between the JEM and validation job means by process areas and by year. Weighted kappa (κ) and percent concordance were calculated to assess agreement between the reconstructed and the validation exposure categories (ie, low, medium, and high). Table 1 presents the number of personal total measurements and job codes by sampling year in the two validation datasets. The validation datasets consisted of 34 jobs from the partial validation dataset and 65 jobs from the full validation dataset and represented a small portion of the 269 jobs in the facility. The 269 jobs included 132 and 137 jobs in the production and non-production areas of the plant, respectively. Thus, among jobs in production areas, 25% (33/132) of job codes in the partial validation dataset and 42% (56/132) of job codes in the full validation dataset could be validated while, among non-production jobs only, 1% (1/137) of job codes in the partial validation dataset and 7% (9/137) of job codes in the full validation dataset could be validated. However, of the 89 job groups present in the short-term workers' work histories, 42 jobs were also in the full validation dataset, thus the  where bias in log e =[log e predicted mean -log e measured mean]. Then, relative bias was calculated as a ratio of equivalent bias and measured value, converted to a percent as shown in equation 2.

Results
Epidemiologic studies often utilize exposure categories in their exposure-response model. To evaluate the validity of one potential exposure categorization scheme based on beryllium occupational exposure limits of 2.0 (33) and 0.2 (34), the estimates in the JEM (HEE and BEE) as well as in the validation dataset were categorized into three exposure groups; high (>2 µg/m 3 ); medium (2-0.2 µg/m 3 ); or low (<0.2 µg/m 3 ). Agreement between the JEM and validation categories was then evaluated as percent of measurements in the concordant cells. In addition, the overall agreement between the JEM (combined BEE and HEE) and the validation estimates was evaluated by generating a scatter plot and calculating quantitative agreement statistics.

Statistical analysis
All statistical analyses were conducted using SAS 9.2 (SAS Institute Inc, Cary, NC, USA). The distribution of the personal total samples was evaluated graphically using probability plots. To assess the degree of agreement between the means of jobs in the JEM and the validation dataset, scatter plots and proportional difference plots (a variation of the Bland Altman plots) were

Relative bias (%)
Beryllium exposure validation validation represents 47% of the jobs held by the study participants. Table 1 further illustrates that fewer jobs (and measurements) could be evaluated in the earlier than latter years. The full validation dataset provides a better representation of jobs in a given year compared to the partial validation dataset. The percent of the measurements less than the LOD were 60% and 33% for the partial validation and full validation datasets, respectively. There were many more production and high exposure jobs in the full validation dataset than the partial validation dataset.

Validation 1: validity of BEE
For the BEE (1999 data only), 24 jobs in the partial validation dataset and 47 jobs in the full validation dataset were available to validate the estimates for jobs in the JEM for 1999. The means in the JEM for jobs were plotted against the means for the corresponding (matched) jobs in the two validation datasets and showed good overall agreement (CCC~0.8) ( figure 4).
The precision coefficients suggest that better agreement was achieved in the full compared to the partial validation dataset. Thus the full validation dataset was determined to be the optimum dataset to assess agreement and was used for all subsequent analyses. Figure  5 shows the distribution of the differences around the zero line (no difference or bias); a small positive bias on average was observed but no patterns by different process areas were observed based on the ANOVA results (P=0.09).The median value of equivalent bias for the BEE (1999) was 0.01 µg/m 3 and overall the BEE overestimated the validation estimates by 8% (last row in table 2).

Validation 2: validity of HEE
For HEE, 65 job-year combinations from the full validation dataset were available from the period 1994-1998 to validate the estimates for jobs in the JEM for the same period. The comparison of reconstructed exposure estimates with the validation dataset is presented in table 2 and plotted in figure 6. Overall equivalent bias and relative bias of HEE were 0.04 µg/m 3 and 6.2%, respectively (first row in table 2). Although overall the HEE were found to slightly overestimate beryllium exposure (compared to the validation data), in 1994 and 1997-1998, the HEE underestimated exposure (by approximately -47% to -5%). The 1994 data are based on very few measurements (N=4) and only two jobs. Relative bias was the highest in 1994 and lowest in 1998. The scatter plots and proportional difference plots by process area and sampling year are presented in figure 6(a) and (b), respectively. The scatter plots (plots on the left side) show the spread of the data around the unity line and the plots in the right side show the proportional difference. The CCC, which reflects the deviation of the fitted line from the concordance line, was 0.72 and influenced by a lack of precision (precision coefficient=0.72) rather than a lack of accuracy (accuracy coefficient=0.99). The plots of proportional difference showed a small positive bias and no pattern in the difference by process area or year was observed. This absence of pattern was further corroborated by the ANOVA test that showed no significant pattern in proportional difference by process areas or by sampling year (P=0.11 and 0.75, respectively). Overall, the agreement between the JEM and validation estimates (from combined HEE and BEE) was high (CCC=0.77). In evaluating the exposure categories   Figure  7 presents the exposure grouping into the 3 categories by reconstructed and validation categories. The percent concordance (ie, the percent of reconstructed exposure jobs correctly assigned to the corresponding validation category) was 63% (72/114), and the proportions varied by category with better performance in the low (76%, 34/45) and medium (59%, 30/51) categories than in the high category (44%, 8/18). All the misclassifications were just one category away from the concordance cell with no observations in the cells two categories away from the concordance cell.

Discussion
The utility of occupational epidemiology studies in risk assessment in part depends on the quality (including amount, validity, and precision) of exposure data.
Validation of reconstructed historical exposure data is a critical and integral component of epidemiological studies, as the reliability of the study results depend on the validity of the exposure data (37). However, only a few epidemiologic studies have dealt with validity of reconstructed exposure estimates using independently collected measurement data (37,38). In historical studies of BeS and CBD, inconsistent exposure-response relationships were observed when using the dailyweighted average or general area measurements as airborne exposure metrics. We reconstructed personal beryllium exposure estimates for an epidemiologic study at a beryllium manufacturing facility (4) and validated those estimates. Although reconstructed beryllium exposure tended to be slightly higher than the measured mean, it is important to note that the overestimation of reconstructed exposures would lead to underestimated risk per unit of exposure (16,39). Furthermore, any biases in reconstructed exposure estimates in this study were non-differential (unrelated to outcome status), which would attenuate the exposure-response relationship, but the magnitude of the attenuation cannot be estimated (39). The precision coefficient of the CCC (r p ) between the JEM estimates and the validation measurements was relatively strong (r p =0.72) which may lead to little attenuation of the exposure-disease relationship. The observed relative bias and precision coefficient of exposure estimates created in our study are comparable to those reported in other occupational exposure reconstruction studies. Astrakianakis et al (19) constructed retrospective exposure models based on data from 503 factories in the Chinese textile industry. They evaluated the validity of the reconstructed estimates from a model with external validation datasets from two factories that were not used for model building. They reported low relative bias (mean -2%, range -30~118% by process) with moderate correlation (r p =0.59).    specific dusts from 13 saw mills to generate HEE; the relative biases were -1% and -12%, respectively, with strong correlations (r p =0.70 and 0.79) for wood and non-specific dusts. These study findings are similar to our results that also showed low overall bias and moderate-to-strong precision. Stewart et al (37) developed HEE for a mortality study among acrylonitrile workers using a deterministic model incorporating four exposure determinants; predicted exposures overestimated measured mean values by 17%. In contrast, Hornung et al (15) reported an overall relative bias of -24% in a model used to predict ethylene oxide exposures, when compared to an internal validation dataset. Burstyn et al (16) used external datasets obtained from the different countries to validate exposure estimates from a model for benzo(a)pyrene in asphalt paving. They found weak correlation (r p =0.28) and high relative bias (approximately -70% to -50%). Glass et al (18)   quantitative agreement statistics were not reported, the authors used 95% CI of the estimates and the validation data to judge agreement. They reported that the validation data generally confirmed the baseline and the modifier estimates used in the algorithm. However, when the baseline estimates were significantly different from the validation values, they adjusted their baseline estimates by averaging them with the validation values. In our study, exposures were reconstructed over a relatively short time period (1994)(1995)(1996)(1997)(1998)(1999), during which exposure may not be expected to change significantly. However, this period immediately followed an epidemiologic survey (conducted in 1993-1994) that documented elevated prevalence of BeS and CBD (5). In response to those findings, the company made changes to the plant and production processes during the period 1996-1999 (41). Table 3 shows the TF and description of the major process changes by process areas and sampling year that could potentially impact beryllium exposures. The process change information was obtained from technical history of the plant based on information provided by the company in monthly reports, process descriptions, and interviews with technical staff (process engineers and industrial hygienists) and long-term employees. The time trend analysis suggests that the process changes such as enclosures and engineering controls corresponded to TF in some process areas. For example, from 1995-1999, the TF decreased from 3.2 to 1 in the pebbles plant. During the same period, plant history records indicated installation of new ventilation and enclosure of some processes. In other areas, it was not feasible to examine the effect of process changes on TF, for example, in the beryllium machining, some alloy production, and all non-production areas due to the high percentage of measurements below the LOD and/ or small sample size. Finally, in some work areas, the TF did not correspond to recorded changes in process or lack thereof. For example, the Whiting furnace area showed considerable reduction in exposure over time but no significant process change was documented either in the monthly reports or noted during interviews with employees. Nevertheless, the HEE, which are a combi-nation of the BEE with TF, showed good agreement with the validation data.
A potential limitation of this study may arise from the amount and quality of the validation data. Personal beryllium sampling data used to test validity were assumed as "true exposure" in the job-year combination. However, a limited number of samples were included in the evaluation. The small number of measurements used for calculation of job means may lead to low precision in estimates. In addition, limited jobs across the facility in each year were available; thus the validation result may not be representative of the entire JEM. For example, 17 jobs from 143 measurement data were available in 1997 and comparatively low relative bias was also observed in the same year. In addition, there was little documentation describing the validation data; particularly, information was lacking on the purpose of each sample and specific job codes sampled. The purpose of a sampling strategy will differ depending on the goal (eg, compliance) and will likely influence exposure measurements (37). Finally, Virji et al (4) assumed that the time trends in general area air sampling data represented the time trends in personal exposures. However, exposure factors causing time trends in area sampling might be different than those affecting personal sampling. For example, table 4 does not include information on change in administrative controls which may have impact only on personal beryllium exposure but not area sampling. Nevertheless, these limitations are common to most studies relying on historical information. Despite these limitations, the availability of data from surveillance monitoring allowed validation of a JEM based on estimate of personal measurements. Thus the validated JEM offers a degree of confidence in the JEM estimates and the resulting exposure-response relation between beryllium exposure and BeS and CBD.

Ethics
Michael S Kent is employed by Materion Brush Inc, the owner of the beryllium manufacturing facility described in this report.