Use of plant- and period-specific job-exposure matrices in studies on occupational cancer.

KAUPPINENT, PARTANEN T. Use of plant- and period-specific job-exposure matrices in studies on occupationalcancer. Scand J Work Environ Health 14(1988)161-167. Job-exposure matrices werecon structed and applied in the estimation of past exposures in a case-referent study nested within a cohort of Finnish woodworkers. The objectivewasto avoidbias in the risk estimates because of a misclassifica tion of exposures. The matrices were constructedseparately for each plant under study and each calendar year of follow-up. The level of exposure was incorporated in the matrices, since rather comprehensive data on exposureswereavailable. The individualwork histories were converted to exposure histories by a computer program whichcalculated several exposureindicators (eg, level and dose, with and without allowance for a latencyperiod). Thecomparison betweenseveral indicatorswas thought to provideaddi tional information for the final evaluation of results. The use of the plant- and period-specific job ex posure matrices may be considered in cohort and nested case-referent studies on occupational hazards asan alternativeto otherprocedures usedintheestimation of exposures. Specific matrices mayfindbroader applicability along with the increasing availability of detailed hygienic data.

also in other count ries even though they have not been constructed in the form of a matrix. Examples of these are the Canadian data ba sed on interviews of ca ncer patient s in a large case -referent study, into which 275 (init ially 172) chemical agents ha ve been incorporated (10), and the Finni sh register of employees occupationally expos ed to 162 (initially 50) carcinogens (3).
The main fun ction of a job-exposure matrix is to pro vide inform ation about the connections between expo sure s and diseases through the linkage o f the jo b titles with the exposure s in a systematic, unb iased way . Job-expo sure matrices can be used as a resear ch tool in the generat ion, and sometimes in the testing, of hypo the ses in register-based or other epidemiologic studies. The y may also be used as an instrument of preventive action because they include information about the occupational groups exposed to known hazardous agents.
One of the main problems of job-exposure matrices is the miscla ssification of exposures, which introduces bias -in man y cases a negative one -in the measures of occurrence relation (5). The source of misclassification may be incomplete specificity or sensitivity of the expo sure assessment, or both. Low specificity result s particularly when the job title classes are broad and a considera ble number of the workers classified as expo sed is in fact une xposed to th e agent because of job-, time-, or plant-specific fac tors. The omi ssion of a lat ency period may also lead to loss of specificity, as recen t expo sure s will be recorded as exposur es -even though their importance in the initiation of the ob served ca ncers ma y be limited. Low sensitivity result s if exposures remain unidentified in th e matrix. If complete occup ational histories of the subjects are not taken into account, some jobs -and con sequently Table 1. Classification of formaldehyde exposure.
technique described in this report. The results of the study have been published in detail elsewhere (14,19).

Relevant exposures and their classification
All common exposures in the industries under study were identified through a survey of chemicals used and through knowledge of the formation of some air contaminants in the manufacturing processes. Exposures to known or suspected carcinogens among the study population were assessed; these included exposure to formaldehyde, wood dust, chlorophenols, engine exhaust, and certain pesticides. Terpenes and other heating products of coniferous woods, phenol, caseinalbumin glues, melamine glues, and solvents were also included in the matrices. The exposure data for bis(chloromethyl)ether and biological air contaminants (eg, fungal spores) did not permit an accurate evaluation of exposure; they were therefore included in the matrix as potential exposures, assessed mainly by "educated guesses." The total number of exposures in the matrices thus became twelve.
An exact level-and dose-based definition of "exposure" was judged necessary -especially for formaldehyde -in order to minimize the misclassification of exposure due to nonoccupational factors. For instance, the indoor air of dwellings may be an important source of exposure to formaldehyde. The minimum criterion of exposure to formaldehyde was therefore set at a dose of 3 ppm-months and a level of 0.1 ppm, on the basis of estimated exposure in dwellings. For the remaining compounds, one month of exposure at a level clearly exceeding the nonoccupational background level was required.
In addition to these criteria, further characteristics of exposure were incorporated into the construction of the exposure classifications. A separate indicator was assigned to agents attached to a "carrier," such as wood dust, because the carrier may affect the risk by changing the distribution of the chemical agent in the respiratory tract. An additional indicator was defined for the peak exposure to formaldehyde, because this may be a specific risk factor, as suggested by animal experiments in which a high risk of nasal cancer was found only for very high formaldehyde concentrations (2,15). Subjects with uncertain exposures were excluded from the analysis in order to reduce the misclassification of exposures. The number of categories of exposure level was decided upon after an evaluation of the available industrial hygienic data. These data allowed the division of the exposure level for formaldehyde, wood dust, and chlorophenols into the category of nonexposure and three levels of exposure. For the remaining agents, one or two categories of exposure were considered sufficient. The classification of formaldehyde exposure is shown as an example in table I.

Study design
The respiratory cancers incident between 1957through 1980 among a retrospective entry cohort of 3 805 male workers in Finnish particleboard, plywood, and sawmill industries were identified. The admissibility criteria for the cohort were at least one year of work at the selected plants between 1944and 1965and year of birth 1900or later. The list of eligibleworkers from 19 plants was linked with the data of the Finnish Cancer Register. Fifty-seven verified cases of respiratory cancer were identified. Three referents matched by year of birth were selected for each case from the cohort. The referents had to be alive at the time of the diagnosis of the corresponding case. The exposures of the cases and the referents were registered for formaldehyde, wood dust, chlorophenols, terpenes, pesticides, and some other exposures with the job-exposure matrix exposures -may be missed, and the sensitivity of exposure assessment is reduced. The problems involved in the misclassification of exposures in epidemiologic studies have been treated in several articles (6,8,11,12,17,24).
The present article describes an application of jobexposure matrices in a case-referent study on respiratory cancer nested within a cohort of woodworkers. For the purpose of increasing sensitivity and specificity, two additional dimensions, calendar time and plant, were incorporated into the matrices. A further goal was to reduce information bias by excluding the subjective elements in the determination of the 12 exposures under study. A fairly similar procedure has been recently used also in the context of a cohort study among workers exposed to formaldehyde (22).

Relevant jobs and their classification
The general occupational classifications available were not sufficiently detailed for the purposes of the study. Specific and sensitive determination of the exposures require that the occupational categories be internally homogeneous in regard to the exposures. The manufacturing processes of the industries under study were therefore divided into homogeneous "exposure zones" (7), and the job titles were matched to these zones. This procedure resulted in 73 job categories. Some mills were visited in order to insure that the old job titles were included and that the classification compared approximately with that in the mill records so that the occupational histories could be coded without difficulty according to the constructed classification system. 17   Code  of  job   210  01  01  02  07  09  11  15  17  18   0755  0174  0755  0767  0760  0144  0771  0755  0755   1273  0181  0181  0181  0181  0657  0181  0181

Content oj the job-exposure matrices
After the basic dimensions of the matrices -the jobs and the exposures -had been defined, the next and the most laborious task was to fill in the "cells" of the matrices. Several sources of information were used. The results of previous hygienic measurements were collected from the mills and from the archives of the Finnish Institute of Occupational Health. The results were criticallyevaluated because the exposure data may become biased if, for example, the purpose of the measurements, the measuring methods used, the conditions during the measurements, or the sampling strategy are disregarded (20,25). Current hygienicdata were used in the estimation of the recent level of exposure and as an indication of the variability of exposures among different mills. A comparison of the present and past data also showed whether any prominent changes in the exposures had taken place over time.
Next, hygienicmeasurements were made for the jobs with missing or unreliable data. They were carried out in two plywood mills, one particleboard mill, and one sawmill. The preliminary job-exposure matrices were prepared on the basis of the total body of material provided by past and present measurements.
After the occupational histories of the cases and the referents had been collected, the preliminary matrices were checked at the mills. Only those job titles appearing in the occupational histories of the cases and the referents were checked. The case/referent status was blinded from the occupational hygienist (TK), who had only a list of jobs held by the cases and referents. The onset and possible termination of the exposures , the exposure levels, and the changes in the exposure levels over time were registered. The evaluation of the changes over time was based on factors considered to influence the exposures, such as changes in the raw materials used, chemicals used, ventilation, and use of respirators. Information was obtained mainly in interviews with senior foremen and older workers and from surveys of the mills. Histories of the mills, old layouts, and photographs were also used. An example of a checked matrix element is shown in table 2.
The accuracy of the results does not depend only on the quality of the matrices, but on the occupational history data as well. These data were collected from three independent sources. First, the mill records often provided accurate information of the dates of entry and termination of employment of the worker in different occupations, but in many cases the descriptions of the jobs were not sufficiently accurate. Second, interviews of senior foremen and co-workers resulted in accurate descriptions of the jobs, but the dates were inaccurately recalled . Third, a questionnaire was mailed to the persons themselves (if alive) or to the next-of-kin. This information was used for reviewing the entire work history and for obtaining information on smoking habits. The occupational history was reconstructed as a combination of information from the three sources. Whenever contradictions between the different sources were indicated, the source likely to be the most reliable was chosen as the basis for the coding.

Exposure indicators
A computer program was constructed which linked the occupational histories (for the 19 target mills) with the plant-and period-specific matrices and calculated the exposure data for the cases and referents. The cases and referents were classified according to the level of exposure reg, in parts per million (ppm)], duration of exposure (in years), and dose (product of the estimated level of exposure and duration of exposure, eg, in ppm-years). The calculated odds ratios were adjusted for the duration of tobacco smoking and the survival status. The computer program also allowed the use of a minimum criterion for the latency period in the calculations. This criterion was set at 10 years, but the calculations were also made without the latency period. Typical sets of results obtained with these procedures are shown in table 3. Table 3. Odd s rati o (OR) esti mates and 90 % confi dence in tervals (90 % e l) for resp irato ry can cer acco rdi ng to diff erent indicato rs of expo sur e to form aldehyde. A ll t he odd s rati os have been adjusted by st ratif ic ation for surv ival status. Data from Part anen et al (19).

Discussion
Th e descr ibed procedure is an exa mple of the way plan t-and period -specific job-exposu re matrices may be constructed and applied in an epidemiologic study. In regard to the appli cability of the matri x approach, the nested case-referent design has some advantages over the trad itional cohort and population-based casereferent designs. Th e important features of a valid jo bexpos ure matri x -high specificity and sensitivityare easier to obtain with the nested case-referent design because only a fairly sma ll number of plants, jo b titles, expos ures, and occupationa l histor ies need to be consid ered. To be sure, this approach is applicable also in cohor t studies because they are usually restricted to only o ne or a few exposures. Ho wever, the numb er of occupational histories to be collected and checked 164 is often much higher than in a nested case-refer ent study, where only the occup ational historie s of the cases and referents have to be scrutinized . Th e popul ation-based case-refere nt studies, on the other hand, have the disadvantage of embracing a wide spectrum of job titles and exposures. Thi s circumsta nce force s the researcher to use rather general, non specific classificat ions of job titles and exposures in the matr ix, which tends to increase the misclassificat ion of expo sures. In the nested case-referent design , high sensitivit y and specificity of the matrices can be achieved if the matrices are constru cted and checked for every plant and time period invol ved.
The main obje ctive of the use of job-exposure matrices in the present study was to avoid the misclassification of exposures, which, even if nondifferential, may have a major effect on the risk estimates . This effect can be demonstrated by the folIowing example . .
Let us assume that the exposure indicator is coded in a binar y fashion (no-yes) and , further, that misclassification of exposure is nondifferential, ie, the sensitivity and specificity of exposure assessment are independent on the case-referent status -a reasonable assumption for the present data, particularly if the analysis is done stratified according to the survival status, a likely source of nondifferential misclassification in this study .
In the simple general case of nondifferential misclassification in unstratified, unmatched data, the effect of incomplete sensitivity and specificity is bias toward s the nulI value of the odds ratio . The magnitude of the bias depend s not only on the sensitivity and specificity but also on the proportion of exposed person s in the base, estimated by a simple random sample of the base, ie, the re ferents.
In general, if only a low proportion of personssay, 0.05 or less -is exposed, even a minor deviance from perfect specificity (eg, from 1.00 to 0.95) results in a marked absolute bias toward the null value ( figure  1, left). The higher the proportion of the exposed , the greater the effect of sensitivity on the expected odds ratio , as ilIustrated in figure 1 (right), in which 50 0/ 0 of the referents is assumed to be exposed and the sensitivity of the exposure assessment is set at 0.9. On the assumption of perfect specificity, a correct expected odds ratio of 6 will, in this case, be biased into the value of slightly higher than 4. Loss in specificity further in-creases the bias, although not so drasticalIy as loss in sensitivity. For any fixed values of specificity, sensitivity, and proportion of exposed persons, the absolute bias increases as the odds ratio increases .
We back-calculated a few odds ratios derived from our data so as to correct for incomplete sensitivity and specificity of exposure assessment. The data was stratified by survival status (alive/deceased) . The adjusted odds ratios were then calculated with Gart's procedure (9). We assumed sensitivities and specificities to be between 0.8 and 1, depending on the exposure variate and survival status. For the uncorrected odds ratios between I and 1.5, only very minor changes took place, as expected. The greatest absolute change was in the odds ratio for formaldehyde with the provision for a lO-year latency period. On the assumption of a sensitivity of exposure assessment of 0.9 for the living and 0.8 for the deceased and a specificity of 0.95 for both, the original adjusted odds ratio increased from 1.44 to 1.56 after correction.
In the present study, the job-exposure matrix technique probably diminished differential misclassification by excluding subjective elements in the evaluation of exposures. However , the folIowing indicat ions of a negative bias were observed in the preliminary analysis of the data: the exposure-response relationships were often negative (the odds ratio for low exposure was higher than that for high exposure); some odd s ratios were below unity; and the provision for a latency period decreased many odds ratios. The most likely reason for these anomalies appeared to be the under- reporting of job tasks (consequently also underreporting of exposures) among the cases. This bias is however not related to the job-exposure matrix technique; it is a consequence of asymmetry in the accuracy of occupational histories. The study design did not include matching by survival status; the proportion of living subjects was therefore higher among the referents (67 070) than among the cases (5 %). This asymmetry led to a number of incomplete occupational histories, especially among the "old" cases. Negative exposureresponse relationships were to a large extent explained by the fact that remote exposures were often heavier than recent ones. This type of information bias can be corrected either by stratifying the data by survival status (as was done in the present study) or by matching the survival status in the study design phase. The likelihood of a nondifferential misclassification of exposures will also probably remain low if the plantand period-specific job-exposure matrices are constructed on the basis of jobs homogeneous in regard to exposures. However, should there still remain heterogeneity within some job categories, misclassification would result. A source of such misclassification is the possibility that some workers doing the same work in the same work area are not exposed because of their regular use of effective respirators. In that case, also the use of respirators should be incorporated in the construction of the job categories.
Another possible source of misclassification is the use of partial occupational histories. Complete occupational histories were elicited in the present study by a postal questionnaire, but jobs held outside the studied mills by the cases and referents were not used in the data analysis because it was too laborious to ascertain all possible exposures from the numerous workplaces reported. Partial omission of the occupational histories may thus have been a source of misclassification of exposure, but this bias was unlikely because additional (omitted) exposures to the studied chemical agents were evaluated, on the basis of the occupational histories collected, as being infrequent and rather evenly distributed among the cases and referents. However, some dose estimates (eg, for wood dust) may have been underestimated or inaccurate and, if so, would tend to level off exposure-response relationships (4).
Reliable methods of exposure assessment based on homogeneous exposure zones and a high quality of occupational histories are important prerequisites of a valid estimation of exposures. However, there is an additional problem to be considered in epidemiologic studies, ie, the choice among various indicators of exposure. An indicator which accurately measures the exposure of the target organ to the chemical agent under study and which includes adjustment for the latency period required is likely to be relevant. For formaldehyde, the most sensitive indicator might well be the estimated dose, with provision for a latency period. However, it may also be argued that some other indicator, for example, the duration of peak exposure 166 with provision for a latency period, is preferable. The use of indicators which do not include the time dimension (eg, the mean level of exposure) has also been preferred over indicators including time (eg, dose, duration of exposure) because the latter may lead to artificial similarity of exposures among the cases and referents. This bias is likely to appear when the design is matched by several time-dependent factors, such as year of birth, year of beginning employment, and duration of employment (21).
Another problem in the choice of exposure indicators appears to be how to "weight" temporal factors, such as remote and recent exposures, the time since the termination of exposure, and the age of the person during exposure. Moreover, the answers to these questions are likely to depend on the type of cancer and on the physicochemical and toxicologic properties of the exposures. (See, eg, reference 23.) Nevertheless, the approach of the present study was to construct several exposure indicators instead of just one. This approach provides information about different aspects of cancer risk, such as the exposure-response relationship and latency dependence, which can then be used for the evaluation of the plausibility of occupational exposures as etiologic factors.
In conclusion, plant-and period-specific jobexposure matrices based on homogeneous exposure zones may be recommended for consideration in nested case-referent and cohort studies on occupational hazards as an alternative to the other procedures used in the estimation of exposures. The matrices of the type used in this study are likely to decrease misclassification, particularly differential misclassification, of exposures. The described procedure also allows for the calculation and comparison of several indicators of exposure which may contribute to the assessment of the credibility of the findings. In the future, when more hygienic data will be available for epidemiologic studies and when smaller risks and risks confounded by multiple exposures will probably be studied to an increasing extent, this procedure may be a useful model to follow.