smoking in occupational cohort mortality studies.

Methods of for in occupational studies. J Health 10 (1984) 143-149. Cohort mortality studies, which compare the observed mortality of a historically exposed cohort to an expected mortality based on figures from a national population, are among the most common and useful types of studies done in occupational epidemiology. Such studies are primarily based on records which lack data on smoking. Hence it is often difficult to determine whether an observed excess of a given smoking-related disease is the result of occupational exposures, an excess consumption of cigarettes among the exposed, or an interaction between the two. The purpose of this paper is to outline the different types of control which may be exercised in cohort mortality studies to control for the effects of smoking. Indirect control, which uses only existing data, may be exercised through (i) analysis of other smoking-related causes of death besides the cause-of-death of primary interest, (ii) use of an internal reference group instead of the na- tional population, (iii) adjustment of the excess risk based on hypothetical smoking habits, or (iv) analysis for a dose-response relationship between occupational exposure and disease. Direct control, which re- quires interviews, may be exercised through (i) a survey of the smoking habits of currently employed cohort members, (ii) a survey of all cohort members, or (iii) a "nested" case-referent study within the cohort. The choice of which method to use depends on the cost and the degree of accuracy required by the investigator.

With no data on smoking in a cohort mortality study, it is often difficult to determine whether an observed excess of disease is a result of occupational exposures, an excess consumption of cigarettes in the exposed cohort, or an interaction between the two.
Occupational cohort mortality studies are often conducted using company or union records to assemble the cohort, which is then followed to determine vital status. These studies use a historical cohort design. The exposures have often occurred far in the past, and a large percentage of the cohort is no longer currently employed. The studies are usually done solely from records, which consist primarily of work histories, a list of who is alive and who is dead, and the causes of death for those who have died. Often the whereabouts of those who are still alive is unknown, and no effort is made to contact these individuals. Usually no information about the smoking habits of the cohort is available. An estimation of disease risk is usually made through a comparison of the mortality experience of the exposed cohort with the mortality experience of a large national population (such as Reprint requests to: Mr K Steenland, National Institute for Occupational Safety and Health, 4676 Columbia Parkway,Cincinnati,OH 45226,USA. that of the United States), although occasionally other nonexposed groups are used.
Many diseases are associated with smoking, and many of these same diseases are work-related. For example, lung cancer and nonmalignant respiratory disease are among the smoking-related chronic diseases which can be caused by occupational exposures.
If excess mortality in the cohort is observed for a cause of death which is known to be associated with smoking, it is important to attempt to control for the effects of smoking. There are several methods of control, both indirect methods which use existing records and direct methods which require interviews to obtain information on the smoking habits of the cohort. Most of these methods have been discussed at one point or another in the epidemiologic literature (4). Our purpose in this paper is to review all methods which have been used to date. Some would argue that control for smoking should be built into any study design from the outset, regardless of whether the disease of interest is known to be smoking-related or whether an excess of the disease of interest occurs in the exposed population. This argument is based on the assumption that almost all diseases are smoking-related and that a deficit of disease can also be due to smoking (for example a deficit of heart disease in a nitroglycerine plant). However, obtaining information about smok-ing for a retrospective study can be costly, and we would argue that in many cases it may be prudent to determine first if a disease known to be related to smoking is in excess before going to the expense of obtaining smoking information.

Indirect control using available records
Assuming the investigators are unable to obtain any direct information on the smoking habits of the cohort in question, there are still four types of indirect control that can be exercised using available records. They are (i) an analysis of other smoking-related diseases, (ii) use of an internal nonexposed cohort, (iii) an adjustment based on hypothesized differences in smoking habits for the exposed and nonexposed populations, and (iv) an analysis of the data for a dose-response relationship. Except for the use of a nonexposed cohort, all these methods involve little or no expense. However, they all lack the precision of direct control, for which data on smoking habits are obtained.

Other smoking-related causes of death
If a given excess risk for any particular cause is in fact due to excess smoking by the exposed cohort when compared to the national population (the reference group), then other smoking-related causes of death should also be elevated. Coronary heart disease, nonmalignant respiratory disease (bronchitis and emphysema), and cancers of the lung, larynx, pancreas, esophagus and bladder are among the diseases associ-U'eir & Dunn study (29), and the Doll & Peto study (9). Despite the fact that these data are not quite comparable due to different definitions of smokers (current or ever), different age categories, and different periods of follow-up, for most of these diseases the data are consistent from one study to another. Esophageal cancer is a n exception, perhaps due to the inclusion of pipe and cigar smokers in the nonsmoking group in the Weir & Dunn study, since the Dorn study showed cigar smokers to have an excess risk of 5.33, and pipe smokers an excess risk of 1.99, for esophageal cancer. Furthermore, for most of these diseases, a consistent dose response is apparent in relation to the amount smoked, a further confirmation that these diseases are in fact associated with smoking. If an exposed cohort has smoked substantially more than the national population, then all the smoking-related diseases should be elevated. It is clear from table 1 that the causes of death that are the most associated with smoking are lung cancer, emphysema, and laryngeal cancer. It is these diseases which perhaps can most easily serve as indirect indicators that a given excess mortality is due to smoking rather than to exposure to a toxic agent.
As an example of this technique, in a study of workers exposed to pesticides, an excess of lung cancer was apparent; yet there were no excesses for cardiovascular disease, cancers of the esophagus, kidney, or pancreas, or for emphysema (3). Hence the authors took the lack of elevation of smokingrelated causes to be indirect evidence that the excess of lung cancer was unlikely to be related to smoking. ated with smoking. In table 1 we list the standardized mortality ratios for these diseases for smokers in use of an internal reference group comparison to nonsmokers on the basis of four large Rather than using the national population as a cohort studies of smokers, the Dorn study of United comparison group, it is often advantageous to use States veterans (15), the Hammond study (13), the another reference group, such as a cohort from a similar socioeconomic class and from the same community. It is well known that smoking habits vary by social class (26). In addition there is evidence for the United States that there is some slight variation according to large geographic regions (5), although there are few published data on this question. Stronger evidence of geographic variation, controlling for social class, is available for England (8). For these reasons, a local nonexposed group, with the same income level as the exposed cohort, may have smoking habits that are more similar to those of the exposed group than those of the national population are. An example of this approach is a study of welders in a local union where there was a 30 % excess rate of lung cancer mortality compared to that of the national population (2). With the use of nonwelders in the same local union as the reference population, a similar excess risk for the welders was observed. This finding could be taken as evidence that the original observed excess was not due to smokingassuming the nonwelders from the same local union shared the smoking habits of the welders.
However, there is evidence that smoking rates also vary for different occupations, even within the same socioeconomic categories (7,21,25,28). For example, in one study (25), the range of current smokers within a sample of blue-collar workers varied from 36 % (stationary engineers) to 72 % (roofers). Hence using another occupational group as a reference population may also have its pitfalls. It may be possible to avoid such potential pitfalls if the exposed cohort can be divided into low and high exposure groups. The low exposure group can then be considered as an internal comparison group of workers in the same occupation (see later discussion on dose response).

Adjustment based on hypothesized differences in smoking habits
Axelson has pointed out that an adjustment can be made under various assumptions about the smoking habits of the cohort, given that the relative risks of smokers for a variety of diseases are well known and the smoking habits of the reference population can be estimated (1). Numerous prospective epidemiologic studies, such as those shown in table 1, have provided relatively stable estimates of the excess risk smokers incur for a number of causes of death. Furthermore several surveys have provided estimates of the proportion of smokers in the United States population. Axelson chooses an example which considers lung cancer. He assumes that the nonexposed population is composed of 50 Vo nonsmokers, 40 % moderate smokers, and 10 % heavy smokers, with relative risks of lung cancer of 1, 10, and 20, respectively. He then forms several hypotheses about the smoking habits of the exposed population (see table 2). For example, one such hypothesis is that the exposed cohort is composed of 30 Vo nonsmokers, 50 % moderate smokers, and 20 % heavy smokers. Under such a hypothesis, the exposed population would have a relative risk of 1.43 compared to the hypothetical nonexposed population, due to smoking alone. Then, if a study of an exposed population finds an excess risk on the order of 5-10 and if, in fact, the smoking habits of the exposed cohort differ by the hypothesized amount from the nonexposed, then one can conclude safely that smoking cannot account for more than a fraction of the risk found. The point made is that for the degree of risk often found for exposed populations, rather large deviations in the smoking habits of the exposed versus nonexposed populations would have to exist for smoking to explain the excess risk observed.
Clearly, these sorts of adjustments are crude. Lacking any direct information on the smoking habits of the exposed cohort, the investigator can only hypothesize such smoking habits and estimate what degree of relative risk might be explained by smoking.
A recent example of the use of this technique is provided by Suta & Thompson (27), who wished to determine the influence of smoking on a previous finding that automobile workers suffered a 32 % excess of lung cancer. Using data from the annual health interview surveys made by the National Center for Health Statistics in the United States, the authors obtained data on the smoking habits of both automobile workers and the general population in the period 1965-1980. These data indicate that automobile workers smoked more than the general population. Note that these data did not allow Suta & Thompson to know the smoking habits of the automobile workers in the particular study with which they were concerned, but only allowed them to hypothesize a difference between the smoking habits of the exposed group and the national population. The authors then adjusted the observed 32 % excess risk  (1). The percentages in the table represent the distribution of smokers among the exposed. Comparing the exposed with the nonexposed group composed of 50 % nonsmokers, 40 % moderate smokers, and downward, and estimated that, after smoking is taken into account, the excess risk declines to somewhere between 6 and 21 %, depending on how the adjustment is made.
The adjustments used by Suta & Thompson contain a measure of imprecision due to several factors, as pointed out by Silverstein et a1 (23) in an accompanying editorial. One is that the smoking rates for automobile workers obtained from the health interview surveys show an appreciable variation due to sample size limitations, and the differences observed between auto workers and the general population (7-10 % difference in the percentage of current smokers) were not statistically significant. Another is that, while it is possible to adjust for current smokers and those who have never smoked, an adjustment for former smokers is more difficult. The excess risk of former smokers varies according to the age when they quit, to how much they smoked before they quit, and to how long ago they quit. This information may be unavailable in surveys of smoking habits; for the automobile workers in the health interview survey it was not known how much they smoked before they quit. It is this difficulty in adjusting for former smokers which caused Suta & Thompson to give a range of estimates for the excess risk for auto workers after adjustment.

Dose-response analysis
It is customary in occupational cohort mortality studies, if an overall excess risk for a given cause is observed, to estimate different risks according to different dose categories. (Since data on dose is usually unavailable, duration of employment often serves as a surrogate for dose.) If a trend toward increasing risk is seen with increasing duration of employment, it is thought to indicate that in fact the excess risk is associated with exposure. This concept certainly has some validity. An observed dose-response relationship is frequently taken as one of the criteria for establishing a true association of an agent with a disease, in animal as well as in human studies. Such a dose-response relationship in a human epidemiologic study can be seen as a confirmation of a true association instead of as one spurious due to smoking. However, when duration of employment is used as the measure of dose, there will be a strong correlation with years of consumption of cigarettes. Since increased years of cigarette consumption are associated with increased risk for a number of diseases, an appropriate analysis would attempt to observe an increased risk with duration of employment stratified by age so that workers within the same age categories would be compared according to their duration of employment. Such a stratification for age is customarily done in standard lifetable programs used to calculate a standardized mortality ratio. An evaluation of the dose-response relationship through a comparison of standardized mor-tality ratios may then be a valid way of controlling for the effect of smoking, if one assumes that the smoking habits of the exposed cohort remained fairly constant over time or at least changed uniformly among dose categories over time. In other words, the stratification for age, within the different dose categories of the exposed cohort, must be assumed to stratify broadly on amount smoked also. Furthermore there must be no interactive effect between age (or cumulative years of smoking) and the exposure in question for the disease of interest, or else comparisons of indirectly standardized measures will be invalid.

Direct control using data on smoking obtained by interview
Direct control for the effects of smoking requires contacting members of the cohort themselves, or other individuals such as next-of-kin who can give reliable information about the smoking habits of cohort members. Such an effort usually increases the cost of the study and might be undertaken only if (i) an excess risk is found that is small enough to be in fact accounted for by smoking rather than by exposure and (ii) the much cheaper methods of indirect control are felt to be too imprecise.
It is usually impossible to contact all cohort members or their surrogates directly to determine smoking habits. Smoking data are usually collected through (i) a survey of a sample of currently employed cohort members, (ii) a survey of a sample of all cohort members, or (iii) a case-referent study of a given disease within the cohort and a survey of only cases and referents.
Generally the case-referent approach will be the most precise method of controlling for smoking. Cost considerations may play a role in the decision as to what kind of direct control should be exercised. The cost of obtaining smoking information depends on the number of people who are to be interviewed.

Survey of currently employed cohort tnet~lbers
Conducting a survey of a sample of currently employed cohort members has the advantage of being much easier than a survey of a sample of all cohort members, many of whom are no longer employed. However current workers may not be representative of all workers in the cohort. For example, persons who have smoked more and have had worse health may have left the workforce.
The smoking habits of currently employed workers may be compared with the smoking habits of the current national population of a similar age, which is available from national surveys. However, the national population includes many nonworkers, whose smoking habits may differ from those of the workers. As an alternative, it may be possible to compare the currently exposed with another group of workers who are current workers but are nonexposed.
A recent example of the use of this technique will serve as illustration. In a recent cohort mortality study of heavy equipment operators (1 I), the investigators wished to assess whether the cohort's smoking habits differed from those of the national population. A survey of 107 currently working men indicated that 25 Yo had never smoked. Data from the 1970 health interview survey, conducted by the National Center for Health Statistics in the United States, indicated that 31 9 ' 0 of men in the national population were nonsmokers. The authors conducted a chi-square test and found no significant differences in smoking habits between the cohort and the United States population and hence concluded that smoking was unlikely to have confounded their results. Another example of this approach is provided by Edling et a1 (lo), who conducted a survey of the smoking habits of current workers and then used an Axelson-type adjustment to correct estimates of excess risk of cardiovascular disease mortality.

Survey of past and current workers
A better method to assess the smoking habits of an exposed cohort is to survey a sample of the entire cohort. Such a survey involves tracing employees who are no longer working and who may have died or moved far away from the worksite. When the employees have died, it is necessary to interview a surrogate (such as their next-of-kin or co-workers).
The data which can be derived from interviewing surrogates will presumably be somewhat less reliable than the data obtained from cohort members themselves, but it should be relatively accurate as to whether a cohort member simply smoked or did not smoke. The literature on the validity of surrogate data for smoking is limited. One study (22) indicated that for about 2,000 British and Norwegian immigrants to the United States who died in the 1960s, next-of-kin showed 92 To agreement with the decedent's own information on being a regular smoker versus an occasional smoker or nonsmoker. However, when the comparison was made for more-detailed information about how much an individual smoked (never, occasionally, less than one pack, one pack, and more than one pack per day), agreement between the decedent and the next-of-kin fell to 74 070. In another study of lung cancer among shipyard workers (6), the authors interviewed 24 men *ith lung cancer and their wives concerning the husbands' smoking habits and found that there was 83 % agreement for smoking category. A similar degree of agreement was found by Flanders et a1 (12), who interviewed 15 live subjects and their nearest relatives to determine the subjects' smoking habits, and by Pershagen & Axelson (19), who compared information on smoking given for 14 deceased men by the next-of-kin with the information on decedents' medical records. In general, however, it can be said that more information is needed to determine the validity of smoking data from surrogates.
It is difficult t o trace past employees who are still alive but may have moved far away. Fortunately, in many occupational studies, the cohort is composed of individuals who, after retirement, tend to stay in the community where they worked. For example, in a recent study of smelter workers employed at eight different plants (personal communication, Edward Ricci and Phil Enterline, Graduate School of Public Health, University of Pittsburgh, Pennsylvania, United States), the investigators conducted a survey of over 400 cohort members. After cohort members were traced via letters, phone calls, and information provided by friends and relatives, a final response rate of 83 % was attained. These smelter workers came from relatively stable and small communities and were probably easier to trace than workers in large urban centers.
After the survey is conducted, other data must be used to estimate the smoking habits of the reference group. Once data are available on the smoking habits of the exposed and nonexposed, some type of adjustment similar to the one proposed by Axelson must be done to assess the possible confounding effect of smoking on any excess cause-specific risk which was found in the exposed cohort.
An example of this type of adjustment is found in a study of uranium miners (16). In that study a comparison of the smoking habits of the exposed versus those of the national population indicated that an 18 Yo excess in lung cancer mortality should have been expected for the exposed group due to smoking alone. In fact a 400 Vo excess of lung cancer mortality occurred.

Nested case-referent study
Once the cohort study is completed and excess risk of a given disease has been found, then an alternative way of controlling for smoking is to conduct a nested case-referent study. In such a study cases are usually all those who died of the disease of interest, and referents are a sample of cohort members who did not die of the disease of interest (or alternatively, had not died of the cause of interest at the time of the case's death). Cases and referents, or their surrogates (usually next-of-kin), must be interviewed to determine smoking habits, and cases and referents must be evaluated for their exposure to the suspected agent. Exposure odds ratios may then be calculated which adjust for smoking. However this approach usually requires that a group of nonexposed individuals is available in the cohort so that there is variability in the cases' and referents' exposure histories.
A good example of this approach is a study of copper smelter workers in Sweden by Pershagen et a1 (20). From a cohort of some 4,000 workers, 76 lung cancer cases were identified. Two referenti per case were chosen from among those who had died of diseases other than lung cancer. Interviews were conducted with next-of-kin to determine smoking status, and exposure or nonexposure t o arsenic was determined from company records. The authors found a threefold risk of lung cancer for nonsmokers exposed to arsenic. An interaction between arsenic and smoking for lung cancer was noted.
In a case-referent study with dead cases, live referents could be used as well, although the question of the comparability of smoking data would arise if some information came from the subjects themselves while other information came from surrogates. Frequently this problem is avoided by interviewing surrogates for the live referents rather than interviewing the referents themselves. On the other hand there is some evidence (17) that dead referents will have smoked more than live referents, a phenomenon which may bias estimates of risk.
The case-referent approach has the value that the investigator cannot only control for confounding due to smoking, but can also determine whether there is an interaction between smoking and the exposure variable. This ability to assess interaction is the strength of the case-referent approach to controlling for smoking. Perhaps the best known interaction between an occupational exposure and smoking is for the carcinogenic effect of asbestos. Other evidence indicates that exposures to arsenic (20), radon daughters, cadmium, and toxins in the rubber industry (18) may also interact with smoking to increase health risk. A recent report has suggested that smoking may also interact with occupational exposures to decrease lung cancer (24), under the theory that smoking causes an increase in airway mucus, which protects the lung from carcinogens. However, the epidemiologic data to document this last suggestion are weak or are contradicted by other studies (14). Alternatively, if there is such a protective effect, it may be restricted to very specific situations; for example some authors (14) have noted that British coal miners with smoking-induced bronchitis have had less coal dust in their lungs.
A disadvantage of the case-referent approach may be the necessity of having a group of nonexposed individuals for a reference group. Without such nonexposed individuals, everyone in the case-referent study will be exposed, and no risk for disease conditional on exposure can be estimated.
In some cases it may be possible to conduct a nested case-referent study to control for smoking within a cohort where everyone has been exposed, if there is sufficient variation in exposure levels. In such a case, the risk of low-exposed individuals can be compared to the risk of high-exposed individuals, smoking being controlled. Whittemore B McMillan (30) have recently provided an example of this approach in their case-referent analysis of a cohort study of uranium miners.

Discussion
It is very rare that an epidemiologist possesses information on the smoking habits of a historical occupational cohort; yet this lack of information has not prevented occupational cohort mortality studies from being undertaken. Indeed occupationally exposed cohorts have provided the definitive evidence of carcinogenicity for most known human carcinogens, including benzene, vinyl chloride, asbestos, bis(ch10-romethy1)ether (BCME), beta-naphthylamine, coke oven emissions, radon daughters, arsenic, nickel, and hexavalent chromium. Sometimes the excess risks observed have been for cancers which are unrelated or only slightly related to smoking (eg, hepatic angiosarcoma and vinyl chloride, leukemia and benzene). In other cases the risks have been of such magnitude that smoking could obviously not account for them (eg, BCME and radon daughters and lung cancer, bnaphthylamine and bladder cancer). Occasionally data on smoking were available for the cohorts (eg, asbestos, arsenic, radon daughters). Finally, in some cases an internal reference group has been used (eg, coke oven emissions).
Occupationally exposed cohorts will continue to provide the most solid evidence of associations between many toxins and chronic human disease. This is so because occupationally exposed cohorts have often been exposed sufficiently long ago to relatively high concentrations of toxins so that an effect can be measured. Furthermore mortality studies, rather than morbidity studies in which smoking information can be more easily collected, will continue to be common because of the relative convenience of using routinely collected records. Therefore the problem of disentangling the effects of exposure and smoking will continue. Many investigators are already currently using some method of control when the disease in question is known to be associated with smoking and when an excess risk is observed for an occupational cohort. Our purpose in this report has been to list these methods in a systematic way and stimulate further discussion.