The case-referent (case-control) study in occupational health epidemiology

The case-referent (case-control) study in occupational health epidemiology.

Epidemiologic studies play an important role in the elucidation of health hazards in industrial work environments. These studies have usually been undertaken as specific research projects and have methodologically often been rather expensive. Since it is desirable, however, to accomplish epidemiologic surveillance of various worker populations more often and rather routinely for the achievement of an effective identification and control of various risk factors, simple, effective, and, therefore, inexpensive methods should be applied. The intent of this communication is to discuss the case-referent (case-control) study design as an often suitable approach in occupational health epidemiology.

BASIC PRINCIPLES OF COHORT AND CASE-REFERENT STUDIES
Studies on morbidity or mortality in an industrial population are often based on the cohort approach. In occupational health epidemiology, a cohort is usually defined as a group of workers with an exposure factor in common, and it is followed-up with regard to time-course of exposure and development of illness during a period of time. The general population or another particular cohort is commonly chosen as the referent with regard to the mortality or morbidity under study. Several disorders potentially related to the exposure may be studied within this single cohort.
Cohort studies tend to be rather laborious since, ordinarily, several hundreds or thousands of people have to be studied with regard to exposure and outcome as to disease or death. It is therefore wise to avoid this type of study design whenever possible. However, when the exposure is rare and represented by isolated individuals or small groups scattered over a large population, e.g., throughout a country, the cohort approach is usually the only alternative.
In many situations, the case-referent method represents an interesting and convenient alternative to the cohort design. The general principle of case-referent studies might be illustrated by the following example: If 100 cases of lung cancer 0355-3140/79/020091-9 are enrolled and interviewed with regard to smoking habits, one might find, say, 97 of them to be smokers. If referents (or "controls") are chosen from the total population, providing the cases, or from people with another disease (which bears no relation to smoking), one might find some 60 0J0 of these indviduals to be smokers. Thus, the primary information from the case-referent study is a difference in exposure frequency between the cases of lung cancer and the referents. This difference can be utilized in the calculation of the rate ratio ("relative risk") for lung cancer, smokers versus nonsmokers. This is the essence of the information gained from a case-referent study.
Given a reasonably common exposure, a case-referent study requires less extensive data acquisition than the cohort approach; it only requires an ample source of subjects (cases and referents). A sufficiently high exposure frequency among the subjects may be achieved through an arbitrary choice of the source population. Then, it may be preferable to find a setting in which a fairly large group of workers with the exposure in question live in a parish or a small town. The ascertainment of cases in the population might be based on a local register of deaths, e.g., in a parish or in a county, but it can also be based on the medical files of a local hospital or of an industrial health care unit. Referents are often obtainable from the same source.
In urban areas, any specific industrial exposure tends to be rare, and therefore the case-referent approach easily fails, unless there exist some company-related registers, e.g., the deaths in active ages, or siok leaves or .early pensions. Trade union registers should not be forgotten in this context, since a membership might encompass disability and life insurance and result in a registration of causes of disability or deaths.
From a case-referent study only relative measures of the effect, such as the risk or rate ratio ("relative risk"), can be obtained directly. Recent development makes it possible, however, to assess also the absolute mortality (or morbidity) rate among the exposed and the nonexposed, i.e., if the overall rate is known for the source population (19).
However, there are many methodological pitfalls which can form a source of bias in a case-referent study. These various problems have been classified into four different categories, namely, those relating to the role of exposure status in the selection of cases and/or referents, inaccuracy of the information on exposure among cases and referents, possible relation of the reference entity (usually various noncase diagnoses) to the exposure, and, finally, incomplete control of confounding factors [see Miettinen (19)].

SELECTION OF SUBJECTS
A close look at a register or a similar source of subjects might reveal that some individuals have probably entered this source just because of, or in relation to, exposure or nonexposure. There might also otherwise be particular and primary relations to the exposure status. A phenomenon of this type has to be carefully considered, and certain restrictions of the source of subjects might sometimes be required or the source might turn out to be unusable. For example, in a study of neuropsychiatric disorders in relation to solvent exposure (4), it was necessary to restrict the source of subjects, a pension fund register, to encompass only a period in time when no particular concern was given to the possible relationships between solvent exposure and neuropsychiatric disorders. Later on, interest in this connection grew considerably, and one could suspect, therefore, that painters and similar workers more easily acquired a diagnosis of this character than other, nonexposed workers.
If a certain industrial exposure requires medical surveillance of the workers, a similar phenomenon may result, as symptoms, signs, or diagnoses then tend to be observed in view of the exposure. In general, however, primarily exposure-related diagnoses seem to be a fairly rare problem in occupational epidemiology as far as new areas of work-induced disorders are concerned. The situation might change in the future, when improvements in the work environment are epidemiologically evaluated with regard to known hazards and when background information therefore might influence diagnostic thinking. Even then, the assessment of a serious diagnosis of, say, cancer, cardiovascular disease, etc., will hardly be influenced by the physician's knowledge of the exposure status of the individual.

INFORMATION ON EXPOSURE
Adequacy of information on exposure is usually a minor problem in occupational health epidemiology, at least in qualitative terms, if the factory cooperates in providing information on each subject's employment in a particular job. The job history should preferably be ascertained blindly with regard to whether the subject is a case or a referent. If interviews are used for the assessment of exposure, the situation might be more problematic since there could be a tendency, conscious or not, for those suffering from a particular disease, the cases, to somewhat exaggerate their exposure in comparison with the referents. The same phenomenon might occur if relatives have to be interviewed with regard to the job of dead individuals; moreover, the interviewer might easily contribute to this type of bias since exposure among cases could attract more interest and attention than among referents. It is preferable, therefore, to use blinded interviewers and to let the questions cover various extraneous matters in addition to the exposure(s) under consideration. Diseased referents might be apt to reply in a manner more similar to that of the cases than healthy individuals, who might find little interest in answering questions about earlier work conditions.

THE REFERENCE ENTITY
The referents of a case-referent study are supposed to reflect the frequency of exposure in the source population. As has already been indicated, there might be some benefits to using diseased referents, but on the other hand one has to consider carefully what diagnoses can be suitable reference entities. Thus, if the exposure might cause not only the disease under study but also other disorders, the inclusion of these would increase the exposure frequency among the referents in comparison to the source population. The difference in exposure frequency between cases and referents would then be falsely reduced. For example, in a study of cardiovascular deaths and a possible causal role of exposure to nitroglycerine and nitroglycol in the explosives industry (9), it was rather apparent that individuals having died from explosives accidents had to be excluded from the reference series as there is an obvious primary relationship between the exposure and this cause of death. Similarly, in a study of arsenic exposure and lung cancer (3), it was suspected, from earlier observations (10), that cardiovascular deaths (among others) could not be appropriate for the reference entity due to a possible causal role of arsenic exposure.
Thus, based ona priori judgements, the reference entities should be refined to comprise only specific diagnoses unrelated to the exposure. Since there is often a lack of knowledge about such relationships, it might be preferable to use a great variety of diagnoses in the reference series, and not a single diagnosis (or interrelated diseases), to decrease the influence of an unknown connection between the exposure and a particular disorder. Should the exposure frequency be "falsely" increased among the referents through an unknown relationship between the exposure and included referent diagnoses, the result would be an underestimation of the effect of the exposure.

CONFOUNDING
Confounding factors have to be identified and accounted for. By definition such factors are related both to exposure and to the illness at issue (18). Specifically, in casereferent studies, confounders are determinants of exposure such that their distributions are different between the case and the reference subjects enrolled in the study, a view recently suggested by Miettinen (personal communication).
Confounding factors might be controlled with various methods, one being restriction of the study to encompass only a narrow range of the factor (or a certain level or category of the factor). It is more common, however, to account for the confounding factor(s) either by matching or stratification. Matching means selection for each case of one or more referents from the same category of one or several confounding factors. The matched pairs or sets should be maintained in the analysis of the data, if the matching is relevant (in which case there tends to be a correlation in the exposure pattern within the pairs or sets); further aspects on matching should be studied elsewhere (14,15). In stratification, the individuals in particular categories of the confounding factor(s) are grouped together in various confounder-specific strata. A further possibility is to apply multivariate analysis or a combination of multivariate analysis and stratification by means of a multivariate confounder score as has been suggested by Miettinen (20).
Age should be looked upon as a confounding factor in occupational health studies, but it is not always very important as a confounder. Usually, an increasing amount of exposure will be associated with older age, and the risk of almost any kind of chronic disease also increases with age. If the occupational title is taken as the exposure, age tends to be a very weak confounding factor, especially for skilled workers, since they are apt to stay in their jobs throughout life, i.e., there is little change in exposure (the job) over time (Le., age).
Smoking is often discussed as a confounding factor in occupational health studies, e.g., in the context of lung cancer due to an industrial exposure. It should be realized, however, that it is unusual that smoking is more closely associated with any particular industrial exposure, i.e., individuals with and without exposure will only rarely have appreciably different smoking habits. It is wise to control smoking whenever possible, however, and particularly if the effect of the industrial exposure is rather weak, say, resulting in a risk or rate ratio below two to three. These 94 aspects have been more fully evaluated elsewhere (1) by means of a quantitative model for the confounding effect of smoking in occupational health studies regarding lung cancer. It should also be observed that confounding effects might be negative or masking, e.g., explosives workers are prohibited to smoke on the job, a regulation which would offer some protection against cardiovascular disease.
Certain aspects of confounding are natural and convenient to control through restriction, as was done in the aforementioned study of neuropsychiatric disorders in relation to solvent exposure (4). Thus, it was necessary to reduce the source of subjects for this study, a pension fund register, to encompass only skilled workers, since neuropsychiatric diagnoses might occur particularly among those with some primary mental deficiency and who therefore were determined unable to achieve exposure to solvents in the skilled trades of painters, varnishers and carpet-layers. Similarly, if other registers are used, e.g., a local register from a parish surrounding a factory, contributing causes of death, diagnoses relating to severe diabetes mellitus or indicating debilitas, etc., should be noticed and the register primarily reduced by the exclusion of individuals having suffered from such disorders. Again, the reason is that these individuals might have been unable to work and therefore to achieve exposure. A problem of this type was faced in a study of cancer and cardiovascular disease in the context of arsenic e~posure (3). (See the following discussion.) It was shown in this study that the result of such primary exclusions was conservative in character and tended to reduce the effect of the exposure somewhat.
Neglecting to evaluate a source of subjects in the manner just indicated and not undertaking the proper restrictions would bias the results of a study in either an overestimating or a too conservative direction. Thus an underestimation of the effect of the exposure would occur if the case entity comprises diagnoses with a primary relationship to nonexposure (as mental disorders in the aforementioned solvent study), or an overestimation would result if the reference entity includes disorders which are primarily designated as related to nonexposure (as mental disorders and severe diabetes in the arsenic study).
Other aspects on confounding in occupational health epidemiology refer. to concommitant or consecutive exposures to potentially hazardous industrial substances. Such exposure situations are difficult to deal with since one or several of the exposures might fulfill the criteria of a confounder when another exposure factor is under consideration, i.e., such exposures tend to interconfound each other (1) and sometimes turn out to be practically inseparable.
Finally it should be noted in this context that there is both a prestudy situation, which requires judgement about possible confounding factors for the data collection, and an a posteriori situation, in which confounding can be evaluated in the light of the data. When an evaluation has been made, the suspected confounding factors have sometimes turned out to be weak and negligible (8,13) or even negative and masking, as was age for cardiovascular disease and explosives workers (9).

ON THE RELATIONSHIP BETWEEN PROPORTIONAL MORTALITY STUDIES AND CASE-REFERENT DATA
A rather common approach in occupational health epidemiology has been the study of proportional mortality [see Newhouse and Schilling (21)]. Thus, if the number of cases in the exposed population is a and the number of other deaths (noncases) is c, the proportional mortality among the exposed is a/(a + c). Similarly, when b denotes the cases and d the other deaths (noncases) among the nonexposed, Thus the risk ratio might be estimated as . If the number of cases of the disease under study, a and b, respectively, is quite small in comparison to c and d, the risk ratio might be estimated as ad/be. Since this quotient is also taken as the risk ratio (or rate ratiosee the following discussion) in casereferent studies, the similarity between the case-referent approach and the pro-portional mortality study is apparent as long as a and b are small, i.e., when the disease under study is rare. However, recent development has shown that the rare disease condition is not necessarily a prerequisite for case-referent studies, since, according to Miettinen (19), ad/be gives the ratio of the incidence rates. (Incidence rate, or incidence density, is the number of incident cases per person-years at observation.) For rare diseases, the incidence rate ratio approaches the ratio of the cumulative risks. (Cumulative risk, or cumulative incidence, is the fraction of individuals falling ill during a defined period of time.) Further insight into the nature of case-referent studies can be obtained from the paper by Miettinen just referred to (19) and makes it apparent that the acquisition of dead referents or diseased referents over the study period will result in an incidence rate ratio, provided that the diagnoses among the referents are not related to the exposure (positively or negatively). Thus the referents simply represent the source population of the cases over the study period in respect of exposure or nonexposure.
Due to the close relationship between case-referent studies and proportional mortality studies, there is rarely a need anymore for the proportional mortality approach, since the case-referent view provides better estimates and does not require the rare disease assumption. In addition to being a theoretically well-established method today, the case-referent approach now also permits the derivation of absolute measures of morbidity, i.e., the incidence rate and the cumulative risk, if the total incidence rate of exposed and nonexposed individuals is known either from the study itself or from other sources (19).

BENEFITS OF THE CASE-REFERENT METHOD
As has already been pointed out, casereferent studies, whenever possible to apply, are fairly effective and do not usually require extensive resourees, since few study subjects represent large source populations through this study design. There is also another advantage, namely, the possibility to study the etiologic influence of several exposures within the same series of cases and referents, e.g., both industrial risk factors and general health hazards, such as smoking, etc. Thus case-referent studies provide for the study of multiple etiologic factors for a given illness (while cohort studies permit the follow-up of multiple effects of a given exposure). It is also easy to evaluate combinations of various factors to elucidate particular risk exposure situations, such as concommitant smoking and certain industrial exposures with regard to lung cancer. Sometimes the result might be surprising, e.g., when the risk for lung cancer was found to be relatively decreased rather than increased from smoking among miners exposed to radon daughters (6).
The disadvantages of case-referent studies have already been indicated from various aspects. The need to derive the absolute measures of disease in an indirect way is one problem. A very disturbing situation occurs if the exposure is rare or scattered. For example, in the case of exposure to trichloroethylene as a possible cause of cancer, only very few individuals were found to be occupied with degreasing operations or otherwise exposed within a factory or in a given geographic area such as a town, parish or another administrative unit providing a register. Therefore the case-referent approach failed, and a cohort had to be established, exposed individuals being taken from a large number of industrial settings throughout Sweden (2).
It might be added that the case-referent approach is particularly convenient if patient registers in hospitals can be utilized as a source of subjects. However, researchers in epidemiology working outside hospitals might experience difficulties in obtaining access to hospital registers; many would probably benefit, therefore, from close cooperation between epidemiologists and clinicians.

AN EXAMPLE OF THE CASE-REFERENT APPROACH
A study on arsenic as a possible cause of various disorders, among them lung can-96 cer, other types of malignancies and cardiovascular disease, might be utilized as a methodological example of the casereferent method (3).
First consider some different sources of cases for a study of this kind. The source could be the death records of the parish surrounding the copper smelter where the arsenic exposure occurred, but one might also have thought of utilizing a trade union register which also included members who had died. Another possibility would have been the use of a company register, either of those who had died in active years or of all people ever employed. In the latter cases it would be necessary to make a comparison with other data (official registers of causes of death) to find the cases. Interestingly, this manner of conducting a study would be to take a case-referent view within a cohort [ef. Axelson and Sundell (5) and Bayliss et al. (7)].
The approach finally chosen was to utilize the death records of the parish. In Sweden these local death registers encompass all individuals residing in the parish at the time of death (although they might have died at a more or less remote and specialized hospital). All subjects in the death register had to be classified with regard to the underlying cause of death, and those who could not be clearly defined had to be excluded. Since a case-referent study is concerned with relative frequencies of exposure, it is not necessary to include every case or potential referent; one must only ensure that excluded (d. the following text) or missing subjects do not have special relations to exposure. (Although unlikely, one should consider the possibility of a relationship to exposure status by comparing the exposure frequency of the excluded individuals with that of the finally chosen referents.) However, if cases are primarily missing for some reason, the possibility to derive the incidence rates is jeopardized.
Secondly, one might consider the primary relationship between the exposure status and prevalence of the subjects in the register, Le., of those not already missed due to undefinable underlying causes of death. Individuals suffering from malformations, severe diabetes mellitus, etc., might have died rather early without ever having been able to work, Le., these diagnoses are related to nonexposure and should not be accepted as (either cases or) referents. If included in the reference entity, there will be a "dilution" of the exposure and therefore an increase of the effect. In the study referred to, these exclusions were shown to act in a conservative direction, just as expected.
Another step was to consider what diagnoses would be acceptable among the referents due to possible causation from the exposure. Thus, from earlier information [particularly the study by Lee and Fraumeni (10)], it was anticipated that individuals having died of lung cancer had to be excluded from the referents in order to evaluate cardiovascular deaths. On the other hand, cardiovascular deaths could not be accepted as a reference entity for lung cancer, nor did other cancer cases seem to be suitable referents. Through such judgements, "refined" reference entities were finally selected, including only such diagnoses which were not known or suspected to have an etiologic relationship to arsenic exposure.
For the assessment of exposure, lists (blinded with regard to whether the subject was a case or a referent) were sent to the company for identification of those individuals who had been employed in various departments. The levels of arsenic and other agents were estimated over time for the various workplaces [for details see original paper (3)]. Then, it was possible to classify the cases and referents into the various exposure categories, attention being paid to duration of exposure, as well as to the level of exposure, and the time period of exposure in relation to the induction-latency time for developing the diseases. It might be worth noticing that the case-referent approach makes it easy to account for latency time, as well as for the period of effective exposure. Thus, in cancer studies the exposure during the last, say, ten years might be disregarded (both among cases and referents). a With the total nonexposed as the standard.
Age was accounted for as a possible confounding factor since long-term exposure would require a comparatively old age, which also implies a high risk for cardiovascular disease, lung cancer, etc. Stratification by age was done, the data layout for cardiovascular disease appearing in table 1. A restriction to 30-74 years of age seemed reasonable because the younger ages would not provide any information (no cases) and in old ages one might suspect less adequate diagnoses, particularly with regard to cardiovascular disease. Inclusion of old ages would also have diluted an effect, since almost anybody would suffer from cardiovascular disorders in old age and the difficulty to distinguish between cases and referents would increase (and result in exclusions). (See the preceding discussion.) Once the presented aspects have been considered, the remaining analyses need merely be confined to statistical calculations. Thus p-values were obtained by the Mantel-Haenszel test (12) and the extension of it (11). The various measures of effect were calculated, such as the crude rate ratio, the SMR (standard mortality ratio), and the SRR (standardized risk ratio), the latter measures based on principles given by Miettinen (16,17) and useful for evaluating the strength of the controlled confounding (i.e., through the deviation from unity of the quotient of the crude rate ratio to the SMR -upward deviation indicating positive confounding and downward deviation a negative or masking confounding) as well as for doseresponse relationships (the SRR). These aspects are discussed in some detail in the original paper (3) but also elsewhere (1,17).
ACKNOWLEDGMENT I am indebted to Prof. OlIi Miettinen for his comments on this manuscript.