Epidemiologic study of work-related diseases Methodological problems of register-based studies

HARRINGTON 3M. Epidemiologic study of work-related diseases: Methodological problems of register-based studies. Scand J Work Environ Health 10 (1984) 353-359. The efficient execution of an occupational epidemiologic study requires accurate information on both exposure and effect. In this paper emphasis is placed on the use of registers of occupation or disease as tools in the undertaking of descriptive and analytical epidemiologic studies. While cancer registers are a common information source nowadays, their validity is not always consistently good. Other sources of information, such as industry records, professional membership listings, and disease notifications are also used. The inaccuracies of these registers - established usually for nonepidemiologic purposes - are outlined. Specific examples are used to illustrate the problems and pitfalls involved in interpreting the derived results. The study populations discussed range from the high enumeration feasible with life-long occupational groupings, such as pathologists and sea pilots, to attempts to trace individuals whose "membership" in an exposure group might be residence in the American Embassy in Moscow or recognition as short-service military personnel exposed to A-bomb testing a generation ago. The recent development of using census-based occupational information to trace mortality and morbidity is also discussed.

Occupational health is concerned with the interrelationship of health and work. In the past much emphasis has been placed on discovering the role played by work in initiating disease. More recently, concern has shifted to a broader base, namely, the role played by work in the disease process, be it initiator, contributor, or exacerbator. Epidemiologic method is an important research tool in all these cause-effect relationships. Detailed accounts of the uses of epidemiology are well documented elsewhere (21,35), but it is important to emphasize that the preventive health approach to populations of people is fundamental to occupational health practice and to the science of epidemiology.

Types of epidemiologic study
Epidemiology is the study of the distribution and determinants of disease frequency in human populations (30). In terms of occupational health, new work-disease relationships have been described through relatively simple inquiries (descriptive studies), while such observed "effects" or even anecdotal accounts have been confirmed or refuted by the use of analytical studies. Rarely can experiments be conducted to evaluate the importance and/ or effectiveness of a variety of probable etiologic factors in the incidence or prevalence of work-related disease. Central to the whole procedure, however, is the establishment of a cause-effect relationshippreferably with a dose-effect componentwhich can then be the starting point for logical preventive action. Such a statement presupposes the acquisition of a variety of reliable pieces of information. This process of acquisition is the difficult part of epidemiology, not the subsequent statistical analysis.
The items of specific importance are a reliable record of morbidity and/or mortality in defined populations with an accurate record of relevant exposures of interest. Thereafter, the process of mathematical manipulation is comparatively straightforward. The purpose of this paper is t o review some of the sources of such records and the methodological pitfalls inherent in their discovery and usage.

General
As Hernberg (26) states, "The perfect epidemiologic study still remains to be done [p 1631." No study to date is without methodological flaws, yet the magnitude and direction of these errors play an increasingly evocative role in the interpretation placed on the epidemiologic study. Errors are of two main types, systematic or random. The systematic error distorts in such a way that increasing the numbers of the study subjects o r the studies themselves would not mitigate the effect. Random errors, by contrast, are amenable to such devices.
Within the gamut of analytical epidemiologic studies, the basic choice is between the cross-sectional or longitudinal type, while the more useful latter variety can be further divided into case-referent and follow-up studies. The case-referent approach has inherent attractions regarding cost in time, money, and personnel and has a particular value where the disorder of interest is relatively rare. These good points are often offset by the large-scale opportunities for bias in such studies. Sackett (42) catalogues a staggering 56 varieties of bias. Thus the deserved popularity of the case-referent study must be tempered by a realization that such an approach is a methodological minefield.
Furthermore less obvious but important aspects of study design are being overlooked. Anderson & Mantel (4) note the following three: (i) the distinction between analytical and descriptive surveys, whose aims are quite different; (ii) the need for stability in data-generating mechanisms, speculation being no substitute for carefully gathered facts; and (iii) the use of "operational methods of measurement," by which they mean that potential replicators of the study understand the methods used and that the measurements themselves are reproducible.
In addition specific study difficulties such as variations in latency period, verification of cause of death (or disease), proper selection of referent populations, good exposure data, and a cogent assessment of cause-effect relationships must be taken into account if an epidemiologic study is not to be fatally flawed (12).
These, and other methodological problems, are now addressed in the context of register-based studies. Such studies can be reviewed by subdividing them into (i) population-specific problems, (ii) exposure-specific problems, and (iii) problems relating

Population sources
Occupational mortality data are by far the items most frequently used for epidemiologic studies of work-related diseases. The background and many of the pitfalls of their utilization are described elsewhere in this journal (22). Morbidity data are softer and therefore less popular. Information on both the disease process and its (perhaps) fatal end points needs to be acquired from some source or other. These sources will now be reviewed.
National data by geography. Routine mortality and morbidity statistics are collected in all developed countries. They can be used to plan health care, control known diseases, and estimate the economic burden of ill health. In the past, few have been gathered for epidemiologic research alone, even fewer for an evaluation of work-related etiologies. It is not surprising therefore that some commentators bemoan the success rate of these sources as the starting point for fruitful etiologic studies (2). The astute clinician with the inspired "hunch" still carries off the prize for cause-effect discoveries (32), though, increasingly, the use of routine data to generate such hunches has acquired some justifiable vogue (1).
At the forefront of this approach has been disease mapping. Examples of this procedure are given in the article on occupational mortality elsewhere in this journal (22), but the study of disease variations between even small geographic areas may be rewarding. Gardner et al (16) analyzed the geographic variations in specific causes of mortality among 1 366 local authority areas in England and Wales (as defined in 1977) for the years 1968-1978. Their analyses noted an excess mortality from pleural mesothelioma in Barking, London, and nasal sinus cancer in High Wycombe and Rushden. These "discoveries" had already been the subject of previous investigations, but their detection by such a relatively crude technique as national mortality records is encouraging, as other, as yet undiscovered links between environment and disease could be revealed for moredetailed study. The test of the method will thus be whether it leads to the detection of previously unknown etiologies. For example, the women of Weymouth appear to have an excess mortality from malignant neoplasms of the colon, ovary, and peritoneum. No etiologic agent has yet been elucidated, but a study of the women's past occupations is underway.
Similar approaches have also been used in the United Statesparticularly in relation to cancer morbidity (15). One specific "spin-off" from the United States cancer mapping exercise, with which I was involved, could be described in a little more detail as it provided some unexpected outcomes. The National Cancer Institute invited the Center for Disease Control in Atlanta to follow up the hunch that the excess lung cancer rates for white males on the southeastern seaboard was due to the concentration of chemical and paper industries in that area. A case-referent study of incident cases of lung cancer failed to reveal a clear-cut relationship with these industries (23), but further analysis of the data noted that shipbuilding was a particularly prominent industry among the cases (8). Asbestos was the probable etiologic agent, though, oddly, no mesothelioma excess could be detected despite intensive review of the pathological records of the hospitals included in the study. The high lung cancer rates for men and women in Charleston, South Carolina, was also followed up by Dement and his co-workers (1 1) in an assessment of a chrysotile textile plant in that city.
National data by profession. Despite the uses that can be made of occupations stated on death certifi-cates, serious flaws exist as to their inherent validity, as well as to their comparability to the census-based denominator data used in the national commentaries on occupational mortality, such as the British decennial supplements (22). However, when professional groups have their own registers, the registers can be a useful source of occupational epidemiologic data. Selikoff's important series of papers on asbestosrelated disease among insulators in the United States was based primarily on labor union records, though these lists were exceptionally complete for a union, the members of which also tended to remain insulators (44). In the United Kingdom I have had the opportunity of looking at two different types of occupations, both of which provided a good data base for a mortality study. First, the English Channel sea pilots, though they start such work at the relatively late age of 35 years, were all, by entry qualification, master mariners, who in practice virtually never do any other job thereafter. Moreover, in order to acquire a pension, their widows are required to furnish the pilotage authority with a copy of the death certificate. This pension rule thereby obviated many of the search problems inherent in a retrospective cohort study. All the original cohort was traced through to survival or death, and 95 % of the death certificates was immediately available without recourse to national repository searches (20). An equally rewarding source of mortality data was obtained in the course of two studies of British pathologists. The Royal College of Pathologists maintains a professional register, specialist accreditation in the subject is dependent upon membership in the College, few, if any, pathoiogists thereafter change jobs, and the cessation of payment of membership dues in the vast majority of cases means death of the individual under study (24). Not all such professional registers are so easy however. Kinlen (27) has cited the problem of "life membership" of the Royal Society of Chemists as a potential problem leading to an underestimation of relative risks.

National data by census linkage.
Not all occupational groups are quite so easy to follow as the professions, however, and for many no reliable records exist of the members of a group of industries at the national level. Nevertheless, one attempt to overcome this shortcoming has been to link occupation, as defined in the census, with a subsequent prospective review of those individuals identified as employed in the industry in question at that time. Census-based mortality studies of this type, therefore, relate occupational characteristics defined in life by the census respondents with their subsequent mortality. Industry-based mortality studies can thus be undertaken cheaply and quickly, and several have been completed [for example, those on hairdressers (3) and fertilizer manufacturers (14)], while others are currently being completed (for instance, my own study on pharmaceutical workers). The main drawback with this approach is the fact that the occupation stated on the census form only relates to the job currently done and thus could be irrelevant to long-term or lifetime "exposure." Such an approach can only, therefore, be used for industries with.relatively stable populations, and, even then, the results would need to be viewed with considerable circumspection. Nevertheless, the validity of such data can be enhanced if the individual concerned is capable of being followed through more than one census in the same industry, though such long-term survival has other implications for the self-selection and workrelated hazards reviewed later in this paper.
Cancer registers. Thus far, I have considered either national-based data or fatal disease. Such an approach precludes, by definition, a consideration of nonfatal diseaseeven malignancy with a low fatality rate. Disease incidence data would, perforce, have considerable advantages, though the parochial nature of some registers sometimes limits their wider applicability in occupational epidemiologic studies (18). The most common types of incident data available are cancer registers. Cancer registration, as it is recognized today, began in Hamburg in 1927 and in Massachusetts a year later. The United Kingdom began such collections in 1930 ( 9 , Connecticut five years later. Nowadays over 27 countries have a cancer register of some sort or other, and the percentage of cases with a tissue diagnosis frequently exceeds 90 in Western Europe and North America (10).
Unfortunately, some registers are restricted to hospital patients rather than being population-based. In some European countries, notably Scandinavia, the linkage of population-based cancer register data with the unique number assigned to an individual at birth provides a splendid opportunity for epidemiologic research. Nevertheless the employment of cancer registers in identifying occupational or industrial carcinogenic hazards is poorly developed (49)largely due to the fact that the high quality of the histological data is not matched by similar quality in the occupational history data. This flaw clearly needs to be rectified if this invaluable source of disease incidence is to fulfill its promise as a source of etiologic hypotheses. However, the verification of the factory registers of employees as reliable has dogged investigators down the years. Mancuso & Coulter (33) tried to counter this problem by using the Bureau of Old Age and Survivors Insurance (BOASI), while Marsh (34) advocated use of the Employers Quarterly Report to the US Internal Revenue Service. Neither method is wholly satisfactory, and the investigator is often left to make the best of whatever data can be acquired.

Workplace-based data. Observations of ill health in
In this context Fletcher & Ades (13), in their investigation of English'foundry workers, began by listing the members of the Steel Castings Research and Trade Association. Their final tally of foundries was, however, underrepresentative of the smaller workplaces and geographically skewed. The study results were cautiously interpreted in this light, but such methodological flaws may be unavoidable.
Even if a factory population is adequately traced, as in the McDonald cohort of Quebec chrysotile miners and millers (31), the relationship of length of service to selection factors can be apparent. While a substantial exposure effect was evidept for men after five years of employment and even more clearly after 20 years, men with less than one year in the industry had relatively high standardized mortality ratios, especially when employed in jobs with low dust exposure. While those men employed for 20 years or more may well be relatively homogeneous in selection and exposure, are they substantively discrete from the early leavers?
A particularly suitable cohort of workers has been extensively studied by Veys at the Michelin Tyre Company at Stoke on Trent. He noted that the company records were good for the whole of the factory's 60-year life, that the workforce was very stable, and that, even after retirement, the employees tended to remain in the geographic vicinity, and thus their subsequent deaths were relatively easy to verify. This population of rubber workers continues to be a valuable source of information of the effects of rubber manufacture on health (41).
Similarly Newhouse (39) has followed a cohort of 4 500 asbestos textile and insulation workers for over 50 years, and, even though the company is now defunct, the records obtained continue to provide useful sources of factory and neighborhood disease patterns (40).
Ad hoc populations. The success of ad hoc studies using workplace records is heavily dependent upon the quality of the records. Occasionally the occupational health problem arises from an effect noted in a hospital clinicas occurred with an "epidemic" of peripheral neuropathy in Ohio which was traceable to one workplace and one section therein where the introduction of a new fabric coating agent (methyl butyl ketone) showed a close temporal relationship with the onset of the neuropathic effects (7). At other times, the putative link between effect and workplace/exposure can lead to attempts to trace somewhat bizarre "workplace" populations. Caldwell and his co-workers (9) followed up 3 217 nuclear test participants on military maneuvers in the 1950s after claims that some of these servicemen were dying of leukemia, while Lilienfeld (28) was asked to trace residents of the American Embassy in Moscow after assertions that the building had been subjected to intense microwave bombardment.
Ad hoc studies may start from registers established for other purposes. Pregnancy outcome and workrelated factors have considerable vogue. The epidemiologic approach to human reproductive failure assessment is, however, fraught with difficulties (25,50). The problems not only include the selection of a suitable end pointspontaneous abortion, stillbirth, live congenital malformationbut also the difficulty of deciding retrospectively what constitutes a significant occupational or environmental exposure or even a pregnancy. Even greater obstacles are encountered if infertility is to be assessed, and this question is particularly thorny regarding female infertility, because one is trying to prove a negative. Case-referent studies invariably suffer, at least potentially, from recall bias, whereas a cohort approach is costly and not particularly fruitful as the end point under review may be relatively rare. Pregnancy outcome studies may thus be suitable subjects for the cohort approach with nested case-referent studies, but the cost of such exercises frequently proves prohibitive without a clear-cut hypothesis to test.
The inspired hunch is all that is left under these circumstances, and, althougth some important discoveries have resulted from a hunch (29), a more rational approach to hunch collection is clearly needed (32). Such an alternative need not be costly, but is, apparently, yet to be implemented despite calls from a number of quarters.

Exposure sources
The discovery of occupational hazards requires the juxtaposition of two sets of data: information on illness or death among workers and information on their occupations. Most epidemiologic studies to date have concentrated on workplace sources and on the job or industry title. As the effect-response relationships sought become more subtle, a much more detailed account is needed of real exposures rather than of the presumed ones that may be inherent in a job title.
Obtaining occupational histories is a difficult procedure, based as it frequently is on the workers' memory of past events. Few records exist of occupational hygiene measurements of specific workplace agents, and the job exposure matrix (JEM) will only partially solve this problem (36). A novel approach which is bearing fruit is the subject by subject coding of jobs by technical experts independent of knowledge of the respondents' morbidity or mortality (45). Such a pilot exercise in Montreal has led to the confirmation of some known exposure-effect relationships and augurs well for the future discovery of new associations.
Good exposure data from the past is available in some industries, notably from the British National Coal Board, and lifetime expectancies of pneumoconiosis are now predictable from current coalface dust concentrations (38).
The "dose x time" factor is, however, a continuing source of epidemiologic confusion. Allusion was made to McDonald's data suggesting a high standardized mortality ratio for short-term workers. Whether this phenomenon is due to high exposure and selection out of the industry is not clear. Such an hypothesis has been invoked by Wagoner and his coworkers (48) to explain the excess deaths from lung cancer in a beryllium plant, some of the deaths having occurred in less than the assumed latency period and some among workers who had been employed for short periods of time (frequently less than one year). That the data shows an excess lung cancer rate for "sensitive" individuals is disputed, but there can be little doubt that the problem of the short, high exposure group and the relationship of their subsequent ill health to such exposures has not been resolved.
A similar phenomenon has been observed by Gilbert in his reworking of the Hanford nuclear plant data (17). He drew attention to the fact that, although reduced death rates were noted in the early years of follow up, especially for those of higher socioeconomic status, workers employed less than two years and especially "terminated" workers were found to have elevated death rates compared with the remainder of the study population. Thus the Hanford population could be considered to show a healthy worker effect and a short-term unhealthy worker effect.

Problems involving populations and exposures
Choice of referents. The selection of an appropriate referent population bedevils occupational epidemiologic studies (37). In cohort studies the national rates are commonly used, with or without correction for regional variation or socioeconomic class. But it is important to note that socioeconomic class data are only available in most countries up to the usual age of retirement, whereas the disease/death end point of the study population could be a decade or more later. Furthermore census data errors in socioeconomic classification (for example, the "misclassification" of company directors as socioeconomic class I and the large "unoccupied" category which is itself heavily biased towards socioeco-nomic classes IV and V) can greatly affect the validity of such comparisons (27). Such biases might be reduced if data from national samples like the British Longitudinal Study or the American Health and Nutrition Examination Survey (HANES) were used in future studies (43).
Comparable factory cohorts may avoid some of these problemssuch as Tolonen and his coworkers' use of a neighboring factory in the assessment of cardiovascular morbidity and carbon disulfide exposure (47). Such procedures will, however, double or even treble the cost of the study, and they do not necessarily avoid the problem of migration in or out of the cohort prior to the start of the study (51).
For case-referent studies the dead/alive status of the case should preferably be matched by a similar first person or proxy for the referent (19). Blot and his co-workers (8) tested the validity of several approaches in the Georgia lung cancer study described earlier and found the quality of data from such matching procedures to be comparable. Such comparability does not, of course, necessarily address the question of validity.
Confounding. Finally, in this section, there remains the ever present problem of confounding. This issue is increasingly important as the commoner diseases are reviewed for occupational factors, when the rate ratios are frequently less than three. However, Axelson (6) has estimated that even in such cases the smoking effect for Swedish studies is unlikely to account for a rate ratio above 1.5 for relevant disease end points. Thus rate ratios above two are still likely to be related to factors other than confounderseven those as powerful as smoking.

Conclusions
Register-based studies form an important and growing resource for the investigation of workrelated diseases. The quality of national data is increasing, but the denominator/numerator mismatch continues to act as an important bar to validity. The development of longitudinal studies and the greater use of disease registersparticularly for cancer and pregnancy outcomecan and will have important implications for work-related health studies.
Even so, the era of the ad hoc study is not overparticularly when the lead to such an investigation comes from the inspired hunch of the astute clinician. While a greater use of some system for encouraging and then processing these hunches remains, there will always be scope for the individualist. Large-scale and thus expensive record collections will never solve all of the problems.
The methodological problems of register-based studies concern both the validity of the exposure data and that of the disease identification. In addition problems o f comparison and confounding are ever present. Epidemiologic studies are never cheap, but the nesting of case-referent studies in a n established cohort probably offers a better return on outlay than the more accurate cohort approach alone or the cheaper but potentially more biased case-referent approach.