Pathology standards for asbestosis

(1983) Pathology standards for asbestos-associated diseases of the lungs and pleural cavities were recently developed by the Pneumoconiosis Committee of the United States College of American Pathologists under contract to the US National Institute for Occupational Safety and Health (NIOSH). The purpose of the contract was to develop standardized criteria for the pathological diagnosis of these diseases and to develop a system for grading the severity and extent of asbestosis. The results of a preliminary reading trial and the NIOSH statistical analysis of the trial are presented. These results indicate that the proposed grading schema has acceptable inter- and intra-observer variability. The variability is similar to that observed for radiologists in radiographic reading trials.

The University of Vermont and the College of American Pathologists, under contract to the National Institute for Occupational Safety and Health N O S H ) , have recently developed standardized criteria for the pathological diagnosis of asbestos-associated diseases of the lungs and pleural cavities. In addition the contract called for the development of a system for grading the severity and extent of asbestosis. The need to develop these standards grew out of several observations. First, while the pathological features of advanced asbestosis appeared to be generally well recognized, the early lesions of asbestosis are not well documented in the literature, nor are possible differences in the pathological response to different asbestos types or to asbestos combined with confounding minerals such as silica. Most importantly the minimal histological criteria for the diagnosis of asbestosis have not been defined. Second, in order to develop sound epidemiologic studies and to compare findings between different laboratories, i t is necessary to have agreed definitions and a reproducible system for measuring the extent and severity of the disease. Third there has been a veritable explosion of compensation claims against the major asbestos manufacturers (2). Many of these involve the possibility of occupational lung cancer in which the presence or absence of histologically confirmed asbestosis may be pivotal.
Previous experience in developing standards for coal worker's ~neumoconiosis has shown that standardized criteria are of value in all these areas (5). They also provide a basis for developing more accurate criteria or better systems of grading as more is learned about the natural history of asbestos-associated diseases.
In 1980 a committee of eight US pulmonary pathologists and one pulmonary epidemiologist was assembled under the aegis of the Pneumoconiosis Committee of the College of American Pathologists. The Committee was chaired by Dr JE Craighead of the University of Vermont. The purpose of the Committee was to review case material from asbestos-exposed individuals in both the United States and worldwide and develop a report. To our surprise very few collections of pathological case material were available. In those that were located, exposure and clinical information was usually lacking. It was portant that the grading system measure the most significant parameters of the also difficult to obtain specimens from workers exposed to relatively pure forms of the four major commerical forms of asbestos (chrysotile, crocidolite, amosite and anthophyllite). We were grateful therefore to obltain access to material from the Commission de la Sante' et de la Securite' du Travail du Quebec, where exposure was predominantly to chrysotile asbestos, and from Drs LO Meurman and Y Collan in Finland, where exposure was predominantly to anthophyllite. Case material was also obtained from Great Britain and South Africa. The Committee met four times to review the assembled case material. The resulting report reflects our experience of the case material and extensive discussion (1). In addition the opinions of many recognized experts in the field were solicited, and these were incorporated where appropriate.
The Committee's charge was to describe only lesions of the lungs and pleural cavities; therefore neoplasms of the larynx and gastrointestinal tract were not considered. Four lesions are considered in detail in the report, ie, pleural plaques, asbestosis, lung cancer, and mesothelioma. Emphasis has been placed on the probable morphogenetic sequence of events in the evolution of asbestosis and on the minimal histological criteria to establish the diagnosis. In addition the report of the Committee devotes considerable time to the significance of asbestos bodies and fibers in the lung and recommends methods for the pathological study of a suspected case of disease due to asbestos exposure.
The monograph also includes a proposed grading system for asbestosis and the results of a preliminary reading trial. A separate analysis of this trial was conducted by NIOSH, and this analysis is presented in detail in the present communication.
Several grading systems were considered by the Committee, some of which were relatively complicated and designed with specific research goals in mind. An important consideration in developing a grading system is that it should be relatively simple to perform and have acceptable reproducibility. It is also im-disease and be comparable to systems used by other disciplines, for example, the radiographic classification of pneumoconiosis developed by the UICC (Internationale Union Contre Cancer) (7).
The system adopted fulfills many of these conditions. 1 t is semiquantitative and measures the extent and severity of the pulmonary fibrosis of asbestosis. Certain aspects of a grading system proposed by Hinson et a1 (4) were incorporated into this schema. The proposed grading system is not intended to be a substitute for diagnosis, which should be made according to the criteria set out in the monograph (1). It is based on histological material only. This procedure has several advantages. First there is a permanent record. Thus the grading can be repeated by different observers should such repetition be necessary for medicolegal or scientific purposes. In addition retrospective studies can be performed. Second the system is applicable to small specimens such as biopsy samples. Third fibrosis is more easily recognized histologically than grossly. It could be argued that the small area sampled in a tissue section could seriously bias the reading in view of the irregular distribution of the disease. However, if multiple representative sections are taken from each lobe, then this bias can be minimized, and an average grade can be obtained for the whole lung or for individual lobes (1). Although inflation fixation is recommended, the grading system is applicable to uninflated lung sections also.
As mentioned earlier, both severity and extent are measured. The criteria for severity are based on a likely pathogenic sequence of events in the evolution of asbestosis. The criteria for severity and extent are presented in table 1

Materials and methods
The material used for testing the grading system was obtained from the Archives of the Armed Forces Institute of Pathology, courtesy of Dr L Hochholzer, Dr J Rust and Col E Cowart. The material was screened from several hundred cases by two pathologists and selected from among

1
Fibrosis involving the wall of at least one respiratory bronchiole with or without extension into the septa of the immediately adjacent layer of alveoli; no fibrosis in more distant alveoli 2 Fibrosis as in grade 1 plus involvement of alveolar ducts and/or 2 or more layers of adjacent alveoli; there still must be a zone of nonfibrotic alveolar septa between adjacent bronchioles 3 Fibrosis as in grade 2 but with coalescence of fibrotic change such that all alveoli between at least two adjacent bronchi,oles have thickened, fibrotic septa; some alveoli may be obliterated completely 4 Fibrosis as in grade 3 but with formation of new spaces of a size larger than alveoli ranging up to as much as 1 cm; this lesion has been termed "honeycombing;" the spaces may or may not be lined by epithelium Only occasional bronchioles are involvedmost shmow no lesion B = 2 More than "occasional" but less than half of all bronchioles involved C = 3 More than half of all bronchioles involved a Lesions associated with individual respiratory bronchioles are evaluated. The grade is based on the most severe lesion in the slide, not a visual average of the changes found in the various individual respiratory units. b The proportion of the respiratory bronchioles involved by the disease process is assessed. The grades pertain to the relative numbers of bronchioles, in the slide, involved by any degree of fibrosis, not just the numbers involved to the maximum degree as recorded under severity.

Fig 2.
Grade I 1 asbestosis. The fibrosis is centered on a respiratory bronchiole but is extend- Fig 1. Grade I asbestosis. The fibrosis is con-ing out into adjacent alveolar septa. An asbestos fined to the walls of a respiratory bronchiole (RB). body is seen in the fibrosis (arrow). (Hematoxylin (Hematoxylin and Eosin, 15 X ) and Eosin, 15 X ) others because they were believed to be representative of different degrees of asbestosis. Two sections, A and B, were assessed in each of 20 cases. Slide A of each pair was read twice (once in the morning and once in the afternoon) by each of nine pathologists, while slide B was read only once by each. Each slide was read independently by every pathologist with approximately 5 min allotted to each sli,de. It should be noted that all nine readers were specialists in pulmonary pathology.

Interpathologist variation
Analysis was undertaken separately for severity and extent, as it was felt that a combined score was inappropriate. For instance, it was felt that a product score of 4, obtained from a severity of four times an extent of 1, had a different interpretation from a severity of 2 and an extent of 2. In addition, since both grading schemes are new, it was felt desirable to evaluate both as it might be found that one was reliable and the other was not.
Modifications could then be made with greater confidence about the source of the unreliability. Table 2 shows the distribution of the severity gradings for each pathologist. Although the distributions do not differ markedly, there is some variation between pathologists. For example C clearly believed that the majority of the slides were grade 3 severity, while G thought that most were grade 2. Pathologist B read one of the slides as being category 0. The mean level was 2.6 with a range of 2.3-2.9. Table 3 shows the information for extent in a similar manner. Again the distributions do not differ markedly among the pathologists. In particular, all except one categorized most slides into categories B and C. If we assign the scores A = 1, B = 2, C -3, we can obtain mean values of extent which give a rough indication of the average level. These show a minimum of 2.0 for G and a maximum of 2.6 for A.    Although two pathologists may obtain identical distributions of grades of severity or extent, this is no guarantee that they agree exactly on the classification of every slide. Indeed film readers often obtain very similar prevalences among groups of X-ray films but disagree on individual classifications. Thus, in addition to studying the distributions of severity and extent among pathologists, we must also examine agreement of assessments of individual slides.
Interobserver reproducibility is often conveniently studied through the examination of pairwise agreement and disagreement. The following three statistics have been chosen to measure reproducibility: The first statistic has an obvious interpretation. As its name implies, it measures the degree to which pairs of pathologists agreed exactly on the assessment of slides. The second has a similar meaning except that the definition of agreement has been widened to include a minor disagreement of plus or minus one category. (Under this definition, extent scores of A and B and of B and C would count as agreement, while scores of A and C would not.) The third statistic reflects the degree, in terms of categories, to which two pathologists might differ in their interpretation of the same slide. These three statistics have been calculated for each possible pair of pathologists; there were 36 such pairs. Table 4 shows the mean and range of the percentage of exact agreement for severity over the 36 pairs. The mean percentage of exact agreement of 57 implies that random pairs of pathologists from this group would be expected to agree exactly on the assessment of one slide in two. The lowest pairwise agreement was 38 010, and the highest 71 O/o.  Table 4 shows that the degree of difference between any two pathologists in the group averaged about half a category. The worst situation was seen for extent, for which the average disagreement for one pair of pathologists amounted to nearly one category. I t is of interest to note that complete agreement on the severity scores was obtained for 9 out of the 58 slides (16 O/o), and for extent 7 of the 58 (12 O/o) (the corresponding statistics for f one category were 57 and 43 Ol o, respectively).

Intrapathologist variation
The same three statistics as for interpathologist variation were used to explore intrapathologist variation. Twenty slides were each read twice by each pathologist, a procedure giving rise to nine values of each statistic for severity and nine also for extent. The results are expressed in terms of means and ranges in table 5. The intrapathologist reproducibility on severity was much better than the interpathologist reproducibility, as might be expected.

Comments
The interpathologist agreement in this trial was close to 55 O/o for exact pairwise agreement. This value is comparable to that obtained for another established important medical tool, the ILO U/C classification of radiographs (6). For example, Felson et a1 (3) obtained exact pairwise agreement of 59 O/o, and agreement to k one category of 98 O/o in their classification of coal miners' films for profusion of small rounded opacities3 (Since the radiologists in that report were only using a three-point scale, we might conclude that the pathologists' agreement was possibly slightly superior.) Experience with film reading has shown that, while readers frequently agree on the presence or absence of disease, there is frequent disagreement on level of abnormality for those films which are said to be abnormal by at least one reader (3). Thus the agreement seen between pairs of readers tends to reduce as the percentage of abnormal films being examined increases. In this respect then, if this tendency can be applied to the pathologists, this trial was a severe test of their agreement as it contained only abnormal samples. In other situations, when the material being studied includes samples of normal tissue, one would expect the agreement to be better.
In conclusion the results of this preliminary trial are encouraging. However it would be invalid to generalize these results as the panel of pathologists that took part was not a random selection of all pathologists. Additional, better-controlled trials are indicated with sections carefully chosen to be equally representa-tive of the different grades and to include normal tissue. The time interval between readings should be extended to measure intrapathologist variation better, and readings should be made by both expert and nonexpert readers.