Questionnaire reliability and validity for aluminum potroom workers

PhD3 VALE JR, AALEN 00. Questionnaire reliability and validity for aluminum potroom workers. Sca'!d J Work Environ Health 1989;15:364-370. As a part of a study on the respiratory symp toms of aluminum potroom workers, the reliability of a self-administered questionnaire and an interview ques~i,:,nnaire was .studic:d with the use of 261 and 49 employees, respectively. The validity of the self admll1lstered.questionnaire (134persons examined) and the interviewquestionnaire (90persons examined) wasassessed 111 a companson of the statements with the case histories. The reliabilityof the self-administered que~tion~aire was.fairly high: the kappa coefficient ranging from 0.58 to 0.83, while the reliability of the interviewques.tlOnnalre var~ed from -0.03 to 0.45. The samepattern waspresent with regard to validity, as the self-administered questionnaire showed the highest mean sensitivity, specificity, and agreement in a comparison with the case histories. The self-administered questionnaire seemed to discriminate well be tween s~mptomatic and as~mpt?matic individuals, whereas supplemental information about symptoms, as obtained by a standardized interview questionnaire, appeared to be less valid.

Standardized qu estionnaires o n resp irat ory symp to ms ha ve been available since the 1960s (I). Such questionnaires are primarily aimed at recording manifestations of chro nic bronchitis. Wh ile th e sta nda rdized re spiratory questionnaires prepar ed by th e Briti sh Medical Resea rch Council (BMRC) (I) and th e Ame rica n Thoracic So ciet y (2) include questions abo u t wheezing and a sthm a , none of the existing sta nda rd qu estionnaires can provid e information on the po ssibl e a ssociation between asthmatic sympto ms and wo rk co nd itio ns . Using questions from the BMR C questionnaire as a model , we therefore constructed two questi onnaires, one selfadm inistered questionnaire arid one interview que stionna ire, to be used in a prospe ct ive sur vey of asthmatic sym p to m s among aluminum pot room workers.
Several methodological p roblems emerge with the use of questionnaires in epidemiologic work (3). Translating questionnaires to another language may change th e indi vidual interpretation o f the questions and hence the outcome rates (4). Furthermore , in spite of a careful de sign of the questions and their o rde r, there will al wa ys remain some doubt about th e reliab ility and valid ity of the answers obtained .
The purpose of our st udy was , th erefore , to exa mine the qualit y of an swer s to so m e questions with respect to their reliabilit y and va lid ity .

Sample and survey design
The 297 potroom workers, men and wo me n , present in four No rweg ia n aluminum plants on a randomly selecte d date co nstit uted the survey gro up (table I).
Only one man refused to participat e. O f the respond en ts to the sc reening qu estionna ire, 35 did not take part in a seco nd query because th ey had left the plant, wer e perfo rming thei r military duty, were o n sick lea ve, or were not present for other rea sons. No on e declined to pa rti cipate in the second exa mi natio n . A de tailed flo w cha rt of the study is pr esented in figur e I . Of the 296 initial respondents , 101 report ed cough o r a co m binatio n of dyspnea and wh eezing, and these responses qualified the persons for an interview with a standa rdi zed questionnaire. Ninety interviewed person s, mixed with 44 randomly selected pe rsons denying all respiratory sym p to m s, were then examined by an experience d chest physician (JK) who did not know the responses to the questionnaire . The II persons who wer e in terviewed but who did not att end the clin ical examina tion were absent from the plant whe n th e chest specia list visited it.

Questionnaires
Major respiratory sy m pto ms (d yspnea , wheezi ng , and cough) , po ssibl y predisposin g co nd iti o ns in the cas e history (allergy, familial asthma, previous asthma), and ex po su re d ata, includi ng sm o king habits and former wo rk exposure , were , as a fir st stage , recorded on a self-a d mi niste red quest ionnaire. Workers with a pattern of respirat ory complaints believed to be part icularly relevant, ie, wheezing and d yspnea co m bined or co ugh , were then int erviewed with a detailed question-    naire th at had been sta nda rd ized . Thi s qu est ionnaire was de sign ed to provide su p pleme ntal in formation on respi ratory sym pto ms. Some of th e questions in th e qu est ionnaire inte rv iew were translations fr om the BMRC qu est ionnaire. The int er viewer s wer e all experienced plan t nurses who had worked with interview questionnai res for severa l years. They were trained by one of th e a utho rs (JK), usin g BMRC procedures for the training a nd ed uca ting o f int erviewer s (5) . It was emphasized th at there should be no probing for res po nses a nd th at una nswered qu est ion s sh ould be dealt with in th e sa me manner by a ll th e interviewer s. In the clini cal , clo seended interview, we obtained informatio n whi ch was co m pa ra ble to that obtained by the questionnaires.
(Th e quest ionnaires ar e obtain able from th e a utho rs on request. ) Regis tr at ion 1 Reg is t rati on 2 Fr equ enc y Definitions and data analysis Validity. Validity refers to th e ability of a qu estionnaire to measu re what it was inte nde d to me asure. It is genera lly expressed as sensitivity and specificity (ta ble 2) . Va lidi ty is usu all y assessed in a co m paris o n o f the resul ts o f a questionnaire wit h se pa ra te , ind ep endent crite ria . For resp iratory qu estionnaires a pp ro pria te crite ria a re genera lly un a va ilable. However, the questionnair e method att em pts to elicit esse ntia l aspects o f tho se symptoms that would be found in an extensive clinical history (6,7). Despite the disagreements that might exist between physicians in relation to diagnoses, Hampton et al (8) found that a co rrect clinical dia gnosis was obtained by history taking alone in 85 070 of the cases studied in a medical outpatient clinic. On the basis of this finding, we decided to test the validity of the questionnaires aga inst clinical histo ry tak ing. In addition to sensitivity and specificity we estimated the kappa agreement (ta ble 2) between the qu estionnaires and the an swers obt ained by a chest ph ysician (JK). the result s. Th e period between the test and retest was chosen to minimi ze both of these effec ts , but we were also for ced to pay att entio n to shif t sched ules and vacat ions. Con sequentl y, reliabil ity was measured as the agreement between the statements obtained by two administrations of the self-administered and th e interview questionnaires with an interval of three to five months. The reliability of the amount of smoking , estimated as pack-yea rs, was expressed by the co rrelatio n coefficient (Spearma n rank ).

Reliability.
The agreement o f response between two administrat ions of a qu estionnaire is an appro priate measur e of reliability (tabl e 2) (3). However , this approach assumes that th e symptoms do not change in the interval between examinations; otherwise the results will be influenced by real variation in symptom status. On the other hand, if the interval is too shor t , the subjects may recall their former answer s and hence bias Kappa coefficient. The agreement between discrete variables in qu estionnaires and case histori es, as well as agreement betwee n th e two ad ministrat ions of a questionn air e (reliab ility), was estimat ed by th e kappa coeffic ient (9). A kappa value of 0 correspond s to a chance expectation, while a kappa value of I ind icates perfect agreement. Kapp as ranging from 0.4 to 0.7 are considered satisfactory, while a value above 0.7 is regarded as excellent (10). a Th e resul ts are based o n the data in tabl e A1 of appe ndix I. Table 4. Reliabil ity of so me of th e questio ns inc luded i n th e interview qu estionnaire on res piratory sym pt oms ." Table 3. Reli ability of so me of t he quest ions inc luded in the self-adm inistered respi rato ry questlonnai re.s a The result s are based o n the data in tabl e A2 of append ix I. b The quest io ns we re transl ated from the qu est ionn aire of the British Medi c al Res earch Coun cil.

Reliability of the interview questionna ire
Tabl e A2 in a ppendix 1 gives the compariso n of the two sets of responses to the interview questionnaire. The reliability of the answers to th e inter view qu estionn aire (table 4) was generally lower than for the self-

Data analysis of interview questionnaires versus selfadm inistered questionnaires.
Since onl y per son s who reported symptoms were inter viewed, false negative repor ts from the self-administered qu estionnaire were not discovered with the interview questionnaire. Therefore, the sensitivity and specificity of the main symptoms (dyspnea , wheezing, and cough) of the int erview questionnaire could not be estimated.

Reliability of the self-administered questionnaire
The compariso n of t he first and seco nd administration o f the self-administered qu estionnaire is shown in tabl e A I in ap pendix I. The reliabilit y of the q uestions varied fro m 0.58 to 0. 86 (tabl e 3). Th e major respiratory qu estion s (dyspne a, wheezing, cough , and a combination of dyspnea and wheezing) had very acceptable value s, ie, kappa =0.63, kappa =0.66, kappa = 0.58, and kappa = 0.61, respectively. Th e reliability was higher for th e report ing of childhoo d allergy (kappa = 0.70) and famil ial asthma (kappa =0.83).
The overall prevalence of symptoms from the two surveys did not differ significantly (table 1). Th e question o n smoking hab its with the crud e classifica tio n o f smo kers versus ex-smo kers a nd never smo kers had a high reliabili ty (kappa = 0.86). The correlation coefficie nt for th e amount of smoking was 0.88 (Spearman rank). Validity of the self-adm inistered questionnaire Tabl e Bl in append ix II gives the comparison between the responses to the self-administered questionnaire a nd the case histori es. Th e validity of the symptom qu estions is sho wn in table 5. The question about cough had the lowest overall validity of the main symptom questions with a sensitivity of 73 070, a specificity of 67 0J0, and a kappa of 0.39. Th e q uestio ns a bo ut history of asthma and childhood allergy had th e lowest sensitivity of th e screening qu estion s, 50 and 53 %, respectively.
Th e crude classification of smoking habits had an optimal specificity, sensitivity, and agreement (98 % 100 % and 0.97, respectively). High kappa agreement (0.61) was also found for the question a bout former work exposure. Table B2 in appendix II gives the comp arison between the respon ses to the inter view qu estionnaire a nd th e case histories.

Validity of the interview questionnaire
The validity of so me of the questions of th e interview is shown in table 6. The mean sensitivity and specificity was 58 and 77 %, respectively. The kappa agreement ran ged from 0.08 for symptoms at wor k to very acceptable values, as fo r instance for the que stion ab out respiratory symptoms at night (kappa = 0.63). The question s translated from the BMRC que stion naire had an agreement of 0. 57, 0.54 , and 0.20 . Th e mean kapp a agreement of the questions validated was 0.34. Table B3 in appendi x II shows how symptoms reported in the self-administered que stionnaire were reported in the interview.

367
With regard to the que stion on dyspnea and the one on the combination of dyspnea and wheezing, only 2.7 and 3.5 0J0 , respectivel y, gave negative statements in the interview. However, as many as 15.4 070 of tho se reporting cough in the self-administered questionnaire denied such symptoms in the interview .

Reliability
Agreement between the responses to the same questionnaire on two occa sion s is the usual measure of the reliability of a questionnaire (3). Simple observed agreement (see table 2) has, in most other studies, been used as the parameter of reliability, and it is not directly comparable with the kappa coefficient. Since the kappa index relates the observed agreement to the agreement that occurs by chance, we have found this parameter more valuable.
The self-administered screening questionnaire had a n acceptable reliability (me an kappa =0.70) . The reliabilit y of the main symptom que stions compared fa vorably with others calculated from published data (II,12,13). In a study of cotton textile workers the reliability was 0.6 2 for a qu estion on chest tightness and 0.31 for a que stion on grade s o f dyspnea (II) . In another study of respiratory sym ptoms among coal miners, the reliability for que stions on symptoms of phlegm and wheezing was 0.50 and 0.54, respectively (13). For a group of 30 medical patients, the reliability for que stions on cough and dyspnea was 0.43 and 0.59, respectivel y (12). How ever , the reliabilit y of our self-ad ministered questionnaire was some wha t lower than what Mitchell & Miles found in a study of Queen sland schoo lchild ren (14). The reliab ility of wheez ing was found to be as high as 0.86, while the reliab ility of the statement on productive cough was 0.81. The increased reliability in the latter study might be due to the interval between the two administrations of the questionnaire. In the latt er there were nine weeks between the studies, while in our study the interval ranged from three to five months -and this time interval might give a real change in sympt om s. For example , alterations in environment al exposur e, which is more likel y to occur for indu strial workers than for schoolchildren, might give rise to real cha nges in sympto ms. The reliabilit y of some major que stion s from the interview questionna ire (dyspnea , dyspnea and wheezing, cough, and cough more than three months a year) was abo ve the acceptable lowe r limit (kappa =0.40) (table   4). The lower reliability of the se que stions than with the same questions of the self-administered questionnaire could be explained by response bias introduced by the interviewer.
Smoking status has been reported with higher reliability than respiratory symptoms (13,15). Thi s was also the case in our study. The crude classification of former work exposure status sho wed the same relia-368 bility as smo king status. The estimation of lifetime cigarette con sumption (pack -year s) also had a high reliability (r =0.88) and was fairl y comparable to that reported by Samet et al (r = 0.81) in a study o f asbestosexposed workers (16).
If we accept a kappa coefficient of 0.40 as the lower limit for approvable values , onl y the int erview que stionnaire had questions that were less reliable. Th ese were det ailed que stions about symptoms and thei r relation to work, vac a tio ns, etc. A possibl e expla na tion for this variation could ha ve been the influen ce of va riations in the work environment expos ure and the fact that a thre e-week summer vacation fell between the two examin ations. However, there were no di fferences between the prevalence of work-related co mplaints and vacat ion disabilities on the two occasions . Furthermore, the constant prevalence of symptoms suggests that pos sible under-or overreporting occurred with equ al frequency and should not hav e influenced subsequent results in either direction.
The respiratory questions in our study reviewed the prec eding year. This interval may po ssibly introduce recall bias and could account for th e lowe r reliabilit y of the detailed qu estions of the inter view questionnaire. Th e optimal recall period for dramatic occurrences such as motor vehicle accidents can be as sho rt as three months (17). Large intraindividu al va riation in symptom s is another po ssible explanation fo r the low reliabilit y o f the int erview que stionnaire . Thi s finding is in accordance with the findings of longitudinal studies by Ferri s et al (18) and Sharp et al (19) suggesting that respiratory symptoms are not static; they both develop and remit , even in indi vidu als who maintain the same smo king sta tus. Samet et al (16), who studied a gro up of shipya rd workers twice one year apart, also found that work ers with unchanged smo king hab its repl ied to the sympto m questions with an average, observed agreement of only 70 0J0 (implying an even lower value for kappa). Holland et al (7) mea sur ed an average observed agreement of 85 0J0 when either the BMRC questionnaire or the National Coal Board's Pneumoconiosis Field Research Questionnaire was read ministered after six months, and even in th is study, using thoroughly validated questionnaires, certa in questions had rcliabilities as low as 66 0J0 (implying an even lower valu e for kappa) . Preliminary result s (unpublished) from a prospecti ve study on bronchial hyperreacti vity in pot room workers indicat e that symp to ms show co nsidera ble intraindividual va ria bility during an interval as short as three months, a nd these result s support the the ory that the low reliabilit y is partly due to a real chan ge in sym ptoms and is not entirely related to the qu alit y of the que stions.

Validity
The self-administered screening que stionnaire had, except for childhood allergy, que stions that correlated well with the ph ysician 's assessment. An und erestima-tion of a llergy o n the ba sis of the sta teme n t o f chi ldho od all er gy see ms likely, as a llergy in the genera l popu la tio n is estimate d to b e twice as high. H o wever , a llergy has been regarde d for a long time as a risk factor fo r th e de velopment of " potroom a sthma" a nd th e selectio n o f nonallergic persons to suc h em p loy m en t ha s been p ra cticed .
The question o n a history o f asthma ha d low validity. But o n ly two of th e 134 clinically exam ine d pe rso ns gave a n a ffirm a tive a nswer to thi s q ues tio n , a nd th e estimatio n of both va lid ity and relia bility is d ubious.
The questions o n smo king sta tu s and on p revious wo rk exp osure showed excellent va lidi ty , we ll co mparab le with t ha t o f o t he r quest ionnaires (20,21).
The sta ndardized int erview questi onnai re ai me d a t a more-detailed char a cter iza tio n o f the resp iratory sym p to ms. H ow ever, the quality of the in formation collected in t he interview seemed to vary. There was close agreem ent between th e self-ad ministered a nd inter view questionnaires regarding th e positi ve a nswers to th e quest ions o n d ysp nea a nd wheezi ng, while th e quest ion o n co ug h showed less sta bilit y. W e ha ve no exp la n ation why so many d en ied co ug h in t he int erview , while giving a p ositi ve a nsw er in the se lfa d mi n iste re d q uestio n naire. Sup p leme nta ry in fo rm ation on th e cha racte r of co m p la in ts lik e th e rela tion to wor k a nd durat ion o f sympto ms had generally a low validity in addi tion to the alrea dy mentioned low ag reement of the interview questionnaire in the two ad ministr ations . Th ere fore , the int erpretation of some of the sup p lem en ta l q ues tio ns is di ffi cult and ha s to be d on e wit h ca re. Eve n qu estions tran sla ted fro m t he BMRC questionna ire d id not a chi eve bett er va lid ity t ha n th e qu esti ons of th e self-a d m in istered questi onna ire .
The se lf-admi niste red q uestio n nai re see ms to be a va lu a b le to ol fo r screening for respi ratory sym pto ms a mo ng alum inum potroom wo rkers, whi le the use of a n inte rview questionnaire as a su bs ti tute for h istory taking b y a phys icia n seems less ad visa ble. a Th e number of subjec t s in eac h row differs beca use the an-a The num be r of subjects in eac h row differs because the an swe ri ng of some q uestion s was dep en dent on a positive. sw eri ng of so m e que st ion s w as depend ent on a positive ans wer in th e prec ed ing question . answer i n t he precedin g ques t io n. b The " no" and "don't kno w" stat emen ts were cod ed togeth er. Table B3. Co mpariso n of positive statements in t he seuadm inistered ques t ionnai re versus the statements in the inte rvi ew quest ion na ire .