Comparison of expert-rater methods for assessing psychosocial job strain

Comparison of expert-rater methods for assessing psychosocial job strain Scand J Work Environ Health 2001;27(1):70–75. Objectives This study tested the reliability and validity of industry- and mill-level expert methods for measuring psychosocial work conditions in British Columbia sawmills using the demand-control model. Methods In the industry-level method 4 sawmill job evaluators estimated psychosocial work conditions at a generic sawmill. In the mill-level method panels of experienced sawmill workers estimated psychosocial work conditions at 3 sawmills. Scores for psychosocial work conditions were developed using both expert methods and applied to job titles in a sawmill worker database containing self-reported health status and heart disease. The interrater reliability and the concurrent and predictive validity of the expert rater methods were assessed. Results The interrater reliability and concurrent reliability were higher for the mill-level method than for the industry-level method. For all the psychosocial variables the reliability for the mill-level method was greater than 0.90. The predictive validity results were inconclusive. Conclusions The greater reliability and concurrent validity of the mill-level method indicates that panels of experienced workers should be considered as potential experts in future studies measuring psychosocial work conditions.

Most studies of job strain rely on direct self-reports of psychosocial work conditions (1)(2)(3)(4)(5) or self-reports pooled within occupations (6)(7)(8)(9). However, associations between self-reports of work conditions and selfreported health status may be biased due to common methods variance, whereby people who report health symptoms have a higher probability of reporting exposures than people who do not (10). In spite of repeated calls for the development of expert methods for the assessment of psychosocial work conditions to remedy this problem (11)(12)(13), very few investigations have used expert raters (14)(15)(16)(17)(18).
The purpose of this study was to test the reliability and validity of 2 expert methods for measuring psychosocial work conditions in British Columbia sawmills using the demand-control model (19). In the 1st method, 4 sawmill job evaluators estimated psychosocial work conditions at a generic sawmill (industry-level expert method). In the 2nd method (mill-level expert method), panels of experienced sawmill workers were randomly selected at 3 sawmills representing the typical technological range in the industry and asked to estimate the psychosocial work conditions in each mill.
Job-title specific scores for psychosocial work conditions were developed using both expert methods. These scores were then applied to the job titles used in an already gathered sawmill worker data set containing self-reported health status and heart disease outcomes. The reliability and validity of both expert methods were determined.

Subjects and methods
Over the past decade we have gathered a cohort of approximately 28 000 British Columbia sawmill workers who worked in 1 of 14 sawmills between 1950 and 1998 (20). As part of a study on technological change a subcohort of 9806 workers was identified. These workers were working in the industry in 1979. Three thousand workers randomly selected from this subcohort were approached for an interview between October 1997 and March 1999 (21).
A shortened version of the demand-control instrument was used to obtain self-report scores for control, psychological and physical demand, co-worker social support, and noise for the workers' jobs at the time of the interview (21). Self-reports of current health status (on a 5-point scale from excellent to bad) were obtained, as were self-reports of heart disease (yes or no, during the preceding 6 months). Basic demographic and lifestyle information was also obtained, as was each worker's job title at the time of the interview.
In the data set, 66 unique sawmill job titles were identified among the respondents. In order to obtain stable estimations by pooling self-reports within the job titles, we decided that a minimum of 5 self-report scores were necessary per job title. Thirty-one of the 66 job titles were held by ≤4 respondents and so were not used to calculate pooled self-report scores, the result being 35 job titles for which pooled scores could be calculated. Because 7 of these 35 job titles with pooled scores were not estimated with the expert rating method, a final list of 28 job titles was developed for which selfreport, pooled self-report, and mill-and industry-level expert scores were available. All the analyses were based on these 28 sawmill job titles.
Of the 3000 workers sampled, 2156 complete responses were obtained for a response rate of 71.9%. Of these 2156 respondents, 270 had filled out a shortened version of the questionnaire with no psychosocial job strain data, leaving 1886 questionnaires with complete data. Seven hundred and eleven of these workers were employed at a study mill at the time of the interview. Of these, 61 workers had experienced unemployment during the year preceding the interview and were excluded from the analysis. The elimination of the respondents for which pooled self-reports were not available (ie, respondents with uncommon job titles) resulted in a final sample of 408 workers.
Obtaining experienced worker estimates of psychosocial work conditions (mill-level expert method) Three of the 14 sawmills in the cohort study, representing the current range of sawmilling technology, were selected. A list of 54 basic job titles in the sawmill industry was developed using panels of experienced sawmill workers. Using seniority lists, 15 workers with more than 20 year's experience were randomly selected at all 3 sawmills. A single interviewer conducted face-to-face interviews with these workers during 1997 using the modified demand-control instrument.
The 10 best interviews were selected from each of the 3 study mills. The selection criterion was based on the number of job titles estimated. Any rater who was unable to provide an estimate for ≥10% of the job titles was excluded. In 2 of the 3 sawmills, 10 raters were able to estimate ≥90% of the job titles. In the 3rd mill, 12 raters estimated ≥90% of the job titles. Thus, 2 raters were randomly selected and eliminated from the panel in order to obtain a pool of 10 raters at each sawmill. Estimates for the 30 sawmill workers were pooled for each job title.

Obtaining estimates from job evaluators (industry-level expert method)
A union-management system of job evaluation has been in place in the British Columbia sawmill industry since the late 1960s. This job-evaluation system relies on an instrument developed within the industry to measure psychological and physical demand, control over skill use, control over decision making, and physical and other hazards. The expert evaluators were therefore familiar with the dimensions used in the demand-control model and with applying them to assessments of sawmill job titles.
Since the 1960s, a altogether 6 job evaluators have been employed in the sawmill job-evaluation program. These 6 job evaluators, with over 20 years' experience in sawmill job evaluation in British Columbia, were potential interviewees. Three were currently employed by industry, 1 was currently employed by the union, and 2 were recently retired from the union. All 3 industry raters agreed to participate, as did the currently employed union expert. One of the retired union experts was too ill to participate, and the other refused without giving a reason.
Using face-to-face interviews, these 4 expert raters were asked to rate a generic sawmill for current (1997) psychosocial work conditions using the modified demand-control instrument. A total of 54 job titles was estimated.

Reliability analyses
Two measures were used to assess reliability, the intraclass correlation coefficient (ICC) between individual raters and the ICC for all raters (22,23). The ICC for individual raters (individual ICC) measures how an individual rater's estimate of exposure compares with that of the other raters. However, the purpose of this study was not to assess individual raters, but rather to determine whether or not the group's mean estimate of exposure was reliable. That is, had we selected different groups of raters, would the group means of their estimates have been reliable? The measure that best estimates this form of reliability is the ICC for all raters (group ICC) (23, p 273).
For the mill-level method, individual and group ICC values were calculated for the entire group of 30 workers across the 3 mills. For the industry-level method, individual and group ICC values were calculated for the 4 job evaluators. As well, the proportion of variance distributed between raters, job titles, and the residual was calculated.

Validity analyses
Pooled self-reports and mill-and industry-level expert scores were imputed for each individual on the basis of the job title so that each of the 408 persons in the study had self-report, pooled self-report, mill-level, and industry-level scores for each of the 5 psychosocial work conditions. As a measure of concurrent validity, the agreement between the 4 methods was assessed by calculating pair-wise Pearson correlation coefficients.
As a measure of predictive validity, a logistic regression was conducted using the 4 different methods with self-report health status and heart disease as the outcome variables. These health outcomes were chosen since other studies have shown that psychosocial work conditions are associated with them. Self-reported heart disease has been associated with control, psychological demand, co-worker social support, physical demand, and noise in several studies (4,17,(24)(25). Self-reported health status has been linked with control (26).
Logistic regression was used to model self-reported health status (dichotomized to 0 = excellent or good; 1 = fair, poor, or bad) and self-reported heart disease (dichotomized as 0 = absent and 1 = present) as the outcome variables. The sociodemographic variables age, education, housing status, income, and country of birth were modeled with each outcome. Controlling for sociodemographic variables, multivariate models were conducted with respect to control, psychological demand, physical demand, noise, and co-worker social support for each of the 4 assessments. Table 1 shows the reliability of the 4 job evaluators for the industry-level expert method. The group ICC values were highest for control (0.83) and social support (0.84) and lowest for psychological demand. Table 2 shows that, when the estimates for 30 experienced workers in 3 sawmills were pooled, the group ICC values for all 5 variables were greater than those obtained with the industry-level method. In particular, the group ICC scores for the demand variables (physical and psychological demand) were increased by approximately 20% with the mill-level method.

Concurrent validity
As a measure of concurrent validity, the agreement between the 4 methods was assessed by calculating pairwise Pearson correlation coefficients (table 3). For all 5 psychosocial work condition variables, the level of agreement between the direct self-reports decreased, moving from pooled to mill-level, and to industry-level estimation methods. Similarly, the level of agreement between the pooled self-reports decreased, moving from mill-level to the industry-level expert method. The greatest agreement between methods was observed for the pooled self-reports and the mill-level method for all the variables except psychological demand. The greatest agreement between the methods was obtained for noise. The agreement of the methods for psychological demand was strikingly lower than for the other psychosocial work condition variables. Table 4 presents the odds ratios and confidence intervals for reported heart disease. Self-reported noise and psychological demand were both associated with a  higher incidence of heart disease. Increased self-reported physical demand was also associated with a higher incidence of heart disease (P=0.06). The 3 other methods for measuring both noise and psychological demand showed weak associations with heart disease. With the logistic regression with self-reported health status, control was the only psychosocial work condition variable which showed any association (table 5). Self-reports for control were associated with poor selfrated health in the expected direction (ie, self-rated health improved with increasing control). Pooled selfreports for control showed a weaker association between self-reported control and health status in the expected direction. Finally, the mill-level method for measuring control indicated a similar effect size for control and self-rated health status (P=0.07).

Discussion
When the industry and mill-level methods were compared (by pooling 30 workers' scores in the latter method), reliability scores, as measured by the group ICC were higher for the mill-level method. The reliabilities with the mill-level method were greater than 0.90 for all the psychosocial work conditions, which, according to Nunnally (27), is the minimum standard for reliability. The industry-level method was not able to attain this basic level of reliability.
The concurrent validity results showed correlations of 0.32 between self-reported control and control estimated with the industry-level expert method. In the Whitehall Study of British civil servants, for men and women, correlations between self-reports for control and estimations of control conducted by personnel managers ranged from 0.25 and 0.35 (17). These correlations were similar to those obtained in this investigation using the industry-level expert method.
The correlations between the self-reports of control and the mill-level expert estimation were higher (0.53) than the correlations between the self-reports and the industry-level method, and the correlations found between the self-reports and the expert method used in the Whitehall investigation. The greater reliability and concurrent validity of the mill-level method indicates that panels of experienced workers should be considered as potential experts in future studies measuring psychosocial work conditions The predictive validity results were difficult to interpret. They showed that the self-reports of psychological demand and noise were associated with reported heart disease in the expected direction. In addition, the self-reports of control were associated with reported health status, all in the expected direction. However, both expert rating methods showed weaker associations with health outcomes than the direct and pooled selfreport methods did.
There are other reasons for expecting a weaker association between an expert assessment of psychosocial work conditions and health status. For example, the expert assessment of psychosocial work conditions involves cognitive and emotional processing (on the part of the experts), which will tend to attenuate associations between assessed work conditions and health outcomes (18). As well, associations for the industry-level method should be weaker than for the mill-level method because estimations were conducted at arm's length and were based on a generic sawmill rather than on actual sawmills. The weaker associations using the expert method in comparison with the self-report method in this study are to be expected due to this bias. There were several other limitations to this study. Because self-reported heart disease was relatively rare in this sample, the statistical power of the regression analyses may not be sufficient to detect associations when they could be present. In addition, there may be issues of measurement reliability for the 2 outcome measures used in the regression analyses.
In spite of these limitations, this study demonstrated that the mill-level method was reliable. The reliability scores (across all the psychosocial work condition variables) for the mill-level method were approximately 20% greater than for the industry-level method. Improvement in reliability with the mill-level method was particularly noteworthy in the case of psychological and physical demand.
In addition, the mill-level method showed greater concurrent validity than the industry-level method. The much higher correlation in this study, between self-reported control and the mill-level method of measuring control (0.53), compared with the correlations between the self-reported and expert-assessed control in the Whitehall Study (0.32), also point to the utility of using panels of experienced workers to estimate psychosocial work conditions.