Validity of a single-item measure of stress symptoms.

OBJECTIVES
The objective of the study was to investigate the content, criterion, and construct validity of a single-item measure of stress symptoms. Such a concise measure would be useful in monitoring stress at work. The criteria for validity were convergence with conceptionally close measures, the plausibility of associations with health and work characteristics, and the power to discriminate between groups.


METHODS
Four sets of independent cross-sectional data were used. The first data set, from Finland Post, comprised symptoms of ill health and mental resources (N=1014). The second, from four Nordic countries, included well-known validated scales on exhaustion, mental health, sleep, vitality, and optimism, and therefore the convergence between the measures could be studied (N=1015). The third, from a metal factory, included three indicators of health and four work characteristics (N=773). The fourth, representing the Finnish working population, described group differences in stress symptoms (N=2156) and allowed comparison with a study on emotional exhaustion in the working population. Distributions, correlations, and factor analysis were used for the study.


RESULTS
The stress-symptoms item converged with items on psychological symptoms and sleep disturbances and with validated measures of well-being. It had theoretically grounded associations with indicators of health and psychosocial work characteristics, and it discriminated between gender and age groups and industrial branches in accordance with the validated emotional exhaustion scale.


CONCLUSIONS
The stress-symptoms item showed satisfactory content, criterion, and construct validity for group-level analysis. It is suggested that the longer scales used to measure psychological stress can be replaced with it in survey research.

The cost, both economic and human, of occupational stress is high (1,2). Work organizations are therefore paying more attention to monitoring and preventing stress (3). Although stress at work has been given several definitions, common elements of those used in the field of psychology are an imbalance between environmental supply and individual needs and also an imbalance between environmental demands and individual motives and abilities (4,5). Stress has been considered a response to a stressful situation, the response being conducive to illness. Stress can activate a person's nonspecific defense system and lead to various symptoms of ill health if the process is prolonged (6,7).
No specific method is available with which to measure long-term stress. In worklife long-term stress is often evaluated from experienced symptoms and work-related effects (12). These measurement scales have different names, depending on their content and purpose. Frequently, the General Health Questionnaire (13,14), the Short-Form 36-item Health Survey (15), and the Maslach Burnout Inventory (16) are used for this purpose. Although traditional psychometric principles presuppose measurement scales consisting of several questionnaire items, the questionnaires used in worklife Original article cannot always be comprehensive, especially in repeated follow-up studies. Valid single-item methods would offer a means with which to report the results of interesting intervention trials dealing with work and worker well-being when comprehensive scales are not accepted by the target organization. The Occupational Stress Questionnaire (17,18) was developed for this purpose according to the single-item principle for use by occupational health personnel in monitoring perceived wellbeing and the psychosocial factors related to it. The questionnaire is based on the psychological theory of work stress (6,19), and it also includes the dimensions of job demands, control, and social support (1).
The single-item measure of stress symptoms included in the Occupational Stress Questionnaire was developed in the beginning of the 1970s on the basis of both symptom checklists used in mental health screening and clinical experience with normal patients in occupational health settings. The question refers to the general experience of stress, not to work-related stress, as follows: "Stress means a situation in which a person feels tense, restless, nervous or anxious or is unable to sleep at night because his/her mind is troubled all the time. Do you feel this kind of stress these days?" The response is recorded on a 5-point Likert scale varying from "not at all" to "very much". The question is used in individual and group screening in occupational health services, in organizational assessment (18), and in population studies (20). It was used as a national indicator of psychosocial harm in describing national profiles as a response to the European Office of the World Health Organization in developing criteria for auditing workplace health systems (21). Such long-term use of the question in different contexts shows a priori or face validity that can be interpreted as an intuitive estimate of content validity (22).
A measure is valid when it measures what it is purported to measure. According to the principles of psychological test validation, "validity" refers to the appropriateness, meaningfulness, and usefulness of specific inferences made from test scores, and test validation is the process of accumulating evidence to support such inferences (23). The conceptualization of validity varies somewhat in the fields of psychology, epidemiology, and sociology, but common to these traditions is the distinction between construct validity and criterion validity. Construct validity refers to the degree to which the measure captures the hypothetical quality or trait (ie, the construct). "The estimate of construct validity is always changing with the accumulation of further evidence about the traits and qualities that underlie the construct [p 781]" (22). Criterion validity can be established in relation to an independent validated criterion method that is concurrently available with the investigated method or in relation to a future outcome (22,24,25). Measurement reliability is a prerequisite for the empirical testing of validity.
In psychology, content validity is also emphasized, as it is easier to study it empirically than it is to study construct validity, through convergence and divergence with other measures. On the whole, investigating validity does not deviate from the general scientific procedures used to confirm theories (26,27). Empirical testing of validity is not always possible, but almost any information gathered in the process of developing or using a test (or method) is relevant to its validity (28).
The concurrent validity of a method can be investigated by comparing its results with those of a method with well-characterized properties. Generally, factor analyses of construct and content validity, especially structural equation modeling, are used for this purpose. For single items, traditional methods are available for investigating validity. The multitrait-multimethod matrix, correlations with variables assumed to measure the same concept, experimental designs, and investigations of response processes are some examples (29). Longitudinal data would offer the best means of estimating construct validity and predictive criterion validity, but the numerous problems in implementing longitudinal study designs in worklife limit the applicability of this approach (30,31). Especially in the context of method development, organizations and employees are reluctant to spend their time responding to extensive test batteries and undergoing repeated measurements. Frequent changes in modern organizations also limit the possibilities to carry out randomized reference studies and interpret the changes observed.
Borg et al (32) reported positive experience with the predictive validity of a single-item measure of self-rated health. Their results showed that, in a 5-year followup of a working population, repetitive work, psychological demands, low social support, job insecurity, and ergonomic exposures were significant predictors of the worsening of self-rated health. Wanous et al (33) carried out a meta-analysis on the validity of single-item measures of job satisfaction. According to their results, it is acceptably reliable and valid to use a single-item measure for a concept such as job satisfaction, which is located between factual questions and more abstract or vague psychological concepts. According to their metaanalysis, the estimated lower limit of the reliability of single-item measures of job satisfaction is 0.67. Although Wanous et al (33) recommended the use of sum scales whenever possible, they listed certain, often practical reasons for using single-item measures. Reduced costs, increased face validity for the respondent, and problems related to the construction of sum scales support the use of single items instead of sum scales. Item bias, for example, the blurring or reversal of information, has been shown to be common with sum scales measuring work characteristics (34,35). Therefore, in brief, the objective of this study was to investigate the validity of the following single-item measure of stress symptoms: "Stress means a situation in which a person feels tense, restless, nervous or anxious or is unable to sleep at night because his/her mind is troubled all the time. Do you feel this kind of stress these days?" The response was recorded on a 5-point Likert scale varying from 1 "not at all" to 5 "very much". Our detailed research questions were the following: 1. Does the single-item measure of stress symptoms have content and concurrent criterion validity on the basis of convergence with detailed questions on symptoms and mental resources and validated scales of mental well-being? 2. Does the stress-symptoms item show construct validity? 2.1. Does the stress-symptoms item have theoretically plausible associations with indicators of health and psychosocial work characteristics? 2.2. Does the stress-symptoms item discriminate coherently between gender and age groups and between industrial branches when compared with the discriminative power of a validated scale of emotional exhaustion?

Material and methods
The study was carried out using four sets of questionnaire and interview data in which the stress-symptoms item was included. The descriptive approach was selected to avoid possible data dependence in the interpretation.

Data and measurement methods
The four data sets came from different organizational, validation, and population studies. The first set (I) came from Finland Post Ltd, where 1014 employees with delivery tasks responded to the Occupational Stress Questionnaire (17) The second set of data (II), a heterogeneous data set on 1015 employees from Denmark, Finland, Norway, and Sweden, was gathered in the context of validating the Nordic Questionnaire for Psychological and Social Factors at Work (QPSNordic) in relation to six validated indicators of mental health and well-being and the stress-symptoms item (35). The six indicators were as follows: (i) the emotional exhaustion scale of the Maslach Burnout Inventory, consisting of 5 items ranging from 0 (never) to 6 (daily, eg, "burn out") (16), (ii) the mental health scale (12 items, eg, "unhappy and depressed"), and (iii) the sleep disturbance scale (3 items, eg, "staying asleep") of the General Health Questionnaire (GHQ), the item scales ranging from 1 to 4 (better or more-much worse or less than usual) (13,14), (iv) the vitality scale (4 items, eg, "feel full of pep"), (v) the mental health scale (5 items, eg, "very nervous person") ranging from 1 (all of the time) to 6 (none of the time during the past 4 weeks) of the Short-form 36item Health Survey, (15), and (vi) the optimism scale, consisting of six items of the Life Orientation Test (LOT) ranging from 1 (strongly disagree) to 5 (strongly agree or in general, eg, "optimistic about my future") (37). All the response scales were reversed to indicate the quantity of the attribute in question.
The third data set (III) was collected in a light metal factory, where 773 employees from all occupational levels responded to questionnaires and attended a medical examination in relation to a workplace health promotion program (38). The following measures were derived from this study: (i) the work ability index (39), a questionnaire composed of the following seven item areas: work ability compared with lifetime best (0-10 points), work ability in relation to the demands of the job (2-10 points), number of current diseases diagnosed by a physician (1-7 points), estimated work impairment due to diseases (1-6 points), sick leave during the past year (12 months) (1-5 points), own prognosis of work ability 2 years from now (1, 4 or 7 points), and mental resources (1-4 points), the range of the index being 7-49; (ii) health status as diagnosed by an occupational health phycisian at an individual appointment at the occupational health clinic, the status being classified from 4 to 10 according to Finnish school grades; (iii) perceived health; and (iv) work characteristics (17). Perceived health was measured with the question "What is your health state compared to that of other people your age?" The response scale from 1 (very good) to 5 (very poor) was reversed. In addition single-item questions on the following work characteristics were included: job control ("At work, can you influence matters concerning you?"), support from supervisor ("Does your supervisor provide help and support when needed?"), social climate ("How do workmates get along at your workplace?"), and quantitative workload ("Do you have to hurry to get your work done?"). The item scales varied from 1 to 5 (5 indicating the quantity of the attribute in question).
The fourth dataset (IV) was a stratified random sample of the Finnish working population. It comprised 2156 people who were between the ages of 25 and 64 years and answered the stress-symptoms question in a telephone interview (20). Statistics Finland planned and carried out the sampling.
A summary of the research questions and data sets is given in table 1.

Statistical methods
In the first data set (Finland Post, N=1014) the content validity of the stress-symptoms item was investigated with the use of factor analysis. The maximum likelihood method and varimax rotation were applied.
In the second data set (Nordic data, N=1015), concurrent criterion validity was investigated by Pearson's product-moment correlations between the stress-symptoms item and six validated measures of well-being. These measures were also factor analyzed to investigate the construct validity of the stress-symptoms item.
In the third data set (metal factory, N=773), the construct validity of the stress-symptoms item was further investigated with Pearson's product-moment correlations. These analyses were used to investigate the associations of the stress-symptoms item with health indicators and psychosocial work characteristics.
In the fourth data set (Finnish working population, N=2156), construct validity was determined on the basis of the discriminative power of the stress-symptoms item. The distributions were used to investigate the prevalence of stress symptoms according to gender, age, and industrial branch and to compare this prevalence with the prevalence of emotional exhaustion measured by the validated Maslach Burnout Inventory (36). In both studies the industrial branch was classified according to the standard industrial classification (40).

Convergence of the stress-symptoms item with other measures of well-being
Four conceptionally distinct factors were extracted in the factor analysis of the items concerning psychological and physical symptoms, mental resources, and the stress-symptoms item of the Occupational Stress Questionnaire. The factors described physical symptoms, psychological symptoms, mental resources, and sleep disturbances (table 2). The total variance explained was 47.8%. The stress-symptoms item had the highest loading on psychological symptoms, 0.48, and sleep disturbances, 0.45. The communalities of the variables "depressed", "active and energetic", and "stress symptoms" were above 0.60 and, therefore, indicated satisfactory reliability for these variables.
The stress-symptoms item correlated strongly with the mental health scale of the Short-form 36-item Health Survey (-0.63), the content of which emphasizes depressive symptoms, and with the vitality scale (-0.58), which includes items reflecting general energy. Correlations with other indices were also high, with the exception of "optimism" (-0.24) (table 3).
A tentative factor analysis was carried out to investigate the concept of mental well-being as covered by the validated scales and the stress-symptoms item. The scales formed one factor with an eigenvalue of 3.63. The total variance explained was 51.86%. The highest loadings were found for mental health in the Short-form 36-item Health Survey (0.88), mental health in the General Health Questionnaire (0.86), vitality in the Shortform 36-item Health Survey (0.83), and the stress-symptoms item (-0.72).

Plausibility of the association between the stresssymptoms item and indicators of health and psychosocial work characteristics
The stress-symptoms item correlated statistically significantly with the work ability index and diagnosed health status, but the correlations were low (-0.19 and -0.14). It correlated somewhat more highly with perceived health (-0.31), however. The associations between the work characteristics and the stress-symptoms item were  stronger than they were with the health indicators (table 4). The stress-symptoms item showed the highest association with quantitative overload (0.30) and social climate (-0.17). The health indicators were associated more clearly with job control, and perceived health was also associated with social climate.
Coherence of discrimination between groups on the basis of the stress-symptoms item and the scale of emotional exhaustion Stress symptoms and emotional exhaustion were investigated independently in the Finnish working population in similar samples in the same time period. Stress symptoms and emotional exhaustion (36) varied slightly but coherently between the men (N=1032) and women (N=1124). The well-being of the women was somewhat lower according to both indicators. Stress symptoms and emotional exhaustion increased as age increased. For the stress-symptoms item, the number of observations in the age groups were 487, 706, 718, and 245 from the youngest to the oldest age group, respectively (figure 1). Both indicators of mental well-being (the stresssymptoms item and the emotional exhaustion scale) discriminated between industrial branches, but the order of the branches varied somewhat according to the indicator. "Stress symptoms" had the highest ratings in education and research. It was also highly rated in hotel and restaurant work and finance and insurance (figure 2). Emotional exhaustion was not only strong in education and research, but also in health and social welfare services, technical and business services, and finance and insurance. The differences between the industrial branches were relatively small when the small number of observations in some branches were taken into consideration ( figure 2).

Discussion
The content validity of the single-item measure used for stress symptoms was satisfactory for monitoring stress in different worklife contexts. The stress-symptoms item most clearly reflected psychological symptoms and sleep disturbances according to the factor analysis of this item and the items measuring psychological and physical symptoms and mental resources. The content and concurrent criterion validity of the stress-symptoms item was corroborated also by its congruence with the wellknown validated scales measuring mental well-being. This single-item measure of stress symptoms varied in accordance with the mental health scales of the General Health Questionnaire and the Short-form 36-item Health Survey. It also reflected general energy to some degree, as it was clearly associated with emotional exhaustion, sleep, and vitality. Psychological symptoms, which are an essential component of well-known mental health scales, and sleep disturbances can be considered early signs of stress, and the investigated single item of stress symptoms covered these early signs of stress well.
The construct validity of the stress-symptoms item was corroborated on the basis of its convergence and divergence with indicators of health. Its association with perceived health was moderate, whereas its relationship to diagnosed health status and work ability was low, although statistically significant. Because all diseases, also those not mediated through the stress mechanism, are taken into consideration in the objective and subjective assessment of health and in the work ability index, the single-item stress-symptoms measure should also diverge from these indicators. A more profound investigation of the construct validity through the testing of cause-effect relationships was not possible since the data of this study were cross-sectional. However, in two recent longitudinal studies, the stress-symptoms item proved to be an important predictor of the incidence of shoulder pain and sciatic pain, with a dose-response effect (41,42).
The construct validity of the stress-symptoms item was further supported by its impending associations with work characteristics. Work overload and social climate were associated with it, whereas job control was related to the illness-based health indicators. Stress symptoms have been shown also earlier to be associated with work overload (43), whereas job control has predicted health and mortality (10,11). The magnitude of the associations between stressors and stress outcomes has displayed replicable patterns in research results in that stressors have correlated moderately with symptom checklists reflecting dysphoria and poorly with somatic symptoms (9).
The discriminative power of the stress-symptoms item was verified by a coherent picture of the differences between the gender and age groups when this picture was compared with that of the emotional exhaustion scale of the Maslach Burnout Inventory. The differences between the groups were rather small but consistent in relation to these two indicators. The women's well-being was lower than that of the men, and wellbeing decreased according to both indicators as age increased.
The similar discrimination of the stress-symptoms item and the emotional exhaustion scale with respect to the industrial branches further supported the construct validity of the stress-symptoms item. It is possible that the few differences in the order of the branches with [Stress-symptoms item = percentage of the respondents in the response categories "rather" or "very much" (Finnish working population data, N=2156), emotional exhaustion = percentage of the respondents in the standardized category "severe" (36)].
these two indicators were due to the great variation of stress and exhaustion within the branches or to the small number of observations in some categories even though the data were obtained from representative samples of the Finnish working population. Nor can these two indices be expected to converge totally because the stresssymptoms item measures the general experience of stress and the emotional exhaustion scale measures work-related exhaustion.
In summary, the single-item measure of stress symptoms proved to be a valid measure for drawing grouplevel conclusions about mental well-being. We suggest that the longer measurement scales used for nonspecific symptoms of psychological stress can be replaced by the stress-symptoms item in monitoring stress at work and in survey research. It seems to be a more sensitive indicator of well-being in work organizational studies than are illness-based health measures. Less information is available on the validity of this questionnaire item in areas of life other than work. Further construct validation of the item would require longitudinal research. The application of this single-item measure of stress symptoms requires local validation when used in different contexts and cultures in spite of the positive results of our study.