Visual analogue scales for detecting changes in symptoms of the sick building syndrome in an intervention study.

OBJECTIVES
This study tested questionnaires using visual analogue scales (VAS) in a cleaning intervention study and attempted to find a simple way of analyzing the replies to the questionnaires.


METHODS
A VAS questionnaire made up of 26 questions was developed and marked once a week for 28 weeks by the room occupants in 3 buildings. A total of 1248 questionnaires was used in the analysis of the results. A simple model based on the differences between a person's average responses during 2 different periods was used in the analysis.


RESULTS
No clear effect of the cleaning was found. Several significant correlations between different questions were established. Estimates for the design of future studies are given.


CONCLUSIONS
The VAS questionnaire proved to be feasible for this type of study. It is suggested that each intervention period should last 4 weeks if the questionnaire is used once a week. However, the length of the period also depends on the expected latency of the symptoms, on how long it takes for environmental conditions to be affected by the intervention, and on how quickly conditions return to "normal" during control periods.

Questionnaires using visual analogue scales offer the possibility of obtaining an indication of symptom intensity. On a linear scale, the person indicates how he or she is feeling at the moment by placing a mark between 2 statements concerning a specific condition. These questionnaires, giving a subjective evaluation of the person's condition, are easily administered by the person and quick to complete, and the result does not depend on the person's recollection of past symptoms.
Two employees of the Scott Paper Company (1) were apparently the first to record the use of a visual analogue scale (VAS). They reported high test-retest reliability (r=0.76) and high interrater reliability (r=0.71) when the VAS was used by superiors to rate subordinate workers, and they noted that the VAS freed the rater from using "direct quantitative terms" and allowed "as fine a discrimination of merit" as was desired. Freyd (2) reviewed this use of a VAS, which was known at the time as the "Graphic Rating Method". Other uses of the VAS approach began to appear in the literature with increasing frequency in the 1960s, initially for the assessment of environmental phenomena (3), but soon afterwards for the assessment of pain (4). Wewers & Lowe (5) and Ahearn (6) provide useful reviews of the historical extension of the VAS approach to the assessment of mood, anxiety, alertness, craving for cigarettes, quality of sleep, and the like, but the VAS does not appear to have been used to assess the particular set of subclinical symptoms associated with the sick building syndrome until Wyon (7) performed intervention experiments in a Swedish hospital. Investigations concerning the sick building syndrome had previously been performed epidemiologically, using questionnaires to assess the frequency of occurrence of building-related symptoms over extended periods (8), rather than a VAS to assess current symptom intensity, which is clearly of more use in assessing the impact of short-term experimental intervention in occupied buildings (9). However, traditional questionnaire methods have been successfully used for daily and weekly symptom recordings in intervention studies (10,11).
The characteristics of human responses obtained by the VAS have been well documented in connection with its use in clinical research. Bond et a1 (12) reported significant correlations (P<0.001) between VAS ratings of anxiety and other established measures, not only between subjects but also when anxiety was experimentally manipulated by the blinded administration of caffeine. Miller & Ferris (13) report that the reliability of VAS assessments is better when the now standard 100-mm scale is used than with a 50-mm scale and that acceptable validity was achieved for the assessment of mood and pain in a comparison with other indices and verbal scales. The validity of the VAS was also investigated by Wyon (7): VAS ratings relating to symptoms of the mouth, lips, throat, skin, nails, eyes and headaches in relation to the sick building syndrome were found to be corroborated by objective indicators of these symptoms, the indicators including measurements of tear-film stability and inter-blink interval, time to swallow, reported use of palliatives, and objective observations of each patient as recorded by a nurse. VAS assessments are sensitive to small changes (14), more sensitive than category scales for changes over time (IS), and stable even when used by different ethnic or cultural groups (16).
Nyren (17) has discussed the difference between the unipolar VAS (eg, as used in pain assessment), typically ranging from absence of the symptom to some level that is defined as maximum intensity, and the bipolar VAS (eg, as used in mood assessment), ranging from "depression" at one extreme to "elation7' at the other. The center of a bipolar scale may be taken by some subjects to represent their own habitual state, while others may take it to represent the "normal" state (for the population). These may or may not differ in a subject's estimation, and therefore additional variance is introduced. Wewers & Lowe (5) regard bipolar scales of this kind as more difficult for subjects to use, although subjects experience little difficulty in using bipolar scales to assess environmental dimensions such as hot and cold or dry and humid, which have a well-understood neutral point. The distribution of the marks from, for example, the left border (scale recordings) often departs from normality and thus a nonparametric analysis is usually to be preferred. Nyren (17) points out that VAS data consistently violate the parametric assumption of constancy of "measurement error". This stricture applies only to the "error" introduced by using subjective methods of assessment. Manual measurement of the distance of VAS markings from one end, using a micrometer, was compared by Huang et a1 (18) with transcription of the same forms using a digitizer pad. Both methods yielded excellent accuracy, but the use of the digitizer introduced fewer errors. A digitizer was used to transcribe the VAS data obtained in the present experiment.
One practical advantage of the VAS questionnaire is that all questions fit into 1 sheet of paper, and this feature makes it easier to administer. It is only necessary to specify the time when the questionnaire has to be filled out and point out that the answers have to reflect how the person is feeling at that particular moment. While traditional questionnaires concerning the sick building syndrome typically ask for the frequency of occurrence, at any intensity, on a 3-point scale (often, sometimes, never), a VAS questionnaire offers the possibility of giving a more accurate answer to a given question and of recording slight variations in response to the same question when presented with the questionnaire on a regular  naire does not require a recall period. The purpose of this study was to evaluate the feasibility of using the VAS questionnaire in cleaning intervention studies. The background for the present study was that relationships between the composition of dust in indoor office environments and the occurrence of symptoms of the sick building syndrome had been reported by, for example, Raw et a1 (1 I), Gyntelberg et a1 (19), and Sundell et a1 (20). The purpose was to study whether improved cleaning could reduce symptom intensity.
Some preliminary results of this study have been presented by Kildeso & Tornvig (21).

Subjects and methods
The questionnaire shown in figure 1 was used. It consists of 26 different questions concerning general conceptions of indoor air quality and related parameters (questions 1-6), sensory irritation of the nose and throat (questions 7-10), sensory irritation of the eyes (questions 15-1 8), skin irritation (questions [11][12][13][14], and, finally, neurotoxic symptoms (questions 19-25). The neutral point differs for the various questions. The purpose of varying the neutral point was to make people consider their answer to each question carefully.
The cleaning intervention study was performed in 3 buildings operated by the municipality of Lyngby-Taarb z k north of Copenhagen, Denmark: a 6-story office building with a total floor area of 4800 m2 built in 1973 with 240 employees, a kindergarten of 300 m2 built in 1970 accommodating 10 employees and 53 children, and a school of 4500 m2 built in 1958 with 22 teachers and 260 children. In the office building a ventilation system with heating and approximately 40% recirculation was in use. The floor coverings were tufted carpets in the administration building and linoleum in the school and the kindergarten. In the office building, intervention was performed on 2 floors, and 4 floors were used as a reference. For the school and kindergarten no reference was used due to limited resources. In each of the 3 buildings, 2 sampling rooms were selected for the measurement of dust.
The design of the intervention study and the measuring program has been described by K i l d e s~ et a1 (22). In summary, the intervention was performed in different combinations: improvement of the vacuum cleaning by high-efficiency particulate-air (HEPA) filtering of the exhaust air and use of a brush mouthpiece, cleaning of the floors as the first step in the cleaning program, application of other cleaning agents, and cleaning of surfaces covered by papers, cables, and the like.
Every Friday, several people working in the buildings filled out a VAS questionnaire. The dust measurements and registration of other factors, such as activity, were also performed once a week but, for practical reasons, on Thursdays. It was found that the intervention had little effect on airborne dust levels, although several significant, but small effects on the levels of dust on surfaces could be shown (22).
The room occupants were asked to fill out the questionnaire half-way through their workday for practical reasons. They were instructed only to consider the way they felt at the time when they answered the questionnaire. Later, the marks on the questionnaires were transferred to a computer by means of a digitizer. A linear scale was used with a mark at the left border set at 0 and a mark at the right border set at 1. The value measured has been referred to as the scale recording in the following text.
All 22 teachers at the school were asked to fill out the VAS questionnaire at the end of the third teaching class Friday before leaving the classroom. In the kindergarten, all 10 employees were asked to fill out the VAS questionnaire at the middle of the day on Friday. In the administration building, approximately 15 people from each of the 2 intervention floors and approximately 10 from each of the reference floors were asked to fill out the VAS questionnaire. People normally working in the dust sampling room on each floor were asked (2)(3), and the rest were selected at random. People were asked to write the date, room number, and their name or some personal code on the questionnaires. Use of a personal code preserves confidentiality while permitting the use of each person as their own control.
The response rate of the questionnaires varied during the study period. Some people stopped answering, and some not initially asked in the administration building started to respond. Only employees who were initially asked and who had filled out at least 10 of the 28 questionnaires were included in the analysis. Therefore, approximately 20% of the questionnaires received were excluded.

Statistical methods
The Anderson-Darling normality test was used to test the data for normal distribution. The Kruskal-Wallis' and Mood's median nonparametric tests, both based on the median of the scale recordings, were applied for testing if the cleaning intervention had any effect on the responses to the questionnaires. The repeated measurements within an intervention period have been assumed to be repeated measurements of the same situation. The Spearman correlation matrix was calculated.
The VAS questionnaire was used to detect changes in the VAS responses between 2 periods (eg, a given intervention and the corresponding control). Thus a simple mathematical model could be used based on the difference between the mean scale recording for the individual person for 2 periods. This model can be applied for the evaluation of any of the 26 questions.
Let Xvk denote the evaluation number k (k = 1,2, .... m?) in period j ('j = 1,2) of person number i (i = 1,2 ,.... n), where E(X,,) = pi and Var(Xik) = oij2. oU,2 is the variance for person number i in evaluating the symptom intensity during period number j, where conditions are assumed to be stationary. The change between the 2 periods for person number i is then 6, = pi, -pa, which is estimated as This model is based on the assumption that the scale recording is an interval variable. People with only 1 or 0 response in 1 of the 2 intervention situations evaluated were excluded. Table 1 shows the number of people and the number of VAS questionnaires included from each of the 8 locations where the single floor in the administration building is considered as 1 location. Between 24 and 59 questionnaires were included every week.

Results
No significant results of the cleaning intervention were found when nonparametric tests were applied. The differences in the means of the data set 8, were found to be normally distributed. Therefore, the effects of the intervention were also analyzed by this method by means of standard statistical methods. This analysis did not reveal any significant effect either. The results from the different intervention periods and the control were pooled. Table 2 shows the mean and median values and standard error of the mean scale recordings for some of the questions at the 8 different locations. The number of observations is shown in table 1. Significant differences between some of the locations were found for several questions using nonparametric tests.  Figure 2 illustrates the mean scale recording as a function of week number for selected questions. It shows how the response to some questions varies little from week to week, while the response to other questions varies greatly. Figure 3 illustrates the distribution of the scale recordings for some of the questions for all 8 locations pooled. It shows that there is a tendency for people to place the mark in the center or close to one of the extremes. For all 26 questions the distribution of the scale recordings for each of the 8 locations was found to deviate significantly from a normal distribution (P<0.001).
A matrix with the Spearman correlation coefficients for the pooled data set is given in table 3 for scale recordings for questions 4,7,9, 15, 16, 18, and 19, as these were considered to be related to cleaning and had their neutral points at the right end point of the scale. The correlation matrix for all questions and factors influencing the answers was calculated, and there seems to be a tendency for a significant (P<0.05) correlation (r>0.7) between the scale recordings for questions concerning the same symptom, whereas the correlation was lower between questions concerning different symptoms. The highest correlation coefficients (0.8<r<0.9) were found between questions 9 and 10, between questions 11 and 12, between questions 16, 17 and 18 in all 3 possible combinations, and between questions 23 and 24.

Discussion
In general, our experience was that the VAS questionnaire was feasible for repeated questioning of the same people due to the easy administration of the questionnaire and the short time needed for completion. It was possible to record differences between the locations for some Table 2 questions. Figure 2 clearly illustrates the difference in variability over time for the questions. It appears that questions with a high variation as a function of time in figure 2 also tend to have a higher coefficient of variation (see table 2) (eg, questions 9 and 15). Some questions will obviously be influenced by, for example, changes in the weather and epidemics of common cold, while others obtain a very stable response (eg, question 19 on headache).
The tendency for people to place the marks in the questionnaire in the middle or close to one of the extremes results in the nonnormal distribution of the data, as illustrated in figure 3, which makes the application of nonparametric statistics necessary. However, the introduction of the model based on the difference of the means for each participating person turns out to be an efficient way for transforming the data set into normally distributed data.
The high correlation coefficients between responses to certain questions illustrates that the number of questions in the questionnaire could be reduced without losing too much information.   The cleaning intervention did not affect the symptom intensity of the sick building syndrome of the room occupants, as measured by VAS questionnaires. This finding may not be surprising, as the effects of the cleaning intervention on dust levels, though significant, were small (22). A correlation, if any, between the symptoms of the sick building syndrome and the dust measurements may also have been reduced because there was a time difference between the dust sampling and the marking of the VAS.
An estimate of the dimensioning of future studies based on the variability in this study can be found by the model based on the differences between the mean values of the scale markings in 2 different intervention periods (1 of them could be the control). As these differences were found to be normally distributed, the equation can be used for finding the necessary number of persons n in each intervention (23). o2 is the standard deviation of the differences and 6 is the average difference that needs to be detected for all people with the power 1 -P (being the probability of detecting an existing difference between 2 periods) and the level of significance a. Z,-, and Z,-p are the percentiles in the standard normal distribution. Questions 4,7, and 18 have been used as examples of estimations of the number of people required in future studies, depending on the duration of the study, to detect a given difference between 2 periods (eg, intervention and control). In the estimate of o, it has been assumed that there are no systematic differences in the scale recordings. The variability is caused by factors that are not controlled in the experiment. Thus o has been estimated for 6, calculated for 2 consecutive periods of equal length T (T = 4, 6, 8, 14 weeks), starting from the beginning of the study period. T=14 weeks covers the entire study. The result is given in table 4 for a level of significance of a = 0.05 and a power of 1 -P =0.8. Table 4. Estimates for the design of future studies based on questions 4, 7 and 18. For different numbers of weeks in each of 2 periods (eg, intervention and control), the number of persons fulfilling the criteria of >I response in each period is given. The total number of questionnaires on which each estimate has been based is also given. The necessary number of people in the study has been estimated for three different values of average change in the scale recordings Gwhich the study must be able to detect. It was found that, for periods (intervention and control) shorter than 4 weeks, more than half of the people will be excluded as a minimum of 2 responses in each period is required to include a person in the analysis. As  table 4 indicates, there is no benefit in using intervention periods longer than 8 weeks. A likely explanation is that, as the period increases, more variability caused by uncontrolled factors will be introduced. Table 4 also indicates that the study design should take into account the expected ratio of missing data.
The estimates given in table 4 vary for other types of questions, but the approximate levels will not change considerably. It is of course important to note that the optimum length of an intervention period is not just a function of the statistical power of the VAS approach. It also depends on the expected latency of the symptoms (how quickly they appear and disappear), on how long it takes for environmental conditions (eg, dust) to be affected by the intervention, and on how quickly conditions (eg, dust levels) return to "normal" during control periods.
This study shows that, even though no effects of the cleaning intervention were found in the VAS responses, these questionnaires may be an expedient alternative to traditional questionnaires, for which repeated measurements are relevant and no recall time is required.