Validity of self-reported mechanical demands for occupational epidemiologic research of musculoskeletal disorders

Validity self-reported for disorders. . 2009;35(4):245–260. Objectives To describe the relation of the measured validity of self-reported mechanical demands (self-reports) with the quality of validity assessments and the variability of the assessed exposure in the study population. Methods We searched for original articles, published between 1990 and 2008, reporting the validity of self-reports in three major databases: �����host, Web of �cience, and �ub�ed. �dentified assessments were clas- �����host, Web of �cience, and �ub�ed. �dentified assessments were classified by methodological characteristics (eg, type of self-report and reference method) and exposure dimension was measured. We also classified assessments by the degree of comparability between the self-report and the employed reference method, and the variability of the assessed exposure in the study population. Finally, we examined the association of the published validity ( r ) with this degree of comparability, as well as with the variability of the exposure variable in the study population. Results �f the 490 assessments identified, 75% used observation-based reference measures and 55% tested self-reports of posture duration and movement frequency. Frequently, validity studies did not report demographic information (eg, education, age, and gender distribution). Among assessments reporting correlations as a measure of validity, studies with a better match between the self-report and the reference method, and studies conducted in more heterogeneous populations tended to report higher correlations [odds ratio (�R) 2.03, 95% confidence interval (95% ��) 0.89–4.65 and �R 1.60, 95% �� 0.96–2.61, respectively]. Conclusions The reported data support the hypothesis that validity depends on study-specific factors often not examined. �xperimentally manipulating the testing setting could lead to a better understanding of the capabilities and limitations of self-reported

�nstruments based on self-report (referred to as �self-self-report (referred to as �self-(referred to as �selfreports" hereafter) are used commonly for the assessment of mechanical exposures in epidemiologic research of musculoskeletal disorders (1). �elf-reports have unique advantages for a number of applications: (i) they can be less expensive than observation-based and direct measurement instruments to assess large populations (2); and (ii) they constitute a feasible method to assess exposures that occur with highly irregular patterns, for example exposures that change seasonally (3), in the past (4), or under conditions where research space for interviewing is limited or privacy must be maintained (5).
Numerous studies have warned researchers about the lack of validity of self-reports, based on the low level of association that is frequently reported between this type of instrument and observation-based or direct measurement methods for the assessment of mechanical demands (6)(7)(8). �t has been argued that while self-reports effectively convey relative differences in exposures of heterogeneous populations, they are imprecise measures of the absolute levels of the exposure (9). �thers, in contrast, have stated that the validity of self-reports for the assessment of mechanical exposures cannot be appropriately established with the information currently available (10,11). These reviews have coincided in that the reported agreement between self-reports and other more objective measures of exposure assessment may be due to the methodological characteristics of the studies testing such agreement and not the true capability of self-reports to measure the exposure of a population. �n their review, �tock et al (10) systematically explored the effects of question, response scale formulation, and criterion methods on the validity of questionnaires and interviews used to assess occupational mechanical exposures. The authors found that low measured validity may often be due to the poor formulation of questions, which limits the ability of the study population to accurately report their exposures. Also, the authors noted that many of the studies included in the review used reference methods that were untested, not necessarily able to capture the variability of work (which may have been captured by self-reports), or not comparable to the questions being tested. �tock et al's review found it difficult to establish definitive conclusions about the validity of these self-reports due to the methodological limitations in many of the studies that were reviewed.
Another recent review by �arriera-Viruet et al (11) assessed the quality of studies that tested the validity of selfreports, and the association between study quality and the measured validity. To assess the quality of each study, the authors considered the reported data, the validity of the criterion method, and the employed statistics. The authors reported that validity studies using direct measurements as the criterion method tended to have better quality scores than those based on observations. �arriera-Viruet �arriera-Viruet et al's study, however, did not find a relation between the study, however, did not find a relation between the overall quality of the study and the reported agreement between the self-report and the criterion method. This finding was used to suggest that what drives the reported validity of self-reports is not the quality of the study, but rather the fact that self-assessments can be considered fundamentally a psychophysical measure of exposure which may not be lineally related to physical stimuli. �imilarly, self-reports may reflect the subject's response to a variety of stimuli simultaneously, some of which may not be controlled or measured.
�ur study aims to advance this discussion by quantifying the effect of key characteristics of validity assessment methods on the measured validity of self-reported mechanical demands currently published in the literature. �pecifically, we explored the hypothesis that the use of non-comparable reference methods explains, at least partially, the low-to-moderate measured validity of selfreports. Also, in recognition of the fact that the correlation, a commonly used statistical measure of validity, is affected by the amount of variability present in the variables being compared, we explored the relation between the heterogeneity of the population with regard to the exposure of interest and the reported validity of selfreported ergonomic demands. To do this, we classified all validity assessments of self-reports available in the recent literature by: (i) methodological characteristics related to related to study population, (ii) reference method, (iii) statistical measure of validity employed, and (iv) exposure and exposure dimension measured by the tested self-report (12). (12). Notably, we documented whether demographic features of the study population such as age, gender, and education level were reported in the studies. This is pertinent as demographic features may be related to the accuracy of self-reported data (13) for example, via differences in the physical capability to execute a given task (14) or in the capability to understand and process information in response to a question (15). We used the information about validity assessments both to conduct the tests mentioned above and describe the trends in validity assessment research of self-reported mechanical demands.

Manuscript selection
�ne of the authors conducted a first screening of all identified articles based on titles and abstracts. This stage aimed to exclude any manuscripts that tested the validity of instruments for health outcomes evaluation, scalebased self-reports for evaluation of ergonomics demands (eg, self-reports that sum-up the results of various questions to estimate a total level of exposure), or studies testing self-reports with methods different from the reference or criterion validity method (eg, studies using construct validity, face validity). �f there was any doubt about eligibility, the article was retained. Those articles that passed this first stage of review were obtained in full manuscript form and read in-depth to check their eligibility. At both stages, articles were retained if all the answers to the three following questions were yes: (i) Did the study evaluate a mechanical demand (including assuming awkward postures, applying force or executing forceful movements, doing repetitive movements, and measures of physical exertion)? (ii) Did the study compare the self-report of demands with another measure of demands? (iii) Did the study test the validity of a single-question?

Data extraction
We captured information that characterizes individual validity assessments of questions, including: type of self-report; type of criterion method (ie, direct measurement, observation, or other self-report based instrument); employed question and response scale; type of population (ie, blue-versus white-collar workers); industry; period of exposure covered by the self-report and criterion method; measure of validity (eg, correlations, regression parameters, Kappa statistics, and mean differences between self-report-based and criterion method-based measures of exposure, etc.); sample size; and study population demographics (ie, age, gender, and education).

Self-report classification by exposure and exposure dimension
The classification of self-reports according to the exposure they evaluated was based on previously proposed taxonomy schemes of physical exposures (10,16). We added a new category for the exposure �activity/task" to include validity assessments testing the occurrence of activities or tasks in the workplace (table 2).
The classification of self-reports according to the exposure dimension they address (duration, frequency, and/or magnitude) was conducted via a 3-step process. First, both the part of the question that defined the focus of the question (15) and the question response were separately classified into the three different dimension categories. �econd, the question predicates were examined for particular words suggesting the intention to further quantify the exposure in terms of duration, frequency, or magnitude (table 3). Third, each self-report was reviewed individually in order to understand its purpose and integrate information from the previous two stages.

Movement
Questions about the action of squatting or moving a body part (eg, bend or rotate back). An important subcategory is all questions about manual materials handling (lifting, carrying, pushing, and pulling).

Repetition
Questions specifically asking about the duration of repetitive movements or qualifying a repetitive movement (eg, how many times per minute…). Questions about movement of the arms, hands or fingers. Questions about work in an assembly line.
In one case, the use of a keyboard was not classified as activity exposure, but as repetition because of the qualifier "intense" keying.
Physical exertion Questions looking to rate the physical exertion of jobs or tasks.
Force Questions asking about the use of pinch grasp; questions asking about the weight of objects; and questions asking about external force direction or body posture with the purpose of estimating internal loads (eg, compression or shear forces in the L4/L5 joint). Questions about manual material handling of objects of a given weight were classified in the movement exposure.

Vibration
Questions asking about the duration of the use of vibrating tools or questions asking about driving any or particular types of vehicles.

Activity/task
Questions asking about the duration or frequency of occurrence of activities such as computer use, walking, cleaning, maintenance, and meetings.

Other
Questions about workstation design, adjustability of the workstation, presence of pressure on the tip of the thumb, and the use of hands as a hammer.
When a self-report was said to evaluate the magnitude dimension of an exposure (ie, posture, movement, repetition, and vibration), we created two subcategories. �n the first subcategory, we classified simpler questions investigating the presence of an exposure (eg, using a dichotomous response). �n the second subcategory, we classified more complex questions attempting to quantify the magnitude of the exposure in greater detail (eg, using ordinal scales, or qualifying the magnitude of the exposure in the focus or predicate of the question). An illustration of this process is presented in figure 1.

Trends in validity testing of self-reports research
All identified validity-tested self-reports were organized by: type of self-report; type of criterion method; demographic information; whether the actual question was presented in the manuscript; exposure and exposure dimension; industry; and the type of population being tested.
To avoid the double counting of non-independent validity assessments presented in the same study and therefore to have a more accurate representation of the distribution of validity assessments by methodological characteristics and topics, the following sources of non-independence were considered: (i) validity assessments based on the same raw data, but transformed prior to analysis (eg, information originally collected with a larger number of categories and then collapsed in the analysis stage to fewer categories); (ii) validity assessments based on the same data but with more than one statistical measure of validity presented; and (iii) validity assessments of the same instrument in a subset (or subsets) of the original population. Assessments were considered independent when the same self-report was tested against the same criterion, but with a different time lapse between the application of the reference method and the administration of the self-report. We also considered assessments as independent when the same self-report was tested with a different reference method or with a different questionresponse type. �therwise, we considered assessments coming from the same manuscript as independent.

Relation of the comparability between self-reports and reference methods with reported validity
The subset of validity assessments reporting correlation coefficients (both �pearman and �earson correlation coefficients) as statistical measures of validity were   (19) included in this analysis. �nitially, the comparability of self-report and reference methods was evaluated separately for the exposure time period and the exposure construct. �ach factor was rated by two independent raters as �corresponds," �does not correspond", or �is arguable," which included cases where the correspondence was difficult to discern given the information available in the manuscript. When disagreement occurred, consensus was reached after re-examining the available information. Then, using all potential combinations of the consensus evaluations of the time period and construct's correspondence, we created an overall correspondence index (table 4) with three levels: high (+), medium (+/-), and low (-). The time period was said to correspond when it was well defined in the self-report (eg, time spent sitting during the past work shift) and it matched the measurement period used by the criterion method. The time period was also said to correspond when the time period was not specified in the self-report (or was specified as an average or usual amount of time in the present job or in the past year) and the criterion method was applied (or referred) to more than two work shifts (thereby enabling a better reflection of the variability of the exposure).
The constructs were said to correspond when the units of the self-report corresponded with the units of the reference method (eg, both self-report and criterion method measured angles, time, or frequencies in the same units). �onstructs were also said to correspond when self-reports asked for physical effort or physical exertion (eg, using �org scales or other alpha-numeric scales) and the reference method used either heart rate-based or acceleration units-based (eg, from a portable accelerometer) measures of exposure. The correspondence between self-reportbased measures of physical effort and acceleration units was considered appropriate because it has been shown that movement detected by acceleration-based instruments is related to muscle load and, therefore, to energy expenditure (20), a measure of physical effort.
Generalized estimating equations (G��) (�R�� G�N��D, �A� v9.1, �A� �nstitute, �ary, N�, U�A) were used to assess the relation of the level of correspondence of self-reports and criterion methods (independent variable, rated as high or low) with their measured correlation (dependent variable, rated as high or low, depending on whether the correlation was above or below the median correlation of 0.49). G�� was used to account for �within study dependency" of validity assessments.

Relation of heterogeneity of study population with reported validity
For this analysis, we included reviewed validity assessments that: (i) used correlation coefficients as the statistical measure of validity, and (ii) included information about the variability of the exposure of interest (ie, standard deviation of the exposure as measured by the criterion method or in its absence as measured by the self-report being tested), which was thought to be a measure of the heterogeneity of the population regarding the exposure being measured. �tandard deviations were not used directly to measure the variability of the exposure in the validity studies. �nstead, the coefficient of variation (�V), defined as the standard deviation divided by the mean of the exposure (or median when the mean was not reported) was used. The �V allows for the evaluation of the variability of a construct in relation to its mean; it is most useful in comparing the variability of several different samples, each with different means (21). Therefore, this measure was considered an appropriate surrogate for variability for this analysis where multiple studies (with multiple exposures measured in a variety of populations) were aggregated. When multiple sub-groups of workers were used as the study population and individual estimates of mean and variability were presented, we pooled them to get an estimate of the �V for the entire study population.
Again, we used G�� to conduct the association analysis while accounting for dependency of validity assessments. The dependent variable was the reported correlation (rated as high or low, depending on whether it was above or below the median-reported correlation of 0.49). The independent variable was the �V (rated as high or low, depending on whether it was above or below the median �V of 1.00).

Results
The search identified a total of 9563 articles. A first screening based on the title and abstract resulted in 126 articles being selected for in-depth review. �f these 126 studies, 40 included at least one inter-method test of Does not correspond Medium (+/-) Low (-) Low (-) a The comparability between self-reports and criterion methods is evaluated independently for the evaluated time period and the evaluated construct as "corresponds", "arguable or hard to tell", or "does not correspond" (see text for details on the evaluation criteria). The combination of the two factors were used to evaluate the overall comparability between self-reports and criterion methods as: high (+), medium (+/-) or low (-).
individual questions asking about mechanical demands. These 40 studies reported a total of 490 independent individual-question validity assessments (figure 2). Three separate reports follow about (i) trends and characteristics of current validity assessments; (ii) the relation between the comparability of self-reports and criterion methods, and the reported validity of selfreports; and (iii) the relation between the variability of the exposure in the study population and the reported validity of self-reports.

Trends in validity testing of self-reports research
�elf-administered questionnaires were the most commonly tested type of self-report (69.2% of validity assessments), followed by diaries/logs (16.8%) and interviews (13.3%). �f all evaluated self-reports, 74% were tested against observation-based methods. Direct measurement criterion methods were used only in 18.4% of all validity assessments. �n 6.5% of the cases, selfreports were tested against other self-reports. �xposures in most categories were primarily tested against observation methods. �xposures in the �movement" category used observations as the criterion method in 96.5% of the assessments. �n contrast, validity assessments of self-reported vibration exposures were tested mostly against direct measurement methods (figure 3). �ee the appendix for an inventory of validity assessments of single question for mechanical demands evaluation.
�ostures and movements are the most frequently evaluated mechanical demands, accounting for 55% of all validity assessments. �verall, 35% of all the tested self-reports evaluated the magnitude of the exposure alone. Another 42% of the self-reports evaluated simultaneously the magnitude and duration of the demands. However, most self-reports (62.5%), evaluating simultaneously the magnitude and duration dimensions of the exposures, tested only the duration in which an exposure was present ( figure 4).
�mportant demographic information such as the education level, age, and gender of the study population was frequently not included in the validity assessment. For example, only one out of the 40 studies reported the education level of the study participants (14). Twelve studies presented at least one validity assessment with missing information on the mean or the distribution of the age (eg, standard deviation or range of age) of the study population (17,19,(22)(23)(24)(25)(26)(27)(28)(29)(30)(31). Lastly, five studies did not report the gender distribution of the study population (19,22,23,32,33).
The exact wording or a condensed version of the evaluated self-report (14, 17-19, 22, 24, 25, 29-31, 34-45) was reported in 22 studies. Another three studies provided some validity assessments, for which there was full presentation of the self-report, but also provided validity assessments, for which the wording of the selfreport was not fully presented (26,46,47). Five other studies reported assessments in which either the question   or the response was reported, but not both (4,27,33,48,49). For the remaining nine studies, there was little or no information at all about the wording of the evaluated self-report (23,28,32,(50)(51)(52)(53)(54)(55).

Relation of the comparability between self-reports and reference methods with reported validity
�f the 40 studies, 27 presented correlations as a measure of validity of self-reports, which resulted in a total of 244 validity assessments used in this analysis ( figure 2). There was a wide range of correlations between selfreports and reference methods across exposures, exposure dimensions, types of self-report, and the level of correspondence. When these 244 assessments were classified depending on the reported correlation level (above or below the median correlation), we found that validity assessments of exposures in the force, repetition, and movement categories tended to result more frequently in correlations below the median (100%, 85.7%, and 53.1%, respectively). Validity assessments of self-reports evaluating other exposure categories such as physical activity, posture, activity, and vibration resulted more frequently with correlations above the median (60%, 67.7%, 71.4%, and 72.7%, respectively) �f all validity assessments reporting correlations, 62% (152 assessments) tested self-reports against reference methods measuring the same period and construct as the self-report. However, another 30% (72 assessments) of these validity assessments tested self-reports against reference methods for which either one or both of the evaluated period of time or construct did not match the self-report; 97% of this lack of correspondence was due to low correspondence in the time period being evaluated. Therefore, we could evaluate only the relation between time period correspondence and reported correlation.   Validity assessments of exposure categories (such as physical activity, force, vibration, posture, and repetition) were conducted frequently under conditions where the time period evaluated by the self-report was said not to correspond with the time period evaluated by the criterion method (50%, 46.7%, 36.4%, 34.4%, and 32.1%, respectively).
There was a positive association between the level of comparability of self-reports and reference methods, and the reported validity of self-reports (�R 2.03, 95% �� 0.89-4.65) (table 5). There were not enough subsets of validity assessments using the same type of self-report and criterion method to be able to adjust for these variables.

Relation of heterogeneity of study population with reported validity
�f the 40 studies, 18 presented both correlations as a measure of validity and information to estimate the �V of the exposure of interest in the study population. These 18 studies resulted in 114 validity assessments that were used in this analysis (figure 2). Validity assessments of the repetition, activity, force, and posture exposure categories were conducted frequently in populations with exposure variability below the median variability of the studies included in this analysis (95%, 83.3%, 61.5%, and 45.2%, respectively).
There was a positive association between the heterogeneity of the study population with regard to exposure of interest (as measured by the �V) and the reported validity of self-reports (�R 1.60, 95% �� 0.96-2.61) (table 5). There were not enough subsets of validity assessments, using the same type of self-report and criterion method, to be able to adjust for these variables.

Main findings
This study aimed to examine whether methodological aspects of validity assessments are likely to explain findings of validity testing and to document current trends in this topic of research. �verall, our study supports the findings of �tock et al (10) that it is difficult to evaluate the validity of self-reported exposures based on currently available validity assessment studies. We provided quantitative evidence to support the claim that methods and characteristics of the study population used in current validity assessments may explain, at least partially, the low-to-moderate reported validity of self-reported mechanical demands. This evidence suggests that commonly accepted views about the poorto-moderate validity assessment should be reconsidered. Furthermore, reported results cannot be easily generalized to other populations as study populations have not generally been well described.
We found that validity assessments have primarily been conducted for self-reports measuring posture duration and movement (including manual materials handling) frequency, together representing 55% of all tested exposure dimensions. Although this may simply reflect their commonality in occupational epidemiologic research, it seems reasonable that the development and testing of self-reports assessing other exposure categories and dimension combinations should be conducted before conclusions are made about the validity of selfreported mechanical demands as a whole. Also, the fact that mainly the duration and frequency dimensions of the exposure were evaluated for postures and movements respectively was not surprising. These dimensions capture what is traditionally more important regarding posture and movement. However, this marked distribution highlights the fact that more elaborated patterns of exposure (eg, whether the total duration of a posture during a day is achieved during a single moment of exposure or by adding multiple moments of a exposure throughout the day) were not typically assessed by the self-reports included in this review.
�elf-reports were validated mainly against observation-based methods of exposure assessment. The use of Table 5. Unadjusted association analysis study characteristics and reported validity of self-reports. (r = correlation between the selfreport and the reference method; CV = coefficient of variation)  19 38 a Estimates using Generalized Estimating Equations. Estimates using Generalized Estimating Equations. b The cut-off corresponds to the median of all validity assessments reporting correlations (N=244); association analyses with subsets of those validity assessments are presented in this table. c The comparability of the time period between self-reports and reference methods was judged "arguable or hard to tell" for 18 of the 244 validity assessments reporting correlations; only those validity assessments that were judged to have high or low correspondence were included in this analysis (226 assessments). d The cut-off corresponds to the median coefficient of variation of all validity assessments for which this information could be estimated (N=114).
this type of reference method was particularly frequent for exposure categories such as movement, repetition, and posture. This may be of concern because the validity of observational methods themselves has also argued for the assessment of detailed postures and the assessment of rapid movements (56,57). Therefore, in agreement with previous reviews (10), it is recommended to observe results about the validity of self-reported information cautiously in relation to the validity of the employed reference method. Frequently, these validity studies did not report important demographic information on the study population such as age, gender and, most importantly, education, which is one likely determinant of an individual's cognitive capability to understand and respond to questions (15). �imilarly, we found that validity studies often do not report the evaluated questions in full, which is essential to corroborate and evaluate previous research. �verall, the lack of this essential information makes it difficult to establish the degree to which these studies can be generalized to other situations and working populations.
This study's findings highlight that methodological and population differences across studies are related to the measured validity of self-reports and should, therefore, be considered when interpreting the aggregate results from multiple validity studies. We found a high frequency of mismatch between self-reports and criterion-based methods. �ost often, the discrepancy corresponded to a selfreport asking for average or usual exposures while the criterion method evaluated exposures for a specific time period (eg, a few hours or a work shift). We found that the mismatch was related to lower reported correlations. We also found that the higher the heterogeneity of the study population with regard to the exposure of interest, the greater the number of reported correlations.
�ur analyses of the effect of design and study population characteristics on the measured validity of selfreports have important limitations. First, our analyses resulted in only moderate associations and were conducted for aggregated validity reports for multiple exposures and using various types of self-reports and reference methods. Therefore, we cannot rule out alternative explanations to observed correlation levels in the validity tests, other than the heterogeneity of the study population and the correspondence between the self-report and reference methods. �econd, only studies reporting correlations could be used in these analyses. Although correlations were by far the most-used measure of validity, it remains unknown whether methods and study characteristics of studies reporting correlations are representative of all validity studies. Furthermore, the frequent use of correlations as a measure of validity raises another issue. �orrelations only measure one aspect of the validity of self-reported information, namely, what proportion of the variability of self-report estimates is due to the exposure as measured by the reference method. �n this way, high correlations can be achieved as long as large values of self-report based estimates are associated with large values of the reference methods. This may occur even in the presence of systematic bias in the self-report based estimates. Therefore, our results should be observed within the limitations of correlations as a measure of exposure. Third, we used a qualitative method to assess the comparability between self-reports and reference methods. Therefore our assessment of the level of comparability may be subject to misclassification; however, efforts were made to have independent evaluations of the comparability study, which is expected to reduce error in this assessment. Lastly, the level of comparability between self-reports and reference methods, and the variability of the exposure in the study population were related to correlations regardless of whether they were �earson or �pearman correlation coefficients. This was done to maintain a larger dataset. We expect little effect due to this aggregation of coefficients as they were distributed similarly across validity assessments that were said to have good and low comparability between self-reports and reference methods, and assessments with lower and higher exposure variability.
Implications for epidemiologic research on musculoskeletal disorders �ubstantial research effort has been invested in assessing the validity of self-reports as a source of information of occupational mechanical exposures. �urdorf (1) noted that very few self-reports used in occupational epidemiologic research have been tested for validity. �ince then, at least 40 studies have tested the validity of 490 individual questions dealing with a variety of mechanical exposures. These efforts have not been as effective as desired in firmly establishing the circumstances under which self-reports are more likely to be valid. �ur review suggests that the lack of strong conclusions about the validity of self-reported mechanical demands is largely due to two factors: (i) aggregating information from multiple observational studies is difficult, and (ii) currently employed field validity assessment methods are not optimal to learn about this type of selfreported information. Although a more radical solution could be to avoid the use of self-reported information of mechanical demands, this seems impractical both due to the cost of not using it and the difficulty of accomplishing this under certain conditions (58). Researchers may agree that, instead, self-reported information should be confined to certain pieces of information required for occupational exposure assessment (9). However, the question remains: to what parts of the exposure assessment process should self-reports be confined (59)? To answer this question, we propose a two-pronged strategy: (i) the use of agreedupon minimum standards for both testing and results reporting in field validity testing, which would allow for better use of this type of research; and (ii) the use of more controlled conditions that would allow for a more systematic understanding of self-reported information.
The use of field validity testing is to some extent unavoidable, as researchers are compelled to validate their tools and methods in a representative sample of the population in which they expect to apply these instruments. Therefore, the validation study is used to explore the magnitude and direction of bias that would be introduced to the association between the exposure of interest and musculoskeletal disorders; however, this musculoskeletal disorders; however, this ; however, this information is typically not used to calibrate or formally correct the measurement error in the full population of interest. Although the results from observational validity studies are most applicable to the specific study population in which they are conducted, these studies can help stimulate important improvements by reporting complete information about factors that may influence the measured validity of self-reports.
�nabling a proper meta-analysis to be conducted on aggregate results, a summary of the proposed minimum summary of the proposed minimum information that should be reported from validity assessments of self-reports includes: · exact wording of the body and response scale of the exact wording of the body and response scale of the questions used; · context of the question (eg, how many and what other questions were asked, time allowed, whether additional aid was available); · distribution of the exposure of interest in the population, as measured with both the criterion method and the self-report measure; · appropriate measures of validity other than (or in addition to) correlations such as mean differences, regression parameters, misclassification matrices; · confidence intervals or other appropriate measures of variability of the estimates of validity; · demographic characteristics of the evaluated population (age, gender and education, health status); · distribution of the measurement errors as well as statistical measures of validity in relevant subgroups of the study population (eg, occupation, age, gender, musculoskeletal disorder status groups); · key words in the title or abstract such as �intermethod reliability" or validity; and · standard additional information for scientific publications.
We further propose to complement observational field-based validity information with experimental research in order to investigate explanatory mechanisms for validity findings. �xperimental research would enable a more appropriate adjustment for exposure distribution and workers' differences; it would also allow the use of more accurate reference methods. Finally, it would enable investigators to systematically test the effect of question content, format, and administration. This in turn would facilitate an understanding of the relation between the criterion and self-reported mechanical exposures under different work conditions and working populations, and refine methods and instruments for data collection.
For a long time, experimental testing has advanced our knowledge about the relationship between work load and perceived physical exertion (60,61). This has permitted researchers to gain better insight into what reported working demands really mean (62,63). Recently, two studies have assessed the potential determinants of accuracy of self-reported repetitive work and self-reported task durations (58,64). �etersson et al (64) showed that the duration and repetition of tasks can affect individuals' accuracy; more specifically, they reported that while very short tasks (2.5 minutes) can be overestimated by 100%, longer tasks (37.5 minutes) can be reasonably estimated. �n a more recent study, and using a working population, �arrero et al (58) demonstrated that task-related factors such as the work pattern (whether individuals execute tasks continuously or are intermittently interrupted by other tasks) and the physical and cognitive demands of the tasks can affect the accuracy of self-reported tasks durations, for example up to 30% in tasks lasting 40 minutes. �n contrast to validity studies for specific populations, the results from systematic testing can be potentially used across multiple populations to correct bias and interpret the self-reported physical exposure information gathered (65).
�n conclusion, the use of self-reported mechanical demands for occupational epidemiologic research requires further, better validity testing research. We believe the full potential of self-reports in occupational epidemiologic research is still to be discovered.