Prediction of objectively measured physical activity and sedentariness among blue-collar workers using survey questionnaires

. Prediction of objectively measured physical activity and sedentariness among blue-collar workers using survey questionnaires. Scand J Work Environ Health . 2016;42(3):237–245. Objectives We aimed at developing and evaluating statistical models predicting objectively measured occupational time spent sedentary or in physical activity from self-reported information available in large epidemiological studies and surveys. Methods Two-hundred-and-fourteen blue-collar workers responded to a questionnaire containing information about personal and work related variables, available in most large epidemiological studies and surveys. Workers also wore accelerometers for 1–4 days measuring time spent sedentary and in physical activity, defined as non-sedentary time. Least-squares linear regression models were developed, predicting objectively measured exposures from selected predictors in the questionnaire. Results A full prediction model based on age, gender, body mass index, job group, self-reported occupational physical activity (OPA), and self-reported occupational sedentary time (OST) explained 63% (R 2 adjusted) of the variance of both objectively measured time spent sedentary and in physical activity since these two exposures were complementary. Single-predictor models based only on self-reported information about either OPA or OST explained 21% and 38%, respectively, of the variance of the objectively measured exposures. Internal validation using bootstrapping suggested that the full and single-predictor models would show almost the same performance in new datasets as in that used for modelling. Conclusions

Even in modern information societies, a considerable proportion of the working population is exposed to physical activity at work (1,2).In a national survey in 2012, 39% of the Danish workforce reported to have a job where ≥75% of the time required some physical activity, such as standing and walking (3).More selfreported time spent in physical activity during work has been associated with increased risk of long-term sickness absence (2,4), premature drop-out from the workforce (5), and cardiovascular and all-cause mortality (1,6,7).On the other hand, other workers spend a large proportion of the time at work being sedentary (8)(9)(10), which has been suggested to be associated with increased all-cause mortality (11), musculoskeletal pain (12), and obesity (13).
Occupational time spent sedentary and in physical activity have so far mainly been determined using questionnaires that are feasible to administer in a large population, such as in national surveys (14)(15)(16).However, questionnaires have been criticized for giving biased and imprecise results compared to objective measurements (17).Systematic and random measurement errors may lead to misleading results, both when documenting time spent sedentary and in physical activity and when determining associations with relevant outcomes such as health and well-being.As an alternative, objective measurements using accelerometers offer accurate information of time spent sedentary and in physical activity (18,19).Thus, accelerometer recordings have been used as the gold standard for validating questionnaire-based data on time spent sedentary and in physical activity (20,21).However, accelerometers demanding more resources to use than questionnaires (22), disqualifing them from most large-scale studies.
An attractive compromise would be to predict objectively measured occupational time spent sedentary and in physical activity from self-reported information that would generally be available in most large epidemiological studies and surveys.Explicit prediction models have been proposed before to predict time spent sedentary and in physical activity (23)(24)(25), but these studies have not developed models for exposures at work, which may show associations with self-reported predictors other than leisure time exposures.A few previous studies have, indeed, developed prediction models for time spent sedentary and in physical activity specifically at work (26)(27)(28)(29)(30).However, they have mainly focused on predicting answers to some self-reported variables by another type of self-reported information.This approach increases the risk of correlated error or common-method bias (31).
Another limitation of previous prediction models addressing time spent sedentary and in physical activity at work is that the predictors included in the models, such as cognitive (32) and psychosocial variablesincluding social norms, self-efficacy and advantages of sitting less (26) -are not normally available in large epidemiological studies and surveys.Developing models based on predictors that typically appear in large epidemiological studies and surveys would increase the utility of the models in the context of, for instance, public health surveys and cohort studies of occupational health.
As a general endeavor in exposure modelling, examination of simple models based on few predictors is of interest, since parsimonious models may be easier to use and more stable than models based on many predictors.In the present context, this would call for assessments of the performance of models based only on selected questionnaire variables that can be expected to be particularly predictive of sedentary behavior and physical activity at work.Thus, this study aimed at developing and evaluating statistical models predicting objectively measured time spent sedentary and in physical activity at work from self-reported variables which would generally be available in large epidemiological studies and surveys.A secondary aim was to examine the extent to which single-predictor models based on questions regarding occupational sedentary time (OST) or occupational physical activity (OPA) can predict the result of objective measurements of corresponding variables.

Study design and population
Recruitment flow of the study population is shown in Appendix A (www.sjweh.fi/data_repository.php).Workers were recruited from seven blue-collar occupations (manufacturing, assembling, construction, cleaning, garbage collection, mobile plant operation, and health services) in the cross-sectional New Method for Objective Measurements of Physical Activity in Daily Living (NOMAD) study (33) to obtain a wide range of exposures while maintaining homogeneity among workers with respect to socioeconomic status.
The Ethics Committee for the Capital Region in Denmark approved the study (journal number H-2-2011-047), which was conducted in accordance with the Helsinki declaration.

Procedure
At each workplace, data were collected continuously during a four-day period, with research staff being present at the workplace on days one and four (33,34).On day one, workers interested in participating in the study underwent anthropometric measurements and completed questionnaires addressing variables related to demographics, health, lifestyle, work and psychosocial factors.Also on day one, objective measurements of time spent sedentary and in physical activity at work were initiated by equipping the workers with two accelerometers (Actigraph GT3X, ActiGraph LLC, Florida, USA) and a diary for noting working hours.On day four, workers returned the measurement equipment and the diary.Approximately 80% of the workers declared that the objective measurements were collected during typical working days.

Occupational time spent sedentary and in physical activity
Sedentary behavior and physical activity were analyzed using the custom-made Acti4 software according to established procedures (34,35).The software identifies a number of different activity types, as well as the gross body posture.For the present study, we merged periods of sitting and lying into "sedentary time" and collapsed periods with any type of physical activity into one category, ie, "physical activity".Thus physical activity is defined to occur whenever the worker is not sitting or lying.
Time spent sedentary and in physical activity was averaged for each specific worker across all working periods with valid measurement data.A working period was considered valid if it comprised ≥4 hours of work, and corresponded to ≥75% of that individual's self-reported average working time per day.Workers with at least one valid work day were included in further analyses.

Predictors
The predictors used for modelling in this study were Gupta et al selected a priori from the questionnaire based on (i) whether they would likely predict time spent sedentary or in physical activity according to previous studies (26,29,30,32,(36)(37)(38), (ii) whether they are commonly available in large epidemiological studies and surveys, and (iii) whether they showed a large relative dispersion between workers in our material.Based on these criteria, we arrived at including self-reported information on age, gender, body mass index (BMI), job type, OST, and OPA.These predictors are described in detail in Appendix B (www.sjweh.fi/data_repository.php).Selecting predictors a priori without knowing their relationship to the outcomes is a recommended approach in modern statistical literature as it reduces the risk of capitalizing on chance and arriving at spurious relationships between predictors and outcomes (39)(40)(41).

Statistical analyses
All predictors were treated as continuous variables except for gender, job type, and OPA which were treated as categorical variables.Statistical operations were performed using the R software package "rms" (42).
The six predictors (cf.Appendix B) were modeled together against each objectively measured exposure using least-square linear regression analyses to develop a full prediction model.The available degrees of freedom for statistical analysis in this study were sufficient to allow inclusion of all six variables.
The sensitivity of the full model to selection of predictors was analyzed by running models removing one predictor at a time and observing the resulting change in the explained variance.
In addition, single-predictor models were developed based only on those predictors that directly focused on sedentary behavior and/or physical activity.Thus, two models were developed using least-square linear regression analyses, one based on self-reported information on OST (variable #5 in Appendix B) and another based on self-reported OPA (variable #6 in Appendix B).The performance of the resulting full and single-predictor models was evaluated by the R 2 adjusted and the mean squared error (MSE) of estimation.The residuals of the models were examined for normal distribution and homoscedasticity.
The expected ability of the full and single-predictor models in predicting objectively measured exposures in new datasets was estimated using a bootstrap resampling procedure (43).Five-hundred bootstrapped virtual datasets were drawn with replacement from the source population and were of the same size.For each virtual dataset, we fitted a model including the same predictors as in the original model.This re-fitted model was then applied to the source dataset, and the model fit parameters were compared to the parameters obtained in the original fit.
The differences, ie, the "optimism" of the original model, were averaged across all bootstrap repeats and used as an overall measure of optimism, reflecting the extent to which the original model capitalized on chance (44).

Results
Descriptives of the participating workers are shown in table 1.Of the 214 workers, most were engaged in manufacturing (27%) and least in mobile plant operations (5%).
In total, 4357 hours of Actigraph measurements were collected.On average, 38.1% of the working time was spent sedentary and the remaining in physical activity (table 2).
Table 3 shows the resulting coefficients of both the full and the single-predictor models estimating objectively measured sedentary time at work.Since time spent sedentary and in physical activity as defined in the present study are complementary variables (ie, they add up to 100% of the working time) we focus on presenting, in detail, only the results of modelling sedentary time at work.Results pertaining to the prediction model for time spent in physical activity are presented in Appendix C (www.sjweh.fi/data_repository.php).The full model predicted 63% of the variance.Older age and higher BMI were found to be significant predictors of less objectively measured sedentary time at work.Additionally, workers who reported more OPA (answer categories 2-4 of the single four-graded question on OPA, variable #6, cf.Appendix B) were exposed to less objectively measured sedentary time at work compared to those reporting that their work was mostly sedentary and did not require strenuous physical activity (category 1 of the OPA question).Assemblers, garbage collectors and manufacturing workers had markedly less sedentary time at work than mobile plant operators.When predictors were removed from the models one at a time, explained variance was reduced between 1% (gender) and 18% (job group; cf.table 3).
The developed full model (table 3) can be used to predict sedentary time at work for any worker characterized by some specific combination of the predictor variables in the model.For example, a 53-year-old female assembler, with a BMI of 29.1 kg/m 2 , who has reported "mostly sedentary work that does not require strenuous physical activity" (response category 1 of the OPA question), and stated her sedentary time to be "almost all the time", is predicted by the model to be sedentary 67.8% of her time during working hours according to objective measurement.
Figure 1 shows predicted versus objectively measured values for time spent sedentary and in physical activity at work.
The single-predictor model predicting objectively measured sedentary time at work only from self-reported OPA, showed an explained variance, R 2 adjusted, of 21% (MSE 260.0; table 3).The other single-predictor model based on the worker's self-reported OST explained 38% of the variance of objectively measured sedentary time at work (MSE=212.4;table 3).

Internal validation using bootstrapping
The bootstrap validation of the full model revealed an optimism of 5% in explained variance, R2.When models developed from the virtual bootstrap datasets were used on the source data, MSE was 136.1, as compared  to 117.0 in the original fit (cf.table 3).The bootstrap validation of the single-predictor models based on selfreported answers on OPA or OST revealed an optimism of 2% and 0%, respectively, in explained variance when predicting objectively measured sedentary time at work.The corresponding MSE were 268.5 (original fit 260.0) and 215.2 (original fit 212.4).

Discussion
This study aimed at developing and evaluating prediction models for estimating time spent sedentary or in physical activity during working hours by self-reported variables that would normally be available in large epi-demiological studies and surveys.Among blue-collar workers, a full model explained 63% (adjusted R 2 ) of the variance of objectively measured time spent sedentary, and since we defined physical activity to occur whenever the worker was not sedentary, our model for predicting physical activity had the same performance.Bootstrap validation suggested that this performance was somewhat optimistic; the expected adjusted R2 when using the model in new datasets would be 57%.Single-predictor models based only on self-reported OPA or OST explained 21% and 38%, respectively, of the variance in objectively measured exposures.The single-predictor models were very stable according to the bootstrap validation.
This study is novel in predicting objectively measured time spent sedentary or in physical activity at work from self-reported information that would normally be available in large epidemiological studies and surveys not specifically designed to predict these exposures.The performance of the full model based on age, job group, BMI, and self-reported OST and OPA was similar to the best performances of previous questionnaires on occupational sedentary and physical activity (45)(46)(47).The performance of our model was even better than previously developed models using a customized set of variables to predict self-reported OST or OPA (26,29,30).Also, these previous models produce estimates of self-reported OST or OPA, and thus do not adjust for the bias present in these self-reports, relative to objectively measured data.When used in investigations on the effects of sedentary behavior or physical activity on different health outcomes, these models may therefore lead to biased associations and misleading estimates, eg, of the health effects of being sedentary for a particular proportion of the working day.In producing estimates of objectively measured exposure our models avoid this fallacy.
In our full model, age was observed to be negatively associated with sedentary time at work, which corroborates many previous studies (26,30,32).BMI also tended to be negatively associated with time spent sedentary, and this contradicts previous studies (26,29).However these studies were conducted on the general working population (26,29) and the models predicted self-reported OST, not objectively measured time as in our study (26,29).More studies are needed to verify the observed negative association of BMI with objectively measured sedentary time at work, and to disclose whether this association is specific to bluecollar workers.
The predictor "job group" contributed substantially to explain variance in objective exposure, as shown by the considerable decrease in R2 when removing this particular predictor from the full model (table 3).Job group has previously proven to be an important predictor of sedentary behavior and physical activity (26).However, this previous study categorized participants mainly as white-or blue-collar workers and did not use a detailed occupational classification.Thus, our finding that even job group within the segment of blue-collar workers appears to be an important predictor of sedentary time and physical activity is novel, and encourages further research into the performance of this predictor even for other blue-collar jobs.
As expected, workers reporting more OST were also, on average, more sedentary according to objective measurements, as predicted by our full model.Similarly, workers reporting more OPA (categories 2-4 of the OPA question) generally had less objectively measured sedentary time at work, compared to those reporting that their work was mostly sedentary and did not require strenuous physical activity (category 1 of the OPA question).These results indicate that self-reported OST and OPA do have some potential to predict objective exposures.We did, indeed, find that a fair prediction of objectively measured sedentary time at work could be obtained even using single-predictor models based on a single question about either OST or OPA (cf.table 3).These results point to a similar or better predictive ability of these self-reported data than found in previous studies addressing sedentary behavior and physical activity (48,49).Still, answers to these questions cannot explain the major part of the variance of the corresponding objective measures.For the single item measuring OST, one explanation may be that response categories are defined using ambiguous terms such as "almost" and "rarely/very little", which workers may find difficult to interpret.Answers to the OPA question could only explain 21% of the variance in objectively measured exposures.One reason may be that the four response categories of the OPA question do not fully cover the range of physical activities occurring during work.For example, some workers may have a job comprising considerable sedentary time, but also occasional periods of high physical load; mixed exposures are not clearly captured by any specific response alternative to the OPA question.Notably, this ambiguity may be present even though we slightly modified the answer categories for the OPA question compared to its original form (50) so as to pursue better clarity.For example, response category 1 in the original version ("predominantly sitting") use the term "sitting" without further specification.This may lead to confusion among workers responding to this question, and the modified OPA answer attempts to be more specific in saying, "Mostly sedentary work that does not require strenuous physical activity''.In response category 4, the term "heavy manual work" used for the original OPA question was replaced with "heavy or fast moving work that is physically strenuous", which we believe is more explanatory.Due to these slight discrepancies, our prediction models may perform differently if they are applied to answers on the original OPA question.
For a prediction model to be useful, it should not only perform well for the dataset on which it was developed, but also for new datasets.We could not perform a genuine external validation of our prediction models since a new similar dataset was not available.In lack of new data for testing the models' performance, we used an alternative validation technique, i.e., internal bootstrap validation, which has been recommended in statistical literature on modelling (40,44,51).The bootstrap validation suggested a 5% optimism in variance explained by the full models predicting time spent sedentary and in physical activity at work.Thus, a considerable proportion of exposure variance could be explained by the full models even after adjusting for optimism.We therefore consider our models to be useful in other datasets, even when taking into consideration that bootstrap validation may result in a too-positive impression of model performance since it inherently reflects the structure of the source data.Also, as our objective measurements were collected for a limited number of days, which leads to uncertainty in the resulting "true" mean exposure estimates, we would expect our models to perform even better in predicting mean exposures for longer periods of time, for which sampling variance will be less pronounced.
Our definition and operationalization of sedentary time is consistent with the majority of previous research on sedentary behavior (52,53).However, defining physical activity as being strictly complementary to sedentary time (ie, equivalent to "non-sedentary time") deviates from some previous studies (54,55).This reflects an inconsistency in the literature regarding how to define physical activity.For example, Smith, Hamer (55) defined physical activity on basis of a step count while Buman, Hekler (54) used thresholds based on accelerometer counts.Due to this discrepancy in the operational definition of physical activity, we emphasize that our models are valid only when adopting equivalent definitions of physical activity.

Practical implication of the findings and future research
Our full prediction models will be particularly useful in future studies in which data can be collected on all predictors.The models may also, with some hesitation, be used to obtain post-hoc estimates of objectively measured time spent sedentary and in physical activity at work in existing studies if data on the predictor variables included in our models are available.Danish national surveys of working life regularly collect self-reported information on all predictor variables used in our model.In the Danish national survey on work and health conducted in 2010 (56), data were collected on age, BMI, gender, OST, OPA, and job group using similar questions as in the present study.Thus, we believe that our full models have the potential to be used in a retrospect reconsideration of data on exposures and occupational risks, at least in Danish national surveys.We emphasize that the prediction models developed in the present study are specific to the source population of blue-collar workers, and we recommend future studies to test our models in other occupations and, if necessary, develop adjusted models that fit those occupations better.We also encourage studies testing our models for use in the general population.
Additional predictors that we did not include in our models may have the ability to predict sedentary time and physical activity, for instance psychosocial variables (26).Thus, we encourage investigations into the contents of other large-scale survey data materials to disclose whether they do, indeed, include the predictors used in our models (or subsets thereof) and/or additional potential predictors, for instance psychosocial variables.To this end, variables that may strongly depend on the character of the job, psychosocial factors being one example, should be included with caution, due to their potentially strong correlation with occupation; and a possible predictive ability should be interpreted with this correlation in mind.
We encourage efforts to develop new, short, yet reliable questionnaires assessing occupational sedentary behavior and physical activity, and verify if they can predict "true" exposures better than the self-reported OPA or OST used in our study.If so, they could be attractive for future large scale population surveys.In this line of development, we encourage paying consideration to the trade-off between resources required to collect predictors and the eventual performance of the model.We did not in the present study compare costs associated with collecting exposure data directly by objective measurements versus costs of developing prediction models using questionnaire data, but we emphasize that a major rationale for modeling exposures in future studies would be that they deliver more exposure data at a lower cost than objective measurements, and thus represent a favorable trade-off between cost and performance (57,58).We also emphasize the need for validating the performance of any future model in new datasets, either using a genuinely new sample or by internal bootstrap validation techniques; this has very rarely been attempted in previous modelling studies (41).

Concluding remarks
This study showed that full prediction models based on self-reported information that would normally be available in large epidemiological studies and surveys could predict objectively measured time spent sedentary and in physical activity at work with a reasonably good accuracy.Internal validation of the prediction models indicated that performance would decrease slightly if they were used in new datasets from the studied occupations, but that they may still be useful for other populations of blue-collar workers, with due caution.Single-predictor models using information specifically addressing sedentary behavior and physical activity showed lower performance than full models, but still offered an attractive opportunity to predict objectively measured exposures.We suggest that our models may be used to revisit previous studies based only on selfreports, as well as an inspiration for designing future large epidemiological studies and surveys addressing sedentary behavior and physical activity on the basis of self-reported information.

Figure 1 .
Figure 1.Objectively measured versus predicted occupational time spent sedentary (A; illustrating the full model in table 3) and in physical activity (B; illustrating the complementary full model in Appendix C).Line of identity is included in the diagrams.

Table 1 .
Descriptive a Mostly sedentary work that does not require strenuous physical activity.b Mostly work while standing or walking but not requiring strenuous physical activity.c Work while standing or walking with some lifting and carrying.d Heavy or fast moving work that is physically strenuous.

Table 2 .
Descriptives of objectively measured time spent sedentary and in physical activity at work among the studied 214 blue-collar workers.[Min=minimum; Max=maximum; SD=standard deviation] b Calculated using a standard one-way random effects model.

Table 3 .
Models predicting objectively measured sedentary time at work on the basis of answers to questions normally available in large epidemiological studies and surveys.[B=regression coefficient; 95% CI=95% confidence interval; R 2 adj =coefficient of determination (explained variance) adjusted for the number of terms in the model; OPA=occupational physical activity; OST=occupational sedentary time; MSE=mean squared error].
cWork while standing or walking with some lifting and carrying.d Heavy or fast moving work that is physically strenuous.