Cross-sectional evaluation of an internet-based hearing screening test in an occupational setting.

Objectives The Occupational Earcheck (OEC) is an online internet test to detect high-frequency hearing loss for the purposes of occupational hearing screening. In this study, we evaluated the OEC in an occupational setting in order to assess test sensitivity, specificity, and validity. Methods A cross-sectional study was conducted in 2015, in which the optimized OEC was evaluated on 94 employees from the army and three different companies in construction and manufacturing. Subjects underwent OEC in an office-like room. Pure-tone air conduction audiometry was performed as a reference test. The OEC was repeated for a subset of subjects (N=19). Important test characteristics (ie, sensitivity and specificity, test validity, and test-retest reliability) were assessed. Results When analyzed on the individual level, the sensitivity and specificity of OEC were 90% and 77%, respectively. The speech reception threshold results correlated strongly with the pure-tone average of the frequencies 3, 4 and 6 kHz, reflecting good test validity (r=0.79). The difference between test and retest was not significant. The intra-class correlation coefficient was moderate (r=0.57), indicating a reasonable agreement between test and retest. Conclusions The OEC appears to be a suitable test for the detection of high-frequency hearing loss among noise-exposed employees, with good sensitivity and specificity values, even when performed in a semi-controlled occupational setting, though a possible learning effect should be taken into account.

High-frequency hearing loss (HFHL) caused by excessive exposure to noise in the workplace (also known as noise-induced hearing loss (NIHL) is one of the most commonly reported occupational illnesses in the Netherlands (1). Various primary preventive measures for occupational HFHL exist, from interventions to control noise at the source, to the use of personal hearing protection devices. Primary preventive measures are not always effective (2). For this reason, secondary prevention of HFHL by screening employees exposed to noise becomes important. Early identification of HFHL may prompt actions to prevent progression of the hearing loss (3).
In many European countries, including the Netherlands, professional associations recommend that employees, who are exposed to noise levels greater than a time-weighted average of 80 dB(A) be provided with a periodic audiometric evaluation (4). This evaluation should be offered annually in order to monitor the employees' hearing abilities closely. However in practice, audiometric evaluation is incorporated into the preventative occupational health examinations, which are not offered this frequently. Moreover, participation rates among the employees are often low (5).
The traditional approach for occupational hearing evaluation is pure-tone air conduction audiometry.
Though pure-tone air conduction audiometry is the reference standard in clinical assessments, it is a costly and time-consuming method for screening. Hearing threshold assessment for both ears may take 15 minutes, depending on the tester's and participant's experience and motivation, and the number of frequencies measured. Moreover, obtaining reliable pure-tone hearing thresholds in an occupational setting is challenging. Pure-tone thresholds are subject to variability due to tester, participant, and environmental factors, but test procedure and equipment also play a role (6,7).
Online speech-in-noise testing (for the measurement of auditory speech recognition abilities in noise), promises to be a valuable alternative tool for hearing screening. It is easily accessible, low cost, and broadly applicable (8)(9)(10)(11)(12)(13). It allows hearing assessment of atrisk employees in a remote setting as it does not require specialized and costly technical equipment and therefore facilitates more frequent hearing assessments (14). The test measures the speech reception threshold (SRT), a measure of the ability to understand speech in noise. The SRT is defined as the critical signal-to-noise ratio (SNR) necessary for a person to recognize 50% of speech material correctly.
Several online tests have been developed for the Dutch language. The first test was a digit triplet test: the National Hearing Test (9). Commissioned by the Netherlands Hearing Health Foundation, the department of Audiology of the Leiden University Medical Center developed the Occupational Earcheck (OEC), which is based on similar principles. It was specifically developed to monitor the hearing ability of employees in noisy occupations and raise awareness of the damaging effects of noise on hearing. The test is designed to be very precise, as it tests both ears monaurally. The OEC was optimized and validated in a well-controlled laboratory setting at our department and showed a sensitivity of 93%, and a specificity of 94% for the detection of HFHL (Sheikh Rashid M, Leensen MC, de Laat JA, Dreschler WA. Evaluation of an optimized internet-based speechin-noise test for occupational noise-induced hearing loss screening: Occupational Earcheck. Submitted for publication). However, the test should also be evaluated in a noise-exposed population in an occupational environment in order to assess whether it is appropriate for screening purposes. In this study, we evaluated the OEC further in an unselected sample of noise-exposed subjects and in more realistic occupational settings than the laboratory environment. Our main objective was to evaluate whether the improved OEC is a valid and reliable screening test to detect HFHL in a high-risk population.

Study population
The study participants were recruited from the army and three different companies in construction and manufacturing. With consent of the company management, information letters were sent to employees of several noisy departments in the companies and the army. In total, 102 employees volunteered to participate. Participants were adults (≥18 years) and Dutch speakers. The medical ethics committee of the University of Amsterdam approved the study protocol (number 2013_231). Informed consent was obtained for all subjects.

Measurement procedure
A cross-sectional study was carried out in 2015. The index test (OEC) and the reference test (pure-tone air conduction audiometry) were performed in a single test session during which the subject's demographic details (including gender, age, and occupational noise exposure) were also collected by means of a short questionnaire. The question concerning occupational noise exposure was: "How many days a week do you work in noise [noise is defined as sound levels >80 dB(A), or when talking with a raised voice at a distance of 1 m is required]?" The measurements were performed at five representative occupational test locations, in quiet office-like rooms. One of the companies had multiple sites, therefore the measurements were performed at two different locations. Ambient noise level measurements were performed at the test sites prior to testing. The audiometric test conditions of all test locations met the international standards for hearing screening (ie, unmasked air conduction starting at 500 Hz; ISO 8253, part I) when sound attenuating cups are used in combination with the headphones.
Each subject completed the OEC on their own with minimal supervision by the testers. A subgroup (every 5 th subject) repeated the OEC a second time. Hereafter pure-tone air conduction audiometry was performed as a reference. Both ears were measured at the octave frequencies 500-8000 Hz, including 3000 and 6000 Hz. Pure-tone air conduction audiometry was performed by two trained test operators using an Interacoustics AC40 or AD 229b clinical audiometer in combination with TDH 39 headphones with sound attenuating cups (Amplivox audiocups). For the OEC measurements, a research laptop and Sennheiser HDA 200 headphones were used. The testers who evaluated OEC were not aware of the results of the pure-tone air conduction audiometry, and vice versa. A complete measurement including instructions and informed consent (5 minutes), questionnaire (5 minutes), OEC (5 minutes) and pure-

Occupational Earcheck
The speech material of OEC consists of a closed set of eight Dutch consonant-vowel consonant (CVC) words: meaningful correlation of r=0.58 between SRT results and a pure-tone average (PTA) of the higher frequencies (15). This sample size would provide 80% power to discover a correlation which is statistically different from a moderate correlation of r=0.30 at the 0.05 significance level. Descriptive statistics were performed on demographic information and pure-tone thresholds. True HFHL on the basis of pure-tone air conduction audiometry was defined as a PTA of the frequencies 3, 4 and 6 kHz (PTA 346 ) of 25 dB HL or worse. SRT results of the OEC in dB SNR for the first ear tested were compared for HFHL and non HFHL ears by means of an independent samples t-test. To assess test validity, the OEC SRT results of the first ear measured were compared to PTA 346 in dB HL of the corresponding ear by means of a Pearson product correlation coefficient. To further assess the discriminative power of the test, a receiver operating characteristic (ROC) analysis was performed on SRT results of the first ear measured. By means of this analysis, an appropriate cut-off value for pass/fail of the OEC was estimated, and corresponding test sensitivity and specificity values for detecting HFHL were assessed monaurally. To assess the sensitivity and specificity on the individual level, true HFHL was defined as a PTA 346 of 25 dB HL or worse for at least one ear (HFHL 1+). Both ears of one subject had to have a lower score than the cut-off value of OEC in order to pass the screening test. An individual with a test result equal to or higher than the cut-off value for at least one ear would get a positive test result. To assess test reliability, test and retest results of the first ear measurement of a subgroup were compared with a paired sample t-test. Two parameters were calculated, the intra-class correlation coefficient (ICC, two-way random, absolute agreement, single measures), and the measurement error. The ICC was calculated to get an insight into the degree of agreement between test and retest results. In order to assess the consistency of the test results, the measurement error was calculated by taking the quadratic mean of the within-subject standard deviations of the repeated measurements. Data were analyzed using SPSS statistics version 22 (IBM Corp, Armonk, NY, USA) .

Results
In total, 102 subjects volunteered to participate; 6 did not attend on the day of the test and 2 were excluded from analysis due to invalid OEC measurements (OEC test was presented on both ears at the same time, instead of one ear). The remaining 94 subjects all performed the index test (OEC) and the reference test (pure-tone air conduction audiometry). The flow of the participants through the study is depicted in figure 1: 30 subjects had They are represented by eight response buttons on a visual screen, identified by a picture and a written word. A ninth button labelled "not recognized" is included. The words were selected from the Dutch wordlist used for diagnostic speech audiometry (16) and contain matching vowels and high-frequency consonants, making the test more sensitive for the detection of HFHL. In order to acquire a precise test, the intelligibility of the individual words in noise was equalized with level adjustments. These level adjustments were derived from the slopes of word-specific psychometric functions, based on previously performed tests (17). The test is presented in a stationary masking noise, matched to the long-term average speech spectrum of the words, except for the higher frequencies: the matched masking noise is low-pass filtered (cut-off frequency 1.4 kHz), and has a noise floor of -12 dB SNR. The test consists of 25 stimuli per ear, making it a relatively short test which can be performed within five minutes.
Test presentation is monotic: both left and right ear are tested separately. The sequence of the ears is randomly assigned by the OEC. The volume level of the stimuli can be set by the user to a comfortable loudness by means of a volume scale, resulting in individual test intensities. The test is administered by means of the simple adaptive up-down procedure with a step size of 2 dB. The first stimulus is presented at a SNR of 0 dB. With every correct response, the subsequent stimulus level is decreased by 2 dB, and with every incorrect answer the stimulus is increased by 2 dB. The noise level remains fixed throughout the test. The SNR presented range from -30-0 dB. The actual calculation starts at the SNR of the first incorrect response, resulting in an individual starting level. The SRT is calculated by averaging the SNR of stimuli 6-25 per ear. The intra-test standard deviation (SD) is calculated using the same stimuli and gives an insight into the variation within a single test measurement. It can therefore be used as a measure of the accuracy of a test performed by an individual.

Statistical analyses
A sample size calculation was performed, indicating that ≥79 subjects were needed in order to detect a (bed /bεt/, knife /mεs/, bag /tαs/, pan /pαn/, cat /pus/, book /buk/, sock /sͻk/, sun /zͻn/) a HFHL (1+), of which 17 had a HFHL at both ears, 4 at the right ear only, and 9 at the left ear only; 64 subjects did not have a HFHL. Of the 30 subjects with HFHL (1+), all were male, with a mean age of 52.3 years (SD 7.3). The majority reported working in noise for at least half a working day per week, with an average of 3.8 days a week (SD 1.5) (Information concerning this question was missing for 1 subject in this group). A majority of the 64 non HFHL subjects were male (92.2%), with a mean age of 36.4 years (SD 10.6). The majority reported working in noise for at least half a working day per week, with an average of 3.1 days a week (SD 1.9) (Information concerning this question was missing for one subject in this group). Across the five test locations, only small variations in gender, age, and SRT scores were observed. The distribution of audiometric hearing threshold levels for HFHL and non HFHL ears is shown in figure 2.
In order to assess how well the OEC discriminates HFHL from non HFHL, SRT test results of HFHL ears were compared to those of non HFHL ears (for the first ear tested). The mean SRT was -11.4 dB SNR (SD 4.2) for HFHL ears, and -16.7 dB (SD 2.2) for non HFHL ears. The difference of 5.3 dB SNR was statistically significant (P<0.001).
To assess the validity of the OEC, the SRT results of the first ear tested were compared to the pure-tone audiogram of the corresponding ear. As shown in figure  3, SRT results correlated strongly with PTA 346 (r=0.79, P<0.001).
A ROC analysis was used to assess the most appropriate cut-off value for a dichotomous pass/fail outcome with the best trade-off between sensitivity and specificity values using monaural data of the first measurement. The highest agreement between hearing thresholds and OEC test results was found when the cut-off value was set at -14.9 dB SNR ( figure 4). This setting resulted in a sensitivity of 83% and a specificity of 75% in order to identify HFHL with PTA 346 of 25 dB HL or worse. The area under the curve (AUC) was 0.89 [95% confidence interval (95% CI) 0.81--0.97)]. Table 1 presents the OEC results (positive for at least one ear and negative for both ears) compared to pure-tone air conduction audiometry results (HFHL for at least one ear and non HFHL) on the individual level. When taking both ears into account, the sensitivity was 90% and the specificity was 77%.
A subgroup of 19 subjects performed the OEC twice. The mean SRT scores for test and retest (for the first ear) were compared. Performance on retest, with a mean SRT of -16.9 dB SNR (SD 2.4) was better than on the initial test, with a mean SRT of -16.0 dB SNR (SD 3.0). This indicated a learning effect of 0.9 dB SNR, but this was not statistically significant (95% CI -0.3-2.1, P=0.12). The test and retest results were moderately correlated, with an ICC of 0.57 (P=0.003). The measurement error was 1.8 dB SNR.

Discussion
The OEC distinguished well between HFHL and non HFHL ears, with a significant difference between the mean SRT results of 5.3 dB SNR for the first ear measurement. The test showed a high correlation of 0.79 between SRT results and PTA 346 . In this study, a sensitivity of 83% and a specificity of 75% was found. The high AUC (0.89) value indicated good test accuracy. These analyses were based on test results of single ear measurements. Results of each of a subject's ear were studied separately in order to properly assess the OEC's test properties. In order to reduce the possible influence of a learning effect, we used the results of the first ear tested for this measurement. However, for practical screening purposes, the main focus is on the outcome at the level of the individual tested, and both ears per subject should be taken into account. The assessment of test results on the binaural level is important in order to make the correct decisions for referral, further comprehensive audiological assessment, and recommendations for the appropriate intervention. Therefore,    sensitivity and specificity values were established on the individual level as well. Based on the classification of HFHL for at least one ear versus no HFHL for both ears, the sensitivity (or proportion of true positives) on the individual level increased to 90% and the specificity (or the proportion of true negatives) to 77%.
In a well-controlled laboratory-based study of the OEC with normal-hearing subjects and HFHL subjects, a sensitivity of 93% and a specificity of 94% were found, as well as a high correlation between SRT results and high frequency PTA (r=0.83) (Sheikh Rashid M, Leensen MC, de Laat JA, Dreschler WA. Evaluation of an optimized internet-based speech-in-noise test for occupational noise-induced hearing loss screening: Occupational Earcheck. Submitted for publication). We found poorer test characteristics (sensitivity, specificity and correlation with high frequency PTA) in this study in an occupational setting. A possible explanation for the differences found is that the laboratory study had a study sample, which consisted of young normal-hearing students on the one hand, and known HFHL cases on the other. The current study consisted of an unselected group of noise-exposed employees, classified as either having a HFHL or not. Noise-induced HFHL might have been the most probable hearing loss in this high-risk population, however, age-related hearing losses (ie, presbyacusis) could not be ruled out, as the HFHL group was significantly older compared to the non HFHL group. This can be attributed both to a longer period of noise exposure and to the (early) effects of presbyacusis. Furthermore, the reference standard was carried out differently in both studies. In the lab study, clinical pure-tone air conduction audiometry was performed in a soundproof booth, while in the current study, it was performed in poorer testing conditions, which may have led to less reliable measurements.
Jansen et al (18), performed a similar study, in which noise-exposed workers completed the broadband digit triplet SRT self-test in an office-like room at five different industrial settings. Their findings were slightly more favorable relative to the findings presented in this paper. They found a higher sensitivity and specificity for detecting mild HFHL (92% and 89%, respectively), and a lower measurement error (0.8 dB). The differences in findings may be explained by their use of digit triplets in a broadband noise. The simplified speech material is less influenced by non-auditory cognitive abilities, andin combination with the broadband noiseleads to more reliable estimations of the SRT. The use of meaningful words in a speech-in noise test such as in OEC, however, may be valuable for screening purposes as it is representative of daily communication situations experienced by the population being screened. Also, the use of a low-pass filtered noise instead of an unfiltered broadband noise has shown to improve the discrimination between HFHL and normal hearing/other losses (17,19). Differences in study methods (such as the chosen definition of HFHL, measurements for one or both ears, and the calculation of the measurement error) and study population may also have explained the differences found between the studies.
The OEC can serve as a valuable screening method for HFHL in occupational settings. We aimed to develop a test that can improve a reliable differentiation of HFHL from normal hearing, and isolated low-frequency hearing losses. A comprehensive diagnostic audiological evaluation, is only indicated when the OEC result is positive. HFHL identified by OEC is probably related to noise exposure, but may also reflect another form of HFHL. The actual type and degree of the hearing loss should then be specified in further full diagnostic audiological evaluation after which appropriate measures can be advised.
This study showed some difficulties concerning the practical implementation of the OEC. An important issue was the reasonable test-retest reliability. The relatively large measurement error found may be due to a learning effect between both ear measurements within one test. Only a small subgroup performed the test twice, so even though we did not find a statistically significant difference between test and retest, a possible learning effect cannot be ruled out, and its influence on test results remains unknown. A learning effect may have led to higher estimated SRT values (especially for the first ear measured) and the relatively high number of false positive HFHL classifications. The 77% specificity found at the individual level would in practice result in a large proportion of employees incorrectly identified as having a HFHL, and consequently unnecessarily referred for comprehensive testing. The high false-positive rate may decrease by introducing a retest for subjects with a positive test score. It is important to further investigate the effects of a direct automatic retest on test sensitivity and specificity of the OEC applied at an individual level.
Another important limitation is that the study population consisted of volunteers, creating a risk of sample selection bias. This type of bias should not influence  Positive 1+  27  15  42  Negative  3  49  52  Total  30  64  94   a True high-frequency loss for at least one ear (HFHL 1+) is defined as a pure-tone average of the frequencies 3, 4, 6 kHz (PTA 346 ) according to the pure-tone air conduction audiometry test. b Occupational Earcheck (OEC) result based on a cut-off value of -14.9 dB SNR to discriminate between a positive result for at least one ear (1+), and a negative result for both ears.
Sheikh Rashid et al the comparison of pure-tone air conduction audiometry results with OEC results, as both tests were performed by all participants. However, this bias may have affected certain study population characteristics such as the prevalence and the severity of HFHL, as more health conscious employees or employees with significant hearing problems may have volunteered to participate. As the severity of hearing loss is associated with sensitivity and specificity, the values that were established in this population may not be entirely applicable to other populations of noise-exposed employees.
The study demonstrated a good agreement between test result and hearing status according to the conventional audiogram. However, this optimal cut-off value of the pass/fail outcomes was determined post hoc, and may have led to an overestimation of the accuracy of the OEC. For these reasons it is important to validate the new threshold criteria in other noise-exposed samples.
Future studies concerning the development of the OEC should focus on its applicability to specific populations, its feasibility in different testing environments, and its special requirements. For instance, the OEC may be used as a monitoring tool and be applied on an annual basis to identify small changes in hearing. Therefore, the test-retest reliability of OEC should be assessed in more detail, taking into account the learning effect between tests.

Concluding remarks
In this study, we assessed the accuracy of OEC for screening purposes in realistic occupational settings. This paper demonstrated that the OEC is able to detect HFHL, even in less optimal occupational settings. A good discriminative power was achieved, as reflected by the sensitivity and specificity values of 90% and 77%, respectively.