Using Audiometric Data to Weigh and Prioritize Factors that Affect Workers’
Hearing Loss through Support Vector Machine (SVM) Algorithm

Workers’ exposure to excessive noise is a big universal work-related challenges. One of the major consequences of exposure to noise is permanent or transient hearing loss. The current study sought to utilize audiometric data to weigh and prioritize the factors affecting workers’ hearing loss based using the Support Vector Machine (SVM) algorithm. This cross sectional-descriptive study was conducted in 2017 in a mining industry in southeast Iran. The participating workers (n = 150) were divided into three groups of 50 based on the sound pressure level to which they were exposed (two experimental groups and one control group). Audiometric tests were carried out for all members of each group. The study generally entailed the following steps: (1) selecting predicting variables to weigh and prioritize factors affecting hearing loss; (2) conducting audiometric tests and assessing permanent hearing loss in each ear and then evaluating total hearing loss; (3) categorizing different types of hearing loss; (4) weighing and prioritizing factors that affect hearing loss based on the SVM algorithm; and (5) assessing the error rate and accuracy of the models. The collected data were fed into SPSS 18, followed by conducting linear regression and paired samples t-test. It was revealed that, in the first model (SPL < 70 dBA), the frequency of 8 KHz had the greatest impact (with a weight of 33%), while noise had the smallest influence (with a weight of 5%). The accuracy of this model was 100%. In the second model (70 < SPL < 80 dBA), the frequency of 4 KHz had the most profound effect (with a weight of 21%), whereas the frequency of 250 Hz had the lowest impact (with a weight of 6%). The accuracy of this model was 100% too. In the third model (SPL > 85 dBA), the frequency of 4 KHz had the highest impact (with a weight of 22%), while the frequency of 250 Hz had the smallest influence (with a weight of 3%). The accuracy of this model was 100% too. In the fourth model, the frequency of 4 KHz had the greatest effect (with a weight of 24%), while the frequency of 500 Hz had the smallest effect (with a weight of 4%). The accuracy of this model was found to be 94%. According to the modeling conducted using the This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Sound & Vibration DOI:10.32604/sv.2020.08839 Article ech T Press Science

Abstract: Workers' exposure to excessive noise is a big universal work-related challenges. One of the major consequences of exposure to noise is permanent or transient hearing loss. The current study sought to utilize audiometric data to weigh and prioritize the factors affecting workers' hearing loss based using the Support Vector Machine (SVM) algorithm. This cross sectional-descriptive study was conducted in 2017 in a mining industry in southeast Iran. The participating workers (n = 150) were divided into three groups of 50 based on the sound pressure level to which they were exposed (two experimental groups and one control group). Audiometric tests were carried out for all members of each group. The study generally entailed the following steps: (1) selecting predicting variables to weigh and prioritize factors affecting hearing loss; (2) conducting audiometric tests and assessing permanent hearing loss in each ear and then evaluating total hearing loss; (3) categorizing different types of hearing loss; (4) weighing and prioritizing factors that affect hearing loss based on the SVM algorithm; and (5) assessing the error rate and accuracy of the models. The collected data were fed into SPSS 18, followed by conducting linear regression and paired samples t-test. It was revealed that, in the first model (SPL < 70 dBA), the frequency of 8 KHz had the greatest impact (with a weight of 33%), while noise had the smallest influence (with a weight of 5%). The accuracy of this model was 100%. In the second model (70 < SPL < 80 dBA), the frequency of 4 KHz had the most profound effect (with a weight of 21%), whereas the frequency of 250 Hz had the lowest impact (with a weight of 6%). The accuracy of this model was 100% too. In the third model (SPL > 85 dBA), the frequency of 4 KHz had the highest impact (with a weight of 22%), while the frequency of 250 Hz had the smallest influence (with a weight of 3%). The accuracy of this model was 100% too. In the fourth model, the frequency of 4 KHz had the greatest effect (with a weight of 24%), while the frequency of 500 Hz had the smallest effect (with a weight of 4%). The accuracy of this model was found to be 94%. According to the modeling conducted using the

Introduction
Noise exposure is a common consequence of industrialization that there is no significant relationship between its pressure, frequency, and wavelength ranges; These types of noises (that have no significant relationship between its characteristics (pressure, frequency, and wavelength ranges)), are produced and emitted considerably in different industries [1]. In fact, exposure to excessive noise has become a widelyblamed harmful agent and a major risk factor among workers in various industries [2]. Numerous studies have shown that harmful factors in the workplace such as noise, vibration, and shift work and so on have detrimental effects on workers' health [3][4][5][6]. The effects of environmental noise pollution have become even more profound because of the speedy growth of economy and urbanization [7].
In the US, over 30 million workers are exposed to dangerous noises (85 dB) (A-weighted noise exposure level normalized to an 8 hour working day) and 7.4-10 million industrial workers are at the risk of hearing loss caused by occupational noise [8]. In Michigan only, around 86000 people are suffering from noiseinduced hearing loss. Although this hazard can be prevented, hearing loss is a common occupational disease in the US [9]. In 1990, $200 million were paid in compensation for hearing loss caused by noise exposure [10]. On the other hand, over the past 10 years, the proportion of the European population at the risk of exposure to noises above 65 dB has increased from 15% to 26% [11].
Exposure to excessive noise will lead to physiological (high blood pressure, adrenaline secretion, more likelihood of heart attack, change in respiratory rate and the amount of consumed oxygen, deterioration of the auditory system, and increase in stomach and intestinal activities) as well as psychological, social, and economic complications. Furthermore, work efficiency tends to decline in people who are exposed to excessive noise. Such individuals are also more likely to face problems in their communicative abilities and understanding of warning/safety sings [12,13]. Many cases of absenteeism in industrial environments and workers' report of chronic fatigue and suffering in many indicators of physical health can further be attributed to excessive noise exposure. Research has also revealed that people who are in contact with harmful noise demonstrate impaired psychological responses, especially in stressful situations, and report reduction in the quality and quantity of their sleep [14]. One of the major consequences of exposure to excessive noise is permanent or transient hearing loss [8].
Noise-induced hearing loss commonly occurs in the first 10-15 years of work if workers are in contact with high frequency noises (over 4000 Hz). The degree of the influence of excessive noise exposure depends on various personal and environmental factors. Noise-induced hearing loss classically affects frequencies less than 3, 4, and 6 KHz first. If exposure to noise prolongs for longer times, individuals' ability in hearing higher frequency noises will be impaired too. Thus, audiometry should be conducted for the definitive diagnosis of hearing loss [9]. Age and work experience are believed to influence both permanent and transient noise-induced hearing loss [10].
Audiometry is a method for assessing people's hearing sensitivity, hence shedding light on the nature, degree, and probable causes of hearing impairment [15,16]. Using this technique, auditory stimuli with different intensity levels are presented to the person and his/her responses are recorded. The hearing threshold is determined in the light of the minimum intensity level stimulus to which the person responds consistently. The output of this diagnostic test, known as audiogram, can be used for adopting appropriate treatments and hearing aids [16,17].
Data mining (DM), which is a subfield of computer science and is located at the intersection of artificial intelligence, machine learning, statistics, and database systems, is a process for detecting patterns in large data sets. DM is intended to extract information from a set of data and transform it to an easily understandable structure. The extracted information may include categorization and estimation of outcomes after an intervention, detection of associations among different variables, or prediction of deviations [18]. In the current study, DM was adopted for two reasons: first, DM is different from applying algorithms blindly, hence the usefulness of the extracted information. Thus, successful application of DM needs appropriate knowledge within the studied domain and clearly stated objectives [19]. Second, the use of DM in new areas usually casts light on interesting issues that are unknown to commonly used techniques. This presents opportunities for useful algorithmic development and extensions [20].
Over the past years, since support vector machines (SVM) has demonstrated appropriate generalization performance in a wide array of learning problems (e.g., handwritten digit recognition, classification of web pages, and face detection), researchers in the community of machine learning have become interested in its application. Problems like multilocal minima, curse of dimensionality, and neural network overfitting are rarely observed in SVM. Further, originating from statistical learning theory, SVM has a rigid theoretical basis. Nonetheless, training SVM is still a challenge, particularly for a large-scale learning problem. As a result, it is necessary to devise a fast training SVM algorithm to be applied to various engineering problems in our field [21].
Since hearing loss is a widespread disorder caused by excessive noise, it imposes huge costs on industrial workers or individuals who are in contact with noise. However, few studies have tried to weigh and prioritize factors affecting workers' hearing loss based on audiometric findings by the use of SVM algorithm. Therefore, the present study aimed to: 1-Determine workers' equivalent sound level and other predictor factors. 2-Assess the hearing loss of both ears. 3-Weigh and prioritize factors that may affect hearing loss based on SVM algorithm. 4-Estimate the error rate and accuracy of the models emerging from the SVM algorithm.

Sampling Procedure
The study was conducted in a mining industry in southeast Iran. In the light of individuals' equivalent sound level, the findings of previous studies, and the type of algorithm used in modeling hearing loss, three groups (a control group in the office that was exposed to a low equivalent sound level and two case groups that were selected from the workshop and were exposed to high equivalent sound level) were generally involved in the study. Based on the sound equivalent level of the three groups, fifty participants were included in each group, hence having 150 participants in total [22].

Research Design
This study adopted a cross-sectional, descriptive, analytical, prospective design. It generally entailed the following steps: (1) selecting predicting variables to weigh and prioritize factors affecting hearing loss; (2) conducting audiometric tests and assessing permanent hearing loss in each ear and then evaluating total hearing loss; (3) categorizing different types of hearing loss; (4) weighing and prioritizing factors that affect hearing loss based on the SVM algorithm; and (5) assessing the error rate and accuracy of the models [23].

Selecting the Variables Influencing Hearing Loss (PREDICTORS)
Age, work experience, equivalent sound level, and frequency were taken into consideration for weighing and prioritizing factors that may influence hearing loss among individuals [23,24]. All the participants were adults with three age ranges, namely 20-35, 35-50, and over 50 years old [23]. With regard to their work experience, the participants were divided into three groups too: less than 10 years of experience, between 10 and 20 years of experience, and over 20 years of experience [22]. Following ISO 9612, equivalent sound level was measured using a TES-1345 dosimeter manufactured in Taiwan. Before using the machine, a CEL 110/2 calibrator (made in the UK) was utilized to calibrate the dosimeter [25,26]. Puretoe hearing thresholds were recorded at 250, 500, 1000, 2000, 4000, 6000 and 8000 Hz using a CA 120 clinical audiometer manufactured in Denmark [27,28]. Upon eliminating the effect of age, hearing loss estimates in four frequencies (500, 1000, 2000, and 4000 Hz) were entered into Eq. (1), hence computing noise-induced hearing loss (NIHL). Then, Eq. (2) was exploited to calculate permanent hearing loss [28,29]. (1) TL: the hearing loss of each ear in a particular frequency (dB) NIHL: noise-induced hearing loss (dB) NIHL t : the permanent hearing loss in both ears (dB) NIHL b : the permanent hearing loss of the better ear (dB) NIHL p : the permanent hearing loss of the poor ear (dB) According to the classification proposed by the World Health Organization (WHO), the hearing ability in the range of 0-25 dBA is regarded as normal hearing, while that in the range of 40-26 dBA is known as mild injury. In addition, people with hearing thresholds of 41-60 dBA, 61-80 dBA, and over 80 dBA are believed to suffer from moderate, severe, and profound hearing loss respectively [30].

Weighing and Prioritizing Factors Affecting Hearing Loss Using SVM
Finally, modeling the hearing loss of workers was done by entering all information about age, work experience, equivalent sound level and hearing loss frequencies in IBM SPSS Modeler 18.0. The algorithm's function is described as follows.

Support Vector Machine Algorithm
A major way through which people perceive the world and receive the required knowledge is learning from the data. Let fx i ; y i g, i ¼ 1; …; l, y i 2 fÀ1; 1g and x i 2 IR n be the training samples where x i is the training vector and y i is its corresponding target value. To have consistent notations, lowercase bold letters are used to refer to column vectors (for example, x i ), while uppercase bold letters are used to show matrices. The notation ðAÞ ij is used to refer to the (i th) rowand the (j th) column element of matrix ðAÞ [31]. According to Boser et al., the training Support Vector Machine for a pattern recognition problem can be formulated as the quadratic optimization problem [32]. Its function is showed in Eq. (3): maximize : where a is a vector of length l and its component a i corresponds to a training sample fx i ; y i g, Q is a l Â l semidefinite kernel matrix, and C is a parameter chosen by the user. A larger C assigns a higher penalty to the training errors. The training vector x i whose corresponding a i is nonzero is known as support vector. Support vector machine maps training vector x i into a high-dimensional feature space by the function ÈðxÞ such that ðQÞ ij ¼ y i y j ðKÞ ij ¼ y i y j Kðx i ; x j Þ and Kðx i ; When the above optimization problem is solved, we can obtain an optimal hyperplane in a high-dimensional feature space to make a distinction between the two-class samples. The decision function is given by Eqs. (4) and (5): In the latter algorithm, a technique presented by Keerthi et al. is used to choose the two variables for optimization and determine the stopping conditions. First, training patterns are split into five sets [33,34]: Then, we define them in Eq. (6): where

Assessing the Accuracy of the Model Generated by SVM Algorithm
In categorization algorithms, which are used for classifying discrete output variables, criteria such as accuracy, confusion matrix, sensitivity, features, etc. are used for conducting assessment. In the current study, accuracy and confusion matrix were utilized. Confusion matrix is a square matrix whose dimensions are equal to the number of output categories. In this matrix, the main diameter indicates the percentage of accurate predictions. According to Eq. (7), accuracy of the model is obtained by dividing the percentage of correct predictions by all predictions [35].

Accuracy ¼
True Postive cases þ True Negative cases All cases (7)

Ethical Considerations
Ethical approval was sought from the Ethics Committee of Kerman University of Medical Sciences (ID: IR.KMU.REC. 1396.2458). In addition, all the participants signed a written consent form prior to the study.

Data Analysis
The collected data were fed into the Statistical Package for Social Sciences (SPSS) version 18 (SPSS, Inc., Chicago, Illinois, USA).The obtained mean scores, standard deviations, and correlation coefficients were analyzed using linear regression and paired samples t-test. Also, IBM SPSS Modeler 18.0 was used to model changes in hearing loss.

Demographic Information
Tab. 1 demonstrates the participants' demographic information.

Measuring Equal Sound Level
The participants in the first group were exposed to equal sound level of less than 70 dB, while those in the second and third groups were exposed to equal sound levels of 70-80 dB and over 85 dB respectively. The mean scores and standard deviations of equivalent sound level for the three groups respective were 70 ± 3 dB, 77.62 ± 4.43 dB, and 89.7 ± 3.03 dB.

The Results of Hearing Loss
Tab. 2 displays the results related to hearing loss of workers' both ears. The results are divided into 5 categories in the light of the degree of hearing loss. The results of paired samples t-test showed no significant difference in the hearing loss of right and left ears of participants in the three groups in similar frequencies (P > 0.05).

Modeling Hearing Loss Changes
In this study, four different models of hearing loss changes were calculated. The first model represents the audiometric data of the first group (SPL < 70 dBA), the second model entails the audiometric data of the second group (70 < SPL < 80 dBA), the third model demonstrates the audiometric data of workers in the thrid group (SPL > 85 dBA), and the fourth model is based on the audiometric data of all participating workers.  Tab. 3 illustrates the results of the corresponding confusion matrix. The model presented an accurate prediction for all workers whose hearing loss was normal or mild. The accuracy of SVM algorithm in this modeling was equal to 100%.

The Second Model: Modeling Hearing Loss Changes Based on Audiometric Data of Workers in the
Second Group (70 < SPL < 80 dBA) Fig. 2 demonstrates the output model presented for the hearing loss changes of workers in the second group. As observed, the frequencies of 4 KHz (with a weight of 21%) and 1 KHz (with a weight of 18%) had the highest impact. Frequencies of 2 KHz and 500 Hz came third and fourth in that order. Finally, the frequency of 250 KHz (with a weight of 6%) had the lowest influence.
This model accurately predicted the hearing loss of workers in the normal, mild, and moderate categories. The accuracy of SVM algorithm in this modeling was found to be 100%. The results of the confusion matrix corresponding to the algorithm of the second model are presented in Tab. 4.  The results of the corresponding confusion matrix are presented in Tab. 5. Accordingly, the model accurately predicts hearing loss among workers with normal, mild, moderate, and severe hearing loss. The accuracy of SVM algorithm in this modeling is 100%.

The Fourth Model: Modeling Hearing Loss Changes Based on Audiometric Data of Workers in all Groups
The results of the fourth model (which included the data of all the 150 participants) are presented in Fig. 4. It is observed that the frequency of 4 KHz (with a weight of 24%) had the greatest impact on   Tab. 6 contains the results of the corresponding confusion matrix. Accordingly, the model accurately predicts hearing loss among workers with normal, moderate, and severe hearing loss. However, with regard to workers with mild hearing loss, the model predicted that 25.71% of them had normal in terms of their hearing ability and the rest (74.28%) had mild hearing loss. The algorithm accuracy in this regard is 94%, with an error rate of 6%.   Figure 4: The fourth model: the weight (%) of hearing loss predicting variables for participants in the three groups

Discussion
This study used SVM algorithm to weigh and prioritize the factors that affect workers' hearing loss based on audiometric findings. The factors such as age, work experience, A-weighted equivalent sound pressure levels and frequencies were considered predictor factors. There were severity hearing loss (target factor) in the second and third groups. The results showed that the average exposure to equal level sound for the first, second, and third groups were 70 ± 3 dBA, 77.62 ± 4.43 dBA, and 89.7 ± 3.03 dBA respectively. It was also discovered that, in the first two groups, age and work experience had significant correlations with hearing loss. More specifically, the correlation coefficients for the relationship between age and hearing loss were r = 0.385 (P = 0.008) for the first group and r = 0.394 (P = 0.008) for the second one. Additionally, the correlation coefficients for the relationship between work experience and hearing loss were r = 0.362 (P = 0.014) for the first group and r = 0.32 (P = 0.038) for the second group. However, no significant association was detected between age/experience and hearing loss in the third group. That is, the correlation coefficient of the relationship between age and hearing loss was r = 0.189 (P = 0.277), while that of the association between experience and hearing loss was r = 0.28 (P = 0.076). The results of Pearson correlation and linear regression further revealed a statistically considerable relationship between noise exposure and hearing loss for all the participants (n = 150) (r = 0.414, P = 0.0001).
In a descriptive, analytical, cross-sectional study, Halvani et al. [9] examined the correlation between noise exposure and hearing loss among the workers of a textile factor, Taban-e-Yazd. They investigated hearing loss in the left and right ears of 100 workers who worked in spinning, knitting, and mechanical sectors of the factory. In this study, the workers were exposed to noises with frequencies of 250 Hz, 500 Hz, 1000 Hz, 2000 Hz, 4000 Hz, and 8000 Hz and their hearing ability was measured using audiometric tests. The results showed that, keeping the work experience constant, a rise of one unit in the sound pressure level would lead to 18% noise-induced hearing loss. Also, a rise of one unit in the work experience would result in 37% noise-induced hearing loss [9]. In line with these findings, the results of the current study also indicated that the likelihood of hearing loss goes up as a result of increasing work experience. In another study, Tajic et al. (2008) examined the effect of noise pollution on the auditory system of workers in a metal factory in Arak. The participants worked in the sheet metal manufacturing, assembly, and welding sectors. The results demonstrated that the highest degree of hearing loss was recorded among workers in the age range of 41-50 years with 21-30 years of work experience. They also discovered significant relationships between hearing loss, on the one hand, and sound pressure level, age, and work experience, on the other [36]. The results of the present study are in agreement with the findings reported by Tajic et al. [36], meaning that the rise of age, work experience, and sound pressure level would lead to higher possibility of hearing loss.
According to the models generated by the SVM algorithm, in the first model (SPL < 70 dBA), the frequency of 8 KHz had the highest weight (33%), whereas noise had the smallest weight (5%) (Fig. 1). This model accurately predicted the hearing loss of people with normal hearing ability or mild hearing Dubno et al. [37] aimed to categorize audiometric phenotypes caused by aging through using animal models. They gathered audiograms from 338 samples classified into four phenotypes (older normal, metabolic, sensory, metabolic + sensory). Accuracy of the QDA analysis, SVM algorithm, and RF were found to be 93.2%, 89.9%, and 89.3% respectively. The accuracy indices obtained in their study is relatively low compared to the ones obtained in the current research, in which the accuracy of the first three models was 100% and the accuracy of the fourth one was 94%. Acir et al. [38] conducted a study entitled "automatic classification of auditory brainstem responses using SVM-based feature selection algorithm for threshold detection". In their study, the findings were classified according to the SVM algorithm, with the reported accuracy being 96.2%. The accuracy obtained in the current study is as high as the one reported by Acir et al. [38].
Wang et al. [39] carried out another study entitled "wavelet entropy and directed acyclic graph support vector machine for detection of patients with unilateral hearing loss in MRI scanning". They scanned 49 patients using MRI and subsequently classified them into three groups; the first group included 14 people with right ear hearing loss (RHL), the second one included 15 members with left ear hearing loss (LHL), and the third group, which comprised 20 individuals, functioned as the control group (HC). They selected wavelet entropy (WE) from the MRI of each participant and subsequently submitted it to a directed acyclic graph support vector machine method (DAG-SVM). The accuracy of DAG-SVM method for the RHL, LHL and HC groups were found to be 97.14%, 96.33%, and 96.73% respectively [39]. The accuracy indices obtained in the current study are high (close to 100%), like the ones obtained by Wang et al. [39] Taylor et al. [40] conducted another study entitled "Audio Gene: predicting hearing loss genotypes from phenotypes to guide genetic screening". They intended to describe the algorithm underlying Audio Gene by employing a software system which utilized machine learning techniques. These techniques used phenotypic information derived from audiograms to estimate the genetic cause of hearing loss in people segregating ADNSHL. The results indicated that Audio Gene had an accuracy of 68% in predicting the causative gene within its top three predictions. In contrast, the accuracy reported for a majority classifier was found to be 44%. The accuracy indices reported in their study was relatively low, while the ones obtained in the current research were high.
Majumder et al. [22] used both unsupervised (i.e., Expectation Maximization (EM), K-means, Linear Vector Quantization (LVQ), and Self Organization Map (SOM)) and supervised (i.e., Naïve Bayes, Instance-based (IB), Back Propagation Network (BPN), and Radial Basis Function (RBF)) data mining procedures to model hearing loss changes using audiometric data among professional drivers. They concluded that, save for RBF, all data mining algorithms demonstrated relatively high adaptation and performance for the right ear. Likewise, in the current study, the models generated by the SVM algorithm showed high accuracy. Noma et al. [41] intended to predict hearing loss symptom using audiometric data and FP-Growth technique. The findings indicated that, in five different models, the accuracy rates were 100%, 99.5%, 98.25%, and 94.6% with frequencies higher than 10. In this study, save for the last algorithm, all the other FP-Growth algorithms had high accuracy. Likewise, in the current study, the accuracy of the first three models was 100%, while that of the last one was 94%.
In this study, some variables that had not been investigated in previous research were weighed and prioritized. They included various sound pressure levels, frequencies (250 Hz, 500 Hz, 1 KHz, 2 KHz, 4 KHz, and 8 KHz), age, and work experience. Most of the studies investigating hearing loss have only reported the error rate and accuracy of the model, without mentioning anything about the weight and the effect size of different variables. One of the limitations of the current study was the problems the researchers encountered to convince the stakeholders in the industry and participants to cooperate with the research team.

Conclusion
This study focused on weighing factors that can cause hearing loss. According to the obtained results, in the first model, out of all the predictors (age, work experience, equal sound level, and frequency), the frequency of 8 KHz had the greatest impact (with a weight of 33%), while noise registered the smallest effect (with a weight of 5%). The accuracy of this model in predicting hearing loss was 100%. In the second model, the frequency of 4 KHz had the biggest impact (with a weight of 21%), whereas the frequency of 250 Hz had the smallest influence (with a weight of 6%). The accuracy of this model was 100% too. In the third model, the frequency of 4 KHz had the highest impact (with a weight of 22%), while the frequency of 250 Hz had the lowest effect (with a weight of 3%), with the accuracy of this model being 100% as well. In the fourth model, the frequency of 4 KHz had the greatest impact (with a weight of 24%), while the frequency of 500 Hz exerted the smallest effect (with a weight of 4%). The accuracy of this model was found to be 94%. Based on the accuracy indices obtained from these models, SVM algorithm can be regarded as an appropriate and powerful instrument to predict and model hearing loss.