Discrimination of Driver Fatigue Based on Distortion Energy Density Theory and Multiple Physiological Signals

Driver fatigue is an important contributor to traffic accidents, and driver fatigue is significant for the safety of people’s lives. Aiming to prevent traffic accidents caused by driver fatigue, a series of real driving experiments was carried out in the present work. First, based on an analysis with respect to distortion energy density (DED) theory and the experimental results, the upper trapezius at 6th neck vertebrae is more sensitive to driver fatigue and easier to fatigue than that at 7th neck vertebrae in a real driving. And then 2 cm from the 6th vertebrae on both sides were selected as the locations of data acquisition for electromyography (EMG) signal. The experimental results show that the approximate entropy (ApEn) from the electroencephalography (EEG), EMG, and respiration (RESP) signals decreases with increasing driving time, indicating that the degree of fatigue increases. After approximately 90 min, the rate of decrease in ApEn becomes slow, indicating deeper driver fatigue. According to three-D analysis, principal component analysis, and fuzzy C-means clustering analysis, the EEG-EMG combination effectively reflects the state of drivers. Finally, the ApEns from EEG and EMG were selected as independent variables, and a discriminant model of driver fatigue based on Mahalanobis distance theory was built. The accuracy of the model is up to 90.92% by 10-fold cross validation. The reasons for the high accuracy are the reasonable selection of the locations of EMG data acquisition and better degree of discrimination of EEG and EMG. The main contributions of this study are to provide a theoretical foundation for establishing internationally recognized standard locations for neck EMG data acquisition and to provide a feasible method for discriminating driver fatigue in real driving tasks.


I. INTRODUCTION
Driver fatigue is an important contributor to traffic accidents. More and more attention has been paid to the discrimination and prevention of driver fatigue, which is significant for the safety of people's lives [1]. The studies on driver fatigue detection can be divided into four categories: (1) those based on subjective questionnaires and evaluations, e.g., SOFI-25 (Swedish Occupational Fatigue Inventory-25), KSS (Karolinska Sleepiness Scale), PFC (Pearson Fatigue Coefficient), SSS (Stanford Sleeping Scale) [2], [3]; (2) those based on the driver behavior, e.g., blink, pupil change, yawning, nodding and other facial features [4]; (3) those based on the vehicle behavior, e.g., steering wheel control, speed variation and lateral displacement of vehicle, stepping on the accelerator and braking [5]- [7]; and (4) those based on physiological signals, e.g., electroencephalography (EEG), electromyography (EMG), electrocardiography (ECG), electrooculography (EOG), respiration signals (RESP), etc [8]- [31]. At present, physiological signals are considered effective methods to discriminate driver fatigue because persons usually have little control over them, and physiological signals provide the most objective and reliable driver information.
Studies on EEG. In recent years, more studies have demonstrated the deviation of EEG indicators from the normal vigilant state during fatigue in the time and frequency domains. Zhang et al. [8] thought that monitoring physiological signals on a simulator was a possible method of investigating driver fatigue, and he proposed an approach based on clustering in brain networks to improve the performance of driver fatigue detection. Tuncer et al. [9] proposed an EEG-based intelligent system for driver fatigue detection, containing preprocessing, feature generation, informative features selection and classification with a shallow classifier phase. Lees et al. [10] recorded 32-lead monopolar EEG data of drivers during a monotonous driving task, and they assessed fatigue/sleepiness using the Pittsburgh sleep quality index (PSQI), the Epworth sleepiness scale (ESS), the Karolinksa sleepiness scale (KSS) and the checklist of individual strength 20 (CIS20). Additionally, the experimental results of some other researchers showed that EEG signals showed potential value for early warning and prevention of traffic accidents caused by driver fatigue [11]- [14].
Studies on EMG. Muscle fatigue is also a significant feature of driver fatigue, which is the most perceptual physiological state of drivers, and EMG was used recently to discriminate driver fatigue. Fu et al. [15] proposed a dynamic fatigue detection model based on the combination of EMG and other signals. In this model, driver fatigue can be evaluated in a probabilistic way using EMG signals and contextual information. Hostens et al. [16] thought that muscle activation and fatigue may provide more insight into the effects of long-term driving in the occurrence of health problems in the neck/shoulder/back area. In his work, surface EMG data was recorded from the left and right trapezius and deltoid muscles during a simulated driving task, with muscle stiffness being reported by more than half of the participants after 1 hour of driving. Chen et al. [17] detected the effects of long-time driving on the neck, shoulder, and waist muscles, and they found that the degrees of fatigue in different parts of muscles were different during the driving tasks. Similarly, some other studies also found that the EMG is an effective physiological signal to use in evaluating driver fatigue [18]- [20].
Studies on other signals. Zou et al. [21] recorded the RESP, ECG, and other physiological signals in a driving simulator and analyzed the physiological data; they proposed a comprehensive fatigue indicator that was used to evaluate driver fatigue. Jiao et al. [22] investigated the effects of different vibration frequencies on heart rate variability (HRV) determined with ECG and driver fatigue in simulated driving by using power spectrum analysis and subjective evaluation, and they found that the degree of driver fatigue was associated with the vibration frequencies in simulated driving. Arnedt et al. [23] compared the physiological signals with self-reporting measures, nocturnal sleep latency tests (SLTs), an auditory vigilance task, and other performance measures. Additionally, there are some investigations on psychological aspects of safe driving [24], [25].
Based on above literature, many meaningful results and conclusions have been obtained. Different measuring methods [26]- [27], different physiological signals [28]- [29], and different characteristic features [30]- [31] can be used to evaluate driver fatigue. However, challenges still remain so far in the physiologically based discriminant methods for driver fatigue. Consider the following: (1) the traditional wire-based signal record method inevitably disturbs the driver, and so it is difficult to carry out the experiments in a real driving task. (2) There are no internationally accepted standard locations for EMG data acquisition, which may result in a low discriminant efficiency of EMG. (3) Reasonable selection of characteristic features and their combinations is important for the accuracy of discriminant models of driver fatigue.
Aiming to better address these problems, a series of investigation was carried out in the present work. The main contributions and originality of this study included the following: (1) based on a wireless body area network (WBAN) [32], real-time, portable, wearable sampling electrodes were used to record physiological signals in real driving tasks. (2) Reasonable locations for EMG data acquisition were determined based on distortion energy density (DED) theory and experiments. (3) The ApEns from physiological signals were extracted and analyzed by three-D analysis, principal component analysis, and fuzzy C-means (FCM) clustering analysis [33], [34]. Then, the characteristic features and their combinations were selected with better discriminant efficiency. This is also important for improving the accuracy of the discriminant model. (4) A discriminant model of driver fatigue was built based on Mahalanobis distance theory, verified by a 10-fold cross validation and discussed with respect to the outcomes of a statistical analysis. In short, the main contributions of this study are to provide a theoretical foundation for determining internationally recognized standard locations for neck EMG data acquisition, and to provide a feasible method for discriminating driver fatigue in a real driving task.

II. REAL DRIVING DESIGN
The real driving design includes the following ( Figure 1): ① According to an NHTSA (National Highway Traffic Safety Administration) report, young people, particularly males, between the age of 18 and 35 have the greatest risk for fatigued driving resulting in car crashes [35]. Therefore, 12 male and healthy volunteers (18-35 years old) were selected as participants in the present work. ② The monotony of road scenarios and an automatic transmission car make it easier for drivers to feel fatigue. Thus, the participants were requested to drive continuously for 120 min on the highway from Shenyang to Dandong, with their speed controlled below 100 km/h. ③ During the driving task, electroencephalography (EEG), electromyography (EMG), and respiration (RESP) signals were collected in real-time by portable, wearable sampling electrodes in a wireless body area network (WBAN). Then, the physiological signals were transferred to a data processing system by Bluetooth. The sampling frequency was 200 Hz. Details on the locations of the sampling electrodes may be found in Section IV of this paper. The physiological signals were collected and transferred by wireless method (WBAN and Bluetooth), and there was no disturbance of the drivers. This study was reviewed and approved by the Ethics Committee of our institute. ④ During the driving tasks, the assistants helped the participants to complete the SOFI-25 form at 10 min intervals. In this form, the participants scored their degree of fatigue according to their own feelings, including energy, physical strength, comfort, and drowsiness. "0" indicates no fatigue, and "10" indicates deep fatigue. Then, based on the subjective feedback, the alert state and fatigue state of participants were roughly determined.

A. APPROXIMATE ENTROPY
In statistics, approximate entropy (ApEn) is a technique used to qualify the amount of regularity and the unpredictability of fluctuations in time-series data [36]. ApEn was initially developed to analyze medical data [37], finance data [38], and psychology data [39]. Recently, ApEn was applied in human factor engineering [40]. In the present work, the ApEns from physiological signals were used as the characteristic feature to evaluate the fatigue state. The algorithm of ApEn is as follows [41]. ① For a time series of data u(1), u(2), …, u(N), there are N raw data values from measurement equally spaced in time. ② Fix m (an integer) and r (a positive real number). The value of m represents the length of the compared run of data, and r specifies a filtering level. In the present work, m=2, and r=0.15×SD (SD is the standard deviation of original data). ③ For a sequence of vectors (2) in which, d[x, x * ] = max|u(a)-u * (a)|. The u(a) are the m scalar components of x. d represents the distance between the vectors x(i) and x(j), given by the maximum difference in their respective scalar components. ⑤ Define:

B. FUZZY C-MEANS CLUSTERING
Fuzzy C-means (FCM) is a kind of clustering method that was proposed by Bezdek [42], [43]. This method is used to determine the probability that a data point belongs to a certain cluster. In the present work, FCM clustering was used to evaluate the relationship between the physiological signals and drivers' states (alert state and fatigue state). The optimized objective function is as follows: The algorithm of the FCM clustering is as follows [44]: Otherwise, go back to step ②.

C. MAHALANOBIS DISTANCE
The Mahalanobis distance is a measure of the distance between a point x and a distribution Q, introduced by Mahalanobis in 1936 [45]. The Mahalanobis distance is a multi-dimensional measurement of how many standard deviations away x is from the mean of Q. The distance is zero if x is at the mean of Q and grows as x moves away from the mean along each principal component axis. If each of these axes is rescaled to have unit variance, then the Mahalanobis distance corresponds to the standard Euclidean distance in the transformed space. The Mahalanobis distance is thus unitless and scale invariant and considers the correlation of the data set. In the present work, the Mahalanobis distance was used to discriminate the states (alert or fatigue) of drivers in a driving task. The algorithm of Mahalanobis distance is listed as follows [46].
First, chose training samples. , , …, x n1 (1) are the training samples of the distribution of the alert state (Q 1 ), and the number of samples is n 1 (2) are the training samples of the distribution of the fatigue state (Q 2 ), and the number of samples is n 2 . Then, Then, the discriminant function is, The discriminant results are,

IV. DETERMINATION OF THE LOCATIONS OF DATA ACQUISITION
Based on clinical observation, the fatigue of the neck muscles of a driver occurs mostly commonly at the lower segment, i.e., 6 th neck vertebrae (shorten as C6) and 7 th neck vertebrae (shorten as C7) in the driving tasks. In the present work, (1) first, the mechanical analysis on the upper trapezius at C6 and C7 was carried out, and then a biomechanical model of neck muscles was built based on the theory of distortion energy density (DED). (2) The results based on DED were verified by driving experiments. This procedure can provide a theoretical foundation for determining internationally recognized standard locations for neck EMG data acquisition.

A. FUNDAMENTAL ANALYSIS BASED ON DED
According to biomechanical principles, the mechanics between the head and neck can be simplified as a variable section cantilever (Figure 2). Based on the mechanical analysis of the neck muscles of the driver in Figure 2, neck mechanics are subjected to uniaxial tensile yield. Then, the biomechanical model of a driver shown in Eq. (16) can be obtained. In Eq. (16), the neck cross section is assumed to be an approximate ellipse, and D A and D B are the longer axis and shorter axis of the ellipse, respectively. Then, the stress at any neck section in the ellipse can be calculated based on Eq. (16). From the physiological structure, it can be seen that both the longer axis and the shorter axis at C6 are shorter than at C7, In Eq. (16), M(x) is the neck bending-moment, and it can be calculated by the moment equation (Eq. (17)), where G is the neck load (basically the weight of the head), X is the neck arm, L is the length of the uniform neck load, and q is the load factor. According to the physiological structure of the human body, the neck arms at C6 and C7 are approximately equal, X C6 ≈X C7 . Therefore, the bending-moments M(x) at C6 and C7 are approximately equal too (M(C6)≈M(C7)). M T (x) is the neck torque, and it can be calculated by a torque equation. The values of M T (x) at C6 and C7 are equal too (M T (C6)≈M T (C7)), because the external couples T at C6 and C7 are equal.
M(x)=G· X +(qL 2 cosθ)/2 (17) Then, according to Eq. (16), the stress at C6 is higher than the stress at C7. Therefore, the neck muscle at C6 is more easily fatigued during long exposures to higher stress levels than that at C7. Therefore, the upper trapezius at C6 is most sensitive to driver fatigue.

B. VERIFICATION BY EXPERIMENTS
To verify the DED-based results, the ApEns from the upper trapezius at C6 and C7 were compared. The locations of data acquisition are shown in Figure 3, and the values of ApEn at C6 and C7 are shown in Figure 4. From the figure, we can conclude that the ApEns decrease with driving time at both C6 and C7. At an early time (0-30 min), the drivers had no fatigue, and there was basically no difference between C6 and C7. However, at a later time (90-120 min), the drivers were fatigued, and the values of ApEn at C6 were much lower than the value of ApEn at C7, indicating that the muscle at C6 is more sensitive than the muscle at C7. This experimental result is consistent with the DED-based analysis results, and the result was verified.  Then, based on the above DED analysis and experimental results, 2 cm on both sides of the upper trapezius at the 6 th neck vertebrae (C6) were selected as the locations of data acquisition for the neck EMG signal. The reference electrode was located at C7. The detailed locations of electrodes are shown in Figure 3. Reasonable selection of the locations of EMG data acquisition is one of the important reasons for the high accuracy of the model built in the present work [47]. In addition, based on the literature review [48], the occipital bone of the head is sensitive to driver fatigue. Therefore, O1 and O2 in Figure 3 were selected as the locations of EEG data acquisition. The electrodes for RESP data acquisition were located at the abdomen and were positioned with an adjustable rubber belt, and then abdominal respiration was recorded.

V. EXTRACTION OF CHARACTERISTIC FEATURE A P E N
In the present work, the EEG, EMG, and RESP signals of 12 participants were continuously collected for 120 min during real driving tasks on the highway. The physiological signals were de-noised by Empirical Mode Decomposition (EMD) [49]. For each participant, 30 s of characteristic feature ApEns data from EEG, EMG, and RESP were extracted at 10 min intervals. The variation trends of the average ApEn values of 12 participants are shown in Figure  5 for EEG, EMG, and RESP. The ApEns from three physiological signals decrease with increasing driving time, indicating that the degree of fatigue increases. After approximately 90 min, the rate of decrease in ApEn becomes slow, indicating deeper driver fatigue. In general, the ApEn from EEG reflects the degree of order of the mutual adjustment of the nervous system. The reason for the decrease in ApEn from EEG is that the ability to self-adjust becomes weaker with increasing driving time [50]. Similarly, the muscles gradually become tense and stiff, and the ability to control sport units also becomes weaker. This is the reason for the decrease in ApEn from EMG with increasing driving time [51]. The circumstance encountered in the real driving task is different, and it is inevitable that characteristic features fluctuate. However, the regularity of the decreasing trend is clear. Therefore, the ApEns from physiological signals have the discriminability and stability in evaluating driver fatigue. The variation trend of the averaged subjective scores of 12 participants during the driving tasks is shown in Figure 6. The scores increase with driving time. From 0-30 min, the scores are almost 0, indicating no fatigue, and the drivers are in an alert state. From 30-90 min, the scores obviously increase, indicating the transition from an alert state to a fatigue state. From 90-120 min, the scores are mostly higher than 5, indicating the drivers are in a fatigue state. Then, based on (1) the ApEn values in Figure 5, (2) the subjective feedback in the questionnaire from participants in Figure 6, and (3) the literature reviews in [52] and [53], the period from 0~30 min during a real driving is defined as the alert state, and that from 90~120 min is defined as the fatigue state in the present work.

VI. DISCUSSION OF CHARACTERISTIC FEATURE A P E N
In the present work, first, two analysis methods were used to discuss the discriminant effectiveness of characteristic features of ApEns from physiological signals, including (1) the distribution of ApEn in 3D space and (2) principal component analysis. Then, a combination analysis by fuzzy C-means (FCM) clustering was carried out.

A. DISTRIBUTION OF A P E N IN 3D SPACE
Five minutes of data from both the alert state and the fatigue state of 12 participants were selected for analysis. The characteristic features were extracted every 30 seconds, including the ApEns from EEG, EMG, and RESP. Then, 10~15 min in the alert state was selected, and 105~110 min in the fatigue state was selected in the present work. In total, there are 720 ApEn measurements (2 states × (5 minutes/30 seconds) × 3 signals × 12 participants), so half were from the alert state, and the other half were from the fatigue state.
The three-D distribution of states is given in Figure 7. In the figure, the coordinate axes are the values of the normalized ApEns from EEG, EMG, and RESP. One can conclude that the regularities of ApEns from EEG and EMG are better than the regularities from RESP. For example, the alert state is distributed mainly in the bigger value range of ApEns from EEG and EMG, and the fatigue state is distributed mainly in the smaller value range of ApEns from EEG and EMG. However, in the RESP axis, the alert state and fatigue state distribute over the whole value range, indicating that the ApEns from EEG and EMG have a better degree of discrimination than those from RESP.

B. PRINCIPAL COMPONENT ANALYSIS
In this section, the discriminant efficiencies of the ApEns from the three physiological signals were compared by principal component analysis (PCA). By PCA, three principal components (P1, P2, and P3) are obtained, and their contribution rates are shown in Figure 8. The contribution rates of the first two principal components (P1 and P2) are 51.24% and 37.51%, respectively. The cumulative contribution rate is up to 88.75 (more than 85%). Therefore, most of the useful information is in P1 and P2, and these two principal components are sufficient to evaluate driver fatigue. However, the contribution rate of P3 is only 11.25%. Therefore, the most information in P3 is redundant information, and it will not be considered in the following discussion.   Equation 18 gives the coefficient matrix of principal components, where X1, X2, and X3 represent the ApEns from EEG, EMG, and RESP, respectively. The coefficients represent the weight of each characteristic feature in a principal component, and they are also the discriminant efficiencies of features used to evaluate driver fatigue. In P1 and P2, the coefficients of X1 and X2 are obviously higher than the coefficient of X3, indicating that the discriminant efficiencies of the ApEns from EEG and EMG are better than that from RESP. Therefore, the ApEns from EEG and EMG should be selected as independent variables in building discriminant model of driver fatigue, and the ApEn from RESP can be ignored.

C. COMBINATION ANALYSIS BY FUZZY C-MEANS CLUSTERING
In this section, with the aim of obtaining the optimum signal combination, fuzzy C-means clustering was used to analyze different ApEn combinations. The clustering results for the ApEns from EEG-EMG, EEG-RESP, and EMG-RESP are shown in Figure 9 to Figure 11. In these figures, the probability of the alert state is given as two physiological signals are combined.  Figure 9, one can conclude that (1) when both of the ApEn values from EEG and EMG are higher, the probability of an alert state is higher (more than 80%, the yellow zone in the figure). (2) However, when both of the ApEn values from EEG and EMG are lower, the probability of the alert state is lower (less than 20%, the blue zone in the figure), and then the probability of the fatigue state is higher. (3) When the ApEn values from EEG and EMG are others, the probability of an alert state is 20~80%, which should be a transition state. (4) In general, in the case of the EEG-EMG combination, there is a clear boundary for the probabilities of an alert state and a fatigue state, which is the advantage of this combination. Therefore, the EEG-EMG combination can effectively reflect the fatigue state (or alert state) of drivers during the driving tasks. However, in the cases of the EEG-RESP combination and the EMG-RESP combination ( Figure  10 and Figure 11), there is no clear boundary for the probabilities of an alert state and a fatigue state, which makes it difficult to discriminate these two states.

A. BUILDING THE DISCRIMINANT MODEL
According to the above discussions, the ApEns from EEG and EMG better discriminate between the alert state and the fatigue state. Therefore, these two characteristics (ApEns) are selected as independent variables in the discriminant model of driver fatigue. Among the characteristics, 120 sets of data in the alert state were defined as Q1, and 120 sets of data in the fatigue state were defined as Q2. Then, based on Mahalanobis distance theory, a mathematical discriminant model of driver fatigue was built, which is shown in Eq. 19. In the equation, ApEn EEG is the ApEn from EEG, ApEn EMG is the ApEn from EMG, and the coefficients are obtained by the calculation according to the coefficient matrix in Eq. 13 In the present work, 10-fold cross validation was used to verify the accuracy of the model. There were 120 sets of data (both the alert state and the fatigue state) divided into 10 groups. Any 9 groups took turns being used as the training group, and the other group was used as the testing group. The results of 10-fold cross validation are shown in Figure 12. One can conclude that the average accuracy of the testing group is up to 90.92%. In the case of the single physiological model, the accuracy is only approximately 83.23% (single EEG), 75.60% (single EMG), and 43.66% (single RESP). Therefore, the discriminant model of driver fatigue built in the present work is very accurate, and the model made by using multiple physiological signals is better than that using a single source signal.

B. DISCUSSION OF THE ACCURACY OF THE MODEL
To better understand the reason for the accuracy of the model, statistical analysis (paired t-test) of the ApEns from the physiological signals was performed. The testing results are shown in Table 1. The statistical significances P of the ApEns from EEG and EMG are lower (P<0.05), which indicates the differences between the alert state and the fatigue state are obvious when EEG and EMG are used to discriminate driver fatigue. The statistical significance P of the ApEn from RESP is higher (P>0.05), which indicates that the difference between the alert state and the fatigue state is not obvious when RESP is used to discriminate driver fatigue, proving that the ApEns from EEG and EMG have a better degree of discrimination for the state of drivers than RESP. Therefore, reasonable selection of the locations of data acquisition for EMG and better degree of discrimination of EEG and EMG are the reasons for the high accuracy of the discriminant model of driver fatigue built in the present work.

VIII. CONCLUSIONS
The main findings in this paper can be summarized as follows.
(1) Based on DED theory and the experimental results, the upper trapezius at 6 th neck vertebrae is more easily fatigued during a real driving and more sensitive to driver fatigue than that at 7 th neck vertebrae. Therefore, 2 cm on both sides of 6 th vertebrae were selected as the locations of data acquisition for the neck EMG signal.
(2) The ApEns from EEG, EMG, and RESP signals decrease with increasing driving time, indicating that the degree of fatigue increases. After approximately 90 min, the rate of decrease in ApEn becomes slow, indicating deeper driver fatigue. According to three-D analysis and principal component analysis, the ApEns from EEG and EMG have a better degree of discrimination than that from RESP.
According to fuzzy C-means clustering analysis, the EEG-EMG combination effectively reflects the fatigue state (or alert state) of drivers.
(3) The ApEns from EEG and EMG were selected as independent variables, and a discriminant model of driver fatigue based on the Mahalanobis distance theory was built. The accuracy of the model is up to 90.92% by 10-fold cross validation. The reasons for the high accuracy are the reasonable selection of the locations of EMG data acquisition and better degree of discrimination of EEG and EMG.
The main research contributions of this study are that it provides a theoretical foundation for determining internationally recognized standard locations for data acquisition for the neck EMG signal and that it provides a feasible method for discriminating driver fatigue during a real driving task. However, the driving tasks in the present work were carried out on a highway. If the road conditions are different, probably the driver fatigue may occur at a different time. These are still challenges in the field of safe driving and are of course directions for our future work.