Sound Signal Based Fault Classi ﬁ cation System in Motorcycles Using Hybrid Feature Sets and Extreme Learning Machine Classi ﬁ ers

: Vehicles generate dissimilar sound patterns under different working environments. These generated sound patterns signify the condition of the engines, which in turn is used for diagnosing various faults. In this paper, the sound signals produced by motorcycles are analyzed to locate various faults. The important attributes are extracted from the generated sound signals based on time, frequency and wavelet domains which clearly describe the statistical behavior of the signals. Further, various types of faults are classi ﬁ ed using the Extreme Learning Machine (ELM) classi ﬁ er from the extracted features. More-over, the improved classi ﬁ cation performance is obtained by the combination of feature sets in different domains. The simulation results clearly demonstrate that the proposed hybrid feature set together with the ELM classi ﬁ er gives more promising results with higher classi ﬁ cation accuracy when compared with the other conventional methods.


Introduction
Asia has the world's most astounding centralization of mechanized two and three-wheeled vehicles. Motorcycles are the most favored means of travel frequently used by sparingly middle-class citizens due to their inexpensiveness, mileage, street conditions, and less demanding mobility. Motorcycle acquires the automobile market above 75% virtually of the total vehicle sales in the Asian countries. The Engine is the heart of automobiles and may get a complaint if the rider doesn't make continuous checking of the engine and other particles. Viewing from the side of engine complaints, the sound of the engine would provide major information about the engine problem [1]. Vehicles of a particular type, in various working environments, produce different sound patterns. Each sound pattern is regarded as an aural signature. Sounds of moving vehicles afford hints of their behaviors such as possible faults make and concerts of subsystems. In such circumstances, stalking of faults manually befits difficult and automatic acoustic investigation assists easy monitoring of assured conditions of the vehicles and future significances [2]. The problematic sound might be produced due to the hydraulic tap fit failures, float needle problem, high acceleration problem, air screw problem, alteration of silencer, air filter problem, etc. Due to the rapid development of transportation technology, fault detection and diagnostics in the motorcycles based on the audio signatures is one of the major issues in the day to day life. Fault diagnostic can be done once the fault is detected properly.
Anami et al. [3] employed an acoustic signal-based recognition technique for the detection of faults in motorcycles using time domain-based features based and ANN classifiers. But the training time required for such type of classifier was found to be more. Paulraj et al. [4] adopted the autoregressive modeling algorithm for extracting features from the vehicle sound signals. But the authors focused only on finding the distance between the vehicles and the type of moving vehicle. Pan et al. [5] introduced a fault diagnosis method using the combination of entropy and mutual information as features derived from the vibration signals. But all the time domain-based features do not provide proper information, because most of the data might be embedded in the frequency content of the signal. For feature extraction, frequency domain-based signal processing techniques such as Discrete Fourier Transform (DFT) and short Time Fourier Transform (STFT) is used for detecting faults in the automobiles [6]. Anami et al. [7] adopted spectral based techniques by exploiting the variations in spectral behavior together with the chain code of the pseudo spectrum of the sound signal. Zhou et al. [8] adopted Fourier Transform based Mel-Frequency Cepstral Coefficients (MFCC) for recognizing the faults. Since the audio signals are non-stationary in nature, the Fourier transform based methods do not provide appropriate results because of their fixed window size. However, DFT and STFT have been realistic only to stationary signals. The wavelet Transform (WT) is localized in both time and frequency domain whereas the Fourier Transform (FT) is localized only in the frequency domain. Generally, by using the Wavelet Transform the frequency components can be separated based on the number of levels. Then number of sub bands is directly proportional to the number of levels of decomposition. In the wavelet Transform, various frequency components can be separated and stored and further processed for the classification purpose. Wavelet domain with various sub bands can produce detailed frequency information when compared to the Fourier Transform based analysis. Moreover, Wavelet Transform is well suited for the analysis of non-stationary signals, whereas, Fourier Transform is suitable only for the analysis of stationary signals. Since most of the signals such as speech, sound, music biomedical signals etc. are non-stationary in nature, WT based methods are used for the analysis of such type of signals. Furthermore, the local behavior of the signal can be studied well using Wavelet Transform when compared to Fourier Transform [10].
Wavelet Transform (WT) with variable window size is used for obtaining features from the nonstationary audio signals. Because of their inherent multi-resolution nature, wavelet coding schemes are especially suitable for applications where scalability and tolerable degradation are important. Wavelet Transform decomposes the signal into a set of basic functions [9]. These basis functions are called wavelets. Wavelets are obtained from a single prototype wavelet called mother wavelet by dilations and shifting. WT is computed separately for different segments of the time domain signal at different frequencies. But in WT based methods, only low frequency components called approximations are considered for analysis. This may cause problems, since important information may be located in the high frequency components. The aforementioned problem can be overcome by the application of Wavelet Packet Transform (WPT), in which the signal under consideration is divided into both low frequency components called approximations and high frequency components called details and both are considered for analysis. Wang et al. [10] employed the Wavelet Packet Transform (WPT) based approach in which distribution of energies in the first five sub bands of wavelet packet decomposition levels are used as features. A two stage Artificial Neural Network (ANN) classifier is deployed for the recognition of faults. But the computation time required was more and the classification rate obtained was less. Berredjem et al. [11] used wavelet packet coefficients to extract features from faulty bearings. Huseyin et al. [12] adopted wavelet packet analysis to extract features from the sound signals recorded from the cars and used multilayer perceptron classifiers for classification. Despite the advantages of the feature sets in all the three domains, the classification accuracy attained was still less, which can be further improved by using the combination of feature sets in the three domains. This work mainly focuses on the application of hybrid feature sets.
Classification is one of the major tasks in fault identification in the vehicles. Many of the authors suggested Feed Forward Neural Network (FFNN) classifiers. Nevertheless, such type of classifiers produced less classification rate and Mean Square Error (MSE). ELM is a comparatively novel learning technique used for fault classification. Certainly, ELM models are highly adaptable and are much faster to train than standard neural network models. The main idea behind ELM is the projection of the original input into a high dimensional feature space. The uniqueness is that this new space is entirely fixed before observing the data. Thus, the actual learning consists of a simple linear regression that can be calculated proficiently in closed form. In this paper, the classification of faults in motorcycles based on acoustic signals using hybrid feature sets and ELM has been addressed. The rest of the paper is organized as follows: Section 2 discusses the proposed methodology. Feature extractions in three domains are discussed in Section 3. Results and discussions are discussed in Section 4 followed by conclusion Section 5.

Proposed Methodology
The objective of this study is to classify different faults in motorcycles based on the sound signals. The sound signals generated by the vehicles in practice are in the time domain. But most distinguished information may be hidden in the frequency content of the signal. The proposed methodology for fault detection incorporates the advantages of both times, frequency and wavelet domain-based features. Fig. 1 illustrates the block diagram of the proposed framework. The important stages include signal preprocessing, feature extraction and classification.
Initially, the signals undergo preprocessing which includes low pass filtering and normalization. Low pass filter is used for limiting the signal values below a predefined level and normalization is done for bringing all the signal values to an offset value. The appropriate features are extracted from the preprocessed signal. The feature vectors reflect the statistical characteristics of the signal in time, frequency and wavelet domains. The Extreme Learning Machine (ELM) is used for classifying the faults. The classification performance is analyzed by using a different combination of feature sets. The forthcoming sections describe the detailed description of each stage.

Feature Extraction
The extraction of features from input signals is vital to resolve the accuracy of classification. The usage of all coefficients of the input signal into the classifier may increase the computational load and decrease the classification rate. Thus, it is necessary to reduce the size of the input signal by extracting the relevant characteristics of the signal.

Time Domain-Based Features
The time-domain features give the basic information of data that is stored in a one-dimensional format [13]. The important statistical features extracted from the audio signals are explained below: where n is the number of samples.
Variance of a signal uses the power of the signal as a feature. Generally, the variance is the mean value of the square of the deviation of that variable. However, the mean of the signal is close to zero [18].
The variance represents the power of the variation of the standard deviation from the mean.
(c) Kurtosis (K): Kurtosis is a statistical measure that describes how profoundly the tails of distribution vary from the tails of a normal distribution. Fig. 2 shows the curves for Normal distribution and kurtosis.
The kurtosis of a distribution is defined as where μ is the mean of x(n), σ is the standard deviation of x(n), and E(n) represents the expected value of the quantity n.

Audio signal recorded Preprocessing Extreme Learning Neural Network Classifier
Low Pass filtering Normalization F 1 Figure 1: Acoustic signal-based fault classification system

Time domain
The average rate at which information is produced by a stochastic source of data is called Entropy [14]. Entropy related with each promising data value is the negative logarithm of the probability mass function for the value: where p contains the normalized sample number counts.
There are some features that describe the information about the structure of the envelope of the signal. They are zero Crossing (Z) and waveform length(L).
(e) Zero-Crossing (Z): A zero-crossing is a point where the sign of a function changes (e.g., from positive to negative), represented by an intercept of the axis (zero value) in the graph of the function. Thus, the zero-crossing rate is the rate of sign-changes along with a signal, i.e., the rate at which the signal changes from positive to negative or vice-versa [15].
M is the total number of samples in a processing window. x(m) is the value of the m th sample.
(f) Waveform Length (L): It is the cumulative length of the waveform over the segment. The resultant values indicate a measure of waveform amplitude, frequency, and duration all within a single parameter [12].
where x i is the value of each part of the segment k, and N is the length of the segment N.

Frequencydomain Based Features
The property of the signal can also be described from the frequency content of the signal. The visual representation of the spectrum of frequencies of the signal can be analyzed from the audio spectrogram. Spectrograms are occasionally called as sonographs [16]. The estimate of the spectrogram for the particular fault is shown in Fig. 3. The important features extracted from the spectrogram are Mean frequency and Median frequency (a) Mean Frequency (C): The mean frequency of an audio spectrum is calculated as the sum of the product of the audio spectrogram intensity (in dB) and the frequency divided by the total sum of audio spectrogram intensity. It can be expressed as follows: It is a frequency at which the audio spectrum is divided into two regions with an equal integrated power [19]. It can be expressed as Figures 4a and 4b show the Mean frequency and median frequency estimates of the signal.

Wavelet Packet Transform (WPT)
Wavelet Packet Transform provides a multi-resolution and sparse representation of the audio signal. The wavelet packets are more sensitive to small changes in the audio signal. In the DWT, each level of decomposition is evaluated by passing only the proceeding wavelet approximation coefficients through discrete-time low and high pass quadrature mirror filters. Nevertheless, in the wavelet packet decomposition, both the detail and approximation coefficients are decomposed for creating the full binary tree. For n levels of decomposition, 2 n sets of coefficients are produced in WPT [17]. The energy values distributed in the approximations and detailed coefficients of the wavelet packet sub bands are revealed as features. Fig. 5 shows the Wavelet packet decomposition structure.
Wavelet packets are precise linear combinations of wavelets. They form bases that preserve many smoothness, and localization properties of the orthogonality, of their parent wavelets [18]. The packet is given by the following equation where n ¼ 0; 1; 2; ………:: and k ¼ 0; 1; 2; ………::; m The scaling function is fðtÞ ¼ u

Algorithm 1 briefs the process of feature extraction
4 Extreme Learning Machine Based Classifier ELM is the single hidden layer Feed Forward Neural network (FFNN) used for a broad range of nonlinear applications. The ELM structure comprising of three layers (input, hidden, output) is shown in Fig. 6. In this structure, the weights and biases of the input layer are arbitrary while only the outputs are calculated using the ELM algorithm. The learning time required for ELM is extremely less. Furthermore, the ELM structure has precise generalization aptitude compared to the FFNN based conventional learning algorithm [19]. AN ELM is a model of the form where hðxÞ ¼ ½h 1 ðxÞ; ……:h L ðxÞ T is referred to as an ELM feature vector, and b denotes the vector of expansion coefficients. Eq. (11) represents a network with two layers, where the input is projected first into a M dimensional space, in which the linear combination is accomplished.
The basic ELM frame consists of M number of hidden nodes and operates with g(x) activation function. The other important characteristics of ELM are the randomly chosen input weights and hidden unit biases  [20,21]. Furthermore, the weights and biases are adjusted and the network is trained by finding the least square solutions which analytically determine output weights. The hidden layer parameters are related by the following equation.
Hb ¼ T; The hidden layer parameters are randomly selected, before giving the training data [22]. The ELM not only achieves minimum training error but also the smallest norm of the output weights. Tab. 1 shows the parameters used for ELM Training.

Acquisition of Sound Samples
The different sound signals of motorcycles (Royal Enfield Bullet) are recorded using the digital voice recorder, under the supervision of experienced mechanics. Acoustic features are acquired from motor cycle engine with single cylinder, 346cc, air cooled petrol engine, four stroke, maximum torque of 28Nm @4000 and 5 speed manual gear box. The signals are sampled with the frequency 44.1 kHz. The sampled signals are further low pass filtered and normalized. The database consists of a total of 4500 samples which includes 500 samples of healthy motorcycles and 4000 samples of faulty motorcycles are show in Tab. 2. The faulty motorcycles produce problematic sounds like an air filter, air screw problem, air silencer; float needle problem, high acceleration problem hydraulic tap pit failure and pushrod problem.

(a) Alter silencer problem
There may be sparks firing in the silencers. This ensures when the exhaust/silencer is tampered to achieve better performance. The inner circumference of the silencer would be surfaced with exhaust residue, mostly unburnt fuel which is pushed from the cylinder. And this unburnt fuel is sparked by the heat exhaust fumes along with the silencer and it appears as sparks and sounds like gunfire. Fig. 7a shows the air silencer and the generated sound signal due to air silencer problem.

(b) Air filter problem
An air filter is a pleated multi-layered, oil impregnated mesh prepared from surgical cotton or other fabric. It is used for cleaning air going to the intake system of the engine, thereby reducing its turbulence, while limiting the foreign particles in that air. The motorcycle with a problem in the air filter produces a Push rod problem 570 425 145 Figure 7a: Alter silencer and the generated sound signal due to alter silencer problem different sound. i.e., a high flow air filter allows more air than a normal air filter. Fig. 7b shows the air filter and the generated sound signal due to air filter problem

(c) Pushrod problem
A bad push rod (usually this means it's bent) should produce unwanted sound. Fig. 7c shows Pushrod and the generated sound signal due to pushrod problem

(d) Hydraulic tappet
Due to the damaged hydraulic tappets, tappet noise should be produced. When dirt pushes inside the tappet cylinder its hydraulic property is lost and thus resulting in some clearance with the pushrods. As a consequence, the unwanted sound might be produced. Fig. 7d shows Hydraulic tappet and the generated sound signal due to Hydraulic tappet problem  When the carburetor is out of adjustment, the air/fuel screw and the balance between two or more carburetors need to be adjusted. The incorrect adjustment can produce unwanted sounds.

(f) Float needle problem
The floats in a carburetor are typically made from either brass or plastic. Most of the floats in small engines do not have a metal tang to adjust them. The problems in the float needle produce different sounds.
(g) High acceleration problem: Increasing the speed of the accelerator produces different sounds. This is called a high acceleration problem. Figs. 7g and 7h show the generated sound signal due to the bike with normal speed and high speed.
The pertinent features are extracted from the recorded sound signals after preprocessing. The features are further normalized and randomized. 75% of the dataset is used for training and the remaining 25% are used for testing.
These features are given as input to the FFNN, ELM classifiers. In the hidden layer, the hyperbolic tangent sigmoidal activation function is used and in the output layer, the logistic sigmoidal activation function is used. The total number of output neurons is fixed as 8 and the number of the epoch is fixed as   The performance of the network is tested using the test datasets and validated by calculating different performance metrics. A 10-fold cross-validation algorithm is adopted to get enhanced test performance.

Performance Assessment
The performance of the proposed methodology is analyzed by evaluating different performance metrics given in Eqs. (18) to (20).
A classification rate of 98.5% was achieved for the hybrid feature set. Fig. 8 shows the performance of the ELM classifier with hybrid dataset for variable number of hidden nodes.
It is found that the best performance is achieved by increasing the number of a hidden node with a maximum of 100.
Thus, in the above discussions and from the comparison results, it is concluded that the combination of the features sets in all three domains, i.e., time, frequency and wavelet domains provide better performance.  Figure 8: Performance of the neural network training

Conclusion
Fault classification using ELM classifier is proposed for the recognition of various faults in the motorcycle. Features from time, frequency and wavelet domains are extracted from the sound signals captured generated by the motorcycles. These feature sets give important attributes of the signal. The high performance is obtained by the application of ELM classifier, because of its unique characteristics. The proposed methodology is tested in the real environment by collecting the sound signals which is recorded from the motorcycles with different faults. Thus, the ELM classifier with hybrid feature sets can be used for predicting, identifying and classifying various faults in the motor cycles at the early stage. By identifying the faults automatically, we can avoid situations such as engine failure, serious accidents and other components from getting damaged. Consequently, the faulty components can be replaced or get repaired at the early stages, thereby improving the life time of other engine components. For example, if the problem in the air filter is automatically identified first, it can be replaced immediately and by replacing the faulty component, the mileage of the motor cycle can be improved very much. The experimental results and discussions clearly demonstrate that the proposed system provides better performance when compared to the conventional classification systems.
Funding Statement: The author(s) received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.