Condition Monitoring of Chain Sprocket Drive System Based on IoT Device and Convolutional Neural Network

,


Introduction
In a mechanical system, the gear train, belt pulley drive, and chain sprocket drive (CSD) are used to transmit power using rotating shafts. When the drive shaft and driven shaft are near each other, a gear drive unit is used. For a location where the distance between the drive shaft and driven shaft is relatively short, a belt pulley drive unit or CSD unit is often used. A CSD system has the advantage of transmitting power with greater force than a belt pulley drive system and does so by transmitting power without slip during the transmission process. us, CSDs have been applied in nearly all mechanical industries, such as machine tools, marine and aerospace drives, motorcycles, and the timing system for automotive engines. A CSD system failure while in operation can cause catastrophic human and economic losses. erefore, it is critical to identify defects in a CSD system before it breaks down. A CSD system generally consists of a chain sprocket unit and its driving system. Damage to the chain sprocket unit is mainly due to roller chain fatigue [1].
e driving system of a chain sprocket unit consists of the electric motor, gears, bearings, and rotating shaft. Damage to a driving system is caused by a defect or failure [2,3] of the bearing supporting the rotating shaft and gear tooth, which transmits the driving force to drive the chain sprocket unit. Several studies have been conducted to evaluate methods for rotating machinery. For the feature extraction of fault signals, traditional fault diagnosis methods such as time-average method [4], cepstral analysis [5], pseudo-Wigner-Ville distribution (PWVD) [6], discrete wavelet transform (DWT) [7], higher-order method [8], adaptive line enhancement [9], empirical mode decomposition (EMD) [10], and cyclostationary analysis [10,11] have been used. CWT is nonorthogonal wavelet transform, and DWT is orthogonal transform. Fourier transform of mother wavelets for CWT and that for DWT are different from each other. e smoothing effect of CWT is better than that of DWT in the frequency domain. DWT is a useful method for the compression or recovery of signal, but CWT has advantage for the time-frequency analysis of signal to extract the fault information. In PWVD (pseudo-Wigner-Ville distribution) if we determine wrong kernel function, there is a cross-term problem. EMD is theoretically incomplete tool and not useful for the analysis in case of multiple faults. us, CWT has been widely used for time-frequency analysis of a fault signal. To classify fault patterns using fault features, k-nearest neighbor algorithms [12], Bayesian classifiers [13], support vector machines (SVMs) [14], and artificial neural networks (ANNs) [15] have been used. Most recently, owing to the development of computational performance and learning algorithms, deep learning approaches that include both feature extraction and classification of fault patterns have been applied in the field of fault diagnosis [16,17] and have become the most popular diagnosis methods [18][19][20][21][22][23]. Most papers on the fault diagnosis of rotating machinery using deep learning techniques have focused on the classification of a specific single component defect, such as the bearing wear pattern [16,[18][19][20], gear tooth failure pattern [17,21,22], and unbalanced rotating shaft [23]. ese previous studies analyzed and classified the fault pattern of single components such as a tooth, bearing, and rotating shaft of the mechanical system. is paper presents the diagnosis of a CSD system and multiple classifications of fault patterns using a deep learning technique. In this work, the CSD system components such as the electric motor shaft, bearings, and gears were arbitrarily damaged and assembled. A total of eight fault-states were created. e signals used for fault diagnosis were the vibrational acceleration measured in the CSD system. In this work, Internet of ings (IoT) sensor has been developed for the wireless measurement of fault signal and the real time transmission of measured data. e developed IoT sensor was firstly applied to the laboratory test for the validation of the new method. It is being used for the detection of chain convey system. An IoT device measured the acceleration. e IoT device is newly developed and consists of a wireless microelectromechanical system (MEMS) accelerometer, Bluetooth function, Wi-Fi function, and battery. e wireless MEMS accelerometer is useful in IoT applications because its small size and batterypowered operation are the typical requirements for IoT sensors [24,25]. For the fault stage diagnosis or normal state using deep learning, image data are essential. erefore, onedimensional time data were converted into time-scale image data by continuous wavelet transform (CWT). e image data include time-frequency information related to the fault types. A convolutional neural network (CNN) was employed for multiple classifications of eight fault-states and one normal state, and the image data were used as the CNN input. e combination of CWTand CNN was introduced in field of fault detection [26][27][28]. e innovative difference for these algorithms is the difference in the CNN structure according to application area. e CNN structure such as filter size (kernel size), number of layers, and number of inputs and outputs should be determined optimally for the successful application in different systems. In this study, the combination of CWT and CNN was also employed for the condition monitoring of the CSD system. e new optimal structure of CNN was presented for diagnosis of the multiple faults in the CSD system. roughout the CNN, the patterns of eight fault types and a normal type were successfully classified, and their feature maps were well extracted.

eory of CWT.
e CWT [29] is based upon a family of functions: where ψ is a fixed function called the "mother wavelet," which is localized in terms of both time and frequency. e function ψ a,b (t) is obtained by applying the operations of shifting (b-translation) in the time domain and scaling in the frequency domain (a-dilation) to the mother wavelet. e mother wavelet used throughout this study is the Morlet wavelet [30], where ω 0 is the center frequency of the "mother wavelet" when the mother wavelet is transformed to the frequency domain. B is the bandwidth defined as the variances of the Fourier transform ψ(f) of the Morlet wavelet, where f indicates the frequency and * denotes the complex conjugate. e CWT of a signal x(t) is defined by where ψ * (·) is the complex conjugate of ψ(·) and the function x(t) satisfies the condition Here, ψ a,b (t) plays an analogous role to the e jωt in the definition of the Fourier transform. If the mother wavelet ψ(t) satisfies the admissibility condition: then the inverse wavelet transform can be obtained by 2.2. eory of CNN. Artificial neural networks have been widely used for the prediction and classification of sound and vibration signals. Before deep neural networks (DNNs) were introduced, ANNs with shallow neural network (SNN) structures were used [31][32][33]. An SNN uses a supervised training process with a feature vector. However, it is difficult to extract the system fault features if the dynamic system characteristics are unknown. erefore, the DNN structure was developed for feature extraction [34]. A CNN is one of the DNN structures [35], as shown in Figure 1. A CNN uses an unsupervised training process and feature maps related to fault information, which are extracted from the stages of several convolutional and pooling layers. e neurons in a CNN are arranged in the form of feature maps. e input to a convolutional layer is the image x of size m × n in the CNN. e convolutional layer contains f filters (kernels) of size r s, which have smaller dimensions than the input image. e output of the convolutional layer is a set of f feature maps of size (m − s + 1) × (n − s + 1) by striding over one pixel. e filter, realized by assigning a weight f ij to each pixel in the input image and calculated as a weighted sum, extracts certain features contained in the image. e weighted sums are then added by an additive bias and passed through a nonlinear function to obtain pixels in the convolutional map. Traditionally, sigmoid and hyperbolic tangent functions were used; recently, rectified linear units [35] have become popular. e activation output y l j of a particular feature map j in the convolutional layer l is given as where ϕ is the nonlinear activation function; b l j is the scalar bias for the l th layer; Z l j is the selected feature map i in the ( l-1 ) th layer, which is summed up by the feature map j in the l th layer; ⊗ denotes the convolutional operator that convolutes the activation y l−1 i of the preceding layer; and f l ij is the filter used to perform the convolutional operation. e filter weights are trained to detect specific features. Hence, effective feature selection in successive stages that can distinguish between different categories is necessary for the accurate classification of new input images.
is is followed by a pooling layer. Each feature map is subjected to region-wise pooling, such as the maximum or average of nonoverlapping pixels. e output of the pooling layer leads to a dimensional reduction depending on the chosen stride. e activation output p l d after downsizing the feature map d in a layer l is given as where χ is the downsizing function, such as the average or maximum function downsized by a factor of N l j , and d l j is the convoluted feature map to be downsized.
As the original input passes through successive convolution and pooling processes, the network learns to efficiently represent all of the images. e last neural network layer is a fully connected layer whose output y p is given by where b p is the bias for the output layer, W is the weight matrix between the input and output layers of the fully connected layer, f denotes the feature maps of the fully connected input layer, and ψ is the softmax function [35]. e parameters b l j , f l ij , b p , and W were learned during training.
e training takes place via stochastic gradient descent (SGD) with the objective of minimizing the error between the actual and desired output. e gradient was computed using the backpropagation method [36]. All the filter weights and biases were updated according to the objective function for each input sample until an optimal representation is obtained for the training samples. For the backpropagation method, the cost function J is defined as where P is the number of output neurons, t p is the p th element of the target output for the p th fault, and y p is the actual output of the network for the p th fault. A common problem encountered in training CNNs is overfitting, which results in poor performance in a set of holdout tests after the network is trained on a small or even large training set; this affects the ability of the model to generalize unseen data. e SGD algorithm has been used as a learning algorithm [34] in backpropagation. To overcome the overfitting problem, adaptive moment estimation (ADAM) [37,38] was proposed, which is another method that computes the adaptive learning rates for each parameter. Recently, Hinton [39] proposed the RMSprop algorithm. RMSprop is an unpublished, adaptive learning rate method; it lies within the realm of adaptive learning rate methods that have seen growing popularity in recent years. In this study, the RMSprop algorithm was employed. For the RMSprop algorithm, the update rule is mathematically given by where E[(∇J) 2 ] is the moving average of squared gradients, δJ/δw is the gradient of the cost function with respect to the weight, η is the learning rate, and β is the moving average parameter. e default value for the moving average parameter that can be used in projects is 0.9, which works very well for most applications.

CSD System and Synthetic Fault Patterns for Test.
A visualization of the CSD system used for the experiment is shown in Figure 2. e CSD system used for testing is the one on the right in Figure 2(a). e technical specifications are summarized in Table 1.

Shock and Vibration
In the CSD system used for testing, there are four bearings, four gears, one chain, two sprockets, and one electric motor. e input shaft rotational speed is 1800 rpm (30 Hz), and that of the output shaft is 59.4 rpm (0.99 Hz). e speed reduction ratio is 30. If the chain and sprocket are damaged, the CSD system cannot work; therefore, artificial faults were created only in the CSD system drive portion, such as the motor shaft, bearings,

Data Acquisition.
e experiments were performed under the conditions shown in Figure 2(c). e helical gearbox consists of two gears mounted on independent shafts. e input shaft is connected to the electric motor, which transforms electrical energy into rotational movement for transmission to the mechanical system. e output shaft is linked to the chain sprocket unit, which has a chain connected to the final output shaft and transforms electrical energy into mechanical force, as opposed to the rotational movement of the final shaft. e final output shaft can be used to drive the conveyor system. e most important details regarding these mechanical components are listed in Table 1. Vibration data were obtained using a vibration sensor unit mounted on the gearbox housing to determine the acceleration in the vertical direction. Sound data were obtained using a microphone placed at a distance of 1 m from the CSD system to measure radiated noise, as shown in Figure 2(b). A vibration sensor unit, called "Happy-Go," was built with the following components:   "Happy-Go" has its own built-in low-pass filter set to 1/2 the sampling rate; the device is shown in Figure 4.
e accelerometer also has a built-in high-pass filter, but it was deactivated. e test rotor was sampled at 2000 Hz, indicating that the cut-off frequency of the low-pass filter was 1000 Hz. e measurement range, sensitivity, noise density, and frequency bandwidth of the sensor are ±3 g, 270 mV/ga, 175 μg, and 2000 Hz, respectively. e vibration data were transmitted to a computer via Wi-Fi. Bluetooth is installed for the measured data transmission to the smart phone in the future but was not used in this study. A ½-inch free-field microphone (B&K 4192, Denmark) was used to measure sound data, which were transmitted to the computer through a data acquisition system (NI 9233, USA). With this setup, a dataset was created incorporating the healthy and faulty conditions presented in Section 3.1. A test was performed for each fault condition, resulting in a total of 700 test runs. Each test had a runtime of 5 min, from which the last 30 s of vibration data were captured using the accelerometers.

Data Analysis Based on Vibration eory.
e vibration data measured in the time domain were analyzed in the frequency and time-frequency domains using the MAT-LAB (MathWorks, USA) signal processing toolbox. Figures 5 shows the time history of eight faults and one normal signal measured by the accelerometer during one test. Figure 6 shows a comparison between the power spectrum of the normal vibration signal and eight fault vibration signals, respectively. e spectrum shapes of the eight fault signals were different from that of the normal signal. According to vibration theory of rotating machinery [40][41][42][43], under normal conditions, in rotating machinery like the CSD system, there are major vibration sources such as impact vibration between the sprocket and chain, gear meshing vibration, bearing rolling vibration, and electric motor shaft rotor vibration. e frequencies of these sources in the CSD system were calculated using the system specifications listed in Table 1  Shock and Vibration 7 (vii) Structural dynamic resonance of the CSD system: 57 Hz e frequencies have several vibration peaks corresponding to these sources, as shown in Figure 6. e frequency region of these vibration peaks is under 500 Hz. e variations in vibration caused by the eight faults primarily occur in the frequency region above 500 Hz, as shown in Figure 6. e peaks under 500 Hz are related to the rotating frequency of the shaft, the teeth meshing frequency, the contact frequency of balls in bearings, and the contact frequency between chain and sprocket. e high frequency peaks are related to mechanical faults such as broken teeth, bearing wear, and rotating shaft imbalance. e mutual spectrum shapes of the eight fault signals were also different. According to the spectrum shape differences, the eight fault types can be distinguished and classified using vibration signals. For a more meaningful analysis of fault classification, CWT was applied to the vibration signals, and the timefrequency information for vibration signals was obtained. Figure 7 shows the CWT analysis results applied to the eight vibration signals. Detailed explanation of the major frequency components in the CWT analysis for the normal vibration signal is given in Figure 8 and listed in Table 2. e vibrations of these major frequencies are related to the vibration sources and listed in Table 2.

Feature Extraction.
e traditional fault classification method in the field of machine learning uses a feature vector. Feature vectors extracted from raw signals have been used as the input of classifiers such as SVM and SNN. erefore, numerous feature extraction methods have been studied for many years [4][5][6][7][8][9][10][11]. e major features of rotating machinery can be summarized [17,44] as follows: (i) Peak to peak (ii) Root mean square time and frequency (iii) Standard deviation time and frequency (iv) Shape factor (v) Frequency center (vi) Impulse factor (vii) Crest factor (viii) Mean time and frequency (first moment) (ix) Variation time and frequency (second moment) (x) Skewness (third moment) (xi) Kurtosis (fourth moment) To reduce feature dimensionality and improve classification accuracy, feature selection is critical to subsequent classifications. Several researchers have proposed effective methods of feature selection [45]. e classification machine using feature vectors has adopted the supervised training method [34]. However, it is difficult to extract the feature vectors for multiple-fault classification. erefore, it is necessary to extract the features automatically based on the unsupervised training method. e CNN is a classification machine using an unsupervised training method. e supervised training method demands a feature vector correlated to fault characteristics as the input of a classification machine. e major feature vectors of rotating machinery were mentioned in Section 4.2. ese feature vectors are useful for the fault pattern classification of one mechanical element such as the tooth itself, bearing itself, or shaft itself. However, when multiple faults are taking place during CSD system operation, it is difficult to find which feature vectors are correlated to the multiple faults of tooth, bearings, shaft imbalance, and their combination. erefore, the feature vectors for the multiple classifications should be self-extracted using a DNN such as CNN. e CNN uses the raw signal instead of using features and generates a feature map at each CNN convolution layer stage. It is an unsupervised training method because the raw signal is used as the input of machine learning based on CNN. In general, the CNN uses images as input data. In this study, the CWT was applied to the signal recorded by a wireless MEMS accelerometer. e image data obtained by the CWT were used as the CNN input image, as shown in Figure 7. Vibration data recorded every 30 s were classified by the multiclass classifier as Fault 1, Fault 2, . . ., Fault 8 or normal. Among the images recorded every 30 s (126 × 60,000), a reduced image (126 × 400) recorded for 0.2 s was used as the CNN input.

Network Setup and Training.
After the vibration signals were processed using CWT and converted into two-dimensional images, they were used as CNN inputs. e DNN toolbox for MATLAB (MathWorks, USA) was utilized. e network architecture is shown in Figure 9. e first layer following the input layer is a convolutional layer with eight feature maps of filter size 3 × 3. is is followed by a meanpooling layer of size 2 × 2. e next layer is a convolutional layer with 16 feature maps of filter size 3 × 3, followed by a 2 × 2 mean-pooling layer. e output layer contains nine neurons corresponding to the eight different fault conditions and one normal condition. All the layers are fully connected. e softmax function was used as the classification function. e RMSprop method [39] was used to train the network with an initial learning rate of 0.001. e batch size was taken as 128. Training was carried out for 1000 iterations (50 epochs). e change in accuracy during the learning iterations is shown in Figure 10. An optimal value of weight is achieved in 390 iterations with minimum error. For this training, out of the 700 samples, 500 samples were used for training, 100 samples for validation, and 100 samples for testing, and 10 different networks were produced.

Results and Discussion.
e classification accuracy ratio is the ratio of the number of correctly classified test samples to the total number of test samples. In this case, among the 100 test samples, 97 samples were correctly classified. As shown in Figure 10, the accuracy rate arrives at the maximum value and becomes stable at 390 iterations. e filter size in the convolution layer and the number of filters were optimized to ensure that the accuracy rate converges to the maximum value and is stable. e accuracy rate is defined as follows [20]: accuracy rate � the corrected classfication of sample the total number of samples .
e network performed with a classification accuracy rate of 97%, misclassifying just three samples, as shown in Figure 11. e feature maps for the eight fault-states and one normal state were extracted from the final convolutional layer of one network among the 10 trained networks, as shown in Figure 12. e figure clearly shows different feature maps for the eight fault-states and one normal state. According to these results, the feature map of each fault shows a different time-frequency characteristic. e features must be visualized to verify the image recognition accuracy. If the data for each pixel are used for features as shown in Figure 12, then the features are highdimensional data and show the time-frequency characteristics related to each fault. erefore, t-stochastic neighbor embedding (t-SNE) was used for visualization by converting high-dimensional data into low two-dimensional data, as shown in Figure 13 [46]. "High-dimensional" implies that the number of features is high and "two-dimensional" indicates that the number of features is two. Figure 13(a) shows the reduced two-dimensional features for 20 input images of the CNN; Figure 13     different input datasets, the t-SNE was used for visualization by converting these features into low two-dimensional data. e results were plotted as shown in Figure 13(c). e features of the raw images are scattered randomly, making it difficult to classify the fault and normal states. However, the features obtained by the trained network are grouped according to fault or normal conditions. erefore, it is possible to classify the fault and normal states.

Conclusions
In this paper, a novel condition monitoring approach for the CSD system is proposed, based on the integration of CWT and CNN. In the CSD system, eight fault-states and one normal state were artificially manufactured. An IoT device for vibration measurement was also developed. e device features a wireless MEMS accelerometer, Bluetooth function, and Wi-Fi function. e vibration data were measured using the wireless MEMS accelerometer mounted on the CSD system. One-dimensional vibration signals in the time domain were transformed into time-scale images via CWT. ese images were then classified by a CNN, which can extract the underlying, deep features embedded in images that are closely related to fault types. As critical factors affecting the network classification accuracy, the filter size and number of filters were optimized in the convolutional and pooling layers of the CNN structures. Input images obtained by the CWT and feature maps extracted by the CNN are high-dimensional data. us, t-SNE was used for visualization by converting the high-dimensional data into low two-dimensional data. Two-dimensional features enabled clear classification of the eight fault-states and one normal state. e results showed that the condition monitoring approach for the CSD system based on the integration of CWT and CNN is an excellent classification method.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.