Fault Diagnosis and Fault Frequency Determination of Permanent Magnet Synchronous Motor Based on Deep Learning

The early diagnosis of a motor is important. Many researchers have used deep learning to diagnose motor applications. This paper proposes a one-dimensional convolutional neural network for the diagnosis of permanent magnet synchronous motors. The one-dimensional convolutional neural network model is weakly supervised and consists of multiple convolutional feature-extraction modules. Through the analysis of the torque and current signals of the motors, the motors can be diagnosed under a wide range of speeds, variable loads, and eccentricity effects. The advantage of the proposed method is that the feature-extraction modules can extract multiscale features from complex conditions. The number of training parameters was reduced so as to solve the overfitting problem. Furthermore, the class feature map was proposed to automatically determine the frequency component that contributes to the classification using the weak learning method. The experimental results reveal that the proposed model can effectively diagnose three different motor states—healthy state, demagnetization fault state, and bearing fault state. In addition, the model can detect eccentric effects. By combining the current and torque features, the classification accuracy of the proposed model is up to 98.85%, which is higher than that of classical machine-learning methods such as the k-nearest neighbor and support vector machine.


Introduction
As industrial automation becomes increasingly popular, motors are used in various mechanical systems to supply power. The advantage of automation is that it makes production lines faster and more flexible. Faults in the motors and machine elements, including bearings, gearboxes, and shafts, may result in substantial financial costs and human safety problems. Early fault diagnosis and detection are essential for maintaining the high performance and reliability of the entire mechanical system [1]. Fault diagnosis can prevent unexpected lengthy process shutdowns, damage to the mechanical system, unnecessary maintenance operations, and even expensive repairs. Therefore, to prevent catastrophic motor failure, early fault diagnosis of the motor and machine elements is important.
Generally, motor fault diagnosis is divided into mechanical and electrical faults [2,3]. Mechanical faults include air-gap deformation, bearing failures, shaft misalignment, and mechanical imbalance, as presented in the literature [4][5][6]. Furthermore, electrical faults usually include the stator, rotor, and electrical supply faults, which were analyzed in [7][8][9]. Various diagnostic techniques for mechanical systems have been presented in the literature [10][11][12]. Thermal image and acoustic-based methods both have the advantage of being non-invasive. In [13], the feature areas of a thermal image were determined by calculating the difference between the thermal images. After finding the areas, the images were converted into binary images for fault classification. However, the method based on the thermal image has a disadvantage in that the machine is damaged while collecting the dataset over a long time under high-temperature conditions. The acoustic-based technique was analyzed in [14,15]. It has a lower cost than the thermal-based method. Generally, defects in the motor are identified by analyzing the features of the frequency spectrum of the sound emission. However, in real conditions, the acoustic signal is easily mixed with other signals and interferes with environmental factors. Many mechanical fault diagnosis applications based on vibration signal analysis are also available, especially for bearing faults [16] and gear transmission systems [17]. In [18], 12 accelerometers were simultaneously used. Moreover, the time and frequency domains of the vibration signal were analyzed separately to determine the failed components of the gearbox. Article [19] presents a time-frequency analysis of a vibration signal that solves time-varying faults under variable speed conditions. However, the position at which the accelerometer is set is often challenging.
Motor current signature analysis (MCSA) is a fault diagnosis technique based on motor current analysis. MCSA is one of the most widely used fault diagnosis techniques for motors, and has the advantage of simplicity of current sensors and installation. The fast Fourier transform (FFT) is a well-known method that computes the discrete Fourier transform of a discrete-time series function. The FFT method produces computationally efficient results; hence, it is a powerful and simple MCSA technique. In a previous study [20], the motor current signal under a transient working condition was analyzed via the discrete wavelet decomposition for a gearbox faults detection. In another study [21], multiple current sensors were used to diagnose the gearbox faults. The frequency domain signals transferred by FFT are stacked to create a matrix as an input vector of the 2D convolutional neural network (CNN).
Machine learning has become a popular technique. In general, machine learning can be classified into supervised and unsupervised learning. Unsupervised learning is a training method that does not require any labels, such as the principal component analysis [22], k-nearest neighbor (KNN) algorithm [23], and generative adversarial networks [24]. Supervised learning requires a correct label for the training model, such as the support vector machine (SVM) [25], artificial neural networks [26], and linear regression [27]. Many studies have shown that machine learning can effectively solve problems associated with motor fault diagnosis [28].
Recently, the fast-growing deep learning algorithm, a part of machine learning, has been widely used in this field. In the literature [29], two motor fault diagnosis methods were proposed to detect five motor conditions. The motor conditions include a normal permanent magnet synchronous motor (PMSM), two different degrees of demagnetization fault PMSMs, a bearing with a damaged inner ring, and a bearing with aluminum powder. The fault diagnosis technique effectively detected the five conditions over a wide speed range. After data acquisition of the stator current, the discrete wavelet transform (DWT) was utilized to extract the features. The softmax classifier classified the approximation and detail coefficients obtained through the transformation. To achieve a higher accuracy and a reliable fault-detection method, a 1D CNN was proposed. By stacking the convolutional, max-pooling, and batch normalization layers, the 1D CNN automatically learned the important features from the time-domain signal. The final classification accuracy was up to 98.8%, which was 0.7% higher than that of the DWT method.
However, in real conditions, most motors are operated under loads, which is not considered in [29]. To obtain robust and reliable motor fault diagnosis and detect several fault conditions simultaneously, motor current and vibration signals were leveraged together in [30]. Both the motor current and vibration signals were converted into a time-frequency distribution using the wavelet transform. The time-frequency distributions were treated as grayscale images, which were sent to the multi-signal 2D CNN. Two architectures of the 2D CNN with similar parameters were discussed: a model that takes a two-channel signal as the input vector and the other that takes two individual signals as the input vectors. These results indicate that the latter model had a higher accuracy rate. In [31], a stacked inverted residual CNN (SIRCNN), which is a lightweight model, is proposed to diagnose rolling bearing faults. The time domain vibration signal is transformed into a 2D image after normalization. By using the depth-wise separable convolution, linear bottleneck, and inverted residual block, the computations and size of the model can be decreased. Moreover, the authors of [31] indicate that SIRCNN is highly robust against different noisy environments, with the addition of white Gaussian noise to the original signal. In a previous study [32], a bearing defect diagnosis model was trained based on the transfer learning methodology. The model was first trained by the source domain data; next, the samples of the target domain were used to fine-tune the mode. Furthermore, in another study [32], a novel trigonometric cross-entropy function calculating the sparsity cost was developed and included in the cost function. The modified cost function can evade the redundant activation of neurons in the hidden layer. A previous study [33], with different results than those of [30], proposed a multiresolution multisensory fusion network consisting of a 1D CNN and long short-term memory (LSTM). By combining the 1D CNN and LSTM, the model can learn features from a two-channel signal well. Moreover, the authors of [33] highlight the effectiveness of multiple kernels in finding different scales of features. The power supply frequency was considered and eliminated using the Hilbert transform. Furthermore, in contrast with other studies, one [33] used three load conditions, which were closer to real applications.
In this paper, a 1D CNN, which is a weakly supervised learning model, is proposed. The 1D CNN model consists of multiscale feature-extraction modules and has the ability to automatically determine the localization of the frequency component contributes to the classification. The experimental results demonstrate that the proposed method can effectively diagnose three different motor states running at variable speeds, load conditions, and eccentric effects. The results of the proposed 1D CNN were better than those of the previous methods. Therefore, we recommend the use of the proposed 1D CNN for featureextraction and the softmax layer for classification for higher classification accuracy.
The contributions of this paper are as follows: (1) Unlike the aforementioned related studies, the 1D CNN model proposed in this paper diagnoses motor faults by extracting the stator current signal and torque signal of the motor. (2) In [33], a multilevel information fusion model, combined with a 1D CNN and LSTM, was used to diagnose the motor faults. The model can detect five different motor faults by extracting the vibration and stator current signals. However, Wang et al. [33] considered only three fixed load settings. This study considered variable loads ranging from 0 to 0.24 Nm. (3) The parameters and sizes of the neural network can be reduced using the proposed feature-extraction module. Furthermore, the model remains robust and can obtain a high classification accuracy. (4) In the aforementioned studies, the exact frequency of the signal contributing to the classification was not shown. This study implemented a weakly supervised architecture and visualized the important grades of the frequency components that contribute to the classification. (5) In summary, we propose a 1D CNN model to detect motor faults, under a wide range of motor speeds from 100 to 1600 rpm and loads from 0 to 0.24 Nm. In addition, the model can detect the effect of eccentricity and identify important frequency components.
The remainder of this paper is organized as follows. Section 2 presents the design of the motor diagnosis platform. Section 3 describes the structure of the diagnosis system and explains the diagnosis steps. In Section 4, the experimental results obtained from real-time motor data, which demonstrate the effectiveness and robustness of the proposed methods for motor condition monitoring, are presented. The experimental results show that the proposed 1D CNN model can effectively classify. Section 5 presents the conclusions of this study.
ics. The maximum rpm and torque of the motor were 3000 rpm and 2.39 Nm, respectively. The torque sensor used was DATAFLEX ® 16, which was developed by KTR Systems GmbH. The torque measurement error was 0.1% with angular, radial, and axial offset compensation performances. Furthermore, the torque sensor had a maximum torque measurement of 16 Nm. The Hall sensor, ACS711EX, could handle bidirectional currents from −31 to +31 A with a 100 kHz bandwidth. The data acquisition unit was a USB-2405, developed by ADLINK Technology. It used the 24-bit Sigma-Delta ADC with a built-in anti-aliasing filter and four simultaneous sampling analog input channels up to 128 kS/s. Figure 1 shows the two motors on the platform. The motor on the left side was the power source of the mechanical system. This motor was the testing motor, which was diagnosed while it rotated. Three types of testing motors were analyzed: a healthy motor, a motor with demagnetization failure, and a motor with bearing fault. The motor on the right side was the load motor, which provided inverse torques to the testing motor. The actuator controlled the torque of the load motor. A torque sensor was installed between the testing and load motors to measure the torque value of the load motor. Figure 2 illustrates the peripheral devices. The power supplier was used to provide a power of 5 V to the Hall sensor. The Hall sensor was clamped to the testing motor using one of the threephase wires for the current signal acquisition. A load disk with four holes was placed on the axis. The eccentric effect could be generated by locking a small weight on one of the holes while the motor was rotating.    Figure 1 shows the two motors on the platform. The motor on the left side was the power source of the mechanical system. This motor was the testing motor, which was diagnosed while it rotated. Three types of testing motors were analyzed: a healthy motor, a motor with demagnetization failure, and a motor with bearing fault. The motor on the right side was the load motor, which provided inverse torques to the testing motor. The actuator controlled the torque of the load motor. A torque sensor was installed between the testing and load motors to measure the torque value of the load motor. Figure 2 illustrates the peripheral devices. The power supplier was used to provide a power of 5 V to the Hall sensor. The Hall sensor was clamped to the testing motor using one of the three-phase wires for the current signal acquisition. A load disk with four holes was placed on the axis. The eccentric effect could be generated by locking a small weight on one of the holes while the motor was rotating. time motor data, which demonstrate the effectiveness and robustness of the propose methods for motor condition monitoring, are presented. The experimental results show that the proposed 1D CNN model can effectively classify. Section 5 presents the conclu sions of this study.

Motor Diagnosis Platform and Sensors
The motor diagnosis platform built in this study is shown in Figure 1. The critica peripheral devices of the platform include two PMSMs, one torque sensor, and one Ha sensor. The PMSM used was ECM-A3L-0807, which was manufactured by Delta Electron ics. The maximum rpm and torque of the motor were 3000 rpm and 2.39 Nm, respectively The torque sensor used was DATAFLEX ® 16, which was developed by KTR System GmbH. The torque measurement error was 0.1% with angular, radial, and axial offse compensation performances. Furthermore, the torque sensor had a maximum torqu measurement of 16 Nm. The Hall sensor, ACS711EX, could handle bidirectional current from −31 to +31 A with a 100 kHz bandwidth. The data acquisition unit was a USB-2405 developed by ADLINK Technology. It used the 24-bit Sigma-Delta ADC with a built-i anti-aliasing filter and four simultaneous sampling analog input channels up to 128 kS/s Figure 1 shows the two motors on the platform. The motor on the left side was th power source of the mechanical system. This motor was the testing motor, which wa diagnosed while it rotated. Three types of testing motors were analyzed: a healthy moto a motor with demagnetization failure, and a motor with bearing fault. The motor on th right side was the load motor, which provided inverse torques to the testing motor. Th actuator controlled the torque of the load motor. A torque sensor was installed betwee the testing and load motors to measure the torque value of the load motor. Figure 2 illus trates the peripheral devices. The power supplier was used to provide a power of 5 V t the Hall sensor. The Hall sensor was clamped to the testing motor using one of the three phase wires for the current signal acquisition. A load disk with four holes was placed o the axis. The eccentric effect could be generated by locking a small weight on one of th holes while the motor was rotating.

Methods
In this section, the proposed motor fault diagnosis techniques are described. First, the stator current and torque signals were collected under several conditions, and the dataset was introduced. To make the data more suitable for the proposed 1D CNN model, the dataset was processed first. Then, the feature-extraction module and 1D CNN were designed. A diagnosis model design process was utilized to confirm the reliability and classification accuracy rate of the proposed 1D CNN model. After the experiment was tested using a real motor diagnosis platform, the hyperparameters of the 1D CNN model were obtained, as shown in this section. The performance of the model and the associated frequency automatically determined by the proposed 1D CNN model are discussed in Section 4.
The flowchart of the proposed method is shown in Figure 3.

Methods
In this section, the proposed motor fault diagnosis techniq the stator current and torque signals were collected under severa taset was introduced. To make the data more suitable for the pr the dataset was processed first. Then, the feature-extraction mo designed. A diagnosis model design process was utilized to co classification accuracy rate of the proposed 1D CNN model. A tested using a real motor diagnosis platform, the hyperparamete were obtained, as shown in this section. The performance of the m frequency automatically determined by the proposed 1D CNN Section 4.
The flowchart of the proposed method is shown in Figure 3

Data Collection
Experiments for the motor fault diagnosis were conducted nosis platform, as illustrated in Table 1. The stator current and to taneously collected using a USB-2405 at a sampling rate of 12,8 included a healthy motor, a demagnetized motor, and a motor w the loading conditions were fixed at 0 and 0.24 Nm. To ensure th nose motors while the loads were changing, the data on loads loads changed randomly in the range of 0-0.24 Nm. Because of discrete torque commands were represented as = [0, 0.024, … tor continuously and randomly chose one torque value in , and mand to the motor. Furthermore, the frequency of the varying lo tion range of the operating speed was from 100 to 1600 rpm, and

Data Collection
Experiments for the motor fault diagnosis were conducted on the motor fault diagnosis platform, as illustrated in Table 1. The stator current and torque signals were simultaneously collected using a USB-2405 at a sampling rate of 12,800 Hz. The motor types included a healthy motor, a demagnetized motor, and a motor with bearing fault. Two of the loading conditions were fixed at 0 and 0.24 Nm. To ensure that the model could diagnose motors while the loads were changing, the data on loads were collected while the loads changed randomly in the range of 0-0.24 Nm. Because of the actuator setting, the discrete torque commands were represented as T c = [0, 0.024, . . . , 0.216, 0.24]. The actuator continuously and randomly chose one torque value in T c , and sent a step torque command to the motor. Furthermore, the frequency of the varying load was 10 Hz. The detection range of the operating speed was from 100 to 1600 rpm, and the data were collected every 100 rpm. The third operating condition was affected by the load disk. The eccentricity could be controlled by locking or unlocking the weight on the load disk. For every motor condition, 500 measurements were obtained. Hence, the total number of datasets was 144, 000 (3 × 3 × 16 × 2 × 500), indicating three failure modes, three loading conditions, 16 rotating speeds, 2 eccentric modes, and 500 measurements. The proposed model had two output layers: one output was the classification of the failure mode of the motor, and the other was the detection of the eccentric effect. Therefore, the samples were labeled according to the failure mode of the motor and the eccentricity of the load disk for supervised learning. In this study, the ratio of training data to testing data was 4:1. To ensure that the model learned the features from every condition, the data were divided equally based on the operating conditions.

Signal Preprocessing
The raw torque and current signals collected were a 1D time series. To eliminate the DC voltage from the power supplier, the raw data, S r,k = [S r,0 , · · · , S r,l−1 ], went through the zero-mean operation first, as follows: where S z is the signal after zero-mean operation. S r is the raw signal, and l is the length of the raw signal. Then, the signal was converted from the time domain to the frequency domain using the FFT method. The FFT calculation is as follows: where S f is the transformed signal, and only the amplitude of the frequency signal is analyzed. Subsequently, the frequency signal was normalized into the range of [0,1] using the following expression: where S N is the normalized signal and Max S f represents the maximum amplitude of the frequency components. The normalization operation has two advantages: (1) it is convenient to observe the signal difference between different failure modes, and (2) it helps the model converge faster while training the deep learning model.

Feature-Extraction Module
As mentioned in Section 3.1.1, three motor failure modes, namely, variable rotating speeds of the motor, the loading effect, and the eccentric effect, were analyzed. The operating conditions were very complex for motor diagnosis. The features of the motor varied substantially in terms of location and size. The fixed kernel size used in the traditional CNN model was changed. A larger kernel size of the convolutional layer was preferred in order to obtain global information, whereas the smaller one obtained the local information. To allow the model to learn from the complex motor conditions, a sparsely connected network architecture was used instead of a densely connected architecture. Figure 4 illustrates the difference between the two convolutional architectures. Through the application of multiple convolution filters, as shown in Figure 4b, the network learned the multilevel features from the same input.
speeds of the motor, the loading effect, and the eccentric effect, were analyzed. The ope ating conditions were very complex for motor diagnosis. The features of the motor varie substantially in terms of location and size. The fixed kernel size used in the tradition CNN model was changed. A larger kernel size of the convolutional layer was preferre in order to obtain global information, whereas the smaller one obtained the local info mation. To allow the model to learn from the complex motor conditions, a sparsely co nected network architecture was used instead of a densely connected architecture. Figu 4 illustrates the difference between the two convolutional architectures. Through the ap plication of multiple convolution filters, as shown in Figure 4b, the network learned th multilevel features from the same input.  In addition to the sparsely connected convolutional layer, a 1 × 1 convolution was used in the proposed neural network. The main purpose of using a 1 × 1 convolution was to control the dimensionality of the convolutional layer [34,35]. Furthermore, by adding a 1 × 1 convolution with a nonlinear activation function, the nonlinearity enabled the model to learn more complex functions. In [36,37], the residual connection was successfully used in the ResNet and Inception neural networks. The residual connection [38] helped the model prevent vanishing gradients. By adding the identity input, the convolutional layer enabled the learning of different features. In [31], the residuals were used in the stacked inverted residual convolution neural network model for bearing fault diagnosis. Through the implementation of stacks of inverted residual blocks, the model fulfilled a lightweight design and maintained a fast and highly accurate diagnosis.
The structure of the proposed feature-extraction module is shown in Figure 5. The module is divided into two parts: (1) feature-extraction and (2) residual connection. In Figure 5, from top to bottom, the first three sets of convolutional layers are the featureextraction parts. Set 1 consists of a 1 × 1 filter, a 1 × 3 filter, and a 1 × 3 filter. Set 2 consists of a 1 × 1 filter and a 1 × 3 filter, and Set 3 consists of a 1 × 1 filter. Moreover, Set 4 at the bottom is the residual connection part, consisting of one 1 × 1 filter and one 1 × 3 filter with a step length of 2. The multiple convolution filters enabled the model to learn multiscale features from the signal. Subsequently, the three feature vector sets were concatenated and connected through linear activation and max-pooling layers. To implement a residual connection, the output dimensionality of Set 4 was set to be the same as that of the maxpooling layer. Hence, the stride of the 1 × 3 filter in Set 4 was equal to 2. Finally, the output of the module was the sum of the extraction and residual parts with the ReLU activation function. When designing the feature-extraction module, the following concepts were satisfied: the 1 × 1 convolution filter should first be used to scale down the dimensionality before applying different kernels. The number of convolutional kernels should be larger in a more complex feature-extraction set. In Figure 5, Set 1, which had two 1 × 3 convolutional layers, had more kernels than the other sets. Furthermore, Set 3 only consisted of a 1 × 1 convolutional layer with the smallest kernel numbers. Furthermore, in Figure 5, only the filter with the purple background outputs with a linear activation function, and the other filters in the module output with the ReLU activation function. cepts were satisfied: the 1 × 1 convolution filter should first be used to scale down the dimensionality before applying different kernels. The number of convolutional kernels should be larger in a more complex feature-extraction set. In Figure 5, Set 1, which had two 1 × 3 convolutional layers, had more kernels than the other sets. Furthermore, Set 3 only consisted of a 1 × 1 convolutional layer with the smallest kernel numbers. Furthermore, in Figure 5, only the filter with the purple background outputs with a linear activation function, and the other filters in the module output with the ReLU activation function.

Global Average Pooling
In this study, the proposed 1D CNN replaced the flatten layer with a global average pooling (GAP) layer [39]. Figure 6 illustrates a schematic of the GAP and flatten layers. It was assumed that the feature maps obtained from the last feature-extraction module had the dimensions of width , height ℎ, and map number . Classification of a dense layer with class was connected to it. The number of parameters, , required by the fully connected layer was calculated as follows:

Global Average Pooling
In this study, the proposed 1D CNN replaced the flatten layer with a global average pooling (GAP) layer [39]. Figure 6 illustrates a schematic of the GAP and flatten layers. It was assumed that the feature maps obtained from the last feature-extraction module had the dimensions of width w, height h, and map number d. Classification of a dense layer with k class was connected to it. The number of parameters, P g , required by the fully connected layer was calculated as follows: Sensors 2021, 21, 3608 8 of 19 The output vector of the GAP layer represented only the spatial average of the feature map. However, the number of training parameters, , required between the flatten layer and the classification layer was calculated as Comparing the calculation results of Equations (1) and (2) shows that the GAP layer required a fewer number of training parameters t. A lower proportion of the fully connected layer parameters to the total can decrease the chance of overfitting the problem to the training data. Furthermore, some researchers have demonstrated that the GAP layer can be effectively utilized in 2D object localization [40,41]. In this study, weighted global average pooling (WGAP) was adopted to determine the frequency components that contributed to the signal-level classification. The network architecture of the proposed model is illustrated in Figure 7. The network is a two-input, two-output model. The two inputs are the frequency-domain signals of the current and torque. The features of the two input signals were extracted individually using several feature-extraction modules. Subsequently, the two WGAP layers were individually used to obtain the most representative features from the feature maps and . Then, the two feature vectors, WGAP1 and WGAP2, were combined to obtain the fusion feature vector . Finally, was used to classify the failure mode and eccentric effect.
The last feature map obtained from the signal had a size of × × 1. The feature map can be expressed by the following equation: where and indicate the depth and length of the feature map, respectively. Each vector is a -dimensional vector representing the feature of the spatial region . The output vector of the WGAP layer, × , is expressed as where × is the weight matrix. The weight matrix evaluates the critical grade of the spatial th region. The value of the weights is normalized to the range [0,1], and the sum of the weight values is equal to 1. Generally, a spatial region with a higher weight implies The output vector of the GAP layer represented only the spatial average of the feature map. However, the number of training parameters, P f , required between the flatten layer and the classification layer was calculated as Comparing the calculation results of Equations (1) and (2) shows that the GAP layer required a fewer number of training parameters t. A lower proportion of the fully connected layer parameters to the total can decrease the chance of overfitting the problem to the training data.
Furthermore, some researchers have demonstrated that the GAP layer can be effectively utilized in 2D object localization [40,41]. In this study, weighted global average pooling (WGAP) was adopted to determine the frequency components that contributed to the signal-level classification. The network architecture of the proposed model is illustrated in Figure 7. The network is a two-input, two-output model. The two inputs are the frequency-domain signals of the current and torque. The features of the two input signals were extracted individually using several feature-extraction modules. Subsequently, the two WGAP layers were individually used to obtain the most representative features from the feature maps F T and F c . Then, the two feature vectors, WGAP1 and WGAP2, were where , and , are the CFMs corresponding to class of the current signal and the torque signal, respectively. CFM is the linear sum of the patterns at different spatial locations.
, and , indicate the classification weights of the two types of outputs.
A constant value of 0.5 indicates that , and , equally influence the result of the CFM. To observe the critical grade of the frequency component, the CMF should be scaled to the size of the corresponding original signal, and the amplitude of the CMF should be normalized from 0 to 1. Finally, the users can identify the frequency regions that are most relevant to a particular category.

Model Building
To confirm that the proposed 1D CNN model could be implemented on a real platform, a simple design flowchart was designed and is presented in Figure 8. The principles used to adjust the parameters are described in Section 3.2. For the feature-extraction module, the kernel number of Set 3 must exceed that of Set 2, and the kernel number of Set 3 must be the least.
To ensure that the random parameters did not construct the final neural network, the neural network was built from a comparatively small size. If the average training accuracy where D and L indicate the depth and length of the feature map, respectively. Each vector F i is a D-dimensional vector representing the feature of the spatial region i. The output vector of the WGAP layer, G R D×1 , is expressed as where α R L×1 is the weight matrix. The weight matrix evaluates the critical grade of the spatial ith region. The value of the weights is normalized to the range [0,1], and the sum of the weight values is equal to 1. Generally, a spatial region with a higher weight implies that it is more informative for signal-level classification. The weight matrix was determined as follows: where f R L×1 , which is the feature score of the ith region, is obtained by the following operation: where W R 1×D is the parameter vector and b R 1×1 is the bias. σ is the tanh activation function. By concatenating the two outputs of the WGAP1 and WGAP2 layers, the fusion feature vector F m R 2D×1 , shown in Figure 7, is obtained. Then, two FC layers with a softmax activation function are used separately to classify the fusion feature vector F c . For a given class c, the probability distribution of classification P c is expressed as follows: where S c is the input vector of the softmax layer, which is determined using the following equation: where w c n is the classification weight corresponding to class c for unit n. To visualize and analyze the important frequency component for a particular signal, a class feature map (CFM) was built. The CFMs for different signals are defined as follows: where M C,c and M T,c are the CFMs corresponding to class c of the current signal and the torque signal, respectively. CFM is the linear sum of the patterns at different spatial locations. w c,1 n and w c,2 n indicate the classification weights of the two types of outputs. A constant value of 0.5 indicates that w c,1 n and w c,2 n equally influence the result of the CFM. To observe the critical grade of the frequency component, the CMF should be scaled to the size of the corresponding original signal, and the amplitude of the CMF should be normalized from 0 to 1. Finally, the users can identify the frequency regions that are most relevant to a particular category.

Model Building
To confirm that the proposed 1D CNN model could be implemented on a real platform, a simple design flowchart was designed and is presented in Figure 8. The principles used to adjust the parameters are described in Section 3.2. For the feature-extraction module, the kernel number of Set 3 must exceed that of Set 2, and the kernel number of Set 3 must be the least.
ensors 2021, 21,3608 is lower than 90%, users need to add more feature-extraction modules and fo presented in Figure 8. Furthermore, once the average training accuracy is su users should check whether the difference between the average training an ing accuracies is lower than 3%, implying that the overfitting problem is n Subsequently, the model will be tested on a real platform. To verify that the implemented on the motor diagnosis platform, 10 continuous predictions determine the motor state for one diagnostic result. The failure mode wi number among the 10 predictions was the diagnostic result. The eccentric agnosed using the same method. The architecture of the proposed 1D CNN model after experimental tu in Figure 7. Generally, the architecture of the proposed model has two inp for the current signal and the other for the torque signal. For both input feature-extraction modules and one WGAP layer were used to learn the fe quently, one concatenated layer and two output layers were connected avoid the overfitting problem and generalize the ability of the 1D CNN mo To ensure that the random parameters did not construct the final neural network, the neural network was built from a comparatively small size. If the average training accuracy is lower than 90%, users need to add more feature-extraction modules and follow the rules presented in Figure 8. Furthermore, once the average training accuracy is sufficiently high, users should check whether the difference between the average training and average testing accuracies is lower than 3%, implying that the overfitting problem is not significant. Subsequently, the model will be tested on a real platform. To verify that the model can be implemented on the motor diagnosis platform, 10 continuous predictions were used to determine the motor state for one diagnostic result. The failure mode with the highest number among the 10 predictions was the diagnostic result. The eccentric mode was diagnosed using the same method.
The architecture of the proposed 1D CNN model after experimental tuning is shown in Figure 7. Generally, the architecture of the proposed model has two input layers: one for the current signal and the other for the torque signal. For both input signals, seven feature-extraction modules and one WGAP layer were used to learn the features. Subsequently, one concatenated layer and two output layers were connected separately. To avoid the overfitting problem and generalize the ability of the 1D CNN model, the dropout regularization method [42] was used between the WGAP layer and the output layer with a rate of 0.2. Tables 2 and 3 show the hyperparameter settings of the proposed motor diagnosis model and feature-extraction module, respectively. The proposed 1D CNN model was trained using the Adam optimization [43] with the mean square error function. Based on a large number of experiments and observation of the classification accuracy, a fixed learning rate of 3 × 10 −4 was assigned. Finally, the softmax function was used to classify the feature vector into three failure modes and eccentric detection. The proposed 1D CNN model performs a high classification accuracy rate. The experimental results and analysis are presented and discussed in Section 4.

Experimental Results
In this section, the results of the experiment are discussed. First, the formula for calculating the accuracy rate is presented. Then, the evaluation of the performance of the proposed 1D CNN model and the comparison with other algorithms, including KNN, SVM, multilayer perceptron (MLP), 1D CNNG, 1D CNNT, and 1D CNNC, are discussed. In addition, the t-SNE [44] algorithm is used to reduce the dimensionality of the feature map and visualize the separation of the features learned from the model. Then, the relevant frequency components obtained automatically by the proposed 1D CNN are analyzed.

Classification Results
To evaluate the proposed 1D CNN model, the following formulas were used to calculate the classification accuracy: Acc avg = Acc mode + Acc ecc 2 × 100% (16) where Acc mode and Acc ecc denote the classification accuracies of the failure mode and eccentricity, respectively. T hm , T dm , and T bm indicate the true classifications of the healthy, demagnetized, and bearing fault motors, respectively. T ne and T e represent the conditions without and with an eccentric effect, respectively. F hm , F dm , F bm , F ne , and F e are the corresponding false classifications. In addition, the accuracy rate of the individual state, Acc state , was calculated using Equation (15), where T state and F state represent the true and false detections, respectively. Acc avg indicates the average accuracy of the method. Both the classification of the failure modes and the eccentricity are equally important; hence, the denominator of Equation (16) is 2.
The following methods were evaluated for comparison.
KNN classifier using the handcrafted features; 3.
1D CNNC: a one-input, two-output model based on the proposed method, but that only uses current signal; 6.
1D CNNT: a one-input, two-output model based on the proposed method, but that only uses torque signal; 7.
Proposed 1D CNN using current and torque signals.
Two classical machine-learning methods, KNN and SVM, were used. The KNN and SVM methods were trained using handcrafted features. The handcrafted features used are listed in Table 4. The current and torque signals were analyzed using the DWT [29], considering the seventh-level decomposition. The Daubechies 24 was used as the mother wavelet function. For each level, features 1-6 in Table 4 were extracted. To solve the confusion caused by the similar failure modes under different magnitudes of loads, the maximum amplitude of the time-domain and frequency-domain signals were extracted (features 7 and 8) as well. Therefore, the feature vector for KNN and SVM was 100 (8 × 6 + 2 for each signal). In this study, the KNN used the KD-tree algorithm to compute the nearest neighbors, and the number of neighbors was five. In the SVM, both the linear and RBF functions were used as the kernel functions for evaluation. Table 4. Handcrafted feature sets for k-nearest neighbor (KNN) and support vector machine (SVM).

Index
Features Formations Median absolute deviation Med AD = median(|X i − median(X)|)

Mean absolute deviation
The maximum amplitude of the time-domain signal The maximum amplitude of the frequency-domain signal For the remaining learning methods, models with a similar number of parameters (approximately 170,000) were designed for the analysis and comparison. MLP, 1D CNNG, and the proposed 1D CNN models used the normalized amplitude-frequency signals of the current and torque as the input signals. To confirm the effectiveness of the fusion features, the performance of the 1D CNNC and 1D CNNT models, which use a single signal only as the input vector, were evaluated. For the MLP method, two processing streams consisting of one fully connected layer with 14 neurons were used to learn features from the signals separately. The 1D CNNG model was constructed according to [29,30] and consisted of two input layers, four densely connected convolutional layers, a flatten layer, and two output layers. In 1D CNNG, the kernel size went from large to small, and the number of kernels increased as the model went deeper. As for the 1D CNNC and 1-D CNNT, both models consisted of one input layer, seven feature-extraction modules, a WGAP layer, and two output layers. The classification results obtained from the comparison are listed in Table 5.  Table 5 shows the classification accuracies of the individual states and the average accuracies. The average classification accuracies of KNN and SVM based on the handcrafted features were 88.96% and 94.66%, respectively. The classification accuracy of KNN and SVM was up to 85.00%; however, it was still lower than that of the proposed 1D CNN model. For such a complex dataset, handcrafted features cannot comprehensively represent the data. However, learning methods can effectively learn discriminated features from input signals. For the MLP and 1D CNNG methods, the average accuracies were 89.81% and 93.36%, respectively. Compared with the result of MLP, the architecture of the 1D CNN was more suitable for the frequency-domain signals used in this study. The 1D CNNG model performed well in the failure mode classification with accuracies of 100%, 99.89%, and 99.18% for the healthy, bearing fault, and demagnetized motors, respectively. However, the 1D CNNG model could not effectively determine whether the eccentric phenomenon occurred in the mechanical system-the accuracy rates were 99.60% and 74.49%.
Furthermore, based on the experimental results, the stacking of the densely connected convolutional layers and flatten layer used in 1D CNNG caused an overfitting problem. In the training stage, the experimental results revealed that the failure mode and eccentricity detection accuracies of 1D CNNG were up to 99.00% and 97.20%, respectively. The average accuracy was 98.1%. However, in the testing stage, the accuracy rates of the motor failure mode and eccentricity classifications were 99.69% and 87.04%, respectively. Moreover, the average accuracy rate was 93.36%, which was relatively lower than 98.1%. The overfitting problem was solved by replacing the densely connected convolutional layers and the flatten layer with the proposed feature-extraction modules and WGAP layers. Based on the experimental results, the average accuracy rates of 1D CNNT, 1D CNNC, and proposed 1D CNN in the training stage were 90.04%, 95.05%, and 99.43%, respectively. In the testing stage, the average accuracy rates of the 1D CNNT, 1D CNNC, and proposed 1D CNN were 89.80%, 94.80%, and 98.85%, respectively, which were close to the results in the training stage. The results show that the reduction of training parameters for the fully connected layer between the feature vector and the output layers could lower the possibility of overfitting.
Furthermore, a comparison between the 1D CNNT, 1D CNNC, and the proposed 1D CNN models is discussed. Table 5 indicates that the accuracy rate of the proposed 1D CNN model was higher than those of the similar models using the single sensor information, which confirmed the effectiveness of the fusion features learned from the extraction modules. The average accuracy of the proposed model was 98.85%, which was 4.5%, and 9.5% higher than those of the 1D CNNC and 1D CNNT, respectively. For the CNNT model, the accuracies of the failure mode and eccentricity detection were 90.41% and 89.20%, respectively. Moreover, that of the 1D CNNC model were 98.93% and 90.675%, respectively. Both models had an approximately 90% accuracy rate of eccentricity detection. However, when using a multi-signal network, the accuracy was improved to 98.04%. As for the failure mode detection, CNNC already achieved a 98.93% accuracy rate. After combining the current and torque signal information, the accuracy rate increased to 99.66%.
To provide an intuitive understanding of the effectiveness of the proposed method, the feature vectors learned from CNNT, CNNC, and the proposed model were visualized using the t-SNE algorithm. The t-SNE algorithm is a technique for dimensionality reduction. The 248-dimensional feature vector was reduced to three, and the visualization results are shown in Figure 9. By rotating and observing the plot, the features can roughly separate the three classes of failure modes. However, Figure 9a shows that the features of the three failure modes were easily clustered at the boundaries between each class. The separation of the three failure modes is better in Figure 9b than in Figure 9a; however, the features of the healthy, bearing fault, and demagnetized motor are sparsely separated into several groups. In Figure 9c, most features are distributed on the left side of the plot. However, it can be observed that the clustering of features at the boundary was not critical, as shown in Figure 9a. Furthermore, similar to Figure 9b, the features of the three failure modes were separated into several groups, but each group was closer than that shown in Figure 9b. The results shown in Table 5 and Figure 8 confirm the effectiveness of the proposed feature-extraction module and 1D CNN on multisensory fusion.
features of the healthy, bearing fault, and demagnetized motor are sparsely separated into several groups. In Figure 9c, most features are distributed on the left side of the plot. However, it can be observed that the clustering of features at the boundary was not critical, as shown in Figure 9a. Furthermore, similar to Figure 9b, the features of the three failure modes were separated into several groups, but each group was closer than that shown in Figure 9b. The results shown in Table 5 and Figure 8 confirm the effectiveness of the proposed feature-extraction module and 1D CNN on multisensory fusion.

Important Frequency Component
To find the important frequencies contributing to specific classifications, the CMF was generated and visualized in this section. Figure 10 shows the important grades of the frequency components for different failure modes under similar operating conditions. The

Important Frequency Component
To find the important frequencies contributing to specific classifications, the CMF was generated and visualized in this section. Figure 10 shows the important grades of the frequency components for different failure modes under similar operating conditions. The evaluated operating conditions were under 1000 rpm, 0.24 Nm loading, and no eccentric effect.  Figure 10a with Figure 10c, the current frequency spectra and the curves of the important grade clearly demonstrate the difference between the two types of motors. However, the torque frequency spectra and the curves of the important grade for both motors were similar. As shown in Figure 10a,c, frequency components smaller than 400 Hz had similar important grades for deciding the classification. Nevertheless, the model took more important grades in the range of 400-1050 Hz to obtain the classification results of the demagnetized motor. In Figure 10a, the peak value is located at approximately 550 Hz, whereas in Figure 10c, the two peak values are located at approximately 650 and 900 Hz. Furthermore, in Figure 10b,d, the two peak values are located at approximately 120 Hz and 425 Hz, respectively. However, in Figure 10d, frequency components smaller than 120 Hz accounted for a larger important grade, and the frequency amplitude at 425 Hz was larger than that in Figure 10b. The analyses in Sections 4.2 and 4.3 indicate that the current frequency spectrum mainly determined the classification results. Furthermore, the model could achieve a higher classification accuracy by combining the torque frequency information in the two influential regions.   Comparing Figure 10a with Figure 10c, the current frequency spectra and the curves of the important grade clearly demonstrate the difference between the two types of motors. However, the torque frequency spectra and the curves of the important grade for both motors were similar. As shown in Figure 10a,c, frequency components smaller than 400 Hz had similar important grades for deciding the classification. Nevertheless, the model took more important grades in the range of 400-1050 Hz to obtain the classification results of the demagnetized motor. In Figure 10a, the peak value is located at approximately 550 Hz, whereas in Figure 10c, the two peak values are located at approximately 650 and 900 Hz. Furthermore, in Figure 10b,d, the two peak values are located at approximately 120 Hz and 425 Hz, respectively. However, in Figure 10d, frequency components smaller than 120 Hz accounted for a larger important grade, and the frequency amplitude at 425 Hz was larger than that in Figure 10b. The analyses in Sections 4.2 and 4.3 indicate that the current frequency spectrum mainly determined the classification results. Furthermore, the model could achieve a higher classification accuracy by combining the torque frequency information in the two influential regions. Table 6 lists the training and prediction speeds of each model. The prediction time in Table 6 is the average time of 1000 prediction times. For the classical machine-learning methods, the training times of the KNN and SVM were 0.09 and 9.32 min, respectively. The training times were significantly shorter than those of the deep-learning methods. However, the training times do not account for the time required for feature-extraction. It took about 6 h to extract the handcrafted features from the entire datasets. The deep-learning method could simultaneously perform feature-extraction during training. The training times of the MLP, 1D CNNG, 1D CNNT, 1D CNNC, and proposed method were 33.28, 127.92, 120.93, and 85.06 min, respectively. Although the MLP and 1D CNNG methods required less time to train the model, their performances were typically inferior. By combining the features of the current and torque signals, the proposed model could converge faster, and took less time for training than the models that used a single signal. The SVM could perform the fastest prediction. The proposed method took 0.0335 s to predict a single sample, whereas the other models took approximately 0.025 s to predict one sample. The prediction time of the proposed method was ten times longer than that of the SVM. However, on the real platform, we used 10 continuous predictions to determine one diagnostic result. Hence, it took approximately 0.335 s to diagnose the state of the motor each time for the proposed model; 0.335 s is sufficiently short for the motor diagnosis system. Furthermore, among the deep-learning methods, the proposed model had the highest performance.

Computation System
The computer used for training in this work was a PC equipped with an NVIDIA GeForce GTX 1080Ti and an Intel ® Core™ I7-8700K. The PC had a 32G memory to calculate such a large dataset. This work used Python as a development tool owing to its convenient libraries, such as Scikit-Learn, Keras, and TensorFlow. The drawings in this article were done in MATLAB, owing to the convenient drawing functions and esthetics of its figures.

Discussion of Results
This paper proposes a 1D CNN model, which is a multi-signal fault diagnosis network for PMSMs. The experimental results reveal that the current and torque signal fusion features enable the model to perform better than single-signal models. The operating conditions evaluated included a wide speed range (100-1600 rpm), loading effect (0-0.24 Nm), and eccentric effect. The variable speeds, loads, and eccentricity resulted in a detection more similar to the actual applications. For humans, such complex operating conditions are difficult to classify and require considerable time for analysis. By using the handcrafted features, the KNN and SVM classifiers achieved classification accuracies of 88.96% and 94.66%, respectively. However, the designed 1D CNN model could automatically find the discriminated features under such complex conditions, and effectively classified three failure modes and eccentricity with accuracies of 99.66% and 98.04%, respectively. The total accuracy rate was up to 98.85%. Compared with a previous study, the dataset used in this study was relatively larger. The overfitting phenomenon was not significant.
Furthermore, this research extended the technique of 2D image object localization to a 1D signal important segment finding. The proposed model with the WGAP layer could generate the CMFs of the current and torque signals. CMF could effectively identify the important frequency components contributing to the classification. By observing the CMFs under different operating conditions, users can determine the difference in signals between normal and failure motors.

Future Work
We expect the motor fault diagnosis system to be used in the status monitoring of production lines. However, the system proposed in this paper is based on PC; the equipment cost and electricity consumption for a PC-based system are high. In the future, we will only use the PC to develop and design the detection model, and the embedded system will be used to implement the diagnosis. We intend to modify the proposed diagnosis system from a PC to an embedded board, such as the Jetson Nano. By observing the CMFs, the unimportant frequency components can be discarded and the size of the input vectors can be reduced. Moreover, by changing the computation platform and data acquisition unit, the cost of implementing the diagnosis to the production line can be reduced. In addition, the 1×1 kernel and the WGAP layer used in the proposed model can decrease the size of the model. These advantages can lower the computational resources and make the model more suitable for the embedded systems.

Conflicts of Interest:
The authors declare no conflict of interest.