Remaining Useful Life Estimation of BLDC Motor Considering Voltage Degradation and Attention-Based Neural Network

Brushless DC motor, also referred to as BLDC motor, has been a widely used electric machine due to its excellent performance over conventional DC motors. Due to complex operating conditions and overloading, several irregularities can take place in a motor. Stator related faults are among the most commonly occurring faults in BLDC motor. With an initial raise in local heating, a fault in the stator can largely reduce motor efficiency and account for the entire system breakdown. In this study, we present a deep learning-based approach to estimate the remaining useful life (RUL) of BLDC motor affected by different stator related faults. To analyze the motor health degradation, we have investigated two types of stator faults namely inter-turn fault (ITF) and winding short-circuit fault (WSC). A generator was coupled with the motor and using an average value rectifier (AVR), generator’s output voltage was monitored for the entire lifecycle. A proven neural network for effective sequence modeling, recurrent neural network (RNN) is selected to train the voltage degradation data. For a better estimation of nonlinear trends, long-short term memory (LSTM) with attention mechanism is chosen to make predictions of the motor RUL for both types of faults. The main concern that encourages authors of this paper is the proposed method can be used for the real-time condition monitoring and health state estimation of BLDC motors. Also, the proposed AVR-LSTM method is not affected by environmental influences, making it suitable for diverse operating conditions.


I. INTRODUCTION A. PRELIMINARIES
An effective prognostics and health management (PHM) framework is the fundamental element of many engineering and industrial systems [1]. Besides predicting a future failure, a proper condition monitoring technique can largely increase the overall performance of a system. Maximizing uptime, minimizing maintenance costs, and warning about system anomaly are the primary concerns of PHM technology [1]. Fault diagnosis and health prognostics are the two major components of PHM [2]. Fault diagnosis is the stage to detect the fault and find the root cause of failure. And prognostics deals with the forecasting of future state based on historical diagnostics data. Health prognostics is the most challenging part of PHM due to several reasons [3]. Firstly, it is difficult The associate editor coordinating the review of this manuscript and approving it for publication was Jiajie Fan .
to find a degradation parameter that accurately describes the change in health state over the period of lifecycle. Secondly, although there are several health indicators to diagnose a fault such as kurtosis, variance, RMS, entropy, etc., these features do not always hold a distinguishable degradation trend to properly estimate health degradation [4].
In motor operation, primarily we expect it to deliver a mechanical output for a given electrical input. Any fault that is interrupting or restricting the output of the motor, is a threat to the entire system [5]. In brushless DC machines, entire commutation is based on the electromagnetic induction since there is no mechanical brush. In BLDC motors, this is primarily done by altering the current polarity in stator coil by using some switching phenomena. Based on the pole position of the rotor, stator coils are converted into a set of electromagnets by energizing with different current polarity. Undoubtedly, the stator is the most important part of BLDC motor as the fundamental commutating operation takes place in here. Many anomalies can take place on the stator coil due to several operational complexity and manufacturing defects. For example, improper insulation during the build, excessive heat, aging over time, rub impacts due to rotor eccentricity, etc. [6]. In permanent magnet machines, the most frequently occurring stator-fault is the short-circuit in stator coils. Short-circuit can be found in different places of stator coil and based on its position, it can be categorized as: (a) Inter-turn short circuit, and (b) Winding short-circuit. In both cases, shorted sections create imbalanced impendence and induce currents in the winding that can damage the stator coil. These anomalies result in torque ripples, disturbed air-gap flux intensity, excessive heating, noise, and vibration generation. If the fault is not monitored properly, it can lead to the gradual breakdown of motor causing severe damage to the property [7].
Several studies have shown different approaches to detect and diagnose stator related faults in BLDC motor. A. Gandhi et. al. presents a comprehensive review of inter-turn faults in electric motors [8]. J. K. Park et al. proposed stator current frequency analysis and input impedance monitoring for inter-turn faults of BLDC motors [9], [10]. S. Rajagopalan et al. used time-frequency representation of motor currents to detect rotor related faults in nonstationary conditions [11]. S. T. Lee et al. used the inspection of third harmonic components of motor current to detect stator-related faults [12]. A detailed survey of BLDC motors faults can be found in [13]. These methods are well established and capable of the detection and diagnosis of different motor faults. Among them, the third harmonic analysis of motor current is proven to be quite robust in diagnosing stator related faults [14]. However, research on the RUL estimation methodologies accounted for stator-related faults has been very limited.
In literature, there are three approaches to predict the remaining useful life of a rotary component. These approaches are categorized as model-based, data-driven, and hybrid approaches. In a model-based approach, a mathematical model for the degradation process is required. And, in data-driven approach, a massive amount of data collected from the component using some sensors is required. Hybrid-approaches take advantage of both methods. Over the past few years, several model-based prognostics methods have been developed. For example, deep learning based [15], Kalman filter [16]. Bayesian network [17], etc. A review of model-based prognostics methods can be found in [18]. However, a model-based RUL prediction is troublesome since it requires the knowledge of inherent failure in order to build a suitable degradation model. A proper mathematical model is often unavailable for a system. On the other hand, with the industry 4.0 revolution, collecting, storing, and handling big data have become quite easier [19]. Available resources to process the data and advanced methods to extract fault-related information have increased exponentially making the data-driven prognostics a prevalent approach in machinery maintenance.
Ease in accessing big data has also made deep learning approaches quite effective in RUL prediction [20]. Many studies have shown the efficient application of deep learning algorithms in machinery RUL prediction A comprehensive review of deep learning for prognostics and health management can be found in [21], [22]. Recurrent neural networks (RNNs) have been the most popular deep learning approach for machinery RUL estimation due to its excellent time-series prediction. For example, Angela et al. used a hybrid VARMA-LSTM method to estimate the state-of-charge of electric vehicles [23]. Zhang et al. used LSTMs with deep fusion for machinery RUL prediction [24]. Wu et al. used vanilla LSTM for RUL prediction [25]. Further studies on LSTM for RUL estimation can be found in [26], [27]. In this study, we have implemented an attention mechanism with the conventional LSTM model for a more accurate and reliable RUL prediction.
Vibration signal collected from sensors throughout the entire lifecycle has been a popular measure for the fault diagnosis and RUL estimation of rotary machinery [28]. Several efficient and robust signal processing techniques are developed for the identification of fault and learn the trend of fault features. For example, positive energy residuals [29], spectral kurtosis [30], spectral L1-L2 norm [31], ensemble empirical mode decomposition [32], and wavelet transform [33]. Features extracted using these methods serve the purpose of health indicator, stating the state-of-health in different stages. After extracting sufficient fault-related information, the degradation trend is measured for different health indicators at different fault stages. However, the major problem of using multi-feature approach is the conservation of health-related information in different operating conditions. A machine operating in a laboratory environment will exhibit marginally different vibration response in industrial applications and so does a distinctive feature-trend. This makes it a challenging and inefficient task to build different RUL frameworks for different operating conditions. In case of permanent magnet motors, vibration response in presence of stator fault does not change abruptly, making it even more difficult to extract the health-related information from vibration signals. Especially, at the incipient stage of stator fault, no irregularity is observed in motor vibration to make a conclusion. This study aims to provide a robust prognostics framework of BLDC motors by considering output voltage as a degradation parameter. A motor-generator setup is built, and the voltage produced by a generator is considered as a health indicator which is directly related to the mechanical output (torque) produced by the motor. Primarily, generator voltage is not affected by surrounding noise, vibration, or heat. Rather it is constant for a specific mechanical input given into the generator. This allows us to directly monitor the performance of BLDC motor without interrupting the motor or generator operation. The fundamental concept that motivates authors of this paper is a robust RUL estimation framework that is not affected by environmental influences and can be monitored keeping the system in operation.

B. PROPOSED METHOD
In this paper, we propose a deep learning-based degradation estimation model to predict the RUL of BLDC motor. The output voltage of the coupled generator is considered as a degradation parameter of motor health state. Generator voltage is produced from the mechanical output of motor and free from external environmental influences like heat, noise, and vibration. Therefore, this parameter will act as a proper diagnostics index to estimate the RUL in dynamic operating conditions.
Generator's output voltage is necessarily a three-phase AC voltage. To convert this AC into equivalent DC voltage, we have used an average value rectifier (AVR) with a smoothing capacitor to get rid of the ripple effect. Motor health degradation trend is recorded for two types of statorrelated faults and by continuous monitoring of temperature, motor current, and speed, different fault thresholds are obtained. Motor current is analyzed in frequency domain and time-frequency domain to have a better intuition about the stator irregularity. Every 3 rd harmonic of motor current (3 rd , 6 th , 9 th . . ..) is observed to detect both types of faults. Once the necessary diagnostics information is obtained, collected data are filtered using the moving average technique to obtain an observable degradation trend. Later, LSTM architecture based on attention mechanism is used to estimate the RUL of motor for both types of faults. Analyzing several regression metrics, it is found out that both models performed effectively. However, in the case of nonlinear degradation trends, attention-based LSTM has a better RUL prediction over the regular LSTM.
Several other performance metrics are also computed to understand the performance of attention-LSTM and compared with the conventional LSTM model which are discussed in later sections. A brief overview of the proposed prognostics framework is illustrated in Fig. 1.

II. RELATED THEORIES A. RECURRENT NEURAL NETWORK
Recurrent neural network (RNN) was initially adopted for the sequence modeling in deep learning filed. For example, predicting a sequence of music, text translation, and video activity recognition. It was mainly introduced to share features across the neuron in different layers which was a difficult task using the ordinary neural network [33]. Later, with excellent capability of time-series information capturing, it has been largely implemented in many engineering fields. Time series data plays an important role in the field of PHM. Primarily, the data collected using sensors are time series data. In case of prognostics, we want to estimate a ''value'' of system heath as a function of time. Therefore, RNNs can be a handy tool for an effective estimate of the remaining useful life (RUL) [34]. In RNNs, hidden layers have some recurrent cells whose output is influenced by the past input and current input with some feedback connections. Based on the layout and organization of these cells, RNNs can be of many types such as-many to many, many to one, one to one, and one to many. An illustration of basic RNN network is illustrated in Fig. 2. X i denotes the input and y i denotes the output of i th layer. a i is the activation of i th layer which is passed through (i + 1) th layer. W ai denotes the weight vector for activations in the i th layer. Therefore, whole computing an output, y i , an RNN unit considers the input of that layer, X i as well as activation of previous layer, a i . This information is passed throughout the entire RNN architecture. For the first layer, a 0 is usually equal to a vector of zeros. Prediction of output y at time t can be expressed as (1) where ''Relu'' refers to the activation function and b y refers to the bias associated with neurons. There are many other activation functions available for computation such as-sigmoid, tanh, SoftMax, etc.

B. LONG SHORT-TERM MEMORY (LSTM)
Long short-term memory unit is used in RNN so that can handle long term dependencies. For a quite large and complex sequence, RNN finds it difficult to backpropagate through time and in cases, the gradient becomes exponentially large or small. This phenomenon is called exploding gradients or vanishing gradients problem, respectively. LSTMs are introduced to solve this type of problem by creating a memory cell. Therefore, LSTM is a powerful approach in a deep neural network architecture for sequence modeling [35]. An LSTM cell consists of three gates which are used to control the flow of information throughout the network.
These gates are called input gate, forget gate, and output gate. Fig. 3 illustrates the working principle of LSTM. When weights from the input gate take zero value, no element can get into the block. Similarly, when the output gate takes a zero value, no element can get out of the block. If both the gates are closed at the same time, elements inside the cell are trapped and will not affect outside elements. Therefore, when backpropagation is done through an LSTM network, it can propagate across many time steps without any vanishing gradient problem [36]. Mathematical representations of different weights and gates of the LSTM are [37]: [ * = input, forget, and output.]

C. PROPOSED ATTENTION-LSTM MODEL
Although LSTM can handle long term dependencies by introducing a memory cell in the unit, it fails to capture or pay different degree attention on sequences with multiple time-step predictions [38]. During the training steps of an LSTM network, a sliding time window is moved forward to obtain a prediction result. Therefore, predictions are affected by the sequence of events. In the added attention mechanism, a quantitative weight is added to each important time step to overcome the shallow attention of LSTM. Due to the additional weights, the model can automatically focus on relevant information and pay more attention to the intrinsic characteristics of sequence data [39]. In case of some nonlinear trends in time-series data, attention mechanisms can put importance on the inconsistent changes and improve the performance of the model. Mechanism of attention-based LSTM is briefly described below: Step 1: Initialize weights such as-LSTM size, batch size, time step, and input-output size. Assume learned feature vector by LSTM network is Step 2: Specify attention weights and vectors as: Here, W x is the trainable weighted matrix and b att is the bias vector. s h is the current hidden state and s t is the target hidden state. W att is the attention weight vector and a i is the SoftMax normalized vector. ϕ( * ) is the activation function which can be different for (8) and (9). In this study both are kept as Relu.
Step 3: Initialize hidden state as per the weights in step 1 and step 2.
Step 4: Compute the weighted time-step vector as: Step 5. Process the weighted time steps by concatenating all the c t and computed output vector as: where H T denotes the transpose operation of H and A = A concise framework of the proposed model is presented in Fig. 4. The attention layer takes the weights computed from LSTM layers and computes a new time step as expressed in (10) [40]. In the attention mechanism, the weights of LSTM layer are adjusted which are directly responsible for the score. Implying that the attention will not be focused on every element of the input samples rather on the particular input sequences [41], [42]. This will cause additional training time compared to LSTM regression. A regularization parameter named ''Dropout'' is used to avoid overfitting phenomena during training stage. The trained model is used to predict the RUL and several performance metrics are computed to evaluate the performance of the model.

III. TEST BENCH AND FAULT DESCRIPTION A. TEST RIG SETUP
Unlike conventional DC motor, BLDC motor lacks a mechanical commutator. Due to the lack of a mechanical part, a BLDC motor delivers higher torque compared to DC motor. Other major features of BLDC motor include higher efficiency, noiseless operation, better torque-to-body ratio, and longer lifecycle [43]. However, the control of a BLDC motor requires extra effort. An external controller with a different control mechanism is needed for precise operation of BLDC motor. Several studies have shown different control schemes of BLDC motors such as -motor-driver, sensor-less control, PID tuner, comparator-based control, etc. [44], [45].
To perform tests on BLDC motor for different fault types, we consider controlling the motor using a motor driver. A picture of the motor test circuit is presented in Fig. 5. The motor driver delivers a 24V constant DC voltage as pulse width modulation with a 50% duty cycle. Embedded hall effect sensors (HES) in the stator windings monitor the pole position of rotor and motor driver energizes the stator coils accordingly. We have used a 26W interior permanent magnet (IPM) type BLDC motor manufactured by DNJ Korea. Details of the motor used for this study is presented in Table 1. We built a conventional generator-motor (G-M) setup where the BLDC motor is coupled with a generator. G-M setup allows us to build, control, and measure experimental parameters with ease. A spider-type mechanical coupling is used to join the motor shaft and generator shaft together. It is understood that rotation of generator's shaft is entirely dependent on the force applied by motor shaft. Therefore, the electrical energy induced in generator is necessarily activated by the motor. This is the fundamental concept of this investigation and estimating motor's health state based on the electrical energy produced at the generator's end. Both the motor and generator are IPM-type brushless DC machines and bought from the same manufactures which minimizes the uncertainty in electromechanical energy conversion. Also, a higher rated generator (40W) is used compared to the motor (26W) to ensure that there is no overflow of current or voltage in the generator.
Besides providing the pulse width modulation (PWM) input to the motor, the motor driver used to serve many purposes such as-sensing the rotor pole position from HES signals, controlling the speed of motor, and displaying instantaneous torque. Since, it is impossible to measure generator output accurately without load, a 10M load was used at the generator end structured in a delta configuration. A fullbridge rectifier named average value rectifier is used to convert the AC output of generator into a DC voltage.
NI cDAQ-9178 has been used as a data acquisition device along with a virtual bench. Different modules were installed in the device to acquire different sensor data such as-NI-9246 for current, NI-9205 for voltage, and NI-9214 for temperature. For this study, four sensor-data were monitored continuously during the motor test: motor current, generator voltage, stator temperature and motor speed. Sampling rate for current and voltage acquisition was set to be 5 kHz and for temperature it was 100Hz. LabVIEW software, which is also a product of same manufacturer's as DAQ, is used to configure, update, and set different DAQ parameters.

B. STATOR-RELATED FAULTS
As mentioned earlier, the rotor is a permanent magnet in BLDC motor. By passing three-phase currents through the coil windings, each winding is converted into an electromagnet. This electromagnet (stator) interacts with the poles of permanent magnet (rotor) by attraction or repulsion. Similar poles will repeal, and the opposite poles will attract. Therefore, after a certain amount of time, poles of the electromagnet must be altered to keep the rotor moving. It can be done easily by switching the polarity of current in the stator winding. This is how the commutation occurs in BLDC motor without any mechanical contact called brush [46].
We use a motor driver to control the operation of BLDC motor at different speeds. Motor driver converts the fixed DC voltage into a trapezoidal PWM input. PWM acts like a switch and applies it to the motor with a series of ON and OFF pulses. By controlling the period of PWM pulses we can control the duty-cycle as well as speed of the motor [43]. An inverter embedded into the motor driver working as a switch energizes motor phases based on HES signal. Table 2 represents the commutation logics and phase operations during each logic of a 6-step commutation. Three phases of the motor are named as phase A, phase B and phase C. At any given step, commutator logic is either ''logic high'' or ''logic low'' which corresponds to ON and OFF, respectively. For example, at step 1, AH and BL are 1 and rest of the logics are 0. At this situation, a positive current will be passed through phase A and negative current through phase B. Phase C will remain unbiased at this step. For the next step, inverter commutation logic changes accordingly (Table 2 ), and three phases are energized with positive, negative, and no current. In case of the BLDC motor used for this study, it has 4-pair of 3-phase windings making it a total of 12-windings motor. It is understood that the most crucial and important commutating phenomenon occurs in the stator coil of BLDC motor. Therefore, a fault in the stator creates significant amount of irregularities in motor operation and gradually, this can lead to an entire breakdown of motor system. In permanent magnet machines, most frequently occurring stator-faults can be categorized as: (a) Inter-turn short circuit, and (b) Winding short-circuit. In both cases, shorted sections create several complicated unbalances in motor operation that can develop severe faults in BLDC motor. In this paper, inter-turn fault is referred VOLUME 8, 2020 to as ITF fault, and winding short-circuit fault is referred to as WSC fault unless otherwise specified.
The BLDC motors used for testing are 2-pole 12-winding motors having a 3-phase WYE configuration of stator coils. This implies that every 4 windings are connected together making three different phases in a total of 12 windings. The internal configuration of BLDC motor used in this study is illustrated in Fig. 6. During the ITF fault test, we made a turnturn short-circuit by connecting two points of winding B2' as shown in Fig. 6(a). On the other hand, for WSC fault, we have created inter-winding short circuit by joining winding A1 and C2' as shown in Fig. 6(b).
Since the windings represented with the same letter are internally connected together, connecting any of the four windings results in a winding short-circuit. To get a better intuition about the fault states and their effect on motor operation, a list of motor parameters is shown in Table 3. Some of the parameters are predetermined such as input voltage and loads. These are kept similar for all three types of test. Other parameters such as motor current, coil temperature, and motor speed are dependent on motor's performance. These parameters show variable characteristics subjected to different motor faults such as-ITF and WSC fault. In case of temperature monitoring, a 2-wire thermocouple was installed inside the stator coil adjacent to the defected winding. Since the rise in coil temperature is the primary evidence of stator irregularity, monitoring the temperature of defected windings at the earliest stage will provide a better fault diagnosis information.

C. AVERAGE VALUE RECTIFIER
The output of a coupled generator in G-M setup is necessarily a 3-phase AC voltage. For better relevance with DC input voltage and ease in real-time condition minoring, we convert the generator AC output into a constant DC by using an average value rectifier (AVR). It is a full-wave, 6-pulse rectifier with 6 diodes in operation. Typically, the output is a DC with two positive pulsation values. In the motor experiments, we have used a smoothing capacitor to make it a better DC output. AVR used for the generator voltage measurement is illustrated in Fig. 7.
Voltages calculated from an AVR can be expressed by following equations [47]: where: • v a v b v c are the three-phase AC voltages.
• v ref is the DC offset on the AC side.
• V RMS is the RMS component of AC voltage.
• v DC is the voltage difference between the positive and negative terminals of the rectifier.
• v p v n are the voltages at the positive and negative terminals of the rectifier. Different forms of output voltages can be measured for different requirements. To study the degradation trend of generator voltage, we chose to use V RMS in our computation. Usage of a capacitor plays an important role in the output of AVR. In a complex operation, AC components can get mixed with the DC outputs of AVR creating a pulsation in the output known as ripples. Excessive ripples can lead to an erroneous approximation of DC voltage therefore it is not desired [48]. Based on the amount of power generated in the generator end, a suitable capacitor is used to minimize the AC ripples and get a stable DC output.

IV. RESULT ANALYSIS A. FAULT DIAGNOSIS
Fault detection and diagnosis with proper fault magnitude identification is the preliminary task of prognostics. For stator-related faults in BLDC motor, we have monitored several parameters such as-coil temperature, motor current, vibration, and output voltage. As motor vibration does not change during the incipient stage of failures, we present the stator coil temperature and motor current signature analysis to understand the incipient fault characteristics. Figure 8 shows the stator coil temperature trends for different stator faults. A 2-wire thermocouple is used to measure the coil temperature adjacent to faulty coil. All three motors are started at the same time where the stator coil temperature was almost equal to the experiment chamber temperature which is around 28 • C. Motor with WSC fault shows an exponential peak in temperature at the initial hour. ITF fault state also has a rise in temperature for the first hour, but it is less than that of WSC fault. Healthy state motor on the other hand, shows an almost constant temperature without any sudden rise. After 300 minutes of operation, the temperature trend hit a plateau for both types of faults. It can be observed that in faulty cases, stator temperature reaches up to 85 • C and 125 • C for ITF and WSC faults, respectively. Whereas, in healthy state, the temperature is around 40 • C. Raise in coil temperature is the primary evidence of stator related fault. Excessive heat produced in the stator can lead to several other severe faults such as-winding insulation break, reverse magnetic flux, and demagnetization of permanent magnet (rotor). Since a winding short-circuit was made in the stator winding of the motor, there was a significant change in the motor phase current too. Motor current signature analysis (MCSA) is a handy tool for fault diagnosis. However, analyzing only the time series motor current is not sufficient to detect the variations in phase currents. Therefore, the motor current is analyzed by determining frequency components using fast Fourier transform (FFT). Figure. 9 represents the line current trends of BLDC motor in all three states of health. Sensor current data as a function of time for all health states are presented in Fig. 9(a), 9(b), and 9(c). Since the winding-short circuit was on phase A and phase C, both phases show identical characteristics throughout the entire lifecycle of motor operation. To avoid redundancy in fault characteristics, we present the current analysis of phase A for WSC fault and phase B for ITF fault. The maximum current recorded for phase A is 2.5A at the healthy state whereas it increased to 4.0 at the ITF and WSC fault states.
The motor used for testing had a 120 • conduction in a sixstep continuous operation of phase A, phase B and phase C. Therefore, in normal operation, tripled harmonics (3 rd , 6 th , 9 th . . .) which are necessarily zero sequence components, are not present in a phase current. This phenomenon obeys the Kirchhoff's current law (KCL) which states the sum of three-phase currents at a certain time must be equal to zero [12]. Violation of KCL is considered as a deviation of motor's normal operation which will result in a peak for every third harmonic [14]. Frequency components of motor current are presented in Fig. 9 right column. Third harmonic components for healthy state current is not present whereas it has a noticeable peak for the faulty states as shown in Fig. 9(e) and 9(f) for ITF and WSC faults, respectively. Magnitude of harmonics and their frequency locations computed from FFT are presented in the Table 4.

B. DEGRADATION DATA
Generator's output voltage was continuously monitored as a performance measure of BLDC motor over time. 3-phase AC of generator's output is converted to single DC voltage using average value rectifier. A smoothing capacitor was placed at the output terminals to take care of the rectifier's ripple effect. AVR output can be interpreted as many variations such as RMS voltage, equivalent DC, and positive part of the AC voltage. Since we have considered AVR output as a health indicator of motor, we selected the RMS voltage of the AVR output as shown in (13). During measurement, same data acquisition environment was used for both cases. For example, sampling rate at the generator AC output and the AVR DC output was kept 5 kHz. Figure 10 represents the generator three-phase voltages labeled as phase U, phase V, and phase W. Corresponding DC output is shown in Fig. 10 (d).
In practice, transformed DC values consist of residual variations known as ripples. Usually, the ripples are periodic in nature and their fundamental frequency is twice that of the AC voltage. For a longer time span, ripples induced in voltage data might lead to an erroneous approximation. Therefore, to reduce the ripples in DC, a smoothing capacitor is used with the AVR. Several configurations of smoothing capacitors were used at the AVR end and based on the optimum performance, a 100 µF capacitor was selected. By the term optimum performance, we refer to the fact that produces the least amount of voltage ripples and the lowest possible DC peak-to-peak. 100 µF capacitor produced a DC peak-to-peak of around 0.4-0.6 V during conversion.
Besides, acquired voltage data comprised of many redundant time-series data and outliers. Therefore, filtering the data is necessary to obtain a discernable degradation trend. We choose two-fold sliding moving average (MA) function to extract the diagnostics related information from the sensor data. MA performed on time-series voltage data can be expressed mathematically in (17).
Here, v 1 + v 2 + v 3 + . . . v m is the sum of voltages at t 1 , t 2 , t 3 . . . t m instances, respectively. m is the number of MA order referring to the number of time-series elements to be considered in a single MA computation. Sensor data consist of redundant information and short-term fluctuation that can lead to an erroneous prediction. Therefore, we performed 2-fold MA over the raw data. First MA was performed with an order of 100 and second one with order 10.
In machine learning, normalization (or standardization) of sensor data is a popular approach to improve the performance of a model. Many researchers have adopted this technique to scale the data into a boundary with a fixed range of values according to their requirements. Normalization provides a new dataset with lower standard deviation and suppresses outliers. However, in this study, normalization was not done solely for this purpose. We have used the normalization after the training is performed on the dataset and set a boundary for the maximum life (RUL 100% ) and end of life (EOL) at (RUL 0% ). We took the help of a min-max scaler to scale voltage data from a range 0-1. Since the RUL primarily focuses on the estimation of component usage, a percent scale showing 0% or EOL and 100% or full lifetime will give a better intuition. Voltage normalization using min-max scaler can be expressed as (18): V sc is the scaled voltage after min-max scaling. V max and V min refers to the maximum and minimum voltage. V i is the voltage at i th instance. Figure 11 illustrates the degradation data of generator output voltage for both types of faults. Red scattered points represent the AVR voltage for 193 hours in case of ITF fault and 255 hours in case of WSC fault. An observable trend for health state degradation was achieved by sliding the MA function over different window lengths. It can be noticed that the degradation trend in WSC fault consists of more nonlinearity compared to that of ITF fault. Therefore, a model should be more robust in predicting the RUL of WSC fault.

C. RUL ESTIMATION
After acquiring the degradation data for ITF and WSC fault types, data were normalized by using a moving-average filter. Filtered values are considered as actual RUL of BLDC motor and fed into the RNN network for training. Since finding the optimal parameters is a challenging task, we tested several trials of LSTM model with different parameters. Such as, activation function is used as exponential linear unit (ELU) and linear in the output layer for several batch sizes e.g. 1300, 3500, 6900, 8000 etc. Later, ELU with batch size 8000 is chosen to train the models as it provided the best performance for validation dataset. ELU preserves the advantages of rectified linear unit (ReLU) as well as decreases the bias shift problem making the model suitable for nonlinear regression [39]. Two LSTM layers were used with 128 and 66 neurons for ITF fault. To predict the nonlinear behavior of WSC fault, first LSTM layers neurons were increased to 264. Adam optimizer is used for both models with a learning rate of 0.0001. In case of attention-LSTM model, degradation data were the inputs for LSTM layers and, the features learned from LSTM layer were fed into the attention layer to computed necessary weights as shown in (10). Later, the weighted time steps are processed for the regression and prediction of degradation data. Adam optimizer is used to compute the adaptive learning rates based on the error gradients. Dropout is used to mask the hidden states so that the neurons do not affect the forward propagation during training. It helps the model to avoid overfitting. However, during testing and predictions, dropouts were avoided so that the neurons can forecast the future time steps. A list of model parameters is shown in Table 5. Filtered degradation data are considered as true RUL which is shown in Fig. 11. A best fit linear trend presenting the maximum life, RUL 100% and EOL, RUL 0% is computed as a function of linear regression to the actual RUL. This linear fit line is called true RUL. Besides representing a linear RUL trend, true RUL is used to compute several metrics and indexes of prognostics. In this study, we have computed error ratio, ER i and its standard deviation to evaluate prognostics model performance. Figure 12 represents the RUL estimation of BLDC motor subjected to ITF fault. Total motor lifetime for ITF fault was 193 hours. A total of 373,242 data points was used to predict the RUL where 70% were kept for model training and 30% for testing. LSTM model performed quite well in predicting motor's RUL for ITF fault. In Fig. 12 (d), the red curve shows the true RUL as a linear regression from RUL 100% to RUL 0% . It can be observed that the prediction of LSTM with attention mechanism has a better performance compared to that of regular LSTM. Observing the entire trend might be difficult to understand. Therefore, a RUL estimation results are presented in 25%, 50%, 75%, and 100% service time intervals to make the prediction understandable. We can see that attention-based LSTM has been quite good at capturing the trend of actual RUL as well as its prediction is much closer to the actual data compared to that of LSTM without attention mechanism. Figure 13 presents the RUL estimation for WSC fault. Motor service hour was 255 hours with 429,885 data points. 70% of these data were used for training the models and 30% for testing. As seen from the degradation data, WSC exhibits a nonlinear trend on several occasions. This is why we have used 264 neurons in the first LSTM layer, unlike ITF fault model which was 128. Other parameters for WSC RUL estimation were kept as the same. Regular LSTM could capture the degradation trend quite well, but the prediction accuracy was lower than the LSTM with attention. Like the ITF fault, added attention layer resulted in a better prediction in capturing the nonlinearity in case of WSC fault too. To understand the predicted RUL, both predicted RUL and the actual RUL are plotted with a shorter time frame of 25%, 50%, 75%, and 100% data.

D. PERFORMANCE EVALUATION
As seen in Fig. 12 and Fig. 13, LSTM with attention mechanism outperforms conventional LSTM in terms of better prediction for both types of faults. This is due to the effect of emphasizing attention in abruptly changing degradation trends It can be seen that LSTM with attention mechanism holds an almost identical trend of voltage degradation. Even for a small portion of change in trend, attention-LSTM is able to capture the change and predict accordingly. However, in terms of computational cost, attention mechanism required more time per epoch during training due to extra weight computations. Two regression metrics are obtained from both models namely root mean squared error (RMSE), and mean absolute error (MAE). These metrics can be described mathematically as (19) and (20).
here: v i = actual voltage datâ v i = predicted voltage data by the models N = number of data points Equation (21) is used to compute the percent error of RUL prediction for different amount of data [49].
RMSE and MAE of the models are presented as percent errors in Fig. 14  MAE or RMSE gives us the information about the accuracy of model's prediction. ER i is further computed to get an intuition about the RUL prediction error. In Table 6, standard deviations of the ER i is shown in order to evaluate the stability of RUL prediction results for different data sizes.
A few more things should be considered while evaluating the performance of a deep learning model such as-computational resources, and time take for training. All the models were trained on a computer with AMD Ryzen 7 2700 octacore CPU and 32GB of RAM. For accelerated computation, an NVIDIA GTX 970 GPU is used with 4GB VRAM. Training time for LSTM models were 83 and 108 minutes for ITF and WSC faults, respectively. In case of attention-LSTM model, it took around 117 and 145 minutes for ITF and WSC train data, respectively. Therefore, it can be inferred that if the computational time is not a factor, attention-based LSTM is preferred over the regular LSTM for a better RUL prediction.
The added attention layer is adopted to get a more focus on the informative data of health degradation. This is also intelligible from the performance metrics of RUL models. In both cases, MAE and RMSE of attention-LSTM are less than that of LSTM models. Error ratio, ER i is directly related to the prediction error of actual RUL and predicted RUL. Standard deviation of ER i for different data sizes also indicates the stability and better prediction of attention-LSTM over the conventional LSTM model.
An accurate estimation of remaining useful life (RUL) can not only verify the goal of a system, but also eliminates failure risks by early fault detection. This study aims to provide a real-time RUL estimation framework of BLDC motor subjected to different stator-related faults. Components used for the measurement and instrumentation in this study can be easily built in an industrial environment. This method can be further extended to predict RUL of other electric motors as well as other rotary machinery by connecting a generator with the rotating part and monitor the output as a health index.

V. CONCLUSION
BLDC motor has gained vast popularity over a few decades due to its high efficiency and low maintenance. With increased demand and complex operation environment, a robust prognostics and health management framework for BLDC motors is essential. This paper has presented an effective RUL estimation method of BLDC motor by considering the generator output voltage as a health indicator. Two types of stator related faults are investigated namely ITF fault and WSC fault. MCSA is performed on motor current for both fault types to understand the fault characteristics and identify the faults at the earliest stage. To acquire generator output voltage, we have used an average value rectifier to efficiently sense and acquire data. Collected data for the entire lifecycle are normalized using moving average filtering and ground truth of the degradation is obtained as true RUL. Later, a conventional LSTM model and attention-based LSTM model were trained for the future predictions of RUL. Proposed attention LSTM model is found to be more effective in predicting RUL for both types of faults. Outcomes of this paper can be summarized as below: (1) This RUL estimation method can be used for different operating conditions as it is free from environmental influences such as-heat, noise, and vibration. (2) This method will allow us to predict the RUL as well as estimate the state-of-health of motor during operation. This real-time condition monitoring technique will be highly applicable for the BLDC motor's condition monitoring on the fly. (3) The model build for the prediction of output voltage can be further implemented for active power monitoring, efficiency monitoring, etc. Since these measures also depend on the output voltage and current, real-time condition monitoring can be performed using these indexes keeping the motor in operation. In future works, we look forward to modeling uncertainties associated with this RUL estimation framework. The proposed AVR-LSTM model with attention mechanism can be further modified to capture the fault-related information using a real-time updating method. This will further increase the robustness and reliability of this method.