Remaining Useful Life Prediction of Turbofan Engine Based on Temporal Convolutional Networks Optimized by Genetic Algorithm

It is tough to select the parameters of the deep neural network and enhance the accuracy in the field of remaining useful life (RUL) prediction. To address the problem, one RUL prediction model optimized by a genetic algorithm, based on temporal convolutional networks (TCN), was proposed. Firstly, forward-filling sliding sampling is used to add the samples’ time step. Then the genetic algorithm is used to search the hyperparameters of the residual block of TCN. Finally, the performance of the proposed method is verified by the C-MAPSS dataset. The results show that in the two evaluation metrics of root mean square error (RMSE) and score function (SF), the proposed GA-TCN reduces them by 8.2% ∼ 27.56% and 28.24% ∼ 79.35%, respectively, when compared with other studies. The RMSE and SF metrics of the proposed method are on average 17.10% and 54.10% lower than that of other methods in four sub-datasets of the turbofan engine.


Introduction
The turbofan engine, as the core component of the aerospace industry, has a complex structure and poor operating conditions. Accurate remaining useful life (RUL) prediction of turbofan engines will aid in ensuring the safe and reliable operation of spacecraft, as well as cutting costs on maintenance and enhancing economic benefits. Deep learning, new technology in the field of artificial intelligence in recent years, has excellent depth feature extraction ability and can well establish the mapping relationship between measured data and degradation trends [1]. Wang et al. [2] use sensor data directly as input data to a deep separable convolutional network (DSCN) to extract sensitive features and map them to the RUL of mechanical equipment. Hsu et al. [3] and Zheng et al. [4] both use the depth longterm memory network (LSTM) to estimate the RUL of a turbofan engine. However, the parameters of the same layer of an LSTM cannot be shared, and making LSTM parallel computing difficult. Furthermore, as the layers of network layers increases, so does the number of LSTM parameters, resulting in a very long training time of LSTM. Moreover, in existing RUL prediction methods, deep neural network parameter selection is difficult, and prediction model accuracy is difficult to improve.
Temporal convolutional networks (TCN) [5] is a kind of convolution neural network specially designed for time series processing. Compared with LSTM, it also has the ability of time memory, and its superior performance has been verified in the fields of machine translation, speech synthesis, time series prediction, and so on. Therefore, a TCN model based on a genetic algorithm (GeneticAlgorithm, GA) for parameter search is proposed in this paper, which is used to predict the RUL of the turbofan IOP Publishing doi:10.1088/1742-6596/2181/1/012001 2 engine. On the one hand, this method makes full use of the advantages of convolution layer parameter sharing, parallel computing, and good memory ability in TCN structure; on the other hand, taking advantage of the good global convergence, high optimization efficiency, and a small number of parameters of GA, the parameter optimization of TCN can reduce the workload of manually adjusting parameters, obtain a stable TCN structure, improve the prediction accuracy of the model, and provide a reliable basis for turbofan engine maintenance strategy.

Temporal convolutional networks
Temporal convolutional neural network (TCN) is a kind of convolution neural network, which is specially designed to deal with time series based on a residual network [6] signed by Bai [5] et al. in 2018. Its basic residual block structure is shown in figure 1 For one-dimensional sequence input x n ∈ R and convolution kernel : R , the dilated convolution operation F of sequence element s is defined as: The basic structure of TCN is shown in figrue 2, where each layer of TCN is a residual block (Res-Block), and the dilated rate of each series residual block increases as 2 N-1 from shallow to deep, which makes it have a better ability to remember historical information. The biggest advantage of TCN over RNN lies in the weight sharing and local perception of the convolution layer. The sharing of weight parameters can effectively reduce the number of parameters that need to be trained in the network, and the local perception characteristics can harmoniously reflect the local structure information of the input data covered by the current convolution kernel. Therefore, TCN can not only learn the long historical time dependence of input time series but also perform parallel computing like CNN. Although RNN can theoretically capture an infinitely long history, TCN has been shown to be more suitable for areas that require long-term historical dependence [5].

RUL Prediction Framework of GA-TCN
The improved residual block is used to build the deep TCN structure system in this paper, and the genetic algorithm (GA) is used to optimize the key network parameters. The preprocessed time series data is first used as the input of TCN, then the output of the last layer TCN is directly connected to a full connection layer with one neuron number, and the output of the full connection layer is directly used as the RUL prediction result. Figure 3 depicts the remaining service life prediction process, which is divided into three modules: data processing, parameter Optimization of TCN, and RUL prediction.  Figure 3. Proposed RUL prediction framework based on GA-TCN

Data processing
The original sensor data of turbofan engine are different in dimension and order of magnitude. In order to improve the accuracy of RUL prediction, this paper uses the maximum and minimum normalization method to limit the range of original data to [0,1]. In order to reduce the interference of the engine normal running state characteristics to the degraded state characteristics, references [2][3][4] and [7][8][9] all adopt the assumption of " Piecewise linear degradation" to deal with RUL labels. When the actual remaining life Y is greater than a certain upper boundary value T, the engine life is considered to be in the normal running state, and its life is always constant T; when Y is less than T, the engine life begins to degenerate and the engine life is the original actual value Y. , Where: label Y is the segmented RUL labels. In addition, the selection of T value is closely related to the degradation state of the engine. According to the research in reference [10], the T value is usually between 120 and 130 cycles.

Parameter Optimization of TCN
In this subsection, GA uses floating-point coding [11], and its specific search parameters and ranges are shown in table 1, corresponding to 1) activation functions A: ReLU, ELU, LeakyReLU, and SELU; 2) optimizer O; 3) the number of layers of the network, that is, the number of residual blocks L: to ensure that TCN can learn the time-dependent relationship of input data and facilitate training, it is stipulated that the network has a minimum of three layers and a maximum of seven layers, and its range corresponds to the range of the number of neurons in this layer; 4) Dropout rate D; 5) the parameter α of LeakyReLU: the value that will be further randomly generated when an individual network activation function in the initialized population uses LeakyReLU; 6) convolution kernel size K; 7) convolution way P: 1 ϕ means causal convolution, and 2 ϕ represents non-causal convolution; 8) batch size B; 9) initial learning rate λ . The expansion rate of residual blocks in each layer of TCN is that 1 2 N − , N is the layer value of residual blocks in each TCN network individual. In the process of training, when the prediction effect of the verification set which is randomly selected 10% of the training set, is not improved for 150 epochs, the learning rate of the individual network is reduced to 0.1 times of the original. The patience of early stop during training is 200 epochs. the maximum epochs of training are 300. the loss function is mean square error (MSE). The specific steps for optimizing TCN parameters are as follows: Step 1 (Initialization population): initialize the TCN models with a number of 10, and keep the population constant in each generation after that; Step 2 (Calculate the fitness of the population): the fitness of the population in GA is the MSE between the predicted value of RUL and the label value, and the smaller the value is, the better the fitness is.
Step 3 (The selection operator): the deterministic selection operator is used, that is, after the individuals of each generation are sorted according to the fitness from large to small, the first 50% is retained, and then 10% is selected from the remaining poor individuals, to prevent the algorithm from maturing prematurely.
Step 4 (The crossover operator): the uniform crossover operator is used, that is, the gene values on each gene position of the selected parents are exchanged with the same crossover probability to form two new individuals. The number of new individuals generated after crossover is determined by both the number of individuals retained after the operation of the previous selection operator and the number of initial populations.
Step 5 (The mutation operator): using the one basic mutation operator, randomly specify a parameter in table 1, mutate within the given range of parameter values, and set the mutation rate to 0.1.
Step 6 (Termination condition): when the algorithm completes 50 iterations, the operation is terminated.
The additional contrast method is the particle swarm optimization (PSO) algorithm [12]. According to the experience in reference [10], the learning factor of PSO is 1.2, the minimum inertia weight is 0.4,the maximum inertia weight is 0.9. other experimental conditions remain unchanged.

Test Results and Discussions
In this section, based on the framework of CUDA10.0 and Tensorflow 2.0 GPU, the algorithm is constructed by Python 3.7, and the calculation of the experiment is completed on a computer with configuration information of Intel Core TM i5-9300H CPU, Nvidia GeForce GTX 1660Ti GPU, and 16G RAM.

Dataset description
The verification data set of this paper adopts the Commercial Modular Aero-Propulsion System Simulation(C-MAPSS) data set which is widely used in RUL prediction [13]. The dataset consists of four subdatasets. These sub-datasets are simulated under different operating conditions and fault modes, and the monitoring data of 21 sensors and 3 operating conditions are collected to reflect the degradation process of turbofan engine from start to failure. The dataset description is shown in table 2, where FD001 and FD003 subdatasets have only one operating condition, so they are relatively simple and predictable, while FD002 and FD004 are difficult to predict because they have six operating conditions, especially FD004 and two failure modes. Therefore, the performance of the TCN algorithm can be well verified by using this data set. In the above 24-dimensional data, FD001 and FD003 are single operating conditions, so the data set with three fixed operating conditions will cause great interference to the extraction of TCN degenerate features. Because of the abnormal data of sensors 1, 5, 6, 10, 16, 18, 9, in the RUL prediction of FD001 and FD003, only the data of the remaining 14 sensors are used as the input of TCN; in the RUL prediction of FD002 and FD004, in addition to the 14 sensor data mentioned above, the data set by 3 operating conditions are also used. In addition, in order to facilitate the comparison, this paper carries out experiments on the upper boundary values of RUL tags, respectively.

Evaluation indicators
In this paper, the prediction performance of TCN is quantitatively evaluated by scoring function (Scoring Functions, SF) [13], root mean square error (Root Mean Square Error, RMSE) and accuracy (Accuracy) [7].
is the error between the predicted RUL value of the engine and the actual value of RUL; Q is the number of test samples.

Analysis of experimental results
The dataset used in the TCN parameter search is the FD001 subdataset, with the upper boundary value T =125 and the time step is 36. Therefore, the shape of TCN input data is (None,36,14).The convergence effect of GA and PSO is shown in figure 4. From the figure, we can see that the convergence value of the best individual of GA is about 135,while the convergence value of the best individual of PSO algorithm is about 150 GA. The optimization effect of GA is 11% higher than that of PSO algorithm. Finally, the parameters of the best individual are found in table 3, and then use these parameters to construct the TCN model structure of this paper, while continuing to use the same learning rate attenuation and early stop strategy mentioned above, only increase the maximum number of training to 1000.   year Apporach Wang et al. [2] 2019 DSCN Hsu et al. [3] 2018 LSTM Zheng et al. [4] 2017 LSTM + FNN Ellefsen et al. [7] 2019 RBM + LSTM Li et al. [8] 2018 CNN + FNN Liu et al. [9] 2020 AGCNN

Fitness of the best individual
The number of generations of evolution