Industrial Time Series Data Forecasting of LSTM Neural Network Based on Attention Mechanism

At present, the maintenance mode of industrial equipment is still based on regular maintenance and after-the-fact maintenance, and with the development of industrial production intelligence, the production data of equipment has increased dramatically. In order to reasonably carry out maintenance activities in the use stage of CNC machine tools and other industrial equipment, this paper proposes an industrial time series data prediction method based on LSTM of attention mechanism. Firstly, based on the LSTM recurrent neural network, combined with the complex historical data of CNC machine tools, the overall characteristics of important time series are obtained. Secondly, the mechanism of attention analysis is introduced, and the algorithm of network structure design and prediction process is given. Finally, the accuracy of prediction is compared with experiments with standard BP neural network. The experimental results show that, compared with the standard BP neural network, the LSTM model based on the attention mechanism has higher prediction accuracy, which verifies the effectiveness of the method.


Introduction
The development of new technologies, such as the Industrial Internet of Things and Big Data, has greatly promoted the process of intelligent and automated industrial production. In the process of industrial production, it is very important to find reliable indicators that can represent the stability of production from a large amount of information collected by various sensors. As a kind of data with typical time series characteristics, fault time series data can effectively predict the reliability of the use stage of CNC machine tools and other equipment, avoid blindness to equipment maintenance, enhance the pertinence of equipment maintenance, and improve the reliability and continuity of equipment, at the same time, this is of great significance for formulating reliable production plans and carrying out reasonable maintenance activities.
At present, there are mainly two kinds of methods for predicting equipment reliability through predictive reasoning based on existing knowledge experience such as time series data. One is based on statistical model, such as ARMA model [1] , ARIMA model [2] and so on. The other is the prediction method based on artificial intelligence, such as artificial neural network, support vector machine [3] and deep learning [4] . Compared with the statistical model method, the prediction method based on artificial intelligence can well deal with the nonlinear relationship between data and the overall logic of data and information.
With the rapid development of deep learning, some deep learning models are used in the research of CNC machine tool fault prediction. LSTM neural network is a special recurrent neural network with strong sequence information processing capabilities. Considering the gradual failure of the CNC 2 machine tool during operation and the strong time series characteristics of the collected data, this paper proposes an attention-based [5] LSTM industrial time series data prediction method, including 4 layers (input layer, hidden layer, fully connected layer, output layer) detailed design of the network structure, as well as the realization algorithm of network training and network prediction. In this paper, the simulation experiment is carried out using the data set of centerless grinding machine and compared with the BP neural network model. The experimental results show that the LSTM neural network time series data prediction method with attention mechanism has high prediction accuracy and is suitable for industrial time series data prediction.

Research process
This section gives the LSTM neural network industrial time series data prediction model based on the attention mechanism in view of the time series characteristics of industrial time series data. See Figure  1 below. The model can be divided into a four-layer network structure and two functional modules, namely the input layer, hidden layer, fully connected layer, output layer, model training module and model prediction module. In the study, supervised learning is used to train the fault prediction model. The input layer preprocesses the original fault time series data to meet the input requirements of the model. The hidden layer uses several LSTM memories cells to build a recurrent neural network. Through repeated iterations, the weight parameters are adjusted to reduce the error until it converges. Attention mechanism is introduced to give different reference weights to the data at different points in time and add dropout algorithm to alleviate the over fitting problem in the process of multivariate data learning. The fully connected layer reduces the data dimension and improves the prediction accuracy. The output layer outputs prediction results. The model training module adopts the Adam optimization algorithm based on gradient. The model prediction module adopts iterative method to use the trained LSTM model for prediction.

Network model training
The network model training mainly focuses on the hidden layer. Before inputting the data into the prediction model, the original time series data set is preprocessed at the input layer. The original time series data set x is the standardized individual data.
, , ⋯ , , 1 , ∈ (4) Then the theoretical output corresponding to the model input can be expressed as: , , ⋯ , , , ⋯ , Integrate the multivariate data in the standardized training set into [samples, time_steps, features] three-dimensional data, and input into the single hidden layer structure of the model. The hidden layer contains L LSTM memory cells connected in series, where L H represents the output of the previous LSTM memory cell, and L C represents the state information. In order to reduce the problem of gradient disappearance and speed up the training, the ReLU function is used as the threshold activation function. Adding the Dropout algorithm to the hidden layer and randomly dropping the cell and its connection during the training process can alleviate the problem of over-fitting. Introduce the attention mechanism into the hidden layer and using the softmax activation function to calculate the attention probability distribution values  assigned to each input, so that its output weight sum is 1, and generate an attention weight matrix and context vectorV . The calculation formulas are as follows: ℎ ℎ , ∈ , ℎ ∑ , ℎ ， 1 , ∈ ℎ , ∈ In the above formulas, s W represents the weight matrix, s b represents the offset, and the score function is a similarity measurement function. The generated context vector V is input into the softmax function for normalization, so that the sum is 1. Then, information is extracted through a fully connected layer.
After training in the model, the output of training set ' tr D can be expressed as follows: , , The error test between the model output P and the theoretical output Y is controlled by the loss function. The minimum loss function is taken as the optimization goal. Given the random seed number, learning rate, and training steps initialized by the network, the model continuously updates and corrects the network weights through the BPTT forward calculation method and the Adam stochastic gradient optimization algorithm, so as to get a more optimized hidden layer. The loss function uses the mean square error, the calculation formulas are as follows: (12) ∑ (13) In the formulas, i p is the true value, i y is the predicted value, L is the window length, and L m  represents the number of elements in each sample sequence.

Network model prediction
, , ⋯ , Then combine 1  m P and the last The training fitting sequence tr P can be obtained by processing the model input sequence X of the training set tr D in the same way. By calculating the deviation between tr D and tr P , te D and te P , the fitting accuracy and prediction accuracy of the model can be given quantitatively.

Simulation experiment
In this section, combined with the operation data of CNC machine tools, the prediction model proposed in Section 2 is used for simulation experiments.

Experiment preparation
This section mainly introduces the data set used in the simulation experiment and the configuration of the experimental environment.

Data set.
The data required for the experiment is provided by a sensor data set of a CNC machine tool in a mechanical processing plant. The data set contains data of 8 different types of sensors such as equipment power and spindle speed in the same time period. The data set has a time span of 6 days and a collection interval of 1 minute. After filtering the data set, there are 8 groups of valid data, 6000 in total. The 8 groups of data are divided into 2 groups of label data and 6 groups of continuous data. Figure  2 shows the visual features of 3 groups of data. Considering that the three data of equipment power, voltage and current play a key role in the normal operation of CNC machine tools, we decided to predict these three data. In order to make the input data multivariable, 8 groups of data are cleaned and integrated into three-dimensional data: [samples, time_steps, features], which are input into the hidden layer with time steps as the unique index.

Experimental results
After comparative experiments, the results are shown in Table 1:  Figure 3 shows the comparison results of the predicted and actual values of the equipment power of the two models (BP neural network does not participate in the comparison because of poor fitting). Blue is the predicted value, and red is the actual value. It can be seen from the figure that the prediction result of the time series data of the LSTM model based on the attention mechanism has a better fit with the actual data and the prediction is more accurate.

Summary
Relying on the background of industrial big data and industrial intelligence, this paper deeply analyzes the overall logical sequence of industrial time series data, mines the complex temporal correlation between information, and proposes an LSTM neural network industrial time series data prediction method based on attention mechanism, which effectively improves the prediction accuracy of industrial time series data. The results show that the prediction effect of the proposed method is superior to the BP neural network model and the LSTM neural network model. The future direction of the paper considers to improve the method, improve its parallel computing capabilities, and shorten training time.