Research on Database Failure Prediction Based on Deep Learning Model

Effective management of the database plays an important role in the development of the grid business and the reduction of operation and maintenance costs. For the oracle database, there are many factors affecting its performance. It is difficult for the oracle database to predict possilble problems with the common method. This paper proposes an analysis strategy based on oracle AWR report. By introducing a self-encoding deep learning model to construct a database failure prediction mechanism. The experiment shows that the failuture prediction model of this method has better prediction performance than the prediction algorithm with omitted feature learning.


Introduction
The database bears the functions of storing and reading data, which is an important part of the information system. Database failures may lead to paralysis of information systems, which poses a great threat to enterprise operations and production safety. Data loss may have irreversible effects on enterprises. A roughly statistics on information system failures occurred in a large enterprise in recent years, and found that database failures accounted for the highest proportion, so database performance is the focus of information system operation and maintenance personnel [1] [2]. With the development of artificial intelligence technology, combined with deep learning, machine learning and other methods to carry out database fault automatic diagnosis analysis and early warning research work .It's the current research hotspot to improve information system operation and maintenance automation and intelligence. The traditional shallow neural learning model has strong dependence on features, so the representation ability is limited. Deep learning can build a network model with multiple hidden layers, use massive data to train the model, and realize automatic extraction of more complex and useful features [3].
The research and application of deep learning in the field of regression prediction is an important direction for future research. Forecasting the failure of the database can provide timely and effective prevention and control measures. This paper uses the deep learning model and its improved algorithm to conduct research based on the oracle AWR report data. This paper proposed a deep learning model suitable for oracle database failure prediction, which further broadens the application field of deep learning.

Related work
Deep learning is a branch of artificial intelligence and has made breakthroughs in many fields in recent years. SAE is one of the most important structural models in deep learning.AWR is a bunch of historical performance data placed on the SYSAUX tablespace.

Stacked Auto Encoder (SAE)
The Auto-encoder (AE) is the basic component of SAE [4]. The AE includes an input layer, a hidden layer, and an output layer. The three layers are connected step by step. The AE model sets the training target to fit the input data and then uses the back-propagation algorithm to train. Although the training process of the AE model is based on a supervised learning algorithm, the original data is not required to have a classification label, so the entire training process is still an unsupervised learning process. If a sparse penalty is added to the training, the number of activated units in the network is limited, a sparse automatic encoder is formed. If random noise is added to the input data during training, the denoising automatic coding is performed. These two models often learn better data characteristics in practice.
Stacked autoencoder is a deep neural network which is composed of multi-layer autoencoders.It is widely used for dimensional reduction [5] and feature learning [6] in deep learning methods.

AWR Report
The AWR report is a performance collection and analysis tool provided by Oracle. It can provide a report of the entire system resource usage over a period of time. Through this report, we can understand the whole operation of a system, which is like a comprehensive medical report. The performance indicators for the database are generally related in three places, io, memory and cpu.
These three are the key information that needs attention in the AWR report.

Proposed technology
The deep learning technology is used to solve the big data analysis problem in the Oracle operation and maintenance process. Data preprocessing should be done first. Then build and train the depth model to extract more advanced data features from the original data. After extracting the data features, they can be directly classified into the classifier, or they can be applied to the classifier together with the original information.

Operation and maintenance feature data
This paper mainly considers the data form and essential characteristics of the oracle AWR report, so deep learning is used to construct the prediction model. Every day, a large number of oracle AWR reports monitor data generation, and the problem is highly reproducible. Behind its massive nature, such real-world data necessarily implies certain common features. The performance index parameters of the Oracle database are shown in Table 1 [7]. This paper summarizes the operation and maintenance data, and automatically finds the common features from the unlabeled monitoring data to describe the samples by designing the learning model.Then it performs the database running state determination output for only a small number of samples and inputs them as predictions. The tagged data set trained by the classifier eventually forms a model to actual prediction.

Deep learning model based on self-encoding
Auto-encoder is a fast learning model in deep learning. Its basic principle utilizes the hierarchical structure of artificial neural networks [8]. The AE transforms the input of the visible layer to the hidden output layer and then reconstructs through the hidden layer so that the target output of the autoencoder is almost equal to the original input itself. In this research question, the form of the network input/output node is consistent based on the synthetic feature vector which is constructed by the operation and maintenance monitoring parameters. Each node of the network input corresponds to one element of the feature vector, and the network output node also corresponds to the element form in the feature vector one by one.
The self-encoding network used in this paper is a three-layer structure, including input, single hidden layer and output layer. The structural model is shown in following Figure 2. It is difference from the traditional neural networks, which are directly learned and then used to predict problems. Self-encoding networks only focus on hidden layer weighting parameters without performing classification operations.
w : The value of the hidden layer weight.
Deep learning theory considers this eigenvalue to be a new eigenform expression of the deep features of the characterization model pattern library which obtained through machine learning. Using the characteristics of self-encoding network learning without the original feature vector input to the classifier for pattern classification can greatly improve the classification accuracy, and even exceeds the performance of the current best classification algorithm in many problems [9].
The model is solved using the general gradient descent method. The solution process is actually the process of obtaining the hidden layer weight by iterative approximation. Since the input feature vector has been normalized, the sigmoid function is used as the transform kernel function of the hidden layer: After obtaining the hidden layer weight w , the function ) (z f can be determined as the feature transform kernel of the vector. The learning function of the self-encoding network accomplishes the mining of deep features of the operational data sample data. It doesn't provide predictive classification of new samples. The second step is to design a pattern classifier which predicts the operation and maintenance state around the transformed feature kernel obtained by learning.

Softmax-based predictive model
In the classification problem, a labeled learning sample set must be provided for reference by the learning machine. After the classifier obtains the classification ability through learning, the newly input feature samples can be classified. This paper uses the Softmax model [10] to build a predictive classifier.
(1) Constructing a training set: Setting the class label for each vector based on prior knowledge. The classification represents the state to be output by the prediction model which results in a set of labeled learning vectors.
(2) Solving the Softmax model is a generalization of the logistic model on the multi-classification problem. Using a hypothesis function to estimate the probability value ) | ( p x j y   for each class j for a given training sample set input L. Then define the hypothesis function ) ( h x  as follows: Finding the model parameters that minimize the cost function. The final pattern prediction classifier is obtained by iteratively determining the parameters.

Results and analysis
The simulation and verification of the model is based on a month's Oracle operation and maintenance monitoring data of a power company. According to the construction requirements of the predictive model, two types of learning sets, namely the feature learning set for determining f (z) and the classified learning set for determining the parameters of the Softmax classification model which need to be separately organized. The finishing process of the two learning sets is as follows: a) Feature learning set, set the feature learning sample size, to ensure the universality of feature learning, use random extraction to select from the total sample set; b) Classification learning set, Set the classification learning set sample size. The classification learning set needs to set the category label for each sample feature vector in the collection. After the learning set is completed, the simulation and verification work is carried out according to the process shown in Figure 3. The self-encoding network adopts the three-layer structure (input, hidden layer, output) discussed above.Ensure the number of input layer nodes and the number of output nodes , the number of hidden layer nodes is set to half of the number of input nodes.Using the method of this study to learn the characteristics of different learning set scales, set the iterative upper limit of feature learning, and train the Softmax classifier based on the post-learning features, and then verify the classification accuracy in the test set. In order to facilitate the performance comparison, the support vector machine and the Softmax classifier are used to learn and verify the accuracy on the original data vector.
The prediction classification accuracy of 8 different learning sets is shown in Figure4.  6 It can be seen from Figure4 that the SVM prediction classification accuracy based on the original feature vector is higher when the learning sample size is small. The prediction accuracy of this method is the worst. However, with the increase of the training samples of oracle AWR data input, the accuracy of the prediction of this method is gradually increasing with the growth of the learning set, and finally maintained at around 87%. However, the SVM prediction without self-coding learning is followed by learning. As the scale of the episode increases, its prediction accuracy gradually decreases, but it remains at around 68%. The Softmax direct classifier that omits the feature learning process is compared, and its performance isn't significant as the size of the learning set increases.
By comparison, the prediction model proposed in this paper highlights the secondary feature learning of database operation and maintenance monitoring data, and its performance has better learning characteristics than the direct prediction classification without feature learning. This advantage is very suitable for this type data mining.

Conclusion
Based on deep learning, the self-encoding network is used to mine the deep features of the Oracle operation and maintenance data. Combined with Softmax regression method, the classifier is constructed and a model suitable for oracle database failure prediction is designed and implemented. The simulation proves that the model's prediction accuracy for operation and maintenance management is higher. The vision of the research work is to build a database automatic operation and maintenance management system suitable for enterprise level. This paper is only a small part of the work. Its value lies in the construction of a higher level deep learning network model based on its learning mechanism. Next, further research is needed.