Method for predicting cutter remaining life based on multi-scale cyclic convolutional network

In the process of predicting the remaining cutter life, the deep-learning method such as convolutional neural network does not consider the time correlation of different degradation states, which directly affects the accuracy of the remaining cutter life prediction. To extract the features with time-series information to predict the remaining cutter life more effectively, this article proposes a new deep neural network, which is named the multi-scale cyclic convolutional neural network. In the multi-scale cyclic convolutional neural network, a multi-scale cyclic convolutional layer is constructed to memorize the degradation state at different moments and to mine the timing characteristics of multiple sensor data. Multi-scale features are extracted through multi-scale convolution, and the convergence of parameters is improved by layer-by-layer training and fine-tuning. Finally, the remaining cutter life is predicted based on the features. The comparison with the published prediction methods of convolutional neural network and recurrent neural network models proves that our method (multi-scale cyclic convolutional neural network) is superior in improving the precision and accuracy of remaining cutter life prediction. This method breaks through the limitations of the convolutional neural network prediction model in this field and provides a theoretical basis for evaluating the remaining service life of the cutter.

1. For the sensor noise interference problem, a large-scale convolutional kernel is used to filter signal noise. 2. A multi-scale convolutional kernel is used to extract multi-scale features and maintain global and local features to improve network capacity and model feature learning capability. 3. A multi-scale cyclic convolutional layer is constructed to store degradation state information, and this layer is used to mine the time-series

Introduction
In advanced manufacturing systems, the high performance of a machine tool is the key to producing highquality machined surfaces, and the main cause of cutter failure is cutter contact tip wear. To ensure machining accuracy within the cutter life cycle, the industry generally adopts excessive protection strategies, resulting in additional machining cost. Therefore, if the remaining cutter life (RUL) of the cutter can be accurately estimated, which can be fully utilized and reduces the purchase of cost. Moreover, the workload can also be greatly reduced. However, due to the intermittent contact between the cutter and the artifacts, it is a challenge to capture the dynamic characteristics of the cutter wear mechanism, which severely restricts the efficiency and accuracy of cutter RUL prediction. To develop an effective RUL prediction system, scholars have carried out more and more researches. At present, RUL prediction methods are mainly divided into two types, namely methods based on physical models of RUL prediction and methods based on data-driven of RUL prediction. 1,2 The method based on the physical model is to use the wear failure mechanism of the tool to mathematically model the entire wear degradation process of the tool, and then use the empirical formula to optimize the parameters of the mathematical model, and finally predict the RUL. However, due to the complex failure mechanisms of different tools, it is difficult to establish an accurate mathematical model in practical applications. On the contrary, the data-driven method only needs to input the data characteristics, and does not require the empirical formula of the physical model and the complex failure mechanism. Therefore, in recent years, data-driven RUL prediction methods have become more and more popular. Benkedouh et al. proposed RUL based on support vector regression to predict the tool. First, extract feature vectors from the vibration signal, force signal, and acoustic emission signal provided by the 2010 Predictive Health Management (PHM) data set, and then regression predicts the RUL of the tool, but the result shows that the error is large. Wu et al. 3 proposed a tool remaininglife-prediction method based on random forest model, which extracts 28 feature vectors from cutting force signals, vibration signals, and acoustic emission signals and uses these 28 feature vectors to train random forest. The model is used to predict the tool wear value, and the experiment shows that it has good predictive performance. Drouillet et al. used artificial neural networks (ANNs) to predict the RUL of the cutter using the motor spindle power. Yan and Lee 4 proposed a logistic regression model to predict the RUL of the drill bit, and predict the RUL of the drill bit by establishing the relationship between the vibration signal and the wear value.
Niaki et al. 5 used wavelet analysis to extract the time-domain and frequency-domain features of multisensor signals for recurrent neural network (RNN) model tool wear prediction; improved RNN model through the application of sensor information fusion in tool wear estimation. Research shows that its generalization performance can be up to 13%. Drouillet et al. 6 studied the relationship between RUL and machine tool spindle power to predict RUL and found that the error range between the predicted RUL of the tool and the real RUL is very small, which proves that the machine tool spindle power value is a very effective feature vector. Corne et al. 7 used the neural network to input the spindle power signal and vibration force signal data for processing. The study showed that the use of power signal and vibration force signal data to predict tool flank wear value error is about 0.4%-18.4%. Kong et al. 8 studied tool wear based on the kernel principal component analysis method based on integral radial basis function and Gaussian process regression (GPR). The study showed that the kernel principal component analysis has smoothness and GPR's confidence interval range; at the same time, it is better than neural network and support vector machine in improving the accuracy of tool wear prediction. But the disadvantage is that these models largely depend on the sensitivity of the extracted features, which is usually realized through expert knowledge. 9-12 Tobon-Mejia et al.'s 13 study is based on the dynamic Bayesian network (DBN) model to predict and identify the remaining service life of the tool. Kaya uses all neural networks to verify the reliability of the model in the prediction of milling tool wear. Therefore, machine learning is used by many scholars to extract tool degradation characteristics to predict its remaining life. [14][15][16][17] As one of the data-driven methods, deep-learning methods 18 can automatically extract features based on raw sensor data and build corresponding prediction models. 19 Among deep-learning techniques, convolutional neural network (CNN) 20 has received special attention in tool RUL prediction because of its huge advantages in processing time-series signals. To take advantage of the powerful feature extraction capabilities of CNN, Babu et al. used CNN for the first time to solve the cutter RUL prediction problem, and through experiments proved that this method is significantly better than multilayer perceptron and support vector regression method. Other researchers have also studied RUL associated with various signals by adopting deeplearning methods such as CNN and RNN in recent years. 14,21,22 Although deep-learning models have great advantages in automatically extracting the features, these models cannot obtain feedback information and memory information from time-series data. The two information features in the sensor data divided by various noise signals with a long duration may lead to prediction failures. During operation, with the passage of time, the tool degenerates from a normal wear state to a completely blunt failure state, which is a gradual degradation process over time. Correspondingly, the degradation state of the cutter at different moments is related to the time scale. However, the existing research ignores this dependence in the network construction process, which affects the accuracy of the prediction model and limits its promotion. Scholars have proposed many methods to reduce the dimensionality of the original data to improve the accuracy of the remaining-lifeprediction model. However, they rarely mentioned the influence of the time-series information existing among the data on the prediction accuracy. Therefore, the establishment of correlation models of different degradation states is very important to predict the RUL of the cutter accurately.
To effectively use the timing information which is hidden in the signal, this article proposes a new deeplearning method, namely the multi-scale cyclic convolutional neural network (MSRCNN) to predict the RUL. The basic principle of MSRCNN is first, by constructing a new multi-scale cyclic convolutional network layer to memorize the timing of different degradation states and to mine the timing characteristics of the original data. Then, multi-scale features are extracted through multi-scale convolution; global and local features are retained; and network parameters are optimized by layer-by-layer training and fine-tuning. Finally, the remaining service life is predicted based on the characteristics.
The main contributions of this article are as follows: (1) in response to the problems of sensor noise interference, a large-scale convolution kernel is used to filter out signal noise; (2) a multi-scale convolution kernel is adopted to extract multi-scale features, maintaining global and local features, and improving the network capacity and model feature-learning ability; (3) construct a multi-scale cyclic convolution layer to memorize the information of the degraded state, which is adopted to mine the time-series characteristics of the original data and improve the remaining-life-prediction accuracy of the model.

Cyclic convolutional layer
As the convolutional layer of the core component of CNN, it does not require manual intervention and can extract useful features from the input data. However, there is no recurrent layer in the convolutional layer, which means that the signal only flows forward in the CNN, and the output cannot be fed back to the input. Correspondingly, only the current input information in each time step is considered and the previous degradation information is ignored in CNN. In particular, the existing prediction methods based on CNN cutter RUL cannot solve this issue and leads to reduce their prediction accuracy and generalization ability. Therefore, in this article, a new cyclic convolutional layer is constructed to solve this problem and improve the prediction performance of the agorithm. Different from the convolutional layer, a cyclic connection is added between the output and the input in the cyclic convolutional layer, so that the information is transmitted cyclically instead of one direction. In the cyclic convolutional layer, the output information is fed back to the input through the cyclic connection, and the degradation of information over time is memorized. Therefore, the output of the cyclic convolutional layer depends on the current input state information and the previous state information in the past input memory. Through this dynamic tuning, the time sequence characteristics of the input data can be fully mined and the temporal correlation model of different degradation states can be established in the cyclic convolutional layer.
In theory, the cyclic connection enables the cyclic convolutional layer to feed back output information from the input sensor data to the input, forming a cyclic information flow, rather than a one-way flow. However, in practical applications, the convolutional layer often encounters the problem of gradient disappearance during the training iteration process. In order to reduce the effect of the disappearance of the gradient and capture long-term correlations, a gated selection mechanism is introduced in the recurrent convolutional layer, 23 whereas the gated selection mechanism does not exist in long short-term memory (LSTM) networks. 24,25 By introducing a selective mechanism, the recurrent convolutional layer is able to appropriately forget or emphasize information from previous moments as well as the current moment. On one hand, the reset gate is able to determine the extent to which past information is forgotten, which will effectively allow the network to forget some previously irrelevant information. On the other hand, the update gate controls the amount of information passed from the previous state to the current state, which helps the network to remember long-term information and eliminate the problem of gradient disappearance. Thus it is able to capture the dependencies on different time scales adaptively.
As shown in Figure 1, is the nonlinear activation function, x t jÀ1 is the input time sequence sensor data, and h i tÀ1 = x 0 tÀ1 is the storage state fed back by the loop connection at time step t -1. Two gated networks are created in the gated loop convolutional layer, namely the reset gate r t jÀ1 and the update gate u t jÀ1 as given by where d(Á) is the logistic sigmoid function, * represents the convolution operator k i r ,w i r ,k i u , and w i u are the convolution kernel, b i r and b i u are the bias terms. When the time step is t, the state of the cyclic convolutional layer x i t can be expressed as follows where h i t represents the newly generated state, tanh(Á) is the activation function, k i h and w i h are the convolution kernel, b i h is the bias term, and 8 represents the Hadamard product (matrix elements correspond to multiplication). It can be seen from the equation (3). When the time step is t, x i t is a linear mapping between the state at the previous moment and the current state, the reset gate and refresh gate control its current state.

Multi-scale and one-dimensional sensing data
The input data of MSRCNN is various parameters collected by multiple sensors. To comprehensively utilize all the sensor data, this article uses the sliding window strategy to construct multi-channel one-dimensional sensor data. The process can be expressed as follows among them, I is the constructed multi-channel data, w is the width of the sliding window, m is the life span, t is the number of corresponding channels represents different sensing parameters. The detailed process of data generation is shown in Figure 2. Convolution kernels of different sizes can extract data information from different time scales in the convolutional network. During the tool wear degradation stage, as time goes by, more and more correlated degradation features have been recorded. The monitoring data collected by multiple sensors such as vibration signals, acoustic emission signals, acceleration vibration signals, etc. are also different. Therefore, in the deep prediction network, if a single convolution kernel is used to automatically extract feature information, it will cause the prediction accuracy of the model to decrease, because the degradation information will be lost in the learning process. In order to avoid this problem, this article proposes a multi-scale learning strategy. As shown in Figure 2, three convolution kernels with inconsistent sizes are arranged in parallel in the multi-scale learning strategy, namely 1 3 F, 2 3 F, and 4 3 F, to extract sensitive features from the input sensor data signal. In the learning process to fully extract degradation information on different time scales, thereby ensuring the integrity of the features.  Before entering the RUL prediction network, the three extracted feature vectors are concatenated together as the overall input.

The overall layout of MSRCNN
The architecture of MSRCNN proposed in this article is shown in Figure 3. The proposed MSRCNN includes structures including multi-scale cyclic convolutional layer (MSRCL), convolutional pooling layer (PL) and convolutional fully connected layer (FCL). In order to comprehensively use multiple sensors to monitor the measured data information, this article uses a multiscale learning strategy to integrate multiple sensor data information as the input of the multi-scale cyclic convolutional network. Then, createing N recursive convolutional layers and N PLs and connecting them together to automatically extract the degradation information in the sensor data, and finally establish and predict different degradation state models. In the recursive convolution layer, the number i of the cyclic convolution layer is set to 1, 2, 3,..., N, the size of the convolution kernel is 1 * k, which the number is 2 iÀ1 , and the cyclic convolution layer has the same other parameter settings. For the N -1th PL, the maximum pooling sampling function is used, and the last PL uses the global maximum pooling sampling function. At the same time, N recursive convolutional layers are converted into a vector of size 2 N À1 M, and then the vector is input to the subsequent FCL for RUL prediction estimation. In this article, the number L of FCL is set to 3. In the first two FCLs, the number of neurons that are activated nonlinearly using the corrected linear unit (ReLU) is F, in the third FCL, the number of neurons used for RUL prediction as the output layer of MSRCNN is 1. Every MSRCL in this article is followed by a PL. Every MSRCL in this article is followed by a PL. For the ith MSRCL, the parameter settings are the same, the number of convolution kernels is 2 iÀ1 M, and the size of the convolution kernel is k * 1. For the first N -1 PLs, maximum pooling and non-overlapping sliding windows are used, that is, p = s, and the last PL uses the global maximum PL, FCL has a total of three layers, and the first two FCL neurons use ReLU. The activation function, MSRCL, and FCL all apply dropout and L2 regularization.

Experimental setup
The experimental platform of the CNC milling machine and the installation positions of different types of sensors are shown in Figure 4. The workpiece is cut and the material is removed from the raw material, the original skin layer material with rough particles is removed by face milling, and then the workpiece is milled. A Kistler9265B three-way dynamometer is installed between the workpiece and the processing test bench to measure the cutting force in the form of electric charge and convert it into a voltage signal for storage through the Kistler5019A charge amplifier. Three Kistler piezoelectric accelerometers are installed on the test bench to measure the vibration of the machine tool in the X, Y, and Z directions.

Data description
In order to prove the effectiveness of the method proposed in this article in the prediction of the remaining service life of the tool, this section applies MSRCNN to the experimental data of milling cutters for RUL prediction. The experimental data comes from the New York Society for Forecasting and Health Management (New York Society for Forecasting and Health Management). PHM shares the data for the 2010 high-speed CNC machine cutter health prediction competition. The experimental conditions are shown in Table 1.

Data preprocessing
To unify the data range, all sub-data sets in the data are standardized. The standardization formula is as follows where X i j is the original data of the number of j sensing parameter in line i, and X iÃ j is the data after standardized processing of the number of j sensing parameter in line i.

Evaluation index
To evaluate the performance of the proposed method in RUL prediction, this article uses four evaluation indicators as follows: (1) explained variance score (EVS), explained the variance score of the regression model. Its value range is [0,1]. The closer it is to 1, the more the independent variable can explain the variance of the dependent variable. The smaller the value, the worse the effect. (2) Mean absolute error (MAE), used to evaluate how close the predicted result is to the real data set. The smaller the value, the better the fitting effect.
(3) Mean squared error (MSE), this indicator calculates the mean value of the sum of squared errors between the fitting data and the corresponding sample points of the original data. The smaller the value, the better the fitting effect. (4) R 2 score (R 2 ) coefficient of determination, its meaning is also to explain the variance score of the regression model. Its value range is [0,1]. The closer to 1, the more the independent variable can explain the variance of the dependent variable. The smaller the value, the more that means the worse the effect. Where y is the true value,ỹ is the predicted value, and y the mean value EVS(y,ỹ) = 1 À Varfy Àỹg Varfyg ð7Þ

Results and discussion
In the tool RUL prediction, this article performs five cross-validation on the training data set used, and finally determines the network structure parameters of MSRCNN. The structure parameters include the number of convolution kernels M, the size of convolution kernels 1 3 K, and the number of recursive convolution layers N, the size of PLs p, and the number of neurons F. In this article, the FCL uses dropout and L2 regularization, the recursive convolution layer also uses these two rules, both of them use random V forward propagation to obtain the predicted mean and variance. In addition, the loss function of the MSRCNN network in this article is the mean square error, and the optimizer uses Adam to optimize the loss of the objective function in the equation by updating the weight value and deviation of the MSRCNN network through optimization iteration. The MSRCNN model is trained from scratch for 50 iterations, and its detailed configuration is shown in Table 2.
First, MSRCNN uses multi-scale convolution to extract multi-scale features, and then circulates the convolution layer to simulate the time correlation of different degradation states, mines the timing characteristics of the data, and uses layer-by-layer training and finetuning to improve the convergence of parameters. The loss function of the MSRCNN training process decreases as shown in Figure 5. Set 50 as the number of iterations. It can be seen that after 50 iterations, the training error approaches 0, reaching convergence.
Based on the constructed convolutional layer, this article evaluates the influence of the number of neurons in the MSRCNN layer on the model evaluation index, as shown in Figure 6. This article increases the network depth of MSRCNN by increasing the number of cyclic convolutional layers and PLs. It can be seen that as the network depth increases, the higher the EVS and R 2 values, the lower the MAE and MSE values. Therefore, choosing the right number of neurons is very significant for model optimization parameters.
To analyze its impact more quantitatively, this article uses five different network layers to predict the milling cutter data, and uses the above four evaluation indicators to evaluate the network layers to evaluate the RUL prediction model, as shown in Table 3.
It can be seen that as the network depth increases, the EVS and R2 values are closer to 1, and the MAE and MSE values are getting closer and closer to 0. However, when the network depth is small (when it is 2), its evaluation index is poor, which may be due to its insufficient fitting ability. When the network depth is deeper (6 o'clock), this will result in a heavier calculation burden and increase the cost of calculation time. Unfortunately, the accuracy rate has not been improved. That may be due to the fact that excessive network depth leads to accuracy saturation and even over-fitting. The above analysis of the results shows that it is particularly important to select the appropriate number of neurons for the prediction model. Therefore, based on the above data analysis, this article sets the number of cyclic convolutional layers N to 4.
The MSRCNN model is mainly composed of the CNN and the RNN. To verify the effectiveness of the model, the model is compared with the CNN model Weight decay coefficient, l 10 À5 Figure 5. Training iteration process. and the RNN model. Taking the two sets of cutter data of milling cutter 1 and milling cutter 2 as the test set, the above three models are used to predict their remaining service life, respectively. The prediction results are shown in Figure 7, wherethe x-label is time, and the ylabel is the percentage of the remaining life corresponding to the sample at the current moment in the life cycle. The blue and green, brown solid lines represent the predicted life values, and the red solid lines are the actual life values, respectively. Figure 7 shows that the goodness of fit of the MSRCNN model is best compared to the other two models and are closest to the real RUL. Table 4 shows that the MSRCNN model proposed in this article is better than CNN and RNN in the evaluation indicators EVS, MAE, R 2 , and MSE, which shows that MSRCNN can provide more accurate RUL prediction results, and its performance is steadier. The comparison results proved the MSRCNN model is conducive to the improvement of accuracy and further improves the accuracy of RUL. Due to the powerful feature extraction ability of multi-scale convolution and the ability of circular convolution to excavate the time sequence characteristics of data, the introduction of time sequence features can reduce the prediction error.

Conclusion
In this article, a new prediction framework MSRCNN for RUL prediction of cutting cutters is proposed. The proposed MSRCNN takes the time-series data collected by different sensors as input and constructs a cyclic convolution layer to simulate the degraded state of the cutter to mine the time-series characteristics of the data. Then, multi-scale features are extracted through multi-scale convolution, and the parameters are optimized by layerby-layer training and fine-tuning. By periodically  superimposing multiple cyclic convolutional layers and maximum PLs, feature information is automatically extracted from the input data. Finally, by inputting these learned features into the subsequent FCL to estimate RUL, CNN and RNN prediction methods are compared. Experimental results show that compared between the existing CNN-based prediction models and RNNbased prediction models, the proposed MSRCNN has obvious advantages in accuracy and convergence. We can conclude that our proposed method overcomes the limitations of the CNN prediction model in the evaluation of the remaining service life of the cutter.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported in part by the East China Jiao Tong University Fundamental Research Funds (grant no. 2003419018) and by the National Natural Science Foundation of China (grant no. 52067006).