Tool Wear Monitoring Based on Transfer Learning and Improved Deep Residual Network

Considering the complex structure weight of the existing tool wear state monitoring model based on deep learning, prone to over-fitting and requiring a large amount of training data, a monitoring method based on Transfer Learning and Improved Deep Residual Network is proposed. First, the data is preprocessed, one-dimensional cutting force data are transformed into two-dimensional spectrum by wavelet transform. Then, the Improved Deep Residual Network is built and the residual module structure is optimized. The Dropout layer is introduced and the global average pooling technique is used instead of the fully connected layer. Finally, the Improved Deep Residual Network is used as the pre-training network model and the tool wear state monitoring model combined with the model-based Transfer Learning method is constructed. The results show that the accuracy of the proposed monitoring method is up to 99.74%. The presented network model has the advantages of simple structure, small number of parameters, good robustness and reliability. The ideal classification effect can be achieved with fewer iterations.


I. INTRODUCTION
Tool wear state has great influence on the dimensional accuracy, surface integrity and machining efficiency of the workpiece. In the cutting process, due to thermodynamic coupling, chatter, abnormal wear and breakage of tool, these factors directly affect the service life of cutting tools, and even lead to abnormal machine shutdown and personnel injury. Therefore, it is great significance to monitor the tool wear state, it can effectively improve tool durability, reduce production costs, increase production efficiency and processing quality, and raise the reliability and safety of the processing process.
Tool wear condition monitoring can be divided into direct measurement method and indirect measurement method [1]. The direct measurement method directly measures the state change of the tool. The wear area is directly observed by The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang . optical measurement method, isotope method and resistance method. Direct measurement method has high accuracy. Due to environmental constraints, it is inconvenient to conduct direct observation and the cost is high in the actual processing. The indirect measurement method can monitor the tool wear state by analyzing the associated dynamic information during the cutting process. Indirect method is low cost and strong anti-interference ability. The indirect measurement method has become a hot topic in tool wear monitoring with the rapid development of signal processing and intelligent fault diagnosis. The commonly used models of indirect measurement method in tool wear state monitoring include neural network [2], fuzzy inference [3], fuzzy neural network [4], dynamic Bayesian network [5], support vector machine [5], etc. The proportion of tool condition monitoring models is shown in Fig. 1.
Neural networks have made great progress in tool wear monitoring. Deep learning network models, represented by convolutional neural network and deep residual network, have been gradually applied to tool wear monitoring in recent years. Fatemeh et al. [6] enhanced the influence of tool wear in signals. The convolutional neural network (CNN) was used to estimate tool wear and different data sets were validated. It shown that the method was feasible. Convolutional neural networks can automatically identify features from multi-scale signal matrices, the method was verified by S45C steel workpiece end milling experiments with different processing parameters [7]. Wu et al. [8] built the experimental system on the machine tool. The wear image information of all blades in machining gap was obtained by matching frame frequency of industrial camera and spindle speed of machine tool. Then convolutional neural network was used to identify the tool wear state. The method has effectiveness and practicability. The cutting force generated by mechanical force model instead of experimental cutting force, which was used to predict tool wear state. Convolutional neural network was applied to confirmed the feasibility of this method by Su [9]. Convolutional neural network is also widely used in machining surface roughness estimation, bearing fault diagnosis, tool wear detection and other aspects [10].
Residual network has a good performance in tool wear state monitoring [11], intelligent diagnosis of rolling bearings [12] and fault diagnosis of gear boxes [13]. It has strong versatility and operability. Ma et al. [14] used deep residual network to achieve data-driven fault diagnosis, experimental results shown that the detection accuracy of the proposed method is significantly higher than the early faults. Peng et al. [15] fused multi-source information. Deep residual network (DRNN) was used for fault diagnosis of rotating machinery. The presented method performed well in feature learning, model training, noise resistance, fault tolerance and fault diagnosis.
Although deep learning has made great achievements in the field of fault diagnosis. But the deep learning model still has some flaws. The training of deep learning needs a lot of data, once the data is insufficient, it will cause the phenomenon of over-fitting. The maximum pooling layer of deep learning is poor in retaining local information, it is difficult to process missing data, complicated weight of network structure and easy to ignore the correlation of attributes in data set. In addition, the parameters of the model increase exponentially when hidden layers are increase. It is huge to train a deep learning network model in the number of label samples, time and computing power.
To solve the above problems, this paper put forward a tool wear state monitoring method based on Transfer Learning and Improved Deep Residual Network. Firstly, wavelet transform is used to process cutting force signals for eliminating the influence of manual features and obtaining the required spectrum. Secondly, convolution series omits a large number of training tuning parameters. Thirdly, the residual module is improved to place the LeakyReLU activation function before the convolution layer. Fourthly, the Dropout layer is added to randomly lose network units and weights connected to them during training to reduce overfitting and improve accuracy. Fifthly, global mean pooling technique is used to reduce the number of training parameters and test time of the model. Finally, all the convolutional layers of the pre-training network are frozen to construct the required full connection layer for Transfer Learning after the construction of the Improved Deep Residual Network and the tool wear state monitoring is realized.

II. DATA PREPROCESSING
The data collected in this paper were the cutting force signals of a certain type of aero-engine blisk. The signal was a one-dimensional time-varying unsteady signal. When the deep learning model in the field of Computer Vision (CV) is a pre-training model, one-dimensional signals should be converted into two-dimensional image data. There are two transformation methods. The first method is based on data reconstruction, which directly intercepts and splice the onedimensional cutting force signal to reconstruct the twodimensional image. The method is simple in operation, but the frequency domain information of the signal is ignored. The second method is based on time-frequency domain conversion, including Fourier Transform (FT) [16] and Wavelet transform (WT) [17]. Fourier Transform (FT) has poor ability to characterize non-stationary signals, which may lead to information loss when extracting deep features. Wavelet transform (WT) uses window function with fixed area and variable shape to balance time resolution and frequency resolution through multi-resolution analysis. Therefore, wavelet transform will be used in this paper to process the data.

A. WAVELET TRANSFORM
French scholar Morlet [18] put forward wavelet transform. It has been widely used in signal processing, image processing, applied mathematics and many other fields. It is mainly used in filtering, denoising, compression and transmission in signal processing and analysis. The wavelet function is defined as follows: VOLUME 10, 2022 If (t) ∈ L 2 (R) satisfies the admissibility condition: Then (t) is called an admissible wavelet (integral wavelet, fundamental wavelet), Whereˆ (ω) is the Fourier transform of (t). The wavelet function generated by the fundamental wavelet can be expressed as: where, a is the expansion factor and b is the translation factor. The area of a,b (t) is fixed, the size of a only affects the length of a,b (t). The discrete wavelet function can be written as: The discrete wavelet transform coefficient can be represented as: The reconstruction formula of discrete wavelet transform is as follows: where C is a constant independent of the signal.

B. BLISK PROCESSING TEST AND SIGNAL ACQUISITION
The workpiece material was titanium alloy TC17. The disk and blades used in the test were provided by Zhuzhou Diamond Cutting Tools Co., LTD. The blade arrangement of disk milling cutter was left, middle and right. The blade model is indexable YBG212. It is made of hard alloy steel with TiAIN non-coating. The parameters of disk and blade were shown in Table 1. The powerful compound machine developed by Northwestern Polytechnical University was used. The processing method was climb milling and cooling by emulsion. The monitoring device for cutting forces was the piezoelectric force measuring instrument Kistler 9255B. The tool wear in flank was measured by the IFM-G4 automatic tool measuring instrument. The dynamometer was fixed on the work table and the workpiece was fixed on the dynamometer with four 12mm bolts. The instantaneous cutting force signal was amplified by multi-channel amplifier Kistler 5080 and transmitted to computer by data acquisition card PCI-DAS602/16. The data was recorded by DEWESoftX2 software and the cutting force signal was collected. The data acquisition process was shown in Fig. 2.

C. PROCESS THE DATA
According to the spindle speed with 70r/min and the feed speed with 20mm/min, 30mm/min and 40mm/min, the cutting force signals were divided into three categories. A total of 5164 sets of data were used to analyze tool wear at different feed rates. All-pass filter was used to remove interference from cutting force signal. The cutting force signal, spectrum diagram and scale diagram were showen in Fig. 3, Fig. 4, and Fig. 5, respectively. The first Type data groups (S with 70r/min and F with 20mm/min) were shown in Fig. 6, the second type (with 70r/min and F with 30mm/min)S were shown in Fig. 7 and the third type (S with 70r/min and F with 40mm/min) were shown in Fig. 8.   The input signal was decomposed into some subband signals by wavelet transform. Some of these subband signals can be used to represent the trend of the signal, the coefficients of these subbands can be set to 0. The signal was reconstructed by discrete wavelet inverse transformation to remove the trend of cutting force signal. The reconstructed signal was shown in Fig. 9. Fig. 9 shows the first group of data reconstruction signals of the first type.

III. DEEP LEARNING MODEL FRAMEWORK A. CONVOLUTIONAL NEURAL NETWORK
Convolutional Neural Network (CNN) is a typical deep learning network architecture inspired by biological visual perception mechanism. Typical convolutional neural networks are composed of convolutional layer, pooling layer, full connection layer and Softmax classification layer, as shown in Fig. 10.
The convolutional layer is the core of the convolutional neural network, which can extract the required features according to the objective function. The convolution layer is to extract features by calculating the inner product of the overlapping region between the 2d convolution kernel and the corresponding input image and traversing every pixel on the whole image through a nonlinear activation function. The mathematical model of the convolution layer is expressed as follows: where x l j is the value of the j feature map of layer l, f is nonlinear activation function, x l−1 i is the ith feature map value of layer l − 1, k is convolution kernel, b l j is the j-th bias parameter of layer l. The activation function used is Rectified Linear Units(ReLU), its mathematical model is as below: The pooling layer is usually located after the convolution layer to reduce the size of the feature graph and introduce invariability. Its mathematical model is represented as follows: where down()is a sub-sampling function. The full connection layer is the last layer of convolutional neural network, which is used to perform classification and regression tasks. Its mathematical model is written as below: whereŷ is the output of full connection, W is the weight matrix multiplied by the input eigencolumn vector x, b is the offset column vector, f (·) is the activation function. The full connection layer adopts SoftMax function as the activation function output. The SoftMax function is defined as: where w l ij is the weight from the j neuron to the i neuron in layer l − 1.

B. IMPROVED DEEP RESIDUAL NETWORK
He et al. [19] presented the deep residual network to solve the problem of gradient disappearance in convolutional neural network extracting deep features. The deep residual network [20] is composed of multiple residual modules with jumping connections. The residual module can effectively solve the problems of gradient dispersion and gradient explosion after network deepening. The residual module is defined as follows: x l+1 = f (y l ) where x l and y l represent the input and output of the residual module, W l is the weight matrix, F (x l , {W l }) is the residual mapping to be learned by the network, f (·) is the Rectified   Linear Units (ReLU). For the layer residual module, the learning characteristics from shallow to deep are written as follows: When the corrected Linear Units (ReLU) [21] input is less than 0, the weight cannot be updated and the subsequent training is in a silent state. Therefore, the advanced activation function LeakyReLU was used instead of ReLU function to improve the robustness of the model. The mathematical expression of the LeakyReLU function is as follows: where ε is a small constant. It can retain some values of the negative axis. There is still a small gradient of non-zero output   The residual module is shown in Fig. 11. By learning residuals, part of the original input information is directly transmitted to the next layer through the identity mapping layer to alleviate the problem of feature loss in the information transmission of the convolution layer. This structure improves the expression ability of the model and avoids the degradation caused by the deepening of network layers.
The residual optimization structure is adopted to facilitate the construction of deep network and reduce the number of network parameters and computation, as shown in Figure.11. Firstly, a 1 × 1 convolution layer is used for dimensionality reduction. Then, another 1 × 1 convolution layer is used for dimensionality reduction after the 3 × 3 convolution layer. Compared with the traditional residual structure, the proposed structure not only ensures the accuracy of calculation, but also reduces the number of parameters and saves calculation force. In Fig. 12, d is the input data dimension and the parameters in the box are the convolution kernel size and output channel of each layer.
The data set collected in the actual operation often contains a certain amount of abnormal data that will induce the network to learn as a rule, lead to over-fitting problems and reduce model accuracy. Therefore, Dropout technology will introduce into the residual module in this paper. Dropout technology was proposed by Hinton et al. [22] to solve the overfitting problem. The idea of Dropout technology is to randomly discard some of the hidden layer neurons to reduce the chance of abnormal data to learn and diminish the impact of abnormal data on the network. The operation diagram of Dropout is shown in Fig. 13. Figure 13(a) shows the deep feedforward network and Figure 13(b) shows the deep feedforward network after Dropout. The number of intermediate features, redundancy and complex co-adaptive relationships between neuron are reduced by adding a Dropout layer. It can also increase the orthogonality between the features of each layer and avoid over-fitting.
Furthermore, the LeakyReLU activation function is placed before the convolution layer to achieve direct connection between the input and output, which maximizes the retention of transmitted information. Since the number of redundant parameters in the fully connected layer is not conducive to the operation of the network, the global average pooling layer is adopted in this paper to replace the full connection layer. The global average pooling directly pools the input features without neurons, which can simplify the network structure and omit a lot of network parameters. The internal structure of VOLUME 10, 2022  the Improved Residual Module is shown in Fig. 14. The structure of the Improved Deep Residual Network is displayed in Fig. 15.

C. TRANSFER LEARNING
The realization methods of Transfer Learning are divided into sample transfer [23], feature transfer [24], model transfer [25] and relationship transfer [26]. The model transfer method was adopted in this paper. The Improved Deep Residual Network was used as a pre-training network model, in which all convolutional layers were freezed, all required connection layers were built and global fine-tuning was performed. In Transfer Learning, synthetic samples are used as source domain D s and the training set after data enhancement is used as target domain D t . Samples in source domain and target domain follow the same label distribution. The data from D s is pre-trained to the model and the parameters are saved. The weight matrix can be recorded as W s . W s was applied to initialize the parameters of the target network. The data of D t was used to retrain the whole network and a new model was obtained through fine-tuning.
Yosinski et al. [27] evaluated the migration ability of convolution layers at different positions through a large number of experiments. It was found that the features extraced from the lower convolutional layer had strong migration ability, while the features extraced from the higher convolutional layer that related to specific tasks were not suitable for migration and needed to be retrained on new data sets.
Therefore, the proposed network is adopted freezing of the low level network parameter and the high level network parameters that migrated from the network parameters trained by cutting force data. In this paper, most of the architecture of the Improved Deep Residual Network model was retained when pre-training the network model. The fully connected layer was rebuild. The parameters of the first seven residual network modules are frozen and used as shallow feature extractors. The parameters of the 8th residual module are randomly initialized to learn the deep features. Softmax classification of layer is used to obtain the tool wear status.

D. TRANSFER LEARNING WITH IMPROVED DEEP RESIDUAL NEURAL NETWORKS
Aiming at the deficiency of traditional deep learning model, an Improved Deep Residual Network model based on Transfer Learning and Improved Deep Residual Network was proposed for tool wear monitoring. Transfer Learning is adopted to train the target domain to establish the sample recognition and classification model, which can avoid the time and computing power wasted by repeated training on the network model. The deep residual network is used to solve the problems of gradient dispersion, gradient explosion and degradation after network deepening. The residual module is improved to enhance the robustness of the model during training. The schematic diagram of Transfer Learning and Improved Deep Residual Network framework is shown in Fig.16, and the basic process is shown in Fig.17.
Firstly, the data was preprocessed as follows: • The cutting force signals were collected.
• The one-dimensional cutting force were converted into two-dimensional signals by wavelet transform.
Then, the deep residual network was set up as follows: • The LeakyReLU activation function was applied to replace ReLU function in the residual module to improve the robustness of the model. • The LeakyReLU activation function was placed before the convolution layer to maximally retain the transfer information.
• Dropout technology was introduced into the residual module to discard part of the redundant information in the network, which can improve the over-fitting phenomenon of the network.
• The global average pooling layer was used instead of the fully connected layer to simplify the network structure.
Finally, the model was migrated as follows: • After the Improved Deep Residual Network was built, all convolutional layers were frozen and the fully connected layer was reconstructed for Transfer Learning.
• The tool wear state monitoring was completed by training network.

IV. TOOL WEAR MONITORING APPLICATIONS A. EXPERIMENTAL ENVIRONMENT AND PARAMETER INDEX
Deep learning framework Matlab2021a was adopted to verify the feasibility of the presented network model in this paper. The experimental environment is shown in Table 2.
To improve the performance of the proposed networks in this paper, ablation experiments were used to determine the optimal network parameters. The parameters of the ablation experiment are shown in Table 3.

B. EXPERIMENTAL RESULTS
In the case of wear, the cutting force data were collected in this paper. The spindle speed was 70 r/min, the feeding speed were 20 mm/min, 30 mm/min and 40 mm/min. A total of 5164 sets of data were divided into three categories. The ratio of training set, validation set and test set is 6:3:1.
In the Improved Deep Residual Network model, and then Transfer Learning. The accuracy of the ablation experiment training process was shown in Fig. 18(a). The Model 1, Model 2 and Model 3 are the same network models with different parameters and the Model 4 is the Improved Deep Residual Network without Transfer Learning. The accuracy VOLUME 10, 2022   Fig. 18(b). The confusion matrix of Model 1 was displayed in Fig. 19, where Fig. 19(a), Fig. 19(b), Fig. 19(c) and Fig. 19(d) are for the training set, the validation set, the test set and the overall dataset, respectively.

C. ALGORITHM CONTRAST
In order to verify the reliability of the proposed model in this paper, comparison experiments were conducted using

V. DISCUSS
The one-dimensional convolutional neural network was used to train time signals for bearing fault diagnosis by Zhang et al. [30] Due to the initial phase difference of time samples, the learning process of convolutional neural network would be interfered. Therefore, wavelet transform was applied to convert the collected one-dimensional timevarying unsteady cutting force signals into two-dimensional data and the two-dimensional signals were used as the input of the deep learning network model in this paper. Wavelet transform has good effect on characterizing abrupt and singular signals and is suitable for processing cutting force signals. Moreover, all-pass filter is adopted to reduce the correlation between different features extracted.
Tool wear monitoring based on deep learning has been studied deeply. Zhang et al. [31] adopted wavelet packet decomposition to process the cutting force data and fused the signal characteristics. The sample error was 8.2%. However, the problem of gradient dispersion or gradient explosion caused by the deepening of network layers was ignored. Deep residual network was applied to solve this problem in the paper. The residual structure was improved to replace the traditional ReLU activation function with an advanced LeakyReLU activation function and introduce a Dropout layer. In order to make full use of effective resources and save computational power, the Improved Deep Residual Network was used as the pre-training model and the model migration method was adopted to tool wear monitoring of machining aero-engine blisk.
To verify the feasibility of the proposed network, ablation experiments were used to compare network models with different parameters. The proposed method was compared   with several commonly used machine learning methods in section 4. The traditional model K algorithm with the highest accuracy of shallow learning was 79.9% and the VGG16 network with the highest accuracy of deep learning was 95.5%. The input of shallow learning was one-dimensional time-varying unsteady cutting force signal without wavelet transform. The shallow learning was difficult to deal with missing data, prone to over-fitting problems and easy to ignore the correlation of attributes in the data set, which lead to low accuracy in generally. Data was preprocessed and wavelet transform was used in traditional deep learning.
But the problem of network degradation caused by gradient explosion and gradient disappearance had not been solved.
The spectrum graph was directly input the presented method of Transfer Learning and Improved Deep Residual Network, which can avoid the complex process of feature extraction and data reconstruction and can learn representative features directly from raw data Moreover, it can combine feature extraction and fault diagnosis in a model (end-toend). The structure nodes of the proposed network in this paper are 112 layers. The structure nodes of the traditional deep residual network ResNet50 are 177 layers. The accuracy rate of the proposed network was 8.07% higher than that of ResNet50. Experimental results showed that the proposed network simplified the network structure, improved the computing efficiency and enhanced the robustness.

VI. CONCLUSION
A tool wear monitoring method based on Improved Deep Residual Network and Transfer Learning was presented in this paper. Dropout layer was introduced to prevent the occurrence of overfitting phenomenon. The LeakyReLU activation function was added to improve the robustness. The Transfer Learning model was applied to pre-trained to improve the accuracy and generalization performance of the model monitoring. It was compared with convolutional neural network, support vector machine and other machine learning. The conclusions were as follows: • The proposed model is suitable for tool wear monitoring.
The data is directly input the model without manual feature extraction of processed 2D data. The ''end-toend'' model structure has better operability and versatility. The model can be changed by modifying parameters, which has strong flexibility and growth.
• Compared with traditional machine learning, the accuracy of the presented model was 99.74% and the VGG16 network with the highest accuracy in traditional machine learning was 95.5%. The proposed model has excellent fault diagnosis performance.
• The feasibility of the proposed method was verified by cutting forcee signals in this paper. Other types of signals can be used to verify the method in future studies. The method can also be applied to other mechanical equipment fault diagnosis.