An Automatic Intelligent Diagnostic Mechanism for the Milling Cutter Wear

The abrasion of milling cutters is an important factor that affects the accuracy of a workpiece. The intervals between cutter changes is based on the burr condition of the edges on the finished products as well as their dimensional precision. Delayed replacement of cutters will result in a degradation of workpiece quality and it is important that the wear of cutters be monitored in a timely manner. In this study the actual vibration signals generated in a milling process were measured using an Automatic Intelligent Diagnosis Mechanism (AIDM) to determine cutter wear. The AIDM included two feature extraction approaches and three classification methods. The first approach used the Finite Impulse Response Filter (FIR) with Approximate Entropy (ApEn) for feature extraction. The second approach was nonlinear feature mapping using a fractional order Chen-Lee chaotic system. This used chaotic dynamic error centroids and chaotic dynamic error mapping for status identification. After feature extraction the results were substituted into a Back Propagation Neural Network (BPNN), Support Vector Machine (SVM), and a Convolutional Neural Network (CNN) for identification. The results of the experiments showed that a Chaotic Dynamic Error Map of the fractional order Chen-Lee chaotic system in the AIDM had an identification rate of 96.33% using a convolutional neural network. In addition, it was shown that the AIDM model could automatically select the most suitable feature extraction and classification model from the input signal and could determine the wear level milling cutters.


I. INTRODUCTION
Tool wear is an unavoidable problem in machine tool use. Damaged or worn tools will result in defects in the finished products and will seriously effect workpiece quality. Excessive tool wear requires extra work to solve surface roughness issues and defects revealed by additional quality inspection of workpieces [1], [2]. Timely inspection and analysis of tool wear is essential to reduce the cost of manpower, cutting tools and workpieces. Tool wear is more clearly defined using ISO stipulated standards to determine the status of cutting tools. The methods for the diagnosis and lifespan analysis of tool wear are all practical. Different types of sensor, such as The associate editor coordinating the review of this manuscript and approving it for publication was Longzhi Yang . accelerometers [3], microphones [4], dynamometers [5]- [7], and others are used to gather the data in actual machining experiments. Some studies have used audio signals generated by the cutting tool during machining. Variations in the vibrations generated by interaction of the tool with the workpiece during machining are classified using a single signal preprocessing method. However, using this method it is difficult to detect over fitting, should this occur, when the amount of data submitted to the trained model for testing has become very large. In such a case identification efficiency of the classification model is reduced. Most of the signal preprocessing methods presently used for tool wear detection transform the time domain signals into frequency domain signals for analysis. For example, when Fourier Transforms [8], [9] and Wavelet Transforms [10], [11] are used for feature extraction, VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the feature mapping capabilities of Fourier Transforms for non-stationary and time-varying signals are limited, and an accurate description of the features of local signals is not possible. Although Wavelet Transform can accurately describe local information, real time processing is difficult to achieve because of the huge amount of computation involved [12]. Therefore, in this study an Automatic Intelligent Diagnosis Mechanism (AIDM), is proposed that captures the vibration signals generated during interaction between a milling cutter and workpiece using a single accelerometer. Feature extraction is done using two different approaches. The first involves a Finite Impulse Response Filter (FIR) [13], [14] for signal decomposition after which feature extraction is done using Approximate Entropy (ApEn) [15]- [17].The second approach uses a fractional-order Chen-Lee Chaotic system [18], [19] to conduct nonlinear feature mapping and the Chaotic Dynamic Error Centroid and Chaotic Dynamic Error Maps are selected as features of status identification. Finally, the feature extraction data obtained by these two different methods are used for identification by (1) a Back Propagation Neural Network (BPNN) [20], [21], (2) a Support Vector Machine (SVM) [22], [23] and (3) a Convolutional Neural Network (CNN) [24], [25] respectively, to find the most suitable classification model and feature extraction method for signal testing.

A. EQUIPMENT,EXPERIMENTAL AND RESEARCH STRUCTURE
In this study a QUASER MV154-C CNC milling machine was used as the experimental platform, see FIGURE 1 (a). The device used for measuring the vibration signals generated by cutting action was a Metra Mess-und Frequenztechnik KS943B.100 3-axis accelerometer, see FIGURE 1 (b). The signal acquisition module used was an NI-9234, see FIGURE 1 (c). Specifications of the CNC machine: the maximum stroke of the X, Y, and Z-axes are 762 mm, 530 mm, 560 mm. The frequency range of the KS943B.100 accelerometer is 0.5 Hz to 22,000 H. The signal acquisition module has 4 channels and the sampling rate is 51.2kS/s. The sampling frequency used in the study was 10,000 Hz. The end milling cutter used in this article is a WEENIX model with a 4-flute number of teeth and a diameter of 10mm. The specific processing conditions in this article are: Spindle speed: 3000rpm Feed rate(mm/min): 300 Cutting depth: 0.5mm. The computer used for analysis and identification uses an Intel (R) Core (TM) i7-7700 CPU 3.60 GHz, has 16 GB of RAM, and an NVIDA GeForce GTX-1080 graphics card. Software used was Matlab 2019a with the Machine Learning Toolbox 11.5, the Deep Learning Toolbox 12.1 and the Neural Network Training Toolbox.
The flowchart of the experiments is as shown in FIGURE 2. The process involves the capture of cutter vibration signals from the 3-axis accelerometer during workpiece machining by the signal acquisition module NI-9234. The signal data is processed and identified using the Automatic Intelligent  Diagnosis Mechanism (AIDM) which will be introduced in section B of part II. The AIDM automatically selects the different extraction methods for the various classification models and applies them to the input data. In this way the most suitable feature extraction method is selected that also gives the best final identification rate and the most suitable classification model. The AIDM used in this study utilizes two different approaches to feature extraction.
The first approach involves the use of a FIR with ApEn, and the fractional order Chen-Lee chaotic system. A transformation between the time domain and the frequency domain for the vibration wear signals (as slight, moderate, and severe wear) for an end milling cutter is done using the Fast Fourier Transform (FFT). The features are identified as bands that indicate wear status and Bandpass filtering is done for each condition which can be displayed as a frequency domain response diagram. The filtering range is from 2280 Hz to 4500 Hz and the feature bands of the three types of tool wear status can be retained. The regularity of signal feature values of the three states obtained after Bandpass filtering can be distinguished by ApEn. In the second approach nonlinear feature mapping of the time domain vibration signals of each of the three cutter wear states is done using the chaotic system. The actual test signals are then subtracted from the rated ideal signal to obtain a Chaotic Dynamic Error Map with the coordinates of the Chaotic Dynamic Error Centroid as the identification features. In the analyses and comparisons done in the experiments, the y coordinate of the chaotic centroid and the chaotic dynamic error map are used as features for the identification of cutter wear, and thus the number of features is same as in the first approach. Finally, training and identification is done using the three classification models such as BPNN, SVM and CNN. The classification categories include light, moderate, and severe wear. The experimental signals include two sets: the training set and the testing set, as shown in TABLE 1. The first set of signals are the pre-collected milling signals of end mill A and the collected milling signals of end mill B in real time. Feature extraction was conducted for the pre-collected and realtime collected milling signals, and the pre-collected milling signal sets were used for model training and verification. The milling signals collected in real time were used as the testing set. To verify the AIDM performance proposed in this article, the second set of signals was processed in the same way as first set. The second set included the pre-collected milling signals of end mills A and B, and the collected signals from end mill C in real time. The pre-collected milling signals were used as the model training set and for verification. The milling signals collected in real time were used for model testing. According to ISO 8688-2 specifications [26], a milling cutter with an abraded area VB greater than 0.6mm is regarded as damaged. Therefore, in this study a maximum abrasion area of 0.6 mm on an end mill was regarded as severe wear, 0.3 mm as moderate wear, and 0.1 mm as slight wear. The tool wear experiments were carried out based on this classification and the results of the two proposed AIDM feature extraction methods were analyzed and are presented in Sections A and B of part IV. Identification with these two different feature extraction methods was conducted using BPNN, SVM and CNN, and the results of classification and discussions are presented in Sections C and D of part IV.

B. THE AUTOMATIC INTELLIGENT DIAGNOSIS MECHANISM
The AIDM data sources were classified into two categories, the pre-collected signal data and the real-time signal data from the machine. The pre-collected data was used for training and the verification model. The real-time data acquired from the machine was used for status evaluation. The AIDM data processing structure is shown in FIGURE 3. To reduce the classification model training time, the pre-collected data was processed for feature extraction. A portion of the signal features obtained by feature extraction was used for model training and the remainder was used for model verification. After completion of model training, real-time signals from the machine were extracted and used to test the training model. The classification model was compared and ranked according to the test results. The design of the AIDM algorithm used in this study is shown in the flowchart in FIGURE 4.

III. FEATURE EXTRACTION IN AIDM A. THE FINITE IMPULSE RESPONSE FILTER
To find the frequency bands features associated with slight, moderate, and severe wear of the tool, transformation of the VOLUME 8, 2020 vibration signals in the time domain was conducted for the three states using a Fast Fourier Transform (FFT). Bandpass filtering was then done using Finite Impulse Response (FIR). The filtering range used was from 2280 Hz to 4500 Hz to retain the wear features of the frequency bands associated with the three states. The finite impulse response filter, also known as an FIR digital filter, is used to change the frequency spectrum of input signal through computation. The FIR digital filter has a certain distinct advantage in that all the vertexes are within the unit circle after Z conversion. This makes it more stable than the Infinite Impulse Response Filter (IIR) [27]. Furthermore, it does not require feedback in FIR and hence optimization is easier than that for IIR. The input-output function of FIR (1) is as follows: The input signal is x(n), the output signal of the FIR filter is y(n), K is the filter order, and b k is the impulse response of the filter which also represents the filter coefficient.

B. APPROXIMATE ENTROPY
In this study Approximate Entropy (ApEn) was used to distinguish the signal difference and regularity of the three states after filtering, and to calculate the signal feature values of the three states. ApEn is a feature extraction method and complexity indicator that can be used to find similarity between segments of signal data as well as data features in numerical form as the difference between data points [28], [29]: Let x(1), x(2) · · · x(N ) represent the data of a time series; where N represents the total number of data points, and the data is formed into an m dimensional vector set X (i) in order, as shown in Formula (2): d |X (i), X (j)| is the distance between X (i) and X (j), and the maximum distance between these two elements is defined in Formula (3): The noise filter coefficient r is given and the number of data that meet the condition of d[x(i), x(j)] ≤ r is recorded. The sum of this value is then compared with N − m + 1 to obtain the ratio of similar numbers to the total number, as defined in Formula (4): The logarithm of C m i (r) is used for calculating the average of all i, and it is recorded as φ m (r), as defined in Formula (5): Add 1 to the dimension, giving m + 1, to obtain C m+1 i (r) and φ m+1 i (r), and the value of ApEn can be obtained from Formula (6): Because the length, N of the actual data point cannot be infinite, the above formula is rewritten as:

C. DYNAMIC ERROR OF THE FRACTIONAL ORDER THE CHEN-LEE CHAOTIC SYSTEM
Chaos theory is part of nonlinear system theory and one of its characteristics is high sensitivity to variations of input signal.
The output signal will show significant changes for even the slightest change in input signal. Chaotic systems have strange attractors and their output motion has an orderly and non-periodic trajectory. In this study the Grünwald-Letnikov (G-L) differentiation model was used to enhance the dynamic error variation rate of the chaotic system. The Chen-Lee chaotic system and the Grünwald-Letnikov (G-L) fractional order model are both discussed in this section. The Chen-Lee chaotic system is a chaotic system based on dynamic systems and it is used for non-linear conversion mapping of extracted signals. It uses a Chaotic Dynamic Error Map and the coordinates of Chaotic Dynamic Error Centroid as identification features. The Chen-Lee dynamic equation is as shown in (8).
The above formula is rewritten in the form of the rated ideal signal and the actual testing signal systems, to obtain the dynamic error of the Chen-Lee chaotic system, as shown in (9) and (10).
The dynamic error of the system can be obtained from subtracting the rated ideal signal system from the testing signal system. The above two formulas are rewritten in matrix form [30], [31] to facilitate subtraction of the two systems, as shown in (11) and (12).
In this article the concept of processing the discrete signal and expressing the time domain vibration signals (Chaotic dynamic) of the rated signal system and the testing signal system is introduced: On the basis of the above mentioned concepts the parameters of the rated signal system as in (11) (12) to obtain the dynamic error of the chaotic system. The dynamic error equation of the chaotic system can be expressed as in (15). The approximate fractional differential equation for the dynamic error of the fractional order chaotic system can be expressed as in Formula (16). Where σ is the order, m is a real number, and the differential computation can be performed when 0 < σ < 1. The dynamic error equation of the fractional order Chen-Lee chaotic system can be obtained from fractional differentiation of Formula (16) and Formula (15) [18], [32], as shown in Formula (17).  (17) and they are defined according to fractional calculus, α , β , and γ are the system parameters as in Formula (18); and 0 < α < (−β − γ ) is required for Formula (18) to realize the dynamic error of the fractional order Chen-Lee chaotic system [32]. In this article the experimental results are compared and the y coordinates of the chaotic centroid and the chaotic dynamic error map are taken as the features for the analysis of milling cutter wear status.

IV. THE AIDM CLASSIFIER A. BACK PROPAGATION NEURAL NETWORK
The Back Propagation Neural Network is a combination of Multilayer Perceptron (MLP) [33] and Back Propagation (BP) [34]. The Back Propagation Neural Network is a motoring type of learning network and is the most representative and widely applied neural network. The BPNN includes the forward propagation of network input and back propagation of output errors. If the actual output does not reach the expected level, an error message is sent back from the output layer to the input layer. The purpose of learning can be achieved by successively modifying the weights to reduce the error level.
The basic structure of the Back Propagation Neural Network is shown in FIGURE 5. The structure of the neural network includes the processing unit, the layers, and the network. The processing unit is the most basic computing unit. The ''layers'' are formed by several processing units with the same function and include the input, hidden and output layer which together form the neural network. The Back Propagation Neural Network is a typical multi-layer network structure formed by the input, hidden and output layers. The layers are connected through the computation of weights and errors, and the output value of the previous layer is processed as the input to the next layer with a nonlinear activation function. The formula is as follows: where S j is the jth input value of the hidden layer unit; w ij is the weight for connecting the processing units in different layers; x i is the input value of the ith unit; and b j is a bias value. When an error is generated between the predicted value and the actual value, the back propagation neuron will pass the error back to the hidden layer to correct the weight and bias value of the model. The input value of the training data will be passed to the next layer based on the new value of weight and bias of each layer, to correct errors until the value is within the acceptable range when it will stop. The objective function to reduce the difference between the output and target value of the neural network, is shown in (20): where y (i) is the actual value;ŷ (i) is the predicted value.

B. SUPPORT VECTOR MACHINE
The support vector machine is a widely applied classification technology for machine learning and pattern identification. The support vector machine was first used to solve the two classification issue. The learning approach of the Support Vector Machine learning system is the calculation of an ideal hyperplane through statistical learning theory, so that the data between various categories has the Maximal Margin and the Minimal Misclassification, and the optimal plane consists of the support vectors. H is the classification hyperplane; H 1 and H 2 are the samples that are closest to the classification hyperplane in each of the categories respectively, they are parallel to the classification hyperplane and the distance between them is the margin. For optimal classification of the hyperplane it is necessary that these two types of sample be properly separated to maximize the classification distance.
Where H 1 = wx + b = 1, H 2 = wx + b = −1 and x is the input feature vector; w is the normal vector of hyperplane; b is a constant and the interval between two hyperplanes is 2/ w . Therefore, to maximize the distance between the two planes, w needs to be minimized. The following condition must be fulfilled to make the same type of sample data fall together on the same side of the hyperplane: The above formula is combined and rewritten as: To minimize w and fulfill the conditional expression to achieve the optimal classification hyperplane, we combine the above expressions and obtain the following objective function:

C. CONVOLUTIONAL NEURAL NETWORK
The Convolutional Neural Network (CNN) is a deep learning network that is capable of high-performance image processing. Its characteristic is that the feature of each layer in the convolutional neural network is obtained through convolution and an activation function after full-value sharing by the local area of the previous layer. The CNN reduces complexity by a reduction of neurons and feature extraction which also significantly reduces the training parameters for images within a certain dimension [35], [36]. CNN can use an original image directly as input to the network and this results in its wide application in the field of deep learning. The convolutional neural network has Input, Convolution, Pooling, Activation, Fully Connected and Output Layers [37]. FIGURE 6 is a diagram of the convolutional neural network designed for this study. It was used as a classification model for subsequent AIDM. The convolutional layer extracts image features and conducts convolution computations using masks of different sizes for spatial filtering to enhance image features and extraction. In general, the pooling layer includes the Max Pooling Layer [38] and Average Pooling Layer [39]. The activation layer performs nonlinear conversion through the activation function to obtain the features. Commonly seen activation functions include the Sigmoid (24) and ReLU (25). However, the Sigmoid function has a gradient loss problem. In contrast, the output of ReLU is relatively stable [40]. Therefore, to retain the most significant features of the images, prevent gradient loss and increase the computation speed of the network, maximum pooling and ReLU were used for parameter settings.   FIGURE 8 (a-c). It can be seen that the feature band of slight wear was 3272 to 3344 Hz, that of moderate wear was 4292 to 4395 Hz, and that of severe wear was 3105 Hz to 3223 Hz. A Bandpass filter was used to filter all signals between 2280 and 4500 Hz to facilitate subsequent feature extraction and classifier identification. Feature extraction was done after Bandpass filtration, the ratio of similar numbers to the total number was calculated and dimensionality was reduced all using ApEn computation.  FIGURE 9 is the feature signal map after feature extraction with ApEn. It can be seen that although the difference between slight wear and moderate wear with FIR and ApEn can be seen, the curve traces of moderate and severe wear are very similar, these two states cannot be independent, and this may cause classifier error.

B. NONLINEAR FEATURE EXTRACTION OF THE CHEN-LEE CHAOTIC SYSTEM FRACTIONAL ORDER
Nonlinear feature mapping of the Time Domain signals of the three states of the tool in FIGURE 7 (a-c) was conducted and the sensitivity of the chaotic system allowed the different states to be individually separated by slight changes of signal intensity. The experimental results were compared and the y-axis coordinates of the chaotic centroid and the chaotic  dynamic error map were taken as the physical quantity for the analysis of tool wear. The distribution of the feature signals for integer order and fractional order were also compared, see FIGURE 10. As the feature signal distribution map of the integer order 1 to the fractional order 0.2 showed insignificant change, the feature signal distribution maps using integer order 1, fractional orders 0.9, 0.7, 0.5, 0.3 and 0.1 were all tried.
The feature distribution in the three states were compared, see FIGURE 10. When the order was 0.1, the feature distribution of the three different wear states were independent and the size and distribution interval of each could be clearly identified. This was not the case with the other orders. Therefore, the fractional order 0.1 was used as the parameter for subsequent experiments. The chaotic dynamic error map and centroids were substituted into the classifier for subsequent training and testing. The dynamic error map VOLUME 8, 2020 of the chaotic system with fractional order 0.1 for the three cutter wear states is shown in FIGURE 11(a-c). The difference between the three wear states after nonlinear mapping through the fractional order Chen-Lee chaotic system can be clearly seen and the wear state of a milling cutter can be easily determined.

C. TRAINING AND IDENTIFICATION,BPNN AND SVM
There are no fixed specifications or basis for the internal parameter settings of the Back Propagation Neural Network, so the constant parameters used for training, the number of neurons, and the amount of data training were used. In this study, the training function used in BPNN for AIDM was a Scale Conjugate Gradient Algorithm. The GPU was also used for conducting auxiliary computations and the number of neurons was set to 1000. In TABLE 2 it can be seen that the first set of experimental signals were 300 milling data signals from end mill A for model training; 300 realtime data signals were also collected from end mill B and substituted into the trained model for testing. Using the same parameters, the second set of experimental signals sent 600 pre-collected milling data signals from end mills A and B for model training. The 300 collected real-time data signals from end mill C were used as the testing set and substituted into the trained model for testing. The features obtained from the computation of ApEn and the y-axis coordinates of the chaotic centroids with the fractional order 0.1 are substituted into BPNN for training and testing using the same parameters and data as in the experiment. The difference in identification rate between the preprocessing methods for the two types of data was compared and the results can be seen in TABLE 2.
The experimental parameters for SVM are same as for the BPNN. The pre-collected milling signals were used as the training set and sent for model training, and the real-time collected signals were used as the testing set. The features obtained from the computation of ApEn and the y-axis coordinates of the chaotic centroids with the fractional order 0.1 were substituted for training and testing just as for the BPNN. The difference in identification rate between these two types of feature extraction methods was compared, see TABLE 3.

D. TRAINING AND IDENTIFICATION OF THE CONVOLUTIONAL NEURAL NETWORK
Matlab 2019a was used in this study for verification and the Deep Network Designer was used for the implementation of a self-made convolutional neural network as well as the verification classifier, see FIGURE 6. The dynamic error map of the chaotic system with fractional order 0.1 for the three states of tool was substituted into the convolutional neural network for training and testing. In this section the training and testing sets from the second set of signals was used as an example. TABLE 4 shows the number of samples in the CNN training set. The training set samples were substituted into the convolutional neural network for training, and the classification identification rate and the number of iterations are shown in FIGURE 12.  In Figure 12 it can be seen that the accuracy rate reaches 100% after 60 iterations and the loss function is 0. This indicates that the model has been completely trained. The images to be tested are substituted into the trained model and the number of image samples in the second CNN testing set were as shown in Table 5. The results of substituting the testing set images from the first and second set of signals into the convolutional neural network model are shown in TABLE 6, presented in the form of the number of test samples and the number of correct identifications. The identification rates were 96.33% and 89.33% respectively.

VI. CONCLUSION
In this study the AIDM framework was used in an investigation of the wear rate of end milling cutters used in production. Two different signal preprocessing approaches were used to carry out feature extraction and subsequent classifier identification. The first approach used FIR filtration with ApEn. Although this approach could be used to successfully detect differences between the state of slight and moderate wear, the results did not show independency and this could cause errors in classification. The second approach used a fractional order Chen-Lee chaotic system. The characteristic high sensitivity of a chaotic system to small changes of input proved advantageous. Feature mapping was done and the chaotic dynamic error maps and the y-axis coordinates of the chaotic centroids were used as identification features. The features extracted with these two approaches were used in the classifier for subsequent identification. The results showed that the AIDM designed in this study had an identification rate of 98.66% with the BPNN classifier for the first signal test set. For the second test set the identification rate was 89.33% using the CNN classifier. In both approaches the Chen-Lee chaotic system was used for fractional order signal extraction. The fractional order Chen-Lee chaotic system and the AIDM designed in this article gave the best and most effective feature extraction. The fractional order Chen-Lee chaotic system combined with the convolutional neural network also showed good results with the self-designed AIDM structure. The identification rates achieved with the two signal test sets was more than 89%. The results of the experiment also showed that the AIDM method could provide a most suitable classification model as well as excellent signal preprocessing.