Tool Wear Prediction Based on Multi-Information Fusion and Genetic Algorithm-Optimized Gaussian Process Regression in Milling

Tool wear condition monitoring plays a crucial role in intelligent manufacturing systems to enhance machining quality and efficiency. The indirect methods employ various sensor signals to monitor tool wear condition, attracting wide attention in industrial applications. Multi-information fusion technologies can promote tool wear monitoring results to be more accurate and reliable. For improving the prediction accuracy and ensuring the reliability of the indirect methods, this study proposes a tool wear prediction method based on multi-information fusion and genetic algorithm (GA)-optimized Gaussian process regression (GPR). First, wavelet packet denoising (WPD)-based signal processing is adopted to suppress the noise interference of multisensor signals. Then, kernel principal component analysis (KPCA)-based dimension reduction is employed to mine the most sensitive features to flank wear from candidate multidomain features. Next, a fusion model of GPR and GA optimization is designed to establish a nonlinear mapping relationship between sensitive characteristics and flank wear width. Finally, performance evaluations under three sets of milling tests are carried out to validate the effectiveness of the proposed method. Experimental results indicate that the proposed method can lower prediction error and uncertainty of flank wear width compared with other intelligent approaches, promoting a successful application of indirect monitoring methods in milling.


I. INTRODUCTION
N UMERICAL control milling is frequently used in various industrial fields, such as automobile [1] and aerospace [2]. The machining processes involve continuous material removal, which increases tool wear and may even result in cutting tool failure [3]. Tool wear condition directly affects the surface quality of finished workpieces [4]. Furthermore, according to relevant statistics [5], cutting tool failures account for about 20% of the machine downtime, and 3%-12% of machining time is lost on tool replacement. Therefore, tool wear monitoring has become increasingly critical to enhance the workpiece surface quality, reduce maintenance downtime, and improve production efficiency.
Many studies have been conducted throughout the years to promote the development of tool wear monitoring techniques, including direct and indirect measurement methods [6]. Measuring direct indicators (such as flank wear width or area [7]) by a microscope [8] or camera [9], the direct approach can reach high measurement precision. However, it is often performed offline or interfered with by cutting chips and coolant fluids, which are limited in industrial applications [10]. The indirect approach mines wear information from real-time sensor signals (such as cutting force, vibration, acoustic emission (AE), spindle motor current or power, and cutting temperature) [11], which can avoid interrupting continuous cutting processes. Therefore, the indirect method is more suitable for a practical application and has attracted wide attention and research.

A. Limits of Prior Arts
Recently, due to the complementarity and fault tolerance of different signals, multisensor fusion approaches enabled by artificial intelligence technologies have been widely applied in indirect tool wear monitoring [12], [13]. Among them, artificial neural network (ANN)-based methods (especially deep learning-based) have achieved outstanding performance in tool wear monitoring, benefiting from their powerful feature mining and data mapping abilities. However, their performances heavily rely on massively available samples with wear labels and a similar data distribution between testing and training samples [14]. However, because of data collected cost This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and machining parameter variation, these two requirements are difficult to be met in practical applications, resulting in their performances deteriorating sharply. Moreover, these networks are generally regarded as black boxes and trained based on an empirical risk minimization principle, leading them to fall into a local minimum easily, especially under small samples [15].
As another of the most frequently used algorithms in tool wear monitoring, support vector regression (SVR) has clear interpretability and takes a structural risk minimization principle as an optimizing goal [16]. It holds excellent fitting and generalization abilities under small samples. However, like ANN-based approaches, SVR-based methods cannot provide an uncertainty estimation of predicted results directly [17]. Besides, other intelligent algorithms, such as least squares SVR (LSSVR) and random forest (RF), are also inconvenient to give uncertainties of their predicted results in indirect tool wear monitoring.

B. Research Motivation
The uncertainty analysis of predicted results is crucial to improve the stability and reliability of indirect methods. As an intelligent decision-making algorithm solved by Bayesian inference, Gaussian process regression (GPR) can provide uncertainty quantification of predicted results compared with other algorithms [17], such as ANN, SVR, and LSSVR. In addition, GPR holds better adaptability to deal with complex problems of high dimension and small samples, benefitting from its powerful nonlinear fitting ability and flexible parameter-solving approach [18]. Thus, GPR can monitor tool wear status more effectively and reliably under small samples. However, GPR is generally trained by a conjugate gradient algorithm, which makes its performance rely on the initial value and is prone to falling into a local optimum [19].
As an effective global optimization algorithm, the genetic algorithm (GA) is inspired by the natural selection and evolution mechanism [20]. Since the optimizing process does not depend on gradient computations, it has strong robustness and global search ability [21], which can effectively optimize the hyperparameters of GPR. However, the GA-optimized GPR model is still developing in tool wear monitoring, especially integrating into multisensor information fusion methods under small samples and multiple milling parameters.

C. Main Contribution
To further improve the accuracy and reliability of indirect methods under small samples, this study presents a tool wear prediction method based on multi-information fusion and GA-optimized GPR in milling operations. The major contributions are summarized as follows.
1) An indirect tool wear prediction method is developed to estimate flank wear width using multisensor signals, which increases the monitoring accuracy of tool wear conditions and reduces the prediction uncertainty of the indirect approach. 2) A fusion model based on GPR and GA optimization is designed to establish a nonlinear mapping between flank wear width and its sensitive multidomain features. It can achieve lower prediction error and provide uncertainty quantification of the predicted results. 3) Milling tests under multiple operating parameters are performed to verify the effectiveness of the proposed method, which can obtain better performance than other advanced approaches, promoting a successful practice of the indirect measurement technique under small samples. The rest of this article is organized as follows. Section II reviews related works about indirect tool wear monitoring. The experimental setup, data acquisition, and the proposed methodology are described in Section III. Section IV elaborates on and discusses model performances and comparative results. The conclusion is drawn in Section V.

II. LITERATURE REVIEW
The indirect approach builds an intelligent prediction model for tool wear monitoring based on massive data historically collected from various sensors and then makes decisions upon the real-time signals online [22]. As shown in Fig. 1, it mainly consists of data acquisition, signal preprocessing, feature extraction and selection, feature fusion, and predictive model construction. It has received a lot of attention, and many achievements have emerged over the years.
In terms of data acquisition, cutting force [23], vibration [24], AE [25], motor current [26] or power [27], and cutting temperature [28] are the most frequent signals used to monitor tool wear. Recently, taking advantage of the complementarity and fault tolerance of various signals, multisensor signal fusion can comprehensively monitor tool wear information. It makes monitoring results more reliable and accurate and has been widely used in tool wear condition monitoring [13], [29]. Some relevant studies about tool wear prediction using multisensor signals are listed in Table I. It can be found that a combination of cutting force and vibration signals is one of the most common multisensor fusion strategies used for tool wear monitoring in machining processes. In addition, the sensitivity of cutting force and vibration signal to tool wear is different in the x-, y-, and z-directions [30]. Therefore, this study collects 3-D cutting forces and vibration signals to predict the flank wear width of milling cutters.
In terms of signal preprocessing, Kalman filtering [51], median filtering [52], bandpass filtering [53], and Wiener filtering [54] are usually used to alleviate the environmental Utilizing various basis functions and subdivided frequency ranges, wavelet packet denoising (WPD) has an advantage in effectively suppressing time-varying noise, thereby enhancing the signal-to-noise ratio of collected signals [55], [56]. Therefore, WPD-based signal preprocessing is employed in this study to reduce the interference of environmental noise.
In terms of feature extraction and selection, time, frequency, and wavelet domain (WD) features are first extracted to decrease the redundant information of multisensor signals. Generally, time domain (TD) features include maximum, peak-to-peak, variance, root mean square, skewness, kurtosis, and so on [34]. The frequency spectrum and its statistics are often utilized in the frequency domain (FD) [35]. Also, wavelet energy is frequently adopted in the time-frequency domain [36]. In this study, considering the complementarity of different domain features, multidomain features are extracted from multisensor signals. To further reduce the redundant information contained in initial features, various dimension reduction techniques have been applied for feature selection and fusion, such as kernel principal component analysis (KPCA) [30], [34], [57], minimum redundancy maximum relevance (MRMR) [58], and locally linear embedding (LLE) [59]. Among them, KPCA can effectively fuse principal components from numerous features, and it is a popular technique for nonlinear dimensionality reduction in tool wear monitoring [30], [34], [57]. Thus, in this study, the KPCA algorithm is adopted to fuse candidate features and obtain the most sensitive characteristics related to tool wear.
In terms of intelligent prediction models, ANN, SVR, LSSVR, multiple linear regression (MLR), RF, and GPR are commonly adopted to predict the flank wear of cutting tools, which are listed in Table II. Among them, ANN is one of the most frequently used intelligent models. Gao et al. [39] employed a gated recurrent unit for multisensor fusion and lowered the tool wear predicting error. He et al. [41] used stacked sparse autoencoders to improve tool wear prediction performance. Liu et al. [42] and Xu et al. [43] developed  [45] proposed an ANN-based multisensor fusion model to predict tool wear. Feng et al. [49] introduced an attention mechanism to the convolutional neural network and improved the monitoring accuracy of tool wear using multisensor features. Although ANN-based methods (including deep learning methods) have achieved excellent performance in tool wear monitoring, monitoring results heavily depend on the following requirements. One is that mass training samples with wear labels are available [60]. Another is that testing samples obey the same data distribution as training samples [61]. However, these requirements are hard to be met in industrial applications due to the acquired and labeled costs of massive data and parameter variations of machining operations [14]. Moreover, ANN-based models are generally trained by an empirical risk minimization principle and easily fall into a local minimum under small samples [15]. In addition, taking advantage of clear interpretability and structural risk minimization principle, SVR and its variants are frequently used algorithms in tool wear monitoring [30], [34]. For example, Benkedjouh et al. [62] used SVR to predict life prediction of cutting tools. Kong et al. [63] applied v-SVR to learn the correlation between flank wear width and fused features. Taking the squared error instead of tube error as the objective function, LSSVR can reduce the calculative burden and increase modeling accuracy. Zhang et al. [66] applied LSSVR to predict the flank wear width of milling cutters and improved the predicting accuracy. However, they cannot give uncertainty quantification of tool wear monitoring results. Wu et al. [69] and Bustillo et al. [70] utilized RF for wearmodeling processes and obtained higher prediction accuracy than ANN and SVR. Although these methods hold strong fitting and generalization abilities for tool wear monitoring under small samples, they still cannot provide an uncertainty estimation of predicted results easily [17]. However, uncertainty analysis is vital to ensure reliability and enhance the applicability of indirect monitoring methods.
As a Bayesian inference-based intelligent algorithm, GPR can offer an uncertainty quantification to analyze the reliability of predicted results [71], [73]. Kong et al. [17] used GPR to obtain better accuracy than ANN and SVR in tool wear prediction and analyzed a confidence interval (CI) under intrinsic parameters. Zhang et al. [19] and Li et al. [71] employed GPR to predict tool wear and improved the prediction accuracy and reliability. Although these methods have better adaptability to deal with tool wear monitoring problems of high dimensions and small samples, their performances are easily affected by the intrinsic parameters. In addition, GPR is generally trained by a conjugate gradient algorithm, making it hard to obtain the global optimum [19], [74].
As a natural selection and evolution mechanism-inspired probabilistic search algorithm, GA has strong robustness and global search ability [20], [21], which provides a potential solution to improve the performance of GPR. However, the GA-enabled GPR model is rarely investigated for monitoring tool flank wear in milling operations. Furthermore, the current studies scarcely combined them with the abovementioned WPD-based multisensor signal preprocessing and KPCA-based multidomain feature fusion strategies. Therefore, this study presents an indirect tool wear prediction method based on multi-information fusion and GA-optimized GPR in milling operations, which synthetically utilize these methods to improve the predicting performance and quantitatively analyze the reliability.

A. Experimental Setup and Data Acquisition
The milling experiments of die steel with high-speed steel cutters are implemented to validate the presented tool wear prediction method in this study. The experimental setup and data acquisition process are shown in Fig. 2. In the real-time data acquisition process, a dynamometer of Kistler 9347C is applied to monitor cutting forces, and an accelerometer of Kistler 8763B is used to monitor cutting vibrations. Meanwhile, an NI-DAQ system is adopted to collect multisensor signals at a sampling frequency of 2500 Hz. After finishing each milling operation, an optical microscope of the Hirox KH-7700 is adopted to offline measure the flank wear width of milling cutters.
In addition, three sets of milling tests under different machining conditions are carried out to verify the generalization ability of the proposed tool wear prediction method. The milling parameters are shown in Table III. Each set of  milling tests applies 15 milling cutters. Further details of the experimental setup and data acquisition are available in the previous study [30].

B. Proposed Tool Wear Prediction Method
To further improve the prediction accuracy of tool wear and provide an uncertainty estimation of the indirect tool wear monitoring methods in milling operations, this study presents a tool wear prediction method based on multi-information fusion and GA-optimized GPR. Its overall flowchart is shown in Fig. 3.
As shown in Fig. 3, the proposed tool wear prediction method consists of four modules: 1) WPD-based sensor signal preprocessing; 2) KPCA-based multidomain feature fusion; 3) GA-GPR-based predictive model building; and 4) performance evaluation and wear prediction. Each module is elaborated on in the following.

C. WPD-Based Sensor Signal Preprocessing
During the milling operations, the raw collected signals usually include noise information, and thus, signal preprocessing is crucial to enhance the signal-to-noise ratio of the monitoring signals. As one of the most effective time-frequency analysis approaches, WPD can be adopted to suppress the noise portion and enhance the characteristic component of the monitoring signals [75]. Thus, the WPD-based sensor signal preprocessing is performed on the collected signals.
As shown in Fig. 3, the cutting signals are first segmented from the raw collected signals to eliminate the interference of the noncutting phase. Then, for the segmented cutting signals, the outliers are detected and removed to control the unexpected disturbance from the data acquisition system. The outliers are defined as the sampling points above the mean plus or minus three times the standard deviation during a sampling period. Also, each outlier is replaced by the mean of 100 sampling points adjacent to it, considering the time-varying characteristics of the collected signals. In addition, the WPD approach is adopted to suppress the environmental noise contained in the cutting signals. Also, its main steps are summarized as follows.
Step 1-1: The four-level wavelet packet decomposition is carried out on the input time-varying and nonstationary cutting signals after selecting a "db8" wavelet with orthogonality and discreteness.
Step 1-2: The optimal wavelet tree is calculated to determine the optimal wavelet packet basis according to the given entropy criterion.
Step 1-3: The soft thresholding is selected to quantize each decomposed coefficient, considering the continuity and smoothness of the output signals.
Step 1-4: The denoising signal is obtained based on wavelet packet reconstruction using the processed coefficients.
Finally, after performing sensor signal preprocessing based on WPD, the preprocessed signals are utilized to predict the flank wear width of the milling cutter. Correspondingly, taking vibration signals in the z-direction collected from Group 2 as an example, its WPD-based signal preprocessing procedure is given in Fig. 4.

D. KPCA-Based Multidomain Feature Fusion
For the preprocessed signals, TD, FD, and WD features are first extracted to reduce the information redundancy [34]. As listed in Table IV, maximum, peak-to-peak, variance, root mean square, skewness, and kurtosis are adopted in the TD. Spectral skewness and spectral kurtosis are picked in the FD. A wavelet energy feature is chosen in the wavelet (timefrequency) domain. Thus, nine features are extracted from each channel signal within one second, and 54 candidate features are obtained for each milling feed.
Then, the candidate features are normalized to eliminate the effects of the physical unit in each domain, which is calculated where x c represents candidate features extracted from multisensor signals. In addition, an example of multidomain feature extraction and normalization is shown in Fig. 5. It can be seen that not all candidate features are obviously sensitive to the tool wear process. Therefore, the feature selection is significant to obtain the most principal features [34].
As a widely used nonlinear dimensionality reduction approach, the KPCA can effectively solve the most principal components without performing the explicit calculation of the nonlinear mapping [76]. Thus, this study employs it to fuse the most sensitive features and further reduce the dimension of the candidate features. The algorithm of KPCA is briefly introduced in the following.
Step 2-1: The kernel matrix K is computed according to the selected kernel function K (·), which is expressed as follows: where n represents the number of the input samples, x i and x j are the samples of input space, and φ(·) denotes the nonlinear mapping from input space to feature space.
Step 2-2: Centralization of the kernel matrix K is carried out as follows: where I denotes a n × n matrix consisting of the element 1/n.
Step 2-3: The eigenvalues of the centralized kernel matrix ⌢ K is decomposed and computed by solving the following expression: where α = (α 1 , α 2 , . . . , α n ) T represents the eigenvectors and nλ denotes the corresponding eigenvalues.
In particular, the above equation can be derived by where C represents the covariance matrix of the mapping data and v denotes its eigenvector.
Step 2-4: The projection of mapping data φ(x i ) into the eigenvectors v is implemented to obtain the principal component x p i as follows: Note that the number of the principal components is determined by the accumulation variance percentage, which is expressed as follows: In addition, the radial basis function (RBF) is selected as the kernel function, and the expression is given as follows: After finishing multidomain feature fusion based on KPCA, the principal features related to tool wear are obtained, which are utilized to establish a predictive model for the flank wear width of the milling cutter.
E. GA-GPR-Based Predictive Model Building 1) GPR-Based Initial Model Building: As one of the most efficient regression methods, GPR has good adaptability and generalization ability to deal with complex nonlinear problems of high dimensions and small samples [17], [19]. Moreover, as a Bayesian inference-based algorithm, GPR can offer an uncertainty quantification to analyze the reliability of predicted results. Thus, this study adopts GPR to build the initial model of tool wear prediction. The details of its modeling are introduced as follows.
In general, GPR is the kernel-based nonparametric modeling based on Gaussian distribution. For a given training dataset , where x i represents the ith input sample and y i denotes the corresponding observation value, the Gaussian process can be defined as follows: where m(x) and k(x, x ′ ) are the mean function and kernel function, respectively, which are expressed as follows: Considering the noise in the data acquisition, a general GPR model can be given as follows: where ε ∼ N (0, σ 2 n ) represents the independent and identically distributed Gaussian noise with a mean of 0 and a standard variance of σ n .
For any finite number of the observation values, y is also an individual Gaussian process, which can be expressed as where δ i j is the Kronecker delta function.
In general, GPR first learns a prior function using the training dataset X and then predicts a posterior function valueŷ * of the test dataset X * . Also, the joint distribution of the observed values y and predicted valuesŷ * can be expressed as where y = [y 1 , y 2 , . . . , y n ] T , X = [x 1 , x 2 , . . . , x n ], and I n represents an n × n identity matrix. According to the Bayesian theory, the posterior distribution is calculated aŝ where m(ŷ * ) and cov(ŷ * ) are the mean and variance ofŷ * respectively, their mathematical expression is given as follows: In practice, the performance of GPR model is affected by the kernel function. The squared exponential function has been widely used in GPR modeling and obtained more satisfactory prediction results [17], [19]. Therefore, it is selected as the kernel function of GPR model in this study. Also, its expression is shown as follows: where σ s represents the standard deviation of the observation signal and l denotes the characteristic length scale. It can be seen that θ = {l, σ s , σ n } is a key hyperparameter in the GPR model. Generally, the GPR model is trained using the conjugate gradient algorithm, but it is susceptible to the initial value and is difficult to converge to the global optimum [74]. In order to address these limitations, this study adopts an optimization algorithm to solve the hyperparameters.
2) GA-Based Parameter Optimization: GA is a global optimization probabilistic search algorithm inspired by the natural selection and evolution mechanism [20]. Since the optimization process does not depend on the gradient, it has strong robustness and global search ability [21]. Thus, the hyperparameters of the GPR model are optimized using GA. As shown in Fig. 3, the main steps of the GA optimization are briefly explained as follows.
Step 3-1 (Objective Function Construction): In this study, the expression is defined as follows: where θ d and θ u are the upper and lower bounds of the search interval for θ, respectively.
Step 3-2 (Parameters Initialization): There are the maximum evolutionary epoch T , maximum population size M, individual length L, search interval [θ d , θ u ], and initial population P(0). The tth epoch population P(t) is defined as P(t) = {X |X (q) = a 1 a 2 · · · a L , q = 1, 2, . . . , M} (18) where X (q) represents the qth individual of the tth epoch population P(t) and a 1 a 2 · · · a L denotes the coded value of the individual X (q).
Step 3-3 (Individual Evaluation): Calculate the fitness value of each individual according to the fitness function, which is the same as the objective function.
Step 3-4 (Selection Operation): Based on the fitness evaluation of individuals, the selection operation is to inherit the optimized individuals to the next generation.
Step 3-5 (Crossover Operation): The crossover operator reorganizes the optimized individuals and gradually abandons the relatively inferior ones.
Step 3-6 (Mutation Operation): The individual value is mutated by changing some gene values of individual coding in the population. After implementing selection, crossover, and mutation operations, the updated population P(t + 1) can be obtained.
Step 3-7 (Termination Condition Judgment): After completing T iterations, the individual with the best fitness value obtained in the process of evolution is taken as the optimal solution of hyperparameters in the GPR model.
Finally, the optimal parameters of the GPR model are obtained using GA optimization to establish a predictive model for the flank wear width of milling cutters.

F. Performance Evaluation and Wear Prediction
To validate the performance and effectiveness of the proposed tool wear prediction method, four evaluation criteria are utilized in this study, which includes Pearson correlation coefficient (PCC), mean absolute percentage error (MAPE), mean absolute error (MAE), and root-mean-squared error (RMSE) [34]. The expressions are given as follows: where y * i represents the flank wear width measured by an optical microscope andŷ * i denotes the predicted flank wear width using the proposed tool wear prediction method. In addition, a 95% CI of predicted results from the GA-GPR model is given byȳ * ± 2 × (cov(ŷ * )) 1/2 . To quantitatively analyze the reliability and stability of the proposed tool wear prediction method, CI average width (CIAW) and CI standard deviation (CISD) are utilized [17], which are expressed as follows: where 4 × (cov(ŷ * i )) 1/2 is the width of the 95% CI at the testing sample x * i . Among these evaluation criteria, the larger the PPC value and the less the MAE/RMSE/MAPE value, the higher the predicted accuracy. Besides, the lower the CIAW/CISD value,  the more reliable and stable of prediction wear results. Finally, after implementing performance evaluation in prediction accuracy and reliability, the predictive tool wear prediction model with excellent performance is obtained, which is utilized to predict the flank wear width using in-process multisensor signals.
IV. RESULTS AND DISCUSSION A. Multi-Information Fusion Analysis 1) Multisensor Signal Fusion Analysis: First, comparative experiments of the proposed tool wear prediction method using different signals are conducted to analyze the necessity of multisensor signal fusion. Also, evaluation results under different machining parameters are shown in Fig. 6, where "All" denotes a combination of vibration signals and cutting forces in x-, y-, and z-directions. In addition, overall performances of tool wear prediction using different signals are summarized in Table V, where "All * " represents no WPD-based signal denoising for multisensor signals. Note that quantitative results are given by "mean value ± standard deviation." As shown in Fig. 6 and Table V, different signals are discrepant in the predicted results of tool flank wear. The evaluation criteria (including MAPE, MAE, RMSE, and CISD) of predicted wear adopting cutting forces are lower than that using vibration signals, which indicates that cutting forces are more accurate in tool wear prediction than vibration signals. Besides, cutting forces or vibration signals in the y-direction obtain the lowest prediction error compared to other directions. However, all the single signals are significantly inferior to multisensor signal fusion in the performance of tool wear prediction. Finally, benefitting from the complementarity and fault tolerance of different signals, multisensor signal fusion lowers by 60.78%, 60.37%, 58.92%, 77.01%, and 30.83% of the mean of MAPE, MAE, RMSE, CIAW, and CISD compared to the best performance obtained by the vibration signals, respectively. Therefore, multisensor signal fusion is necessary and effective for tool wear prediction. In addition, predicted results under 95% CI of tool flank wear are shown in Fig. 7.
It can be seen from Fig. 7 that a 95% CI width of predicted results using vibration signals in the y-direction is significantly larger than that using cutting forces in the y-direction. Also, a 95% CI width of predicted results using multisenor signals is less than that using cutting forces in the y-direction. Besides, predicted wear using multisensor signals is closer to the measured wear than the single signals. Concretely, as shown in Table V, compared with the best performance of the force signals, predicted results of multisensor signals lower by 40.61%, 40.51%, and 42.23% in the mean value of MAPE, MAE, and RMSE. It also reduces by 49.36% and 47.38% in the mean value of CIAW and CISD, respectively. Therefore, these results display that multisenor signals are more accurate and reliable than the single signal in tool wear prediction.
In addition, as shown in Fig. 8, the WPD-based signal processing can overall lower the predicted error of tool flank wear when using multisensor signals. Correspondingly, predicted results under 95% CI of tool flank wear are drawn in Fig. 9. It can be found from Fig. 9 that a 95% CI width of predicted results without WPD is larger than that with WPD when using multisenor signals. Quantitatively, as shown in Table V, compared with no wavelet packet denoising (NWPD) for multisensor signals, predicted results based on WPD lower by 14.26%, 13.75%, 12.58%, 19.07%, and 22.75% in the mean value of MAPE, MAE, RMSE, CIAW, and CISD, respectively. Therefore, WPD-based signal processing can improve the accuracy and reliability of tool wear prediction using multisensor signals.
2) Multidomain Feature Fusion Analysis: Next, comparative experiments of tool wear prediction based on hybrid features extracted from different domains are carried out to Fig. 8. Performance evaluation of tool wear prediction using WPD-based signal processing. Fig. 9.
Predicted results of tool flank wear using WPD-based signal processing (95% CI).  Table IV. The evaluation results of these experiments are shown in Fig. 10. Besides, quantitative performances of tool wear prediction are given under different criteria in Table VI. As shown in Fig. 10 and Table VI, a combination of two domains (i.e. TD + FD, TD + WD, and FD + WD) can lower the MAPE, MAE, and RMSE of predicted wear compared with the single domain (i.e., TD, FD, and WD), improving the prediction accuracy of tool flank wear. Also, it can lessen the CIAW and CISD of predicted wear, which enhances the reliability of tool wear prediction compared with the single domain feature. Moreover, fusion features from MD (i.e., TD + FD + WD) can further improve the accuracy and stability of predicted wear compared with combined features from the two domains. Also, a 95% CI width of predicted wear using MD features is obviously less than the single domain features, which is shown in Fig. 11. Specifically, compared with TD features that obtain the best performance in the single domain, predicted results using MD features lower by 31.06%, 30.34%, 24.63%, 20.07%, and 38.58% in the mean value of MAPE, MAE, RMSE, CIAW, and CISD, respectively. In addition, compared with combined features from two domains (i.e., TD + WD) that obtain the lowest error in two domains, predicted results by MD features lessen by 15.69%, 16.19%, and 12.64% in the mean value of MAPE, MAE, and RMSE, respectively. Also, it reduces by 12.06% and 28.47% in the mean value of CIAW and CISD, respectively, compared with combined features from two domains (i.e., TD + FD) that obtain the best reliability in the two domains. In conclusion, multidomain feature fusion not only can reduce the predicted wear of tool flank wear but also enhance the reliability and stability of tool wear prediction.

1) Comparisons With Other Signal Fusion Methods:
In order to evaluate the effectiveness and advancement, the proposed tool wear prediction method is compared with other schemes in each critical process. First, performance   Fig. 12, where the Avg, Vot, and Min represent the average, voting, and minimum schemes of all the single input signals, respectively. Among them, the average scheme takes the same weight for different single signals, the voting scheme adopts different weights according to predicted error, and the minimum scheme utilizes the lowest predicted error of the single signal under each machining parameter. Besides, all accelerations (AA) denotes a fusion of all vibration acceleration signals in the x-, y-, and z-directions, and AF represents a fusion of all cutting forces in three directions. Correspondingly, the quantitative results are summarized in Table VII. It can be found from Fig. 12 and Table VII that the predicted error of the voting scheme is less than the average strategy due to considering the sensitivity of different single signals. The minimum scheme can lower predicted error using the most sensitive signal in different machining parameters. When combining cutting forces in the x-, y-, and z-directions, the AF strategy can reduce predicted error compared with the average, voting, and minimum schemes of a single signal to a certain extent. However, its predicted accuracy and reliability are limited to the information limitation of single-type signal,  and the predicted error is still less than the multisensor signal fusion used in the proposed method. Concretely, as shown in Table VII, the predicted wear of multisensor signal fusion is lower by 19.06%, 19.63%, and 22.11% in the mean value of MAPE, MAE, and RMSE, respectively. In addition, as shown in Fig. 13, a 95% CI width of predicted wear applying the all forces (AF) strategy is obviously larger than multisensor signal fusion. Overall, the predicted wear of multisensor signal fusion is lower by 48.78% and 36.06% in the mean value of CIAW and CISD, respectively. Therefore, the multisensor signal fusion employed in the proposed method can improve the prediction accuracy and reliability compared with other signal fusion schemes.
2) Comparisons With Other Feature Fusion Methods: Then, to evaluate the effectiveness of multidomain feature fusion, the KPCA technique is compared with other feature fusion strategies, such as no feature selection (NFS) scheme, MRMR, and LLE. These methods are conducted on the presented GA-GPR prediction model using multisenor signals. Also, the performance comparison of these methods is shown in Fig. 14. Correspondingly, the overall results are quantitatively listed in Table VIII. As shown in Fig. 14 and Table VIII, the feature selection strategies based on MRMR and LLE technique can achieve a lower predicted error (such as MPAE, MAE, and RMSE) than the NFS scheme that used all candidate multidomain features. However, these two techniques have a larger CIAW of the predicted wear compared with the NFS scheme, which reduces the reliability of tool wear prediction. Fortunately, the KPCA-based scheme can reduce a predicted error and improve the reliability of predicted results, which is shown  in Table VIII and Fig. 15. Concretely, compared with NFS, MRMR, and LLE schemes, the KPCA-based strategy lowers by 13.60%, 4.56%, and 4.68% in the mean value of MAE, and it also reduces by 17.44%, 2.04%, and 6.14% in the mean value of RMSE. Furthermore, the KPCA-based scheme lowers by 30.14%, 39.66%, and 40.04% in the mean value of CIAW, and it reduces by 56.91%, 41.60%, and 27.86% in the mean value of CISD compared with other feature selection schemes. Consequently, the KPCA-based multidomain feature fusion scheme can reduce a predicted error of tool flank wear and enhance the reliability of tool wear prediction.
3) Comparisons With Other Prediction Methods: Next, to demonstrate the effectiveness of the predictive model, the presented GA-GPR model is compared with other intelligent models, including MLR, BP, RBF, SVR, LSSVR, and RF. These models are also optimized by GA. Besides, all the compared models adopted the WPD-based signal preprocessing and KPCA-based feature fusion scheme same as the GA-GPR model. The performance evaluations under different machining parameters are shown in Fig. 16, and the overall performances under different evaluation criteria are quantitatively summarized in Table IX. It can be found from Fig. 16 and Table IX that the GA-GPR model achieves the best performance of tool wear prediction among the compared methods. Compared with the GA-LSSVR model obtained the lowest error of other compared methods, the GA-GPR model lowers by 14.11%, 14.17%, and 11.38% in the mean value of MAPE, MAE, and RMSE, respectively, and it also reduces by 9.70%, 40.65%, and 18.17% in the standard

4) Comparisons With Other Optimization Methods:
In addition, comparison experiments of predictive models under grid search (GS)-and GA-based optimizing schemes are conducted to validate the effectiveness of the model parameter optimization. Correspondingly, the overall performance of these optimized prediction models under different machining parameters is shown in Fig. 17.
It can be seen from Fig. 17 that the GA-optimized prediction models have a larger PPC and lower MAPE, MAE, and RMSE than GS-based models as a whole. Also, whether using GS-or GA-based model parameter optimization, GPR achieves higher correlation and lower error of tool wear prediction among other compared methods. Furthermore, as shown in Fig. 18, Fig. 17.
Overall performance of tool wear prediction under different optimization methods. a 95% CI width of predicted wear employing the GA-optimized GPR model is smaller than that using the GS-optimized GPR model.
The GA-GPR model reduces by 21.34% and 29.45% in the mean value of CIAW and CISD, respectively, which enhances the reliability of tool wear prediction compared with the GS-GPR model. Also, the GA-GPR model also lowers by 18.03%, 18.14%, and 14.54% in the mean value of MAPE, MAE, and RMSE, respectively, which improves the prediction accuracy of tool flank wear. Ultimately, the proposed tool wear prediction method achieves better performance than other methods in each critical process, including compared multisensor fusion schemes, multidomain feature fusion techniques, and intelligent prediction models.

C. Discussion
The indirect methods have been widely applied for tool wear monitoring integrated into intelligent manufacturing systems. When the training samples with labeled information are inadequate or the data distribution between training and testing samples is different, the monitoring accuracy of those data-driven indirect methods reduces significantly. In addition to the monitoring accuracy, the stability and reliability of indirect methods are also critical to integrated manufacturing systems. In order to improve the monitoring accuracy as well as provide the prediction reliability under small samples, this study investigates an indirect tool wear prediction method using multi-information fusion technologies and a GA-optimized GPR model synthetically. The main innovation of the developed tool wear monitoring method relies on the integration of the following technologies. First, the 3-D cutting forces and vibration signals provide multidimensional information for tool wear monitoring. Then, the WPD-based signal preprocessing can enhance the signalto-noise ratio of the collected multisensor signals. Next, the multidomain features are extracted from the preprocessed signals to supply the comprehensive wear characteristics, and the KPCA technology is employed to fuse the principal features related to tool wear. Finally, the GPR-based prediction model can balance the monitoring accuracy and reliability, and the GA-optimized model can enhance the monitoring performance.
Intuitively, the main ablation methods are implemented to demonstrate the effectiveness of each technology integrated into the proposed method. In addition to the evaluation metrics, including MAE, CIAW, and CISD, the relevant research has shown that some advanced metrics such as alpha-lambda (α − λ ), prognostic horizon, and prediction distribution can be used to verify the accuracy and reliability of prognostic methodologies effectively [77]. Considering the consistency of these metrics displayed in the previous study [78], the α − λ metric is adopted to further highlight the contribution of each module. The α − λ metric is defined as follows: where λ * and λ p denote the ground truth and the predicted value, respectively, and α represents an arbitrary error of the prediction model. Also, the α − λ metric is averaged over the whole testing samples. The lower α and the larger α − λ , the higher the accuracy and the better the reliability of the prediction method. To highlight the prediction performance of the proposed method, this study specifies 5% to be a chosen value for α. Besides, the optimizing time is also adopted to validate the ablation methods. The performance comparisons between the ablation methods are quantitatively summarized in Table X.
It can be found that compared with the ablation method (NWPD + KPCA + GA-GPR), the proposed method (WPD + KPCA + GA-GPR) can obviously reduce MAE, CIAW, and CISD and increase α − λ with almost no increment in optimizing time. The ablation method (WPD + NFS + GA-GPR) has higher MAE, CIAW, and CISD and lower α − λ , and increases the optimizing time than the proposed method. Moreover, compared with the ablation method (WPD + KPCA + GS-GPR), the proposed method can significantly reduce the optimizing time and prediction error and increase α − λ . Therefore, the proposed method can balance the prediction accuracy and reliability of tool wear prediction, combined with the WPD-based multisensor signal processing, KPCA-based multidomain feature fusion, and GA-optimized GPR prediction model. In addition, to further demonstrate the superiority of the proposed method, this study implements comprehensive comparisons of α − λ and time consumption with other advanced methods. Specifically, the comparative results are listed in Table XI. It is easily found that compared with other advanced methods, the proposed method obviously lowers the prediction error and improves the α − λ metric. Although the optimizing time of the proposed method is slightly more than the WPD + KPCA + GA-LSSVR method, their training and testing times are basically equal. Besides, the optimizing time of the WPD + KPCA + GA-RF method is slightly shorter than that of the proposed method, but the proposed method has an advantage in training and testing time. For the WPD + KPCA + GA-BP and WPD + KPCA + GA-RBF methods, the optimizing and training times are significantly longer than the proposed method. The WPD + KPCA + MLR method has a little time-consuming, but its prediction accuracy and reliability are not enough for effective tool wear prediction.
Consequently, when using multisensor signals for tool wear prediction, the proposed method (WPD + KPCA + GA-GPR) not only can prove higher prediction accuracy and reliability but also take less training and testing times, which has great utilization potentiality in online tool wear monitoring.

V. CONCLUSION
In this study, a tool wear prediction method based on multi-information fusion and GA-optimized GPR is proposed for indirectly measuring the flank wear width in milling. The multi-information fusion analysis and comprehensive comparisons with other advanced and ablation methods are carried out to verify the effectiveness of the proposed method under different milling parameters. The main works are summarized as follows.
1) The proposed indirect method can improve the accuracy and reliability of tool wear prediction using multisensor signals, benefiting from the integration of the WPD-based signal preprocessing, the KPCA-based multidomain feature fusion, and the GA-optimized GPR prediction model. 2) The WPD-based signal preprocessing strategy lowers by 14 and RF models in the accuracy of tool wear prediction and additionally provides an uncertainty estimation interval to quantify the reliability of the predicted results. Furthermore, the GA-GPR model obviously improves the prediction accuracy and reliability and significantly reduces the optimizing time compared with the GS-GPR model. Comparison results demonstrate that the proposed tool wear prediction method achieves accurate and reliable monitoring of the flank wear width, promoting a successful application of indirect methods in milling operations. The study can also provide some beneficial suggestions for indirectly measuring tool flank wear in other machining operations (such as turning and drilling). For future research, the proposed method will be expanded to more processing parameters and machining operations, further enhancing its generalization ability. In addition, multi-information fusion methods based on other feature fusion technologies and intelligent prediction models will be developed to tool wear prediction, further improving the prediction accuracy and reliability.
Jiajie Shao received the M.S. degree in mechanical manufacture and automation from the University of Shanghai for Science and Technology, Shanghai, China, in 2021. He is currently pursuing the Ph.D. degree in mechanical engineering with the School of Mechanical Engineering, Tongji University, Shanghai.
His research is currently focused on structural health monitoring, fault diagnosis, and transfer learning.
Weicheng Guo received the Ph.D. degree in mechanical engineering from Donghua University, Shanghai, China, in 2020.
He was a Visiting Scholar with Purdue University, West Lafayette, IN, USA. He is currently a Lecturer with the School of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai. His current research interests include precision machining of aerospace difficult-to-cut materials and intelligent manufacturing.
Weidong Li (Member, IEEE) received the Ph.D. degree in mechanical engineering from the National University of Singapore, Singapore, in 2002.
He is currently a Full Professor with Coventry University, Coventry, U.K., and the Dean of mechanical engineering with the University of Shanghai for Science and Technology, Shanghai, China. He has led a number of EU and U.K. projects in the areas of sustainable or digital manufacturing and cooperated with automotive, aeronautical, and manufacturing industries [e.g., Airbus, Jaguar Land Rover, Sandvik, and some manufacturing Small and Mid-sized Enterprise (SMEs)]. In the research areas, he has published over 120 research articles in peer-reviewed international journals and five books (Springer and World Scientific Publisher). His primary research interests include computer-aided manufacturing, sustainable manufacturing, and big data analytics for smart manufacturing.
Dr. Li is a fellow of Institution of Engineering and Technology (IET) and Institution of Mechanical Engineers (IMechE) U.K. Jianmin Zhu received the Ph.D. degree in mechanical manufacture and automation from the Huazhong University of Science and Technology, Wuhan, Hubei, China, in 2000.
He is currently a Professor with the School of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai, China. He has published over 120 articles, five books, and 30 patents. His research interests cover process monitoring and intelligent control of the electromechanical system, precision measurement techniques, and complex system modeling and control.
Prof. Zhu has won the Second Prize of National Science and Technology Progress Award, the Second Prize of the Ministry of Education Natural Science Award, and the Second Prize of Henan Province Science and Technology Progress Award.
Qichao He received the B.S. degree in mechanical engineering from the East China University of Science and Technology, Shanghai, China, in 2021.
He is currently working as an Engineer with Shanghai Aerospace Equipment Manufacturing General Company Ltd., Shanghai. His current research interests include precision machining and process optimization.
Dianjun Fang received the Ph.D. degree in mechanical engineering from the Technical University of Dortmund, Dortmund, Germany, in 1995.
He is currently a Professor with the School of Mechanical Engineering, Tongji University, Shanghai, China, and the Chinese Director of the Qingdao Sino-German Institute of Intelligent Technologies, Qingdao, China. He is the Honorary Chief Scientist in China with the Fraunhofer Institute for Material Flow and Logistics, Dortmund. His current research interests include structural health monitoring, digital twins, and intelligent logistics system.