Rotation Error Prediction of CNC Spindle Based on Short-Time Fourier Transform of Vibration Sensor Signals and Improved Weighted Residual Network

The spindle rotation error of computer numerical control (CNC) equipment directly reflects the machining quality of the workpiece and is a key indicator reflecting the performance and reliability of CNC equipment. Existing rotation error prediction methods do not consider the importance of different sensor data. This study developed an adaptive weighted deep residual network (ResNet) for predicting spindle rotation errors, thereby establishing accurate mapping between easily obtainable vibration information and difficult-to-obtain rotation errors. Firstly, multi-sensor data are collected by a vibration sensor, and Short-time Fourier Transform (STFT) is adopted to extract the feature information in the original data. Then, an adaptive feature recalibration unit with residual connection is constructed based on the attention weighting operation. By stacking multiple residual blocks and attention weighting units, the data of different channels are adaptively weighted to highlight important information and suppress redundancy information. The weight visualization results indicate that the adaptive weighted ResNet (AWResNet) can learn a set of weights for channel recalibration. The comparison results indicate that AWResNet has higher prediction accuracy than other deep learning models and can be used for spindle rotation error prediction.


Introduction
High-precision computer numerical control (CNC) equipment is the core of the modern manufacturing industry.The spindle is the key rotating part of CNC equipment, which is a complex mechanical system integrating machine, electricity, liquid and gas.Rotation error refers to the distance that the actual rotation axis deviates from its ideal axis [1].The spindle rotation error of CNC equipment directly reflects the machining quality of the workpiece and is a key indicator reflecting the performance and reliability of CNC equipment [2,3].Accurately predicting the rotation error of the spindle is of great significance for reducing machining errors and improving the reliability of CNC equipment.
Through reviewing the existing literature, it is found that spindle rotation error prediction is mainly divided into the direct measurement method and the physical modelingbased method.The direct measurement method installs a standard ball or rod at the end of the tool jig, and the rotation accuracy tester is utilized to measure the spindle rotation error.Based on the direct measurement method, researchers have conducted extensive research on monitoring spindle rotation errors.For example, Castro [4] proposed a laser interferometer-based method for evaluating the rotation error of machine tool spindles, utilizing a master ball with high surface finish and accuracy to reflect the incident beam back to the interferometer.Liu et al. [5] proposed a four-point method for spindle rotation error measurement and separation by using four sensors to measure the orbit at the center of the spindle cross-section.Wang et al. [6,7] developed a spindle rotation error evaluation method based on the least squares method, which is based on a measuring system composed of a standard ball and high-precision capacitive displacement sensor.The error characteristics are extracted by time domain and frequency domain signal analyses.At the same time, it was found that the rotation error is closely related to the spindle rotation speed.Anandan et al. [8] proposed a multi-directional error separation technique to obtain the radial axis rotation error.The physical modeling-based approaches carry out spindle vibration analysis and rotation error prediction by establishing a spindle dynamics model.For example, Karacay et al. [9] studied the spindle vibration in radial, axial, rocking and yawing directions, utilizing a model of spindle dynamics of a rigid rotor grinder supported by angular contact ball bearings.Kang et al. [10] developed a physical model of a high-fidelity and highspeed spindle bearing system and realized the dynamic prediction of spindle rotation error.Bai [11] studied the formation mechanism of rotation error and concluded that the bearing, spindle and spindle shank joint face are the key components that lead to the decline in spindle rotation accuracy, and they established a physical model of spindle rotation accuracy degradation based on bearing wear.
The above review shows that direct measurement methods and physical model-based methods are able to obtain the spindle rotation error.However, there are still some limitations to these methods.The main drawback of direct measurement methods is that the tool position is occupied by a standard ball or standard bar.The spindle being measured cannot mount the tool and complete the normal machining process [12].Therefore, the direct measurement methods are based on the premise of an idle spindle and cannot measure the rotation error when the spindle is loaded [13], which is challenging to use in the actual machining of workpieces with cutting tools.Most current studies utilize the spindle dynamics model to study the spindle stiffness, intrinsic frequency and other dynamics parameters, and carry out the optimization design.Due to the extensive simplification of the rolling bearing dynamics model, few studies have predicted the spindle rotation error [10].In addition, establishing a dynamic prediction model is an extraordinarily time-consuming and idealized process, which is not conducive to industrial practical applications.The real-time monitoring of machine tool spindle performance and the real-time compensation of rotation errors have become enormous challenges.Complex and simplified dynamic models do not accurately reflect spindle rotation; expensive measurement equipment, strict installation requirements and existing measurement techniques can affect regular machining tasks.These challenges create obstacles to the direct prediction of spindle rotation errors.
According to references [6,7,11], the spindle rotation error is closely related to speed and wear degree.Spindle vibration signals usually contain characteristic information about spindle speed [14] and wear level [15].Therefore, it is reasonable and feasible to establish the function mapping relationship between the spindle rotation error and vibration signal [16].In fact, the realization of rotating machinery fault diagnosis [17][18][19] and remaining useful life prediction [20][21][22] through vibration signals has been rapidly developed.The difference with fault diagnosis is that the spindle rotation error is a non-fault state, the vibration signals are similar between categories and the discriminative features are weak, unlike the significant difference in features between different fault categories.Spindle rotation error prediction is also not a regression problem, such as remaining useful life prediction, and the regression method is not generalized enough for rotation error prediction at multiple speeds [12].With the help of the above idea, researchers established a mapping relationship between the vibration signal and the rotation error by means of a neural network.Song et al. [23] developed a multi-scale convolutional neural network (MSCNN) model for spindle rotation error prediction by first acquiring spindle vibration signals through multiple sensors and then extracting features using convolutional kernels of different sizes.The experimental results verify the feasibility of the prediction of spindle rotation error by a convolutional neural network (CNN).Further, to address the bottleneck regarding traditional CNN models, which are difficult to train when superimposing deep structures, Song et al. [24] proposed a residual network (ResNet)-based spindle rotation error prediction algorithm, which achieved good prediction accuracy.However, existing CNN-based methods do not consider the correlation of multi-sensor data, affecting the accuracy of deep learning methods.Specifically, due to different installation locations, data from different sensors may contain various degrees of degraded information.In particular, data collected by multiple sensors are redundant, and the direct fusion of different channels without distinguishing the importance of the sensors may lead to the transfer of redundant information between networks, further affecting the performance of the model.So, deep learning requires effective learning mechanisms to highlight sensor data that contain more degraded information to improve the generalization ability of the model.The categories for spindle rotation error prediction are non-fault states, and the small intra-class distance further increases the difficulty of prediction.
To address the above problems, this article developed a new neural network model named adaptive weighted ResNet (AWResNet) for predicting spindle rotation error.The method incorporates an adaptive multi-sensor data-recalibration module in ResNet to weight the channel data, thus improving the accuracy of spindle rotation error prediction.In order to assess the effectiveness and superiority of the developed model, spindle rotation error prediction experiments were carried out.The main contributions of this article can be summarized as follows: 1.
The attention weighting unit is adopted to adaptively distinguish the importance of the spindle multi-sensor vibration data, so as to emphasize the important feature information, suppress the redundant feature information, and enhance the feature extraction capability of the model.

2.
The AWResNet model for spindle rotation error prediction is constructed by adding an attention weighting unit to the original residual network (ResNet), which takes the Short-time Fourier Transform (STFT) time-frequency domain features of the vibration signals as inputs to establishes end-to-end mapping between the vibration signals and the rotation errors.

3.
Comparison tests, feature visualization, attention weight visualization, and anti-noise experiments are carried out based on the vibration data collected from the machine tool spindle reliability test bed, and the experimental results verify the effectiveness and superiority of the proposed method.
The rest of the article consists of Section 2, which introduces the fundamentals of ResNet; Section 3, which describes in detail the developed AWResNet prediction model; Section 4, which employs the spindle rotation error dataset to verify the validity and superiority of AWResNet; and Section 5, which draws conclusions.

Convolution Neural Network
The CNN [25,26] is an important branch of deep learning methods.Because of its strong feature-learning ability, it has been widely used in the manufacturing industry in equipment reliability analysis fields such as fault diagnosis [27,28], remaining useful life prediction [29,30] and so on.In the CNN model, a trainable convolution kernel slides over the input data to extract local features at different positions.Sparse connection and weight sharing are the main features that distinguish CNNs from traditional neural networks.Convolutional operations are performed by multiplying the convolution kernel with the corresponding positions of the input data and then adding them to obtain the output [31].The convolution operation can be expressed as follows: where x indicates the output of the convolution layer; ω indicates the weight of the convolution kernel and b is the bias; i, j and l represent the serial numbers of input channels, output channels and the convolution layer, respectively; * represents convolution operation; f (•) represents the nonlinear transformation activation function.The derivative of the rectified linear unit (ReLU) can only be 0 or 1, which is more effective than the traditional Sigmoid and Tanh activation functions at avoiding gradient disappearance and gradient burst during deep neural network training.

Pooling
Pooling is a downsampling operation that significantly compresses the data and reduces the data dimension.The specific operation of the pooling layer is to aggregate a data point with its surrounding data points to reduce the data dimensions.Commonly used pooling operations are average pooling and maximum pooling.Average pooling takes the average value of the data points in a specific region of the feature map in a particular step size, while maximum pooling takes its maximum value.Taking maximum pooling as an example, its mathematical description can be expressed as follows: where p c (k, z) is the output at coordinates (c, k, z); x c (k, z) is the input data at c-th channel, k-th row and z-th column, where m, n ∈ [1, i]; i represents the size of the pooling region; s represents the sampling interval.Global average pooling (GAP) is shown in Figure 1, which takes the average of all the data in each channel and is mainly used before the fully connected layer of the ResNet model, thus enabling data dimensionality reduction.
operation; (•) represents the nonlinear transformation activation function.The der tive of the rectified linear unit (ReLU) can only be 0 or 1, which is more effective than traditional Sigmoid and Tanh activation functions at avoiding gradient disappearance gradient burst during deep neural network training.

Pooling
Pooling is a downsampling operation that significantly compresses the data and duces the data dimension.The specific operation of the pooling layer is to aggrega data point with its surrounding data points to reduce the data dimensions.Commo used pooling operations are average pooling and maximum pooling.Average pool takes the average value of the data points in a specific region of the feature map in a p ticular step size, while maximum pooling takes its maximum value.Taking maxim pooling as an example, its mathematical description can be expressed as follows: (, ) =  { ( + ( − 1),  + ( − 1))} where  (, ) is the output at coordinates (, , ) ;  (, ) is the input data at  channel, -th row and -th column, where ,  ∈ [1, ];  represents the size of the po ing region;  represents the sampling interval.Global average pooling (GAP) is sho in Figure 1, which takes the average of all the data in each channel and is mainly u before the fully connected layer of the ResNet model, thus enabling data dimensiona reduction.

Cross-Entropy Loss Function
The loss function for classification tasks is generally cross-entropy loss (CEL).In classification calculation, the estimated probability  () that the observation  belo to class  can be calculated and compared with the true probability  () for obtain the loss of the CNN.In deep learning methods, CEL calculates the distance between predicted and true values.CEL can be expressed as follows: In the CEL function,  represents the number of classification categories.

Cross-Entropy Loss Function
The loss function for classification tasks is generally cross-entropy loss (CEL).In the classification calculation, the estimated probability q i (y) that the observation y belongs to class i can be calculated and compared with the true probability p i (y) for obtaining the loss of the CNN.In deep learning methods, CEL calculates the distance between the predicted and true values.CEL can be expressed as follows: In the CEL function, M represents the number of classification categories.

The Proposed Prediction Method
The existing spindle rotation error tester is expensive and requires high installation accuracy, and it is easy to cause damage in practical applications.Therefore, it is challenging to measure rotation error under load.How to monitor the performance of the spindle and compensate for the error in real time has become an urgent problem.The prediction method based on vibration signal does not consider the importance of different sensor data.To solve the above issues, this section adopts the original vibration signal as the input and firstly extracts the time-frequency characteristics of the original multi-sensor data by using STFT.Then, in order to distinguish the importance of different sensor data and establish the correlation of multi-sensor data, a new AWResNet method is proposed to adaptively recalibrate different sensor data to give more weight to data containing more degenerate information and less weight to data containing redundant information, so as to extract more discriminative features and improve the prediction ability of deep learning networks.

STFT Representation
STFT is a joint time-frequency transform method for non-stationary signals.It converts one-dimensional vibration signals into a two-dimensional matrix suitable for twodimensional CNN processing, which contains not only the time domain characteristic spectrum, but also the frequency domain characteristic spectrum.STFT divides the original vibration data into equal-length segments, then multiplies each segment by a window function in chronological order to perform a segmented Fourier transform.The results of the obtained series of Fourier transform are lined up to become a two-dimensional representation.Mathematically, STFT can be written as follows: where t and τ are time; ω is frequency; x(τ) is the signal that needs to be transformed; h(τ − t) is the window function; and STFT(t, ω) is the Fourier transform of x(τ)h(τ − t).This paper adopts the Hann window as the window function, and the Hann window function can be expressed as follows: where K denotes the number of data points in the output of each Fourier transform segment, which is set to 64 in this article.An input signal of length 1024 is passed through a Hann window STFT with an overlap of 32 to obtain a two-dimensional representation of 33 × 33.

The Proposed AWResNet Model
ResNet was developed to solve the problem of deep CNN training in image processing.The ResNet model adds many identity maps to the convolution layer, which is beneficial to the backpropagation of errors and optimization of network weights.ResNet has performed well in image recognition, image segmentation and object detection [32].Figure 3 shows the residual building units (RBUs) of ResNet.Each RBU module consists of Conv, BN [33] and ReLU.Stacking multiple RBU modules builds the ResNet model.In Figure 3, Conv represents the convolutional layer.The output y of the entire RBU can be represented as follows: where BN stands for Batch Normalization.When the batch training method is used, the feature distribution among samples often changes during iteration, which is an internal covariance shift problem.In this case, the model parameters need to be constantly updated to accommodate the changing distribution.BN is a normalization method to solve the problem of internal covariance shift.

The Proposed AWResNet Model
ResNet was developed to solve the problem of deep CNN training in image processing.The ResNet model adds many identity maps to the convolution layer, which is beneficial to the backpropagation of errors and optimization of network weights.ResNet has performed well in image recognition, image segmentation and object detection [32].Figure 3 shows the residual building units (RBUs) of ResNet.Each RBU module consists of Conv, BN [33] and ReLU.Stacking multiple RBU modules builds the ResNet model.In Figure 3, Conv represents the convolutional layer.The output y of the entire RBU can be represented as follows: where BN stands for Batch Normalization.When the batch training method is used, the feature distribution among samples often changes during iteration, which is an internal covariance shift problem.In this case, the model parameters need to be constantly updated to accommodate the changing distribution.BN is a normalization method to solve the problem of internal covariance shift.

Attention Weighting Unit
The STFT time-frequency representation of the multi-sensor data is utilized as an input to the ResNet model, with each input data channel representing a sensor signal.Data from different sensors contain information about spindle degradation to varying degrees.Specifically, some data may contain rich feature information related to spindle degradation features, while others may contain very few degradation features or even only measurement noise.Therefore, in order to identify discriminatively important information in multi-sensor data, it is necessary to identify the importance of different sensor data by modeling the relationship between channels along the channel dimension.To highlight the critical feature information and suppress the useless feature information, an adaptive weighting ResNet (AWResNet) model is proposed for the recalibration of spindle multi-sensor data.The core idea of AWResNet is to add the squeeze-excitation attention weighting unit [34] to the RBU.The attention weighting unit can adaptively recalibrate the weights of multi-sensor channel data according to the input, with each channel's data having a weight of varying magnitude.A larger weight represents the greater influence of the channel's data, with vice versa indicating that the channel's data are less important.The attention weighting unit is displayed in Figure 4 and comprises four parts: global information extraction, channel interrelationship modeling, weight calculation and weighted output.

Attention Weighting Unit
The STFT time-frequency representation of the multi-sensor data is utilized as an input to the ResNet model, with each input data channel representing a sensor signal.Data from different sensors contain information about spindle degradation to varying degrees.Specifically, some data may contain rich feature information related to spindle degradation features, while others may contain very few degradation features or even only measurement noise.Therefore, in order to identify discriminatively important information in multi-sensor data, it is necessary to identify the importance of different sensor data by modeling the relationship between channels along the channel dimension.To highlight the critical feature information and suppress the useless feature information, an adaptive weighting ResNet (AWResNet) model is proposed for the recalibration of spindle multisensor data.The core idea of AWResNet is to add the squeeze-excitation attention weighting unit [34] to the RBU.The attention weighting unit can adaptively recalibrate the weights of multi-sensor channel data according to the input, with each channel's data having a weight of varying magnitude.A larger weight represents the greater influence of the channel's data, with vice versa indicating that the channel's data are less important.The attention weighting unit is displayed in Figure 4

Attention Weighting Unit
The STFT time-frequency representation of the multi-sensor data is utilized as an input to the ResNet model, with each input data channel representing a sensor signal.Data from different sensors contain information about spindle degradation to varying degrees.Specifically, some data may contain rich feature information related to spindle degradation features, while others may contain very few degradation features or even only measurement noise.Therefore, in order to identify discriminatively important information in multi-sensor data, it is necessary to identify the importance of different sensor data by modeling the relationship between channels along the channel dimension.To highlight the critical feature information and suppress the useless feature information, an adaptive weighting ResNet (AWResNet) model is proposed for the recalibration of spindle multisensor data.The core idea of AWResNet is to add the squeeze-excitation attention weighting unit [34] to the RBU.The attention weighting unit can adaptively recalibrate the weights of multi-sensor channel data according to the input, with each channel's data having a weight of varying magnitude.A larger weight represents the greater influence of the channel's data, with vice versa indicating that the channel's data are less important.The attention weighting unit is displayed in Figure 4   Global information extraction: In order to establish dependencies between different channel information, it is necessary to compress global spatial information into channel descriptions, which is achieved through GAP operations.The global information of statistical information z in channel c can be represented as follows: Sensors 2024, 24, 4244 8 of 19 where x ∈ R c×H×W , H × W represents the spatial dimension of x c , and the input signal is an STFT time-frequency representation with a height of H and a width of W. z ∈ R c×1×1 , z c can be interpreted as a channel description, which can describe the global information of different channels.x c represents the input feature information in the c-th channel.Channel inter-relationship modeling: To establish the inter-relationships between channels and capture the dependencies between different channel information, this step must be able to learn the nonlinear interactions between different channels, which is implemented through a one-dimensional convolutional Conv and ReLU activation function.The input for this step is z ∈ R c×1×1 , with a channel of c.The convolution operation with a kernel of 1 × 1 can fuse information from different channels.In order to control the computational complexity of the attention mechanism, the number of output channels of the one-dimensional convolution is reduced by the dimension parameter β.The number of input channels of the one-dimensional convolution is c, and the output channels is c/β.Feature nonlinear transformations are implemented by employing ReLU after onedimensional convolution.
Weight calculation: The weight calculation must ensure that the feature information of multiple channels is allowed to be emphasized rather than a single channel, and the weight of each channel is obtained by adopting the Sigmoid function.The output value of the Sigmoid is between 0 and 1, which ensures that multiple channels are emphasized.Before activating the Sigmoid function, it is also necessary to increase the dimension through a one-dimensional Conv with a convolution kernel of 1 × 1, so that the number of weights is consistent with the number of channels.The convolutional layer has c/β input channels and c output channels.The process of channel relationship modeling and weight calculation can be expressed as follows: where σ represents Sigmoid and δ represents ReLU function; s denotes attention weight; the weight of the first convolutional layer is W 1 , and the second convolutional layer is W 2 .
It should be noted that the value of attention weight s will adaptively change with the input sample.By adopting the channel attention mechanism, different samples can adaptively learn a set of weights of different sizes, thereby assigning larger weights to more important channels and smaller weights to less important channels.Weighted output: Multiply the input feature x and attention weight to obtain a weighted output, and the final recalibrated output x c can be expressed as follows:

AWResNet Model
By embedding an attention weighting unit in the RBU module, the adaptive weighting of data from different channels can be realized.Figure 5a shows the proposed adaptive weighting RBU module.The attention weighting unit is located after the second BN layer.The proposed AWResNet model can be constructed by stacking multiple adaptive weighting RBU modules, as shown in Figure 5b, where FC indicates the fully connected layer.
Based on the AWResNet model proposed above, the model's parameters need to be further determined.The structure of AWResNet is illustrated in Table 1.In the table, adaptive weighting RBU1 ×2 represents the residual block repeated once.Therefore, the structure contains 17 convolution and 1 fully connected layer operations.In adaptive weighting RBU, the dimension parameter β is set to 16.

of 19
Based on the AWResNet model proposed above, the model's parameters need to be further determined.The structure of AWResNet is illustrated in Table 1.In the table, adaptive weighting RBU1 × 2 represents the residual block repeated once.Therefore, the structure contains 17 convolution and 1 fully connected layer operations.In adaptive weighting RBU, the dimension parameter  is set to 16.

AWResNet-Based Spindle Rotation Error Prediction Procedure
Figure 6 shows the AWResNet-based spindle rotation error prediction procedure, which can be summarized into four steps: data collection and preprocessing, network structure design, model offline training, and model online testing.The detailed steps are as follows: (1) Data collection and preprocessing.Firstly, multiple vibration sensors were employed to acquire vibration data from the spindle with different rotation errors.Secondly, the vibration signals were divided by a fixed length, and corresponding labels were made for each divided sample according to the category of rotation error.Meanwhile, STFT

AWResNet-Based Spindle Rotation Error Prediction Procedure
Figure 6 shows the AWResNet-based spindle rotation error prediction procedure, which can be summarized into four steps: data collection and preprocessing, network structure design, model offline training, and model online testing.The detailed steps are as follows: (1) Data collection and preprocessing.Firstly, multiple vibration sensors were employed to acquire vibration data from the spindle with different rotation errors.Secondly, the vibration signals were divided by a fixed length, and corresponding labels were made for each divided sample according to the category of rotation error.Meanwhile, STFT processing converted the signal into time-frequency domain features.Then, min-max normalization was used to preprocess the data, which is conducive to the convergence of the model.(2) Network structure design.The network structure adopted in this paper is shown in Table 1.
( (4) Model online testing.The saved AWResNet model structure and parameters were called, the prepared test set was taken as the input of the trained AWResNet algorithm, the actual classification probability was calculated, and the prediction result of the rotation error was obtained.
processing converted the signal into time-frequency domain features.Then, min-max normalization was used to preprocess the data, which is conducive to the convergence o the model.
(2) Network structure design.The network structure adopted in this paper is shown in Table 1.
(3) Model offline training.The prepared training set data were fed into the AWResNe model, and the model was trained by the gradient descent method.Forward propagation calculated the loss, and back propagation updated the model parameters.The detailed algorithm flow for training AWResNet model is presented in Algorithm 1.When the maximum training epoch was reached, the trained model weight and bias were saved as loca files for model testing and calling during online deployment.
(4) Model online testing.The saved AWResNet model structure and parameters were called, the prepared test set was taken as the input of the trained AWResNet algorithm the actual classification probability was calculated, and the prediction result of the rota tion error was obtained.//Feature extract 3:

Output prediction result
Calculate the output of Conv+BN+ReLU layers; 4: Calculate the output of 8 adaptive weighting RBU modules in series; 5: Calculate the output of the GAP layer; 6: Calculate the output x i of the FC layer; 7: //Calculate the probability p j of each category 8: , where M stands for the number of categories; 9: //Calculate loss 10: Calculate the cross entropy loss L(p(x), q(x)) using Formula (3); 11: //Error backpropagation and updating parameters 12:

Experimental Verifications
To evaluate the effectiveness and superiority of the developed AWResNet model in predicting the rotation error of spindle of CNC machines, vibration data and the corresponding rotation error were collected through a spindle reliability test bed.Python 3.7 and PyTorch was adopted as programming language to carry out experimental verification and analysis in the hardware environment of Windows, i7 CPU and RTX2060 SUPER GPU.It should be noted that to ensure the fairness of the comparison results, the experimental results of all methods were obtained under the same data acquisition platform, the same dataset and the same programming environment.

Experimental Platform
Figure 7 illustrates the spindle rotation error reliability analysis testbed, which consists of a CTB40D spindle and drive, a PCB256A14 vibration sensor, a DYMH-104 force sensor, a rotation signal processing unit, an NI PXie-1082 data acquisition and control unit, etc.The experimental platform can collect vibration signals and rotation signals through the vibration sensor and spindle rotation signal processing unit, respectively.The vibration sensors are installed at the base end, spindle end and bearing end.The loading experiments were performed through the load spectrum of the spindle to ensure that the spindle operation was similar to the actual working conditions [35].The rotation signal processing unit collected the rotation error signal every 10 h and performed wear tests at other times.

Data Preprocessing
After the data acquisition was completed, data preprocessing and discretization were needed.Errors were rounded to 0.5.The spindle speed range was 1000 r/min~4000 r/min, and the range of spindle rotation error was 5 µm~14.5 µm.The rotation error data were discretized to contain a total of 20 categories.Two datasets were selected for each category.The vibration signal sampling frequency was 20 KHz, and the sampling time of each sub-dataset was 10 s.The signals in the Z-axis direction from three vibration sensors were selected as experimental data.To simulate the application in the actual industry, the data were divided according to the time order.The first 70% of the data points of each signal in order were regarded as the training data, and the last 30% of the data points were regarded as the test data.Data enhancement can improve the generalization ability of deep learning algorithms [36].Data enhancement with overlapping samples was adopted in this article, as shown in Figure 8.For each training sample, there was an overlap of data points with the subsequent sample.The example in Figure 8 has 1024 data points per sample, and there are 704 overlapping data points.

Data Preprocessing
After the data acquisition was completed, data preprocessing and discretization were needed.Errors were rounded to 0.5.The spindle speed range was 1000 r/min~4000 r/min, and the range of spindle rotation error was 5 µm~14.5 µm.The rotation error data were discretized to contain a total of 20 categories.Two datasets were selected for each category.The vibration signal sampling frequency was 20 KHz, and the sampling time of each subdataset was 10 s.The signals in the Z-axis direction from three vibration sensors were selected as experimental data.To simulate the application in the actual industry, the data were divided according to the time order.The first 70% of the data points of each signal in order were regarded as the training data, and the last 30% of the data points were regarded as the test data.Data enhancement can improve the generalization ability of deep learning algorithms [36].Data enhancement with overlapping samples was adopted in this article, as shown in Figure 8.For each training sample, there was an overlap of data points with the subsequent sample.The example in Figure 8 has 1024 data points per sample, and there are 704 overlapping data points.

Comparision Methods
To demonstrate the validity and superiority of AWResNet, we compared the AWRes-Net algorithm with many existing methods including LeNet, CNN, convolutional bidirectional long short-term memory (CBiLSTM), MSCNN and ResNet.
LeNet: LeNet is the classical deep learning method for image classification.The improved LeNet network structure adopted in this paper is successively as follows: Conv+ReLU layer with an output channel of 6; the maximum pooling layer area is 2 × 2, with a stride of 2; Conv+ReLU layer with an output channel of 16; the adaptive maximum pooling layer output size is 5 × 5.The LeNet model has a convolutional kernel size of 5 × 5.

Comparision Methods
To demonstrate the validity and superiority of AWResNet, we compared the AWRes-Net algorithm with many existing methods including LeNet, CNN, convolutional bidirectional long short-term memory (CBiLSTM), MSCNN and ResNet.
LeNet: LeNet is the classical deep learning method for image classification.The improved LeNet network structure adopted in this paper is successively as follows: Conv+ReLU layer with an output channel of 6; the maximum pooling layer area is 2 × 2, with a stride of 2; Conv+ReLU layer with an output channel of 16; the adaptive maximum pooling layer output size is 5 × 5.The LeNet model has a convolutional kernel size of 5 × 5.
CNN: One of the most significant differences from LeNet is that this CNN model adds the BN layer.The CNN model structure includes two Conv+BN+ReLU layers with output channels of 16 and 32; the maximum pooling layer area of 2 × 2, with a stride of 2; two Conv+BN+ReLU layers with output channels of 64 and 128; the adaptive maximum pooling layer output size of 4 × 4. The CNN model has a convolutional kernel size of 3 × 3.
CBiLSTM: A CNN can extract spatial correlation features from data, while bidirectional long short-term memory (BiLSTM) can extract temporal correlation features from data.According to the literature [37], CNN+BiLSTM (CBiLSTM) was adopted in comparison experiments.
MSCNN: According to the literature [23], by using different convolution kernel sizes, an MSCNN can extract multi-scale features from spindle vibration signals and achieve good results in spindle rotation error prediction.Therefore, an MSCNN was used in the comparison experiments.
ResNet: According to the literature [24], through identity mapping, ResNet can accurately predict spindle rotation error, so ResNet was used for comparison experiments.
In addition, to ensure the fairness of the comparison results, the same hyperparameters were used for the proposed model and other comparative models.We set the batch size to 64, which affected the accuracy and training speed of the model.Adam was the parameter optimization method.Momentum was an important parameter, with a value of 0.9.The learning rate had a value of 0.001.L2 regularization was employed to optimize model training.The parameter was set to 0.00001.The experimental maximum epoch was set to 100.The above parameters followed the benchmark settings for deep learning used in mechanical industry fault diagnosis in reference [37].
The performance evaluation index of the model is classification accuracy, which can be expressed as follows: where Sum(True i ) represents the number of test samples; Acc represents classification accuracy; Sum(Test i == True i ) indicates that the number of the predicted labels of the test sample is equal to real labels.The larger the Acc, the better the model performance.

Prediction Results
Five experiments were performed for each method.The prediction accuracy and standard deviation of the five experiments are presented in Figure 9 and Table 2.The average prediction accuracy of AWResNet models was significantly better than that of other deep learning models.Compared with LeNet, CNN and BiLSTM methods, the average prediction accuracies of the AWResNet model were significantly improved by 5.84%, 4.49% and 2.95%, respectively.The above three methods have fewer convolution layers and could not extract helpful feature information from vibration data.Compared with the MSCNN, the average accuracy of the AWResNet model was improved by 2.07%.Since the MSCNN did not use identity mapping, the accuracy was lower than that of ResNet and AWResNet.The average accuracy of the AWResNet model was improved by 0.64% compared to the ResNet method.This was due to the attention weighting unit's ability to adaptively recalibrate the importance of multi-sensor data, assigning greater weights to channel data containing more degraded information, thereby enhancing useful information and suppressing redundant information.

Confusion Matrix
The confusion matrix was utilized to observe the classification accuracy of the network in each category.The confusion matrices of ResNet and AWResNet models for the classification of spindle rotation errors are shown in Figure 10.Each row represents the predicted label category, and each column represents the real label category.The data in row  and column  in the figure represent the proportion of categories in row  predicted to correspond to categories in column .As can be seen from the figure, in 14 of the 20 classes (rotation errors were 5 µm, 5.5 µm, 6.5 µm, 7 µm, 7.5 µm, 8.5 µm, 9 µm, 10 µm, 10.5 µm, 11 µm, 12 µm, 12.5 µm, 13 µm and 14 µm), the classification accuracy of the AWResNet model was higher than the ResNet model.For two categories (11.5 µm, 13.5 µm), the classification accuracy of the AWResNet model was equal to the ResNet model.Although the accuracy of the AWResNet model was slightly lower than that of the ResNet model for rotation errors of 6 µm, 8 µm, 9.5 µm and 14.5 µm, the AWResNet predictions were generally close to the actual values.For example, 14% of the 9.5 µm samples were predicted to be 9 µm, and 5% of the 14.5 µm samples were predicted to be 14.This was due to the fact that the rotation error division spacing was too small, and the vibration signals between the current category and the neighboring categories were very similar, making it difficult for the model to extract the weak feature information.Such prediction results still represent very essential guidance for actual processing.The confusion matrix further demonstrates that AWResNet outperformed ResNet in classification in most categories.In addition, this paper compared the computational complexity, inference time and number of parameters of different methods, where the inference time was the time consumed by 2320 test samples on the dataset.Although LeNet, CNN and BiLSTM had relatively low computational complexity, inference time and parameters, their prediction accuracies were significantly lower than the AWResNet models.For the MSCNN, not only was the prediction accuracy lower than the AWResNet model, but the multi-scale structure also brought a large number of parametric quantities.Compared with ResNet, AWResNet brought few additional parameters, and the inference time for 2320 samples increased by 0.07 s, (translating to 0.03 ms for a single sample), and the increased inference time was negligible.

Confusion Matrix
The confusion matrix was utilized to observe the classification accuracy of the network in each category.The confusion matrices of ResNet and AWResNet models for the classification of spindle rotation errors are shown in Figure 10.Each row represents the predicted label category, and each column represents the real label category.The data in row i and column j in the figure represent the proportion of categories in row i predicted to correspond to categories in column j.As can be seen from the figure, in 14 of the 20 classes (rotation errors were 5 µm, 5.5 µm, 6.5 µm, 7 µm, 7.5 µm, 8.5 µm, 9 µm, 10 µm, 10.5 µm, 11 µm, 12 µm, 12.5 µm, 13 µm and 14 µm), the classification accuracy of the AWResNet model was higher than the ResNet model.For two categories (11.5 µm, 13.5 µm), the classification accuracy of the AWResNet model was equal to the ResNet model.Although the accuracy of the AWResNet model was slightly lower than that of the ResNet model for rotation errors of 6 µm, 8 µm, 9.5 µm and 14.5 µm, the AWResNet predictions were generally close to the actual values.For example, 14% of the 9.5 µm samples were predicted to be 9 µm, and 5% of the 14.5 µm samples were predicted to be 14.This was due to the fact that the rotation error division spacing was too small, and the vibration signals between the current category and the neighboring categories were very similar, making it difficult for the model to extract the weak feature information.Such prediction results still represent very essential guidance for actual processing.The confusion matrix further demonstrates that AWResNet outperformed ResNet in classification in most categories.

Weight Visualization
To further demonstrate that the AWResNet model can learn weights of varying sizes for different channels to recalibrate the data, thereby highlighting important information and suppressing redundant information, the weights of the last batch in the second adaptive weighting RBU4 module are visualized along 512 channels in Figure 11.The weight visualization graph indicates that different channels had different weights, and the maximum weight value was 0.8109, which appeared in channel 286.The minimum weight was

Weight Visualization
To further demonstrate that the AWResNet model can learn weights of varying sizes for different channels to recalibrate the data, thereby highlighting important information and suppressing redundant information, the weights of the last batch in the second adaptive weighting RBU4 module are visualized along 512 channels in Figure 11.The weight visualization graph indicates that different channels had different weights, and the maximum weight value was 0.8109, which appeared in channel 286.The minimum weight was 0.3007 and appeared in channel 502.As shown in Figure 11, channels 7, 173, 182, 236 and 286 all had weights more than 0.7, which corresponded to more important feature information; channels 89, 149, 177, 293, 328, 348, 385, 407, 408, 502 and 510 all had weights less than 0.4, which corresponded to relatively unimportant feature information.Attention weighting units adopted larger weights to reinforce important features, while smaller weights weakened unnecessary features.This was mainly due to differences in the data collected by the vibration sensors installed at the base end, the spindle end and the bearing end.Different rotation errors or samples are not consistently sensitive to different sensor data.Some may be more sensitive to data from the bearing end, some more sensitive to the spindle end and some more sensitive to the base end.Thus, there are differences in the importance of the data from different channels and differences in the importance of the features extracted from varying channels for the prediction of the rotation error.The weight visualization further demonstrates that attention weighting units learn different sizes of weights for different channels, emphasizing important feature information and suppressing redundant information.Thus, AWResNet can accurately extract the critical features of different rotation errors and avoid the influence of similar features.
Sensors 2024, 24, x FOR PEER REVIEW 17 of 20 0.3007 and appeared in channel 502.As shown in Figure 11, channels 7, 173, 182, 236 and 286 all had weights more than 0.7, which corresponded to more important feature information; channels 89, 149, 177, 293, 328, 348, 385, 407, 408, 502 and 510 all had weights less than 0.4, which corresponded to relatively unimportant feature information.Attention weighting units adopted larger weights to reinforce important features, while smaller weights weakened unnecessary features.This was mainly due to differences in the data collected by the vibration sensors installed at the base end, the spindle end and the bearing end.Different rotation errors or samples are not consistently sensitive to different sensor data.Some may be more sensitive to data from the bearing end, some more sensitive to the spindle end and some more sensitive to the base end.Thus, there are differences in the importance of the data from different channels and differences in the importance of the features extracted from varying channels for the prediction of the rotation error.The weight visualization further demonstrates that attention weighting units learn different sizes of weights for different channels, emphasizing important feature information and suppressing redundant information.Thus, AWResNet can accurately extract the critical features of different rotation errors and avoid the influence of similar features.

Anti-Noise Experiment
In practical industrial applications, the monitoring signals collected by vibration sensors are often affected by environmental noise generated by vibration and friction, thereby reducing the quality of monitoring data.To evaluate the prediction performance of AWResNet in noisy environments, we conducted anti-noise experiments by adding Gaussian and Laplace noise signals with different signal-to-noise ratios (SNRs) to the original signal.The SNR can be expressed as follows: where  represents the original signal power and  represents the noise signal power.In the experiment, Gaussian and Laplace noise with an SNR of 12 dB, 10 dB and 8 dB was added to the original signal.Table 3 shows that the prediction accuracy of the AWResNet model was 90.74%, 89.59% and 87.12%, respectively, when the Gaussian noise level was 12 dB, 10 dB and 8 dB.As the SNR decreased and the noise became stronger, the

Anti-Noise Experiment
In practical industrial applications, the monitoring signals collected by vibration sensors are often affected by environmental noise generated by vibration and friction, thereby reducing the quality of monitoring data.To evaluate the prediction performance of AWResNet in noisy environments, we conducted anti-noise experiments by adding Gaussian and Laplace noise signals with different signal-to-noise ratios (SNRs) to the original signal.The SNR can be expressed as follows: SNR = 10log 10 P signal P noise (11) where P signal represents the original signal power and P noise represents the noise signal power.In the experiment, Gaussian and Laplace noise with an SNR of 12 dB, 10 dB and 8 dB was added to the original signal.Table 3 shows that the prediction accuracy of the AWResNet model was 90.74%, 89.59% and 87.12%, respectively, when the Gaussian noise level was 12 dB, 10 dB and 8 dB.As the SNR decreased and the noise became stronger, the prediction accuracy of all models decreased, but the predictive performance of AWResNet consistently outperformed other deep learning models.In particular, compared to the ResNet model, AWResNet improved prediction accuracy by 1.41%, 1.89% and 1.11% under three noise levels.When Laplace noise was added, the prediction accuracies were 90.91%, 89.53% and 87.71% at an SNR of 12 dB, 10 dB and 8 dB, respectively.The prediction accuracy was significantly higher than that of other deep learning models.Compared to the ResNet model, it was improved by 1.73%, 1.82% and 1.77%.It can be concluded that the AWResNet model has better anti-noise robustness and stability.This advantage contributes to the practical application of the AWResNet model.

Conclusions
The existing measurement methods of spindle rotation error are usually implemented on the premise of spindle idling, which is challenging to use for real-time monitoring and real-time rotation error compensation in actual machining.The prediction method based on vibration signal does not consider multi-sensor data interaction and highlight critical sensor data.To solve this problem, a new AWResNet model for spindle rotation error prediction is proposed in this paper.The AWResNet model mainly implements the adaptive recalibration of data in different channels through the attention weighting unit embedded behind the RBU module.To evaluate the effectiveness and superiority of the AWResNet model, we carried out prediction experiments on a machine tool spindle reliability test bed and concluded the following: (1) The results of rotation error prediction experiment show that the prediction accuracy of AWResNet model is 92.72%, which is significantly higher than other deep learning models.
(2) The confusion matrices show that the AWResNet model is more accurate than the ResNet model in 14 out of 20 categories.
(3) Weight visualization shows that the embedded attention weighting unit can learn different weights for each channel.The weight is between 0.3007 and 0.8109.AWResNet model can highlight important feature information and suppress redundant information.
(4) The anti-noise experiments indicate that the accuracy of the AWResNet model is higher than that of the ResNet model under three different noise levels of Gaussian and Laplace noise and that the AWResNet model has better robustness, stability, and more significant potential for industrial applications.
In our future research work, we will collect more rotation errors and vibration signals from different types of spindles, which will be utilized to study the generalization and transfer ability of the model.

Figure 2
illustrates the raw vibration data and their corresponding STFT timefrequency representation.(a) and (b) represent the time domain of the vibration sensor at the spindle end with a rotation error of 8.5 µm when the spindle speed is 1000 r/min and the vibration sensor at the bearing end with a rotation error of 12 µm when the spindle speed is 3000 r/min, respectively.(c) and (d) are time-frequency domain representations of (a) and (b), respectively.By observing the time domain signals under different rotation errors, it is found that the vibration data under different rotation errors have apparent similarities.In the STFT time-frequency domain feature map, the vertical axis indicates the frequency bands at different frequencies, and the horizontal axis indicates the time points.It can be seen that researchers find it difficult to simply use time or time-frequency domain signal analysis methods to distinguish different categories of rotation errors.

Figure 2 .
Figure 2. Time and time-frequency domain characterization of vibration data: (a) Time domain representation of spindle-end vibration signal with a rotation speed of 1000 r/min and a rotation error of 8.5 µm; (b) time domain representation of spindle-end vibration signal with a rotation speed of 3000 r/min and a rotation error of 12 µm; (c) time-frequency domain representation of spindle-end vibration signal with a rotation speed of 1000 r/min and a rotation error of 8.5 µm; (d) time-frequency domain representation of spindle-end vibration signal with a rotation speed of 3000 r/min and a rotation error of 12 µm.

Figure 2 .
Figure 2. Time and time-frequency domain characterization of vibration data: (a) Time domain representation of spindle-end vibration signal with a rotation speed of 1000 r/min and a rotation error of 8.5 µm; (b) time domain representation of spindle-end vibration signal with a rotation speed of 3000 r/min and a rotation error of 12 µm; (c) time-frequency domain representation of spindle-end vibration signal with a rotation speed of 1000 r/min and a rotation error of 8.5 µm; (d) time-frequency domain representation of spindle-end vibration signal with a rotation speed of 3000 r/min and a rotation error of 12 µm.
and comprises four parts: global information extraction, channel interrelationship modeling, weight calculation and weighted output.

)
Model offline training.The prepared training set data were fed into the AWResNet model, and the model was trained by the gradient descent method.Forward propagation calculated the loss, and back propagation updated the model parameters.The detailed algorithm flow for training AWResNet model is presented in Algorithm 1.When the maximum training epoch was reached, the trained model weight and bias were saved as local files for model testing and calling during online deployment.

Figure 6 .
Figure 6.The spindle rotation error prediction procedure of the proposed AWResNet method.

Figure 6 .Algorithm 1 :
Figure 6.The spindle rotation error prediction procedure of the proposed AWResNet method.

Figure 7 .
Figure 7. Test bench for spindle rotation error.

Figure 9 .
Figure 9. Average prediction accuracy of different methods.

Figure 9 .
Figure 9. Average prediction accuracy of different methods.

Figure 11 .
Figure 11.Visualization of different channel weights.

Figure 11 .
Figure 11.Visualization of different channel weights.

Table 3 .
Prediction accuracy in different noise environments.