An Integrated Health Condition Detection Method for Rotating Machinery Using Refined Composite Multivariate Multiscale Amplitude-Aware Permutation Entropy

With a view to realizing the fault diagnosis of rotatingmachinery effectively, an integrated health condition detection approach for rotating machinery based on refined composite multivariate multiscale amplitude-aware permutation entropy (RCmvMAAPE), max-relevance andmin-redundancy (mRmR), and whale optimization algorithm-based kernel extreme learningmachine (WOAKELM) is presented in this paper. 1e approach contains two crucial parts: health detection and fault recognition. In health detection stage, multivariate amplitude-aware permutation entropy (mvAAPE) is proposed to detect whether there is a fault in rotating machinery. Afterward, if it is detected that there is a fault, RCmvMAAPE is employed to extract the initial fault features that represent the fault states from the multivariate vibration signals. Based on the multivariate expansion and multiscale expansion of amplitude-aware permutation entropy, RCmvMAAPE enjoys the ability to effectively extract state information on multiple scales from multichannel series, thereby overcoming the defect of information loss in traditional methods. 1en, mRmR is adopted to screen the sensitive features so as to form sensitive feature vectors, which are input into the WOA-KELM classifier for fault classification. Two typical rotating machinery cases are conducted to prove the effectiveness of the raised approach. 1e experimental results demonstrate that mvAAPE shows excellent performance in fault detection and can effectively detect the fault of rotating machinery. Meanwhile, the feature extraction method based on RCmvMAAPE and mRmR, as well as the classifier based on WOA-KELM, shows superior performance in feature extraction and fault recognition, respectively. Compared with other fault identification methods, the raised method enjoys better performance and the average fault recognition accuracy of the two typical cases in this paper can all reach above 98%.


Introduction
As one of the widely applied mechanical equipment, rotating machinery plays a vital role in industrial production. Nevertheless, it usually operates in harsh environments such as heavy load and high speed, which greatly increases the risk of faults. ese faults may result in equipment shutdown and even casualties cause if they are not dealt with in time [1,2]. Due to the particularity of industrial machinery, direct disassembly overhaul will affect normal production. Hence, research on nondisassembly health condition detection technology of rotating machinery has always been a hotspot. When encountering faults, some changes will occur in the internal structure of rotating machinery, which affects the frequency and amplitude of vibration signals. It indicates that the vibration signals contain a wealth of information related to the operating states of rotating machinery [3,4]. Consequently, analyzing vibration signals is a feasible method for fault diagnosis [5].
e essences of vibration signals-based fault diagnosis are the fault feature extraction and pattern recognition issues. Among which, how to extract the features which can represent the working states from the vibration signals is the key in fault diagnosis. In the past decades, time-frequency analysis is widely applied in feature extraction of vibration signals. Many time-frequency analysis methods such as empirical mode decomposition (EMD) [6], local mean decomposition (LMD) [7], wavelet packet transform (WPT) [8], and variational mode decomposition (VMD) [9] are applied to fault diagnosis of rotating machinery. Unfortunately, the vibration signals of rotating machinery usually exhibit nonlinear and nonstationary characteristics, which cause the above methods to have some defects in practical applications. For instance, WPT needs to choose the suitable wavelet kernel function [8] and VMD need to set the penalty factor α and the number of intrinsic mode functions (IMFs) K before processing the vibration signals [10], thereby the self-adaptive capacity of them is poor. EMD enjoys good adaptability, but it has defects such as mode mixing and end effect. In addition, the application of time-frequency analysis methods alone requires the operators to have a certain knowledge reserve, which limits the efficiency and application scope of these methods. erefore, developing an efficient and accurate fault feature extraction tool is urgent and necessary.
Recently, the entropy-based theory has been widely adopted as feature extraction tool in the field of fault diagnosis due to its excellent performance in measuring the nonlinear complexity of time series [11]. Entropy methods that are commonly applied include approximate entropy (AE) [12], sample entropy (SE) [13], fuzzy entropy (FE) [14], and permutation entropy (PE) [15]. Among them, AE is highly dependent on the data length and is prone to undefined entropy value. SE and FE are time-consuming, so they are not suitable for processing signals with a large amount of data, while PE is favored by many scholars because of its high computational efficiency and strong antinoise ability. Zhang et al. [16] adopted PE to detect bearing faults and proposed a bearing fault diagnosis model based on PE, ensemble empirical mode decomposition, and optimized SVM. Kuai et al. [17] proposed a fault diagnosis method for planetary gears based on PE, CEEMDAN, and ANFIS.CEEMDAN is applied to decompose the vibration signal of planetary gears, and PE is used to extract the characteristics of the obtained IMFs. Finally, ANFIS is used as a classifier to complete fault identification. Nevertheless, PE also exists some inherent defects. For example, it loses sight of the influence of amplitude information of signals on the entropy value, which may lose the crucial information.
To address this problem, Azami et al. [18] presented the amplitude-aware permutation entropy (AAPE), which is not only sensitive to the frequency but also sensitive to the amplitude of signals. e excellent performance of AAPE has been verified through the simulation and biological signals experiments.
However, AAPE also possesses some shortcomings that cannot be ignored. Firstly, AAPE only measures the complexity of the measured signal on one temporal scale, thereby cannot capture the long correlation of the signal [19]. To address this question, based on multiscale entropy theory [19], multiscale amplitude-aware permutation entropy (MAAPE) was proposed to extract the fault information of rolling bearings [20]. Unfortunately, MAAPE enjoys poor stability, especially for short-time series. e defect will cause MAPPE to produce unreliable entropy values on high scales. Secondly, AAPE cannot extract fault features from multichannel vibration signals, which limits its ability to extract fault information for large equipment. For large equipment, the long transmission path will reduce the vibration impulse to a certain extent. In other words, the fault information will be lost. erefore, the vibration signal collected by single channel is usually not enough to provide enough fault information to identify the fault type [21]. It is necessary to improve AAPE so that it can extract fault features from multichannel vibration signals synchronously.
With a view to solving the aforementioned defects, refined composite multivariate multiscale amplitude-aware permutation entropy (RCmvMAAPE) is presented in this paper. Compared with the existing AAPE methods, the proposed RCmvMAAPE possesses two main improvements. Firstly, refined composite multiscale method is employed to substitute the traditional multiscale method in MAAPE to overcome the entropy instability problem [22]. In addition, on the basis of multidimensional embedding reconstruction theory [23], AAPE is expanded to multivariate AAPE (mvAAPE) to measure the complexity of multichannel vibration signals. Based on the above improvements, RCmvMAAPE overcomes the abovementioned defects and can stably measure the complexity of multichannel signals on multiple scales. e performance of RCmvMAAPE is comprehensively tested utilizing a variety of synthetic signals in this paper, and the results indicate that RCmvMAAPE can availably measure the complexity of multivariate signals. In view of the advantages of RCmvMAAPE, this paper employs it to extract the fault features of multichannel vibration signals of rotating machinery.
As we know, the fault features distributed on multiple scales extracted by RCmvMAAPE are a high-dimensional feature vector. Among which, some sensitive features can effectively represent the fault information, but some redundant features not only affect the accuracy of subsequent fault classification but also reduce the diagnosis efficiency. For this reason, it is necessary to compress the high-dimensional fault features to improve the fault recognition rate. e max-relevance and min-redundancy (mRmR) is a typical features selection method based on spatial search, which uses mutual information to measure the relevance and redundancy of features [24]. e maximum correlation indicates that the feature has a large correlation with the sample category, that is, it can reflect the sample category information to the greatest extent. Minimal redundancy means that the correlation between features is the smallest, that is, the redundancy of features is the smallest. is paper adopts mRmR to select the sensitive features to form sensitive features vectors that represent the fault state of rotating machinery.
Afterward, different fault states of rotating machinery will be identified according to the sensitive feature vectors, namely, pattern recognition. At this stage, a classifier with high computational efficiency and good generalization performance is needed. Kernel extreme learning machine (KELM) [25] is a machine learning method that combines ELM and kernel function. While retaining the high calculation efficiency of ELM, the introduction of kernel function enables KELM to enjoy stronger generalization ability compared with commonly used classifiers such as BP neural network (BP) [26], support vector machine (SVM) [27], and extreme learning machine (ELM) [28] when dealing with linear inseparable problems; meanwhile, KELM is sensitive to parameter setting due to the existence of kernel function. To choose the best parameters, we need to employ a suitable optimization algorithm to determine the best parameters of KELM. Commonly used optimization algorithms consist of particle swarm optimization (PSO) [29], ant colony optimization (ACO) [30], and whale optimization algorithm (WOA) [31]. Among which, WOA has attracted more and more attention due to its uncomplicated operation, less adjustment parameters, and strong capability to jump out of local optimum. erefore, WOA is utilized to iteratively select the optimal parameter of KELM to build a classifier based on WOA-KELM. e low-dimensional sensitive feature vectors are input into WOA-KELM so as to judge the fault type of the rotating machinery.
Consequently, a new integrated health detection method for rotating machinery is proposed, which includes two parts: fault detection and fault identification. In the fault detection stage, mvAAPE is employed to extract the features of the vibration signals to determine whether the rotating machinery is malfunctioning. By introducing the key link of fault detection, the unnecessary disassembly and maintenance of the equipment can be avoided, and the damage to the equipment can be reduced. In the fault identification stage, the presented method based on RCmvMAAPE, mRmR, and WOA-KELM is applied to diagnose different fault types and fault severity of rotating machinery. Two examples are conducted to prove the performance of the proposed method and its superiority compared to other existing methods. e rest of the paper is arranged as follows: in Sections 2 and 3, the basic theory of RCmvMAAPE and WOA-KELM is introduced in detail; Section 4 displays the steps of the proposed approach; two typical cases are adopted for experiments to verify the excellent performance of the proposed approach in Section 5; finally, this paper is summarized in Section 6.  [15]. For a given time series X � x i , i � 1, 2, . . . , N, at any time point t, the m dimensional reconstruction vector can be obtained as

The Basic Theory of RCmvMAAPE
where m denotes the embedding dimension and d denotes the time delay.
For each reconstruction vector, in accordance with the size of the elements in ascending order, the permutation π r 0 ,r 1 ,...,r m− 1 can be acquired, which fulfills that where j * represents the index of the column of each element in the reconstructed component. Accordingly, there are m! possible permutation patterns, of which the i-th permutation is marked as π i . e relative frequency of π i can be expressed as where g(π i ) represents the function that counts the number of π i in X m,d t . e value of g(π i ) will increase by 1 if the permutation order of the internal elements of X m,d t is π i . Consequently, based on the calculation theorem of Shannon entropy, PE can be defined as Nevertheless, PE enjoys some nonnegligible deficiencies, which led to its inability in describing the irregularity of the series. Firstly, from the theoretical point of view, the original PE algorithm only considers the effect of the ordinal structure of the time series on the entropy value, but the amplitude information of each mapped element in the series is ignored. Secondly, when there are elements with equal amplitude, their influence on the entropy value cannot be accurately estimated. In view of the aforementioned defects of PE, Azami proposed AAPE to significantly enhance the performance of PE [18]. e basic principle of the AAPE algorithm is as follows: Supposing that the starting value of p(π i ) is 0, for the reconstruction vector X m,d t , when the time t adds from 1 to N − m + 1, the value of p(π i ) is updated whenever the permutation is π i .
where α ∈ [0, 1] denotes the adjustment coefficient which is utilized to adjust the weight of the time series amplitude average and the deviation between the amplitudes. us, the probability of p(π i ) is Mathematical Problems in Engineering e AAPE of time series x can be defined as 2.1.2. mvAAPE. To describe the complexity of multichannel time series, it is necessary to extend the AAPE to multivariate analysis so as to put forward multivariate amplitude-aware permutation entropy (mvAAPE). e definition of mvAAPE is described as follows: (1) Given a p-channel series X � x c,1 , x c,2 , . . . , x c,i , . . . , x c,N }, c � 1, 2, . . . , p, phase space reconstruction is performed as follows: (2) Arrange the reconstruction time series Z m,d i in as- (3) For c-th channel, supposing that the starting value of p(π c,i ) is 0, for the reconstruction series Z m,d i , when t gradually increases from 1 to N − m + 1, the value of p(π c,i ) will be renewed as π c,i appears.
(4) Calculate the relative frequency of i-th permutation in c-th channel π c,i as follows: For p-channel time series, p(π c,i ) satisfies p c�1 m! j�1 p(π c,i ) � 1. (5) e probability of the i-th pattern π i in p-channel time series X can be calculated as follows: (6) Based on the definition of Shannon entropy, mvAAPE is expressed as where mvAAPE actually extends the application of AAPE from univariate analysis to multivariate analysis. However, mvAAPE only analyzes the multichannel time series on one temporal scale, while the measured time series often contains information on multiple scales. erefore, the key information will lose if only a single scale analysis is conducted. In response to this problem, mvMAAPE that is able to analyze time series on multiple scales is proposed.

mvMAAPE.
e principle of mvMAAPE is as follows: (1) For p-channel series U � u k,1 , u k,2 , . . . , u k,i , . . . , u k,L }, k � 1, 2, . . . , p, the multivariate coarse-grained time series at scale factor τ is defined as follows: When τ > 1, the multivariate series is divided into coarse-grained time series of length [L/τ]. (2) Calculate the mvAAPE of τ multivariate coarsegrained time series and the result is as follows: where mvMAAPE overcomes the shortcomings that PE does not consider the amplitude information; meanwhile, the combination with multivariate analysis improves the utilization of multichannel information, which is essentially an assessment of the irregularity of multichannel data. e evaluation principle can be summarized into two aspects: (1) if the entropy value of the multivariate series X is greater than that of series Y on most scale factors, it can be shown that X is more random than Y and more prone to dynamic mutations. (2) If the entropy value of X decreases significantly with the increase of the scale factor, it indicates that the information included in X mainly appears on a smaller scale factor, such as a random white noise signal. mvMAAPE considers the interrelationship of each time series in multichannel data and comprehensively evaluates each dimension of multichannel series. erefore, mvMAAPE can effectively detect the mutation change of multichannel series. e mvMAAPE realizes multivariate and multiscale analysis by extending the mvAAPE method to multiple scales, so as to obtain more useful information. However, the coarse-graining method adopted by mvMAAPE has serious defects, which leads to incomplete information analysis. For instance, the calculation of mvMAAPE only considers the coarse-graining series starting from u k,1 and ignores the coarse-graining series such as u k,2 at scale factor τ. However, the remaining τ − 1 time series also contain the key information, and the direct neglect will lead to insufficient analysis and affect the analysis effect. erefore, the refined composite multiscale coarsegraining approach is employed to achieve accurate and sufficient analysis. e implementation principle of the coarse-graining method is presented in Figure 1.

Refined Composite
e Detailed Procedures of RCmvMAAPE are Described as follows: (1) For p-channel series U � u k,1 , u k,2 , . . . , u k,i , . . . , u k,L }, k � 1, 2, . . . , p, the coarse-grained multivariate time series are computed on a given scale factor τ and the elements of the a-th coarse-grained time series Y τ a � y τ k,i,1 , y τ k,i,2 , . . . are computed by For the scale factor τ, there will be τ diverse coarse-grained multivariate time series. (2) For each coarse-grained multivariate series, the marginal relative frequencies p(π j ) are computed. en, the average relative frequencies p(π j ) can be acquired by (3) e RCmvMAAPE of original multivariate time series is computed as follows: In the RCmvMAAPE approach, there are three key parameters, namely, the m, α, and d. For the embedding dimension m, if the value is too small, the reconstructed vector includes too few states and the algorithm will lose its validity and significance, whereas if m is too large, the phase space reconstruction will homogenize the time series, which not only increases the amount of calculation but also cannot reflect the slight change of the time series. According to references [18,29], the AAPE for univariate analysis usually sets the embedding dimension to 3-7, and the optimal parameters of the univariate analysis method and multivariate analysis are generally consistent, so this article sets the embedding dimension to m � 5. e adjustment coefficient α is usually set to 0.5 according to reference [18], so this article sets α � 0.5. Time delay has little effect on the performance of the algorithm, so in this article, d � 1.

Performance Analysis.
To validate the performance of RCmvMAAPE, other multivariate analysis approaches are compared with it to reflect its advantages in extracting the complexity of multichannel signals. White Gaussian noise (WGN) and 1/f noise are two signals that are widely adopted to evaluate the univariate and multivariate analysis method. Compared with WGN signals, the power spectrum of 1/f noise is more complicated and includes more mode information. e generation of WGN is randomly distributed, so the probability of its state transition matrix appearing is approximately equal. On the contrary, 1/f noise is a longrange correlation signal, and the irregularity of 1/f noise is lower than that of WGN. Consequently, the complexity of 1/ f noise is higher than that of WGN. Considering the universality, WGN and 1/f noise are employed to create a multichannel signal with three different channels to analyze RCmvMAAPE, mvMAAPE, RCmvMSE, and RCmvMPE. ey are (a) three channel WGN; (b) three channel 1/f noise; (c) two channel WGN and one channel 1/f noise; and (d) two channel 1/f noise and one channel WGN.
ere are 25 groups (length 2048) of the synthesized signals in each case.
For sake of verifying the advantages of the proposed approach in measuring the complexity of multivariate signals, RCmvMAAPE, mvMAAPE, RCmvMPE, and RCmvMSE of four kinds of multivariate synthetic signals are calculated.
e mean standard deviation diagrams of the four methods are shown in Figure 2. Compared with mvMAAPE, RCmvMPE, and RCmvMSE, the standard deviation of RCmvMAAPE is significantly smaller than mvMAAPE and RCmvMSE, which indicates that the stability and robustness of RCmvMAAPE are stronger than mvMAAPE and RCmvMSE. It can be clearly seen from the figure that RCmvMAAPE can effectively separate four multivariate synthetic signals, proving that RCmvMAAPE Mathematical Problems in Engineering has better separation performance. What's more, the fluctuation of the RCmvMPE curve is greater than that of RCmvMAAPE, especially the fluctuation of (d) is obvious.
is phenomenon shows that RCmvMAAPE is more stable when analyzing multivariate data and is not prone to large errors. In addition, when the scale factor is 14-20, RCmvMSE cannot effectively distinguish between (b) and (d). Similarly, mvMAAPE cannot effectively distinguish (a) and (c); meanwhile, the entropy value of four multivariate signals has extremely large fluctuation, which also verifies that the traditional coarse-graining method is prone to large errors. In a word, compared with the other three multivariate analysis methods, RCmvMAAPE enjoys better separation performance and robustness, thereby can better characterize the complexity of multivariate signals.

Kernel Extreme Learning Machine.
Kernel extreme learning machine is a training algorithm based on singlehidden layer feedforward neural network. It does not require to repeatedly adjusting the hidden layer parameters [28]. In addition, the conventional single-hidden layer feedforward neural network parameter training problem is transformed into solving linear equations, and the smallest norm leastsquares solution obtained is used as the network output weight.
e whole training process is completed once. erefore, the training speed is greatly improved and the generalization performance is better.
For input and output data, the goal of ELM is to simultaneously minimize training error and output weight norm, which can be expressed as follows: where β is the connection weight vector between the hidden layer and the output layer and h(x i ) is the kernel mapping of the hidden layer. e optimization problem of equation (18) is simplified to the following constraint problem: where ξ i stands for training error and C denotes the penalty factor. Using the theory of orthogonal projection, the training process of ELM is equivalent to solving the following dual optimization problems:  (20) where α i is the Lagrangian operator, and the derivative of it is Substituting formulas (20) and (21) into formula (22), the formula (23) can be equivalently written as follows: e corresponding output function of ELM is described as follows: It can be seen from the formula (25) that the parameter I/C is added to the main diagonal in the unit diagonal HH T , thereby its eigenvalue cannot be 0. en, the weight vector is  computed. ELM is more stable and has strong generalization ability in this way. e kernel function is introduced into ELM and the KELM algorithm is proposed. Mercer condition is applied to define the kernel function matrix of KELM as follows: where K(x i , x j ) denotes the kernel function and the elements of the kernel matrix Ω i,j in row i and column j, erefore, it can be concluded that the actual output of the KELM model is

Whale Optimization Algorithm.
Whale optimization algorithm (WOA) is a novel heuristic search optimization algorithm [31]. Its advantages lie in its uncomplicated operation, less adjustment parameters, and strong capability to jump out of local optimum. e algorithm mainly imitates three behaviors of humpback whale, including encircling prey, hunting prey, and searching prey. WOA supposes that the current best candidate solution is the target quarry or close to the best. After defining the best search agent, other search agents will therefore try to renew their best-located search agents. e update formula of WOA position is as follows: where A and C are the coefficients; t is the number of iterations; X(t) represents the current position vector of the whale; and X * (t) denotes the best whale position vector so far. e mathematical expressions of A and C are as follows: where T max represents the maximum number of iterations and r 1 and r 2 are random numbers in the interval [0, 1]. e value of a decreases linearly from 2 to 0, and t is the number of iterations.
When hunting, humpback whales not only swim to the prey in spiral shape but also contract the encircling circle. e position of whales is updated with 50% probability between the contraction mechanism and the spiral model.
where D ′ � |X * (t) − X(t)| denotes the distance between the whale and its prey; the constant b is used to define the spiral shape; and l is a random number in [− 1, 1]. When the humpback whale attacks the prey, by linearly reducing the value of parameter a, the fluctuation range of A is continuously decreased and the value of A in the interval [-a, a] decreases continuously as a decreases. When the value of A is in the interval [-1, 1], the solution position of the whale's next search agent will be any position between the current position and the prey position. By simulating the behavior of the humpback whale attacking the prey, the development capability of local search is shown. When the random value of A is greater than 1 or less than − 1, the humpback whale search agent moves away from the prey to search, thereby finding a more suitable prey, which shows the exploration function of the whale optimization algorithm in the global search.

Whale Optimization Algorithm-Based Kernel Extreme
Learning Machine (WOA-KELM). Considering that the performance of the KELM is easily affected by penalty factors and kernel parameters, a new method for optimizing the kernel extreme learning machine by whale optimization algorithm is raised. e optimization procedure is presented in Figure 3, and the detailed step is as follows: (1) Input training set and testing set samples and normalize the two sample sets, respectively.

The Proposed Approach
In this study, considering that RCmvMAAPE possesses excellent performance of processing multivariate time series, it is used to extract the fault features of rotating machinery. Combining mRmR and WOA-KELM, an integrated health condition detection method for rotating machinery is proposed. e method includes fault detection and health condition recognition.

Fault Detection.
e ability of mvAAPE to measure the complexity of multivariate nonlinear data and the probability of dynamic mutation is the basis for fault diagnosis. Since mvAAPE is proposed based on mvPE, it inherits the ability of mvPE to detect failures. e inconsistent entropy values of mvAAPE corresponding to different states are a prerequisite for fault screening.
e mvAAPE values of the rotating machinery vibration signals in all fault states are greater than that in the normal state, and the difference is obvious. erefore, mvAAPE can be applied for fault screening. In order to determine the screening criteria intuitively, a threshold based on mvMAAPE is set. When the mvMAAPE value of the vibration signal of rotating machinery in an unknown state is less than the threshold, the state is determined to be healthy. Conversely, if it is greater than the threshold, it is determined that there is a fault.

Health Condition Recognition.
After fault detection, if it is detected that there is a fault in rotating machinery, further analysis is required to judge the type and severity of the fault. Firstly, RCmvMAAPE is employed to acquire the nonlinear complex information of fault multichannel vibration signals to form the initial fault feature vectors. However, the RCmvMAAPE values at all scales may include redundant information, so it is necessary to compress the feature dimensions to obtain sensitive feature vectors. e mRmR is a dimensionality reduction algorithm for nonlinear data, which uses mutual information to measure the correlation and redundancy of features, so as to realize the importance ranking of features. erefore, the mRmR is utilized to screen the initial fault features to obtain sensitive feature vectors. Finally, the whale optimization algorithm is utilized to optimize the kernel function parameter and penalty factor of KELM to construct the optimal classification model and accomplish the health condition recognition of rotating machinery.
e flowchart of the raised approach is shown in Figure 4  and establish a threshold based on mvAAPE to determine the health condition of the rotating machinery. If the mvAAPE value of the vibration signal to be detected is less than the threshold value, it indicates that the rotating machinery is healthy. e output is normal and the diagnosis terminates. Otherwise, the next step is conducted to judge the fault type and severity of the rotating machinery.

Experimental Analysis and Results
In order to study the health condition detection method for rotating machinery raised in this paper to verify its universality and effectiveness for fault identification of general rotating machinery, experiments and analysis are conducted using two typical examples, namely, rolling bearings and gearboxes. e rolling bearing dataset was provided by CWRU [32]. e gearbox experiment data were collected on the QPZZ-II vibration analysis platform produced by Jiangsu Qianpeng Diagnostic Engineering Co., Ltd.

Experimental Rig and Data Introduction.
e data were collected by the high-precision multichannel sensor installed on the bearing experimental rig. e specific structure of the bearing experimental rig is presented in Figure 5. e experimental rig includes a motor, a torque transducer/encoder, control electronics, and a dynamometer. e installation position of the acceleration sensors is at the 12 o'clock position at both the drive end and fan end of the motor housing, which are connected with the magnetic casing.
e collected experimental data are the vibration waveforms of the motor, which are collected by the 16channel data recorder. Single-point faults are set on SKF rolling bearings by electrical discharge machining. e fault diameter is 0.1778 mm, 0.3556 mm, and 0.5334 mm, respectively, and the fault depth is 0.2794 mm. e three fault diameters represent the different severity of the bearing fault. e experimental environment is set as follows: the motor load is 0 hp, the motor speed is 1797 r/min, and the sampling frequency is 12 kHZ. In this article, the data used include 10 categories, normal bearings, inner race faults, outer race faults, and ball faults. e fault diameter of each fault state is 0.1778 mm, 0.3556 mm, and 0.5334 mm (label as NM, IRF1, IRF2, IRF3, ORF1, ORF2, ORF3, BF1, BF2, and BF3, respectively). For each fault state, the synchronous vibration signal at the drive end and fan end is used as dual-channel data. Generally, in the field of bearing fault diagnosis, the vibration signals are basically collected at the drive end.
Since the data quality of the driver end is higher, which contains less noise and can directly reflect the vibration of the output part, however, for the fault diagnosis of mechanical equipment, high accuracy of fault identification is our goal. erefore, it is necessary for us to use all available information to improve the utilization rate of information. e data of the fan end contains part of the fault information and the use of the data can significantly improve the characteristic quality, thus improving the fault recognition rate.
In this study, the vibration data of each working condition were divided into 58 samples without overlap, and the   number of sampling points of each sample was set to 2048. In order to be consistent with the engineering application under the actual condition, 28 samples for various working conditions are randomly selected for training, and the remaining 30 samples are the testing set. e effectiveness of the raised approach is validated by randomly selecting training and testing samples. e specific introduction of the dataset is presented in Table 1.

Fault Detection.
e time domain waveforms of rolling bearing under ten working conditions are shown in Figure 6. Due to the lack of regularity, it is hard to directly recognize diverse working conditions based on their original vibration signals. According to previous analysis, PE has the ability to detect faults, the mvAAPE is obtained based on the theory of multidimensional embedding, and reconstruction also enjoys the same function. erefore, mvAAPE can be used to detect whether the equipment is faulty. Figure 7 shows the mvAAPE values for all samples. As presented in Figure 7, the mvAAPE values in the fault states are generally large and the mvAAPE of the normal state is small, which is significantly different from the mvAAPE values of the fault states. Consequently, this method can be used to screen the normal state of the bearing. e value at the blue dotted line is defined as the mvAAPE threshold (2.9973). By comparing the mvAAPE value of the vibration signals with the threshold, the normal and fault states can be clearly distinguished. However, the samples of different fault types have poor separability, so mvAAPE cannot be used as the standard to judge the fault type and severity. A further analysis is needed to obtain more reliable characteristics. e fault samples have the maximum mvAAPE value, which demonstrates that they are more complicated than normal samples. When the bearing is in normal operation, the vibration mainly comes from the interaction and coupling between the mechanical parts and the ambient noise, thereby the vibration signal shows certain regularity. erefore, the mvAAPE value of normal condition is lower than that of the fault condition. When a fault occurs in the running process of the bearing, the vibration of the bearing will produce periodic pulse components. e high frequency vibration is mixed with the bearing vibration, which makes the frequency component and bandwidth of vibration signal more complex. e first procedure in fault diagnosis is health detection. For a complicated mechanical system, it is necessary to judge whether there is a fault in the component firstly and then identify the type and severity of the fault. If the system does not detect the fault, it indicates that the system is running normally, and there is no need to disassemble and repair it.

Fault Recognition.
Once a bearing fault is detected, the raised approach is used to distinguish the diverse fault types and severity. To validate the advantages of multivariate analysis, univariate analysis methods such as RCMAAPE are employed to test the bearing vibration signals at the drive end. By comparing with the univariate feature extraction method, the advantages of multichannel analysis in terms of information utilization are intuitively verified. Each method uses data from 9 fault conditions for experiments. e entropy results of univariate analysis method RCMAAPE and multivariate analysis methods RCmvMAAPE, RCmvMPE, RCmvMSE, and mvMAAPE are shown in Figures 8(a)-8(e).
Compared with other multivariate analysis methods shown in Figures 5(b)-5(d), the entropy deviation of RCmvMAAPE is smaller and the stability is higher. First of all, when the scale factor is 5-16, RCmvMPE has poor discrimination of NM, IRF3, and ORF3. In addition, mvMAAPE is generally poorly distinguished, and the entropy deviation of each fault state is very large, which indicates its performance is unstable and easily causes large errors. Except for NM and ORF2, the RCmvMSE curves of the other states are similar on most scales, and the degree of overlap is high, making it difficult to distinguish them. For the other two univariate analysis methods, entropy deviation is significantly greater than that of the multivariate analysis method, and the degree of entropy curve overlap is also greater than that of the multivariate analysis method. is is mainly because the univariate analysis method only uses the vibration information of one channel, so the utilization rate of information is relatively low, while the multivariate analysis method realizes the effective use of information by comprehensively considering the vibration information of multiple channels, thus improving the stability and robustness of the analysis. erefore, based on the abovementioned analysis, RCmvMAAPE is more effective in feature extraction than RCmvMPE, RCmvMSE, mvMAAPE, and RCMAAPE, while the quality of the extracted features is also higher.
According to the abovementioned analysis, although the features extracted by the RCmvMAAPE method have high quality and can represent the fault state well, the fault features on the partial scale enjoy low separability and cannot achieve satisfactory distinguishing effect. For the sake of reducing the redundancy between features and enhancing the separability of fault features, the mRmR approach is utilized to reduce the dimension of original features. e distribution of multiscale features after the rearrangement is visually described in Figure 9. e dimensionality of the new multiscale fault features is selected as 9 according to the correlation with the main fault information and the importance of the features. Finally, the obtained new fault features are input into the WOA-KELM classifier to determine the fault type and severity. Figure 10 shows the failure classification results for one trial. It can be clearly observed from the figure that all the faults have been accurately identified and the classification accuracy has reached 100%, which indicates that the proposed approach can availably distinguish the types and severity of faults.
In addition, for the sake of avoiding the influence of random factors such as contingency on the experimental results, 20 trials are repeated to obtain more accurate and reliable classification results. Moreover, four other entropybased methods are also used to diagnose rolling bearing faults.
e detailed classification results of the five approaches for 20 trials are presented in Figure 10 and Table 2.

Mathematical Problems in Engineering
From Figure 11 and Table 2, it is obvious that the average classification accuracy of the raised approach is higher than that of other approaches, and the average accuracy rate is 99.96%. Moreover, the accuracy of the multivariate analysis methods (RCmvMAAPE, RCmvMPE, RCmvMSE, and mvMAAPE) is generally higher than that of the univariate analysis method (RCMAAPE), which is consistent with the previous analysis. erefore, the comparison results indicate that the raised approach can effectively extract fault features and obtain high fault recognition rate.
To verify the necessity of mRmR feature selection, twodimensional projections of two random features selected without adopting the mRmR method are presented in Figure 12(a), while the first two sensitive features obtained applying the mRmR method are visualized as Figure 12(b). By comparing Figures 12(a) and 12(b), it can be clearly found that RCmvMAAPE combined with mRmR has a better recognition effect than using RCmvMAAPE alone. Moreover, nine random features (τ � 8, 19,1,17,9,3,20,14,6) are directly inputted into WOA-KELM to identify the fault type and the identification results are presented in Table 3. According to the results in Table 3, it can be clearly found that the fault recognition accuracy rate gained without using the mRmR method is lower than that gained with adopting the mRmR method. In addition, it can be noticed that the recognition accuracy of RCmvMAAPE is still higher than that of other methods without using mRmR. us, the experimental results again verify that RCmvMAAPE can extract fault features from multichannel signals effectively and improve the quality of fault information. e mRmR method can select sensitive low-dimensional features from high-dimensional fault    features, which not only improves the recognition accuracy but also improves the classification efficiency. is section discusses the superiority of using WOA algorithm to optimize KELM in fault identification. For comparison, three commonly used classifiers are used for comparison, namely, support vector machine (SVM), extreme learning machine (ELM), and kernel extreme learning machine (KELM). e ratio of training samples to testing samples remains the same. e diagnostic results of the five approaches using diverse classifiers are listed in Table 4. It can be seen that when the four classifiers are combined with the five feature extraction methods, the classification accuracy of WOA-KELM is the highest, which shows that WOA-KELM is an effective classifier. In addition, it can be clearly found that when the features obtained by different feature extraction methods are input to the four classifiers, the classification accuracy of RCmvMAAPE is the highest, which further verifies that the raised RCmvMAAPE approach has excellent performance in feature extraction.

Experimental Rig and Data Introduction.
e gearbox experiment data were collected from the experiment platform QPZZ-II that is built by Jiangsu Qianpeng Diagnosis Engineering Co., Ltd. e overall structure of the experimental platform is shown in Figure 13. e experimental platform is composed of gearbox, motor, iron base, capacitance, and sensors. e sensors are installed above the gearbox. e experimental data consist of eight channels of vibration signals and one channel of tachometer signals, in which the motor speed is 880 r/min. In the experiment, a total of four operating conditions were set up, including normal condition, gear pitting fault (pitting), gear tooth breaking (tooth breaking), pinion wear fault (wearing), and gear pitting fault coupling with pinion wear fault (pitting and wearing). e detailed introduction of gearbox experimental data is shown in Table 5. e data acquisition equipment is QPZZ-II produced by Jiangsu Qianpeng Diagnostic Engineering Co., Ltd., with a sampling frequency of 5.12 kHZ and sampling time of 6 s. erefore, each health state contains 53248 data points. e selected channels are the acceleration signal collected by the bearing X on the motor side of the input shaft and the bearing Y on the load side of the output shaft. e collected vibration signals are divided into 26 nonoverlapping samples with length 2048. Among them, 10 samples were used for training, and the remaining 16 groups were used for testing. It is difficult to directly judge the type of gear failure based on the amplitude and frequency changes of the waveforms. According to the previous analysis, mvAAPE can be used to detect whether mechanical equipment is faulty and is successfully used to detect the health condition of rolling bearings. Due to the complicated structure of the gearbox, it is difficult to disassemble and inspect the gearbox. erefore, it is necessary to detect the health condition of the gearbox. Figure 15 Figure 9: Distribution of multiscale feature after applying the mRmR approach.   more complicated than that of the normal samples. After the gearbox fails, the vibration signals enjoy obvious modulation characteristics, which are composed of multiple AM and FM signals. Compared with the vibration signals of the normal samples, the fault signals contain more impact components; meanwhile, due to the influence of random factors such as noise in the signal, the signal component is more complex, so it has a larger entropy value.

Fault Recognition.
After detecting the gearbox failure, for the sake of identifying different fault types, the raised approach is utilized to process the fault vibration signals to obtain stronger features. Similarly, to verify the advantages of multivariate analysis, the univariate analysis method (RCMAAPE) is used for the motor side vibration signals. In addition, for the sake of studying the effectiveness of the RCmvMAAPE approach for extracting fault features, the RCmvMPE, mvMAAPE, and RCmvMSE approaches are used to analyze multichannel vibration signals. e analysis result is shown in Figures 16(a)-16(e).
It can be observed from Figure 16 that the overall trend of the RCmvMAAPE curve is consistent with that of RCmvMPE and mvMAAPE, but RCmvMAAPE has smaller entropy deviation, which indicates that the RCmvMAAPE method has better stability. Compared with the RCmvMSE method, the RCmvMAAPE curve has more obvious fluctuation, so it can effectively highlight the earth oscillation component of gearbox fault vibration signal, so as to extract fault features more effectively. In addition, compared with the univariate analysis method RCMAAPE, the entropy deviation of RCmvMAAPE is significantly smaller, that is, its performance is better. e main reason is that the univariate analysis method only makes rough use of the fault information in the single channel vibration signal, while the rich information in other channels is not used reasonably. However, after gearbox fails, the transmission path of internal vibration is complex and has multiple directions. e vibration signals collected from each channel contain the fault information, so it is impossible to fully characterize the fault state only by performing univariate analysis. Based on the abovementioned analysis, RCmvMAAPE can effectively analyze multichannel vibration signals and has stable performance.
It can be observed from Figure 16 that the fault features extracted by RCmvMAAPE are redundant at some scales, which indicates that not all features can be used for fault classification. It is necessary to screen them to select sensitive features. In order to improve the separability of fault features, the mRmR approach is used to process the features. e distribution of multiscale features after the rearrangement is visually described in Figure 17. e dimensionality of the new multiscale fault features is selected as 9 according to the correlation with the main fault information and the importance of the features. Finally, the obtained new fault features τ � (19,8,7,16,5,13,10,3,2) are fed into the WOA-KELM classifier to determine the fault type. Figure 18 shows the fault classification results for one trial. It can be clearly observed from the figure that except two samples of pitting and wear fault are misclassified as tooth breaking fault, the other faults are accurately identified, and the     Figure 13: e experimental rig of the gearbox from QPZZ-II.    overall classification accuracy rate reaches 95.83%, which shows that the raised approach can availably distinguish different fault types of gearbox.
Similarly, in order to reduce the large randomness of experimental results due to only performing one trial, 20 trials are repeated to obtain more reliable and accurate classification results. In addition, in order to intuitively verify the advantages of RCmvMAAPE method, four other entropy-based methods are used to diagnose gearbox faults. e detailed classification results of five approaches for 20 trials are shown in Figure 19 and Table 6. It is obvious from Table 7 that the average recognition accuracy of the presented approach is the highest and the standard deviation is the smallest, which indicates that the raised approach has stable and excellent performance. e accuracy of RCmvMPE approach is slightly lower than that of the proposed approach, which indicates that RCmvMPE can also effectively diagnose gearbox faults. But the standard difference is large, indicating that the recognition rate is not stable. In addition, the accuracy of the multivariate analysis method is higher than that of the univariate analysis method, which verifies the necessity of multivariable analysis in gearbox fault diagnosis.
As before, for the sake of investigating the importance of mRmR feature selection, two-dimensional projections of two random features selected without adopting the mRmR method are presented in Figure 20(  visualized as Figure 20(a). It can be seen from the figure that the features without mRmR feature selection are disorderly and have no obvious clustering center, which indicates that the quality of features is not high and further processing is needed to obtain separable features. After mRmR feature selection, although no obvious clustering center is obtained, the separability of the three fault states becomes stronger. It can be concluded that mRmR feature selection can improve the recognition of features and has better recognition effect. en, nine features are randomly selected and input into the WOA-KELM classifier to determine the fault type of gearbox. Similarly, each method was repeated 20 times. Table 7 shows the gearbox identification results of five methods without using mRmR feature selection for 20 trials. As can be seen from Table 7, although the highest recognition rate of the RCmvMAAPE approach is lower than that of the RCmvMPE method, the average recognition rate is still the highest, which indicates that the performance of RCmvMAAPE is more stable. Consistent with the previous analysis, the recognition accuracy of the multivariate analysis approach is higher than that of the univariate analysis approach, which directly verifies the necessity of multivariate analysis. In a word, mRmR dimension reduction can significantly improve the fault recognition rate, that is, improve the reliability of fault identification.
To validate the necessity of utilizing WOA-KELM, three commonly used classifiers are used for comparison: SVM, ELM, and KELM. e same proportion of training and test samples is employed to train and test the classifier. Table 8 shows the classification results of five approaches using diverse classifiers. It can be seen that the RCmvMAAPE approach still has the highest fault recognition rate when using different classifiers, which is higher than that of the RCmvMPE method.
Obviously, amplitude-aware       permutation entropy has better performance than permutation entropy by considering the amplitude and frequency information of time series. In addition, when the five methods are combined with different classifiers, the WOA-KELM classifier has the highest average recognition rate of 93.33%, which is higher than that of the KELM classifier alone. Since the performance of KELM is affected by the kernel parameters and penalty factor. e artificial setting cannot achieve the best classification effect. In conclusion, the WOA-KELM classifier has excellent performance, and the generalization performance is better than the commonly used classifiers.

Conclusions
In this study, a novel nonlinear analysis approach called RCmvMAAPE is raised. Various synthetic signals are analyzed and compared with RCmvMPE, mvMAAPE, and RCmvMSE. e results verify that RCmvMAAPE could effectively measure the complexity of multivariate time series and enjoys more stable performance. In the fault detection part, the mvAAPE is used to define a threshold. If the mvAAPE value of the measured sample is less than the threshold value, the equipment is normal, so as to realize the fault detection of the equipment. When a fault is detected, RCmvMAAPE is employed to extract fault features to construct initial feature vectors, and then mRmR is used to select sensitive features to form sensitive features to be classified. Finally, the sensitive feature vectors are input into the WOA-KELM classifier to determine the type and severity of the fault. e validity of the raised approach is verified by two typical examples, namely, rolling bearing and gearbox. e results demonstrate that the raised approach can not only accurately detect the fault of rotating machinery but also effectively identify the fault type. In addition, compared with other methods, RCmvMAAPE can extract higher quality fault features from multichannel vibration signals and is superior to that of common entropy-based methods, which verifies its effectiveness in feature extraction. From the perspective of practical application, the proposed method avoids the mode classification that is full of uncertainty and improves the effectiveness and timeliness of fault diagnosis by detecting the state of rotating machinery, thereby is more in line with the actual engineering needs.

Data Availability
e experimental data used to support the findings of this study are available from the corresponding author upon request.