Fault Diagnosis of a Rolling Bearing Based on Adaptive Sparest Narrow-Band Decomposition and RefinedComposite Multiscale Dispersion Entropy

Condition monitoring and fault diagnosis of a rolling bearing is crucial to ensure the reliability and safety of a mechanical system. When local faults happen in a rolling bearing, the complexity of intrinsic oscillations of the vibration signals will change. Refined composite multiscale dispersion entropy (RCMDE) can quantify the complexity of time series quickly and effectively. To measure the complexity of intrinsic oscillations at different time scales, adaptive sparest narrow-band decomposition (ASNBD), as an improved adaptive sparest time frequency analysis (ASTFA), is introduced in this paper. Integrated, the ASNBD and RCMDE, a novel-fault diagnosis-model is proposed for a rolling bearing. Firstly, a vibration signal collected is decomposed into a number of intrinsic narrow-band components (INBCs) by the ASNBD to present the intrinsic modes of a vibration signal, and several relevant INBCs are prepared for feature extraction. Secondly, the RCMDE values are calculated as nonlinear measures to reveal the hidden fault-sensitive information. Thirdly, a basic Multi-Class Support Vector Machine (multiSVM) serves as a classifier to automatically identify the fault type and fault location. Finally, experimental analysis and comparison are made to verify the effectiveness and superiority of the proposed model. The results show that the RCMDE value lead to a larger difference between various states and the proposed model can achieve reliable and accurate fault diagnosis for a rolling bearing.


Introduction
The reliability of a rolling bearing plays a vital role in ensuring stable and reliable operation of a mechanical system. If local failure of a rolling bearing is not detected as early as possible, it is likely to cause a breakdown of a mechanical system or major production safety accidents, resulting in huge economic losses. Therefore, condition monitoring and fault diagnosis fora rolling bearing have become a prevalent topic in this scientific research field [1][2][3][4][5][6][7].
Due to the influence of nonlinear factors such as varying load, clearance, nonlinear stiffness, friction, vibration signals of a rolling bearing present nonlinear and nonstationary characteristics. Therefore, it is essential to adopt an adaptive signal analysis method to extract hidden patterns or physical information. At present, various advanced signal processing techniques including wavelet transform [2] (WT), empirical mode decomposition (EMD), and its improved version [8][9][10][11], local mean decomposition (LMD) [3,12], variation mode decomposition (VMD) [13,14], matching pursuit
Step 2: Search for the sparest decomposition of the original signal x(n) by iterative operation in optimization process as follows: (1) Set i = 1, r 0 (n) = x(n) (2) Solve the optimization problem P1 with nonlinear constraint using Gauss-Newton algorithm.
P1: Minimize r i (n) − IMF i (n) 2 2 (2) Subject to IMF i (n) ∈ Dic (3) Set r i (n) = r i−1 (n) − IMF i (n) (4) If r i (n) 2 2 ≤ ε is satisfied, stop the program and obtain the decomposition results; otherwise, let i = i + 1 and return to sub-step (2) to repeat until termination condition is met.
From the above process, it can be seen that the ASTFA does not depend on the distribution of extreme points. Hence, it can inhibit some deficiency caused by the fitting processing of extreme points in EMD. Moreover, the ASTFA algorithm has a solid mathematical foundation. In the literature, the Gauss-Newton algorithm was adopted to solve the optimization problem P1 to search for the sparest represent components in the literature [17]. However, Gauss-Newton algorithm highly depends on the initial values. If the initial values deviate too far from the real values, the solution often diverges after iteration and the inaccurate components may appear.

ASNBD Algorithm
To overcome the shortcomings of ASTFA method, the ASNBD algorithm is introduced to complete nonstationary signal process in this paper. In the ASNBD, a filter with optimization parameters is built by solving a nonlinear optimization problem, and a regulated differential operator is used as the objective function so that each component is constrained to be a local narrow-band signal to generate an intrinsic narrow-band component (INBC). Furthermore, the immune genetic algorithm [30] (IGA) is utilized to address a nonlinear optimization instead of Gauss-Newton algorithm. In order to depict the ASNBD method, the definition of intrinsic narrow-band signal is illustrated firstly as follow.
For a signal expressed as A(t) cos(ωt + φ(t)), if its phase function φ(t) varies slowly and its amplitude function A(t) is band-limited, and the maximal frequency of A(t) is much smaller than ω, it can be defined as a narrow-band signal. Furthermore, if a neighborhood interval exists at any point of the signal, the signal can be regards as a local narrow-band signal. A singular local linear operator will converted a local narrow-band signal to zero [31]. In this paper, a singular local linear operator T, developed in the literature [31], will be adopted as shown below: Similar to the ASTFA algorithm, after constructing a highly redundant dictionary Dic as Equation (1), the ASNBD algorithm will search for the sparest INBCs by solving the optimization problem P2 with nonlinear constraint. The ASNBD algorithm is illustrated as shown below [18]: (1) Set i = 1, r 0 (n) = x(n); (2) Solve the following nonlinear constrained optimization problem P2: Subject to x(n) = where M is the number of INBCs; T is the differential operator as Equation (3); D is an operator that regulates the residue; λ is the weight of T(INBC i (n)) 2 2 and D(r i (n) − INBC i (n)) 2 , and, in general, λ is set to 1.
(3) Set r i+1 (n) = r i (n) − INBC i (n) (4) If r i+1 (n) 2 2 ≤ ε, then stop; otherwise, set i = i + 1 and go to the step (2). The optimization objective function is that INBC(n) is constrained to be a local narrow-band signal. Thus, the obtained INBCs have explicit physical meaning in ASNBD. However, the optimization of all data points requires a massive computational cost, especially when the dataset size is big. In order to reduce the computational tension, in step (2), the optimization of all data points can be transformed into the optimization of the parameter vector β of a filter χ [18]. In other words, the sparest INBCs can be obtained by solving the optimization calculation P 3 for the parameter vector β of a filter χ. IGA is an improved genetic algorithm, which can effectively improve population diversity and restrain premature convergence of traditional genetic algorithm due to the combination of biological immune mechanism and genetic algorithm. On the one hand, in the immune system, antibodies promote or inhibit each other to maintain population diversity. On the other hand, large-scale optimization calculation is carried out through immune selection, immune variation, immune update, and new dynamic adjustment operation. Moreover, IGA adopts immune memory function, which improves the overall search ability and speeds up the search procedure. In addition, IGA is not sensitive to the initial values. Accordingly, IGA is used to address the optimization problem P 3 (as shown below) for the parameter vector β of a filter χ. The procedure of the optimization calculation is depicted as shown below.
(1) Calculate the fast Fourier transformationr i (k) of r i (n). (2) Design a filter χ( k|β) (β= [ω, ω b , ω c ]): (3) Solve the following nonlinear unconstrained optimization problem P 3 to obtain parameter vector β 0 of a filter χ(k β) by applying IGA algorithm. The initial values are created randomly in the IGA algorithm, and the maximum number of generations is set to 200, and the termination tolerance is e-6 and the population size is 500 in the IGA procedure. The flowchart of the ASNBD algorithm is given in Figure 1.
Solve P : Obtain the optimal parameter vector of the filter ( ) k β χ β [ ] (3) Solve the following nonlinear unconstrained optimization problem 3 P to obtain parameter vector 0 β of a filter ( | ) k χ β by applying IGA algorithm. The initial values are created randomly in the IGA algorithm, and the maximum number of generations is set to 200, and the termination (4) Convert the filter with optimized parameter vector β 0 to INBC i (n) using inverse fast Fourier transformation. In fact, the INBC i (n) is obtained through the filtering process using the optimal filter designed in step (3).

Simulation Analysis for ASNBD
A simulation signal x(t) is used to verify the effectiveness and superiority of the ASNBD technique. x(t) includes a cosine signal x 1 (t) and an amplitude-modulated and frequency-modulated (AM-FM) signal x 2 (t). The time domain waveforms of x(t) and its components are shown in Figure 2 and is written as shown below.

Simulation Analysis for ASNBD
A simulation signal ( ) x t is used to verify the effectiveness and superiority of the ASNBD technique. ( ) x t includes a cosine signal 1 ( ) x t and an amplitude-modulated and frequencymodulated (AM-FM) signal 2 ( ) x t . The time domain waveforms of ( ) x t and its components are shown in Figure 2 and is written as shown below. x t x t x t x t x t t t t can be concluded that ASNBD can achieve more accurate decomposition results than ASTFA and CEEMD method. For comparison, ASNBD, ASTFA, and CEEMD are utilized to analyze the signal x(t). The results are shown in Figures 3-5, respectively. In Figure 3, the first two components are obviously false high-frequency components with weak energy only at the two end-point part, which may be generated due to the decomposition procedure, and the third component INBC 3 and the fourth component INBC 4 are consistent with the true components. Therefore, the two components are very useful component. Although the obtained components using ASTFA technique also reflect the real components (shown as C 2 ,C 3 ) in Figure 4, their energies reduce a lot and their waveforms exhibit a big deviation relative to the true component. From Figure 5, it can be seen that the real component sare not successfully derived by the CEEMD technique. At the same time, the further calculation shows that the correlation coefficient of INBC 3 and x 1 (t) is 0.9824, and correlation coefficient of INBC 4 and x 2 (t) is 0.9875 by using ASNBD, while correlation coefficient of C 2 and x 1 (t) is 0.9452 and correlation coefficient of C 4 and x 2 (t) is 0.8000 by using ASTFA. Therefore, it can be concluded that ASNBD can achieve more accurate decomposition results than ASTFA and CEEMD method.

Dispersion Entropy
The complexity stands for meaningful structural richness. MSE and RCMSE are the most common measures, but they are still challenges for short-term time series since the undefined values may be generated when the scale factor is large. Furthermore, their computation is not quick enough for real-time application. In [29], the refined composite multiscale dispersion entropy algorithm (RCMDE) was proposed to overcome the deficiencies. In this subsection, the dispersion entropy (DisEn) is depicted as follows.
(1) First, for a time series  are mapped to c classes with integer indices from 1 to c using the normal cumulative distribution function (NCDF).
Assume the NCDF maps x to { } 1 2 , , N y y y = y  , that is: where σ and u are the standard deviate and mean of time series x, respectively. Then, each i y is converted into an integer from 1 to c by using a linear algorithm, which is written as:

Dispersion Entropy
The complexity stands for meaningful structural richness. MSE and RCMSE are the most common measures, but they are still challenges for short-term time series since the undefined values may be generated when the scale factor is large. Furthermore, their computation is not quick enough for real-time application. In [29], the refined composite multiscale dispersion entropy algorithm (RCMDE) was proposed to overcome the deficiencies. In this subsection, the dispersion entropy (DisEn) is depicted as follows.
(1) First, for a time series . . N are mapped to c classes with integer indices from 1 to c using the normal cumulative distribution function (NCDF). Assume the NCDF maps x to y = {y 1 , y 2 , . . . y N }, that is: where σ and u are the standard deviate and mean of time series x, respectively. Then, each y i is converted into an integer from 1 to c by using a linear algorithm, which is written as: where z c j is the jth element of the classified time series. Round represents the rounding operation, which means either increasing or decreasing a number to the next digit. As a result, the time series are mapped into the class integer from 1 to c.
(2) Time series z m,c i are reconstructed with embedding dimension m and time delay d. (3) For each c m potential dispersion patterns, the relative frequency is computed using the equation as: (4) Finally, based on the definition of Shannon's entropy, the DisEn value is calculated as follows: From the calculation process of DisEn, it can be found that when all possible dispersion patterns have equal probability value, the irregularity degree of data is the highest, and the maximum DisEn value ln c m is obtained. On the contrary, when the time series is regular or completely predictable, there is only one π v 0 ...v m−1 different from zero and the smallest DisEn value is achieved [28].

Refined Composite Multiscale Dispersion Entropy
The refined composite multiscale dispersion entropy algorithm includes four main steps.
(1) To obtain coarse-grained time series at scale factor τ, the coarse-graining procedure can be demonstrated as shown in Figure 6, from which it can be seen that coarse-grained sequences are obtained from different start points. The original time series x is divided into several segments and the jth element of the kth coarse-grained time series y τ k = {y τ k,1 , y τ k,2 , . . . y τ k,p }, 1 ≤ k ≤ τ can be built by the following equation: (2) For a scale factor τ, define the embedding dimension m and time delay d, then the relative frequency set {p τ k , , 1 <k ≤ τ} of all coarse-grained time series y τ k are calculated as formula (12).
(4) Finally, RCMDE value is achieved as follow: large a c or m may consume more computation time. According to our research, when the parameter c is 4-10, similar results can be obtain. In addition, when the parameters m and c changed under the condition of m c N < , the results were similar. For more information about the parameters c , m , and d , please refer to the literature [28]. For the scale factorτ , it needs to be set according to the actual situation. Simultaneously, for RCMDE, since the coarse-graining process shorten the length of a signal to N τ , the requirement m N c τ < must be met. Figure 6. Schematic illustration of the coarse-graining procedure.

Parameters Selection
It is an important issue to select appropriate parameters for entropy-based approach. There are four parameters, including the embedding dimension m, the number of classes c, the time delay d and the maximum scale factor τ. In general, it is recommended d = 1 because when d > 1 some important information in terms of frequency may be discarded, which might lead to aliasing for practical work, and the number of class c must be bigger than 1, because when c = 1, there is only one dispersion pattern [28]. Moreover, in order to obtain reliable statistics, the number of potential dispersion patterns c m should be smaller than the length of the signal (c m < N). When c is too large, a slight difference between amplitudes would change their class to obtain different dispersion entropy values, which may result in high sensitivity to noise. However, when c is too small, the amplitudes that are far from each other may be regarded as the same class and thus cause inaccurate value. When the embedding dimension m is too small, the dispersion entropy might not detect the dynamic changes. Although a bigger m can capture more information, too large m might need a longer data. In general, the length of data is between 10 m and 30 m . Moreover, too large a c or m may consume more computation time. According to our research, when the parameter c is 4-10, similar results can be obtain. In addition, when the parameters m and c changed under the condition of c m < N, the results were similar. For more information about the parameters c, m, and d, please refer to the literature [28]. For the scale factor τ, it needs to be set according to the actual situation. Simultaneously, for RCMDE, since the coarse-graining process shorten the length of a signal to N τ , the requirement c m < N τ must be met.
On the other hand, the length of datasets N will affect the estimation of the RCMDE value. Too large an N may reduce the computing efficiency. While, when N is too small, in order to satisfy the requirement that c m < N, we have to use a smaller m or c, which likely causes the limitations described above. The capability and propriety of the RCMDE algorithm for measuring complexity was evaluated and compared with RCMSE by synthetic signals and real biomedical datasets in [29]. In order to evaluate the sensitivity of the RCMDE algorithm to the length of datasets, we employed synthetic signals for a rolling bearing with fault, which is written as shown below: where N is the length of the synthetic signal and the sample frequency is f s = 12 KHz. We employed x 1 (t) to simulate a signal of a faulty rolling bearing, in which, f 1 = 4 KHz, f 0 = 30 Hz and the periodical impulse is expressed by t 1 = mod(t, 1 f o ), t is simulation time. x 2 (t) indicates the AM-FM signal, x 3 (t) is a sine signal, and x 4 (t) is white noise. According to the principle of parameter selection mentioned above, considering the calculation time and information richness, we choose the parameters as c = 9, m = 2, and d = 1. Figure 7a,b record the statistical property of the RCMDE value changes with N, from which we can draw the following conclusion. Firstly, the entropy values have the similar trend with time scales no matter how long the dataset is. Secondly, when N is more than 2K, the obtained results are almost same. Lastly, from Figure 7b when the data length N ranges from 1K to 5K, the standard deviation (Std) decreases with N increasing, but when the data length N is more than 5000, the standard deviation goes up at some scales. Hence, based on the above analysis, we use N = 2048. On the other hand, the length of datasets N will affect the estimation of the RCMDE value. Too large an N may reduce the computing efficiency. While, when N is too small, in order to satisfy the requirement that m c N < , we have to use a smaller m or c , which likely causes the limitations described above. The capability and propriety of the RCMDE algorithm for measuring complexity was evaluated and compared with RCMSE by synthetic signals and real biomedical datasets in [29].In order to evaluate the sensitivity of the RCMDE algorithm to the length of datasets, we employed synthetic signals for a rolling bearing with fault, which is written as shown below: x t x t x t x t x t x t 500t 2 f t x t 1 05 40 t 300 t 30 t x t 2 800 t where N is the length of the synthetic signal and the sample frequency is s f 12KHz = . We  Figure 7a,b record the statistical property of the RCMDE value changes with N , from which we can draw the following conclusion. Firstly, the entropy values have the similar trend with time scales no matter how long the dataset is. Secondly, when N is more than 2 K , the obtained results are almost same. Lastly, from Figure 7b when the data length N ranges from 1K to 5 K , the standard deviation (Std) decreases with N increasing, but when the data length N is more than 5000, the standard deviation goes up at some scales. Hence, based on the above analysis, we use

Fault Diagnosis Model Proposed
When a variety of failures occur in mechanical system, the vibration signals acquired by sensors represent the nonlinear and nonstationary characteristics and the energy distribution will change with different working states, resulting in the variety of the complexity of times series. Here, a novel-fault diagnosis-model is developed by combining the ASNBD method with the RCMDE algorithm in this paper. Firstly, a nonlinear and nonstationary vibration signal is decomposed into a series of the INBCs. Secondly, the RCMDE values from the relevant INBCs are extracted as fault

Fault Diagnosis Model Proposed
When a variety of failures occur in mechanical system, the vibration signals acquired by sensors represent the nonlinear and nonstationary characteristics and the energy distribution will change with different working states, resulting in the variety of the complexity of times series. Here, a novel-fault diagnosis-model is developed by combining the ASNBD method with the RCMDE algorithm in this paper. Firstly, a nonlinear and nonstationary vibration signal is decomposed into a series of the INBCs. Secondly, the RCMDE values from the relevant INBCs are extracted as fault features. In the end, of fault diagnosis process, basic multiSVMis employed as class discrimination technique to identify different fault type and location. The proposed fault diagnosis scheme for rolling bearing is given in Figure 8. The specific steps for the proposed scheme are given as follows.
Step 3: Extract the RCMDE values from selected INBCs as fault features to construct feature vectors. Suppose that n is the number of the selected INBCs, max τ is the maximum scale factor, then max n τ × dimension of feature vectors can be achieved. Theoretically, more features are helpful to quantity fault categories from different perspectives. However, too many features may lead to huge computation cost and reduce the recognition rate. Thus, the number of INBCs used is usually set to less than four; and the maximum scale factor max τ is less than 20. Here, we set max 20 τ = .
Step 4: Divide the original datasets randomly into two groups, one as the training samples, and the other for the testing samples. For an unknown test sample, failure patterns can be discriminated by the output results of the multiSVM classifier.

Datasets Collection and Signal Decomposition
In order to verify the effectiveness, the proposed scheme is applied to the experimental datasets  Step 1: Collect m i vibration signals for ith classes of working states. Thus, M = k i=1 m i vibration signals are obtained in total for k classes.
Step 2: Decompose each vibration signal into several IBNCs and select the relevant INBCs, which contain rich fault information for further feature extraction.
Step 3: Extract the RCMDE values from selected INBCs as fault features to construct feature vectors. Suppose that n is the number of the selected INBCs, τ max is the maximum scale factor, then n × τ max dimension of feature vectors can be achieved. Theoretically, more features are helpful to quantity fault categories from different perspectives. However, too many features may lead to huge computation cost and reduce the recognition rate. Thus, the number of INBCs used is usually set to less than four; and the maximum scale factor τ max is less than 20. Here, we set τ max = 20.
Step 4: Divide the original datasets randomly into two groups, one as the training samples, and the other for the testing samples. For an unknown test sample, failure patterns can be discriminated by the output results of the multiSVM classifier.

Datasets Collection and Signal Decomposition
In order to verify the effectiveness, the proposed scheme is applied to the experimental datasets shared by Case Western Reverse Bearing Data Center [32]. The datasets include vibration time series collected by the accelerometer mounted on the driven-end bearings with inner race fault (IRF), ball fault (BF), outer race fault (ORF), and normal state. The driven-end bearings were charged to single-point failures with fault diameters of 0.007in to 0.021in. The sampling frequency f s equals to 12 KHz. The motor load is 2hp, and the shaft rotation speed is f r = 1750rpm. Ten classes of vibration signals were utilized in this paper. The datasets are divided into 55 segments as samples with the length N = 2048. The more details of datasets and experimental rig are given on the Case Western Reserve University's website. Datasets used are listed in Table 1. The time-domain waveform of vibration signals under various conditions are shown in Figure 9, from which it can be found these vibration signals are obviously nonlinear and nonstationary; and it is difficult to differ them from each other.

5.2.Feature Extraction by RCMDE with ASNBD
To quantify the complexity of intrinsic mode, the original vibration signal is decomposed into a number of INBCs using the ASNBD method. Simultaneously, correlation analysis is conducted between each INBC and the original signal to determine which ones are the false components. The components with small correlation coefficients are regarded as false components and removed. To the end, six-eight components are obtained as true INBCs for next analysis. Figure 10 and Figure 11,

Feature Extraction by RCMDE with ASNBD
To quantify the complexity of intrinsic mode, the original vibration signal is decomposed into a number of INBCs using the ASNBD method. Simultaneously, correlation analysis is conducted between each INBC and the original signal to determine which ones are the false components. The components with small correlation coefficients are regarded as false components and removed. To the end, six-eight components are obtained as true INBCs for next analysis. Figures 10 and 11, respectively, show the decomposition results for the vibration signals of IRF (noted as No.1 signal) and BF (noted as No.2 signal and please see Tables 1 and 2). After computation, the ball-fault feature-frequency is f b = 136 Hz. Since it is most difficult to detection the ball faults, we draw the envelop spectrum for the first component of BF signal in the Figure 12, in which the ball fault frequency f b can be found more easily when using the ASNBD than the ASTFA and the CEEMD. This result illustrates the superiority of the ASNBD. Besides, from the abovementioned figures, it can be also concluded that fault information for rolling bearing mainly concentrates on the first several components because they present modulation and impulse characteristics with larger energy. Moreover, it is found the correlation coefficients R and kurtosis values K for the first three INBCs are bigger. Here, we list the results of correlation analysis and kurtosis values for the INBCs of No.1 signal and No.2 signal in Table 2 as an example to clarify the selecting process for better INBCs. Therefore, the first three INBCs are selected to characterize the original signal.  Tables 1 and 2). After computation, the ball-fault feature-frequency is b f 136 Hz = .
Since it is most difficult to detection the ball faults, we draw the envelop spectrum for the first component of BF signal in the Figure 12, in which the ball fault frequency b f can be found more easily when using the ASNBD than the ASTFA and the CEEMD.
This result illustrates the superiority of the ASNBD. Besides, from the abovementioned figures, it can be also concluded that fault information for rolling bearing mainly concentrates on the first several components because they present modulation and impulse characteristics with larger energy. Moreover, it is found the correlation coefficients R and kurtosis values K for the first three INBCs are bigger. Here, we list the results of correlation analysis and kurtosis values for the INBCs of No.1 signal and No.2 signal in Table 2 as an example to clarify the selecting process for better INBCs. Therefore, the first three INBCs are selected to characterize the original signal.    Tables 1 and 2). After computation, the ball-fault feature-frequency is b f 136 Hz = .
Since it is most difficult to detection the ball faults, we draw the envelop spectrum for the first component of BF signal in the Figure 12, in which the ball fault frequency b f can be found more easily when using the ASNBD than the ASTFA and the CEEMD.
This result illustrates the superiority of the ASNBD. Besides, from the abovementioned figures, it can be also concluded that fault information for rolling bearing mainly concentrates on the first several components because they present modulation and impulse characteristics with larger energy. Moreover, it is found the correlation coefficients R and kurtosis values K for the first three INBCs are bigger. Here, we list the results of correlation analysis and kurtosis values for the INBCs of No.1 signal and No.2 signal in Table 2 as an example to clarify the selecting process for better INBCs. Therefore, the first three INBCs are selected to characterize the original signal.      Figure 13, from which we draw a few conclusions as follows. First of all , the RCMDE values of normal rolling bearing is much bigger in the major time scales than those under faulty states, which is consistent with the fact that the vibration signals under normal state are most complex and irregular. However, failures would change the system dynamics to become the excitation source, which will cause periodic impulse, increase the self-similarity of vibration signals, and thus drive the entropy values drop down. Secondly, the RCMDE values from vibration signals of rolling bearing under ball fault state and inner race fault state are bigger than those under outer race fault state. This phenomenon can be explained by the  Figure 13, from which we draw a few conclusions as follows. First of all, the RCMDE values of normal rolling bearing is much bigger in the major time scales than those under faulty states, which is consistent with the fact that the vibration signals under normal state are most complex and irregular. However, failures would change the system dynamics to become the excitation source, which will cause periodic impulse, increase the self-similarity of vibration signals, and thus drive the entropy values drop down. Secondly, the RCMDE values from vibration signals of rolling bearing under ball fault state and inner race fault state are bigger than those under outer race fault state. This phenomenon can be explained by the fact that when local failures occur in ball elements or inner race, the vibration signals would pass through a long way to the sensors which are mounted on the bearing basis, leading to more modulation components. While the outer race is fixed on the bearing basis, the pathway is shortest to the sensor and the vibration signals contain little interference, so that they show a more apparent periodical impulse and the entropy values are smaller. In addition, the RCMDE values of faulty rolling bearing monotonically decrease with the time scale increasing. This can be due to the fact than the multiscale coarse-graining procedure progressively eliminates the uncorrelated random components such that the entropy monotonically decreases with the scale factors [20]. At the same time, the RCMFE values of the third INBCs are given in Figure 14 in comparison with the RCMDE. From Figure 14, it can be found that the results fluctuate greatly. Although when the scale factor ranges from 6 to 12, the entropy values of normal rolling bearing are biggest, they have not clearly regular patterns over all scales. Most important of all, by comparing Figure 13 with Figure 14, it can be obviously shown that RCMDE leads to larger differences between various states than the RCMFE, resulting in essentially improving the fault detection rate of rolling bearing, which would be verified during the next class discrimination process.
As mentioned above, there are 55 samples for each state and there are 550 samples in total. All these samples are randomly divided into two groups, in which 100 samples (10 samples per class) are determined as training group to obtain the training matrix T 100×60 , and 450 samples are regarded as the test group to achieve test matrix M 450×60 . fact that when local failures occur in ball elements or inner race, the vibration signals would pass through a long way to the sensors which are mounted on the bearing basis, leading to more modulation components. While the outer race is fixed on the bearing basis, the pathway is shortest to the sensor and the vibration signals contain little interference, so that they show a more apparent periodical impulse and the entropy values are smaller. In addition, the RCMDE values of faulty rolling bearing monotonically decrease with the time scale increasing. This can be due to the fact than the multiscale coarse-graining procedure progressively eliminates the uncorrelated random components such that the entropy monotonically decreases with the scale factors [20]. At the same time, the RCMFE values of the third INBCs are given in Figure 14in comparison with the RCMDE. From Figure 14, it can be found that the results fluctuate greatly. Although when the scale factor ranges from 6 to 12, the entropy values of normal rolling bearing are biggest, they have not clearly regular patterns over all scales. Most important of all, by comparing Figure 13with Figure 14, it can be obviously shown that RCMDE leads to larger differences between various states than the RCMFE, resulting in essentially improving the fault detection rate of rolling bearing, which would be verified during the next class discrimination process.

Fault Diagnosis Results and Comparison
Support Vector Machine (SVM) has excellent classification performance for small-sample recognition task. However, it is only a binary classifier and it is difficult to deal with multiple class problem. Multi-class Support Vector Machine (MultiSVM) with linear kernel function, as SVM's extension technique, is employed in this paper. Simultaneously, to demonstrate the necessity of using the ASNBD, the RCMDE values of the raw signals are extracted and the comparison analysis is done. In addition, the RCMFEs are computed to validate the superiority of the proposed model in comparison with the RCMDEs. The results are listed in Table3. The first row illustrates the proposed method outperforms the other approaches because it acquires the highest accuracy and the smallest standard deviation. The method shown in second row employs the RCMFE values as features instead of the RCMDE values. The third row and the fourth row use raw signals instead of INBCs to extract the RCMDE values or the RCMFE values as feature vectors. Noted that these techniques use the same class recognition method-MultiSVM to make the comparison fair. Compared the first\the third row with the second\the fourth row, it can be observed that the fault identification rate is higher and the standard deviation is lower when using the RCMDE values as input feature vectors for multiSVM classifier than using the RCMFE values. This is because RCMDE values lead to bigger difference between the bearing working states as shown in Figure 13. On the other hand, the features extracted from INBCs are more effective than those derived from raw signals from the first\second row against the third\fourth. In other words, the application of signal decomposition method is necessary to obtain more fault-sensitive features to improve the classifier's performance.
In order to further verify the effectiveness and superiority of the ASNBD technique, we utilized the CEEMD and the ASTFA to complete signal decomposition. Similar to the proposed model, the first three intrinsic mode functions (IMFs) were used to extract the RCMDE values as features and the fault diagnosis results were listed from the fifth to the eighth rows in Table 3. In these techniques, the CEEMD or the ASTFA method was employed to preprocess the signals. No matter which signal process technique served, the entropy-based measures are effective features to fault diagnosis and can yield satisfactory results even when basic multiSVM was used. However, it is no doubt that the proposed model is best among them. At the same time, a few currently-developed techniques are listed in Table 4, from which, it can be observed that our

Fault Diagnosis Results and Comparison
Support Vector Machine (SVM) has excellent classification performance for small-sample recognition task. However, it is only a binary classifier and it is difficult to deal with multiple class problem. Multi-class Support Vector Machine (MultiSVM) with linear kernel function, as SVM's extension technique, is employed in this paper. Simultaneously, to demonstrate the necessity of using the ASNBD, the RCMDE values of the raw signals are extracted and the comparison analysis is done. In addition, the RCMFEs are computed to validate the superiority of the proposed model in comparison with the RCMDEs. The results are listed in Table 3. The first row illustrates the proposed method outperforms the other approaches because it acquires the highest accuracy and the smallest standard deviation. The method shown in second row employs the RCMFE values as features instead of the RCMDE values. The third row and the fourth row use raw signals instead of INBCs to extract the RCMDE values or the RCMFE values as feature vectors. Noted that these techniques use the same class recognition method-MultiSVM to make the comparison fair. Compared the first\the third row with the second\the fourth row, it can be observed that the fault identification rate is higher and the standard deviation is lower when using the RCMDE values as input feature vectors for multiSVM classifier than using the RCMFE values. This is because RCMDE values lead to bigger difference between the bearing working states as shown in Figure 13. On the other hand, the features extracted from INBCs are more effective than those derived from raw signals from the first\second row against the third\fourth. In other words, the application of signal decomposition method is necessary to obtain more fault-sensitive features to improve the classifier's performance. In order to further verify the effectiveness and superiority of the ASNBD technique, we utilized the CEEMD and the ASTFA to complete signal decomposition. Similar to the proposed model, the first three intrinsic mode functions (IMFs) were used to extract the RCMDE values as features and the fault diagnosis results were listed from the fifth to the eighth rows in Table 3. In these techniques, the CEEMD or the ASTFA method was employed to preprocess the signals. No matter which signal process technique served, the entropy-based measures are effective features to fault diagnosis and can yield satisfactory results even when basic multiSVM was used. However, it is no doubt that the proposed model is best among them. At the same time, a few currently-developed techniques are listed in Table 4, from which, it can be observed that our proposed model is a promising alternative. Here, note that a satisfying classification was achieved when the moving-average based multiscale fuzzy entropy (MAMFE) combined partly ensemble local characteristic scale decomposition (PELCD)in the literature [33], but the procedure of feature extraction and selection is relatively complex and time-consuming because the MAMFE algorithm employ too many template vectors.