Rolling Bearing Fault Diagnosis Using Modified Neighborhood Preserving Embedding and Maximal Overlap Discrete Wavelet Packet Transform with Sensitive Features Selection

In order to enhance the performance of bearing fault diagnosis and classification, features extraction and features dimensionality reduction have become more important. The original statistical feature set was calculated from single branch reconstruction vibration signals obtained by usingmaximal overlap discrete wavelet packet transform (MODWPT). In order to reduce redundancy information of original statistical feature set, features selection by adjusted rand index and sum of within-class mean deviations (FSASD) was proposed to select fault sensitive features. Furthermore, a modified features dimensionality reduction method, supervised neighborhood preserving embedding with label information (SNPEL), was proposed to realize low-dimensional representations for high-dimensional feature space. Finally, vibration signals collected from two experimental test rigs were employed to evaluate the performance of the proposed procedure. The results show that the effectiveness, adaptability, and superiority of the proposed procedure can serve as an intelligent bearing fault diagnosis system.


Introduction
Bearings are one of the most crucial elements of rotating machinery [1,2] and bearing faults can seriously affect safe and stable operations of the rotary mechanical equipment [3,4]. If no effective actions are taken, device faults will inevitably occur, and such faults may lead to serious casualties and enormous pecuniary loss [5]. Thus, it is of significance to identity bearing faults to maintain safety of the device and reduce maintenance cost. Vibration signals collected from rolling bearings usually carry rich information on machine operation conditions [6]. In recent years, with the rapid development of signal processing, data mining, and artificial intelligence technology, the data-driven methods are becoming more important in the fault diagnosis of rolling bearings. Four main steps are necessary for these methods based on vibration signals analysis: signal processing, features extraction, features reduction, and patterns recognition [7,8]. The first three steps are the foundation of patterns recognition.
In the phase of signal processing and features extraction, due to the complexity of equipment structure and variety of operation conditions [5], the signals collected from rolling bearings often exhibit strong nonlinearity and nonstationarity. Therefore, the time-domain and frequency-domain analysis approaches cannot have essential effects [9]. For these signals, time-frequency analysis can provide an effective way for features extraction. There are representative and commonly used time-frequency analysis methods, such as empirical mode decomposition (EMD), short-time Fourier transform (STFT), Wigner-Ville distribution (WVD), and wavelet transform (WT) [10].
In recent years, various intelligent fault diagnosis systems based on EMD [11][12][13][14][15], STFT [16][17][18], and WVD [19][20][21] 2 Shock and Vibration have been widely developed for monitoring the condition of bearings in rotating machines with varying degrees of success. However, for these time-frequency methods, some challenges exist in the application. EMD has some problems such as over envelope, end effects, and mode mixing [22][23][24]. The effectiveness of STFT is still hampered by the limitation of single triangular basis [25,26]. WVD can produce interference terms on the time-frequency domain in a critical condition and high computational complexity [27]. Wavelet analysis is another important time-frequency analysis method, and it is outstanding in rotary machine diagnosis because its multiresolution merit is suitable for analyzing nonlinear and nonstationary signals [28]. Continuous wavelet transform (CWT) and discrete wavelet transform (DWT) are two categories of WT. They have perfect local properties in both time and frequency spaces and can be used as an effective method to preserve signal characteristics [27]. In [29], wavelet filtering to detect periodical impulse components from vibration signals was presented. In [30], the DWT for extracting the rotor bar faults feature was studied. In [31], the CWT and the wavelet coefficients of signals are used to process vibration signals. However, both CWT and DWT have drawbacks. CWT can generate redundant data. Therefore, it has a huge operand and requires a long time to use [31,32]. Although DWT can overcome this drawback by using the decomposition of the original complex signal to several resolutions [33,34], DWT requires the sample size to be exactly a power of 2 for the full transform because of the downsampling and has very poor frequency resolution at high frequencies [35][36][37]. In order to overcome these drawbacks, a new wavelet-based algorithm is developed, namely, maximal overlap discrete wavelet packet transform (MODWPT) [38]. It not only provides better frequency resolution, but also has no restriction about sample size [36,38]. In [36], simulation signals and gear fault vibration signals collected form a test stand are decomposed into a set of monocomponent signals by MODWPT; then the corresponding Hilbert spectrum is applied for gear fault diagnosis; the simulation and practical application examples show that the Hilbert spectrum based on MODWPT is superior to EMD. However, the time-frequency analysis methods mentioned above can cause a high-dimensional feature vector that can be a primary reason for fault classification accuracy degradation [39]. Thus features selection or dimensionality reduction is needed to find the most useful fault features that can keep intrinsic information about the defects.
Generally, the statistical properties of the signal in time, frequency, and time-frequency domain are extracted to represent features information, such as peak value (PV), root mean square (RMS), variance ( ), skewness (Sw), and kurtosis ( ). In [40], 21 time-domain statistical characteristics are extracted from different IMFs obtained by EMD as the feature vectors. Then, principle component analysis (PCA) was employed to extract the dominant components from statistical characteristics for gear faults detection. In [41], two time-domain and two frequency-spectrum statistical characteristics are selected as the features to train the SVM with a novel hybrid parameter optimization algorithm for fault diagnosis of the rolling element bearings. In [31], the statistical parameters of the wavelet coefficients in 1-64 scales were calculated for the vibration signal. In [42], 40 statistical features of wavelet packet coefficients were calculated for a single sample for each state of bearing condition. In [43], for each wavelet packet node, 10 statistical features are extracted from its associated wavelet packet coefficients and 10 statistical features are extracted from frequency spectra of its associated wavelet packet coefficients. However, considering the complex mapping relations between some bearing faults and their signs, it is often difficult to determine which statistical property is worthy of reflecting the fault nature from the feature space. If unsuitable features are used for fault diagnosis, it may lead to a decline in accuracy and efficiency of fault diagnosis [10,44]. Therefore, how to select the fault sensitive statistical characteristics as the basis of subsequent fault analysis garners considerable attention and is further studied. In this paper, a features extraction method, features selection by adjusted rand index and sum of within-class mean deviations (FSASD), is proposed. FSASD combines the -means method and sum of within-class mean deviations (SWD) of feature data, which can select the sensitive statistical characteristics for fault analysis.
For the high-dimensional statistical characteristics data, if these data are used directly in fault classification, it will lead to the very high computational complexity and fault classification accuracy degradation. Therefore, features dimensionality reduction is another crucial stage in the fault diagnosis process [21]. Up to now, dimension reduction algorithms for machinery fault diagnosis have been intensively investigated [46,47] and many classical methods have been proposed [48]. Principal component analysis (PCA) and linear discriminant analysis (LDA), as two classical linear dimensionality reduction methods, have been widely used for linear data; when the distribution of a dataset is nonlinear, PCA and LDA may be invalid [49]. Therefore, recently, some nonlinear dimensionality reduction methods, kernel principal components analysis (KPCA), Isomap, Laplacian Eigenmaps (LE), and Local Linear Embedding (LLE), and so on, are presented to provide a valid solution for the dimensionality reduction of nonlinear data [12]. Although nonlinear dimensionality reduction methods have been successfully applied in many fields, they also have some problems in practical applications, such as the problem of "out-ofsample" that has no explicit mapping matrix [50], the problem of overlearning of locality [51], and high computational complexity. Inspired by nonlinear dimensionality reduction methods, a lot of linear unsupervised dimensionality reduction methods based on manifold learning are proposed [52], such as neighborhood preserving embedding (NPE) [53], orthogonal neighborhood preserving projection (ONPP) [54], and locality preserving projections (LPP) [55]. They are the representative ones, which preserve the local geometric structure on the data manifold using linear approximation to the nonlinear mappings [52]. In recent years, some other manifold learning-based dimensionality reduction methods are presented to provide valid solutions for dimensionality reduction. In [56], a novel supervised method, called locality preserving embedding (LPE), is proposed and gives a low-dimensional embedding for discriminative multiclass submanifolds and preserves principal structure information of the local submanifolds. In [57], maximal local interclass embedding (MLIE) is proposed. MLIE can be viewed as a linear method of a multimanifold-based learning framework, in which the information of neighborhood is integrated with the local interclass relationships [57]. In [52], a general sparse subspace learning framework, called sparse linear embedding (SLE), is proposed and can integrate the local geometric structure to obtain sparse projections. And the ONPP is taken as an example to design a novel sparse subspace learning framework [52]. In [58][59][60][61][62], some supervised and semisupervised dimensionality reduction methods based on NPE are proposed. NPE, as a manifold learning method, is a kind of linear approximation of LLE by replacing the nonlinear mapping relation to achieve dimensionality reduction [53,63]. NPE aims at preserving the local neighborhood structure on the data manifold, and it can work well with multimodal data. In [63], the NPE is applied for bearing fault identification and classification and performs well in feature extraction. However, NPE could not utilize the label information in dimensionality reduction [57]. LDA is a supervised dimensionality reduction method and takes the label information into account in features reduction. Based on the respective attributes of NPE and LDA, supervised neighborhood preserving embedding with label information (SNPEL), a modified NPE, is proposed in this paper, where the fault label information is considered.
The contribution of this paper is the development of intelligent fault diagnosis system of rolling bearings based on multidomain features, systematically combining statistical analysis methods with artificial intelligence techniques. FSASD, a novel features extraction method, was proposed to select the fault sensitive statistical characteristics as the basis of subsequent fault analysis. A modified features reduction method, SNPEL, was proposed to excavate abundant and valuable information with low dimensionality. The execution of the proposed bearing fault diagnosis method is divided into four steps: signal processing, features extraction, features reduction, and fault patterns identification. In the first step, vibration signals collected from bearings are decomposed into different terminal nodes by MODWPT, and multidomain features were calculated from the reconstructed signal. In the second step, the adjusted rand index (ARI) criterion of the clustering method and SWD of samples were used to select fault sensitive statistical characteristics, which can represent the fault peculiarity under different working conditions. Furthermore, due to information redundancy and a high-dimensional dataset, in the third step, SNPEL was applied to obtain a new lower-dimensional space in which the new constructed features were obtained by transformations of the original higher-dimensional features such that certain properties were preserved. Finally, vibration signals collected from two test rigs were conducted to validate the effectiveness, adaptability, and superiority of the proposed method for the identification and classification of bearing faults. The first test rig is from Case Western Reserve University; four cases with 12 working conditions were employed to verify the performance of the proposed method. The second test rig is SQI-MFS test rig; two cases with 10 working conditions were employed to verify the performance of the proposed method. The analysis results for the vibration signals of roller bearing under different working conditions show the effectiveness, adaptability, and superiority of the proposed fault diagnosis approach.
The rest of this paper is organized as follows. In Section 2, a theoretical background of the LDA technique, NPE technique, and SVM is summarized. In Section 3, a description of the proposed diagnosis technique is given, and the system framework of the proposed method is illustrated. In Section 4, bearing faulty vibration signals collected from two experimental test rigs are employed to verify the proposed fault diagnosis method. Finally, some conclusions are drawn in Section 5.

Bearing Fault Effects on the Vibration in Frequency
Domain. For the bearing, the inner race, outer race, ball, and cage which are placed in the space between the rings make rotating possible. However, due to the inappropriate lubrication of the bearing rolling elements, inadequate bearing selection improper mounting, indirect failure and material defects, and manufacturing errors, various defects can occur [21], such as surface fatigue damage, bonding, and wear. The most common of these faults is the surface fatigue damage, which is further categorized as spalling, crack, or other abnormal conditions [64]. When a fault appears on the surface of bearing, the cyclical impulsive vibration emerges. The frequency of the impulsive vibration is known as the fault symptoms, of which the value depends on the fault size, rotational speed, and damage location [65].
For different bearing components (i.e., outer race, inner race, and ball, as shown in Figure 1), main fault frequencies are the cage fault frequency (CFF), the inner raceway fault frequency (IRFF), the outer raceway fault frequency (ORFF), and the ball/roller fault frequency (BRFF). When the outer ring is fixed, the aforementioned fault frequencies are mathematically described as (1 − cos ) = , where is the motor driving frequency or rotational frequency of shaft, is the ball/roller diameter, is the pitch diameter, is the number of rolling elements, and is the ball contact angle (zero for rollers) [21]. Therefore, a lot of research work has been carried out based on vibration signal for bearings fault analysis.

Maximal Overlap Discrete Wavelet Packet Transform (MODWPT)
. WT can be treated as a fast-evolving mathematical and signal processing tool in dealing with nonstationary signals [66] and has been widely applied in many engineering fields for decomposing, denoising, and signal analysis over nonstationary signals [26,42]. Continuous wavelet transform (CWT) and discrete wavelet transform (DWT) are two categories of WT. The CWT have some drawbacks; one of these is that CWT generates redundant data. Therefore, it has a huge operand and requires a long time to use [31,32]. The DWT can overcome this drawback by using the decomposition of the original complex signal to several resolutions [33,34]. Let be a column vector containing a sequence 0 , 1 , . . . , for all nonzero integers . These high-pass filters are also required to satisfy (2). In addition, both low-pass filters and high-pass filters are chosen to be quadrature mirror filters satisfying ℎ = (−1) −1− , or = (−1) +1 ℎ −1− for = 0, 1, . . . , − 1.
Although the DWT has been developed to improve the drawback mentioned above of CWT [33,34], it requires the sample size to be exactly of a power of 2 for the full transform because of the downsampling step in the DWT [35]. In order to overcome these drawbacks, maximal overlap discrete wavelet transform (MODWT) is developed [36]. MODWT could be considered as a revised version DWT. While the DWT of level restricts the sample size to an integer multiple of 2 , the MODWT of level is well defined for any sample size [35][36][37]. A scaling of the defining filters is required to conserve energy and filters are given bỹ Thus, (2) becomes Shock and Vibration 5 However, both the DWT and the MODWT have very poor frequency resolution at low frequencies [36]. For this drawback, the maximal overlap discrete wavelet packet transform (MODWPT) can further decompose the high frequency band, which is not decomposed in the DWT and the MODWT. Let when mod 4 = 0 or 3, then , = {̃}; when mod 4 = 1 or 2, then , = {h }. Therefore, with the suitable decomposition scale and disjoint dyadic decomposition, the complicated signal could be decomposed into a number of components whose instantaneous amplitude and instantaneous frequency attain physical meaning [36,37].

Linear Discriminant Analysis (LDA).
The LDA was proposed by Fisher [67] for dimension reduction, which finds an embedding transformation such that the between-class scatter is maximized and within-class scatter is minimized [68][69][70]. The objective of the original Fisher's LDA, namely, Fisher's criterion, is to maximize the ratio of between-class scatter matrix to within-class scatter matrix : where is a vector and and are two scales. | | is the absolute value operator. However, a large number of state classes are usually present for identification and classification of different bearing faults. Hence the multiclass LDA is more desired [21]. Let x i ∈ R n ( = 1, 2, . . . , ) be -dimensional samples and ∈ ( = 1, 2, . . . , ) be the associated class labels, where is the number of samples and c is the total number of classes. Let be the number of samples in class . When > 1, where = − 1, a projection matrix is needed. Both and are r by r matrices, and the ratio of them cannot be computed directly. The determinant ratio is used: where the definitions of the between-class scatter matrix and the within-class scatter matrix are as follows: where is the mean of the samples in class and is the mean of all samples: The between-class scatter matrix and within-class scatter matrix also have equivalent form [71,72]: = [ ] × and = [ ] × are weight matrices, and and are diagonal matrices. is the th diagonal samples of and the sum of elements of the th row of , and is the th diagonal samples of and the sum of elements of the th row of . The solution to minimize the within-class scatter variance and maximize the between-class variance is obtained by an eigenvalue decomposition of and considering the eigenvalues corresponding to the eigenvalues.

Neighborhood Preserving Embedding (NPE)
. NPE, which is proposed by He et al. [53] for dimension reduction, aims at preserving the local neighborhood structure on the data manifold and is a linear approximation of the LLE. NPE can avoid a disadvantage of LLE that is sensitive to outliers [63]. NPE not only seeks an embedding transformation such that the local manifold structure is preserved, but also can be performed in either supervised or unsupervised mode when the class information and a better weight matrix are available [53].
Given a dataset of samples assembled in a matrix = [ 1 , 2 , . . . , ], the dimension of each sample is , and a transformation matrix A can be found that maps these samples to a dataset of samples assembled in = [ 1 , 2 , . . . , ]. The dimension of each sample is ( ≪ ), where the th column vector of corresponds to that of . Thus, the transformation can be expressed by = . The specific procedure can be presented as follows [53,63]: (1) Constructing an adjacency graph: calculate the Euclidean distance between samples and . The -nearest neighbors (knn) are used to construct the adjacency graph . The distance ( , ) represents the edge connecting and , as (2) Computing the weights: in this step, the weights of the edges are computed. Let denote the weight matrix with , having the weight of the edge from node to node and let it be 0 if there is no such edge. The weights of the edges can be computed by the minimizing weighted objective function which is presented as follows: with constraints A reasonable criterion for choosing an expected map is to minimize that cost function which is presented as follows [72]: This optimization problem can be converted to the following expression: where = ( − ) ( − ), = diag(1, . . . , 1), and tr is the trace of . is symmetric and semi-positive definite. The specific procedure of how to solve the above minimization problem can be seen in [72].

Support Vector Machine (SVM).
The key concept of SVM [73], which is originally developed for binary classification problems, uses a hyperplane to define decision boundaries between data points with different class. The idea behind SVM is that it can seek to construct optimal separating hyperplane to separate the two patterns, where the hyperplane minimizes the upper bound of the generalization error by maximizing the margin between the separating hyperplane and the nearest sample points [24]. SVM is able to handle both simple linear classification tasks and the classification of complex and nonlinear multiclass data [12].
Considering that a dataset { , } =1 consists ofdimensional sample, ∈ R d ( = 1, 2, . . . , ) presents the attribute and the corresponding label ∈ {−1, +1} defines the type. In order to acquire a hyperplane to separate the two types of samples, a linear decision boundary, ( ) + , can be learned from the training samples, where is the normal direction of a separation plane and is the bias [12,24]. Samples of each type can be classified through the following constraints: The optimal hyperplane can be obtained by solving the following optimization problem: When the data are linearly separable, the formulations presented above can work accurately. However, they will be ineffective when the investigated sample is overlapping or nonlinear [12]. Thus, a parameter is adopted to make the classifier more robust, which allows a certain degree of misclassification for some points around the decision boundary. Furthermore, a penalty parameter , imposing a trade-off between training error and generalization [24], is introduced to control the number of misclassified points and adjust the margin between different classes [12]. Therefore, the optimization problem to find the optimal decision can be described as follows: For the constrained optimization problem, by using the duality theory of optimization, the final decision function can be presented as [24] where symbolizes Lagrange multipliers and ( , ) is a kernel function, which is positive definite. Typical examples of kernel function [10] offer these choices: linear kernel, polynomial kernel, radial basis function (RBF) kernel, and sigmoid kernel.
Shock and Vibration 7 For roller element bearings, the fault detection is a multiclass pattern recognition task, which can be generally solved by decomposing the multiclass problem into several binary class problems [74]. In [75], the multiclass patterns recognition was handled by the "one-against-one" approach. In this paper, we select the polynomial kernel to solve the multiclass pattern recognition task.

Features Extraction Method FSASD (Features Selection by Adjusted Rand Index and Sum of Within-Class Mean
Deviations). In this paper, we suggest that the most sensitive statistical characteristics should be selected before the implementation of the fault patterns recognition technique. For this reason, the -means method and SWD are applied to a dataset that includes different statistical characteristics for the case of bearing conditions. In FSASD, each kind of statistical characteristic is clustered by -means method, from which the clustering result adjusted rand index (ARI) becomes an evaluation index of each statistical characteristic.
For each kind of statistical characteristic, we compute SWD of characteristic samples in each bearings condition. The sum of SWD in all bearing conditions can be obtained. For each statistical characteristic, the higher the value of ARI is, the greater the characteristic class discriminative degree will be. The lower the value of SWD is, the greater the class cohesion of the characteristic will be. Therefore, the ratio of ARI and SWD is selected to indicate the sensitivity of statistical characteristic. The description of FSASD is summarized in the following steps.
Step 1. In the training samples, there are kinds of bearing fault types, vibration signals samples in each type of bearings fault pattern, and kinds of statistical characteristics. By vibration signals processing, we can obtain original feature sets, where is the th statistical characteristic of the th sample in the th kind of bearings fault type. Next, can be classified into clustering partitions using the -means method. The ARI of the clustering partitions can be calculated to judge the accuracy of clustering results [76,77].
is number of objects in a pair being placed in the same class in and in the same class in , is number of objects in a pair being placed in the same class in and in different classes in , is number of objects in a pair being placed in different classes in and in the same class in , is number of objects in a pair being placed in different classes in and in different classes in .
ARI can give a measure of the agreement between partitions and in classification problems [79]. When the ARI value is 1 (maximum), it indicates that the algorithm is doing the correct distinction between classes [79]. Necessarily, the greater the value of ARI is, the better the clustering performance will be. Therefore, the ARI can give us the characteristic's discriminant power [79].
Step 2. The SWD of characteristic samples of a kind of statistical characteristic in each type of bearings conditions is calculated, that is, the SWD of the elements of the row of the matrix . Therefore, we can obtain SWD sets, [SWD 1 , SWD 2 , . . . , SWD ], where SWD can be expressed by where 8

Shock and Vibration
Next, we can obtain SSWD( ), which is the sum of the SWD of characteristic samples of the th statistical characteristic for all cases of bearing conditions, where SSWD( ) can be expressed by In this paper, we presume that the SWD can be used to express the cohesion of data. Thus, there is the standard deviation sequence SSWD = {SSWD(1), SSWD(2), . . . , SSWD( )}, which becomes another evaluation index for features extraction. In this paper, we presume that the lower the value of SSWD( ), the greater the class cohesion of the characteristic.
Step 3. Obtain a new sequence, ASD = {ASD(1), ASD(2), . . . , ASD( )}, where the definition of ASD( ) is as follows: In this paper, we presume that the greater the value of ASD( ), the better the statistical characteristic sensitivity of the corresponding characteristic elements. Therefore, the sorted ratio sequence of ARI and SWD (SASD) can be obtained by sorting the ASD in descending mode.

Supervised Neighborhood Preserving Embedding with
Label Information (SNPEL). Although NPE can preserve the local neighborhood structure on the data manifold, it is mostly used as an unsupervised dimensionality reduction method, which does not take label information into account. However, the label information is useful for improving the dimensionality reduction performance and increasing the classification accuracy. Therefore, a novel dimensionality reduction method, SNPEL, was proposed. SNPEL naturally inherits the merits of SNPEL and LDA. The underlying idea of solving the problem mentioned above is that the optimization objective of LDA can be integrated into NPE; that is, the between-class scatter is maximized and the withinclass scatter is minimized.
Based on the description of NPE and LDA in Section 2, the optimization objective of SNPEL can be obtained by combining the optimization objectives of LDA and NPE. The objective function can be defined as follows: According to (15), the above objective function can be expressed as follows: The above optimization problem can be converted to the trace ratio optimization problem, and according to (21), the objective function (35) can be simplified as follows: where the matrix and matrix − need to be normalized. Thus, the final optimization objective function is presented as follows: where ( − ) nor and nor represent the normalized matrix ( − ) and the normalized matrix , respectively. Finally, the dimensionality reduction projection matrix A can be formed by solving a generalized eigenvalue problem: can be obtained. The detailed procedures of SNPEL are listed as follows.
Step 1. Compute Euclidean distance between samples and , and the -nearest neighbors (knn) are used to construct the adjacency graph .
Step 2. Compute the weights on the edges. Let denote the weight matrix with , having the weight of the edge from node to node , and let it be 0 if there is no such edge. The weights of the edges can be computed by the minimizing weighted objective equation (18).
Step 3. Compute the -dimensional mean vectors for the different classes of the dataset.
Step 6. Sort the eigenvectors by decreasing eigenvalues and choose eigenvectors with the largest eigenvalues to form the × -dimensional projection matrix .
Step 7. Compute the equation = . The -dimensional samples can be transformed to the -dimensional samples and procedures of dimensionality reduction have been completed.  Finally, with the utility of SNPEL, the low-dimensional feature matrices of the training and testing dataset can be obtained with more sensitive and less redundant information for the bearings fault identification and classification.

System
Framework. The implementation of the proposed method is shown in Figure 2, where the statistical analysis and the artificial intelligence approaches are systematically blended to detect and diagnose rolling element bearing faults. The whole fault diagnosis procedure is divided into four steps: signal processing, features extraction, features reduction, and patterns recognition.
In the first step, vibration signals collected from bearings are decomposed into different wavelet packet nodes by MOD-WPT. The single branch reconstruction signals of terminal nodes will be applied to generate statistical characteristics. With the utility of the proposed FSASD, the most sensitive statistical characteristics can be selected to construct feature vectors for the training classifier. The most sensitive statistical characteristics will be directly applied to extracting features for testing samples. Then, for the feature reduction, the low-dimensional training feature space is obtained by the proposed SNPEL, which generates a projection that can be used for dimensionality reduction of the testing feature space. The low-dimensional testing feature space can be obtained. SASD and projection matrix are obtained by processing the training set, which can be directly used by testing set. In the last step, the low-dimensional training feature set is employed as the input of the fault type to train the classifier. The trained classifier will be employed to conduct fault patterns recognition using the low-dimensional testing feature set. The procedure of this proposed method outputs the fault identification and classification accuracy.

Experimental Setup and Cases. The vibration dataset is freely provided by the Bearing Data Center of Case
Western Reserve University (CWRU) [45]. Figure 3 shows the system used for measuring the data that includes an electric motor (left), a torque transducer/encoder (center), a dynamometer (right), and control circuitry (not shown). The bearings used in this work are deep groove ball bearings of the type 6205-2RS JEM SKF at DE. The single fault (including ball fault, inner race fault, and outer race fault) was separately seeded on the normal bearing with different defect sizes (0.007 in, 0.014 in, 0.021 in, and 0.028 in) using electro-discharge machining [12]. The vibration signals were collected using accelerometers under different motor loads of 0-3 hp (motor speeds of 1730 to 1797 rpm).
In order to evaluate the effectiveness, adaptability, and robustness of the proposed bearing fault diagnosis method, the vibration signals of different fault types and degrees were employed. The detailed information of the used dataset is

Analysis
Results. According to the system framework shown in Figure 2, the first step is signal processing, in which vibration signals collected from bearings are decomposed into different wavelet packet nodes by MODWPT. In this paper, the decomposition level is 4 and the "dmey" is selected as mother wavelet. One ball fault vibration signal sample from the training set of 2 hp and the corresponding single branch reconstruction signals of terminal nodes are presented in Figure 4. According to the decomposition of vibration signals, 16 terminal nodes and the corresponding coefficients can be obtained. Then, we obtain 16 single branch reconstruction signals of terminal nodes and 16 corresponding Hilbert envelope spectra (HES), which can generate 192 statistical characteristics using 6 statistical parameters shown in Table 2. For 192 statistical characteristics of each sample, the class discriminative degree of each characteristic is different, which is reflected in Figures 5 and 6. In this paper, we provide four examples, of which two are time-domain characteristics (energy and energy entropy) and two are HES statistical characteristics (standard deviation and kurtosis).
The original feature set is composed of 192 statistical characteristics. Then, the FSASD is employed to select the sensitive statistical characteristics as the input feature vectors for the training classifier. The ARI, SSWD, and ASD of 192 statistical characteristics of the training samples are presented in Figures 7, 8, and 9, respectively. In Figure 7 In order to verify the effectiveness and adaptability of the proposed bearing fault diagnosis method, a series of comparative experiments are divided into two groups. The detailed descriptions of them are presented below. Furthermore, in order to verify the superiority of MODWPT, WPT is also applied for fault diagnosis, and the results are compared with those of MODWPT.
In the first group, the FSASD is not applied. The original feature set contains 192 statistical characteristics which are directly processed by some dimensionality reduction methods. OFS-SVM is a SVM-based diagnosis model, in which the OFS is the input of SVM. OFS-PCA/NPE/LDA/SNPEL-SVM are SVM-based diagnosis models with the use of PCA, NPE, LDA, and SNPEL, respectively. According to Tables 3-7, the performance of each model using MODWPT is better than that of the model using WPT.
The detailed results of all models using MODWPT are presented below. For the testing set of case 1, all models can achieve preferable performance. The accuracies of each model can reach over 96%, and the highest accuracy can reach 100%.
Here ( ) is series of a dataset for = 1, 2, . . . , , is the number of data points, and ( ) is the energy distribution of the signal ( ).       In the second group, the FSASD is applied to select the sensitive statistical characteristics before the implementation of features reduction and fault diagnosis. OFS-FSASD-SVM is a SVM-based diagnosis model, in which the sensitive characteristics can be selected from OFS by FSASD. OFS-FSASD-PCA/NPE/LDA/SNPEL-SVM are SVM-based diagnosis models with the use of PCA, NPE, LDA, and SNPEL, respectively. According to Tables 8-12 and Figures 10-21, the performance of each model using MODWPT is better than that of the model using WPT. The detailed results of all models using MODWPT are presented below.
The sfn is the number of selected characteristics. For the testing set of case 1, all models can achieve preferable performance. For the testing set of case 2, compared with the experimental results of the first group, diagnosis accuracies        of all models using FSASD appear to be an improvement. According to the experimental results of the second group, when a suitable parameter sfn is selected, it can achieve a desirable improvement on the diagnosis accuracy. According to Figures 12-21, we find that the fault diagnosis can attain better performance when the parameter sfn is in a relatively wide range; for example, for the performance of OFS-FSASD-SNPEL-SVM, the highest diagnosis accuracy can reach 100%. Therefore, on the one hand, the validity of the design of the correlation parameter can be  verified. On the other hand, it can verify that the proposed bearing fault diagnosis algorithm has great adaptability.

Experiments Based on the Test Rig 2 4.2.1. Experimental Setup and Cases.
In order to validate the adaptability of the proposed bearing fault diagnosis method, we collected vibration signals from SQI-MFS test rig to conduct some experiments. Figure 22 shows the test rig and Figure 23 shows that the bearings used in this work are the type SER205. The single fault (including ball fault, inner race fault, and outer race fault) was separately seeded on the normal bearing with different defect sizes (0.05 mm, 0.1 mm, and 0.2 mm) using laser machining. The vibration signals were collected from the bearings using accelerometers under different motor speeds of 1200 rmp and 1800 rmp, where the sampling frequency is 16 kHz.
The detailed information of the used vibration dataset is presented in Table 13, where ball, inner race, and outer race faults have three fault degrees, respectively, and there is also a normal condition. Therefore, there are 10 working conditions, corresponding to 10 fault patterns. In each fault pattern, 60 samples are acquired from vibration signals, while each sample contains 5000 continuous data points. Two cases are employed in the experiments, the same as test rig 1. The samples of a kind of motor speed are selected as the training samples and the samples of different motor speeds are selected as the testing samples. In case 1, 40 random samples of 1800 rmp are selected as the testing samples. In case 2, 40 random samples of 1200 rmp are selected as testing samples. For training samples, two cases use the same remaining 20 samples of 1800 rmp.

Analysis
Results. The procedure of bearing fault diagnosis for SQI-MFS test rig is the same as that for the test rig 1. In the experiments, MODWPT is applied for vibration signals processing. For 192 statistical characteristics, the class discriminative degree of each characteristic is reflected in    Figure 34. According to the experimental results of the second group, compared with the first group, on the one hand, the performance of the diagnosis model using the FSASD can have an improvement, which indicates that the different numbers of sensitive features have an effect on fault diagnosis accuracy. According to Figures 29-34, we find that the fault diagnosis can attain better performance when the parameter sfn is in a range; for example, for the performance of OFS-FSASD-SNPEL-SVM, the highest   Figure 20: The diagnosis results of models using MODWPT for the testing sets of two cases with the use of FSASD and different dimensionality reduction methods. The output dimension sizes of PCA, LDA, and SNPEL are 20, 11, and 20, respectively. The "NO" represents the model without using dimensionality reduction method. diagnosis accuracy can reach 89.83%, which can verify that a desirable improvement on the diagnosis accuracy can be achieved when a suitable parameter sfn is selected. On the other hand, the performance of the diagnosis model using different dimensionality reduction methods can also lead to different impacts on fault diagnosis accuracy, especially the fault diagnosis accuracy of the testing set of case 2. Because the proposed SNPEL can preserve the local geometry of the data and work well with multimodal data, at the same time, it can also take the label information into account in dimensionality reduction. Therefore, the low-dimensional feature space obtained by SNPEL is more beneficial to fault identification and classification. Through a series of comparative experiments, the effectiveness and adaptability of the proposed bearing fault diagnosis procedure for SQI-MFS test rig can be verified.

Conclusions
This paper proposed a novel procedure in order to identify and classify different bearing fault conditions. The proposed procedure, systematically blending statistical analysis with 24 Shock and Vibration artificial intelligence, is developed using MODWPT as multidomain features generation approach. Using the proposed FSASD as the most sensitive features extraction method, the modified NPE (SNPEL) as a feature dimensionality reduction technique, and SVM as an automated fault patterns recognition system, the experimental data collected from two experimental test rigs contain different bearing fault conditions such as ball fault, inner race fault, and outer race fault at different defect sizes.
According to the experimental results, the proposed bearing fault diagnosis method has great potential to be an effective and adaptable tool for precise identification and classification of bearing faults for a variety of bearing conditions. For the experimental test rig 1, in the proposed         datasets collected from the experimental test rig 2 (SQI-MFS) are employed. Cases 1 and 2 use the testing samples with different motor speeds, which are 1200 rmp and 1800 rmp, respectively. They use samples with the same motor speed (1800 rmp) as the training samples. The experimental results can also indicate that the diagnosis model using the proposed methods can achieve preferable performance.

Conflicts of Interest
The authors declare that they have no conflicts of interest.