Research on the Feature Selection of Rolling Bearings’ Degradation Features

. The bearings’ degradation features are crucial to assess the performance degradation and predict the remaining useful life of rolling bearings. So far, numerous degradation features have been proposed. Many researchers have devoted to use dimensionality reduction methods to reduce the redundancy of those features. However, they have not considered the properties and similarity of those features. In this paper, we present a simple way to reduce dimensionality by classifying diﬀerent features based on their trends. And the degradation features can be classiﬁed into two subdivisions, namely, uptrends and downtrends. In each sub-division, there exists visible trend similarity, and we have introduced two indexes to measure this similarity. By selecting the representative features of the subdivision, the multifeatures can be dimensionality reduced. Through the comparison, the root mean square and sample entropy are two good representatives of uptrend and downtrend features. This method gives an alternative way for dimensionality reduction of the rolling bearings’ degradation features.


Introduction
Rolling bearings are widely used in rotary machinery as a component to provide a near frictionless environment to support and guide a rotating shaft, which has an important influence on the modern industry.At the same time, bearings are the most frequent reason for failure in mechanisms.An unexpected failure may cause not only the loss of property but also the loss of human beings, even leading to catastrophe.So, the technology of condition-based maintenance (CBM) comes into being to monitor the degradation process and predict the remaining useful life of bearings.Several approaches have been reported to monitor the degradation process of bearings, e.g., acoustic emission signals, temperature, lubricant analysis, electrical current analysis, and vibration signals.Among them, the vibration signal is believed to be the most extensively used approach in industries for diagnosis and prognostics due to the ease of measurement and analysis.To give a good representation of bearings' degradation process, many signal processing techniques are applied to extract different features.Good reviews for feature extractions can be seen in [1][2][3][4].
When having extracted numerous features, it is still difficult to estimate which features are better to trace the bearing's degradation process.In addition, these features are still with high dimensionality, and we need to select appropriate methods for reducing dimensionality to remove redundant features.Some researchers have devoted to this area.In [5], logistic regression is used to convert the multidimensional features into single health indicator.Dong and Luo [6] extracted the time domain, frequency domain, and time-frequency domain features and fused them by principal component analysis (PCA) and then used the least squares support vector machine (LSSVM) optimized by particle swarm optimization (PSO) for degradation process prediction.Similarly, Lu et al. [7] applied PCA to fuse multifeatures, and the degradation trend of slewing bearing was predicted using the LSSVM optimized by PSO.In [8], Yu employed a dynamic PCA for the dimensionality reduction of multifeatures and developed generative topographic mapping-based quantification indications for health degradation assessment.Finally, a variable replacing-based contribution analysis method is developed to verify that the fuse features are effective.Kang et al. [9] proposed a state assessment method based on the relative compensation distance of multifeatures and dimension reduced by locally linear embedding (LLE) algorithm.Li and Zhang [10] applied supervised locally linear embedding projection for machinery fault diagnosis.By using linear embedding projection (LPP), Yu [11] proposed a multivariate statistical process control-based bearing performance quantification index and combined exponential weighted moving average statistic for performance degradation assessment.Yu [12] proposed a local and nonlocal preserving projection (LNPP) based index for defect classification and performance assessment.Benkedjouh et al. [13] presented a prognostic method based on isometric mapping (Isomap) and support vector regression.
As introduced above, many references have contributed to reducing the dimension of multifeatures of rolling bearings for diagnostics and prognostics.e dimensionality reduction methods usually can be classified into two categories: linear one and nonlinear one.At present, a classification chart of basic dimensionality reduction methods can be seen in Figure 1.In particular, the perspective of manifold learning methods (i.e., LPP, LLE, and Isomap) accelerates the development of this academic field.However, there are three queries that the references above have not mentioned or solved.e first is that the feasibility of the multifeature dimensionality reduction in rolling bearings.For example, when applying the manifold learning methods, the first step ought to be determining whether there exists a manifold surface of those high ordered features.e second is that the persuasion or generalization ability of the applied method.
e results of the references above exist inconsistent.By comparisons of a specific case or two, it is hard to infer which dimensionality reduction method is better.e third is that there is lack of a principle for the number of dimensions that should be reduced to.e number must be predetermined, most of the researchers set it as two or three, but there should be a powerful reason to set the number of dimensions that should be reduced to.
With these questions, it is easy to think of a simple way to fix these questions.Take a classification of those features and then select the best performance representative to represent the corresponding type.Now, the question is changed to how many different types should be classed into.It simply put these features into two classes based on their trends.And we just need to measure which feature of the two classes has the best property.In this paper, first, we are going to summarize a criterion for the degradation features of rolling bearings.en, we will have a discussion of difference by their traditional classification modes.When conducting classification, we have found a trend similarity between features and introduce two similarity indexes to approximately measure this similarity.Finally, we can infer that the degradation features of rolling bearings have two main categories: uptrends and downtrends.By selecting the representative features of those classifications, the multifeatures can be dimensionality reduced.e rest of the paper is organized as follows.In Section 2, a criterion of the degradation features is summarized.In Section 3, two similarity indexes are introduced to measure trend similarity.In Section 4, a new classification of multifeatures is proposed based on the discussion of the traditional classification of the multifeatures.e two cases used in this paper are stated in Section 5. e discussion is in Section 6.Finally, concluding remarks are given in Section 7.

The Criterion of Being a Degradation Feature
It should make certain that which kind of features are good or not for prognosis.Not all the features of rolling bearings can be treated as degradation features.For example, the mean value cannot be treated as a degradation feature.Figure 2 shows the mean value of the whole life of Case I. e details of two cases, namely, Case I and Case II, we used in this paper have been exhibited in Section 5.As the figure shows, the mean value keeps straight all the time except a slight fluctuation close to the end of failure.e mean value can be treated as a diagnosis indicator of misalignment.Nevertheless, it could not be a degradation feature of rolling bearings.From the relative references, we can summarize a criterion of the degradation features as follows.

Criterion
(1) A degradation feature can be extracted from the runto-failure data.Generally, each file can extract a degradation feature point.(2) A degradation feature must have a trend which can assess the degradation process and should have a physical significance.(3) Generally, a degradation feature should not be a simple mathematical transformation from the other features.(4) In particular, it is better to have degradation feature extraction methods which have denoising performance and enhance the proportion of the signals which contain defect information.
Criterion 1 is the premised item.By extraction methods, generally, each file will extract a degradation feature point.us, each degradation feature point can constitute a time sequence which is the degradation feature.It is worth to explain that decomposition methods can make multifeatures which are not discussed in this paper.Criterion 2 is the foremost item.e role of degradation features is to assess the degradation process and further to predict the remaining useful life.Criterion 3 is a supplement of Criterion 2. Some researchers proposed features through elementary functions (e.g., asinh and atan).ese functions can make features are monotonous, but it is difficult to identify the degradation status, so they are not the degradation features yet.Criterion 4 is an additional criterion.
e degradation feature which has lower noise is relatively better.

Shock and Vibration
Figure 3 shows the peak-to-peak and root mean square (RMS) of Case I.Although they have a great difference in numerical value, it is easy to see a trend similarity between them.Since it is an average process when calculating RMS, there are little burrs in its curve, so relatively speaking, RMS is a better choice of the two features.
Actually, both peak-to-peak and RMS can measure a sort of energy of rolling bearings.ey are belonging to energy features which are a subdivision of bearings' features.If we check all the subdivision of bearings' features and select representatives to represent this subdivision, the dimensionality reduction problem of bearings can be solved.
Next, we will introduce a method to quantitatively describe the trend similarity which is an auxiliary for feature selections.

The Similarity Index of Trend Similarity
As we can see from Figure 3, it appears to find a trend similarity between the peak-to-peak and RMS.Many similarity indexes are based on distance measures, e.g., Manhattan distance, Euclidean distance, and Chebyshev distance.Since there is no specific definition of trend similarity, it is more difficult to measure the trend similarity.As a matter of experience, when referring to trend similarity, first thought to measure this similarity is comparing the derivatives of the two sequences.And we need to use fitting methods.However, the selection of fitting methods and their parameters becomes another question.
In this paper, we will introduce two similarity indexes to approximately describe this trend similarity.e first one is the Fréchet distance.e Fréchet distance is first proposed by Fréchet in 1906, and it is a measure of similarity between curves that considers the location and ordering of the points along the curves [14].An intuitive definition of the Fréchet distance can be described like that.Where a man is traversing a finite curved path while walking his dog on a leash, with the dog traversing a separate path.Assume that the dog varies its speed to keep the leash as much slack as possible: the Fréchet distance between the curves is the length of the shortest leash sufficient for both to traverse their separate paths.Note that the definition is symmetric with respect to the two curves [15].
A formal definition can be depicted as follows.Let S be a metric space.A curve A in S is a continuous map from the unit interval into S, i.e., A: Let A and B be two given curves in S. en the Fréchet distance between A and B is defined as the infimum over all reparameterizations α and β of the maximum over all t ∈ [0, 1] of the distance in S between A(α(t)) and B(β(t)).In mathematical notation, the Fréchet distance e Fréchet metric considers the flow of the two curves because the pairs of points whose distance contributes to the Fréchet distance sweep continuously along their respective curves.is makes the Fréchet distance a better measure of similarity for curves.For time series sequences, we need to use discrete Fréchet distance (DFD), also called the coupling distance.It approximates the Fréchet metric for polygonal

Shock and Vibration
curves, defined by Eiter and Mannila [16].e DFD considers only positions of the leash where its endpoints are located at vertices of the two polygonal curves and never in the interior of an edge. is special structure allows the DFD to be computed in polynomial time by an easy dynamic programming algorithm.In order to display the DFD intuitionally, Figure 4 shows an example of it.e DFD of curves P and Q is 1.8983 which is the length of the line in magenta.
As for the similarity between degradation features, another example is shown in Figure 5 to illustrate the calculation of the DFD between the peak-to-peak and RMS of Case I.It is important to normalize the ordinate first to ensure the consistency of range.It is worth noting that the pairs of points whose distance contributes to the DFD are upright or saying one-to-one correspondence, that is because we have not normalized the abscissa where the scale interval is 1. e distance between the correspondent two points is the length of each other.In this way, the DFD is equal to the maximum of all the corresponding two points' length.By the means of this method, the maximum of two curves' DFD can reach 1. e DFD of the example is 0.2257.But, in fact, there are just 14 corresponding lengths (CL) that are greater than 0.1.ose files are all concentrated at the end of the degradation process.It is normal to see that when the bearing is close to failure, the peak-to-peak is grown faster than the RMS since the vibration is fierce.Generally, the peak-to-peak and RMS have a similarity.
en, we propose a close index (CI) to measure the holistic similarity.
e close index (CI) η can be defined by η � num(CL < ε)/total num, i.e., the ratio of the number where CL < ε to the total number of the files.e parameter ε is threshold of similarity; in general, we set ε � 0.1.us, the CI of the two curves is 98.6%.Now, we have two similarity indexes to measure the trend similarity between two features: one is the DFD, and the other is the CI.Comparatively, the CI is more visualized, and it measures the overall similarity.ough the setting of parameter is empirical, it does not interfere with 4 Shock and Vibration the judgment whether the similar degree is higher or not of two features compared with the other two.For convenience, we have made an empirical classification of similar degrees by CI, as shown in Table 1.However, the DFD locates the most differentiated corresponding two points.It describes a kind of local dissimilarity which infers that bigger DFD means larger local dissimilarity.

A New Classification of Bearings' Features Based on Their Trends
Generally, the multifeatures of rolling bearings can be classified into the time domain, the frequency domain, the time-frequency domain, and complexity domain traditionally.Time domain features have been widely used.ey usually measure the statistical characteristics of a signal.When extracting frequency domain features, it needs to be converted into frequency domain by fast Fourier transform (FFT) method.e time-frequency domain features have made rapid progress recently.
e complexity domain is different from the above, and it measures the signals' complexity degree.In this section, we are going to search for the trend similarity of different features through the traditional classification of bearings and proposed a new classification of bearings' features based on their trend.

Time Domain Features.
Time domain features are a kind of features which are easy to think out and obtain.A commonly used time domain features' list is shown in Table 2.Not all the features can be treated as degradation features, e.g., feat1 (the mean value) is not a degradation feature as we have discussed.Feat2, feat3, and feat4 are the amplitude of root, the RMS, and the absolute mean value, respectively.All the three measure the average energy amplitude of the signal and have the same unit and same order.erefore, they can be classified as energy feature.As we can see from Figure 6, the three features have an extremely similar trend.However, there is some subtle distinction.
e trends of Case I and Case II are different.rough the viewpoint of energy, we can conjecture the process of degradation.For Case I, an outer race fault example, there is no overt change before #520 (where # means the number of files), and we estimate the bearing is in normal condition.Between #520 and #700, the curve increases in a linear way, and we guess the bearing is in slight fault.In this stage, the accumulated stresses reach a certain value, and this indicates that a dentation process is developing.e dent will have specific asperity that produces stress and energy concentration and become more deteriorated gradually.When the stresses reach a certain threshold, the crack is opened.At about #700, there is a sudden change, and we guess it is the occurrence time point of the crack.From #700 to #850, the bearing should be in the severe fault condition.e asperities are smoothed by the continuous rolling contact and abrasive wear actions.at means the generated stress due to dents' asperities will be reduced.As the damage spread over a broader area, the vibration level raises again.is is called "healing" phenomenon and has been stated in [2,17,18].In this stage, the crack continues to propagate and the stresses are still accumulating.During this time, the spalling occurs.At about #850, the defect is completed.From #850 to the end, the condition of bearing is becoming fierce.e damage sustains growth.e "healing" phenomenon expands, and the variances enlarge.According to [19], the whole process of degradation consists of two visible "healing" spans and two peaks.As close to failure, the feature experiences a significant increase.For Case II, the inner race fault example, though it seems to be monotonous, we can find there exist two "healing" spans.ere is a slight decrease at the beginning of degradation.It is considered as a run-in period.Before #1200, the bearing is in the normal stage for there is little change.From #1200 to #2750, the curves are increasing with the variance enlarging.e bearing is gradually transitioned from slight to severe fault.From #2750 to the end, the curves are moving up sharply, and the bearing is no doubt in failure stage.Take an overall survey of the two cases, the energy curves are increasing which means the fault is growing up even though there exist back and forth.e feat5 and feat6 are the third and fourth central moment, and there is a minus in front of α (also in feat9) to make the value positive.e feat10 is the unbiased estimation of variance and also the second central moment.
ese three features have a similar trend but have a serious problem that bigger x i will make the curves steep.
Here, we have the third and fourth standardized moment, the skewness (feat15) and kurtosis (feat16).RMS can be seen as the second central moment when X is close to zero.Skewness and kurtosis are dimensionless, and they have their respective statistical meanings.Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.Negative skewness manifests that the tail on the left side of the probability density function is longer or fatter than the right side, vice versa.
Kurtosis is another statistic measure which can weigh the "tailedness" of the probability distribution.Figure 7 shows the skewness and kurtosis also with RMS of the whole life of Case I and Case II.It can be observed a similarity compared with RMS.We can see that there is a zoom at the end of the degradation in kurtosis of Case I.As we all know, kurtosis is a good feature for diagnosis.However, the monotonicity is less than RMS.For a white Gaussian noise, the kurtosis is close to 3, but as Figure 7 shows, it is close to 3 again near the end of failure.at cannot be explained.
ere are amplifications at the local peaks compared with the RMS in Case II.Feat7, feat8, and feat9, as shown in Figure 8, are the peak-to-peak, maximum, and minimum values.eir trend is similar too, if the mean value is close to zero.e peak-to-peak is about twice over the other two.e three features can measure the one-order energy of the signal too.However, these features have more uncertainty than RMS, since they only can measure the peaks of each file.Feat11, feat12, feat13, and feat14 are shape factor, crest factor, impulse factor, and clearance factor, respectively, as shown in Figure 9. ey are all dimensionless.As we discussed above, the RMS represents the energy of signals.And they have a similar trend.So, the shape factor is fluctuating around its mean (close to 1). e same event occurs to the crest factor.It is fluctuating around its mean too.e crest factor is sensitive to the files where there has a big maximum value, but there is little change of the mean energy.ese files are thought to be where the defect exists.e clearance factor has the similar function to the crest factor.e impulse factor is steep.It is sensitive to where the mean value is very close to zero.ose four features sometimes cannot be deemed as degradation features, for they cannot trace the process of degradation properly.But these features can be considered as diagnosis indicators.
As shown in Table 2, the similarity indexes based on RMS are also calculated.e similarity indexes of RMS are no doubt 0 and 1.For feat2, feat4, feat7, feat8, and feat9, we can see that they are in extremely similar degree.And for feat5, feat6, and feat10, they are in very similar degree.If we take order normalization of the three, the normalized features are in extremely similar degree too.It is easy to realize that features extracted from a set of data which have a similar physical significance and same order (i.e., same dimension) should have a similar tendency.e mentioned normalized features can be called energy features.Taking a panoramic view of the degradation process, the energy features have a   3 lists seven frequency domain features.ese features are all calculated in the frequency domain by using FFT.For p 1 , it is the mean value of the signal's frequency amplitude.Based on Parseval's theorem, we have the equation that (1/N) N n�1 x 2 i �  K k�1 (s(k)) 2 .So, p 1 presents a kind of energy of the signal.No wonder that it is extremely close to RMS.For p 2 to p 4 , the equations are similar to δ 2 , S k , and K v .ey are the variance, skewness, and kurtosis of the frequency domain, respectively.From Figure 10, we can see the three (normalized) have a similar trend compared with RMS.

Frequency Domain Features. Table
e unit of p 5 , p 6 , and p 7 is hertz; among them, p 5 is the gravity frequency.Actually, the three features measure kinds of change of frequency concentration.As shown in Figure 11, the three features show a similar trend, Shock and Vibration especially, the p 5 and p 6 .Note that the p 7 of Case I has a different trend ranged from #500 to #800.And then, we take the envelope of each file's signal to calculate these three features displayed in Figure 12.Once processed after envelope analysis, the trend of the three features is similar.So, envelope analysis can remove interference signal and make the demodulation signal contain more defect information.However, these three features cannot be regarded as degradation features for they do not have a good trend for degradation assessment.When referring to the envelope analysis, the envelope domain features are more commonly used which are a subset of frequency domain features.Envelope analysis is broadly used to process the bearings' signal.For the vibration data of bearings, signal modulation effect is one of the problems for processing.e modulation effect can be solved by using  8 Shock and Vibration envelope analysis.When localized defects occur at races or a roller, the vibration signal becomes amplitude modulated.By using envelope analysis, the defect frequency can be demodulated and appear in the envelope spectrum.Usually, the defect frequency includes the ball-pass frequency of outer ring (BPFO) fBPFO, the ball-pass frequency of inner ring (BPFI) fBPFI, and the ball-spin frequency (BSF) fBPF.By knowing the failure modes of Case I and Case II, we can extract the amplitude peaks at the characteristic frequencies from each file.us, we can have the feature named "amplitude of defect frequency (ADF)" of each case.e results are shown in Figure 13.As we can see, for Case I, the outer race fault, the ADF and RMS have a similar trend.e similarity indexes of the both normalized features are 0.2844 and 89.51%.But, for Case II, there is something different between the two features' trends.As for RMS, there is a stable rising trend ranged from #1000 to #2748 while for ADF, there is a long period straight trend until it is close to failure.To delve the phenomenon of Case II, we extract the sum of amplitudes of fBPFI, 2 × fBPFI, 3 × fBPFI, i.e., the base defect frequency, second defect harmonic, and the third defect harmonic, respectively.We name it as ADF3, and it is shown in Figure 14.It can be seen there is a slightly increasing period ranged from #1000 to #2748 similar to the Shock and Vibration RMS.From the above, we can infer that ADF is the underlying determinant of the RMS.And the RMS is the outward manifestation of the ADF.Actually, there are more than eight frequency domain features.We have not listed them because they are lacking more explicit physical significances.Some of the frequency domain features can relate to time domain features, just like p 1 .ey measure specific kinds of energy.e others are like p 5 , and they measure specific concentrated frequency.Whatever the frequency domain features are, they must conduct FFT.However, FFT has its disadvantages, e.g., truncation error and leakage error.Furthermore, the traditional Fourier transform is not suitable to process the nonstationary signal.So, the frequency domain features are not very accurate.

Time-Frequency Domain Features.
Nowadays, timefrequency analysis is developed rapidly, and it can describe the time domain and frequency domain information of the signal at the same time.Many time-frequency signal processing techniques have been proposed for bearing diagnosis, e.g., wavelet methods, empirical mode decomposition (EMD) ( [20], [21]), local mean decomposition (LMD) [22], intrinsic time-scale decomposition (ITD) [23], variational mode decomposition (VMD) [24], and empirical wavelet transform (EWT) [25].We can classify them into two groups.One includes the first four decomposition methods, for they decompose signals in a dichotomy way.e other includes the last two methods.When carrying out VMD or EWT, the decomposed signals are exhibited as in different band-pass filters.And both VMD and EWT are not recursive methods.10 Shock and Vibration By using these methods, the signal is decomposed into several subsignals.And it is difficult to extract a degradation feature simply by these methods.So, it is usually combined with energy or complexity measures to extract degradation features.Pan et al. [26] developed an assessment model based on second-generation wavelet packet decomposition (WPD) and support vector data description (SVDD) for health assessment of the bearings.e degradation features used were the energies of the wavelet packet nodes.Pan et al. [27] further proposed a new approach using second-generation WPD with fuzzy c-means (FCM) for performance degradation assessment.Wavelet packet node energies are also used to compose feature vectors.In [28], Hong et al. utilized wavelet packet-empirical mode decomposition for feature extraction.e corresponding entropy features are extracted from the raw signal after wavelet packet decomposition.An energy feature extraction method based on ensemble empirical mode decomposition (EEMD) and Gaussian mixture model is proposed in [29].
As revealed in the references, usually the time-frequency signal processing techniques are used for denoising the raw signal and selecting the subsignals which include the defect or degradation information.And then a time-frequency domain feature can be extracted by combining with energy or complexity measures.In this way, time-frequency domain features are turned to be energy features and complexity measures.

Complexity Features.
Complexity measures are different from the energy features.Many references have used randomness complexities for diagnosis and prognostics.Zhao et al. [30] proposed a quantitative diagnosis method of a spall-like fault for bearings based on empirical mode decomposition (EMD) and approximate entropy (ApEn).Yang et al. [31] proposed a bearing diagnosis method based on EMD energy entropy and ANN.Zheng et al. [32] presented a bearing diagnosis approach based on local characteristic-scale decomposition (LCD) and fuzzy entropy (FuzzyEn).Shannon entropy (ShEn) is selected as one of the basic features for prognostics in [33].Yan et al. have applied permutation entropy (PermEn) as features for bearings diagnosis in [34].A diagnosis method based on multiscale entropy and adaptive neurofuzzy inference is proposed in [35].Pan et al. have applied correlation dimension and ApEn in the performance degradation process of bearings [36].In the numerous relevant literature studies, authors have applied many randomness complexities for research and combined with signal processing methods like EMD and wavelet transform.No matter what the forms of the randomness complexities are, the basic principle of randomness complexities is invariable, namely, the greater the regularity is, the lower the randomness complexities' value is.For convenience, when we talk about randomness complexity later, we use complexity instead.
In this next, we are going to apply six commonly used complexities, i.e., ShEn, ApEn, sample entropy (SampEn), FuzzyEn, PermEn, and LZC.ShEn is the first proposed complexity [37].It is sensitive to the noise.In 1976, Lempel et al. proposed a complexity called LZC [38].In 1991, Pincus gave an approximate valued of Kolmogorov-Sinai entropy named ApEn [39].SampEn is a modification of ApEn proposed by Richman et al. in 2000 [40].Compared to ApEn, SampEn has a relatively trouble-free implementation and has data length independence.Moreover, SampEn need not to calculate the template vector composed by itself.In 2002, Bandt et al. introduced PermEn which is based on comparisons of neighboring values of times series [41].Chen et al. proposed FuzzyEn in 2007, and they extended the "membership degree" in ApEn with a fuzzy function [42].e calculations are ignored in this paper.To give a fair comparison, the same parameters should set equally.Table 4 shows the parameters.Notice that the embedding dimension m of PermEn is not like ApEn and SampEn; big m will make greater time.And we set it 6.
e six complexities' degradation features of Case I and Case II are calculated as shown in Figures 15 and 16

Shock and Vibration 13
deepens, the defects occur and propagate, thus making the vibration signal become more periodical so that the complexities' features show a downtrend.And we will benchmark which one has the best performance.First, it should test the periodical signals with different intensity noise.And we set up a group of simulation signals, S(t) � X(t) + e(t), where X(t) � sin(2π × 10t) and e(t) is the additive noise.e signal's sampling frequency is 10 kHz. Figure 17 shows the normalized complexities with different SNRs.
All the complexities are rising with the increase of the noise.However, ShEn and PermEn do not have a good monotonicity.Both are the worst.
Although we have tested the signals with additive noise, the test signals are not general.References [43][44][45] showed that the bearings' signals are with chaotic properties.So, we are going to test the complexities with chaos signals.e logistic map is a simple way to generate chaos signals.Figure 18 shows the logistic map with the largest Lyapunov exponents (LLE).LLE can only measure the chaotic system.When it is periodical, LLE � 0. e six complexities are shown in Figure 19.As we can see, ShEn and PermEn are the worst.FuzzyEn has something wrong at edges of periodical and chaos.LZC has some wrong value about μ � 3.6.ApEn and SampEn have better performance.
e length of data can affect the complexities' value.Figure 20 shows a simulated signal with the length from 100 to 4000.As we can see, PermEn and ShEn have an increasing convergence trend.e complexities' values are convergent after 2000 data points.e others are better since they are convergent before 2000 points.
Above all, we have compared the six comparisons with three methods.Among them, ApEn and SampEn have better performance.Since SampEn is an improvement of ApEn, the SampEn shows the best performance.It can be a representative of the complexity features.

The Cases of Bearings' Run-to-Failure Data
In this section, two run-to-failure data are used to visualize and validate the trend similarity of different features.

Case I (Outer Race Fault).
Case I data come from IMS center, as shown in Figure 21.e details of the test can be seen in [46].We used set no. 2 which exhibits outer race defect as Case I.

Case II (Inner Race Fault).
Case II comes from the IEEE PHM 2012 Prognostics Challenge data, which is provided by FEMTO-ST Institute.e details of the data can be seen in [47].Figure 22 shows the experimentation platform which is named PRONOSTIA.We use the first dataset in the first load condition as Case II.
Since we have no idea of the failure mode of Case II, we will take the envelope spectrum of the last file data (i.e., #2803) which is shown in Figure 23 to figure out the failure mode of Case II.We can see the peak with 218.8 Hz.By means of the calculation of characteristics frequencies, we have the ball-pass frequency on inner race (BPFI) for 221.66 Hz and the ball-pass frequency on outer race (BPFO) for 168.34 Hz and fundamental train frequency (FTF) for 12.95 Hz.

Discussion
Prior work has enumerated and discussed the degradation features in time domain, frequency domain, time-frequency domain, and complexity domain.A fact must be recognized that the degradation features are endless, and it is impossible to fuse all the features.As mentioned previously, many references utilized different methods for dimensionality reduction.However, they ignored the physical significances of the degradation features.If you want to fuse two features, the first thing is to make the two have the same ordinate unit.14

Shock and Vibration
As it is revealed, the basic degradation features of the same ordinate unit have a trend similarity.At present, there is no precise definition of trend similarity.When talking about trend similarity, it comes from the idea that using curve fitting method and comparing the derivatives of two features.However, both the selection of fitting method and parameters of selection method to be set are difficult.Meanwhile, it is not accurate to calculate the derivative of time series.So, we have used DFD and proposed CI to approximately measure this trend similarity.
As discussed earlier, we have classed the features based on their physical significances.From the classification, we can simply categorize the features in two classes.One is uptrend features, and the other is downtrend features.
ough there are many frequency features, they are hardly can be regarded as degradation features.And then, we can use a typical one, e.g., to represent this kind of features.We take RMS and SampEn as the representatives.In essence, the energy and the complexity features are related.Figure 24 has shown the SampEn and RMS together of Case I and Case II.We can see a synchronous reverse trend similarity in the fist 85% time of the whole process.As the degradation deepens, the dent or defect will make the stresses and energy concentration; meanwhile, the dent or defect will make the signal more periodical.However, when close to failure, the energy increased rapidly, but the complexity does not change much.We surmise that the defect is completed on the surface of interactions and competitions.We consider that the defect signal accounts for a large proportion of the overall signal.A simple example is that if a signal's amplitude   16 Shock and Vibration increases proportionately, then the complexity of each formed signal during the process is unaltered.In turn, when the bearing turns close to the failure, the difference between impact amplitude and overall signal is not so high.rough the run-to-failure process, the RMS destines to have an increasing trend and the SampEn destines to have a decreasing trend.In addition, we have used all the IEEE PHM 2012 Prognostics Challenge bearings' data for validation.ough some of results show a long period flat, at least it lines with the regulation close to the failure.It is worth to mention that the sampling time and sampling frequency should be constant, or there exists a jump at the point of the change.Future work should focus on finding or proposing better representatives of energy and complexity features combined with new signal processing techniques.e fault information can be better extracted through these techniques.

Conclusions
In this study, we have summarized the criterion of degradation features.And then, we have listed multifeatures of rolling bearings using two run-to-failure bearings' data.We have classified them in their different domains.rough the process, we have found a trend similarity of degradation features whose dimensions are the same.We use the DFD and propose CI to approximately measure this similarity.By doing this, we can simply categorize the degradation features in two classes, namely, uptrend and downtrend features.RMS and SampEn are two good representatives of them.e Shock and Vibration degradation process can be presented through these two features.is method gives an alternative way for dimensionality reduction of the rolling bearings' degradation features.

Figure 1 :Figure 2 :
Figure 1: e classification chart of basic dimensionality reduction methods.

Figure 6 :
Figure 6: e X r , X rms , and μ |x| of the whole life of (a) Case I and (b) Case II.

Figure 8 :Figure 7 :
Figure 8: e peak-to-peak, maximum, and minimum values of the whole life of (a) Case I and (b) Case II.

Figure 9 :
Figure 9: e shape factor, crest factor, impulse factor, and clearance factor of the whole life of (a) Case I and (b) Case II.

Figure 10 :Figure 11 : e p 5 ,
Figure 10: e p 1 , p 2 , p 3 , p 4 , and RMS of the whole life of (a) Case I and (b) Case II compared with RMS (the five features are normalized).

7 Figure 12 :Figure 13 :
Figure 12: e p 5 , p 6 , and p 7 of Case I processed after envelope analysis.

Figure 14
Figure 14: e ADF3 of (a) Case I and (b) Case II compared with RMS.

Figure 16 :
Figure 16: e six complexities of Case II.

Figure 20 :
Figure 20: e curves of six complexities versus data length.

Figure 24 :
Figure 24: e SampEn and RMS of (a) Case I and (b) Case II.

Table 1 :
An empirical classification of similarity by CI.

Table 2 :
e time domain features and their similarity indexes of Case I and Case II based on RMS.
Note: x i is a signal series for i � 1, 2, • • • , N, in which N is the number of the data points.

Table 3 :
e frequency domain features and their similarity indexes of Case I and Case II based on RMS.
Note: s(k) is a spectrum for k � 1, 2, . . ., K, in which K is the number of spectrum lines, s(k) ≥ 0. f(x) is the frequency value of the kth spectrum line.

Table 4 :
e summary of the selected parameter values.