Stability Evaluation of Fault Diagnosis Model Based on Elliptic Fourier Descriptor

The performance evaluation of fault diagnosis algorithm is an indispensable link in the development and acceptance of the fault diagnosis system.Aiming at the stability evaluation of the fault diagnosismodel based on the characteristic clustering, an image edge detectionmethod based on the Elliptic FourierDescriptor (EFDSE) is proposed to evaluate the stability of the fault diagnosis model, which applies similarity measurement of image to effective evaluation of faulty diagnosis algorithm. The quantitative evaluation index of the diagnostic capability of characterization based cluster fault diagnosis model is used to provide reference for the acceptance and reliability of the diagnosis results. Finally, the effectiveness of the stability evaluation is verified by the fault data of the motor bearings.


Introduction
With the development of modern industrial technology and information technology, manufacturing systems in various fields such as new energy, communication, computer, and industry are becoming more and more complex.Due to the complexity of the structure and the influence of various potential factors, the system inevitably exists as the hidden trouble.Once the hidden danger is induced, the personnel and economic loss of different degree will be caused.Therefore, the method of system fault diagnosis has become the focus of researchers.There are three common fault diagnosis methods: fault diagnosis based on control model [1], diagnosis based on statistical method [2], and fault diagnosis based on Artificial Intelligence [3].At present, a large number of studies have focused on optimizing the stability of the fault diagnosis model, usually measured by the degree of diagnosis, and the higher the stability of the model in practical application is, the more cost will be paid.Therefore, it is necessary to analyse the effect of model stability on the effectiveness of fault diagnosis.However, there is still a lack of unified system for measuring the accuracy of models.The main methods are relative deviation and residual squared sum method [4] which are error analysis method, grey correlation theory [5] and ED train based on statistical data, Confidence interval [6], etc.The grey correlation theory can realize the diagnosis of multi data input, but it can only be applied within the range of the same characteristic parameters and cannot be used to compare with fault diagnosis with parameters of different feature ranges.Residual squared sum method is an evaluation method for regression model.It is not conducive to relative comparison between different fault diagnosis models, which is influenced by the absolute value of the dependent variable and the independent variable.The confidence interval index is based on the hypothesis that the result of the training data group's diagnosis conforms to the normal distribution.It will produce a large number of errors in the case of small data, and the upper limit of confidence interval does not converge to 1 with the increase of the accuracy.Therefore, it is not suitable for models with high accuracy in fault diagnosis.
On the basis of the above research, considering the true distribution of the fault diagnosis output, the stability evaluation method of fault diagnosis model based on Elliptic Fourier Descriptor is proposed, which apply similarity measurement to evaluation of faulty diagnosis algorithm and can provide objective evaluation without understanding the parameters of the fault diagnosis model, when using the fault

System Description and Model
In order to train and verify the new technology and new theory of motor bearing fault, a motor bearing state evaluation system developed by Rockwell has obtained a series of motor performance database [7] which can be used to verify or improve the performance evaluation of motor.Some projects that have been or are making use of these databases include Winsnode state assessment technology, model-based diagnosis technology, and motor speed determination algorithm.The experimental platform is shown in Figure 1.
As shown in Figure 1, the train stand consists of a 2 hp motor (left), a torque transducer/encoder (center), a dynamometer (right), and control electronics (not shown).The train bearings support the motor shaft.Single point faults were introduced to the train bearings using electrodischarge machining with fault diameters of 7 mils, 14 mils, 21 mils, 28 mils, and 40 mils (1 mil=0.001inches).See FAULT SPECIFICATIONS for fault depths.SKF bearings were used for the 7, 14, and 21 mils diameter faults, and NTN equivalent bearings were used for the 28 mil and 40 mil faults.Drive end and fan end bearing specifications, including bearing geometry and defect frequencies, are listed in the BEARING SPECIFICATIONS.
Vibration data was collected using accelerometers, which were attached to the housing with magnetic bases.
Accelerometers were placed at the 12 o' clock position at both the drive end and the fan end of the motor housing.During some experiments, an accelerometer was attached to the motor supporting base plate as well.Vibration signals were collected using a 16 channel DAT recorder and were post processed in a Matlab environment.All data files are in Matlab ( * .mat)format.Digital data was collected at 12,000 samples per second, and data was also collected at 48,000 samples per second for drive end bearing faults.Speed and horsepower data were collected using the torque transducer/encoder and were recorded by hand.
Outer raceway faults are stationary faults; therefore placement of the fault relative to the load zone of the bearing has a direct impact on the vibration response of the motor/bearing system.In order to quantify this effect, experiments were conducted for both fan and drive end bearings with outer raceway faults located at

Fault Diagnosis Model Based on Feature Clustering
Clustering analysis is a kind of unsupervised learning.
y: peak power (): instantaneous amplitude; : mean amplitude; (): probability density; : standard deviation Because of some eigenvectors of vibration may have certain correlation, the stability of clustering fault diagnosis models will be affected.Therefore, removing relevant eigenvectors is the first step to accomplish fault diagnosis.In this paper, principal component analysis (PCA) is used to extract unrelated feature vectors.The calculation results are shown in Figure 3.It can be seen that PAR and KURTOSIS can represent the feature vectors of DE, as shown in Table 2.
In order to effectively verify the reliability of the proposed evaluation method, this paper adopts DB (Davies-Bouldin) [12], Dunn Validity Index (DVI) [13], and Silhouette coefficient (SC) [14].More than ten indexes of clustering evaluation are used to pre-evaluate the optimal number of  DE eigenvectors, and motor bearing faults are classified into two categories.That is to say, Clara, Kmeans, and Dbscan will be used to divide the faults into two categories by means of Euclidean distance, respectively.In order to verify the effectiveness of the proposed method, the training sets T is randomly extracted from the feature sets F according to the proportion of 3:1, and then the training set T and the feature set F are divided into two categories, respectively.The results of the classification are as in Figure 4.
The first class data corresponding to the training set T and the feature set F are extracted respectively.See Figure 5.
In order to evaluate the stability of the fault diagnosis model based on feature clustering, the 2D points need to be mapped to a graphic based on a certain rule.This paper will use Dirichlet tessellation to accomplish points mapped.The Dirichlet (Delaunay) mosaic, also known as Voronoi Diagram or Thiessen Polygon, is a structure of computational geometry, which can be used for qualitative analysis, statistical analysis, and adjacent analysis [15].In this paper, the Euclidean distance between any two points in the first category is computed.Any point is seen as the vertex of a triangle will be connected to two nearest points of Euclidean The  points are mainly based on  coordinates and are sorted by  coordinates.
Step 1.The  points are sorted by mainly based on  coordinates.
Step 2. Structural process is as follows:  First, in order to smooth the image to reduce the obvious noise influence on the edge detector, the image adopted is the Gauss filter to check the image by Gauss filter whose size is 2 × 2, as follows:

EFDSE Algorithm
If a window A whose size is 3 × 3 in the image is, pixel  will be filtered.Then, after Gauss filtering, the brightness value of the pixel  is where * is a convolution symbol; : the sum of all elements in the representation matrix.The Canny algorithm uses four operators to detect the horizontal, vertical, and diagonal edges of the image.The operator of the edge detection returns the first-order value  of the horizontal and vertical directions   ; thus the gradient intensity  and  which is direction gradient of the pixels can be determined.
The gradient strength of the current pixel is compared with the two pixels along the positive and negative gradient direction.If the gradient intensity of the current pixel is maximum compared to the other two pixels, the pixel is retained as the edge point; otherwise the pixel is suppressed, which is called the maximum value suppression.After exerting maximum suppression, there are still some edge pixels caused by the change of noise and color.In order to solve these stray responses, the selection of high and low threshold is established.If edge pixels are higher than the high threshold, the edge pixels are marked as strong edge pixels; if the gradient value of the edge pixels is less than the high threshold is larger than the low threshold, the edge pixels are recorded as the weak edge pixels, but if the 8 neighborhood pixels of the weak edge pixels have one strong edge pixel, they can be retained as edge points; if the edge pixels are pixels, the edge pixels can be retained as edge pixels.The gradient is less than the low threshold, and it is suppressed.Figure 6 is detection by Canny operator.Their Edge contour IS detected (see Figure 7).

Fourier
Descriptor.Shape is one of the most important visual features of a target.The existing shape representation methods can be divided into two categories: shape representation based on region feature and shape representation based on contour feature.The contour based method mainly uses the pixel information of the target coverage area boundary to describe the shape [16,17].Fourier Descriptor is a classical contour based shape representation method.Cosgriff (1960) proposed it for the first time.The main idea is to describe the features of the contour by using a set of data that represents the overall frequency of the shape.It is invariable to the operation of rotation and translation and is the most widely used descriptor of the shape.In the aspect of algorithm research, researchers have done a lot of work in improving the shape representation based on Fourier Descriptor, in order to improve the description ability of shape.As for D Zhang and G Lu an enhanced general Fourier descriptor is proposed to extract the key part of the image content description.This method solves the shortcomings of the large number shape description which are not suitable for the generic shape description [18].SS Li, YD Huang, and JW Yang propose a region based affine invariant ring Fourier Descriptor for affine invariant feature extraction, which can be used to extract the contour features of objects with multiple components [19].R Kasaudhan and SH Son propose an enhanced version of grid distance Fourier Descriptor to calculate image similarity and improves image matching rate.B Belkhaoui, A Toumi, and A Khalfallah combine Fourier Descriptor with watershed (WS) algorithm to propose a process and method of automatic target recognition using inverse synthetic aperture radar image to solve the target recognition problem of radar image [20].First, we define a continuous curve () in order to explain a Fourier Expansion ( see Figure 8 ).() can be expressed by According to Euler's formula, If we define   =  1 −  2 ,  − =  1 +  2 Then, (8) can be derived by where   and   are said to Fourier Descriptor Then,   and  − can be derived by ( 8) and ( 9) The coefficients in (9) can be obtained by considering the orthogonal property.Thus, one way to compute values for the descriptors is  is the unit arc length along with boundary circle.In order to describe the outline of the image, the selected starting point needs to circle along the boundary curve.So, () is a periodic function of a period in which periodic is 2.In order to obtain the Elliptic Fourier Descriptor of the boundary curve, Fourier series expansion is first carried out, and it can be expanded by 1D Fourier series.
Then an expression of ellipse coefficient can be computed by ( 12) Then According to the relationship between trigonometric function and exponential function, there are Then is the number of sampling points in the contour curve, it is usually the half of the number of pixels in the contour curve, and   and   is the value at the sample point () and () when they lie in .According to ( 16) and ( 17),   can be regarded as the sum of complex numbers.That is, Here Equation ( 13) can be expressed: According to (21), the Ellipse Fourier Descriptor of the fault classification contour curve is to be obtained and is normalized, shown in Table 3.

Stability Evaluation of Fault Diagnosis Model Based on
Elliptical Fourier Descriptor.Assume that the class I contour edge descriptor of the training set T and the feature set F is 1 and 2, respectively.This paper shows that if the stability of the fault diagnosis model based on the feature clustering is good, and the cohesiveness of the class center is stronger.That is to say, when adding or removing the same characteristic of data to a certain class, changed degree of boundary shape of the cluster is very small and vice versa.Therefore, we define the similarity of the contour shapes of the two fault classification results defined as the stability evaluation criteria of the fault diagnosis model, as shown in cov() represents the covariance of the two descriptors;  represents the standard deviation of the descriptor vectors.The range of  is [0, 1].Value of  is close to 1; then stability of fault diagnostic is better.

Experimental Results and Discussion
The clustering results are usually verified by two kinds of techniques: one is the intracluster distance such as the within.cluster.sscalculation that is the square of each internal distance.The more similar the characteristics of the data in cluster, the better the clustering effect.The other is the distance between clusters such as the average contour coefficient calculated by avg.silwidth.The larger the value, the larger the difference of the data feature of different classes and the better the data area diversity of the clustering algorithm.The within.cluster.ssindex and avg.silwidth index of the three clustering results are compared with the EFDSE index in this paper (show in Table 4).As seen from the table, EFDSE indicates that diagnosis effect of Kmeans is the best, Clara is the second, and the Dbscan is the worst.It is consistent with the conclusion of avg.silwidth and within.cluster.ss,which proves that proposed EFDSE method in this paper is effective for the stability evaluation of the fault diagnosis model based on the feature clustering.Making EFDSE is fit for more and more faulty diagnosis method is our work direction in future.

Figure 1 :
Figure 1: An experimental platform for fault simulation of motor bearing.
It does not need to define the classes in advance or give a training sample to indicate what the data should have.Data sets can be divided into a number of different classes, and the intraclass data have very high similarity.This is very applicable where no standard information signs can be identified, such as fault diagnosis.Because some system parameters, environmental interference and noise are difficult to be confirmed accurately in the real environment, it is difficult to establish an accurate model of fault diagnosis model.Based on the data driven method, it avoids the mathematical modeling of the process and can be learned through historical data when the information of the diagnosis of the object mechanism is not clear.It can learn and model through historical data to complete the fault diagnosis.Commonly used clustering algorithms are Kmeans[8], BRICH[9], EM, DBSCN[10], CLARANS[11], etc.This paper mainly studies the stability evaluation method of fault diagnosis model (EFDSE).Therefore, Clara, Kmeans, and Dbscan are directly used for fault diagnosis of motor driven end bearing (DE) based on feature clustering.These clustering algorithms are distance-based clustering and density-based clustering, respectively.The data DE is a time series, as shown in Figure2In the process of fault diagnosis of motor bearings, the effect of fault feature extraction determines the final diagnosis rate.The peak to average ratio (PAR), kurtosis (KURTOSIS), and bias (SKEWNESS) of the vibration data cover the distribution features, statistical characteristics, and linear characteristics of the vibration, which can effectively reflect the main characteristics of the vibration events.Therefore, this paper regards these three characteristics as the basis of fault diagnosis, specific calculation method, and specific calculation method (1)(2)(3):

Figure 4 :
Figure 4: (a) Results of Clara clustering of T sets.(b) Results of Kmeans clustering of T sets.(c) Results of Dbscan clustering of T sets.(d) Results of Clara clustering of F sets.(e) Results of Kmeans clustering of F sets. (f) Results of Dbscan clustering of F sets.

Figure 5 :
Figure 5: (a) The class I of Clara clustering of T sets.(b) The class I of Kmeans clustering of T sets.(c) The class I of Dbscan clustering of T sets.(d) The class I of Clara clustering of F sets.(e) The class I of Kmeans clustering of F sets. (f) The class I of Dbscan clustering of F sets.
(i) If  = 2, return (ii) If  = 3, three points are connected to construct a triangulation net and return (iii) The  points are divided into subsets   and   on the basis of evenly principle or nearest neighbor principle (iv) Construct triangular net (  ) of   (v) Construct triangular net (  ) of   (vi) (  ) merge with (  ) and put back Step 3. Merge process (i) For given (  ) and (  ), calculate convex hull of   and   (ii) Obtain the top tangent  and the bottom tangent  (iii) Start from , according to left endpoint, right endpoint, and their adjacent points to complete (  ) and merge with (  ) until the  is encountered.The Voronoi diagram of the class I of T sets and F sets is shown in Figure6.

Figure 7 :
Figure 7: (a) Contour of T sets Clara clustering.(b) Contour of T sets Kmeans clustering.(c) Contour of T sets Dbscan clustering.(d) Contour of F sets Clara clustering.(e) Contour of F sets Kmeans clustering.(f) Contour of F sets Dbscan clustering.

4 . 3 .
) cos () , Fourier Description of the Edge Features of Fault Classification.Determining a starting point ( 0 ,  0 ) of the target boundary and moving along the counter clockwise direction at a certain speed, the boundary of the boundary point coordinates can be used to describe the boundary.The cluster boundary curve of the first class data set is defined as  () =  () +  () ,  = 0, 1, . . .,  − 1(12) EFDSE that map the fault classification results in 2D graphic, using graphic edge detection technology.By extracting the feature vectors of the contour curves of fault classification results, the contour shape similarity is calculated to evaluate the effect of fault diagnosis.It is a new method of stability evaluation based on feature clustering fault diagnosis model.It applies similarity measurement of image to valuation of faulty diagnosis algorithm.In the case of unknown data samples and data methods, the stability of the model fault diagnosis effect is evaluated only by the visual contour feature vectors of the fault classification results.From the experimental method and principle, the evaluation is applicable to the stability evaluation of fault diagnosis models based on feature clustering.But used clustering algorithm should be distance-based clustering and density-based clustering.

Table 1 :
Partial drive end and fan end of bearing fault data.

Table 3 :
Elliptical Fourier Description on the edge of fault classification.

Table 4 :
The index of stability Clara Kmeans Dbscan.