An Illustration of New Methods in Machine Condition Monitoring, Part II: Adaptive outlier detection

There have been many recent developments in the application of data-based methods to machine condition monitoring. A powerful methodology based on machine learning has emerged, where diagnostics are based on a two-step procedure: extraction of damage-sensitive features, followed by unsupervised learning (novelty detection) or supervised learning (classification). The objective of the current pair of papers is simply to illustrate one state-of-the-art procedure for each step, using synthetic data representative of reality in terms of size and complexity. The second paper in the pair will deal with novelty detection. Although there has been considerable progress in the use of outlier analysis for novelty detection, most of the papers produced so far have suffered from the fact that simple algorithms break down if multiple outliers are present or if damage is already present in a training set. The objective of the current paper is to illustrate the use of phase-space thresholding; an algorithm which has the ability to detect multiple outliers inclusively in a data set.


Introduction
The fundamental framework that describes the process of a Condition Monitoring (CM) strategy can be outlined in a sequence of steps [1]: operational evaluation, data acquisition, feature extraction and statistical modelling for feature discrimination. It is the last step of this process, i.e. the implementation of algorithms that analyse extracted data features for damage detection and quantification, that the second part of this study is concerned with.
In general, feature discrimination can be performed using machine learning algorithms and statistical methods. These methods can be categorised into the supervised and the unsupervised learning approaches. Unsupervised learning methods for damage detection do not require all the class labels of the acquired data and can provide information of potential abnormality in the data being analysed. On the other hand, although supervised learning methods might provide much more informative results, they require training data describing all the damage classes or operational and environmental scenarios of a monitored machine, that in most cases are not available. For this reason, unsupervised learning approaches, might have an advantage over the supervised learning ones in many practical CM applications. For unsupervised learning, one major challenge is that environmental and operational changes, that could affect the potential damage sensitive features, should be taken into account. Although, aspects of this issue are taken into consideration in this study, the analysis presented will not deal with it as its main research objective, therefore the authors refer to papers [2,3,5] for further exploration of the problem and its proposed data analysis solutions. Novelty detection is one of the fundamental unsupervised learning approaches, also known in the statistics field as outlier analysis. It works under the assumption that selected features defining a normal condition follow the Gaussian distribution [4]. Distance measures, such as the Mahalanobis squared-distance, can be used as discordancy measures. Procedures, such as the Monte Carlo Method, can be employed for the estimation of a threshold value.
Here, a new adaptive novelty detection method will be discussed for the purposes of damage detection on the synthetic data discussed in Part I. The proposed approach is based on the idea of spatial adaptation, a term that, in the statistical literature, describes regression and classification methods that do not require an initially pre-defined fixed basis, because of their ability to create this kernel basis from the data being analysed [6,7,8]. The main advantage in this case, is the better recovery of the functions actually occurring within real data [8].
The method proposed here exploits the three following concepts [9]: a) that differentiation enhances the high frequency portion of the signal, b) that the expected maximum of a random series is given by the universal threshold, and c) that good data cluster in a dense cloud in phase space or in Poincaré maps. The threshold used in order to identify the outliers in the data, is estimated in an adaptive manner, in the phase space of the dynamic system examined (operational bearing in our case).
The aim is to show that this approach of outlier detection can be a powerful tool for certain CM applications. The layout of the paper is as follows: the following section briefly explains the basic theory on the new adaptive threshold method proposed. Section Three explains the challenges faced in the estimation of thresholds when standard novelty detection methods are not provided with good quality training data. This is demonstrated by comparing results for an outlier detection scheme that uses the Mahalanobis squared-distance and the Monte Carlo process for threshold estimation to the results for the new adaptive threshold method. The paper ends with brief conclusions.

Theoretical background: the phase-space outlier detection method
The method iteratively constructs sequential ellipsoids in the dynamical system's threedimensional phase space: points lying outside the ellipsoids are designated as outliers. The three-dimensional phase space, also known as a Poincaré map or a phase space map, is a plot in which a variable and its derivatives are plotted against each other.
The threshold is defined by the Universal criterion, introduced in the landmark paper given in reference [8], forming an ellipsoid in the phase space that separates inliers from outliers. For n independent, identically distributed, standard, normal random variables ξ i the expected absolute maximum is: where λ U is termed the universal threshold. For a normal, random variable, that consists of n data points and whose standard deviation is estimated by σ and the mean is zero, the expected absolute maximum is: The algorithm consists of a number of sequential iterations that stop when the number of good data becomes constant (or, equivalently the number of new points identified as outliers, peaks in the power diagrams in this case, falls to zero). If u i is the dataset being analysed, then each iteration has the following steps: • Calculate surrogates ∆u i and ∆ 2 u i for the first and second derivatives from: • Calculate the standard deviations of all three variables σ u , σ ∆u and σ ∆ 2 u , and then the expected maxima using the universal criterion. • Calculate the rotation angle of the principal axis of the ∆ 2 u i versus u i using the cross correlation: • Each set of variables {u i , ∆u i , ∆ 2 u i }, determines a point in spherical coordinates. For each pair of these variables, an ellipse can be calculated. Therefore, for ∆u i versus u i the major axis is λ U σ u and the minor axis is λ U σ ∆u ; for ∆ 2 u i versus ∆u i the major axis is λ U σ ∆u and the minor axis is λ U σ ∆ 2 u ; and for ∆ 2 u i versus u i the major and minor axes a and b respectively, can be shown by geometry to be the solution of: • For each projection in space the points that lie outside of the ellipse are identified and replaced with a smoothed estimate in order to perform the next iteration.
At each iteration, replacement of the outliers reduces the standard deviations calculated in step two and thus the size of the ellipsoid reduces until further outlier replacement has no effect.
3. Time-domain outlier detection 3.1. For reference: outlier analysis of the enhanced by stochastic resonance synthetic data The classic outlier analysis approach, that computes the Mahalanobis squared-distance discordancy measures of the data feature matrix and compares them to a threshold, was initially used to analyse the synthetic data, processed using the optimised stochastic resonance approach, and presented in more detail in Part I.
For multivariate data the Mahalanobis squared-distance is estimated from the following equation: where {x} ζ is the feature vector corresponding to the candidate outlier, {x} is the sample mean of the normal condition features and [Σ] is the normal condition feature sample covariance matrix. The threshold value that labels the inlier and outlier observations can be estimated through the employment of a Monte Carlo method. Briefly, a p×n matrix (p-observations, n-dimensions, in order that the matrix dimensions match the dimensions of the extracted features) is generated and populated with elements randomly drawn from a zero-mean, unit standard deviation Gaussian distribution. Then the Mahalanobis squared distance is calculated for all elements and the largest value stored. This is repeated a large number of times, each time storing the largest Mahalanobis squared-distance, which are then sorted in order of magnitude. The threshold is assigned as a percentile of the resulting array of Mahalanobis squared-distances.
For the single impulse case, presented in Figure (1), a 5-dimensional feature was defined as a 5-point time window. In total this resulted in a 200 × 5 feature matrix. A series of 40 features was used as training data. The results are satisfying for this particular case, but restrictions arise regarding the need to define appropriate feature dimensions and choose training data that do not include outliers. The latter restriction, can be significantly challenging for larger and more complex datasets.   In Figure 2, the results of the outlier analysis algorithm for the multiple impulse signal are given. The choice of the amount and quality of the algorithm's training data is important for the algorithm's performance. In this case, a 10-dimensional feature was defined as a 10-point time window. In total this resulted in a 9600 × 10 feature matrix. The first series of 40 features was used as training data. Changing the feature dimensions and the amount of training data results in different outputs and can potentially hinder the robustness of the approach. An additional comment is that the algorithm is not a strong choice for time-domain signal analysis, if the nonstationarities that need to be detected in the signal are impulse-like, due to the short duration of the signal's amplitude variation. In addition, the results can only give approximate information of the time that these changes are happening, since the definition of a feature is a sample point time window. This problem, can be addressed by choosing a frequency domain approach, if the signals analysed are periodic. In many condition monitoring scenarios, this could be a valid assumption. For condition monitoring applications though, where the speed of the rotating machine examined is changing (e.g. wind turbine bearings/gearboxes), this might not be always well founded. Depending on the strong or weak nature of periodicity of the signal's nonstationarities associated with damage, the frequency domain approach could result in a relative quality of performance.

The phase-space threshold method results and discussion
The phase-space threshold method is used in this section for outlier detection. Figures 3 and 4 show the phase-space threshold analysis results for the single impulse case and Figure 5 shows the multiple impulse case results.
The results demonstrate that the method can be very efficient for detecting outliers and offers the advantage of adaptivity: outliers are detected through an iterative procedure that at each step re-defines the parameters describing the distribution of the analysed data and therefore does not directly depend on the existence of a training set corresponding to healthy state data. This iterative procedure followed for the detection of outliers means in practice that the "threshold" is not necessarily a constant value but is adjusted throughout the dataset according to the output of the algortithm's specific active iteration steps. This may lead to the assumption that variations that may appear in the system monitored, lasting longer than variations taking place for shorter time periods, often linked to common machine damage types in the condition monitoring research, might not inhibit the reliability of the method. This requires further study, but also indicates the potential of the method.  Figure 3. Phase space of SR output data: demonstration of constructed ellipsoid (single impulse case).
In Figure 5, one can see the results of the method for the second case of data analysed (multiple impulse data). The detection of outliers provides an advantage when compared to the results given in Figure 2, in the sense that the outliers can be directly linked to time instants and that the estimated "threshold", changes values throughout the dataset, each time, depending on neighboring data points and the rate of the difference in their values. Further study should be followed in order to confirm the performance of the method in experimental data.

Conclusions
The second part of this paper demonstrates how a novel adaptive method for outlier detection can be used in combination with the genetically-optimised stochastic resonance approach, for the purposes of bearing condition monitoring. Standard machine learning or statistics algorithms for novelty detection might under-perform in certain condition monitoring applications due to their training needs and the complexity of the data being analysed. Here, through our analysis of synthetic pre-processed data it is demonstrated that we can potentially avoid this risk by exploiting an adaptive approach. Further research is needed in order to confirm the results on experimental bearing datasets.