Fuzzy Treatment of Candidate Outliers in Measurements

Robustness against the possible occurrence of outlying observations is critical to the performance of a measurement process. Open questions relevant to statistical testing for candidate outliers are reviewed. A novel fuzzy logic approach is developed and exempliﬁed in a metrology context. A simulation procedure is presented and discussed by comparing fuzzy versus probabilistic models.


Introduction
Measurement is intrinsically subject to uncertainty: according to the guide [1] "a measurement has imperfections that give rise to an error in the measurement result." Thus, an accurate statistical analysis is important to optimize the estimation process.
Given a set of measurements, an outlier is an element significantly different from the others (see, e.g., [2] and the standard [3]). The detection of an outlier in a data set can be really important. It can provide important information, like, the discovery of an unforeseen phenomenon, a miscalibration or fault in instrumentation, or a reporting mistake. Furthermore, it is important to take the correct decision about what to do with a candidate outlier: for this reason, different tests for outlier analysis have been studied.
Statistical tests perform screening of a dataset in order to individuate any candidate outliers, using hypothesis testing. Classical tests are the Grubbs [4] and the Dixon ones [5]; other similar tests were formulated, for example, by David and Quesenberry [6], Ferguson [7], and Thompson [8]. All these tests provide a statistic to be compared with a critical value in order to conclude whether the doubtful observation is an outlier or not. The only difference relies on the construction of the statistic and the choice of the critical value. Most of them detect one outlier at a time, so they have to be repeated several times on the screened dataset to detect any further outliers. Different tests, based on similar ideas and hypotheses, like in [9,10], have been developed for the simultaneous detection of many outliers.
Using a Bayesian approach, some theoretical problems for the above tests are highlighted in this paper: to cope with such problems, an outlier analysis based on fuzzy logic is proposed, and a fuzzy treatment procedure is developed. Some works of related interest are [11] (general theory and application of fuzzy sets and systems), [12] (evidence theory with application to uncertainty treatment), [13][14][15][16][17] (fuzzy treatments of uncertainty in diverse fields, including measurement and temporal information), [18,19] (Bayesian approach to outliers processing).
The paper is organized as follows. In Section 2, some statistical tests for the detection of outliers in a set of observations are recalled. A review of these outlier tests is provided, focusing on Grubbs and Dixon ones. Moreover, some numerical examples are provided, and a criticism to orthodox hypothesis testing is considered from a Bayesian point of view. In Section 3, a novel outlier treatment based on fuzzy logic is developed with a simulation procedure implemented in Matlab TM . Strategy and implementation aspects are detailed in Section 3.1. An application is reported in Section 3.2, where also the procedure performance is discussed by comparing fuzzy versus probabilistic models. Finally, in Section 4 the inference system architecture is recapitulated, and concluding remarks are pointed out.

Statistical Tools: State of the Art and Open Questions
According to standard [20], an outlier is "an observation that appears to deviate markedly in value from other members of the sample in which it appears." To process outliers, diverse tests have been originated and developed within the classical (Neyman-Pearson) hypothesis testing framework: a statistic is compared with a critical value related to a significance level α, leading to rejection of the null hypothesis H 0 = "the observation is not an outlier" if the statistic exceeds the critical value. Among the others, the Grubbs and Dixon tests can be considered paradigmatic examples (see standard [20]). They are used to screen sampled datasets, aiming at detecting possible outliers one by one. Tests for simultaneous detection of more outliers at a time were also proposed (e.g., [9]); however, in these tests the exact number of suspected outliers must be specified in advance, or at last an upper bound to this number must be known [10]. The Grubbs and Dixon tests are here briefly recalled, and a few examples are elaborated in order to discuss some relevant features with application to a Gaussian distributed sample, data set = {x 1 , . . . , x i , . . . , x n }.
A one-sided test looks for candidate outliers on one side only of the ordered dataset (i.e., either for maximum or for minimum), whereas in a two-sided test the dataset is screened on both sides. Focusing on the two-sided Grubbs test [4], the following statistic G is compared with a critical value G 0 obtained from the Students t-distribution: where μ and σ are the sample mean and standard deviation, respectively, and x is the value that maximizes |x i − μ| over the data set. If G(x) > G 0 , then x is considered an outlier at the related significance level α. Given the above data set, a Dixon test [5] yields the statistic where x is the candidate outlier, x the observation closest to x, and x max , x min are the maximum and the minimum respectively among the x i 's. The statistic Q is confronted with a critical value Q 0 that can be found in a table [21], where critical values originally calculated by Dixon [22] are corrected by use of interpolation analysis. Whereas the Grubbs test is not limited to small samples, the original examples presented by Grubbs [4] pertain to samples of sizes 15 and 8. Moreover, the Dixon criterion is used on samples of small size only (see the original work of Dixon [5]). Consequently, case-studies focused on datasets of size 10 are apt at highlighting features and issues peculiar to this family of classical statistics tests.
and let the Grubbs and Dixon tests be mutually compared on the same observed value x = 25.9, both at the significance level α = 0.05, for the null hypothesis H 0 . By applying the Grubbs test, at α = 0.05 the critical value is G 0 = 2.29. For the observation x = 25.9, G(x = 25.9) = 2.35 > G 0 : thus H 0 is rejected, and x = 25.9 is considered an outlier to be removed from the data set 1. On the contrary, by applying the Dixon test, at α = 0.05 the critical value is Q 0 = 0.477, thus Q(x = 25.9) = 0.46 < Q 0 ; therefore, H 0 cannot be rejected: in this case the same value x = 25.9 is not considered an outlier. The two tests-even though both correctly performed in the same testing conditions-lead to divergent decisions. Moreover, if the decision is to discard the detected outlier from the dataset in view of further statistical processing, the surviving data cannot be considered a random sample of mutually independent observations: all of them are in fact associated by having passed the same selection criteria.
The Grubbs and Dixon tests can be repeated in order to detect more than one outlier-if any-in a given dataset. However, as shown in the following example where data are Grubbs-tested for multiple outliers, a test may happen to be unstable. This is a consequence of the fact that the statistic G in (1) is a function of x and of the sample parameters mean μ and standard deviation σ: these parameters are subject to change with exclusion/inclusion of individual values.
x = 1.78 would not be considered an outlier (the initial inequality G(x = 1.78) > G 0 is reversed after duplication of the individual value x = 1.78). The conclusion is that drastic rejection or acceptance of a suspected outlier is not the best decision-making criterion: a better posed criterion might be to assign the suspected value a weight for further processing purpose. This is the idea developed in the next section by using fuzzy logics. Before moving to fuzzy logics, it should be noted that the classical Neyman-Pearson approach to hypothesis testing is challenged by Bayesian statistics. The probability, conditioned on the involved dataset, for an observation to be a candidate outlier can be dramatically different from the probability referred to in the family of classical tests (see, e.g, [19] for a numerical example illustrating how classical hypothesis testing can be prone to misinterpretation and misuse).
According to the Bayesian approach, the test is formalized in terms of inverse probability. Thus, the posterior probability of the hypothesis after the data have been observed is obtained by means of the Bayes rule. To develop a Bayesian model for candidate outliers testing, let the propositions H 0 (the so-called null hypothesis) and H 1 (alternative hypothesis) be two mutually exclusive and exhaustive hypotheses under test, namely, H 0 = "the observation is not an outlier;" H 1 = "the observation is an outlier." Let the proposition E = "the test result is positive for a suspected outlier," represent the available evidence.
The conditional probabilities P(E | H 0 ) = α and P(E | H 1 ) = 1 − β represent the test size (level of significance) and the power of the test, respectively. In tests based on orthodox statistics, two types of errors may occur in testing for outliers: an outlier may be missed with a probability α (type I error), or a false detection may occur with a probability β (type II error). A Bayesian approach to outlier testing is instead focused on computing the posterior probability of an observation being an outlier (H 1 ) given the test result is positive for a suspected outlier (E), that is, P (H 1 | E). This can be computed as follows.
The standard for dealing with outliers [20] remarks that rejection of aberrant observations should relay preferably upon physical-rather than statistical-grounds. On the base of such a remark, the treatment of possible outliers can be developed from a fuzzy logic standpoint, aimed at capturing physical grounds and related hypotheses by means of fuzzy processing tools. The proposed approach is developed starting from the consideration that the propositions "the observation is an outlier", and its negation can be modelled in fuzzy logic terms by assigning a truth degree, varying from zero (complete falsehood) to unity (full truth): the probability measure P(H 1 | E) can thus be replaced by a purpose-built-according to the strategy presented in Section 3.1-fuzzy outlierness degree.
In terms of fuzzy sets, the logical connectives of conjunction, disjunction and negation are translated by fuzzy settheoretic operations of intersection, union, and complementation, respectively. Using a standard model, originated by Zadeh [23] and further elaborated by Mamdani and Assilian [24], the fuzzy inference engine used in the present casestudy is detailed in the next section.

Strategy and Implementation.
To tackle the above open questions, an alternative treatment to candidate outlier detection and processing is here developed in the framework of fuzzy logic. Fuzzy logic is integrated in the framework of the possibility theory (see, e.g., [12]), where a counterpart of the Bayes rule can be derived [25]. Thus, a fuzzy logic treatment for outliers is not prone to Bayesian criticisms, unlike tests based on orthodox statistics.
To the purposes of present work, such a treatment is based on classical fuzzy logic rules, as introduced by Zadeh [23]. Fuzzy logic potential as a paradigm for uncertainty treatment in measurement is studied in various works, such as [12][13][14][15][16][17]26].
The focus here will be on criteria for transforming the outlier problem into fuzzy terms.
First of all, a definition of candidate outlying observations must be stated. According to the current use in technical literature (for distance-based approaches see, e.g., [27,28]), the following definition is considered: an observation is a candidate outlier if its distance from a predefined reference value exceeds a given threshold. In fact, this definition makes explicit the assumptions underlying classical tests-see (1) and (2).
In these terms, the reference value and the threshold value are defined as the mean and a multiple m (an integer value to be chosen for implementation) of the standard deviation respectively of a Gaussian probability density function (pdf).
Denoting by P(a | b) the pdf of a conditioned to b, the Bayes formula is P(a | b) = P(a)P(b | a)/(P(b)), where P(a | b) is the posterior pdf, P(a) the prior pdf, and P(b | a) the likelihood function. Putting P(a) = N (μ, σ 2 ), the mean and the standard deviation are identified by μ and σ, respectively. Note that μ and σ are parameters whose values must be set after an expert judgment (in metrology terms, σ is named type B uncertainty estimation [1]), to initialize the algorithm. The values of μ and σ are required to specify the prior pdf P(a), so they must be preset before starting the measurement process.
To design a suitable fuzzy strategy, some steps are required, so to introduce the notion of a fuzzy degree qualifying an observation to be a candidate outlier: for short, outlierness degree ρ = ρ(x) ∈ [0, 1]. The strategy can be detailed by introducing the distance d(x, μ) = |x − μ|, and the percentage expert's estimate of uncertainty, expressed by σ = σ ·100/μ (the case μ = 0 is not covered in this approach). For instance, by putting m = 5 the outlierness degree of a single observation x can be computed according to the following inference scheme that includes two inputs (fuzzy distance and %uncertainty) and one output (outlierness).
The fuzzy distance is obtained after a fuzzification of the distance d = d(x, μ), according to:
The inference engine is the basic Mamdani model [24] (with if-then rules, minimax set-operations, sum for composition of activated rules, and defuzzification based on the centroid method) available from the Matlab TM fuzzy logics toolkit (Identification of commercial products in this paper does not imply recommendation or endorsement, nor does it imply that the products identified are necessarily the best available for the purpose.) Here, the fuzzification is detailed in terms of fuzzy distance, fuzzy uncertainty, and outlierness. The membership functions (depicted by triangular or trapezoidal shapes in Figure 1) reflect expertbased choices, after selection from an interactive menu purposely implemented.
The Mamdani model is congenial to capture and to code expert-based knowledge in view of performing targeted simulations; accordingly the system's performance is tuned using heuristic criteria: Figures 1 and 2 illustrate its typical behaviour.
The outlierness degree ρ = ρ(x) is obtained by application of the centroid defuzzification method. This provides the abscissa of the barycentre of the fuzzy set composed according to the activated rules. The overall functioning of the rules is summarized in the 3D graph in Figure 2.  3.2. Application and Discussion. The resulting ρ(x) is used to determine a weight entered in processing the data set for estimation purpose. Each individual value in the data set is assigned a weight w = w(x), whose assignment rules are (w1) if d(x, μ) ≥ 5σ, then w(x) = 0 (fully outlier); In this way any fuzzy outlier, being not discarded from the data set, still contributes with its own weight to the final estimated value.
To assess the performance of the fuzzy treatment compared to Grubbs and Dixon tests, a numerical example is reported with reference to the data set 1 of Example 1. According to Grubbs test, the suspected value x = 25.9 is an outlier, on the contrary according to Dixon test the same suspected value is not an outlier. Such a disagreement is successfully managed by the fuzzy treatment, which assigns an outlierness degree.
The final result of the fuzzy procedure is influenced by values assigned to parameters μ and σ. For example, putting μ = 25.0 and σ = 0.4, the fuzzy procedure performance is shown in Figure 3 with application to the candidate x = 25.9. The candidate results a fuzzy outlier with outlierness degree ρ(25.9) = 0.636 and, according to assignment rule (w3), it is assigned the weight w(25.9) = 1 − ρ(25.9) = 0.364.
The efficacy of this fuzzy treatment is supported by another example developed with application to the candidate outlier x = 1.78 taken from data set 2 of Example 2. Here, Grubbs test and Dixon test yield mutually contradictory results (Grubbs test detects the outlier x). Moreover, introducing an extra candidate outlier x = 1.78 in the data set, a failure of the Grubbs test has been noted. Figure 4 shows how this candidate is detected and assigned its outlierness degree ρ(1.78) = 0.674: in this case μ = 1.70, and σ = 0.028.

Conclusion
The presence of suspected outlying values in measurements has given rise to a long-standing problem. Its difficulty is mainly due to the lack of sharp criteria for outlier detection and treatment in an estimation process. The classical statistical approach to candidate outlier detection and treatment has been reviewed, highlighting some problems that have been discussed at a logical level. To overcome some of these problems, a novel fuzzy logic approach has been proposed and a system has been implemented. The system performance has been tuned by simulations: optimization and integration for perspective in-process metrology is envisaged for further developments.
The notion of a fuzzy outlier is introduced and specified in terms of an outlierness degree founded on metrological rather than statistical grounds (as suggested by the standard [20]). Such a degree is computed as the result of a 2-input/1output fuzzy inference system. A Bayesian estimation process is referred to in the designed strategy. The expert-based estimates of the mean and of the standard deviation of the prior pdf in the Bayes rule, are used to initialize the process. Independence is not required in Bayes rule and the fuzzy treatment of data is not affected by statistical independence. Thus, whereas preservation of independence may be a problem for orthodox statistical tests, it is not for the proposed treatment of outliers. Fuzzifications of a candidate outlier's distance from the mean value and of the standard deviation provide the inputs to the fuzzy inference system. The outlierness degree is obtained by centroid defuzzification.
In the light of the results of the research work presented and discussed so far, the following conclusions can be pointed out: (i) compared to orthodox hypothesis testing for outliers, such as Grubbs and Dixon tests, the developed fuzzy approach is not prone to criticisms raised by Bayesian statistics; (ii) the outlierness degree can be conveniently translated into a relative weight assigned to an outlier entering an estimation process; (iii) the efficacy of the proposed fuzzy inference system has been demonstrated on heuristic grounds, with successful management of case-studies, where orthodox tests would lead to mutually divergent decisionmaking.