Multiple outlier detection tests for parametric models

We propose a simple multiple outlier identification method for parametric location-scale and shape-scale models when the number of possible outliers is not specified. The method is based on a result giving asymptotic properties of extreme z-scores. Robust estimators of model parameters are used defining z-scores. An extensive simulation study was done for comparing of the proposed method with existing methods. For the normal family, the method is compared with the well known Davies-Gather, Rosner's, Hawking's and Bolshev's multiple outlier identification methods. The choice of an upper limit for the number of possible outliers in case of Rosner's test application is discussed. For other families, the proposed method is compared with a method generalizing Gather-Davies method. In most situations, the new method has the highest outlier identification power in terms of masking and swamping values. We also created R package outliersTests for proposed test.


Introduction
The problem of multiple outliers identification received attention of many authors. The majority of outlier identification methods define rules for the rejection of the most extreme observations. The bulk of publications have been concentrated on the normal distribution (see [1][2][3][4][5][6], see surveys in [7,8]. For non-normal case, the most of the literature pertains to the exponential and gamma distributions, see [9][10][11][12][13][14][15][16][17]. Constructing outlier identification methods, the most of authors suppose that the number s of observations suspected to be outliers is specified. These methods have a serious drawback: only two possible conclusions are done: exactly s observations are admitted as outliers or it is concluded that outliers are absent. More natural is to consider methods which do not specify the number of suspected observations or at least specify the upper limit s for it. Such methods are not very numerous and they concern normal or exponential samples. These are [1,5,18] methods for normal samples, [15,16,19]) methods for exponential samples. The only method which does not specify the upper limit s is the [2] method for normal samples.
We give a competitive and simple method for outlier identification in samples from location-scale and shapescale families of probability distributions. The upper limit s is not specified, as in the the case of Davies-Gather method. The method is based on a theorem giving asymptotic properties of extreme z-scores. Robust estimators of model parameters are used defining z-scores.
In Section 2 we present a short overview of the notion of the outlier region given by [2]. In Section 3 we give asymptotic properties of extreme z-scores based on equivariant estimators of model parameters, and introduce a new outlier identification method for parametric models based on the asymptotic result and robust estimators. In section 4 we consider rather evident generalizations of Davies-Gather tests for normal arXiv:1910.10426v1 [math.ST] 23 Oct 2019 data to location-scale families. In Section 5 we give a short overview of known multiple outlier identification methods for normal samples which do not specify an exact number of suspected outliers. In Section 6 we compare performance of the new and existing methods.

Outliers and outlier regions
Suppose that data are independent random variables X 1 , . . . , X n . Denote by F i (x) the c.d.f. of X i . Suppose that if the data are not contaminated with unusual observations, then the following null hypothesis H 0 is true: there exist θ ∈ Θ such that There are two different definitions of an outlier. In the first case outlier is an observation which falls into some outlier region out(X). The outlier region is a set such that the probability for at least one observation from a sample to fall into it is small if the hypothesis H 0 is true. In such a case the probability that a specified observation X i falls into out(X) is very small. If an observation X i has distribution different from that under H 0 then this probability may be considerably higher.
In the second case the value x i of X i is an outlier if the probability distribution of X i is different from that under H 0 , formally F i = F (x, θ). In this case outliers are often called contaminants.
So in the first case exists a very small probability to have an outlier under H 0 . In the second case outliers (contaminants) do not exist under H 0 and outliers (contaminants) do not necessary fall into the outlier region. Both definitions give approximately the same outliers if the alternative distribution is concentrated in the outlier region. Namely such contaminants can be called outliers in the sense that outliers are anomalous extreme observations. In such a case it is possible to compare outlier and contaminant search methods.
In this paper, we consider location-scale and shape-scale families. Location-scale families have the form The right-sided α-outlier region for a location-scale family is If f 0 is symmetric, then the two-sided outlier region is simpler: The value of α is chosen depending on the size n of a sample: α = α n . The choice is based on assumption that under H 0 for someᾱ close to zero The equality (3) means that under H 0 the probability that none of X i falls into α n -outlier region is 1 −ᾱ. It implies that The sequence α n decreases fromᾱ to 0 as n goes from 1 to ∞.
The first definition of an outlier is as follows: for a sample size n a realization x i of X i is called outlier if x i ∈ out(α n , F ); x i is called right outlier if x i ∈ out r (α n , F ).  Type I extreme value 1 − e −e x ln ln n e −bn Type II extreme value e −e −x ln(− ln(1 − 1/n)) e bn /(n − 1) Logistic The number of outliers D n under H 0 has the binomial distribution B(n, α n ) and the expected number of outliers in the sample under H 0 is ED n = nα n . Note that ED n → − ln(1 −ᾱ) ≈ᾱ as n → ∞. For example, ifᾱ = 0.05, then ln(1 −ᾱ) ≈ 0.05129 and for n ≥ 10 the expected number of outliers is approximately 0.051, i.e. it practically does not depend on n. So under H 0 the expected number of outliers 0.051 is negligible with respect to the sample size n.

Preliminary results
Suppose that a c.d.f. F ∈ F ls belongs also to the domain of attraction G γ , γ ≥ 0 (see [20]).
If F ∈ G 0 ∩F ls , then there exist normalizing constants a n > 0 and b n ∈ R such that lim n→∞ F n 0 (a n x+b One of possible choices of the sequences {b n } and {a n } is In the particular case of the normal distribution equivalent form a n = 1/b n can be used. Expressions of b n and a n for some most used distributions are given in Table 1.

Condition A.
Consider a model that satisfies the following conditions: a)μ andσ are consistent estimators of µ and σ; b) the limit distribution of ( Condition A c) is satisfied for many location-scale models including the normal, type I extreme value, type II extreme value, logistic, Laplace (F ∈ G 0 ), Cauchy (F ∈ G 1 ).
The following theorem is useful for right outliers detection test construction.  If F ∈ G γ ∩ F ls , γ > 0 and Conditions A hold, then the limit random vector is

Theorem 3.2
Suppose that the function f 0 is symmetric. If F ∈ G γ ∩ F ls , γ ≥ 0 and Conditions A hold, then for fixed s Suppose now that the function f 0 is not symmetric. Set For example, if type I extreme value distribution is considered, then b n = ln ln n, a n = 1 .
For the type II extreme value distribution a n , b n , a * n , b * n have the same expressions as a * n , b * n , a n , b n for the Type I extreme value distribution, respectively.

Remark 3
Similarly as in Theorem 3.1 we have that if s is fixed and F ∈ G 0 ∩ F ls , then for fixed i, i = 1, ..., s, and if F ∈ G γ ∩ F ls , γ > 0, then for fixed i, i = 1, ..., s,

Robust estimators for location-shape distributions
The choice of the estimatorsμ andσ is important when outlier detection problem is considered. The ML estimators from the complete sample are not stable when outliers exist.
In the case of location-scale families highly efficient robust estimators of the location and scale parameters µ and σ are (see [21] Expressions of K −1 0 (x) and values d for some well-known location-scale families are given in Table 2.

Right outliers identification method for location-scale families
Suppose that F ∈ G γ ∩ F ls , γ ≥ 0. Let a n , b n be defined by (5). Set

Theorem 3.3 The distribution of the statistic U + (n, s) is parameter-free for any fixed n
Denote by u + α (n, s) the α critical value of the statistic U + (n, s). Note that it is exact, not asymptotic α critical value: Our simulations showed that the below proposed outlier identification methods based on exact and approximate critical values of the statistic U + (n, s) give practically the same results, so for samples of size n ≥ 20 we recommend to approximate the α-critical level of the statistic U + (n, s) by the critical values v + α (s) which depend only on s. We shall see that for the purpose of outlier identification only the critical values v + α (5) are needed. We found that the critical values v + α (5) are: v + 0.1 (5) = 0.9677, v + 0.05 (5) = 0.9853, v + 0.01 (5) = 0.9975. Our simulations showed that the performances of the below proposed outlier identification method based on exact and approximate critical values of the statistic U + (n, 5) are similar for samples of size n ≥ 20.
We write shortly BP-method for the below considered method.
BP method for right outliers. Begin outlier search using observations corresponding to the largest values of Y i . We recommend begin with five largest. So take s = 5 and compute the values of the statistics If U + (n, 5) ≤ v + α (5), then it is concluded that outliers are absent and no further investigation is done. Under H 0 the probability of such event is approximately 1 − α.
, then it is concluded that outliers exist. Note that (see the classification scheme below) that if U + (n, 5) > v + α (5), then minimum one observation is declared as an outlier. So the probability to declare absence of outliers does not depend on the following classification scheme.
If it is concluded that outliers exist then search of outliers is done using the following steps.
. If d 1 < 5, then classification is finished at this step: d 1 observations are declared as right outliers because if the value of X (n−d1) is declared as an outlier, then it is natural to declare values of X (n) , ..., X (n−d1+1) as outliers, too.
If d 1 = 5, then it is possible that the number of outliers is higher than 5. Then the observation corresponding to i = 1 (i.e corresponding to X (n) ) is declared as an outlier and we proceed to the step 2.
Step 2. The above written procedure is repeated taking , the classification is finished and d 2 + 1 observations are declared as outliers.
If d 2 = 5, then it is possible that the number of outliers is higher than 6. Then the observation corresponding to the largestŶ (n−1) is declared as an outlier, in total 2 observations (i.e. the observations corresponding to i = 1, 2 (i.e corresponding to X (n) and X (n−1) ) are declared as outliers and we proceed to the step 3. And so on. Classification finishes at the lth step when d l < 5. So we declare (l − 1) outliers in the previous steps and d l outliers in the last one. The total number of observations declared as outliers is l − 1 + d l . These observations are values of X (n) , ..., X (n−d l −l+2) .

Left outliers identification method for location-scale families
Let a * n , b * n be the normalizing constants defined by (10).
Theorem 3.1 and Remark 3 imply that the limit distribution (as n → ∞) of the random variable U − (n, s) coincides with the distribution of the random variable V + (s). So the critical values u − α (n, s) are approximated by the critical values v − α (s) = v + α (s). The left outliers search method coincides with the right outliers search method replacing + to − in all formulas.

Outlier detection tests for location-scale families: two-sided alternative, symmetric distributions
Let a n , b n be defined by (5).
. Denote by u α (n, s) the α critical value of the statistic U (n, s). Theorem 3.1 and Remark 2 imply that the limit distribution (as n → ∞) of the random variable U (n, s) coincides with the distribution of the random variable V + (s). So the critical values u α (n, s) are approximated by the critical values v α (s) = v + α (s). The outliers search method coincides with the right outliers search method skipping upper index + in all formulas.

Outlier detection tests for location-scale families: two-sided alternative, non-symmetric distributions
Suppose now that the function f 0 is not symmetric. Let a n , b n , a * n , b * n be defined by (10).   Begin outlier search using observations corresponding to the largest and the smallest values ofŶ i . We recommend begin with five smallest and five largest. So compute the values of the statistics U − (n, 5) and U + (n, 5). If U − (n, 5) ≤ v α/2 (5) and U + (n, 5) ≤ v α/2 (5), then it is concluded that outliers are absent and no further investigation is done. (5), then it is concluded that outliers exist. If U − (n, 5) > v α/2 (5), then left outliers are searched as in Section 3.3. If U + (n, 5) > v α/2 (5), then right outliers are searched as in Section 3.2. The only difference is that α is replaced by α/2 in all formulas.

Illustrative example
To illustrate simplicity of the BP-method, let us consider an illustrative example of its application (sample size n = 20, r = 7 outliers). The sample of size n = 20 from standard normal distribution was generated. The 1st-3rd and 17th-20th observations were replaced by outliers. The observations x i , the absolute values |Ŷ i | of the z-scoresŶ i , and the ranks (i) of |Ŷ i | are presented in Table 3.
Step 1. The inequality U (16) (20) = 1.0000 > 0.9853 = v 0.05 (5) (note that U (16) (20) corresponds to the fifth largest observation in absolute value) implies that d 1 = 5. So it is possible that the number of outliers might be greater than 5. We reject the largest in absolute value 20th observation as an outlier and continue the search of outliers.
Step 2. The inequality U (15) (19) = 1.0000 > 0.9853 = v 0.05 (5) (note that U (15) (19) corresponds to the fifth largest observation in absolute value from the remaining 19 observations) implies that d 2 = 5. So it is possible that the number of outliers might be greater than 6. We declare the second largest in absolute value observation as an outlier. So two observations (19th and 20th) are declared as outliers. We continue the search of outliers.
We created R package outliersTests 2 to be able to use the proposed BP test in practice within R package.

Generalization of Davies-Gather outlier identification method
Let us consider location-scale families. Following the idea of Davies-Gather [2] define an empirical analogue of the right outlier region as a random region where g n.α is found using the condition andμ,σ are robust equivariant estimators of the parameters µ, σ.
The equation (16) is equivalent to the equation equation So g n,α is the upper α critical value of the random variableŶ (n) . It is easily computed by simulation.
Generalized Davies-Gather method for right outliers identification: ifŶ (n) ≤ g n,α , then it is concluded that right outliers are absent. The probability of such event is α. IfŶ (n) > g n,α , then it is concluded that right outliers exist. The value x i of the random variable X i is admitted as an outlier if x i ∈ OR r (α n ), i.e. if x i >μ +σg n,α . Otherwise it is admitted as a non-outlier.
An empirical analogue of the left outlier region as a random region where h n.1−α is found using the condition The equation (18) is equivalent to the equation equation So h n,α is the upper 1 − α critical value of the random variableŶ (1) . It is easily computed by simulation. (1) ≥ h n,1−α , then it is concluded that left outliers are absent. The probability of such event is α. IfŶ (1) < h n,α , then it is concluded that left outliers exist. The value x i of the random variable X i is admitted as an outlier if x i ∈ OR l (α n ), i.e. if x i <μ +σh n,1−α . Otherwise it is admitted as a non-outlier.

Generalized Davies-Gather method for left outliers identification: ifŶ
Let us consider two-sided case.
If the distribution of X i is symmetric, then the empirical analogue of the outlier region is the random region In this case

Generalized Davies-Gather method for left and right outliers identification (symmetric distributions):
if |Ŷ | (n) ≤ g n.α/2 , then it is concluded that outliers are absent. The probability of such event is α. If |Ŷ | (n) > g n.α/2 , then it is concluded that outliers exist. The value x i of the random variable X i is admitted as a left outlier if x i <μ −σg n,α/2 , it is admitted as a right outlier if x i >μ +σg n,α/2 . Otherwise it is admitted as a non-outlier.
If distribution of X i is non-symmetric, then the empirical analogue of the outlier region is defined as follows:

Generalized Davies-Gather method for left and right outliers identification (non-symmetric distributions):
if Y (1) ≥ h n,1−α/2 andŶ (n) ≤ g n,α/2 , then it is concluded that outliers are absent. The probability of such event is α. IfŶ (1) < h n,1−α/2 orŶ (n) > g n,α/2 , then it is concluded that outliers exist. The value x i of the random variable X i is admitted as a left outlier if x i <μ +σh n,1−α/2 , it is admitted as a right outlier if x i >μ +σg n,α/2 . Otherwise it is admitted as a non-outlier.

Rosner's method
Let us formulate Rosner's method in the form mostly used in practice. Suppose that the number of outliers does not exceed s and the two-sided alternative is considered. Set (see [5,22]) |Ỹ j | = |(X j −X)/S X | may be interpreted as a distance between X j andX. Remove the observation X j1 which is most distant fromX. This maximal distance is R 1 . The value of X j1 is a possible candidate for contaminant.
Recompute the statistic using n − 1 remaining observations and denote by R 2 the obtained statistic. Remove the observation X j2 which is most distant from the new empirical mean. The value of X j2 is also possible candidate for contaminant. Repeat the procedure until the statistics R 1 , · · · , R s are computed. So we obtain all possible candidates for contaminants. They are values of X j1 , . . . , X js Fix α and find λ in such that If n > 25, then the approximations are recommended (see [5]); here t p (ν) is the p critical value of the Student distribution with ν degrees of freedom.

Rosner's method for left and right outliers identification:
if R i λ in for all i = 1, · · · , s, then it is concluded that outliers are absent. If there exists i 0 ∈ {1, . . . , s} such that R i0 > λ i0n , i.e. the event ∪ s i=1 {R i > λ in } occurs, then it is concluded that outliers exist. In this case classification of observations to outliers and non-outliers is done in the following way: if R s > λ sn , then it is concluded that there are s outliers and they are values of X j1 , . . . , X js . If R j λ jn for j = s, s − 1, . . . , i + 1, and R i > λ in , then it is concluded that there are i outliers and they are values of X j1 , . . . , X ji .
If right outliers are searched, then define R + 1 = max 1 i nỸi , and repeat the above procedure taking approximations Denote by R s the Rosner's test with a fixed upper limit s. Our simulation results confirm that the true significance level is different from the level α suggested by the approximation when n is not large. Nevertheless, it is approaching α as n increases, see Figure 1. The true significance value of the BP test, which uses asymptotic values of the test statistic are also presented in Figure 1.

Bolshev's method
Suppose that the number of contaminants does not exceed s. For i = 1, · · · , n set Set τ + = min 1 i s τ + (i) /i.
In the case of left and right outliers search Bolshev's method uses τ (i) instead of τ + (i) , defining the statistic τ = min 1 i s τ (i) /i.

Bolshev's method for left and right outliers search.
If τ ≥ τ 1−α (n, s), then it is concluded that outliers are absent; here τ 1−α (n, s) is the 1 − α critical value of the statistic τ under H 0 . If τ < τ 1−α (n, s), then it is concluded that outliers exist. In such a case they are selected in the following way: if τ i /i < τ 1−α (n, s) then the observation corresponding to τ i is admitted as an outlier, i = 1, ..., s.

Hawking's method
Suppose that the number of contaminants does not exceed s. Let us consider the search for right outliers.
Hawking's method. If B + ≤ B + α (n, s) then it is concluded that outliers are absent; here B + α (n, s) is the α critical value of the statistic under H 0 . If B + > B + α (n, s), then it is concluded that outliers exist. In such a case outliers are selected in the following way: if b + i > B + α (n, s), then the value of the order statistic X (n−i+1) is admitted as an outlier, i = 1, ..., s.

Comparative analysis of outlier identification methods by simulation
In the case of location-scale classes probability distribution of all considered test statistics does not depend on µ and σ, so we generated samples of various sizes n with n − r observations having the c.d.f. F 0 and r observations having various alternative distributions concentrated in the outlier region. We shall call such observations "contaminant outliers", shortly c−outliers. As was mentioned, outliers which are not c-outliers, i.e. outliers from regular observations with the c.d.f. F 0 , are very rare.
We repeated simulations M = 100000 times and using various methods we classified observations to outliers and non-outliers and computed the mean number D OcO of correctly identified c-outliers, the mean number D ON of c-outliers which were not identified, the mean number D N O of non c-outliers admitted as outliers, and the mean number D N N of non c-outliers admitted as non-outliers.
An outlier identification method is ideal if each outlier is detected and each non-outlier is declared as a non-outlier. In practice it is impossible to do with the probability one. Two errors are possible: a) an outlier is not declared as such (masking effect); b) a non-outlier is declared as an outlier (swamping effect). We shall write shortly "masking value" for the mean number of non-detected c-outliers and "swamping value" for the mean number of "normal" observations declared as outliers in the simulated samples.
If swamping is small for two tests then a test with smaller masking effect should be preferred because in this case the distribution of the data remaining after excluding of suspected outliers should be closer to the distribution of non-outlier data.
From the other side, if swamping for Method 1 is considerably bigger than swamping of Method 2 and masking is smaller for Method 1, then it does not mean that Method 1 is better because this method rejects many extreme non-outliers from the tails of the regular distribution F 0 and the sample remaining after classification may be not treated as a sample from this regular distribution even if all c-outliers are eliminated.
For various families of distributions, sample sizes n, and alternatives we compared Davies-Gather (DG) and new (BP) methods performance. In the case of normal distribution we also compared them with Rosner's, Bolshev's and Hawking's methods.
We used two different classes of alternatives: in the first case c-outliers are spread widely in the outlier region around the mean, in the second case c-outliers are concentrated in a very short interval laying in the outlier region. More precisely, if right outliers were searched, then we simulated r observations concentrated in in the right outlier region out r (α n , F 0 ) = {x : x > x αn } using the following alternative families of distribution: 1) Two parameter exponential distribution E(θ, x αn ) with the scale parameter θ. If θ is small, then outliers are concentrated near the border of the outlier region. If θ is large then outliers are widely spread in the outlier region. If θ increases, then the mean of outlier distribution increases. Note that even if θ is very near 0 and the true number of outliers r is large, these outliers may corrupt strongly the data making tails of histogram two heavy.
2) Truncated normal distribution T N (x αn , µ, ρ) with the location and scale parameters µ, ρ (µ > x αn ). If ρ is small then this distribution is concentrated in a small interval around µ. If µ increases, then the mean of outlier distribution increases.
For lack of place we present a small part of our investigations. Note that the results are very similar for all sample sizes n ≥ 20. Multiple outlier problem is not very relevant for smaller sample sizes.

Investigation of outlier identification methods for normal data
We use notation B, H, R, DG, and BP for the Bolshev's, Hawking's, Rosner's, Davies-Gather's, and the new methods, respectively. If DG method is based on maximum likelihood estimators, then we write DG ml method, if it is based on robust estimators, we write DH rob method.
For comparison of above considered methods we fixed the significance level α = 0.05. We remind that the significance level α is the probability to reject minimum one observation as an outlier under the hypothesis H 0 which means that all observations are realizations of i.i.d. having the same normal distribution. The only test, namely R method uses approximate critical values of the test statistic, so the significance values for this test is only approximately 0.05 and depends on s and n. In Figure 1 the true significance level value for s = 5, 15 and [0.4n] in function of n are given.
The B, H, and R tests methods have a drawback that the upper bound for the possible number of outliers s must be fixed. The BP and DG tests have an advantage that they do not require it.
Our investigations showed that H,B and DG ml methods have other serious drawbacks. So firstly let us look closer at these methods. If the true number of c-outliers r exceeds s, then the B and H methods can not find them even if they are very far from the limits of the outlier region. Nevertheless, suppose that r does not exceed s and look at the performance of the H method. Set n = 100, s = 5, and suppose that c-outliers are generated by right-truncated normal distribution T N (x αn , µ, ρ) with fixed ρ and increasing µ. Note that the true number of c-outliers is supposed to be unknown but do not exceed s = 5. In Figure 2 the mean numbers of rejected non-c-outliers D N O are given in function of the parameter µ (the value of the parameter ρ = 0.1 2 is fixed) for fixed values of r see Figure 2. In Table 5 the values of D N O plus the values of the mean numbers of truly rejected c-outliers are given. The Table 5 shows that if r = 1, then if µ is sufficiently large, the c-outlier is found but the number of rejected non-c-outliers D N O increases to 4, so swamping is very large. Similarly, if r = 2, then D N O increases to 3, so swamping is large. Beginning from r = 3 not all c-outliers are found even for large µ. Swamping is smallest if the true value r coincides with s but even in this case one c-outlier is not found even for large µ. Taking into account that the true number r of c-outliers is not known in real data, The above analysis shows that the B, H, DG ml methods have serious drawbacks, so we exclude these methods from further consideration.
Let us consider the remaining three methods: R, DG, and BP. For small n the true significance level of Rosner's test differ considerably from the suggested, so we present comparisons of tests performance for n = 50, 100, 1000 (see Tables 6-7). Truncated exponential distribution was used for outliers simulation.
Remoteness of the mean of outliers from the border of the outlier region is characterized by the parameter θ.    All three methods find all c-outliers if they are sufficiently remote. For n = 50 the BP method gives uniformly smallest masking values and the DG method gives uniformly largest masking values for any considered r in all diapason of alternatives. For n = 100 and r = 2, 5 the result is the same. For n = 100 and r = 10 (it means that even for very small θ the data is seriously corrupted) the BP method is also the best except that for the most remote alternatives the Rosner [0.4n] method slightly outperforms the BP method. For n = 1000 and the most of alternatives the BP method strongly outperforms other methods, except the most remote alternatives.

Swamping values
The DG and Rosner's methods have very large masking if many outliers are concentrated near the outlier region border. In this case data is seriously corrupted, however, these methods do not see outliers.
Conclusion: in most considered situations the BP method is the best outlier identification method. The second is Rosner's method with s = [0.4], and the third is the Davies-Gather method based on robust estimation. Other methods have poor performance.

Investigation of outlier identification methods for other location-scale models
We investigated performance of the new method for location-scale families different from normal. We compare the BP method with the generalized Davies-Gather method for logistic, Laplace (symmetric, F ∈ G 0 ∩ F ls ), extreme values (non-symmetric F ∈ G 0 ∩ F ls ), and Cauchy (symmetric, F ∈ G 1 ∩ F ls ) families. C-outliers were generating using truncated exponential distribution concentrated in two-sided outlier region. Swamping values being small, masking value, see Table 8 and differences between the true number of c-outliers and the number of rejected observations, see Figures 4-5, were compared. The BP and DG rob methods find very well the most remote outliers, meanwhile, the BP method identifies much better closer outliers. The DG rob method identifies badly multiple outliers concentrated near the border of the outlier region, whereas the BP method does well. The DG M L is not appropriate for multiple outlier search.

Conclusion
In many situations, the proposed outlier identification method has superior performance as compared to existing methods. The method is based on an asymptotic result, so it should be not applied for samples of very small size n ≤ 15.
The formulated approach could be extended and used for probability distribution families different from location-scale and shape-scale.
The R package outliersTests was created for the practical usage of proposed test.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or non-profit sectors.