Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Omnibus test for normality based on the Edgeworth expansion

  • Agnieszka Wyłomańska ,

    Contributed equally to this work with: Agnieszka Wyłomańska, D. Robert Iskander, Krzysztof Burnecki

    Roles Conceptualization, Formal analysis, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    agnieszka.wylomanska@pwr.edu.pl

    Affiliation Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wroclaw University of Technology, Wroclaw, Poland

  • D. Robert Iskander ,

    Contributed equally to this work with: Agnieszka Wyłomańska, D. Robert Iskander, Krzysztof Burnecki

    Roles Conceptualization, Formal analysis, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw, Poland

  • Krzysztof Burnecki

    Contributed equally to this work with: Agnieszka Wyłomańska, D. Robert Iskander, Krzysztof Burnecki

    Roles Conceptualization, Formal analysis, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wroclaw University of Technology, Wroclaw, Poland

Abstract

Statistical inference in the form of hypothesis tests and confidence intervals often assumes that the underlying distribution is normal. Similarly, many signal processing techniques rely on the assumption that a stationary time series is normal. As a result, a number of tests have been proposed in the literature for detecting departures from normality. In this article we develop a novel approach to the problem of testing normality by constructing a statistical test based on the Edgeworth expansion, which approximates a probability distribution in terms of its cumulants. By modifying one term of the expansion, we define a test statistic which includes information on the first four moments. We perform a comparison of the proposed test with existing tests for normality by analyzing different platykurtic and leptokurtic distributions including generalized Gaussian, mixed Gaussian, α-stable and Student’s t distributions. We show for some considered sample sizes that the proposed test is superior in terms of power for the platykurtic distributions whereas for the leptokurtic ones it is close to the best tests like those of D’Agostino-Pearson, Jarque-Bera and Shapiro-Wilk. Finally, we study two real data examples which illustrate the efficacy of the proposed test.

Introduction

Testing the hypothesis of normality is one of the fundamental procedures of the statistical analysis. There is a large number of normality tests. Some of them such as the χ2 goodness-of-fit test [1] with its variants, the Kolmogorov-Smirnov (KS) one-sample cumulative probability test [2], the Shapiro-Wilk (SW) test [3], D’Agostino-Pearson (DP) test [4] and Jarque-Bera (JB) test [5] are nowadays considered classical. These tests are based on comparing the distribution of the observed data to the expected distribution (χ2), on measuring the distance between the empirical and analytical distribution function (KS), on taking into account some transformations of moments of the data like skewness and kurtosis (DP and JB), or on calculating some function of order statistics (SW). Other tests based on the empirical distribution function, less widespread, include the Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], and the Anderson-Darling (AD) test [9]. From other testing techniques, let us mention ideas based on the empirical characteristic function [10], on the dependence between moments that characterizes normal distributions [11], or on the Noughabi’s entropy estimator [12].

The Edgeworth series can be used to expand an arbitrary probability distribution in terms of its cumulants. So far, the Edgeworth expansion has been utilized to design a score test for normality of errors in a regression model [13], design a normality test for the probit model [14], and to design a normality test against a specific alternative, such as the logistic distribution [15].

There have been also attempts in the literature to provide a one-sample statistical test of normality for data in a broader setting like in a general Hilbert space [16]. Despite this verity, there have been continuing efforts to develop tests for the departure of a random sample from normality that could be considered omnibus, [17, 18, 19, 20, 21, 22], that is, to be able to reject the null hypothesis of normality with high power for a wide range of alternatives.

Many of the normality tests consider evaluating the third and fourth order moments and, hence, the power of such tests depends on whether a symmetric or skewed alternative is being considered. In general, it is expected that symmetric alternatives or those with small amount of skew are more difficult to differentiate from the null hypothesis of normality than those alternatives characterized with a large skew [20]. The fourth order moment, most commonly used in the form of kurtosis, has less obvious effect on the performance of a normality test, but here it is also important whether the distribution is leptokurtic or platykurtic [23, 24], that is, whether its kurtosis is larger or smaller than that of the normal distribution, respectively.

There are many signal processing applications in which the underlying distribution of the data is leptokurtic [25, 26], while those phenomena that can be modeled with a platykurtic distribution, with the exception of the uniform distribution, are less present [27]. Consequently, normality tests dedicated to platykurtic alternatives are scarce. Nevertheless, some effort has been made to improve the performance of a normality tests across the range of symmetric platykurtic alternatives [28].

The aim of this work is to develop a novel test for normality of omnibus character that could outperform the classical tests for the case of platykurtic symmetric alternatives.

The paper is structured as follows. First, we derive a test statistic based on the second term of the Edgeworth expansion which incorporates information on both the skewness and kurtosis. In the next part we establish the main results. We formally construct a statistical test on normality and provide information on critical values of the test. Next, we analyze the power of the test by Monte Carlo simulations. We take into account four symmetric distributions which can be very close to the normal law, namely the generalized Gaussian, mix Gaussian distributions, α-stable and Student’s t-distributions. The former two serve as examples of platykurtic distributions, whereas the latter two are classical leptokurtic probability laws. We compare the results with the power of the classical normality tests. Finally, we study two real data examples taken from a collection of over 1300 datasets that were originally distributed alongside the statistical software environment R and some of its add-on packages. The examples illustrate the efficacy of the proposed test. The findings of the paper are summarized in the last section.

Derivation of the new test statistic based on Edgeworth expansion

Let X1, X2, …, XN be a random sample from the distribution with finite mean μ = E(X1). We define the arithmetic mean , n = 1, 2, …, N and the standardized mean by (1)

The Edgeworth expansion is a series that approximates a probability distribution in terms of its cumulants [29]. For random variables Xi, i = 1, 2, …, N, with finite kth moment it has the following form (2) where (3)

In Eqs (2) and (3), Φ(y) and ϕ(y) stand for the cumulative distribution function (CDF) and the probability density function (PDF) of a standard normal distribution, respectively, and Pi(y) is an appropriate Hermite polynomial of degree 3i − 1 [29]. The coefficients in Pi(⋅) are expressed in terms of appropriate moments of the random variable X1. For instance, the first two polynomials have the following form (4) (5) where τ = E(X1μ)3(VarX1)−3/2 and κ = E(X1μ)4(VarX1)−2−3 are the skewness and excess kurtosis (in this paper in the figures also called, in short, kurtosis) of the random variable X1.

Let us now concentrate on the statistic Tn for n = 2 (see Eq (1)), which depends only on two random variables X1 and X2. Such a choice of n allows to have many realizations of the statistic for one sufficiently long trajectory by dividing the data into blocks of length 2. The choice of n = 2 seems to be the most optimal, however in order to prove this, in the simulation study we present the comparison between the results obtained in case of n = 2 and n = 3. By Eq (2) the CDF of the T2 statistic can be approximated by (6) Also, (n + 1)/nT2 has a Student’s t-distribution with 1 degree of freedom when X1 follows the normal distribution. For practical reasons we assume k = 2. The functions Hi(y), i = 1, 2 contain the information about the deviation of the T2 distribution from the standard normal law. If the distribution underlying the random sample is close to normal we expect the deviations (hence the functions) to be smaller than those for non-normal distributions. They can be also called corrections to the normal distribution since the CDF of T2 is approximated by the CDF of the standard normal distribution corrected by those functions.

Let us now define two statistics and as the maxima of the functions H1(y) and H2(y), respectively, calculated over the y values at which the empirical CDF of T2 changes, hence the T2 values. This is similar to calculating the Kolmogorov-Smirnov statistic. Let (7) and (8) This approach is arbitrary and other ways of arranging the random sample into blocks of length two are permitted. Then (9) where .

To ascertain whether the test statistic , i = 1, 2, is sensitive to deviations from normality we perform Monte Carlo simulations for two non-normal distributions described in more detail in the Appendix. They are the generalized Gaussian (GG) distribution with parameters μ = 1, β = 0.2, and ρ = 2.2 corresponding to a case of platykurtic distribution and the Student’s t-distribution with ν = 16 degrees of freedom, corresponding to the case of a leptokurtic distribution. For M simulated samples x1, x2, ⋯, xN of size N we calculate ⌊N/2⌋ values of the T2 statistic and evaluate the maximum of Hi(y), i = 1, 2, over all values of the standardized means T2 obtained for a given sample according to Eq (9). As a result, we obtain M realizations of , i = 1, 2. In Fig 1 we present a comparison of the empirical PDFs of , i = 1, 2, for the two considered distributions and the corresponding empirical PDFs obtained for the standard normal distribution. The empirical PDFs were constructed as kernel density estimators [30]. In the simulations we considered N = 1000 and M = 5000.

thumbnail
Fig 1. Empirical PDFs of and for the Student’s t-distribution with ν = 16 degrees of freedom (left panels) and generalized Gaussian (GG) distribution with μ = 1, β = 0.2, ρ = 2.2 (right panels) with corresponding empirical PDFs obtained for the standard normal distribution.

https://doi.org/10.1371/journal.pone.0233901.g001

We can observe that is less sensitive to deviations from normality than the statistic. This effect is visible for both analyzed non-normal distributions. Therefore, we propose a test for normality based on the test statistic (10) where .

Testing for normality

Construction of the test

In our statistical test the null hypothesis (H0) is that the data come from a normal distribution with an unknown mean and variance. The alternative hypothesis (H1) is that the data set does not come from such a distribution. The test statistic is given by formula (10).

As explained in the previous section, for sample data x1, x2, ⋯, xN, first, we calculate averages . Then, we calculate values of the statistic defined by Eq (8) by taking every two consecutive observations (without overlapping). In consequence, from the sample of size N we obtain ⌊N/2⌋ values of the statistic. Finally, we calculate according to Eq (10) by taking the maximum over the statistic values. It is worth to mention that in the formulas for T2 and we replace μ, τ and κ coefficients by the sample mean, skewness and excess kurtosis, respectively, calculated for the whole series x1, x2, ⋯, xN.

We reject the H0 hypothesis if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value at a given significance level c. The procedure of testing is summarized in Fig 2 Schema 1.

thumbnail
Fig 2. Schema 1.

Schematic algorithm of the testing procedure.

https://doi.org/10.1371/journal.pone.0233901.g002

In order to construct the critical region we advocate the use of Monte Carlo simulations. We simulate M trajectories of size N of independent identically distributed (i.i.d.) random variables from the standard normal distribution. As a result we obtain a matrix of size M × N. For each trajectory we calculate the statistic. The critical region is defined as (11) where Q1 and Q2 are are empirical quantiles of order c/2 and 1 − c/2, respectively, calculated from the M values of . The Q1 and Q2 are called lower and upper critical values.

The critical values for five different sample data sizes (N = 20, 50, 100, 200, 1000) based on M = 5000 Monte Carlo simulations for two selected significance levels c = 5% and c = 1% are presented in Table 1 in the Appendix.

thumbnail
Table 1. The lower and upper critical values Q1 and Q2 for sample sizes 20, N = 50, 100, 200 and 1000 and two exemplary significance levels c: 0.05 and 0.01.

The critical values are calculated based on the 5000 Monte Carlo simulations of standard normal distributed samples.

https://doi.org/10.1371/journal.pone.0233901.t001

Power simulation study

The power of the test is the probability to reject the null hypothesis H0 when the alternative H1 is true. The power is an important characteristic of any statistical test. In our case the power of the test is defined as follows: (12) where Q1 and Q2 are lower and upper critical values, respectively. In our study, for all considered cases we calculate the power of the test by using Monte Carlo simulations. More precisely, for a given sample size N we simulate M independent trajectories from considered distribution. For each trajectory the value of test statistics is calculated and we check if the value falls into the critical region constructed for a given significance level c. The power of the test is evaluated as the fraction of trajectories for which the value of is larger than an upper critical value Q2 or smaller than a lower critical value Q1.

In the following, we perform a simulation study for four selected distributions: two of them belong to the platykurtic class of distributions and two—to the leptokurtic. In the first group we choose the generalized Gaussian and the mixed Gaussian distributions, for which the JB test was found to perform poorly [24]. In the second group, α-stable and Student’s t distributions are considered. In order to show the effectiveness of the proposed test for each considered distribution we examine five values of the sample size, namely N = 20, N = 50, N = 100, N = 200 and N = 1000. We assume the significance level c = 0.05. For the implementation of the test and the simulation study we used MATLAB R2019a. Simulations were performed on Intel(R) Core(TM) i7-7500U CPU @ 2.7 GHz. In this section we graphically illustrate the results for N = 50, 100, 200, 1000. The powers for all considered sizes and the graphical comparison of the N = 20 and N = 50 cases are presented in the Appendix, see Tables 221 and Figs 36.

thumbnail
Fig 3. Power of the introduced test and standard tests for normality for different sample sizes: N = 50, N = 100; N = 200 and N = 1000 for the generalized Gaussian distribution with respect to the excess kurtosis.

Powers were calculated on the basis of 5000 simulations. The significance level is equal to 5%.

https://doi.org/10.1371/journal.pone.0233901.g003

thumbnail
Fig 4. Power of the introduced test and standard tests for normality for different sample sizes: N = 50, N = 100; N = 200 and N = 1000, for the mixed Gaussian distribution with respect to the excess kurtosis.

Powers were calculated on the basis of 5000 simulations. The significance level is equal to 5%.

https://doi.org/10.1371/journal.pone.0233901.g004

thumbnail
Fig 5. Power of the introduced test and standard tests for normality for different sample sizes: N = 50, N = 100; N = 200 and N = 1000, for the α-stable distribution with respect to the parameter.

Powers were calculated on the basis of 5000 simulations. The significance level is equal to 5%.

https://doi.org/10.1371/journal.pone.0233901.g005

thumbnail
Fig 6. Power of the introduced test and standard tests for normality for different sample sizes: N = 50, N = 100; N = 200 and N = 1000, for the Student’s t-distribution with respect to the number of degrees of freedom.

Powers were calculated on the basis of 5000 simulations. The significance level is equal to 5%.

https://doi.org/10.1371/journal.pone.0233901.g006

In Fig 7 we study the power of the proposed test for normality for the generalized Gaussian distribution. For the distribution we assume that μ = 1, β = 0.2 and analyze the power by changing the ρ parameter. As it is described in the Appendix, in this case the ρ parameter controls if the generalized Gaussian distribution belongs to the leptokurtic or platykurtic class of distributions. Since there is one to one correspondence between the ρ parameter and the excess kurtosis (see formula (14)), we present the power of the test with respect to the excess kurtosis. It is compared with the power of the most common tests for normality, namely JB, DP, SW, Kuiper, Watson, CvM, AD, χ2 and the test of Zoubir and Arnold [22], based on the empirical characteristic function (CF), that was shown to perform well for smaller sample sizes. We can see that the proposed test is clearly superior to other tests for N ≥ 200 and that for N = 100 it shares this superiority to other tests with the DP test, which, on the other hand, performs best among considered tests for sample size N = 50 and very small kurtosis values. We can also observe that the least performing tests in this study are the KS test, the χ2 test, and (surprisingly) the JB test, especially for short samples.

thumbnail
Fig 7. The oil investment (Data1) and differentiated real earnings (Data2) datasets.

https://doi.org/10.1371/journal.pone.0233901.g007

thumbnail
Table 2. Comparison of the powers of the tests for normality for GG distribution and N = 20.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t002

thumbnail
Table 3. Comparison of the powers of the tests for normality for GG distribution and N = 50.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t003

thumbnail
Table 4. Comparison of the powers of the tests for normality for GG distribution and N = 100.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t004

thumbnail
Table 5. Comparison of the powers of the tests for normality for GG distribution and N = 200.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t005

thumbnail
Table 6. Comparison of the powers of the tests for normality for GG distribution and N = 1000.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t006

thumbnail
Table 7. Comparison of the powers of the tests for normality for MG distribution and N = 20.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t007

thumbnail
Table 8. Comparison of the powers of the tests for normality for MG distribution and N = 50.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t008

thumbnail
Table 9. Comparison of the powers of the tests for normality for MG distribution and N = 100.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t009

thumbnail
Table 10. Comparison of the powers of the tests for normality for MG distribution and N = 200.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t010

thumbnail
Table 11. Comparison of the powers of the tests for normality for MG distribution and N = 1000.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t011

thumbnail
Table 12. Comparison of the powers of the tests for normality for α-stable distribution and N = 20.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t012

thumbnail
Table 13. Comparison of the powers of the tests for normality for α-stable distribution and N = 50.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t013

thumbnail
Table 14. Comparison of the powers of the tests for normality for α-stable distribution and N = 100.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t014

thumbnail
Table 15. Comparison of the powers of the tests for normality for α-stable distribution and N = 200.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t015

thumbnail
Table 16. Comparison of the powers of the tests for normality for α-stable distribution and N = 1000.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t016

thumbnail
Table 17. Comparison of the powers of the tests for normality for Student’s t distribution and N = 20.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t017

thumbnail
Table 18. Comparison of the powers of the tests for normality for Student’s t distribution and N = 50.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t018

thumbnail
Table 19. Comparison of the powers of the tests for normality for Student’s t distribution and N = 100.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t019

thumbnail
Table 20. Comparison of the powers of the tests for normality for Student’s t distribution and N = 200.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t020

thumbnail
Table 21. Comparison of the powers of the tests for normality for Student’s t distribution and N = 1000.

The following tests are taken under consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1].

https://doi.org/10.1371/journal.pone.0233901.t021

The second considered distribution is the mixed Gaussian which is also a member of the platykurtic class. Here we assume μ1 = 1, σ1 = 1 and σ2 = 1 and analyze the power of the test by changing the μ2 parameter. As it is explained in the Appendix, there is one to one correspondence between the μ2 parameter and the excess kurtosis of the mixed Gaussian distribution (see formula (16)). Therefore, in Fig 8 we present the power of the test with respect to the excess kurtosis and compare it with the power of the common tests for normality. As previously, we can see that the proposed test is clearly superior to other tests for N > 100 while its performance diminishes for sample size N = 50. Again, we can see that the least performing tests in this study are the KS test, the χ2 test, and the JB test, especially for short samples.

thumbnail
Fig 8. Theoretical excess kurtosis for the generalized Gaussian distribution with respect to the ρ parameter.

https://doi.org/10.1371/journal.pone.0233901.g008

The next two considered distributions, namely α-stable and Student’s t-distributions belong to the class of leptokurtic distributions. We note that the α-stable distribution is difficult to differentiate from the normal distribution when α is close to two and the same is true for the Student’s t distribution when number of degrees of freedom is large [31, 32, 17].

In Fig 9 we consider the symmetric α-stable distribution with σ = 1. Since for the α-stable distribution excess kurtosis is infinite, in this study we analyze the power of the test with respect to the stability index α and compare the test performance with results of the classical tests. We can see that the situation for the leptokurtic distributions is different than for the platykurtic. We can clearly identify two groups of the tests. The first group, which consists of the introduced test, JB, SW and DP tests performs much better than the second group with the rest of the tests (only for N = 50 the introduced test visibly falls behind the top three tests but is still superior to others). It seems that the SW test has the best overall performance, and the proposed test is the last in the first group for smaller and moderate samples but for N = 1000 it behaves very similarly to the SW.

thumbnail
Fig 9. Theoretical excess kurtosis for the mixed Gaussian distribution with m = 2, p1 = p2 = 0.5, μ1 = 0, as a function of the parameter μ2.

https://doi.org/10.1371/journal.pone.0233901.g009

The last considered distribution is the Student’s t. In Fig 10 the power of the proposed test for normality is presented as a function of the number of the degrees of freedom since for the Student’s t-distribution the excess kurtosis is finite only if number of degrees of freedom exceeds 4. As in the previous cases, the power results are compared to those of the common tests for normality. We can see that the situation is very similar to the observed for the α-stable distribution. We can observe two groups of tests. The introduced test belongs to a selected group of test that perform much better than the rest of the distributions (again only for N = 50 it visibly falls behind the leaders). In this case, it seems that the JB test is superior for most of the cases. The proposed test is the last in the first group for smaller and moderate samples but for N = 1000 it becomes even better than the JB test.

thumbnail
Fig 10. Theoretical excess kurtosis for Student’s t-distribution as a function of the parameter ν.

https://doi.org/10.1371/journal.pone.0233901.g010

In the Appendix in Tables 221 we present powers of the considered normality tests with the highlighted best results also for N = 20. The situation for N = 20 is similar to the case N = 50 with two striking differences: the χ2 test fails to reject simulated samples from the generalized Gaussian, mixed Gaussian and α-stable distributions, and the CF test significantly improves its performance being the clear winner for the generalized Gaussian and mixed Gaussian distributions, see also Figs 36.

In order to analyze the influence of the n parameter on the effectiveness of the proposed test, in the Appendix we present the comparison of the power of the test for n = 2 and n = 3 for all considered distributions and sample sizes. In Tables 2225 we demonstrate the power of the test and in each considered case we highlight the best results. We observe that for the platykurtic distributions generally the test for n = 2 outperforms the test for n = 3, especially for larger sample sizes and excess kurtosis much smaller than zero. For the leptokurtic distributions in both cases, namely for n = 2 and n = 3, the power of the test is comparable.

thumbnail
Table 22. Comparison of the power of the test for n = 2 and n = 3—GG distribution.

https://doi.org/10.1371/journal.pone.0233901.t022

thumbnail
Table 23. Comparison of the power of the test for n = 2 and n = 3—MG distribution.

https://doi.org/10.1371/journal.pone.0233901.t023

thumbnail
Table 24. Comparison of the power of the test for n = 2 and n = 3 − α-stable distribution.

https://doi.org/10.1371/journal.pone.0233901.t024

thumbnail
Table 25. Comparison of the power of the test for n = 2 and n = 3—Student’s t distribution.

https://doi.org/10.1371/journal.pone.0233901.t025

Application to real time series

In order to demonstrate how the proposed methodology can be applied to real data, in this section we consider two illustrative datasets from a collection of over 1300 datasets that were originally distributed alongside the R environment [33]. The inclusion criteria for a dataset to be considered an illustrative example in our study were: sufficient sample size, lack of obvious trend, lack of obvious correlation and platykurtosis. When necessary, data differentiation was considered to arrive at weak stationarity. The first dataset (Data 1) is related to oil investments. In the collection the data are under the name “Oil Investment”. For the analysis, we took the variable “waterd”, which describes the depth of the sea in metres and we examine the first 50 available observations. A detailed description of the data can be found in Ref. [34]. The second dataset (Data 2) corresponds to the non-experimental “control” group, used in various studies of the effect of a labor training program. The time series is titled “Labour Training Evaluation Data”. To the analysis we took the first 200 observations of the differentiated time series “re78“, which describes the real earnings in the year of 1978 [35, 36, 37, 38].

The two considered datasets are presented in Fig 11. For both considered time series we apply the proposed test to ascertain whether the hypothesis of normality can be rejected. The testing procedure is illustrated in Fig 2 Schema 1. First, for a dataset of size N we calculate the value of the test statistic , see formula (10). Then, we check if the calculated value falls between lower and upper critical values constructed for the samples of size N at a given significance level, see Table 1. We reject the hypothesis of normal distribution if the value of the test statistic is smaller than the lower critical value Q1 or higher than the upper critical value Q2. The results of testing the hypothesis of normality for the two considered real datasets with the proposed test as well as the other considered here normality tests are presented in Table 26. We present the following information about the results of the tests at the significance levels c = 0.01 and c = 0.05: 0 means the hypothesis of normality was not rejected whereas 1 implies the hypothesis of normality was rejected. For the new test introduced in this paper we also depict the test statistic values in the parentheses.

thumbnail
Fig 11. Comparison of the power of the introduced test and standard tests for normality for N = 20 and N = 50 for the generalized Gaussian distribution with respect to the excess kurtosis.

https://doi.org/10.1371/journal.pone.0233901.g011

thumbnail
Table 26. Results of the new test and other considered here tests for normality for two real-world datasets at significance levels of c = 0.01 and c = 0.05.

Tests taken into consideration: the new test proposed in this paper, Jarque-Bera (JB) test [5], D’Agostino-Pearson (DP) test [4], Shapiro-Wilk (SW) test [3], test based on the empirical characteristic function (CF) [22], Kolmogorov-Smirnov (KS) test [2], Kuiper test [6], Watson test [7], Cramer-von Mises (CvM) test [8], Anderson-Darling (AD) test [9] and χ2 goodness-of-fit test [1]. “0” means that the normality is not rejected, “1”—it is rejected. In parentheses values of the test statistic for the new test are presented.

https://doi.org/10.1371/journal.pone.0233901.t026

For Data 1 we can observe that the new test rejects the hypothesis of normality at both significance levels: 1% and 5%. At significance level 5% the same result is obtained for SW, KS, Kuiper, Watson, CvM, AD and χ2 tests. The JB, DP and CF tests do not reject the hypothesis of normality although the empirical kurtosis is smaller than zero (see Table 26). At significance level of 1% the introduced and χ2 tests are the only ones that lead to the rejection of the null hypothesis. Interestingly, for Data 2, only the new test rejects the hypothesis of normality at the significance level of 5%.

Conclusions

Developing an omnibus test for normality of a random sample is a challenging and important task in signal processing that is particularly difficult for symmetric alternatives and those that are close to the normal distribution. We examined the behavior of the Edgeworth expansion when the distribution of a random sample deviates from the normal assumption. Then, by appropriately utilizing the second term of this expansion we designed a novel test on normality that can be treated as omnibus.

The test’s performance, evaluated via Monte Carlo simulations, showed superior performance to those exhibited by other statistical tests for normality, particularly for the case of platykurtic distributions and sample sizes greater or equal to 100. For these distributions, the proposed test in almost all cases had the highest power.

For the leptokurtic distributions the situation was different, but still the proposed test was among the best. For this class of distributions, we were able to identify two groups of the tests. The power of the proposed test was shown to belong to the group of powerful tests that include the well-known D’Agostino-Pearson, Shapiro-Wilk and Jarque-Bera tests. For the largest size of the sample it even surpassed these top competitors.

We also showed the efficacy of the introduced test by studying two datasets from the open R language data repository. We compared the results of the proposed test and those of the other considered here normality tests. It is evident that for the first dataset the new test was among a few which led to the rejection of the hypothesis of normality at significance level of 5%. At the level of 1%, only the proposed and the χ2 tests rejected normality. For the second dataset the new test was the only one which rejected the hypothesis of normality.

Finally, we also note that the test is relatively easy to use. The introduced statistic is computationally simple. To conduct the test one needs to calculate critical values for a given significance level. In the paper we presented a simple algorithm to calculate these values and provided the critical values for typical sample sizes and significance levels.

Appendix

We review some of the classical distributions used in this work, which belong to the platykurtic and leptokurtic families of distributions.

Platykurtic distributions

1. Generalized Gaussian distribution.

The generalized Gaussian GG(μ, β, ρ) is characterized by the probability density function given by the formula [39, 40] (13) where μR, β, ρ > 0. The ρ parameter controls how heavy is the tail. The excess kurtosis in this case takes the form (14) In Fig 12 we present the theoretical excess kurtosis of GG distribution along the ρ parameter. The GG distribution can either be leptokurtic or platykurtic, depending on the parameter ρ. Nevertheless, in this paper we focus on the platykurtic region of the parameter ρ.

thumbnail
Fig 12. Comparison of the power of the introduced test and standard tests for normality for N = 20 and N = 50 for the mixed Gaussian distribution with respect to the excess kurtosis.

https://doi.org/10.1371/journal.pone.0233901.g012

2. Mixture of Gaussian distributions.

A random variable M follows a mixture of m Gaussian distributions if its PDF has the form [41, 42] (15) where pi ≥ 0, and fi(⋅) is a PDF of a normally distributed random variable with mean μi and variance .

In this paper we take under consideration the simplest case, namely m = 2, p1 = p2 = 0.5, μ1 = 0, and consider the MG distribution only in terms of the μ2 parameter. In this case the excess kurtosis is given by (16)

The above formula can be proved as follows. Using the PDF of MG distribution given in (15) one can show that Thus Further

Using the formula for PDF of the random variable M we obtain Thus, finally the excess kurtosis for the MG random variable when m = 2, , m1 = 0 and σ1 = σ2 = 1 takes the form As one can see in the considered case the MG distribution is platykurtic. In Fig 13 we present the excess kurtosis of MG distribution with m = 2, p1 = p2 = 0.5, μ1 = 0, with respect to the μ2 parameter.

thumbnail
Fig 13. Comparison of the power of the introduced test and standard tests for normality for N = 20 and N = 50 for the α-stable distribution with respect to the α parameter.

https://doi.org/10.1371/journal.pone.0233901.g013

Leptokurtic distributions

1. Student’s t-distribution.

The Student’s t-distribution is defined through its PDF given by the formula [43, 44] (17) where the parameter ν > 0 is called the number of degrees of freedom. In the above definition Γ(⋅) is the gamma function. The excess kurtosis for Student’s t-distribution is defined only for ν > 4 and takes the form (18) In Fig 14 we show the theoretical excess kurtosis for Student’s t-distribution with respect to the ν parameter.

thumbnail
Fig 14. Comparison of the power of the introduced test and standard tests for normality for N = 20 and N = 50 for the Student’s t-distribution with respect to the number of degrees of freedom.

https://doi.org/10.1371/journal.pone.0233901.g014

2. α-stable distribution.

The α-stable random variable with parameters α, σ, β and μ is defined through its characteristic function in the following way [45] (19) where 0 < α ≤ 2 is the stability index, σ > 0 is the scale parameter, −1 ≤ β ≤ 1 is the skewness parameter and μR is the location parameter. The explicit formula for the PDF of α-stable random variable is not given in an elementary form, except for the three cases: normal, Cauchy and Lévy distributions. In this work, distributions with the stability index α close to two (i.e., the case of normal distribution) are studied in detail.

Comparison of the power of the test for n = 2 and n = 3.

In this part we present a comparison of the power of the test for two cases: n = 2 and n = 3. In Tables 2225 we highlight the best results for the considered cases.

Comparison of the computational times.

In this part we present a comparison of the computational time of the proposed normality test with the computational times of the other considered tests. Here we only present the running times for one exemplary distribution, namely the Gaussian, and one sample size, namely N = 1000. The power of each test was calculated on the basis of 5000 simulations. In Table 27 we depict mean computational times calculated as the time needed to evaluate the power of the test divided by the number of the Monte Carlo simulations (in our case 5000). We note that the tests based on the empirical distribution function (EDF tests) seem to be unusually slow. This is due to the fact we calculated powers by evaluating p-values for all samples by means of Monte Carlo simulations as advocated by Ross [46] for goodness of fit testing for the case of unspecified parameters.

thumbnail
Table 27. Comparison of the mean computational time (in seconds) for the considered tests for the sample from Gaussian distribution of size N = 1000.

The powers of the tests are are calculated based on the 5000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pone.0233901.t027

Critical values of test.

In this part we present critical values of the introduced test for the considered five sample sizes (N = 20, 50, 100, 200, 1000) and two significance levels c: 1% and 5%, see Table 1.

Comparison of the powers of the tests for normality.

In this part we present a comparison of the powers of the normality tests considered in this paper for four distributions and for five sample sizes (N = 20, 50, 100, 200, 1000). The results are presented in Tables 221, where we have highlighted the best results.

References

  1. 1. Chernoff H, Lehmann EL (1954) The use of maximum likelihood estimates in χ2 tests for goodness of fit. Annals of Mathematical Statistics 25(3):579–586.
  2. 2. Chakravarti IM, Laha RG, Roy J (1967) Handbook of Methods of Applied Statistics, Volume I, John Wiley and Sons.
  3. 3. Shapiro SS, Wilk MB, Chen HJ (1968) A comparative study of various tests for normality. Journal of the American Statistical Association 63(324):1343–1372.
  4. 4. Pearson ES, D’Agostino RB, Bowman KO (1977) Tests for departure from normality: Comparison of powers. Biometrika 64(2):231–246.
  5. 5. Jarque CM, Bera AK (1980) Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters 6(3):255–259.
  6. 6. Srinivasan R (1971) On the Kuiper test for normality with mean and variance unknown. Statistica Neerlandica 25(3):153–157.
  7. 7. Watson GS (1961) Goodness-of-fit tests on a circle. Biometrika 48(1/2):109–114.
  8. 8. Anderson TW (1962) On the distribution of the two-sample Cramer-von Mises criterion. Annals of Mathematical Statistics 1:1148–1159.
  9. 9. Anderson TW, Darling DA (1954) A test of goodness of fit. Journal of the American Statistical Association 49(268):765–769.
  10. 10. Koutrouvelis IA (1980) A goodness-of-fit test of simple hypotheses based on the empirical characteristic function. Biometrika, 67(1):238–240.
  11. 11. Shan G, Vexler A, Wilding GE, Hutson AD (2010) Simple and exact empirical likelihood ratio tests for normality based on moment relations. Communications in Statistics—Simulation and Computation 40(1):129–146.
  12. 12. Noughabi HA (2016) Two powerful tests for normality. Annals of Data Science 3(2):225–234.
  13. 13. Kiefer NM, Salmon M (1983) Testing normality in econometric models. Economics Letters 11:123–127.
  14. 14. Lahiri K, Song JG (1999) Testing for normality in a probit model with double selection. Economic Letters 1(65):33–39.
  15. 15. Sadooghi-Alvandi SM, Rasekhi A (2009). Testing Normality Against the Logistic Distribution Using Saddlepoint Approximation. Communications in Statistics—Simulation and Computation 38(7):1426–1434.
  16. 16. Kellner J, Celisse A (2019) A one-sample test for normality with kernel methods. Bernoulli 25(3):1816–1837.
  17. 17. D’Agostino RB (2017) Tests for the normal distribution. In Goodness-of-fit-techniques (pp. 367–420). Routledge, NY.
  18. 18. D’Agostino RB, Belanger A, D’Agostino RB Jr (1990) A suggestion for using powerful and informative tests of normality. The American Statistician 44(4):316–321.
  19. 19. Doornik JA, Hansen H (2008) An omnibus test for univariate and multivariate normality. Oxford Bulletin of Economics and Statistics 70:927–939.
  20. 20. Epps TW, Pulley LB (1983) A test for normality based on the empirical characteristic function. Biometrika 70(3):723–726.
  21. 21. Urzúa CM (1997) On the correct use of omnibus tests for normality. Economics Letters 3(54):301.
  22. 22. Zoubir AM, Arnold MJ (1996) Testing Gaussianity with the characteristic function: the iid case. Signal Processing 53(2-3):245–255.
  23. 23. Razali NM, Wah YB (2011) Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics 2(1):21–33.
  24. 24. Thadewald T, Büning H (2007) Jarque–Bera test and its competitors for testing normality–a power comparison. Journal of Applied Statistics 34(1):87–105.
  25. 25. Chapeau-Blondeau F (2000) Nonlinear test statistic to improve signal detection in non-Gaussian noise. IEEE Signal Processing Letters 7(7):205–207.
  26. 26. Mansour A, Jutten C (1999) What should we say about the kurtosis? IEEE Signal Processing Letters 6(12):321–322.
  27. 27. Banerjee S, Agrawal M (2013) Underwater acoustic noise with generalized Gaussian statistics: Effects on error performance. In OCEANS-Bergen, MTS/IEEE, 1–8.
  28. 28. Nakagawa S, Hashiguchi H, Niki N (2012) Improved omnibus test statistic for normality. Computational Statistics 27(2):299–317.
  29. 29. Hall P (1987) Edgeworth expansion for student’s t statistic under minimal moment conditions, The Annals of Probability 15:920–931.
  30. 30. Silverman BW (1986) Density Estimation for Statistics and Data Analysis. London: Chapman & Hall/CRC.
  31. 31. Burnecki K, Wyłomańska A, Chechkin A (2015) Discriminating between light-and heavy-tailed distributions with limit theorem. PloS One 10(12):e0145604. pmid:26698863
  32. 32. Burnecki K, Wyłomańska A, Beletskii A, Gonchar V, Chechkin A (2012) Recognition of stable distribution with Lévy index α close to 2. Physical Review E 85(5):056711.
  33. 33. R language data repository, https://vincentarelbundock.github.io/Rdatasets/datasets.html.
  34. 34. Favero CA, Pesaran MH, Sharma S (1994) A duration model of irreversible oil investment: theory and empirical evidence, Journal of Applied Econometrics 9(S):S95–S112.
  35. 35. Dehejia RH, Wahba S (1999) Causal effects in non-experimental studies: re-evaluating the evaluation of training programs. Journal of the American Statistical Association 94:1053–1062.
  36. 36. Dehejia RH (2005) Practical propensity score matching: a reply to Smith and Todd. Journal of Econometrics 125:355–364.
  37. 37. Lalonde R (1986) Evaluating the economic evaluations of training programs. American Economic Review 76:604–620.
  38. 38. Smith JA. Todd PE (2005) Does Matching overcome LaLonde’s critique of nonexperimental estimators. Journal of Econometrics 125: 305–353.
  39. 39. Nadarajah S (2005) A generalized normal distribution, Journal of Applied Statistics 32 (7):685–694.
  40. 40. Varanasi MK, Aazhang B (1989) Parametric generalized Gaussian density estimation. Journal of the Acoustical Society of America 86 (4):1404–1415.
  41. 41. Behboodian J. (1970) On the modes of a mixture of two normal distributions, Technometrics 12:131–139.
  42. 42. Robertson CA, Fryer JG (1969) Some descriptive properties of normal mixtures. Scandinavian Actuarial Journal (3-4):137–146.
  43. 43. Hazewinkel M, ed. (1994) Student distribution, Encyclopedia of Mathematics, Springer.
  44. 44. Hogg RV, Craig AT (1978) Introduction to Mathematical Statistics, New York: Macmillan.
  45. 45. Samorodnitsky G, Taqqu MS (1994) Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman&Hall/CRC.
  46. 46. Ross S (2012) Simulation. Academic Press.