Modified Greenwood statistic and its application for statistical testing

In this paper, we explore the modified Greenwood statistic, which, in contrast to the classical Greenwood statistic, is properly defined for random samples from any distribution. The classical Greenwood statistic, extensively examined in the existing literature, has found diverse and interesting applications across various domains. Furthermore, numerous modifications to the classical statistic have been proposed. The modified Greenwood statistic, as proposed and discussed in this paper, shares several key properties with its classical counterpart. Emphasizing its stochastic monotonicity within three broad classes of distributions - namely, generalized Pareto, $\alpha-$stable, and Student's t distributions - we advocate for the utilization of the modified Greenwood statistic in testing scenarios. Our exploration encompasses three distinct directions. In the first direction, we employ the modified Greenwood statistic for Gaussian distribution testing. Our empirical results compellingly illustrate that the proposed approach consistently outperforms alternative goodness-of-fit tests documented in the literature, particularly exhibiting superior efficacy for small sample sizes. The second considered problem involves testing the infinite-variance distribution of a given random sample. The last proposition suggests using the modified Greenwood statistic for testing of a given distribution. The presented simulation study strongly supports the efficiency of the proposed approach in the considered problems. Theoretical results and power simulation studies are further validated by real data analysis.


Introduction
This paper explores the modified Greenwood statistic and its use in addressing statistical testing issues.The modification made to the traditional statistic enables its application to random samples from any distribution, whereas the original Greenwood statistic was specifically designed for random samples with positive values.The relative simple form of the introduced statistic allows for the analysis of its theoretical properties and its application in various areas of interest.
The Greenwood statistic was introduced by M. Greenwood in 1946 [1] and further was discussed in [2], where some asymptotic properties and moments of its distribution were analysed for exponentially distributed random sample.From that time many authors have analyzed this statistic and its various extensions from a theoretical perspective.For instance, Greenwood statistic and its modifications were introduced in the context of testing for the exponential and uniform distribution (see, e.g., [3]).In [4] and [5] authors derived asymptotic properties of the moments of the Greenwood statistic.Also the asymptotic behaviour of the distribution of the Greenwood statistic with respect to the existence of the moments of the underlying distribution were derived in [6].In [7] authors proved the stochastic monotonicity of the Greenwood statistic under the assumption of star-shaped stochastic monotonicity of an underling random sample.The Greenwood statistic was also used in [8] to introduce a test for inference of the tail index in context of generalized Pareto distribution.Some of important works related to Greenwood statistic also include its applications in testing the Taylor's law, see [9,10].Recently, in [11] authors introduced a generalization of the Greenwood statistic and analyzed the asymptotic properties of the modified Greenwood statistics for regularly varying distributions.Due to the simple form of the Greenwood statistic and its modifications, they were applied in various real-data problems.The classical examples include applications in the analysis of clustering events either in space or time, namely in medicine and epidemiology [1,12], genetics and genomics [13,14], biology [15], economics and insurance [16,17], hydrology [18], optimization [19], physics and materials science [20,21], anomaly detection [22], internet traffic monitoring [23] and even in athletics [24].
In this paper we discuss the theoretical properties of the introduced modification of the Greenwood statistic paying a particular attention to its stochastic ordering for three general classes of distributions, namely generalized Pareto distribution [25], α-stable distribution [26], and Student's t distribution [27].
Here we extended the last class, namely Student's t distribution, by the Gaussian distribution and thus, we examine the properties of the modified Greenwood statistic for such class.The mentioned above classes of distributions cover the light and heavy-tailed family of distributions.They are crucial in the probability theory as well as real data applications.
In the context of application of Greenwood statistic, the generalized Pareto distribution was previously discussed in [7,8].The distribution is defined through probability density function with three parameters, of which γ ∈ R is the most relevant parameter responsible for heaviness of the tail distribution (see, e.g., [25]).If γ ≤ 0 the distribution has lighter tail, in particular for γ = 0 the generalized Pareto distribution reduces to exponential distribution, and eventually for γ ≥ 0.5 the variance of corresponding random variable does not exist.
The α-stable distributions are defined by four parameters, with the stability index α ∈ (0, 2] considered one of the most crucial.The stability index is responsible for the heavy-tailed behavior, and the smaller values of α correspond to a higher probability of the associated random variable taking extreme values.When α < 2, α−stable distributions fall within the category of heavy-tailed distributions, with the corresponding random variable exhibiting infinite variance in this scenario.On the contrary, α−stable distributions can be viewed as an extension of the Gaussian distribution, converging to it when α = 2 (see, e.g., [28]).For more details we refer the readers to classical books on α−stable distributed signals and models, such as [29,30,31,32].
The class of Student's t distributions is defined through the probability density function with the parameter ν > 0 (number of degrees of freedom) responsible for the tail behaviour of corresponding random variable [27].For ν > 2, the variance of the distribution exists while it is infinite otherwise.
As mentioned, in this paper, the modified Greenwood statistic is applied for the testing problem.Here we present three different directions.The first one, is related to the classical problem of testing the Gaussian distribution.In this case one can also apply the modified Greenwood statistic and propose the goodnessof-fit test.For Gaussian distribution testing, a widely used is the Shapiro-Wilk test [33] and its several extensions, see [34,35].Another commonly utilized tests for Gaussianity are based on skewness and kurtosis, namely Jarque-Bera test [36] and D'agostino-Pearson test [37].Several approaches assessing the empirical distribution function of the random sample have been introduced to test for Gaussian distribution, e.g.Kolmogorov-Smirnov test [38], Cramér-von-Mises test [39], Kuiper test [40], Watson test [41], Anderson-Darling test [42] and Lilliefors test [43] (as well as its extensions [44]).In addition, several comparative studies have analyzed the effectiveness of tests for Gaussianity, e.g.[45,46,47].We also highlight the recently proposed approach based on the conditional variance statistic [48].
The second considered problem when the modified Greenwood statistic is proposed to be applied is the testing of infinite-variance distribution for given random sample.This problem is much more general than testing a specific distribution, however it was also considered in the literature.A general test for infinite moments was introduced in [49], where the author constructed statistic that diverges if a kth moment is infinite and converges otherwise.Another general bootstrap-based test for finite moments was introduced in [50].Both tests can be utilized to verify whether the distribution's variance exists.In several studies [51,52,53] empirical cumulative even moment statistic was analyzed in context of testing for infinite variance, as the statistic diverges when random sample comes from infinite variance distribution.However, assessing the properties of the moments for various distributions with different properties is difficult, thus some alternative methods of detection of infinite moments can be applied, e.g. based on estimating the power-law behaviour of the tail of the distribution.For various distributions, the existence of the moments depend on the tail of the distribution, thus estimating the power-law index allows to infer about the existence of the variance (for estimation see [54,55,56,57,58,59]). Let us note that the classical Greenwood statistic was also utilized in context of testing for tail index.Namely, in [8], the authors used the Greenwood statistic to estimate the confidence intervals for tail index parameter in case of generalized Pareto distribution.
The last application of the modified Greenwood statistic is a testing of a specific distribution with given parameters.The testing procedure is describe within three mentioned above classes.In case of non-Gaussian distributions, the goodness-of-fit tests are usually based on the empirical distributions of the test statistic and rejection regions are obtained by Monte Carlo simulations.For generalized Pareto distribution this approach was extensively analyzed in [60] and there are considered several goodness-offit tests, e.g., Kolmogorov-Smirnov, Cramér-von-Mises and Anderson-Darling tests, which in the classical versions were proposed for Gaussian distribution testing.Similar approach can be utilized in case of Student's t distribution.In case of the goodness-of-fit test for α-stable distribution, the approach based on conditional moments was developed [61,52].Moreover, in such case a likelihood ratio test have been introduced to discriminate between Gaussian distribution and α-stable distribution with α = 2 (see, e.g., [32]).For other goodness-of-fit test dedicated to α-stable distributions we refer to [62].
The main novelty of this paper is to introduce the modified Greenwood statistic and discuss its main theoretical properties for three considered classes of distributions.Additionally, our aim is to demonstrate the usefulness of the introduced statistic for the testing problem that can be performed in three different contexts discussed above.The efficiency of the proposed testing methodology is verified for simulated random samples from three analyzed classes of distributions.Finally, the theoretical and simulation studies are supported by real data analysis from condition monitoring area.The presented results clearly confirm that the test based on modified Greenwood statistic outperforms other considered classical tests for Gaussian distribution.Specifically it is evident when dealing with small sample sizes in the class of α−stable distributions.Additionally, the modified Greenwood statistic serves as the powerful tool for testing the general infinite-variance distribution of given sample.
The rest of the paper is organized as follows.In Section 2 we recall some important characteristics of the Greenwood statistic.In Section 3 we introduce the modification to Greenwood statistic and we present its main properties.In Section 4 we introduce application of the modified Greenwood statistic to testing problem.In Section 5 we present power simulation study of the proposed tests for three discussed classes of distributions.In Section 6 we show application of the proposed approach to real data case.Lastly, in Section 7 we present concluding remarks.

Greenwod statistic
In this section, we present basic facts on Greenwood statistic T n .Let X 1 , X 2 , . . ., X n be an independent identically distributed (IID) random sample from a common nonnegative distribution.Then statistic T n is defined as follows ( Originated by Greenwood [1], for testing exponentiality and further used in a number of applications, statistic ( 1) is intensively studied in recent years.Below, we present the most important properties of statistic T n .
It should be noted that exact distribution of statistic T n is very difficult to obtain and closed form expression for probability density function (PDF) or cumulative distribution function (CDF) is, in general, unknown even for underling random sample with exponential distribution.Thus, vast literature on Greenwood statistic is devoted to the analysis of its asymptotic behavior and approximation of percentiles.In particular, under the assumption of finite fourth moment, T n is asymptotically normal with mean 2 n and variance 4  n 3 .To be specific, let Tn = √ n nTn 2 − 1 , then Tn d → N, where N is a standard normal random variable with distribution N (0, 1) (see, e.g., [2]).Furthermore, under the assumption of regular variation of distribution of underling random sample X 1 , X 2 , . . ., X n , asymptotic distribution of T n was studied in [63] and [6].In particular, in case where parameter of regular variation, belongs to the interval (0, 1), it was shown in [6] that T n converges, under proper normalisation, to U V 2 , where U and V are independent random variables with α−stable distributions (see, Theorem 2.1 and Remark 2.1 in [6]).In case of regular variation parameter greater than 1 and proper normalisation, T n converges to a random variable with the α−stable distribution (see, Theorems 2.2 -2.5 in [6]).Note that, as was pointed out in [2], the convergence is very slow even in a case of limiting normal distribution.Thus, the usage of asymptotic distribution of T n for constructing critical regions is of limited value.For the historical review of the most important results on approximation of Greenwood statistic see supplementary material in [7].
A vast literature is devoted to the analysis of many other interesting probabilistic properties of statistic T n .By the definition (1), we have 1 n ≤ T n ≤ 1, hence all moments of the Greenwood statistic are finite.Another important characteristic of T n , that is a direct consequence of (1) is that its distribution is scaleinvariant.Note, that Greenwood statistic is closely related to some other important statistics, such as sample coefficient of variation nTn−1 , and self-normalized sum SN n = 1 √ Tn (see, e.g., [6] and references therein).Statistic T n is commonly used for testing exponentiality (see, e.g., [7] and the references therein).Equivalently it can be used as a test statistic for uniformity as discussed in Section 1 of suplementary material in [7].Moreover, statistic T n and its modifications has been used in the context of testing for extreme domain of attraction (see, e.g., [64] and [65] and the references therein).These applications is closely related to clustering and heterogeneity detection, as it was pointed out in [7].In particular, when clustering is present T n tends to have values closer to one, while in opposite case of uniformly or even super-uniformly distributed data, that is without grouping or outliers, values of T n are closer to 1  n .We refer to Section 3 in [7] on the discussion about the connection between clustering, heavy tails and behavior of statistic T n .This property of Greenwood statistic, together with its stochastic monotonicity discussed below provide justification for T n being effective tool for discriminating between light and heavy tailed data within various classes of distributions.

General properties
Due to its simple form and useful properties, as discussed in the previous sections, statistic T n is commonly used for solving numerous real-data problems.However, the main limitation of the usage of Greenwood statistic T n is that it can not be applied for general real valued samples that commonly appears in many important applications.For resolving this problem, in this section we propose a modification of Greenwood statistic called modified Greenwood statistic, defined as Although, proposed statistic can be used for analyzing any real valued samples, we concentrate on its value for testing within families of distributions symmetric with respect to 0. In particular, as we show in this contribution, statistic S n exhibit high efficiency in detecting if a sample comes from α-stable or Student's t distribution when compering with Gaussian distribution, outperforming other traditionally used test statistics.Statistic S n preserve most of the fundamental properties of Greenwood statistic.In particular, it is scale invariant.Moreover, note that S n is bounded, as 1 n ≤ S n ≤ 1.Consequently, all moments of S n exists even for the case where this property does not hold for underling random sample.Additionally, under assumption of finite fourth moment S n is asymptotically normal.Note that, as was pointed out in the previous section the convergence of statistic T n is very slow.The same property persists for the modified Greenwood statistics, see Fig. 1 and Fig. 2. In Fig. 1 we present the comparison of PDF of asymptotic distribution of normalized S n (denoted by Sn ) and empirical PDFs of Sn for three considered distributions, see Appendix A. Analogously, in Fig. 2 we present the comparison of the corresponding distributions' tails (i.e., 1-CDF).The discussed above figures clearly show that the usage of asymptotic distribution of S n for constructing critical regions is of limited value.In the following sections we present approach for testing procedures that goes in line with the one developed in [7] and [8] and is based on stochastic monotonicity of statistics S n .

Stochastic monotonicity
One of the important properties of statistic S n that plays a crucial role in applications and justifies its usage as a test statistic, is its stochastic ordering.Definition 1.Let X and Y be independent random variables with CDFs F X (•) and F Y (•), respectively.We say that X is stochastically smaller than Y (in standard sense), denoted by X ≤ st Y , whenever, for each It is known that stochastic ordering of Greenwood statistic T n can be obtained under assumption of star-shaped ordering of distribution of underling random sample X 1 , X 2 , . . ., X n .We refer the reader to Theorem 1 in [7] for the detailed proof of this fact.Herein, we demonstrate that statistic S n exhibits the same stochastic ordering property.Before we present the main result of this section let us recall two equivalent definitions of star-shaped order commonly used in the literature.
Definition 2. Let X and Y be independent random variables with CDFs F X (•) and F Y (•), respectively.We say that X is smaller than Y in the star-shaped order, denoted by X ≤ * Y , whenever g(x) ) is a star-shaped function, i.e., g(αx) ≤ αg(x), for any, x > 0, α ∈ [0, 1] (see Definition in Section 1 in [66]).

Equivalently we say that
are quantile functions of X and Y , respectively.(see, e.g., Proposition 1 in [66]).
Based on the assumption of star-shaped ordering of distribution of random sample X 1 , X 2 , . . ., X n , standard stochastic ordering of statistic S n can be shown.Proposition 1.Let {P θ , θ ∈ Θ ⊂ R} be a family of star-shaped ordered absolutely continuous, symmetric probability distributions on R, that is such that if θ 1 ≤ θ 2 we have X (θ1) ≤ * X (θ2) , with X (θi) ∼ P θi , i = 1, 2. Then, for each n ≥ 2, we have where n is given by ( 2) with the {X i } having a common distribution P θ .
The proof of Proposition 1 is presented in Appendix C.1.Moreover, observe that in case of nonnegative random variables X 1 , X 2 , . . ., X n statistics S n and T n are equivalent.Thus, in this case, due to Theorem 1 in [7] stochastic order of S n is also preserved. where n is given by ( 2) with the {X i } having a common distribution P θ .
In the subsequent part of this section we concentrate on three analyzed classes of distributions of underling random samples X 1 , X 2 , . . ., X n : generalized Pareto distribution G(γ, δ), α-stable distribution S(α, σ), and Student's t distribution T (ν) (see Appendix A for definitions).Notice, that for the class G(γ, δ) star-shaped ordering and thus standard ordering of corresponding statistic T n , was proved in [7] and was extensively studied in the context of clustering detection and testing in [7] and [8].We call off this result in Proposition 3 (i) and refer the reader to Proposition 1 (i) in [7], for its detailed proof.Proposition 3. Let S n be a modified Greenwood statistic defined in (2), where X 1 , X 2 , . . ., X n are IID random variables with a common distribution.
is stochastically increasing with respect to parameter γ.
then S n is stochastically decreasing with respect to parameter ν.
The proof of Proposition 3 is presented in Appendix C.2.In Fig. 3 mass shift of probability is presented in the case of α-stable distribution, as an example illustrating stochastic behavior of statistic S n , demonstrated in Proposition 3 (ii).It is pointed out that the mass of probability shifts to the right as stability index moves away from 2. Similar behavior of statistic S n can be observed for generalised Pareto distribution of underlying random sample and was discussed in [8].Such behavior of the test statistic S n is strictly connected with the heaviness of the tail of distribution of underlying random sample and presence of clustering in the dataset.In particular, heavier tail in underlying random sample leads to larger values of statistic S n .We refer to Section 3 in [8] for the broad discussion on a connection between heavy tails, clustering and Greenwood statistic.Finally, let us pointed out that stochastic monotonicity proved in Proposition 3 justifies a construction of rejection regions for the tests introduced in the next section.

Application of modified Greenwood statistic for testing problem
In this section, we demonstrate the application of the Greenwood statistic for the testing problem.We showcase the universality of the Greenwood statistic and discuss three versions of the statistical test: a test for Gaussian distribution, a test for an infinite-variance distribution, and a test for a given distribution with a specific value of the parameter responsible for the heavy-tailed behavior.Although, the methodology presented in this section is described in a general form, in further analysis, we demonstrate its usefulness for three considered classes of distributions, namely α−stable S(α, σ), Student's t T (ν) and generalized Pareto GP(γ, δ).

Testing of Gaussian distribution
One of the proposed applications of Greenwood statistic is its utilization in the problem of the testing of Gaussian distribution.Let us consider the class of distributions P θ , where θ ∈ Θ is the distribution's parameter and F θ (•) denotes the CDF of the corresponding distribution.Moreover, we assume that the Gaussian distribution belongs to P θ and is characterized by θ * parameter.Here we discuss the problem of testing Gaussian distribution versus heavy-tailed distribution.For such a case there are two possible scenarios, namely in scenario (1) for each θ ∈ Θ we have θ * ≤ θ while in scenario (2) we have θ * ≥ θ.
In the considered problem of Gaussian distribution testing we formulate the following H 0 and H 1 hypotheses, depending on the possible scenarios, respectively Knowing that, S n is stochastically monotone with respect to θ within given class, we can consider it as a test statistic.For mentioned above scenarios we consider one-sided testing procedure meaning that the test statistic obtained for a single trajectory is compared only with one critical value and the result is the base to reject the H 0 .The rejection region of the tests corresponding to scenarios (1) and ( 2) is as follows where Q p (n) is the theoretical quantile of order p from the distribution of the test statistic under H 0 hypothesis and n is a sample length.As the distribution of S n in general is not known in the rejection region (7) we take the empirical quantile, i.e., Qc (n).To this end we simulate M sample trajectories of X 1 , X 2 , . . ., X n from Gaussian distribution and for each of them we calculate the value of the modified Greenwood statistic.Finally, we obtain M realisations of the distribution corresponding to S n .In further analysis, the tests corresponding to the above mentioned scenarios are denoted as M G 1 test and M G 2 test, respectively.The described above procedure can be applied for S(α, σ) and T (ν) distributions since in such classes the modified Greenwood statistic is stochastically monotone with respect to appropriate parameters.Additionally, S(2, σ) and T (∞) distributions are Gaussian (see Appendix A).For such two considered cases we apply M G 2 test.Following, in Propositions 4 and 5 we present fundamental properties of the test M G 2 applied for the distributions S(α, σ) and T (ν), respectively.Proposition 4. The test M G 2 for the hypotheses (6) applied for the distribution S(α, σ) has size c, is unbiased and has decreasing power functions with respect to parameter α.
The proof of Proposition 4 is presented in Appendix C.3.
Proposition 5.The test M G 2 for the hypotheses ( 6) applied for the distribution T (ν) has size c, is unbiased and has increasing power functions with respect to parameter ν.
The proof of Proposition 5 shall be omitted as it is similar to the proof of Proposition 4.

Testing of infinite-variance distribution
In this part, we present the application of the modified Greenwood statistic in the procedure for testing of infinite-variance distribution which is much more general problem than testing of Gaussian distribution.As in previous case, let us assume that random sample comes from the family of distributions with CDF F θ (•).We assume the statistic S n is stochastically monotone within this class with respect to θ ∈ Θ.Here we assume that θ is the distribution's parameter responsible for finiteness of variance.The hypotheses of the test depend on the behavior of the tail of the distribution with respect to θ.Let us assume that θ * ∈ Θ is the boundary between finite-variance and infinite-variance distribution.Here we consider two tests corresponding to two possible scenarios.In the scenario (3) the infinite-variance distribution corresponds to θ ≥ θ * while in scenario (4) to θ ≤ θ * .Thus, for such two tests we have the following H 0 and H 1 hypotheses, respectively In this class of problems, we utilize one-sided testing procedure.Thus, the statistic obtained for a single trajectory is compared only with one critical value.For discussed tests, the rejections regions are defined as follows where, similar as in the previous case, Q p (n) is the theoretical p-quantile from the distribution of S n under H 0 and n is a sample length.Here we also apply the empirical quantile in (10), i.e., Qc (n), as the theoretical distribution of S n is unknown.In the further analysis the tests are denoted as M G 3 and M G 4 , respectively.Similar as in previous case, the rejection regions with certain confidence level c are constructed based on M simulated trajectories of X 1 , X 2 , ..., X n from the distribution with CDF F θ * (•).
It is worth noting that similar approach was introduced in [8] in the context of generalized Pareto distribution.In the current paper, we extend the testing procedure to Student's t distribution.Test M G 3 corresponds to the test (4.3) from [8] in the generalized Pareto distribution class.In this paper, testing the infinite-variance distribution for Student's t class corresponds to test M G 4 .The respective values of θ * parameters used to calculate rejection regions are γ = 0.5 and ν = 2 for GP(γ, δ) and T (ν) distributions, respectively.
In Proposition 6 we present fundamental properties of the test M G 4 applied for the distributions T (ν).
Proposition 6.The test M GT 4 for the hypotheses ( 9) applied for the distribution T (ν) has size c, is unbiased and has increasing power functions with respect to parameter ν.
The proof of Proposition 6 shall be omitted as it is similar to the proof of Proposition 4.
Finally, let us note, that we can not test infinite-variance distribution of α−stable class, strictly in the framework presented in the current section, however, in practical applications one may consider the inverse problem, i.e., testing of finite-variance distribution, which for α−stable class of distributions reduces to the testing of Gaussian distribution (see Section 4.1).

Testing of given distribution with specific value of the parameter responsible for heavy-tailed behavior
In this case, similar as previous cases, we assume that considered random sample comes from the distribution with the CDF F θ (•), where θ ∈ Θ and within this class the modified Greenwood statistic S n is stochastically monotone with respect to parameter θ.In this case the H 0 and H 1 are defined as Here θ * may correspond to any case of the considered family of distributions in contrast to the case presented in Section 4.1, where θ * was the parameter corresponding to Gaussian distribution.Additionally, we assume there exist θ 1 , θ 2 ∈ Θ such that θ 1 < θ * < θ 2 .
Here we apply two-sided testing procedure.For single sample trajectory we reject H 0 hypothesis if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value with a given significance level c.To construct the rejection region of the considered test at given confidence level c based on the statistic S n , we proceed similarly as in the previous cases.We simulate M sample trajectories of X 1 , X 2 , . . ., X n from distribution with CDF F θ * (•).For each simulated sample we calculate the value of the modified Greenwood statistic.Finally, we calculate the rejection region where, Q p (n) is the theoretical p-quantile from the distribution of S n under H 0 and n is a sample length.The described above test can be applied for all three distributions considered in this paper, namely GP(γ, δ), S(α, σ) and T (ν) distributions.In those cases the parameters θ used in the general description are α, γ and ν, respectively.

Power simulation study
In this section, we present results reflecting the versatility of utilizing the proposed statistic in various statistical tests.At first, we apply the test for Gaussianity described in Section 4.1.Eventually, we show the efficiency of the test for infinite variance introduced in Section 4.2.Moreover, for each test we compare the obtained results with the results of testing methods known in the literature.

Testing of Gaussian distribution
In this part, we present results obtained for testing of the Gaussian distribution.This part is related to the test introduced in Section 4.1.In this case, H 0 corresponds to Gaussian distribution of a random sample and H 1 corresponds to non-Gaussian distribution.The testing procedure and quantiles are based on the distribution of S n obtained in 100000 Monte Carlo simulations, thus as the number of simulations is large, the empirical distribution of S n is close to the theoretical one.The quantiles obtained in testing procedure are presented in Table B.3 (see Appendix B).Later, we denote modified Greenwood statisticbased test introduced in this paper as MG test.Moreover, we compare the results of the proposed test for Gaussianity with the tests known in the literature.We selected tests thoroughly investigated in [45] and commonly used in various applications, namely Kolmogorov-Smirnov (KS) test [38], Kuiper test [40], Cramér-von-Mises (CM) test [39], Watson test [41], D'agostino Pearson (DP) test [37], Anderson-Darling (AD) test [42], Shapiro-Wilk (SW) test [33] and Jarque-Bera (JB) test [36].In addition, we also compare the results with the Lillefors (LF) test [43], and in the class of the α-stable distribution with the likelihood ratio (LR) test [32].For all of the tests, we set significance level to c = 0.05.To prove the efficiency of the proposed test, we show the power of the tests based on 2000 Monte Carlo simulations.For α-stable distributions, we tested the procedure for samples with stability index α ∈ {1, 1.05, . . ., 2}.In case of Student's t distribution, we selected number of degrees of freedom in range ν ∈ {1, 2, . . ., 100}.We simulated trajectories with sample lengths n = 10, 50, 100, 200, 500, 1000.The results are presented in Fig. 4 and Fig. 5.For α-stable distribution (see Fig. 4), the power of a test introduced in this paper decreases with increasing value of stability index α, that is as the α-stable distribution tends to Gaussian distribution.Thus, the power increases as stability index α moves away from 2, that represents Gaussian distribution.Moreover, for smaller sample sizes, the proposed MG test also performs better or at least as good as any alternative test considered in this paper.The MG test is more efficient especially for n = 10, 50, which means that when α < 2, the MG test tends to rightfully reject H 0 more often than other analyzed techniques.For n = 100 and α ≥ 1.85, the JB test outperforms MG test.However, for n = 100 and α < 1.85, the MG test performs better than other considered tests.For n = 200, 500, 1000 the JB and SW tests slightly outperform MG test for large values of α, however in all the considered cases, the MG test is more efficient than any other test known from the literature.
The results obtained for the class of Student's t distributions are presented in Fig. 5.The power of MG test decreases as ν decreases, which is expected result, since the Student's t distribution tends to Gaussian distribution as ν → ∞.The proposed test performs as good as the JB, SW or AD tests, and outperforms other tests considered in this paper.Thus, as the power of MG test is large for large values of ν, the test can be utilized to distinguish between near-Gaussian Student's t distributions (namely when ν is large) and Gaussian distributions.

Testing of infinite-variance distribution
In this section, we present results obtained for testing for infinite-variance distribution for the class of generalized Pareto and Student's t distributions.In this case, the H 0 corresponds to infinite-variance distribution, and H 1 corresponds to finite-variance distribution.For each distribution, the quantiles of the distribution of the modified Greenwood statistic used in testing procedures were obtained based on 100000 Monte Carlo simulations.Quantile of S n statistic for generalized Pareto distribution was calculated for γ = 0.5, and for Student's t for ν = 2.The quantiles obtained in testing procedure are presented in Table B.4 and B.5 (see Appendix Appendix B).Power of a test was calculated based on 2000 Monte Carlo simulations.Moreover, we compare the performance of Greenwood statistic based test with test known in the literature, later denoted as T test.In [49], the author proposed a statistical test to identify if the random sample comes from the distribution with infinite variance, and the test statistic of the T test under H 0 has known theoretical distribution.For MG test and T test the significance level was set to c = 0.05.The results of the power of the tests were obtained for the parameters responsible for the tail of the distributions: γ ∈ {0, 0.1, . . ., 2} for generalized Pareto distribution, and ν ∈ {1, 2, . . ., 15} for Student's t distribution.Simulated trajectories had lengths n = 10, 50, 100, 200, 500, 1000.
For generalized Pareto distribution when γ ≥ 0.5 the variance is infinite, while it is finite otherwise.In Fig. 6, we present results obtained for MG test and T test for infinite variance in case of generalized Pareto distribution.The power of introduced MG test decreases with increasing γ, as expected.The test outperforms the T test, even in case of smaller sample sizes.Namely, the MG test correctly does not reject H 0 for γ ≥ 0.5, and for γ < 0.5 the test rejects H 0 in favor of H 1 .Moreover, when n = 10 the T test rejects H 0 for all values of γ and fails to distinguish between infinite-and finite-variance distributed samples.
The power of MG and T tests for Student's t distribution are presented in Fig 7 .In this case for ν > 2 the variance exists while for ν ≤ 2 it is infinite.The power of proposed test increases as ν increases, as expected.As the sample size increases, the proposed in this paper test becomes more accurate, which means that for finite-variance distribution with ν > 2, it tends to more often rightfully reject H 0 hypothesis in favor of H 1 .Comparing with T test, the MG test is more effective for all sample sizes, as it does not reject H 0 for ν ∈ {1, 2} when the distribution has infinite variance.Moreover, T test wrongly rejects H 0 when n = 10, 50 for all ν, failing to detect infinite-variance for small sample sizes as opposed to MG test, which does not reject H 0 for ν ∈ {1, 2} and rejects H 0 in favor of H 1 otherwise.

Real data analysis
In this section we investigate the efficiency of the proposed MG test for real data.The considered problem is related to condition monitoring in mining industry and identification of the properties of the background noise in the vibration signal.Here we apply the modified Greenwood statistic-based test for Gaussian distribution and for infinite-variance distribution of the examined random sample.This approach is especially useful in the analysis connected to condition monitoring, when one have to select proper tool for fault detection based on the vibration signal, see [53] for more details about the examined problem from the condition monitoring perspective.The selected dataset was analyzed in [67,53].The signal was obtained from hammer crusher in good condition.Hammer crusher is a kind of machine that is used for fragmentation of lumps of cooper ore.The dataset consists of 25500000 observations collected during 1.7 minutes.
In our analysis, we apply the testing procedure to the raw signal (time domain) and to its spectrogram representation (time-frequency domain), as for the vibration signal the time-frequency representation of the data is usually examined.As the signal in the time domain has 25500000 observations, we segmented the data into 25500 non-overlapping sub-signals, each consisting of 1000 observations.In time-domain for such sub-signals the proposed approach is applied.
Let us note that for lower frequency range, usually there are some periodic components [53].Thus, in order to analyze the signal in time-frequency domain, first we applied high-pass filtering to data in time domain and analyzed higher range of frequencies from 1kHz to 12.5kHz.The spectrogram S(f, t) is defined as square of the short time Fourier transform of given signal [53], namely where x 1 , x 2 , . . ., x n are the observations in the time domain, w(•) is a window function, t ∈ T is a time point and f ∈ F is the frequency.Hence, the spectrogram is two dimensional map.It is worth noting that the number of time points ñ in the time-frequency domain depends on the windowing function, overlapping parameter and the number of the observations in the time domain.In the testing procedure vectors S(f, t 1 ), S(f, t 2 ), . . ., S(f, t ñ) are considered as the random samples.Thus, for each frequency we obtain a sub-signal of length ñ.As the MG test is valid for independent data, in our analysis the overlapping parameter is set to 0. Selected windowing function was kaiser(2000, 5) in MATLAB software.Thus, in time-frequency domain we obtained 943 sub-signals, each consisting of 1275 observations.For such data, we first we apply the test for Gaussian distribution and then for the infinite-variance distribution in order to confirm the results presented in [67,53].In case of modified Greenwood statistic-based test for Gaussian distribution, H 0 corresponds to Gaussian distribution and H 1 corresponds to non-Gaussian distribution.The distribution of S n statistic under H 0 hypothesis is obtained based on 10000 Monte Carlo simulations.In the MG test for infinite variance, H 0 corresponds to infinite-variance distribution and H 1 corresponds to finite-variance distribution.We apply the testing procedure for two classes of distributions: generalized Pareto and Student's t distributions.The distribution of S n statistic under H 0 hypothesis is obtained based on 10000 Monte Carlo simulations for γ = 0.5 and ν = 2 for generalized Pareto and Student's t distribution, respectively.In case of analysis in the time-frequency domain, the distribution of S n was obtained for spectrograms of simulated random samples from Gaussian distribution, and for γ = 0.5 and ν = 2 for generalized Pareto and Student's t distribution, respectively.
The results of the testing procedures are presented in Table 1 (for Gaussian distribution testing) and Table 2 (for infinite-variance distribution testing).In case of testing for Gaussian distribution, the MG test rejects H 0 for 89% sub-signals in the time domain and for all of the frequencies in the time-frequency domain.Thus, we conclude that the signal is not Gaussian-distributed.In case of MG test for infinite variance, under the assumption of generalized Pareto distribution, in time domain the H 0 was rejected for 45% of the sub-signals and in time-frequency domain H 0 was rejected for 38% frequencies.Thus, in time domain for the MG test with rejection region of S n calculated for generalized Pareto distribution, the results are ambiguous.For MG test based on distribution of S n statistic obtained for Student's t distribution, in time domain the H 0 was rejected for 25% of the sub-signals, and in spectrogram representation H 0 was rejected for 11% of frequencies.Thus, the distribution of the noise of the signal can be classified as the infinite-variance as for both time and time-frequency domain the majority of the sub-signals is classified as infinite-variance distributed.The obtained results confirm the results from both [67] and [53], where the signal also was classified as infinite-variance distributed.

Summary and Conclusions
In this paper, we discuss the modified Greenwood statistic, which is a simple extension of the popular Greenwood statistic widely discussed in the literature.The modified Greenwood statistic can be applied to random samples from any distribution, thus its applicability is much wider than in the case of the classical Greenwood statistic, which is defined only for samples with positive distributions.Here, we discuss the main properties of the proposed statistic for general random samples, with particular attention paid to the stochastic monotonicity within three general classes of distributions: namely, the generalized Pareto, α-stable, and Student's t distributions.The class of Student's t distributions is enriched with the Gaussian distribution, which is considered in this class as the limiting distribution.The stochastic monotonicity of the modified Greenwood statistics is discussed with respect to the parameters responsible for the tail behavior in the considered classes of distributions, namely γ (in the case of the generalized Pareto distribution), the stability index α (in the case of the α-stable distribution), and the number of degrees of freedom ν (in the case of the Student's t distribution).
The proven theoretical properties of the modified Greenwood statistic within three general classes of distributions enable us to propose it as a test statistic.In this paper, we describe three testing scenarios where the introduced statistic can be applied.In the first scenario, the modified Greenwood statistic is applied in the goodness-of-fit test for the Gaussian distribution.This is a classical problem discussed widely in the literature, with many proposed solutions.Our proposition extends the existing literature.The presented simulation study, where we analyze the power of the test for the Gaussian distribution within α-stable, and Student's t classes of distributions, clearly confirms the superiority of the modified Greenwood statistic-based test over classical tests, especially for small sample sizes.
In the second scenario, we propose using the modified Greenwood statistic to test for the infinite-variance distribution of a given sample.This problem is much more general than testing for a specific distribution.We note that knowledge about the finiteness of the theoretical variance is crucial for selecting appropriate tools for data analysis.This point has been extensively discussed in the context of condition monitoring, as seen in [53], however, it is also crucial in many other areas of interest.Here, the modified Greenwood statistic proves to be a perfect tool for the considered problem.Through presented simulation studies for two classes of distributions, namely, generalized Pareto and Student's t, we demonstrate the efficacy of the proposed approach compared to the known method proposed in [49].
The last scenario proposed in this paper involves using the modified Greenwood statistic for testing a given distribution with assumed parameters.In the simulation study, this problem is discussed in the context of testing the Gaussian distribution, so we did not perform specific simulations for the third scenario.However, we highlight the usefulness of the modified Greenwood statistic in this area as well.
To demonstrate the usefulness of the proposed approach, we consider dataset from condition monitoring area.It has been previously examined in the literature, where its specific properties were discussed.We analyzed a vibration signal to identify possible infinite-variance distributions within the data.Vibration signals are typically analyzed in time-frequency domains, often using spectrograms, and our analysis was conducted accordingly.The signal under examination was also discussed in [53], where the authors proposed a simple technique to identify non-Gaussian heavy-tailed behavior.In this paper, employing the modified Greenwood statistic, we corroborated previous findings and determined that such data correspond to an infinite-variance distribution in both the time and time-frequency domains.This result is pivotal for further analysis of such data, particularly in the context of local damage detection; for more information, refer to [53].
The simulation studies and real data analysis presented clearly confirm the usefulness of the modified Greenwood statistic in testing problems and its superiority over classical approaches.

Figure 1 :Figure 2 :
Figure 1: Comparison of PDF of asymptotic distribution of Sn and empirical PDFs of Sn, for sample lengths n = 100, 200, 500, 1000 for three considered distributions.Empirical distributions of statistic Sn are obtained based on 10000 Monte Carlo simulations.

Figure 3 :
Figure 3: Comparison of the PDFs of Sn for α−stable distribution for different values of α for sample lengths n = 10, 50, 100.The results are obtained based on 10000 Monte Carlo simulations.

Figure 4 :
Figure 4: The power of a MG test for Gaussian distribution obtained for the class of α-stable distributions.In this case, H 0 corresponds to Gaussian distribution and H 1 to α-stable distribution with α = 2.The power of MG test is compared with the power of KS, Kuiper, CM, Watson, DP, JB, SW, AD, LF and LR tests.The simulations were conducted for α ∈ {1, 1.05, . . ., 2} and for sample lengths n = 10, 50, 100, 200, 500, 1000.Black solid line represent the significance level c = 0.05 selected for all analyzed tests.The results are obtained based on 2000 Monte Carlo simulations.

Figure 5 :
Figure 5: The power of a MG test for Gaussian distribution obtained for the class of Student's t distributions.In this case, H 0 corresponds to Gaussian distribution and H 1 to Student's t distribution.The power of MG test is compared with the power of KS, Kuiper, CM, Watson, DP, JB, SW, AD and LF tests.The simulations were conducted for ν ∈ {1, 2, . . ., 100} and for sample lengths n = 10, 50, 100, 200, 500, 1000.Black solid line represent the significance level c = 0.05 selected for all analyzed tests.The results are obtained based on 2000 Monte Carlo simulations.

Figure 6 :
Figure 6: Results of MG test for infinite-variance for generalized Pareto distribution.In this case, H 0 corresponds to infinitevariance distribution and H 1 corresponds to finite-variance distribution.The power of MG test is compared with the power of T test.Power of a test was calculated based on 2000 Monte Carlo simulations.The simulations were conducted for γ ∈ {0, 0.1, . . ., 2} for sample lengths n = 10, 50, 100, 200, 500, 1000.Black solid line represents significance level c = 0.05.

Figure 7 :
Figure 7: Results of MG test for infinite-variance for Student's t distribution.In this case, H 0 corresponds to infinite-variance distribution and H 1 corresponds to finite-variance distribution.The power of MG test is compared with the power of T test.Power of a test was calculated based on 2000 Monte Carlo simulations.The simulations were conducted for ν ∈ {1, 2, . . ., 15} for sample lengths n = 10, 50, 100, 200, 500, 1000.Black solid line represents significance level c = 0.05.

Figure 8 :
Figure 8: Analyzed signal in time and time-frequency domain.Left panel presents signal obtained from the crusher (time domain).Right panel presents the spectrogram of the analyzed signal (time-frequency domain).

Table 1 :
Results obtained in testing for Gaussian distribution in time domain and time-frequency domain.In this case, H 0 corresponds to Gaussian distribution, H 1 corresponds to non-Gaussian distribution.We present the percentage of sub-signals for which the H 0 was rejected.

Table 2 :
Results obtained in testing for infinite variance in time domain and time-frequency domain.In this case, H 0 corresponds to infinite-variance distribution, H 1 corresponds to finite-variance distribution.We present the percentage of sub-signals for which the H 0 was rejected.