Super Generalized Central Limit Theorem: Limit distributions for sums of non-identical random variables with power-laws

In nature or societies, the power-law is present ubiquitously, and then it is important to investigate the mathematical characteristics of power-laws in the recent era of big data. In this paper we prove the superposition of non-identical stochastic processes with power-laws converges in density to a unique stable distribution. This property can be used to explain the universality of stable laws such that the sums of the logarithmic return of non-identical stock price fluctuations follow stable distributions.

Introduction-.There are a lot of data that follow the power-laws in the world. Examples of recent studies include, but are not limited to the financial market [1][2][3][4][5][6][7], the distribution of people's assets [8], the distribution of waiting times between earthquakes occurring [9] and the dependence of the number of wars on its intensity [10]. It is then important to investigate the general characteristics of power-laws.
In particular, as for the data in the financial market, Mandelbrot [1] firstly argued that the distribution of the price fluctuations of cotton follows a stable law. Since the 1990's, there has been a controversy as to whether the central limit theorem or the generalized central limit theorem (GCLT) [11] as sums of power-law distributions can be applied to the data of the logarithmic return of stock price fluctuations. In particular, Mantegna and Stanley argued that the logarithmic return follows a stable distribution with the power-law index α < 2 [2, 3], and later they denied their own argument by introducing the cubic laws (α = 3) [4]. Even recently, some researchers [5][6][7] have argued whether a distribution of the logarithmic returns follows power-laws with α > 2 or stable laws with α < 2. On the other hand, it is necessary to prepare very large data sets to elucidate true tail behavior of distributions [12]. In this respect, the recent study [7] showed that the large and high-frequency arrowhead data of the Tokyo stock exchange (TSE) support stable laws with 1 < α < 2.
In this study, we show that the sums of the logarithmic return of multiple stock price fluctuations follows stable laws, and it can be described from a theoretical background. We will extend the GCLT to sums of independent non-identical stochastic processes. We call this Super Generalized Central Limit Theorem (SGCLT).
Summary of stable distributions and the GCLT-.A probability density function S(x; α, β, γ, µ) of random variable X following a stable distribution [13] is defined with its characteristic function φ(t) as: where φ(t; α, β, γ, µ) is expressed as: The parameters α, β, γ and µ are real constants satisfying 0 < α ≤ 2, −1 ≤ β ≤ 1, γ > 0, and denote the indices for power-law in stable distributions, the skewness, the scale parameter and the location, respectively. When α = 2 and β = 0, the probability density function obeys a normal distribution. Note that explicit forms of stable distributions are not known for general parameters α and β except for a few cases such as the Cauchy distribution (α = 1, β = 0).
A stable random variable satisfies the following property for the scale and the location parameters. A random variable X follows S(α, β, γ, µ), when where X 0 = S(α, β, 1, 0). When the random variables X j satisfy X j ∼ S(x; α, β j , γ j , 0), the superposition Z n = (X 1 + · · · + X n )/n 1 α of independent random variables {X j } j=1,··· ,n that have different parameters except for α is also in the stable distribution family as: where the parametersβ,γ andμ are expressed as: We can prove this immediately by the use of the characteristic function for the sums of random variables expressed as the product of their characteristic functions: φ(t; α,β,γ,μ) = n j=1 φ t/n 1 α ; α, β j , γ j , 0 .
We focus on the GCLT. Let f of x be a probability density function of a random variable X for 0 < α < 2: with c + , c − > 0 being real constants. Then, according to the GCLT [11], the superposition of independent, identically distributed random variables X 1 , · · · , X n converges in density to a unique stable distribution S(x; α, β, γ, 0) for n → ∞, that is where ϕ X is a characteristic function of X as the expected value of exp(itX), E[X] is the expectation value of X, ℑ is an imaginary part of the argument, and parameters β and γ are expressed as: with Γ being the Gamma function. When α = 2, we obtain µ = xf (x)dx, σ 2 = x 2 f (x)dx and the superposition Y n of the independent, identically distributed random variables converges in density to a normal distribution: Our generalization-.We consider an extension of this existing theorem for sums of non-identical random variables. In what follows we assume that the random variables {X i } i=1,··· ,n satisfy the following two conditions.
(Condition 1): The random variables C + > 0, C − > 0 obey respectively the distributions P c+ (c), P c− (c), and (Condition 2): The probability distribution function f i (x) of the random variables X i satisfies in 0 < α < 2: where c +i and c −i are samples obtained by C + and C − . We emphasize that the probability distribution function may not be obtained even when we integrate f i (x) over c +i and c −i .
The main claim of this paper is the following generalization of GCLT: The following superposition S n of nonidentical random variables with power-laws converges in density to a unique stable distribution S(x; α, β * , γ * , 0) for n → ∞, where with ϕ i being a characteristic function of X i as the expected value of exp(itX i ), and parameters β * , γ * , β i , γ i are expressed as: Here E C+,C− [X] denotes the expectation value of X with respect to random parameter distributions P c+ and P c− . Proof-.Although the following is not mathematically rigorous, we give the following intuitive proof.
The probability distribution function of random variables {X j } j=1,··· ,N satisfying the Conditions 1-2 is expressed as: where c +j > 0 and c −j > 0 satisfy E[C + ] > 0 and E[C − ] > 0. The superposition S N is then defined as: where ϕ j is a characteristic function of X j . On the other hand, let N ′ be M × N with some M , and {X ij } i=1,··· ,M,j=1,··· ,N be samples given by the same parent to X j for each j. Then {X ij } i=1,··· ,M,j=1,··· ,N are independent, identically distributed for i = 1, · · · , M at a fixed index j. Then, we define the superposition S N ′ as follows: Here, we do not consider the convergence of S N in density for N → ∞, but consider the superposition S N ′ for N ′ → ∞, since the superposition S N will converge to the same limiting distribution of S N ′ if S N converges in density.
We focus on the convergence in density of S N ′ for M → ∞ and N → ∞ as follows. About the previous A N ′ in S N ′ , we express it as A N ′ = N j=1 A Mj with the following A Mj (j = 1, · · · , N ), Here, the superposition S N ′ is described as: Then, Y Mj converges in density to S(α, β j , γ j , 0) for M → ∞ according to the GCLT (4), that is where β j and γ j are Thus, with the stable property (2), we obtain the convergence of the superposition S N ′ as follows: where β * and γ * are: This proves the superposition S N ′ converges in density to S(α, β * , γ * , 0). Figure 1 illustrates the concept of this proof. As above, the superposition S N ′ of non-identical stochastic processes converges in density to a unique Step(i) X11 · · · X1N superposition . . . . . . . . .
To verify the main claim numerically, we use two kinds of test: two-samples Kolmogorov-Smirnov (KS) test [14] and two-samples Anderson-Darling (AD) test [15] with 5% significance level. We generate two data by different methods, and see the P −values of both of tests. Then, unless the null hypothesis is rejected, we judge the two data follow the same distribution.
For the first data, we generate non-identical stochastic processes satisfying Conditions 1-2, and prepare the superposition obtained in the same way as (6). For the second data, we generate the random numbers that follow the stable distribution, where the first data will converge to the stable distribution according to (6). Note that we compare the superposition with not a cumulative distribution function but random numbers obtained from another numerical method described below since a cumulative distribution function of a stable distribution cannot be expressed explicitly except for a few cases.
For the first data, let us consider the chaotic dynamical system x n+1 = g(x n ), where g(x) is defined [16] as follows for 0 < α < 2: This mapping has a mixing property and an ergodic in-variant density for almost all initial points x 0 . One of the authors (KU) obtained the following explicit asymmetric power-law distribution as an invariant density [16]: This asymmetric distribution behaves as follows for x → ±∞: This is exactly the same expression with the condition of GCLT (3) for random variables in X. Then, putting the variables δ 1 and δ 2 be distributed, we can obtain various different distributions with the same power-laws. We regard the parameters δ 1i and δ 2i as random samples obtained from ∆ 1 and ∆ 2 , where ∆ 1 and ∆ 2 obey P δ1 (δ) and P δ2 (δ), respectively. These are defined for δ > 0 with finite mean.
Then the parameters c +i and c −i are given as , and E[C + ] < ∞, E[C − ] < ∞ are also satisfied since δ 1i , δ 2i are not 0 and samples from some random variables ∆ 1 and ∆ 2 with finite mean. As above, we can get some stochastic processes satisfying the Conditions 1-2.
With two data obtained accordingly, we see whether the superposition S N = ( N i=1 X i − A N )/N 1/α numerically converges in density to a stable distribution S(x; α, β * , γ * , 0) or not. Table I and II show P −values of the KS test and the AD test for each α, ∆ 1 , ∆ 2 . The constant L is the length of the sequence and N is the number of sequences used for the superposition. The meaning of U(a, b) is the uniform distribution in (a, b). Figure 2 illustrates an example of correspondence when α = 1. "Crand(0, 1)" is the random numbers follow the standard Cauchy distribution. This case shows that the integral average of the probability distribution function with the Cauchy distribution is not uniquely determined.
As can be seen from Table I and II, we cannot reject the null hypothesis in any case for α. In other words, the distribution of superposition S N and the stable distribution S(x; α, β * , γ * , 0) are close enough in density according to our SGCLT.
In Figure 3, we can see that the superposition of nonidentical distributed random variables converges.
Conclusions-.We have further generalized the GCLT for the sums of independent non-identical stochastic processes with the same power-law index α. Our main claim of SGCLT can have more general applications since the various type of different power-laws exist in nature. Thus, our SGCLT can support the argument on the ubiquitous nature of stable laws such that the logarithmic return of the multiple stock price fluctuations follow a stable distribution with 1 < α < 2 by regarding them as the sums of non-identical random variables with powerlaws. Take the data of the stock market as an example. Then, for the case that the distribution of the logarithmic return of each stock price fluctuation have the almost same power-law exponents and different scale parameters (c + , c − ), we get some trends or indicators according to this SGCLT.
The authors thank Dr. Shin-itiro Goto (Kyoto University) for stimulating discussions.