Parameter estimation for stable distributions with application to commodity futures log returns

This paper explores the theory behind the rich and robust family of {\alpha}-stable distributions to estimate parameters from financial asset log-returns data. We discuss four-parameter estimation methods including the quantiles, logarithmic moments method, maximum likelihood (ML), and the empirical characteristics function (ECF) method. The contribution of the paper is two-fold: first, we discuss the above parametric approaches and investigate their performance through error analysis. Moreover, we argue that the ECF performs better than the ML over a wide range of shape parameter values, {\alpha}{\alpha} including values closest to 0 and 2 and that the ECF has a better convergence rate than the ML. Secondly, we compare the t location-scale distribution to the general stable distribution and show that the former fails to capture skewness which might exist in the data. This is observed through applying the ECF to commodity futures log-returns data to obtain the skewness parameter.

ABOUT THE AUTHOR Mr M. Kateregga is a finishing PhD student at the University of Cape Town in South Africa. His research is in the field of mathematical finance and his PhD thesis is entitled Stable Distributions with Applications in Finance. The current paper is a chapter in his thesis which is due for submission in August, 2017. Mr Kateregga is also a researcher at the African Collaboration for Quantitative Finance and Risk Research (ACQuFRR) which is the research section of the African Institute of Financial Markets and Risk Management (AIFMRM), which delivers postgraduate education and training in financial markets, risk management and quantitative finance. Mr Kateregga also works with the African Institute for Mathematical Sciences (AIMS) in South Africa as a Research Assistant.

PUBLIC INTEREST STATEMENT
This paper is entitled parameter estimation for stable distribution with applications to commodity future log-returns. The paper is useful to individuals interested in investing their wealth in financial markets. It provides essential information on how historical asset prices can inform future market movements via parameter estimation. This is crucial to portfolio managers, speculators, and hedgers. It's imperative that the most accurate estimation method is established. Market data distribution deviates from the normal distribution, it exhibits skews, high or low peaks, and fat or skinny tails. The current paper is geared towards establishing the best estimation method among known methods in economic and financial analysis for skewed data.

Introduction
The motivation for this paper derives from the fact that parameter estimation from historical data is an important analysis to financial market participants. It provides useful information for portfolio managers, speculators, and hedgers. It is therefore, imperative that the most accurate estimation method is established. It is a known fact that in general, market data deviates from the Gaussian distribution, its distribution is either skewed, high or low peaked, and/or with fat or skinny tails. The current paper is geared towards establishing a better parameter estimation method among the commonly known ECF, ML, quantile, and logarithm moments methods used in economic and financial analysis for skewed data assumed to flow stable distributions.
The application of stable distributions in finance is traced way back in the late 50s when Mandelbrot (1959Mandelbrot ( , 1962Mandelbrot ( , 1963) developed a hypothesis that revolutionalized the way economists viewed and interpreted prices in speculative markets such as grains and securities markets. The hypothesis suggested that prices were not Gaussian as it had been previously believed by market participants based on Bachelier (1900). Mandelbrot's hypothesis was therefore, an extension of the widely embraced breakthrough of Bachelier (1900).
In the following years Zolotarev (1964) developed integral representations of stable laws and the results have been used to develop parameter estimation techniques for the stable laws. Fama (1963) reviewed the validity of Mandelbrot's hypothesis and came up with statistical tools suitable for dealing with speculative prices. Dumouchel (1971) employs this class of distributions in statistical inference for long-tailed data. Graphical representation of their densities and the estimation of their parameters via interpolation appear in Holt and Crow (1973) and in Koutrouvelis (1980) using regression. Parameter estimation methods based on quantile methods are presented in Fama and Roll (1971) for symmetric stable distributions but this approach faces a problem of discontinuity of the traditional location parameter in the asymmetrical cases when the exponent parameter passes unity. A remedy and generalization of the quantile approach is later introduced by McCulloch (1986).
A different parameter estimation technique based on fractional lower order moments (FLOM) appears in Ma and Nikias (1995) where the authors develop new methods for estimating parameters in impulsive signal environments. However, their methods only cover symmetric stable distributions. There was a need to extend the method to asymmetric systems. This came through by Kuruoğlu (2001) where a generalized FLOM method is introduced. Generally, FLOM methods pose a challenge of having to estimate the Sinc function and this in turn affects the accuracy of the results. As a consequence a better estimation approach referred to as logarithmic moments method (LM) is proposed by Kuruoğlu (2001) to avoid having to compute the Sinc.
The third estimation method utilizes the maximum likelihood (ML). It is known that the ML approach is widely favored in economic and financial applications due to its generality and asymptotic efficiency (see for instance, Yu, 2004). However, there are cases where the ML method can be unreliable especially when the likelihood function is not tractable, or its not bounded over the parameter space or does not have a closed form representation. For instance, in this current paper the densities considered do not have closed form expressions. However, since there is a one-to-one correspondence between the density function and its Fourier transform it could be worth exploiting the latter since it always exists and its bounded. This leads us to next estimation method.
The fourth estimation approach is the empirical characteristic function (ECF) method discussed in Yu (2004). Although the likelihood function can be unbounded, its Fourier transform is always bounded and, while the likelihood function might not be tractable or could not be of a closed form, the Fourier transform could have a closed form expression. The Fourier transform of the density function is the characteristic function (CF), hence the name empirical characteristic function (ECF) method. In this paper we aim to show that this approach performs better than all the previously mentioned methods. A useful software package that can be used to estimate stable distributions is provided in Nolan (1997). A more theoretical approach to statistical estimation of the parameters of stable laws is extensively discussed in Zolotarev (1980). Readers interested in how to simulate stable process can refer to two excellent literatures of Weron and Weron (1995) and Zolotarev (1986). This paper explores the theory behind the rich and robust family of -stable distributions to estimate parameters from financial asset log-returns data. We discuss four-parameter estimation methods including the quantiles, logarithmic moments method, ML, and the empirical characteristics function (ECF) method. The contribution of the paper is two-fold: first, we discuss the above parametric approaches and investigate their performance through error analysis. Moreover, we argue that the ECF performs better than the ML over a wide range of shape parameter values, including values closest to 0 and 2 and that the ECF has a better convergence rate than the ML. Secondly, we compare the t location-scale distribution to the general stable distribution and show that the former fails to capture skewness which might exist in the data. This is observed through applying the ECF to commodity futures log-returns data to obtain the stable parameters.
The rest of the paper is organized as follows: in Section 2 we define a stable process and its construction from independent and identically distributed random variables based on a generalized central limit theorem and discuss its characterization. In Section 3 we study the density and distribution properties of stable processes through their characteristic functions. Section 4 explains how the four-parameter estimation methods discussed in this paper work and provides an analysis on their accuracy. In Section 5 we study and analyze some commodity data and show that the data deviates from the normal distribution hypothesis. We use the ECF to obtain the four stable parameters from the data and in addition, fit the data to various distributions to determine the closest shape of the data which turns out to be the t location-scale distribution for all our data. This distribution is suited for data that is highly peaked and heavily tailed with outliers. However, we propose stable distribution fitting to check for any existing tails. Section 6 concludes.

Stable processes
Stable also known as alpha-stable (or equivalently -stable) processes belong to a general class of Lévy distributions. They are limiting distributions with a definitive exponent parameter that determines their shape.

Definition and construction
Definition 2.1 Let X 1 , X 2 , … , X n be independent and identically distributed random variables and suppose a random variable S defined by where "⇒" represents weak convergence in distribution, a n is a positive constant and b n is real. Then S is a stable process and the constants a n and b n need not to be finite. Definition 2.1 allows modeling of a number of natural phenomenon beyond normality using stable distributions. The fact that a n and b n do not necessarily have to be finite provides the generalized central limit theorem.
Definition 2.2 (Generalized Central Limit Theorem Rachev (2003)) Suppose X 1 , X 2 , … denotes a sequence of independent and identically distributed random variables and let sequences a n ∈ ℝ and b n ∈ ℝ + . Then we can define a sequence of sums Z n such that their distribution functions weakly converge to some limiting distribution: where H(x) is some limiting distribution.
The traditional central limit theorem assumes finite mean a: = [X i ] and finite variance 2 : = ar[X i ] and defines the sequence of sums such that the distribution functions of Z n weakly converge to h sG (x): where h sG (x) denotes the standard Gaussian distribution.
Suppose the independent and identically distributed random variables X i equal to a positive constant c almost surely and the sequences a n and b n in (2) are defined by a n = (n − 1)c and b n = 1, then Z n is also equal to c for all n > 0 almost surely. In this case the random variables X i are mutually independent and as a result, the limiting distribution for the sums Z n belong to the stable family of distributions by definition. This is one reason why they are regarded as stable.
(3) > 0 is the scale parameter (it narrows or extends the distribution around ).
(4) ∈ ℝ is the location parameter (it shifts the distribution to the left or the right).
Suppose a random variable s follows a stable distribution S( , , , ) then the random variable z = (s − )∕ has the same-shaped distribution as s but with the location parameter = 0 and the scale parameter = 1. This is another reason why they are referred to as stable, the shape is maintained after any rescaling.
The densities are generally computed using characteristic functions through transformations such as the Fourier. 1 One can also refer to the work of Zolotarev (1964Zolotarev ( , 1980Zolotarev ( , 1986 for straight-forward and easy-to-compute integral representations of stable distribution and density functions. The distribution functions for the different values have been tabulated in Dumouchel (1971), Fama and Roll (1968) and Holt and Crow (1973).

Special case
Let (X t , t ≥ 0) denote a Lévy process. The characterization of X t is deduced from the Lévy-Khintchine formula.
be a Lévy process. There exist b ∈ ℝ, ≥ 0 such that the characteristic function of X is given by where 1 {⋅} is an indicator function and m is a -finite measure satisfying the constraint Definition 3.2 (The Lévy-Itô Decomposition Applebaum (2004)) If X t is a Lévy process, there exist b ∈ ℝ, a Brownian motion B (t) with variance ∈ ℝ + and an independent Poisson random measure N on ℝ + × (ℝ − {0}) such that, for each t ≥ 0, where The compensated compound Poisson random measure is defined by Ñ = N − t to preserve the martingale property. The Lévy measure satisfies (8).
A stable distribution can be constructed by setting to zero in (7) or the second term on the right of (9) to zero and the Lévy measure in (8) to This gives a pure jump Lévy process which is a simple example of a stable family of distributions. We discuss a general case in the following.

General case
In the following, (S t ) t≥0 will represent a stable process. Its characteristic function Φ is obtained using the definition of domain of attraction of stable random variables and the Lévy-Khinchine representation formula in Definition 3.1 (see Applebaum, 2004): Alternative forms of parametrization are discussed in McCulloch (1986) for easier numerical analysis. More discussion on this to follow in Section 3.4.
The density of S t is computed from (12) using the Fourier transform: Figure 1 shows density graphs for different exponent parameter values. The density is defined over the whole real line and for application purposes in finance log-returns data is usually used instead of raw asset prices to fit this family of distributions. The drawback in approximating (13) is that elementary techniques such as expressing the integral in terms of simple functions or using infinite polynomial expressions of the density function are not sufficient for meaningful numerical analysis. Some authors propose a standard parameterized integral expression of the density given by (see Ament & O'Neal, 2016) However, this representation consists of an oscillating integrand which in turn leads to another alternative approach presented in Zolotarev (1986) where the density of S t is given by where = arctan tan 2 2 sign(s − ).

Some properties of stable distribution functions
Firstly, recall that for any two admissible sets of parameters of stable distributions we can find two unique numbers a > 0 and b such that where The intuition is that a general stable distribution can be expressed in terms of a standard stable distribution. That is, we can write S( , , , ) Secondly, suppose h, H and Φ denote the respective probability, cumulative density and characteristic functions of a stable random variable, S, where then it is readily seen that the following properties hold: (1) h(−s, , ) = h(s, , − ).
The above three relations can be verified by trigonometric properties.

Simulating -stable random variables
The two excellent references for simulating stable processes are Zolotarev (1986) and Chambers, Mallows, and Stuck (1976).
Definition 3.3 Suppose S t is a stable process with parameters ( , 2 , 2 , ), the characteristic function is given by where Lemma 3.4 Let ∈ − 2 , 2 be a uniformly distributed random variable and let W be an independent exponential random variable with mean 1. Then is a standard -stable process with parameters ( , 2 , 1, 0).
Proof See Zolotarev (1986). ✷ A stable random variable can be easily generated using Lemma 3.4. Programming languages such as R or MATLAB can be utilized to generate a uniformly distributed random variable U on the interval − 2 , 2 and an independent exponential random variable E with mean 1 2 . Then the stable random variable would be generated by computing where A , = 1 + 2 tan 2 2 1 2 and B , = tan −1 ( tan 2 ) .

Moments of stable processes
Statistical moments [| ⋅ | k ] of stable distributions are finite only when k ≤ . Moreover, for < 2 the variance is infinite, for ∈ (0, 1] the mean does not exist and the mean is zero when ∈ (1, 2). This is not always the case for symmetric stable distributions where = 0.

Fractional lower order moments
The FLOM is an alternative for computing moments of -stable random variables especially in situations where the mean and/or variance are infinite. FLOM representation formulas are discussed in = 1 Ma and Nikias (1995) for symmetric stable random data and its generalization to asymmetric stable random data in Kuruoğlu (2001). In the latter, if S i ∼ S( , , , ) and ≠ 1, then where = arctan tan 2 and Γ denotes the Gamma function. From the above representations, moments with negative values of p are attainable. This results into the logarithmic moments approach that provides an easier way of estimating stable distribtuion parameters compared to the FLOM.

Logarithmic moments
This approach is as a result of the challenges encountered when using the FLOM method which requires computing Gamma functions, the inversion of the sinc function and it only works for some p.
The current method suggests computing derivatives with respect to the moment order p resulting in moments of the logarithms of the stable process. We illustrate in the following.

Lemma 3.5 Let S denote a symmetric stable random variable and let p ∈ ℝ. Then
The moments follow readily for n = 1, 2, …. i.e.

Parameter estimation of stable processes
The four common methods for estimating parameters of stable processes include: quantiles method (see Fama & Roll, 1971;McCulloch, 1986McCulloch, , 1996, the logarithmic moments method (see Kuruoğlu, 2001), the empirical characteristics method (see Yang, 2012), and the ML method (see Nolan, 2001). We investigate their accuracy in the following.

The quantiles method
The quantile method was pioneered by Fama and Roll (1971) but was much more appreciated through McCulloch (1986) after its extension to include asymmetric distributions and for cases where ∈ [0.6, 2] unlike the former approach that restricts it to ≥ 1.

Empirical characteristic function method
Suppose a set of observable data {s 1 , s 2 , … , s N } follows a stable distribution. Then we can approximate the characteristic function of this data by applying a basic Monte Carlo approach based on the law of large numbers i.e.
We can express the characteristic function (12) in terms of the cosine and sine function from basic trigonometric principles, i.e.
where As a result, we observe that =̂+̂̂tan2 .  The estimated characteristic function relates to the model parameters by Solving this system leads to the estimation representation formulas for the stability and variance parameters: The real and imaginary parts of the characteristic function (36) provide estimates for ̂ and ̂: Suppose Υ(u): = arctan(ImΦ(u)∕ReΦ(u)) and choose another set of positive numbers u k , k = 3, 4 together with ̂ and ̂ then the estimates of the location and skewness parameters are given respectively by Notice, it can be deduced from Equation (36) that This provides an alternative way to envision the regression estimation method: where y k = log(− log |Φ(u k )| 2 ), m = log(2 ), x k = log(u k ) and k is an error term. The stability parameter and the scale parameter can be estimated by selecting u k = k 25 , k = 1, 2, … , M; of real data (see Koutrouvelis, 1980, Table I). The estimates ̂ and ̂ are then used to estimate and using the following relation where z l = Υ n (u l ) + k n (u l ), l =̂lu − |̂lu|̂sign(u) (u,̂) and l is some random error. The proposed real data set for Q (see Koutrouvelis, 1980, Table II) is u l = l 50 , l = 1, 2, … , Q.

Logarithmic moments method
This approach follows the theory discussed in Section 3.5.2. The key innovation with this method is that there is no need of computing Gamma functions and the sinc function as in the FLOM. Secondly, techniques of parameter estimation for symmetric stable random variables (i.e. = 0) can be log(− log(|Φ(u)| 2 )) = log(2 ) + log(u).
applied to skewed stable random variables (i.e. ≠ 0) and, techniques of parameter estimation for centered stable random variables (i.e. = 0) to non-centered ones (i.e. ≠ 0) through centro-symmetrization. However, this comes at a cost of losing almost half of the sample data. Therefore to obtain better estimates one has to use large sample data sets.

Centro-symmetrization of stable random data sets
Let S k be a sequence of n independent stable random variables distributed according to Then the distribution of a weighted sum of the above sequence with weights a k can be estimated using their characteristic function: where the pth power of a number x is defined by As a result, it is easy to obtain sequences of independent stable random variables with zero , zero as well as both zero and zero for ≠ 1. This yields the centred, deskewed, and symmetrized sequences:

Parameter estimation
Suppose S k is a data set assumed to be drawned from S( , , , ). Then the exponent parameter is estimated by setting = 0 in (27), and the log moment M 2 is estimated from the obverted data (45). That is, The estimated ̂ is used to estimate using (26) where M 1 is estimated from the obverted data (44). That is, From the definition of , | 0 | can be estimated by Centering (see (43)) requires |̂0| to be multiplied by (2 + 2 )∕(2 − 2 ) to obtain |̂| of the original data where the sign of is determined by S k ∼ S( , , , ).
where S max , S md , S min is the maximum, median and minimum of the original data.
Finally, the location parameter is estimated by where 0 is the median or mean of the obverted data ().

Maximum likelihood method
The ML method is the most favored parameter estimation method in economic and financial applications. The method relies on the density function which in the case of stable distributions poses a closed form representation problem. In this case we propose a numerical estimation of the density function. For a vector s = (s 1 , s 2 , … , s n ) of independent identically distributed random variables assumed to follow a stable distribtion, the ML estimate of the parameter vector Θ = ( , , , ) is obtained by maximizing the log-likelihood function given by where h (s; Θ) denotes a numerically estimated stable probability density function. It is shown for instance in Mittnik, Rachev, Doganoglu, and Chenyao (1999) that the best algorithms to compute the ML is by using Fast Fourier Transforms (FFT) or by direct integration method as in Nolan (2001). The ML algorithms require carefully chosen initial input parameters which in our case can be obtained for example, through the quantiles method described above. The FFT is faster for large data sets and the direct integral approach is suitable for smaller data sets since it can be evaluated at any arbitrary point.
In the following section, we analyze commodities and apply the empirical characteristic functions method to estimate the stable distribution parameters.
It is important to mention the restrictions on the parameters under which the different estimation methods operate.

Error analysis
In this section we simulate datasets from the stable family of distributions based on the theory in Chambers et al. (1976) and Weron and Weron (1995). Then use the above four methods to retrieve the stable parameters from the simulated data. Our focus is on the and but the arguments extend to the other two parameters.
First, it is important to mention that all the four methods perform poorly close to the boundaries i.e. → 0, → 2 and → ±1. Moreover, literature shows that the methods operate efficiently under the parameter restrictions in Table 1.
In addition, the MLE seems the most preferred and used estimation method. However, we observe in our analysis that this method fails for particular parameter ranges and it is not robust. For instance in estimating 0.1 < < 1.0 with respect to , the MLE fails to converge and returns huge unrealistic errors. This is why we do not include it in Figure 2(a). Similarly, for = 0.4 estimation with respect to , the logarithm moments method returns either negative or very large values which is expected according to the constraints in Table 1. We omit its graph in Figure 2(b). Meanwhile, we  notice that in both cases, the quantile and ECF methods work well with the latter providing relatively the best estimates.
The graphs in Figure 3 show the error associated with estimating 1.0 < < 2.0 for different values. Note that all the four methods work well and we still notice the ECF being relatively the most accurate and robust method. Recall that for → 1 and → 2 the estimation methods perform poorly. An example is Figure 3(a) (for = 1.4) which was the closest for which the ML would converge but for higher > 1.4 values but far less than 2.0 (see for instance, Figure 3(a) for = 1.7) the methods performed relatively better except for the logarithm moments methods. (a)

(b)
The graphs in Figure 4 illustrate convergence of the quantile, ECF and the MLE in estimating = 1.4 and = 1.7. We simulated 50,000 points and divided it into 100 sets starting with a 500-sized set and increasing it by 500 to 50,000. The logarithm moments method performed extremely poorly and incomparable to the above three methods. It is not included in Figure 4(a) and (b). The ECF is seen to perform better than the quantile and ML methods with a relatively better convergence rate. Similary Figure 5 shows the convergence rates for the quantile, ECF and ML estimation methods. The ECF still provides a better precision in both cases i.e.  (a)

(b)
In summary the empirical characteristic function method outperforms all the three other methods discussed in this paper in the following way: (1) It is robust and can consistently estimate a wide range of and parameters.
(2) It provides a better precision compared to the quantile, logarithm moments and MLE methods for a wide range of and parameters.
(3) It has a better convergence rate.
Therefore the quantile, logarithm moments or the ML methods can be used to provide initial parameters for the ECF method. Similarly, the latter can be used to provide initial parameters for better future estimators.
The following section is devoted to extracting stable parameters from log-returns commodity futures data using the ECF method.

Commodity data
The data sets used here are obtained from Quandl Financial and Economic Data website. The sets differ in sizes and include settled prices of Corn Futures Continuous Contract C#1 from 1959-07-01

The t-location-scale distribution
The t-location-scale distribution is most suited for modeling data distributions with heavier tails, more prone to outliers than the Gaussian distribution. The distribution uses the following parameters

Scale parameter
The probability density function (pdf) of the t-location-scale distribution is given by where Γ(⋅) denotes the gamma function. The mean of the t-location-scale distribution is given by and it is defined for * > 1 and undefined otherwise. The variance is given by The t-location-scale distribution approaches the Gaussian distribution as * approaches infinity and smaller values of * yield heavier tails. This distribution does not take skewness into consideration and its three parameters are usually estimated using the ML estimation method.
Using algorithms by Sheppard (2012) on our log-returns commodity futures data we obtained fittings in Figures 6-8.
According to the * values, the log-returns data exhibit some tails. To determine the nature of the details one would require to run some QQ plots but this can also be observed directly from the Figures 6-8. It is important to mention that QQ plots do not straight away provide conclusive evidence about the nature of the tails. More tests would still need to be made. For instance under the t-location scale it is not obvious to observe any skewness in the data. We however, view this effect when we fit the data to stable distribution (see Table 2) as discussed in the following section.

Stable distribution fitting
On the other hand, by assuming stable distribution for our log-returns commodity futures data, we employed the ECF method and obtained the stable parameters in Table 3.
Log-returns of commodity futures are not only high peaked but they also have left and right skinny tails with extreme outliers as observed from the QQ-plots for energy commodities (i.e. Crude oil, Natural gas and Gasoline) in Figure 9, the grains commodities in Figure 10 and the precious metals in Figure 11. Table 3 shows stable distribution parameters extracted from the log-returns data using the empirical characteristic function parameter estimation method. We notice that the data exhibit a bit of skewness which is not reflected in the t-location-scale distribution fitting.

Conclusion
First we showed that the ECF provides the best precision in estimating a wide range of and parameters, it is robust and provides better convergence compared to the quantile, ML, and the logarithm moments. Secondly, we have illustrated that in general, the distribution of the commodity futures log-returns data is closest to a t-location-scale distribution due to its high peaks, skinny tails and extreme outliers. Moreover, by using the ECF estimation method we realize some minor skewness effects not captured in the t-location-scale fitting. We recommend the ECF as a suitable approach for estimating parameters of any skewed financial market data and could be used to obtain initial input parameters for future and better estimation techniques.

Funding
This work was supported by funds from the National Research Foundation of South Africa (NRF), the African Institute for Mathematical Sciences (AIMS) and the African Collaboration for Quantitative Finance and Risk Research (ACQuFRR) which is the research section of the African Institute of Financial Markets and Risk Management (AIFMRM), which delivers postgraduate education and training in financial markets, risk management and quantitative finance at the University of Cape Town in South Africa.