Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models

This paper provides a survey of three families of flexible parametric probability density functions (the skewed generalized t, the exponential generalized beta of the second kind, and the inverse hyperbolic sine distributions) which can be used in modeling a wide variety of econometric problems. A figure, which can facilitate model selection, summarizing the admissible combinations of skewness and kurtosis spanned by the three distributional families is included. Applications of these families to estimating regression models demonstrate that they may exhibit significant efficiency gains relative to conventional regression procedures, such as ordinary least squares estimation, when modeling non-normal errors with skewness and/or leptokurtosis, without suffering large efficiency losses when errors are normally distributed. A second example illustrates the application of flexible parametric density functions as conditional distributions in a GARCH formulation of the distribution of returns on the S&P500. The skewed generalized t can be an important model for econometric analysis. --


Introduction
Assumptions about the distributions of economic variables are useful for much of economic modeling; however, it is important that the assumed models are consistent with the stylized facts.
For example, selecting a normal distribution permits modeling two data characteristics-the mean and variance, but is not appropriate for data which are skewed or have thick tails. Similarly the use of other distributions, such as the lognormal or Weibull distributions, is restricted to applications with admissible data characteristics. Efforts to model more diverse data characteristics have led to a rapid development of alternative methodological approaches in economics. Semiparametric procedures provide one approach which reduces the structure imposed in the modeling process.
Because semiparametric procedures impose relatively little structure on the data, they have desirable large sample properties under quite general conditions. However, in specific applications, the use of semiparametric procedures requires the specification of user specified objects, such as a kernel and window width in kernel regression, and since little structure is assumed, the resulting models may not be parsimonious. In addition, if the assumed structure in a parametric model is approximately correct, the resulting estimator will typically have superior properties to a semiparametric estimator. Pagan and Ullah (1999) provide an excellent summary of these and related issues.
In this paper, we explore an intermediate position between the specification of a simple parametric form for the probability density function and semi-parametric estimation. This approach is based on "flexible" parametric density functions that involve few parameters but can accommodate a wider range of data characteristics than are available with such commonly used distributions as the normal, lognormal, or the student t distribution. Section 2 summarizes three alternative families of flexible probability density functions, some basic distributional characteristics, important special and limiting cases, and a visual representation of skewness and kurtosis combinations spanned by the respective distributions. Section 3 considers two applications of these distributions in economics: quasi maximum likelihood estimation of regression models and GARCH modeling. Concluding remarks are offered in Section 4.

Alternative Models
The normal and Laplace distributions are two of the first probability density functions to have been considered for model building in economics and statistics. They are both symmetric and have kurtosis of 3 and 6 respectively and provide good models for many economic series, with the Laplace being able to model thicker-tailed distributions than the normal. However, it is not uncommon to encounter data which is both skewed and heavy tailed in economics and finance applications. In the following, we summarize three alternative families of distributions that may be used as models for possibly skewed and thick-tailed distributions.

Skewed Generalized T distribution (SGT)
The skewed generalized t distribution (SGT) was obtained by Theodossiou (1998) where is the beta function, m is the mode of y and the parameters p and q are both positive and control the height and tails of the density. The parameter Setting λ = 0 in the SGT yields the generalized t (GT) of McDonald and Newey (1988). Similarly, setting p=2 yields the skewed t (ST) of Hansen (1994) which includes the student t distribution when λ = 0. The order moments of can be shown to be given by ; hence, the SGT defines moments of order less than the degrees of freedom ( = pq). n 3 1 The results for the order moments about the mode can be used to derive expressions for moments about the mean using the binomial expansion. The standardized values for skewness and kurtosis 2 in the ranges (-∞,∞) and (1.8,∞), respectively, can be modeled with the SGT. Thus, the SGT allows for significantly more flexibility in modeling skewness and kurtosis than the student t distribution which is symmetric and has kurtosis 3 + 6/( n -4) for degrees of freedom exceeding four.
Another important class of flexible density functions corresponds to a limiting case of the SGT. Letting q yields the skewed generalized error distribution (SGED) defined by → ∞ The parameter p in the SGED controls the height and tails of the density and λ controls the for all positive values of h. The SGED is symmetric for λ = 0 and positively (negatively) skewed for positive (negative) values of λ. The symmetric SGED, GED, is also known as the generalized power (Subbotin (1923)) distribution or the Box-Tiao (Box and Tiao (1962)) distribution. The SGED can easily be seen to include the skewed ( 0 λ ≠ ) or symmetric ( 0 λ = ) Laplace or normal corresponding to p = 1 or 2, respectively. Figure 1 provides a visual summary of the interrelationships between some of the pdf's in the SGT family of distributions where "S" denotes the skewed generalization of the indicated pdf. As the parameter p grows larger, the SGT pdf approaches the uniform pdf, for example SLaplace denotes the skewed Laplace pdf.

Exponential generalized beta of the second kind (EGB2)
The four parameter EGB2 distribution is defined by the probability density function where the parameters φ , p, and q are assumed to be positive, cf. McDonald and Xu (1995). m and φ are respectively location and scale parameters. The parameters p and q are shape parameters.
The EGB2 pdf is symmetric if and only if p and q are equal. The normal distribution is a limiting case of the EGB2 where the parameters p and q are equal and grow indefinitely large. The moment generating function for the EGB2 is from which the first four moments can be readily derived as Mean: . ψ denotes the derivative of the log of the gamma function. The EGB2 may accommodate standardized values for skewness in the range (-2.0, 2.0) and standardized values of kurtosis in the range (3.0, 9.0). Johnson (1949Johnson ( , 1994 proposed three families of distributions of random variables that are transformations of normal variables. These transformations allow modeling a wide range of values of skewness and kurtosis. We consider the inverse hyperbolic sine (IHS) transformation which allows unbounded random variables. For this paper we use a slightly different parameterization than used by Johnson (1949). Specifically, we consider

Inverse hyperbolic sine (IHS)
is the hyperbolic sine, z is a standard normal, and a, b, λ, and k are scaling constants related respectively to the mean ( μ ), variance ( 2 σ ), skewness, and kurtosis of the random variable Y. The pdf of y is given by

Applications
The flexible pdf's summarized in Section 3 have many applications in economic modeling where a normality assumption may be unnecessarily restrictive. We consider two applications in this section: (1) partially adaptive or robust estimation of regression models and (2) estimating models characterized by generalized autoregressive conditional heteroskedasticity (GARCH).

Regression Models: A Simulation Example
We use a Monte Carlo simulation to illustrate the potential usefulness and efficiency gains available from the application of the flexible distributions discussed above in regression modeling.
Following Hsieh and Manski(1984), Newey(1988), McDonald and White(1994), and Ramirez, Misra, and Nelson (2003), we simulate data from the model or data generating process (DGP) where the X t 's are drawn from a Bernoulli distribution with Prob(X=1) = 0.5. We consider three different error distributions, each with a zero mean and unit variance. One error distribution is the standard normal, another is a thick tailed variance mixture or contaminated distribution, and the third corresponds to a skewed error distribution. 4 We consider samples of size of fifty and one hundred with one thousand replications. For each model, we estimate the slope and intercept parameters using ordinary least squares (OLS) 5 and least absolute deviations (LAD) 6 as benchmarks and also estimate the parameters 7 using partially adaptive estimation based on the error distributions summarized in Section 2.
Comparing the standardized skewness and kurtosis 8 ) 4 Thus, the first error distribution is merely the unit normal, Z 1 = N[0,1]. The thick-tailed variance contaminated distribution is generated as a mixture by Z 2 = U*N[0, 1/9] + (1-U)*N [0,9] where U is 1 with probability .9 and 0 otherwise. Z 2 is symmetrically distributed with kurtosis of 24.3. The skewed distribution is generated by (SK,KU) for the normal (0, 3), mixed normal (0, 24.3), and lognormal (6.185, 113.9) error distributions with Figure 2 suggests the IHS and SGT-estimation would be compatible with the data; whereas the EGB2 estimators would not. Ramirez, Misra, and Nelson (2003) and McDonald and White(1994) applied the IHS and GT and EGB2 partially adaptive estimators to this DGP.
Adaptive maximum likelihood (AML) estimation, based on a normal kernel, following Hsieh and Manski (1984), and generalized method of moments (GMM) estimation, as outlined in Newey (1988)  The root mean squared errors (RMSE) for the estimated slope parameters, using each of the previously mentioned methods, are reported in Table 1. Since each of the flexible pdf's considered includes the normal as a special or limiting case, one would expect partially adaptive estimators to perform similarly to OLS for normally distributed errors, but not necessarily for the mixture or skewed error distributions. This intuition seems to be confirmed based on the results reported in Table 1 where we also observe that there appears to be relatively little efficiency loss for the partially adaptive estimators relative to the OLS estimator for the data generating process with normally distributed error terms. The DGP for each error distribution satisfies the Gauss Markov assumptions with OLS yielding the most efficient linear unbiased estimators, but with the normality assumption OLS is also the most efficient of all unbiased (linear and nonlinear) estimators. The contaminated normal and lognormal error distributions are examples in which the OLS estimator will be the minimum variance unbiased linear estimator, but there will be nonlinear estimators which provide significant improvements in estimator efficiency. Comparing the two panels ( T=50 and T=100) in Table 1, the results appear to be generally consistence with T convergence.
Not surprisingly, OLS has the largest RMSE of any of the estimators considered for the mixture (thick-tailed and symmetric) distribution considered. However, AML, GMM, and the 8 The normal kernel estimate of the error density is ( ) In the case of the skewed and thick tailed error distribution, OLS again performs the worst for estimating the slope. The partially adaptive estimators offer substantial efficiency gains relative to both OLS and LAD. The AML and GMM estimators perform similarly to the partially adaptive estimators for thick-tailed and symmetric distribution GT; however, the RMSE for the possibly skewed partially adaptive estimators (SGED, ST, SGT, EGB2, and IHS) appear to outperform AML and GMM for both sample sizes. The performance of the EGB2 and IHS for the skewed error distributions is particularly impressive. The strong performance of the EGB2 may be surprising since the moments of the true underlying error distribution do not lie in the portion of the moment space covered by the EGB2 as illustrated in Figure 2. In this sense, it appears that accounting for the some potential skewness and kurtosis may be more important than capturing it exactly when estimating parameters characterizing the mean. Of course, if we were interested in estimating other features of the distribution, we would expect the performance of the EGB2 to deteriorate.

GARCH Models
Many economic time series involving financial or macroeconomic data are characterized by volatility clustering, that is, the tendency of large residuals (deviations of the series from the mean) to be followed by large residuals and small residuals by small residuals of unpredictable sign. Engle's (1982)  The standard normal leads to an unconditional distribution with leptokurtosis, but does not capture all the leptokurtosis present in high frequency speculative prices (Bollerslev, Engle, andNelson 1994, p. 2979). In an attempt to better model leptokurtosis Bollersllev (1987) proposed using the standardized t-distribution. Nelson (1991) used the GED and Bollerslev et al. (1994) applied the GT. It was found that the "generalized t-distribution is a marked improvement over the GED, though perhaps not over the usual student's t distribution. Nevertheless the generalized t is not entirely adequate, as it does not account for the fairly small skewness in the fitted 's and also appears to have insufficiently thick tails for the S&P500 data." Wang et al. (2001) and Ramierez (2001) apply the EGB2 and IHS to GARCH models, respectively.

t z
To illustrate the application of the SGT distribution family in estimating GARCH models, we report the results of fitting representative SGT-GARCH, EGB2-GARCH, and IHS-GARCH specifications 10 to the Standard and Poor's 500 stock market index (S&P500). The data analyzed were computed using , where denotes the daily level for each series at time t. The data cover the period from January 1992 through December 2001 (10 years) and Table   2 reports sample characteristics. The Dickey-Fuller tests provide support for the hypothesis that the series of logarithmic changes are stationary. The S&P500 data are characterized by thick tails and negative skewness. Based on the estimated skewness, kurtosis, Jarque-Bera, and chi-square goodness of fit statistics, the assumption of normality is rejected. Comparing the sample skewness and kurtosis values with the feasible combinations in Figure 1 we note that for the unconditional distributions the IHS and SGT are consistent with the data, but the EGB2 is not. This does not imply that the same could be said for the GARCH specifications.
MATLAB was used to obtain partially adaptive or quasi-maximum likelihood estimates (QMLE) of the unknown distributional and GARCH parameters for each of the models indicated in Table 3, both with and without GARCH effects. Given that QMLE was used rather than method of moments, sample and theoretical moments may differ. Since the GARCH formulations provide a statistically significant improvement over a non-GARCH specification only the estimated GARCH formulations, their estimated parameters and robust standard errors, and conditional distributions are reported. The sample standardized skewness and kurtosis for the estimated GARCH residuals and corresponding theoretical skewness and kurtosis are also reported.
The last six rows of Table 3  equal to the number of additional parameters will be used to test the hypothesis. Specifically, for T, GED, EGB2* (symmetric EGB2 with p = q), m=1; for GT, SGED, EGB2, and IHS, m=2; and for SGT, m=3.
Since the LR-GARCH statistic is significant at the 5% level for all models, there is strong support for the GARCH specification. Similarly, all of the LR-Normal test statistics are significant at the 5% level; hence, providing the basis for rejecting the normal specification for each model for the S&P500 data. The last row in Table 4 reports LR tests comparing several skewed pdf's with 11 The asymptotic 2 χ may not be appropriate in comparing nested models in which the parameters of the constrained model lie on the boundary of the parameter space, where the asymptotic distribution may be a mixture of 2 χ 's. their corresponding symmetric special cases, GT vs. SGT, T vs. SGT, GED vs. SGED, and the EGB2* vs. EGB2. Statistical significance on arises in the case of the EGB2; hence, compelling evidence of asymmetry in the conditional pdf's for the S&P500 is lacking.
The sum of squared errors (SSE), sum of absolute errors (SAE), and chi-squared goodness of fit ( 2 χ ) provide a basis for comparing non-nested specifications. The 2 χ statistic is asymptotically distributed as chi-square with degrees of freedom one less than the difference between the number of groups and estimated parameters. 12 The 2 χ goodness of fit statistic fails to reject the symmetric EGB2 (EGB2*), GT, EGB2, IHS, and SGT at the 5% level of significance.
Looking across all specifications, based on the SSE, SAE, and 2 χ criteria, the GT appears to provide the best fit.
In summary, the S&P500 data provides an example of the importance of using parametric specifications which are flexible enough to accommodate observed data characteristics. In each case considered the GARCH effects are statistically significant and yield conditional distributions for the standardized residuals with different characteristics than for the unconditional residuals.
That is, the distributions for the standardized residuals exhibit higher (lower) levels of skewness (leptokurtosis) than the non-standardized ones. Nevertheless, the conditional heteroskedasticity itself cannot fully account for the non-normality of the log-return series.

Summary and conclusions
This paper has reviewed three families of flexible parametric probability density functions: the skewed generalized t distribution, the exponential generalized beta of the second kind, and the inverse hyperbolic sine distribution. These distributional families include as limiting or special cases many common parametric distributions, including the normal. They allow one to quite flexibly model the first four moments of a distribution while maintaining the parsimony of a completely specified parametric model as is summarized in Figure 2.
These models can provide the basis for partially adaptive or QML estimation of many 13 12 The SAE, SSE, and The use of the flexible distributions to model GARCH specifications was illustrated using S&P500 stock return data. For this data set, the SGT, IHS, and EGB2-GARCH specifications provided statistically significant improvements relative to normal-GARCH specifications and with respect to non-GARCH specifications.

= + +
where ε t = y t -μ. Parentheses include robust standard errors for the estimates. SK-Residuals and KU-Residuals are the standardized sample skewness and kurtosis for the GARCH residuals computed as in Table 1. SK and KU are the theoretical skewness and kurtosis computed using the formulas in the Appendix. LR-GARCH is a log-likelihood ratio statistics for testing the significance of the GARCH specifications. It follows the χ 2 distribution with 2 d.f. LR-normal is a log-likelihood ratio statistic for testing the normal against the remaining eight probability specifications. It follows the χ 2 distribution with m d.f. (m = 1 for the T, GED and EGB2*, m = 2, for the GT, SGED, EGB2 and IHS, and m = 3, for the SGT). The last row presents the LR statistics for testing the T, GT and SGED against the SGT, the GED against the SGED, and the EGB2* against the EGB2. SSE is the sum of squared errors, SAE is the sum of absolute errors, and χ 2 is the chi-square goodness of fit. *Statistically significant at the 5% level. Critical values for the χ 2 distribution at the 5% are for 3. 84, 5.99, 7.81, 21.03, 22.36, 23.68, and 25.00 for 1,2, 3, 12, 13, 14, and 15 d.f., respectively.