Bayesian Estimation for the Centered Parameterization of the Skew-Normal Distribution

The skew-normal (SN) distribution is a generalization of the normal distribution, where a shape parameter is added to adopt skewed forms. The SN distribution has some of the properties of a univariate normal distribution, which makes it very attractive from a practical standpoint; however, it presents some inference problems. Specifically, the maximum likelihood estimator for the shape parameter tends to infinity with a positive probability. A new Bayesian approach is proposed in this paper which allows to draw inferences on the parameters of this distribution by using improper prior distributions in the “centered parametrization” for the location and scale parameter and a Beta-type for the shape parameter. Samples from posterior distributions are obtained by using the Metropolis-Hastings algorithm. A simulation study shows that the mode of the posterior distribution appears to be a good estimator in terms of bias and mean squared error. A comparative study with similar proposals for the SN estimation problem was undertaken. Simulation results provide evidence that the proposed method is easier to implement than previous ones. Some applications and comparisons are also included.


Introduction
The skew-normal distribution is a three parameter class of distribution with location, scale and shape parameters, and it contains the normal distribution when the shape parameter equals zero.A continuous random variable Z is said to obey the skew-normal law with shape parameter λ ∈ R and it is denoted by SN (λ) if its density function is: where φ(•) and Φ(•) denote the density and distribution functions of a standard normal variable.
If Y is a random variable defined by with ξ ∈ R, ω ∈ R + , then Y is said to have a skew-normal distribution with location-scale (ξ, ω) parameters and shape parameter λ, and it is denoted by Y ∼ SN D (ξ, ω, λ).The subscript D indicates the use of the "direct parametrization" (Azzalini 1985).The density function of Y is: The mean and variance of density in (1) are given by: The coefficient of skewness of Y is given by: where κ 2 , κ 3 are the second and third degree cumulants, respectively.The range of γ 1 is (−c 1 , c 1 ) where c 1 = √ 2(4 − π)/(π − 2) 3/2 ≈ 0.99527.
The maximum likelihood estimation of parameters is troublesome.Although the likelihood function can be calculated without much trouble, several problems arise on maximizing it.For example, if ξ = 0, ω = 1, and all the observations are positive (or negative), then the likelihood function is monotone, and its maximum occur on the upper (lower) boundary of the parameter space, corresponding to a non finite estimate of λ.In the case of unknown values for the parameters ξ, ω, λ, the problem can be even more difficult because there is always an inflexion point at λ = 0 for the likelihood function, leading to the Hessian singular (Chiogna 1998, Azzalini & Genton 2008).The second problem is solved by using a "centered parametrization" of the density function.The first is an open problem and some proposals already exist: Sartori's (2006) stand out, who uses a modification of the score function to estimate the λ parameter, combined with maximum likelihood estimators of ξ and ω.
Based on the Bayesian approach, a solution was proposed by Liseo & Loperfido (2006), who focused their inference on λ, considering ξ, ω as nuisance parameters in the direct parameterization.Wiper, Girón & Pewsey (2008) also studied the problem in the case of the half-normal distribution.
In this paper we propose a new Bayesian approach based on the "centered parametrization" to deal with the estimation of the shape parameter in the skewnormal family.The proposed methodology applies MCMC simulation techniques by using the Metropolis Hastings algorithm (Metropolis, Rosembluth, Teller & Teller 1953).
From a classical point of view, there are at least two reasons to consider the "centered parametrization": i) it provides a more practical interpretation of the parameters, and ii) it solves the well known problems of the likelihood function under direct parametrization.Arellano-Valle & Azzalini (2008) stated that the standard likelihood based methods and also the Bayesian methods are problematic when they are applied to inference on the parameters in the direct paramaerization near λ = 0.This is due to the fact that the direct paramerization is not numerically suitable for estimation.
The structure of this paper is the following: In section 2 a Bayesian method is proposed to obtain point estimates for the "centered parameters", which can be back transformed using the equations in (3) to obtain point estimates of the direct parameters.Section 3 contains results from a simulation study concerning the proposed methodology.Applications are presented in section 4. Some conclusions are included in section 5.

Bayesian Estimation
Let Y = (Y 1 , . . ., Y n ) be a random sample from the skew-normal distribution with parameters µ, σ, and γ 1 , and suppose we wish to obtain Bayesian estimators for the parameters.Following Azzalini (1985), Azzalini &Capitanio (1999), andPewsey (2000) we propose to use the "centered parametrization" to obtain Bayesian estimators for the parameters of interest.We propose the following prior distributions: Note that we have assigned the standard prior for location-scale parameters to µ and σ (Box & Tiao 1973).In the case of γ 1 , we know that it takes values on (−c 1 , c 1 ), so if W ∼ Beta(a, b), then the random variable γ 1 = 2c 1 W − c 1 has a density the kernel of which is shown above.The prior density for γ 1 depends on two hyperparameters (a and b) and could lead to a rich varieties of shapes, just as in the case of the Beta distribution.In this paper, we set a = b = 1, which leads to a uniform distribution on (−c 1 , c 1 ), but other values can be selected, allowing the incorporation of prior information.The uniform prior is non-informative, but proper.An invariant and non informative prior for γ 1 could be derived using an approach similar to that employed by Liseo & Loperfido (2006); however, that approach is complicated from computational point of view because it involves numerical integration in the n-dimensional space.The priors proposed in this work allow the implementation of the well known Metropolis-Hastings algorithm in order to sample the posterior density.As the sample size increases, the effect of the prior becomes less relevant.
Then, under the assumption of independence, the joint prior density in the "centered parameterization" is: Applying Bayes' theorem, from ( 2) and (3) the posterior joint distribution of µ, σ, γ 1 is: The implied prior distribution can be obtained on the parameters in the original parameter space by the random variable transformation formulae, for which the inverse transformation turns out to be: and its Jacobian is and, therefore the implied joint prior density in the original parametrization is given by This implied original parameterized joint prior density is quite different from the one used by Liseo & Loperfido (2006).
To obtain the marginal posterior distributions of interest p(µ | y), p(σ | y), p(γ 1 | y) it is necessary to use numerical based integration as the Markov Chain Monte Carlo techniques.In the case of the Gibbs sampler, it is necessary to have the complete conditionals p(µ | σ, γ 1 , y), p(σ | µ, γ 1 , y), p(γ 1 | µ, σ, y) to implement it; however, their closed forms are not available, therefore we propose to use the Metropolis-Hastings algorithm (e.g.Metropolis et al. 1953, Chib & Greenberg 1995).
To apply the Metropolis-Hastings algorithm, the candidate generating distribution has to be selected first.One has to be very careful in this step since an inadequate selection of such a distribution can cause the Metropolis-Hastings algorithm to have a poor performance due to a high rejection rate.Pewsey (2000) obtained large sample theory results for the moment's estimators from the "centered parametrization".For Y ∼ SN C (µ, σ, γ 1 ), let Revista Colombiana de Estadística 40 (2017) 123-140 where τ = (2/(4 − π)γ 1 ) 1/3 .By using the delta method, Pewsey (2000) obtained: where μ, σ, γ1 are the moment estimators of µ, σ and γ 1 .As a consequence of the definition of the moment estimators, Slutsky's Theorem and the Central Limit Theorem, the joint distribution of the estimators is asymptotically trivariate normal.We can use these results to select the multivariate normal distribution as the candidate generator in the Metropolis-Hastings algorithm.
To star the Metropolis-Hastings algorithm, the trivariate normal distribution N 3 ( θ, Σ) is used as a proposal density, where: and θ = (μ, σ, γ1 ) are the moment's estimators of (µ, σ, γ 1 ) in the "centered parametrization".The variances and covariances in (4) are obtained by plugging in the parameter estimates into the variance and covariance formulae given by Pewsey (2000).It is important to note that Σ in (4) can be adjusted in order to mimic the posterior distribution (Carlin & Louis 2000).In this paper, routines were written in the R software (R Core Team 2015) to estimate the posterior distributions of µ, σ, γ 1 .These routines are designed to update the covariance matrix after having obtained s Markov Chain Monte Carlo samples from the parameters of interest.
In this work we set s = 1, 000.The variance covariance matrix is estimated as: is adjusted, the N 3 ( θ, Σ) is used as the proposal distribution.Due to the fact that σ is positive and γ 1 is restricted to the interval (−c 1 , c 1 ), inadmissible values can be avoided simply by rejecting samples that do not meet these conditions.In order to verify the convergence of the Metropolis-Hastings algorithm, the Gelman & Rubin (1992) convergence test is used.

Simulation Study
In this section, we present results from a simulation study concerning the proposed Bayesian procedure described in the previous section.Samples of size n = 20, 50 and 100 were simulated from SN C (µ, σ, γ 1 ) for different values of µ, σ and γ 1 .A comparison with the method of moments (Pewsey 2000) and modified maximum likelihood estimators (Sartori 2006) in terms of the bias and the mean squared error is also included.For this comparison, the estimates in the "direct parameterization" were transformed to the "centered parameterization".
Next, the algorithm used to estimate the bias and mean squared error using the Bayesian approach is briefly described.
a) Generate a trivariate normal sample using the results of steps 4) and 5).
b) Reject the samples that do not meet the conditions: σ > 0 and γ 1 ∈ (−c 1 , c 1 ), or keep the samples that meet the conditions to select one with the Metropolis algorithm.
c) Repeat steps a) and b), B = 30, 000 times; update the variance-covariance matrix of the proposed distribution after 1,000 iterations.
Revista Colombiana de Estadística 40 (2017) 123-140 Table 1: Bias and mean squared error for Bayesian point estimators.The estimates for µ, σ, and γ1 were obtained from 5,000 simulated samples of size n from SNC (µ, σ, γ1).The mode of the marginal posterior densities was used as a point estimate of the true parameters.

Some Examples
In this section, we present three examples of the proposed Bayesian method.For each example, we estimate the marginal posterior densities of µ, σ, and γ 1 using three parallel Metropolis sampling chains; each are run for 50,000 iterations and a burn-in period of 25,000 iterations.In all the cases the proposal acceptance rate was at least 30%.Convergence was checked by inspecting trace plots and applying the Gelman & Rubin (1992) test.

Example 1: The Frontier Data
The data in this example corresponds to n = 50 random numbers generated from SN D (0, 1, 5) or equivalently SN C (0.7824, 0.6228, 0.8510).The data are available within the R's sn package (Azzalini 2016).The maximum of the likelihood function occurs on the upper boundary of the parameter space, corresponding to an infinite estimate of λ.So, one could expect to obtain an estimate near 0.99527 for γ 1 .
Figure 1 shows histograms of simulated draws from the posterior distribution for µ, σ, and γ 1 .Table 4 shows the posterior summaries for µ, σ, and γ 1 .If the posterior mode of the marginal posterior distribution is used for estimation purposes, then μ = 0.8873, σ = 0.7323, γ1 = 0.9682.Note that those point estimates are similar to those obtained with the method proposed by Sartori (2006), that is μ = 0.8811(0.08602),σ = 0.7295(0.0777),γ1 = 0.9481(0.0570),where the values in parentheses are the standard errors.Table 5 shows the result for the Gelman & Rubin's test.We used three parallel Metropolis sampling chains with different initial values.Approximated convergence is diagnosed when the upper C.I. limit is close to 1.

Example 2: Sartori's Data
The data in Table 6 are 20 random numbers from SN D (0, 1, 10) or equivalently SN C (0.7939, 0.6080, 0.9556) published in Sartori's article (2006).The usual maximum likelihood estimator for λ and that obtained by using Sartori's method are both finite.Figure 2 shows the histograms of simulated draws from the posterior distribution for µ, σ and γ 1 .Table 7 shows the posterior summaries for the mean, standard deviation, and the coefficient of skewness.When the posterior mode of the marginal posterior distribution is used as point estimate, then μ = 0.8627, σ = 0.6382, γ1 = 0.6179.Note that those point estimates are similar to those obtained with the method proposed by Sartori (2006): that is μ = 0.8812(0.2376),σ = 0.6338(0.1171),γ1 = 0.6289(0.3884).

Example 3: Half-Normal Case
Perhaps, one of the hardest cases to estimate the parameters of the SN D (0, 1, λ) is when λ is large.Because it is known that as λ tends to infinity, and the SN D (0, 1, λ) tends to the half-normal density, we test the proposed estimation procedure for the half-normal case and compared it with the results of Wiper et al. (2008).In this case, we expect to obtain an estimate near 0.9936 for γ 1 .
In Table 8, we present the simulation results for the proposed estimation method when a sample comes from a half-normal random variable with parameters ξ = 0, η = 1; HN (0, 1).This is approximately a SN C (0.7979, 0.6028, 0.99527).
The point estimate for γ 1 is very close to the expected value, 0.99527, when n = 50, 100; this result is a good one because if one graphed together the halfnor- mal and the SN C (0.7979, 0.6028, 0.99527) densities, then we would not be able to distinguish between the two densities.Results show that the bias and mean squared error of the γ1 tends to zero as n increases, which provides evidence that the proposed estimator is consistent (Table 8).Although the following is not a fair comparison because in the Wiper's case only two parameters are estimated while in the SN case three parameters are estimated, at least we have an idea how well the proposed estimator is working in this case.For the location parameter, the proposed estimator seems to be better since its mean squared error is smaller than the one proposed by Wiper et al.'s (2008).For the scale parameter, the Wiper et al.'s (2008) estimator seems to be better than the one proposed in this paper (Table 9).

Concluding Remarks
The simulation results in Section 3 provide evidence on the advantages of using the Metropolis Hastings algorithm to estimate the parameters of the skew−normal family.
The results from a simulation study show that the bias and the mean squared error decreases as the sample size increases.Also, it seems that the mode appears to be a precise syntetic index of the posterior distribution.
It turns out that the estimates provided by the new methodology for the shape parameter are finite in comparison with the estimates obtained by using the direct parameterization and the maximum likelihood method, which can lead to convergence problems.
Since we are using the Bayesian approach, it is possible to obtain HPD for the parameters of interest, similarly to Wiper et al. (2008).
where j indexes the Markov Chain Monte Carlo samples, θ = 1 n n j=1 θ j and θ j is the j-th Markov Chain Monte Carlo sample.Once the variance covariance-matrix Revista Colombiana de Estadística 40 (2017) 123-140

Figure 1 :
Figure 1: Histogram of simulated draws from the posterior distributions for a)µ, b)σ and c)γ1.

Figure 2 :
Figure 2: Histogram of simulated draws from the posterior distributions for a)µ, b)σ and c)γ1.

Table 3 :
Bias and mean squared error for moment estimates

Table 4 :
Posterior summary for the frontier data.

Table 8 :
Performance of the proposed estimation method when a sample comes from a HN (0, 1).The mean square error (mse) and bias are obtained when the mode of the marginal posterior densities is used as a point estimate of µ = 0.7979, σ = 0.6028, γ1 = 0.99527.Results are based on 5,000 samples of size n.