Asymmetric Regression Models Bernoulli/log Proportional-hazard Distribution

In this paper we introduce a kind of asymmetric distribution for non-negative data called log-proportional hazard distribution (LPHF). This new distribution is used to study an asymmetrical regression model for data with limited responses (censored) through the mixture of a Bernoulli distribution with logit link and the LPHF distribution. Properties of the LPHF distribution are studied, maximum likelihood parameter estimation and information matrices are addressed. An illustration with real data shows that the model is a new alternative for studies with positive data censored. Resumen En este artículo se introduce una forma de distribución asimétrica para datos no-negativos llamada distribución log hazard proporcional (LPHF). Esta nueva distribución es usada para estudiar un modelo de regresión asimé-trico para datos con respuestas limitadas (censuradas) a través de mezclas de una distribución Bernoulli con función link logit y la distribución LPHF. Propiedades de la distribución LPHF son estudiadas, se abordan las estima-ciones de máxima verosimilitud de los parámetros y las matrices de informa-ción. Se presenta una ilustración con datos reales, donde se muestra que el modelo propuesto es una nueva alternativa para estudios con datos positivos censurados.


Introduction
The fundamental law of geochemistry enunciated by Ahrens (1954), "the concentration of a chemical element in a rock is distributed log-normal", is an application of the log-normal distribution. This distribution is also widely used to model different types of information, including income in the economy and lifetime distributions from materials, among others.
In many of these situations, both the kurtosis and the asymmetry of the distribution are above or below the expected for the log-normal model, reason why it is necessary to think in a more flexible model that achieves such deviation in modeling positive data.
Its density function is given by: where ξ ∈ R, is a location parameter, η ∈ R + , is a scale parameter, λ is an asymmetry parameter, φ(·) is the density function of a standard normal distribution and Φ(·) is the respective cumulative distribution function. Notice that if λ = 0 then the ordinary log-normal distribution follows as it is the case with the ordinary skew-normal model. Also, the information matrix is singular, thus regularity conditions are no longer satisfied. One consequence of this fact is that likelihood ratio statistics is no longer distributed according to the central chi-square distribution (Arellano-Valle & Azzalini 2008).
Moreover, in many cases the asymmetrical positive random variable in the study is limited, and in turn this is explained by a set of auxiliary covariates X 1 , X 2 , ..., X p , thus extensions to the censured case with covariates should be addressed. The study of random variables with limited responders with covariates was presented by Tobin (1958) who studied the model popularly known as Tobit.
This model has been extensively studied in the case of normally distributed errors and is defined by considering that the observed random variable y i = max{y * i , 0} with y * i = x i β + i , i = 1, 2, . . . , n; where the error term i ∼ N (0, σ 2 ), i = 1, . . . , n, x i is a p × 1 vector of known independent variables and β is a p × 1 unknown parameter vector. Although the Tobit model is an alternative for censoring data; in some situations the proportion of censored data cannot be well explained by the normal model since the tail of this distribution is more or less heavier than the proportion of censored data.
For instance, Moulton & Halsey (1995) show an application with 330 children in Haiti during 1987-1990(see Job, Halsey, Boulos, Holt, Farrell, Albrecht, Brutus, Adrien, Andre, Chan, Kissinger, Boulos & the CiteSoleil/JHU 1991 which examines the; Immunogenecity of children before the implementation of a vaccine, here the number of observations below the detection limit was 86 observations (26.1%), which exceeds the expected value with a normal model, whereby the proportion of censure cannot be explained by the Tobit model. These authors use asymmetric models such as log-normal model.
In this sense, other works have been published, for instance, other distributions: such as the log-skew-normal has been implemented by Chai & Bailey (2008) and recently, Martínez-Florez, Bolfarine & Gómez (2013) implemented the model logalpha-power-normal.
The model proposed by Moulton & Halsey (1995) is a generalization of the Cragg (1971) model, that in the classical literature is known as the two-part model, which is an alternative to Tobit when the data rate below or above the threshold is quite different from the probability of the tail obtained with the normal model.
The probability density function of y i under Cragg (1971) model can be expressed as where p i is the probability determining the relative contribution made by the point distribution to the overall mixture distribution, f is a density function with positive support, and Given the nature of the random variables involved in the Cragg (1971) model, different processes determine the respective components of the model.
A positive response necessarily comes from f, on the other hand, a T value comes from the point mass distribution. This model, however, does not consider the situation of a lower limit and that part of the observations may be below this lower limit.
If allowed to some limiting responses are the result of interval censored to f , we have the generalization of the two-part model exposed by Moulton & Halsey (1995). This means that an observed T value can be either a realization from the point-mass distribution or a partial observation from f with critical value not precisely known but lying somewhat in (0, T ) for a small pre-specified constant T . Formally, where F is the cumulative distribution corresponding to f. If we vary the basic density f and the link function corresponding to p i , we can generate a large family of mixed models. Models such as probit/trucated-normal, logit/lognormal, logit/log-gamma, probit/log-skew-normal and logit/log-alpha-power normal have been considered in practical applications in biology, economy, agricultural and so on (Chai & Bailey 2008, Martínez-Florez, Bolfarine & Gómez 2013. Notice that for p i = 0, i = 1, . . . , n, Moulton & Halsey (1995) model reduces to the Tobit model (Tobin 1958). This is an extension of the log-normal distribution allowing for one extra parameter which will be presented in the next section.

Proportional Hazard Distribution
Recently, Martínez-Florez, Moreno-Arenas & Vergara-Cardozo (2013) introduced a new asymmetric model which is called proportional hazard model, this model is defined as follows: Let F be a continuous cumulative distribution function with probability density function f , and hazard function h = f /(1 − F ). We say that Z has a proportional hazard distribution associated with F , f and the parameter α > 0 if its probability density function is where α is a positive real number. We use the notation Z ∼ P HF (α). The distribution function of the P HF model is given by This is why this type of distribution can also be regarded as an exponentiated distribution or a fractional order statistic distribution, widely studied in the literature.
If Z is a random variable from a standard P HF (α) distribution then the location-scale extension of Z is obtained from the transformation X = ξ + ηZ, where ξ ∈ R and η ∈ R + , is a scale parameter.
In the particular case where F = Φ(·), we have the family of distributions called proportional hazard normal (PHN) and denoted P HN (ξ, η, α).
In Martínez-Florez, Moreno-Arenas & Vergara-Cardozo (2013), we can see the behavior of the P HN (0, 1, α) density and the model hazard function for some values of the α parameter.

Log Proportional-Hazard Distribution
Let Y be a random variable with support in R + , we say that Y follows a univariate log-proportional-hazard distribution with parameter α, if the transformed variable X = log(Y ) ∼ P HF (α). We denote Y ∼ LP HF (α).
Then, the pdf for the random variable Y can be written as where F is an absolutely continuous distribution function with density function f = dF . This model is called standard log proportional-hazard distribution. Let X ∼ P HF (ξ, η, α), where ξ ∈ R is a location parameter and η ∈ R + is a scale parameter. Hence, the transformation X = ln(Y ) leads to the location-scale log proportional-hazard model, with pdf given by We use the notation Y ∼ LP HF (ξ, η, α), so that LP HN (α) = LP HN (0, 1, α).

Its cumulative distribution function can be written as
According to (1), the inversion method can be used for generating from a random variable with distribution LP HF In the special case where f = φ(·) and F = Φ(·), the density and distribution functions of the standard normal distribution, respectively, we have the standard log proportional-hazard-normal distribution.
We will denote this extension by using the notation Y ∼ LP HN (ξ, η, α). Figure 1 shows the pdf's for the LPHN distribution for α equals 0.75, 1, 2 and 3. Is clearly seen that the shape of the distribution is affected when changes the value of α. For the log-normal case, when α = 1, the kurtosis is smaller than when α = 2 and, similarly, for the log-skew case, when α = 3. Furthermore, when α = 0.75 the kurtosis for the log-normal is greater. Asymmetry is always positive and also controlled by parameter α. Hence, α controls asymmetry as well as kurtosis for the LPHN distribution. The r-th moment for the random variable Y ∼ LP HN is calculated numerically. Using the results of the central momentsμ r , the coefficients of variation, asymmetry and kurtosis are obtained. Figure 2 shows the behavior of the mean and the coefficients of asymmetry and kurtosis of the LPHN model.
The survival and hazard functions for the LPHN model are, respectively, given by where r LN (·) is the hazard function of the log-normal distribution. Then, the hazard index T is proportional to the hazard index of the log-normal distribution.

Inference for Log Proportional-Hazard-Normal Model
For a random sample of size n, . The corresponding score equations are similar to the obtained in Martínez-Florez, Moreno-Arenas & Vergara-Cardozo (2013), we only need to consider the change in the log-likelihood function and obtain the MLE estimators using numerical methods.
The observed information matrix for location-scale PHN follows from minus the second derivatives of the log-likelihood function. This result is similar to that obtained by Martínez-Florez, Moreno-Arenas & Vergara-Cardozo (2013) but with minor changes due to the difference in the log-likelihood function.

Expected Information Matrix for the Location-Scale PHN
, the expected information matrix entries are: The expected values of the above variables are generally calculated using numerical integration. When α = 1, ϕ LΦ (x; ξ, η, 1) = 1 ηy φ log(y)−ξ η , the locationscale log-normal density function. Thus, the information matrix becomes Numerical integration shows that the determinant is |I(θ)| = 1 η 4 [2 − a 2 11 − 2a 2 01 ] = 0, so in the case of a log-normal distribution the model's information matrix is nonsingular. The upper left 2 × 2 submatrix is the log-normal distribution's information matrix.
For large n and under regularity conditions we havê θ A → N 3 (θ, I(θ) −1 ) and the conclusion follows thatθ is consistent and asymptotically approaches the normal distribution with I(θ) −1 as covariance matrix, for large samples.
This result shows that the information matrix for the LPHN model is nonsingular and therefore the inference for large samples can be made, contrary to the log-skew-normal model, whose information matrix is singular when λ = 0, that consequently resulting likelihood ratio statistic is not distributed as a chi-square.
Note that as in the LPHN model, the information matrix of the log-skewnormal model has the same structure or shape that the location-scale skew-normal model, SN (ξ, η, λ), where now Z = (log(y) − ξ)/η. Is well known and was demonstrated by Azzalini (1985), that the information matrix of the skew-normal model is singular when its parameter of asymmetry λ = 0.

Asymmetric Regression Model Logit/LPHN
We now extend the LPHN model to the case of random variables with a limit of detection and the presence of covariates. Specifically, we consider the case of models with limited response and excess zeros in the response variable. Considering extensions of the generalized the two-part model of Moulton & Halsey (1995) to the situations logit/log proportional hazard-normal model, jointly with covariates at each step of the model. Initially, we develop the case of censored random variables LPHN. Thus, calling p 0 the proportion of observations at or below threshold point T , the censored model LP HN (ξ, η, α) is represented by the probability density function Now we extend this model to the case of presence of covariates in limited response and when the response is not limited.
The above model can be extended to the situation where only a proportion 100p 0 % of censored observations comes from the censored LPHN, with the remaining 100(1 − p 0 )% of the observations coming from the population of low responders, located below or at the point T.
Modeling this mixture as the outcome of a Bernoulli random variable D with pr(D = 1) = 1 − p 0 while for D = 0, Y ≤ T with probability one. The contribution of y i to the likelihood conditioning on D = 1 when Y is assumed to follow a LPHN model can be written as Then, assuming that the response y i = T is explained by the set of explanatory variables X 11 , X 12 , . . . , X 1p , then we model this mixture as the outcome of a Bernoulli random variable with logit link function with and 1 − p 0i = 1 1 + exp (x (1)i β (1) ) where x (1)i = (1, x 1i1 , . . . , x 1ip ) , is a covariate vector of dimension p + 1 associated with the parameter vectors β (1) = (β 10 , β 11 , . . . , β 1p ) .
Taking into account the LPHN model, we have a covariate vector x (2) = (1, X 21 , X 22 , . . . , X 2r ) of dimension r, possibly different from x (1) and parameter vector β (2) = (β 20 , β 21 , . . . , β 2r ) , for which This mixture of distributions we will call "linear logistic regression model" with proportional hazard-normal distribution and will be denoted by The logarithm of the likelihood function for θ = (β (1) , β (2) , η, α) given X (1) , X (2) and Y, is given by We denote by 0 the sum over censored observations and 1 the sum over noncensored observations. The score function corresponding to the log-likelihood function is given by (for j = 1, 2, . . . , p and k = 1, 2, . . . , r) The system of equations obtained by equating the score to zero has no solution in closed form, and tends to be solved using iterative numerical methods.
The resulting equations require numerical procedures such as the Newton-Raphson or quasi-Newton method. These optimization algorithms can be found in the packages maxLik or optimx of the R software.
The observed information matrix is given by J(θ) = −H(θ) = − ∂ 2 (θ) ∂θ∂θ T , where H(θ) is the hessian matrix, which is obtained in the Appendix for the vector of parameters θ. In addition can be obtained information matrix defined as less n −1 times the expected value of the observed information matrix.

Numerical Illustration
The application of the logit/LPHN model, is carried out using the data described by Moulton & Halsey (1995) in a study of measles vaccines conducted in Haiti during 1987-1990. The detection limit was 0.1 international units (UI), or log(0.1) = −2.306 in the natural log-scale. The codification for the covariates involved in the study were X 1 = EZ (vaccine type; 0:Schwarz , 1:Edmonston-Zagreb); X 2 = HI (vaccine dose; 0:medium, 1:high) and X 3 = F EM (gender; 0:male, 1:female).
Such as Moulton & Halsey (1995), the aim in the present analysis is to study the immunogenicity differential between boys and girls using the logit/ log-proportionalhazard-normal (logit/LPHN) model.

Models
A variety of models can be adjusted given the covariates in the study. We adjust some of these models were carefully chosen from the cases studied by Moulton & Halsey (1995).
Model 1: Covariates and censored data in limited response, without censored data and covariates in the point-mass distribution located at zero; Model 2: Censored data and covariates in limited response, without covariates in the point-mass distribution located at zero; Model 3: Censored data, covariates in limited response and in the point-mass distribution located at zero; Model 4: Censored data, covariates in limited response and in the point-mass distribution located at zero, a particular model.
The summary statistics we have log(y) = −0.1793, s 2 = 1.1055, √ b 1 = 0.7521 and b 2 = 2.6286 where the quantities √ b 1 and b 2 correspond to the sample coefficients of asymmetry and kurtosis for values above 0.1. The high asymmetry degree indicated by the sample coefficient of asymmetry ( √ b 1 ) reveals that it seems worthwhile trying to fit an asymmetric model for this data set. Moulton & Halsey (1995), and Moulton & Halsey (1996) modeled this data using the hybrids logit/log-normal (logit/LN) and logit/log-gamma (logit/LGM) models.
We adjust the mixtures logit/LN and logit/LGM, for 1-4 models, finding in both cases the model 4 presents the best fit. The estimates for these models are given in the Table 1. Note that δ is the shape parameter of the LGM model. To compare model fit, we computed the AIC criterion (Akaike 1974). We started by fitting the censored LPHN model with covariates (Model 1). It is also fitted by the Bernoulli/LPHN model with covariates and logit link (models 2-4), for which the results are presented in the Table 2. According to the criterion AIC, the best fit clearly is presented by the hybrid logit/LPHN model.
In the case of the Bernoulli/LPHN model, we found that of all hybrid models fitted, the best is the Model 4.
In the continuous component we has that E(Y ) = X (2) β (2) since E( ) = 0. In order to have E(Y ) = X (2) β (2) we must correct the intercept taking β * (2)0 = β (0) + E( ), where ∼ LP N (0, η, α), That is, the corrected estimator for the intercept of the regression model corresponding to the continuous part. Therefore, for model 4, we found thatβ * (2)0 = −0.333. Here, covariates EZ and HI entered only in the Bernoulli component, and covariate FEM is the only associated with the LPHN component. Based on the Model 4, for those observations above the detection limit, the girls had exp(0.286) = 1.331, and hence greater measles antibody concentration than boys.
As mentioned at the beginning of this illustration, the goal was to show that the model censored logit/LPHN was a good alternative to adjust the data set vaccine now we are going to show that this model is indeed different from the model censored logit/LN, so, we test the hypothesis H 0 : α = 1 versus H 1 : α = 1 Using the likelihood ratio statistics, we have that −2 log(Λ) = −2(−511.18 + 480.72) = 60.91 which is greater than the 5% critical chi-square value 3.84, then we conclude that the logit/LPHN model fits the data better than the logit/LN model.
As a proof of good fit of the proposed model, we can confirm that the proportion of observations below the detection limit is 26.1% and the estimated proportion from model 2 with the hybrid model logit/LPHN is 25.90%.
Finally, in order to check the fit of the model estimates, we make the QQplot of the standardized residuals or scaled residuals of the continuous part, e i = (log(y i )− x (2)iβ (2) )/η based on the model and to the LN, LGM and LPHN distributions. Figure 3 presents QQplots for the scaled residuals.
Here, we can see that for vaccine data, the model LPHN fits better than the LN and LGM models, and thus, the mixed model logit/LPHN may be a new option to adjust censored data with covariates.

Conclusions
We proposed a new distribution that is used to study an asymmetrical regression model for data with limited responses through the mixture of a Bernoulli distribution with logit link and the LPHF distribution. Additionally, we made an illustration with real data and showed that the proposed model is an alternative for censored positive data.