Bayesian Semi-Parametric Logistic Regression Model with Application to Credit Scoring Data

: In this article a new Bayesian regression model, called the Bayesian semi-parametric logistic regression model, is introduced. This model generalizes the semi-parametric logistic regression model (SLoRM) and improves its estimation process. The paper considers Bayesian and non-Bayesian estimation and inference for the parametric and semi-parametric logistic regression model with application to credit scoring data under the square error loss function. The paper introduces a new algorithm for estimating the SLoRM parameters using Bayesian theorem in more detail. Finally, the parametric logistic regression model (PLoRM), the SLoRM and the Bayesian SLoRM are used and compared using a real data set.


Introduction
Semi-parametric regression models include regression models that combine parametric and nonparametric components.Semi-parametric regression models are often used in situations where the fully nonparametric model may not perform well, or when the functional form of a subset of the regressors or the density of the errors is not known.Semi-parametric regression models are a particular type of semi-parametric models containing a parametric component.So, they rely on parametric assumptions and may be misspecified and inconsistent as in a fully parametric model.
Semi-parametric models combine the flexibility of a nonparametric model with the advantages of a parametric model.A fully nonparametric model will be more robust than semiparametric and parametric models since it does not suffer from the risk of misspecification.However, nonparametric estimators suffer from low convergence rates, which deteriorate when considering higher order derivatives and multidimensional random variables.In contrast, the parametric models have the risk of misspecification, but if correctly specified they will normally enjoy √-consistency with no deterioration caused by derivatives and multivariate data.The basic idea of a semi-parametric model is to take the best of both models.
Many authors have tried to introduce algorithms to estimate the parameters of the semiparametric regression models.Majumdar and Eubank (2009) have studied the Bayesian semiparametric sales projections for the Texas lottery.Meyer et al. (2011) have introduced Bayesian estimation and inference for generalized partial linear models using shape-restricted splines.Zhang et al. (2014) have studied estimation and variable selection in partial linear single index models with error-prone linear covariates.Guo et al. (2015) have studied the empirical likelihood for the single index model with missing covariates at random.Bouaziz et al. (2015) have studied semi-parametric inference for the recurrent events process by means of a singleindex model.Yousof and Gad (2015) have introduced Bayesian estimation and inference for generalized partial linear model using some multivariate conjugate prior distributions.Finucane et al. (2015) have introduced a semi-parametric Bayesian density estimation with disparate data sources.
The aim of this paper is to propose a new method for estimating semi-parametric logistic regression model.This is Bayesian type algorithm of estimation.The rest of the paper is organized as follows.In Section 2, the generalized partial linear model (GPLM) is presented.Section 3, introduces the semi-parametric logistic regression model (SLoRM).In Section 4, the Bayesian estimation and inference for the SLoRM is presented.In Section 5, the proposed method is applied to credit scoring data.Finally, some concluding remarks are presented in Section 6.

The Generalized Partial Linear Model (GPLM)
Consider the generalized linear model (GLM), where G(.) is a known and monotone link function, and β is an unknown finite dimensional parameter.A semi-parametric generalized linear model known as generalized partial linear model is an extension of a generalized linear model defined in Eq. (1).The GPLM has the form where G(.) is a known link function.It is a semi-parametric model since it contains both parametric and nonparametric components.For a unit link linear function, the model in Eq. ( 2) can be reduced to the model E[Y|X, W] = Xβ + m(W).This model is called partial linear model (PLM).
In the generalized linear model (GLM) the nonlinear link function G(.) is fixed and monotone.In the generalized partial linear model (GPLM), a more complex relationship between the response and the regressors is defined by a non-monotone link function.
The estimation methods for the GPLM, in Eq. ( 2), are based on the idea that an estimator β ̂ is obtained for a known m(. ) , and an estimator m ̂(. ) is obtained for known β.The estimation method that will be considered are based on kernel smoothing methods in the estimation of the nonparametric component of the model.This method is known as the profile-likelihood method.
The profile likelihood method has been introduced by Severini and Wong (1992).It is based on assuming a parametric model for the conditional distribution of Y given X and W. The idea of this method is as follows: 1. Assume the parametric component of the model, i.e. the parameters vector β, β * say.2. Estimate the nonparametric component of the model depending on β * , i.e. m β (. ) using a smoothing method to obtain the estimator m ̂β(.). 3. Use the estimator m ̂β(. ) to construct profile likelihood for the parametric component using either a true likelihood or quasi-likelihood function.4. The profile likelihood function is then used to obtain an estimator of the parametric component of the model using a maximum likelihood method.Thus the profile likelihood method aims to separate the estimation process into two parts; the parametric part which is estimated by a parametric method, and the nonparametric part which is estimated by a nonparametric method.Murphy and Vaart (2000) show that the full likelihood method is not a good choice for semi-parametric models.In semi-parametric models the observed information, if it exits, would be an infinite-dimensional operator.They use profile likelihood rather than a full likelihood to overcome this problem.The algorithm for profile likelihood method is derived as follows.

Derivation of the likelihood functions
For the parametric component of the model, the objective function is the parametric profile likelihood function which is maximized to obtain an estimator of β.This function is given as where ℓ(. ) denotes the log-likelihood or quasi-likelihood function, , and ℓ i = ℓ(μ i,β , y i ).
For the nonparametric component of the model, the objective function is a smoothed or a local likelihood function which is given as where , and the local weight K H (w − W i ) is the kernel weight with  denoting a multidimensional kernel function and H is a bandwidth matrix.The function in Eq. ( 4) is maximized to obtain an estimator for the smooth function m β (w) at a point w.

Maximization of the likelihood functions
The maximization of the local likelihood in Eq. ( 4) requires solving the equations with respect to m β (t) and ℓ .(. ) denotes the first derivative of ℓ(. ).The maximization of the profile likelihood in Eq. ( 3) requires solving the equations with respect to the coefficient vector β.The vector ∇ m β (W i ) denotes the vector of all partial derivatives of m β with respect to β.A further differentiation of Eq. ( 5) with respect to p leads to an explicit expression for ∇ m β as follows: where ℓ .. denotes the second derivative of ℓ(G(η), y i ) The Eq. ( 5) and Eq. ( 6) can only be solved iteratively.Severini and Saitniswalis (1994) presented a Newton-Raphson type algorithm for this maximization as follows. Let and ℓ ij = ℓ(G(η ij ), y i ).Also, let ℓ i ., ℓ i .. , ℓ ij .and ℓ ij ..
be the first and second derivatives of ℓ i and ℓ ij with respect to their first argument, respectively.All the above values are calculated at the observations W i instead of the free parameter w.Then Eq.( 5) and Eq. ( 6) are transformed to respectively.The estimator of ∇ m j = ∇ m β (W j ) based on Eq. ( 7) is necessary to estimate β, where Eq. ( 8) and Eq. ( 9) imply the following iterative Newton-Raphson Type algorithm.The β can be updated as , where B is a Hessian type matrix defined as X ̃i T and The updating step for p can be summarized in a closed matrix form as follows: ℓ n .. ), and S p is a smoother matrix with elements . .

Third:
The updating step for m j = m β (w j ) The function m j = m β (w j ) is updated by ., where k=0, l, 2,…is the number of iteration.It is noted that the function ℓ ij .. (. ) can be replaced by its expectation with respect to Y to obtain a Fisher scoring type algorithm, Severini and Staniswalis (1994).
For the above procedure we can note the following: 1.The variable Z ̃ is a set of adjusted dependent variable.2. The parameter β is updated by a parametric method with a nonparametrically modified design matrix X ̃.

The function ℓ i
.. can be replaced by its expectation, with respect to y, to obtain a Fisher scoring type procedure.4. The updating step for m j is of quite complex structure and can be simplified in some models for identity and exponential link functions G.

The semi-parametric logistic regression model (SLoRM)
Consider the updating steps of estimating GPLM, using the profile-likelihood method which have considered in Section (2).The updated values of m  and  are: and .
(14) The estimators of the parametric component  and the nonparametric component m(w) will be developed when the response variable Y belongs to a binomial distribution.This will be shown in the univariate case model using the profile likelihood method.The iterative procedure that will be used is a Newton-Raphson type algorithm.

Updates of the nonparametric component
The Eq. ( 13) can be written as , where and

Updates of the parametric component
Eq. ( 14) can be written as where .. n i=1 U ̃i (k)2 , j=l,2, … ,n is the number of observations, and k=0, l, 2, is the iterations.The procedure will be iterated until convergence.

Bayesian estimation and inference for the SLoRM
Bayesian inference derives the posterior distribution as a consequence of two antecedents; a prior probability and a likelihood function derived from a probability model for the data to be observed.In Bayesian inference the posterior probability can be obtained according to the Bayes theorem as where P(β|y) is the posterior distribution, π(β) is the prior distribution, and L(y|β) is the likelihood function.
The proposed algorithm for estimating the SLoRM parameters as follows: 1. Obtain the probability distribution of response variable . 2. Obtain the likelihood function of the probability distribution of response variable . 3. Choose a suitable prior distribution of . 4. Use Eq. ( 15) to obtain the posterior distribution.5. Obtain the Bayesian estimator under the square error loss function.6. Replace the initial value of  by the Bayesian estimator.7. Use Profile likelihood method and the Newton-Raphson algorithm with the new initial value of  to estimate the SLoRM.Consider the SLoRM in Section (3) and suppose that y i ~Binomial(m, β) , ⇒P(Y i |m, β) ∝ β y (1 − β) m−y .We have four different cases, depending on the assumed prior distribution.These cases are described below.

Case 1: The general case
Suppose that the conjugate prior distribution of β is as π(β)~Beta(ξ, τ).Using Eq. ( 15), the posterior distribution is Beta(∑ y i .Then, the Bayesian estimator under the square error loss function is .

Case 3:
Suppose that we have no information aboutβ, then c Ignorance (Noninformative) prior distribution of β will be as follows Then the posterior distribution is Beta(∑ y i ).Then the posterior distribution is ) with expected value and variance . Then, the Bayesian estimator under the square error loss .
The proposed algorithm can be generalized for different types of semi-parametric regression models.For instance, Yousof and Gad (2015) used a similar algorithm for estimating the generalized partial linear model using a Bayesian approach.We consider the Wald statistic to show the significance of the parameters β ̂i.Also, we will use the odds ratio for facilitating the selection of the significant variables.
In case of the SLoRM, the proposed algorithm is appropriate if the profile likelihood method and the Newton-Raphson algorithm are applied.The SLoRM can be used if the response variable has only two values and if the explanatory variables are divided into two groups: the first group, the non-metric (categorical) covatiates and the second group, the metric (quantitative) covatiates.

Application: Credit Scoring Data
The credit cards industry has been growing rapidly recently.Thus, consumer credit data are collected by the credit department of banks.The credit scoring manager often evaluates the consumer's credit with intuitive experience.However, with the support of the credit classification model, the manager can accurately evaluate the applicant's credit score.Credit scoring represents a set of common techniques to decide whether a bank should grant a loan to an applicant (borrower) or not.Credit scoring data have metric and non-metric covariates; hence the SLoRM algorithm is very convenient to this data type.Also, all the parametric and nonparametric models cannot be used for the credit scoring data for the above reason.
The current data include 400 cases taken from the National Bank of Egypt in the period 2005 -2010.These data consist of 15 variables; 14 independent variables X's (covariates) and one dependent variable (Y).Each independent variable represents a question to the applicant.The first three covariates X 1 , X 2 and X 3 are quantitative variables whereas the remaining variables are categorical variables.The dependent variable Y is the credibility which takes two values; 1 means "credit-worthy" and 0 means "not credit-worthy".The variables and their values are presented in Table (1).The PLoRM, the SLoRM and the Bayesian SLoRM are applied to the data.The PLoRM model is as follows .
The model gives the probability that the response variable takes the value 1.The parameter estimates of PLoRM using maximum likelihood are presented in Table (2).
To test hypothesis about the parameters are H 0 : The null hypothesis is rejected if the P-value (Significance) of β ̂i < (∝= 0.05) Depending on these results there are four significant independent variables.These variables are saving (X 4 ); whether the applicant has saving, credit card (X 5 ); whether the applicant has a credit card, Guarantees (X 10 ); the guarantees to the applicant, and Purpose of credit (X 14 ).These variables influence the credit worthiness, according to our data.The remaining variables are insignificant.The three metric variables, age of the borrower, amount, and duration of the loan, are insignificant in all models.
The goodness of fit for the whole model is tested using the loglikelihood ratio test; the p-value is 0.046 which means that the model is significant at 5% significance level.
The second fitted model is the SLoRM which assumes that the probability of a good loan is given by , where G (. ) is the cumulative logistic function, X i is a vector of the variables X 4 , X 5 , … , X 14 and W i is a vector of the variables X 1 , X 2 X 3 .The parameter estimates and their standard errors are displayed in Table (2).From the results we can conclude that 5 variables are significant.These variables are X 5 , X 4 , X 10 , X 14 , and X 13 .The three metric variables, age of the borrower, amount, and duration of the loan, are insignificant in all models.
The third fitted model is SLoRM using Bayesian approach.As we presented in Section 4 there are 4 cases.Here we use the second case where we use the conjugate prior distribution and ~(, ),π(β)~Beta(1,1) .The other cases have been tried but we display results of case (2) in Table (4) for the sake of parsimony.It is found that the five categorical variables X 5 , X 4 , X 10 , X 14 and X 13 are the only significant variables, which influence the credit worthiness, according to our data.Three metric variables, age of the borrower, amount, and duration of the loan, are insignificant in all models.
The three models are compared and the results are displayed in Table (5).In general, SLoRM estimators are efficient than PLoRM estimators with small standard errors and deviance.The Bayesian SLoRM estimators are efficient than the PLoRM and the SLoRM estimators with small standard errors and deviance.Estimating credit worthiness using Bayesian SLoRM gives estimators with less deviance.

Conclusions
According to the PLoRM, it is found that the five categorical variables X 5 , X 4 , X 10 and X 14 are the only significant variables, which influence the credit worthiness, according to the current data.According to the SLoRM and the Bayesian SLoRM, it is found that the five categorical variables X 5 , X 4 , X 10 X 14 and X 13 are the only significant variables, which influence the credit worthiness, according to the current data.The three metric variables; the age of the borrower, the amount of the loan, and the duration of the loan are insignificant in all models.
The SLoRM estimators are efficient than the PLoRM estimators in sense they have smaller standard errors, deviance and smallest number of iterations.The Bayesian SLoRM estimators are efficient comparable to the PLoRM and the SLoRM estimators with smaller standard errors, deviance and mallest number of iterations.Estimating credit worthiness using Bayesian SLoRM gives estimations with less deviance and with smallest number of iterations.

Table ( 1
): The description of the study covariates

Table ( 3
): The results of the SLoRM Model

Table ( 5
): The -2loglikelihood of the three models