Variable reduction, sample selection bias and bank retail credit scoring

https://doi.org/10.1016/j.jempfin.2009.12.003Get rights and content

Abstract

This paper investigates the effect of including the customer loan approval process to the estimation of loan performance and explores the influence of sample selection bias in predicting the probability of default. The bootstrap variable reduction technique is applied to reduce the variable dimension for a large data-set drawn from a major UK retail bank. The results show a statistically significant correlation between the loan approval and performance processes. We further demonstrate an economically significant improvement in forecasting performance when taking into account sample selection bias. We conclude that financial institutions can obtain benefits by correcting for sample selection bias in their credit scoring models.

Introduction

The sub-prime mortgage lending crisis has shaken the financial stability of many developed countries. Many factors have caused the crisis but it has highlighted the importance of accurately assessing risks in bank retail lending. Accurate assessment of the probability of default in retail bank lending can help banks classify their customers, charge them appropriately, and improve the efficiency of credit funding. It can also result in banks having a competitive advantage over their rivals. Even a small improvement of predicting the probability of default can bring lenders substantial additional profits (see Blöchlinger and Leippold, 2006). Finally, precise risk assessment is important as the Basel Capital Accord (Basel II) requires the banking industry to meet minimum regulatory capital requirements based on the calculation of the probability of loan default, and therefore more accurate default assessment allows for more efficient utilisation of regulatory capital, a further source of competitive advantage.

The primary issue of credit scoring research has been to determine what variables significantly influence the probability of default. A second important issue in the construction of credit scoring cards is the shift from monitoring the loan performance process to the broader criteria that includes the loan approval decision process. This paper investigates the effect of including the customer loan approval process in the estimation of loan performance and explores the influence of sample selection bias in predicting the probability of default. We compare the forecasting performance of the bootstrap variable reduction procedure with single-stepwise. We expect bootstrap variable selection can reduce the likelihood of the incorrect inclusion of “noise” variables. As well as investigating further important issues in the probability of default the motivation for this paper lies in the continuing interest in sample selection bias analysis and, in particular, ongoing concerns regarding the prediction accuracy of sample selection bias.

In this paper we use a large personal loan data-set from one of the largest UK banks to conduct a bootstrap variable reduction simulation. We quantify the sample selection bias by taking account of the correlation between the loan approval process and the subsequent loan performance process. The size of our data-set allows us to reserve a large holdout sample. Thus we can further compare the benefits from comparing the forecast performance of a loan performance process which corrects for sample selection bias against an alternative. We find that the bootstrap variable selection procedure can choose more robust explanatory variables that forecast well out-of-sample when compared with single-stepwise variable selection procedure. In particular, the variables selected by the bootstrap simulation technique can enable us to infer that the UK bank providing our data applies a credit risk minimization lending policy. We confirm that there is a statistically significant correlation between the loan granting and performance processes. We show that there is an economically as well as statistically significant improvement in forecasting performance for out-of-sample when taking into account sample selection bias. Our results are important for financial institutions who can obtain benefits by correcting for sample selection bias in their credit scoring models by including the customer loan approval process in the estimation of loan performance process.

The organization of this paper is as follows. Section 2 provides some background and a brief review of the relevant literature on retail credit scoring. Section 3 introduces the data-set we use and explains our use of a bootstrap simulation technique. In Section 4, the multi-process probit model is specified for customer approval and loan performance processes. The estimation results are presented in Section 5 where we compare the out-of-sample forecasting performance. Section 6 discusses the implications of these results and summarises our conclusions.

Section snippets

Background and literature review

The primary issue of credit scoring research has been to determine the variables that significantly influence the probability of default (see Thomas, 2000 and a recent example is Dinh and Kleimeier, 2007).1 Typical bank retail loan databases have

Data description and bootstrap variable selection procedure

The unique data-set in this paper is supplied by one of the biggest commercial banks in the UK between the years of 1995 and 2003. The data are personal loans for the existing bank customers but excludes both mandatory-accept and automatic-decline customers.3

Model specification

The model is based on the propensity of standard probit models. The unit of observation is an individual customer i. One binary outcome for the loan approval process can be observed; either grants or rejects the loan request. The decision depends on a latent variable which determines customer i (i = 1, ⋯, n) unobserved credit quality Q1i*. A credit line will be decided by an implicitly defined threshold q, saying q = 0. Let Y1i = 1 if the loan request by customer i is rejected with Q1i*  0 and Y1i = 0

The statistical evidence on sample selection bias

To quantify the sample selection bias arising from non-random sample selection, we first examine the correlation of the two error terms in our multi-process model. We then present a detailed comparison of parameter estimates with and without the allowance of the sample selection bias. Finally, we examine predictive performance on the holdout data-set. Table 2 presents the estimated coefficients on all customer characteristics using a training data sample. The estimated correlation between the

Conclusion

This paper investigates the effect of including a customer loan approval process to the estimation of loan performance process and explores the influence of sample selection bias in predicting the probability of default. This paper also applies the bootstrap variable reduction technique to reduce the dimension of explanatory variables. The motivation for the study lies in the need of reducing explanatory variable dimensions for credit scoring research. Also there is continuing interest in

Acknowledgement

We thank the editor (F.C. Palm) and two anonymous referees, for their useful comments and suggestions. All errors are of course our own.

References (20)

There are more references available in the full text version of this article.

Cited by (27)

  • Deciphering big data in consumer credit evaluation

    2021, Journal of Empirical Finance
  • Credit rationing in P2P lending to SMEs: Do lender-borrower relationships matter?

    2020, Journal of Corporate Finance
    Citation Excerpt :

    The sample used in this paper is not randomly drawn from the applicant population, which implies that sample selection bias could be an issue. This has been studied in the seminal work by Boyes et al. (1989) and more recently by Jacobson and Roszbach (2003) and Marshall et al. (2010). If the platform's decision on which loans to select to publish online and which ones to reject is correlated with the error terms in my estimations, sample selection effects might be an issue.

  • Deep generative models for reject inference in credit scoring

    2020, Knowledge-Based Systems
    Citation Excerpt :

    The straightforward consequence is that the model parameters are biased [2], which has a statistical and economic impact [3,4]. Another consequence is that the default probability can be underestimated, affecting the risk premium and the profitability of the bank [5]. Hence, reject inference, which is the process of attempting to infer the true creditworthiness status of the rejected applications [6], has created a great deal of interest.

  • Would two-stage scoring models alleviate bank exposure to bad debt?

    2019, Expert Systems with Applications
    Citation Excerpt :

    CIBIL provides credit information as distinct from opinions and does not classify any client’s loan as being in default unless the lender has already classified it as such. While many research papers have discussed credit scoring models for developed countries (Akkoc, 2012; Bequé & Lessmann, 2017; Brown & Mues, 2012; Leow & Crook, 2016; Majeske & Lauer, 2013; Marshall, Tang, & Milne, 2010; Ono, Hasumi, & Hirata, 2014; Tong, Mues, & Thomas, 2012), relatively few have focused on building such models for developing and emerging markets (Abdou, 2009a, b; Abdou & Pointon, 2009; Abdou, Pointon, & El-Masry, 2008; Abdou, Tsafack, Ntim, & Baker, 2016; Bekhet & Eletter, 2014; Fernandes and Artes, 2016; Khashman, 2011; Louzada, Ferreira-Silva, & Diniz, 2012). While these have addressed a wide range of cases none, to the authors’ knowledge, have examined the Indian banking sector.

  • Validation of default probability models: A stress testing approach

    2016, International Review of Financial Analysis
    Citation Excerpt :

    Taking into account that different simulated situations lead to distinct behaviors of the metrics it is possible to analyze how the characteristics of the default phenomenon could influence a broad set of evaluation metrics. Consequently, we studied the traditional performance measures like KS, AUROC and AR (BCBS, 2005b; Hand, 2009; Keenan & Sobehart, 1999; Marshall, Tang, & Milne, 2010; Ostrowski & Reichling, 2011; Verbraken, Bravo, Weber, & Baesens, 2014) as well as other less common metrics like Pietra, BRIER, CIER, Kullback-Leibler (KL), Information Value (IV) and measure M (Joseph, 2005; BCBS, 2005b; Ostrowski & Reichling, 2011; Izzi, Oricchio, & Vitale, 2012). Our analysis of the validation techniques can be divided into two parts:

View all citing articles on Scopus
View full text