Would Two-Stage Scoring Models Alleviate Bank Exposure to Bad Debt?

The main aim of this paper is to investigate how far applying suitably conceived and designed credit scoring models can properly account for the incidence of default and help improve the decision-making process. Four statistical modelling techniques, namely, discriminant analysis, logistic regression, multi-layer feed-forward neural network and probabilistic neural network are used in building credit scoring models for the Indian banking sector. Notably actual misclassification costs are analysed in preference to estimated misclassification costs. Our first-stage scoring models show that sophisticated credit scoring models, in particular probabilistic neural networks, can help to strengthen the decision-making processes by reducing default rates by over 14%. The second-stage of our analysis focuses upon the default cases and substantiates the significance of the timing of default. Moreover, our results reveal that State of residence, equated monthly installment, net annual income, marital status and loan amount, are the most important predictive variables. The practical implications of this study are that our scoring models could help banks avoid high default rates, rising bad debts, shrinking cash flows and punitive cost-cutting measures.


Introduction
At a time when even the largest banks are not immune to distress, credit decision-making is crucially important. The Reserve Bank of India (RBI) and the Finance Ministry has thus far externally controlled and regulated the banking sector. Deregulation and the decoupling of state control pose new challenges, and intense competition is placing the survival of all but the fittest and the most efficient in doubt. Commercial banks are accordingly striving to adjust to a new economic and technological environment. Sound credit scoring models form an integral part of this adjustment process. This motivates our present purpose which is to propose suitably conceived and designed credit scoring models for personal loans with due allowance for the incidence of default.
The novel contribution of the present paper consists in integrating two stages of the decision process with reference to the Indian banking sector. Firstly, we build credit scoring models for our unique sample of personal loans, provided by one of the largest Indian banks. The sample includes a significant number of bad debts that is consonant with the current and evolving profile of personal indebtedness. Secondly, we explore in detail the characteristics of the defaulters in our sample. This feature is particularly important given the recent history of rising bad debt. In both stages, we identify the key predictor variables to be used in building models.
Further, we evaluate our models by using actual misclassification costs.

FIGURE (1) ABOUT HERE
Electronic copy available at: https://ssrn.com/abstract=3354373 The sharp increase in household leverage ratios in recent years shown in Figure 1a Table 1, against only 4.12% in the year ended March 2010. The rate slightly decreases in the next two years, 2012 and 2013, which is commensurate with the increase of non-performing assets reported on Indian banks' balance sheets (Financial Times, 2011). It should be emphasised that at the end of March 2014 retail credit has increased driven primarily by housing loans, personal loans and auto loans representing 47%, 36% and 14%, of gross credit respectively (RBI, 2014).  While many research papers have discussed credit scoring models for developed countries (Marshall, et al., 2010;Akkoc, 2012;Brown & Mues, 2012;Tong, et al., 2012;Majeske and Lauer, 2013;Ono, et al., 2014;Leow and Crook, 2016;Bequé and Lessmann, 2017), relatively few have focused on building such models for developing and emerging markets (Abdou, et al., 2008;Abdou, 2009a-b;Abdou and Pointon, 2009;Khashman, 2011;Bekhet and Eletter, 2014;Abdou, et al. 2016;Fernandes and Artes, 2016). While these have addressed a wide range of cases none, to the authors' knowledge, have examined the Indian banking sector. Given the sensitivity of data access is significant. Particularly, in the light of past financial crises, banks become increasingly risk reverse due to security and clients data protection laws. Small samples are widely used in building scoring models in the literature, as this issue is well recognised (see for example Paliwal and Kumar, 2009;Abdou and Pointon, 2011;Lessmann et al., 2015). For instance, consumer loan applications models are regularly Electronic copy available at: https://ssrn.com/abstract=3354373 built using around1,000 observations or less (see for example Kim and Sohn, 2004;Lee and Chen, 2005;Sustersic et al., 2009;Derelioğlu and Gürgen, 2011;Abdou, et al. 2016). In building scoring models, statistical techniques such as discriminant analysis and logistic regression are widely used (Tsai, et al., 2009;Akkoc, 2012;Wang, et al., 2012;Abdou, et al. 2014;Bekhet and Eletter, 2014;Abdou, et al., 2016). The logistic regression model does not necessarily require the assumptions of the discriminant analysis model and may prove to be more robust in practical applications.
In this paper four statistical modelling techniques are applied to analyse bank personal loans using a data-set provided by an Indian bank. As motivated by the above literature these are discriminant analysis, logistic regression, multi-layer feed-forward neural networks and probabilistic neural networks. Three different criteria namely correct classification rate, error Electronic copy available at: https://ssrn.com/abstract=3354373 rates and actual misclassification cost are used to compare the effectiveness and predictive capabilities of different models. Moreover, in this paper actual misclassification costs, provided by the bank's own credit officials, are used in preference to the more conventionally used estimated misclassification costs. This underscores the novelty of our contribution.
The layout of this paper is organised as follows: Section 2 reviews the current guidance note on credit risk management by RBI. Section 3 addresses research methodology and data sources.
Section 4 discusses the empirical results. Section 5 concludes and discusses the opportunities for further research.

Current credit risk management practices in Indian banks
In the 21 st Century banks are confronted with an increasingly complex combination of interdependent financial and non-financial risks. This includes credit, interest rate, liquidity issues, regulatory, reputational and operational risks. These risks need to be controlled and managed by banks' senior executives. Further, major decisions about whether or not to implement a centralised or decentralised structure to manage these risks are faced by banks all over the world. In India, banks have been guided by a centralised approach on their credit risk from the RBI "Guidance Note on Credit Risk Management" that was issued in 2002 1 . These guidelines recommend that banks need a credit risk framework that focuses on policy and strategy, organisational structure and systems, as discussed below.
Credit risk policy and strategy. Banks require a board-approved risk policy and strategy that clearly identifies how to manage the bank's lending portfolio. Strategic plans must establish the credit granting processes that will be utilised by the bank with due consideration for the target market and cost/benefit considerations. Organisational structure. Risk management committees and credit risk management departments are vital structural components in establishing successful risk systems that clearly identify accountability and ensure that responsibility flows from the Board of Directors down to lending officers.
Credit Risk Frameworks (CRFs) are used to avoid an overly simplistic approach to risk classification and a process that is used to formulate risk-ratings is as follows: 1. Identify all the principal business and financial risk elements.
2. Allocate weights to principal risk components.
3. Compare with weights given in similar sectors and check for consistency. 4. Establish the key parameters (sub-components of the principal risk elements). 5. Assign weights to each of the key parameters.
6. Rank the key parameters on the specified scale.
7. Arrive at the credit-risk rating on the CRF. 8. Compare with previous risk-ratings of similar exposures and check for consistency. 9. Conclude the credit-risk calibration on the CRF (RBI, 2015).
Credit risk modelling techniques encourage a more quantitative and less subjective approach to personal lending. These methods have enhanced the measurement of risk and performance in banks' lending portfolios. The modelling techniques suggested by the RBI Guidelines include econometric techniques, neural networks, optimisation models, rule-based or expert systems and hybrid systems. In this paper we explore the first two set of techniques (for details regarding the credit risk framework, see the Appendix). Credit risk models as described by RBI Guidance Notes encourage the statistical analysis of historical data including the Z-score model and Emerging Market Scoring (EMS) model (RBI, 2015).

Research methodology
The main aim of this paper is to investigate whether apposite credit scoring models can lead to more efficiently discriminating creditworthiness evaluation and ultimately towards lower default rates. At an early stage of this research we conducted structured interviews with key decision-makers in a number of private and foreign banks in India. This included state and regional sales managers, territory managers of personal loans, branch managers, credit approvals and credit default controllers. The importance of doing this was threefold. Firstly, these interviews enabled us to establish a list of explanatory variables, which are used as part of actual lending procedures. Secondly, the results of these interviews form a natural complement to the available academic literature. Thirdly, we were able to establish that there was no set method used in the evaluation of personal loan applications in India. In many cases a predominantly judgemental approach was employed.
In building our proposed scoring models we adopt a two-stage analysis and use four different statistical modelling techniques namely discriminant analysis, logistic regression, multi-layer Electronic copy available at: https://ssrn.com/abstract=3354373 feed-forward neural networks and probabilistic neural networks. In the first stage, we build our scoring models and, using actual misclassification costs, test the predictive capabilities of the various scoring models. In the second stage we focus upon the default cases, using 'customer began to default' as a dependent variable, and the same set of explanatory variables as used in the first stage of the analysis. Furthermore, a Variable Impact Analysis is conducted as part of the two stage analysis to identify the key determinants of both successful and defaulted cases.

Data collection and sampling procedures
In order to build our proposed credit scoring models, we use historical data comprising 2,093 personal loans supplied by one of the largest banks in India. Thus, given the data sensitivity, our sample size is in line with the previous literature (see for example, Lessmann et al., 2015;Paliwal and Kumar, 2009). The significance of our dataset is as follows. Firstly, based on literature reviews in Lessman et al. (2015) and Paliwal and Kumar (2009), our sample size appears to be in the top 20% of the published literature. Secondly, even when reported, larger sample sizes can be misleading. Often studies report results for multiple sub-samples. Though the average sub-sample size may be higher than our sample, it is common that several of the sub-samples may be significantly smaller than 2,000 observations (see e.g. Brown and Mues, 2012;Baesens et al. 2003;Lessman et al., 2015). Thirdly, our application is interesting and important in its own right due to its focus upon developing countries. Of the ten papers identified in Lessman et al. (2015) as having larger sample sizes than our own, seven focus upon developed countries. In terms of applications to developing countries larger samples are either derived from externally funded research projects (Lee et al., 2006;Huang et al., 2007) or, whilst slightly larger, are of a similar order of magnitude (Yap et al., 2011;2,765 cases).
Fourthly, it is important to recognise that our sample derives from a real-world credit scoring problem and data we ourselves collected. This stands in marked contrast to a small number of classical datasets that are regularly used in studies of credit scoring (see e.g. Table 3 in Lessman et al., 2015). Furthermore, our unique blind data set used in this paper covers a lending range from Rupees crore 50,000 to Rupees crore 100,800,000 for its customers from 2009 to 2014, of which 1,233 are considered good loans and the remainder 860 are bad loans. Having such a high percentage (41.09%) of bad loans, the dataset can be considered as 'pertinent' (see for example, Abdou et al., 2008).
The Indian bank provide 20 predictor variables which are mainly used in their decision making process. However, 6 predictors are excluded leaving 14 explanatory variables which are used Electronic copy available at: https://ssrn.com/abstract=3354373 in building the scoring models, as shown in Table 2. Having a 'land line' is a mandatory decision criterion, without which the application is declined. Similarly, the provision of legal documentation is mandatory. Both 'state" and 'pin code' (equivalent to a postal code in the UK or a zip code in the USA) are considerably highly correlated (i.e. 97.70%) and therefore pin code is excluded 2 . We also excluded both the 'starting and the ending actual year' as we use 'term' as an explanatory variable 3 . The 'customer begin to default' variable is excluded when building the scoring models in the first stage. However, this variable is used as a dependent variable when running the sensitivity analysis investigating the incidence of the default cases 4 , i.e. in the second stage, see Section 4.3.

TABLE (2) ABOUT HERE
In order to build our scoring models, Palisade Neural Tools, STATGRAPHICS Centurion XVI, IBM-SPSS Statistics 22 and R are used. We use a stratified 10-fold cross-validation technique to test the predictive capabilities of our scoring models. We randomise the data so that the percentage of bad customers in each group is the same, using R. The training set consists of 1883 cases (except for three folds, which consists of 1884 cases) and the hold-out set consists of 209 cases (except for three folds, which consists of 210 cases) 5 .

Discriminant Analysis
Discriminant analysis (DA) is a discrimination and classification technique, first popularised in bankruptcy prediction by Altman (1968). The following formula can be used for MDA: where, 2 Our sample includes over 200 'pin codes' which make it almost impossible to be used as a categorical explanatory variable, and it does not add any value to be used as a numerical explanatory variable. However, retaining 'state', as an explanatory variable, can capture any loan quality differences between the states. 3 Other Indian banks use a number of different variables as part of their credit evaluation which include, for example, length at current employment, spouse income and number of dependents.

4
Interestingly, there is a belief stated by credit officials in the Indian banking sector that there is no need to include variables such as guarantees, field visits and feasibility studies in their credit evaluation processes. 5 The correlation between the predictor variables are within an acceptable range i.e. <0.50.
Z represents the discriminant z-score, α is the intercept term, and i  is the respective coefficient in the linear combination of explanatory variables, , for i = 1 to n (see, for example, Abdou, 2009a).

Logistic Regression
Logistic Regression (LR) is a widely used statistical modelling technique, in which the probability of a dichotomous outcome is related to a set of predictor variables in the form: where, p is the probability of default, α is the intercept term, and i  represents the respective coefficient in the linear combination of predictor variables, , for i = 1 to n. The dependent variable is the logarithm of the odds ratio, Abdou et al. 2016).

Multi-Layer Feed-forward Network
It is convenient to use Multi-Layer Feed Forward Networks (MLFNs) to represent complex relationships between a set of variables. Figure 2 presents an example of a MLFN structure as follows:

FIGURE (2) ABOUT HERE
The following formula explains the MLFN function for two hidden layers:

Probabilistic Neural Network
A Probabilistic Neural Network (PNN) is primarily a classifier, mapping inputs to a number of classifications, which might be imposed into a more general function. Figure 3 presents an example of a PNN structure, as follows:

FIGURE (3) ABOUT HERE
The Bayesian probability density function, for the respective output from PNN pattern node, can be represented as follows (see, Abdou, 2009a): The conditional probability can be written as: for each class, using the basic Bayes' formula (see, Abdou, 2009a, p. 100).

Empirical results and analysis
We present descriptive statistics for our predictor variables followed by our two-stage results.
Stage one, focuses on presenting the results of the four statistical models (shown in Section 3.2.) using the 10-fold cross validation. Then we compare different statistical techniques results predictive capabilities using average classification rates, errors rates and actual misclassification costs. In addition, we present a ranking of the relative importance of the predictor variables. Stage two performs an additional sensitivity analysis of the default cases. Loan Amount ranges from Rupees crore 50,000 to Rupees crore 100,800,000; and Net

Descriptive statistics
Income ranges from Rupees crore 570,000 to Rupees crore 1,310,000.
where, ACR1 denotes the corresponding actual cost ratio associated with a Type I error; P(B/G) denotes the associated probability of a Type I error; π1 denotes the prior probability of good cases; ACR2 denotes the corresponding actual cost ratio associated with a Type II error; P(G/B) denotes the associated probability of a Type II error; π0 denotes the prior probability of bad cases.
These actual misclassification cost ratios that were provided, pre credit crunch, demonstrated a more favourable outlook in India with a 2006 ratio of 1.6:6.5 compared to previous studies (see for example, Abdou et al. 2008;Abdou, 2009b) who used a ratio of 1:5. However, the later figures used reflect a clear deterioration in the Indian lending climate with a ratio of 1.7:15 being used from 2011. This deterioration is confirmed by observations that the RBI raised interest rates to tame inflation and, due to worsening credit conditions, asked lenders to double their provisions for bad loans (see Financial Times, 2011;2015).
Furthermore, as an additional robustness test, for the two neural network models, namely PNN and MLFN, we run the 10-folds cross validation again, this time allowing the 10-folds to be chosen at random.  Table 4). Clearly, this suggests that AMC has significantly increased over time. This should motivate decision-makers to apply scoring models to reduce default rates.

Logistic regression
Results of the 10 LR scoring models hold-out sub-samples using a default cut-off score of 0.50, are shown in (see Table 5). Again, our results show notable increases in AMC over time. These results are in line with DA scoring models results shown in Section 4.2.1.

Multi-layer Feed-forward Networks
Tables 6 and 7 give the classification results for the 10 MLFN scoring models hold-out subsamples and the additional 10 MLFN scoring models based on random runs, respectively. As per the former, the ACCR ranges from 63.16% to 76.67% with an overall mean of 67.13%. respectively (see Table 6). As per the latter, our 10 MLFN scoring models based on random runs show slightly better results under each of the previous criteria. As shown in   Table 8). Results shown in Table 9

Comparison of different statistical scoring models
Comparing different models where the same 10-folds are used, neural network models, namely PNN and MLFN, outperform conventional models, namely DA and LR, used in this paper.  Table 9).
We then use a General linear model, which is a one-way Analysis of Variance (ANOVA), to investigate whether there are significant differences between different models for the scoring criteria outlined above 6 . The general linear model with categorical variables is formed by where, 6 The focus here is upon the hold-out sub-samples. (see for example, Bingham and Fry, 2010). Table 10 shows our results and there is an evidence of statistically significant differences between the scoring models for each criterion. The graphical illustration (see Figure 4) confirms the findings shown in Table 10.

Sensitivity analysis of default credits: Stage 2
The main aim of this stage is to shed light upon the default cases given that they constitute a relatively large proportion of the entire sample (over 41%, 860 out of a total of 2,093 cases).
We use a stratified 5-fold cross-validation technique to explain the timing of the incidence of default. We use the same four statistical modelling techniques shown in Section 3.2. We rerun additional 5-fold cross validation with folds randomly chosen by the software for both MLFN and PNN. However, it should be emphasised that the main focus of this section is to identify the key determinants of the incidence of default. Interestingly, in our sample, default occurs only in the first and second years, and none in later years. We randomise the data so that the percentage of bad customers who start to default in their first year and those who start to default Electronic copy available at: https://ssrn.com/abstract=3354373 in their second year are the same, using R. The training set consists of 688 cases and the holdout set consists of 172 cases.

Descriptive statistics for default customers
In building our scoring models, we use the same 14 explanatory variables, as shown in Table   2. However, the dependent variable used in this section is 'customer begin to default' replacing 'loan quality' in the original modelling. As to the five continuous predictors, Age ranges from 23 to 56 years old; EMI ranges from Rupees crore 1,469 to Rupees crore 469,920; Loan Amount ranges from Rupees crore 5,000 to Rupees crore 16,000,000; Net Income ranges from Rupees crore 570,000 to Rupees crore 1,250,000; and Term ranges from 2 to 4 years.

Importance of different variables for the default cases
It is crucial for decision-makers to become fully aware of the key determinants of the incidence of default, which in turn may reflect on their final decision. This can have a demonstrable impact on the loan quality and subsequently on the overall lending decision making process. In summary, and as part of our policy implications, recent news report that high default rates, rising bad debts and shrinking cash flows has led to enforced redundancies and the closure of a significant number of branches throughout India (Quartz India, 2015;Financial Times, 2015).
Thus, evidence clearly demonstrates that it would have been less costly for the bank had it adopted our credit scoring models rather than implementing their own strategic decisions to downsize. These lessons are not limited to the Indian bank that provided our loan data-set as confirmed by recent news that four major foreign banks have reduced their exposure to the Indian market (Quartz India, 2015;Financial Times, 2015).

Conclusions and areas for further research
The main aim of our paper is to use a two-stage analysis to investigate whether scoring models can efficiently distinguish the Indian banking clients' creditworthiness, and reduce default rates. Working alongside the bank, our fresh contribution includes the incorporation of actual misclassification costs when evaluating our models. Our statistically rigorous analysis also stands in marked contrast to the predominantly subjective approach the bank were using to make lending decisions. In building our models we use four statistical modelling techniques namely discriminant analysis, logistic regression, multi-layer feed-forward neural network and probabilistic neural network. This is combined with a bespoke data-set with a default rate of over 41%. STATE level effects are also prevalent in the incidence of default. This suggests that, in practice, greater care needs to be exercised when granting loans to clients from different states.
In summary, by applying our proposed scoring models to the Indian banking sector, and alongside successful implementation, we argue that the challenges facing the Indian market could be significantly reduced. In particular, our best scoring models can significantly reduce our sample default rate by 14.24% (i.e. 41.09%, the original default rate -26.85%, default rate using PNNran). Inter alia problems such as increasing interest rates in an attempt to restructure default debt, inflation and the increased cost of banks' debt could be mitigated. Other consequences of the high default rates have been the redundancy and branch-closure policies that some Indian banks followed in an attempt to cut costs. We submit that some of these costcutting measures could thus ultimately have been avoided.
In terms of the theory of expert and intelligent systems our proposed two-stage approach forms a natural complement to previous neural network (Gaganis et al., 2007;Öğüt et al., 2009) and hybrid (Li et al., 2016) modelling of credit risk. We also show that methods such as neural networks can lead to better assessments of credit risk than classical statistical methods (Abdou, et al. 2016;Abellán and Castellano, 2017). Beyond reproducing aspects of real decisionmaking our results show that neural network models can lead to improved financial decisionmaking in industrial applications. In particular, neural network models may be particularly useful when the distribution of instances in the dataset is unbalanced (Zhao et al., 2015) or information is scarce (Falavigna, 2012).
There are a number of opportunities for further work. This includes the application of additional techniques and their possible combination into integrated models with larger sample sizes. In particular, gene expression programming, fuzzy algorithms, proportional hazard models and SVM etc. Limitations of our study include potential concerns over the accuracy of industrystandard costings and the need for high computational efficiency in industrial-sized financial applications (see for example Zhao et al., 2015). Results may also be sensitive to the economic conditions associated with the timing of the business cycle (see for example Derelioğlu and Gürgen, 2011). However, recent financial turbulence in India suggests extending our study to other products including credit cards, business loans and mortgages would also be extremely timely.

Appendix: grading system for calibration of credit risk
In this section, we discuss the rating scales and weighted scoring systems as typically applied in the lending departments of Indian banks.

Rating scales:
(i) Numerical values from 1 to 9 are utilised in rating scales with 1 to 5 representing levels of acceptable credit risk as shown in Table A1 below, and 6 to 9 representing unacceptable credit risk (RBI, 2015). Weighted scoring systems: weighted systems apply a score or grade for risk profiling with suitably applied percentages assigned to each of the risk-ratings to produce a weighted average risk-rating. The example as shown in Table A2 below would be considered as a potentially low-risk rating: Clearly the problem is how the Credit Risk Framework (CRF) assigns those weightings. In this paper and as a starting point we are assigning weightings for personal loans based on advanced statistical techniques such as neural networks to avoid any subjective bias in assigning these weightings.  (2009/10, 2010/11, 2012/13), adapted. *Numbers for these years are converted from billion to crore.   Abdou et al., 2016). Notation: GG refers to actual good cases, predicted as good cases; BB refers to actual bad cases, predicted as bad cases; TE refers to total error rates (Type I plus Type II).        Notation: this Figure presents a structure of a number of independent predictor variables for MLFN. This network is configured to have a larger number of nodes in the second hidden layer compared to the first hidden layer. The output at a given layer (for example, second hidden layer) may be expressed as a connection-weighted summation of outputs from the previous layer (for example, first hidden layer) plus a neuron-bias (a parameter assigned to each neuron). Arriving at a neuron in the output layer, the value from each hidden layer neuron is multiplied by a weight, and the resulting weighted values are added together. Then, a conversion function for the output layer produces Y values as outputs of the network (Abdou, 2009a, p. 101  Then, each of these values pass to each of the nodes in the summation layer, which is a function of the distance in the smoothing factors. One node per dependant variable is in the summation layer, each node computes a weighted average using the training cases in that category. The summation layer output values can be interpreted as a probability weighting associated with each class. Finally, the output node selects the category with the highest probability weighting as the predicted category (Abdou, 2009a, p. 99