Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes

https://doi.org/10.1016/j.eswa.2010.02.101Get rights and content

Abstract

This paper describes a credit risk evaluation system that uses supervised neural network models based on the back propagation learning algorithm. We train and implement three neural networks to decide whether to approve or reject a credit application. Credit scoring and evaluation is one of the key analytical techniques in credit risk evaluation which has been an active research area in financial risk management. The neural networks are trained using real world credit application cases from the German credit approval datasets which has 1000 cases; each case with 24 numerical attributes; based on which an application is accepted or rejected. Nine learning schemes with different training-to-validation data ratios have been investigated, and a comparison between their implementation results has been provided. Experimental results will suggest which neural network model, and under which learning scheme, can the proposed credit risk evaluation system deliver optimum performance; where it may be used efficiently, and quickly in automatic processing of credit applications.

Introduction

Credit risk analysis is an important topic in financial risk management, and has been the major focus of financial and banking industry. Credit scoring is a method of predicting potential risk corresponding to a credit portfolio. Models based on this method can be used by financial institutions to evaluate portfolios in terms of risk. Data mining methods, especially pattern classification, using real-world historical data, is of paramount importance in building such predictive models (Yu, Wang, & Lai, 2008).

Due to financial crises and regulatory concerns of the Basel Committee on Banking Supervision, 2000, Basel Committee on Banking Supervision, 2005, a regulatory requirement was made for the banks to use sophisticated credit scoring models for enhancing the efficiency of capital allocation. The Basel Committee, comprised of central bank and banking business representatives from various countries, formulated broad supervisory standards and guidelines for banks to implement. Due to changes in the banking business, risk management practices, supervisor approaches, and financial markets, the committee published a revised framework as the new capital adequacy framework, also known as Basel II (Basel Committee on Banking Supervision, 2005). The commencement of the Basel II requirement, popularization of consumer loans and the intense competition in financial market has increased the awareness of the critical delinquency issue for financial institutions in granting loans to potential applicants (Li, Shiue, & Huang, 2006).

Credit scoring tasks can be divided into two distinct types (Laha, 2007, Li et al., 2006, Vellido et al., 1999). The first type is application scoring, where the task is to classify credit applicants into ‘‘good’’ and ‘‘bad’’ risk groups. The data used for modeling generally consists of financial information and demographic information about the loan applicant. In contrast, the second type of tasks deals with existing customers and along with other information, payment history information is also used here. This is distinguished from the first type because this takes into account the customer’s payment pattern on the loan and the task is called behavioral scoring. In this paper, we shall focus on application scoring.

In credit scoring; a scorecard model lists a number of questions (called characteristics) for loan applicants who provide their answers based on a set of possible answers (called attributes). As a credit scoring method, neural network models are quite flexible as they allow the characteristics to be interacted in a variety of ways. They consist of a group or groups of connected characteristics. A single characteristic can be connected to many other characteristics, which make up the whole complicated network structure. They outweigh decision trees and scorecards because they do not assume uncorrelated relations between characteristics. They also do not suffer from structural instability in the same way as decision trees because they may not rely on a single first question for constructing the whole network. However, the development of the network relies heavily on the qualitative data that are solicited to specify the interactions among all characteristics (Cheng, Chiang, & Tang, 2007).

The use of neural networks in business applications has been previously investigated by several works (Ahn et al., 2000, Baesens et al., 2005, Baesens et al., 2003a, Becerra-Fernandez et al., 2002, Hsieh, 2005, Huang et al., 2004, Huang et al., 2005, Lee and Chen, 2005, Lee et al., 2002, Malhotra and Malhotra, 2003, Min and Lee, 2005, Smith, 1999, Vellido et al., 1999, West, Dellana, and Qian, 2005). The general outcome of such works is that in the credit industry, neural networks have been considered to be accurate tool for credit analysis among others (Min & Lee, 2008).

Recently, the work in Lim and Sohn (2007) proposed a neural network-based behavioral scoring model which dynamically accommodates the changes of borrowers’ characteristics after the loans are made. This work suggested that the proposed model can replace the currently used static model to minimize the loss due to bad creditors. In (Martens, Baesens, Van Gestel, & & Vanthienen, 2007), an overview of rule extraction techniques for support vector machines when applied to medical diagnosis and credit scoring was presented. This work proposed also two rule extraction techniques taken from the artificial neural networks domain. In (Huang, Chen, & Wang, 2007), hybrid SVM-based credit scoring models were proposed to evaluate an applicant’s credit score from the applicant’s input features. This work used the Australian and German datasets in its implementation.

More recently, the work in Abdou, Pointon, and Elmasry (2008) investigated the ability of neural networks, such as probabilistic neural nets and multi-layer feed-forward nets, and conventional techniques such as, discriminant analysis, probit analysis and logistic regression, in evaluating credit risk in Egyptian banks applying credit scoring models. This work concluded that neural network models gave better average correct classification rates than the other techniques. However, in their neural network training and testing strategy, they used a high ratio of the dataset for training (80%), in comparison to validation (20%); which we consider as an imbalanced strategy when attempting to achieve meaningful neural network learning. In (Angelini, Di Tollo, & Roli, 2008), an application of neural networks to credit risk assessment related to Italian small businesses was described. This work presented two neural network systems, one with a standard feed-forward network, while the other with a special purpose architecture; and suggested that both neural networks can be very successful in learning and estimating the default tendency of a borrower, provided that careful data analysis, data pre-processing and training are performed.

In (Yu et al., 2008), a multistage neural network ensemble learning model was proposed to evaluate credit risk at the measurement level. The proposed model consisted of six stages: firstly, generating different training data subsets especially for data shortage, secondly, creating different neural network models with different training subsets obtained from the previous stage, thirdly, training the generated neural network models with different training datasets and obtaining the classification score, fourthly, selecting the appropriate ensemble members, fifthly, selecting the reliability values of the selected neural network models, and finally fusing the selected neural network ensemble members to obtain final classification result by means of reliability measurement. In (Tsai & Wu, 2008), the work investigated the performance of a single classifier as the baseline classifier to compare with multiple classifiers and diversified multiple classifiers by using neural networks based on three datasets. In (Setiono, Baesens, & Mues, 2008), a recursive algorithm for extracting classification rules from feed-forward neural networks that have been trained on credit scoring data sets; having both discrete and continuous attributes, was presented. Lately, in (Šušteršic, Mramor, & Zupan, 2009), Kohonen and error back-propagation neural networks were used as consumer credit scoring models for financial institutions where data usually used in previous research is not available. This work suggested that the error back-propagation neural network showed the best results. In (Lin, 2009), a three two-stage hybrid models of logistic regression-artificial neural network was proposed to construct a financial distress warning system suitable for Taiwan’s banking industry, and to provide an optimal model of credit risk for supervising authorities, analysts and practitioners in conducting risk assessment and decision making. In (Wang & Huang, 2009), a back propagation based neural network was used to classify credit applicants. In (Xu, Zhou, & Wang, 2009), a credit scoring algorithm-based on support vector machines, was proposed to decide whether a bank should provide a loan to the applicant.

In (Chuang & Lin, 2009), a reassigning credit scoring model (RCSM) involving two-stages was proposed. The classification stage is constructing an ANN-based credit scoring model, which classifies applicants with accepted (good) or rejected (bad) credits. The reassign stage is trying to reduce the Type I error by reassigning the rejected good credit applicants to the conditional accepted class by using the CBR-based classification technique.

In general, we can deduce that using neural networks for credit scoring and evaluation has been shown to be effective over the past decade. The capability of neural networks in such applications is due to the way the network operates, and the availability of training data. This is more evident when using multi-layer perceptron networks based on the back propagation learning algorithm (Haykin, 1999). When feeding the information from a credit applicant to the neural network, attributes (applicant’s answers to a set of questions (or characteristics)) are taken as the input to the neural network and a linear combination of them is taken with arbitrary weights. The attributes are linearly combined and subject to a non-linear transformation represented by a certain activation function (sigmoid function in this work), then fed as inputs into the next layer for similar manipulation. The final function yields values which can be compared with a cut-off for classification. Each training case is submitted to the network, the final output compared with the observed value and the difference, the error, is propagated back though the network and the weights modified at each layer according to the contribution each weight makes to the error value (Crook, Edelman, & Thomas, 2007). In essence the network takes data in attributes space, transforms it using the weights and activation functions into hidden value space and then possibly into further hidden value space; if further layers exist, and eventually into output layer space which is linearly separable.

Despite their successful application to credit scoring and evaluation, neural networks may not deliver robust “judgment” on whether an applicant should be granted credit or not. This problem arises from different reasons and partly depends on the chosen real world dataset for training and validating the trained neural network. Many of the previous works, which we described earlier on in this paper, suffer from problems despite the demonstrated successful implementations of the neural networks.

The first problem when using neural networks is the use of a high ratio of training-to-validation datasets. Depending on which dataset is used (the German credit dataset (Asuncion & Newman, 2007) is used in this work), a high ratio of training-to-validation data does not yield meaningful learning; for example, previously adopted ratios of training-to-validation (training:validation) datasets include: 80%:20% (Abdou, Pointon, and El-Masry, 2008, Li et al., 2006), 71%:29% (Boros et al., 2000), 68%:32% (Hsieh, 2005), 70%:30% (Baesens et al., 2003b, Hsieh, 2005, Kim and Sohn, 2004, Tsai and Wu, 2008), 69%:31% (Šušteršic et al., 2009), 67%:33.3% (Setiono et al., 2008), and 62%:38% (Atiya, 2001). A more appropriate ratio would be closer to (50%:50%) as used in (Sakprasat & Sinclair, 2007), or a lower ratio of training-to-testing dataset.

The second problem with using neural networks for credit evaluation is normalization of the input data. The values fed to the input layer of a neural network are usually between ‘0’ to ‘1’. This is not a problem when using a neural network for image processing for example, since all input values would be representing the image pixel values, which in turn have a more uniform distribution and a finite difference between the lowest and the largest pixel value (Khashman, 2008). However, with credit evaluation, the numerical values (input values) representing the attributes of a credit applicant vary marginally in value, and if a simple normalization process is applied to the whole dataset, say by dividing each value in the set by the largest recorded value, then much information would be lost across the different attributes. For example, the highest value recorded in the German dataset is 184 (case 916, attribute 4); if all values within the dataset are divided by this maximum value, much of the input data would be closer to ‘0’ value, which does not represent the attributes, thus leading to inefficient neural network training. Therefore, normalization of the credit application input data should be carefully performed, while maintaining the meaning of each attribute.

Another problem with using neural networks in financial applications is the computational cost. The simplest MLP neural network has three-layers (input, hidden and output). Much of the previously suggested neural networks for credit evaluation use two hidden layers. The problem here is the more layers are added, the higher the computational cost is, and thus, the higher the processing time becomes.

In this paper, we aim to address the above problems when designing neural network models with application to credit risk evaluation. Firstly, using the German credit dataset (Asuncion & Newman, 2007) that contains 1000 real world application–decision cases, we train three neural network models using nine learning schemes. The three neural models differ in topology, and in particular in the number of hidden layer neurons and learning and momentum rates. The nine learning schemes differ in the training-to-validation data ratios (or as we refer to them, learning ratios). The lower the ratio, the more challenging it is for a neural network, but the more robust and meaningful the learning is. We compare the performance of the neural network models under all schemes and then select the ideal neural model and learning scheme.

Secondly, we use a simple but efficient normalization procedure that is applied automatically when reading the input data for each numerical attribute value separately. This assures that the 24 input values representing the different 20 attributes are meaningful for the neural network after normalization. Thirdly, we maintain simplicity when designing the back propagation learning algorithm-based neural networks, by using a single hidden layer, and a single neuron at the output layer; thus minimizing the computational and time costs.

The structure of the paper is as follows: in Section 2 a brief explanation of the credit risk evaluation dataset is presented. In Section 3 the credit evaluation system is described; showing input data normalization procedure and the design strategy of the neural network models. In Section 4 the results of training and testing (validating) the neural models using the nine learning schemes are presented; and a comparison between the evaluation results is provided. Finally, Section 5 concludes this work and suggests future work.

Section snippets

Dataset for credit evaluation

For the implementation of our proposed credit evaluation system we use the German credit dataset; available publicly at UCI Machine Learning data repository (Asuncion & Newman, 2007). This real world dataset, which classifies credit applicants described by a set of attributes as good or bad credit risks, has been successfully used for credit scoring and evaluation systems in many previous works (Eggermont and Kok, 2004, Huang et al., 2006, Huang et al., 2007, Laha, 2007, Li et al., 2006, O’Dea,

The evaluation system

The neural network-based credit risk evaluation system consists of two phases: a data processing phase where each numerical value of the applicant’s attributes within the dataset is normalized separately; this is one of our objectives in this work. The output of this phase provides normalized numerical values representing a credit applicant’s case, which is used in the second phase; evaluating the applicant’s attributes and deciding whether to accept or reject the application using a neural

Implementation and experimental results

The results of implementing the credit risk evaluation neural network models were obtained using a 2.8 GHz PC with 2 GB of RAM, Windows XP OS and Borland C++ compiler. As one of our objectives is to investigate an ideal learning ratio, we follow nine learning schemes to train the neural network.

The learning schemes differ in the training-to-validation data ratio. For example, learning scheme 1 (LS1) uses a ratio of (100:900); i.e. the first 100 credit application cases are used for training the

Conclusions

This paper presented an investigation of the use of supervised neural network models for credit risk evaluation under different learning schemes. We also propose an efficient, fast and simple to use credit evaluation system, based on the results of our investigation. In our approach we trained three models of a three-layer supervised neural network; based on the back propagation learning algorithm, under nine learning schemes. These schemes differ in the ratio of the number of credit

References (47)

  • W. Huang et al.

    Forecasting stock market movement direction with support vector machine

    Computers and Operations Research

    (2005)
  • Z. Huang et al.

    Credit rating analysis with support vector machines and neural networks: A market comparative study

    Decision Support Systems

    (2004)
  • Y.S. Kim et al.

    Managing loan customers using misclassification patterns of credit scoring model

    Expert Systems with Applications

    (2004)
  • A. Laha

    Building contextual classifiers by integrating fuzzy rule based classification technique and k-nn method for credit scoring

    Advanced Engineering Informatics

    (2007)
  • T.S. Lee et al.

    A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines

    Expert Systems with Applications

    (2005)
  • T.S. Lee et al.

    Credit scoring using the hybrid neural discriminant technique

    Expert Systems with Application

    (2002)
  • S.T. Li et al.

    The evaluation of consumer loans using support vector machines

    Expert Systems with Applications

    (2006)
  • M.K. Lim et al.

    Cluster-based dynamic scoring model

    Expert Systems with Applications

    (2007)
  • S.L. Lin

    A new two-stage hybrid approach of credit risk in banking industry

    Expert Systems with Applications

    (2009)
  • R. Malhotra et al.

    Evaluating consumer loans using neural networks

    Omega

    (2003)
  • D. Martens et al.

    Comprehensible credit scoring models using rule extraction from support vector machines

    European Journal of Operational Research

    (2007)
  • J.H. Min et al.

    Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters

    Expert Systems with Applications

    (2005)
  • J.H. Min et al.

    A practical approach to credit scoring

    Expert Systems with Applications

    (2008)
  • Cited by (197)

    • Machine learning for predicting propensity-to-pay energy bills

      2023, Intelligent Systems with Applications
    View all citing articles on Scopus
    View full text