Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes

doi:10.1016/j.eswa.2010.02.101

Expert Systems with Applications

Volume 37, Issue 9, September 2010, Pages 6233-6239

https://doi.org/10.1016/j.eswa.2010.02.101 Get rights and content

Abstract

This paper describes a credit risk evaluation system that uses supervised neural network models based on the back propagation learning algorithm. We train and implement three neural networks to decide whether to approve or reject a credit application. Credit scoring and evaluation is one of the key analytical techniques in credit risk evaluation which has been an active research area in financial risk management. The neural networks are trained using real world credit application cases from the German credit approval datasets which has 1000 cases; each case with 24 numerical attributes; based on which an application is accepted or rejected. Nine learning schemes with different training-to-validation data ratios have been investigated, and a comparison between their implementation results has been provided. Experimental results will suggest which neural network model, and under which learning scheme, can the proposed credit risk evaluation system deliver optimum performance; where it may be used efficiently, and quickly in automatic processing of credit applications.

Introduction

Credit risk analysis is an important topic in financial risk management, and has been the major focus of financial and banking industry. Credit scoring is a method of predicting potential risk corresponding to a credit portfolio. Models based on this method can be used by financial institutions to evaluate portfolios in terms of risk. Data mining methods, especially pattern classification, using real-world historical data, is of paramount importance in building such predictive models (Yu, Wang, & Lai, 2008).

Due to financial crises and regulatory concerns of the Basel Committee on Banking Supervision, 2000, Basel Committee on Banking Supervision, 2005, a regulatory requirement was made for the banks to use sophisticated credit scoring models for enhancing the efficiency of capital allocation. The Basel Committee, comprised of central bank and banking business representatives from various countries, formulated broad supervisory standards and guidelines for banks to implement. Due to changes in the banking business, risk management practices, supervisor approaches, and financial markets, the committee published a revised framework as the new capital adequacy framework, also known as Basel II (Basel Committee on Banking Supervision, 2005). The commencement of the Basel II requirement, popularization of consumer loans and the intense competition in financial market has increased the awareness of the critical delinquency issue for financial institutions in granting loans to potential applicants (Li, Shiue, & Huang, 2006).

Credit scoring tasks can be divided into two distinct types (Laha, 2007, Li et al., 2006, Vellido et al., 1999). The first type is application scoring, where the task is to classify credit applicants into ‘‘good’’ and ‘‘bad’’ risk groups. The data used for modeling generally consists of financial information and demographic information about the loan applicant. In contrast, the second type of tasks deals with existing customers and along with other information, payment history information is also used here. This is distinguished from the first type because this takes into account the customer’s payment pattern on the loan and the task is called behavioral scoring. In this paper, we shall focus on application scoring.

In credit scoring; a scorecard model lists a number of questions (called characteristics) for loan applicants who provide their answers based on a set of possible answers (called attributes). As a credit scoring method, neural network models are quite flexible as they allow the characteristics to be interacted in a variety of ways. They consist of a group or groups of connected characteristics. A single characteristic can be connected to many other characteristics, which make up the whole complicated network structure. They outweigh decision trees and scorecards because they do not assume uncorrelated relations between characteristics. They also do not suffer from structural instability in the same way as decision trees because they may not rely on a single first question for constructing the whole network. However, the development of the network relies heavily on the qualitative data that are solicited to specify the interactions among all characteristics (Cheng, Chiang, & Tang, 2007).

The use of neural networks in business applications has been previously investigated by several works (Ahn et al., 2000, Baesens et al., 2005, Baesens et al., 2003a, Becerra-Fernandez et al., 2002, Hsieh, 2005, Huang et al., 2004, Huang et al., 2005, Lee and Chen, 2005, Lee et al., 2002, Malhotra and Malhotra, 2003, Min and Lee, 2005, Smith, 1999, Vellido et al., 1999, West, Dellana, and Qian, 2005). The general outcome of such works is that in the credit industry, neural networks have been considered to be accurate tool for credit analysis among others (Min & Lee, 2008).

Recently, the work in Lim and Sohn (2007) proposed a neural network-based behavioral scoring model which dynamically accommodates the changes of borrowers’ characteristics after the loans are made. This work suggested that the proposed model can replace the currently used static model to minimize the loss due to bad creditors. In (Martens, Baesens, Van Gestel, & & Vanthienen, 2007), an overview of rule extraction techniques for support vector machines when applied to medical diagnosis and credit scoring was presented. This work proposed also two rule extraction techniques taken from the artificial neural networks domain. In (Huang, Chen, & Wang, 2007), hybrid SVM-based credit scoring models were proposed to evaluate an applicant’s credit score from the applicant’s input features. This work used the Australian and German datasets in its implementation.

More recently, the work in Abdou, Pointon, and Elmasry (2008) investigated the ability of neural networks, such as probabilistic neural nets and multi-layer feed-forward nets, and conventional techniques such as, discriminant analysis, probit analysis and logistic regression, in evaluating credit risk in Egyptian banks applying credit scoring models. This work concluded that neural network models gave better average correct classification rates than the other techniques. However, in their neural network training and testing strategy, they used a high ratio of the dataset for training (80%), in comparison to validation (20%); which we consider as an imbalanced strategy when attempting to achieve meaningful neural network learning. In (Angelini, Di Tollo, & Roli, 2008), an application of neural networks to credit risk assessment related to Italian small businesses was described. This work presented two neural network systems, one with a standard feed-forward network, while the other with a special purpose architecture; and suggested that both neural networks can be very successful in learning and estimating the default tendency of a borrower, provided that careful data analysis, data pre-processing and training are performed.

In (Yu et al., 2008), a multistage neural network ensemble learning model was proposed to evaluate credit risk at the measurement level. The proposed model consisted of six stages: firstly, generating different training data subsets especially for data shortage, secondly, creating different neural network models with different training subsets obtained from the previous stage, thirdly, training the generated neural network models with different training datasets and obtaining the classification score, fourthly, selecting the appropriate ensemble members, fifthly, selecting the reliability values of the selected neural network models, and finally fusing the selected neural network ensemble members to obtain final classification result by means of reliability measurement. In (Tsai & Wu, 2008), the work investigated the performance of a single classifier as the baseline classifier to compare with multiple classifiers and diversified multiple classifiers by using neural networks based on three datasets. In (Setiono, Baesens, & Mues, 2008), a recursive algorithm for extracting classification rules from feed-forward neural networks that have been trained on credit scoring data sets; having both discrete and continuous attributes, was presented. Lately, in (Šušteršic, Mramor, & Zupan, 2009), Kohonen and error back-propagation neural networks were used as consumer credit scoring models for financial institutions where data usually used in previous research is not available. This work suggested that the error back-propagation neural network showed the best results. In (Lin, 2009), a three two-stage hybrid models of logistic regression-artificial neural network was proposed to construct a financial distress warning system suitable for Taiwan’s banking industry, and to provide an optimal model of credit risk for supervising authorities, analysts and practitioners in conducting risk assessment and decision making. In (Wang & Huang, 2009), a back propagation based neural network was used to classify credit applicants. In (Xu, Zhou, & Wang, 2009), a credit scoring algorithm-based on support vector machines, was proposed to decide whether a bank should provide a loan to the applicant.

In (Chuang & Lin, 2009), a reassigning credit scoring model (RCSM) involving two-stages was proposed. The classification stage is constructing an ANN-based credit scoring model, which classifies applicants with accepted (good) or rejected (bad) credits. The reassign stage is trying to reduce the Type I error by reassigning the rejected good credit applicants to the conditional accepted class by using the CBR-based classification technique.

In general, we can deduce that using neural networks for credit scoring and evaluation has been shown to be effective over the past decade. The capability of neural networks in such applications is due to the way the network operates, and the availability of training data. This is more evident when using multi-layer perceptron networks based on the back propagation learning algorithm (Haykin, 1999). When feeding the information from a credit applicant to the neural network, attributes (applicant’s answers to a set of questions (or characteristics)) are taken as the input to the neural network and a linear combination of them is taken with arbitrary weights. The attributes are linearly combined and subject to a non-linear transformation represented by a certain activation function (sigmoid function in this work), then fed as inputs into the next layer for similar manipulation. The final function yields values which can be compared with a cut-off for classification. Each training case is submitted to the network, the final output compared with the observed value and the difference, the error, is propagated back though the network and the weights modified at each layer according to the contribution each weight makes to the error value (Crook, Edelman, & Thomas, 2007). In essence the network takes data in attributes space, transforms it using the weights and activation functions into hidden value space and then possibly into further hidden value space; if further layers exist, and eventually into output layer space which is linearly separable.

Despite their successful application to credit scoring and evaluation, neural networks may not deliver robust “judgment” on whether an applicant should be granted credit or not. This problem arises from different reasons and partly depends on the chosen real world dataset for training and validating the trained neural network. Many of the previous works, which we described earlier on in this paper, suffer from problems despite the demonstrated successful implementations of the neural networks.

The first problem when using neural networks is the use of a high ratio of training-to-validation datasets. Depending on which dataset is used (the German credit dataset (Asuncion & Newman, 2007) is used in this work), a high ratio of training-to-validation data does not yield meaningful learning; for example, previously adopted ratios of training-to-validation (training:validation) datasets include: 80%:20% (Abdou, Pointon, and El-Masry, 2008, Li et al., 2006), 71%:29% (Boros et al., 2000), 68%:32% (Hsieh, 2005), 70%:30% (Baesens et al., 2003b, Hsieh, 2005, Kim and Sohn, 2004, Tsai and Wu, 2008), 69%:31% (Šušteršic et al., 2009), 67%:33.3% (Setiono et al., 2008), and 62%:38% (Atiya, 2001). A more appropriate ratio would be closer to (50%:50%) as used in (Sakprasat & Sinclair, 2007), or a lower ratio of training-to-testing dataset.

The second problem with using neural networks for credit evaluation is normalization of the input data. The values fed to the input layer of a neural network are usually between ‘0’ to ‘1’. This is not a problem when using a neural network for image processing for example, since all input values would be representing the image pixel values, which in turn have a more uniform distribution and a finite difference between the lowest and the largest pixel value (Khashman, 2008). However, with credit evaluation, the numerical values (input values) representing the attributes of a credit applicant vary marginally in value, and if a simple normalization process is applied to the whole dataset, say by dividing each value in the set by the largest recorded value, then much information would be lost across the different attributes. For example, the highest value recorded in the German dataset is 184 (case 916, attribute 4); if all values within the dataset are divided by this maximum value, much of the input data would be closer to ‘0’ value, which does not represent the attributes, thus leading to inefficient neural network training. Therefore, normalization of the credit application input data should be carefully performed, while maintaining the meaning of each attribute.

Another problem with using neural networks in financial applications is the computational cost. The simplest MLP neural network has three-layers (input, hidden and output). Much of the previously suggested neural networks for credit evaluation use two hidden layers. The problem here is the more layers are added, the higher the computational cost is, and thus, the higher the processing time becomes.

In this paper, we aim to address the above problems when designing neural network models with application to credit risk evaluation. Firstly, using the German credit dataset (Asuncion & Newman, 2007) that contains 1000 real world application–decision cases, we train three neural network models using nine learning schemes. The three neural models differ in topology, and in particular in the number of hidden layer neurons and learning and momentum rates. The nine learning schemes differ in the training-to-validation data ratios (or as we refer to them, learning ratios). The lower the ratio, the more challenging it is for a neural network, but the more robust and meaningful the learning is. We compare the performance of the neural network models under all schemes and then select the ideal neural model and learning scheme.

Secondly, we use a simple but efficient normalization procedure that is applied automatically when reading the input data for each numerical attribute value separately. This assures that the 24 input values representing the different 20 attributes are meaningful for the neural network after normalization. Thirdly, we maintain simplicity when designing the back propagation learning algorithm-based neural networks, by using a single hidden layer, and a single neuron at the output layer; thus minimizing the computational and time costs.

The structure of the paper is as follows: in Section 2 a brief explanation of the credit risk evaluation dataset is presented. In Section 3 the credit evaluation system is described; showing input data normalization procedure and the design strategy of the neural network models. In Section 4 the results of training and testing (validating) the neural models using the nine learning schemes are presented; and a comparison between the evaluation results is provided. Finally, Section 5 concludes this work and suggests future work.

Section snippets

Dataset for credit evaluation

For the implementation of our proposed credit evaluation system we use the German credit dataset; available publicly at UCI Machine Learning data repository (Asuncion & Newman, 2007). This real world dataset, which classifies credit applicants described by a set of attributes as good or bad credit risks, has been successfully used for credit scoring and evaluation systems in many previous works (Eggermont and Kok, 2004, Huang et al., 2006, Huang et al., 2007, Laha, 2007, Li et al., 2006, O’Dea,

The evaluation system

The neural network-based credit risk evaluation system consists of two phases: a data processing phase where each numerical value of the applicant’s attributes within the dataset is normalized separately; this is one of our objectives in this work. The output of this phase provides normalized numerical values representing a credit applicant’s case, which is used in the second phase; evaluating the applicant’s attributes and deciding whether to accept or reject the application using a neural

Implementation and experimental results

The results of implementing the credit risk evaluation neural network models were obtained using a 2.8 GHz PC with 2 GB of RAM, Windows XP OS and Borland C++ compiler. As one of our objectives is to investigate an ideal learning ratio, we follow nine learning schemes to train the neural network.

The learning schemes differ in the training-to-validation data ratio. For example, learning scheme 1 (LS1) uses a ratio of (100:900); i.e. the first 100 credit application cases are used for training the

Conclusions

This paper presented an investigation of the use of supervised neural network models for credit risk evaluation under different learning schemes. We also propose an efficient, fast and simple to use credit evaluation system, based on the results of our investigation. In our approach we trained three models of a three-layer supervised neural network; based on the back propagation learning algorithm, under nine learning schemes. These schemes differ in the ratio of the number of credit

References (47)

H. Abdou et al.
Neural nets versus conventional techniques in credit scoring in Egyptian banking
Expert Systems and Applications
(2008)
B.S. Ahn et al.
The integrated methodology of rough set theory and artificial neural network for business failure prediction
Expert Systems with Applications
(2000)
E. Angelini et al.
A neural network approach for credit risk evaluation
The Quarterly Review of Economics and Finance
(2008)
I. Becerra-Fernandez et al.
Knowledge discovery techniques for predicting country investment risk
Computers and Industrial Engineering
(2002)
E.W.L. Cheng et al.
Alternative approach to credit scoring by DEA: Evaluating borrowers with respect to PFI projects
Building and Environment
(2007)
C.L. Chuang et al.
Constructing a reassigning credit scoring model
Expert Systems with Applications
(2009)
J.N. Crook et al.
Recent developments in consumer credit risk assessment
European Journal of Operational Research
(2007)
N.C. Hsieh
Hybrid mining approach in design of credit scoring model
Expert Systems with Applications
(2005)
C.L. Huang et al.
Credit scoring with a data mining approach based on support vector machines
Expert Systems with Applications
(2007)
J.J. Huang et al.
Two-stage genetic programming (2SGP) for the credit scoring model
Applied Mathematics and Computation
(2006)

W. Huang et al.

Forecasting stock market movement direction with support vector machine

Computers and Operations Research

(2005)

Z. Huang et al.

Credit rating analysis with support vector machines and neural networks: A market comparative study

Decision Support Systems

(2004)

Y.S. Kim et al.

Managing loan customers using misclassification patterns of credit scoring model

Expert Systems with Applications

(2004)

A. Laha

Building contextual classifiers by integrating fuzzy rule based classification technique and k-nn method for credit scoring

Advanced Engineering Informatics

(2007)

T.S. Lee et al.

A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines

Expert Systems with Applications

(2005)

T.S. Lee et al.

Credit scoring using the hybrid neural discriminant technique

Expert Systems with Application

(2002)

S.T. Li et al.

The evaluation of consumer loans using support vector machines

Expert Systems with Applications

(2006)

M.K. Lim et al.

Cluster-based dynamic scoring model

Expert Systems with Applications

(2007)

S.L. Lin

A new two-stage hybrid approach of credit risk in banking industry

Expert Systems with Applications

(2009)

R. Malhotra et al.

Evaluating consumer loans using neural networks

Omega

(2003)

D. Martens et al.

Comprehensible credit scoring models using rule extraction from support vector machines

European Journal of Operational Research

(2007)

J.H. Min et al.

Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters

Expert Systems with Applications

(2005)

J.H. Min et al.

A practical approach to credit scoring

Expert Systems with Applications

(2008)

Cited by (197)

Credit risk prediction based on an interpretable three-way decision method: Evidence from Chinese SMEs
2024, Applied Soft Computing
Credit risk prediction can provide essential tools for use in commercial banking credit and credit-related decision-making. This paper proposes a three-way decision method based on prospect theory and evidence theory for predicting credit risk. The first problem in this study is determining the optimal classification boundary, and the second is effectively predicting the sample default status within an uncertain boundary. To address the limitation of the SVDD model, which is that it does not consider the aggregation degree, a new sample-weighted support vector data description (SW-SVDD) model is constructed by ranking samples according to their relative membership degree. The classification boundaries of the default and nondefault samples are determined according to the maximum prediction accuracy of the SW-SVDD model. The samples are divided into definite boundary nondefault, definite boundary default, and uncertain boundary samples. The default status of samples falling into definite boundaries is predicted by the SW-SVDD model. The three-way decision method combining prospect theory and evidence theory predicts the default status of samples falling into uncertain boundaries. This paper also proposes a new interpretability method based on the default probability, nondefault probability, and optimal threshold point obtained by the three-way decision model. The empirical results show that the proposed three-way decision method is a model that balances accuracy and interpretability. It has a higher classification performance than traditional models and can reveal the key features that lead to each customer's default. The proposed three-way decision method can enhance the accuracy and reliability of risk assessments, enabling financial institutions to make more informed lending decisions and more effectively manage credit portfolios.
Profit scoring for credit unions using the multilayer perceptron, XGBoost and TabNet algorithms: Evidence from Peru
2023, Expert Systems with Applications
Credit unions are growing microfinance institutions that base their lending decisions on the judgment of their credit analysts. Therefore, the purpose of this paper is to design 6 profit scoring models, capable of predicting the Internal Rate of Return (IRR) of credit applications, using the multilayer perceptron, XGBoost and TabNet algorithms and thus serve as a support tool for the credit analyst. For this purpose, the least correlated and most independent features were selected from the dataset coming from a Peruvian credit union and composed of 36 402 observations. Then, the hyperparameters of all algorithms were tuned. Finally, the profit scoring models that considered only the selected features were compared to which considered all features. As results, it was obtained that the most significant features that determine the IRR of a loan are the effective monthly interest rate and the member’s maximum or average days delinquent. The results obtained from the performance evaluation of the profit scoring models suggested the XGBoost as the best algorithm. In addition, the model that used the XGBoost algorithm and considered all the features had the best performance.
Machine learning for predicting propensity-to-pay energy bills
2023, Intelligent Systems with Applications
Predicting a customer's propensity-to-pay at an early point in the revenue cycle can provide organisations with many opportunities to improve the customer experience, reduce hardship and reduce the risk of impaired cash flow and the occurrence of bad debt. With the advancements in data science; machine learning techniques can be used to build models to accurately predict a customer's propensity-to-pay. Creating effective machine learning models without access to large and detailed customer features presents some significant challenges. This paper presents a case study, conducted on a dataset from an energy organisation, to explore the uncertainty around the creation of machine learning models to predict residential customers entering financial hardship which then reduces their ability to pay energy bills. Incorrect predictions can result in inefficient resource allocation and vulnerable customers not being proactively identified. This study investigates machine learning models' ability to consider different contexts and estimate the uncertainty in the prediction. Eight models from four families of machine learning algorithms are investigated for their novel utilisation. A novel concept of utilising a Bayesian Neural Network for the binary classification problem of propensity-to-pay energy bills (i.e. tabular data with numerical and categorical variables) is proposed and explored for deployment.
Credit evaluation solutions for social groups with poor services in financial inclusion: A technical forecasting method
2022, Technological Forecasting and Social Change
Financial inclusion aims to provide financial services at an affordable cost to low-income groups in need. However, the lack of effective credit evaluation information for such groups has hindered the innovative development of financial inclusion in the banking industry. This study proposes a slack constrained matrix factorisation model to supplement missing credit information. The method fills in missing data with known data in groups of similar credit behaviours. We empirically analyse the performance of this method in a supplemented credit information matrix and a sparse credit information matrix. We use actual credit data of farmers and herdsmen in extremely poor areas and small, medium and micro enterprises in National Equities Exchange and Quotations, China. This study concludes that the proposed credit evaluation methods based on sparse credit information can effectively improve the performance of traditional credit classification algorithms.
Predicting industry sectors from financial statements: An illustration of machine learning in accounting research
2022, British Accounting Review
The main aim and contribution of this study is to outline and demonstrate the usefulness of a machine learning approach to address prediction-based research problems in accounting research, and to contrast this approach with a more conventional explanation-based approach familiar to most accounting scholars. To illustrate the approach, the study applies machine learning to predict a firm's industry sector using the firm's publicly available financial statement data. The results show that an algorithm can predict an industry sector with just this data to a high degree of accuracy, especially if a non-linear classifier is used instead of a linear classifier. Additionally, the algorithms were able to carry out an industry-firm pairing exercise taken from introductory accounting text books and MBA cases, with predicted answers showing a high degree of accuracy in carrying out this exercise. The study shows how machine learning approaches and algorithms can be valuable to a range of accounting domains where prediction rather than explanation of the dependent variable is the main area of concern.
An Evidential Reasoning Rule-Based Ensemble Learning Approach for Evaluating Credit Risks with Customer Heterogeneity
2023, International Journal of Information Technology and Decision Making

View all citing articles on Scopus

View full text

Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes

Abstract

Introduction

Section snippets

Dataset for credit evaluation

The evaluation system

Implementation and experimental results

Conclusions

Expert Systems and Applications

Expert Systems with Applications

The Quarterly Review of Economics and Finance

Computers and Industrial Engineering

Building and Environment

Expert Systems with Applications

European Journal of Operational Research

Expert Systems with Applications

Expert Systems with Applications

Applied Mathematics and Computation

Computers and Operations Research

Decision Support Systems

Expert Systems with Applications

Advanced Engineering Informatics

Expert Systems with Applications

Expert Systems with Application

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Omega

European Journal of Operational Research

Expert Systems with Applications

Expert Systems with Applications