Feature selection in bankruptcy prediction

https://doi.org/10.1016/j.knosys.2008.08.002Get rights and content

Abstract

For many corporations, assessing the credit of investment targets and the possibility of bankruptcy is a vital issue before investment. Data mining and machine learning techniques have been applied to solve the bankruptcy prediction and credit scoring problems. As feature selection is an important step to select more representative data from a given dataset in data mining to improve the final prediction performance, it is unknown that which feature selection method is better. Therefore, this paper aims at comparing five well-known feature selection methods used in bankruptcy prediction, which are t-test, correlation matrix, stepwise regression, principle component analysis (PCA) and factor analysis (FA) to examine their prediction performance. Multi-layer perceptron (MLP) neural networks are used as the prediction model. Five related datasets are used in order to provide a reliable conclusion. Regarding the experimental results, the t-test feature selection method outperforms the other ones by the two performance measurements.

Introduction

Business and academic communities have paid much attention to predict bankruptcy. This is because incorrect decision-making in financial institutions may run into financial difficulty or distress and cause many social costs affecting owners or shareholders, managers, workers, lenders, suppliers, clients, the community and government, etc. As a result, bankruptcy prediction has been one of the most challenging tasks and a major research topic in accounting and finance.

The advancement of information technology allows us to obtain a variety of information about some risk status of a company from many ways, such as professional agencies, mass media, etc. In the process of evaluating a great amount of information, many people usually rely on some analyst’s judgment. However, some factors can influence the result of the analysis. Statistical and artificial intelligence (AI) methods can be used to identify important factors for bankruptcy prediction.

In the field of bankruptcy prediction, AI methods have been developed for a long time. They are used to build models to evaluate whether corporations face financial distress. The grand assumption is that financial variables extracted from public financial statements, such as financial ratios, contain a large amount of information about a company’s financial status which may be a factor to cause bankruptcy [1]. It is a complicated process to utilize those related financial data and other information from enterprise’s strategic competitiveness to operational details to establish an effective model.

Along with the development of AI and database technology, data mining techniques are gradually applied in various domains. In bankruptcy prediction, data mining techniques are able to predict business failures which can be very important for related staffs in two different ways. First, they can be used as “early warning systems”. These systems are very useful to those (e.g. managers, authorities, etc.) that can take actions to prevent business failures. These actions include the decision about merger of the distressed firm, liquidation or reorganization and associated costs. Second, these systems can help decision makers of financial institutions to evaluate and select firms to collaborate with or to invest in. Such decisions have to take into account the opportunity cost and the risk of failures [2].

To deeply analyze a huge amount of information of the corporations is likely to take much time and need many human resources. When irrelevant information is overabundance, it is unlikely to interpret and absorb the information very easily. Therefore, how to filter and condense the large amount of data is a very important issue to predict business failures, especially for bankruptcy prediction.

Feature selection as the preprocessing step is the one of the most important steps in data mining process. It aims at filtering out redundant and/or irrelevant features from the original data [3].

In related work, they attempt to design various mathematical calculations and/or combine different models to tackle the bankruptcy prediction problem. However, the crucial process of feature selection is not carefully concerned in many bankruptcy prediction studies. That is, selecting more informative data to effectively predict bankruptcy. Superfluous and redundant information inputted into a model could consume much time and cost, and even reduce the degree of accuracy of the model [4], [5].

As there are a number of statistical based feature selection methods which are used for bankruptcy prediction, the research question of this paper is which method is the best one for allowing the models to provide the best performance. In this paper, we consider five feature selection methods which have been applied in bankruptcy prediction to compare their prediction accuracy and Type I and II errors. They are t-test, correlation matrix, stepwise regression, principle component analysis (PCA) and factor analysis (FA).

It should be noted here that although some machine learning techniques, such as self-organizing maps (SOM) [6] and genetic algorithms [7] can be applied for selecting representative features, they are not widely considered in the business domain, especially for bankruptcy prediction. Therefore, the aim of this paper is to first examine the traditional statistical based feature selection methods for bankruptcy prediction.

The contributions of this paper allow us to not only understand the best feature selection method for effective bankruptcy prediction but also provide the baseline feature selection method for future related research.

This paper is organized as follows. Section 2 briefly describes the methods of data mining applied in bankruptcy prediction. Related work is also reviewed. Section 3 describes the experimental methodology. Experimental results are present in Section 4. The conclusion is provided in Section 5.

Section snippets

Bankruptcy prediction

Sometimes a firm can become distressed and continue to operate in that condition for many years. On the other hand, some firms enter bankruptcy immediately after a highly distressing event, such as a major fraud. A number of factors influence these outcomes. Lensberg et al. [8] investigates related work and categorizes various factors affecting bankruptcy potentially. They are audit, financial ratios, fraud indicators, start-up and stress which are measured by qualitative or quantitative

Experimental design

There are three stages to complete this experiment shown in Fig. 1. The first stage is to build a multi-layer perceptron (MLP) neural network as the baseline model since it is the most widely used in bankruptcy prediction [1], [13]. In this stage, we do not apply any feature selection methods. The second stage uses the five feature selection methods individually for generating more appropriate features. Then, there are five different new generated feature sets which are used to train the MLP

The baseline models

For investigating distinct parameters affecting the outcome of MLP in the five datasets, we built sixteen models and performed 5-fold cross validation. Table 4 presents the baseline models for the five datasets in terms of their best setting for training epochs and numbers of hidden nodes, average accuracy, and Type I and II errors.

Feature selection performance

Table 5 shows the performance of using t-test, Stepwise, correlation matrix, FA, and PCA which are applied on the baseline model over the five datasets.

Regarding

Conclusion

It is a very important issue to accurately predict business failure in financial decision-making. Bankruptcy prediction has long been regarded as a critical topic and has been studied extensively in the accounting and finance literature.

Data mining techniques have been used to prediction bankruptcies in recent years. Feature selection, a pre-processing step in the data mining process, is the step to select and extract more valuable information in the massive related materials. That is, it aims

Acknowledgement

This research is partially supported by National Science Council of Taiwan (NSC 96-2416-H-194-010-MY3).

References (45)

  • G. Zhang et al.

    Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis

    European Journal of Operational Research

    (1999)
  • T.S. Lee et al.

    Credit scoring using the hybrid neural discriminant technique

    Expert Systems with Applications

    (2002)
  • R. Malhotra et al.

    Differentiating between good credits and bad credits using neuro-fuzzy systems

    European Journal of Operational Research

    (2002)
  • T.E. McKee et al.

    Genetic programming and rough sets: a hybrid approach to bankruptcy classification

    European Journal of Operational Research

    (2002)
  • K.S. Shin et al.

    A genetic algorithm application in bankruptcy prediction modeling

    Expert Systems with Applications

    (2002)
  • M.-J. Kim et al.

    The discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms

    Expert Systems with Applications

    (2003)
  • S. Canbas et al.

    Prediction of commercial bank failure via multivariate statistical analysis of financial structures: The Turkish case

    European Journal of Operational Research

    (2005)
  • K. Lee et al.

    A comparison of supervised and unsupervised neural networks in predicting bankruptcy of Korean firms

    Expert Systems with Applications

    (2005)
  • J.H. Min et al.

    Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters

    Expert Systems with Applications

    (2005)
  • C.-S. Ong et al.

    Building credit scoring models using genetic programming

    Expert Systems with Applications

    (2005)
  • K.S. Shin et al.

    An application of support vector machines in bankruptcy prediction model

    Expert Systems with Applications

    (2005)
  • T.V. Gestel et al.

    Bayesian kernel based classification for financial distress detection

    European Journal of Operational Research

    (2006)
  • Cited by (202)

    • Bankruptcy prediction using fuzzy convolutional neural networks

      2023, Research in International Business and Finance
    • Modeling and prediction of business success: a survey

      2024, Artificial Intelligence Review
    View all citing articles on Scopus
    View full text