Feature selection in bankruptcy prediction
Introduction
Business and academic communities have paid much attention to predict bankruptcy. This is because incorrect decision-making in financial institutions may run into financial difficulty or distress and cause many social costs affecting owners or shareholders, managers, workers, lenders, suppliers, clients, the community and government, etc. As a result, bankruptcy prediction has been one of the most challenging tasks and a major research topic in accounting and finance.
The advancement of information technology allows us to obtain a variety of information about some risk status of a company from many ways, such as professional agencies, mass media, etc. In the process of evaluating a great amount of information, many people usually rely on some analyst’s judgment. However, some factors can influence the result of the analysis. Statistical and artificial intelligence (AI) methods can be used to identify important factors for bankruptcy prediction.
In the field of bankruptcy prediction, AI methods have been developed for a long time. They are used to build models to evaluate whether corporations face financial distress. The grand assumption is that financial variables extracted from public financial statements, such as financial ratios, contain a large amount of information about a company’s financial status which may be a factor to cause bankruptcy [1]. It is a complicated process to utilize those related financial data and other information from enterprise’s strategic competitiveness to operational details to establish an effective model.
Along with the development of AI and database technology, data mining techniques are gradually applied in various domains. In bankruptcy prediction, data mining techniques are able to predict business failures which can be very important for related staffs in two different ways. First, they can be used as “early warning systems”. These systems are very useful to those (e.g. managers, authorities, etc.) that can take actions to prevent business failures. These actions include the decision about merger of the distressed firm, liquidation or reorganization and associated costs. Second, these systems can help decision makers of financial institutions to evaluate and select firms to collaborate with or to invest in. Such decisions have to take into account the opportunity cost and the risk of failures [2].
To deeply analyze a huge amount of information of the corporations is likely to take much time and need many human resources. When irrelevant information is overabundance, it is unlikely to interpret and absorb the information very easily. Therefore, how to filter and condense the large amount of data is a very important issue to predict business failures, especially for bankruptcy prediction.
Feature selection as the preprocessing step is the one of the most important steps in data mining process. It aims at filtering out redundant and/or irrelevant features from the original data [3].
In related work, they attempt to design various mathematical calculations and/or combine different models to tackle the bankruptcy prediction problem. However, the crucial process of feature selection is not carefully concerned in many bankruptcy prediction studies. That is, selecting more informative data to effectively predict bankruptcy. Superfluous and redundant information inputted into a model could consume much time and cost, and even reduce the degree of accuracy of the model [4], [5].
As there are a number of statistical based feature selection methods which are used for bankruptcy prediction, the research question of this paper is which method is the best one for allowing the models to provide the best performance. In this paper, we consider five feature selection methods which have been applied in bankruptcy prediction to compare their prediction accuracy and Type I and II errors. They are t-test, correlation matrix, stepwise regression, principle component analysis (PCA) and factor analysis (FA).
It should be noted here that although some machine learning techniques, such as self-organizing maps (SOM) [6] and genetic algorithms [7] can be applied for selecting representative features, they are not widely considered in the business domain, especially for bankruptcy prediction. Therefore, the aim of this paper is to first examine the traditional statistical based feature selection methods for bankruptcy prediction.
The contributions of this paper allow us to not only understand the best feature selection method for effective bankruptcy prediction but also provide the baseline feature selection method for future related research.
This paper is organized as follows. Section 2 briefly describes the methods of data mining applied in bankruptcy prediction. Related work is also reviewed. Section 3 describes the experimental methodology. Experimental results are present in Section 4. The conclusion is provided in Section 5.
Section snippets
Bankruptcy prediction
Sometimes a firm can become distressed and continue to operate in that condition for many years. On the other hand, some firms enter bankruptcy immediately after a highly distressing event, such as a major fraud. A number of factors influence these outcomes. Lensberg et al. [8] investigates related work and categorizes various factors affecting bankruptcy potentially. They are audit, financial ratios, fraud indicators, start-up and stress which are measured by qualitative or quantitative
Experimental design
There are three stages to complete this experiment shown in Fig. 1. The first stage is to build a multi-layer perceptron (MLP) neural network as the baseline model since it is the most widely used in bankruptcy prediction [1], [13]. In this stage, we do not apply any feature selection methods. The second stage uses the five feature selection methods individually for generating more appropriate features. Then, there are five different new generated feature sets which are used to train the MLP
The baseline models
For investigating distinct parameters affecting the outcome of MLP in the five datasets, we built sixteen models and performed 5-fold cross validation. Table 4 presents the baseline models for the five datasets in terms of their best setting for training epochs and numbers of hidden nodes, average accuracy, and Type I and II errors.
Feature selection performance
Table 5 shows the performance of using t-test, Stepwise, correlation matrix, FA, and PCA which are applied on the baseline model over the five datasets.
Regarding
Conclusion
It is a very important issue to accurately predict business failure in financial decision-making. Bankruptcy prediction has long been regarded as a critical topic and has been studied extensively in the accounting and finance literature.
Data mining techniques have been used to prediction bankruptcies in recent years. Feature selection, a pre-processing step in the data mining process, is the step to select and extract more valuable information in the massive related materials. That is, it aims
Acknowledgement
This research is partially supported by National Science Council of Taiwan (NSC 96-2416-H-194-010-MY3).
References (45)
- et al.
Credit rating analysis with support vector machines and neural networks: a market comparative study
Decision Support Systems
(2004) - et al.
A survey of business failures with an emphasis on prediction methods and industrial applications
European Journal of Operational Research
(1996) - et al.
Optimization-based feature selection with adaptive instance sampling
Computers & Operations Research
(2006) Evaluating feature selection methods for learning in data mining application
European Journal of Operational Research
(2004)- et al.
A GA-based feature selection and parameters optimization for support vector machines
Expert Systems with Applications
(2006) Data visualisation and manifold mapping using the ViSOM
Neural Networks
(2002)Genetic algorithm-based feature set partitioning for classification problems
Pattern Recognition
(2008)- et al.
Bankruptcy theory development and classification via genetic programming
European Journal of Operational Research
(2006) - et al.
Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters
Expert Systems with Applications
(2005) - et al.
An application of support vector machines in bankruptcy prediction model
Expert Systems with Applications
(2005)
Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis
European Journal of Operational Research
Credit scoring using the hybrid neural discriminant technique
Expert Systems with Applications
Differentiating between good credits and bad credits using neuro-fuzzy systems
European Journal of Operational Research
Genetic programming and rough sets: a hybrid approach to bankruptcy classification
European Journal of Operational Research
A genetic algorithm application in bankruptcy prediction modeling
Expert Systems with Applications
The discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms
Expert Systems with Applications
Prediction of commercial bank failure via multivariate statistical analysis of financial structures: The Turkish case
European Journal of Operational Research
A comparison of supervised and unsupervised neural networks in predicting bankruptcy of Korean firms
Expert Systems with Applications
Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters
Expert Systems with Applications
Building credit scoring models using genetic programming
Expert Systems with Applications
An application of support vector machines in bankruptcy prediction model
Expert Systems with Applications
Bayesian kernel based classification for financial distress detection
European Journal of Operational Research
Cited by (202)
Artificial neural network (ANN)-based algorithms for high light stress phenotyping of tomato genotypes using chlorophyll fluorescence features
2023, Plant Physiology and BiochemistryExtending business failure prediction models with textual website content using deep learning
2023, European Journal of Operational ResearchBankruptcy prediction using fuzzy convolutional neural networks
2023, Research in International Business and FinanceAn automatic energy saving strategy for a water dispenser based on user behavior
2022, Advanced Engineering InformaticsA comparative study on applied null hypothesis feature selection technique in crime forecasting
2024, AIP Conference ProceedingsModeling and prediction of business success: a survey
2024, Artificial Intelligence Review