An Empirical Analysis of Financially Distressed Italian Companies

This paper investigates the performance of forecasting models for default risk referring to the annual balance sheet information of Italian firms. One of the main issues in bankruptcy predictions is related to the selection of the best set of indicators. Therefore, our main research question concerns the identification of the determinants of corporate financial distress, comparing the performance of innovative selection techniques. Furthermore, several aspects related to the default risk analysis have been considered, namely the nature of the numerical information and the sample design. The proposed models take in consideration the above-mentioned issues and the empirical results, elaborated on a data set of financial indices expressly derived from annual reports of the industrial firms. These reports provide evidence in favor of our proposal over the traditional ones.


Introduction
In recent years, the interest in the prediction of corporate financial distress has grown together with the global increase of corporate collapses.This has happened also due to the consequences of bankruptcy (Riasi, 2015).
Since the fundamental paper of Beaver (1966), which proposes for the first time the use of financial indicators as bankruptcy predictors, and the even more essential work of Altman (1968), which extended the previous intuition to a multivariate framework, there have been many contributions in this field (Agarwal and Taffler, 2008;Aloy Niresh, J. and Pratheepan, T., 2015;Altman et al., 2014;Altman E. I. and Branch B., 2015;Balcaen S. and Ooghe H., 2006;Becchetti and Sierra, 2003;Bellovari et al., 2007;Dimitras et al., 1996;Gunathilaka, C., 2014;Jackson R. and Wood,Morris A., 2013;Platt and Platt, 2002;Poddighe and Madonna, 2006;Sanchez J.A. and Sensini L., 2013).Some authors have studied the role of the financial variables (Keasey and Watson, 1987;Amendola, et al., 2010) as predictors.Others have compared the performance of static and dynamic models to investigate the impact of time dynamics on both parameter estimations and model performance (Balcaen and Ooghe, 2004;Chava and Jarrow, 2004;Dakovic et al., 2007;Hillegeist et al., 2004).
Few authors have attempted to assess the contribution of variable selection techniques to the performance of a model (Amendola et al. 2011a;Back et al., 1994;Brabazon and Keenan, 2004;du Jardin, 2010).
In recent years, the exponential growth of micro-data availability and the development of computer techniques have recently attracted new interest on the topic.
Despite the numerous empirical findings, different research concerns still need to be addressed, such as: the definition of failure, the stability of the data, the choice of the sample design, the variable selection (Amendola et al., 2011b;Hä rdle et al., 2009;Sensini, 2015;Sexton et al., 2003).
Furthermore, in order to examine the effect of explanatory variables across the diverse states of financial distress, a multi-state approach has been used (Lau, 1987;Shary, 1991).
Starting from a large set of financial indicators considered as potential predictors the main purpose the study on corporate failure prediction is to select the best set of predictor in order to discriminate firms with a high probability of failure from healthy firms.
However, the problem which still deserves further investigation is how to choose the best predictors among the large number of financial indicators suitable for predicting the bankruptcy and insolvency.
The aim of this work is to investigate several determinants of bankruptcy in the Italian industry bankruptcy prediction by means of predictive approaches that take into account the variable selection problem.The use of model selection techniques based on shrinkage methods have been proposed by Amendola et al. (2011b) where the performance of this approach have been compared over traditional variable selection techniques.
In this paper the empirical setting has been extended in reference to a large sample of Italian industrial firms and the forecasting performance has been evaluated considering different optimal prediction sets and different sampling approach.The results of the comparative analysis, computed using specific accuracy measures, are in favor of the use of an innovative variable selection procedure and highlights the role of the optimal set of predictors in generate accurate default risk prediction.This paper has six sections and it is structured as follows.The next section introduces the data and the sampling procedure.Section 3 illustrates in brief the variable selection techniques.The proposed models are described in section 4, while the results of the prediction power comparison of the different models are reported in Section 5.The last section offers some concluding remarks.

The Data and the Financial Predictors
The notion of business failure has been defined in many different ways in literature and it is not easy to agree on a widely accepted definition (Karels and Prakash, 1987;Crutzen and van Caillie, 2007).
A failure state has been analysed from diverse perspectives depending not only on the context and the characteristics of the firms but also on the interest of researchers (Dimitras, Zanakis and Zopounidis, 1996).
In part of the literature, business failure is defined as a sequence of several financial situations that lead to the closure of the firm (Morris, 1997).However, this definition only concentrates on the financial disease without taking into account other difficulties that can affect the firms' health in the early stages of the failure process (Argenti, 1976).
Given that the empirical literature distinguishes between economic and juridical business failure (Weitzel and Jonsson, 1989), in this study we refer to the juridical concept of business failure, considering those companies that have experienced permanent financial disease We do not take in consideration firms that incur in voluntary liquidation or have a temporary financial disease.
The data-set includes industrial companies, in the limited liability sector, that have started the juridical procedure of bankruptcy in Italy in the considered period.The information on the legal status and the annual reports have been extracted from the AIDA database of Bureau Van Dijk (BVD).
In particular, the disease set is composed of those industrial firms that have entered the juridical procedure of bankruptcy in Italy at t=2010, for a total of 5628 failed firms and five years of financial statement information prior to failure (t-i; i = [1; 5]).
The firms included in the data set which registered missing data have not been included in the analysis.
For this reason a preliminary analysis was executed and the results are reported in Table 1.
For the considered period, the population of failed firms has been divided into two sub-sets: a) firms that regularly present financial statements; b) firms that did not present their financial statements or presented incomplete information and are not suitable for the purpose of our analysis.In Table 1, it can be easily observed that firms do not tend to present the financial statements in the years immediately before the failure or, in any case, the financial information is incomplete.
In the final data-set used for the elaboration have been included only those firms that provide a full financial statements in the five years prior to failure (2005)(2006)(2007)(2008)(2009).
We indicate 2010 as the reference period, t, so as to verify in a time span of 4 years of future annual reports (at t + i; i = [1; 4]) that the company selected as healthy at time t does not get into financial diseases in the next 4 years.
The healthy set was sampled among the Italian industrial firms that were still active at the end of time t (year 2010) which have not incurred in any kinds of bankruptcy procedures, such composition with creditors, receivership, extraordinary administration etc., between 2010 and 2014; which have not changed name or structure with operations such as merger and acquisition in the years of interest and have provided full information at time (t -i; i = [1; 4]) and (t + i; i = [0; 4]).
With the aims of achieving a set of full information, i.e. each firm provides complete financial data for each time period t, the analysis has been limited to the three years of interest (2006,2007,2008).The main aim of the analysis is to investigate the performance of the developed default risk models over different sample designs on real data.
Concerning the selection of the sample design, despite numerous debate in literature, there has been no clear evidence in favor of a unique solution.A possible solution is to adopt a balanced-sample, by selecting the same sample size for both cluster of failure and healthy firms.The motivation is that the population proportion significantly favors active firms and so a non-balanced sample would select a small number of failed firms leading to a biased estimator.However, there are also reasons in favour of different choices, such as oversampling the failing companies with unbalanced proportion (Back, 1997).
In this analysis a cluster scheme, based on the geographical distribution as auxiliary variable, of the industrial firms have been used and both balance and unbalanced cluster sampling designs have been considered.A crosssectional approach has been considered as benchmark.
Different approaches have been proposed into the literature to forecast default risk and the authors have used different set of variables.
In this paper, according to Bellovari et al. ( 2007), we believe that in order to achieve higher model accuracy the selection of the predictors has to consider different aspects.
They have had a relevant financial meaning in a failure context, they have been frequently used in failure prediction literature and, finally, the information needed to calculate these ratios is available.
Accordingly, the financial variables have been selected among the most relevant in underline current and prospective conditions of operational unbalance, in line with the main previous theoretical and empirical studies on the topic (Altman, 2000;Dimitras et al., 1996).
Finally we have 55 indicators as potential bankruptcy predictors (reported in table 2) that take into account all the relevant aspects of the firms' structure: Profitability, Size and Capitalization, Leverage, Liquidity, Operating structure, Turnover.
The predictors data-base for the considered period (2006,2007,2008) was derived starting from the financial statements of each firm included in the sample.The whole sample has been divided in a training set (70% of the data) used for estimation purpose and a test set (30% of the data) used for performance evaluation.

Selection Techniques
A relevant problem in measuring the risk of failure is to select the optimal set of financial indicators.Since the seminal paper of Altman (1968) this issue has been largely discussed in financial literature and, over the years, different selection procedures have been proposed.
The traditional methods refer to subset regression, which aim at choosing the set of the most important predictors to be included in the model.In this class we can allow different methods: all-subset; forward (backward) selection; stepwise selection (Furnival and Wilson, 2000).
A different approach is based on the shrinkage procedure based on penalized regression methods.They allow a variable to be partly included in the model via constrained least squares optimization.Shrinkage often improves prediction accuracy, trading off decreasing variance for increased bias (Hastie et al. 2009).
Among this frame, a widely used approach is the Least Absolute Shrinkage and Selection Operator, LASSO proposed by Tibshirani (1996).
The Lasso allows for simultaneous execution of both parameter estimation and variable selection.It shrinks some coefficients in the linear regression and sets others to 0, and hence tries to retain the good features of both subset selection and ridge regression.The Lasso linear regression can be generalized to other models, such as GLM, hazards model, etc. (Park and Hastie, 2007).

Default-risk Models and Performance Evaluation
In this work we aim at developing default risk models for predictions and diagnosis of the risk of bankruptcy, in particular we focus on the variable selection of the best optimal set of predictors.
For this purpose we compared different selection strategies, evaluating their performances in terms of prediction accuracy considering different sample design and different time horizon.
We refer to two different approaches: the Logistic Regression with a stepwise variable selection (Model 1) and the regularized Logistic Regression with a Lasso selection (Model 2).
As benchmark we estimated a Linear Discriminant Analysis with a stepwise selection procedure (Model 3).
The logistic regression can be written as: The Regularized logistic Regression consider the penalty term , as illustrated in the previous session, via the Lasso and can be written as For evaluation purposes the classification results can be summarized in a two-by-two confusion matrix that allows for four possible outcomes as indicated in Table 4.This information can be used to generate further accuracy measures widely used in a bankruptcy prediction study (Engelmann et al., 2003;Fawcett, 2006).They include some measures based on the Cumulative Accuracy Profile (CAP) and its summary statistic, the Accuracy Ratio, calculated by relating the area under the CAP plot to the area under the CAP of a hypothetical "perfect" rating system.A different approach is based on the Receiver Operating Characteristics (ROC) analysis that shows the ability of the classifier to rank the positive instances relative to the negative instances.
Although the construction of the ROC curve differs from the CAP approach, the summary measures of both curves essentially contain the same information.It can be shown that the Accuracy Ratio can be calculated referring to the Area under the ROC curve with following equation: AR = 2 * AUC -1.The Accuracy Ratio is normalized between -1 and 1, while the Area under the ROC curve lies between 0 and 1.The area is 1 for a perfect model.Testing the performance of a default model means to investigate its ability to discriminate between different levels of default risk.

Empirical Results
The predictive performance of the developed models has been evaluated in terms of: Correct Classification Rate (CCR); Area under the ROC curve (AUC); Accuracy Ratio (AR).
The accuracy measures have been computed on the training and test sets for each forecasting model, previously described (Model 1, Model 2 and Model 3) and each sample design.For the unbalanced sample (Table 5-6), the correct classification rate of the three models increases when approaching the bankruptcy year, both in the training set and in test set.Concerning the effect of the sample design it seems to be not so relevant, in fact the trend of the accuracy measures for the balanced sample (Table 7-8), is quite similar to that in the unbalanced sample.Looking at the error rates, the values for the balance sample are on average slightly worse than the unbalanced.Comparing the performance of the three models, it can be noted that the Lasso has a better performance in each year, in both sets and for both samples, compared to Logistic Regression and Discriminant Analysis.
To sum up, the analysis shows that forecasting models based on unbalanced sample and shrinkage selection methods perform better than the model based on balance sample and traditional selection procedure.As expected the Lasso procedure selects a reduced number of variables and gives advantage in terms of computational time.Overall, the performance of the model increases, as the forecasting horizon decreases even if some drawbacks can be registered for the Logistic Regression in the year 2007.
The final set of financial variables included in the three estimated models are consistent with those considered, at different levels, in large part of the empirical literature on the topic (Amendola et al., 2010;Dimitras et al., 1996).

Concluding Remarks
In this study the industrial enterprise default risk models have been developed by investigating the role of variable selection procedures and sample designs in the overall forecasting performance.
The financial statements of healthy and failed Italian companies, sampled with balance and unbalance schemes, have been analysed.
In particular, we aim at evaluating the opportunity to implement variable selection techniques based on shrinkage regression.The performance of the proposed forecasting models has been evaluated at different time horizons and by means of properly chosen accuracy measures.The results of our analysis seem to support the research question which underlines the superior performance of the Lasso selection procedure over traditional methods, specifically logistic regression and discriminant analysis.
The results are quite similar for both the balanced and unbalanced sample, which underline the marginal effect of the sample design in terms of forecast accuracy.
Overall, the proposed approach seems to be a promising and valid alternative.
Given the dynamic nature of the problem, we may obtain better results in terms of forecast accuracy if we include the time dimension and the evolutionary behavior of the financial variables in the models.
Furthermore the empirical findings can be generalized by extending the analysis to a larger data set including other European countries.

Table 1 .
Failed firms sample

Table 2 .
Financial IndicatorsThe explanatory variables considered reflect different aspects of the firms' structure, as synthesized in Table3.

Table 4 .
Confusion Matrix failing group.An overall index, the Correct Classification Rate, (CCR), i.e. correct classified instances over total instances, can be computed.

Table 5 .
Unbalanced sample: Accuracy measures for training set

Table 7 .
Balanced sample: Accuracy measures for training set

Table 8 .
Balanced sample: Accuracy measures for Test set

Table 9 .
Cross-Sectional sample: Accuracy measures for training set

Table 10 .
Cross-Sectional sample: Accuracy measures for Test set