A Bayesian Survival Model Approach for Business Distress Prediction

The early warning signals of corporate distress and failure have been a major area of concern for shareholders, policy makers and academicians alike. Numerous approaches have been applied to examine firm insolvency ranging from the famous Altman’s Z-score, traditional econometrics, financial ratio analysis to the more contemporary tools of Artificial Intelligence and Machine Learning. The Cox proportional survival hazard model is a commonly applied technique not only in the field of medical sciences for estimating occurrences of a specific event but also in failure prediction of private firms. The study investigates distress prediction of firms in context of emerging nation like India where otherwise the application of Bayesian survival models is limited. A rich panel of firms spanning over ten years and representing varied sectors like manufacturing, services, mining and construction is compiled for the purpose. The study contributes by developing hazard (survival) modelling using Bayesian perspective. The advantage of Bayesian method lies in dealing effectively with censored and small samples over usual frequentist methods. Both standard Cox survival model for censored failure time and Bayesian estimation have been performed to assess and compare their performance. It is found that prediction accuracy of Bayesian Cox model is significantly higher than of the classical Cox model. The study contributes by providing useful insights in detecting early signs of distress in Indian corporate sector that is otherwise scant in literature.


Introduction
Modelling and analysing survival data is one of the earliest research fields of statistics marking the beginning of development of actuarial science and demography in the 17 th century. As the name indicates, survival data deals with life times, or more generally, with waiting times of some initial event of interest (like birth, start of treatment, acute distress of a company etc.) to the consequence of the happening (like death, relapse, bankruptcy etc.). A vital milestone in survival analysis was the paper wherein an estimator of survival curve was introduced [10]. The path breaking study not only opened a new avenue of research on survivability but also raised many technical research questions for upcoming research.
Introduction of proportional hazards model (PHM) incorporating impact of covariates (explanatory variables) was discussed in a seminal work [5]. In biomedical sciences, especially in clinical trials, or in demography an important issue arises when observing "time to event" data where the event of interest is death but certain individuals remain alive even by the end of study period leading to right censored data. In survival modelling, researcher is generally interested in the time span till the occurrence of an event like death/failure. The primary goal of censored survival model is to assess the dependence of time taken till happening of the event on certain exogenous parameters or covariates. [5] introduced PHM that is applicable for univariate failure time data wherein each unit under the study can experience the specified event at most once. In a parametric setup, wherein the lifetime distribution belongs to a known family of distribution, the regression analysis is reduced to estimation of unknowns utilizing survival time and covariate dataset. [9] have discussed extensively the [5,6] partial likelihood approach to estimate the unknown parameters of the survival model from a classical perspective. Similarly, [11] also summarize the application of PHM using classical approach. They discuss the effects of interaction, omission, measurement error, misclassification, monotonicity, multicollinearity, time dependency, goodness of fit of the approach.
A related strand of literature has also considered prior knowledge for hazard function estimation. [16] utilized Bayesian perspective to find posterior estimates of several quantities of interest dealing with complex models and unusual data structures of survival data from different medical studies. A detailed and insightful treatment of Bayesian survival analysis is discussed by [8]. The authors discuss semi-parametric Bayesian approaches in context of failure time models using Gamma, Beta and Dirichlet processes to model functions across time like baseline hazards and time-dependent covariate effects. Earlier, the applicability of Bayesian methods was limited in survival analysis due to the complicated structure of posterior distribution under censoring that used to become intractable for further algebraic derivations. However, research and development of Gibbs sampling has provided numerical algorithms for obtaining samples from posterior distribution. The Bayesian computation of one or multi-level or hierarchical process is possible using simulation techniques. The Gibbs sampler is one of the methods of Markov Chain Monte Carlo (MCMC) sampling algorithm which is widely accepted and popular tool in Bayesian analysis. [7] introduced and established the convergence properties of Gibbs sampling algorithms in the context of image processing.
As per above backdrop, distress prediction of Indian firms is carried out by employing annual data over a ten-year period, spanning from 2006 to 2015. This study proposes an alternative Bayesian semi-parametric approach to solve a PHM in addition to standard Cox survival model. The results illustrate profound improvement in prediction accuracy of Bayesian Cox model compared to classical Cox model that is helpful in detecting early signs of distress in Indian corporate sector. The structure of the rest of the paper is as follows. Section 2 briefly provides the micro foundations of Cox hazard model and importance of incorporating covariates in the distribution for prediction. Moreover, it introduces and develops Bayesian MCMC approach for Cox model. An exposition of dataset and variables is illustrated in Section 3. The empirical outcomes are presented in Section 4 followed by summary of major findings in Section 5.

Modelling Strategy
Theoretical foundations of both Cox PHM and Bayesian approach are described herewith.

The Cox Proportional Hazard Model
The survival model is a very popular concept of Statistics which characterises the success or failure of an individual or business and measures its survival or hazard. In this concept, let a random variable T indicates the time to failure of a firm.
T is a continuous variable having association with survival or hazard time t of a firm following certain underlying distribution function which represents the probability of survival of a firm. Its distribution function is given by: The survivor function is defined as the probability that a firm will survive longer or up to time t formulated as: Similarly, hazard function is defined as the rate of death of a firm. Let, ≤ < + ∆ | ≥ denotes the probability of survival time T to be between and + ∆ . Then the hazard function is defined as follows: Generally, there are four fundamental concepts in survival analysis viz., duration, censoring, hazard rate and survival function. Duration refers to the time span between the start of the failure process to the time when the desired event occurs or culmination of the study period whichever happens first. In case the failure does not happen, it is called right censoring. The hazard rate is known as instantaneous failure rate at time t given that the individual is still alive at time just before t. Whereas, the survival function measures the probability of survival of the individual after a specified time. Utilizing the Bayes theorem, we have, Here, is the probability density function of random variable . The survival function can be formulated as: Now, 1 = ℎ 4 4 is called the cumulative hazard function. The relationship between cumulative hazard function with survival function is depicted as below: Thus, while performing survival analysis one can not only determine how some of the explanatory variables determine the shape of hazard curve but also, we can estimate hazard function of a firm.
The objective is to calculate the probability of an event happening over some finite time interval (t, t+∆t) using the Cox PHM frame work. In particular, the baseline hazard rate is defined as , which is common to all individuals, is defined as the following limit: Here, t is the instantaneous or sudden death of an individual provided that it survived until time t. So, t ∆ is the approximate probability of failure in time interval (t, t +∆t). Being an arbitrary function of time, the hazard rate is quite difficult to estimate. The baseline cumulative hazard rate is defined as: The hazard function for a subject i with covariates 8 9 = 8 9( , 8 9;,…, 8 9= is represented as: Therefore, hazard rate is the product of unknown baseline hazard rate t which is a non-parametric part and the exponential function of the unknown regression coefficient B = ∑ B E E 8 9E that is the parametric part. The hazard function |8 for an individual with covariate vector equal to zero is which is the hazard function in absence of covariates also known as baseline hazard function. Here a key assumption is that relative risks are constant with time. For demonstration, consider a special case with only one covariate. The relative risk is defined as: The relative risk does not depend on time because only the baseline intensity reflects dependence on time. The result can be generalized for more than one covariate in the model. The analysis of intensity utilizing counting process and survival data is the main goal of Cox model. In a theoretical setup, [2] have discussed the counting process framework and proposed martingale concept as the estimator of such model.
Let there be n subjects/companies under investigation. For subject i, I i (t) is the intensity process for a counting process given covariate vector 8 9 = 8 9( , 8 9;… 8 9= . J 9 is defined as risk indicator, i.e., the set of subjects still at risk at time 9 . Let K 9 is the count process of failures which occurs in interval [0, ]. This process is constant and equal to zero between failures and increments by one unit at each failure. Here, the stochastic process 0K 9 , ≥ 02 is a count procedure with following assumptions: takes integer values. For s<t, K 9 − K 9 represents number of failure(s) that occur in interval [s, t]. Accordingly, the rate of failure is defined as N 9 = J 9 t|Z 9 . While the intensity, I i (t) is defined as the probability of failure occurring in the time interval [t, t + dt), given that it has not occurred yet.
Here, KP is the increment of K 9 over the small interval [t, t+dt). Consequently, I i (t) is the multiplicative intensity defined as: where the intensity process is the product of an observed process and an unobserved function. Therefore, the intensity process for K 9 is given as: Here, 7 is the instantaneous probability that the subject is at risk at time t and the event occurs in time interval (t, t +dt).

Bayesian Formulation for Cox Model
Suppose that companies either fail or are censored during the period of study. In such a data set, 0R = K 9 , J 9 , 8 9 ; P = 1,2, … , U2 we have unknown parameter B and the base line hazard that needs to be estimated. Under non-informative censoring, the marginal likelihood of the data is given by: Further, the joint likelihood of the data set D is given as: Here, K denotes a small increment in the interval , + . K and K are equal to 1 if a firm fails during the period 0, and , + respectively and 0 otherwise. Under non-informative censoring process, the counting process is assumed to follow the Poisson distribution. So, counting process increment i.e., K is represented as: Once the data set D is available, the focus is to derive the posterior distribution of unknown parameters. Utilizing Bayes theorem, the joint posterior distribution of parameters of the Cox PHM is derived as: B, Λ |R and VWDYB, Λ [ represents the posterior distribution of B, Λ given R and likelihood function respectively. The prior probability of baseline hazard function and regression coefficient are given as 7 and B respectively. Both 7 and B i.e. . can be expanded as a linear function: Here, i 9 = [1, @ 9( , … , @ 9 ] represent the vector of C are assumed to be following normal distribution, such that Next, we need to specify the prior distribution as required according to Bayesian formulation. As per Equation (16), K 9 i.e. number of distressed firms, the prior is assumed to be Poisson process, . Additionally, Loose priors are chosen for each of the above-mentioned case. The iterative Gibbs sampling algorithm is utilized to carry out the MCMC sampling. Moreover, convergence diagnostics are also performed to assess the validity of results.

Data and Variables
The Capital IQ platform that populates company level data of Indian firms from their annual reports has been utilized for empirical analysis. The dataset comprises of annual information of 554 firms for ten years spanning from 2006 to 2015. The firms represent varied sectors like manufacturing, services, mining and construction. Out of 554 public limited companies, 224 firms are found to be stressed (failure) based on prominent financial variables selection criteria whereas non-stressed (success) firms are 330.

Methodology for Distress Classification
Reasonable share of distressed firms in the full sample leads to robust results [12]. Applying this philosophy and following [14], the variables chosen and the approach followed with cutoff values for each attribute to decide the distress status of a firm is explained as below.
Interest coverage ratio (<1): It measures the firm's capability to pay interest on its outstanding debt. A low ratio implies greater debt burden and higher possibility of bankruptcy or default.
EBITDA (Earnings before interest tax depreciation and amortization) to expense ratio (<1): It is a vital indicator measuring the ability of a firm to meet its overall expenses. A value higher than unity is desirable for smooth functioning of operations.
Net worth to debt ratio (<1): The ratio reflects the capacity of a concern to repay its debt employing its net assets during contingency. A value higher than unity is desirable that signals comfortable position.
Net worth growth (consecutively negative for two years): Positive net worth growth of a firm signifies satisfactory performance in terms of generating profits. However, successive losses may lead to distress situation.
A firm is classified as distressed if it meets at least three conditions out of four as mentioned above. A firm is considered as non-distress if it does not satisfy any of the four conditions. Temporal data is available for firms. However, the time point when a firm is categorized as distressed, it is considered to be the death point of firm and altogether dropped from the dataset, leading to unbalanced panel. Additionally, all companies not falling under either distress or non-distress category are dropped from the entire study sample.

Training and Testing Sample Selection
In order to assess the prediction accuracy of a model the entire dataset is split into two samples viz., training and testing sample. The selected model is initially estimated based on training sample. Testing sample enables to examine the prediction efficacy of selected model. The size of testing sample is roughly kept at 25% of the entire database with similar proportions of distress and non-distress firms. Using the methodology explained as above, we have finally collected 512 companies' data (288 success and 224 fail firm) on various financial variables. Our training sample comprises of randomly chosen firms as 404 in total, bifurcated between successful (non-distress) and failed (distress) firms as 222 and 182, respectively. Moreover, testing data consists of a total of 108 companies with 66 being successful and remaining 42 failed firms. It may be noted that the time point when a firm fails is called its death point. Such a failed company is excluded from the analysis from the death point making the panel unbalanced.
The standard Cox survival model and Bayesian PHM are examined for their ability to classify and predict distress/nondistress firms. The level of accuracy with which a model categorizes firms correctly as successful or failure firm in training sample determines its "in-sample" classification ability. Likewise, the precision with which a model labels a business concern as distressed/non-distressed in testing or "out of sample" dataset indicates its prediction ability. The prediction of selected models is compared at yearly intervals beginning from one year horizon. The examination of predictive accuracy of both modeling strategies at multiple time horizons and drawing conclusion from the outcome is the prime objective of the analysis.

Covariates Selection Strategy
This segment explains the financial ratios considered in modelling failure prediction. We identify relevant financial ratios of firms based on useful established literature on corporate defaults that have explored array of parameters and applied diverse parametric and non-parametric methodologies like logit, hazard, neural network, machine learning etc. [1,4,[13][14][15]17]. However, most of the studies have been applied for advanced economies. So, before applying the methodologies for prediction we have tested the suitability of various indicators and finally selected variables that indicated substantial prediction ability for Indian conditions.
At the outset we select the parameters reflecting various aspects of a company and exclude outliers to smoothen data broadly following the approach of [3]. Subsequently, correlation analysis and step-wise regression has been performed to choose the variables as per their relevance and significance to discriminate between distress and healthy firm.

Explanatory Variables
A broad array of firm variables were considered for modelling purpose covering various aspects of a firm such as profitability, liquidity, size, leverage, valuation and so on.

The final choice of variables is based on various aforementioned approach.
Initiating with profitability, the parameters chosen comprise of earnings before interest and tax to tax ratio (EBIT_TAXPAID), retained earnings to asset ratio (RET_EARN). EBIT_TAXPAID is a crucial ratio representing earnings in proportion to tax liability. The ability of a business to convert its assets to cumulative net profits is captured by RET_EARN. INCOME_ASST also known as return on total assets is the capacity of a concern to generate net income from assets. All the profit indicators are expected to improve financial standing, thus reducing the probability of distress of an entity.
Cash to asset ratio (CASH_ASST) is considered that is a commonly used liquidity ratio to examine availability of liquid cash availability at disposal of a firm for business continuity that can be readily realized with minimum transaction cost.
Short term financial management for smooth operations is imperative. Accordingly, account receivable normalized by assets that constitutes trade credit receivable (ACC_RECEIVABLE) has also been selected. It depicts firm's short-term usage of funds and forms vital trade credit measure. A high level of current assets may reduce the risk of liquidity by renting or leasing plant and machinery, whereas similar policy cannot be followed for the components of working capital.
Amongst the leverage ratios debt to asset ratio (DEBT_ASST) is included that is commonly used ratio to assess the share of company liabilities. Aggressive leverage signifies greater liability that may conceive default risk for a concern.
Firm size is an important indicator that can wield on firms' performance in numerous ways. Larger firms may enjoy greater creditworthiness, reputation and market power leading to higher growth opportunities. However, bigger companies are prone to loss of effective management and control. The variables employed to reflect size are natural logarithm of net worth (LN_NETWORTH) to account for its valuation, which is broadly the equity capital together with its reserves and surplus.

Results
The descriptive findings of the dataset are presented below followed by the analytical results in subsequent sub-section.

Stylized Findings
A quick glimpse of dataset utilized is presented in Table 1 (Appendix I). Profitability indicators shows profits for successful firm whereas losses for failure firms indicating stress build-up amongst distress companies. Likewise, a better short term operational position is depicted by account receivable at 0.37 for successful units as against 0.32 for firms classified as failure. CASH_ASST show a better figure for successful firms. It is observed that failure firms are more leveraged at 0.20 in average for the period compared to 0.13 for successful firms. Similarly, sales ratio is better for successful firms vis-à-vis failure firms. Last but not the least, valuation of healthy firms stands at 6.1% in contrast to 5.6% for distress companies clearly pointing to better parameters for successful firms. The statistics in general are in accordance with the intuition that substantiates the methodology applied for classification of firms in the two groups. Utilizing the paired t-test, Table 2 depicts how the selected variables are good discriminator for the comparative analysis of distress and non-distress firms.

Model Testing
The detailed comparison of regression results based on standard Cox PHM and Bayesian survival model are carried out herein. Before proceeding with Bayesian framework, we need to specify the prior distribution of unknown parameters. In this context, normal distribution is assumed as the priori for regression coefficients. Thereafter, posterior distribution is derived, which is combination of prior with likelihood function. The convergence of posterior distribution is checked by employing the kernel density plot of the Cox regression coefficient using 100,000 iterations and 10,000 sample burn out in the MCMC algorithm. Additionally, marginal kernel density is also generated for examining stability of results (Figure 3, Appendix II). Kernel density plots display near normality for all the selected parameters. The posterior density and overall inferences are based on the summary of such samples.
The regression results based on both standard Cox and Bayesian survival model are presented in Table 3 (Appendix I). The tabulation compares the output obtained from both the techniques. Nearly all the factors are turning to be significant in determining default probability of firms for standard Cox model. Higher profitability ratios like retained earnings to assets ratios are leading to decline in distress condition of firm. Short term liquidity indicators like accounts receivable display mixed impact on firms' financial health. Debt to asset ratio indicates positive association with distress situation of firm both in case of standard Cox model and Bayesian approach. Liquid cash resources like CASH_ASST also display major role in robust performance of companies. Similarly, both firm size and net worth valuation exhibit inverse impact on probability of distress. Moving to Bayesian estimation results, it is also corroborating Cox PHM results albeit with improved significance level as evidenced by variables like DEBT_ASST etc.
The in-sample classification accuracy of classical Cox PHM and Bayesian estimation is shown in Tables 4 and 5 (Appendix I) respectively. The classification table is tabulated assuming probability cut off point at 0.5 for successful/distress. It is observed that improved classification accuracy is obtained employing Bayesian methodology for non-distressed and distress firm separately leading to significant estimation gain of nearly 0.7% at overall level.
The ROC curve which represents predictive ability of models has also been plotted and it shows a higher area under curve in case of Bayesian PHM i.e., 93.0% in contrast to 90.9% obtained employing its counterpart i.e., standard PHM (Figure 1, Appendix I). The significance of the survival model has increased over time for providing the survival and hazard function of individual entity and how they change over time. This individual firm-based information from such analysis can be utilized to predict the firms' performance and distress behavior in near future. The survival function for the study duration has been derived and tabulated in Table 6 (Appendix I). The survival probability is based on the average of explanatory variables that were used to fit the survival model for distress prediction. The table clearly indicates declining trend of survival rate over the initial seven years. The survival probability of firms has sharply declined by 7 percent approximately by second year. The drop has continued till seventh year with a survival rate of 90% approximately at that point. Afterwards, the survival rate is nearly flat till the end of period. The result shows that the survival probability has remained quite high for the dataset. A consistent trend has been found but in reverse direction compared to survival function in Table 6 (Appendix I), which displays the cumulative hazard rate of the same firms on average basis. The elevated hazard rate during initial phase represents higher failure rate that stabilizes by seventh year as revealed from survival probability also.

Prediction Comparison
The analysis of classical and Bayesian Cox survival analysis has been performed based on training and testing sample as elaborated earlier. The training sample has been utilized for model building and its calibration. Thereafter, testing sample enables us to assess the accuracy of both modeling approaches. The prediction results based on training and testing sample for different cutoff values for standard PHM and Bayesian survival are tabulated in Table 7 (Appendix I). Broadly, Bayesian approach produces more accurate forecast for different cutoff values for distressed and non-distressed firms both. The finding points to significant forecasting gains from Bayesian survival technique. The model prediction has also been worked out to assess the overall performance. N period ahead prediction is performed based on the model obtained on the training sample. The predictions have been done for both different cut values for within-sample and for testing sample separately. Table 6 shows predictions that are based on different cut values for both training and testing samples. It is observed that the higher the cut value, the lower the prediction accuracy and higher is Type I error.
The results based on N period ahead prediction is plotted in Figure 2. It is observed that, as we move farther in time, the Type I error is increasing and consecutively correct prediction of distressed firms is falling.

Conclusion
Predicting business failure is a critical area of research useful for entrepreneurs and policy makers alike. The analysis focuses on firm failures in Indian context employing a rich panel over ten years. Apart from the standard Cox survival modelling approach, the study also develops Bayesian survival technique for comparative purpose. The in-sample classification accuracy reveals forecasting gains of around 0.7% by Bayesian methodology vis-a-vis Cox procedure. Likewise, sizeable prediction gains have been witnessed through Bayesian approach. The study contributes by empirically establishing the superiority of Bayesian approach for management and analysts alike for better policy decisions. Last but not the least, it also proposes methodology for ascertaining the distress for firms in corporate sector with limited bankruptcy details that can be gainfully applied to identify early signals of stress.   Table 3. Estimation Results.