Using fuzzy c-means clustering algorithm in financial health scoring

Classification of firms according to their financial health is currently one of the major problems in the literature. To our knowledge, as a first attempt, we suggest using fuzzy c-means clustering algorithm to produce single and sensitive financial health scores especially for shortterm investment decisions by using recently announced accounting numbers. Accordingly, we show the calculation of fuzzy financial health scores step by step by benefit from Piotroski’s criteria of liquidity/solvency, operating efficiency and profitability for the firms taken as a sample. The results of correlation analysis indicate that calculated scores are coherent with short-term price formations in terms of investors’ behavior and so fuzzy c-means clustering algorithm could be used to sort firm in a more sensitive perspective.


Introduction
Investors' decisions represent the expectations derived from cumulative beliefs that include past experiences and formation of recent reasonable differences in prior beliefs (Ball and Brown, 1968;Morris 1996;Fama, 1998;Core et al., 2003;Cajueiro and Tabak, 2004;Brimble and Hodgson, 2007).In this sense, announced accounting numbers as quantized signals play an important role on changes in beliefs and can cause rapid stock price fluctuations mostly in weak efficient or inefficient markets where investors must actively manage their portfolios in order to expect a proper return in the frame of speculative investment behavior (Fama et al., 1969;Malkiel and Fama, 1970;Harrison and Kreps, 1978;D'Ambrosio, 1980;Harvey, 1993;Urrutia, 1995;Aitken, 1998;Grieb and Reyes, 1999).In other words, especially in short-term, investors buy or sell stocks based on changes in financial health of firms which become clear by recently announced accounting numbers (Core et al., 2003).Therefore investors need summarized indicators to make investment decisions in post-announcement short-term.
Financial classification is useful tool for market participants to compare differentiation in financial situations.Although there are lots of general accepted scores, summarizing the large amount of valuable data is currently one of the major problems in the literature.For instance, F-Score is widely accepted benchmark developed by Piotroski (2000) to show financial performance of the firms as single summarized indicator and provides many useful insights to identify financially healthier firms.However, the numerical characteristic makes F-Scoring (between '0': the lowest and '9': the highest qualification) insensitive to sort and classify firms especially to explain the price formations ans short-term investors' decisions.
In the literature, there are various studies which utilize a clustering algorithm for classification problem.On the other hand, while most part of these studies have tried to integrate clustering techniques into portfolio management (Pattarin et al., 2004;Tola et al., 2008;Nanda et al., 2010 etc.), there are limited number of studies that concentrate on classifying the firms based on their announced accounting numbers.
Wang and Lee (2008) suggest a clustering method based on a fuzzy relation to classify the financial ratios of different companies and they stated that the clustering method can be applied in conditions where the cluster number is not determined.On the other side, their study does not mention the benefit of using this kind of clustering.
The main contribution of this study is to suggest a systematic alternative to sort firms sensitively according to changes in their financial health based on recently announced accounting numbers.In other words, we show how Fuzzy c-means (FCM) clustering algorithm could be used in order to produce single and more sensitive numerical indicator, hereupon called 'fuzzy financial health score (F-FHS: between '0' and '1')', that show the changes in financial health compared to previous year.To our knowledge, our study is first to provide a methodological perspective under this point of view.
We present this methodological perspective through an implementation on selected sample.Since the reaction level of markets with low efficiency on recently announced accounting numbers is high, we select the data of 166 active firms listed and traded on National Market of Istanbul Stock Exchange1 as a sample in model implementation.We use delta determinants of F-Score to calculate F-FHSs of selected firms: ∆ROA (change in return on assets), ∆CFO (change in cash flow from operations), ∆LEV (change in leverage), ∆CR (change in current ratio), ∆MARGIN (change in gross margin) and ∆TURN (change in asset turnover).
2013 and 2014 annual announced accounting numbers were used because in that period Turkey initialized its position against IMF and ranked as the sixth biggest economy in Europe and the sixteenth in the world.Therefore, these years can more clearly reflect firm specific performance under smooth economic conditions.
In order to see if F-FHSs are meaningful summarized single indicators or not, correlation analysis is executed between calculated scores and realized returns of firms for given short period.Ten trading days (n) are used as pre and post terms of announcement time of financial statements and three different indicators are used as return inputs, 'rA', 'rB' and 'rC'.'r ' denotes the price changes of stock in percentages by using 'P !"# ' and 'P !$% ', indicate the stock prices at the end of post announcement term and the one trading day before from 't' respectively, while 't' indicates the announcement date of financial statements.
In order to make return input more explanatory from the view of investors' active behavior, trading volumes are taking into account for pre and post terms via calculating their weights '-./0 ' and 'r 1 ' that denotes the weighted average price changes of stock in percentages is added into analysis as second return input.
More return or less loses results compared to market's return also perceived as win situation by investors.In this sense, 'r ; ' is added as another return input via calculating the spread between 'r 1 ' and market return (r < ) which indicates percentage change in market index value (IV) between the post n th trading day and announcement date.
r ; = & 2 3 > ?@()* $&?@ ( ?@ ()* (3) The structure of the paper is as follows.In the next section, a brief overview of the FCM algorithm is provided.In section 2, data sources are mentioned, calculation of F-FHSs is shown step by step and the results of correlation analysis are given.In Section 3, conclusions are mentioned.

FCM Clustering Algorithm
Clustering algorithms based on its structure are generally divided into two types: fuzzy and nonfuzzy (crisp) clustering.Crisp clustering algorithms give better results if the structure of the data set is well distributed.However, when the boundaries between clusters in data set are ill defined, the concept of fuzzy clustering becomes meaningful (Nefti and Oussalah, 2004).Fuzzy methods allow partial belongings (membership) of each observation to the clusters, so they are effective and useful tool to reveal the overlapping structure of clusters (Zhang, 1996).Fuzzy c-means (FCM) clustering algorithm is one of the most widely used method among fuzzy associated models (Bezdek and Pal, 1992).% × ' dimensional data matrix, composed of a set of % vectors is A fuzzy clustering algorithm separates data matrix, into 8 overlapping clusters in accordance with the design of a fuzzy partition matrix, U. Fuzzy partition matrix, 9 is composed of the degrees of memberships of objects, ! ( (& = 1, 2, … , %) in every cluster : The degree of membership of &. vector in cluster : is represented by ; <,( 0 U. Accordingly, the partition matrix is given by U = 4 ; "," ; #," … ; >," ; ",# ; #,# … ; >,# 5 5 6 5 ; ",$ ; #,$ … ; >,$ 7. ( In fuzzy clustering method, each cluster is represented with a vector of cluster centers which is usually identified as the centroids of ' objects, e.g., average of all the datum of the corresponding cluster (Celikyilmaz and Turksen, 2009).The algorithm calculates 8 number of cluster center vectors ?= {@ " , @ # , … , @ > } 0 A >×- where each cluster center is denoted as @ < 0 3 -, : = 1, 2, … , 8. FCM clustering algorithm is a simple and convenient method.In this method, the number of clusters, c is assumed to be known or at least fixed.Because this assumption is considered to be unrealistic in many data analysis problems, the method for determining the number of clusters such as Cluster Validity Index (CVI) analysis has been developed in FCM clustering algorithm (Pal and Bezdek, 1995 In eq. ( 7), (-! 19M#4 denotes cluster center vector for cluster 8 obtained in (T U 2)th iteration.% !,& 194 in eqs.(7) and ( 8) denotes optimum membership values obtained at T. iteration.According to this operation, the membership values and cluster centers seem to be dependent on each other.Therefore, Bezdek (1981) proposed an iterative formula for determining membership values and cluster centers.Accordingly, at each iteration T, objective function V 194 is determined by FCM algorithm is ended at the end of a particular iteration or according to a termination rule defined as XN ! 194U N !19M#4 X Y Z (Celikyilmaz and Turksen, 2009).Step 1. Due to the existing heterogeneity in measurement units of variables, it is necessary to perform a homogenization process.By utilizing the normalization of the variables, weighting variables more or less is prevented.The normalization process is performed with the following relation:

The data and empirical implementation
where ' ()* is the minimum value and ' (+, is the maximum value of corresponding variable.All variables with normalization is scaled to the range [0-1]. Step 2, Optimum value of the number of cluster (c) and degree of fuzziness (m) are determined by utilizing CVI analysis.Step 3. Cluster center vectors and partition matrix are determined by applying FCM clustering algorithm with the prior information, .and /, obtained at previous step. For Step 4. Euclidean norm is calculated for each cluster center vector.
In this implementation, it is claimed that the norm values allow an assessment of the general level of financial health for each cluster.Thus, while the value of the calculated norm for each cluster increases, the level of financial health rises in accordance with defined determinants, and while the norm value becomes smaller, the level of financial health of cluster will be reduced similarly.As a result, calculated Euclidean norms for center vectors of five clusters are given in Table 2.

Using fuzzy c-means clustering algorithm in financial health scoring
No. 3(147)/2017 391 Step 5.The advantage of FCM clustering algorithm is to produce the degree of membership of each country to c cluster.Let the degree of memberships of (. firm to c number of cluster is denoted as ) * = +, -/-/ , :/-/ … / , ;/-< and the vector consisting of the norms of cluster center vectors is represented by &.
Accordingly, the F-FHS for each firm is determined with the following formula, != " !# (12) Step 6. F-FHSs of each firm are presented in Table 3 and F-FHS, $ = 1, 2, … , % is calculated with the following relation, Step 7. Correlation analysis is executed in order to see if F-FHSs work or not.The results of significance test are given in Table 4 and there is a statistically significant relationship between F-FHSs and r / , r 0 respectively.That means F-FHSs are coherent with short-term price formations and so, the scores could be used as summarized single indicators to sort firms according to changes in their financial health in a more sensitive way.

Conclusion
The paper suggests a methodological perspective for the first time on how Fuzzy c-means (FCM) clustering algorithm could be used in order to sort firms according to changes in their financial health compared to previous year.Accordingly, to show this methodology, we applied FCM clustering algorithm and produced fuzzy financial health scores (F-FHSs) of selected 166 active firms listed and traded on National Market of Istanbul Stock Exchange by benefit from F-Score's delta determinants calculated via using the accounting numbers of 2013 and 2014.This implementation enables us to classify firms in a more sensitive way based on single numerical indicator.
A correlation analysis was executed between calculated scores and realized returns for a given short term in order to investigate the employability of F-FHSs.The results indicate that FCM clustering algorithm is beneficial tool to sort firms according to their financial health level and can provide a sensitive and single summarized indicator for investment decisions based on recently announced accounting numbers especially for markets with low efficiency.
In this paper, we tried to show this methodological perspective through empirical implementation by using F-Score's delta determinants.On the other hand, this is not the only option.Also, the best-fit mix of determinants to produce most efficient F-FHSs can be investigated which is also closely related with the subject of value or behavioral relevance of accounting numbers.

Figure 1 .
Figure 1.The change in cluster validity indices according to the number of cluster, (left) XB index, (right) Bezdek's partition coefficient (Hammah and Curran, 1998)of ./12,34 in objective function is expressed as the degree of fuzziness or fuzzifier, and it determines the degree of overlapping of clusters.The situation of ". = 2" which means that the clusters are not overlapping represents the crisp clustering structure(Hammah and Curran, 1998).Here, 5617 & , -! 4 is the measure of distance between 8. object and *. cluster center.FCM clustering algorithm specifically uses Euclidean distance.Quadratic distance ensures that the objective function is not negative definite, > 0 .
; Kim and Ramakrishna, 2005; Celikyilmaz and Turksen, 2008).FCM clustering method is based on a constrained optimization problem reaching the optimum solution with the minimum of the objective function.The mathematical model of this optimization problem with two prior information such as number of cluster, 8 and fuzziness parameter, B is identified as: min CDXE U, VF = G G H; <,( I J +' # D! ( , @ < F+

Table 2 . Euclidean norms calculated for the cluster center vectors
Source: Developed by authors.