An Integrated Fuzzy Clustering Cooperative Game Data Envelopment Analysis Model with application in Hospital Efficiency

Hospitals are the main sub-section of health care systems and evaluation of hospitals is one of the most important issue for health policy makers. Data Envelopment Analysis (DEA) is a nonparametric method that has recently been used for measuring efficiency and productivity of Decision Making Units (DMUs) and commonly applied for comparison of hospitals. However, one of the important assumption in DEA is that DMUs must be homogenous. The crucial issue in hospital efficiency is that hospitals are providing different services and so may not be comparable. In this paper, we propose an integrated fuzzy clustering cooperative game DEA approach. In fact, due to the lack of homogeneity among DMUs, we first propose to use a fuzzy C-means technique to cluster the DMUs. Then we apply DEA combined with the game theory where each DMU is considered as a player, using Core and Shapley value approaches within each cluster. The procedure has successfully been applied for performances measurement of 288 hospitals in 31 provinces of Iran. Finally, since the classical DEA model is not capable to distinguish between efficient DMUs, efficient hospitals within each cluster, are ranked using combined DEA model and cooperative game approach. The results show that the Core and Shapley values are suitable for fully ranking of efficient hospitals in the healthcare systems.


Introduction
Health is one of the most important issue in every society, hence providing good health care services is the center for well-being of people in the society. On the other hand, considering the extent and services that are offered in health ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R I P T 5 hospitals in East Spain. Their method was useful for both health administration controlling hospitals performance and hospitals management. The results showed that the efficiency of the services were above the mean. Shahhoseini et al. (2011) measured technical efficiency of 28 similar types of hospitals (public and private) in all provinces of Iran. They collected the inputs data as the number of active beds, number of other professionals, number of nurses and number of physicians. Also, operations, outpatients visit, bed occupancy rate, average length of stay and inpatient bed days were outputs. The results indicated that 60 percent of hospitals are technically efficient and there are excess number of inputs (specifically in their non-clinical human resources) that should be attended by the managers. Rezaee and Karimdadi (2015) proposed a multi-group DEA model for considering the geographical location in efficiency evaluation. They selected inputs such as number of medical equipment, total number of personnel and number of operational beds. Also, the outputs were number of inpatients, number of outpatients, number of special patients, bed-day and bed occupancy rate. Lindlbauer and Schreyogg (2014) analyzed the association between hospital specialization and technical efficiency using a dataset for 11 consecutive years. Their results showed that the efficiency has negatively associated with Casemix specialization but positively associated with medical specialization. Fragkiadakis et al. (2014) evaluated the operational and economic efficiency of 87 Greek public hospitals using DEA over the period [2005][2006][2007][2008][2009]. They have also explored the efficiency trends over time and investigated the factors that can explain the efficiency results. Gholami et al. (2015) examined the influence of IT investment on efficiency and quality of 187 US hospitals. They He offered inputs as the number of beds, number of primary care physicians, number of specialists and the produced outputs were inpatient discharges, outpatient visits and surgical operation are considered. He compared performance of hospitals and confirmed that the expected benefits from the health reforms in Turkey had been partially achieved in the short run. Using the Malmquist index, Anthun et al. (2017) investigated the productivity growth and optimal size of hospitals in Norway. They collected data of 16-years, 1994-2014 and indicated that the mean productivity increased by 24.6% with annual change 1.5%. They also concluded that estimated optimal size was smaller than the actual size of most hospitals.

ACCEPTED MANUSCRIPT
A C C E P T E D M A N U S C R I P T 6 Several researchers have used advanced DEA models to evaluate the hospitals. Ancarani et al. (2009) introduced a two-stage analysis for measuring hospital wards' efficiency. In the first stage, DEA was used to calculate technical efficiency scores of large Italian hospitals and in the second step, the variables affecting on DEA scores were considered. They presented the indicators for inputs as number of beds, surgery room utilization, number of physicians, units of non-medical personnel and maintenance costs for equipment. The outputs were number of cases multiplied by average diagnosis related group (DRG) weights, day-hospital or/and day-surgery cases and ambulatory cases. The results showed that both exogenous re-organization processes and decisions internal affected the ward's efficiency. Du et al. (2014) developed a slack-based additive super-efficiency DEA model to evaluate 119 general acute care hospitals in Pennsylvania. In their study, the inputs were both physical and financial and produced outputs were health services and health outcomes. They considered the quality and quantity indicators for both inputs and outputs. Kawaguchi et al. (2014) presented a dynamic network DEA to evaluate both the efficiencies of separate hospitals and the dynamic changes of efficiencies. The purpose of their study was to evaluate the policy effects of the reform for municipal hospitals from 2007-2009 in Japan. Kao et al. (2011) presented a two-stage approach of integrating independent component analysis and DEA to efficiency measurement of 21 hospitals of Taiwan in 2005. They compared the DEA and principal component analysis-DEA models. The results showed that the proposed model could improve the discriminatory capability of DEA efficiency.
Cross-efficiency DEA has been applied in many studies. For example, Costantino et al. (2013) evaluated hospitals in a region of Southern Italy using fuzzy cross-efficiency DEA model. They used triangular fuzzy numbers to deal with uncertain data and estimated a fuzzy triangular efficiency for each hospital through a cross-evaluation by a compromise between objectives. Finally, results were defuzzified to obtain the ranking. Dotoli et al (2015) presented a novel cross-efficiency fuzzy DEA technique to evaluate the performance of DMUs under uncertainty and applied the proposed technique to performance evaluation of healthcare systems in an Italian region. Ruiz and Sirvent (2017) developed a fuzzy cross-efficiency evaluation based on possibility approach. This method was presented for fuzzy inputs and convex outputs. They also extended benevolent and aggressive fuzzy formulations in order to deal with the alternate optimal for the weights. In the previous works, some papers focused on the generating weights in cross-efficiency DEA model. As shown in the literature, the cross-efficiency DEA approach has some drawbacks.
For instance, it produces the weights which may not acceptable for all DMUs (Wu et al., 2009;Lam, 2010). To overcome this problem and produce an acceptable and fair weights, different models have been introduced by

ACCEPTED MANUSCRIPT
A C C E P T E D M A N U S C R I P T 7 researchers. Ramon et al. (2010) focused on the choice of the weights profiles to be used in the calculation of the cross-efficiency scores. Their approach allows the inefficient DMUs to make a choice of weights that prevent them from using unrealistic weighting schemes. Lam (2010) Lin et al. (2016) used an iterative method for determining a unique weight set for positive input and output data and reducing the number of zero weights in crossefficiency evaluation.
One of the powerful techniques for producing a set of fair weights is game theory approach. Liang et al. (2008) presented a new method based on cross-efficiency and non-cooperative game. Wu and Liang (2012) proposed a game cross-efficiency DEA model in which each DMU was viewed as a player who seeks to maximize its own score under the condition that the cross-evaluation scores of each of other DMUs does not deteriorate. Tavana and Khalili-Damghani (2014) proposed an efficient two-stage fuzzy DEA model with uncertain inputs and outputs to evaluate the efficiency scores of a DMU and its sub-divisions. They decomposed the efficiency score of two-stage DMU and used the Stackelberg game to calculate the efficiency scores of sub-divisions. Finally, they used the Monte Carlo simulation procedure to discriminately rank the efficient DMUs and sub-divisions. Liu et al. (2017) used cross-efficiency evaluation in concept of aggressive game cross-efficiency and proposed an aggressive Some researchers have used fuzzy C-means (FCM) algorithm for clustering DMUs in DEA context. FCM algorithm has initially been introduced by Bezdek (1973Bezdek ( , 1981 for clustering data. Ben-Arieh and Gullipalli (2012) used FCM clustering method for utilizing DEA with sparse input and output data. They applied optimal completion strategy algorithm to estimate the missing values and investigate data recovery effects on DEA results. Amin et al. (2011) clarified the role of alternative optimal solutions for the DEA clustering approach. They showed that different optimal solutions may conclude different clusters with different sizes and different production functions. Samoilenko and Osei-Bryson (2008) increased the discriminatory power of DEA model in a heterogeneity situation.
They used cluster analysis to inquire into the differences between the DMUs in the sample. Then, they applied DEA to calculate the relative efficiencies of the DMUs in each subset of the sample. Azadeh et al. (2010) composed the integrated fuzzy DEA model with fuzzy C-means and used the model for cellular manufacturing system. Each of clusters indicated a degree of desirability for operator allocation. Herrera-Restrepo et al. (2016) used an integrated principal component analysis (PCA), DEA and clustering approach for Bank branch operational performance. They detected influential branches by PCA and then, clustered branches based on operating characteristics. Finally, they applied DEA to study branch efficiency performance from meta-frontier and cluster-frontier perspectives.
This paper evaluates 288 hospitals in 31 provinces of Iran. The provinces of Iran are different in term of economic growth, population, gross domestic product (GDP) and etc. It is clear that the characteristics of each province have impact on performance of hospitals. Therefore, in this study, first the provinces are clustered using a FCM algorithm to increase the homogeneity among hospitals. After dividing the provinces to different clusters, DEA has been applied for efficiency estimating of hospitals within each cluster. Although the DEA model determines the efficiency score for hospitals, but it is not able to distinct between efficient units. In recent years, many studies have focused on ranking efficient DMUs. Perhaps super efficiency model of Anderson and Peterson (1993) is one of the most common approach used for ranking efficient DMUs. However, as suggested by Banker and Chang (2006), the Andersen-Petersen super-efficiency procedure may not produce correct ranking, since it is based on different ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R I P T 9 frontiers for different efficient DMUs, hence the efficiency scores generated may not be fair. To overcome this problem, we propose to combine DEA model with cooperative game approach to produce the fair efficiency scores using Shapley value. In addition to the Shapley value, the Core is applied to evaluate the efficient DMUs and the results of Shapley value and Core are compared.
The rest of this paper is organized as follows: in Section 2, the cross-efficiency DEA with Core and Shapley value approaches is described. In section 3, two numerical examples from the literature are compared with the proposed model. Section 4 discusses the proposed fuzzy C-mean clustering algorithm as well as the selection of input and output variables. In section 5, the applicability of the proposed integrated DEA and cooperative game approach has been shown by applying it to the real dataset of hospitals in Iran. Finally, conclusion and direction for future research have been drawn in Section 6.

Methodology
The methodology of this paper is based on fuzzy C-means for clustering provinces, cross-efficiency DEA for estimating of hospitals in each cluster and Core and Shapley value for fully ranking of efficient hospitals. Hence, this section describes foundations for the above methods.

Fuzzy C-Means
Fuzzy C-means (FCM) algorithm developed by Dunn (1973) is one of the common clustering techniques for allocating data points to two or more clusters (Zhang et al., 2016). It is used for pattern recognition and clustering tasks. Clustering is the process in which the samples are divided into the categories with similar members. These categories are called clusters. A cluster is a collection of similar objects that are different from objects in other clusters. Clustering is heterogeneous population distribution into a number of homogeneous sub-categories or clusters. For being similarity, the various criteria can be considered. In this paper, FCM technique is applied to classified Iranian provinces into several clusters. In Iran, some provinces are larger and more developed than others, hence we have classified them based on population and GDP per capita. To classify data based on similar property, Bezdek (1981) presented FCM algorithm for clustering n measured DMUs (objects, hospitals, etc.) into C clusters.
The algorithms clusters data to two or more clusters using minimization of an objective function J(U,V) defined as follow: Where m is any real number between one and infinity (1 ) m    is the controller of fuzziness, i v is the center of each cluster, so phrase ... delivering the concept of similarity between each data and the center of each cluster with respect to a fuzzy partition matrix U and a set of prototype V. By minimizing the above objective function and put zero with constraint , the matrix U can be gained as Formula (2): Now, a new set of prototype V is defined as follow: Using the formulas (1), (2) and (3), data can be classified based on similar characteristic. The steps of the FCM algorithm are summarized as follows: Step 1: Randomly select the set of c and centers V and initialize matrix U by using Formula (2).
Step 2: update the centers of each cluster by using Formula (3).
Step 3: calculated a new objective function by using Formula (1).
Step 4: if new old JJ  stop, otherwise returns to step 2.
By using the above FCM algorithm, the provinces of Iran are classified in different clusters. Then, in each cluster, the following methodology is run separately.
As mentioned before, in above DEA model, inefficient hospitals can be fully ranked, while the scores of all efficient hospitals are equal to unit. In other words, DEA is not able to fully rank of the efficient hospitals. Several researchers introduced different approaches for ranking efficient units including the cross-efficiency. The advantage of cross-efficiency is using peer evaluation instead of self-evaluation. The cross-efficiency matrix

Core and Shapley value
In cooperative game theory, Core and Shapley value are used to divide the pay-offs gained by coalition between members. For using cooperative game, first the pay-off of each coalition should be calculated. According to Nakabayashi and Tone (2006) is computed by equation (7).
To prove the equation (6), Nakabayashi and Tone (2006) considered the model (8) as the characteristic function for game (N, C), where N is the number of players and C is the characteristic function. Nakabayashi and Tone (2006) proved that the game (N, C) in model (8) is supper-additive

ACCEPTED MANUSCRIPT
The dual program of model (8) is presented as follows: One can easily find out that the optimal solution of the model (9) is as equation (6). So, the equation (6) can be considered as characteristic function instead of the model (8). After calculating the pay-offs of each coalition, the pay-off for each player in coalition can be calculated by Core and Shapley value approaches.
The Core concept was introduced by Gillies (1959) (10) and (11): and for every coalition x is the pay-off of ith player in the coalition S. For more details, reader can refer to Gillies (1959).
Actually, there is not a single vector x that satisfies (10) and (11). To create a vector x which belongs to the least Core, the model (12) is introduced. The model (12) does not guarantee satisfying (10) and (11), only when the 14 optimal solution to model (12) is non-negative the solution found belongs to the Core and thus satisfies (10) and (11).
Model (12) (13): where s is the number of players in coalition. The phase { ( ) C(S i)} CS  means that if player ith joins to the coalition S, how much value can be increased.
This paper finds out the fair weights for fully ranking of efficient hospitals in each cluster. To calculate a set of fair common weights, the model (14) was presented by Nakabayashi and Tone (2006). Although Nakabayashi and Tone (2006) only used the Shapley value for obtained the final common weights, this paper applied Core approach and compares the results of two methods. In model (14) Where ' j wE multiplies by jth column of matrix ' E (the normalized cross-efficiency matrix). Finally, the final DEA-Game efficiency score of jth hospital is obtained from equation (15).
Where * d w is the optimal weights calculated by model (14).
We should have pointed out that if there are many number of efficient DMUs, for example 30, it is necessary to

Numerical examples: comparison with state-of-the-art
In this section, the proposed approach is compared to some previous studies. The first example considers the dataset used in in Wu and Liang (2012). As seen in Table (2), the dataset has 1 input and 4 output variables.
[ Table 2 here] The results of DEA, arbitrary cross-efficiency DEA (traditional cross-efficiency DEA), aggressive cross-efficiency DEA (proposed by Sexton et al., 1986), cross-efficiency DEA-Game (Wu and Liang, 2012) and the proposed crossefficiency DEA-Game in this paper are shown in Table (3).

ACCEPTED MANUSCRIPT
A C C E P T E D M A N U S C R I P T

16
[ Table 3 here] The rankings produced by each model are also shown in Table 3. The Spearman's Rank-Order Correlation between the proposed DEA-Game model and DEA-Game suggested of Wu and Liang (2012) is 0.829 which is significant at the 95% level.
We further, compare our results with the DEA-Game approaches suggested by Li et al. (2016) and Hinojosa et al. (2017). For this purpose, consider the data and the results of the different models are reported in Table (4). [ Table 4 here] As seen in Table ( (2017) is also significant.

An application in hospital efficiency
The data in this study are gathered from 288 hospitals in 31 provinces of Iran. First, the FCM is applied for clustering the provinces. In this paper, the criteria considered for clustering of provinces are gross domestic product (GDP) per capita and population. One of the most important criteria for examining the amount of attention to the health sector is the index of GDP. GDP is the monetary value of all the finished goods and services produced within a country's borders in a specific time period including industry, agriculture and services. Healthcare is also one of the sub services and hospitals are the most important medical center of health systems. The low share of healthcare in GDP causes the reduction of quality of medical services. Lack of sufficient attention to share of healthcare in GDP and not allocating sufficient funds to this sector has the negative effects on people health. Besides this, GDP per capita is the proxy of income and income has impact on the rate of going to the hospitals. So, the first index for clustering the province is GDP per capita which has undeniable impact on hospitals performance in each province.
In other words, the hospitals in provinces with similar GDP per capita should be compared with each other.

A C C E P T E D M A N U S C R I P T
17 Other index in data clustering is the population of the province. Hospitals are associated with a large portion of society, therefore, in most countries access to health services is known as a basic and essential right for citizens. On the other hand, health improvement and expansion of health services have a significant impact on major factors such as population, fertility, mortality, immigration, family and so on. Also, in provinces with high population, entering and leaving patients are more, so they need more beds and equipment. Therefore, population is an important factor for estimating hospitals performance. The GDP per capita and population of each province are shown in Table (5).
Also, Figure (1) shows the results of FCM method for clustering the provinces based on GDP and population indicators.
[ Table 5 here] [ Figure 1 here] One of the most important steps in evaluation of hospitals is selection the suitable input and output variables. For selection of suitable inputs and outputs, different researchers have selected different variables. According to the previous studies and available data, this paper considers the input variables as the total number of personnel, number of medical equipment in each hospital and number of active beds that means the beds which are available for use.
The personnel are staffs, permanent staffs, contract workers and other staffs. The selected outputs are the number of inpatients, outpatients and special patients separately and the fourth output is bed-days. The variable number of beddays is non-discretionary, and it should be considered as non-discretionary in output-oriented DEA models. Since this paper apply an input-oriented DEA model, so non-discretionary of the output variable such as bed-days would not affect the results. If one run an output-oriented version then it requires to consider number of bed-day as a nondiscretionary variable. Also, in this paper, similar to the most studies, number of active beds has been considered as a proxy for capital in hospital (Csakvari et al. 2014; Rezaee and Karimdadi, 2015;Lobo et al. 2016). When running input-orientation, obviously the lower the number of active beds means the higher efficiency. Minimum, mean and maximum value of the selected inputs and outputs for each cluster are shown in Table (6). [ Table 6 here]

A C C E P T E D M A N U S C R I P T
18

Results and discussion
In this section, the results of efficiency estimating of 288 hospitals in Iran are evaluated. First, FCM algorithm is applied for clustering the provinces based on GDP per capita and population. In this paper, FCM is configured as follows: number of clusters C and fuzziness parameter m are set to 5 and 2, respectively. We assumed convergence criterion  and maximum number of iterations as Based on Bezdek's suggestion, the value of C should be selected between 2 and n (Bezdek, 1998). In our case study, C is between 2 and 31. In Table ( [ Table 7 here] [ Figure 2 here] As mentioned before, we analyze the results for C=5. Pars / North Dome field, which is a natural gas condensate field located in the Persian Gulf, is near of Bushehr. We have used an input-oriented DEA model (4)  [ Table 8 here] Note that the proposed methodology only is explained for cluster 1, similar discussion can be given for other clusters. As shown in Table (8), eleven hospitals in cluster 1 are efficient, hence the traditional DEA cannot rank these hospitals. We used the proposed game theory for ranking efficient DMUs. First, by using the weights of inputs and outputs from model (4), the cross-efficiency matrix, shown in Table (9), is constructed based on Formula (5). As seen in Table (9), some cross-efficiencies are very low, for instance, the efficiency score of East Azarbaijan 6 is 0.107 by using weights of East Azarbaijan 5. This is a common drawback for cross-efficiency DEA since not all DMUs would like to use the weights generated by one unit only. To overcome this problem and to produce fair and acceptable weights, in this paper, the game theory is combined with the cross-efficiency DEA.
[ Table 9 here] Table (9) should be row-normalized. Then, for estimating efficiency scores of efficient hospitals, cooperative game approach is applied. In cooperative game, each efficient hospital is considered as a player and players form a coalition with each other. Now we can use the Core and Shapley value methods with pay-off of coalitions. The Core score of each player is calculated using model (12) and the Shapley value of each player is calculated using formula (13). Figure (4) shows the Core and Shapley value of each hospital of cluster 1 before using common weights.
[ Figure 4 here] Since the weights of standard cross-efficiency DEA model are not fair, the common weights generated by model  Table (10). [ Table 10 here] As shown in Table ( In each of the two methods, hospitals of East Azarbaijan 14 and East Azarbaijan 5 have ranked first and second. In Shapely value, East Azarbaijan 4 and in Core, Gilan 6 have ranked third. Also, In Shapely value, Gilan 6 and in Core, West Azarbaijan 1 have ranked forth. In Table (10), for other clusters, final ranks of Shapley value and Core methods are shown. This shows that how one can rank efficient units in DEA using game theory within each cluster in a fair and acceptable way.
The results of Table (8) show that in Cluster 1, there are 11 efficient hospitals and 46 inefficient hospitals. That is, more than 80% of hospitals in cluster 1 are inefficient. In this cluster, the most efficient hospitals are located in East Azerbaijan which is more developed than other provinces. Also, among the efficient hospitals, according to Table   (10), two hospitals in East Azarbaijan are ranked first and second. The results indicate that the policy makers should give priority to improve the performance of hospitals in other provinces of this cluster.
In cluster 2, according to the results of Table ( According to the results of Table (8), in the third cluster, 64 hospitals are inefficient. All three hospitals in Semnan province, which have been investigated, are efficient. Yazd province has the first and second ranks of the most efficient hospitals (see Table (10)) and the first and second ranks of the most inefficient hospitals (see Table (8)).

ACCEPTED MANUSCRIPT
A C C E P T E D M A N U S C R I P T

22
The cluster 4 includes the large and developed provinces. In this cluster, 17 of the 72 hospitals under evaluation are efficient. In the last cluster, only hospitals in Tehran province are located. Tehran is the capital and most developed province of Iran. The mean of efficiency scores for hospitals in the Tehran province is over 72%, which indicates that hospitals have had a good performance. The results reported in Tables (8) and (10) are very useful to policy makers as they can priorities to take steps to improve the performance of hospitals in undeveloped provinces.
We should also mention that the running times of the proposed DEA-Game model for different clusters are shown in [ Table 11 here]

Conclusion
This      A C C E P T E D M A N U S C R I P T 37

ACCEPTED MANUSCRIPT
A C C E P T E D M A N U S C R I P T 43