Good Governance Problems and Recent Financial Crises in Some EU Countries

The starting point for the research has been the list of 147 banking crises within the period 1976–2011 prepared by the International Monetary Fund. The countries with crises have been analysed with respect to publicly available World Bank indicators in the periods of three years before the crises. The machine learning methodology for subgroup discovery has been used for the analysis. It enabled identification of five subsets of crises. Two of them have been identified as especially useful for the characterization of EU countries with banking crises in the year 2008. Fast growing credit activity is characteristic for the first subgroup while socioeconomic problems recognized by non-increasing quality of public health are decisive for the second subgroup. Comparative analysis of EU countries included into these subgroups demonstrated statistically significant differences with respect to World Bank good governance indicator values for the period before the crisis. Control of corruption, rule of law, and government effectiveness are the indicators which are statistically different for these sets of countries. The significance of the result is in the segmentation of the corpus of countries with banking crises and the recognition of connections between banking crises, socioeconomic problems, and governance effectiveness in some EU countries. The authors present the methodology and the results of their paper in the talk now available as a video: http://videolectures.net/ktsymposium2013_gamberger_modeling/ JEL C63 E01 H11 J00


Introduction
In systemic banking crises multiple banks in a country fail simultaneously and the effects on economy of the country may be significant. Recent crises (2007)(2008)(2009)) stimulated a large number of research directed towards analyses of linkages between financial institutions and interbank exposures (Caldarelli et al., 2013;Haldane and May, 2011) and their impact on crises development and systemic risk (Pokutta et al., 2011). This trend has been a consequence of the general opinion that the last crises are primarily borne by the global financial system. However, in most cases of systemic crises in history, one cannot ignore the mutual dependencies between the real economy and financial sector for the crises development (Nicolo and Lucchetta, 2011).
The work presented in this paper started with identification of financial and socio-economic risk factors and their combinations that present an environment in which systemic banking crises are more likely to develop. The source of the data is the database with country level indicators publicly available from the World Bank. For the analysis a machine learning methodology aimed at identification of relevant subgroups of cases has been used. The methodology is especially appropriate for descriptive analysis of available data because it generates rule based models that can be easily interpreted by human experts (Gamberger and Lavrac, 2002). The methodology has been already successfully applied in a few medical domains and in a domain of political stability (Lambach and Gamberger, 2008). The result of application of this methodology is identification of five subgroups of banking crises that are relatively homogenous in respect to the values of World Bank indicators in the period of three years before the crises. By the analysis of crises in EU countries in the year 2008 included in these subgroups it has been detected that some of them are characterized by socioeconomic problems recognized by nonincreasing or decreasing quality of public health. By comparative analysis it has been detected that these countries had statistically significant decrease in some of good governance indicators before the onset of the crisis. The result is surprisingly in accordance with the model constructed by Francis (2003) connecting governance indicators and financial fragility. The relevance of the result is in the fact that the correctness of the model is confirmed on the completely independent set of crises and with a completely different methodology. Additionally, our result demonstrates that Francis's model is not universal, i.e. that there are banking crises that cannot be attributed to the problems of governance and that applied subgroup discovery methodology is a powerful tool for the segmentation of the corpus of crises.
The organization of the rest of the work is as follows. In the next section we present the preparation of the data set used for the analysis while in Section 3 we give a short description of the used methodology. It follows presentation of induced subgroups in Section 4 and analysis of the relevance and the meaning of the obtained results in Section 5. The central part of the work is in Section 6 which includes evaluation of crises in detected subsets of EU countries in respect of good governance indicators. The discussion and conclusions are in Section 7.

Data
The research presented in this paper is based on the list of banking crises reported by Valencia and Laeven (2012). In total 147 crises in the period 1976-2007 have been described in this document and they have been used as the positive cases for our analysis. 29 out 147 crises are in the period 2008-2011. Examples of crisis cases are China in year 1998 and USA in years 1988 and 2007. As a control group we have used 287 cases. This control group of non-crises or negative cases has been chosen from the same countries that have experienced banking crises, but in such manner that negative cases are in 10 years increments separated from positive cases for the same country. Examples of non-crisis are Finland in years 1971, 1981, and 2001and UK in years 1977, 1987, and 1997. The reason is that Finland had banking crisis in the year 1991 and UK in the year 2007. A period of 10 years without crisis is assumed as long enough to demonstrate that a country is a good representative of a non-crisis case. The total number of non-crises cases represents the maximal number of cases that could be generated with the given constraints.
The crisis and non-crisis cases have been described by 105 indicators available from the World Bank dataset. The indicators are publicly available from the World Bank Data website (http://data.worldbank.org/indicator). At first we included 5 indicators suggested as potentially relevant by the mentioned International Monetary Fund document (Laeven and Valencia 2012). They are: current account balance as percentage of GDP, central government debt as percentage of GDP, domestic credit to private sector as percentage of GDP, foreign direct investments as percentage of GDP, and bank capital to assets ratio. Besides them we have included 100 other indicators from various data fields of the World Bank database. Included indicators are from economic policy, health, agriculture, and gender sets of data. From each field we tried to select a few most representative indicators. In order to be able to compare countries of largely different indicator values (such as GDP) on relatively equal terms, instead of using absolute value indicators we used only indicators of fractional or relative type (examples are: percentage of rural population, life expectancy at birth, percentage of unemployment with tertiary education, and research and development expenditure as percentage of GDP). Good governance indicators have not been included because their values are available only for the period after year 1996. Important data preprocessing step was transformation of basic indicators into values from the temporal window preceding positive/negative case year. We have used as a representative temporal window the period of 3 years before the event. Besides 3 basic values, 6 new indicators are introduced for each window: mean value for the window, slope, minimum value, maximum value, and relative years related to minimum and maximum value of an indicator in the time window before the event year. The result is a dataset consisting of 147 positive examples and 287 negative examples so that each of these examples is described by 945 (105 times 9) numerical attributes. Some of these attributes have unknown values. The dataset is prepared in the form that may be used by diverse machine learning systems. In the our previous work we have analysed crises till year 2007 . In this work we have used as negative cases countries that have never experienced banking crises and the countries that have experienced crises but so that negative cases are at least 10 year far from the period of crises. Due to the fact that in the period 2008-2011 crises have been experienced in many developed countries and that this period is the most interesting for the analysis, it has been decided that negative cases can be only from countries that have experienced banking crises. The reason for such a bias in constructing negative cases is to avoid detection of differences between countries experiencing crises and those that never had a banking crisis, especially to avoid detection of differences between developed economies in which crises are relative often and undeveloped economies in which banking crises are rare. The aim of the work has been to identify differences in indicator values that precede the outburst of crises. The underlying problem of the analysis based on this dataset is its skewness between positive and negative cases with respect to temporal dimension: many positive cases are from the most recent period while most of the negative cases are from the period before the year 2000. This bias could possibly lead to induction of subgroup descriptions that reflect time related development of countries rather than crises development. We have tested this possibility by repeating the complete subgroup induction process with the same examples but classified so that 147 most recent both crises and non-crises examples (after year 1996) have been positive examples and all other examples (before year 1996) have been set as negative. The resulting subgroups have been significantly different from those obtained for the crises/non-crises classification of cases. Based on this result it may be concluded that in spite of time related skewness of data we are able to induce crises related results.

Methodology
Subgroup Discovery (SD) was introduced as a data mining methodology by Klösgen (1996) and Wrobel (1997). SD techniques aim to discover distinct but potentially overlapping subsets of the population that are statistically unique or interesting and at the same time as large as possible. The goal of the subgroup discovery is induction of human interpretable descriptions of subgroups. The input is a set of cases consisting of a group of positive cases P (countries experiencing banking crises in a specific year) and a control group of negative cases N (countries in a period of no banking crisis). The subgroup discovery algorithm constructs rules that are true for positive cases and false for negative cases. It is not necessary that rules are true for all positive cases and false for all negative cases, but the intention is to find short rules that are true for large subsets of positive cases and at the same time false for large subsets of negative cases. Subgroup sizes are not defined in advance but the algorithm tends to make them as large as possible. A rule with ideal covering properties is true for all positive cases and not true for all negative ones. Positive cases covered by a rule are also called true positives and their number is denoted by T P, while negative cases covered by the rule are called false positives (FP). All remaining negative cases not covered by the rule are called true negatives (T N). An ideal rule has T P = |P| and T N = |N|, and because of |N| = T N + FP the ideal rule has FP = 0.
The first step in the rule construction process is the construction of all possible features representing elementary rule building blocks (Fürnkranz et al., 2012). For numerical attributes the features have the form Attribute > value or Attribute < value. Examples of features for the attributes in the crises/non-crises dataset are percentage of rural population > 40.8 or slope of quasi-liquid liabilities < 0.11. For each input attribute there can be many different features and the process of their construction is well defined. Practically for each pair of one positive and one negative case it is possible to construct one feature for every attribute. For example, if we have a positive case with percentage of unemployment = 10 and a negative case with percentage of unemployment = 15 then a feature percentage of unemployment < 12.5 may be constructed. This feature will successfully discriminate between these two cases because it is true for the positive case and false for the negative one.
The central part of the rule construction process is the search algorithm for selecting combinations of features with optimal covering properties on the given set of cases. Features can be connected only by logical conjunction. This means that a combination of features is true for a case only if all features are true for the case and that a combination of features is false for a case if any of the features is false for it. In the subgroup discovery approach, the following rule quality measure Q is used as the optimization goal in the heuristic search of rules: Q = T P/(FP + g) where g is an appropriately selected generalization parameter. High quality rules will have a large Q value and they will cover many positive cases (large T P) and a low number of negative cases (small FP). The number of tolerated FP cases relative to the number of T P cases covered by the rule is determined by parameter g. Most relevant rules are typically generated with intermediate values of the parameter but final decision which model will be selected as most appropriate depends on human expert evaluation of included conditions, unexpectedness of the result, or on possible practical relevance of the rule. For the experiments with the crises/non-crises domain g values were varied in the range between .2 and 5 and the results reported in the next section are obtained with g value of 0.5.
The subgroup discovery methodology based on ILLM (Inductive Learning by Logic Minimization) system has been implemented at the Rudjer Boskovic Institute, Zagreb, Croatia. At http://dms1.irb.hr there is a publically available Data Mining Server that may be used for subgroup discovery tasks on user-submitted data. The server presents a very simple and user-friendly interface to the data analysis process, but is limited to 1000 cases and 1000 descriptors to prevent server overload. At http://dms1.irb.hr/do-illm/examples/list_of_examples.php one can find and download the complete dataset described in this work. The results reported in the next section can be repeated by uploading the dataset to the page, http://dms1.irb.hr/do-illm/bin/levelA/execute_levela.php, by selecting generalization parameter equal 0.5, selecting model complexity equal 4, and finally by pressing the Start induction button. Computation time is about 15 minutes.

Induced subgroups of crises
The result of descriptive induction process for the prepared dataset is detection of 5 subgroups of banking crises. The subgroups are defined by the list of included positives cases. Properties of the subgroups are described by necessary conditions and a list of supporting conditions. The necessary conditions are features that are used in the body of the rule and they must be satisfied in order a positive case is included into the subgroup. The supporting conditions are features that are typically true for positive cases included into the subgroup. They are determined by repeating the same subgroup discovery methodology on the dataset which includes only positive examples from this subgroup and all negative cases. In this case the complexity of constructed rules is limited to one feature with intention to get a list of independent features characteristic for the subgroup. The used generalization parameter value g is identical (equal 0.5) as for the basic subgroup induction.
According to the preliminary expert evaluation of detected necessary and supporting conditions each subgroup received a name. • slope of domestic credits to private sector as % of GDP in the period of three years before crisis > 5.8% per year • life expectancy for females three years before the crisis > 80.2 years.
The supporting conditions: • under-five mortality rate in the year before crisis < 8.0 (per 1000 live births) • population ages 14 and less in the year before the crisis < 21.6% of total population • population ages 65 and above in the year three years before the crisis > 11.0% of total population.
• market capitalization of companies maximal value in the three year period > 51.1% of GDP. • domestic credits to private sector as % of GDP in the year before the crisis > maximal value in two previous years.
• population aged 15-64 in the year before the crisis < 64.3% of total population • rural population in the year three years before the crisis < 33.7% of total population The supporting conditions: • under-five mortality rate in the year before the crisis < 60.3 (per 1000 live births) • life expectancy for females in the year before the crisis > 68.8 years • annual population growth maximal value in the three years period before the crises > 0.4 • road sector energy consumption in the year two years before the crisis > 13.5% of total energy consumption. The supporting conditions: • slope of under-five mortality rate in the period of three years before the crisis > -2.5 (per 1000 live births per year) • annual population growth three years before the crises > population growth one or two years before the crisis Supporting condition: • annual money and quasi money growth in the year two years before the crisis > 5.2% of GDP.

Analysis of subgroups
Subgroups 1 and 2 are relevant because they include many banking crises in developed countries in the years 2007 and 2008. There has been a strong "avalanche" effect causing that the crises occurred in many countries in the same time. But the detected conditions demonstrate that there have been also common patterns in many countries characterized by strongly increasing credit activity in economies with ageing population (Subgroup 1) or high credit activity in economies with high social security ( It is interesting to notice that this subgroup includes also the crisis in Spain in year 1977 and the crisis in Sweden in year 1991. Similarity among Subgroups 1, 2, and 3 is that they all include the same driving force (increased credit activity) in the societies that are not able to absorb these credits in a proper way. In Subgroups 1 and 2 this is mainly due to aging population while in Subgroup 3 this is due to relative small percentage of active population that in many developing countries is a consequence of very high percentage of young population.
For socioeconomic Subgroups 4 and 5 a common characteristic is decreasing quality of life. The interpretation is that decreasing quality of life is not a cause of banking crises but actually a sign of problems in the country connected with worsening of macro-economic situation which may be an environment for the development of banking crises. There are many possible causes of systemic problems including ethnic or civil wars, significant changes of economic system, and deep political crises. In this respect individual countries may differ significantly.
Subgroup 4 includes undeveloped and developing countries in which in some cases banking crises is related with turbulent conditions. For examples, Congo in the period 1991-1994 is faced with the suspension of military and financial assistance for the Mobutu regime, Kenya in year 1992 experiences significant violence in certain parts of the country before presidential elections, and finally Belarus, Kyrgyzstan, Lithuania, and Latvia in the year 1995 are post-communist countries trying to implement novel economical models. In all these cases, a common result has been an instable socio-economic system prone to banking crises. Besides stagnating life expectancy Subgroup 5 includes the necessary condition of stagnating mortality of children under the age of 5 years. Both conditions to some extent reflect the quality of the health care in the country and its decrease or stagnation seems to be able to detect various socioeconomic problems in the society. It is relevant to notice that besides crises in countries like Nigeria and Ukraine the model is valid for Finland, Norway, and Sweden crises in the year 1991, as well as for a few EU countries for the crises in the year 2008.
Based on the presented analysis it may be concluded that we have two types of subgroups. In the first are Subgroups 1-3 that have in common the increased values for the indicator of domestic credit to private sector in the country either as their necessary or sufficient conditions. One of these subgroups has also a supporting condition representing money and quasi money (M2) as percentage of GDP. Because of these indicators, the first group of subgroups may be recognized as a "financially driven" type of banking crises. Subgroups 4 and 5 have stagnating or decreasing life expectancy of females as a common necessary condition. Additionally, both subgroups have increasing or stagnating mortality of children as either necessary or supporting condition. They may be recognized as "socioeconomic problems" related type of banking crises.

Evaluation of EU countries included into Subgroup 5
Necessary conditions of Subgroup 5 are health related World Bank indicators that identify socioeconomic problems of a country. When accompanied by the supporting condition of significant money growth they present an environment in which banking crises may outburst. The subgroup includes crises in undeveloped and developing countries like crises in Sierra Leone in year 1990, in Kenya in year 1992, in Burundi in year 1994, and in Bulgaria in year 1996. It is interesting to notice that all three countries of the Nordic banking crisis in the year 1991 (Finland, Norway and Sweden) are also included in this subgroup. Even more intriguing is that crises in the year 2008 in six EU countries (Belgium, Hungary, Greece, Italy, Portugal, and Spain) are included as well. With the exception of Hungary, five of these crises are also in Subgroup 1. This fact may be interpreted as a sign that although the crises in these countries have been triggered by high credit activities that there exist also socioeconomic reasons for the crises.
An independent set of World Bank indicators has been used in order to test the hypothesis that EU countries included in Subgroup 5 are different from other EU countries with banking crises in the same year. This is the set of Worldwide Governance Indicators (http://info.worldbank.org/governance/wgi/index.asp) which consists of six aggregated indicators representing voice and accountability, political stability and absence of violence, government effectiveness, regulatory quality, rule of law, and control of corruption (Kaufmann et al., 2010).
A good characteristic of the indicators is that they are available as absolute values and as percentile ranks (p-rank, rank over the complete set of 215 economies presented in range 0-100 with 0 as the lowest value). The later is appropriate for comparative analysis of the performance of a single country or a group of countries.
We have used it to compute differences in ranking between years 2007 and 2004, i.e. for the period before the crises in year 2008. At first the differences have been computed for all six indicators for Belgium, Greece, Hungary, Italy, Portugal, and Spain and the results are presented in the upper part of Table 1. After that we have selected 5 other EU countries that all experienced banking crises in year 2008 but which have not been included into Subgroup 5. They are Austria, Denmark, France, Germany, and Netherlands. The results for these countries are presented in the middle part of Table 1. Finally, the differences between these two groups of countries have been evaluated by the T-test and levels of statistical significance are presented in the last row. Columns representing governance indicators are ordered in the sense of decreasing significance. The result means that most significant difference with level of 99.9% between these two groups of countries is in respect of control of corruption, followed by rule of law (level 97%) and government effectiveness (level 96%). The differences in remaining three indicator rankings are not significant but the differences in the total sum (last column) are significant with the level of 99%. In respect of this result, it is interesting to look also at the most recent World Bank data that are available for year 2011. Table 2 presents lists of EU countries with most significant decrease in relevant governance indicators for the period 2008-2011. It may be noticed that Greece and Italy are still on the top of these lists. A possible interpretation is that situation in these countries is not improving and the fact has been practically confirmed by their ongoing financial problems in years 2012 and 2013. It is perhaps even more relevant that Cyprus and Slovenia, for which it is known that they started to have financial crises in years 2012 and 2013, are both highly positioned in these lists.

Discussion and conclusions
The presented models are obtained by descriptive induction based on subgroup discovery methodology. The result are potentially relevant for human interpretation and better understanding of connections existing between publicly available World Bank indicators and occurrences of banking crises. The results confirm that excessive credit activity and high availability of money and quasi money present a high risk for outburst of banking crises. The novelty is that, besides these financial factors, all induced models include as necessary conditions also demographic and/or public health indicators. In Subgroup 1 the life expectancy of females should be interpreted as a sign of ageing population. This result actually means that high credit activity is dangerous especially in developed economies with ageing population. Because most of the crises described by Subgroup 1 happened simultaneously in year 2008, inter-country dependences are obviously very strong for this type of crisis. The available data do not include this information and consequently the induced conditions cannot include intercountry dependences but these relations must be taken into account in the expert evaluation.
Subgroup 5 is much more general because it includes undeveloped, developing, and developed countries in a relative large time span. The result actually states that banking crises may be expected in countries with some socioeconomic problems. Although socioeconomic problems may have various origins and causes, it seems that they may be, at least to some extent, identified by indicators like stagnating or decreasing life expectancy and stagnating or increasing mortality of children. The supporting condition for Subgroup 5 is high money and quasi money growth. The result is useful to understand that banking crisis in many cases is a normal consequence of problems that are not financially related but also that appropriate (restrictive) monetary politics may help that banking crisis does not happen.
The most relevant result is that socioeconomic problems detected by Subgroup 5 for some EU countries are strongly connected with changes in values of governance indicators for these countries before outburst of crises. From available data it is not possible to conclude on causality relations between banking crises, socioeconomic problems, and governance indicators. It is not clear if socioeconomic problems are the result of problems in governance or vice versa, and how they both are connected with banking crises but results demonstrate that selected socioeconomic and governance indicators collected and prepared by the World Bank may be used as warning signals for country level problems.
The results presented in Table 1 demonstrate that banking crises in some EU countries in the year 2008 and financial crises that followed have much more complex background than pure financial causes. The result is in accordance with the model developed by Francis (2003) that is connecting governance indicators and financial fragility. The significance of the result presented in this work is in the fact that correctness of the model is confirmed on the examples and by the methodology that are different from those used for the development of the model. Additionally, we demonstrate that trends (differences) of governance indicators are more relevant than their absolute values.
From the results it is also clear that Francis's model is not able to describe all cases of banking crises. According to our results it is valid only for a relatively small but significant part of crises. It means that appropriate modeling and understanding of banking crises is possible only after successful detection and grouping of similar patterns of events. The results presented in this work are perhaps the first step in this direction.
The importance of the work is in the fact that it clearly demonstrates that future banking and financial crises prevention should also focus on governance effectiveness, more strict law implementation and measures against corruption.

Grants
This work reported in this paper was supported by EU FP7 FOC-II project (255987) and Croatian Ministry of Science, Education, and Sport project "Machine Learning Algorithms and their Applications" (098-0982560-2563).