Identifying Excessive Credit Growth and Leverage

This paper aims at providing policymakers with a set of early warning indicators helpful in guiding decisions on when to activate macroprudential tools targeting excessive credit growth and leverage. To robustly select the key indicators we apply the “Random Forest” method, which bootstraps and aggregates a multitude of decision trees. On these identified key indicators we grow a binary classification tree which derives the associated optimal early warning thresholds. By using credit to GDP gaps, credit to GDP ratios and credit growth rates, as well as real estate variables in addition to a set of other conditioning variables, the model is designed to not only predict banking crises, but also to give an indication on which macro-prudential policy instrument would be best suited to address specific vulnerabilities.


Non-technical summary
Past financial crises and in particular the global financial crisis have shown that excessive credit growth often leads to the build-up of systemic risks to financial stability, which may materialize in the form of systemic banking crises. As mitigating systemic financial stability risks is the objective of macroprudential policy, several macroprudential tools have been designed to curb excessive leverage and/or build-up buffers against likely future losses.
Such instruments include the countercyclical capital buffer, the systemic risk buffer as well as a potentially time-varying leverage ratio, and instruments directly targeting borrowers such as loan-to-value (LTV) and loan-to-income (LTI) caps. However, the application of macroprudential policy is still at an early stage and much effort is currently being devoted to providing policymakers with concrete indications on how to actually design macroprudential instruments. Against this background, we propose an early warning model to be used for identifying those periods in which the build-up of leverage can be defined as excessive and may warrant the activation of relevant macroprudential instruments.
As in any early warning exercise, the target event is first defined. In the present case, the model is designed to issue warning signals well ahead of The second step is the selection of the candidate early warning indicators: in this respect, the dataset used in this application comprises publicly available aggregate credit-related, macroeconomic, market and real-estate variables.
The modelling technique is based on decision trees, in particular binary classification trees. Based on the results of a Random Forest, which consists in bootstrapping and aggregating several decision trees, we select the most relevant early warning indicators. On these we grow a benchmark early warning tree where the key indicators and the respective early warning thresholds are considered in a unified framework, i.e. by taking into account the conditional relationships between them. As a result, the model is designed to not only predict banking crises, but also to give an indication on which macro-prudential policy instrument would be best suited to address specific vulnerabilities. Such instruments include the countercyclical capital buffer, the systemic risk buffer as well as a potentially time-varying leverage ratio, and instruments directly targeting borrowers such as loan-to-value (LTV) and loan-to-income (LTI) caps. 2 However, the application of macroprudential policy is still at an early stage and much effort is currently being devoted to providing policymakers with concrete advice on how to actually design macroprudential instruments.
Indeed, the macroprudential policy strategy has been defined by the European Systemic Risk Board (ESRB) with reference to the guided discretion 1 As it is common in the macro-financial literature (see Section 2), this paper defines leverage as the ratio of a credit aggregate to GDP at the country level, while the microfinancial concept of leverage corresponds to debt divided by equity. Leverage in banking is the ratio of lending to equity and is indeed affected by some macroprudential measures. The broader definition of leverage used in this paper covers non-financial-corporations and household debt, i.e. a country's total private sector leverage. We use this definition of leverage to indicate the level of debt, as opposed to the concept of credit growth (and gap).
2 In Europe, the countercyclical and the systemic risk buffers are regulated at the EU level while LTI and LTI limits as well as the leverage ratio are currently based on national law.
principle, whereby the exercise of judgement is complemented by quantitative information derived from a set of selected indicators and associated 'early warning' thresholds. In particular, with respect to the countercyclical capital buffer, already the Basel Committee on Banking Supervision (BCBS) identified the aggregate private sector credit-to-GDP gap as a useful buffer guide, as this variable would have performed well in signalling the build-up of excessive leverage in the past. 3 However, policymakers should supplement the signal coming from credit-to-GDP trend deviations with judgement based on a broader information set, as implicitly suggested also in the current Capital Requirements Directive (CRD IV), which tasks the ESRB to provide recommendations on other variables which should inform the policy decision.
Taking into account other conditioning variables is necessary because not all credit expansions are bad for financial stability, and the heroic task of identifying credit bubbles in real time requires assessing whether conjunctural credit developments might be disconnected from fundamentals or reflect excessive risk taking and overly optimistic expectations.
Against this background, we propose an early warning model to be used for identifying those periods in which the build-up of leverage can be defined as excessive and may warrant the activation of relevant macroprudential instruments. In our analysis we consider several variables as a policy guide, select the most relevant ones on the basis of a robust quantitative assess-ment of their predictive power, and propose a fully-fledged system where the key indicators and the respective early warning thresholds are considered in a unified framework. The benchmark model we derive is a transparent tool which would also enable the public at large to understand and possibly anticipate macroprudential decisions. We achieve our objective by using decision tree learning, a statistical methodology which retains the advantages of the two approaches traditionally used in the Early Warning literature, i.e. the signalling and the discrete choice approach. The model we develop aims at identifying whether the European financial system is in a given period particularly vulnerable, a situation in which the increased likelihood and importance of a subsequent banking crisis would suggest to build-up capital buffers and/or to curb credit growth. The paper is structured as follows. The next section reviews the related literature on macroprudential tools, in particular the countercyclical capital buffer, and economic applications of recursive trees. In Section 3 we define our target variable, i.e. broadly speaking banking crises in the European Union in the last 40 years. Section 4 describes our candidate early warning indicators. Section 5 illustrates the Classification Tree approach and its extension to Random Forests. The results of the empirical analysis are presented in Sections 6 and 7. Section 8 describes for which countries the tree would issue early warning signals and why, while Section 9 describes the results of an out-of-sample exercise using only pre-2007 information. The policy implications of our findings are discussed in 10. Section 11 summarizes ECB Working Paper 1723, August 2014 7 the main conclusions.
2 Review of the Literature The literature on Early Warning Systems for banking crises has a long tradition (see e.g. Eichengreen and Rose (1998)). However, it has so far focused mostly on emerging markets and on identifying banking crises determinants without an explicit focus on the policy tools intended to reduce the likelihood and severity of such crises. The recent financial crisis and the subsequent policy responses have spurred the efforts towards providing policymakers with concrete indications on how to actually design macroprudential instruments.
Countercyclical capital buffers (CCBs) are one of the main tools envisaged by Basel III and the one on which the analytical framework is most advanced.
The countercyclical capital buffer is designed to increase the resilience of the banking sector and smooth the credit cycle, e.g. in ensuring that the flow of credit is not unnecessarily reduced due to pro-cyclical supply side constraints during a bust phase. BCBS (2010) states that the authorities responsible for operating CCBs should follow a common reference guide, based on the aggregate private sector credit-to-GDP gap. Indeed, Drehmann et al. (2010) and Drehmann et al. (2011) show that deviations of the credit to GDP ratio from a long term trend actually outperform other candidate early warning indicators such as GDP and credit growth, their ratio as such, as well as indicators based on asset prices or measures of banking sector performance.

ECB Working Paper 1723, August 2014
The credit-to-GDP gap, however, suffers from some shortcomings: among others, it may provide misleading signals in real-time as it is prone to large revisions (Edge and Meisenzahl (2011)). This is mainly due to the endpoint bias affecting the one-sided Hodrick-Prescott filter, which is widely used to extract the long-term trend. Moreover, this filter is sensitive to the choice of the smoothing parameter, and adjusts very slowly following during a reversal after a prolonged period of negative credit growth. Finally, positive deviations from trend could be due to either excessive credit growth or low or negative output growth, two scenarios which arguably require different policy responses (Repullo and Saurina (2011)). 4 Owing to the limitations of the credit-to-GDP gap, it is advisable to complement it with other early warning indicators, ideally in a multivariate framework.
Other capital-based instruments targeting excessive leverage are the leverage ratio and the systemic risk buffer. The former aims at addressing risks directly linked to excessive leverage, namely losses occurring in the wake of fire sales and adjustments in asset valuation. The latter is envisaged to increase resilience in the banking sector by addressing structural systemic risks like the size of the banking sector compared to the rest of the economy.
Hardly any applied research is available on the use of the leverage ratio for macroprudential purposes or on the systemic risk buffer. With respect to this latter, one of the biggest challenges is related to the notion of structural systemic risk itself, which is in practice open to interpretations and difficult to measure in an empirical exercise (see ).
With respect to instruments targeting borrowers, the literature suggests some indicators which could be taken into consideration when deciding whether to impose limits to loan-to-value and loan-to-income ratios, e.g. to prevent a credit boom fuelling an asset price bubble. Quite naturally, these indicators are mainly related to house prices (see e.g. Barrell et al. (2010),  and Mendoza and Terrones (2008)). Due to poor commercial property price data coverage and quality and owing to cross-country comparability issues with respect to LTI and LTV ratios themselves, assessing the 'early warning' performance of these promising indicators has been so far very challenging.
The multivariate methodology we propose to adopt to support decisions on the macroprudential instruments described above is decision tree learning, a greatly underutilized technology in economics. Indeed, while Classification and Regression Trees (CARTs, see L. Breiman and J. Friedman and R. Olshen and C. Stone (1984)) are extensively used in other disciplines from biology to chemometrics, their economic applications are rare. The Early Warning literature, in particular, has so far almost uniquely relied on two approaches, namely the signalling approach and the categorical dependent variable regression. The signalling approach has the advantage of being extremely straightforward. 5 Indeed, the early warning signal is issued when the 5 See e.g. Kaminsky and Reinhart (1999) and more recently Alessi and Detken (2011).
considered indicator breaches a pre-specified threshold, set by optimizing the past predictive performance. The downside of this approach is that it considers early warning indicators separately. Logit/probit regression, contrary to the signalling approach, offers a multivariate framework within which one can assess the relative importance of several factors. 6 However, while a desirable feature of an early warning system is to provide clear early warning thresholds for the considered indicators, the logit/probit model offers only an estimate of the contribution of each factor to the increase in the overall probability of a crisis, rather than a threshold value for each regressor. The early warning threshold for the estimated crisis probability is eventually set in a second step and outside of the logit/probit model itself. Moreover, this framework, the way it is commonly applied, is unable to handle unbalanced panels and missing data, which is a serious issue in particular with credit data, with the result that the regression can ultimately be estimated only on a relatively short sample. Decision trees, and classification trees in particular, retain the advantages of both approaches as they are on the one hand very easy to explain and use, and on the other hand able to provide an early warning system where the relevant indicators are considered in a unitary framework. Moreover, decision trees are not sensitive to outliers and can handle nonstationary time series, as the time dimension in not relevant in such a framework. We are aware of only a handful of papers using binary re-cursive trees for assessing vulnerabilities in relation to financial crises: Gosh and Gosh (2002) and Frankel and Wei (2004) analyze the determinants of currency crises, Manasse and Roubini (2009) and Savona and Vezzoli (2014) deal with sovereign crises, while Duttagupta and Cashin (2011) and Manasse et al. (2013) study banking crises in emerging markets. Similarly to this latter paper and to Savona and Vezzoli (2014), the present study grows the benchmark tree on the solid ground of a preliminary analysis based on bootstrapping and aggregating a multitude of trees. However, our explicit objective is to provide a set of triggers for macroprudential policy instruments in the European Union, therefore our crisis episodes and the countries considered are carefully selected accordingly. Moreover, as we adopt a strict policy perspective, we aim at a model that allows for timely decision making and therefore focus on identifying pre-crisis periods rather than crisis periods (see Section 3).

The Banking Crises Dataset
The basis for the banking crises dataset used in this paper is provided by the dataset assembled by Babecky et al. (2012). This quarterly dataset cov-  (2003)); ii) 'bank runs that lead to the closure, merger, or takeover by the public sector of one or more financial institutions' as well as 'the closure, merging, takeover, or large-scale government assistance of an important financial institution (or group of institutions) that marks the start of a string of similar outcomes for other financial institutions' (Kaminsky and Reinhart (1999)); iii) 'significant signs of financial distress in the banking system (as indicated by significant bank runs, losses in the banking system, and/or bank liquidations)' as well as 'significant banking policy intervention measures in response to significant losses in the banking system', where the considered measures include extensive liquidity support, bank restructuring costs, significant bank nationalizations, significant guarantees put in place, significant asset purchases, deposit freezes and bank holidays (Laeven and Valencia (2008), (2010), (2012)).
Neither of the above definitions of banking crisis, however, is fully aligned with the objective and operation of the macroprudential tools targeting credit, as they aim to avoid a broader array of circumstances than simply a banking crisis as defined in these terms alone. Therefore, we use an updated and slightly amended dataset with respect to the one constructed by Babecky et al. (2012), which has been built in the framework of a broader project by the European Systemic Risk Board on the basis of country experts' judgement (see Detken et al. (2014)). In this dataset, a banking crisis ECB Working Paper 1723, August 2014 is defined by significant signs of financial distress in the banking system as evidenced by bank runs in relevant institutions or losses in the banking system (nonperforming loans above 20% or bank closures of at least 20% of banking system assets); or significant public intervention in response to or to avoid the realization of losses in the banking system (see above to strong credit cycles during these periods. 15 banking crises have been deleted from the original databank: one in Austria, Belgium, Czech Republic, Ireland, Luxembourg and Slovakia; two in Estonia, Latvia and the UK; and three in Germany. Among the latter is included e.g. the 1974 Herstatt failure, which was due to settlement risk materialising. We refer to Detken et al. (2014) for further details.
generally relatively short, implying that the overall results would be driven by the evidence linked mainly to the global financial crisis, and in some cases exhibit peculiar patterns which would warn against pooling these countries together with the ones under study. The coverage of banking crises dataset constructed by the ESRB prevented us from extending the analysis to other advanced economies. Over the considered period, 25 separate crisis episodes are recorded for euro area countries, the UK, Denmark and Sweden. They are marked in black in Chart 1. While the incidence of crises shows a marked increase for the current financial crisis, only slightly more than half of the 21 country experts thought that for their country the current crisis met one of the above criteria. Moreover, some countries (Austria, Belgium, Luxembourg, Malta and Slovakia) did not record any crisis consistent with the above criteria over the sample period. Of the remaining countries, 8 experienced one crisis, 7 experienced two crises while the UK experienced three crises.
Finally, in constructing our binary target variable we take into account policy lags. For example, with respect to CCBs, banks should usually be given at least one year time to meet the additional capital requirements before any increases in the buffer take effect. An early warning signal leading the inception of the crisis by less than one year, or once the crisis is already in place, would be late. At the same time, we do not aim at building a model which predicts exactly when a banking crisis will materialize. Rather, we propose an Early Warning System signalling that financial imbalances are building up and the risk of a systemic crisis in the not-so-far future is ECB Working Paper 1723, August 2014 increasing. Therefore, we define as correct any warning signals issued in the four years preceding the start of a crisis, excluding from the analysis the three quarters immediately preceding the crisis and the crisis period itself. The precrisis periods are marked in red in Chart 1, while the periods excluded from the analysis are marked in grey. We do not remove from the sample the quarters following the crisis because our model is not expected to suffer from any post-crisis bias. 8 With the exception of the Spanish and Cypriot crises, the period after 2009Q1 is de-facto not taken into account while optimizing the early warning thresholds because the dataset ends in 2012Q4 and ignores whether a crisis happened in any of the countries in 2013.
including debt securities, to the non-financial private sector. We consider the y-o-y rate of growth, as well as the ratio to GDP and the deviations of such ratio from its trend (i.e. the 'gap'), computed with a backward-looking slowly-adjusting (λ = 400000) HP filter. This latter transformation assumes that the financial cycle is four times as long as the business cycle and has been suggested by BCBS (2010) -we'll therefore refer to it as the "Basel gap". However, such an HP trend might be adjusting too slowly following a prolonged period of negative credit growth, therefore we also consider an alternative gap computed with λ = 26000, corresponding to a financial cycle which is twice as long as the business cycle. We also look at the narrower bank credit aggregate, which we analogously consider as y-o-y rate of growth, ratio to GDP and gap. 9 The level of bank loans as a ratio to GDP is one of the indicators Schularick and Taylor (2012) take as evidence of a story of decades of slowly encroaching risk on bank balance sheets: by including it in our model we aim at exploiting the panel dimension in order to pin-down an 'early warning' level of aggregate leverage. 10 With respect to the time 9 Rates of growth are deflated by subtracting the y-o-y CPI changes. Gaps have been constructed by taking a standard HP filter for the first 5 years of available data and then a recursive HP filter. Although it is advisable to only use gaps after 5-10 years of data due to the start point problem affecting HP trend estimates (see Borio and Lowe (2002)), such an approach would have yielded too short time series. As a result, the evaluation of the predictive performance of gap measures would have been driven mainly by the recent global financial crisis. Also based on the results by Drehmann and Tsatsaronis (2014), who analyze the potential practical consequences of the start point bias, we decided in favor of keeping the longest possible time series.
10 Other indicators studied by Schularick and Taylor (2012) are e.g. the ratios of bank assets to GDP and money, which we do not analyze owing to lack of long enough quarterly bank balance sheet observations. dimension, it could be argued that such an 'early warning level' does not make sense for nonstationary series.However, we would argue that the ratio of credit to GDP is theoretically bounded, hence stationary in the long run.
Furthermore, our statistical procedure is not affected by 'spurious regression' problems. For this reason, we do include credit to GDP levels in the analysis as they serve as conditioning variables for other indicators. Sectoral credit aggregates, namely credit to households and non-financial corporations, are transformed into y-o-y rates of growth, deflated by CPI inflation, and ratios to GDP. The real rate of growth of housing loans is also considered. 11 Global liquidity is included in the form of global credit growth and gaps. 12 We also consider debt service costs. In particular, we use extended debt service ratio (DSR) series with respect to those in Drehmann and Juselius (2012), computed on high-quality (and sometimes confidential) data. 13 We include the 11 The source for loans to households for house purchase is the ECB. 12 Global credit variables are computed as GDP (at PPP) weighted averages of broad credit growth rates and gaps. In particular, global credit growth is constructed by averaging the y-o-y credit growth rates across countries, deflated by subtracting the y-o-y changes of the national CPI. The countries considered for the construction of the global credit variables are the ones under study together with Brazil, Canada, China, Hong Kong, India, Indonesia, Japan, Korea, Mexico, Norway, Russia, Singapore, South Africa, Switzerland, Thailand and the US.
13 The DSR at time t is calculated using the standard formula for the fixed debt service costs (DSC t ) of an instalment loan and dividing it by income (Y t ): where D t denotes the aggregate stock, i t denotes the average interest rate per quarter on the stock, s t denotes the average remaining maturity on the stock and Y t denotes quarterly aggregate income. The source for credit aggregates is the BIS, income data are sourced from Eurostat, while lending rates and the average loan maturity are sourced from the ECB (MFI Interest Rate statistics and MFI Balance Sheet Items statistics, respectively). The interest rate is the 3 month average money market interest rate from Eurostat.
aggregate DSR as well as sectoral DSRs for non-financial corporations and households. Finally, we include public debt, as a ratio to GDP, in the pool of credit-related indicators. 14 The macroeconomic variables we examine are real GDP y-o-y growth and the current account in percentage of GDP (on the properties of the current account as an early warning signal for banking crises, see Kauko (2012)). We also consider the M3 money aggregate, in terms of real y-o-y rate of growth and gap, and the real effective exchange rate. 15 With respect to property prices, house price growth (y-o-y, consumer price deflated) is considered, as well as gap measures. Moreover, we include in the dataset two standard property valuation measures, namely the house price to income ratio and the house price to rent ratio. 16 Finally, the market-based indicators included in our pool are the long (10 years) and short (3 months) interest rates, both deflated by subtracting the y-o-y CPI changes, as well as the deflated y-o-y growth rate of equity prices. 17 The dataset goes from 1970:Q1 to 2013:Q4; however, the last 4 years 14 Eurostat data. 15 The main source for real and nominal GDP data is the OECD; Eurostat data have been used whenever OECD series were not available or shorter (i.e. for Cyprus, Estonia, Greece, Latvia, Malta, Slovakia and Slovenia). The source for the current account balance is Eurostat. M3 is provided by the ECB. The real effective exchange rate is sourced from the IMF's IFS and from Eurostat for Estonia, Latvia and Slovenia.
16 These valuation measures are provided by the OECD in its house price database as indexes and are transformed by subtracting the long-term mean.
17 Interest rates are sourced from Eurostat, while the source for the stock price indexes is the OECD Main Economic Indicators database.
of data are excluded from the analysis (see previous section). To proxy for publication lags and taking a conservative stand, we lag all the variables by one quarter. In other words, the model aims at classifying the current quarter as pre-crisis or tranquil on the basis of data referring to no later than the last quarter, although some information on conjunctural developments from higher-frequency indicators would already be available in real time.

Classification Trees and the Random Forest
A binary classification tree is a partitioning algorithm which recursively identifies the indicators and the respective thresholds which are able to best split the sample into the relevant classes, say pre-crisis and tranquil periods. The output of the predictive model is a tree structure like the one shown in Figure 4, with one root node, only two branches departing from each parent node (hence "binary" classification tree), each entering into a child node, and multiple terminal nodes (or "leaves"). Starting by considering all available indicators and threshold levels, the procedure selects the single indicator and threshold yielding the two purest subsamples in terms of some impurity measure. A standard impurity measure, which we also employ, is the Gini index: where f i is the fraction of periods belonging to each category i in a given node, with i = 1, 2 in our case, i.e. pre-crisis and tranquil. The value of the ECB Working Paper 1723, August 2014 Gini index will be 0 for a node which contains only observations belonging to the same class. The more mixed a sample is, the higher the Gini index will be, reaching a maximum of 0.5 in the case of two categories. It is possible to generalize the above expression for the Gini index in order to take into account different misclassification costs C ij for the various classes. The Gini index can then be written as follows: with C ii = 0 and C ij reflecting the cost of assigning an observation belonging to category i to category j. In our case, for example, it could make sense to be conservative and assume that misclassifying a pre-crisis quarter as tranquil would yield more serious consequences than vice-versa, implying that the cost of a banking crisis is in general larger than the cost of prudential pre-emptive measures. In other words, this would amount to assuming unbalanced policymakers' preferences against missing crises. Asymmetric misclassification costs will also impact the classification of the tree leaves. 18 Once the first best split is selected, the algorithm proceeds recursively by further partitioning the two subsamples, i.e. finding the best split for each of them. The whole logical structure of the tree is then constructed recursively and the algorithm stops when either some stopping rule becomes binding If the policymaker's preferences between missing a crisis (type 1 error) and issuing a false alarm (type 2 error) are balanced, an early warning will be issued if the relevant leaf is associated with a frequency of pre-crisis periods larger than 50%. However, policymakers' preferences after the global financial crisis are likely to have become biased against missing crises, implying a lower threshold.
The main drawback of the tree technology is that, while it can be very good in-sample, it is known not to be particularly robust when additional predictors or observations are included. We overcome this problem by using 19 Theoretically, one can always grow a tree which has enough branches to yield pure leaves, i.e. correctly classify all sample data, unless the data is contradictory in some dimension. However, to avoid overfitting, such a tree should be pruned by replacing some parent nodes with leaves.
the Random Forest method proposed by Breiman (2001). This framework is a state-of-the-art machine learning technique which involves bagging, i.e.
bootstrapping and aggregating, a multitude of trees. Each of the trees in the forest is grown on a randomly selected set of indicators and country quarters. 20 Analogously to the tree, the forest allows for interaction across the various indicators, is able to handle large datasets, is not influenced by outliers and does not require distributional or parametric assumptions. Once a new quarter of data is available, the prediction of the forest will be based on how many trees in the forest classify it as a pre-crisis or tranquil period, and it will also reflect policymakers' preferences. Each of the trees in the forest is in itself an out-of-sample exercise, as the observations that are not used to grow the tree (so called out-of-bag observations) can be put down the tree to get a classification. It is therefore possible to compute the total misclassification error of the forest.
Together with being an extremely powerful predictor, the Random Forest allows to measure the importance of each of the input variables by evaluating the extent to which it contributes to improve the prediction. This is done in practice by randomly permuting the values of the n-th indicator in the outof-bag cases, and comparing these tree predictions to those obtained by not permuting the values. If the error rate increases substantially by permuting 20 Following the Random Forest literature, the number of indicators selected for each tree is equal to √ N , where N is the total number of indicators. At each repetition, 70% of the observations are sampled with replacement. However, the Forest is not very sensitive to the value of these parameters.
the values of an indicator, that means that the indicator does convey relevant information for an accurate classification. If, on the contrary, there is no difference between the two error rates, the indicator is useless.

Results from the Random Forest
The Random Forest could be used as a regular tool for policy purposes. Indeed, based on the error rate of a 100,000-tree forest we have grown on all of the indicators, the chance of misclassifying an incoming quarter of data is 6%. A standard metrics for the evaluation of the performance of a classifier across a range of preferences is the Area Under the Receiver Operating Characteristic curve (AUROC), the ROC curve plotting the combinations of true positive rate (TPR) and false positive rate (FPR) attained by the model. It is constructed by varying the forest 'early warning' threshold, i.e. the required fraction of trees classifying a particular observation as pre-crisis, beyond which that observation will be actually classified as pre-crisis. The ROC curve of a random classifier will tend to coincide with a 45 degree line, corresponding to an AUROC of 0.5, while the AUROC of a good classifier will be closer to 1 than to 0.5. Chart 2 shows the ROC curve of the Random Forest, corresponding to an AUROC above 0.9. This result is derived assuming biased policymaker's preferences against missing crises -in particular, we set misclassification costs such that the cost of misclassifying a pre-crisis quarter is twice as large as the cost of misclassifying a tranquil quarter -and ECB Working Paper 1723, August 2014 is robust to assuming balanced preferences.
Notwithstanding the remarkably good performance of the Random Forest, we acknowledge that this is a black-box model and its predictions would be hard to defend, in particular if they would support the activation of a macroprudential instrument. Therefore, in this paper we rely on the Random Forest in order to identify the key indicators, on which we construct our benchmark tree. By doing so, we ensure that the variables selected to grow the tree are truly the most important ones in the pool and we rule out the possibility that the tree selects a relatively weak indicator which just happens to seem useful in-sample but would not survive an out-of-sample robustness check. Chart 3 shows the ranking of the indicators in the forest, with the bars representing a measure of the increase in the classification error associated with randomly permuting the values of the considered indicator across the out-of-bag cases. This measure is compute for every tree, then averaged and divided by the standard deviation over all of the trees. 21 Not surprisingly, since the model is designed to predict banking crises associated with a domestic credit boom, the most important indicator turns out to be bank credit in the form of its ratio to GDP, followed closely by the gap derived with a very slowly adjusting trend. The level of broad credit 21 Given that the Forest includes an element of randomness, multiple runs of the algorithm on the same dataset won't necessarily yield the same indicators' ranking, in particular if the error associated with different indicators is similar. A robustness check based on several 1000-tree forests indicates that there could be a difference of at most two positions with respect to the ranking illustrated here. The exact ranking is anyway not the focus here, as we are only interested in telling the good indicators from the bad ones. and the Basel gap rank lower than the narrow credit counterparts, though still in the top half of all the indicators. The Lucas critique however applies: economic agents' decisions are indeed not policy-invariant, therefore one could expect that with increasing bank lending regulation, such activities will more and more shift to the non-banking sphere -supporting the use of the total credit aggregate as a more comprehensive indicator for the future.
In general, credit to GDP ratios appear helpful in assessing how vulnerable a country is because of excessive structural leverage rather than conjunctural developments, and are therefore useful in conditioning the information provided by gaps and rates of growth. Global liquidity -in the form of both the global credit gap and the growth rate -turns out to be another key concept, ranking among the five most important indicators. The remaining two indicators among the top five are the level of household credit and the aggregate debt service ratio. Immediately following the top six indicators there are some measures relating to house prices, namely the house price to income ratio, the house price gap and house price growth. Equity price growth ranks a little lower. Indeed, heated asset price growth might be associated with excessive credit growth fuelling a growing bubble. After considering the housing market, the Random Forest suggests that the real short term rate should be looked at next, most likely because a low rate may encourage risk-taking in a search-for-yield behavior. Also among the top half of all the indicators are the household debt service ratio, bank credit growth, the NFC credit to GDP ratio and M3 gaps. The first is that the better the tree is at fitting in-sample data, the purer the leaves it will yield, with associated in-sample frequencies close to 1 or 0.
However, in assessing a country's situation one should consider whether the relevant indicators only marginally exceed (or not) the respective thresholds.
The second caveat relates to country specificities, which cannot be captured by the model. With respect to this leaf, for example, the concept of the DSR could be misleading for specific countries that for reasons not harmful for financial stability have structurally high private sector debt. In such a case, a net debt concept taking into account accumulated private sector wealth would be more suitable. If the bank credit to GDP threshold of 92% is not breached, the next relevant indicator is the bank credit gap with a threshold of 3.6 p.p.. If this threshold is breached, the crisis probability increases to above 60%. In this case, there would be a role for macroprudential tools such as the countercyclical capital buffer as the credit gap can be associated with cyclical systemic risk. Looking at the left branches of the tree, the main messages are as follows. If the DSR is below 10.6% the crisis probability is negligible. A relatively large number of countries, however, are in the middle range, with a DSR between 10.6% and 18%. For these countries, essentially depending on the sign of the M3 gap, different variables become relevant.
These indicators relate to the following: i) house prices, in the form of house price growth and gap and in relation to income; ii) equity prices; iii) the Basel gap; iv) the short term real interest rate; v) bank credit level and growth; and vi) household credit. As an example, a country falling in the 'warning' ECB Working Paper 1723, August 2014 leaf associated with a house price to income ratio 27 points above its long term average might consider adopting measures such as caps to loan-to-value and loan-to-income ratios.
With respect to the in-sample predictive performance of this benchmark tree, the true positive rate and the false positive rate (or share of type 2 errors) are equal to 85% and 4%, respectively, while the share of type 1 errors is 15%. The noise to signal ratio is 5%. A more sophisticated measure of the usefulness of the model, taking into account the policymaker's greater aversion towards type 1 errors, indicates that a policymaker using this tree increases his/her utility by 65% compared with ignoring it. 23

Country classification
According to end-2012 data, the countries above the DSR threshold of 18% are Belgium, Cyprus, Denmark, Greece, Ireland, Italy, the Netherlands, Portugal, Spain, Sweden and the UK. Almost all of these end up in the 'warning' leaf associated with bank credit at more than 92% of GDP and household credit at more than 54.5% of GDP, characterized by a 90% crisis probability.
However, it should be noticed that Italy and Greece breach the first and the second threshold, respectively, by only a couple of percentage points. Based on the available data, the probability of a banking crisis in Cyprus would be 35%, which is the in-sample crisis frequency associated with the bank 23 See Sarlin (2013).
credit to GDP node. Belgium breaches neither the bank credit to GDP 92% threshold, nor the 3.6 p.p. bank credit to GDP gap threshold, ending up in a leaf characterized by a zero crisis probability. With respect to the countries for which the DSR is below 18%, Luxembourg and Slovakia end up in the 'tranquil' leaf associated with a DSR lower than 10%. France, Slovenia, Austria, Finland and Latvia do not breach the -0.24 p.p. M3 gap threshold, while the real short term rate is in all of these countries below -0.5%.
Due to missing data on bank credit for Slovenia and Latvia, these two countries remain associated with the parent node characterized by a 26% crisis probability, while Austria, Finland and France do not breach the 7.3% y-o-y growth threshold and therefore end up in a 'tranquil' leaf. Germany breaches the M3 gap threshold only very marginally, as the gap is still negative, while it does not breach any of the housing-market related nodes, ending up in a leaf characterized by zero crisis probability. The M3 gap in Estonia and Malta is positive. Due to data availability issues, these two countries cannot be classified into any terminal node; the crisis probability associated with the parent nodes they end up in is 8% (house price to income node) and 13% (equity price growth node), respectively.
9 Out-of-sample exercise The tree built on the indicators listed above (excluding global liquidity) would have had the M3 gap at the root node (see Figure 6). Germany and systemic banking crises, the onus is on those who aim to use judgement alone to justify why macro-prudential policy tools are not activated. Second, the intuitive nature of a decision tree model and its easy visualization is likely to increase acceptance of an analytical approach as a starting point for policy discussions. As section 7 has shown, the approach can be used to also trigger discussions on country specificities affecting the risk assessment. Third, a further advantage of the tree model is that depending on the characteristics of the leaf associated with a certain crisis probability, the nature of the vulnerability can also be identified, which in many cases would then suggest the use of a specific policy instrument over another.
ECB Working Paper 1723, August 2014 34 We build an early warning system aiming at supporting policy decisions on when to activate macroprudential tools targeting excessive credit growth and leverage. Together with total credit to GDP deviations from trend (the socalled 'Basel gap') we consider a battery of indicators as a policy guide, including credit ratios and real estate indicators.
By using decision trees, we build a multivariate predictive model which is at the same time extremely accurate and very easy to interpret. Based on the experience of EU countries over the last 40 years, it applies decision tree learning to the problem of identifying excessive credit growth and leverage with a sufficient lead time to allow policy reactions. One of the main advantages of the presented approach is that it takes into account the conditional relations between various indicators when setting early warning thresholds.
At the same time, the model is able to give an indication on which macroprudential tool could be best suited to address specific vulnerabilities.
The proposed early warning system can be regarded as a useful common reference point informing policy makers when using their judgement. Indeed, it is crucial that the use of judgement be firmly anchored to a clear set of principles to promote sound decision-making in the operationalization of macroprudential instruments.

ECB Working Paper 1723, August 2014
With respect to the in-sample predictive performance, this tree yields a true positive rate of 88% and a false positive rate of 2%, while the share of missed crises is 12%. Notice that, although the benchmark tree described in Section 7 is constructed by placing a higher weight on Type 1 errors, it still yields a higher share of missed crises compared to the balanced-preferences tree due to the fact that some branches have been pruned and therefore both trees are in some sense 'suboptimal'. Finally, the noise to signal ratio associated with this tree is 2% while the relative Usefulness measure, i.e. the gain by using this model compared to ignoring it, is equal to 86%.       income ratio is in terms of index points above/below its long term average, while p.p. stands for percentage points. In each terminal node (leaf) of the tree the crisis probability is indicated, based on the frequency of pre-crisis quarters ending up in that particular leaf, considering the historical data on which the tree has been grown. The total number of country/quarters ending up in each leaf is also indicated. When the crisis probability associated with a leaf exceeds 30% the leaf is labelled as a 'warning' leaf. House price/income > -3.5 p. < -3.5 p. Figure 8: The early warning tree derived by assuming balanced preferences. The threshold for the house price to income ratio is in terms of index points above/below its long term average, while p.p. stands for percentage points. In each terminal node (leaf) of the tree the crisis probability is indicated, based on the frequency of pre-crisis quarters ending up in that particular leaf, considering the historical data on which the tree has been grown. The total number of country/quarters ending up in each leaf is also indicated. When the crisis probability associated with a leaf exceeds 50% the leaf is labelled as a 'warning' leaf.