A proximity based macro stress testing framework

: In this a paper a non-linear macro stress testing methodology with focus on early warning is developed. The methodology builds on a variant of Random Forests and its proximity measures. It is embedded in a framework, in which naturally defined contagion and feedback effects transfer the impact of stressing a relatively small part of the observations on the whole dataset, allowing to estimate a stressed future state. It will be shown that contagion can be directly derived from the proximities while iterating the proximity based contagion leads to naturally defined feedback effects. Since the methodology is Random Forests based the framework can be estimated on large numbers of risk indicators up to big data dimensions, fostering the stability of the results while reducing inaccuracies in estimated stress scenarios by only stressing a small part of the observations. This procedure allows accurate forecasting of events under stress and the emergence of a potential macro crisis. The framework also estimates a set of the most influential economic indicators leading to the potential crisis, which can then be used as indications of remediation or prevention. Abstract: In this a paper a non-linear macro stress testing methodology with focus on early warning is developed. The methodology builds on a variant of Random Forests and its proximity measures. It is embedded in a framework, in which naturally defined contagion and feedback effects transfer the impact of stressing a relatively small part of the observations on the whole dataset, allowing to estimate a stressed future state. It will be shown that contagion can be directly derived from the proximities while iterating the proximity based contagion leads to naturally defined feedback effects. Since the methodology is Random Forests based the framework can be estimated on large numbers of risk indicators up to big data dimensions, fostering the stability of the results while reducing inaccuracies in estimated stress scenarios by only stressing a small part of the observations. This procedure allows accurate forecasting of events under stress and the emergence of a potential macro crisis. The framework also estimates a set of the most influential economic indicators leading to the potential crisis, which can then be used as indications of remediation or prevention.


Introduction
Stress testing is of increasing importance in all industries. Regulatory requirements as well as renewed accounting standards are asking for macro stress tests to better safeguard against a crisis. Macro stress testing is a relatively new eld. It requires testing stress-e ects within the greater and most signi cant part of the nancial system and aims at analyzing its resilience as a whole. The merits of macro stress testing are seen in the context of either crisis management or early warning indication. To manage a crisis a stress scenario is applied to known key risk indicators (KRI) and a re-mediating action is derived alike for example the determination of economic capital, whereas in an early warning indication framework the KRIs are identi ed themselves. In the case of early warning, scenario design is crucial ( [7]). Ideally, the macro prudential scenarios should be plausible, severe and suggestive of mitigation opportunities ( [10]). Apart from the obvious choice of historical scenarios, measures for plausibility of self constructed, hypothetical scenarios and algorithms and methods to nd them have been suggested. However, in the shadow of the nancial crisis, scholars (see for example [7]) are suggesting that scenarios might have to be implausibly severe to include the expectation of the unexpected, while especially in the case of historical scenarios scholars have formulated doubt as to whether early warning frameworks can actually work (see for example [7]). The failure of prediction ahead of the nancial crisis in 2007/08 indeed casts doubt on the usage of historical data to assess the probability of an upcoming crisis.
From a methodological point of view, researchers nd that most currently performed macro stress tests do not go beyond the immediate e ects in the market and could be enhanced by a longer time horizon and corresponding correlation/contagion and feedback e ects, preferably in a nonlinear framework ( [7]). Additionally it is often assumed that modeled interdependence remains stable over time while in stressed states such relations can change quickly ( [7]).
At the BIS it is propagated that macro stress testing is a toolbox, not a single tool. This paper adds a tool by developing a framework that is a big data suitable model for adverse economic movements with high predictive capabilities, requiring only a few stressed inputs with the option for nding policy indicators. The proposed model is based on the proximities of a Random Forests variant on an empirical dataset. The framework does not focus on stressing individual risk indicators as is usually done but on stressing the values of all risk indicators on a subset of the observations (for example nancial institutions). More speci cally, a suitable sample of observations from a current state dataset today is chosen and stressed on all risk indicator's values of each of these observations. This stress should re ect the values of the variables of the stressed observation in a future state. How these observations are stressed is not the subject of this paper and not covered herein, however, it can be done taking into account common econometric models of interaction between the variables or by expert judgment. Once the chosen sample is stressed, all other not sampled observations are infected by the values of the variables of the stressed observations by means of the Random Forests proximities. This step is often called contagion. On the infected, estimated stress state data (future) a new Random Forests model is built. By iterating the contagion model, feedback e ects are produced. The model thus encompasses the concept of contagion and feedback e ects, inherently de ned in the model. Based on the number of stress caused events in the estimated stress state data, the framework indicates whether a potential crisis could emerge due to the applied scenario. In case the framework suggests a crisis, the importance measures de ned within the Random Forests algorithm allow to identify the most important variables which had the highest in uence in the classi cation of the observations. This variables can be used to identify remediation actions against the potential crisis. Thus the proposed proximity based stress testing framework can be used as an early warning indicator as well as an instrument to identify actions to manage or prevent a crisis. This paper shows that the resulting Random Forests model predicts future stress events accurately using a relatively small initial sample of observations, while the amount of risk indicators is theoretically not limited: Being based on Random Forests the methodology can cope with a large number of indicators being thus suitable for big data analysis and it can consequently model national and international KRIs together. The characteristics of the framework o er the advantage of robustly modeling the interdependence between observations by applying as many risk indicators as possible and by reducing the impact of estimation errors in stress scenarios. The later by using only few stressed inputs and being able to choose as such either observations which are stressed in a straight forward way or observations where the values of stressed indicators are known with a high degree of certainty. Again, it is assumed hat a model on how to stress observations is already present and the proposed framework and applied on only a few most suitable observations. The remainder of the paper is structured as follows: the second section positions the paper in the current literature on macro stress testing and early warning frameworks under special consideration of Random Forests. In the following third section the concepts of the Random Forests variant of recursive conditional participation and the proximities is introduced. The mathematical foundation of the proposed model is laid out. The fourth section applies the model to an empirical analysis and elaborates the policy indications from the model. The nal section concludes the paper.

Literature review
The contribution of this paper is situated in three areas of the macro stress testing literature: in general macro stress test modeling, in modeling early warning frameworks and in the application of machine learning algorithms.
Current trends in macro stress testing encompass integrating di erent risks, contagion-and feedback e ects: The idea of contagion is the transmission of a shock by a relatively small number of market participants (e.g. banks, sovereigns) to other or most of the other participants. To include the concept of contagion has become a common feature in macro stress testing. Some of the earlier works are by Allen and Gale [3], who model contagion e ects of claims between banks; De Bandt and Hartmann [11], consider contagion e ects in the broader context of systemic risk and Upper and Worms [33] speci cally analyze contagion in the German interbank market.
Feedback e ects describe the e ects of stress and also contagion spreading between the market participants in the subsequent periods of time after the shock and contagion have occurred. Feedback are for example modeled by Jacobsen and Raszbach [23] who use an aggregate vector autoregressive model integrating several modules linking risk factors and balance sheets of corporates to show feedback from nancial stability to the economy.
Elsinger at al. [15] integrate market risk, credit risk, interest rate risk and counterparty credit risk in the Austrian interbank sector. Boss et al. [8] extend the model of Eslinger et al. [15] to a three year horizon and incorporate pro t risk. Considering the rules of accounting Drehmann et al. [13] create a stress test integrating credit and interest rate risk by modeling assets and liabilities simultaneously. State of the art stress tests also increasingly try to include liquidity risk (see for example [5], [34]). The currently most comprehensive model is the risk assessment model of the Bank of England ( [1]), which also includes feedback e ects.
Additionally, in their studies, Juselius and Kim [24] and also Drehmann et al. [12], have found that the macro econometric relationships are mostly non-linear. The BIS [7] has in its various analytical publications assessed that the focus on non-linearities and contagion/feedback e ects is a priority while they doubt the potential of modeling network e ects or aggregation models.
Taking up current research, the proposed model is non-linear, incorporates contagion and feedback effects and it will be shown that the stress tests performed ahead of the crisis are accurate. On the other hand, the model is integrated in the sense that the dependent variable depends on various macroeconomic factors from di erent areas of risk but only models their in uence on each other indirectly by changes in proximities. However, since the proposed model does not specify how the stressed observations are built, another model from the literature which integrates all risk types can be applied to generate the stressed sample of observations which is used in the proposed framework.
Generally the usage of stress testing for early warning indication is not recommended by the BIS [7]. Reasons are the frequent lack of non-linearity and the usage of historical scenarios. The proposed framework is focused on early warning yet it is not built in the classic way. The early warning literature in nance mainly encompasses two approaches, rstly signaling approaches, where a threshold for speci c early warning indicators is identi ed and secondly logit/probit approaches modeling the e ects of risk indicators to identify early warning indicators. The proposed framework on the other hand estimates the number of events under stress, modeling the interaction of a large or even vast amount of indicators and then reverts back to identify the most important of the indicators for remediation of the stress e ects. Additionally it is inherently non-linear. However, the initial estimation is still done on empirical data.
A recent representative of the signaling approach is Pasricha et al. [27] who apply an imbalance indicator model encompassing a large number of potential indicators. While alternatively in a recent work Babecky et al. [4] focus on developed economies and nd by Bayesian model averaging that domestic housing prices, share prices, and credit growth, as well as the global variable private credit are KRIs. The proposed model on the other hand encompasses around 100 indicators and mostly developed countries and some emerging economies. Also for big data application Random Forests accepts many more indicators.
With focus on the proposed application of a Random Forests model several papers are preceding this one. The rst paper is by Gosh and Gosh [18] followed by Frankel and Wei [17] who both apply decision trees on currency crises. Manasse and Roubini [25] use binary recursive trees on sovereign crises. The succeeding paper by Savona and Vezzoli [29] deals again with sovereign crises, while Duttagupta and Cashin [14] and Manasse et al. [26] study banking crises in emerging markets. Alessi and Detken [2] apply regression trees to excessive credit growth and leverage measurement. Savona and Vezzoli [29], Manasse et al. [26] as well as Alessi and Detken [2] all run some sort of Random Forests on their sample. Especially Alessi and Detken run the classic regression Random Forests by Breiman [9] to identify the most important variables to build the forest and construct a nal decision tree with the important variables only. This paper applies likewise a Random Forests model but rstly not by using the classic Random Forests by Breiman but the conditional recursive partitioning forest ( [21]/ [31]) and not for the aim of building a nal tree alike Alessi and Detken [2] but to construct a stressed state dataset (future). The most important variables are likewise identi ed. The conditional recursive partitioning framework is chosen above the classic Random Forests because the latter is known to be biased in the choice of splitting variables and thus in the assessment of the most important variables. The classic Random Forests prefers continuous variables to factors or discrete variables or variables with many di erent values to such with less values on the observations. Thus to identify the most important variables the unbiased method is preferably chosen.
Additionally the proposed model is based on a classi cation forest and not a regression forest. This because the classi cation model can be stressed and interpreted in an intuitive way: a stress situation will result in more observations being classi ed as events. On the other hand, regression trees and a regression forest do not produce new estimates of the dependent variable in case new data is applied, but allocate the observations to average values of the initial dependent variable. Of course, a stressed state causes more observations to be in nodes with higher prede ned average values. Nonetheless is the interpretation of this outcome much more di cult especially since the highest average result was estimated before the stress situation.

Mathematical background . Recursive Conditional Partitioning
To de ne the algorithms the following dataset is introduced: Let Y ∈ { , } m be m observations of the outcome of a binary event. Let X ∈ R m×n be a collection of m observations of n independent variables (risk indicators). A dataset is then denoted by The m rows of the dataset O, O i will be henceforth referred to as the observations.
The conditional recursive partitioning forest and the classic Random Forests are very similar. The main di erence is the splitting framework: within the conditional recursive partitioning framework the variables for splitting are selected based on maximizing the association to the dependent variable calculated by a linear statistic. Like the classic Random Forests algorithm, the nodes in each tree are split on a random sample of the total variables. Unlike the classic Random Forests, the trees are not grown on bootstrap samples but on samples without replacement. Strobl et al. [31] have shown that the bootstrap samples increase the bias in variable selection identi ed in the classic Random Forests. The conditional recursive partitioning framework then grows each tree in the forest in accordance with the following rules ( [21]): 1. For each tree a training sample TS ⊂ O of a prede ned size s, s<m, At each node, test the global hypothesis of independence between Y TS and X TS . If the hypothesis cannot be rejected, independence is assumed, the growth of the tree is stopped in the respective branch. If the hypothesis is rejected, in accordance with a prede ned con dence level, the association of each independent variable with the dependent variable is tested and the variable with the highest association, as measured by the highest statistical signi cance (p value), is chosen as the variable to split on. 3. On the variable with the highest association, the point for the best binary split is chosen as the value which maximizes the test statistics for association. The data in the respective node is split by that value as in the classic Random Forests. 4. The steps are repeated within each tree for all trees in the forest until the global null hypothesis can no longer be rejected or another stopping criteria, alike a minimum number of observations in the respective nodes, applies. Due to application of permutation testing by Strasser and Weber [30], where all possible permutations of the values in the learning sample are used, the following test statistics do not require knowledge of the distribution of the tested random variables: 1. First step is the general linear statistic to measure the association between Y TS and an individual variable X TS .j : The variable to split on is the X TS .j* with j* = argmin j= ,...,m P j and P j = P H j (c quad (T j (TS, w), Σ is the covariance matrix, Σ + is the Moore-Penrose inverse of the covariance matrix, while µ is the mean and S is the permutation of the responses as developed by Strasser and Weber [30]. Due to the application of these statistics on permutations of the samples, the statistics are conditioned on them. In the case of classi cation the function g j is the identity mapping or the zero vector with value 1 at the level k if a nominal variable with K levels is used (e K (k)). The vec-operator turns a matrix by column-wise combination into a column vector. 2. If the aggregated p value of each T j test for association cannot be rejected, thus if basically no p value is lower than a prede ned level the classi cation tree is stopped. Hothorn et al. [20] suggest to use Bonferroni adjusted p values or minimum p values for aggregation. 3. Once the variable, X TS .j* , with the highest association to the dependent variable is found, a similar test statistic is applied: Find best split value on the chosen variable by maximizing the test statistic over all possible subsets of the set of values: Strobl et al. [31] have shown the framework to be unbiased in the choice of splitting-variables and thus the in uence of speci c variables can be interpreted. For further details please refer to [30] and [21]. However the focus in this paper is on the proximity measure which thus will be de ned in detail, after the de nition of a random forest.
assuming a binary (0,1) classi cation. The function behind RF is the conditional recursive partitioning framework with sampling of unique records only when building the trees.
The proximity measure is specifying a concept of distance between two observations in a Random Forests model.

De nition 2.
For two observations O i and O j , i, j ∈ { , .., m} in a random forest, the proximity ρ ij is de ned as the share of trees in the forest where both observations are in the same terminal node. Consequently (ρ ij ) i,j ∈ R m×m is the matrix of mutual proximities between all m observations.
The proximity measure has the following characteristic:

. Mathematical Derivation of the Proximity based Stress Testing Framework
The idea of proximity based stress testing is simple: the closer two observations are, the more likely they in uence each other. Additionally there is no need to stress all observations but only a preferably small share of the total number of observations and then let the contagion and feedback e ects do the rest.
At the end, the aim is to take a dataset of interest (current state data), choose a speci c number of observations, stress those based on expert judgment or on an econometric model and apply the proposed model to construct, from the stressed sample, a stressed future dataset by contagion and feedback e ects. On the stressed dataset a new Random Forests model is built resulting in the estimated stressed state of all observations and a list of the most important variables leading to it.
In detail, in the proposed framework the contagion and feedback e ects are done by proximity weighted averages of the stressed inputs or, for the purpose of modeling feedback, by repeated application of proximity weighted averages. A proximity weight is simply the relative proximity between two observations i and j: De nition 3. De nition of proximity weights ρ ω : with i, j ∈ { , ..., m}. Consequently W := (ρ ω ij ) i,j ∈ R m×m is the matrix of proximity weights. Note, W is not symmetric.
To derive the above described application of the framework, the following assumption must hold: Assumption of Structural Stability -The proximities of observations evolve similarly over time. Thus for two sets of data, which are su ciently close in time, the proximity matrices are equal. In other words, for each pair of observations an ϵ > exists such that: for a speci c point in time t and i, j ∈ { , ..., m}. It can be shown that the dataset used for a Random Forests prediction can be replaced by a dataset of iterated, proximity weighted averages of the values in the very same dataset and still yield the same predictions. Further, the latter can be shown to be generated by a sample of only some observations and still yield the same predictions. It follows that, assuming structural stability, a generating sample for a future stress scenario can be used to build a stressed state dataset which, inserted in a current state Random Forests model, yields predictions of stressed events.

Proposition 1. (Invariance of Prediction)
The prediction results of a Random Forests model are equal for a dataset, O ∈ { , } m × R m×n and the dataset of its respective proximity weighted averages: Assuming an association between Y and X on a perfectly accurate Random Forests model and proximities larger than zero between the observations of the same class and zero else, it follows that a positive integer l exists such that:

refers to the prediction of the forest of the events and non-events and W is the matrix of proximity weights.
Proof. It needs to be shown that all observations are classi ed as the same class before and after the application of proximity based contagion and feedback: . First the e ects of contagion are analyzed: Because, by assumption, observations from di erent classes have a proximity measure of zero, it holds that the value X ij of observation O i on variable X j is transformed by proximity based contagion into:X All transformed values of a variable X j are thus weighted averages of the values of observations in the respective same class only.
The matrix of proximity weights W can, without restriction to generality, be written as a block diagonal matrix, sorted by the classes of the dependent variable: In the rows and columns, the observations with class one come rst and second those with class zero: where |.| is the cardinality of a set. The zero-block matrices, |CL( )|×|CL( )| and |CL( )|×|CL( )| , result because the proximity between observations with di erent classes are zero. Ordering all observations (rows) in the dataset O in the same way as they are now ordered in the matrix of proximity weights, the estimated dataset, O (excluding Y), resulting from a one o application of proximity based contagion, can be written as matrix multiplication:X := W X with X being the set of independent variables X, ordered in the same order as W .
Second, the e ects of the feedback iteration on the proximities: Iterating the estimation of the dataset means taking the proximity weighted average of the proximity weighted average iteratively. The proximity weighted average of the proximity weighted average is then W W X . This results in a power sequence: W is per de nition a stochastic row matrix as are its non-zero diagonal blocks, de ned in equation 7. Within } the diagonal itself is non-zero and always holds the highest row value (because each observation is closest to itself). Note that it was assumed that observations within the same class have a non-zero proximity assuring that the non-zero diagonal blocks contain entries larger than 0.
Thus W has no rows where all entries are zero but it has square matrices on its diagonal, with nonzero entries. Having zero entries in the upper right corner the whole proximity matrix is also of lower blocktriangular form. Ful lling this conditions Qu, Wang and Hull [28] have shown that the sequence of stochastic matrices of proximity weights, W k converges: with c and c being stochastic vectors and |CL( )| being a |CL( )| times |CL( )| square matrix of ones. Since X is stable throughout the sequence it follows that the sequence of proximity weighted averages converges likewise. Third, the e ects of the feedback iteration on the observations conclude the proof : Due to the assumption of an association between Y and X it can without restriction of generality be assumed that for each variable Xv, larger values of this variable are more often associated with class 1 and lower values more often with class 0. Then, because of the convergence of the sequence of averages, the limits of each class must be di erent and it must hold that for each variable Xv a positive number lv of iterations exists, such that each weighted average in class one at this point in the sequence, is larger than any weighted average in class zero: Choosing the number of total iterations l as l := maxv lv allows to perfectly distinguish the classes in Y on each variable. Because it is assumed that O can be perfectly predicted by a Random Forests model, then (Y , W l X ) can likewise be perfectly predicted by a Random Forests model and the observations are classi ed in the same class as by the original Random Forests analysis, with Y being the dependent variable Y, ordered in the same order as W . Since the vertical order of the observations in the datasets does not in uence the prediction of the random forest the result applies likewise to (Y , W l X).

predictions result as for RF(O): Assuming an association between Y and X on a perfectly accurate Random Forests model and proximities larger than zero between the observations of the same class and zero else, it follows that a minimal generating set S ⊂ O exists which generates the same RF results as RF(O) using proximity based contagion and feedback:
Proof. Following the proof of proposition 1 the matrix of proximity weights W l is not changed as it is built by Having drawn S, the proximity weighted average for any value is built from those observations in S which are within the same class as the observation of the considered value. Obviously these changes in the dataset do not a ect the convergence of the product of X with the stochastic proximity weight matrix [28].
Because an association between Y and X is assumed, it holds again that on each individual variable Xv, without loss of generality, the higher values can be attributed to class 1 while the lower ones, after a certain threshold, can be attributed to class 0. Additionally, for the sake of the argument every row i : O i ∈ O/S is set to a vector of zeros. Note that O is considered as the set of the observations. Then the following cases conclude the proof: As such the assumed perfect accuracy of the forest is preserved by the sampled transformation and the prediction remains the same as on the original forest using the whole dataset.

Corollary 2.
Assuming an association between Y and X on a perfectly accurate Random Forests model, then corollary 1 likewise holds if the condition that 'proximities between the observations of the di erent classes are zero' is relaxed to 'proximities between the observations of the di erent classes are smaller than those between observations of the same class'.
Proof. Following the proof of proposition 1 it needs to be shown that the stochastic matrix W l still converges although it is no longer of lower block-triangular form. This is indeed still the case due to Wolfowitz [37] lemma 2. Subsequently, because of lemma 2.1 of Qu et al. [28], W l converges to the matrix c × I, where c is a constant and I the m×m identity matrix.
Assuming again without loss of generality, the higher values on a variable Xv can be attributed in tendency to class 1 while the lower ones can be attributed in tendency to class 0, it follows that most values attributed to a class 1 observation must be above c while most values attributed to a class 0 observation must be below. Because of the assumption that observations in the same classes are closer to each other than to observations of other classes (ρ ij > ρ sl , ∀i, j, s, l : cl(O i ) = cl(O j ) and cl(Os) ≠ cl(O l )), the proximity weighted averages of most values attributed to a class 1 observation are, during the iteration, converging to c monotonically decreasing while most values attributed to a class 0 observation are converging to c monotonically increasing.
It thus holds that there exists a nite number of iterations l such that the proximity weighed averages of the values of a speci c variable within class 1 are all larger than the proximity weighted averages within class 0 of the same variable. The remaining proof follows then from corollary 1. • The assumption of non-zero proximities within the diagonal stochastic square matrices that are larger than proximities between observations of di erent classes, re ects the expectation that if a model is built and accurate, the observations of the same class are sensitive to the same risk drivers and 'closer' to each other. However, in the empirical application there will be cases where this assumption does not hold. • Additionally the assumption of a perfectly accurate forest will not always hold on an empirical dataset.
However, using a large amount of variables and at least 5000 trees, experience has shown that the Random Forests models exhibit an average in-sample classi cation error of less than 1%. • If all assumptions were holding and the full proximity matrix is available, then a sample size of one observation is su cient to conform to corollaries 1 and 2. However, since the assumptions, although they are reasonable, will not fully hold in reality, the empirical application of the methodology will exhibit a deviation to the expected theoretical results. As such additional information in the form of a larger sample will add accuracy and the actual sample size is best calibrated on historical data in a respective portfolio.

Proposition 2. (Stress Prediction) Assuming a time series of datasets of observations O t and matrices of proximity weights W(t) built on these datasets, an association between Y(t) and X(t) on a perfectly accurate Random
Forests model and that proximities between the observations of the di erent classes are smaller than those between observations of the same class and of those observations which change classes between time t and t+1. Assuming that the assumption of structural stability holds, it follows that a minimal generating set S(t+ ) ⊂ O(t+ ) exists which generates the same RF results as RF(O(t + )) using proximity based contagion and feedback with the proximity information at time t: Proof. Technically the main di erence between proposition 2 and corollary 2 is an unknown number of observations which will change the class due to contagion and the iteration of proximity weighted averages (feedback e ects). Since these observations, the transition observations, form part of class cl in time t and class ¬cl in time t+1 their inter-class proximities (proximities between the observations of the di erent classes) can reasonably be assumed to be higher than those of observations which do not change class. Following the proof of corollary 2 the proximity weighted averages converge to a value c. Assuming also and again without loss of generality, that during the iteration of the proximity weighted averages most values attributed to a class 1 observation are converging to c monotonically decreasing while most values attributed to a class 0 observation are converging to c monotonically increasing. Then, because the transition observations have higher proximities to class 1 observations than the observations which remain in class 0, they increase faster towards c compared to class 0 observations. It thus exists a number of iterations l after which the maximum proximity weighted average of the values of observations attributed to class 0 on a speci c variable Xv is lower than the minimum proximity weighted average of the values of the transition variables.
Since Y(t + ) is assumed to be known, proposition 2 follows directly from corollary 2 and the assumption of structural stability. Proposition 2 is the main result in this paper. As laid out above it can be shown that the dataset in a Random Forests prediction can be replaced by a dataset of iterated proximity weighted averages generated by a subset of stressed observations yielding the same predictions as would be derived using the full stressed data. Thus under the outlined assumptions it is su cient to stress a small number of observations (or market participants as denominated in the literature review, while in this paper the market participants are sovereigns) in order to estimate the future stress state of a dependent variable and the whole dataset. The result can be used for stress testing and early warning and allows by the concept of importance measurement to identify the main risk drivers of future event occurrences. To apply proposition 2, two issues have to be tackled. First, as mentioned, the minimal generating set is not speci ed nor how it is found: In the next section, 'Empirical Study', the observations are chosen based on their proximity to all other observations to maximize contagion and feedback e ects, which suits the model best. The minimum size of the set in an environment of only partially ful lled assumptions is empirically calibrated also in the next section.
Second, in an empirical application the dependent variable Y(t + ) is with exception of the stressed observations not known. Y(t + ) needs thus to be estimated alike the dependent variables X(t + ). More speci cally the proximity based contagion and feedback is likewise applied to Y. Since Y can be considered equivalent to just another variable, proposition 2 applies as well and the values of the estimatedŶ will be clearly distinguishable (the accuracy of the distinction is depending on whether the assumptions are fullled). Yet, the estimated values will be weighted averages between 0 and 1 andŶ as such not suitable to be used as dependent variable in a classi cation Random Forests model. To use the weightedŶ the following transformation (rounding) is applied: The values ofŶ which are above a certain threshold τ are attributed to class 1 and and those below to class 0. The threshold τ can be found by calibration as laid out in the next section.

Empirical study The Dataset
To make full use of the capabilities of Random Forests, a large number of independent variables or risk indicators should be used. Considering the aim to show that the proximity based stress testing framework can predict or warn about future crises, the used dataset should be a time series. Therefore the public and online available data of the World Bank, "World Development Indicators & Global Development Finance" has been sourced ¹. The independent risk indicators are selected from currently applied theories on GDP growth, such as tax raising, public spending, monetary policy, the liberty of the economic environment, the workforce and its education and international trade. Indicators with more than 33% missing values are excluded. Indicators that cannot be easily compared between countries such as indicators measured in local currency or other absolute values are also not included. In numbers, 104 indicators are chosen between 1998 and 2010 (with an average of 9% missing values between 1999-2010). The large number of indicators in the model can easily be coped with by Random Forests and as Biau [6] shows, there will be no distortion from variables with no predictive power. The indicators and their descriptions are listed in the appendix. Since the Random Forests based recursive conditional partitioning does not over-t ( [9]), many more indicators could theoretically be introduced.
The 12 years of data in the sample encompasses information from Australia, Austria, Belgium, Brazil, Canada, China, Czech Republic, Denmark, Finland, France, Germany, Greece, Hong Kong SAR, Hungary, India, Indonesia, Ireland, Italy, Japan and the United States. The choice of countries to be included in the sample represent mostly the developed world including some emerging economies. The speci c choice of the countries is based upon data availability and quality.
As dependent variable an indicator for nancial stability was chosen: This paper is considering the changes in the number of non-performing loans per country as such. The non-performing loans (NPLs) are studied in various scienti c papers. Espinoza and Prasad [16] describe NPL as key macroeconomic indicator for nancial stability and investigate its feedback e ects over a three year period. They especially nd that nancial institutions with a high NPL are very sensitive to macroeconomic stress. Likewise Vatansever and Hepsen [35] argue that NPL is an important economic performance measure and apply a regression and co- integration analysis to show a signi cant relationship between NPLs and a list of macroeconomic indicators. Finally Inaba et al. [22] analyze the interrelationship between the increase in non-performing loans (NPLs) and the performance of the real economy in Japan, modeling rst the e ect of macroeconomic variables on NPLs and then the respective feedback e ect of a raise in NPLs on the economy. They nd signi cant distorting in uence of NPLs.
In this paper the NPL (number of non-performing loans as share of total loans as share of GDP) is again drawn from "World Development Indicators & Global Development Finance" ².
Since the recursive conditional partitioning framework is used as a classi cation algorithm, the dependent variable has to be binary. It is common that an event based on NPL movement occurs only after a certain threshold. For example when the ratio exceeds 20% (see [27]). However, in this paper an event is not based on the level of the NPL ratio itself but on the level of change between the analyzed points in time. Independent of the level of NPL a su ciently large change in NPL indicates a crisis.
In this paper an event is de ned as rise in non-performing loans (NPL) of at least 10% annually compared to the previous year. The dependent variable Y will take the value 1 for an NPL event and 0 for otherwise. Hardy and Schmieder [19] describe in their work that the NPL rise around 10% from the typical levels one year ahead of what they call an average crisis, in comparison to 25% for a severe crisis. Also Vazquez et al. [36] macro stress tested credit risk in the Brazilian banking sector and found an increase of 3.3% in long term NPL in their GDP scenario as a stress e ect. This indicates that a threshold of 10% is high enough to serve as indicator for a crisis in this analysis. The plausibility of the choice is shown by observing that during the years analyzed in gure 1, the share of events de ned in such a way evolve as expected: The NPL evolution shows This paper claims that the model will be able to amply reproduce the share of rising NPL of each period.

Description of the Analysis Process
To assess the performance of the proposed model, a backtesting approach is applied: for a speci c point in time, the succeeding two years are predicted by starting with an RF model and proximity matrix on the preceding two years. Note that a time window of two years is the minimum in this paper in order to have su cient data points to assure the quality of the results. The results are then compared to the observed individual and macro stress events.
In general, the approach would contain two estimation steps: rst, the estimation of a stress state/the stressing of a subsample of the current state data and second the estimation of the full stress state in the target future using proximity based contagion and feedback e ects. The aim of the paper however is to show a methodology that allows to use only a small stressed sample of the data one is interested in and construct the rest of the stress state/scenario using the proposed methodology, namely proximity based contagion and feedback. As mentioned, the stressing itself does not form part of the paper. For this reason a perfectly accurate stress scenario model is assumed by using the actually observed stress values as stress estimates: to neutralize potential errors from stress estimation. In detail: Let's de ne stress t−>t+ (O) as the function or methodology to stress a set of observations O and proxycontfed t (S) as the application of proximity based contagion and feedb ack e ects on a sample of observations S using the proximities in time t. Then the usual way to backtest the performance of the proposed methodology would be to compare the Random Forests (RF) classi cation results for RF(O(t + )) with RF(proxycontfed t (stress t−>t+ (Sample t ))), where Sample t is a sample of the data at time t and O(t + ) is the observed stress state data in time t + . However, this backtesting approach includes the estimation of the stress state itself, in other words, it is unclear, whether inaccuracies identi ed by the backtesting are due to a failure of the methodology of proximity based contagion and feedback or the chosen approach to stress the data sample. In this paper the following approach to isolate model e ects is applied: stressed sample stress t−>t+ (Sample t )) is replaced by the actual values SS(t + ) of the sample in the stressed state, thus backtesting only:RF(S(t + ))withRF(proxycontfed t (S(t + ))). Since S(t + ) is a rather small subset of O(t + ) and since the proximities at time t are applied as proposed, the approach backtests the proposed methodology neutralizing unwanted e ects from stressing data. Note that the proximity based contagion is as described applied to the dependent variable Y t as well while the results are again matched to the classes of Y based on the threshold τ t for time t.
As soon as the duration of the feedback e ects is one year or longer, the proximity weighted averages to update each value of each variable are calculated on the inserted stress sample as well. The iteration of the proximity based feedback and contagion on all observations is re ecting the intuition that the feedback of the e ects of initially inserted stress sample observations is a ecting all participants interdependently and that it is fading with time. The fading e ect is a logical consequence of taking averages of averages.
On the resulting estimated stress dataset a new random forest is drawn, predicting the stressed state of the economy.
When applied to the whole time series, then the future state data is just the next period data, seen from the current state, which is not necessarily a macro stress period. As a matter of fact, in the used time series, only two nancial crises are macro stress periods: The crises in 2000/2001 and the nancial crisis around 2007. To consider the nancial (subprime) crisis this paper will include the periods from 2006-2009. 2006 is included to consider the level of accuracy the model achieves directly before the crisis and as such to show that it can cope with a steep rise in stress events from one period to the next. However, in each period there are events in individual countries classifying as stress events based on the de nition in this paper. The empirical analysis will thus assess the predictive power of the methodology in the next period of either individual stress events and of the macro stress events of the crises in 2000 and the nancial crisis around 2007. The current period is considered the 'current normal' while the next period is a future state stress scenario with either individual events or a macro stress event. For clarity, the empirical data, which refers to the total empirical data used in the analysis, is composed of the current dataset (time t), which will be called current state data and the known next period, following the current state data (t+1), the stress state data in case of one of the two macro stress events or the future state data in case of individual stress events . The resulting dataset from the application of the methodology of proximity based contagion and feedback will be called estimated future state data, predicting either individual or macro stress. The drawn sample to initiate the methodology is the stress sample.

Model Accuracy
As mentioned, to assess the accuracy of the applied model a new forest is drawn on the resulting estimated future state data, including the proximity weighted update of the dependent variable Y t . ThenŶ t is predicted using this new forest and the estimated future state data. The result is compared to the empirically observed classi cation of Y t+ and the accuracy is measured by three types of error: the type one error, the share of events which have been classi ed as non-events, the type two error, the share of non-events which have been classi ed as events and the average classi cation error of the two, the average error (or the average accuracy which is 1 minus the average error). In most of the Random Forests applications in the literature some form of the average error is reported.

Model Parameters and Calibration
The cforest algorithm implemented in the R 'party' package is applied ( [20]), using the following parameter settings: quadratic test-statistics with splitting only variables which are associated to Y with at least 99% signi cance. The number of sampled variables tried at each split is set to the square root of the number of independent variables and the class weight is chosen as the inverse proportion of the number of events or non-events in the dataset, both as proposed by Breiman and Cutler [9]. For the stability of the results, 5000 trees are run for each forest.
In this study three input parameters into the Random Forests (cforest) model are calibrated: the minimum number of observations in a node to perform a split, the classi cation threshold τ and the stress sample size.
The minimum number of observations in a node to perform a split is calibrated to minimize the average classi cation error of the tted forests. This parameter is found to be not very sensitive and set to the value two.
The classi cation threshold τ is calibrated for every year t such that error types one and two are as balanced as possible and as small as possible.
The stress sample size is calibrated to give accurate forecasts while being as small as possible. Based on the design of the proposed model those observations which are the most connected in the dataset, thus with the highest proximity measures, are best suited to cause contagion and feedback e ects. Thus those observations are chosen to form the minimal generating sets, the stress sample. This is done in the following way: the observations are ordered with regard to the mode of their proximities and a prede ned share is chosen from amongst the top entries of that list. The prede ned share is calibrated on the empirical data as elaborated in the next section, section 4.1. The mode is taken as measure because an observation which is most often most highly connected to other observations is more contagious than an observation which has the highest mean, which could stem from a few close observations only.
The following analysis is done on a historic rolling window of 2 years (thus the analysis starts only in 1999). As implied by the theory in section 3.2 the training samples will be sampled of unique values only. The training samples have a size of 63.2% of randomly drawn data to build the trees, as proposed by Strobl et al. [31].

. . Validation and Calibration
Before starting the analysis, proposition 1 and corollary 1 are veri ed empirically by testing whether a Random Forests prediction can be reproduced by proximity weighted input data and a generating set. Note that feedback e ects are ignored for this initial proof of concept.
Therefore, on a subset of the empirical dataset a random forest is drawn. Afterwards a sample of observations is chosen and the remaining observations are replaced by the proximity weighted average in accordance with the method outlined above. On the resulting dataset a new random forest is drawn and the prediction of the events is compared to the prediction of the events of the original random forest. Note that to verify proposition 1 and corollary 1 the generating set/stress sample is not drawn from stress/future state data but from the same current state data that the random forest is built on. The size of the sample varies from 0 to 100% to give a avor of how large a sample should be to derive an accurate approximation to the Random Forests results on the used current state data. Figure 2 shows the evolution of the type one and type two errors in relation to the drawn sample size.  subsets. Using no sample and all data, class 1 has a classi cation error of 0 and class 0 a classi cation error of 6%: In other words the type two error is 0 and the type one error is 6%. As stated in corollary 1 there is a minimal set producing the same Random Forests prediction as the whole dataset which in this case is 60% of the data (in this example the stress sample is randomly drawn and not based on an analysis of the mode). Both error types however remain relatively stable until the sample is reduced to 10% and below where the errors quickly rise to around 50% which basically means that the model assigns the classes randomly. This veri es that with a fraction of a dataset (almost down to 10%) and proximity based contagion, the same results can be achieved as with the whole original dataset. Please note that it is also shown that in an environment where the assumptions made to prove corollary 1 to not fully hold, the application of proximity based contagion on a generating set does not exactly reproduce the results of the full dataset and a certain minimum amount of observations in the sample is needed to achieve a stable accuracy. Based on this ndings, the next step is a decision on how large the stress sample should be for the remainder of the analysis. Therefore the above analysis is repeated on a 2 year rolling window and with a comparison of stress sample size s of 10%, 33% and 50%. Note that again the stress sample is drawn on the respective current state data and not stress/future state data since this analysis aims at nding a suitable sample size on known data and then test the model, including the chosen sample size on unknown out-of-the-sample data. The e ects on the average error of the estimated forest are depicted in table 1.
The results show an expected pattern through all years of reduced average-errors whenever more data is inserted. In their similar analysis Alessi and Detken [2] derive a type one error of 38%, a type two error of 25% and thus an average-error 32%. Accordingly in this paper an average-error which is roughly below 33% is considered suitable. Based on the results in table 1 a stress sample of the size of 33% of the dataset leads on average to an average error of 31% and is thus employed throughout the paper.

. . Results
To assess the performance of the framework, the average error, the type one and the type two errors are calculated for estimated future states using proximity based contagion e ects with no-and one year feedback e ects. One year forecasts are the maximum forecast period considered in this paper.
Additionally, to test whether the application of the proposed proximity based contagion framework adds value at all, a random forest is drawn directly on the current state datasets, where the stress sample has been included (referred to as initial dataset) but the proximity based contagion framework has not been applied. This shows whether all the information to correctly predict a stress/future state is already included in the stress sample or added by the proximity based contagion and feedback framework.
The above described process of backtesting the performance is implemented in the following way: for an analysis at time t, 1. the data in t O(t) is stored as current state data and the data in t + , O(t + ) is stored as stress/future data. 2. A random forest is drawn on the current state data and the proximity matrix is stored. 3. Using the proximity matrix the 33% of the most connected observations are identi ed and stressed by replacing their current state values with their stress/future state values. The current state data with the replaced values for the sampled observations is stored as 'initial dataset'. 4. The proximity based contagion and feedback e ects are applied to the initial dataset which is then stored as the estimated dataset. 5. Random forests are drawn on the estimated and initial datasets separately and the respective predicted events are compared to the observed events in the stress/future state data to estimate the type one, type two and average error.
Following are the summary results (table 2) of a proximity based stress testing with a training sample of 63%, a 33% share of stressed original observations and no-and one year feedback e ects on in-sample data.
Modeling feedback e ects is increasing the accuracy and stability of the model. The later is measured by the imbalance between type one and two errors: the lower this imbalance is and the lower its volatility is, the more stable are the results. Also the macro stress forecast for the period between 1999 to 2000 and 2006 and  Forecast (Individual Stress)-Accuracy ( -error) . % . % Macro Stress Forecast-Accuracy ( -error) . % . % Average imbalance between type one& two error . % . % Stdev of imbalance between type one& two error . % . % 2009 is more accurate than the forecast of the individual events within the full window of analysis between 1999 and 2010. However, the results shown in table 2 are derived in-sample and are thus calibrated to be most accurate. To show the accuracy and practicability of the model it has to be tested out-of-sample: the Y thresholds τ t are calibrated in period t and applied on the estimation of the following time period t+1. The following table 3 presents the summarized results of a proximity based stress testing with a training sample of 63%, a 33% share of stressed original observations and one year feedback e ects on out-of-sample data.
The out-of-sample stress/future forecast is naturally less accurate than the in-sample forecast, however the decrease in accuracy is low. Compared to the forecast power of the current state data including the stress sample, the proximity based stress testing framework is signi cantly more accurate, reducing the average error from 48% to 31% in the case of individual stress events and from 44% to 27% in case of the macro stress events. Considering the measures on the imbalance of the type one and two errors, the model is likewise adding stability. The following table 4 shows the estimated out-of-sample errors of type one and type two as well as the errors for the forest drawn on the current state data including the stress sample. The analysis is done for a set of two year windows, the column 'years' shows the oldest year of the training dataset and the youngest of the predicted set. The detailed modeled type one and two errors are especially balanced and low in the stress state in 1999-2000 and during the crisis around 2007. Note that the accuracy in the out-of-sample testing in general and speci cally on the stress states gives support to the assumption of structural stability. However, as pointed out earlier, the assumptions on which proposition 2 is derived do not fully hold on the empirical data. This justi es to use a reduced training sample of 63% instead of a higher value or even the usage of the full sample, simply because it gives some exibility to the model to cope with violations of the assumptions as well as inaccuracies in the stress scenario estimates in the stressed samples. This will undoubtedly occur if the later are estimated and not know as they are in the backtesting approach applied in this paper.
On the other hand, this paper propagates that one advantage of stressing only a small sample of observations is that those could be chosen to be especially easy to stress or that their risk indicator values in times of stress are known with great certainty. Thus if risk indicators of certain observations can be predicted accurately and inserted in a contagion based stress testing model, then the usage of a full instead of a 63% training sample is supposed to increase the accuracy. The next table 5 employs a 70 % as well as a full sample size Bereitgestellt von | UZH Hauptbibliothek / Zentralbibliothek Zürich Angemeldet Heruntergeladen am | 02.02.17 09:44 Table 4: Forecast with 63% tree growing sample, 33% stressed inputs and 1 year feedback, out-of-sample event calibration. The measures are shown in % of the underlying dataset. increasing with the training sample size. The macro stress forecast on the other hand is overall increasing but the accuracy using a training sample with a size of 70% is higher than the one with the full sample. Thus basing the method on a full training sample for each tree leads indeed to higher accuracy yet obviously increases the risk of over-tting the model. Depending on the accuracy of the model assumptions and the inserted stress data in a once built forest either the full training sample or the 63% tree growing sample size might be suitable. Note that Alessi and Detken [2] have used a training sample size of 70% in their analysis and derived a balance error of 32% for the whole time span of their analysis while this paper derives an average balance error of 30.6% for the whole sample and an average balance of 25.7% for the balance error in macro stress states.
Overall the methodology performs well and is able to predict, conditional on the accuracy of the stress sample, an upcoming future state for individual events and for the whole population. The second application of course warrants the de nition of a threshold which, if su cient observations are predicted to be in a stress state, is breached and the whole population is considered to be in a macro stress.
Analyzing gure 1 a macro stress state could be de ned as a state where a third or more of the observations are individually in a stressed state. This threshold encompasses the crisis in 1999/2000 and the nancial crisis around 2007. The following graph maps the observed share of events and the estimated share of events within two year windows. Note that the estimated share of events is the weighted average of the estimated events and non-events, using the overall average error as weight. The gure highlights again that the proximity based stress testing framework is able to predict macro stress states.

. Policy Indicators
The application of the Random Forests model allows to assess the most important variables within the proximity based stress testing framework. This adds value in the following way: The proximity based contagion and feedback model allows to identify beforehand periods of individual stress or macro stress states. Once identi ed, the most important variables or key risk drivers, with regard to the stress/future period, can be extracted from the Random Forests model. Thus the model shows, contingent on the correctness of the random forest, which risk drivers are important in a coming stress event and thus which risk drivers could be managed to prevent the results from the scenario. In other words the model points out policy indicators. The implementation of the proximity based stress testing framework as a whole can thus be summarized as follows: 1. On today's data a proximity based stress testing is applied. 2. The future events are estimated. 3. If there is no signi cant increase in events, the dataset is not a ected by the chosen stress scenario. If on the other hand there is a signi cant increase in events, the proximity based framework can be used to identify policy indicators to address the weaknesses in the dataset before the crisis, assumed by the stress scenario, emerges. 4. Identify the most important variables on the estimated future dataset. 5. Translate the most important variables into policy indicators by comparison of the estimated future values and the observed values today.
To be sure whether variables which are identi ed as important in the estimated future state data are also important in the real stress/future state data, the following backtesting is performed: the upper percentile of the distribution of the importance score of the variables in both datasets is compared and it is assessed how many variables are in both percentiles of each dataset. This shows whether the application of the above outlined methodology leads to the same importance ranking of variables in the estimated future state as in the actually observed stress/future state.
To calculate the importance score, the importance measure introduced within the recursive conditional partitioning package (party, cforest) is applied ( [20]). It is de ned in the following way: Importance is de ned by randomly permuting the values of a predictor variable and thus breaking its original association with the response. Thus, a reasonable measure for variable importance is the di erence in prediction accuracy before and after permuting a variable, averaged over all trees ([32]). The following table 6 points out the share of variables which are important in the estimation as well as the empirical data during the nancial crisis around 2007. In a proximity based stress testing framework with a training sample size of 63%, a stress sample of the  year feedback e ects, around 45%-71% of the most important variables in the estimated future state data also contribute to the macro stress data during the stress event of the nancial crisis. This is su cient to state that important variables from the stress testing exercise actually are important in a crisis situation. Table 7 shows in detail, which variables are actually important in both datasets. The most important shared   Indicators like money and quasi money (M2) to total reserves ratio or tax revenue can be called direct policy indicators. The reserve ratio can be changed by central banks and has an e ect for example on money supply and the interest rate. Likewise if taxation is identi ed as an in uential indicator it can be used directly as a policy indicator. On the other hand, many of the indicators, alike exports, are not direct policy indicators since they are harder to in uence. However, measures can be taken to curb exports.
To translate the most important variables into actual policy indicators the average estimated levels of each indicator can be compared to the average observed current state values. Based on whether the respective indicators are direct or indirect the instruments are chosen to keep the indicator from reaching the level identi ed in the stress/future state. Thus for example the policy indicator of Tax revenue can be translated into the changing of taxes or the creation of new taxes. Note that a Random Forests methodology does not allow to derive exact thresholds but only indicates which variables on which levels contributed to the results and thus the stress/future state. The full extent of how policy indicators are translated into actions is not part of this paper.

Conclusion
In this a paper a non-linear macro stress testing methodology, the proximity based stressed testing framework, with focus on early warning and crisis remediation was developed. The development was done based on heuristic derivation and mathematical proofs. The proposed methodology builds on a conditional recursive partitioning forest: by application of its proximity measures, the e ects of a small stressed sample are expanded to the whole dataset. Feedback e ects are simulated by iterating the process.
Due to the inherited characteristics of Random Forests the model is compatible with the application of big data, thus allowing to use as much variables as possible to estimate the interdependence between observations or market participants as robustly as possible. While then the application of stress scenarios on only a few observations reduces the e ects of inaccuracies in the scenarios as well as the possibility to use observations where either the stress/future state is easily estimated or known with great certainty.
It was shown that a Random Forests model on the estimated future state data predicts a potential crisis very well for an individual observation as well as for macro stress states by accurately forecasting the number of stress events. Likewise it has been shown that the most important variables leading to this events can be identi ed and potentially used as input to manage or prevent crises.
In comparison to the initial dataset and the similar model of Alessi and Detken (2014) the proposed model achieved lower average-and type one errors (table 8). Especially during the years of the crises the proximity based stress testing framework exhibits a low average classi cation error and similar type one and 2 errors.
The proposed proximity based stress testing framework is designed to consider most requirements formulated by the BIS ( [7]) such as being non-linear, containing naturally de ned contagion and feedback e ects and the capability to incorporate national and international KRIs. However, initially the framework still relies on historical data. With regard to the BIS critics towards the application of early warning systems, the proposed framework addresses this by an alternative modeling of the early warning indicator: The number of the modeled stress events itself is the early warning indicator and the most important risk drivers to estimate the early warning indicator can be used to re-mediate the crisis.