Prediction of Systemic Risk Contagion Based on a Dynamic Complex Network Model Using Machine Learning Algorithm

It is well known that the interbank market is able to effectively provide financial liquidity for the entire banking system and maintain the stability of the financial market. In this paper, we develop an innovative complex network approach to simulate an interbank network with systemic risk contagion that takes into account the balance sheet of each bank, from which we can identify if the financial institutions have sufficient capital reserves to prevent risk contagion. Cascading defaults are also generated in the simulation according to different crisis-triggering (targeted defaults) methods. We also use machine learning techniques to identify the synthetic features of the network. Our analysis shows that the topological factors and market factors in the interbank network have significant impacts on the risk spreading. Overall, this paper provides a scientific method for policy-makers to select the optimal management policy for handling systemic risk.


Introduction
Systemic risk is defined by the Bank for International Settlements as the failure of a participant to meet its contractual obligations, which may in turn cause other participants to default with a chain reaction leading to broader financial difficulties [1]. Financial institutions are strongly linked to each other, and, therefore, a single large attack would quickly spread out to all the players in the market. In the past decade, there have been several unexpected extreme financial shocks that impacted the banking system. For example, in September 2008, due to the subprime mortgage crisis, a giant US-based investment bank (Lehman Brothers) collapsed, creating large impacts on almost all the financial institutions in the US, and the spread of systemic risk did not stop until a number of sovereign governments started intervening in the market. Another example is Greece's sovereign debt crisis, which gave rise to the significant instability of the major European banks from 2009 to 2015. e interbank market and systemic risk contagion have drawn great attention from academia. Based on complex network theory, previous research papers have focused on the network evolution and the contagion of risk in accordance with different topological structures of the interbank market. Random networks [2], small-world networks [3], and scale-free networks [4] have been acknowledged as successful networks to model the interbank market. Recent studies have also indicated that the interbank networks could be classified by three different structures: the community structure [5], the tiered structure [6], and the coreperiphery structure [7]. However, the real interbank networks cannot be described simply by these structures. Studies have shown that the interbank network can be characterized by the degree distribution of the nodes. Some studies suggest that the degree distribution of the interbank network follows a power law distribution in many regions (see, e.g., Brazil [8], Japan [9], and Russia [10]), while some other papers show that the degree distribution of the interbank networks follows a two-power-law distribution [11,12] in which there are potentially two pronounced power-law regions [13].
In modern financial systems, financial relationships within the interbank market are linked through an intricate network of claims and obligations from the balance sheets. erefore, merely studying the network structure of the interbank market may not bring sufficient insight in the essence of systemic risk contagion. A ground-breaking study by Nier et al. [14] proposes an approach to associate the balance sheet of each bank with the banks' financial linkages. rough this process, they investigate how the structure of the interbank network relates to the systemic risk by varying the parameters, such as the degree to which banks are connected, the size of interbank exposures, and the degree of concentration of the system. ey show that the effect of the connectivity of the interbank network is nonmonotonic, and an increase of the connectivity aggrandizes the spread effect. Meanwhile, the connectivity increases the banking system's ability to absorb shocks at the same value of the threshold. Since then, a number of papers have modeled the risk of the banking system with balance sheet regulations and complex network technology [15][16][17]. Silva et al. [18] innovated a new system to evaluate systemic risk, and the result proved the significance of considering new risk factors besides the traditional interbank market model of systemic risk. In particular, this paper only considers internal risks of interbank market; other risk factors are not the focus of consideration. erefore, this limitation will lead to underestimation of systemic risks. e research on risk contagion in the interbank market often relies on random networks and scale-free networks to simulate the interbank networks [19]. In addition, the methods of initial failures in the interbank network should be explicitly taken into account. Different methods have been proposed. Krause and Giansante [20] try to trigger a potential banking crisis by exogenously failing a bank with different bank sizes and under different power-law distributions of degrees, and then they investigate the spread of this failure through the banking system. Unlike prior studies focusing on the attacking methods to investigate the robustness of the interbank network, Georg [21] shows that banks become more vulnerable to endogenous fluctuations and occasional idiosyncratic insolvencies when a common shock strikes the entire banking system. Systemic risk contagion can be well managed if there are sufficient capital reserves. However, risk management tools are used when there are limited capital reserves such that the interbank network structures have ineligible effects on controlling the risk. For instance, based on the concept of network communicability, Maria Guerra et al. [22] introduce the impact sensitivity index and show how the index can be used as a financial stability monitoring tool. To investigate the influence of network structures in this paper, we think the most effective method should be machine learning technologies. Since 2010, machine learning has been an important method to research bankruptcy predictions and the features of crisis evaluations. Convectional approaches such as the logistic regression, support vector machine, and neural network focus on the development of the automatic risk indicator and the feature selection instrument. However, Jörg Döpke et al. [23] show that the classification accuracies of these methods are usually insufficient in this field. To improve the prediction of the model, they use ensemble of boosted trees, where each base learner is constructed using additional synthetic features.
is technique is able to recognize the impacts of features, which is what we are seeking. erefore, we decide to investigate the influence of network structures for the financial crisis using Gradient Boosting Decision Tree (GBDT henceforth).
In this paper, we propose a new method to build interbank networks based on the complex network theory and use an empirical topological structure [24] of the interbank market to simulate the internal structure (the balance sheet). According to the statistical features of the complex network, we select a crisis-triggering approach to trigger a cascading default within the interbank network, which includes the clustering coefficient, eigenvector centrality, closeness centrality, betweenness centrality, and asset size of banks. Furthermore, we analyze the influences of contagion in an interbank network under an attack on the banking system. Finally, we study the importance of factors in systemic risk diffusion with machine learning technologies. e whole research process can be used to help policy-makers to better identify and monitor systemic risk as well as to mitigate the crisis.

Dynamic Growth Model of Interbank Network.
e interbank market is able to provide an important liquidity supplement for each individual bank. With regard to the banking system, rather than being insolvent, banks are allowed to engage in liquidity demand-driven interbank trading, thus forming the bilateral interbank network. In addition, the banking system is considered as a complex network in the literature [25]. For the characteristics of the interbank network, some important conclusions have been summarized by Krause and Giansante [20]. Nonetheless, previous studies on interbank network structures did not effectively represent the evolution process of a banking system, and thus it is necessary to make an attempt to extend the research. Although we focus on interbank loans in this paper, this research could be easily extended to other financial linkages, such as payment systems or OTC derivative positions, without changing the critical features of the analysis. Regardless of the form of the trading relationship, the formation of linkages in the financial market is a dynamic growth progress. Next, we will create a dynamic growth model of the interbank network to simulate a real interbank transaction process in this paper, taking the critical factors such as the degree distribution of network, scale factor, and social factors into account.

Generating an Interbank
Network. An interbank network consists of a sequence of banks and a sequence of loan relationships, which can be described as a directed graph G(V, E). Banks (vertices) are denoted as V and the loan relationships (edges) are denoted as E. If i and j are vertices of G(V, E) and there exists an edge from i and j, it denotes the loan relationship between i and j, where i and j serve as the lender and borrower, respectively. Supposing 2 Complexity that the number of banks is ℵ � 1, 2, . . . , n { } and the network of banks is a directed and weighted network in the interbank market, the total structure of the directed and weighted network consists of interbank loans L i , i ∈ ℵ and interbank borrowing B i , i ∈ ℵ. In fact, there are capital flows rather than credit relationships between node i and j in the dynamic growth interbank network.

Starting a Trading in the Interbank
Market. At first, the interbank network begins with n initial unconnected nodes of banks. ere are some banks seeking other banks to build lending relationships with at any time. erefore, a set of outgoing (loans) links Θ out i of node i and a set of ingoing (borrowing) links Θ in j of node j will be generated. en, the probability P t ij from Formula (1) can determine whether there exists a trading linkage i to j. at is, bank node as a lender makes a loan of capital w ij to another bank j as a borrower, where w ij follows a normal distribution. At that time, we have w ij > 0, interbank loans L i > 0, and interbank borrowing B i > 0, respectively: (1) (2) Section II : exp Here, the adjacency matrix R n×n represents the interbank bilateral exposure matrix, d t (i, j) refers to the social distance from node i to j at time t, and D is the maximum distance of all nodes in the interbank network. k out i and k in j present the out-degree and in-degree of node i or node j, respectively. e trading process is able to be close to the real conditions by adjusting the parameters α and β, respectively. Furthermore, section I of P t ij denotes the impact of the degrees of nodes, which shows that banks with bigger degrees would be more likely to complete the transaction, and α(α ≤ 0) is the parameter of the network connection efficiency. Section II of P t ij denotes the impact of the social distance between two nodes. It shows how likely the social factors are to decide the connections between banks with their new neighbors. Parameter β plays a key role in generating different network structures, and the network will be more centralized by increasing β. is algorithm is also used by S. Lenzu [18]. Briefly, section I and section II of P t ij reflect the scale factor and social factor, respectively. erefore, which trading linkage will be built is decided by the probability P t ij at time t, according to Formula (1). Just by setting the number of banks (nodes) n and the size of the time step (pace) t, we can simulate the whole process of an interbank transaction. us, an algorithm for generating the dynamic growth of the interbank network is characterized by the expression N(n, t, α, β). en, we can get the total nominal claims c ij of any bank i towards In summary, we have created a dynamic growth model of the interbank network to simulate a real interbank transaction process, and an interbank network has been simulated according to the model denoted by N(n, t, α, β). Table 1 displays all parameters of the model.

Bilateral Loans Matrix of Claims and Obligations.
e interbank bilateral exposure can be represented by an adjacency matrix R n×n , where each element of R n×n is the total nominal claims c ij from node i to j. When we calculate the adjacency matrix R n×n , we must consider the fact that a bank cannot have exposure to itself. erefore, the matrix R n×n is a nonnegative real matrix, where no element is negative. In other words, each element is either zero or a strictly positive real number. us, the gross interbank loan of bank i is given by the matrix's row sum as l i � i,j∈ℵ c ij . In the same way, the gross interbank borrowing of bank i is given by the matrix's column sum as b i � j,i∈ℵ c ji :

Building the Balance Sheet of Each Bank.
To build a network structure of the whole banking system, it is necessary to incorporate the balance sheet of individual banks into the network structure. Each bank i ∈ ℵ is assumed to have a balance sheet with assets A i and liabilities L i . Figure 1 lists a stylized balance sheet of a bank that participates in the interbank market. e liabilities L i side of bank i ∈ ℵ consists of interbank borrowing b i , equity e i , and deposits d i . On the other hand, the assets A i side of bank i ∈ ℵ consists of interbank loans l i , external assets a i , and capital reserves r i . We define the capital reserves ratio as ρ i � (r i /l i ). Equation (5) implies that total assets must be equal to total liabilities for each bank i ∈ ℵ in the balance sheet:

Interbank Network Cascading Default Mechanism.
To simulate the cascading default of the banking system, it is necessary to combine the bilateral loans matrix and the balance sheet of an individual bank. is is because when we trigger a cascading default in interbank network, no matter whether the crisis is a part of banks' bankruptcy or the whole banking system is suffering from a common shock, the systemic risks will spread via the interbank network, which is observed directly by the bilateral loans matrix. When Complexity systemic risks spread to each bank, the risk tolerance of each bank decides on whether the banking system is healthy or not. By analyzing banks' balance sheets, we can assess whether whole banks can absorb further shocks and maintain the system's financial health. If there are one or more banks that have failed, systemic risks will transit to the interbank networks again. us, the whole banking system has to share the consequences of the failures. In this paper, the banking crisis begins by assuming that one or more banks failed due to a series of mistakes. For simplicity, we assume that a single bank fails initially, starting the risk contagion mechanism described above. When a bank x ∈ ℵ has failed, it implies that its interbank borrowing from other banks does not need to be paid back as follows: b x � j∈ℵ c jx and c ix � 0, i ∈ ℵ. Moreover, there are still interbank loans of bank x ∈ ℵ to other banks. If the condition is shows that the number of interbank loans of bank from bank is more than the number of interbank borrowing of bank from bank x ∈ ℵ. If the con- shows that the impact of the failure of bank x ∈ ℵ is not able to be absorbed by the interbank market. en, the adjacency matrix R n×n is as follows: en, we check the balance sheet of bank i ∈ ℵ to judge whether the capital reserves r i supply a sufficient buffer to cover the loss C xi ′ � (C xi − C ix ) − . Moreover, according to the balance sheet, it is easy to understand that the capital reserves are r i � ρ i l i . Meanwhile, external assets a i keep enough liquidity for an individual bank when the bank i ∈ ℵ has been affected by the crisis. If bankruptcy is inevitable, the external assets will be forced into a fire sale for extra cash. For simplicity, this paper studies the model without this condition and this step is omitted. erefore, for the interbank loans vector C xi ′ i ∈ ℵ, the following condition holds:

Complexity
Consequently, bank i ∈ ℵ is solvent in cases 1 and 2, and it defaulted in case 3. en, the current wave of cascading default is finished in the interbank network. If there is one or more new banks that failed due to the shock of the previous wave, the above progress is going to be repeated until all banks are able to be solvent, and systemic liquidity shortages no longer spread in banking system.

Attack Strategies for Cascading
Default. Based on the interbank network cascading default mechanism above, it is important to assess how to select the initial default banks as attack strategies since the features of the initial failed banks greatly decide the final result of the size of the bankrupt banks in the banking system. In this paper, we choose five statistical features of complex networks as selection methods and accordingly make the top 1% of nodes bankrupt in every method. Using 1000 Monte Carlo simulations for building the interbank network and cascading default to detect the performance of risk tolerance, we evaluate the effects of the selection methods.
e typical statistical features are the eigenvector centrality [26], closeness centrality [27], betweenness centrality [28], clustering coefficient [29], and total assets of banks. In consideration of the impact of large banks, we have also added the total assets of banks to the selection methods.

Calculating the Variable Importance (VI) Based on Gradient Boosting Decision Tree.
e following content introduces the Gradient Boosting Decision Tree [30] and its derivation algorithm for the evaluation of feature importance. e theory of GBDT derives from boosting, which is a stagewise additive method that iteratively adds a function to the combined estimator and adjusts the weights of training data in order to build a strong classifier by linearly combining a set of weak classifiers. Similar to the boosting, GBDT constructs many regression trees iteratively as baselearners, which use a tree-like graph or model and are built through an iterative process that splits each node into child nodes by certain rules, unless it is a terminal node that the samples fall into. e principle idea behind GBDT is to construct the new base-learners to be maximally correlated with the negative gradient of the loss function in order to get higher accuracy of the model. e Gradient Boosting Decision Tree in the settlement of classification problems is a competitive, highly robust algorithm and is appropriate for mining less than clean data [31]. One essential advantage of GBDT is that its interpretation performs well because we can clearly understand the variable importance of GBDT that represents the influences of the features. VI is computed as the (normalized) total reduction of the split criterion brought by the current variable. e higher it is, the more important the variable is. In this paper, we describe the algorithm of VI in the GBDT framework.
For a GBDT with M trees, the VI of variable j is I j � (1/M) T I j (T), where T is a single tree in the GBDT and I j (T) represents the VI of variable j in the T. en, I j (T) can be calculated as follows: and Z represents all nonterminal nodes in T and v t represents the variable chosen at node t. Function i t (function of t) is used to calculate I j (T) generated from the "impurity," which we choose to split the node. We let D t represent the data set at tree node t, and D l t or D r t represents those of its left or right child, respectively. ere are two types of the impurity, namely, entropy and Gini impurity. We use Gini impurity to present the enumeration theory of i t : To sum them up, we get

Results and Simulations
We select 768 banks with over U.S. $100 billion in total assets as of the end of 2018 from the EU15. e EU15 consists of the following 15 countries: Austria, Belgium, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, Netherlands, Portugal, Spain, Sweden, and the United Kingdom. We first obtain the banks' financial data, such as the number of banks and the assets of each bank, from BANKSCOPE. Figure 2 shows the distribution of banks' total asset value in 2018. According to the empirical analysis of the interbank network structure, the network of EU15 has an obvious two-power-law degree distribution. is network, which is able to describe a special scale-free network, is also characterized as a two-power-law functional relation. Technically, a two-power-law degree distribution network is characterized by a complementary cumulative distribution function (CCDF) of assets (weights). Figure 3 describes the log-log plot for the histogram, where the CCDF of the assets of banks is. is figure shows that the number of degrees and the total assets do follow a two-power-law distribution.

Simulating an Interbank Network.
In this paper, we develop dynamic complex network model to simulate EU15 interbank network in 2018 according to the model denoted by N(768, 100, 2, 5). en, we have a dynamic growth interbank network, which is from a complicated dynamic evolvement process. Figure 4 shows the CCDF of the degrees and weights that also fall into two-power-law distributions. Notably, the CCDF of the degrees divides into two sections with different slopes. e first section with the dotted green line corresponds to the power-law distribution p(k) ∝ k − c c � −0.4224, and the second one corresponds to Complexity c � −8.5737, where k is the degree of nodes. is method is based on the assumption that similar distributions possess similar structures, and thus we can simulate the interbank network of the EU15 with the same network structure. By contrasting Figures 3 and 4, we have come to the conclusion that the simulation and the real interbank network's CCDF have similar structures and slopes. erefore, we can use the model to build the interbank network without the relative data of balance sheets, which is superior when simulating the loan relationship between two banks, and to generate a bilateral trading matrix. Figure 5 presents the directed graph of the global interbank networks. To better illustrate this network, we select only the top 100 degree banks; the sizes of the nodes and edges hinge on the degree and weight of the respective bank in the left panel of Figure 5.
As mentioned above, substituting data is needed when the data of interbank trading are difficult to obtain. By simulating the same structured network, the bilateral trading matrix of claims and obligations can be estimated by a dynamic growth interbank network. Kanno [32] was the first to combine the above method and the balance sheets of banks, which are used to solve this type of question. So far, a dynamic growth model of an interbank network, including the directed graph of the interbank network, the bilateral trading matrix of claims, and the obligations and every bank's balance sheet, has been constructed.

Cascading Default in Interbank Network.
To investigate the likelihood of cascading default, we must trigger a crisis in the interbank network. Here, a financial crisis is one or more bank failures giving rise to a lot of bad assets in the banking system. We attack some nodes as initial failures, respectively, based on the top 1% rank of the clustering coefficient, eigenvector centrality, closeness centrality, betweenness centrality, and assets of banks. To investigate the impacts of different attacking methods on financial stability, Figure 6 shows how systemic risk spreads in the interbank network based on different attacking strategies in the different reserves ratios r.

Feature Influence of Interbank
Network. Based on the above framework, we evaluate the influence of the capital reserve ratio on the stability of banking system under the different attack strategies. However, the feature influence of interbank network has not been taken into account comprehensively, and we largely ignore the importance of interbank network structures. In this case, we design experiments to research the influence of the interbank network structures for five different capital reserve ratios r. Following Gai et al. [33], the definition of a financial crisis is an environment in which at least 5% of banks go bankrupt. erefore, when the percentage of bankrupt banks exceeds 5%, we identified that the financial crisis occurred in the interbank network. Based on the above discussion, the 1000 groups of networks N(768, 100, 2, 5) are generated via a Monte Carlo simulation, and we randomly select an attack strategy that triggers a cascading default until the end of the 8 Complexity risk's contagion. Eventually, we would see that the financial crisis occurred or that nothing occurred in each group of networks. Based on complex network theory, we calculate the statistical network features of each network, which can be evaluated by machine learning technology. As Table 2 indicates, there are 32 generated features for assessment: the features of the network, the attack strategies, the financial information, and so on. Most of the features could be interpreted in a straightforward way by complex network theory. In this paper, we train 1000 sets of features' data to forecast a financial crisis using GBDT. In the meantime, the feature influence is assessed by the derivation algorithm of GBDT.

Discussion
In Figure 6, it is noticeable how the reserves ratio impacts the stability of the interbank network. When the financial risk propagates in the interbank market over time, the systemic liquidity shortages no longer spread in the banking system. is finding indicates that there are no more bank failures in a certain wave, and we consider this wave as the final wave ( Figure 6, right side). Investigating the final wave is a meaningful work, and Figure 7 summarizes the forms of the final waves for different initial attack strategies. It is easy to see that, under the clustering coefficient attack strategy, the number of defaulting banks shrinks rapidly at the reserves ratio r � 7.44%, while under other attack strategies, the reserves ratio r begins to shrink under a range of values from 9% to 11%. To survive during the attack on the banks of top 1% asset, the highest reserves ratio is needed to prevent financial contagion in the interbank network when compared to the other attack strategies. e specific effects of the initial attack strategy are shown in Figure 8. Of course, we cannot ignore the fact that the topological structure of the network has a significant impact on potential financial crisis. β is the critical parameter of the network's generation. With the increase of β, the network connection will gradually become more centralized. However, too high or too low β is harmful for the stability of the banking system. An undesired value of β can increase the capital reserves required by the banks to control the spread of the cascading default, as shown in Figure 9. In addition, we want to understand the impacts of other features.
In this paper, we evaluate the influence of features by calculating the VI of GBDT. e greater the feature's impact on the stability of the banking system, the bigger the VI of the feature. In Table 3, we present the top 10 features for each of the considered capital reserve ratios r. From this, we study the popularity of the features for different capital reserve ratios. As shown in Figure 6, regardless of adopting any attack strategy to trigger the initial failures, the crisis should be controlled by setting the capital reserve ratio r > 20%. On the other hand, the crisis would be nearly inevitable when capital reserve ratio is less than 5%. erefore, the influences of features are meaningless for the crisis at reserve ratios of 10 Complexity r � 5%, 20%. Furthermore, AS (attack strategies) are the most important feature that is observed at r � 10%, 15%, and the statistical characteristics of high-ranking banks also tend to have bigger impacts on crises. Some of the features, such as 1% ACCEN and 5% ACCO, are very popular for each reserve ratio. It is necessary to pay particular attention to  high-VI features for policy-makers in order to cope with the effect of high-VI features changes.
In this paper, we build a dynamic growth interbank network based on the complex network theory and empirical analysis with the balance sheet of banks. Based on the above algorithmic framework, we investigate the impacts of the crisis-triggering approaches and network features on the stability of the interbank network. By controlling the capital reserves ratio of each bank, we can satisfy the requirements of financial security at the minimum costs. e findings of this paper can serve as guiding policies from the macroscopic angle and quantify the systemic risks based on the network structures for policy-makers. On the other hand, we find that when banks with higher total assets have defaulted, the whole financial system appears to be vulnerable. However, when banks with higher clustering coefficients go bankrupt, it would be better, and the concentration of the interbank network being too high or too low would benefit the dispersion of risk. us, this paper develops a method for policy-makers to take action to prevent liquidity shortages in the interbank market and discusses the key influencing features for a crisis.
Data Availability e data source of this paper is produced by Bureau Van Dijk from BANKSCOPE (https://bankscope.bvdep.com/).

Conflicts of Interest
e authors declare that they have no conflicts of interest.