A new Copula-CoVaR approach incorporating the PSO-SVM for identifying systemically important financial institutions

Abstract The effective identification of systemically important financial institutions (SIFIs) is key to preventing and resolving systemic financial risks; thus, it is of great research significance for emerging countries to supervise SIFIs and manage systemic financial risks. Since traditional research on identifying SIFIs does not consider emerging machine learning models, it is difficult to properly fit the characteristics of actual financial institutions’ asset distribution. This paper proposes a new method for measuring SIFIs, integrating the PSO-SVM model into the Copula-CoVaR model. This new PSO-SVM-Copula-CoVaR model is meant to evaluate China’s SIFIs based on the publicly traded price data of Chinese listed financial institutions. The empirical results show that, compared with the traditional parameter method (GARCH model) and the nonparametric method (kernel density estimation), the marginal distribution estimation method using the PSO-SVM method can better fit the distribution of an institution’s financial asset return sequence. That is, the model proposed in this paper helps regulatory authorities improve the list of SIFIs more reasonably and implement effective regulatory measures.


Introduction
The global financial crisis of 2008 demonstrated that, when SIFIs get into trouble or go bankrupt, the whole financial system suffers great damage due to the spillover effect, which may eventually lead to a serious economic crisis (Fan et al., 2018;Johnson & Mamun, 2012;Zhang et al., 2021).Thus the effective identification of SIFIs is key for regulators hoping to deal with the systemic risks of the financial system (Chen et al., 2021).
As of now, this paper is one of the few studies that attempting to introduce machine learning models into the identification of SIFIs.The main identification methods of SIFIs relevant in this task are the indicator-based approach and the market-based approach (Li et al., 2021;Xu, 2011).The indicator-based approach involves constructing a corresponding index system to evaluate the importance of financial institutions based on the core concepts of SIFIs.
The market-based approach measures a single financial institution's risk contribution to the whole system based on financial market data, determining its systemic importance; compared with the indicator-based approach's index method, it can effectively overcome the problem of data lag and improve the accuracy of measuring institutional spillover risk.Among this series of market model methods, the CoVaR method, which considers the spillover effects of financial institutions, has received widespread attention (Bernal et al., 2014).
However, CoVaR is essentially a linear correlation function between returns, and there has yet to be discussion of the specific dependency structure, which has some drawbacks when it comes to describing the tail risk dependency (Benoit et al., 2013;Kleinow et al., 2014;L opez-Espinosa et al., 2012).Considering the complex structure of interdependence between financial institutions, scholars have introduced different copula functions to capture the linear and non-linear tail-related characteristics of such institutions and the entire financial system (Bernardi et al., 2017;Jaeger-Ambrozewicz, 2013;Liu, 2015) The main difference between the indicator-based approach and the market-based approach lies in their perspective for understanding the meaning of systemically important financial institutions and their related data.Although it is intuitive and simple to measure SIFIs with an indicator-based approach (Lu & Hu, 2014), such an approach needs more comprehensive data (Br€ amer & Gischer, 2013); additionally, the frequency of statistical data involved is usually low, so it is difficult to capture the changing characteristics of high-frequency data.As a result, the dynamic spillover effect of systemic risk is not considered.Compared with the indicator-based approach, the market-based approach is more forward-looking because it can adopt higher frequency data.
In summary, the market-based approach based on publicly traded data is more suitable for the identification of systematic financial institutions, and the CoVaR model combined with the copula function has the advantage of portraying the dependence of tail risk (Sun et al., 2020;Wu et al., 2021).
Despite this approach's suitability, the existing research still has several shortcomings.Firstly, with the development and complexity of the financial market, the traditional marginal distribution model based on parametric and nonparametric methods has difficulty fitting the distribution characteristics of actual financial assets.Secondly, while the rapidly internationalized and liberalized Chinese stock market is becoming increasingly important for global financial markets (Yao et al., 2018;Zhang, 2017), few studies have explored China's financial markets.Finally, with the asset structure of China's financial market changing, it is difficult to properly conduct financial stability supervision by focusing only on the banking market.
Compared with traditional statistical or simulation methods, machine learning can optimize loss or return functions by using historical data, which can better quantify the complex and changeable characteristics of financial risks.Among the available frameworks, support vector machine (SVM) is a theoretical framework and general method for establishing machine learning with limited samples (Dutta, 2022;Luo et al., 2020;Vapnik, 1995Vapnik, , 1998)).Not only does SVM have a strict theoretical basis, it can better solve practical problems (e.g., small sample, nonlinearity, high dimension and local minimum point) (Kim, 2003;Lee, 2007).This paper introduces SVM to model the edge distribution.
Therefore, this paper endeavors to make up for the deficiencies in the existing research and to make a more accurate and reasonable evaluation of China's SIFIs, based on the publicly traded price data of 27 listed financial institutions in China's capital market from January 2011 to March 2021.To do so, this paper constructs a new PSO-SVM-Copula-CoVaR model for research.Firstly, the PSO-SVM model, which is optimized by the particle swarm optimization algorithm (PSO) (Eberhart & Kennedy, 1995), is used to model the edge distribution of stock price data, and the estimation results are compared with the traditional edge distribution modeling estimation methods (such as the GARCH model and kernel density estimation).Secondly, the Copula-CoVaR function is used to describe the multivariate distribution and tail-dependent risks of financial variables completing the risk assessment of systemically important financial institutions.The results show that the risk estimation model incorporating the PSO-SVM algorithm can effectively improve the accuracy of SIFI recognition.
This paper contributes by filling the gaps in the existing literature on the identification of systemically important financial institutions, introducing machine learning algorithm into the traditional Copula-CoVaR model, and exploring the application of machine learning algorithms in the field of financial risk management.In addition, most existing studies are focused on the banking sector, and few have explored the identification of systemically important financial institutions in China's financial market.The research in this paper provides a unique reference for financial market risk research in emerging countries.
The remaining parts of this paper are arranged as follows: Section 2 is a review of the current literature.Section 3 briefly introduces the sub-models that constitute the PSO-SVM-Copula-CoVaR model and the specific steps for constructing the PSO-SVM-Copula-CoVaR model.Section 4 conducts an empirical analysis based on the actual data of China's financial market.Section 5 is the conclusion of this paper.

Literature review
This part reviews the existing literature on the identification of SIFIs by focusing on three aspects: indicator-based approach, market-based model approach, and Copula-CoVaR method.
Research on the indicator-based approach has shown that the index system for identifying global systemically important financial institutions (G-SIFIs) used by the Financial Stability Board (FSB) considers cross-border transaction activities, business scale, relevance, substitutability, and complexity.Lo (2009) analyzed the systemic risks of institutions based on factors such as business concentration, financial leverage, relevance, risk sensitivity, and closeness between institutions.Billio et al. (2012) used the Granger causality test to establish a financial network based on the stock price data of major financial institutions around the world; they then measured the systemic importance of financial institutions based on indicators such as transmittance and acceptance.Guo (2013) ranked China's systemically important banks based on different evaluation index systems used in China and internationally.Thomson (2015) pointed out that assessing the importance of financial institutions should include factors such as scale, concentration, relevance, contagion, and environment.
Furthermore, in the research based on the Copula-CoVaR model, Karimalis and Nomikos (2018) and Hakwa et al. (2012) proposed a model to measure the contribution of marginal system risk based on the copula theory and the CoVaR method; this model not only characterizes the linear correlation between financial institutions and the entire financial system but also captures nonlinear tail correlation characteristics.Reboredo and Ugolini (2015) and Boako and Alagidede (2018) applied the Copula-CoVaR method to the European debt market and African stock market, respectively.Compared with other methods of estimating CoVaR (e.g., quantile regression, tail risk network, and multivariate GARCH model), the Copula-CoVaR method can estimate the dynamic and asymmetric tail correlation between data more flexibly (Bernardi et al., 2017;Mainik & Schaanning, 2014;Patton, 2006;Reboredo & Ugolini, 2015).
Furthermore, in the research based on the Copula-CoVaR model, Karimalis and Nomikos (2018) and Hakwa et al. (2012) proposed a model to measure the contribution of marginal system risk based on the copula theory and the CoVaR method; this model not only characterizes the linear correlation between financial institutions and the entire financial system but also captures nonlinear tail correlation characteristics.Reboredo and Ugolini (2015) and Boako and Alagidede (2018) applied the Copula-CoVaR method to the European debt market and African stock market, respectively.Compared with other methods of estimating CoVaR (e.g., quantile regression, tail risk network, and multivariate GARCH model), the Copula-CoVaR method can estimate the dynamic and asymmetric tail correlation between data more flexibly (Bernardi et al., 2017;Mainik & Schaanning, 2014;Patton, 2006;Reboredo & Ugolini, 2015).

Methodology
This section briefly describes the PSO-SVM model and Copula-CoVaR model that constitute the PSO-SVM-Copula-CoVaR model; it also, introduces the main steps for constructing the PSO-SVM-Copula-CoVaR model.

The PSO-SVM model
Determining edge distribution estimation is equivalent to solving the following integral equation: This article uses independent and identically distributed data x 1 , x 2 , Á Á Á , x l to construct the following empirical distribution function:: At the same time, the boundary conditions (0,0), (1,1) are added to solve the problem of edge distribution by using a support vector machine.
The objectives for this paper are as follows: firstly, define the corresponding regression problem in image space.Secondly, construct the kernel function Kðx i , x j Þ by using the support vector machine method.Thirdly, construct the cross kernel function kðx i, tÞ: Fourthly, according to the kernel function Kðx i , x j Þ, use the support vector method to solve the regression problem-that is, find out the support vector x i, i ¼ 1, 2, . . ., N and the corresponding coefficients b

The Copula-CoVaR model
Firstly, VaR i a is the value in the a quantile.
CoVaR jji b is defined as the VaR of institution j (or the financial system) under the condition that x i t VaR i a, t when certain events occur in institution i: CoVaR jji b is the value of the b quantile under the conditional probability distribution. Pr In Formula (5), x j t and x i t are the returns sequence of sequence j and i, respectively.The following is for solving CoVaR jji a by analyzing the copula function.
According to Sklar's theorem: Based on Formula (7), u j ¼ F j ðCoVaR jji b, t Þ can be obtained, and then: Here, Step 1: Model the marginal distribution of the stock price data of financial institutions.The PSO-SVM marginal distribution estimation method of is used to model the marginal distribution of the stock price data of each listed financial institution, and the copula data of each financial institution is obtained through probability integral conversion.
Step 2: The union sequence of financial institution assets is constructed, and the union sequence Þ is obtained by using the definition of the copula function, where z i represents the overall distribution characteristics of institution i:

Data selection and basic analysis
In an efficient market, the roles, statuses, and interrelationships of institutions in the financial system can be fully reflected in the volatility and interrelationship patterns of returns in the stock market.At the same time, the daily return rates of stocks haves the beneficial characteristics of easy access and large sample size, which can better reflect the financial system's real situation.
China's financial institutions can be divided into four categories (i.e., banking, insurance, securities, and trust) according to their industries, among which banking and insurance are the two most important pillars.With the development of interest rate marketization, the traditional asset-liability business of commercial banks has been overtaken by increasingly diversified investment and financing channels; this also makes the business dealings among financial institutions in the system closer.Therefore, this paper uses 28 financial institutions as samples, choosing the daily stock return rate of each listed financial institution as its research object.The total sample is comprised of 16 banks and 12 insurance/securities/trust institutions.
In addition, considering the balance of the sample number (which covers more listed companies) and sample size (which accounts for a longer time interval), the sample time period is from January 4, 2011 to June 30, 2022, covering a total of 2,191 trading days.In this sample, the stock price rate of return r t ¼ lnP t À lnP tÀ1 , t ¼ 1, 2, . . ., T: where r t is the rate of return, and P t is the closing price at time t: The analysis data comes from Wind Information; R and Matlab software are used to process the data.Table 1 shows the basic information of the selected financial institutions, and Table A2 in Appendix A shows the results of the descriptive statistical analysis of the data.
The results in Table A2 in Appendix A show that the maximum value of the daily stock return of financial institutions is ±0.1, which is determined by the ups and downs of the Chinese stock market.The standard deviation results show that the volatility of non-bank financial institutions, especially securities institutions, is significantly greater than that of banking institutions; it also demonstrates that, the volatility of small-and medium-sized joint-stock banks is greater than that of state-owned commercial banks.In the data's distribution pattern, the skewness of the yield data of all financial institutions is greater than 0, and the kurtosis values of most financial institutions are greater than 3.These distributions are representative of the typical 'peak and thick tail' distribution characteristics of financial assets, which is confirmed again by the value of the Jarque-Bera statistics.

Construction and comparison of different marginal distribution models
In the calculation of the Copula-CoVaR model, it is particularly important to establish a suitable marginal distribution model for financial assets.At present, the modeling of marginal distribution is mainly achieved through the parametric method and non-parametric method.The parametric method mainly refers to the introduction of parameter distributions (e.g., normal distribution and I distribution) based on the GARCH model.This method, used to describe the residual distribution of yield series, reflects the series's characteristics of peak, thick tail, aggregation, and asymmetry.The non-parametric method refers to directly using the kernel density function or the empirical distribution function to describe the marginal distribution of the return sequence.In the interest of effective econometric modeling, the marginal distribution that best reflects the distribution characteristics of financial assets should be selected for research.If the fitting result of the selected marginal distribution model is quite different from the actual financial asset distribution characteristics, it will result in greater errors during the selection of the subsequent copula model, the estimation of parameters, and the calculation of CoVaR.Thus, this paper seeks the marginal distribution closest to the distribution characteristics of financial assets using three methods: the GJR-GARCH model, kernel density estimation, and the PSO-SVM algorithm.

The GJR-GARCH model
The development of the GARCH model in structural form and distribution is meant to better describe the characteristics of asset return series.The empirical data shows the advantages of the asymmetric GARCH model in the GARCH model (Abad et al., 2014); the GJR-GARCH model is found to be more effective than other asymmetric models in describing the behavior of asymmetric fluctuations in the financial market.Therefore, this paper adopts the GJR-GARCH model as the representative model of the GARCH model in the parametric method.
In these formulas, z t is an independent and identically distributed random variable, and I tÀ1 represents the set of all available information in the t À 1 period.If e t < 0, then S tÀ1 ¼ 1, otherwise S tÀ1 ¼ 0: The parameters w, c, a, and b are all non-random real numbers, and c is a parameter that reflects the asymmetry of fluctuations.In order to ensure a positive conditional variance, the parameters must adhere to the following constraints: is a necessary and sufficient condition for the wide-sense stationarity of the GJR-GARCH model.

The kernel estimation method
In the research of kernel density estimation method, the minimum square error of candidate distribution and empirical distribution is taken as the reference for selecting the optimal kernel function and window width.Firstly, under the condition of the same window width, the optimal kernel function is selected as Gaussian kernel function, rather than a box kernel function, epanechnikov kernel function, or and triangle kernel function.Secondly, the optimal window width is found to be 0.0001 in the neighborhood (0,5) with the default window width of 0.003 as the center.

The PSO-SVM estimation method
Although the non-parametric kernel density estimation method overcomes the shortcomings of the historical simulation method, problems still persist, including its heavy dependence on sample data selection and slow response to emergencies.The support vector machine is a novel small sample learning method with a solid theoretical foundation.The SVM's goal is to obtain the optimal solution in accordance with the existing information rather than just the optimal value when the number of samples tends to infinity.The algorithm converts the actual problem into a high-dimensional feature space through nonlinear transformation, which can ensure that the algorithm has good generalizability.A small number of support vectors determine the final result, which basically does not involve probability measures and the law of large numbers.These vectors, can grasp key samples and eliminate a large number of redundant samples.
Consider the study of the marginal distribution of the SVM estimation method.In the parameter-setting scenario with penalty coefficient C ¼ 1 and window width r ¼ 16, the error rate of three-fold cross-validation is taken as the reference to select the optimal kernel function and window width; the radial basis kernel is also selected as the estimation function of marginal distribution in this scenario.Secondly, the PSO algorithm is used to find the optimal parameter combination, and the search intervals of the penalty coefficient and window width are [0.5,3] and [10,20], respectively.
Finally, the optimal parameter combination is found as penalty coefficient C ¼ 1:99556 and window width r ¼ 16:99585:

Comparison of the results of different marginal distribution estimation methods
The purpose of using the above three methods to estimate the marginal distribution of the return rate series is to find the marginal distribution that best reflects the distribution characteristics of financial assets.The other purpose of the marginal distribution estimation is to estimate the parameters of the copula function.Therefore, this paper adopts the maximum likelihood method to estimate the six types of copula functions (Gaussian copula, t-copula, Clayton copula, Gumbel copula, Frank copula, and Joe copula), as well as investigating the optimal copula function form by using the information criterion.Relevant copula functions are described in the Appendix.It is estimated that the minimum mean square error considers the optimal joint distribution under different methods.The specific estimation results are shown in Table 3.
As can be seen from Table 2, the accuracy of the model's analysis results are affected by the limitations of the econometric model itself, such as the need to assume the conditional distribution of the residuals in advance, and its inability to adapt well to the characteristics of the financial price data.However, the edge distribution estimation method based on PSO-SVM has an absolute advantage in both the performance of marginal distribution and the results of copula function.The marginal distribution estimation of PSO-SVM does not just avoid the loss of data volume caused by excluding some samples (like the traditional nonparametric method), but it also obtains a distribution that is closer to the return series of financial assets than the parametric method's distribution.In addition, because of the existence of support vectors, the method is not limited by the sample data.In view of these findings, this paper uses an integrated system composed of a machine learning method (PSO-SVM) and copula function to further study the identification of systemically important financial institutions at the micro level.

Result analysis based on the Copula-CoVaR model
Table 3 shows CoVaR values calculated through different edge distribution estimation methods based on the Shanghai Securities Composite Index.Table 3 shows the CoVaR values of the top five financial institutions calculated under different methods.
According to the weak-form efficient market hypothesis, market prices fully reflect all available information, thus capturing more micro-level information while taking into account the ever-changing market risks.
The table makes it evident that, due to the special characteristics of China's financial market, commercial banks were born out of the 'universal' national banking system in the early days of the founding of the People's Republic of China.Even though Notes: #1-#5 are financial institutions ranked 1 to 5, respectively, in the calculation results.Source: Authors.
the financial system has diversified, large commercial banks still occupy the dominant position in China's financial system, making them systemically important.The results of different estimation methods consistently show that the banking industry's listed companies have an advantage in systemic importance, indicating that the financial system is still bank-led; some joint-stock commercial banks and city commercial banks also have systemic importance that cannot be ignored and should also be considered in the important matters of financial supervision.Furthermore, in order to verify the robustness of the model, Table 4 shows the value of CoVaR calculated based on the Shenzhen Securities Component Index.A comparison of Tables 3 and 4 reveals that the ranking of SIFIs is consistent, which proves the robustness of the model constructed in this paper.
Our results prove that, more than other types of financial institutions, banks are still the main contributors to financial systemic risks.Since the risk contagion between financial institutions is experiencing evolving and dynamic change, the estimation may deviate from the actual figure if the size of financial institutions is considered the most important determinant in identifying SIFIs.In the practice of financial supervision, we should not only focus on the traditional large-scale stateowned commercial banks with systematic importance; we should also strengthen the supervision of financial institutions with high vulnerability, potential damage and high debt ratios, even if the asset scale is not large enough.Notes: #1-#5 are financial institutions ranked 1 to 5, respectively, in the calculation results.Source: Authors.

Appendix A
The binary distribution joint distribution of copula function consists of two parts: distribution function of variables and copula function representing variables.The form of the joint distribution function of the corresponding strength parameters u 0 and Du is as follows: In this formula, u 1 ¼ F 1 u 0 ð Þ and u 2 ¼ F 2 Du ð Þ are the distribution functions of strength parameters, and h is the parameter of copula function.Table A1 shows the relevant information of the six functions involved in this study.
Step 3: Estimate the risk spillover value of institution i: Based on Step 2, the Copula parameters are estimated by the maximum likelihood method, and CoVaR is calculated with the Copula-CoVaR model.Step 4: Sort the estimated CoVaR values of each financial institution to get the final result.
, F j and F i are the marginal distributions of x

Table 1 .
Basic information of financial institutions.

Table 2 .
Mean square error of different marginal distribution methods.

Table 3 .
CoVaR results of different marginal distribution estimation methods (calculation results based on Shanghai Securities Composite Index).

Table 4 .
CoVaR results of different marginal distribution estimation methods (calculation results based on Shenzhen Securities Component Index).

Table A2 .
Descriptive statistics of financial institutions' yield series.