Application of one-factor copula with Durante generators to high-dimensional data: empirical study on stock market of China

This paper investigates the performance of one-Factor copula with Durante generators (FDG copula) in high-dimensional applications. We use data with 28, 102 and 227 dimensions respectively to compare the mean absolute error in three cases. The results show that estimation error deceases as dimension increases, which means the higher the dimension, the better this model performs. Empirically, we measure the dependence between industrial sectors in Chinese stock market by FDG copulas. It is found that Machinery and equipment sector has the largest dependence coefficients with other sectors. In addition, by comparing the results before and during the COVID-19 pandemic, we find that the epidemic strengthened the connection between the Computer and Media industry and other industries. FDG copulas, tractable and flexible, suits well for high-dimensional estimation. Potential of its application to other fields remains to be discovered.


Introduction
Copulas [1] are mathematical functions used to form joint univariate/multivariate distribution functions. When we have the distribution of all random variables, we can use copula functions to model the dependence structures between them. There are many parametric families of copula, such as Gaussian copula, Archimedean copula, and elliptical copula. Different copula families allow for different dependence structure. Copula-based models provide a great deal of flexibility in modeling multivariate distributions, allowing the researcher to specify the models for the marginal distributions separately from the dependence structure (copula) that links them to form a joint distribution [2]. Copulas have been widely used in finance to model risk. Recently, they are being used within machine learning to help generate synthetic tabular data. Its flexibility and convenience making it popular especially in high-dimensional applications.
The most common copula models for high dimensional data include but not limit to Archimedean copulas, such as Gumbel copula [3] and Clayton copula [4], elliptical copulas and pair copulas, like C-Vine and D-Vine copula [5]. Some of these models are flexible but not easy to compute while others are tractable but have constraints. In 2013, Krupskii and Joe [6] proposed the concept of factor copulas that can handle multivariate data with tail dependence and tail asymmetry, properties that the multivariate normal copula does not possess. Factor copula models based on latent variables is a good IOP Publishing doi: 10.1088/1742-6596/1978/1/012045 2 alternative for modelling high-dimensional data as it provides a wide range of dependence and allow for different types of tail behavior. In 2015, Mazo et al. [7] proposed the one-Factor copula with Durante generators (FDG copula) by changing the linking copula in the original model to the Durante class [8]. This model possesses all the advantages of one-factor copula. Besides, taking Durante linking copulas allow computation of the integral in one-factor copula model and the resulting multivariate copula is nonparametric.
In this study, we intend to uncover the estimation ability of FDG copulas in high-dimensional applications. By applying this model to data of different dimensions and comparing the estimation error, we can assess its performance and see whether it is applicable to real datasets. In the empirical study, we measure the dependence between 28 industrial sectors in the stock market of China. Comparison of correlation strength between industries is made to investigate the impact of COVID-19. This paper explores the potential of FDG copulas in high-dimensional applications. Our empirical results will also provide useful information for market participants to help them manage portfolios.
In the remainder of this paper, section 2 describes the data and Section 3 presents the methodology. Empirical results are shown in Section 4 and conclusion is made in Section 5.

Data
In this study, we use daily stock prices of industrial sectors in Chinese stock market. According to SWS Research, industries in China could be classified at three levels. The first level contains 28 sectors, and the second extends the number to 102. In the third level, the classification is refined to include 227 sectors. In the comparative analysis, we use data of three levels over the period from January 1, 2019 to April 30, 2021. By applying FDG copulas to data of different classes, we can see how this model responds to the increase in dimension. In the empirical study, only the first-level data from September 3, 2018 to April 30, 2021 are used. Data before January 2, 2020 is treated as pre-event period and from January 2, 2020 onward as COVID-19 period. Thus, we are able to analyze the impact of COVID-19 to dependence between industrial sectors on the stock market of China. For computation, our data are transformed to log return form by = − −1 where is the stock price at day t. Table 1 summarizes the data of 28 industries before and after the outbreak of COVID-19. For most of the industries, standard deviation is greater during the pandemic than that before the COVID-19. Values of skewness and kurtosis indicate that data of all industries do not obey normal distribution. The significant results of Jarque-Bera statistics also provide evidence for non-normal distribution.

Methodology
To obtain marginal distributions, ARMA-GJR-GARCH (1,1) is adopted. Then FDG copulas are implemented to model joint distribution and measure the dependence coefficients. All the following calculations are done in R Software. (1,1) In this model, ARMA (Auto Regressive Moving Average) describes weakly stationary stochastic time series in terms of autoregression and moving average. GJR-GARCH [9] (the Glosten-Jagannathan-Runkle Generalized Auto Regressive Conditional Heteroskedasticity), which considers the negative lever effect, models asymmetric volatility. The ARMA-GJR-GARCH (1,1) model is expressed as

ARMA-GJR-GARCH
(2) where is the stock return at time t; is the return residual; 2 is the variance of ; γ is the lever effect and Note that = where are i.i.d variables of standard innovation. Equation (1) and (2) are applied to obtain the cumulative distribution function (CDF) for each stock. We use three possible distributions-skewed normal distribution, skewed student-t distribution and skewed generalized t distribution to fit the data and the best one for each stock is selected by Bayesian information criterion (BIC).
Since the marginal distributions in copulas should be uniform [0,1], we need to transform the CDFs by standardizing the error terms through ̂=̂. (4)

FDG copulas
FDG copula is embedded in the framework of one-factor models [10]. The class of FDG copulas is constructed by choosing appropriate linking copulas for the one-factor copula model. Let = ( 1 , … , ) be the margins obtained from ARMA-GJR-GARCH (1,1) with ~(0,1), In one factor copula model, 1 , … , are assumed to be conditionally independent given the latent variable 0 . Then the one factor copula is given by where 0 is the joint distribution of ( 0 , ) and |0 (· | 0 ) is the conditional distribution of given In FDG copula model, the class of linking copulas 0 which link the factor 0 to the variables is the Durante class. This class allows the calculation of integral in equation (5). It takes the form of ( , ) = min( , ) (max( , )) (7) where : [0,1] → [0,1], the generator of C, is a differentiable and increasing function such that (1) = 1 and ↦ ( )/ is decreasing. Function f takes different form in different parametric families.
Mazo et al. [7] gave four examples of families-FDG-CA (with Cuadras-Augé generators), FDG-F (with Fré chet generators), FDG-sinus (with Durante-sinus generators) and FDG-exponential (Duranteexponential generators). They also introduced a new family of extreme-value copulas which is calculated in FDG copula case, called EV-FDG. In this study, we only adopt FDG-CA, FDG-F, and their extreme value copula EV-FDG-CAF because the other two families do not fit our data. The Cuadras Augé allows for upper but no lower tail dependence and the Fré chet family allows for both.

FDG-F.
In equation (7), A copula belonging to the Durante class with generator (12) gives rise to the well-known Fré chet copula with parameter [12]. The Spearman's rho, upper and lower tail dependence coefficients and Kendall's tau in this family are given by We can see that upper and lower tail dependence are equal. This is a property of tail symmetry. In EV-FDG-CAF, the #, is a Cuadras-Augé copula with parameter (and therefore extremal dependence coefficient) . In equation (15) where is the MAE computed by Kendall's tau or Spearman's rho; is the number of variable pairs; ̂, is the empirical estimator and (̂,̂) is the value computed through FDG copulas.

Performance of FDG copulas in different dimensions
As mentioned above, there are 28 first-level industries, 102 second-level industries and 227 third-level industries. When we measure the dependence between sectors by FDG copulas with data of three levels, there are 28, 102 and 227 dimensions, respectively. We measure Spearman's rho and Kendall's tau by three copula families and calculate their MAE in three cases. Comparing the value of MAE at different dimensions, we can assess the accuracy of FDG copulas. The smaller MAE is, the more  Table 2. The results show that no matter for which copula families, and are decreasing with the increase in dimensions. When dimension increases from 28 to 102, the change in is not obvious, which is the case for three copula families. Only when dimension increases to 227 does the difference show in . Unlike , is more sensitive to the increase in dimensions. From 28 to 102 and 102 to 227, decrease in is diminishing. For example, when dimension increases by 74, of FDG-CA decreases 1.4%. When the dimension increases by 125, of FDG-CA decreases by 1%. In general, FDG copulas are stable and suit well for highdimensional applications. It is worth mentioning that FDG-sinus only works for first-level industrial data and FDGexponential does not fit the data at all. Thus, these two are not adopted in the empirical analysis and the reason why they are not applicable remain to be discovered in future studies.

Dependence between industrial sectors
Following the test of estimation ability, we apply FDG copulas in empirical analysis. Since the classification of second and third level of industries is too detailed, we adopt data of first-level industries where there are 28 sectors. Dependence is measure by three types of FDG copulas and is computed to select the best family for each period. Table 3 shows that FDG-CA which allows for upper tail dependence is more suitable for both periods, because its are the smallest. Empirical results estimated by FDG-CA are displayed in Table 4 and 5 and Figure 1 and 2. In Table 4, we demonstrate the mean values of dependence coefficients of top five sectors before COVID-19 pandemic. Machinery and equipment sector has the largest dependence coefficients. This is not surprising because this sector provides materials and products for most of other industries, such as Mining, Steel, and light manufacturing. Similar to Machinery and equipment sector, Chemical  During the COVID-19 period (Table 5), Machinery and equipment, Chemical and Light manufacturing remained to be the most correlated industries. However, Commerce and Transportation are replaced by Computer and Media industry. We believe that this change is not naturally happened over time but should be attributed to the COVID-19 pandemic. As is known, after the outbreak of the disease, China imposed a lockdown throughout the country. Even though it lasted for no more than three months, people's outdoor activity and traveling have been strictly restricted. For a long time, people could only work, study and spend their leisure time at home. This situation brought up the need for computers and media devices because communication with the outside world is mostly though the media and the Internet. Although stock market in China has experienced small turbulence at the early stage of the COVID-19 pandemic, it stayed strong and stable, compared with stock market in the United States and some European countries. Normally in the crisis period, correlation between industrial sectors will be strengthened due to systemic risk. However, we find that values in Table 5 are smaller than that in Table 4, which means that the correlations became weaker during the epidemic. This may be because economic production activities have been restricted, especially in the first half of year 2020, which leads to a decrease in correlations between industries. From 2018 to 2021, Machinery and equipment sector has the closest relationship with other industries. So, we listed five sectors that are most related with it in two periods in Figure 1 and Figure  2, respectively. Values in the bracket are Kendall's tau dependence coefficients. The sectors and their rankings are the same as that in Table 4 and Table 5. In the pre-event period, Machinery and equipment and Architectural ornament since the former provides raw materials and equipment for the latter. This closeness was broken and replace by Communication industry during the COVID-19 pandemic Like Computer and Media, Communication played a critical role on account of the constraints for people's daily life.

Conclusions
In this paper, we first applied FDG copulas to data with three different dimensions and examined its accuracy in three cases. It is found that this model performs better with the increase in dimensions. Then we analysed the dependence between 28 industrial sectors with data of daily stock prices from September 3, 2018 to April 30, 2021. The results show that Machinery and equipment sector has the strongest correlation with other industries. Meanwhile. The COVID-19 pandemic is found to have influenced the dependence to some extent. Under the epidemic, Computer, Media, and Communication sectors play more important roles. It is suggested that investors in the stock market could manage their portfolios in these industries as well as Machinery and equipment, Chemical and Light manufacturing. High upper tail dependence coefficients between them means high possibility of making profit from these sectors simultaneously.