Phase separation and scaling in correlation structures of financial markets

Financial markets, being spectacular examples of complex systems, display rich correlation structures among price returns of different assets. The correlation structures change drastically, akin to critical phenomena in physics, as do the influential stocks (leaders) and sectors (communities), during market events like crashes. It is crucial to detect their signatures for timely intervention or prevention. Here we use eigenvalue decomposition and eigen-entropy, computed from eigenvector centralities of different stocks in the cross-correlation matrix, to extract information about the disorder in the market. We construct a ‘phase space’, where different market events (bubbles, crashes, etc) undergo phase separation and display order–disorder movements. An entropy functional exhibits scaling behavior. We propose a generic indicator that facilitates the continuous monitoring of the internal structure of the market—important for managing risk and stress-testing the financial system. Our methodology would help in understanding and foreseeing tipping points or fluctuation patterns in complex systems.


Introduction
Even before we could completely recover from the long-lasting effects of the global economic downturn in 2007-08 [1], we are threatened by another impending economic crisis that has been triggered by the coronavirus (COVID- 19) pandemic. The last crisis had brought us both predicament and hope! Predicament, since the traditional theories in economics could not predict, not even warn, the near complete breakdown of the global financial system. Hope, since one began to witness signs of change in economic and financial thinking, including the very fact that there is deeper (and less understood) link between macroeconomics and finance [2][3][4], which certainly merits more attention. Undoubtedly, the financial market serves as an ideal candidate for modeling a complex system [5,6], which is generally composed of many constituents of diverse forms and nature but largely interconnected, such that their strong inter-dependencies and emergent behavior change with time. Thus, it becomes almost impossible to describe the dynamics of the complex system through some simple mathematical equations, and new tools and interdisciplinary approaches are much needed. Historically, financial markets have often exhibited sharp and largely unpredictable drops at a systemic scale-'market crashes' [7]. Such rapid changes may be in some cases triggered by unforeseen stochastic events or exogenous shocks (e.g., coronavirus pandemic), or more often, they may be driven by certain underlying endogenous processes (e.g., housing bubble burst). New insights and concepts, such as systemic risk, tipping points, contagion and network resilience have surfaced in the financial literature, prompting people to better monitor the highly interconnected macroeconomic and financial systems and, thus, anticipate future economic slowdowns or financial crises.
As a spectacular example of a complex system [8,9], the financial market [10][11][12][13] displays rich correlation structures [14][15][16][17][18][19], among price returns of different assets, which have often been visualized as correlationbased networks [20][21][22] with the identification of dominant stocks as influential leaders and sectors as communities [23][24][25]. The correlation structures often change drastically, as do the leaders and communities in the market, especially during market events like crashes and bubbles [7]. Therefore, the continuous monitoring of the complex structures of the market correlations becomes very crucial and practical [18,26,27]. Recently, Pharasi et al [18,19] used the tools of random matrix theory to determine market states and long-term precursors to crashes, and confirmed that during a market crash all the stocks behaved similarly such that the whole market acted like a single huge cluster or community. In contrast, during a bubble period, a particular sector got overpriced or over-performed, causing accentuation of disparities among the various sectors or communities. However, there are no existing formal definitions of market crashes or bubbles; in fact, a certain arbitrariness exists in declaring a market event as a crash or bubble. Hence, it is extremely difficult to detect the signatures of these events so that we can timely intervene or prevent them.
In this paper, we extract information about the disorder in the market using the eigen-entropy measure [28], computed from the eigenvector centralities (ranks) [25] of different stocks in the market, and show for the first time that different market events (correlation structures) undergo phase separation [29,30] in a constructed 'phase space'. For the construction of the phase space we use transformed variables |H − H M | and |H − H GR |, computed from the eigen-entropies [H, H M , H GR ] following the eigenvalue decomposition of the correlation matrices (C) into the market modes (C M ) and the composite group plus random modes (C GR ). We further show that all market events, characterized by the [H, H M , H GR ], are either 'business-as-usual' periods (located toward the interior of the phase space) or 'near-critical' events (located at the periphery). Thus, one can see movements in the order-disorder phases as market events evolve in the phase space, as observed in critical phenomena of physical systems [31][32][33]. For robustness, we chose two different financial markets-the US S & P-500 and Japanese Nikkei-225 over a 32 year period, and studied the evolution of the cross-correlation structures and their corresponding eigen-entropies. One of the entropy difference measures, H − H M , displays scaling [34] behavior with respect to the mean market correlation μ. Further, a functional of the entropy difference measure, −ln(H − H M ), acts as a good gauge of the market fear (volatility index VIX) [35]. Analogous to the black-hole entropy that reveals about the internal structure of a black-hole, our methodology with eigenentropy (measure of market disorder) would also reveal the nature of internal market structure. Further, the phase separation would help us to label the events as anomalies, bubbles, crashes, or other interesting type of events. It would also provide a few generic indicators that would facilitate the continuous monitoring of the internal structure of the correlation epochs [18,19,26,27,[36][37][38]. We anticipate that this new methodology would help us to better understand the internal market dynamics and characterize the events in different phases as anomalies, bubbles, crashes, etc, which could help in better risk management and portfolio optimization [39]. This could also be easily adapted and broadly applied to the studies of other complex systems such as in brain science [28] or environment [40]. Note that we have included only those stocks in our analyses, which are present in the data for the entire duration, and added zero return entries corresponding to the missing days. The list of stocks (along with the sectors) for the two markets and the sectoral abbreviations are given in the SI (https://stacks .iop.org/JPCOMPLEX/2/015002/mmedia) tables S1 and S2.

Cross-correlation matrix
The returns series are constructed as is the adjusted closure price of stock i on day τ , and Δ is the shift in days. Instead of working with a long time series to determine the correlation matrix for N stocks, we work with a short time epoch of M days with a shift of Δ days. Then, the equal time Pearson correlation coefficients between stocks i and j are defined as C ij (τ ) = ( r i r j − r i r j )/σ i σ j , where . . . represents the expectation value computed over the time-epochs of size M and the day ending on τ , and σ k represents standard deviation of the kth stock evaluated for the same time-epochs. We use C(τ ) to denote the return correlation matrix for the time-epochs ending on day τ (see e.g., figure 1). Here, we show the results for M = 40 d with a shift of Δ = 20 d (other choices of M and Δ in SI figure S1).

Eigenvector centrality
The correlation matrix C can be used to produce a correlation-based network [17,26,41]. For any given correlation-based network G := (N, E) with |N| nodes and |E| edges, let A = (a i,j ) be the adjacency matrix, such that a i,j = 1, if node i is linked to node j, and a i,j = 0 otherwise. The relative centrality p i score of node i can be defined as: where M(i) is a set of the neighbors of node i and λ is a constant. With a small mathematical rearrangement, this can be written in vector notation as the eigenvector equation A|p = λ|p . In general, there may exist many different eigenvalues λ for which a non-zero eigenvector solution |p exists. We use the characteristic equation |A − λ | = 0 to compute the eigenvalues {λ 1 , . . . , λ N }. However, the additional requirement that all the entries in the eigenvector be non-negative (p i 0) implies (by the Perron-Frobenius theorem) that only the maximum eigenvalue (λ max ) results in the desired centrality measure. The ith component of the related eigenvector then gives the relative eigenvector centrality score of the node i in the network. However, the eigenvector is only defined up to a common factor, so only the ratios of the centralities of the nodes are well defined.
To define an absolute score one must normalise the eigenvector, such that the sum over all nodes N is unity, i.e., N i=1 p i = 1. Furthermore, this can be generalized so that the entries in A can be any matrix with real numbers representing the connection strengths. In order to enforce the Perron-Frobenius theorem, in the entire work, we use a i,j = |C i,j | 2 , where i, j = 1, . . . , N. In general, we can also consider higher powers n (any positive integer, as discussed and shown in SI figure S2).

Eigenvalue decomposition of cross-correlation matrix
The correlation matrix C of size N × N will have N eigenvalues, say {λ 1 , . . . , λ N }, which may be arranged in descending order of magnitude. Random matrix theory tells us that the asymptotic behavior of eigenvalues of large random matrices (whose entries are independent identically distributed random variables) follow the Marčenko-Pastur distribution [42,43]. By using the eigenvalue decomposition, we can thus filter the true correlations (coming from the signal) and the spurious correlations (coming from the random noise) [14,15,44,45]. The maximum eigenvalue λ 1 = λ max of the correlation matrix C, corresponds to a market mode C M that reflects the aggregate dynamics of the market common across all stocks, and strongly correlated to the mean market correlation μ. The group modes C G capture the sectoral behavior of the market, which are next few eigenvalues subsequent to the largest eigenvalue of the correlation matrix. Remaining eigenvalues capture the random modes C R of the market. However, in practice it often becomes difficult to separate distinctly (without arbitrariness) the group modes and random modes. In order to avoid this, we choose to decompose the correlation matrix into the market mode C M and the composite group plus random mode C GR : (1)

Eigen-entropy
Following the tradition in information theory, we use the eigen-entropy H = − N i=1 p i ln p i , since all the normalised eigenvector centralities are non-negative (p i 0) and N i=1 p i = 1, by construction. The eigenentropy may be described as kind of measure of disorder in the matrix A, where a i,j = |C i,j | 2 ; higher the eigen-entropy, higher is the disorder in the matrix; the highest being in the case of Wishart ensemble that are invariant under orthogonal transformations (WOE-Wishart Orthogonal Ensemble [46]), where H ∼ ln N. For empirical correlation matrices, the eigen-entropy will be bounded by these two limits [0, ln N].
Similarly, corresponding to C M and C GR , we can compute H M and H GR , respectively. Thus, from each cross-correlation matrix C(τ ), we can use the eigenvalue decomposition to construct the set of matrices

Market indicators
Traditionally, the market index returns r or the volatility index (VIX) are used to gauge the state of the market and detect market crises. It has been observed that the eigenvalues of the cross-correlation matrix as well as the mean market correlation μ can also act as indicators of the market state. In this paper, we propose that the eigen-entropy measures also provide important information about the market. Most importantly, the functional −ln(H − H M ) acts as an excellent market indicator (as discussed below). Figure 2 shows the eigenvalue decompositions of the correlation matrices, for: (Top to Bottom) normal, anomalous, type-1 event, crash, and WOE. We have denoted the different matrices as: full correlation C, market mode C M , group-random mode C GR , and displayed the results in figure 2 (Left to Right). The last column shows the results for the ranked eigenvector centralities p i of the different correlation modes: full (C in black curve), market mode (C M in turquoise curve) and group-random mode (C GR in gray curve). Evidently, the internal structure of the cross-correlation matrix changes a lot with time [18,19,26,27,[36][37][38] and causes the change in the importance/hierarchy of the stocks (leaders) and block structures (communities). This further changes the eigen-entropies [H(τ ), H M (τ ), H GR (τ )] that are used to create a phase space where each frame is represented by a point. As time evolves, different parts of the phase space are occupied and this allows us to identify certain phases (restricted to some regions) and characterize the market events as crashes, etc (see supplementary videos 1, 2). Interestingly, for a normal period, the three curves are distinct and there are hierarchies in ranks in all curves; for the market anomaly, all the three curves almost coincide; in the interesting type-1 period (classified due to the position of the point in certain region of phase space), the curves corresponding to the full and the group-random modes coincide while there is a strict hierarchy in the eigenvector centralities of the market mode; for crash period, the curves corresponding to the full and the market modes coincide while there is a strict hierarchy in the eigenvector centralities of the group-random mode; and for the WOE (without internal structure), once again the curves corresponding to the full and the group-random modes coincide while there is a strict hierarchy in the eigenvector centralities of the market mode.

Eigen-entropy and phases
As seen in figure 1, the events in phase space [H, H M , H GR ] appear to be scattered on a complicated manifold (created by MATLAB's surface interpolation). However, there appears to be different phases. We thus  (d)). For frame-wise evolution, see SI supplementary videos 1, 2. We found that many anomalies occurred just around the major crashes and intriguing patterns (termed as interesting events of type-1 and type-2, belonging to two distinct regions in the phase space) appeared. The crashes occupy the region in the phase space, where H − H M 0. During the crashes, the H and H M almost touch the maximum disorder, ln N (corresponding to the random WOE). The events like 'Dot-com bubble' that appear in the H − H GR 0 axis are termed as interesting events of type-1. The events which lie far away from the origin and both the axes, are termed as interesting events of type-2, which include frames with exogenous shocks (like Hurricane Katrina, etc). The events lying close to the origin are like anomalies happening right before or right after major crashes.

Phase separation, order-disorder movements and scaling
The above interesting features led us to try the transformed variables |H − H M | and |H − H GR | as independent coordinates of phase space. Very interestingly, as evident from figures 4(a) and (b), the event frames show clear phase separation-anomalies (green region), crashes (red region), normal (gray), type-1 (light  blue region) and type-2 (deep blue region), for both S & P-500 and Nikkei-225 markets. The order-disorder movements-normal (at the central region) to near-critical phases (at the peripheral regions) are intriguing.
We have also studied in detail the sequence of six frames to follow the order-disorder movements (SI figures S4 and S5) in cases of major crashes and bubbles (SI table S3). The similar nature of the order-disorder movements in all the major crashes and Dot-com bubbles, nine events in USA and eleven events in JPN, certainly indicate robustness of the method. Moreover, we found that (H − H M ) ∼ αexp(−βμ), where α and β are constants (see figure 4(c) for USA and JPN). We found that the best-fit line yields α 0.85 ± 0.03 and β 10.22 ± 0.25; adjusted R 2 = 0.95. Interestingly, the market event frames segregate into different portions, interspersed by the normal events. This data-collapse on a single curve indicates a scaling behavior [34], which implies that the co-movements in price returns for different financial assets and varying across countries are governed by the same statistical law-certainly non-trivial and striking behavior! This suggests that markets have an inherent structure that remains pretty invariant-it has an average structure with fluctuations (dispersion). The dispersion around the average behavior is slightly more in JPN than USA.
The phase properties are found to be pretty robust, though the phase boundaries are not very sharp (and may depend on the parameters like window choice, shift, etc; SI figure S1and S2). In fact, once we characterize the epochs (event frames) into different 'phases', we can actually create different ensembles of anomalies, type-1 events, type-2 events, crashes and normal events. All frames in a certain phase have very similar properties (hierarchies in ranks of stocks) and can be averaged over to represent a certain phase (figure 5). For each type of event, we find that eigenvector centralities have distinct ranges of values and the sorted eigenvector centrality  curves have interesting features (hierarchies) in the eigenmodes. The eigen-entropies actually quantify these features appropriately. For the S & P-500 and Nikkei-225 markets, we compute the histograms of the eigenvector centralities p i . Figure 5 shows the histograms (for S & P-500 (Top) and Nikkei-225 (Bottom)) for all the characterized events (anomalies, crashes, etc), averaged over the respective ensembles, for the full/decomposed matrices. For comparison, we also plot the results for the WOE (black stars). This helps us understand what actually happens in the market, during these different types of events (characterized as phases) and what type of hierarchies exist within the stocks's eigenvector centralities. This would shed new light into the understanding of formation of type-1 events, their development and crashes, etc. It is interesting to note that the properties remain similar across different markets (USA and JPN) and across various periods of time.
One could also simulate (to be reported elsewhere) various correlation structures from a correlated WOE with the mean correlation as tuning parameter. The non-trivial inherent market structure (sectors or communities) plays a crucial role in the observed scaling behavior. We also observed from the evolution of the

Generic market indicator
Finally, the functional −ln(H − H M ) is found to act as a good gauge of the market characteristic (μ) and market fear (VIX) (figure 7). There exist significant and non-trivial correlations between these variables, and the other market indicators (SI figure S8). Hence, this functional −ln(H − H M ) can serve as a very good generic indicator.

Summary and discussions
We summarize the findings of our paper: We emphasize here that our eigen-entropy measure has a few advantages-uniquely determined, non-arbitrary, computational cheap (low complexity), when compared to existing methods, e.g., structural entropy [26]. Note that the structural entropy (or any other network-based entropy measures) is very sensitive to the community structure and construction of the network. An algorithm [24] involves identifying the group mode from the correlation matrix, which may be hard and non-arbitrary (the boundary determined by the eigenvalues of the correlation matrix is not sharp). (b) We have shown that the different market events (corresponding to different correlation structures) like crashes, bubbles, etc undergo phase separation in a constructed 'phase space'. For the demonstration of phase separation in a phase space, we used transformed variables |H − H M | and |H − H GR |, computed from the eigen-entropies [H, H M , H GR ] following the eigenvalue decomposition of the correlation matrices (C) into the market modes (C M ) and the composite group plus random modes (C GR ). We reiterate that this type of phase separation behavior has never been recorded for financial markets; it is very distinct from the two-phase behavior in financial markets reported earlier by Plerou et al [47]. (c) We also showed that all market events, characterized by the [H, H M , H GR ], are either 'business-as-usual' periods (located toward the interior of the phase space) or 'near-critical' events (located at the periphery). We demonstrated movements in the order-disorder phases, for all the major crashes and bubbles (nine events in USA and eleven events in JPN). Our results are pretty robust, as we found similar features in two different financial markets-the US S & P-500 and Japanese Nikkei-225, and over a very long span of time (32 years). (d) One of the entropy difference measures, H − H M , displayed scaling behavior (data collapse) with respect to the mean market correlation μ. The data collapse certainly suggests that the fluctuations in price returns for different financial assets, varying across countries, economic sectors and market parameters, are governed by the same statistical law. This scaling behavior may motivate us to do further research as to determine which market forces are responsible for driving the market or are important for determining the price co-movements and correlations. In addition, this may lead to a foundation for understanding scaling in a broader context, and providing us with altogether new concepts not anticipated previously.
(e) We also showed that the functional of the entropy difference measure, −ln(H − H M ), acted as a good gauge of the market fear (volatility index VIX). This methodology may be generalized and used in other complex systems (to be reported elsewhere) to understand and foresee tipping points and fluctuation patterns. Our proposed methodology may further help us to understand the market events and their dynamics, as well as find the time-ordering and appearances of the bubbles (formations or bursts) and crashes, separated by normal periods.