Identifying long-term precursors of financial market crashes using correlation patterns

The study of the critical dynamics in complex systems is always interesting yet challenging. Here, we choose financial market as an example of a complex system, and do a comparative analyses of two stock markets - the S&P 500 (USA) and Nikkei 225 (JPN). Our analyses are based on the evolution of crosscorrelation structure patterns of short time-epochs for a 32-year period (1985-2016). We identify"market states"as clusters of similar correlation structures, which occur more frequently than by pure chance (randomness). The dynamical transitions between the correlation structures reflect the evolution of the market states. Power mapping method from the random matrix theory is used to suppress the noise on correlation patterns, and an adaptation of the intra-cluster distance method is used to obtain the"optimum"number of market states. We find that the USA is characterized by four market states and JPN by five. We further analyze the co-occurrence of paired market states; the probability of remaining in the same state is much higher than the transition to a different state. The transitions to other states mainly occur among the immediately adjacent states, with a few rare intermittent transitions to the remote states. The state adjacent to the critical state (market crash) may serve as an indicator or a"precursor"for the critical state and this novel method of identifying the long-term precursors may be very helpful for constructing the early warning system in financial markets, as well as in other complex systems.


Introduction
A financial market is a highly complex and continuously evolving system [1][2][3]. To understand the statistical behavior of the financial market and its constituent sectors [4][5][6][7][8][9], researchers focused their attention on the information of co-movements and correlations among the stocks of the market. It is well known that the mean correlation among the stocks assumes much higher values during market crashes than in normal business periods [10]. Similarly, certain correlation structures seem to occur more frequently than by pure chance (randomness), specially when markets approach a critical period or crash [11,12]. However, to identify such similar (clusters) correlation patterns, referred as "market states", as was previously attempted by Munnix et al. [13,14], is rather challenging due to many factors. The first factor is that financial time series is non-stationary; second factor is that there is always noise present in the correlations computed over finite length time series data [15], and it is essential to suppress the corresponding noise in correlation matrices to reveal the actual correlations. To tackle the first factor of non-stationarity, we work with short time series so that the number of time steps over which we compute the correlations can be considered as reasonably stationary. However, with short time series the correlation matrices become highly singular [16][17][18]. To tackle the second factor of noise-reduction, various techniques [19,20] are available. Here, we shall use a recent and efficient one, namely the power map method [19,21,22], for noise reduction as well as breaking the degeneracy in the eigenvalues so that the correlation matrices are no longer singular. Furthermore, the problem of finding similar clusters (groups) of the correlation patterns is a daunting task by itself. To go beyond the simple quantification of financial market states in terms of the average correlation, clustering techniques seem promising as does the study of eigenvalues of the correlation matrix of the corresponding time series [15]. In the research of clustering, the k-means method has had some success for top-todown clustering, but it suffers from one major drawback: the number of clusters and thus the number of states is largely arbitrary (or ad hoc). Earlier, Munnix et al. [13] had provided a scheme where all the correlation frames at different timeepochs were initially regarded as a single cluster and then divided into sub-clusters by a procedure based on the k-means algorithm. They stopped the division process when the average distance from each cluster center to its members became smaller than a certain threshold. Based on the top-to-down hierarchical clustering method and the threshold at 0.1465, which represented the best ratio of the distances between clusters and their intrinsic radii, Munnix et al. had determined the number of markets states for USA to be eight. In the present paper, for determining the "optimal" number of clusters, we use multidimensional scaling (MDS) technique [23] with two/three-dimensional representations, which are comparatively easier for visualization and studying timeevolution. So, using multidimensional scaling map, we apply k-means clustering to divide the clusters of similar correlation patterns into k groups. We propose a new way, based on the cluster radii, of estimating the number of clusters k, which is fairly robust and stable. We thus have a considerable degree of confidence in determining the "optimal" number of market states identified by the new prescription. For our research, we have used adjusted closure price data from Yahoo finance [24] for the S&P 500 (USA) and Nikkei 225 (JPN) stock exchanges, for the 32-year period . The stock list has been filtered such that we have stocks which were included in the market index for the entire period of 32 years. Among others, our main finding is that there exist four market states in USA and five in JPN. We then study the dynamical transitions between the market states, in a probabilistic manner; we also analyze the co-occurrence of paired market states and find that the probability of remaining in the same state is much higher than jumping to another state. The transitions mainly occur among adjacent states, with a few rare intermittent transitions to the remote states. The state adjacent to the critical state may indicate a "precursor" to the critical state (market crash) and this novel method of identifying the long-term precursors may be very helpful for constructing the early warning system in financial markets, and in other complex systems.
The paper is organized as follows: We present briefly the methodology and the data description. Then we present the main part of data analyses along with the above mentioned findings. Finally, we present summary and concluding remarks.

Data description
We have used the database of Yahoo finance [24], for the time series of adjusted closure price for two countries: United States of America (USA) S&P 500 index and Japan (JPN) Nikkei 225 index, for the period 02-01-1985 to 30-12-2016, and for the corresponding stocks as follows: • USA -02-Jan-1985 to 30-Dec-2016 (T = 8068 days); Number of stocks N = 194; • JPN -04-Jan-1985 to 30-Dec-2016 (T = 7998 days); Number of stocks N = 165, where we have included the stocks which are present in the indices for the entire duration. The sectoral abbreviations are given in Table 1.
The list of stocks (along with the sectors) for the two markets are given in the Tables S1 and S2 in Supplementary Information.

Cross-correlation matrix and power mapping method
We present a study of time evolution of the cross-correlation structures of return time series for N stocks, and determination of the optimal number of market states (correlation patterns that exist more frequently then by pure chance or randomness); also, the dynamical evolution of the market states over different time-epochs. The daily return time series is constructed as  figure 1, we show the time evolution of the return of the market index, r(τ ), along with the mean market correlation (average of all the elements of the cross-correlation matrix), µ(τ ), and the Gini coefficient that characterizes the inequality in the distribution of the correlation coefficients. Evidently, whenever there is a market crash (fall in the r(τ )), the mean market correlation µ(τ ) rises a lot, and the Gini coefficient falls drastically, indicating that market is extremely correlated and all the stocks behave similarly (see Ref. [10]). Since the assumption of stationarity manifestly fails for longer return time series, it is often useful to break the long time series of length T , into shorter n time-epochs of size M (such that T /M = n). The assumption of stationarity then improves for the shorter time-epochs used. However, if there are N return time series such that N > M , then this implies that the correlation matrices are highly singular with N − M + 1 zero eigenvalues, leading to poor statistics. As mentioned in the introduction, we thus use the power map technique [19,21,22] to suppress the noise present in the correlation structure of short time series. In this method, a non-linear distortion is given to each cross-correlation coefficient within an epoch by: where is the noise-suppression parameter. This also gives rise to an "emerging spectrum" of eigenvalues, arising from the breaking of the degeneracy of the zero eigenvalues (see Ref. [15] for a recent review). Evidently, whenever there is a market crash, the mean market correlation becomes very high and the Gini coefficient becomes very low, indicating that all the stocks behave very similarly.

Noise-suppression in a short time cross-correlation frame
First, we study the effect of noise-suppression parameter on the cross-correlation matrix and its eigenvalue spectrum within a time-epoch. The cross-correlation structure can be visualized easily through a two/three dimensional map of coordinates generated through a multidimensional scaling algorithm. The MDS is a tool of non-linear dimensional reduction to visualize the similarity of the data set in a D-dimensional space. Each object is assigned to a coordinate space in D-dimensional space keeping the betweenobject distance preserved, as close as possible. The choice of D = 2 or D = 3 is for optimizing the object location to two/three-dimensional scatter plot or map. As an input to the MDS algorithm, we provide the distance matrix [25], generated from the correlation matrix, using the non-linear transformation: The effect of the variation of the parameter on noise reduction and determining the optimal number of market states, can thus be better captured through the MDS. The question is what should be the ideal choice of the noise-suppression parameter ? A very small value of , say = 0.01, surely breaks the degeneracy of eigenvalues (giving rise to an "emerging spectrum" with interesting properties [10]) but does not contribute much to noise-suppression. On the other hand, a large value, say = 0.5, suppresses the noise in the correlation pattern and helps in clustering better way; however, the emerging spectrum approaches towards the main Marcenko-Pastur distribution [26]. In this paper, we are more interested in noise-suppression in the cross-correlation matrix within a single time-epoch rather than properties of the emerging spectrum; hence, we use = 0.6 and this choice of a high value is based on the robustness and finding distinct clusters of stocks using MDS. The effect can be clearly seen through the supplementary figures S2 and S3. Further, our main aim is to find the optimal number of market states, based on correlation structures which are similar and appear more frequently. Hence, we formulate a similarity measure between different cross-correlation matrices at different time-epochs τ , and then find similar groups of correlation frames across different time-epochs. We find that with = 0.6, the noise suppressed cross-correlation structures can be grouped well into similar clusters, as we will describe below. However, we find that the number of market states is not very sensitive to the noise-suppression parameter. A higher value of lowers the mean of the cross-correlation coefficients, µ (see supplementary figure S1) and the maximum eigenvalue λ max of the cross-correlation matrix. Figure 2 shows the results of the noise-suppression on the short time crosscorrelation matrix using power mapping method [10,16,19,27]. Figure 2 and (c), respectively. As mentioned earlier, for any short time series M < N , the highly singular correlation matrices will have N − M + 1 degenerate eigenvalues at zero. Hence, in our case the eigenvalue spectrum consists of 175 eigenvalues at zero, followed by 19 distinct positive eigenvalue. The non-linear power mapping method removes the degeneracy of eigenvalues at zero, leading to an emerging spectrum [10,15]. Figure 2(d) shows the correlation pattern for = 0.01. The effect of the small distortion on the corresponding eigenvalue spectrum and MDS map is shown in figures 2(e) and (f), respectively. The effect is less visible on MDS map; λ max reduces its value by a small amount from 44.05 to 43.67. Next, we use a high value of noise-suppression parameter = 0.6 to reduce considerably the noise of the correlation frame (shown in figure 2(g)).
The effect of = 0.6 is clearly visible on the corresponding eigenvalue spectrum and MDS map, as shown in figures 2(h) and (i), respectively. The shape of the eigenvalue spectrum changes completely. The emerging spectrum from 175 eigenvalues at zero is now non-degenerate in nature, and shows a spread around zero with some negative eigenvalues. Inset of the figures 2(e) and (h) show the emerging spectra in greater details, while for the inset of figure 2(b) the emerging spectrum is absent. Note that, for = 0.6, the value of highest eigenvalue λ max decreases by a large amount to 27.27; the clusters of stocks in the MDS maps are distinct and denser as compare to low noise-suppression ( = 0.01) or without noise-suppression ( = 0). The change in λ max as well as the eigenvalue spectrum is clearly visible (the height and spread of the "emerging spectrum" shown in the inset); the clustering does not change much at this small value. In (g), (h) and (i), when a higher distortion of = 0.6 is given to the correlation frame, the shape of emerging spectrum as well as the MDS map change drastically. The emerging spectrum for = 0.6 is broader compared to = 0.01. In the MDS plot, the stocks with high correlations come nearer to each other and form more compact and distinct clusters, as compared to = 0 and = 0.01.

Noise-suppression in a similarity matrix among correlation frames over different time-epochs
The noise-suppressed cross-correlation structures of return matrices C(τ ) across different times τ = 1, ..., n, can be compared based on their similarities. If there are two correlation matrices C(τ 1 ) and C(τ 2 ) at different time-epochs τ 1 and τ 2 , each computed over a short time-epoch of M days, then to quantify the similarity between the correlation structures, the similarity measure is computed as: .. | denotes the absolute value and ... denotes the average over all matrix elements {ij} [13]. We then use the MDS map to visualize the information contained in n × n similarity matrix, where each element is ζ(τ p , τ q ), where p, q = 1, ...n. Interestingly, the noise-suppression applied to individual correlation frames in short time-epochs, has a dramatic effect in the similarity matrix too. Figure 3 shows the effect of noise-suppression on the similarity matrix [13] and the corresponding MDS map. Each correlation frame is computed with N = 194 stocks of USA; hence, for the time series of length T = 8060 days during the period 1985-2016, there are n = 805 correlation frames constructed from short time-epochs of M = 20 days and shifts of ∆τ = 10 days (50% overlapping time-epochs). Similarly, we have N = 165 stocks of JPN; the time series of length T = 7990 days in the same period yield n = 798 correlation frames. The sharp changes in the structural patterns of the similarity matrices become evident at higher = 0.6. It is noteworthy that figure 3(e) shows the block structure for the USA market and reveals the fact that behavior of USA market was relatively calmer till 2002 and it became more volatile afterwards; the red-yellow stripes highlighting the crash periods. Similarly, figure 3(g) shows that the JPN market became more volatile from 1990 onward; also, it went through more critical periods as compared to USA market. Importantly, the MDS maps with the noise-suppression parameter = 0.6 are more compact and denser, which lead to better clustering and determination of optimal number of markets states (see also supplementary figures S2 and S3).

Determining optimal number of market states
To determine the number of market states, we find the number of clusters that can group together the noise-suppressed cross-correlation return matrices C(τ ) across different time-epochs τ = 1, ..., n, based on their similarities [13]. We use the MDS map to visualize the information contained in n × n similarity matrix, and then use this MDS map with n objects for k-means clustering. The k-means clustering, which is a heuristic algorithm, aims to partition n numbers of correlation frames into k clusters or groups in which each object/frame belongs to the cluster with the centroid (nearest mean correlation), serving as a prototype of the cluster. In k-means clustering, the value of k can be optimized by different techniques [28,29]. Here, we propose a new approach for optimizing k. We measure the mean and the standard deviation of the intracluster distances using an ensemble of fairly large number (say 500) of different initial conditions (choices of random coordinates for the k-centroids or equivalently random initial clustering of n objects); each set of initial conditions may result in slightly different clustering of the n different correlation frames. If the clusters are distinct (or far apart in coordinate space) then even for different initial conditions, the k-means clustering yield same results, yielding a small variance of the intra-cluster distance. The problem of allocating the frames into the different clusters becomes acute when the clusters are very close or overlapping, as the initial conditions can influence the final clustering. So there is a larger variance of the intra-cluster distance. Therefore, the minimum variance or standard deviation for a particular number of clusters displays the robustness of the clustering. For optimizing the number of clusters, we propose that one should look for maximum k, which has the minimum variance or standard deviation in the intra-cluster distances with different initial conditions. We suppose this is easier than determining the "elbow point" from the intra-cluster distance versus number of clusters curve [29].
For each cluster, one computes the average/variance of the point-to-centroid distances for all the points belonging to the cluster; the mean/variance of the intracluster distances is the mean/variance of the k values obtained from each of the k clusters. Next, we use 500 different initial conditions for the k-means clustering, each yielding a slightly different clustering result. One then computes the average as well as the variance (or standard deviation) of the mean intra-cluster distances among the ensemble of 500 runs. Then, the plots of average intra-cluster distance as functions of the , show the plots for 500 initial conditions. As mentioned earlier, the value of k is optimized by keeping the standard deviation lowest and the number of clusters highest; note that for k = 1, the standard deviations are always trivially zero. We find that for USA, the standard deviations are low till k = 4 and then grow for higher number of clusters; thus, k = 4 is the optimal number of clusters. For JPN, which is more complex than USA, the standard deviation is low for k = 1, 2, 3, increases for k = 4 and then decreases drastically for k = 5; beyond that again the standard deviation is higher. Thus, k = 5 is the optimum number of clusters for JPN. The final k-means clustering of the correlation frames in the similarity matrix is therefore performed for k = 4 clusters (USA) and k = 5 clusters (JPN), as shown in figures 5(a) and (b), respectively. We identify the points in each cluster (different colors represent different clusters) with similar correlation patterns and nearby mean correlation as one market state. Based on k-means clustering, figure 5(c) shows four different market states S1, S2, S3 and S4 of USA, where S1 corresponds to the calm state (with low mean correlation) and S4 corresponds to the crash or critical state (with high mean correlation); figure 5(d) shows five market states S1, S2, S3, S4 and S5 of shows the four different states of USA market S1, S2, S3 and S4, where S1 corresponds to a calm state (with low mean correlation) and S4 corresponds to the crash or critical state (with high mean correlation). (d) shows the fives different states of JPN market S1, S2, S3, S4 and S5, where S1 corresponds to the calm state and S5 corresponds to the critical state.
JPN, where S1 corresponds to the calm state and S5 corresponds to the critical state, respectively. The states are arranged in the increasing order of mean correlation. Here, we can also see clear differences structure-wise among the correlation matrices, e.g., there are strong intra-sectoral correlations within the energy, finance and utility sectors, in each of the market states of USA. It may also be mentioned that the selection of noise-suppression parameter = 0.6 is not totally arbitrary. We compared the plots of the average intra-cluster distance as function of the number of clusters for both USA and JPN, using ranging from 0.1 to 0.7 (shown in supplementary figures S2 and S3). The outcome of the comparison is that = 0.6 yields the best results.

Co-occurrence probabilities and dynamical transitions of market states
Once the classification of the short-time cross-correlation frames into different market states are complete, one can follow the evolution of the market as dynamical transitions of the different markets states. Figures 6(a) and (c) show the evolution dynamics of market states of USA and JPN, during 1985-2016. In USA, the market oscillates among the four states S1, S2, S3 and S4. Often S1 or S2 states (with relatively low mean correlations) tend to remain in the same state for a long time; at other times, the market jumps to a higher mean correlation state S3 or S4. Similarly, for JPN the dynamical transitions among the five market states S1, S2, S3, S4 and S5. The probabilistic plots of the market states dynamics are shown in figures 6(b) and (d), for USA and JPN, respectively. The color length of any market state is the probability of that state computed during 110 days (10 overlapping epochs). Evident from the probability plots: (a) In USA, before 2002 the market was mostly in state S1; the market became more volatile, with more frequent transitions to other states, 2002 onward, and (b) in JPN, market became more volatile from 1990 onward. The same kind of behavior is also observed from the temporal evolution of the mean correlation (see supplementary figure  S1).  Tables 2 and 3. The probability of the co-occurrence of paired market states (S3, S4) of USA is about 6%. If we neglect the diagonal entries of the bar plot, which shows the high probabilities of staying in the same states, then we can safely infer that with the significant transition probability, the state S3 of USA acts like a "precursor" to the state S4 (market crash); similarly, for JPN the state S4 acts like a "precursor" to the critical state S5, with significant transition probability of about 8%. Entries just above and below the diagonals of the 3D bar plots are also quite high, which show that the transitions primarily happen between immediately adjacent states, and only exceptions of remote transitions being  Finally, let us test the simple hypothesis whether the system jumps randomly from state S i to S j with probabilities W ij or not. Note that, if we simply look at the curves in figures 7 (c) and (d), it is not obvious that this is indeed the case. However, if we make this hypothesis, we can obtain expressions for the probability that the system should be in one state over long times. This follows from the general theory of Markov chains [30], but for the sake of keeping the paper self-contained, we briefly explain the details below.
Let P i (n) be the probability that the system be in state i after n steps (timeepochs). Using the definition of W ij , as well as the assumption that the transition to j depends only on the previous state via W ij , and in no way on the previous history, we obtain where the sum is over all possible states j. After long times, it is plausible, and can in fact be proved rigorously, that the probability distribution becomes independent of n; in other words, the distribution reaches an equilibrium state P (0) i . The latter then satisfies the equations This can be solved explicitly, if W ij is known. The solution can be proved to be always  Table 2. USA: Co-occurrence probability of four market states (MS) (first is followed by second).  Table 3. JPN: Co-occurrence probability of five market states (MS) (first is followed by second).
positive, and can always be normalized such that so that the numbers P (0) i can indeed be interpreted as a set of probabilities. In the cases where the W ij 's are given by Table 2 (for the USA) or Table 3 (for  JPN), it is straightforward to compute the equilibrium distributions: for the USA, one finds: For JPN, on the other hand: The actual frequencies for the four characteristic market states S1, S2, S3, and S4 of USA, obtained from figure 6(a), enable us to compute the probabilities: 0.523, 0.287, 0.149, and 0.041, respectively. Similarly, actual frequencies for the five characteristic market states S1, S2, S3, S4 and S5 of JPN, obtained from figure 6(c), enable us to compute the probabilities: 0.277, 0.308, 0.262, 0.118 and 0.035, respectively. These probabilities are indeed very close to those in Eqs. 4 and 5, and therefore our hypothesis is correct.

Summary and concluding remarks
In summary, we have studied the identification of market states and long-term precursors to critical states (crashes) in financial markets, based on the probabilistic occurrences of correlation patterns, determined using noise-suppressed short-time correlation matrices. We analyzed and compared the data of the S&P 500 (USA) and Nikkei 225 (JPN) stock markets over a 32-year period. We used the power mapping method to reduce the noise of the singular correlation matrices and obtained distinct and denser clusters in the two/three dimensional MDS maps. The effects are prominent also on the similarity matrices and the corresponding MDS maps. The evolution of the market can be followed by the dynamics transitions between the market states. Using multidimensional scaling maps, we applied k-means clustering to divide the clusters of similar correlation patterns of different time-epochs into k groups or market states. We showed that based on the cluster radii we could have a fairly robust determination of the optimal number of clusters. In each market, the value of optimal number of clusters was chosen by keeping the standard deviation of the intra-cluster distance 'minimum' and number of clusters 'highest'. Thus, based on the modified prescription of finding similar clusters of correlation patterns, we characterized USA by four market states and JPN by five. One must mention that this method yields the correlation frames that correspond to the critical states (or crashes). We have verified that these indeed correspond to the well-known financial market crashes; also, specifically studied the properties of the emerging spectrum and characterization of the critical states (catastrophic instabilities) in Refs. [10,15]. We also analyzed the co-occurrence probabilities of the paired market states. We observed that the probability of remaining in the same state is much higher than the transition to a different state. It implies that market states also feel an "inertia" -stay in the same states for a long time. Also, probable transitions are the nearest neighbor transitions and from the co-occurrence table we showed that the probability reduces very fast if one moved away from the diagonal. Hence, the transitions to other states mainly occurred in immediately adjacent states with a few rare intermittent transitions to the remote states. The state adjacent to the critical state (crash) behaved like a long-term precursor for the critical state, and this prescription could be helpful in constructing an early warning system for financial market crashes.

Supplementary information
(a) (b) Figure S1. Plots of the mean correlation without noise-suppression (blue) and with high noise-suppression of = 0.6 (magenta). For (a) USA, and (b) JPN. USA market was relatively calm upto 2002 and became turbulent with high mean correlation from 2002 onward; JPN market became turbulent 1990 onward. Figure S2. Plots of the intracluster distance as a function of number of clusters k of USA, for different value of noise-suppression parameter 0.1 0.7 using kmeans clustering. We used an ensemble of 500 random generated seeds for analyzing the robustness of different clusters in the k-means clustering. The errorbars are the deviation of the measure of intra-cluster distances arise due to different random seeds. The points lie on the boundary of different clusters are subjected to change the association with the cluster for different initial condition to the centroids in the k-mean clustering. It changes the measure of intra-cluster distance among clusters. Inset shows different color lines corresponds to different seed. The value is optimized by keeping the standard deviation 'lowest' and number of cluster 'highest', simultaneously, for the intra-cluster distance. The results are best for = 0.6 and show minimum deviation for k = 4 (max) and it grows for k > 4. Figure S3. Plots of the intra-cluster distance as a function of number of clusters k of JPN, for different value of nonlinear suppression parameter 0.1 0.7 using k-means clustering. We used an ensemble of 500 random generated seeds for analyzing the robustness of different clusters in the k-means clustering. The errorbars are the deviation of the measure of intra-cluster distances arise due to different random seeds. The points lie on the boundary of different clusters are subjected to change the association with the cluster for different initial condition to the centroids in the k-mean clustering. It changes the measure of intra-cluster distance among clusters. Inset shows different color lines corresponds to different seed. The value is optimized by keeping the standard deviation 'lowest' and number of cluster 'highest', simultaneously, for the intra-cluster distance. The results are best for = 0.6 and show minimum deviation for k = 5 (max) and it grows for k > 5. Table S1. List of all stocks of USA market (S&P 500) considered for the analysis. The first column has the serial number, the second column has the abbreviation, the third column has the full name of the stock, and the fourth column specifies the sector as given in the S&P 500.

S.No. Code
Company