An analysis of cross-correlations in an emerging market

https://doi.org/10.1016/j.physa.2006.10.030Get rights and content

Abstract

We apply random matrix theory to compare correlation matrix estimators C obtained from emerging market data. The correlation matrices are constructed from 10 years of daily data for stocks listed on the Johannesburg stock exchange (JSE) from January 1993 to December 2002. We test the spectral properties of C against random matrix predictions and find some agreement between the distributions of eigenvalues, nearest neighbour spacings, distributions of eigenvector components and the inverse participation ratios for eigenvectors. We show that interpolating both missing data and illiquid trading days with a zero-order hold increases agreement with RMT predictions. For the more realistic estimation of correlations in an emerging market, we suggest a pairwise measured-data correlation matrix. For the data set used, this approach suggests greater temporal stability for the leading eigenvectors. An interpretation of eigenvectors in terms of trading strategies is given, as opposed to classification by economic sectors.

Introduction

Correlation matrices are common to problems involving complex interactions and the extraction of information from series of measured data. Our aim is to determine empirical correlations in price fluctuations of daily sampled price data of distinct shares in a reliable way. Our investigation is based on 10 years of daily data for 250–350 traded shares listed on the JSE Main Board from January 1993 to December 2002.

There are several aspects to the question of how to calculate correlations in financial time series. In particular, missing data and thin trading (no prices changes for a stock over several time periods) may be significant. Random correlations in price changes are likely to arise in an ensemble of several shares. Furthermore, for a portfolio of N distinct assets, there will be N(N-1)/2 entries in a correlation matrix which has been determined from time series of length L. When L is not large, the calculated covariance matrix may be dominated by measurement noise. Hence, it is necessary to understand effects of: (i) noise, (ii) finiteness of time series, (iii) missing data, and (iv) thin trading in determination of empirical correlation.

The properties of random matrices first became known with Wigner's seminal work in the 1950s for application in nuclear physics in the study of statistical behaviour of neutron resonances and other complex systems of interactions [1], [2], [3]. More recently random matrix theory has been applied to calibrate and reduce the effects of noise in financial time series and to investigate constraints on rational (empirically based) decision making (cf. Refs. [4], [5], [6], [7], [8], [9], [10], [11], [28], [29], [34]). Correlation matrices are computed for the data under investigation and quantities associated with these matrices may be compared to those of random matrices. The extent to which properties of the correlation matrices deviate from random matrix predictions clarifies the status of the information derived from the computation of covariances. In several studies of shares traded in the S&P 500 and DAX, it was found that, aside from a small number of leading eigenvalues, the eigenvalue spectra for the measured data coincide with theoretic random matrix predictions, i.e., it was found that the estimation of covariances is dominated by random noise. Ref. [12] postulates a model for the correlations to explain the observed spectral properties. RMT has also been shown to yield an improved estimation technique: an estimated correlation matrix can be filtered by removing the contributions of eigenvalues which lie in the RMT range. In Ref. [13] it is shown that noise levels in the correlation matrix depend on the ratio N:L, where N denotes the number of stocks and L denotes the length of the time-series.

In this paper we consider the problems of missing data and thin trading in determination of empirical correlation in daily sampled price fluctuations. We analyze the data base containing prices Si(t), the prices of assets i=1,,N at time t as follows. We first find the change in asset prices ri(t)=lnSi(t+Δt)-lnSi(t).The usual cross-correlation matrix for idealised data (non-zero price fluctuations and no missing data) is given by Cij:=rirj-ririσiσj,where denotes average over period studied and σi2:=ri2-ri2 is the variance of the price changes of asset i. Alternatively, one could write Cij=1Lt=1LRi(t)Rj(t),where L denotes the uniform length of the time series and Ri(t) denotes the price change of asset i at time t such that the average values of the Ri's have been subtracted off and the Ri's are rescaled so that they all have constant volatility σi2:=Ri2=1. This is written as C=(1/L)MMT where M is a N×L matrix and MT is its transpose (cf. Ref. [5]).

The pairwise measured-data cross-correlation matrix using the pairwise deletion method [14], [15] for the case when there is missing data in time series of returns is computed as follows: Cij:=ρiρj-ρiρiσiσj,where ρi and ρj denote subseries of ri and rj such that there exists measured data for both ρi and ρj at every time period in the subseries, and denotes average over period studied, σi2:=ρi2-ρi2 is the variance of the price changes of asset i.

Section snippets

Random matrix theory (RMT) predictions

We summarise four known universal properties of random matrices, namely the Wishart distribution for eigenvalues, the Wigner surmise for eigenvalue spacing, the distribution of eigenvector components and the inverse participation ratio for eigenvector components, which will be applied in our analysis.

Let A denote an N×L matrix whose entries are i.i.d. random variables which are normally distributed with zero mean and unit variance. As N,L and while Q=L/N is kept fixed, the probability density

Analysis of Johannesburg stock exchange data

The JSE is one of the 20 largest national stock markets in the world. We summarise some of its known qualitative features. Although many of the main board JSE shares are illiquid, the market as a whole is a fairly liquid one. There is share concentration in half-dozen shares: these dominant shares account for almost a third of the index and have a large bias towards resources. The resources sector in turn is strongly correlated with the dollar-rand exchange rate, an exogenous factor that has a

Conclusions

Our investigation exposes some notable differences in the spectral properties of the correlation matrices computed by the three different methods outlined. As in preceding analyses of financial market data, in all cases we have found that the distribution of eigenvalues exhibits: (1) a significant part of the spectrum falls within the range of random matrix predictions, and (2) there exists a small no. of large leading eigenvalues. However, we found that by computing measured-data correlation

Acknowledgments

We thank the referees of Physica A for their pertinent comments and questions. DW thanks National Research Foundation Thuthuka Grant TTK2005081000005 and University of Cape Town Research Council for financial support.

References (30)

  • L. Laloux et al.

    Noise dressing of financial correlation matrices

    Phys. Rev. Lett.

    (1999)
  • L. Giada et al.

    Data clustering and noise undressing of correlation matrices

    Phys. Rev. E

    (2001)
  • V. Plerou et al.

    Random matrix approach to cross correlations in financial data

    Phys. Rev. E

    (2001)
  • J.D. Noh

    Model for correlations in stock markets

    Phys. Rev. E

    (2000)
  • W. Wothke

    Longitudinal and multi-group modeling with missing data

  • Cited by (79)

    • Temporal and spectral governing dynamics of Australian hydrological streamflow time series

      2022, Journal of Computational Science
      Citation Excerpt :

      To test for changes in behaviour over time, we explicitly, and non-parametrically, model the evolutionary dynamics of our underlying time series. Such studies of time-varying dynamics, correlation structure and dimensionality reduction have been widely applied among applied mathematicians in domains such as finance [71,72,72–76]. In hydrological research however, there is less abundance in the applications of such techniques [47].

    View all citing articles on Scopus
    View full text