Recurrence interval analysis of high-frequency financial returns and its application to risk estimation

We investigate the probability distributions of the recurrence intervals $\tau$ between consecutive 1-min returns above a positive threshold $q>0$ or below a negative threshold $q<0$ of two indices and 20 individual stocks in China's stock market. The distributions of recurrence intervals for positive and negative thresholds are symmetric, and display power-law tails tested by three goodness-of-fit measures including the Kolmogorov-Smirnov (KS) statistic, the weighted KS statistic and the Cram\'er-von Mises criterion. Both long-term and shot-term memory effects are observed in the recurrence intervals for positive and negative thresholds $q$. We further apply the recurrence interval analysis to the risk estimation for the Chinese stock markets based on the probability $W_q(\Delta{t},t)$, Value-at-Risk (VaR) analysis and VaR analysis conditioned on preceding recurrence intervals.


Introduction
The recurrence interval analysis has caused a growing interest in extreme value statistics, and has been extensively studied in a variety of experimental time series, such as records of climate [1,2], seismic activities [3] and turbulence [4]. In recent years, a great deal of financial data have been compiled up and thus makes it possible to do relatively accurate analysis of extreme events in stock markets. By investigating the recurrence intervals between extreme events in stock markets, we can better understand the statistical properties of these extreme events. This is supposed to be of great importance for risk assessment. The recurrence intervals between volatilities exceeding a certain threshold q have been carefully studied, and numerous phenomena have been unveiled [5,6,7,8,9,10,11,12,13,14,15,16].
On the other hand, only a few papers have been devoted to the study of recurrence intervals between large price returns [17,18,19,20]. It has been verified that the longterm correlation of the time series has a remarkable influence on the recurrence interval distribution. Numerical simulation studies have shown that for the linear long-term correlated time series the recurrence interval distribution follows a stretched exponential [2,21,22,23]. Santhannam et al. subsequently provided an analytical expression for recurrence intervals distribution in linear long-term correlated time series, and claimed that the distribution is a combination of stretched exponential and power law form [24]. Recently, Bogachev and Bunde studied the recurrence interval distribution for the artificial multifractal signals which have non-linear correlation, and power-law behavior whose exponent varies with the threshold value was observed [18,25]. The multifractal structure has proven to be the main feature of financial records, and consequently the recurrence interval distribution should have power-law behavior. Power-law behavior with exponent depending on the threshold value was indeed observed in a variety of representative financial records [5,19].
Since the stock market crash occurring in May 2007, the Chinese stock markets experienced a strongly fluctuating period. The increasing number of the extreme events with large price returns offers an opportunity to better understand the statistic properties of the recurrence intervals. We investigate the reoccurring intervals between price returns above a threshold q > 0 or below a threshold q < 0 using the high-frequency data in recent years, and find tails of the recurrence interval distributions obey power-law scaling, independent of the threshold value. This scaling behavior is consistent with some empirical results of volatility recurrence intervals [6,7,9,11,13]. The present recurrence interval analysis mainly focuses on the probability distribution and memory effect of the recurrence intervals. Another purpose of this paper is to make risk estimation for the Chinese stock markets based on the recurrence interval analysis. We note that the empirical analysis on the recurrence of financial returns was carried out using daily data [17,18,19,20]. In this work, we investigate the recurrence interval of 1-min high-frequency returns to gain better statistics.
The paper is organized as follows. In Section 2, we explain the database analyzed. Section 3 studies the probability distribution of recurrence intervals using the Kolmogorov-Smirnov (KS) and Cramér-von Mises (CvM) tests. Section 4 further studies the memory effects of recurrence intervals. In Section 5, we attempt to perform the risk estimation based on the recurrence interval analysis. Section 6 concludes.

Data sets
Our analysis is based on a database of China's stock market retrieved from GTA Information Technology Co., Ltd (http://www.gtadata.com/). We study the 1-min intraday data of 20 liquid stocks actively traded on the Shanghai Stock Exchange and the Shenzhen Stock Exchange from January 2000 to May 2009, the Shanghai Stock Exchange Composite Index (SSEC) and the Shenzhen Stock Exchange Composite Index (SZCI) from January 2003 to April 2009. Since the sampling time is 1 minute, the number of data points is about 340000 for each of the two Chinese indices and about 500000 for the individual stocks. These 20 stocks are actively traded stocks representative in a variety of industry sectors. Each stock is uniquely identified with a stock code which is a unique 6-digit number. A stock with the code initiated with 600 is traded on the Shanghai Stock Exchange, while a stock with the code initiated with 000 is listed on the Shenzhen Stock Exchange.

Probability distribution of recurrence interval of price returns
We study the recurrence intervals between extreme events with large positive price returns or negative price returns. The price return is defined as the logarithmic difference between two consecutive minutes, that is, where Y(t) is minute price at time t. Before we calculate the recurrence interval, we normalize the price return of each stock by dividing its standard deviation We investigate the interval τ ≡ τ q between two consecutive price returns above a positive threshold q > 0 or below a negative threshold q < 0 as illustrated in figure 1, and then study the statistics of these recurrence intervals between consecutive large price increases or decreases.

PDF of recurrence interval of price returns
We first calculate the empirical probability distribution function (PDF) P q (τ) of recurrence intervals of price returns. In figure 2, the scaled PDFs P q (τ) τ are plotted as a function of the scaled recurrence intervals τ/ τ for both positive and negative thresholds for the two Chinese indices and four representative stocks, where τ is the mean recurrence interval. For both the Chinese indices and individual stocks, one observes P q (τ) τ for q > 0 shows a profile very similar to that for q < 0 with the same magnitude of thresholds, e.g. q = 2 and q = −2. This indicates that the recurrence interval distributions for positive and negative thresholds are symmetric. We find that, for small τ/ τ , the scaled PDFs for different q values differ from each other, especially for the two Chinese indices, which is partly due to the discreteness effect. In contrast, the tails of the scaled PDFs nicely collapse onto a single curve displaying a scaling behavior. Speaking differently, P q (τ) τ only depends on τ/ τ as and do not depend on the threshold q when τ/ τ is large enough.  Table 1.

Fitting the scaling function of recurrence interval PDFs
We then focus on the tail of recurrence interval PDF and study the particular form of the scaling function. The curves in figure 2 suggest that the tails of the PDFs may follow powerlaw form. We assume that the empirical PDF above x min obeys a scaling form as in Eq. (3) and the scaling function follows where x min is the lower bound of the power-law distribution. Since our hypothesis is that the empirical PDFs for different q values above x min are coincident with their common best power-law fit, we aggregate the interval samples for different q values above x min , and fit them using a common power-law function.
To make an accurate estimation of the parameters for this power law distribution, we use a method proposed by Clauset, Shalizi and Newman using maximum likelihood method based on the Kolmogorov-Smirnov (KS) statistic [26]. Supposing F q is the cumulative distribution function (CDF) for empirical data and F PL the CDF of power-law fit. The KS statistic is defined as With a simple fundamental idea that making the empirical PDF and the best power-law fit as similar as possible, the estimatex min is determined by minimizing the KS statistic. The parameters c and δ are estimated using maximum likelihood method. The fitted power law lines are illustrated in figure 2 with the estimated parametersx min , δ, c and resultant KS statistic depicted in Table 1. Among the 20 individual stocks, there are 16 stocks which have x min 10 showing a scaling region larger than one order of magnitude, and their power-law exponents are estimated to be 3.0 ± 0.3. The two Chinese indices have scaling regions more than two orders of magnitude with power-law exponents 2.2 ± 0.1.  We have shown how to fit the empirical recurrence interval PDFs and provide good estimation of the parameters. It is necessary to test how good the power law fits the empirical PDF. We further perform the goodness-of-fit test using KS statistic. In doing so, the bootstrapping approach is adopted [26,27]. We first generate 1000 synthetic samples from the power-law distribution that best fits the empirical distribution, and then reconstruct the cumulative distribution F sim of each simulated sample and its CDF F sim,PL of the best power-law fit. We calculate the KS statistic for the synthetic data from This KS statistic is relatively insensitive on the edges of two cumulative distributions. To avoid this problem we use a weighted KS statistic defined as [27] KS W sim = max Thus it could be uniformly sensitive across the whole range. The p-value is determined by the frequency that KS sim > KS or KS W sim > KS W, where KS W is the weighted statistic for the empirical data. The tests are carried out for the two Chinese indices and the 20 individual stocks, and the resultant p-values are depicted in Table 1.
A p-value close to 1 indicates that the empirical PDFs for different q values are coincident with their common power-law fit as good as the synthetic data generated from the power-law fit, whereas a relative small p suggests that the empirical PDFs could not be well described by their common power-law fit. We consider the significance level of 1%. If the p-value of an individual stock is less than 1%, then the null hypothesis that the empirical PDFs of this stock can be well fitted by their common power-law fit is rejected. As shown in Table 1, the null hypothesis is rejected for four stocks (600000, 600058, 600100, 600104) using the KS statistic and for SSEC using the KS W statistic. Based upon the fact that 16 stocks (out of 20 stocks analyzed) pass the test using both KS and KS W statistics, we can conclude that for most of the stocks the tails of recurrence interval PDFs obey scaling behavior and the scaling function could be nicely fitted by a power law.
To further test the goodness of this power-law fit, we use another goodness-of-fit measure based on the Cramér-von Mises (CvM) statistic where F is the CDF of empirical data, F PL is the CDF of the power-law fit, and N is the total number of scaled interval samples x = τ/ τ [28,29,30]. For a sequences of scaled interval samples x 1 , x 2 , · · · , x N , arranged in ascending order, W 2 could be calculated from Consider the significance level of 1% and N ≫ 1, if the W 2 statistic is greater than a critical value 0.743 (see the critical values for different significance levels in Refs. [28,29]), the hypothesis that the empirical PDFs coincident with their common power-law fit is rejected. The CvM tests are carried out for the two Chinese indices and the 20 stocks, and the corresponding values of statistic W 2 are depicted in the last column of Table 1. Two stocks (600000, 600100) show W 2 greater than the critical value, thus fail in the test and the tails of their empirical PDFs consequently could not be approximated by the power-law distribution. 18 stocks (out of 20 stocks analyzed) and two Chinese indices pass the CvM test, which further confirms our results that for most of the stocks the tails of recurrence interval PDFs could be nicely fitted by the power-law distribution. The CvM test provides similar results to the KS test, and principally the W 2 statistic is smaller when the p-value of a stock is larger.

Memory effect in recurrence interval of price returns
To fully understand the statistical properties of the recurrence intervals of price returns, we further investigate the temporal correlation between them. Empirical studies have revealed that there exists a memory effect in the volatility recurrence intervals of various stock markets [7,9,11,13,15]. In contrast, to the best of our knowledge, the memory effects of the recurrence intervals of financial returns have not been investigated [17,18,19]. Indeed, we do observe memory effects in the recurrence intervals between large positive and negative returns in the Chinese stock markets. Since the results of the recurrence intervals between positive returns show very similar behavior to that of the recurrence intervals between negative returns, we mainly show the results of the recurrence intervals between consecutive price returns below negative thresholds q < 0.

Conditional PDF
We first investigate the short-term memory by calculating the conditional PDF P q (τ|τ 0 ) of recurrence intervals. P q (τ|τ 0 ) is defined as the probability of finding interval τ conditioned on the preceding interval τ 0 . We study the conditional PDF for a bin of τ 0 in order to get better statistics. The entire interval sequences are arranged in ascending order and partitioned to four bins with equal size. Figure 3 plots the scaled conditional PDF P q (τ|τ 0 ) τ of the two Chinese indices and four representative stocks as a function of the scaled recurrence intervals τ/ τ for τ 0 in the smallest and biggest bins, marked with filled and open symbols, respectively. The symbols for all negative thresholds for τ 0 in the smallest and biggest bins approximately collapse onto two separate solid curves as shown in figure 3. This may further confirm the scaling behavior of the recurrence interval PDFs. Moreover, for large τ/ τ , P q (τ|τ 0 ) with large τ 0 is larger than that with small τ 0 , and for small τ/ τ , P q (τ|τ 0 ) with small τ 0 is larger than that with large τ 0 . We find large (small) preceding intervals τ 0 tend to be followed by large (small) intervals τ, and this observation indicates that there exists a short-term memory in the recurrence intervals of the Chinese stock markets.

Detrended fluctuation analysis
To investigate the long-term memory of the recurrence intervals, we adopt the detrended fluctuation analysis (DFA) method [31,32,33,34,35], known as a general method of examining the long-term correlation in time series analysis. The DFA method computes the detrended fluctuation F(l) of the time series within a window of l points after removing a linear trend. For long-term power-law correlated time series, F(l) is expected to scales as a power law with respect to the time scale l where the DFA scaling exponent α is supposed to be equal to the Hurst exponent when α ≤ 1 [36]: if 0.5 < α < 1 the time series are long-term correlated, and if α = 0.5 the time series are uncorrelated [31,32]. Figure 4 plots the detrended fluctuation F(l) of the recurrence intervals for different values of negative threshold q for the two Chinese indices. The curves show linear behavior in the double logarithmic plot, which indicates F(l) obeys a scaling form as in Eq. (10). The DFA exponent α is estimated by fitting the slope of the curves in figure 4. We do the same calculation to obtain the DFA scaling exponents α for the 20 individual stocks, and plot the exponents in the right panel of figure 4. The exponent α shows a decreasing tendency as the decrease of negative threshold q, but shows values apparently larger than 0.5. This suggests that the recurrence intervals are long-term correlated. Empirical studies have shown the long-term memory of the recurrence intervals may arise from the long-term memory of its original time series [18,23,37,38,39]. To verify this, we calculate F(l) of the negative return series, remaining the position of the positive returns but ignoring its contribution to the calculation of F(l). As it is shown in Figure 4, the exponents α of the negative return series for all the 20 stocks and the two Chinese indices (represented by black circles) are significantly larger than 0.5, and therefore the negative return series are long-term correlated. We then calculate the exponent α of the recurrence interval of the shuffled price returns which are artificially uncorrelated. The exponents α for all the individual stocks and the two Chinese indices tend to be very close to 0.5, and the recurrence intervals of the shuffled data are consequently uncorrelated. This observation provides direct evidence to confirm our assumption that the long-term memory of the recurrence intervals may due to the long-term memory of the price returns (either positive returns or negative returns).

Risk estimation
The study of recurrence interval between extreme events in stock markets has draw much attention of scientists and economists. Most of studies focus on investigating the statistical properties of recurrence interval sequences and understanding its fundamental dynamics. Not much work has been done applying this recurrence interval analysis to risk estimation for real stock markets. In the following paper we attempt to do some risk estimation for Chinese stock markets based on this recurrence interval analysis following Ref. [17,18,19,20].

Probability W q (∆t|t)
In risk estimation, a quantity of great importance is the probability W q (∆t|t), that an extreme event with price return below q < 0 occurs within a short time interval ∆t ≪ τ , conditioned on an elapsed time t after the occurrence of the previous extreme event [18]: Previous study has shown that P q (τ) obeys a power law as in Eq. (4) for τ/ τ >x min . When t >x min τ , it can be algebraically derived that W q (∆t|t) is supposed to be proportional to ∆t and inversely proportional to t. Figure 5 plots W q (∆t = 10|t) for the two Chinese indices and four representative stocks, and apparently Eq. (12) fits the probability W q (∆t|t) well for t > x min τ . It is worth pointing out that W q (∆t|t) shows an increasing tendency for extreme large t because of the poor statistics of rare events with large recurrence intervals. Measurements for other values of ∆t show qualitatively similar results. Here an intriguing feature is that W q (∆t|t) is independent of the threshold q, which is a direct consequence of the scaling behavior of P q (τ) shown in Eq. (3).

Loss probability p *
It is well known that the intraday returns r ∆t of stocks and indexes are distributed according to a Student distribution [40], whose tails follow the inverse cubic law [41]. Hence, its tail obeys a power law as where the exponent β increases with the increase of ∆t. When ∆t equals one minute, the inverse cubic law holds [40], that is β ≈ 3 [41]. Empirical studies show that the tails of the return distributions for |r| ≥ 2 follow Eq. (13) and the exponent β displays values close to 3 as illustrated in figure 6. For a given risk level of VaR = q < 0, the probability p * of loss is The mean interval τ is defined as τ = total number o f return samples number o f return samples that r < q .
By definition, the inverse mean return represents the probability p * and should follow

Conditional loss probability p *
The conditional mean recurrence interval τ|τ 0 is defined as mean recurrence interval conditioned on the preceding interval τ 0 . In Figure 8, we plot the scaled conditional mean recurrence interval τ|τ 0 / τ as a function of the scaled preceding interval τ 0 / τ for the two Chinese indices and four representative stocks. We assume that τ|τ 0 / τ follows As shown in figure 8, the relationship between τ|τ 0 / τ and τ 0 / τ could be nicely described by Eq. (17) in the medium region of τ 0 / τ ∈ (0.1, 10]. For large and small τ 0 / τ , τ|τ 0 / τ for different q values evidently diverges, and could not be well fitted by Eq. (17). Based on the conditional mean recurrence interval, we further calculate the loss probability conditioned on the preceding interval τ 0 . Similar to Eqs. (14) and (16), we expect that where p(r|τ 0 ) is the probability that a return r immediately follows interval τ 0 , and p * is the loss probability for a risk level of VaR = q < 0 conditioned on the preceding interval τ 0 of losses below q. If we know the preceding interval τ 0 , we can estimate the risk level corresponding to a certain loss probability p * . Substituting Eq. (16) into Eq. (17), we obtain the expression of 1/ τ|τ 0 depends on τ 0 / τ and q. In figure 9, we plot the contours of the theoretical conditional loss probability p * = 1/ τ|τ 0 using the parameters estimated from figures 7 and 8. For comparison purposes, we also plot the the contours of empirical conditional loss probability in figure 10. Patterns of the two contour maps are similar except for large and small τ 0 / τ . The large values of empirical loss probability for large or small τ 0 / τ indicate the high risk of extreme loses event conditioned on a long or short elapsed time since the previous extreme event. To better theoretically estimate the conditional loss probability, we need to further refine the relationship between τ|τ 0 / τ and τ 0 / τ in Eq. (17) for large and small τ 0 / τ .

Conclusion
We have studied the probability distribution and the memory effect of recurrence intervals between two consecutive returns above a positive threshold q > 0 or below a negative threshold q < 0 for SSEC, SZCI and 20 liquid stocks of the Chinese stock markets. Since our data sets are 1-min high-frequency returns, the statistics of our findings are much better than previous results using daily data [17,18,19].
We found that the PDFs of recurrence interval with different thresholds collapse onto a single curve for each index and individual stocks and are symmetric with respect to positive and negative returns. The tails of the recurrence interval distributions for different values of threshold q follow scaling behavior, and the goodness-of-fit test shows that the scaling function of 16 stocks and SZCI could be approximated by power law under the significance level of 1% using the KS , KS W and CvM statistics.
The investigation of the conditional PDF P q (τ|τ 0 ) and the detrended fluctuation function F(l) demonstrates the existence of both short-term and long-term correlation in the recurrence intervals. To the best of our knowledge, the memory effects of recurrence intervals of financial returns have not been studied previously, although the memory effects of volatility recurrence intervals are well documented. The long-term correlation in the recurrence intervals is usually attributed to the long-term correlations in the original time series [6]. In the current case, although the returns are uncorrelated, the positive returns are strongly correlated, and so are the negative returns. Therefore, the long memory in the recurrence intervals can also be attributed to the long memory in the positive and negative return series.
We also apply the recurrence interval analysis to the risk estimation for Chinese stock markets, and provide relatively accurate estimations of the probability W q (∆t|t) and the conditional loss probability p * . These results are useful for the assessment and management of risks in financial engineering. We submit that recurrence interval analysis of financial returns pioneered in Ref. [17,18,19,20] has the potential power to forging a link between econophysics and financial engineering and more work should be done in this direction.