Multi-scale description and prediction of financial time series

A new method is proposed that allows a reconstruction of time series based on higher order multi-scale statistics given by a hierarchical process. This method is able to model financial time series not only on a specific scale but for a range of scales. The method itself is based on the general n-scale joint probability density, which can be extracted directly from given data. It is shown how based on this n-scale statistics, general n-point probabilities can be estimated from which predictions can be achieved. Exemplary results are shown for the German DAX index. The ability to model correctly the behaviour of the original process for different scales simultaneously and in time is demonstrated. As a main result it is shown that this method is able to reproduce the known volatility cluster, although the model contains no explicit time dependence. Thus a new mechanism is shown how, in a stationary multi-scale process, volatility clustering can emerge.


Introduction
A typical feature of complex systems, such as finance, is that these have hierarchical and non-trivial structures on different scales. Due to the hierarchical structure such systems are in general very high dimensional; thus there is the challenge to find appropriate methods to model these structures efficiently. Successful attempts have been made to describe such hierarchical systems as processes in scale rather than time or space itself. Examples are the description of financial data [1]- [3], roughness of surfaces [4,5], turbulence [6]- [8] and earthquakes [9]. These successful attempts are characterized by the fact that they can correctly provide the joint probability density function p(y 1 , τ 1 ; . . . ; y n , τ n ) of the log returns y(τ ) of the process variable S at different scales τ , In the following the joint statistics of these increment processess on different scales τ i are considered, which we refer to as multi-scale description. Thereby the smaller increments are always nested into the larger increments. Note that the method itself is not restricted to this particular increment definition [10]. It should be pointed out that the knowledge of the joint multi-scale probability density function p(y 1 , τ 1 ; . . . ; y n , τ n ) includes also information such as multifractal scaling properties, typically expressed by y n (τ ) ∼ τ ξ n . A multifractal scaling property, also often called multiscaling property, is not necessary for our approach presented here. Nevertheless, the concept of these multifractal features is a very strong one and has been shown frequently for different financial data sets. Furthermore, multifractal features are often the basis for setting up powerful models; see e.g. [11]- [13]. For complex systems there is a general challenge: to generate surrogates or future paths based on the knowledge of the statistical properties and the recent history of the system. Most often the properties only in scale or only in time are modelled. In order to model the scale properties, one approach is to propose general models and proper parameter sets, see for example [14,15]. Based on the feature of volatility clustering, which can be measured by the correlation or by return intervals of variables y(t, τ ) and |y(t, τ )| for the same scale τ but different times t, also aspects of more point statistics have been taken into account; see [11,12,16].
Typically, remarkable features such as correlations or the shape of single scale pdfs p(y, τ ) are used for the parameter selection. Some of these approaches provide also a remarkably good description of higher order statistics. In order to model the time properties of complex systems, the dependencies of consecutive points are modelled mostly without considering the form of the 3 distribution. Known examples are the ARCH [17], GARCH [18], MMAR [11] or MRW [12] just to mention a few; for further models, see e.g. [11,19]. We propose an alternative, more general approach, which is based on the knowledge of multi-scale properties. In particular, we show how a given time series can be extended into the future in such a way that the scale and time properties are modelled simultaneously. In a previous work [20] the initial idea of this method has been proposed for simulating turbulence data. Here we extend the analysis to another complex system and show more details using a financial data set. In particular, we will discuss problems such as predicting volatility clustering and verifying the quality of the prediction. This issue of the prediction of features of the future time series is an actual topic. Often one tries to predict extreme events by features of 'patterns' in the precursor data. As categorized in [21], either the frequency of these patterns is evaluated or the pattern itself that precedes any event in the record is analysed. As an alternative also the statistics of return intervals of selected events can be analysed.
Here we do not merely focus on the occurrence of extreme events, but also we are interested in determining some stochastic features of the process in the future, independent of any threshold consideration. As another important feature, our approach allows us to break down the n-point precursor consideration on three-point or, respectively, on two-increment statistics.

Method
The aim is to compute the probability of occurence of the event {x * , t * }, conditioned on the knowledge of the past points {x 1 , t * − τ 1 ; . . . ; x N , t * − τ N }, where the following shorthand notation is used: This probability can be expressed as We now assume that expression (3) has no explicit time dependence. This is a weaker assumption than the stationarity assumption for x * itself, due to the fact that certain information, such as current volatility, is already accounted for by the knowledge of {x 1 , t * − τ 1 ; . . . ; x N , t * − τ N }. Thus our aim will be to determine The two-point pdf may be introduced in the following form: where the expectation is taken over the ensemble and the time variable t. Statistical independence of the cascade on x = X (T ) may be assumed, because the current price can always be normalized to a reference price, by describing a certain fraction or multiple of 4 the asset, instead of the asset itself. Therefore the expectation term can be factorized and the following expression is obtained: Now the case of the conditional pdf of higher order will be considered. We start at Thereby again the above-mentioned independence of the cascade on x = X (T ) was used. Therefore it can be written as Using the same argument for and combining both equations yields an expression for the conditional probability density Because of the involved scales the dimension of the joint probability density is very high. Therefore it is in general very difficult to compute it from empirical time series. However, the description and computation can be highly simplified if Markov properties can be assumed. This is the case if is true for all i and n > i. Without loss of generality we take τ i < τ i+1 . It should be noted that the Markov property can be tested for a given data set [2,7], where also evidence can be found for the validity of the Markov property. In this case the joint probability density can be substantially simplified: p(y 1 , τ 1 ; . . .; y n , τ n ) = p(y 1 , τ 1 |y 2 , τ 2 ) · · · · · p(y n−1 , τ n−1 |y n , τ n ) · p(y n , τ n ).
Most interestingly, it has also been shown that these fundamental conditioned pdfs p(y i , τ i |y j , τ j ) and their τ dependence can be expressed by the Fokker-Planck or Kolmogorov equation where D (1) denotes the drift term and D (2) the diffusion term. Both terms can be estimated directly from the data [2,7], [22]- [24]. In the following we restrict the discussion to Markov processes in scale and to right-justified increments, i.e. the smaller increment is nested into the larger one and has the right end point in common according to equation (1). In the case of Markov properties, equation (10) simplifies to Using different values for x * it is possible to construct a discrete approximation of p(x * |x 1 , τ 1 ; . . . ; x N , τ N ) using equation (14) and the conditional probabilities of first order, which can be estimated directly from the data.

Results for the DAX index
In order to show the application of the above-described method, it will be applied for the DAX index for the years 1994-2003. The financial data sets were provided by the Karlsruher Kapitalmarkt Datenbank (KKMDB) [25]. The time scales used for the reconstruction are choosen as τ n = 5 × 2 n min with n = 0, 1, . . . , 6. The sample size is quite small and therefore the initial estimate of the probability density in the tails on large scales is not very good. For certain large log returns, there is no realization in the initial sample. Therefore the corresponding probability is estimated as zero. But in order to estimate the joint probability function, which is necessary for the extrapolation, see equation (12), all conditional probabilities are multiplied. Therefore the joint probability becomes zero for these cases. In order to estimate the probability for these events, the conditional probability density functions are fitted with splines. For more details, refer to [26,27]. In areas with only a few or no data points a kernel density estimation procedure is applied. These smoothed pdfs are now used for the extrapolation.
To give an idea of how this method works we have shown in figure 1 three different sets of data points {x 1 ; . . . ; x N } together with the constructed probability P(x * , t * |x 1 , t * − τ 1 ; . . . ; x N , t * − τ N ), which is shown on the right side. In addition the unconditioned probability P(x * − x 1 , τ 1 ) is shown. From these three examples, it can be seen clearly how the statistics is sensitively changing with the choice of the initial points {x 1 ; . . . ; x N }, and even bimodal probabilities are obtained for the next step. The main point of our method is that the actually updated conditional pdfs P(x * , t * |x 1 , t * − τ 1 ; . . . ; x N , t * − τ N ) change from situation to situation, delivering a dynamics that seems not to be stationary, but the underlying scaledependent pdfs used for the construction are purely stationary. Based on the knowledge of P(x * , t * |x 1 , t * − τ 1 ; . . . ; x N , t * − τ N ) in a statistical correct sense, a new data point x * can be drawn. Shifting the above-mentioned procedure by one step, the same procedure can be used to predict the next data point, and so on. Thus we are able to generate new artificial data series.
These synthetic time series can be tested with respect to the proper n-scale joint probabilities p(y 1 , τ 1 ; . . . ; y n , τ n ); therefore we use the method to estimate the underlying Fokker-Planck equation. In figure 2 the drift and diffusion terms are shown, which we reconstructed from the original DAX data and the reconstructed data. Based on this result we claim that both data sets underly the same hierarchical stochastic equations and thus are stochastically equal in the sense that both data sets have the same n-scale statistics. The extrapolated time series were also used to calculate the return distribution p(y, τ ) at the different scales. The results are shown in figure 3. As can be seen the agreement of the pdfs is quite good. Although the algorithm delivers already quite good results, the number and magnitude of scales that should be used is still an open question. Possible solutions may be the analysis of the importance of scales by a wavelet analysis or the HARCH model [28]. The above-presented method for extending time series can be used to simulate many different possible extensions of an existing time series. These simulated paths can be used to analyse the evolution of statistical quantities like standard deviation, quantiles of the distribution or the probability of extreme returns above or below a certain threshold. Therefore predictions can be made about how long low-or high-volatility states, as shown in figure 1, will persist or when extreme events may occur. This would provide a connection to the return interval approach [21].

Prediction of volatility
The ansatz presented here allows the prediction of the whole pdf for the next point of the time series as we have shown above. Based on these considerations, the connection between predicted and realized volatility will be analysed in more detail. It is important to note that, based on one prediction and the corresponding realization, it is not possible to determine the quality of this next step prediction. This is due to the fact that the realization of a next data point of a process with a very broad distribution can be due to its randomness close to the centre, and another realization of a process with a narrow distribution can be far from the centre. Therefore in order to determine the quality of the prediction, the following technique was set up. The data set is divided into two parts; the larger one contains the data of the years 1994-2002, while the smaller one contains the data of the year 2003. The larger data set is used to estimate the unconditional and conditional probability density functions on the right side of equation (14). Then a prediction of volatility for the second smaller data set is attempted. A sequence of points from this second data set is used as an initial condition for equation (14). Now it is possible to estimate the conditional probabilities for the next data point x * following the sequence of points used as the initial condition. A tupel with the predicted standard deviation and the realized log return is generated. In a second step, these tupels are sorted with respect to the predicted volatility given by the standard deviation of P(x * , t * |x 1 , t * − τ 1 ; . . . ; x N , t * − τ N ). Based on 5000 consecutive sorted tupels, the mean predicted volatility σ prog and the mean realized volatility σ real is estimated. In figure 4 the results for the DAX index are presented. The functional dependence between σ 2 real and σ 2 prog is quite good and linear. real with respect to the predicted variance σ 2 prog for the DAX index is displayed. The scales used for the forcasts are τ = 5 × 2 n min with n = 0, 1, . . . , 6. A prediction was attempted for the log return in the next 5 min.

Conclusions
A new method has been presented that allows for a multi-scale reconstruction of a time series. More scales are modelled simultaneously, using conditional probabilities of properly chosen increments. A necessary feature of the underlying complex system is that it exhibits Markov properties of the scale-dependent process. This can be tested statistically. In this case all conditional probabilities of arbitrary order can be substituted by conditional probabilities of first order, which can be calculated easily from empirical time series. Using the empirical probability densities a time series can be reconstructed. The reconstructed time series reproduces the probability densities very well. The advantage of this method is the use of the distributions, which are obtained directly from the data. Therefore the modelling of an asymmetric or non-Gaussian distribution is possible. The reconstruction can be applied to any range of scales without a change of the underlying method, as long as Markov properties in scale are present. Furthermore, using this approach the full probability density for the next potential step of a time series is provided. Using an empirical time series as a starting point, this procedure can be applied for prediction purposes. But contrary to many other methods, not only the mean value for the next step is provided, but also the full distribution. Using this full distribution for the log returns, many common risk measures can be calculated. An examples is the standard deviation of the realizations of the process in the next step, as is demonstrated in section 4. But also, more complex risk measures such as VaR (value at risk), EL (expected loss) and ES (expected shortfall) can be easily obtained by integrating the estimated distribution for the next step of the process.
Our method provides for each historical time series a unique distribution for the realization of the process in the next step. Using a distribution-based risk measure such as VaR, EL or ES, a direct link between the historical time series and the risk measure is obtained. If a portfolio of assets is considered, then the choice of the portfolio determines the weights for the historical time series and the time series of the considered portfolio can be easily computed. Taking the portfolio's weights as free variables and the risk measure as an error function, an optimization of portfolios with respect to their risk characteristics is possible. Here it is of advantage that the calculation of the distribution, and hence the distribution-based risk measure, is very fast. Therefore this method can be applied to the cases where historical simulation-based methods are used, e.g. for portfolio optimization.