Tests of the Efficient Markets Hypothesis

Since the French mathematician BACHELIER published his “Théorie de la Spéculation” in 1900, much research has been devoted to the question of whether, and to what extent, stock price movements are predictable. For no obvious reason, a large number of research workers (particularly of statisticians) used only past stock prices for prediction and ignored any other available information. This practice clearly does not increase the chance of success. Nevertheless, since our primary goal is the study of the employed statistical methods, we do not break with this unfortunate habit. Accordingly, this paper is confined solely to univariate analysis.


Introduction
Since the French mathematician BACHELIER published his "Théorie de la Spéculation" in 1900, much research has been devoted to the question of whether, and to what extent, stock price movements are predictable.For no obvious reason, a large number of research workers (particularly of statisticians) used only past stock prices for prediction and ignored any other available information.This practice clearly does not increase the chance of success.Nevertheless, since our primary goal is the study of the employed statistical methods, we do not break with this unfortunate habit.Accordingly, this paper is confined solely to univariate analysis.
In the next section, we survey some standard tests 1 of the hypothesis that stock price changes are independent and identically distributed (i.i.d.).These tests usually require that second moments exist.Since this assumption has been disputed in the past, we consider alternative methods in Section 3, which remain useful in case of infinite second moments, i.e. the R=S (range over standard deviation) analysis (see, e.g.MANDELBROT andWALLIS, 1969, andWALLIS andMATALAS, 1970) and model selection procedures.
The observation of clusters of high and low volatility in financial data (see, e.g., MAN-DELBROT, 1963) severely affected the above hypothesis of randomness, hence this hypothesis had to be replaced by the more general hypothesis of non-predictability, which states that the price series is a martingale or, equivalently, that the expected price of an asset at time t + 1, given the prices up to time t, is (apart from a deterministic drift term) just the price at time t.ENGLE (1982) introduced a class of conditional heteroscedastic processes for the description of the clustering phenomenon.The unconditional distributions of the variates from these processes may have both fat tails and finite variances.In Section 4, we propose a new test procedure for testing the predictability of price changes, which is robust to conditional heteroscedasticity.In contrast to other heteroscedasticityrobust tests (see, e.g., LO and MACKINLAY, 1988) our test does not rely on implausible assumptions.This test is applied to Austrian stock return series.

Testing the classical random walk hypothesis
The first model for a series p 0 ; : : : ; p n of stock prices 2 was independently developed by BACHELIER (1900) and OSBORNE (1959) and refined by MANDELBROT (1963) and FAMA (1965).This model describes the stock returns x t = log(p t ) log(p t 1 ) as independent and identically distributed random variables.Since stock returns essentially are differenced (log) prices 3 , this model implies that the (log) stock prices themselves follow a random walk, hence it is called random walk model.Numerous tests were carried out to examine the adequacy of the random walk model.Early empirical studies concentrated upon the search for linear dependence in stock returns.The observed sample autocorrelations of daily and weekly stock returns, although often statistically significant, were usually too close to zero to be of much speculative value (for a summary see FAMA, 1970).The significance of a particular sample autocorrelation ^ s = ĉs =ĉ 0 ; where ĉs = 1 n n jsj X t=1 (x t x)(x t+jsj x) and x = 1 n n X t=1 x t ; was assessed by using the well known fact that its variance is approximately 1=n for n i.i.d.observations x 1 ; : : : ; x n from a distribution with finite variance.Clearly, the consideration of only a single sample autocorrelation, typically ^ 1 , implies blindness against a variety of deviations from uncorrelatedness.On the other hand, performing significance tests for several sample autocorrelations inevitably increases the risk of reporting false findings.A natural alternative is to employ methods of simultaneous assessment of a large number of sample autocorrelations, such as the modified Box-Pierce test, which is based on the statistic (see BOX and PIERCE, 1970, PROTHERO and WALLIS, 1976, and LJUNG and BOX, 1978).Obviously, the Box-Pierce test can have extremely low power in situations where n is small and only a small fraction of the used sample autocorrelations is of appreciable size.
An alternative statistic is the variance ratio (see, e.g., LO andMACKINLAY, 1988, 1989) which may be explained as follows.Let n equal Nq, where q is any integer greater than 1.The plausibility of the hypothesis of no serial correlation may be checked by comparing the variance estimate of x t , s 2 1 = (n 1) 1 n X t=1 (x t x) 2 ; to that of x t+1 + : : : + x t+q , divided by q, s 2 q = q 1 N 1 N 1 X t=0 (x tq+1 + : : : + x tq+q q x) 2 : A convenient test statistic is given by the centered variance ratio r q = s 2 q =s 2 1 1: Using overlapping sums, we obtain the more refined statistics s 2 q = q 1 (n q + 1) 1 n q X t=0 (x t+1 + : : : + x t+q q x) 2 and r q = s 2 q =s 2 1 1: Under the null hypothesis that the x t are independently and identically distributed variables with finite variance, n 1=2 r q is approximately normally distributed with mean zero and variance 2(2q 1)(q 1)(3q) 1 for large n (LO and MACKINLAY, 1988).The statistic r q may be approximately written as a linear combination of the first q 1 sample autocorrelations with arithmetically declining weights, i.e. r q ' q 1 X s=1 2(q s)q 1 ^ s : Since r q has a representation as a linear combination of sample autocorrelations (and not of squared sample autocorrelations !), it is completely insensitive when the absolute values of the sums of positive and negative autocorrelations, respectively, are of approximately the same size (see also the remark at the end of Section 4.4).
GRANGER and MORGENSTERN (1963) promoted the idea that spectral (or frequency domain) methods are the most appropriate statistical techniques for the investigation of stock market data.Spectral analysis 4 (in its simplest form) represents a (real) time series as a weighted sum of harmonic components, i.e.
f j exp(it!j ); t = 1; : : : ; n; where the Fourier frequencies !j = 2 j=n are the integer multiples of the fundamental frequency 2 =n and the weights f j are the inner products < x; e j >= n 1=2 X x t exp( it!j ) of the vector x = (x 1 ; : : : ; x n ) 0 with the vectors e j = n 1=2 (exp(i!j ); exp(i2!j ); : : : ; exp(in!j )) 0 ; which constitute an orthonormal basis for the n-dimensional complex space C n .The sequence of weights f j is called the discrete Fourier transform of x and the sequence of squared absolute weights is called the periodogram of x.The periodogram decomposes the sum of squares of x into a sum of components associated with the Fourier frequencies !j , i.e.X x 2 t = X I j ; hence a plot of the periodogram ordinates I j against the Fourier frequencies !j shows the relative importance of each harmonic component.Obviously, I j equals I j for all j.
If the observations x 1 ; ::; x n are independent and identically normally distributed with mean x and variance 2 x then the periodogram ordinates I j ; j = 1; : : : ; m (where m = h n 1 2 i ) are independent and identically exponentially distributed with mean 2 x .Fortu- nately, this result does not depend critically on the assumption of normality.For example, if the x t are i.i.d. and have finite variance then the periodogram ordinates are still approximately exponentially distributed 5 .Consequently, the frequency domain tests of randomness considered below are rather robust to deviations from normality.
A widely used frequency domain test of randomness is due to BARTLETT (1954BARTLETT ( , 1955)).Observing that the normalized cumulative periodogram of a Gaussian random sample has the same distribution as an ordered sample from a uniform distribution, Bartlett proposed to apply the Kolmogorov-Smirnov goodness of fit test to the empirical distribution function of the J k .DURBIN (1969) considered a closely related (but possibly more natural) test, which is based on the maximum deviation of J k from its expected value k=m.Under the null hypothesis, the 95th percentile of C = max 1 k m 1 jJ k k=mj is approximately 6 given by 1:358(m 1) 1=2 , hence the null hypothesis will be accepted if the normalized integrated periodogram always lies between the two parallel lines a(!) = !+1:358(m 1) 1=2 and b(!) = !1:358(m 1) 1=2 which lie above and below the line representing the null hypothesis.Figure 1 shows an actual application.The hypothesis that this particular series of daily stock returns is purely random cannot be rejected at the 5% level of significance.
This outcome may be due to the fact that the conventional 7 Kolmogorov-Smirnov test has been designed for the detection of one distinct peak in the periodogram whereas the periodogram of our stock returns at best shows some vague ups and downs.However, there exist alternative tests which are more sensitive to such indistinct deviations from the null hypothesis., 1968, or SCHLITTGEN andSTREITBERG, 1984).The null hypothesis of no serial correlation is accepted since the normalized integrated periodogram always lies between the two lines forming the boundaries of the 95% region.
The tests proposed by RESCHENHOFER (1989) and RESCHENHOFER and BOMZE (1992) are adaptive in the sense that the forms of their test statistics depend on the data.Essentially they are based on the mutual independence of the "half periodograms" A 2 j = n 1=2 X x t cos(it!j ) 2 ; j = 1; : : : ; m and B 2 j = n 1=2 X x t sin(it!j ) 2 ; j = 1; : : : ; m: Independently of using information contained in the A 2 j to construct an appropriate test statistic for the B 2 j , information contained in the B 2 j is used to construct an appropri- ate test statistic for the A 2 j .Finally the two tests for the A 2 j and B 2 j , respectively, are combined in some suitable way.While the adaptive tests may have more power than Bartlett's test in case of several peaks in the periodogram, they are not competitive in case of a single peak.Fortunately, this is not true for another test proposed by RESCHEN-HOFER and BOMZE (1991).This test emerges from Bartlett's test simply by replacing the Kolmogorov-Smirnov goodness of fit measure by another one, namely the length of the graph of the distribution function.Under the null hypothesis, the theoretical distribution function is linear on the interval (0; 1), hence the graph has the smallest possible length p 2. Any deviation from the null hypothesis implies an increase in the length and it doesn't make any difference whether this increase is due to one large peak or to several small peaks.The simplest version of this test is based on the length of the linearly interpolated empirical distribution function.Since this simple test takes into account only the size of the periodogram ordinates but not their arrangement it cannot be efficient.Moreover, the length of the linearly interpolated empirical distribution function asymptotically overestimates the true length.An obvious modification is to replace the periodogram ordinates in this test by smoother quantities obtained by averaging neighbouring ordinates.The results of a simulation study (RESCHENHOFER and BOMZE, 1991) show that this smoothed length test is extremely powerful both in case of a single peak and in case of several peaks.As a consequence of the fact that not even for relatively large sample sizes the asymptotic distribution of the smoothed length can be used for the determination of critical values, DITTRICH et al. (1993) made extensive tables of critical values available.KUNST et al. (1991) applied the smoothed length test to ten Austrian stock return series and rejected the null hypothesis in four cases at the 5% level of significance.
Originally, periodogram analysis was regarded as a technique for the detection of exact periodicities.Since exact periodicities correspond to isolated peaks in the periodogram, formal tests were typically based on single ordinates.While SCHUSTER (1898) examined the periodogram at a given frequency, WALKER (1914) used the largest ordinate as test statistic.FISHER (1929) proposed to divide this statistic by the sum of all ordinates in order to remove the dependence on the unknown variance of the data.The relevance of these tests to the analysis of financial data is due to persistent speculations upon the presence of seasonal and/or weekly patterns.However, because of substantial evidence (see, e.g., FAMA andFRENCH, 1988, andLO andMACKINLAY, 1988) for the presence of autocorrelation in stock returns, the rejection of the random walk hypothesis with one of these tests is no serious indication of the existence of an exact periodicity.Thus tests are required which are able to distinguish isolated peaks due to exact periodicities from very narrow peaks due to continuous spectral components.An early test is due to WHIT-TLE (1952).Whittle's test statistic is similar to that of FISHER'S (1929) test.The only difference is that Whittle divides all periodogram ordinates by smoothed ordinates, hence the significance of large ordinates, in the neighbourhood of which there are other large ordinates, is reduced, whereas large ordinates which stand out from their neigbourhood, are not affected.Unfortunately, Whittle's test depends critically on the method of smoothing.This is also true for most other tests.However, for a simple special case, it is possible to construct a test which avoids this problem.For the time being, suppose that both the period and the phase of the suspected periodicity are known.We model the data as x t = + y t + cos (! 0 t + ' 0 ) ; where !0 and ' 0 are known constants and and are the unknown model parameters.The y t have mean zero and are possibly autocorrelated, but show no strictly periodic behavior.The null hypothesis states that equals zero and it is rejected whenever the regressor cos(! 0 t + ' 0 ) yields a much better fit than the regressor sin(! 0 t + ' 0 ).Clearly, the size of the estimated parameters is a suitable measure for the goodness of fit.It is therefore natural to use the ratio of the estimated regression parameters as test statistic.The resulting test asymptotically detects the hidden periodicity with probability one (see RESCHENHOFER, 1995).In practice, the phase ' 0 of the suspected harmonic component will not be known exactly.However, this test may still be used as long as the suspected phase does not deviate too much from the true phase.If independent replicates are available, the situation improves considerably.Then discrepancies of up to over 4 are tolerable.For the practical application of this test to a single series of stock returns, it is suggested to divide the series into a number of segments and to select a subsample of segments being sufficiently far apart to be treated as independent.The specification of !0 and ' 0 depends on the alternative hypothesis one has in mind.For example, one may suspect that daily stock returns show a weekly pattern with the minimum occurring on Monday (see, e.g., KR ÄMER and RUNDE, 1992).A similar test procedure, which gets along without any prior information about the phase, has been proposed by RESCHEN-HOFER (1997b).

Doubts about the existence of the second moment
The above-mentioned tests based on autocorrelations and periodogram ordinates, respectively, depend on the existence of the variance, hence a rejection of the null hypothesis need not necessarily be due to a deviation from randomness but may as well indicate the non-existence of the variance.Since the long tails that have been observed in the empirical distributions of stock returns are inconsistent with the assumption of normality, MANDELBROT (1963) proposed to use the other members of the stable family rather than the normal distribution.Note that the normal distribution is the only stable distribution with finite variance.Stable distributions have the convenient property of being type-invariant under addition.For example, if the distribution of daily returns is stable, the distribution of weekly and monthly returns will follow a stable distribution of exactly the same form, except for origin and scale.It can be shown that these distributions are the only possible limiting distributions for sums of i.i.d.random variables (see GNE-DENKO and KOLMOGOROV, 1954).Alternatively, the long-tailed empirical distributions of stock returns have been modeled as Student t distributions or mixtures of several normal distributions with approximately the same mean, but substantially different variances (a more recent reference is KON, 1984).While FAMA (1965) presented evidence that daily stock returns might follow stable distributions with infinite variances, subsequent studies reported contrary evidence (OFFICER, 1972, HSU et al., 1974, and BLATT-BERG and GONEDES, 1974).If MANDELBROT (1963) was right, methods would be required which behave reasonably even in case of infinite second moments, e.g.model identification with automatic criteria (see, e.g., AMEMIYA, 1980) and R=S (range over standard deviation) analysis.
The R=S statistic was independently introduced by HURST (1951) and STEIGER (1964).The numerator statistic for testing the randomness of stock returns is questioned by the fact that asymptotically the behavior of Q in case of randomness is quite similar to its behavior in case of weak (short-term) autocorrelation.For illustration, assume that the variance exists.Then , 1975) for i.i.d.x t , where " d !" denotes weak convergence and B is a random variable which is distributed as the range of a standard Brownian bridge on the unit interval, and , 1976) for weakly autocorrelated x t with summable autocorrelations s .Hence the test of randomness based on Q n breaks down whenever P j6 =0 j is close to zero, regardless whether there are some large j or not.On the other hand, this test can be expected to be quite powerful if either P s2Z s = 0 or P s2Z s = 1 or, equivalently, if the (normalized) spectral density at the origin is either 0 or 1, which is the case for strongly (long-term) autocorrelated x t .
The parameter d is used as a measure of the degree of strong autocorrelation.It can be estimated by a technique, which is based on the fact that Q n n d 1=2 has a non-degenerate limit (for the details of this technique see MANDELBROT andWALLIS, 1969, andWAL-LIS andMATALAS, 1970, and for the statistical foundations see MANDELBROT, 1975, TAQQU, 1975, 1977, and MANDELBROT and TAQQU, 1979).In essence, the R=S technique consists of evaluating the R=S statistic for various subsamples, computing average values for each subsample size, regressing the log average values on the log subsample sizes, and finally estimating the parameter d by the slope of the regression line minus 1=2.HAUSER and RESCHENHOFER (1995) performed extensive computer experiments and concluded that this procedure does not allow reliable estimation of the parameter d.GREENE and FIELITZ (1977) applied the R=S technique to stock returns and claimed to have found strong autocorrelation.LO (1991) refined the R=S statistic to make it more robust to weak autocorrelation.He found no evidence of strong autocorrelation in stock returns once the effects of weak autocorrelation have been taken into account 9 .
A simple parametric method of examining the randomness of stock returns is to fit where 0 p P and ( t ) is a purely random process, to the (mean-corrected) returns x 1 ; : : : ; x n and to choose the best fitting model according to some model selection criterion.The hypothesis of randomness is accepted whenever the trivial model with p = 0 is selected.The most famous model selection criteria are the Akaike information criterion (AIC, AKAIKE, 1973) AIC(p) = n log ^ 2 (p) + 2(p + 1) and the Schwarz Bayesian information criterion (Schwarz-BIC, SCHWARZ, 1978) Schwarz-BIC(p) = n log ^ 2 (p) + (p + 1) log(n); both of which have to be minimized with respect to p.Here ^ 2 (p) denotes the MLestimate of the residual variance of the model of dimension p. KUNST et al. (1991) selected the trivial model for only four of ten stock return series with the AIC and for nine of ten series with the Schwarz-BIC.As shown, e.g., by BHANSALI (1988) model selection criteria may remain useful in case of infinite variance.However, there are serious shortcomings in the derivations of these criteria.CHOW (1981) and RESCHENHOFER (1996b) pointed out that the Schwarz-BIC is only a poor approximation to the principle of selecting the model with the highest posterior probability of being correct, and RESCHENHOFER (1996a) showed that a related criterion, called Akaike-BIC, proposed by AKAIKE (1977), breaks down in the special case of purely random data, which is of special importance in financial applications.On the other hand, the AIC -which has been designed as estimator of the expected Kullback-Leibler discrepancy -is severely biased in case of misspecified models.SAWA (1978) and RESCHENHOFER (1994) evaluated the bias for the general case to order O(n 0 ) and O(n 1 ), respectively 11 .Unfortunately, since the obtained bias terms depend on an unknown parameter which has to be estimated, the consideration of these terms need not necessarily lead to superior criteria.

Conditional heteroscedasticity
Since the introduction of conditional heteroscedastic processes by ENGLE (1982), interest has shifted from the modeling of the unconditional distribution of stock returns towards the modeling of the conditional distribution, hence the question whether the unconditional variance of stock returns is finite or infinite is no longer of outstanding importance.Another consequence is that the classical random walk model, which implies independence of stock returns, is obsolete and any test for autocorrelation has to allow for conditional heteroscedasticity.
Using a test which is based on a heteroscedasticity-robust estimator for the variance of a sample (auto-) correlation coefficient (see EICKER, 1963, WHITE, 1980, WHITE and DOMOWITZ, 1984), LO and MACKINLAY (1988) found significant positive serial correlation for weekly and monthly returns.Unfortunately, their null hypothesis implies the existence of higher moments, and hence it is inconsistent with the size of the parameter estimates obtained by the fitting of autoregressive conditional heteroscedastic (ARCH) processes (see ENGLE, 1982) and generalized ARCH (GARCH) models (see BOLLERSLEV, 1986) to various series of stock returns (CHOU, 1988, R ÜNSTLER, 1992) and exchange rates (B ÄRLOCHER, 1990, HAUSER et al., 1994).TAYLOR (1984) arrived at the same heteroscedasticity-robust estimator for the variance of a sample autocorrelation coefficient but under the sole assumption that the multivariate distribution of the sequence of returns is symmetric.However, although TAYLOR (1984) claims that returns have approximately symmetric distributions, this assumption is also too strong.For example, KON (1984) found significant skewness in the distribution of daily returns for 29 out of a sample of 33 common stocks and indexes.A secondary argument for the presence of skewness in daily data is that stock returns typically have a positive mean, which would be inconsistent with the frequent occurrence of the zerochange in daily data if the returns were symmetric around the mean.One reason for excess zeros is that missing values caused by bank holidays are usually replaced by the quotations of the last trading day in order to preserve the weekly pattern.Another reason is the low trading activity on small stock markets.Of course, the latter does not apply to indices.
Conditional heteroscedasticity and skewness may cause effects which appear to be paradoxical at first sight.For example, imagine a situation where positive returns are mostly of medium size and negative returns are typically of very small or of very large size.In addition, assume that successive returns tend to be of approximately the same size.Then one would possibly conclude that the returns are conditional heteroscedastic and that the size of the returns can easily be predicted.But because of the extreme skewness, predictability of the absolute values of the returns is almost equivalent to predictability of the returns themselves.On the other hand, predictability of the returns implies that the conditional second moments around the mean of the unconditional distribution are no longer identical with the conditional variances.Hence it may well be that in reality the conditional variances are constant over time.Thus we should take warning from this example not to rely blindly on some test of conditional heteroscedasticity when the assumption of symmetry is violated.In the next section we first review the heteroscedasticity-robust tests proposed by TAYLOR (1984) and LO and MACKINLAY (1988), respectively.Then we propose a new test procedure which has been designed for situations where both conditional heteroscedasticity and skewness are present.TAYLOR (1984) derived an heteroscedasticity-robust estimator for the variance of ^ s under the sole assumption, H S , that the multivariate density f of x 1 ; : : : ; x n is symmetric around 0. Note that when this hypothesis is true, the bivariate density of x s and x t , denoted f st , satisfies f st (x s ; x t ) = f st (x s ; x t ) = f st ( x s ; x t ) = f st ( x s ; x t ):

Robust tests based on the assumption of symmetry
This implies that x s and x t are uncorrelated whenever s 6 = t.Under H S , every sequence (y t ) t=1::n , for which each jy t j = jx t j, has the same likelihood as the observed sequence (x t ).These 2 n equiprobable realizations (y (i) t ); i = 1; : : : ; 2 n , provide a discrete condi-tional distribution of x 1 ; : : : ; x n given jx t j ; t = 1; : : : ; n.

Robust tests based on certain mixing and moment conditions
Using EICKER'S ( 1963) approach (see also WHITE, 1980, andWHITE andDOMOWITZ, 1984) one arrives at the same heteroscedasticity-robust estimator of the variance of ^ s as TAYLOR (1984) but under different conditions.This estimator may be used to robustify any conventional test for serial correlation which is based on sample autocorrelations.For example, LO and MACKINLAY (1988) used this estimator to robustify the variance ratio test based on the statistic r q (see Section 2).Using the fact that r q may be approximately written as r q ' q 1 X s=1 2(q s)q 1 ^ s and assuming that the sample autocorrelations are asymptotically uncorrelated, one obtains that r q is approximately normally distributed with mean zero and variance q 1 X s=1 2(q s)=q 2 v(^ s ) for large n: The mixing and moment conditions (LO and MACKINLAY, 1988, p. 49) required for a formal proof of this result allow for a variety of forms of heteroscedasticity including many (G)ARCH processes.In any case, the existence of higher moments (including fourth moments !) is required.
ARCH processes have been introduced by ENGLE (1982) in order to generalize the implausible assumption of a constant one-period forecast variance.BOLLERSLEV (1986) extended the ARCH process to the GARCH process which is characterized by var(x t j x t 1 ; x t 2 ; : : :) = 0 + q X i=1 i x 2 t i + p X i=1 i h t i ; x t =h t i.i.d.N(0; 1): For p = 0, this GARCH(p; q) process reduces to the ARCH(q) process.A necessary and sufficient condition for wide-sense stationarity is q X For the widely used GARCH(1; 1) model the condition for the existence of the fourthorder moment is The actual parameters of the processes estimated for a number of financial series do not imply finite fourth moments (see CHOU, 1988, R ÜNSTLER, 1992, B ÄRLOCHER, 1992, HAUSER et al., 1994).Being aware of the fact that these results may rely to some extent on the chosen values of p and q, we do not claim that fourth moments are definitely not finite for stock return series.However, empirical evidence prevents us from taking the existence of these moments for granted.

Allowing for infinite fourth moments and skewness
A natural way to make allowance for an asymmetric heteroscedastic null hypothesis is to adapt Taylor's approach (see 2.1) to the case of skewness by considering only sequences (y t ) which have the same unconditional sample distribution as the original (meancorrected) sequence (x t ).However, because of the continuity of the data, any change of the sign of a nonzero observed entry x t will lead to a situation where x t does not occur in the synthetic series and, vice versa, x t does not occur in the observed sequence.Therefore the only feasible synthetic series would be the trivial one, which is characterized by no change at all.As a consequence, prior to generating the synthetic sequences (y t ), one has to transform the data in such a way that they become "more discrete".
In our approach, this is achieved by specifying threshold values (0 =) v 0 < v 1 < : : : < v k ( max jx t j) and replacing each x t 6 = 0 by the product of its sign and the mean over all jx s j contained in the same segment (v j 1 ; v j ] as jx t j. x t remains unchanged if x t = 0.The resulting discretized version (x t ) of the original sequence (x t ) will be compared with synthetic sequences (ỹ t ), which differ from (x t ) only in the signs, i.e. jỹ t j = jx t j for all t.The signs of the ỹt are chosen by some random mechanism which guarantees that for each j the total number of positive ỹt with jỹ t j 2 (v j 1 ; v j ] equals the total number of positive xt with jx t j 2 (v j 1 ; v j ].It is important to note that this method does not eliminate the asymmetry and also that the resulting sequences (ỹ t ) may be serially correlated.However, one may still examine whether the synthetic sequences are significantly less correlated than the original discretized sequence.Although this ad hoc procedure should certainly be refined in the future, we think that it is already useful in its present form, particularly in view of the restricted applicability of the conventional procedures.In any case, further developments require a sound understanding of the relationship between conditional heteroscedasticity, skewness, serial correlation and predictability.

Results of an empirical study
The following empirical study serves to illustrate the application of our resampling procedure.We used data from the Vienna Stock Exchange which is characterized by a modest number of participants and low trading activity.Daily price changes of seventeen Austrian stocks as well as of the GZ index are examined.The stocks are: Erste Allgemeine Stamm (insurance), CA Vorzug, Länderbank Vorzug (both banks), Gösser, Österreichische Brau, Reininghaus (all breweries), Semperit (rubber), Wienerberger (building materials), Constantia (paper etc.), Jungbunzlauer (citric acid), Lenzing (viscose), Leykam (paper), Steyr (vehicles), Veitscher (magnesite), Leipnik Stamm (industry), and A. Bau Porr Stamm (construction).The total number n of price changes is 1231 (from January 1986 to December 1990) for the GZ index and three stocks (Gösser, Leipnik Stamm, A. Bau Porr Stamm) and 1098 (from February 1986 to August 1990) for all other stocks.The sole criterion for the selection of these particular series was the availability of the data.
For five of these series a simple binomial test, which is based on a comparison of the total number of positive returns and the total number of negative returns, rejects the hypothesis of symmetry at the 1% level of significance.If this test is applied to the mean-corrected returns, the hypothesis of symmetry is rejected in all cases but one.Not surprisingly, the only exception is the GZ index.
To assess the quality of the asymptotic approximation of the distribution of the variance ratio statistic r q in case of symmetric heteroscedastic null hypotheses (see 4.2), 1000 synthetic series are generated (by randomly changing the sign of an observed mean corrected return with probability 1=2) in each case.Table 1 gives the empirical sizes of nominal 5 percent one-sided variance ratio tests r q of the hypothesis of a random walk with possibly heteroscedastic increments.Each row corresponds to a specific nonparametric form of heteroscedasticity.As the empirical sizes are close to 5% the use of the asymptotic distribution of the variance ratio is justified for our sample sizes and our specific forms of (symmetric) heteroscedasticity.Indeed, for our actual series the use of the asymptotic distribution leads almost to the same results as the exact randomization test (see Table 2).However, a rejection of the null hypothesis only because of asymmetry or an infinite fourth-order moment would not be of much interest.Therefore we also employ the procedure proposed in 4.3.
First the interval (0; max( min(x t ); max(x t ))] is divided into 54 segments of equal size for the series of length 1098 and into 61 segments for those of length 1231.Then each x t 6 = 0 is replaced by the product of its sign and the mean over all jx s j contained in the same segment as jx t j.Finally, 1000 synthetic series are generated along the lines given in 4.3 and the variance ratio r q (or an alternative test statistic, see below) is evaluated for both the "original" series and the synthetic series.The null hypothesis, that there are no dependencies left once the effects of conditional heteroscedasticity and skewness have been taken into account, is rejected whenever the value of r q for the original series is large compared with the values of r q for the synthetic series.The results obtained (see Table 3) are in line with the results obtained before.
So far, we have used the variance ratio r q to illustrate our robustification method, but it may as well be applied to other test statistics.Remember that the test statistic r q can approximately be written as r q ' q 1 X s=1 2(q s)q 1 ^ s (see Section 2) and hence it may be interpreted as an nonparametric estimator for the (normalized) spectral density at frequency 0. But in the absence of any prior information, it is hard to see what is the sense of testing the constancy of the spectral density by examining it only at a single frequency.An alternative test is the (modified) Box-Pierce test (see Section 2), which is sensitive to a broad class of deviations from uncorrelatedness.However, for our data the results obtained with the robust version of the Box-Pierce test are very similar to those obtained with the robust version of the variance ratio test.One explanation of this finding is that rejections typically originate in a significant first sample autocorrelation.As this sample autocorrelation is usually positive the use of the one-sided (!) variance ratio even yields slightly better results than the use of the Box-Pierce statistic which takes into account only the size of this coefficient but not its sign.Of course, it is irrelevant to this argumentation whether the robust versions of the two tests are considered or the original ones.
9 Unfortunately, there is a common tendency to explain contradictory findings solely by the use of different statistical methods and to ignore the fact that different sets of data have been analyzed.Moreover, LO (1991) claimed that the procedure used by GREENE and FIELITZ (1977) can be significantly impaired by weak autocorrelation.He substantiated his criticism by refering to DAVIES and HARTE (1987) who showed that a particular version of the R=S analysis, which is based on subsamples sizes down to three, is highly sensitive to short-term effects.In contrast, Greene and Fielitz employed the R=S analysis with a minimum subsample size of fifty in order to reach negligibility of transience.
10 Alternatively, the more general classes of autoregressive moving average (ARMA) models or fractionally integrated ARMA (ARFIMA) models could be used.For definitions and properties of these models see, e.g., BROCKWELL and DAVIS (1987).For ARMA models see also PRIESTLEY (1981).A more demanding reference is HANNAN and DEISTLER (1988).
11 The latter result contains SUGIURA'S (1978) corrected AIC as special case.

Figure 1 :
Figure 1: Application of the Durbin test to a series of daily stock returns (IBM, see JENK-INS and WATTS, 1968, or SCHLITTGEN andSTREITBERG, 1984).The null hypothesis of no serial correlation is accepted since the normalized integrated periodogram always lies between the two lines forming the boundaries of the 95% region.
may be regarded as the time domain counterpart of the Kolmogorov-Smirnov statistic for the normalized cumulative periodogram.It compares the time series plot of the sequence of partial sums S k = x 1 + : : : + x k ; k = 0; : : : ; n with the straight line through the points (0; 0) and (n; S n ).The denominator statistic S = (ĉ 0 ) the comparison of different time series.The usefulness of the statistic and may serve as estimators for the variances of the unconditional sample autocorrelations.TAYLOR (1984) calculated estimates for 17 financial time series.The median of these estimates is about 2:5=n, hence standard autocorrelation tests, which are based on an assumed variance 1=n, are unreliable.

Table 1 :
Empirical sizes of nominal 5 percent one-sided variance ratio tests r q .For each sequence of observed returns 1000 synthetic sequences are generated by randomly changing the sign of an observed return with probability 1=2.q