Long-range correlations in an online betting exchange for a football tournament

We analyze the changes in the market odds of football matches in an online betting exchange, Betfair.com. We identify the statistical differences between the returns that occur when the game play is under way, which we argue are driven by match events, and the returns that occur during half-time, which we ascribe to a trader-driven noise. Furthermore, using detrended fluctuation analysis we identify anti-persistence (Hurst exponent H<0.5) in odds returns and long memory (H>0.5) in the volatilities, which we attribute to the trader-driven noise. The time series of trading volume are found to be short-memory processes.


Introduction
Traditionally, if one wanted to make a bet on a certain outcome (such as the outcome of a football game), the bet would be placed with a bookmaker or with a parimutuel betting market. The bookmaker is an independent agent who decides the odds that he/she feels are appropriate for the expected probability of the outcome and the customer can place bets against those odds. Bookmakers can expect to make profits in the long run by providing odds to their customers which slightly overestimate the true probability of a specific outcome. A parimutuel betting market is a system in which all bets on the outcome of a particular event are placed together in a pool, and the payoff odds are calculated by sharing the pool among all winning bets.
A betting exchange, like Betfair 2 , is different from both of these systems and operates in a manner more akin to a stock exchange. Here, there is no single agent acting as a bookmaker, but agents can play both the roles of the customer and the bookmaker by choosing to either place a bet on an outcome ('back') or offer a bet ('lay') that can be backed by another agent. An agent who bets $1 on an outcome (a ' "back" bet') at 3 : 1 odds can expect to get $4 back if they win ($3 profit), but lose their $1 stake in the event of a loss. The converse of backing is to lay a bet. In this case, the agent plays the role of the bookmaker by taking up the bet of another agent. In the same example, the layer must pay out $4 dollars to the successful bettor in the event of a winning outcome, but retains the $1 stake in the event of a loss. Laying is thus equivalent to betting against an outcome.
The match-making is managed with a double auction order book as with a regular financial market, except that the bid and ask columns are replaced by back and lay. It is important to note, however, that unlike a stock market, a betting market has a definite conclusion. The odds will inevitably move towards infinity or zero as the outcome becomes a certainty.
The data used in our study are extracted from the online betting website, Betfair.com, based in London, United Kingdom. Since Betfair was launched in June 2000, it has become the largest online betting company in the UK and the largest betting exchange in the world. The website claims over two million clients from all over the world.
Betfair acts as a kind of prediction market [1,2]. Prediction markets are speculative markets where assets traded have a value that is tied to a particular event (e.g. will the next US president be a Republican?). Previous studies of prediction markets have shown that the market price of a contract can provide a good reflection of the true odds of that event taking place, or at least the mean market belief [1], [3]- [7]. One of the oldest and most famous prediction markets, the Iowa Political Stock Market, has been shown to beat opinion polls at forecasting the outcomes of presidential elections [7]; however, differing viewpoints [8] do exist in the literature. Furthermore, there is evidence to suggest that parimutuel betting markets have the capacity to aggregate widely dispersed information held by the agents [9], and this may also be the case with the Betfair betting exchange.
Some Betfair traders may be speculative traders, hoping to back high and lay low. While others may simply bet on their favorite team, regardless of the quoted odds. Odds fluctuations may or may not be a reflection of changing probabilities of the event taking place.
A Betfair market is a complex system with many heterogeneous interacting agents, which in many ways operates in a manner akin to a regular stock exchange. We may therefore expect to see similar behavior in the dynamics of its market observables. To this end, we aimed to 3 investigate the presence of long-range temporal correlations in Betfair, a phenomenon that has been observed in financial markets and that has also recently been observed in a political prediction market [10]. What we can learn about the dynamics of a betting market may provide an insight into the dynamics of a financial market and vice versa.

Long-range correlations in financial markets
To investigate the independence of changes of the price, S(t), of a financial asset in time, one typically calculates the autocorrelation function, γ (τ ), for the log-returns, R(t), which, if assumed weakly stationary with zero mean, is given by where and the variance, σ 2 , is given by Here, the logarithm of the price is used to account for the changing scale of prices [11]. This autocorrelation function for financial returns is usually characterized by an exponential function with a decay time of the order of minutes [12]. A slowly decaying autocorrelation function would allow traders to predict the direction of future price movements with confidence and to buy and sell within the time window of this correlation time to ensure profits. But the rapidly decaying autocorrelation observed would seem to confirm the efficient market hypothesis, which states that a market is 'efficient', meaning that all available information about an asset is instantly processed when it reaches the market and is immediately reflected in the price.
However, despite this apparent efficiency of financial markets, we see strong correlations in the volatility of price movements and this is already clear from a visual inspection of a price return time series. Small price changes tend to be followed by small price changes, and large price changes tend to be followed by large price changes. In fact, the volatilities of financial returns have been reported to be long-memory processes [13].
A time series, x i , with autocorrelation function γ (τ ) is said to exhibit long-memory if, in the limit τ → ∞, the following scaling relation is valid: Here, 0 < α < 1 and L(τ ) is a slowly varying function at infinity. In such a process, values from the distant past can have a significant effect on the present. Furthermore, the resulting non-integrability of the autocorrelation function implies anomalous diffusion. For normal diffusion, the standard deviation of running sums σ (t) = ( t x i ) 2 asymptotically increases as t 0.5 , whereas for diffusion processes with long-memory increments, the standard deviation asymptotically increases as t H with H in the interval (0.5, 1). H is called the Hurst exponent and is related to the exponent α in the autocorrelation function by H = 1 − α 2 [13].

4
The Hurst exponent was originally developed in hydrology for the practical purpose of determining optimal dam size for the Nile river's volatile rain and drought conditions. Hurst observed the seasonal flooding of the Nile to be a long-memory process. Long-memory processes are typically modeled with fractional Brownian motion [14] or ARFIMA models [15]. Besides in the volatility of financial price change, long memory has also been observed in the time series of trading volumes associated with traded assets [16]. Furthermore, the time series of the order signs of trades on financial markets have been shown to be correlated with long memory [13].
If, however, the standard deviation of running sums in a time series increases as t H with 0 < H < 0.5, then such a process is called a mean-reverting or anti-persistent process and possesses anti-correlations at all scales. Such anti-persistent auto-correlations have been observed in the returns of spot prices in commodity markets [17,18].

Data analyzed
The dataset used in our analysis comprises snapshots of the Betfair.com order book for the activity of betting markets for the outcomes of 146 matches that took place during the 2008 Champions League football tournament. For each match, there is a market associated with each potential outcome: win, lose or draw. These data were collected using front-end software provided by Fracsoft 3 which interfaces with Betfair.com. The data are resolved to one second accuracy. Also included is a value for the 'odds' at which the last bet for each market was matched during that second and the value of bets matched to date for that outcome. It is important to note that the odds with which Betfair users choose to back or lay outcomes can only take on values from a specified set of numbers between 1.01 and 1000 imposed by the Betfair user interface. If a bet is matched at Betfair 'odds' of 3.0, the backer will triple his/her money in the event of a payout; such a payout would be more traditionally represented in gambling parlance as '2-to-1'.
We take the reciprocal of the Betfair 'odds' to give a value that reflects the probability of that particular outcome. We consider the change in this probability with respect to time to be a stochastic variable analogous to financial price returns and call it the implied probability return. Furthermore, by subtracting the current from the previous value of total bets matched on the market, we deduce the 1 s trading volume, which is the total monetary value of bets matched on the market during any given second.
The market remains open for Betfair users to trade bets up until the very end of a match, when the conclusion is known and the market odds naturally move towards either 1.01 or 1000. A typical dataset is composed of about 6500 records representing the state of the betting market at each second during the length of the match, including approximately 15 min of half-time and typically 3 min of stoppage time. for the three possible outcomes of a Champions League football match. As the match draws toward its conclusion, the implied probabilities for the outcomes tend toward 0 and 1 as the market odds tend toward 1000 and 1.01. We see a large probability change for each outcome at the t ≈ 1400 s mark corresponding to a goal scored by Manchester United (Man Utd.). We see little variation in the implied probability during half-time. (b) The one-second return (change in the implied probability) as a function of time remaining. but 0.0 in the case of Roma and the draw. As expected, it was Manchester United who won the match. We see a large change in the probability at roughly t = 1400 s remaining corresponding to a goal scored. We also see in figure 1(a) that since no match events are occurring, the implied probability remains approximately constant at half-time. In the analysis that we present from now on, graphs will be the result of averaging over all such markets in our dataset.
In this section, we aim to compare the statistics of Betfair implied probability returns with the model of [10] that describes the evolution of the price of a binary option contract in the Iowa Political Stock Market. A binary option is a contract that pays out $1 if an event occurs and $0 otherwise. Assuming an efficient market, the value of the contract should reflect the probability that the event will occur. The price of a contract in a binary option market is thus likely to behave similarly to the implied probability in a Betfair market.
In [10], the authors' model changes in what they consider the 'true' price of a contract in a binary option market, by the following process: where t i is the time at which the ith last trade takes place, p(t i ) is the price of the contract at the ith last trade, T a is the average time between consecutive trades and γ is a free parameter that which is the expected result for the binary option model of [10]. The model is a poor fit to the average half-time returns, which are significantly smaller. (b) When the in-play implied probability is sampled at 1 s frequency, the returns are influenced by a trader-driven noise, which increases with implied probability.
describes how the magnitude of price changes scales with time remaining. By fitting the model to their empirical market data, they estimate γ = 0.49.
It can be shown that when t i is large and price changes are small (such that we can consider both p(t i ) and t i to be constant), the average magnitude of partial sums, p(t i , n) = p(t i−n ) − p(t i ), has the following form: For simplicity, we take γ = 0.5, which gives where α = 2nT a π is a constant. In the following analysis, we make a distinction between the implied probability changes that are observed when the game play of the football match is under way, which we call in-play returns, and returns that occur at half-time. When we factor out the time-dependent component, by multiplying the empirical in-play 100 s Betfair returns by the square root of time remaining in the match, the average result is well modeled by the probability-dependent component of equation (7). This is shown in figure 2(a). Furthermore, when we factor out the probability-dependent component, by dividing the returns by 1/4 − ( p(t i ) − 1/2) 2 , the result is well approximated by the inverse square root law of equation (7), as seen in figure 3. In-play which is the expected result for the binary option model of [10].
implied probability returns for Champions League football returns thus exhibit the 'conditional diverging volatility' reported in [10].
However, we see a very different result for | p imp (t) √ t| in the case the 100 s half-time returns (figure 2(a)) and the model is also a poor fit in the case of the 1 s in-play returns. In particular, the average magnitude of 1 s in-play returns is seen to systematically increase with implied probability. This is shown in figure 2(b). We propose that whereas for low frequencies the model may capture the dynamics of the changing probability of the outcome, at very high frequencies in-play returns are influenced by a trader-driven noise.
The authors of [10] do include noise in their full model that amounts to multiplying the price of equation (5) by an error term exp (t i ), where (t i ) is a Gaussian random variable with a constant variance fit to experiment. Our results suggest that such a noise term for Betfair market returns should have a variance that is a function of the implied probability. This assertion is supported by the results of figure 4 in which we see a strong dependence of trading volume on implied probability ( figure 4(a)) and a square root law relating half-time trading volume to volatility ( figure 4(b)).
At half-time, when no match events are occurring, we expect to see little change in the 'true' probability of the outcome, so both 1 and 100 s half-time returns can be attributed entirely to trader-driven noise. We therefore separate the analysis in the next section into both in-play and half-time. We expect in-play market observables to be affected by systematic trends related to the changing probability of the outcome of the match, but the half-time market observables will result from more stationary trader-driven noise. This is in agreement with the results of [22], in which we find very different probability distributions for the returns that occur at half-time and the returns that occur when the game play of the match is under way. In particular, the in-play returns are seen to exhibit fatter tails, indicating that large probability changes are much more likely to occur when the match is under way, due to significant We find that the volatility can be fitted to a square root law. This result mirrors similar observations for financial markets [19]- [21].
match events such as goals scored creating dramatic changes in the implied probabilities, as seen in figure 1.

Detrended fluctuation analysis (DFA)
Due to their non-stationary nature, to investigate long-range correlations in Betfair market observables, following [10], we employ DFA. DFA is a well-established method for determining the scaling behavior of noisy data in the presence of trends without knowing their origin and shape and it works as follows [23]. Given a time series x i , we first integrate it to form X (t) = t i=1 x i . The integrated time series, X (t), is then divided into blocks of equal length m and a linear ordinary least squares regression is performed to the points in each block to produce the function X m (t). We then detrend the integrated time series X (t), by subtracting the local trend, X m (t), in each block and calculate the root mean square fluctuation,

Volatility.
In figure 5, we show the results of DFA on the magnitude of implied probability returns as drawn from half-time and from times in the match when the game play is under way. In both cases, we see scaling behavior characterized by a straight line on a log-log plot and estimate a Hurst exponent of approximately H = 0.61 from a least squares fit to the graph. This value differs somewhat from the value H ≈ 0.66 and 0.71 for the volatility in the binary option prediction market of [10].
Since, in the case of the in-play returns, we expect the volatility to systematically increase during the match, the scaling breaks down for large box length m as we approach the length scale of the football match. We conclude that Betfair volatility is a long-memory process, and since the same behavior is also observed during half-time, when little news is hitting the market, the scaling results from the trader-driven noise.

Trading volume.
In figure 6, we show the results of DFA on the 1 s trading volumes as drawn from half-time and in-play. In both cases, we see power-law scaling with an exponent H ≈ 0.5 corresponding to short-term memory. Again, in the case of the in-play trading volumes, the scaling breaks down for large box length. This is because of the strong dependence of trading volume on the implied probability, which tends to zero or one at the end of every match. We conclude that unlike what has been observed in the case of financial markets [16], Betfair trading volumes are not a long-memory process.

Implied probability return.
Finally, we use DFA to investigate the Hurst exponent of the implied probability returns themselves. The results of the analysis for half-time and in-play are shown in figure 7.
For the half-time returns, we observe power-law scaling with an exponent H = 0.21. This suggests that the implied probability at half-time is a mean-reverting (anti-persistent) process. 1 10 100 1000 Box Length m (seconds) For the in-play returns, F(m) has a very different shape. This is because during game play the implied probability follows the true probability, which fluctuates with match events; furthermore, the implied probability, always tends to zero or one at the end of each match.
Such mean-reverting behavior is to be expected, since if the 'true' probability of the outcome remains constant at half-time, then when market odds depart from what is believed to be the true probability, the actions of traders seeking arbitrage will serve to restore the 'true' odds. A surprising result is the self-affinity of the signal; we observe a power-law scaling with H = 0.21, spanning at least one order of magnitude. For t < 25 s, F(m) scales with an exponent H = 0.5, indicating that over very short timescales the implied probability behaves more randomly. This may be due to information delays in the Betfair system or represent the reaction time with which traders can respond rationally to changes in the Betfair odds.
The plot of F(m) for the in-play returns exhibits mixed behavior. Scaling with H < 0.5 is still detectable, but over longer timescales the implied probability return follows the true probability of events, which behaves more like a random walk (H ≈ 0.5). For very long timescales, the DFA is influenced by the systematic trend in the odds to move towards zero or one at the end of every match.
Such anti-correlations are atypical for financial markets but have been observed in electricity spot prices [17,18]. However, it should be noted that the implied probability is the reciprocal of the odds at which the last bet was matched. A change in this value does not necessarily mean a change in the market's best back or lay price. Therefore, the anti-correlation present in the half-time signal cannot necessarily be exploited to beat the market.

Lo's modified rescaled range test
Since we expect the implied probability to remain constant during half-time, we expect implied probability returns and trading volumes occurring during half-time to be stationary. We can therefore supplement the DFA with Lo's modified rescaled range test for long memory in the case of the half-time market data. Lo's test [24] is a modification of the classical rescaled range test of Mandelbrot [25], which compares the maximum and minimum values of running sums of deviations from the sample mean renormalized by the sample standard deviation. The classical rescaled range test has been shown to be too weak, and due to the influence of short-range correlation it can indicate long memory when it does not exist [24].
Given a sample time series x 1 , x 2 , . . . , x n with sample meanx n = 1 whereσ 2 x is the sample variance,γ j is the sample autocovariance and q < n. Q n (q) differs from the classical R/S statistic of Mandelbrot only in the denominator. Following Lo, we use q = [k n ], where with [k n ] being the greatest integer less than or equal to k n andp is the sample firstorder autocorrelation coefficient. Provided certain criteria are met [24], V n ≡ Q n / √ n tends asymptotically to a random variable distributed according to the probability distribution for  (9) for the 438 data time series of trading volume returns (left) and absolute implied probability (right) extracted from the 146 Champions League football matches. The line represents the analytical solution for the distribution of V n in the case of short-term memory. Also shown is the 95% confidence range for rejecting the null hypothesis of short-range correlations. For trading volumes, the distribution of points compares well with the expectation of a short-memory model. However, in the case of the absolute probability returns, 14% of points lie outside the 95% confidence interval, suggesting that Betfair volatility is a long-memory process.
the range of a Brownian bridge 4 . When V n is outside the interval [0.809, 1.862], we can reject the null hypothesis of short-range dependence with 95% confidence.
In figure 8, we show the distribution of V n as calculated for the half-time time series of absolute implied probability returns and trading volumes for all 438 markets studied. In the case of the trading volumes, we reject the null hypothesis of short-range dependence with 95% confidence in 7% of the datasets and we find that data fits well to the analytical expectation for the probability distribution for the range of a Brownian bridge.
However, we find larger values of V n in the case of the absolute returns datasets and their distribution differs much from the analytical solution for the Brownian bridge. In particular, we find that 14% of the volatility results fall outside the 95% confidence interval for shortterm memory. 14% may seem small, but it is important to note that Lo's modified rescaled range test is a very strong test and is heavily biased to accept the null hypothesis of short-range dependence [26]. 13 These results appear to support the hypothesis of long-range dependence in implied probability volatility but short-range dependence in trading volume, as indicated by the DFA analysis in the previous section.

Summary and conclusions
We have investigated the temporal correlations in football market data extracted from an online betting exchange, Betfair.com. Unlike financial markets, in which traders buy and sell financial assets, on Betfair it is bets that are instead 'backed' or 'laid'. Despite this significant difference between Betfair and financial markets, we find similar long-range correlations in the market returns.
We show that when sampled at low frequencies, the magnitude of changes in the implied probability can be qualitatively described by the model of [10]. However, when the implied probability is sampled at high frequencies, we see the influence of a trader-driven noise. Furthermore, the magnitude of the trader noise is found to be a function of the implied probability.
Having separated the high-frequency trader noise that dominates the half-time activity from the mixed market behavior of the in-play returns, we perform standard tests for longrange correlations on the in-play and half-time returns and trading volumes. We find that the magnitude of the returns exhibits long-range correlations with a Hurst exponent H = 0.61. We find, however, that unlike what has been observed in the case of financial markets, the Betfair trading volumes are a short-memory process with a Hurst exponent H ≈ 0.5. Finally, detrended fluctuation analysis of the implied probability return reveals that the trader-driven noise gives rise to a self-affine mean-reversion of the implied probability at half-time.