Modelling bursty time series

Many human-related activities show power-law decaying interevent time distribution with exponents usually varying between 1 and 2. We study a simple task-queuing model, which produces bursty time series due to the nontrivial dynamics of the task list. The model is characterised by a priority distribution as an input parameter, which describes the choice procedure from the list. We give exact results on the asymptotic behaviour of the model and we show that the interevent time distribution is power-law decaying for any kind of input distributions that remain normalizable in the infinite list limit, with exponents tunable between 1 and 2. The model satisfies a scaling law between the exponents of interevent time distribution (alpha) and autocorrelation function (beta): alpha + beta = 2. This law is general for renewal processes with power-law decaying interevent time distribution. We conclude that slowly decaying autocorrelation function indicates long-range dependency only if the scaling law is violated.


Introduction
Studying human activity patterns is of central interest due to the wide practical usage. Understanding the dynamics underlying the timing of various human activities -such as starting a phone call, sending an e-mail or visiting a web-site -are crucial to modelling the spreading of information or viruses [1]. Modelling human dynamics is also important in resource allocation. It has been shown that for many human activities the interevent time distribution follows a power law with exponents usually varying between 1 and 2. Processes with power-law decaying interevent time distribution look very different from the Poisson process, which has been used to model the timing of human activities before [2,3]. While time series from the latter look rather homogeneous, the former processes produce bursts of rapidly occurring events, which are separated by long inactive periods. Some examples where power-law decaying interevent time distribution has been observed are email-communication (with exponent β ≈ 1, [4]), surface mail communication (β ≈ 1.5, [5]), web-browsing (β ≈ 1.2, [6]) and library loans (β ≈ 1, [7]). In some other cases a monotonous relation has been reported between the user activity and the interevent time exponent, for example in short-message communication (β ∈ (1.5, 2.1), [8]) or in watching movies (β ∈ (1.5, 2.7), [9]). In a recent paper there can be found a distribution of exponents of various channels of communications [10]. These observations make it necessary to find a model in which the interevent time exponent is tunable.
Similar bursty behaviour has been observed in other natural phenomena, for example in neuron-firing sequences [11] or in the timings of earthquakes [12]. The interevent time distribution does not give us any information about the dependency among the consecutive actions. Correlation between events is usually characterised by the autocorrelation function of the timings of detected events. Bursty behaviour is often accompanied by power-law decaying autocorrelation function [13], which is usually thought to indicate long-range dependency, see e.g. [14]. However, time series with independent power-law distributed interevent times show long-range correlations [15,16]. This paper is organized as follows. We start with introducing a task-queuing model, which has an advantage compared to the Barabási-model [17], namely that the observable is not the waiting time of an action (from adding the task to the list until executing it) but the interevent time between similar activities. We determine the asymptotic decay of the interevent time distribution in a simple limit of the model. We give a simple proof of the scaling law between the exponents of the interevent time distribution and the autocorrelation function based on Tauberian theorems. Finally, we demonstrate that the scaling law can be violated if the interevent times are long-range dependent.

The model
We assume that people have a task list of size N and they choose their forthcoming activity from this list. The list is ordered in the sense that the probability w i of choosing the i th activity from the list is decreasing as a function of position i. The task at the chosen position jumps to the front (triggering the corresponding activity) and pushes the tasks that preceded it to the right (figure 1). This mechanism is responsible for producing the bursty behaviour because once a person starts to do an activity, that is going to have high priority for a while.
The model is capable of covering many types of activities but now we only concentrate on one type. The tasks of this type are marked with A in figure 1 (e.g. watching movies). For the sake of simplicity we assume that the list contains only one single item of type A, all the others are activities of type B. It turns out that this restriction is important for the interevent time distribution but irrelevant for the autocorrelation function.
We show in our paper that this model produces power-law decaying interevent time distribution for a wide variety of the w i distributions. First we analyse the model with power-law decaying priority distribution w i , second we analyse the effect of exponentially decaying priority distribution, and finally we discuss the stretched exponential case. If the list is finite, the power-law regime is followed by an exponential cutoff. The cutoff is the consequence of reaching the end of the list, from which a geometrically distributed waiting time follows: P wt (t) ∼ (1 − w N ) t . A marginal result of section 4 shows that the expectation value of the interevent time is equal to the length of the list independently on the priority distribution (as long as ergodicity is maintained). For the sake of simplicity we determine the exponent of power-law decaying region in the case of an infinitely long list where the exponential cutoff does not appear.

Interevent time distribution
The interevent time is equal to the recurrence time of the first element of the list. Let q(n, t) (n ≥ 1) denote the probability of finding the observed element at position n after t timesteps without any recurrences up to time t. We emphasise here that q(n, t) is not a conditional probability and that 1 − ∞ n=1 q(n, t) gives the survival function of the interevent time distribution. The initial condition is set as q(n, 1) = (1 − w 1 )δ n,2 , that is, at the first step the observed element moves from the front of the list to the second position with probability (1 − w 1 ) (otherwise a recurrence would occur). The restriction not to recur is important because this makes large jumps to the front of the list forbidden for the observed element. Time evolution is given by where Q n = 1 − n k=1 w k is the survival function of the priority distribution. Equation (2) expresses that the observed element can be found at position n either if it was there in the previous time step and we choose a position smaller than n or it was at position n − 1 and we choose an activity at position k > n. Our aim is to determine the asymptotic behaviour of ∞ n=1 q(n, t). The main trick we use in analysing the recursion above is to consider the n = const levels first instead of the t = const levels which intuitively one would do. At every step the time coordinate gets increased while the position of the observed element might remain unchanged or get increased by one as well (figure 2). The path of the element in the (n, t) plane can be divided into sections that start with a step on the bias and are followed by some steps upwards (which can be zero as well). These sections can be characterised by their height-difference, which can take values from 1, 2, . . . . These height differences are independent and (optimistic) geometrically distributed with parameter depending on the position. Let ξ k be independent geometrically distributed random variables with parameter Q k , i.e.
With n fixed q(n, t) is the probability that we find the element at the n th position at time t (without any recurrences). This corresponds to paths with n − 1 steps to the right (started from position 1 at time 0) and q(n, t) is the distribution mass function of the sum of the random heights.
We could go so far without specifying the priority distribution, now we have to specify Q n to continue.

Power-law decaying priorities
Here we analyse the model in which the survival function of the priority distribution is power-law decaying, i.e. Q n = cn −σ+1 + O(n −σ ). O(n ω ) means that that term is asymptotically at most of the order of n ω . In this case w i is also power-law decaying with exponent σ. Though the random variables ξ k are not identically distributedby checking the Lyapunov condition -one can show that the central limit theorem holds for this situation (see Appendix A.2). In this approximation q(n, t) is Gaussian Using integral approximation to evaluate the sums and keeping only the highest order terms in n yields: This formula shows that the probability of finding an element at position n at time t is centered on the t = n σ cσ curve which is in agreement with the numerical results ( figure  3). The sum of q(n, t) in variable n is the probability of the observed element has not recurred up to time t. We approximate this sum by integral and we apply the following substitution (with new variable r): For γ > 0 (here γ may refer to: σ − 1/2, σ, 2σ − 1): where o(n ω ) means that that term is asymptotically of strictly smaller order than n ω . From equations (4-6) it follows that Differentiating this equation gives the first main result of our paper,

Exponentially decaying priorities
Now we study the model with Q n = ce −λn priority survival function. In contrast to the previous case the central limit theorem cannot be used, because the last term in n−1 i=1 ξ i is comparable with the complete sum.
However, we can construct a limit theorem even for this situation. First of all, we approximate the geometrically distributed random variables ξ k ∼ Geom(1 − Q k ) by exponential ones:ξ k ∼ Exp(Q k ). We apply the scaling property of the exponential distribution to expressξ k by i.i.d. exponential random variables:ξ k ∼ e λk η k , where η k ∼ Exp(c). In equation (3) we need the distribution of sums of the first n − 1 ξs: X n has a well defined limit as n → ∞, which we denote by X. The probability density function of this (limit) random variable is denoted by ψ(x) (and is given explicitly in Appendix A.1). The finite sum X n in equation (8) can be approximated by X, yielding since t 0 ψ(y)dy tends to 1 as t → ∞. By differentiating we get β = 2 independently on parameter λ.

Stretched exponentially decaying priorities
The last investigated priority distribution is the stretched exponential, i.e. Q n = ce −λn ν .
Faster than exponential decay (ν > 1) In this case n−1 i=1 ξ i is totally dominated by the last term in the sum, hence we approximate the sum with this term.
This means that the β = 2 exponent holds for this case but a logarithmic correction appears.
Slower than exponential decay (ν < 1) In this case the increment of Q n is just small enough for the central limit theorem to be still applicable (see appendix). We approximate the mean and variance of n−1 i=1 ξ i by integral and we use the property a 0 e y y 1/ν−1 a→∞ ≈ e a a 1/ν−1 . With these we get The t → ∞ asymptotic behaviour of q(n, t)dn can be determined by introducing a proper new variable, Keeping the leading order term only, one finds that the survival function of the interevent time distribution is ∞ 0 q(n, t)dn = 1 For more details of the calculations see Appendix A.3. We have got β = 2 with logarithmic correction again. While for ν > 1 the correction makes the decay of the distribution faster than the pure power law, for ν < 1 it is slightly slowed down.

Autocorrelation function
Another characteristic property of the time series is autocorrelation function. Let X(t) denote the indicator variable of the observed activity: X(t) = 1 if the observed activity is at the first position of the list at time t (i.e. an event happened) and X(t) = 0 otherwise. We define the autocorrelation function as usual, The model defines a Markov-chain and the state space is the space of permutations of the list. The transition matrix for a list of length N is given by where Q N = 0. This matrix is doubly stochastic, hence E [X(t)] t = 1 N . According to the recurrence formula of Marc Kac [18] we conclude that the expectation value of the interevent time is equal to the size of the list independently on the priority distribution (as long as ergodicity is maintained). The autocorrelation function can be written in simpler form making use of the knowledge of E [X(t)] t , The probability of finding the activity at the first position can be calculated numerically by successive application of the Markov-chain transition matrix (18). For power-law decaying priority distributions numerical computations show that the autocorrelation function is power-law decaying with an exponential cutoff (figure 4). Given σ, the autocorrelation functions for various list sizes can be rescaled to collapse into a single curve ( figure 4, inset). This property can be written in a mathematical way:  The exponents used for rescaling the autocorrelation functions are listed in table 1. indicating that in the N → ∞ limit the autocorrelation function is power-law decaying with exponent α = 1 σ for σ ≥ 2. With the exact β = 2 − 1 σ result this can be combined into a scaling law: α + β = 2.
If the priority distribution decays faster than power-law, then β = 2 and α = 0. This means that the autocorrelation function decays slower than any power. This is the case when the priority distribution decays like a (stretched) exponential function.

Proof of the scaling law
The essential properties of the model for the scaling relation are that the interevent times are independent and power-law decaying. Let T denote the set of recurrence times and let τ be an interevent time. The autocorrelation function can be written in the following form: which simplifies to A(t) = P (t ∈ T ) if β ≤ 2. The Laplace transform of the autocorrelation function can be expressed by the Laplace transform of the interevent time distribution: where we used the property that the interevent times are independent and identically distributed. Tauberian and Abelian theorems connect the asymptotics of a function with the asymptotics of its Laplace transform [20]. Applying Abelian theorem to the right side of the last equation results in g(λ) ∼ λ 1−β . Then applying the Tauberian theorem yields A(t) ∼ t β−2 or α + β = 2 .
To be precise we had to use an extended version of the Abelian theorem, which can be derived from the original theorem using integration by parts. With similar train of thought the scaling law can be extended to the 2 < β < 3 region where β − α = 2 holds. These results are in agreement with [16,19].

Models with non-independent interevent time distribution
Independence of the interevent times was important in the proof of the scaling law. The violation of the scaling relation indicates dependency among the interevent times but we emphasize that the opposite direction does not necessarily hold. In this section we investigate two models that are characterised by dependent interevent times. One of them satisfies the scaling relation (23) and the other does not.

A model with weak dependence
Our first example is the two-state model of reference [13], because that model was introduced to capture deep correlations, nevertheless, it obeys the scaling law. We start with a brief introduction of the model. The system alternates between a normal state A, in which the events are separated by longer time intervals, and an excited state B, which performs events with higher frequency. After each event executed in state A the system may switch to state B with probability π or remain in state A with probability 1 − π. In contrast, the excited state has a memory. That is, if the system had executed more events in state B since the last event in the normal state, the transition to state A would be less probable. The probability of performing one more event in state B is given by p(n) = (n/(n + 1)) ν , where n is the number of events in the current excited state. This memory function was introduced to model the empirically observed powerlaw decay in the bursty event number distribution (for the definition see the reference). The dynamics is illustrated in figure 5a) for a better understanding. In both states the interevent times are generated from a reinforcement process resulting independent random interevent times τ A and τ B with power-law decaying distributions. We denote the corresponding exponents by β A and β B . The parameters of the model were the following: π = 0.1, ν = 2.0, β A = 1.3, β B = 5.0, and it was found that the model satisfies the scaling law with exponents β = 1.3 and α = 0.7. The number of events performed in a single period of state A or B are i.i.d. random variables N A or N B . N A is geometrically distributed with parameter π. The distribution of N B can be easily obtained from p(n), and it is power-law decaying with exponent 1 + ν. Making use of these new random variables the dynamics simplifies: the system executes N A events with interevent times distributed as τ A , then switches to the excited state and executes N B events with interevent times τ B , then switches back to the normal state, and so on.
It is clear that the decay of the interevent time distribution of the whole process is determined by the smallest of the exponents β A and β B . Since both of E [τ B ] and E [N B ] are finite, the periods of state B have a characteristic length Within the periods of state A there is no characteristic time-scale, hence by looking on the time series at a larger scale the effective length of the B periods will be shorter (figure 5b)). Therefore the existence of the B periods is irrelevant for the asymptotic decay of the autocorrelation function giving rise to the scaling law.

A model with long-range dependence
Another simple way to introduce dependency between the interevent times is to construct a Markov-chain, i.e. the next interevent time depends only on the actual interevent time. As a counterexample to the scaling law we constructed a long-range dependent set of interevent times by Metropolis-algorithm. The base of the algorithm is constructing a Markov chain on the integers that has power-law decaying stationary distribution P (x) ∼ x −β . The algorithm uses a proposal density (transition rate) Q(x ′ , x n ) which generates a proposal sample from the current value of the interevent time. This sample is accepted for the next value with probability otherwise the previous value is repeated. To generate a power-law distributed sample with long-range dependency the mixing of the Markov chain should be slow, i.e. the gap in the spectrum of the Markov chain should vanish. In this order we allow only small differences between consecutive interevent times: Q(x ′ , x n ) = 1 2D I(x n − D, x n + D) or equivalently x ′ as a random variable is discrete uniformly distributed: The simulation results ( figure 6) show that the autocorrelation function decays slower than it would be assumed from the scaling law (23). This example shows that the scaling relation (23) can be violated if there is long range dependency among the interevent times.

Extensions of the model
A trivial extension of the model could be putting more items of the observed activity into the list. Then this parameter would tune the frequency of doing the activity. In this case the interevent times become dependent and the simulation results on finite lists show that the interevent time distribution decays faster than power-law but decays slower than an exponential function. It is easy to prove that in spite of this the autocorrelation function (17) is independent of the number of the observed activity and remains powerlaw decaying. This is another example for models in which the scaling law does not hold. When the list is finite, we have much more freedom in the choice of the priority distribution. However, the method based on expressing q(n, t) as a probability density function of sum of n − 1 independent geometrically distributed random variables is general, and can offer a good base for further calculations, even for finite lists.
The model may be interesting not only for human dynamics but also for the mathematical theory of card shuffling [21]. We can define the time reversed version of the model in which we choose a position from the same distribution as in the original model and we move the first element of the list to the random position. This model has similar properties to the original model, e.g. this model has the same interevent time distribution and autocorrelation function. If we think of the list as a deck of cards, then the time reversed model is a generalisation of the top-in-at-random shuffle method. The latter is a primitive model of shuffling cards: the top card is removed and inserted into the deck at a uniformly distributed random position [21].

Conclusion
In a proper model one should take into consideration the dependency among the interevent times besides the interevent time distribution. In human dynamics the latter is slowly decaying and as a consequence of this the autocorrelation function of the interevent times is not well-defined (i.e. the second moment does not exist). However, the autocorrelation function of the time series exists and it might be a good measure for long range dependency among the interevent times. Time series of messages sent by an individual in an online community are reported to be not more correlated than an independent process with the same interevent time distribution [16]. Similarly, the exponents in the neuron-firing sequences approximately satisfy the scaling law (α = 1.0, β = 1.1 [13]). For these systems the model we studied might be applicable. However, this is not always the case. For example, the autocorrelation function of the e-mail sequence decays slower (α = 0.75 [13]) than it should do estimated from the scaling law. This indicates long range dependency in the time series in addition to the power-law decaying interevent time distribution. In this case a dependent model should be applied, for example a model based on Metropolis algorithm that is similar to the one we studied as a counterexample to the scaling law. Effects of circadian patterns and inhomogeneity in an ensemble of individuals should also be considered [22].
In real networks interactions between individuals have to be taken into account to reproduce some social phenomena, e.g. temporal motifs [23] observed in a mobile call dataset. Interactions could be incorporated in this model by allowing the actual activity of an individual to modify the priority list of some of his/her neighbour. If this effect is rare and can be considered as a perturbation, our results on the dynamics of the list could be a starting point to further calculations covering for example information flow in a network.
can be expressed [24]. Substituting the proper parameters into that formula one gets With some trivial algebraic manipulations this can be written in the following form The probability density function in the n → ∞ limit reads The products s k=1 (1−e −kλ ) are usually cited as q-Pochhammer symbols (e −λ , e −λ ) s and are convergent in the s → ∞ limit. Equation (A.5) shows clearly that the asymptotic behaviour of the limit random variable X is determined by the first term in the sum.
Appendix A.2. Central limit theorem for the power-law decaying case The Lyapunov condition for the central limit theorem reads: If this condition is fulfilled for any δ, then the central limit theorem can be used. We test this condition at δ = 1 with the following moments: Substituting these and approximating the sums by integrals yield We make integral approximation of the sums of the moments above: n k=1 e aλk ν ≈ n 0 e aλk ν dk = 1 νλ 1/ν aλn ν 0 e y y 1/ν−1 ≈ 1 aλν e aλn ν n 1−ν . (A.14) The last approximation comes from the property that a 0 e y y 1/ν−1 a→∞ ≈ e a a 1/ν−1 . With these we have: We underlined the terms that determine the decay of the integral for large t.