Bayesian inference of Lévy walks via hidden Markov models

The Lévy walk (LW) is a non-Brownian random walk model that has been found to describe anomalous dynamic phenomena in diverse fields ranging from biology over quantum physics to ecology. Recurrently occurring problems are to examine whether observed data are successfully quantified by a model classified as LWs or not and extract the best model parameters in accordance with the data. Motivated by such needs, we propose a hidden Markov model for LWs and computationally realize and test the corresponding Bayesian inference method. We introduce a Markovian decomposition scheme to approximate a renewal process governed by a power-law waiting time distribution. Using this, we construct the likelihood function of LWs based on a hidden Markov model and the forward algorithm. With the LW trajectories simulated at various conditions, we perform the Bayesian inference for parameter estimation and model classification. We show that the power-law exponent of the flight-time distribution can be successfully extracted even at the condition that the mean-squared displacement does not display the expected scaling exponent due to the noise or insufficient trajectory length. It is also demonstrated that the Bayesian method performs remarkably inferring the LW trajectories from given unclassified trajectory data set if the noise level is moderate.

It is often challenging to pinpoint the anomalous diffusion model underlying observed complex diffusion dynamics and determine the value of model parameters (e.g., the anomaly exponent).The former is understood as model classification and the latter as parameter estimation.To get proper information on these tasks, conventionally, one analyzes the experimental time-series data with various dynamic measures such as MSD, Van-Hove correlation functions, steplength or flight-time distribution, and ergodicity breaking parameter [31][32][33][34][35][36][37].However, the interpretation of the data can be subjective and sometimes becomes highly nontrivial if the given data are not ideal.The non-ideal conditions include insufficient samples in terms of length and number, the high noise level in the data, and the spatiotemporal heterogeneity in the samples.To overcome these difficulties, recently, new approaches, e.g., Bayesian inference and machine learning, have been developed with keen interest from cross-disciplinary sciences [10,[38][39][40][41][42][43][44][45][46].
In this work, we develop a Bayesian inference method for analyzing Lévy walks.Bayesian inference provides a direct comparison between the possible statistical models and finds the most likely model parameters over a given parameter space by calculating the likelihood functions of given data.In a Bayesian framework, one does not have to define point-estimators and extract the statistical features (e.g., MSD or van-Hove correlation function).From a single raw trajectory, diffusion models and model parameters can be examined to derive the best-fit model and its model parameters.Recently, some efforts have been devoted to developing the Bayesian framework for Brownian motion, fractional Brownian motion, and diffusing diffusivity [10,[42][43][44][45].These studies have proven the success of the Bayesian inference approach in the trajectory analysis.
The Lévy walk model is a physical correction of Lévy flights [27].The corresponding trajectory consists of independent ballistic flights with constant speed v and the random flight times governed by a Lévy distribution.Lévy walks are known to model various non-equilibrium dynamic processes in biology, ecology, physics, and economics.Prominent examples include foraging dynamics of bacteria, animals, and humans [16,[47][48][49][50][51][52][53], active transport of macromolecules inside a living cell [54][55][56], blinking dynamics of quantum dots [57], light in inhomogeneous media [58], a flow motion in a rotating annulus [59], and related Hamiltonian systems [60,61].For modeling animal's foraging dynamics in ecology, a Bayesian inference approach was applied to differentiate Lévy flights/walks from composite correlated random walks via calculating their likelihood functions [46].Their method, however, was based on manually-extracted step lengths and turning angles from trajectories.Moreover, the correlation between the step length and flight time-the essential ingredient of a Lévy walk-was not taken into account in the likelihood function.
Previously, we developed Bayesian inference frameworks for several anomalous diffusion models [10,[43][44][45].In this work, based on these experiences, we set the Bayesian inference method for Lévy walks.Our approach is based on a Markov decomposition method that mathematically describes a Lévy walk process without the manually-extracted distribution features.Using this method, we construct the likelihood function of a Lévy walk trajectory systematically as a function of trajectory length, the power-law exponent of the flight-time distribution, and the noise strength.
The present paper is organized in the following structure.In Sec. 2 we give a brief overview of Bayesian inference theory and the Lévy walk model that are used throughout the paper.In Sec. 3, we propose a theory of the hidden Markov model for Lévy walks.This model is employed to compute the likelihood function of a one dimensional Lévy walk trajectory based on the forward algorithm (we comment on a generalization to two dimensions in the B).Secs. 4 & 5 present the computational results of our Bayesian inference method.Here, we demonstrate that random flight times governed by a power-law distribution are generated from the hidden Markov model.We also visualize the numerically calculated likelihood functions at various conditions.We then systematically investigate the performance of our Bayesian inference tool for extracting the power-law exponent of Lévy walk trajectories upon varying the trajectory length, the exponent α, and the signal-to-noise ratio.In Sec. 5, we study the model classification using the Bayesian method differentiating among Lévy walk, fractional Brownian and scaled Brownian trajectories at various conditions.Finally, in Sec.6 we summarize our study and discuss the generalization of our Bayesian inference theory of Lévy walks for the application to other similar anomalous dynamic processes.
2 Theoretical background

Bayesian inference method
In this subsection, we provide a short overview of Bayesian inference in connection with our Bayesian approach to Lévy walks in the following section.Here we introduce key concepts as well as define essential probability distributions to be used later.For a complete overview of Bayesian inference, see Ref. [62].Suppose that a data set D is given, and we have several statistical models M i to explain the data.For a quantitative comparison between the models, in principle, we need to estimate the conditional probability P (M i |D) that the model M i is selected to explain D. The best-fit model is the one that maximizes the given conditional probability.According to Bayes' theorem [63], P (M i |D) is given by where P (M i ) is the prior probability that the model M i is given before our observation.P (D|M i ) ≡ Z i is the conditional probability that the data D is observed when a model M i is given, referred to as evidence of a model M i .
Assuming that all the prior probabilities are the same, i.e., P (M i ) = P (M j ) for all (i, j) pairs, then the probability that the model M i is true among N possible models is given by the ratio of the model evidence: This is the key idea in the Bayesian approach towards model inference.The best-fit model can be found via the comparison of the model evidences.
To calculate the model evidence, we define the likelihood function L i ( θ i ) ≡ P (D| θ i , M i ) where θ i is the parameter set necessary to specify the model M i .The model evidence (marginal likelihood) is then evaluated by marginalizing the likelihood function L i ( θ i ) over the parameter space Θ with a weight π( θ i ): Here, π( θ i ) ≡ p( θ i |M i ) is a prior probability for the parameter set θ i given based on additional information or intuition [62].
The Bayesian inference approach is effectively used to find the best parameter set for a given data set if a specific model is given or the best model is determined from the above model inference.The posterior probability distribution of the parameters is given by and we report the medians of this distribution as the estimated values of the parameters of a given model.It is computationally nontrivial to perform the marginalization of the likelihood function [Eq.( 4)] and to sample over the posterior distribution (5).For this task, here we use the method of annealed importance sampling [64].For further information on this method, see our MATLAB code for annealed importance sampling in the AnDi challenge GitHub [65].

The discrete-time Lévy walk
In the Lévy walk model, a flight time τ of a ballistic run with speed v and its step length ∆x is given by the joint probability Because of the constraint between flight time and step length, the whole dynamics are solely determined by characteristics of the flight time distribution ψ(τ ).In the Lévy walk we consider the class of ψ(τ ) that asymptotically behaves as a power-law function at large times with a power-law exponent γ (i.e., the stability index of a Lévy distribution) of 0 < γ ≤ 2. The Lévy walk has distinct diffusion dynamics depending on γ [27]: the diffusion becomes ballistic for 0 < γ < 1, sub-ballistic superdiffusive for 1 < γ < 2, and diffusive for γ ≥ 2. In this work, we are interested in the sub-ballistic Lévy walk where the MSD increases with time as [27] x 2 (t) ∝ t 3−γ at t → ∞. (8) Thus, in this range of γ, the index of flight-time distribution [Eq.(7)] is related to the anomaly exponent via For simplicity, hereafter, we denote the flight-time distribution ψ(t) in terms of α [via Eq. ( 9)] throughout the text.
While the Lévy walk is defined with ψ(τ ) in the continuous time-domain of τ > 0, time series in the experiment, such as particle trajectories obtained from a single-particle tracking experiment, are usually given in unit of a time lag that is bound by the time resolution of a measurement.To apply our Bayesian inference approach to such data, we here introduce a discrete-time Lévy walk.Practically, the Lévy walk trajectory is often generated in this way.The anomalous diffusion simulation package (the AnDi package [66,67]) used in this work provides the discrete-time Lévy walk.
In the discrete-time Lévy walk, the flight times are integers, sampled by the following two steps.(1) A continuous random time τ (> 0) is sampled by the transformation using a uniform random number u in (0, 1) and The normalized probability density function for τ is  (2) The integer part of τ is taken using floor function τ = τ .After the rounding-off, the probability mass function for τ is given by From now on, we call Eq. ( 11) the continuous target distribution and Eq. ( 12) the discrete target distribution.With the integer random times given by ψ target d (τ ), we can generate a discrete-time Lévy walk trajectory parametrized by θ = {v, α, σ noise } via the governing equation Figure 1(a) shows a schematic description of our discrete-time Lévy walk.For given observation time t, the walker (or particle) has N complete flight events and an incomplete flight event.The duration time for the latter is described by a residual time τ res = t − N i=1 τ i .The velocity of the walker during the i-th flight is (−1) ri v where r i = 0 or 1 with an equal probability.Additionally, we consider a Lévy walk in a noisy environment.Physically, the noise may originate from localization errors during the tracking measurement or the thermal random agitation in the heat bath.Thus, in a complete picture, we invent the so-called noisy Lévy walk trajectory by superimposing a Gaussian random noise ξ t to the noise-free trajectory in Eq. ( 13).The Gaussian noise is characterized by ξ t ∼ N (0, σ 2 noise ) where v/σ noise (or σ D /σ noise after standardizing the trajectory in the later section) is understood as the signal-to-noise (SNR) ratio.Figure 1(b) shows examples of the noisy Lévy walk trajectories for varying SNRs.When the noise level is significantly high (SNR = 1), the noisy Lévy walk no longer looks like a Lévy walk.

Hidden Markov model for Lévy walks & Bayesian inference 3.1 Hidden Markov model for Lévy walks
In this section, we develop our theoretical formalism of the Bayesian inference framework for Lévy walks.For this, we first build up a solvable model that appropriately describes the discrete-time Lévy walk process in the scheme of the Bayesian formalism.Our approach is to establish a hidden Markov model for non-Markovian processes governed by a power-law relaxation dynamics using the Markovian decomposition method [68].Now let us consider a flight-time distribution with a power-law decay up to a time T , which is referred to as a cutoff time and usually considered the maximum observation time window in the measurement.Such power-law distributions can be represented by the superposition of a finite number of independent exponential distributions (i.e., Ornstein-Uhlenbeck processes); see the method developed in Ref. [68].In this approximation scheme (with a slight modification), the continuous target distribution can be expanded in terms of multiple exponential distributions as such: where the inverse of the relaxation time of the i-th component and its statistical weight are, respectively, given by See the A for the validation of Eq. ( 15).Here, b is a scale parameter, k fast is the high-frequency cutoff, and N k determines the long-time exponential cutoff T ≈ (4 − α)b N k /k fast .We set b = 4, k fast = 8, and N k = 5 through the work.In this scheme, we regard the power-law distributed flight times (within the observed time window) as the inter-arrival times among N k -component independent Markov processes.The sojourn time in the state s i has the probability density function The state transition from state s i to state s j occurs with a probability P j;α when the process leaves the state s i .This process gives the inter-arrival time distribution Eq. ( 15) in the form of Now we set up the proper rounding-off process to make discrete-time Markov chains of N k components having the inter-arrival times governed by ψ target d (τ ).The walker's state is renewed at every time unit such that it is changed to one of the other states (transition) or maintains the same state (self-transition).Figure 2(a) illustrates how the transition and self-transition events are determined from the continuous inter-arrival times.Let us assume that at t = t 0 the walker (Markov chain) is in the state S t0 = s i and samples the transition time τ according to φ i (τ ).Here S t denotes the walker's state at time t.(i) If 1 ≤ τ < 2, it is allocated to τ = 1 from the rounding-off process.Thus, the walker in the next time (t = t 0 + 1) should change its state to a new one, say, S t0+1 = s j .In this case, the state transition occurs with the transition probability Self transition ii;α and P ij;α , which are given by ( 25) and (24), respectively.
(ii) If τ ≥ 2, then τ ≥ 2 and the walker stays in the same state at t = t 0 + 1 (S t0+1 = S t0 = s i ).This is the case called the self-transition, which occurs with the transition probability (Beware that the event of τ < 1 is the null case because such events are not allowed from φ(τ )).The probability that a walker in the i-th state has an integer sojourn time τ is which can be understood in the Markov chain description as the probability that the walker maintains the same state (τ − 1) times and then changes its state, i.e., Therefore, by simply repeating the above process every time step, we obtain the process having integer inter-arrival times of τ ≥ 1 with the distribution Figure 2B illustrates our Markov chain scheme based on the above formalism.Here we incorporate the fact that the Lévy walk (in 1D) has two velocity states, ±v, for given flight time.This is described in our model such that the state s i has two substates s + i (for +v) and s − i (for −v) with the same transition rate k i .Thus, there are 2N k states in total in the model and the transition probability from the i-th state to the j-th state is given by As shown in Fig. 2(b), the change of the velocity state from s + i (s − i ) to s − i (s + i ) is considered as a transition with the probability P ii;α .On the other hand, if the process changes its state from s this is effectively a self-transition.The effective probability of self-transition is then given by We numerically confirm our hidden Markov model for discrete-time Lévy walks in Sec.4.1.

The likelihood function
The above discrete-time Markov chain describes the dynamics of the latent states S t .For given parameter set θ = {α, v, σ noise } and S t , the conditional probability of observing ∆x t = x t+1 −x t , i.e., the so-called emission probability, is given by In this expression, ∆x t − v corresponds to the state S t = s + i (i = 1, . . ., N k ) and ∆x t + v to the state S t = s − i , respectively.The factor √ 2σ noise is attributed to the fact that the displacement is noise ).We consider this term as an approximate way to include measurement noise for the positions, i.e., we should really have had ξ t = ξ t+1 − ξ t with the ξ t independent.This approximation ignores the correlations between steps that measurement noise induces, and we will see later that it does not work well when σ noise is comparable to the Levy walk step size, i.e., when SNR is not larger than unity.For brevity, below we omit the notation of M Lévy in the conditional probabilities.
In the discrete-time Lévy walk, a trajectory x t can be represented by a series of the unit-time displacements D = {∆x 1 , ∆x 2 , • • • , ∆x T }.Thus, when the latent state is given by S = {S 1 , S 2 , • • • , S T }, the probability of finding D (or x t ) is written as the product of the emission probabilities We should marginalize Eq. ( 27 In this work, we use the forward algorithm method to calculate Eq. ( 28) for obtaining the corresponding likelihood function [69].The summation over S is carried out in an iterative way with the help of the probability function This probability function satisfies the recurrence relation In this expression, we figure that P (∆x t |S t = s ± i , θ) is the emission probability (26) and is the transition probabilities between two distinct states [Eq.(24)] or between the same state (self-transition) [Eq.(25)].The initial condition of the recurrence relation is The initial distribution of the states P (S 1 = s ± i ) is given by Eq. ( 17) provided that the process starts at t = 0 (or more generally, the initial time t = 0 is the renewal time).Alternatively, if the process is observed after a long time where it has become stationary, then an extra factor of the average length of the time intervals, C/(1 − e −ki ), should  be multiplied on the right hand side of Eq. ( 17) to obtain the initial distribution.C is a normalizing constant of the distribution.
The recurrence relation Eq. ( 31) is given in terms of the transition probability, emission probability, and the initial distribution of the states that can be computable.Therefore, starting from Eq. ( 31), we can iteratively obtain β(S T = s The likelihood function is then simply obtained by marginalizing β(S T = s ± i ) over all possible states: Note that changes in direction and observations of the Lévy walker can only happen at integer values of time.In case observations happen with a camera that is operated independently of the system it observes, this can be considered an approximation that is valid in the limit of the camera frame rate being much faster than the typical interval between changes of direction.

Prior distributions
For Bayesian inference, we need to specify the prior distributions for parameters θ = {α, v, σ noise }.We use a uniform prior distribution for the anomaly exponent α based on the fact that our Lévy walk describes a superdiffusive and Fickian dynamics with the anomaly exponent in the range of 1 ≤ α ≤ 2. The prior distribution is set to be P (α) = 1 for 1 ≤ α ≤ 2 or zero otherwise.For the walker's speed v(> 0), we use a folded Gaussian distribution in the positive regime.We use the Gaussian distribution because the trajectories we analyze in Sec. 4 have a standard normal distribution of speed v after standardization.The pre-processing (standardization) of the trajectories is further explained in Sec.4.2.In the case of the background noise strength σ noise , its magnitude compared to the signal can be arbitrary.Here, we use a folded Cauchy distribution P (σ noise ) = 2 π(1+σ 2 noise ) with σ noise > 0 as a prior distribution.The Cauchy distribution is a heavy-tailed distribution, allowing σ noise to be an arbitrarily large value.

Results I: Numerical implementation of the Bayesian inference and parameter estimation 4.1 Numerical implementation of the hidden Markov model
We have numerically implemented our hidden Markov model for the discrete-time Lévy walk.In Fig. 3 we plot the flight-time distributions obtained from the simulated inter-arrival times of the Markov chain at α = 1.0, 1.3, 1.6, and 2.0.We allocate 5000 random walkers on every Markov chain (Fig. 2(b)) and collect the inter-arrival times for the transition events between distinct latent states up to t = 1000.For all cases, we find the numerically obtained interarrival times (gray lines, Fig. 3) are in excellent agreement with the expected power-law target distribution (12) (red lines).While the longest exponential cutoff is expected to start at T ≈ O(300), the power-law scaling in the numerical data seems to extend longer than T .The tail of the distribution for t T is noisy because this part is sampled from the rare events of the N k -th latent state.The exponential cut-off time T gets shorter as α increases.For α = 2, the distribution shows the expected power-law up to T ≈ 256, and then it is slightly off the power-law due to the exponential cutoff.
We numerically obtain the likelihood function by the iterative method [Eq.( 32)] and confirm the validity of our method.For this purpose, we generate Lévy walk trajectories of length 100 time steps with the flight-time exponent α = 1.6 and v ≈ 5.92 (i) without the background noise and (ii) with the noise with SNR = 2.We estimate the likelihood function for the trajectories over the parameter space v ∈ [0, 10], α ∈ [1, 2], and σ noise ∈ [0.01, 5]. Figure 4 visualizes the numerically estimated likelihood functions.In the figure, the heatmaps show the likelihood functions as a function of v and α that are marginalized over the background noise σ noise ; the line plots are the likelihood function as a function of α marginalized over both σ noise and v.All the likelihood functions are divided by their maximum value L max for normalization.(1) The noise-free Lévy walk trajectory [Fig.4(a)].When such ideal Lévy walk data are given, the likelihood function expectedly produces the maximum probability at the parameter (v, α) that is very close to the true value.It is noted that the shape of the likelihood function around the maximum is dependent on the parameter.While the likelihood function appears to be a sharply peaked function for v, it is a broad distribution for α (see the line plot).This is because there are much more samples (displacement vectors) for inferring v than for inferring α that relies on fewer samples of flight times.(2) The noisy Lévy walk trajectory [Fig.4(b)].We find that the noise makes the likelihood function dispersed in the parameter space v while the profile of the likelihood function for α is almost unaffected by the noise.The increased uncertainty for v is understandable.The speed of a flight can be fluctuating by the noise in the unit of time step.Despite the dispersion, the likelihood function still has the maximum at around the input speed.This suggests that the Bayesian approach can successfully estimate the true parameter value (α & v) even from noisy trajectories.

Parameter estimation via the Bayesian inference
Based on the likelihood function constructed above, we have implemented a Bayesian inference method to estimate model parameters from the trajectory data.The Lévy walk is characterized by three model parameters: the power-law exponent of flight-time distribution γ = 3 − α (or, equivalently, the anomaly exponent α), the walker's flight speed v, and the background noise strength σ noise .In this work, we focus on inferring the power-law exponent of the flighttime statistics α for given Lévy walk trajectories.The other parameters, such as v, can be easily inferred by available statistical approaches, e.g., by averaging absolute values of unit displacements.
For our study, we generate a total of 1100 Lévy walk trajectories with various trajectory lengths N step ∈ {10, 100, 200, • • • , 900, 1000}.The trajectories are produced by the AnDi package [66].The anomaly exponent α and the speed v are randomly given such that α ∼ unif(1, 2) and v ∼ unif(0, 10), respectively.We standardize the trajectories using the protocol described in Ref. [70], in which the trajectories are regularized so that the standard deviation of the displacement distribution has σ D ∼ N (0, 1).This pre-processing is required to avoid the unexpected effect of step sizes on both the parameter estimation and model classification.We then add the three different levels of Gaussian noises to the 1100 trajectory with an SNR = σ D /σ noise ∈ {1, 2, 10}, thus in total generating 3300 noisy Lévy trajectories.
We perform our Bayesian inference analysis to these noisy Lévy walk trajectories and estimate the anomaly exponent α, more precisely, the power-law exponent of the flight-time distribution.In Fig. 5, we compare the predicted α to the ground-true value α GT used for generating the data.In each scatter plot, the solid line is the average of the predicted α for given α GT .When the background noise is not significant (SNR = 10, 2), the predicted α (solid line) is in good agreement with α GT .We confirm that the estimation of α via the Bayesian inference is more accurate than that from the fitting of the MSD curve (the dashed line).When the noise becomes comparable with the signal (SNR = 1), both methods show poor performance.
For a more quantitative examination of the performance of our Bayesian inference method, we measure the absolute error (AE), |α i − α GT,i |, as a function of N step , α GT , and SNR.In AE, α i and α GT signify the predicted α and α GT for the i-th trajectory, respectively.In Fig. 6(a), we show the mean absolute error (MAE) as a function of N step .For SNR = 10 and 2, the increase of trajectory length leads to more accurate estimation.For all trajectory lengths, the Bayesian method performs much better than the fitting from MSDs.For the case of SNR = 1, on the contrary, the inference goodness is not improved even if the length is increased.In this case, intense noises impede the accurate inference of flight times.
Figure 6(b) shows the MAE from all trajectory lengths as a function of α GT .For SNR = 10 and 2, the inference goodness is almost insensitive to α GT although the error seems to be relatively larger at α GT = 1 and 2. Note that even at SNR = 10, the MAE from the fitting method (MSD) is considerably high for these cases.When the noise level is too high (SNR = 1), the inference performance tends to be poorer as α GT is close to unity.This tendency occurs because the zigzag flight patterns frequently occurring in Lévy walk trajectories with α GT ∼ 1 are easily confused with the noisy dynamics (when an SNR is small), so the short flight times cannot be correctly extracted.Consequently, as seen in Fig. 5 (SNR = 1), the Lévy walk trajectory with α GT ≈ 1 is inferred to the Lévy walk with a predicted α > 1.
In Fig. 6(c) we examine the effect of SNR on MAE.The result shows that MAE abruptly decreases with increasing an SNR from 1 to ≈ 2, and after then it saturates for SNR 2. This suggests that under a noisy environment the Bayesian inference method extracts the information of α (i.e., the power-law exponent in the flight-time distribution) with the accuracy at the noise-free condition.Surprisingly, such a robust performance is achievable against the noise level as significant as SNR ≈ 2. We also notice that even in the strong noise condition (SNR 1), the Bayesian inference is better than the fitting of MSD method in the parameter estimation.

Results II: Model classification
In this section, we apply our Bayesian inference method for classifying models and comparing their likelihoods.Our task is to compare several diffusion models for a given trajectory in the scheme of Bayesian inference and find the most probable model in accordance with the data.For this task, we additionally consider two diffusion models-apart from the Lévy walk (LW)-describing anomalous diffusion with 0 < α ≤ 2. The two anomalous models are fractional Brownian motion (FBM) [26] and scaled Brownian motion (SBM) [29,30].We refer to a recent review paper [1] for the mathematical introduction of these models and their applications to biology and various physical systems.
Although the three models (LW, FBM, and SBM) can describe the superdiffusive process quantified by the MSD (1), they are based on distinct theoretical context and physical mechanisms.Accordingly, their statistical characteristics are distinguished, enabling us to use the Bayesian approach for model inference.
For a numerical test, we generate trajectory data for the three diffusion models.The data preparation is the same as in Sec.4.2 for the noisy Lévy walk trajectory.For each model, there are a total of 1100 trajectories with N step ∈ {10, 100, • • • , 900, 1000} (and 100 trajectories for given N step ).The α is randomly sampled from [1,2] for the three diffusion models.All of these trajectories are standardized and superimposed with background noise in the same way described in Sec.4.2.The likelihood functions for FBM [44] and SBM [71] are available from our previous work.We refer to Ref. [44] for our Bayesian inference of FBM.Based on that, we have implemented the MATLAB code for the Bayesian model inference for LW, FBM, and SBM [65].
Figure 7 shows the confusion matrices summarizing the performance of the Bayesian model classification upon the above-prepared data set.When SNR ≥ 2, our Bayesian inference gives high true positive rates for all three processes.Especially, the Lévy walk trajectory is remarkably well classified by our Bayesian method.On the contrary, if the noise level is significantly high (SNR = 1), the Lévy walk is the most poorly inferred model among the three models.In this case, the Gaussian noise tends to make the Lévy walk falsely detected as FBM (a Gaussian and stationaryincremental process).The high true positive rates for FBM and SBM at SNR = 1, comparable to those at SNR = 2 and 10, are attributed to the same effect.
For more quantitative analysis of the model classification, we study error rates of the Lévy walk model as a function of N step , α, and SNR.We estimate the false discovery rate (FDR) and false negative rate (FNR) from the confusion matrices.The former is the portion of FBM and SBM trajectories in the set of trajectories classified into LW, and the latter is the portion of misidentified Lévy walk trajectories (as FBM or SBM) in the set of LW trajectories.
Fig. 8(a) and (d), show the FDR and FNR as a function of trajectory length.The FDR tends to decrease with trajectory length irrespective of an SNR.Interestingly, FDR at SNR = 1 seems to be smaller than that for SNR ≥ 2 for most trajectory lengths.It appears that non-LW trajectories have an increased chance of being classified into our Lévy walk model when an SNR is sufficiently large.This tendency can be seen in Fig. 7 and Fig. 8(c).The FNR is shown to be very small (except for SNR = 1) and tends to decrease with the trajectory length.We note that FNR behaves against the noise in the opposite way with FDR, such that a higher noise level leads to an increased FNR.As explained, at  the highest noise level (SNR = 1), the LW trajectories are apt to be classified into FBM.Here, increasing trajectory length elevates the FNR because longer trajectories contain more evidence on the noise contribution and, subsequently, increase the chance of being falsely detected as FBM.
In Fig. 8(b) & (d), we examine the variation of FDR and FNR as a function of α GT .There is a clear tendency that the FDR increases as the ground true α approaches unity (Fig. 8(b)).This is because when α = 1 the three models converge to the same process, i.e., Brownian motion.Hence, when a given (anomalous) diffusion process has α ≈ 1, it is very difficult to pinpoint the correct underlying diffusion model.Ideally, the FDR should be 2/3 at α GT = 1 if there are three models.However, FNR (at SNR = 2 and 10) is rarely affected by the ground truth anomaly exponent.We find from further analysis of the data that ambiguous Brownian-like trajectories around α GT ∼ 1 tends to be classified into LW than FBM or SBM.The rise of FNR at SNR = 1 is the falsely detected FBM from the LW trajectories.

Discussion & Summary
In this work, we have proposed a hidden Markov model for Lévy walks (LWs) and, using this model, developed Bayesian inference tools for the analysis of LW-like trajectory data.The essential part of our theory was to approximate a renewal process governed by a power-law sojourn time distribution with a Markovian decomposition method [68].
We have shown that a power-law flight-time distribution can be generated by this method, enabling us to construct the likelihood functions of an LW.Using the Bayesian inference tool, we have demonstrated to extract the value of the power-law exponent of flight-time distribution (γ = 3 − α) or, equivalently, the anomaly exponent α from given LW-like trajectories (containing background noises).It has been confirmed that the Bayesian inference method outperforms the traditional fitting method by extracting α from an MSD.When the noise level is negligible (SNR = 10), the accuracy of parameter estimation is as high as that by the top-performing machine learning methods recently developed [70,72].For the case of 2D Lévy walk trajectories, remarkably, our Bayesian inference method is shown to perform better than the top-performing machine learning tool (which will be separately reported elsewhere).Apart from the parameter estimation, we have employed our Bayesian method to classify anomalous diffusion models.For three distinct diffusion processes (LW, FBM, and SBM) superimposed with noises, our Bayesian method successfully infers the correct model with a high true positive rate when the noise level is low (SNR = 10, 2).
Our Bayesian inference method based on the hidden Markov model can be extended to analyzing multi-dimensional data or potentially applied to other similar problems.Below we provide some remarks on these issues.
Firstly, it is straightforward to generalize the current model for the analysis of two-dimensional (2D) Lévy walks.The 2D LWs are applicable to the modeling of cell motility on the surface [73,74], the projected 2D motion of actively transported cargoes in the cell [56,75], swarming dynamics of bacteria [12,76], and various foraging dynamics in ecology [15,[47][48][49].We can define a Lévy walk in 2D and construct a hidden Markov model for 2D LWs.The method is briefly sketched in the B. A full investigation of the Bayesian inference method to 2D LWs will be reported elsewhere.One remarkable result is that the estimation of α is more accurate in 2D than in 1D.This is attributed to the fact that successive flight events are more easily identified in 2D, which leads to the increased performance in the parameter estimation of 2D LWs at a higher SNR.
Secondly, our Lévy walk Bayesian model can be easily expanded to include composite correlated random walk (CCRW) processes via an analogous Markovian decomposite method introduced in this work.The CCRW is a multistate random walk in which each state has a short-term (Markovian) directional memory leading to an exponential step-length distribution [46,77].The foraging dynamics of animals is an important example of the CCRW [16,78,79].
The stochastic properties of a CCRW are governed by the superposition rule of the multiple Markovian states.In the framework of CCRWs, the Lévy walk can be regarded as a special model where the superposition is specified by Eqs. ( 16) and ( 17), leading to a power-law flight-time (or step-length) distribution.The CCRW may have a different step-length distribution depending on how multiple Markovian states are superposed.By properly modeling this part, a specific CCRW can be constructed and inferred by the corresponding Bayesian method.A potential application is to identify a correct diffusion model for animal's foraging dynamics.There is a stimulating effort in ecology for modeling the foraging motion [16,[78][79][80][81][82].Often the analysis of the data was based on the step-length distribution (or the inverse cumulative distribution of the step lengths), which led to a debate on whether the data follows a Lévy walk having a power-law step length distribution or a different CCRW [16,46,78,79].Using the Bayesian method, one can analyze the data in a different manner and efficiently differentiate between several candidate models.
Thirdly, our hidden Markov model is applicable to construct Bayesian inference methods for variant Lévy walk models.For example, one can expand the current hidden Markov model for a Lévy walk with fluctuating velocities or a Lévy walk with a rest [27].The latter model was shown to explain the fluid dynamics in a rotating plate and the transport of mRNA-protein complex particles in a neuron cell [54].The Bayesian method will be helpful for extracting the information of parameters in the flight-time and rest-time distributions.Beyond the Lévy walk, once the noise correlation in the successive displacement is adequately incorporated, the current hidden Markov model can be applied to continuous-time random walks with a power-law waiting time distribution.
Finally we comment on some potential technical issues of the current Bayesian approach.The high computational cost is often required for Bayesian inference, which needs to sample parameters from high dimensional posterior distributions and calculate likelihood functions for every sampled parameter.It costs hundreds to thousands of seconds to quantify a trajectory of hundreds of steps.Notwithstanding, it is computationally feasible to handle thousands of trajectories of which step length is of < 1000.In our study, we analyzed such trajectory data within a day using a 40multicore workstation.Additionally, a relevant task is to improve the Bayesian inference method under a strong noise signal.In this work, we have found that the Bayesian inference analysis is vulnerable to noise when SNR ∼ 1.We anticipate that if the noise effect term in the likelihood function [Eq.( 26)] was incorporated without approximation, the Bayesian inference would have a better performance to noisy trajectories under the condition of strong noise signals.
We leave this task as future work.
over angle i to s n j is given by P ij;α [Eq.(38)], and the self-transition probability is given by P self ii;α [Eq.(37)].where Σ n = R(θ n )Σ 1 R(θ n ) T is a covariance matrix for Gaussian fluctuations of v n , θ n is an angle from x-axis of a vector v n , and R(θ n ) is a rotation matrix.From the Markov chain illustrated in Fig. 9(b) with transition probabilities (37) and (38) and the emission probability (39), the likelihood function for 2D Lévy walks can be calculated using forward algorithm as in Eqs. ( 27)- (32).The likelihood function for a 3D Lévy walk model can also be obtained in a similar way.

(a) (b) (c)
Figure 10 shows the performance of the parameter estimation.While most of the results are similar to the onedimensional case, we notice some interesting features: The accuracy of parameter estimation has been improved at SNR = 10 compared to the corresponding case in 1D. Figure 10 shows that mean absolute error decays faster than 1D as trajectory length increases.Moreover, the MAE does not change upon the variation of anomaly exponents (Fig. 10(b)).At SNR = 10, the overall MAE decreases by ∼ 0.02 compared to the 1D case (Fig. 10(c)).

Figure 3 :
Figure 3: Flight time statistics sampled from the inter-arrival times for our Markov chain model (gray lines).In the simulation of the Markov chain model we use the parameters N k = 5, b = 4, k fast = 8, and α = 1.0, 1.3, 1.6, and 2.0 (from Left to Right).The red lines show the discrete target distribution ψ target d

Figure 4 :
Figure 4: Heatmap of likelihood functions for a Lévy walk trajectory with α = 1.6 and the speed v ≈ 5.92.(a) The likelihood function for the Lévy trajectory without the noise (σ noise = 0).(b) The likelihood function for the trajectory with σ noise = 3.In (a) & (b), the heatmap shows the likelihood function marginalized over σ noise as a function of v and α.The line plot represents the likelihood function marginalized over σ noise and v, where the dashed line shows the α at which the likelihood function gives the maximum value.

Figure 5 :
Figure5: The predicted α values from our Bayesian method vs the ground-true value α GT .From Left to Right, the plot correspond to SNR = 10, /2,, and 1.The solid line (red) shows the average of the predicted α values for given α GT .For reference, the dashed line depicts the corresponding curve for α estimated from the fitting of time-averaged MSD curves.We fit the linear part of the MSD curves in log-log plot, where the lag time ∆ ranges from 2 ≤ ∆ ≤ 21 for the trajectories whose lengths are over 100.For the shorter ones, we took 2 ≤ ∆ ≤ 8.The dotted line is the guide line of y = x.

Figure 6 :
Figure 6: Mean absolute error (MAE) from our Bayesian parameter estimation (line-dot) and from the fitting of MSD curves (dashed-dot).(a) MAE as a function of the trajectory length.(b) MAE as a function of α GT .Each data point represents the average value for all trajectory lengths at a given α GT with half bin width |∆α GT | = 0.05.(c) MAE as a function of SNR.Each data point represents the averaged AE for all trajectory lengths.For given N step , five hundred trajectories are generated with an SNR uniformly distributed over (0, 10].In x-axis, half bin width is |∆SNR| = 0.5.

Figure 7 :
Figure 7: Confusion matrices obtained from the Bayesian model classification.From Left to Right, the matrices show the results at SNR = 10, 2, and 1, respectively.The (i, j)-component in a matrix indicates the portion of trajectories classified into the model M j upon the true model M i .

Figure 8 :
Figure 8: Error rates obtained from confusion matrices as a function of the trajectory length, α GT , and SNR.(a) & (d) False discovery rate and false negative rate of LWs as a function of the trajectory length.(b) & (e) False discovery rate and false negative rate of LWs as a function of α GT with half bin width |∆α GT | = 0.05.(c) & (f) False discovery rate and false negative rate of LWs as a function of SNR.Each data point represents the averaged value over trajectory length and α GT .

Figure 9 :
Figure 9: (a) The velocity states in our 2D Lévy walk.There are N θ discrete velocity states specified by v n (n = 1, 2, . . ., N θ ).We allow the orientation of v n to fluctuate around the unit vector (cos((n − 1)θ), sin((n − 1)θ)), which is modeled by a Gaussian fluctuation N ( v n , Σ n ).The superposition of the Gaussian fluctuations (the blue solid line) approximates the uniformly distributed velocity states over angle.(b) The Markov chain model for the 2D discretetime Lévy walk.Each velocity states in (a) have the latent state s ni with i = 1, 2, . . ., N k .The superscript n denotes the velocity state.The transition probability from the state s m i to s n j is given by P ij;α [Eq.(38)], and the self-transition probability is given by P self ii;α [Eq.(37)].

Figure 10 :
Figure 10: Parameter estimation for 2D Lévy walk trajectories.(a) Mean absolute error (MAE) as a function of the trajectory length.(b) MAE as a function of the ground-truth anomaly exponent α GT .(c) MAE as a function of SNR.The dotted line is the result from the Bayesian inference and the dot-dashed line from the fitting of MSD curves.
event (top) is zero.If a walker samples 1 ≤ τ < 2 (middle), it changes the state from s ± i to s ± j with probability P j;α ( 1 2 P j;α for each of the two states s + j and s − j ).If a walker chooses 2 ≤ τ (bottom), it maintains its original state (S t+1 = S t ).(b) Schematic illustration of hidden Markov model for one-dimensional Lévy walk.In this model, the '+' states (red circles) have positive velocity +v and '−' states (blue circles) have negative velocity −v.The probabilities of self-transitions and transitions for every time step are denoted by P self Figure 2: (a) Transition strategy of a random walker in our Markovian approximation of discrete-time Lévy walk model.A walker in a state s ± i samples transition time τ from φ i (τ ) every time step.The probability of τ < 1