Bayesian gradient sensing in the presence of rotational diffusion

Biological cells estimate concentration gradients of signaling molecules with a precision that is limited not only by sensing noise, but additionally by the cell's own stochastic motion. We ask for the theoretical limits of gradient estimation in the presence of both motility and sensing noise. We introduce a minimal model of a stationary chemotactic agent in the plane subject to rotational diffusion, which uses Bayesian estimation to optimally infer a gradient direction from noisy concentration measurements. Contrary to the known case of gradient sensing by temporal comparison, we show that for spatial comparison, the ultimate precision of gradient sensing scales not with the rotational diffusion time, but with its square-root. To achieve this precision, an individual agent needs to know its own rotational diffusion coefficient. This agent can accurately estimate the expected variability within an ensemble of agents. If an agent, however, does not account for its own motility noise, Bayesian estimation fails in a characteristic manner.

Previous work suggested that bacteria use Kalman filters to track a time-dependent concentration signal, providing an optimal weighting of past and recent measurements [26], see also [25]. Yet, Kalman filters address linear problems [27], while sensing a direction is a nonlinear problem of circular statistics, which prompts Bayesian updating of an angular likelihood distribution at each time step [28]. Bayesian estimation had been successfully applied, e.g., for monitoring time-varying environments with two states [29], estimating absolute concentration [30][31][32], or temporal changes thereof [24,33], as well as classification tasks [34,35], and even decision making in the immune system [36].
The specific problem of Bayesian sensing of direction was addressed previously [11,12,15,37], yet without considering motility noise. Likewise, the infotaxis algorithm, which computes a likelihood map for the position of a hidden target, does not include motility noise [38]. In the engineering literature, directional sensing is known as 'bearing tracking', and estimation algorithms do indeed account for motility noise [39,40]. However, to the best of our knowledge, an analytical theory of optimal directional sensing that accounts for motility noise is missing.
Here, we derive theoretical limits for the precision of gradient sensing by chemotactic agents such as biological cells in the presence of both sensing and motility noise. We consider a minimal model of a chemotactic agent in the plane that attempts to track the direction of an external concentration gradient. The agent performs noisy concentration measurements, while it undergoes rotational diffusion. This agent integrates subsequent measurements into a likelihood distribution of possible gradients using Bayesian updating.
Our manuscript is structured as follows: We first briefly review Bayesian gradient sensing without motility noise to introduce notation. We recapitulate how temporal averaging improves the precision of gradient estimates as a function of measurement time. We then introduce motility noise and consider an agent subject to rotational diffusion. This agent, however, is first not aware of its own motility noise, which results in grossly erroneous gradient estimates. In contrast, as our main result, we show how an agent that only knows its own rotational diffusion coefficient D can obtain optimal estimates of gradient direction, as well as a self-consistent estimate of the accuracy of this estimate, i.e., the expected dispersion in an ensemble of agents, which scales as D −1/2 . This optimal gradient-sensing strategy corresponds to temporal averaging over a time span that likewise scales as D −1/2 . We discuss why this new result for gradient-sensing by spatial comparison is different from previous results for chemotaxis by temporal comparison, which predicted a substantially longer optimal time span of temporal averaging that scales as D −1 [25].

MINIMAL MODEL
We consider a chemotactic agent in the plane, see Fig. 1. Orthonormal vectors h 1 and h 2 define its material frame. The agent is subject to rotational diffusion with effective diffusion coefficient D = D rot (with units of inverse time), i.e., h 1 (t 0 ) · h 1 (t 0 + t) = exp(−D|t|). For simplicity, the agent is stationary with center position R 0 .
The agent seeks to estimate the strength and direction of an external concentration gradient with base concentration c 0 , dimensionless gradient strength α 0 = |∇c|a/c 0 (normalized by a sensing length-scale a set by the dimensions of the agent), and gradient direction vector g = cos ψ 0 h 1 + sin ψ 0 h 2 of unit length, such that ψ 0 denotes the gradient angle enclosed by h 1 and g. The concentration gradient Eq. (1) represents an unknown state of the environment, S true (t) = (c 0 , α 0 , ψ true (t)), where ψ true (0) = ψ 0 at time t 0 = 0. The agent shall be equipped with N sensors placed equidistantly on the circumference of a disk of radius a at respective positions r j = R 0 + a cos(2πj/N ) h 1 + a sin(2πj/N ) h 2 , see Fig. 1A. To simplify future calculations, we introduce complex vector notation h = h 1 + ih 2 , which gives, g = Re e iψ0 h * and r j = R 0 + Re η j h * , where η = exp(2πi/N ) denotes the N th root of unity. Each sensor detects stochastic binding events of molecules with rate Λ j = λc j /N proportional to local concentration c j = c(r j ) = c 0 [1 + α 0 cos(2πj/N − ψ 0 )] for j = 1, . . . , N . The sensor count n j during a time interval τ becomes a Poissonian random variable with expectation value n j = n j = Λ j τ and variance n j . As fusion of this sensor information, we consider its Fourier transform n k = N j=1 n j η jk for k = 0, . . . , N − 1 (where we use the complex conjugate of the discrete Fourier transform to simplify formulas below). In a linear concentration field as given by Eq. (1), only the first two Fourier coefficients have non-zero expectation values, n 0 = ν 0 , n 1 = α 0 ν 0 e iψ0 /2 (for N ≥ 3), whereas n k = 0 for 2 ≤ k ≤ N − 2. Correspondingly, we assume that the agent stores only a measurement vector M = ( n 0 , n 1 , n 1 ) ∈ R 3 , where n 1 = n 1 + i n 1 denotes decomposition into real and imaginary part.
In the following, we consider discrete time dynamics with subsequent measurement intervals T j = (t j−1 , t j ) of duration τ and discrete rotational diffusion events at times t j = jτ , where the agent rotates by a random angle ∆ j that is normally distributed with zero mean and variance ∆ i ∆ j = 2Dτ δ ij , see Fig. 1B.
The agents estimates the state of the environment as S = c, α, ψ based on the sequence of measurements M 1 , . . . , M m taken during the time intervals T 1 , . . . , T m using a maximum-likelihood estimate as detailed below.

The measurement process for a single measurement
From the fact that the molecule counts n j are independent Poisson random variables, we readily find the expectation value µ 0 = M and covariance matrix Σ 0 = (M − µ 0 )(M − µ 0 ) T of a single measurement M if the true state is S 0 = (c 0 , α 0 , ψ 0 ). Interestingly, if the agent possesses at least four sensors, N ≥ 4, both µ 0 and Σ 0 are independent of the number N of sensors. For N = 2, gradient-sensing obviously becomes impossible if ψ 0 = ±π/2, while the precision of gradient-sensing (weakly) depends on ψ 0 for N = 3, i.e. it depends on the orientation of the agent relative to the gradient direction, see appendix D. For N ≥ 4, we find and covariance matrix where we introduced short-hand ν 0 = λτ c 0 , and ν = λτ c for later use. Interestingly, the covariance matrix Σ 0 possesses non-zero off-diagonal entries, i.e., n 0 ("measuring absolute concentration") and n 1 ("measuring a gradient") are not independent.
In the limit of large molecule counts, n j 1, we can employ a diffusion approximation, and approximate the probability distribution of each n j by a normal distribution with mean n j and variance n j . The discrete Fourier transform is a linear transformation, hence the distribution of measurement vectors M can likewise be approximated as a multi-variate Gaussian, using the mean values and co-variance matrix computed above, P (M | S 0 ) = N (µ 0 , Σ 0 ). A. In our minimal model, a chemotactic agents seeks to estimate an external concentration gradient ∇c of signaling molecules (relative to its material frame h1 and h2) by counting binding events at N sensor sites spaced equidistantly on the agent's circumference. During each measurement interval Tj of duration τ , the agent obtains molecule counts n1, . . . , nN , which are combined into a (Fourier-transformed) measurement vector Mj. Between measurements, the agent is subject to rotational diffusion with rotational diffusion coefficient D. B. The angle ψtrue(t) enclosed between gradient direction ∇c and material frame vector h1 becomes a random walk with stochastic increments ∆j. This motility noise limits the precision of the gradient estimate ∇c and its direction angle ψ as estimated by the agent. C. The relative direction of the concentration gradient represents a time-dependent state of the environment, Strue(t) = Sj for t ∈ Tj where Sj = (c0, α0, ψj). The agent computes a likelihood distribution L(S) of possible concentration gradients, iteratively executing a prediction step that accounts for its rotational diffusion (which flattens the distribution), and an update step that incorporates a new measurement Mm (which usually sharpens the distribution).

Signal-to-noise ratio
We introduce the signal-to-noise ratio of gradient sensing, SNR (with prefactor matching [41]), which characterizes sensing noise Here, we used Eqs. (2) and (3) in the last step, see also appendix A.1. Explicitly, SNR = λτ |∇c| 2 a 2 /(2c 0 ), i.e., the signal-to-noise ratio scales with measurement time τ . Below, we show that the SNR sets the precision of a single measurement.

Bayesian update rule for likelihood function
The agent computes a likelihood L m = L(S | M 1:m ) for each possible state S = (c, α, ψ) of the environment, based on all previous measurements M 1 , . . . , M m . The corresponding maximum-likelihood state estimate at time t m reads S m = argmax S L m . We are especially interested in the maximum-likelihood estimate ψ m of the gradient direction, and the estimated precision of this estimate (quantified below in terms of a so-called measure of concentration).
After each measurement, the agent updates the likelihood function L m (S), using Bayes' rule Here, P (M | S) is the probability to measure M given a specific state S (measurement process, approximated by Eq. (4)

Likelihood function in the limit of weak gradients
After a single measurement M 1 that yielded a measured angle ψ τ , i.e., n 1 = | n 1 |e iψτ , the likelihood function L 1 = L(S|M 1 ) for S = (c, α, ψ) reads which follows from Eqs. (4) and (6). Here, Λ = Λ(c, α, M 1 ) is a prefactor independent of ψ, see appendix A.3, and we used short-hand A = 2α/(2 − α 2 ) ≈ α for α 1. Of note, L 1 contains as a factor a von-Mises distribution for ψ, p(ψ) ∼ exp[κ τ cos(ψ − ψ τ )] with measure of concentration κ τ = n 0 | n 1 |A/ν, see also appendix B. In the limit of weak concentration gradients, α 0 1, this factor dominates (for likely S and typical M). The second exponential factor in Eq. (7), which results from the offdiagonal entries of the covariance matrix Σ, represents a von-Mises distribution for 2ψ. The corresponding measure of concentration | n 1 | 2 αA/(4ν) ∼ α 0 κ τ is small compared to that of the first factor (for likely S and typical M). Hence, we can approximate this factor by a constant.
The measure of concentration κ τ corresponding to a single measurement depends on M 1 and is thus itself a random variable. We can compute the expectation value of κ 2 τ exactly see also appendix A.1. For the first moment of κ τ , we find an analytic expression in the limit of high signal-to-noise ratio see appendix A.2. Accordingly, we can interpret κ 2 τ as the sum of a squared mean κ τ 2 ≈ SNR 2 , and a variance κ 2 τ − κ τ 2 ∼ SNR. The asymptotic scaling κ τ ∼ SNR is consistent with geometric intuition: the measure of concentration κ τ is closely related to the circular variance, which in turn can be estimated by considering a typical right-angled triangle in the complex plane with small angle ψ τ − ψ 0 and catheti |Re n 1 e −iψ0 | ≈ α 0 ν 0 /2 and |Im n 1 e −iψ0 | ∼ √ ν 0 for SNR 1.
Next, we give an explicit approximation for L 1 . For simplicity, we consider the special case, where the Bayesian prior L 0 (S) is itself a von-Mises distribution in ψ, centered at ψ 0 with measure of concentration κ 0 , and the agent possesses perfect knowledge of the other two environmental variables, c 0 and α 0 , . This is not a severe restriction, at least not for the absolute concentration c, as agents can estimate c 0 very precisely for ν 0 1. Now, the updated likelihood distribution L 1 is again a von-Mises distribution with new measure of concentration κ 1 and maximum-likelihood angle ψ 1 , see also appendix B As expected, ψ 1 is a weighted circular mean of the prior ψ 0 and the measured ψ τ For the measure of concentration κ 1 of the updated likelihood function L 1 , we find from Eq. (11) with κ τ ≈ α 0 n 0 | n 1 |/ν 0 3.5. Sequential estimates in the absence of rotational diffusion We are interested in the marginal likelihood distribution L m (ψ) of the estimated gradient angle ψ after m subsequent measurements. By applying Bayes' update rule Eq. (6) iteratively m times, we obtain an approximate formula for L m (ψ) as a von-Mises distribution Next, we compute the measure of concentration κ m . In the absence of rotational diffusion, subsequent measurements are independent random variables. This allows us to compute the second moment of κ m analogous to Eqs. (S10) and (12), see appendix A.4 Eq. (14) corroborates how chemotactic agents increasingly become more confident of their gradient estimates as the number m of sequential measurements increases. Eq. (14) is equivalent to the result for a single long measurement of duration mτ , for which the effective signal-to-noise ratio reads mSNR, see also appendix D. Asymptotically, the root-mean-square expectation value of the measure of concentration, normalized by the number m of measurements, approaches the signal-to-noise ratio SNR

GRADIENT SENSING IN THE PRESENCE OF ROTATIONAL DIFFUSION
We now consider a chemotactic agent subject to rotational diffusion with D > 0. The orientational angle ψ true (t) that specifies the direction of the gradient vector g relative to the material frame of the agent at time t, i.e., cos[ψ true (t)] = h 1 (t) · g, thus becomes a stochastic process. For simplicity, we consider discrete time dynamics, where rotational diffusion events occur at discrete times t j = jτ . Thus, the orientation angle ψ true (t) is constant dur-

Rotational diffusion jeopardizes gradient measurements if agents are unaware of it
We calculate the expected measure of concentration of sequential gradient estimates for D > 0, following the calculation for the special case D = 0 in section 3 3.5. For simplicity, we again assume a Bayesian prior of the form . For agents with rotational diffusion, subsequent measurements M j and M k are not independent random variables because they depend on the underlying stochastic process ψ true (t). Specifically, This correlation marks a crucial difference to the case D = 0 treated above in Eq. (14), where we exploited that subsequent measurements are independent. For D > 0, Eq. (S12a) in appendix A.4 still holds, but Eq. (S12b) does not.
We now only use the approximation that n 1 are approximately independent for each measurement, but account for the correlation of subsequent measurements in Eq. (16). With this approximation, we obtain We compute the sums of expectation values in Eq. (17) by evaluating a (double) geometric series using Eq. (16), see appendix A.4. As result, we find In the limit of slow diffusion, Dτ 1, we find to leading order in Dτ , This provides an asymptotic scaling for κ m , valid in the limit SNR 1 and slow diffusion, Dτ 1 For D > 0, κ m will initially increase linearly with m, and cross-over to the asymptotic scaling κ m ∼ m −1/2 beyond a characteristic measurement time t = mτ on the order of D −1 . In fact, the condition SNR 1 is not needed for this asymptotic scaling, provided D SNR/τ and m SNR −1 . (The first condition ensures Φ 2 Φ 1 , while the second condition implies that the contribution of the Bayesian prior is negligible.) Note that in the continuum limit Eq. (19) shows how an agent subject to rotational diffusion that does not take into account its own stochastic motion in its update process will erroneously believe that its gradient direction estimate becomes increasingly more accurate if measurement time t = mτ is increased. Yet, this is wrong.
In fact, in an ensemble of agents, the estimation errors δ j = ψ j − ψ j will eventually become completely randomized. To illustrate this behavior, we characterize the distribution of estimation errors δ within an ensemble of agents, approximating it by a wrapped normal distribution with variance parameter σ 2 m . We show that σ 2 m increases as a function of time t m . For the estimation error of an individual agent, we have an approximate iteration rule, valid for early times, mτ D −1 , and high signal-to-noise ratio, SNR 1, which expresses the new error as an affine interpolation of the previous error and the error of the last measurement Here, we introduced the variance parameters σ 2 τ and σ 2 m of wrapped normal distributions approximating von-Mises distributions with measures of concentration κ τ and κ m computed above in Eqs. (10) and (14), such that respective distributions have the same circular variance. Mathematically, σ 2 = −2 ln I 1 (κ)/I 0 (κ), see appendix B. From Eq. (20), we obtain an approximate iteration rule for σ 2 This expression suggests that σ 2 m grows asymptotically as √ 2Dτ m, see appendix A.5. Correspondingly, the circular variance CV = 1−e −σ 2 /2 of the distribution p(δ) should increase, eventually converging to 1 for mτ D −1 . Although the specific assumptions made in the derivation of Eq. (21) do not hold in this limit, simulations corroborate this simple picture, see Fig. 2A.
In conclusion, agents not aware of their own rotational diffusion, will arrive at erroneous gradient estimates. The reason is that past measurements will have become partially invalidated by rotational diffusion, yet are nonetheless incorporated in the gradient estimates with full weight. Concomitantly, the precision that individual agents estimate for their own gradient measurement does not reflect the true accuracy, i.e., the dispersion of maximum-likelihood estimates in an ensemble of agents. Individual agents are 'over-confident' of their own estimates.  18) and (21), for estimated precision and accuracy, respectively. The accuracy converges to one, corresponding to the randomization of estimated angles ψ. At the same time, the estimated precision converges to zero, displaying a cross-over between two scaling regimes as predicted by Eq. (19), see inset. B. Agents aware of own rotational diffusion. Same as panel A, but agents take into account their rotational diffusion in a prediction step for L(ψ). Solid lines represent the analytical results, Eq. (23) and (21). Estimated precision and accuracy converge to the same limit value, CV∞. Inset illustrates circular distributions with circular variance 0.07 (black), 0.14 (gray), 0.64 (light-gray), using von-Mises distributions centered at an arbitrary ψ0. Error bars represent s.e.m. (determined by bootstrapping for an ensemble of n = 5000 agents, occasionally smaller than symbols). Parameters: ν0 = 5000, α0 = 0.03, D τ = 0.05; Bayesian prior: κ0 = 3.09 ≈ κ 2

Agents aware of own rotational diffusion
We now consider an agent that correctly takes into account its own rotational diffusion before updating the likelihood distribution L(S) of estimated concentration gradients. Following the terminology of the known Kalman filter algorithm, we consider in addition to the update step, Eq. (6), which describes the incorporation of new measurement information, an additional prediction step that describes the change of L(S) due to rotational diffusion, see Fig. 1C. In its most general form, this prediction step is given by a Chapman-Kolmogorov equation Here, P (S |S) is the transition probability from state S to state S at time t m−1 . In our case, P (ψ |ψ) is a wrapped normal distribution with zero mean and variance 2Dτ , while the other two state variables do not change, i.e., P (S |S) To make analytical progress, we again assume perfect knowledge of concentration c 0 and gradient strength α 0 , i.e., a Bayesian prior of the form L 0 (S) ∼ exp[κ 0 cos(ψ − ψ 0 )] δ(c − c 0 ) δ(α − α 0 ). In the limit of high signal-to-noise ratio, SNR 1, and slow diffusion, Dτ 1, we can approximate all factors in Eqs. (6) and (22) by von-Mises distributions with appropriate measures of concentrations. In this limit, the update step corresponds to the (normalized) product of two von-Mises distributions, while the prediction step corresponds to the convolution of two such distributions. From the calculus of directional distributions, see appendix B, we obtain a recursive relation for the measure of concentration κ m of a von-Mises distribution approximating L m (ψ) Here, we assume κ m ≈ κ m , κ τ ≈ κ τ , which is valid for SNR 1. For completeness, we list all assumptions made in deriving Eq. (23): (i) high molecule count, ν 0 1 (enabling the diffusion approximation in the measurement model Eq. (4)), (ii) weak gradient, α 0 1 (allowing us to approximate likelihood distributions by von-Mises distributions), (iii) high signal-to-noise ratio, SNR 1 (allowing us to equate the precision estimated by an agent with its expectation value), (iv) slow diffusion, Dτ 1 (which, together with SNR 1, ensures that the simple formulas Eq. (S22) and Eq. (S23) can be used for the measure of concentration of convolutions and normalized products of von-Mises distributions, respectively).
We can understand the scaling for the limit value in Eq. (24) intuitively as follows. We expect κ m to increase linearly with total measurement time t = mτ for t t ∞ , and to saturate to a limit value κ ∞ for t t ∞ , where the cross-over time t ∞ = t ∞ (D) is a yet unknown function of the rotational diffusion coefficient D. Such a saturation curve suggests a scaling relation, t ∞ /τ ∼ k ∞ /k τ . Any measurements taken at a time t in the past will have been corrupted by rotational diffusion to an extent that they do not serve to increase the measure of concentration above a value (Dt) −1 . Thus, the cross-over time t ∞ must satisfy Dt ∞ ∼ k −1 ∞ . We conclude t 2 ∞ ∼ τ /(Dκ τ ), hence κ ∞ ∼ κ τ /(Dτ ). With Eq. (24), we can characterize the distribution of estimation errors δ m = ψ m − ψ m within an ensemble of agents. By Eq. (21), the variance parameter of this distribution converges to σ 2 i.e., the estimated precision σ 2 ∞ ≈ κ −1 ∞ = [2Dτ /SNR] 1/2 of gradient sensing of each individual agent is a faithful estimator for the true accuracy, i.e., the dispersion σ 2 ∞ of maximum-likelihood estimates within an ensemble, provided the agents know their rotational diffusion coefficient D. Fig. 2B corroborates this finding for the equivalent measure of circular variance.
More generally, we can consider agents that assume a value D for their rotational diffusion coefficient when performing their prediction step Eq. (22), while the true rotational diffusion coefficient is D. In this case, the variance parameter σ 2 ∞ of the distribution p(δ) of estimation errors follows from Eqs. (21) and (23) in the long-time limit as which attains the minimal value σ 2 ∞ = σ 2 ∞ exactly for D = D.

Summary of results.
We considered a minimal model of gradient sensing in the presence of both sensing and motility noise. We derived analytical results for sequential Bayesian estimation by chemotactic agents that undergo rotational diffusion. Gradient sensing fails if agents are not aware of their own rotational diffusion, because agents extend temporal averaging infinitely into the past. Concomitantly, the estimated gradient direction decorrelates from the true direction on the time-scale of rotational diffusion, while agents erroneously believe that their estimates become more and more precise as function of total measurement time t. Interestinlgy, we find an abnormal asymptotic scaling of the estimated variance of gradient estimates, CV[L(ψ)] ∼ t −1/2 , see Eq. (19). This abnormal scaling is a signature of erroneous state estimation and is intimately related to the properties of circular statistics. This signature could be tested for in real-world applications, e.g., of bearing tracking.
In contrast, if agents know their own diffusion coefficient, sequential Bayesian estimation yields accurate estimates of gradient direction. These estimates are both faithful and self-consistent, i.e., estimation errors are unbiased and individual agents can estimate how large their estimation error is in each time step. In fact, the estimated precision that individual agents assign to their individual direction estimate converges to the true accuracy, i.e., the dispersion of maximum-likelihood estimates within an ensemble of agents. Remarkably, the ultimate precision of gradient sensing scales with the square root of the rotational diffusion coefficient D as CV[L(ψ)] ∼ √ D. Intuitively, agents extend temporal averaging only over the recent past defined by a time span of duration t ∞ ∼ D −1/2 . Measurements taken before this time span still contain partial information on the current gradient direction as long as they do not extend beyond the rotational diffusion time D −1 . Yet, these measurements are already corrupted too much by rotational diffusion as that they could add to the precision achieved by temporal averaging only over the time span t ∞ . In fact, these past measurement would make the estimate worse if they were included. Mathematically, this reflects a difference between estimating a vectorial quantity, e.g., gradient direction, as opposed to estimating a scalar quantity, see appendix C.
Previous theoretical limits. Our result on the optimal time span for temporal averaging is different from previous work [25], which suggested that temporal comparison should be extended over a time span set by the rotational diffusion time, i.e., t ∞ ∼ D −1 . This previous work addressed a different sensing and chemotaxis strategy based on temporal comparison, which measures a scalar quantity. It turns out that this makes a crucial difference. In short, temporal comparsion relies on active motion of chemotactic agents within a spatial concentration gradient, such that concentration differences in space become encoded in a temporal change of local concentration measurements. Thereby, the cells estimate only a scalar quantity, the component of the concentration gradient along their direction of motion (v · ∇c). If a cell's swimming direction v/|v| decorrelates on a time-scale D −1 due to rotational diffusion, an optimal filter should indeed discount previous measurements on the same time-scale, i.e., downweight measurements taken a time ∆t before by a factor ∼ exp(−D ∆t). The strategy of temporal comparison is different from spatial comparison as considered here, where chemotactic agents estimate concentration gradients by comparing concentrations across their diameter. Bacteria performing run-and-tumble chemotaxis employ temporal comparison [19], while most eukaryotic cells with crawling motility employ spatial comparison [4,5,[42][43][44]. A third chemotaxis strategy is represented by marine sperm cells navigating along helical paths [3]. Although these cells effectively use only a single sensor, their chemotaxis can be mapped on the case of spatial comparison considered here: while moving along helical paths, these cells 'visit' different sensor positions during one helical turn.
In conclusion, our work identified a crucial difference regarding the optimal time span of temporal averaging between the chemotaxis strategies of temporal and spatial comparison if both sensing and motility noise are present.
Typical parameters. Typical rotational diffusion coefficients for the bacterium E. coli are D ∼ 0.1 s −1 , close to the theoretical lower limit of a passive particle of same size and shape. For ten-fold larger sperm cells, active fluctuations dominate [45], resulting in an estimate D ∼ 0.01 − 0.1 s −1 [46]. The motility of crawling Dictyostelium cells was characterized by a persistence time of ∼ 5 min in the absence of chemoactractant [47], which sets an effective rotational diffusion coefficient D ∼ 0.003 s −1 . For immune T cells in three-dimensional tissue, a persistence time of ∼ 1 min was found (displaying a characteristic speed dependence) [48]. If binding of signaling molecules to receptors on the cell surface is diffusion-limited, we can estimate the rate of binding by λc 0 = 4πD c ac 0 for a perfectly absorbing spherical cell of radius a, where D c denotes the translational diffusion coefficient of signaling molecules [49]. For typical values (D c ∼ 300 µm 2 s −1 , a ∼ 10 µm, c 0 ∼ 1 nM), we estimate λc 0 ∼ 10 4 s −1 . Thus, for a concentration gradient of either α 0 = 1%, 0.5%, or 0.1% across the diameter of a cell, and a measurement time τ = 10 s, we estimate signal-to-noise ratios of gradient sensing, SNR ≈ 3.5, ≈ 0.9, and ≈ 0.03, respectively. Assuming D ∼ 0.003 s −1 [47], our main result, Eq. (24), predicts for the ultimate precision of gradient sensing CV ∞ ≈ 0.07, ≈ 0.14, and ≈ 0.64 for these three gradient strengths, respectively (see inset of Fig. 2B for visualization). Reversible binding of signaling molecules effectively increases sensing noise, thus decreasing the signal-to-noise ratio by a constant prefactor [8,9,12]. Some cells, such as sperm cells, respond chemotacticly even at pico-molar concentrations [50], corresponding to respectively lower signal-to-noise ratios [20,41].
Bayesian estimation sets a lower bound for the precision of gradient sensing. This theoretical limit is relevant at high noise levels, but may be less so if noise is low. For the slime mold Dictyostelium, it was shown that at signal-to-noise ratios (SNR) below one, the efficacy of chemotaxis was well characterized by SNR alone, while noise of downstream intracellular signaling [51] becomes also relevant at high SNR [12,16,17]. While several of our analytical results were derived for high SNR, scaling relations persist for low SNR and are confirmed by numerical simulations.
Biochemical implementation. Storing the likelihood distribution of estimated gradient directions, or just a proxy thereof, requires internal memory. We speculate that the distribution of chemotactic effector molecules on the cell boundary as considered in recent models such as LEGI (local excitation, global inhibition) [52], or balanced-inactivation models [53] can indeed serve as a such a proxy. While the position of a concentration peak in such a distribution can represent a maximum likelihood estimate, its amplitude could encode a level of certainty. Similarly, the directional persistence and polarization of crawling cells represents a form of effective memory [54,55]. For marine sperm cells, the axis of their helical paths likewise represents a consolidated memory of previous noisy concentration measurements [56].
Time-varying gradients. Here, we considered only static concentration gradients. Yet, the general framework developed here generalizes in a straight-forward manner to time-varying environments, provided their temporal statistics is known to the agent. For example, analogous results apply if instead of rotational diffusion of the chemotactic agent, it is the direction of the concentration gradient that changes stochastically with a correlation time D −1 (e.g., due to an explicit time dependence of concentration fields, or due to active motion of the agent within a spatially complex concentration field). The seminal infotaxis strategy proposed a Bayesian framework for navigation in time-dependent concentration fields, using time-averaged properties of scalar turbulence [38]. In the presence of motility noise, this problem becomes considerably harder, for which our work can serve as a first step.

S1
A. DETAILS ON ANALYTICAL CALCULATIONS

A.1. Expectation values of higher moments
In deriving Eqs. (2) and (3) for the mean µ 0 and covariance matrix Σ 0 of a single measurement M, valid for N ≥ 4, we used n 0 = ν 0 , n 2 0 = n 0 2 + ν 0 , Similarly, For the special case N = 2, we find different from Eq. (S1) For N = 3, we find the same first moments as in Eq. (S1), but different covariance matrix where Σ (N ≥4) 0 denotes the result for N ≥ 4 from Eq. (3). The mathematical reason is that the calculation of n 1 involves a sum of squared roots of unity, N j=1 η 2j , which is nonzero for N = 2, while the calculation of n 2 1 for Σ 0 involves a sum of cubed roots of unity, N j=1 η 3j , which is nonzero for N = 3.
To compute | n 1 | , we use the law of cosines The isotropy of the covariance matrix in Eq. (3) implies that ϕ is a uniformly distributed random angle with probability distribution p(ϕ) = (2π) −1 , while b 2 follows a χ 2 -distribution for 2 degrees of freedom (namely, n 1 − n 1 and ϕ). The first integration, , results in an elliptic integral, which, however, can be well approximated by The second integration can now be easily done, yielding where the ellipses represents terms that decay exponentially fast for SNR 1.

S2
A.3. Prefactor in Eq. (7) The prefactor Λ in Eq. (7) is independent of ψ and reads In the limit of low signal-to-noise ratio, SNR 1, and α ≈ α 0 , this expression simplifies to

. Precision of sequential measurements
We compute the second moment κ 2 m of the measure of concentration of the marginal likelihood distribution L m (ψ) of the estimated gradient angle ψ after m subsequent measurements, approximating said distribution by a von-Mises distribution.
Analogous to Eq. (11), we have where κ (S11) Thus, we find analogous to Eq. (12) In the presence of rotational diffusion, D > 0, Eq. (S12a) still holds, but Eq. (S12b) does not. We use the approximation that n 1 are approximately independent for each measurement, hence We first compute the second sum of expectation values in Eq. (S13) by evaluating a double geometric series (S14) Similarly, we find for the first sum (S15) S3 By inserting Eqs. (S14) and (S15), as well as Eq. (S1) into Eq. (S13), we obtain from which Eq. (18) follows.

C. OPTIMAL AVERAGING TIME FOR ESTIMATION OF A SCALAR QUANTITY
Previous work addressed the optimal time span for temporal averaging for chemotaxis by temporal comparison [19]. In our notation, this amounts to estimating the scalar quantity s true (t) = d dt c(R(t)) = v 0 h 1 · ∇c from a noisy input signal s(t), where the agent moves with velocityṘ = v 0 h 1 .
As a minimal pedagogical model, we approximate s true (t) as an Ornstein-Uhlenbeck process with s true (t) = 0 and s true (t)s true (t − ∆t) = (α 0 c 0 v 0 ) 2 /2 exp(−D|∆t|). We assume a measurement process with additive sensing noise, s(t) = s 0 (t) + ξ(t), where ξ(t) denotes Gaussian white noise with ξ(t) = 0 and ξ(t)ξ(t ) = (σ 2 /τ ) δ(t − t ). The agent shall perform temporal averaging using a linear filter χ(∆t) We restrict ourselves to filters of exponential form, χ(∆) = A exp(−∆t/t χ ), and ask for the optimal averaging time span t χ . From the condition for a faithful estimtor, s(t) = s true (t), we obtain the prefactor A as A = D + 1/t χ (where this and subsequent expectation values are conditioned on s true (t)). We minimize the variance of estimation errors δ(t) = s(t) − s true (t), which is equivalent to minimizing s 2 . Using the autocorrelation function of s true (t) above, we find We compute this integral using the change of variables z 1 = ∆t 1 + ∆t 2 and z 2 = ∆t 1 − ∆t 2 , and find This expectation value becomes minimal exactly for t χ = D −1 , irrespective of the value of s true (t). Thus, the optimal averaging time equals the rotational diffusion time for estimating this scalar quantity. Similar results were found for detailed models of bacterial chemotaxis [19]. Note that in this case, the time-derivative d/dt is incorporated into the filter function itself, representing a smoothed time derivative.

D. INTERMEDIATE MEASUREMENTS GIVE (ALMOST) NO ADVANTAGE
As in the main text, we consider an agent that monitors a static environment with constant state S(t) = S 0 , where the agent takes subsequent measurements M j during time intervals T j = ((j − 1)τ, jτ ). We ask if knowing the results M j of the intermediate measurements confers any advantage compared to a single measurement of same total duration mτ , corresponding to the sum M = where the left-hand and right-hand side denote the likelihood of state S estimated from two intermediate measurements M 1 and M 2 , each of duration τ , or a single measurement of duration 2τ given by M + = M 1 + M 2 , respectively.

S5
We assume that measurements are Poisson distributed, i.e., P (M|S; τ ) = e −µ µ M /M!, where the mean number of binding events µ = λcτ within a time-interval τ is proportional to concentration c. A straight-forward application of the Binomial formula yields P (M + | S; 2τ ) = P (M 1 | S; τ )P (M 2 | S; τ ) for the measurement probabilities. Bayes' theorem implies Eq. (S27) for arbitrary prior L 0 (S). The case of k ≥ 2 measurements follows by induction.
In the limit dσ/dµ 1, the pre-factor on the right-hand side of Eq. (S28) will be approximately constant, displaying only relative changes on the order of [M − /σ(µ)] 2 dσ/d µ | µ=µ , which, with probability close to 1, is small. Specifically, this will hold true for a diffusion approximation of a Poissonian measurement model, whereμ = σ 2 (μ), providedμ is large, corresponding to the limit where Poissonian and Gaussian measurement models converge to each other.
The above result generalizes in a straight-forward manner to the case vectorial measurements that are distributed according to a multi-variate normal distribution (by diagonalizing the co-variance matrix). As a corollary, we thus obtain an analogous argument for the estimate of the direction of a concentration gradient, see also Eq. (14).
In conclusion, taking intermediate measurements confers a minute advantage for gradient estimation, which vanishes in the limit of large molecular counts.