Statistical parameter inference of bacterial swimming strategies

We provide a detailed stochastic description of the swimming motion of an E.coli bacterium in two dimension, where we resolve tumble events in time. For this purpose, we set up two Langevin equations for the orientation angle and speed dynamics. Calculating moments, distribution and autocorrelation functions from both Langevin equations and matching them to the same quantities determined from data recorded in experiments, we infer the swimming parameters of E.coli . They are the tumble rate ${\lambda}$, the tumble time $r^{-1}$ , the swimming speed $v_0$ , the strength of speed fluctuations ${\sigma}$, the relative height of speed jumps ${\eta}$, the thermal value for the rotational diffusion coefficient $D_0$ , and the enhanced rotational diffusivity during tumbling $D_T$ . Conditioning the observables on the swimming direction relative to the gradient of a chemoattractant, we infer the chemotaxis strategies of E.coli . We confirm the classical strategy of a lower tumble rate for swimming up the gradient but also a smaller mean tumble angle (angle bias). The latter is realized by shorter tumbles as well as a slower diffusive reorientation. We also find that speed fluctuations are increased by about 30% when swimming up the gradient compared to the reversed direction.


I. INTRODUCTION
One of the most prominent model swimmers in the field of biological microswimmers is the gut bacterium E.coli equipped with peritrichous flagella [1]. Its well known run-and-tumble swimming motion and chemotaxis strategy has been thoroughly studied [2][3][4][5][6][7]. Nowadays, modern imaging techniques allow for high-throughput recording of bacterial trajectories [8][9][10][11][12][13][14][15]. The method of labeling flagella by fluorescent markers allows to unravel the diverse swimming mechanisms of microorganisms [16,17]. These refined techniques require an appropriate theoretical modeling of the bacterium's stochastic swimming path, including the dynamics of tumbling. They also require a rational and effcient method how to analyze the recorded data in experiments. In this article we provide such a theoretical framework and illustrate it for the model bacterium E.coli . Thereby, we also reveal some new and detailed insights into its chemotaxis strategy.
The E.coli bacterium resides in the run phase, when all of its flagella form a bundle and rotate counterclockwise. The bacterium swims along a straight line, only thermal rotational diffusion affects its persistence. When at least one of the flagella reverses its sense of rotation, it leaves the bundle and the bacterium is in the tumble phase, where it strongly reorients [18,19]. Typically, the tumble phase is much shorter than the run phase [1]. Therefore, in theoretical models tumbling is considered as instantaneous and a single event is described by a tumble angle drawn from a distribution [8,[20][21][22]. However, a recent and instructive work by Saragosti et al. showed that re-orientation during tumbling can be modeled by enhanced rotational diffusion [23].
In order to analyze large amounts of data from recorded trajectories, specialized computer algorithms, called tumble recognizers, have widely been used to identify tumble events [3,8,9]. In order to distinguish runs from tumbles, these automated tumble recognizers compare turning rate and speed to threshold parameters. They are necessary to distinguish variations of speed and turning rate due to the ubiquitous noise from a real tumble event. The threshold parameters have to be chosen a-priori and adjusted until results from the automatized tumble recognition agree with a visual inspection of the trajectories. There is no general rule how to set these parameters and indeed they vary quite substantially [3,8].
In an earlier work [22], we presented a parameter inference technique that allows to quantify the swimming behavior of bacteria without the need of setting parameters a priori. Kramers-Moyal coefficients were calculated from a suitable stochastic model for the dynamics of the orientation angle and matched to the coefficients determined from experimental data. In particular, the stochastic model treated tumble events as instantaneous. This procedure provided the main characteristics of E.coli and the bacterium Pseudomonas putida: tumble rate, distribution of tumble angles, and the thermal rotational diffusivity. For E.coli it also confirmed an angle bias during chemotaxis reported earlier [8]: the mean tumble angle is larger when swimming against a chemical gradient compared to moving along it. Other parameter inference techniques use the framework of Bayesian inference [9,24]. However, they pose a complex numerical challenge as one has to maximize a likelihood function that contains the data of all the recorded trajectories.
In this article we considerably extend our earlier work by resolving tumble events in time and by incorporating a stochastic process for the speed dynamics (see  The dynamics of the orientation angle is diffusive, where the rotational diffusivity switches via a telegraph process [25] between its thermal (run phase) and enhanced value (tumble phase). The dynamics of the speed contains a shot-noise process [26,27]. It initiates a tumble event by decreasing the speed value, which then relaxes back to the swimming speed according to an Ornstein-Uhlenbeck process [28]. Calculating moments, distribution and autocorrelation functions for orientation angle as well as speed and matching them to the same quantities calculated from experimental data, we are able to infer the swimming parameters of E.coli . Their values are in good agreement with the parameters determined using a tumble recognizer. Compared to the Bayesian framework, our method of parameter inference considerably lowers the efforts of the numerical optimization.
To explore the chemotaxis strategy of E.coli , we condition [29,30] moments and autocorrelation functions on the swimming direction relative to the chemical gradient and infer the swimming parameters as a function of the orientation angle. Besides the well-known chemotaxis strategy (modulation of the tumble rate), we confirm the recently discovered angle bias [8]. We show that the increased angular persistence when swimming up the gradient is caused by both shorter tumbles as well as smaller rotational diffusivity. Moreover, for the same swimming direction we identify larger fluctuations in the speed value.
The article is organized as follows. In Sect. II we introduce the two Langevin equations of our stochastic model and calculate moments, distribution functions, and autocorrelation functions for speed and orientation angle. Section III reviews details of the experiments. Section IV first explains the inference method and then presents our results in a uniform buffer solution (control experiment) and in the gradient of a chemoattractant. We close with a summary and an outlook in Sect. V.

II. MODEL AND METHOD
A. Stochastic model for the random walk of E.coli A typical trajectory of bacteria such as E.coli is described by a run-and-tumble random walk. During the run phase the bacterium moves forward along a nearly straight line, only rotational thermal noise affects its persistence. During the tumble phase the bacterium's speed is reduced and it reorients strongly into a new direction. The angle between the orientations before and after the tumble event is the tumble angle β. We express the velocity of the bacterium in two dimensions as the product of speed v(t) and unit vector e(t) = (cos Θ, sin Θ), where the orientation angle Θ is measured with respect to the x axis. We set up two overdamped Langevin equations for speed and orientation angle, which fully describe the bacterial motion, We introduce both Langevin equations in more detail.
(1) The equation for speed v(t) contains three terms, which are associated with drift, diffusion, and jumps. We start with the last term, It initiates each tumble event at time t i by a shot-noise process, while the occurrence of times t i follows a Poisson process with tumble rate λ. At the beginning of each tumble, the bacterial speed is reduced by the relative jump height η to (1 − η)v t and N λ is the actual number of tumble events. The first and second term represent a conventional Ornstein-Uhlenbeck process. After a tumble event the speed relaxes with relaxation rate r towards the swimming speed v 0 of the run phase. Thus r −1 is the mean duration of a tumble event, which we call tumble time in the following. The Gaussian white noise term is fully determined by ξ sp = 0 and ξ sp (s)ξ sp (t) = σ 2 δ(t − s), where we introduce the white noise strength σ. It describes the ubiquitous noise due to internal noise of the swimming mechanism, variations between individual bacteria, and measurement errors. Note that the actual tumble time of a bacterium is exponentially distributed. In our model the white noise term also induces stochastic fluctuations in the duration of the tumble events as visible in Fig. 2(b). Altogether, the stochastic speed process is determined by five parameters: {λ, r, v 0 , σ, η}.
(2) The stochastic equation for the orientation angle Θ is fully described by rotational diffusion, where the white noise process is defined by ξ an = 0 and ξ an (s)ξ an (t) = δ(t − s). Following Ref. [31], we model tumbles as a random walk on a unit sphere with enhanced rotational diffusion. Thus, the rotational diffusion coefficient D rot (t) is no longer a constant but alternates between two values: the thermal rotational diffusion coefficient D 0 during run phases and an enhanced value D T during tumble phases. We describe each transition between the two states by a Poisson process and thus obtain a telegraph process. The transition rate from the run to the tumble phase is the tumble rate λ, whereas the transition rate in the opposite direction is the speed relaxation rate r or the inverse tumble time. A full definition and basic properties of the telegraph process are given in the appendix B 2 or can be found in [25].
To link the telegraph process to the shot-noise process for the speed value in Eq. (4), the diffusion coefficient switches at the same times t i from the thermal (D 0 ) to the enhanced (D T ) value. Note, while the speed process allows a second tumble although the first one is not finished yet, this is not possible in the telegraph process for rotational diffusion. However, for bacteria like E.coli the time between tumble events is typically one order of magnitude larger than the tumble time r −1 . This makes these double events very rare and tumble events in both speed and angular processes coincide. All in all, we have four parameters governing the stochastic process for the orientation angle: {λ, r, D 0 , D T }. Figure 2 shows a typical simulated trajectory (a) and the corresponding time series for speed and angular displacement ∆Θ during time step ∆t = 0.1 s (b). It has to be compared to the experimental time series of both quantities in (c). Note that ∆Θ ∆t represents the turning rate of the bacterium. In the following we will always work with the angular displacement ∆Θ.

B. Basics of the inference method
In this section we state moments, stationary distributions, and time autocorrelation functions for the stochastic processes of speed and orientation angle in Eqs. (2) and (3). They depend on the swimming parameters introduced above. Matching the theoretical expressions of these quantities to the values determined by averaging over all individual tracks of the experiments, we are able to infer the mean swimming parameters of an E.coli population. We refer to appendix B for details of the derivations and only state the final expressions in the following.

Speed
The moments m V n = v(t) n of Eq. (2), where the average is taken over all times t and all tracks in the long-time limit, can be calculated as a function of the reduced parameter set λ/r, η, v 0 , σ 2 /r . For the first moment, the    mean speed, we obtain The mean speed is smaller than the swimming speed v 0 since during the tumble phase speed is reduced by a factor η. More generally, a recursive formula for the nth moment is given by where the zeroth moment is m 0 = 1 due to normalization. We now have access to all the speed moments. As an example, Fig. 3(a) shows a histogram for the distribution of speed values recorded in an experiment, from which the speed moments can be calculated. The orange line represents the distribution obtained from numerically solving the speed equation (2) using the actual parameters inferred from this experiment. The two distributions nicely agree, which is an a-posteriori verification of our Langevin equation.
From the moments we can only infer the ratios λ/r and σ 2 /r. In order to determine the full set of parameters of (a) (b) Eq.
(2), we also use the speed autocorrelation function for our model. It has an exponential form with relaxation rate r + ηλ, Figure 3(b) shows the autocorrelation function for the experimental data of E.coli . Indeed, the curve is well-fitted by an exponential over two decades up to τ 1s, which is around half the mean track length. This agreement supports the validity of our stochastic description of the speed process in Eq. (2).

Angle
Here, we work directly with the steady-state probability distribution p(|∆Θ|) for the absolute angular displacement |∆Θ| during a finite time step ∆t. We determine p(|∆Θ|) from Eq. (3) for the orientation angle as a function of the reduced parameter set (λ/r, D 0 , D T ). In the long-time limit the probability distribution p(|∆Θ|) becomes stationary and is given by where N (0, σ) denotes the normal distribution with zero mean and standard deviation σ. For our parameter inference we use the same time step ∆t = 0.1s −1 as in Ref. [31]. Figure 4(a) presents a histogram for all angular displacements in time step ∆t recorded in the experiment. It shows a deviation from the theoretical distribution of Eq. (8) in the tail at angles larger than π/2, which is visible only in the semi-logarithmic plot. Note that the region |β| > π/2 only represents roughly 3% of all angular displacements. There are two possible reasons for this deviation: First, we record angular displacements Θ = π + as a displacement −(π − ) since we cannot distinguish between tumbles to the right and left during one time step. Second, it is also possible that the diffusion model for tumbling does not apply for such large angles.
For completeness we also give the nth moment of the absolute angular displacement, m ∆Θ n = |∆Θ| n . It follows directly from the probability distribution of Eq. (8): where n!! denotes the double factorial. Similar to the speed process, we can only infer the ratio λ/r from fits to the probability distribution p(|∆Θ|) of Eq. (8). In order to determine the full set of parameters of Eq. (3), we use again the autocorrelation function of our model, now for the swimming direction e(t). Numerical investigations of our model (see appendix B 3) suggest that it has a simple exponential form with relaxation rate α Θ for parameters relevant to the experiments: Analytically, we are not able to calculate this exponential form. However, in the time interval (λ + r) −1 < τ < D rot −1 relevant to the experiments, we can derive the linear approximation and thereby obtain an expression for the relaxation rate α Θ . Here we have introduced the respective mean D rot and variance ∆ 2 D rot of the telegraph process D rot (t), Figure 4(b) shows the directional autocorrelation function for the experimental data of E.coli moving in a uniform buffer medium. Indeed, the curve is well-fitted by an exponential up to τ 5 s excluding the first point. This agreement supports the validity of our stochastic description of the angle process in Eq. (3). The deviation in the experimental data for the first point is caused by the offset for angular displacements larger than π/2, where the experimental distribution function in Fig. 4(a) deviates from theory. For two and more time steps the influence of this offset becomes smaller and smaller.

III. EXPERIMENTAL MATERIALS AND METHODS
A. Cell culture E.coli AW405 strain was cultured overnight in liquid Tryptone Broth (TB) (10 g/l Difco Bacto TM -Tryptone and 5 g/l NaCl) at 37 • C on a rotary shaker at 300 rpm. The cell suspension was diluted 1:100 into fresh TB, and grown to mid-exponential phase (OD 600 = 0.5). Then the bacterial suspension was washed and resuspended in motility buffer (11.2 g/l K 2 HPO 4 , 4.8 g/l KH 2 PO 4 , 3.93 g/l NaCl, 0.029 g/l EDTA and 0.5 g/l glucose; pH 7.0). Afterward, the cell suspension was divided into two fractions. One was centrifuged and resuspended in the same motility buffer, and the other was centrifuged and resuspended in motility buffer supplemented with the chemoattractant α-methyl-aspartate (Sigma-Aldrich, USA) in a final concentration of 0.5 mM. In both cases, the final OD 600 of the cell suspensions was 0.07 before filling them into chemotaxis chambers.

B. Chemotaxis assay
In this study, a µ-Slide Chemotaxis 3D (ibidi, Martinsried, Germany) was used in order to maintain a stable linear gradient of the chemoattractant α-methylaspartate. This chemotaxis chamber consists of two large reservoirs connected to a central observation area. For the chemotaxis assay, the cell suspension with chemoattractant was filled into the reservoir on the right hand side and the chemoattractant-free cell suspension into the reservoir on the left hand side. The central observation area was filled with motility buffer (see appendix A). A stable linear chemoattractant gradient is generated by diffusion in the observation area and maintained for several hours [32]. For the control assay, both reservoirs were filled with chemoattractant-free cell suspension. In this case, a homogeneous environment without any gradient was established in the observation area.

C. Cell imaging and tracking
An IX71 inverted microscope with a 20× UPLFLN-PH objective (both Olympus, Germany) in phase contrast mode was used for imaging cell trajectories. Five image sequences were taken with 10 min intervals between them using a Orca Flash 4.0 CMOS camera (Hamamatsu Photonics, Japan). For each sequence, the images were acquired at 20 frames per second for 30 s. The field of the view was placed in the center of the gradient region at 30 µm above the bottom of the chamber (total height in the observation area was 70 µm).
A custom Matlab program based on the Image Processing Toolbox (version R2015a, The MathWorks, USA) was used to process the image sequences automatically. For each image sequence, a background image was calculated by pixel-wise time average projection. It was subtracted from each frame to eliminate non-motile objects and shading effects. The built-in Matlab function imerode was then applied for morphological erosion (with a disk of radius 0.6 µm) to reduce the background noise. The putative bacterial cells are distinguished from background using the maximum entropy thresholding algorithm by Kapur et al. [33]. The threshold was calculated for each image in the sequence separately. The median of all threshold values was used to segment the whole sequence. The binary images were further processed with the morphological operations, imopen and imclose (with a disk of radius 0.3 µm) to eliminate any noise caused by segmentation. The built-in function bwconncomp was used to find all connected objects in the binary images. Size and centroid of the objects were determined using the regionprops function. Afterwards, particles with an area between 1 µm 2 to 15.6 µm 2 were considered as single bacterial cell. Finally, trajectories were obtained employing the tracking algorithms by Crocker and Grier [34].
To avoid tracking artifacts caused by tumble events when cells enter and leave the focal plane, the first and last 0.5 s of each track were removed. Highly curved tracks as well as tracks with a total displacement < 10 µm were eliminated, as they most likely result from damaged flagella. The minimal track length is 0.5 s and the maximal length is 19.35 s. The control data set consists of 769 tracks with a total length of 1629 s. The gradient data set consists of 3498 tracks with a total length of 7206 s.

D. Heuristic run-tumble analysis
The trajectories were smoothed using a second-order Savitztky-Golay filter with a window size of 5 data points corresponding to 250 ms [35]. Instantaneous speed v = ∆s ∆t , direction of propagation θ, and turning rate ω = ∆θ ∆t were evaluated on the smoothed tracks. The tumble events were detected as described previously [9,22,36]. Briefly, in the time series of speed and turning rate, local minima and local maxima were detected, respectively, to identify tumble events. Four parameters, two for the speed and two for the turning rate, were adjusted such that the recognition of tumble events was correct as checked by visual examination (threshold parameters α = 3 and β = 6.5 and tumble duration parameters 0.55×∆v and 0.65×∆ω, see the Supporting Information S5 in Ref [22]).

IV. RESULTS
We are now equipped to infer the swimming parameters from experimental data for different experimental settings. We first illustrate the inference method by applying it to a control experiment, where E.coli swims in a homogeneous buffer solution. We validate the inference method by comparing the inferred parameters to their values determined by a heuristic tumble recognizer. Then we demonstrate that our method also reveals the chemotaxis strategy of E.coli when moving in a chemical gradient. In particular, we apply it to data, which was recorded in a linear gradient of α-methyl-aspartate.
A. Inferring the swimming parameters for E.coli in a uniform environment Figures (3) and (4) show distributions and autocorrelation functions for speed and angular displacements recorded for E.coli when swimming in a homogeneous buffer without any chemical gradient. Note that speed and angle inference are performed separately from each other but they are linked by the tumble rate λ and the inverse tumble time r.
Speed inference: From the histogram of the recorded speed values in Fig. 3(a) we determine the moments of the experimental speed data: m v,exp n . The sums are taken over all tracks i = 0, . . . , N and all times t, where T i is the length of track i. Figure 3(b) shows the exponential fit to the speed auto-correlation function, which yields the experimental relaxation rate α V = −5.1 ± 0.2 s −1 . Note that the error estimate and all the following ones are obtained by the method of bootstrapping (see appendix C for more details). We match the first eight speed moments and the relaxation rate to the their theoretical expressions of Eqs. (5)-(7) and obtain 9 non-linear equations for the speed swimming parameters. We solve these equations numerically using a simplex-downhill optimization algorithm from the python package scipy.
Angle inference: Independently, we match the theoretical distribution function for the angular displacement [given in Eq. (8)] to the experimentally recorded histogram in Fig. 4(a) and thereby extract the parameters D 0 , D T , and λ/r. We perform the fit up to ∆Θ = π/2 to avoid the offset for angular displacements larger than π/2. Last, by matching the experimental relaxation rate α Θ of the directional autocorrelation function to the theoretical expression of Eq. (11), we obtain the full set of parameters [see also Eqs. (B33) and (B34) in appendix B 2]. Figure 4(b) shows the linear fit with relaxation rate α Θ = 0.33 s −1 ± 0.02 (green line) and the exponential fit with rate α Θ = 0.32 s −1 ± 0.01 (orange line) in a semi-logarithmic plot.
Inferred Parameters: Table I gives an overview of the inferred swimming parameters for the two stochastic processes for speed and angle. The two inferred tumble rates λ are very close together and the inverse tumble times r agree within the error bars. Our results are in good agreement with tumble rate λ = 0.84 s −1 and swimming velocity v 0 = 20.7 µms −1 determined with a heuristic tumble recognizer (see Sect. III D and Ref. [22]). This validates our inference method. Moreover, our findings are in good agreement with previously measured tumble rates [3,9,22] and swimming speeds [37]. The inferred value for the thermal rotational diffusivity D 0 agrees with previously reported values in the literature, which range from 0.06 s −1 [20,22] to 0.18 s −1 [38].
We use the enhanced rotational diffusion coefficient D T = 2.31 s −1 and the inverse tumble time r = 3.81 s −1 of the angle stochastic process to determine the distribution function of absolute tumble angles, P (|β|), by recording the angular displacement for exponentially distributed tumble times with mean r −1 . The corresponding three-dimensional distribution function is obtained by multiplying the two-dimensional quantity with sin β from the solid angle element. The resulting distribution is shown in orange in Fig. 5 for |β| < π. It has a maximum at β max = 0.78 = 45°and the mean tumble angle is |β| = 1.06 = 61°, which are remarkably close to the values β max = 45°and |β| = 62°from Ref. [3]. The shape of the distribution function is similar to the one obtained with the heuristic tumble recognizer (blue bars). Also, the maximum values are very close. While the main characteristics of the two curves agree well, the heuristic tumble recognizer determines more tumbles for angles close to π. As a result, it finds a larger mean tumble angle |β| = 1.43 = 82°. This might be explained as follows. Some tumbles occur only in one time interval, where one cannot distinguish between a leftward tumble angleβ and a rightward tumble |β − 2π|. Thus, the heuristic tumble recognizer chooses always the smaller angle and, therefore, the distribution of tumble angles close to π is enhanced. In contrast, our inference for the angle process only uses angular displacements up to π/2 in Fig. 4. Thus, it gives a more correct account of the distribution.   Compared to literature we define the tumble time differently by setting τ t = r −1 . Usually, one employs a tumble recognizer and identifies the tumble state when the angular displacement (per time step) exceeds a threshold value [3,7,8,16]. The duration of this period is then the tumble time [see also Fig. 6(a)], for which values of τ t = 0.12 s and 0.14 s were measured using different thresholds [3,8]. However, this procedure underestimates the duration of a tumble event, which starts when a flagellum leaves the bundle and ends when it returns to the bundle. At the beginning and end of this period the angular displacement (per time step) can of course be below the given threshold value. Indeed, Ref. [16] showed that the duration of a tumble event obtained from visualizing the flagellar dynamics during tumbling is significantly larger than the time determined by tumble recognizers.
In contrast to tumble recognizers, our method defines the tumble time as the inverse relaxation rate τ t = r −1 . This is a more rational quantification of the tumble time without the need of an a-priori threshold value. Tumbles are initiated when the speed jumps below the swimming speed and they end when the speed has relaxed back to the swimming speed. We argue that the higher value τ t = 0.23 s obtained by our method describes the tumble process more precisely.

B. Chemotaxis
Next, we apply our method to experimental data of E.coli recorded in a constant gradient of a chemoattractant concentration. Conditioning the analysis on the swimming direction, we are able to determine how the swimming parameters depend on the orientation or swimming angle θ. Thus, we divide the experimental data into eight subsets or sectors each spanning a range of orientation angles centered at θ n = 2πn/8 for n = {0, 1, ..., 7}. Here, θ = 0, 2π means swimming up the gradient and θ = π against the gradient. In practice, instead of dividing the data for the orientation angle into 8 disjunct sectors, we use smooth weighting based on Gaussian kernels as in Ref. [22] (for further details see appendix D). Figure 7 shows the results from applying our inference method to the moments of speed and to the distribution of angular displacements. Graph (a) plots the tumble bias λ/r, the ratio of tumble time to run time, versus orientation angle. It is lowered when swimming up the gradient (θ = 0, 2π) and increased when swimming down the gradient (θ = π). This confirms the classical chemotaxis strategy. The curves from angle inference (orange) and speed inference (blue) show good agreement. Again, we recognize that both inference strategies give coherent results, even though they are performed independently from each other. In Fig. 7(b) the rotational diffusion coefficient D T during tumbling also depends on the swimming direction. It is lowered when swimming up the gradient and increased when swimming down the gradient. This suggests angular persistence or a reduced mean tumble angle, when swimming in a favorable direction, as a chemotaxis strategy. It was already reported in Refs. [8,22]. We will comment more on this strategy in the following.
Adding the speed autocorrelation function to the pa- rameter inference, we investigate whether tumble rate λ and tumble time r −1 are separately modulated during chemotaxis. Figure 8 shows the results for the speed parameters λ, r, v 0 , σ. Indeed, we recover the classical chemotaxis strategy in plot (a) with a strong reduction of the tumble rate when swimming up the chemical gradient. The tumble rate for θ = 0 is less than half of the tumble rate for θ = π. The same trend occurs for the tumble time r −1 , which increases when swimming down the gradient. This bias in tumble time together with the same trend for the diffusion coefficient D T found above confirms a bias in the mean tumble angle β . It is enhanced when swimming in an unfavorable direction, which confirms the alternative chemotaxis strategy identified in Refs. [8,22]. No significant modulations are visible for the swimming speed v 0 plotted in (c). So there is no chemokinesis. The same applies to the jump height η, which is shown in Fig. 11 of appendix E. In the last plot (d) we identify a novel bias in speed fluctuations. The swimming speed is significantly more volatile when swimming up a chemical gradient compared to swimming against it. To the best of our knowledge, this has not been reported yet.

V. CONCLUSIONS AND OUTLOOK
In this article, we provide a detailed stochastic description of the swimming motion of an E.coli bacterium in two dimensions, where we also resolve tumble events in time. We set up an overdamped Langevin equation for the speed dynamics, which contains three terms associated with drift, diffusion, and jumps that initiate a tumble event. A second Langevin equation for the angular dynamics describes rotational diffusion of the orientation angle, where the diffusion coefficient alternates between its thermal value during run phases and an enhanced value during tumbling. The transition between both phases is described by a telegraph process. An analysis of experimental data verifies our description a-posteriori: distribution and autocorrelation functions for both speed and orientation angle agree with theoretical predictions from our model and with numerically determined functions using the inferred swimming parameters.
We considerably extent earlier work [22] by resolving tumble events in time and by incorporating a stochastic process for the speed dynamics. Based on moments as well as distribution and autocorrelation functions, we provide a robust methodology for inferring the full set of swimming parameters that characterize the run-andtumble motion. The inferred swimming parameters are the tumble rate λ, the tumble time r −1 , the swimming speed v 0 , the strength of speed fluctuations σ, the jump height η, the thermal value for the rotational diffusion coefficient D 0 , and the enhanced coefficient during tumbling D T . Although the inference of angle and speed parameters are carried out completely independent from each other, they show good and very good agreement for the two common swimming parameters, r and λ, respectively.
We validated our results by comparing the swimming parameters to the results of a heuristic tumble recognizer and obtained good agreement. However, our approach of inferring parameters has three advantages. First, it does not need to set a-priori threshold parameters for speed and angular displacement. Second, it is able to infer the strength σ of speed fluctuations and the thermal rotational diffusion coefficient D 0 . Third, it provides a more rational and precise choice for the tumble time that encompasses the whole tumble event instead of just the part which is determined by threshold parameters.
The inference method allows to condition the swimming parameters on a specific situation and monitor how they change with the situation by dividing the full data set into subsets. In particular, while conditioning on the swimming direction, we are able to confirm the classical chemotaxis strategy, which modulates the tumble rate λ when changing the swimming direction relative to the chemical gradient. We also confirm the recently discovered modulation of the mean tumble angle (angle bias) [8]. Resolving the tumble event in time, we realize that this angle bias is due to modulations of both the tumble time and the enhanced rotational diffusivity during tumbling. This has not been reported so far. As the tumble rate we expect the tumble time to be determined by the internal chemotaxis machinery of E.coli , which monitors the changing chemoattractant concentration during swimming. The higher rotational diffusivity during a tumble phase, which follows swimming against the gradient, may be caused by more flagella leaving the flagellar bundle, as argued in [23]. Finally and also not reported so far, we show that speed fluctuations are larger by 30% when E.coli swims up the chemical gradient.
Our method of conditioning can be applied to other quantities, for example, the concentration c of the chemoattractant. In particular, the tumble rate of a bacterium, which is adapted to a chemoattractant, should not depend on the concentration c [39]. In an earlier analysis of experiments we already verified this for E.coli and Pseudomonas putida [40]. Other possible conditions explore the biological variability in properties such as the swimming speed v 0 of a bacterium or its size.
In the following we mention some further directions, where our method of inference can be applied or needs to be extented. Recent experimental techniques allow to record tracks of length of the order of 100 s [12,15]. Such long tracks provide enough data to apply our method to a single track and thereby measure swimming parameters for individual bacteria. This can then reveal and quantify heterogeneities in a bacterial population.
To apply the method of inference to other bacterial swimming mechansim, the Langevin equations (2) and (3) need to be modfied. For example, run-reverse bacteria such as the soil bacterium Pseudomonas putida, possess a tumble angle distribution with a sharp peak centered around π [36]. The marine bacteria Vibrio alginolyticus has a bimodal distribution of tumble angles with two maxima as measured in Ref. [41]. In both cases, rotational diffusion with an enhanced diffusivity cannot reproduce such distributions. A possibility to address these cases is to extend the approach of Ref. [22]. There, instantaneous tumbling was modeled by a shot noise process with a delta-peaked angular turning rate and tumble angles drawn from an appropriate distribution. Broadening the delta function to a Gaussian function with the tumble time τ t as standard deviation, one can again re-solve the tumble event in time. Furthermore, an elaborate model of the speed dynamics for Pseudomonas putida should include the alternating swimming speeds reported in Ref. [36], which belong to different swimming modes [17].
Once such models are established, the inference method provides a rational way of analyzing experimental data in order to determine the relevant swimming parameters and to understand important processes such as chemotaxis by conditioning the available data on subsets. Thus, in this article we have introduce a powerful methodology for analyzing properties of bacterial populations, which can handle large amounts of experimental data.  Note that the second and fourth term on the RHS are martingales [42]. Thus, their expectation values vanish. We will use this property when calculating the moments and autocorrelation function of the speed variable. Taking the expectation value . . . on both sides, we obtain the first moment: (B3) To ease the notation, we dropped the superscript V from the main text. Taking the time derivative on both sides, we obtain a non-homogeneous ordinary differential equation (ODE): Its full solution with initial value C at time t 0 reads Taking the long-time limit t → ∞, we recover the equation (5) from the main text.
Next, we calculate the n-th moment m n = v n . Using Ito's lemma [43], we first formulate a SDE for an arbitrary function f (v t ) of the speed variable: Here, v t− denotes the value right before a jump. Setting f (v t ) = v n t , integrating Eq. (B6), and taking the expectation value on both sides yields: where we again extracted the deterministic part of the Poisson process and all martingales dropped out. Taking the time derivative on both sides, we obtain an ODE, which also contains the lower-order moments m n−1 and m n−2 : The solution of this ODE in the long-time limit t → ∞, where dm n /dt = 0, yields Eq. (6) in the main text, Finally, we calculate the speed autocorrelation function g(s, t) = (v s − m 1 )(v t − m 1 ) of Eq. (B1). We define the probability distributions for the speed process P (v ) and the conditional probability P (v, t|v , s) of having v at time t given that we have v at time s and obtain where we have have used Eq. (B5) with C = v in the second last step. We recover Eq. (7) after setting s = t + τ . Identifying the relaxation rate α V , we can write the following formulas for λ and r: (B12)

Angle
We rewrite the Langevin equation (3) from the main text as a SDE using mathematical notation: The SDE contains two stochastic processes: the telegraph process D t , where we drop here the subscript rot used in the main text, and the white noise process dW t . These two processes are stochastically independent of each other. Thus, the moments for the angular displacement during time step ∆t factorize into contributions from each process, The probability distribution function (pdf) p(∆W t ) and the absolute moments of the white noise increments ∆W t during time step ∆t are given by where N (0, σ) denotes the normal distribution with zero mean and standard deviation σ and n!! denotes the double factorial. For the telegraph process D t with states D 0 and D T , the two probabilities for being in one of the states at time t obey the following master equations: Here, λ is the transition rate from D 0 to D T and r the transition rate for the reverse process. The variable C indicates the initial condition at time t 0 . We first state the pdf p(D) in the long-time limit t → ∞ as well as the auto-correlation function D t D s from literature [25]: In the last equation we have introduced the mean D and the variance ∆ 2 D in the long time limit. They are given by The mean value of D t for any time t with initial condition C at time t 0 is given by We can use the pdf p(D) to calculate the first factor on the RHS of Eq. (B14) in the long time limit, Inserting this expression and Eq. (B16) in Eq. (B14) leads to Eq. (9) stated in the main text. The pdf of the absolute angular displacement p(|∆Θ|) can be calculated straightforwardly. Using the independence of the two stochastic processes and combining Eqs. (B15), (B17), and (B18), we obtain (B24) This agrees with Eq. (8) from the main text.
Finally, we calculate the directional autocorrelation function g(τ ) = e(τ ) · e(0) = cos (Θ(τ ) − Θ(0)) . Integrating Eq. (3) and using the real part of the Euler identity e ix = cos(x) + i sin(x) yields: The term in the real part operator can be interpreted as the characteristic function of the random variable X(τ ) = τ 0 √ 2D s dW s for wavenumber k = 1. Using the moment representation of the characteristic function, we obtain where we have defined the moments m n = X n . For symmetry reasons, the odd moments vanish, and the real part operator can be skipped. First, we calculate m 2 , where we use again the independence of the two stochastic processes dW t and D t in the second line, Next, we calculate the fourth moment m 4 , where we use the correlation function of Eq. (B19) in the fourth line: Replacing in the double integral τ 0 . . . ds 1 by 2 s2 0 . . . ds 1 , the integral I is calculated as follows: Truncating the sum of Eq. (B26) for n > 4 , we finally obtain: This form suggests a slope − D of the correlation function for times τ < (λ + r) −1 , which in our case means τ < 0.2 s and is just valid for the very initial time range of the correlation function. From Eq. (B31) we can extract another linear approximation by concentrating on the time range (λ + r) −1 < τ < D −1 . It gives Eq. (11) from the main text, from which we obtain an expression for the relaxation rate α Θ measured in experiments. It is determined by D and the second term in the brackets is a correction. But it is sufficient to determine separate values for r and λ, when r/λ is known from the analysis of the pdf p(|∆Θ|). Solving the equation for α Θ for either λ or r, we obtain the formulas

Numerical investigations of the directional autocorrelation function
The directional autocorrelation function g Θ (τ ) = e(τ ) · e(0) has an exponential form in experiments up to ca. 2s (see Fig. 4). Here, we validate this dependence by numerically solving Eq. (3) with the inferred parameters of table I. The semi-logarithmic plot in Fig.  10(a) shows the resulting autocorrelation function (blue data points). It is in good agreement with the exponential decay of Eq. (10) using the relacation rate α Θ from Eq. (11), which we derived in the previous section in Eq. (B32). This validates our proposition for the relaxation rate.
Moreover, we can further validate the exponential fit to the experimental directional autocorrelation function using the theoretical value for the relaxation rate. After having inferred the reduced parameter set (λ/r, D 0 , D T ) as described in the main text using the pdf p(|Θ|), we determine the directional autocorrelation function by simulating the angle process with the reduced parameter set for different values of the parameter λ. Figure 10(b) shows the mean squared error Σ of the simulated autocorrelation function compared to the experimental function plotted versus the tumble rate λ. The best match is for a λ very close to the value shown in table I, which was determined using the theoretical prediction of Eq. (11)  The orange line shows an exponential decay with the relaxation rate from Eq. (11). Right: Mean squared deviation between the simulated directional autocorrelation function gΘ(τ ) = e(t + τ ) · e(t) and the experimental curve for different tumble rates λ. The global minimum at λ = 0.81 s −1 verifies the use of the theoretical expression (11) for the relaxation rate.
for the relaxation rate α Θ .

Appendix C: The method of bootstrapping
Bootstrapping allows to derive an estimate of the standard deviation of the inferred parameters without the need of repeated experiments [44]. Similar to Ref. [22], we create synthetic ensembles by randomly mixing subsets of the original data set. Let T 0 = {t 1 , ...., t N } be the set of original trajectories. Pulling N random trajectories of this set and laying them back after each pull, one obtains a bootstrap sample T 1 = {t 1 , ...,t N }, where single trajectories can appear several times. We create K = 100 of these bootstrap samples, apply our inference technique to each sample, and obtain a distribution of values for each swimming parameter. The error bars in the main text are the standard deviation from the mean of each swimming parameter.

Appendix D: Smooth weighting of data
The conditioning of section IV B needs the division of the data in different sectors. Instead of a discrete di-vision, we use the whole data set for each sector but weight the data by a Gaussian kernel similar to Ref. [22]. The speed moments for N experimental trajectories when conditioning on a specific swimming angle θ are then calculated according to where we have introduced the width of a section, ∆θ = 0.125π, and their centers θ. Note that we use the actual orientation angle Θ i (t − 2∆t) of the second previous time step to calculate the moments. Tumble events have a finite duration of around 2∆t and this ensures that the whole tumble is connected to the condition of the previous run. The same Gaussian kernels are applied when we calculate the histogram of angular displacements and the autocorrelation functions for speed and direction.
Appendix E: Jump height conditioned on swimming angle θ. Figure 11 shows the relevant plot. There is no systematic dependence of η on the swimming angle.