Optimal navigation of microswimmers in complex and noisy environments

We design new navigation strategies for travel time optimization of microscopic self-propelled particles in complex and noisy environments. In contrast to strategies relying on the results of optimal control theory, these protocols allow for semi-autonomous navigation as they do not require control over the microswimmer motion via external feedback loops. Although the strategies we propose rely on simple principles, they show arrival time statistics strikingly similar to those obtained from stochastic optimal control theory, as well as performances that are robust to environmental changes and strong fluctuations. These features, as well as their applicability to more general optimization problems, make these strategies promising candidates for the realization of optimized semi-autonomous navigation.


I. INTRODUCTION
The problem of finding the most efficient route towards a desired target has important technological and medical applications, such targeted drug delivery at the microscale [1,2], environmental monitoring [3], and optimization of plane routes [4]. With the advent of theoretical and experimental prescriptions for how to make microswimmers [5][6][7][8], the fictional idea of making microscopic devices capable to delivering molecular cargo at the desired location in human body has moved closer to reality.
A solution of this classical problem can be traced back to the work of E. Zermelo [9], who derived the steering policy which minimizes the travel time of a vessel moving at constant speed in presence of wind. In the context of micoswimmers, Zermelo's work has been extended to more general contexts, including fuel consumption and dissipation minimization [10], time-varying flows [11], motion on non-Euclidean spaces [12], or the role of hydrodynamic interactions [13]. The approach of Zermelo, however, does not account for thermal fluctuations, which play a prominent role at the micro-scale.
Optimal navigation in the presence of noise falls into the class of problems addressed by stochastic optimal control theory [14][15][16][17][18]. Considering a cost function defined on the configuration space (e.g. the mean arrival time to a given target starting from a specific position), a stochastic optimization principle can be used to show that it obeys a so-called Hamilton-Jacobi-Bellman equation, from which the optimal control map can be obtained [16,19]. In parallel, machine learning algorithms such as reinforcement learning provide convenient and increasingly popular routes to determining the optimal control landscapes [20][21][22][23][24][25]. In practice, implementations of such optimal policies can then be achieved via external feedback loops for the actuation of a microswimmer motion [26][27][28][29][30][31][32]. * ramin.golestanian@ds.mpg.de On the other hand, a number of natural and artificial microswimmers exhibit tactic behaviour [33], i.e. are able to adapt their motility in response to external stimuli such as light [34][35][36], chemical concentration [37][38][39][40] or viscosity [41,42] gradients, as well as magnetic [6,[43][44][45] or gravitational [46] fields, in an autonomous fashion. Harnessing guidance provided by taxis allows microswimmers to perform complex tasks [38,[47][48][49] in a semi-autonomous way as it does not rely on real time external feedback mechanisms.
Here, we show how these ideas can be applied to the problem of optimal navigation in complex and noisy environments. Considering a minimal but non-trivial optimization problem in two dimensions, we propose novel navigation policies inspired by the control maps provided by the exact global optimization problem, which can be implemented in a semi-autonomous fashion. Using extensive Brownian dynamics simulations, we show that the new policies show performances comparable to that obtained from stochastic optimal control theory, and demonstrate their robustness upon changes in the environment, as well as to positional and rotational fluctuations. Lastly, we illustrate how the semi-autonomous policies can be conveniently adapted to a broader class of problems such as navigation on curved manifolds.

A. Supervised optimal navigation
We consider an overdamped self-propelled particle moving at a fixed speed v 0 in presence of a stationary force field f (r), which may in general include a contribution due to advection by the solvent flow velocity, and translational diffusion with diffusivity D. For simplicity, we set the friction coefficient (and mobility) to unity. The position r of the self-propelled particle obeys the following stochastic differential equatioṅ  (2) and (3) for same parameters as in panel (a). (c) Schematic defining the quantities used for the implementation of the semi-autonomous navigation strategies. The red dot indicates the active particle position r and the red arrow its heading directionû. rc (red cross) marks the closest point to r on the optimal trajectory, while the two vectorsn andt are respectively normal and tangent to the optimal trajectory in rc.
whereû is the unit vector setting the direction of selfpropulsion, and ξ is a Gaussian white noise vector whose components have zero mean and unit variance. Within this setting, the only degree of freedom accessible to the self-propelled particle for navigation is its orientationû. For the sake of presentation, we will first assume a full control overû, either from external sources or by the particle itself, while this constraint will be relaxed later.
We now want to determine the optimal navigation protocol for the particle to reach a target position r T given a force profile and an initial position r 0 . Following standard techniques that invoke the backward Fokker-Planck equation [50], the mean first-passage time (MFPT) T (r) to reach r T while starting initially at r can be found as the solution of the following equation (see Methods for a derivation): which is to be solved subject to the boundary condition T (r T ) = 0. Here, the prescription for optimal control comprises tuning the orientation according to the the following ruleû at every point in space. We note that Eqs. (2) and (3) provide strategies which, by design, lead to the fastest possible trajectories on average as can be seen in the vanishing diffusivity limit where Zermelo's solution [9] is recovered. Hereafter, we will refer to the strategy that corresponds to the solution of Eqs. (2) and (3) as the optimal policy (OP), and the D = 0 optimal path as the Zermelo path. We now illustrate OP by considering a simple but nontrivial setup in which a self-propelled particle navigates in the two dimensional plane spanned by the unit vectors {ê x ,ê y } between neighbouring stationary points of a Taylor-Green vortex flow (see the colour map and black arrows in Fig. 1a). This configuration corresponds to f (r) = v f [cos(ky) sin(kx)ê x − cos(kx) sin(ky)ê y ] with k = 2π/ and being the characteristic length scale of the flow. Rescaling space and time as r → r and t → t/v 0 , the dynamics (1) is characterized by only two nondimensional parameters: the ratio between flow intensity and self propulsion γ ≡ v f /v 0 , and the Péclet number Pe = v 0 /D. Here, we will focus only on cases where the self-propulsion is always stronger than the flow, namely, 0 ≤ γ ≤ 1.
With the setup shown in Fig. 1a, the most direct route between the departure and arrival points requires travelling counter-flow all the way. Consequently, the straight path becomes increasingly disadvantageous as the flow amplitude grows, such that for γ 0.4 the Zermelo path makes use of the flow profile and takes a bell-shaped curve (see the green line in Fig. 1a). For finite Péclet number (D = 0), such a feature is moreover consistent with the control map provided by OP, since the latter generally orients the self-propulsion away from the straight path (Fig. 1b). Simulations of the Brownian dynamics (1) with the control map solving Eqs. (2) and (3) indeed reveal that the OP trajectories tend to remain close to the Zermelo path for a broad range of Péclet number values, provided that the flow strength and particle self-propulsion dominate over fluctuations (see e.g. the density map in Fig. 2d and Methods for more details on the stochastic dynamics simulations).

B. Semi-autonomous optimal navigation
The above observations suggest that optimized navigation in the finite Péclet number regime may be achievable using only the local information of the relative positions of the stochastic swimmer and the Zermelo path, as opposed to the OP which requires a global control map ∼ ∇T (r). We now explore a number of such local strategies and probe their efficiencies in comparison with OP. For a given particle position r, we define r c ≡ min r |r − r | as the corresponding closest point on the Zermelo path. Moreover, we suppose that the latter is smooth and can be parametrized by the moving frame {t,n}, of tangent and normal vectors, witht heading towards the target as shown in Fig. 1c. Assuming that the swimmer is able to measure its relative position to the Zermelo path, it can regulate it by steering its self propulsion directionû via the following ruleû where ∆r ≡ (r − r c ) ·n and the function G ∈ [−1, 1] depends on the amount of information available to the swimmer. As the rhs of Eq. (4) depends on r solely through ∆r and the external flow f , it defines a class of optimal navigation policies relying only on the swimmer's local knowledge of its environment.
In the simplest case where the swimmer can only determine the directionn to the Zermelo path (from its current position), it can choose to keep a constant angle α between its self-propulsion direction andn. Such aligning policy (AP) corresponds to a protocol G AP = ± cos(α), where the ± sign ensures thatû ·t ≥ 0.
Although AP allows the swimmer to remain in the vicinity of the Zermelo path, it also slows it down as it imposes a finite angle betweenû andt even for (arbi-trarily) small separations. For swimmers able to evaluate their relative distances to the Zermelo path, AP can thus be refined by allowing G to depend on ∆r. This defines the adaptive aligning policy (AAP). Here, we choose, for simplicity, a piecewise linear form for G(∆r), namely where the parameter ε sets a cut-off scale above which the stochastic particle points normally to the Zermelo path.
The parameters α and ε introduced above essentially play the role of sensitivities for AP and AAP, respectively. Their optimal values (that minimize the mean travel time in this case) cannot be selected a priori, and need to be determined empirically. However, the existence of such optimal values can be intuitively understood from the control maps obtained from Eq. (5) for various ε values (with the generalization to AP being straightforward). As shown in Figs. 2a-c, exceedingly small ε values force the swimmer to mostly point normally to the Zermelo path, while for excessively large ε stochastic trajectories are less efficiently confined and can visit less favourable flow regions. Therefore, it is natural to expect an intermediate value of ε providing the optimal trade-off between efficient confinement and tangential motion along the Zermelo path. This heuristic picture is confirmed by numerical simulations showing that the mean arrival time t indeed exhibits a minimum at a value ε = ε opt (see Fig.  2f). We moreover note that t varies relatively little with ε, such that in practice the policy implementation does not require a fine tuning of this parameter. The heat map of trajectories obtained from simulations of AAP at optimal ε shows that they globally follow the Zermelo path (Fig. 2e), similarly to the OP case. Contrary to OP, however, they are not distributed symmetrically with respect to the desired path due to a non-zero transverse component of the flow (see Fig. 1a). Better conformity to the OP results thus requires additional features such as a function G in Eq. (4) that depends explicitly on the local flow field f (r). Here, we restrict to the case where the swimmer is unaware of the local flow structure around it and will address such more elaborate policies in a forthcoming publication [51].

C. Performance assessment of the navigation policies
We now compare the performances of the two policies introduced above (AP and AAP) with that of OP by simulating Eq. (1) with the controls defined by (3) and (4) in the Taylor-Green flow setup (Fig. 1a). To illustrate the relevance of nontrivial policies, we moreover consider the straight policy (SP) for which the swimmer always points towards the direction of the target regardless of its current position (see the cyan curve in Fig. 1a for a representative trajectory).
We first work at fixed Pe = 400 and vary the relative flow amplitude γ ∈ [0, 1]. Figure 3a shows the arrival time distributions P (τ ) with τ ≡ t/ t for each of the four policies (OP, AP, AAP, and SP) at γ = 0.7. Remarkable overlap between the OP and AAP distributions can be observed. We moreover find that they are both well described by a so-called inverse Gaussian distribution of the form with variance σ 2 . To quantify this correspondence we furthermore calculate the Kullback-Leibler divergence D KL = ln[P num (τ )/P (τ )] Pnum between the numerically obtained distribution P num and the prediction of Eq. (6), with t and σ determined from the data. As shown in Fig. 3b, D KL remains almost zero for both OP and AAP over a wide range of γ values, highlighting the robustness of (6). As the inverse Gaussian corresponds to the First Passage Time distribution of a driven Brownian particle in one dimension [52], Eq. (6) is closely related to the confinement of the OP and AAP trajectories along the Zermelo path as shown in Figs. 2d-e. In fact, for both OP and AAP the loss of correspondence with the inverse Gaussian coincides with the regime of strong fluctuations that prevent the swimmers from being efficiently guided along the Zermelo path (details in SI). In contrast, for sufficiently large flow amplitudes the simpler aligning policy shows arrival time distributions that do not follow the inverse Gaussian law (red squares in Figs. 3a,b). These distributions indeed exhibit a crossover from inverse Gaussian-like behaviour at τ < 1 to an exponential decay at τ > 1 with a characteristic time τ AP that is systematically larger than the value 2σ 2 / t 2 predicted by Eq. (6). As detailed in the SI, these deviations from the inverse Gaussian law correspond to asymmetric trajectory distributions around the Zermelo path, similarly to the case observed for OP and AAP in the large noise regime. In particular, for the AP case most trajectories land on the left of the target such that in most cases the swimmer has to navigate counter-flow in the last part of its journey. Finally, SP shows arrival time distributions globally compatible with the inverse Gaussian (6), with only slight deviations at large flow amplitudes (see blue circles in Figs. 3a,b), indicating that in this case too trajectories are nearly one dimensional. As SP trajectories are mostly oriented against the flow, they are characterized by a lower effective drift on average, resulting in a larger ratio σ/ t .
As customary in this context, we compare the performances of the navigation policies by measuring the mean arrival time to reach the target t [53], here normalized using the optimal value t opt at D = 0 obtained from Zermelo's solution. The corresponding results shown in Fig. 3c reveal that, naturally, OP performs the best with the mean arrival times always remaining higher than t opt by only a few percent. Conversely, the trivial SP performances strongly deteriorate as γ increases. For small flow amplitudes where the Zermelo path is almost straight (γ 0.4) SP performs similarly to OP with t t opt , whereas for sufficiently large γ values it exhibits mean arrival times reaching five to six times t opt (see the inset of Fig. 3c). Despite the presence of fluctuations, the performances of the policies are thus primarily set by their ability to make efficient use of the stationary flow profile. This feature is moreover illustrated by both AP and AAP, which show mean arrival times no more than 10% higher than that of OP, regardless of the relative flow amplitude. As expected, in the non-trivial cases (γ > 0.4) the performances of the different strategies reflect the amount of information they require for navigation, such that, in the order of increasing efficiency, one finds SP, AP, AAP and OP.

D. Robustness of the new protocols
The above analysis shows that AAP displays arrival time statistics similar to that of OP. Both AP and AAP moreover exhibit performances comparable to OP, despite them relying only on local information. We now assess the generality of these results, focusing on AAP, by discussing more general situations with different model parameters and evaluation setups. The role of Pe. The two colour maps of Fig. 4a show how the ratio t /t opt varies with the Péclet number and In (a-c) the value of the Péclet number is set to Pe = 10 3 . All data in (b,c) are averaged over 10 5 independent trajectories. relative flow amplitude γ for OP and AAP. In agreement with previous results, t /t opt for OP does not significantly depend on γ while we observe a slight increase with decreasing Pe. The AAP case, on the other hand, exhibits two distinct regimes. At small flow strengths (γ 0.4), t /t opt remains nearly constant upon varying Pe such that the AAP performances are not significantly altered by the amplitude of noise. Conversely, at larger γ values where the Zermelo path is more curved, the mean arrival time is more affected by translational noise. As shown in Fig 4b for γ = 0.7, all non-trivial strategies show a slight decrease in performance as Pe is lowered, whereas SP becomes slightly more favourable upon increasing the noise, since in this case stronger fluctuations lead the swimmer to visit less unfavourable flow regions.
Misalignment of self-propulsion. We have so far assumed full control over the self-propulsion orientationû. In reality, however,û is subject to fluctuations-e.g. due to rotational Brownian motion or inaccuracies in the evaluation of the desired direction-which affect the performances of the policies using it as a control. To model the effect of rotational noise, we applied random rotationsû → R(β)û to the controls (Eqs. (3) and (4)), where the angle β was sampled from a uniform distribution in (−ηπ, ηπ]; we present the corresponding results in Fig. 4c. For small η values, we find that the inverse of the mean arrival time normalized by its value at η = 0, t / t η , decays linearly with η 2 with a policy-dependent slope. In particular, OP and AAP show similar trends and appear to be much more robust to the effect of rotational noise than the SP case, as for the latter the mean arrival time has increased by a factor 10 at η = 1 2 , while the corresponding drop in performance for OP and AAP is about 40%. We moreover show in Methods how the scaling t / t η − 1 ∼ η 2 can be derived from a simple argument in the limit of small noises, and how it is related to the quasi-one dimensional nature of the problem. This scaling is thus not expected to hold for large noises, as suggested by the deviations from the linear decay ob-served for the largest η values in Fig. 4c.

Complex navigation tasks.
Increasing the distance between the initial and target points in the Taylor-Green flow allows us to design more complex paths. Upon translating the initial swimmer position alongê x , as shown in Figs. 4d and 4e, both OP and AAP lead to trajectories focused around the Zermelo path on average. Consequently, the corresponding arrival time distributions remain well characterized by the inverse Gaussian law (Figs. 4f). The performances of AAP and OP are moreover stable upon increasing the total travel distance, with mean arrival times t not higher than t opt by more than a few percent.

E. Optimal navigation on a manifold
We next show how AP and AAP navigation protocols are applicable to motion on curved landscapes. Selfpropelled motion on curved surfaces has recently earned growing attention both at individual [12,[54][55][56] and collective [57][58][59][60][61] levels. As stochastic motion taking place on a generic Riemannian manifold involves multiplicative noise, solving the corresponding MFPT equation (2) requires advanced computational techniques [62], which will introduce additional challenges for determining the stochastic optimal control (3). On the other hand, Zermelo's approach was recently generalized to self-propelled motion on curved surfaces using a mapping to Finsler geometry [12]. The corresponding noiseless optimal path, also known as the Randers geodesic, can then straightforwardly be used to extend AP and AAP policies to non-Euclidean spaces.
For the sake of illustration, let us consider the case of active motion on a sphere in the presence of a unidirectional flow f (θ, φ) = v f sin θê φ , where θ and φ respectively denote the polar and azimuthal angles in the spherical coordinate system. As shown in Fig. 5a (see the black arrows), this flow-which is characterized by a pair of vortices at the poles and is maximum at the equator-generally leads to non-trivial Randers geodesics (solid green line) between two arbitrary points on the sphere.
Simulating the counterpart of the Langevin equation (1) on the sphere (details in Methods) at fixed Pe = 10 3 , we are able to compare the performances of AP, AAP and SP. The corresponding arrival time distributions are shown in Fig. 5b. As for the Taylor-Green flow in flat space, we find that they are all in good agreement with the inverse Gaussian law (6). In the AP case, this result is probably due to the rather large value of Pe chosen for convenience, which allows all policies to exhibit trajectories well distributed around a one dimensional path. Figure 5c moreover shows that at small flow strengths all policies perform similarly with t t opt , while for larger γ values leading to more complex Randers geodesics SP becomes increasingly disadvantageous. On the contrary, both AP and AAP always remain close to optimality, as they exploit the information of the noiseless optimal path.

III. DISCUSSION
We have introduced a class of policies that allow for semi-autonomous optimal navigation of microswimmers in complex and noisy environments. These policies rely on the swimmer knowledge of its local environment, such as its relative position and distance to a desired path, and their performances were found to improve with the amount of information available to the swimmer. In particular, extensive numerical simulations reveal that the adaptive aligning policy (for which the self-propulsion orientation varies with the distance to the Zermelo path) shows performances comparable to that obtained from the optimal MFPT control. The two strategies lead to statistically similar trajectories and nearly identical inverse Gaussian arrival time distributions. Our analysis moreover shows that the best performing strategies are also the most robust to environmental changes, such as stronger translational diffusion and the introduction of rotational noise. Finally, it was shown that the newly introduced navigation strategies have the additional advantage of being easily applicable to the problem of optimal navigation on curved surfaces. In an illustrative example on spherical geometry, the semi-autonomous navigation strategies were once again found to perform significantly better than the trivial one consisting of pointing straight to the target.
Although the analysis carried out here focused on the problem of travel time optimization, the policies we proposed are fully determined by the noiseless optimal path. Therefore, they are straightforwardly generalizable to a broader class of optimization problems like energy dissipation or fuel consumption minimization. Furthermore, even though the new policies show remarkable performances as compared to OP, some differences persist. In particular, while trajectories following OP are symmetrically distributed with respect to the Zermelo path, this is not the case for AP and AAP (Figs. 2(d,e)). Our ongoing work investigates how further improvement could be reached by designing policies based on the ability of some swimmers to adapt their swimming direction according to the local flow field [39,41].
Lastly, we note that the constraint of imposing a constant sensitivity (represented by the parameters α and ε) throughout the swimmer motion might restrict the performances of the policies. While making these parameters explicitly space dependent would break the semi-autonomous nature of the policies, determining the functions α(r) or ε(r) is certainly a simpler problem than calculating the full optimal control map of the swimmer orientation. Therefore, the framework presented here could serve as a basis for reinforcement learning based approaches applied to complex navigation problems [20,21,25].

IV. METHODS
The optimal control for an active particle in a stationary flow. The scope of this section is to provide a derivation of the mean first passage time equation and the corresponding optimal control (Eqs. (2) and (3) in the main text). The derivation relies on standard techniques presented in Ref. [50].
First of all, we aim to derive a policy that minimizes the mean arrival time at the target given an initial position x. Let us therefore denote T (x) as the mean travel time. Now, we consider the joint probability p(y, t|x, 0) that at time t the swimmer is at position y, given that it started from x at t = 0. This probability obeys the backward Fokker-Planck equation (from (1)): where the gradients are taken with respect to x. Moreover, p is related to T via T (x) = dtdy p(y, t|x, 0), such that we obtain The optimal choice for the heading direction θ (our control parameter) can be obtained by taking the variational derivative of both sides of (8) with respect to θ itself, leading to which corresponds to Eq. (3) in the main text. Putting together Eqs. (8) and (9), we get the MFPT equation of our problem (Eq.(2) in the main text).
Numerical methods. The Brownian dynamics simulations of Eq. (1) have been performed using an Euler-Mayurama scheme with a time step dt = 10 −3 .
We have verified that the selected time step is sufficiently small, such that the results presented here do not depend on its value. In all our simulations a given run ends when the active particle is within a distance δr = 0.025 from the target. We have also checked that the choice of the disk radius δr does not significantly influence the results as long as the thermal fluctuations length scale is kept relatively small, i.e. √ 2Ddt δr .
Details on the implementation of the protocols.
Here we provide details regarding the implementation of the navigation strategies presented in the main text. The MFPT equation (2) was numerically solved using the Finite Elements Method implemented in the NDSolve routine of Wolfram Mathematica 12.3.1 [63]. The Optimal policy was then implemented from the corresponding solution (3) by discretizing the simulation domain on a square grid of step l = 0.01, and assigning to each box the optimal control orientationû opt (r b ), with r b being the position of the centre of the box. In the stochastic simulations, the swimmer following OP was then aligning its direction of motion with the orientation associated with its current position on the grid.
Both AP and AAP rely on the evaluation of the point r c on the Zermelo path closest to the particle position r.
For numerical efficiency, the Zermelo path was thus discretized and the distance between the particle and the curve was calculated from the positions of the mid-point of each segments. In all simulations the initial particle orientation was taken to be equal to the one prescribed by the Zermelo solution.
The scaling of the mean arrival time with rotational noise. Here, we show how the linear scaling of the mean arrival time t η with the rotational noise strength η 2 shown in Fig. 4c can be understood from an effective one dimensional model of driven Brownian motion.
Assuming that the particle remains in the vicinity of the mean path and neglecting effects due to the curvature of the latter, we consider the following dynamicṡ where the subscript stands for quantities projected along the mean path. The first term on the rhs of (10) thus accounts for the total mean velocity of the particle along the mean path, which includes the combined effects of flow and self-propulsion. In general, the angle θ obeys a nontrivial and policy dependent dynamics. However, in the limit of small η and D where the particle remains close to the mean path, we approximate θ as a Gaussian noise with zero mean and variance ∝ η 2 . Expanding the cosine and performing an average over the noises we thus obtain with κ > 0 being a constant that depends on the navigation details, e.g. the protocol used. In the onedimensional approximation and assuming that v varies little with r , the mean travelling time to reach an absorbing barrier at distance L scales as t η ∼ L/ ṙ , such that we obtain which corresponds to the scaling observed in Fig. 4c.
Langevin simulations on the sphere. To describe the motion of an overdamped particle on the sphere, the Langevin equation (1) has to be adjusted to take into account the multiplicative noise induced by the space curvature. Namely, it is given bẏ wherer ≡ r/|r|, while the noise ξ shares the same statistics as in Eq. (1) and is interpreted in the Stratonovich sense. Furthermore, in contrast with the Taylor-Green flow case studied in the main text, for this setup the characteristic length scale of the flow is comparable with the sphere radius: ∼ R, such that the Péclet number is here defined as Pe = Rv 0 /D. Lastly, the extension of AP and AAP to the case where motion takes place on a sphere straightforwardly follows from the presentation in the text, as it only requires us to generalize the definitions of the distance and relative direction between two points of interest. On a sphere of radius R, the shortest distance between the points r and r c -also known as great-circle distance-is defined as The directionn from r to r c at the point r is likewise defined from the shortest arc linking the two points, namelŷ The desired heading directionû for AP and AAP was then obtained from rotations ofn around the axis set by r using Rodrigues rotation formula [64]: where the protocol function G(∆r) is defined by Eq. (4) and the ± sign ensures thatû ·t ≥ 0. Appendix A: An alternative policy performance indicator As customary in the context of optimal navigation, the primary performance indicator is the mean arrival time at the target t . The corresponding results shown in the main text, reveal that the new simple protocols introduced exhibit performances actually very close to that of OP regardless of the relative flow strength. However, as our numerical simulations give us access to the full arrival time distribution we may get further insights into the performance of a navigation protocol by considering additional observables which also take explicitly into account the effect of fluctuations.
The specific choice we have made here stems from the following remark: owing to the presence of thermal fluctuations, the active particle may reach the target in less time than at D = 0. The frequency of these events can be quantified by looking at the probability of arriving before the optimal time t opt in the absence of noise (defined by the Zermelo solution, see the main text): Prob(t < t opt ) ≡ π < . The latter is a measure of how the policies manage to optimize the effect of fluctuations by maximizing the frequency of small arrival time events. Note that since in general t ≥ t opt , π < is bounded by 1 2 from above.
The new navigation policies introduced in the main text both depend on a free parameter representing the protocol sensitivity. Figure 6 shows the policies performances measured from t and π < as function of sensitivity for both the adaptive aligning policy (AAP, left) and the aligning policy (AP, right). Similarly to the mean arrival time, the performance indicator π < allows to obtain a clear optimal sensitivity value for both AP and AAP. Moreover, the optimal values obtained independently from t and π < generally coincide for AAP while some small differences are observed for AP. In the latter case, choosing either of the two estimates does not lead to significant variations of the values of t and π < , and the corresponding arrival time distributions do not differ significantly.
Performance and robustness assessment. In Fig. 7a we show the probability π < as a function of the relative flow strength γ. On the one hand, the policies performances show trends analogous to those reported in the main text for the analysis of the mean arrival time, with a preserved hierarchy at strong flows. As expected, in order of increasing performance we also find here SP, AP, AAP and OP. On the other hand, the differences between the various protocols appear more striking. For example, the probability that a swimmer following AAP reaches the target in a shorter time than t opt is around 35% lower than that of a swimmer following OP at γ = 1, while the corresponding mean arrival times deviate by only a few percent (see main text). As shown in Fig. 7b, similar conclusions can be reached examining the behaviour of π < as function of the Péclet number. Namely, all OP, AAP and AP show a slight decrease in performance upon increasing the strength of fluctuations, while SP becomes more advantageous at large noises. For completeness, we also show in Fig. 7c the heat maps comparing the values of π < for OP and AAP as function of γ and Pe. Here again, the behaviour of the performance indicator is similar to that of the mean arrival time.
Overall, the new indicator π < is largely dominated by the mean arrival time, although it accounts explicitly for the effect of fluctuations and provides a more refined evaluation of the protocols performances. π < therefore leads to qualitatively analogous conclusions regarding the poli- cies efficiency, which highlights the robustness of the results presented in the main text.
Appendix B: Crossover from Inverse Gaussian to exponential decay for AP In the main text, we showed that for most flow and noise strengths, the arrival time distributions of AAP and OP essentially follow inverse Gaussian laws, whose expression we report here for convenience: where σ 2 denotes the corresponding variance and τ = t/ t . However, we observed that while the the arrival time distributions for AP are well described by the inverse Gaussian law at small γ, significant deviations were found when increasing the intensity of the flow. In particular, these deviations mainly lie in the large-time tail of the distribution that shows an exponential decay with characteristic time τ AP generally larger than the value 2σ 2 / t 2 predicted by the inverse Gaussian law (B1) (see Fig. 8b). Defining τ * ≡ 2σ 2 / t 2 as the variance to square mean ratio of arrival time for each protocol, its scaling with the flow strength is shown in Fig. 8a for all policies. As a sign that AAP and OP on average lead the swimmer to travel faster as the flow strength is increased, τ * decays with γ for these two policies. In contrast, swimmers following SP always travel counter-flow and are thus get slower on average as γ increases. They are thus more subject to fluctuations, such that for SP τ * grows with γ. Lastly, for AP τ * undergoes a crossover from a decay with flow strength at small γ, to a growth with γ at large flows (red squares in Fig. 8a). On the other hand, the value τ AP obtained from the large-time tails of the distribution always grows with γ, similarly to SP (purple inverted triangles in Fig. 8a). These observations can be rationalized from the heat map of AP trajectories shown in Fig. 8c. Namely, it shows that most of the trajectories end at the left of the target, such that the swimmers have to travel counter-flow and are thus generally slower in the final stretch. This suggests that, for AP, the events characterized by τ > 1 are dominated by this last part of the swimmer's journey, ultimately caused by the overall pronounced asymmetry with respect to the Zermelo path. of noise amplitudes, with significant deviations arising only from Pe −1 0.01. As shown in the heat maps in Fig. 9b (left and central panels), when fluctuations are strong the stochastic trajectories are in fact less focused around the optimal path and also more asymmetrically distributed around it. As already discussed in the previous section, this asymmetry leads to larger tails in the probability distributions (data not shown) and therefore to the observed deviations from the inverse Gaussian law.
In this scenario, the shape of SP distribution turns out to be the most robust to fluctuations. This can be better understood by looking at the corresponding heat map in Fig. 9b (rightmost panel). Despite being more dispersed, the stochastic trajectories still look symmetrically distributed around the straight path. This is strictly related to the flow symmetry with respect to the line connecting the starting point with the target (to this end, please refer to Fig. 1a in the main text).