Inferring characteristics of bacterial swimming in biofilm matrix from time-lapse confocal laser scanning microscopy

Biofilms are spatially organized communities of microorganisms embedded in a self-produced organic matrix, conferring to the population emerging properties such as an increased tolerance to the action of antimicrobials. It was shown that some bacilli were able to swim in the exogenous matrix of pathogenic biofilms and to counterbalance these properties. Swimming bacteria can deliver antimicrobial agents in situ, or potentiate the activity of antimicrobial by creating a transient vascularization network in the matrix. Hence, characterizing swimmer trajectories in the biofilm matrix is of particular interest to understand and optimize this new biocontrol strategy in particular, but also more generally to decipher ecological drivers of population spatial structure in natural biofilms ecosystems. In this study, a new methodology is developed to analyze time-lapse confocal laser scanning images to describe and compare the swimming trajectories of bacilli swimmers populations and their adaptations to the biofilm structure. The method is based on the inference of a kinetic model of swimmer populations including mechanistic interactions with the host biofilm. After validation on synthetic data, the methodology is implemented on images of three different species of motile bacillus species swimming in a Staphylococcus aureus biofilm. The fitted model allows to stratify the swimmer populations by their swimming behavior and provides insights into the mechanisms deployed by the micro-swimmers to adapt their swimming traits to the biofilm matrix.


May 24, 2022
Abstract Biofilms are spatially organized communities of microorganisms embedded in a self-produced organic matrix, conferring to the population emerging properties such as an increased tolerance to the action of antimicrobials. It was shown that some bacilli were able to swim in the exogenous matrix of pathogenic biofilms and to counterbalance these properties. Swimming bacteria can deliver antimicrobial agents in situ, or potentiate the activity of antimicrobial by creating a transient vascularization network in the matrix. Hence, characterizing swimmer trajectories in the biofilm matrix is of particular interest to understand and optimize this new biocontrol strategy in particular, but also more generally to decipher ecological drivers of population spatial structure in natural biofilms ecosystems.
In this study, a new methodology is developed to analyze time-lapse confocal laser scanning images to describe and compare the swimming trajectories of bacilli swimmers populations and their adaptations to the biofilm structure. The method is based on the inference of a kinetic model of swimmer populations including mechanistic interactions with

Introduction
Biofilm is the most abundant mode of life of bacteria and archaea on earth [15,14]. They are composed of spatially organized communities of microorganisms embedded in a self-produced extracellular polymeric substances (EPS) matrix. EPS are typically forming a gel composed of a heterogenous mixture of water, polysaccharides, proteins and DNA [13]. The biofilm mode of life confers to the inhabitant microbial community strong ecological advantages such as resistance to mechanical or chemical stresses [3] so that conventional antimicrobial treatments remain poorly efficient against biofilms [6]. Different mechanisms were invoked such as molecular diffusionreaction limitations in the biofilm matrix and the cell type diversification associated with stratified local microenvironments [5]. Biofilms can induce harmful consequences in several industrial applications, such as water [2], or agri-food industry [12], leading to significant economic and health burden [23]. Indeed, it was estimated that the biofilm mode of life is involved in 80% of human infection and usual chemical control leads to serious environmental issues [3]. Hence, finding efficient ways to improve biofilm treatment represents important societal sustainable perspectives.
Motile bacteria have been observed in host biofilms formed by exogenous bacterial species [18,27,36,13]. These bacterial swimmers are able to penetrate the dense population of host bacteria and to find their way in the interlace of EPS. Doing so, they visit the 3D structure of the biofilm, leaving behind them a trace in the biofilm structure, i.e. a zone of extracellular matrix free of host bacteria (1 a and Appendix A A.3). Hence, bacterial swimmers are digging a network of capillars in the biofilm, enhancing the diffusivity of large molecules [18], allowing the transport of biocide at the heart of the biofilm, reducing islands of living cells. The potentiality of bigger swimmers has also been studied for biofilm biocontrol, including spermatozoa [31], protozoans [11] or metazoans [22]. Recent results suggest a deeper role of bacterial swimmers in biofilm ecology with the concept of microbial hitchhiking: motile bacteria can transport sessile entities such as spores [33], phages [50] or even other bacteria [41], enhancing their dispersion within the biofilm. Hence, characterizing microbial swimming in the very specific environment of the biofilm matrix is of particular interest to decipher biofilm spatial regulations and their biocontrol, but more generally in an ecological perspective of microbial population dynamics in natural ecosystems.
Bacterial swimming is strongly influenced by the micro-topography and bacteria deploy strategies to sense and adapt their motion to their environment [25], with specific implications for biofilm formation and dynamics [9]. Model-based studies were conducted to characterize bacterial active motion in interaction with an heterogeneous environment. An image and modelbased analysis showed non-linear self-similar trajectories during chemotactic motion with obstacles [24]. Theoretical studies explored Brownian dynamics of self-propelled particles in interaction with filamentous structures such as EPS [20] or with random obstacles, exhibiting continuous limits and different motion regimes depending on obstacle densities [8,7]. Image analysis characterized different swimming patterns in polymeric fluids [34], completed by detailed comparisons between a micro-scale model of flagellated bacteria in polymeric fluids and high-throughput images [30]. Models of bacterial swimmers in visco-elastic fluids were also developed to study the force fields encountered during their run [26]. However, to our knowledge, no study tried to characterize swimming patterns in the highly heterogeneous environment presented by an exogenous biofilm matrix.
In this study, we aim to provide a quantitative characterization of the different swimming behaviours in adaptation to the host biofilm matrix observed by microscopy. We focus on identifying potential species-dependent swimming characteristics and quantifying the swimming speed and direction variations induced by the host biofilm structure. To address these goals, three different Bacillus species presenting contrasted physiological characteristics are selected. First, different trajectory descriptors accounting for interactions with the host biofilm are defined, allowing to discriminate the swim of these bacterial strains by differential analysis. Then, a mechanistic random-walk model including swimming adaptations to the host biofilm is introduced. This model is numerically explored to identify the sensitivity of the trajectory descriptors to the model parameters. An inference strategy is designed to fit the model to 2D+T microscopy images. The method is validated on synthetic data and applied to a microscopy dataset to decipher the swimming behaviour of the three Bacillus.

Ultrastuctural bacterial morphology
Images of three bacterial swimmers -Bacillus pumilus (B. pumilus), Bacillus sphaericus (B. sphaericus) and Bacillus cereus (B. cereus) -acquired by Transmitted Electron Microscopy (TEM) are displayed in 2. Important structural and physiological differences can be observed between these Bacillus. First, they show noticeable difference in length and diameter, B. sphaericus being the longest bacteria by a factor of approximatively 1.5, Figure 1: Microscopy data and model outlines. (a) Temporal stacks of 2D images are acquired, with different fluorescence colors for host bacteria (Staphylococcus aureus, green) and swimmers (Bacillus pumilus, Bacillus sphaericus or Bacillus cereus, red). Bacterial swimmers navigate in a host biofilm and are tracked in the different snapshots. Swimmer trajectories are represented with white lines. High density and low density zones of host cells are visible in the biofilm (green scale). (b) Additionally to speed and acceleration distributions, three trajectory descriptors are considered. Distance is the total length of the trajectory path. Displacement is the distance between the initial and final points of the trajectory. Visited area is the total area of the pores left by the swimmer during its path. Hence, when a swimmer retraces its steps, the displacement is incremented but not the visited area. (c)Three different mechanisms are considered in the mechanistic model.
Biofilm-dependant speed. A target speed is defined accordingly to the local density of biofilm and asymptotically reached after a relaxation time. Biofilm-dependent direction. Swimming direction is defined accordingly to the local biofilm density gradient. Random walk. A Brownian motion is added. (d) The image acquisition workflow is composed of a first step at the wet lab where host biofilm and swimmer are plated and imaged in different color channels. Then a post-processing phase recomposes the swimmer trajectories with tracking algorithms. Finally, temporal positions, speeds and accelerations are computed. On the biofilm channel, density and density gradient maps are processed at each time step. Images at lower scale are made with a zoom in on the flagella insertion (right panel). Note that the zoom in is optical so that the zoomed in image do not correspond to a zone of the larger scale images. and B. cereus and B. pumilus having similar size, but B. cereus showing a higher aspect ratio. Secondly, they do not have the same type of flagella: B. pumilus and B. sphaericus present several long flagella distributed over the whole surface of the membrane while B. cereus shows a unique brush-like bundle of very thin flagella, at its back tip.
We now wonder if these ultrastructural differences could impact their swimming behaviour in a host biofilm or in a Newtonian control fluid: could the longer body of B. sphaericus be an impediment in a crowded environment such as a biofilm or on the contrary could its larger size give it a higher strength to cross the biofilm matrix? Is the unique brush-like flagella of B. cereus an advantage or a disadvantage to swim in a Newtonian fluid or in a host biofilm?
, is plotted versus the direction change, defined as the angle θ s i (t) between the incoming and outgoing velocity vectors θ s . The left and bottom panels indicate the marginal distributions, with the mean (dashed line) and quantiles 0.05, 0.5 and 0.95 (plain lines). Lower panel: we display the distribution of the instantaneous acceleration norm respectively to the local biofilm density gradient (i.e. ||A i (t)|| function of ∇b(X i (t))) and of the instantaneous velocity norm respectively to the local biofilm density (i.e. ||V i (t)|| function of b(X i (t)), structured by population. The point cloud of each species is approximated by a gaussian kernel and gaussian kernel isolines enclosing 5, 50 and 95% of the points centered in the densest zones are displayed to facilitate comparisons between species (see Materials and Methods Plots and statistics). 8

Characterizing bacterial swimming in a biofilm matrix through image descriptors
2D+T Confocal Laser Scanning Microscopy (CLSM) of the three Bacillus swimming in a Staphylococcus aureus (S. aureus) host biofilm or in a control Newtonian buffer are acquired (see 1 d). Swimmers and host biofilms are imaged with different fluorescence dyes, allowing their acquisition in different color channels, and to recover in the same spatio-temporal referential the swimmer trajectories and the host biofilm density (see Materials and Methods and 1). Namely, for each species s and individual swimmer i, we recover the initial (T s 0,i ) and final (T s end,i ) observation times (when the swimmer goes in and out the focal plane, see Material and Methods sect. Confocal Laser Scanning Microscopy (CLSM)), and the number T s i of time points in the trajectory. We then extract from the 2D+T images the observed position, instantaneous speed and acceleration time-series x) the dynamic biofilm density maps obtained from the biofilm images, we also compute the local biofilm density and density gradient along trajectories t → b s (t, X s i (t)), and t → ∇b s (t, X s i (t)). The angle θ s i (t) and the average velocityV s i (t) between two successive speed vectors are also collected (see 4 and Material and method sec. Post-processing of image data).
Different swimming patterns can be deciphered by qualitative observations of the trajectories X s i (t) (3) in the biofilm and in the control Newtonian buffer, and run-and-tumble swimming patterns are quantified with θ s i (t) andV s i (t) (4). B. sphaericus has a similar run-and-reverse behaviour in the biofilm and the control buffer with trajectories divided between back and forth paths around the starting point and long runs, the biofilm strongly impairing its speed and increasing the number of reverse events. By contrast, B. pumilus clearly switches its swimming behaviour in the biofilm, from quasi-straight runs in the Newtonian buffer to a pronounced run-andreverse behaviour in the biofilm with decreased speeds and chaotic trajectories. On the contrary, B. cereus swimmers manage to conserve comparable trajectories and distributions of swimming speed and direction in the biofilm compared to control. Interestingly, the number of reverse events is even reduced in the host biofilm for B. cereus.
For further quantitative analysis, trajectory descriptors are defined. We first investigate the distribution of the population-wide average acceleration and velocity norms 1 where · denotes the Euclidian norm. We also quantify the swimming kinematics by computing the travelled distance dist s i along the path and the total displacement disp s i , i.e. the distance between the initial and final trajectory points, We finally compute the total biofilm area visited by a swimmer along its path (see 1 b). The same descriptors are computed in the control Newtonian buffer.
The three species present contrasted distributions for these descriptors (5). B. sphaericus has the smallest mean (||A|| = 0.58 and ||V || = 0.70) and median ( A = 0.50 and V = 0.53) values of acceleration and speed, while B. pumilus has the widest distributions (difference between 95 and 5% centiles of 2.76 for A and 2.45 for V compared to 1.00, 1.51 and 1.90, 1.49 for B. sphaericus and B. cereus respectively). B. cereus for its part shows the highest accelerations, indicating larger changes in swimming velocities, but median and mean speeds comparable to B. pumilus (5, A and V panels). We also note that B. sphaericus and to a lower extent B. pumilus trajectories have a significant amount of null or small average speeds, while B. cereus trajectories have practically no zero velocity, consistently with the qualitative analysis (5, V panels). Small velocities episodes of B. sphaericus and B. pumilus could occur during their back-and-forth trajectories, which produce small displacements and pull the displacement distribution towards lower values than B. cereus (5 , Disp panel). B. pumilus displacement is intermediary. Conversely, back-and-forth trajectories can produce large swimming distances for B. sphaericus and B. pumilus(mean adimensioned value of 32.2 and 43.2 respectively) so that B. sphaericus has a distance distribution comparable to B. cereus (mean adimensioned value of 29.6,5 , Dist panel), but lower than B. pumilus. Observing conjointly displacement and distance (5, lower-right panel) provides consistent insights: B. sphaericus shows a large variability of small displacement trajectories, from small to large distances, while B. cereus trajectory displacement seems to vary almost linearly with the distance at least for the points inside the isoline 50%. B. pumilus has again an intermediary distribution, with a large range of displacement-distance couples. The distributions of visited areas of B. pumilus and B. cereus are almost identical, and higher than B. sphaericus one. Compared to the control buffer, all descriptors are reduced in the biofilm. Consistently with previous observations, the displacement (disp) is strongly reduced for B. pumilus, and less impacted for B. sphaericus and B. cereus. These observations must be related to the behavioural switch for B. pumilus and to the identical swimming patterns for the two other Bacilii in the biofilm compared to the control fluid.
All together, this data depict 1) a long-range species, B. cereus, which moves efficiently in the biofilm during long, relatively straight, rapid runs, almost identically as in a Newtonian fluid 2) a short-range species, B. sphaericus, that moves mainly locally in small areas in the biofilm and in the con-trol buffer with lower accelerations and speeds except few exceptions (only 6% of its trajectories induced a displacement higher than 10µm compared to 28% for B. cereus and 26% for B. pumilus) and 3) a medium-range species, B. pumilus, with a large diversity of rapid trajectories, from small to large displacement, and a behavioural change from straight runs in a Newtonian fluid to frequent run-and-reverse events in the biofilm. These kinematics discrepancies for B. pumilus and B. cereus allow them however to cover identical visited areas.
Though, these global descriptors do not inform about potential adaptations of the swimmers to the biofilm matrix. We first check if swimmer velocities are directly linked to the local biofilm density, and if the swimmers adapt their trajectory according to density gradients by plotting the points ( ∇b(t, X s i (t)) , A s i (t) ) and (b(t, X s i (t)), V s i (t) ) (5, lower panel). Clear differences between the three species can be deciphered. First, the three Bacillus do not have the same distribution of visited biofilm density and gradient. B. pumilus swimmers visit denser biofilm with higher variations than the other species while B. sphaericus and B. cereus stay in less dense and smoother areas, the quantile 0.5 of these species being circumscribed in low gradient and low density values. Next, B. cereus has a wider distribution of accelerations, specially for small density gradients, compared to B. pumilus and B. sphaericus. This could indicate that when the biofilm is smooth, B. cereus samples its acceleration in a large distribution of possible values. Finally, we observe that the speed distribution rapidly drops for increasing biofilm densities for B. sphaericus and B. cereus, while the decrease is much smoother for B. pumilus. These observations provide additional insights in the species swimming characteristics: B. pumilus swimmers seem to be less inconvenienced by the host biofilm density than the other species, while B. cereus and B. sphaericus bacteria appear to be particularly impacted by higher densities and to favor low densities where it can efficiently move. Though, B. sphaericus has lower motile capabilities than B. cereus when the biofilm is not dense.

Analysis of swimming data with an integrative swimming model
This descriptive analysis does not allow to clearly identify potential mechanisms by which the swimmers adapt their swim to the biofilm structure or to simulate new species-dependant trajectories. We then build a swimming model based on a Langevin-like equation on the acceleration that involves several swimming behaviours modelling the swimmer adaptation to the biofilm. Furthermore, after inference, new synthetic data can be produced by predicting swimmer random walks sharing characteristics comparable to the original data. We consider bacterial swimmers as Lagrangian particles and we model the different forces involved in the update of their velocity v. We assume that the swimmer motion can be modelled by a stochastic process with a deterministic drift (1 c): where the right hand side is composed of two deterministic terms in addition to a gaussian noise, each weighted by the parameters γ, β and . The first term implements the biological observation (5 b) that the bacterial swimmers adapt their velocity to the biofilm density. This term can be interpreted as a speed selection term that pulls the instantaneous speed of the swimmer towards a prescribed target velocity α(b) that depends on the host biofilm density b. The weight γ can be interpreted as a penalization coefficient. In such a formalism, the difference between the swimmer and the prescribed speed is divided by a relaxation time τ to be homogeneous to an acceleration. Hence, γ is proportionally inverse to τ , γ ∼ 1 τ . As a first order approximation of the speed drop observed in 5 b for increasing b, the target speed α(b) is modeled as a linear variation between v 0 and v 1 , where v 0 is the swimmer characteristic speed in the lowest density regions, where b = 0, and v 1 in the highest density zones where b = 1: The second term updates the velocity direction according to the local gradient of the biofilm density ∇b. The sign of β indicates if the swimmer is inclined to go up (negative β) or down (positive β) the host biofilm gradient, while the weight magnitude indicate the influence of this mechanism in the swimmer kinematics. We note that this term does not depend on the gradient magnitude but only on the gradient direction: this reflects the implicit assumption that the bacteria are able to sense density variations to find favorable directions, but that the biological sensors are not sensitive enough to evaluate the variation magnitudes.
The third term is a stochastic 2-dimensional diffusive process that models the dispersion around the deterministic drift modelled by the two first terms. We define η ∼ N (0, ) The term η can also be interpreted as a model of the modelling errors, tuned by the term . Eq. (1) is supplemented by an initial condition by swimmer. For vanishing v or ∇b leading to an indetermination, the corresponding term in the equation is turned off. Eq.(1) links the observed biofilm density and the swimmer trajectories trough mechanistic swimming behaviours. The model fitting can be seen as an ANOVA-like integrative statistical analysis of the image data. It decomposes the observed acceleration variance between mechanistic processes describing different swimming traits in order to decipher their respective influence on the swimmer trajectories while integrating heterogeneous data (density maps b and trajectories kinematics).
We can define characteristic speed and acceleration V * and A * in order to set a dimensionless version of Eq. (1) This dimensionless version will strongly improve the inference process and will allow an analysis of the relative contribution of the different terms in the kinematics. An extended numerical exploration of this model is performed in Appendix B Sec. Numerical exploration on mock biofilm images to illustrate the impact of the different parameters on the trajectories, showing in particular the interplay between γ and : counter-intuitively, straight lines are induced when the stochastic part is high compared to the speed selection parameter γ (see also Appendix B).

Inferring swimming parameters from trajectory data
For each bacterial swimmer population, we now seek to infer with a Bayesian method population-wide model parameters governing the swimming model of a given species from microscope observations.

Inference model setting
Equation (2) is re-written as a state equation on the acceleration for the bacterial strain s and the swimmer i where θ s := (γ s , v s 0 , v s 1 , β s ) are species-dependant equation parameters. The function f A can be seen as the deterministic drift of the random walk, gathering all the mechanisms included in the model. The inter-individual variability of the swimmers of a same species comes from the swimmer-dependent initial condition, the resulting biofilm matrix they encounters during their run, and the stochastic term.
Inferring the parameters θ s can then be stated in a Bayesian framework as solving the non linear regression problem and additional constrains on the parameters We note that Equation (5) can be seen as a likelihood equation of the pa- The parameter s can now be seen as a corrector of both modelling errors in the deterministic drift and observation errors between the observed and the true instantaneous acceleration. Alternative settings where these uncertainties sources are separated and a true state for position and acceleration is inferred can be defined (see Annex Various inference models). The inference problem is implemented in the Bayesian HMC solver Stan [43] using its python interface pystan [39]. Inference accuracy is thoroughly assessed on synthetic data (see Appendix A Assessment of the inference with synthetic data and 6).

Analysis of the confocal microscopy dataset
We now solve the inference problem (5)-(6) on the confocal microscopy dataset to identify population-wide swimming model parameters in order to decompose the swimmer kinematics in three mechanisms: biofilm-related speed selection, density-induced direction changes and random walk. The inference process is assessed by comparing the descriptors obtained on trajectories predicted by the fitted model (7 a) with descriptors of real trajectories (5 b). The mean values of acceleration and speeds are accurately predicted for the three species (7 a, panels A and V , dashed lines). Relative positions of distance, displacement and visited area mean values are also correctly simulated (5 b and 7 a, upper panel). B. sphaericus presents the lowest predicted accelerations and speeds while B. pumilus has the widest speed and acceleration distributions and B. cereus shows the highest accelerations, consistently with the data. The visited area and the distances are slightly over estimated, but the relative position and the shape of the distributions are conserved. The amount of null velocities for B. sphaericus is under estimated by the fitted model and not rendered for B. pumilus. The distance distributions of the three species are accurately predicted by Figure 6: Inference assessment on synthetic data.(a) Predicted vs true trajectories. Trajectories are recovered by sampling the parameter posterior distribution starting from the same initial condition than in the data. We represent a ground truth trajectory extracted randomly from the original dataset in red, the corresponding sampled trajectories with thin gray lines, and the trajectory obtained with the posterior means in orange. Note that in this simulation, the stochastic part is the same for all simulations, so that the only source of uncertainties comes from the inference procedure. (b) trajectory descriptors. Trajectories are re-computed replacing the original parameters (ground truth) by the inferred parameters. The trajectory descriptors introduced in Characterizing bacterial swimming in a biofilm matrix through image descriptors are computed on the synthetic data (blue curves) and on the data obtained with the inferred parameters (orange curves). Ground truth vs fitted trajectories. (c) The ground truth, i.e. the original trajectories (blue) and fitted (red) trajectories are displayed and show common characteristics. Qqplot of fitted model output vs ground truth(d-e). After inference, the fitted model is used to recompute the synthetic dataset. We plot the x (d) and y (e) components of the accelerations in a qqplot: the fitted model output quantiles are plotted against the quantiles of the original dataset (ground truth) with blue dots, together with the y = x line (red). the fitted model. When displaying conjointly the distance and the displacement (7 a, right lower panel), the distribution of B. sphaericus is correctly predicted by the simulations, but B. cereus and B. pumilus displacements are underestimated. Some qualitative features can be recovered, such as the higher distribution of distance-distribution couples for B. cereus or higher displacement for B. cereus compared to B. sphaericus.
Descriptors of swimming adaptations to the host biofilm are also correctly preserved for the main part (5 b and 7 a, lower panel). B. pumilus is the species that crosses the highest biofilm densities in the fitted model simulations, showing the highest speeds in this crowded areas, and that visits the most frequently areas with high density gradients, consistently with the data. As in the confocal images, the simulated B. sphaericus and B. cereus favor smoother zones of the biofilm with lower biofilm densities. The B. cereus fitted model correctly render the highest acceleration variance observed in the data for low biofilm gradients, while B. sphaericus speed and acceleration variance is the lowest for all ranges of biofilm densities and gradients, both in the data and in the fitted model predictions. The drop of speeds and accelerations for increasing biofilm densities and gradients is well predicted for B. pumilus, but is smoother in the simulation compared to the data for B. sphaericus and B. cereus. In particular, the sharp drop of speeds for b 0.25 observed in the data for B. cereus and B. sphaericus is underestimated by the fitted model.
All together, the model reproduces very accurately the mean values of acceleration, speed and visited area, renders relative positions and the main characteristics of distributions for distance, displacement and interactions with the host biofilm matrix, but produces less variable outputs than observed in the data, meaning that the model is less accurate in the distribution tails. The main features of the swimmer adaptation to the underlying biofilm are however correctly predicted by the model.
To further inform the fitted model accuracy, the coefficient of determina- (3), in order to quantify the goodness of fit of the friction and gradient terms of eq. (2) that represent interactions with the biofilm. These results highlight that B. cereus bacteria do present an important stochastic part in the accelerations, while the B. pumilus species is the best represented by our deterministic modelling.
The three species present very different inferred parameter values (7 b and 2), showing that the model inference captures contrasted swimming characteristics of these Bacillus. Due to the mechanistic terms introduced in Eq. (1), these differences can be interpreted in term of speed and direction adaptations to the host biofilm. First, B. pumilus shows the highest v 0 value, and the highest amplitude between v 0 and v 1 , inducing a higher ability for B. pumilus to swim fast in low density biofilm zones and strong deceleration in crowded area. In comparison, B. sphaericus presents the smallest amplitude between v 0 and v 1 showing a poor adaptation to biofilm density. B. cereus has the highest γ value, showing a reduced relaxation time toward the density dependant speed: in other words, B. cereus is able to adapt its swimming speed more rapidly than the other species when the biofilm density varies. B. cereus swimmers are also better able to change their swimming direction in function of the biofilm variations they encounter along their way, their β distribution being markedly higher than the other species which have very low β. Finally, the stochastic parameter is also contrasted, from a low distribution for B. sphaericus to high values for B. cereus. All together, the inference complete the observations made in 5 b: B. pumilus poorly adapts its swimming direction to the host biofilm (low β) but has a wide range of possible speeds when the biofilm density varies (high v 0 , low v 1 ), that it can reach quite rapidly (intermediary γ) with intermediary stochastic correction ( ). In contrast, B. cereus reaches lower speed values (intermediary v 0 , low v 1 ) but is more agile to adapt its swimming to its environment by changing rapidly its speed when the biofilm density is more favorable (highest γ) and adapting its swimming direction to biofilm variations, with higher stochastic variability (large ). Finally, B. sphaericus is the less flexible of the three bacteria: less fast (smallest difference between v 0 and v 1 ), they are also less responsive to biofilm variations (small γ and β) with low random perturbations (small ).
Finally, after inference, the impact of each term in the overall acceleration data can be quantified and analyzed by displaying its relative contribution in a ternary plot (Appendix B B.6). This relative contribution can be measured thanks to the swimming model which integrates these different mechanisms in the same inference problem. The direction selection is the less influential mechanism for the three species, with a slightly higher impact for B. cereus (50 and 95 % isolines slightly shifted towards A(∇b) in Appendix B B.6 a). When zooming in, the three Bacillus show differences in the balance between speed selection and the random term (Appendix B B.6 b): while B. pumilus is slightly more influenced by the friction term than by stochasticity, these mechanisms are perfectly balanced in B. sphaericus accelerations, while B. cereus is more influenced by the random term.

Interpretation of the bacterial swimming at the light of their morphology
Kinematics descriptors and swimming parameters can then be reinterpreted through the insights provided by the morphology of each bacteria species as shown in 2.      bling events of which are induced by reverse rotation of the cellular motor of its multiple flagella [34]. Additional functional characteristics may discriminate B. pumilus and B. sphaericus, since run-and-reverse swimming is the natural behaviour of B. sphaericus even in the Newtonian control buffer, whereas B. pumilus drastically reduces its speed in high-density biofilms (7, a) and starts tumbling in the host biofilm (4). B. pumilus has the highest number of flagella and is the bacteria that reaches the highest speeds specially in the Newtonian buffer and in low-density areas, indicating that this characteristic may be an advantage for swimming fast in the extracellular matrix. The kind, size and disposition of the flagella bundle may help B. cereus swimmers to adapt their runs to their environment by changing directions to follow lower density areas (higher impact of direction selection term of the three Bacillus in Appendix B B.6) or to adapt rapidly when biofilm density varies (largest γ). B. cereus being the bacteria with the strongest stochastic part (highest , density shifted towards A( ) in Appendix B B.6), this morphology could also help the swimmer to go through the biofilm by random navigation, which helps to maintain comparable straight trajectory with or without biofilm when the stochastic part is higher than the speed selection term (Appendix A A.1, Appendix B B.3 and Appendix B B.6). Finally, B. sphaericus bacteria are much longer than the other two species, which may explain why this species is the less motile in terms of acceleration and kinematics, both in biofilms and in the Newtonian control buffer. 20 3 Discussion

Modelling and analysis of swimming trajectories
When analyzing microbial swimming trajectories, two general strategies can be found in the literature. The first one aims at designing statistical tests quantifying similarities with or deviations from typical motion of interest such as diffusion [34]. Another strategy consists in providing a generative model of the data, analyzing it [8,7] and comparing model outputs with real data [24,20], possibly after inference. The model that is studied in this paper belong to the second category: the model includes deterministic mechanisms describing interactions with the host biofilm, together with a random correction counterbalancing the modelling errors. The parameter inference allows to interpret the data variance relatively to speed or direction adaptations to the host biofilm versus residual effects gathered in the stochastic term. This method is comparable to ANOVA-like multivariate analysis: the parametric phenomelogical mappings between explicative co-variables and a swimming behaviour (for example the function defining speed selection from biofilm density) are gathered in the same inference problem, enabling to decompose acceleration variability between the different swimming behaviours. This integrative method allows for multi-data integration and co-analysis. Furthermore, the fitted model allows to simulate typical swimming trajectories of a given species.

Population-wide swimming characteristics vs true-state
inference.
In this study, we do not aim to recover 'true' swimmer trajectories (a.e. the blue trajectory in Appendix B B.4), i.e. identifying through smoothing techniques an approximation of the specific realization of the stochastic modeling and observation errors that lead to a given 'observed' trajectory. Rather, the goal is to identify common characteristics shared by a population of trajectories by inferring the 'population-wide' parameters (the parameters α, β, v 0 , v 1 , γ and ) that best explain the whole set of observed accelerations in a same population of swimmers. For this reason, we did not introduce swimmer-specific terms nor individual noise: they would have increased the model accuracy, but to the price of a blurrier characterization of the species specificities. This choice determined our inference framework. Despite several alternative options for recovering hidden states, in particular SSM (space state models) which are common in spatial ecology [1], the Bayesian method we opted for is a simpler non-linear regression problem that proved to be sufficient to recover macroscopic swimmer trajectories and species stratification. We discuss in Appendix C Various inference models the different options that were tested and present in Material and Method Sec. Inference the method for noise model selection. Among other interesting features, the Bayesian method provides confidence intervals on the final parameter estimation, and on the resulting trajectories as in 6 a.

Predictive capabilities of the model
The deterministic terms of the model explain only half of the variance (3). A major part of the underlying mechanisms is not correctly described by our model which is a common feature since it is a phenomenological model which only considers interactions with the underlying biofilm at a macroscopic level, without taking into account nanoscale physical mechanisms. A more detailed description of the underlying physics could have been designed as in [30], but it would have made more complex the analysis of the interactions between the host biofilm and the swimmer trajectories and the extraction of species-specific patterns. However, we note that our model correctly renders observations made through macroscopic trajectory descriptors, even though the inference process has not been made based on these observables. Furthermore, several repetitions of the same models with different samples of the stochastic terms give very similar values for the trajectory descriptors (see Appendix B B.5 and section Influence of inference and stochastic terms on the trajectory descriptors), showing that these descriptors are robust to stochastic perturbations. Hence, the model (2) can be used to produce synthetic data sharing the same global characteristics than the original ones specifically taking into accounts interactions between the swimmers and the host biofilm. Furthermore, these predictions also reproduce the species stratification observed in the original data using the global descriptors.

Biological interpretation of the fitted models
The direction selection term of the equation driven by β has little impact in the swimmer model fitted on real data. The parameter β can however have a sensible impact on the kinematics as shown in the sensitivity analysis, and on the trajectories in mock biofilms (Appendix B B.1 d). This could indicate that direction selection based on biofilm gradients is marginally effective in real-life swimming trajectories in a biofilm matrix. On the contrary, the speed selection term is more effective for the three Bacillus, showing that these micro-swimmer are able to adapt their swimming velocity to the biofilm density faced during their run. This term acts as an inertial term which enhances the stochastic term to provide direction and velocity changes.
The model has been used to decipher different adaptation strategies to the host biofilm of the three species during their swim. It confirms that B. sphaericus are the less motile bacteria in the biofilm, with reduced speeds and adaptation capabilities as indicated by the smallest model parameter values and a stereotypic run-and-reverse behaviour inside or outside the biofilm. B. pumilus on the contrary drastically changes its swimming behaviour in the biofilm compared to the Newtonian control buffer, which is reflected in the model by a high amplitude between v 0 and v 1 and a high γ that indicates a rapid adaptation for varying biofilm densities. B. cereus shows the highest adaptation ability to the biofilm matrix, with the highest γ and β reflecting biofilm-induced speed and direction changes. Furthermore, the high stochastic effects (highest ) higher than the speed selection term tuned by γ (see Appendix B B.6) allows this swimmer to conserve straight runs in the biofilm (see Appendix B Sec. Friction and random term in Langevin equations) in the same way than in the control Newtonian fluid.
This characterization methodology could be used to drive species selection for improved biofilm control. Furthermore, the model can be used to predict new trajectories and the resulting biofilm vascularization, in a similar framework as in [18]. Coupled with a model of biocide diffusion, these simulations could be used to test numerically the efficiency of monoor multi-species swimmer pre-treatment to improve the removal of the host biofilm.

Flagellated bacteria in polymeric solutions.
Characterization of flagellated bacteria motility in polymeric solutions is a very active research area [30,34,51,37,38]. Speed and direction variations have been measured for various polymeric fluids with different viscoelastic properties. For the model bacteria E.coli in polymeric solutions, enhanced viscosity decreases tumbling while increased elasticity speeds up the swimmers [34,51]. In our experiments on the contrary, we observed decreased speeds and strong enhancement of reverse events for the flagellated B. sphaericus and B. pumilus in the biofilm compared to the Newtonian control buffer. However, the experimental set-up strongly differ: the complex rheology of S. aureus biofilms may strongly differ from polymeric fluids even if under certain condition they can be considered as visco-elastic fluids [16], impacting differently the swimmer behaviours. Furthermore, the physiology of the motor cell in the Gram-positive Bacillus differs from the one of the Gram-negative E.coli [46,45,44]. Finally, the particular brush-like flagella bundle of B. cereus may allow this species to conserve the same swimming in Newtonian and crowded environments, by adapting its swimming speed to the local density and otherwise randomly selecting swimming directions across the host biofilm. To generalize this approach to other contexts, these study should be reproduced for other swimmers and other host biofilms, together with polymeric fluids and porous media, including biochemical interactions. 23

Infiltration of host biofilms by bacilli swimmers
Infiltration of S. aureus biofilms by bacilli swimmers were prepared in 96-well microplates. Submerged biofilms were grown on the surface of polystyrene 96-well microtiter plates with a µclear ® base (Greiner Bio-one, France) enabling high-resolution fluorescence imaging [4]. 200 µL of an overnight S. aureus RN4220 pALC2084 expressing GFP [28] cultured in TSB (adjusted to an OD 600 nm of 0.02) were added in each well. The microtiter plate was then incubated at 30°C for 60 min to allow the bacteria to adhere to the bottom of the wells. Wells were then rinsed with TSB to eliminate nonadherent bacteria and refilled with 200 µL of sterile TSB prior incubation at 30 celsius for 24 h. In parallel, B. sphaericus 9A12, B. pumilus 3F3 and B. cereus 10B3 were cultivated overnight planktonically in TSB at 30°C. Overnight cultures were diluted 10 times and labelled in red with 5 µM of SYTO 61 (Molecular probes, France). After 5 minutes of contact, 50 µL of labelled fluorescent swimmers suspension were added immediately on the top of the S. aureus biofilm. All microscopic observations were collected within the following 30 minutes to avoid interference of the dyes with bacterial motility. Three replicates were conducted. The same protocol has been repeated without the host biofilm (control experiments): the swimmers are added to the buffer only which is a Newtonian fluid.

Confocal Laser Scanning Microscopy (CLSM)
The 96 well microtiter plate containing 24h S. aureus biofilm and recently added bacilli swimmers were mounted on the motorized stage of a Leica SP8 AOBS inverter confocal laser scanning microscope (CLSM, LEICA Microsystems, Germany) at the MIMA2 platform (https://www6.jouy.inra. fr/mima2_eng/). Temperature was maintained at 30 celsius during all experiments. 2D+T acquisitions were performed with the following parameters: images of 147.62 x 147.62 µm were acquired at 8000 Hz using a 63×/1.2 N.A. To detect GFP, an argon laser at 488 nm set at 10% of the maximal intensity was used, and the emitted fluorescence was collected in the range 495 to 550 nm using hybrid detectors (HyD LEICA Microsystems, Germany). To detect the red fluorescence of SYTO61, a 633 nm helium-neon laser set at 25% and 2% of the maximal intensity was used, and fluorescence was collected in the range 650 to 750 nm using hybrid detectors. Images were collected during 30 seconds (see 1 for sampling period).
Bacterial swimmers navigate within a three-dimensional biofilm matrix and confocal microscope refreshment time is not small enough to allow 3D+T images. To limit 3D trajectories, a focal plane near the well edge has been selected, where the well wall physically constrains the swimmer trajectories in one direction, which select longer trajectories in the 2D plane that can be tracked in time. Therefore, experimental data are composed of two-dimensional trajectories captured between the swimmer arrival and departure times in the focal plane, and the associated 2D+T biofilm density images that change over time due to swimmer action.
To check that the host biofilm structure is identical near the well's edge compared to other 2D slices, we took 4 replicates of S.aureus biofilms that were imaged in 3D using a stack of 6 horizontal images, starting from z = 0 near the well's edge, to z = 6∆z, at the interface between the biofilm and the bulk solution. To study the between and within biofilm density variability in the horizontal images, we subsampled them with a regular Cartesian 4x4 grid, resulting in a 4x6x(4x4)=384 2D images database supplemented by metadata (stack, z and x−y coordinate of the subsample), before computing a clustered pairwise correlation similarity matrix and a permanova.

Transmitted Electron Microscopy
Materials were directly adsorbed onto a carbon film membrane on a 300mesh copper grid, stained with 1% uranyl acetate, dissolved in distilled water, and dried at room temperature. Grids were examined with Hitachi HT7700 electron microscope operated at 80 kV (Elexience -France), and images were acquired with a charge-coupled device camera (AMT).

Post-processing of image data
See 1 for a sketch of the datastream from microscope raw images to model inputs and Appendix A A.1 for data visualization at each step of the postprocessing pipeline.
Swimmer tracking has been applied on the red channel of the raw temporal stacks with IMARIS software (Oxford Instruments) using the tracking function after automated spots detection to get position time-series for each swimmer. Time-series with less than 8 time steps were filtered out.
Then, swimmer speed and acceleration time-series were computed from their position by finite-difference approximations and trajectory descriptors were extracted. The RGB green channel corresponding to the biofilm density temporal images were converted into grayscale and rescalled between 0 and 1 (linear scalling).
Trajectory descriptors are defined as follows. The mean acceleration and speed values, distance and displacement are computed with To compute the visited area, each trajectory piece was subsampled by computing X s i (t k ) = k ns X s i (t) + (1 − k ns )X s i (t + ∆t) for k = 0, n s , with n s = 10 and the pixels included in the ball B(X s i (t k ), r) with radius r = 2 were labelled. The total area of the labelled pixels is defined as the visited area of the swimmer i of species s.
To assess run-and-tumble behaviour, the angle θ s i (t) and the mean ve-locityV s i (t) between two consecutive speed vectors are defined with θ s ∆t) )/2, for t ∈ (T s 0,i + ∆t, T s end,i ). Post-processed data are available at https://forgemia.inra.fr/bioswimmers/swiminfer/SwimmerData.

Computation of the forward swimming model
Time integration of equations (2) has been solved with an explicit Euler scheme regarding positions x s i,t and velocities v s i,t of the swimmer i of species s at time t: where dv s i,t is given by eq. (2), and depends on θ s , V s i,t , x s i,t , b(t, x s i,t ) and ∇b(t, x s i,t ). In practice, the biofilm density and gradient maps b and ∇b are discretized with a Cartesian grid corresponding to the image pixels.
During random walks, swimmer may exit the biofilm domain. When the swimmer reaches the domain boundary, a new swimmer is introduced with a velocity oriented towards the interior of the domain while the original trajectory is stopped at the boundary.

Sensitivity analysis
A local sensitivity analysis (B.1) is performed by comparing basal simulation obtained with γ = β = = 1 (v 0 and v 1 where taken as in Appendix A A.3) with 3 simulations where γ, β and are alternatively set to 0, resulting in 3 alternative models where the speed or the direction selection or the random term is turned off. The interaction between the speed selection term (set by γ) and the random term is illustrated in Appendix B B.3 where 5 repetitions of the same trajectory of a simplified Langevin equation (11) are displayed with or without friction (γ = 1 or γ = 0), but with the same random seed for the stochastic term so that the stochastic part is strictly identical.
To analyze the impacts of the non-dimensionalized swimming parameters γ, v 0 , v 1 , β, on the locomotion behaviour, a global sensitivity analysis has been performed. The parameter space [0, 1] 5 was uniformly sampled with n = 1, 000 points using the Fourier Amplitude Sensitivity Test (FAST) sampler of the SALib library i.e. the function SALib.sample.fast sampler.sample [10,40]. We note that the interval [0, 1] covers a large parameter domain for some parameters, in particular β which remains small after inference. For this parameter, the sensitivity analysis will show potential impact on the output, that may be ineffective in the parameter range of the inferred model.
For each point in the parameter space, a forward simulation is conducted on a population of swimmers on a representative biofilm extracted from the dataset (first batch of the B. pumilus dataset). Trajectory descriptors are then extracted and taken as observable of the sensitivity anaylsis that requires both the parameters sampling and the associated descriptors. Sobol indices of first order are then returned and pairwise partial correlations matrix has been calculated. Convergence of the Sobol indices has been checked by taking sub-samples containing less than 1, 000 points.

Inference
Numerical implementation The inverse problem (4)- (6) has been implemented using a Hamiltonian Monte Carlo (HMC) method to solve this Bayesian inference problem.
The three replicates for each swimmer species are pooled (trajectories and biofilm density maps) and the input data required for the inference procedure (velocity yV and acceleration yA times series for the whole batch of swimmers, biofilm densities yb and gradient yGb extracted at swimmer positions) were assembled in a customed data structured. Normal standard prior distributions were set for all swimming parameters θ = (γ, v 0 , v 1 , β, ). Additional positivity constrained were imposed for all parameters but β. Therefore, the implemented model can be summarized as: A warmup of 1,000 runs is followed by the Markov chains construction (4,000 iterations for 4 Markov chains). Markov chain convergence is assessed by direct visualization (Appendix A A.4) by checking for biaised covariance structures in pair-plots (Appendix A A.5). Standard convergence index were additionnaly computed: effective sample size per iteration (n ef f ) and potential scale reduction factor (R hat ).
Noise model selection Different noise models have been evaluated for the regression model (5) to take into account batch or individual effects. Namely, we decomposed the noise in Eq. (5) by replacing η s by η s i and/or η s,b for individual i and experimental batch b. Model selection has been conducted by computing the WAIC for the different noise models. A huge degradation of the WAIC has been observed for individual or batch dependant noises, indicating that the enhancement of the inference accuracy provided by the additional parameters can be considered as over-fitting and discarded.

Inference validation on synthetic data
Ground truth data construction Ground truth synthetic data (see section Assessment of the inference with synthetic data) were computed by solving eq. (8) and (2)  Comparing ground truth data with the fitted model After inference, a new dataset is obtained by solving eq. (8) with the fitted parameters. The same initial conditions for speeds and positions as the ground truth data are taken. Trajectories are stopped after the same number of time step as in the corresponding trajectory of the ground truth dataset. To discard spurious stochastic uncertainties, the same random seed as the ground truth simulations was taken, so that the unique uncertainty source was inference errors.
Checking the sensitivity to biofilm image noise To produce Appendix A A.6, the biofilm density and the biofilm density gradient maps have been noised with an additive gaussian noise with increasing variance, before inference: we set b ∼ N (0, √ lσ b ) and ∇b ∼ N (0, where σ b is the variance observed in the original data, and b and ∇b are respectively the noise applied to the biofilm density and the biofilm density gradient. The parameter l ∈ [0, 0.01, 0.02, 0.03, 0.04, 0.05] is increased to apply a noise from 0 to 5%.

Inference validation on experimental data
Comparing microscopy data with the fitted model The same procedure is repeated on the microscopy data: after inference, a new dataset is obtained by solving eq. (8) with the fitted parameter, taking the same initial conditions for speeds and positions. Trajectories are stopped after the same number of time step as in the corresponding trajectory of the ground truth experimental dataset.
Measuring the deterministic reconstruction The deterministic coefficient of determination R 2 det was computed to measure how much the dataset is explained by the deterministic part of the model. Setting A s,det i = f A γ, v 0 , v 1 , β|yb, yV, yb, yGb, dt : whereȳ A s is the acceleration mean. R 2,s det is expected to tend towards 1 when the stochastic term η = N (0, ) becomes negligible with respect to A det . Ternary plots (Appendix B B.6) were obtained by first computing the contribution of each term of equation (4) to acceleration estimate. Namely, note

Plots and statistics
, and s(η) s i = η s .
We compute the proportions A(k) s i for k ∈ {b, ∇b, η} are then plotted in ternary plots using the Ternary python package [29] and approximated by gaussian KDE. Isolines are finally plotted as previously described.
To construct the plot in Appendix A A.2, pairwise correlation of the biofilm density in the 384 samples has been computed (scikit-learn pairwise distances, 'correlation' metric parameter [35]), and the resulting similarity matrix has been displayed using Seaborn package clustermap function [49] after hierarchical clustering (scipy.cluster.hierarchy linkage function [48]). Additional permanova has been computed to assess the significance of between-group dissimilarities using stats.distance package permanova function [42].

Code availability
All the image pre-and post-processing, calculations and statistics have been performed with custom scripts using the standard python libraries numpy [17], scipy [48], imageio [21] and pandas [32]. The forward swimming problem computation is computed using customed scripts built upon numpy [17] and H5py (https://www.h5py.org). Sensitivity analysis has been conducted with the SALib library [10,40] (Sobol index, function SALib.analyze.fast.analyze) and the pingouin library [47] (PCC, pcorr method). The Bayesian inference has been conducted using the STAN library [43] through its python interface pystan [39]. All plots have been made with the matplotlib python library [19].
The whole python code has been made available and accessible at the following git repository https://forgemia.inra.fr/bioswimmers/swim-infer. density map with trajectories.The contrast of the original 2 chanel image has been enhanced for visualization. The biofilm density maps (see Material and methods) are The RGB biofilm density temporal images were converted into grayscale and rescalled between 0 and 1 (linear scalling). In this images, for illustrations, trajectories are mapped into the biofilm density map and rescaled density map at initial condition of the first B. pumilus batch. In the dataset, the trajectories are associated with the corresponding biofilm map : X s i (t) is associated with the value b(t, X s i (t)) for swimmer i of species s at time t. As the biofilm density map is also a time-series, the trajectories can hardly be represented on the underlying biofilm that also changes in time.

A.1.2 Assessing the 3D structure of the biofilm
We check that the selection of a 2D focal plan does not induce an additional bias by over-selecting biofilm areas with specific structures near the well's edge. To do so, we assembled an additional dataset of 4 replicates of S.aureus 3D images (see Material and Methods, section 4.2, and A.2 A for the dataset assembly) of horizontal image subsamples, and computed their within and between dissimilarities (see Material and Methods, section Plots and statistics. The resulting pairwise correlation matrix is displayed in A.2 after hierarchical clustering. It shows that the z direction does not structure the information, since the images are not clustered according to their z coordinates contrary to the stack or the x − y coordinate labels. Permanova analysis shows that the differences between stacks and x − y subsamples are significant (p − value = 1e − 4) but not between horizontal images (p − value = 1). Figure A.2: Assessment that the biofilm structure does not strongly vary in the z direction. A) Subsampling procedure. We illustrate the subsampling procedure in one of the 4 replicates. The 2D images constituting the 3D stack are sampled with a 4x4 cartesian grid. We can also visually observe that the biofilm variation between horizontal images are weak. B) Pairwise correlation matrix. The correlation dissimilarity between sample pairs is displayed (black=0 value, indicating identical samples, to light orange ¿ 1, indicating dissimilar samples) after hierarchical clustering. We indicate the stack,z and x − y label in the 3 first columns with a color code. We can observe that the samples are not gathered by z, but rather by stacks and x − y groups, indicating that images with identical x − y labels are clustered together, showing that they are more similar to samples with the same x − y coordinates in other z slices, than other samples in the same z slice with other x − y coordinates.

A.1.3 Illustration of pore formation
As strongly documented in [18], swimmers can dig pores in a exogenous biofilm, which enhance the biofilm innervation and facilitate the penetration of macromolecules. To illustrate the pore formation, we show two successive images taken from a 2D temporal stack of B. sphaericus swimmers in a S.aureus host biofilm in A.3. In the dashed ellipse, we can see a swimmer that has moved in the two successive images, letting behind it an empty space free from host bacteria.

A.2 Statistical tests
T-tests were performed to compare mean differences between 1D distribution of Figure 5. Resulting p-values are displayed in Appendix A A.2.

A.3 Assessment of the inference with synthetic data
To assess the inference method, synthetic data are built and will be used as reference for assessment. We arbitrarily fix a parameter vector and solve system (1) from random initial positions, in a host biofilm arbitrarily chosen in the image dataset. We then extract the swimmer positions at given timesteps and recover accelerations and speeds with the same post-processing pipeline as for microscopy images and solve the inverse problem (5)- (6). If the inference process correctly works, we expect to recover the original parameters (the ground truth).
The ground truth parameters are correctly recovered by the inference procedure (Appendix A A.3), indicating that the parameters are correctly identifiable and that the inverse problem is well-posed. An error of respectively 1.28, 1.34, 2.98 and 0.68% on the parameters γ, v 0 , v 1 and is observed in this controlled situation, β being inferred with lower accuracy (9.59 %). This estimate is robust to noise on the biofilm data, with highest impact on β (Appendix A A.6). To assess the impact of parameter inference uncertainties on trajectory computation, the posterior parameter distribution is sampled and new trajectories are computed, replacing the ground-truth parameters by the sampled ones. The swimmer ground truth trajectories are accurately recovered: the sampled trajectories tightly frame the original swimmer path as illustrated on a randomly chosen trajectory (6 a). We note that an identical random seed has been taken for these simulations, including the ground truth trajectory, in order to turn off the stochastic uncertainties and only focus on the propagation of inference errors during simulations of swimmer trajectories. Finally, we re-assemble a synthetic dataset by replacing the groundtruth parameters by the inferred ones, i.e. the posterior mean. Qqplot of the fitted model accelerations versus the ground truth accelerations give an excellent accuracy (6 d-e), with all the points lying on the bisector, except slight divergences on the distribution tails. The fitted model trajectories visually reproduce the qualitative characteristics of the original dataset (6 c). The trajectory descriptors of section Characterizing bacterial swimming in a biofilm matrix through image descriptors are then computed on both datasets (ground truth and inferred) and compared (6 b). The kinematics descriptors, i.e. acceleration and speed distributions, are very accurately recovered with a relative error of 0.1%, 3.2%, 5% for respectively the mean, quantiles 0.05 and 0.95 of the acceleration (resp. 0.9%, 2.5% ,2% for speed). Some small discrepancies can be observed on the distance and displacement distributions, even if the mean and the quantiles 0.05 and 0.95 are close. The interactions between the host biofilm and the acceleration and speed distribution are also recovered with high accuracy. We note that part of the observed discrepancies comes from an additional source of variability of the simulation framework: when a swimmer reaches a domain boundary during a simulation, its trajectory is stopped and a new swimmer is randomly introduced elsewhere in the biofilm (see Materials and Methods for more details). This simulation strategy seems to be responsible of the over-representation of short trajectories in the inferred dataset, compared to the ground truth (6 b upper panel, distance and displacement distributions).

A.4 Markov chains convergence and correlation
Markov chain (Appendix A A.4) and markov chain pairplots (Appendix A A.5) are displayed. Direct visualization of the posterior sampling allows to detect convergence failure (strong autocorrelation or stationnary markov chain). Markov chain pairplot informs on potential correlation between different parameters posterior samples, showing an interaction between parameter and an identification issue. In Appendix A A.4, the markov chains correctly converged for all the parameter. No strong correlation can be observed in Appendix A A.5.

A.5 Impact of noise on biofilm data
The impact of noise on the parameter inference is assessed by noising the biofilm density and the biofilm density gradients with an additive gaussian noise with increasing variance (Appendix A A.6). The noise variance is scaled with the variance observed in the original data. Namely, we set b ∼ N (0, and ∇b ∼ N (0, where σ b is the variance observed in the original data, b and ∇b are respectively the noise applied to the biofilm density and the biofilm density gradient and ∆x is a pixel width. The parameter l ∈ [0, 0.01, 0.02, 0.03, 0.04, 0.05] is increased to apply a noise from 0 to 5%. We can observe that the estimate of only two parameters is impacted by noising the biofilm inputs: the estimate of β and v 1 . The parameter β is also the parameter which is the less accurately inferred when no noise is added (5%). Its estimation relative error increases up to 35 % when 5% noise is added. The parameter β tunes the direction selection, which is the less effective process in the swimmer model. The other parameters are recovered with correct accuracy (kept under 18% for v 1 , and under 6% otherwise).  No strong covariance effect can be observed, showing that the model can not be reduced by analytical dependence between parameters. Slight correlation is observed between the parameters v 0 , v 1 and γ: this feature is not surprising since γ, v 0 and v 1 are in the same term of equation (2). The correlation is however too low to expect a model reduction.  To illustrate the impact of each parameter on the interplay between the host biofilm and the swimmers trajectories, the model (2) was first computed on two mock biofilms. The first one is a square linear density gradient and the second is composed of large pores on a textured background mimicking the dense biofilm zones (Appendix B B.1 a). A basal simulation is computed with γ = β = = 1 and will be used later on as reference for comparisons.These three parameters are alternatively set to zero to assess the resulting trajectories when the speed selection, the direction selection or the random term is shut down. Suppressing speed selection results in rectilinear trajectories (γ = 0, Appendix B B.1 c), which is rather counter-intuitive since the remaining terms are designed to tune the direction. A discussion of this phenomenon is provided in Appendix B Influence of inference and stochastic terms on the trajectory descriptors. When suppressing direction selection (β = 0, Appendix B B.1 d), the trajectories are no longer drifted downwards the gradient in the upper panel as in the basal simulation, and no longer follow the pores (lower panel). If the stochastic term is shut down ( = 0, Appendix B B.1 e), the trajectories directly go down the gradients and are trapped in the center of the image in the upper panel. When a pore is found along the run, the swimmer keeps following it without being able to escape the pore any longer unlike the basal situation (lower panel).
The link between the model parameters and the global trajectory descriptors introduced in Section Characterizing bacterial swimming in a biofilm matrix through image descriptors is less intuitive. A global sensitivity analysis of the trajectory descriptors (mean acceleration and speed, distance, displacement and visited areas) with respect to the parameters γ, v 0 , v 1 , β and is conducted in Model sensitivity analysis by computing their first order Sobol index (SI) and their pairwise correlation coefficient (PCC). The sensitivity analysis shows that the mean speed is mainly influenced by γ and with slightly negative and positive impact respectively, while acceleration is rather influenced by β and with positive impact. The link between the parameters and the other descriptors is more complex, including non linear effects (strong SI and small PCC) and parameter interactions (higher SI residuals, see Appendix B B.2).

B.2 Model sensitivity analysis
The link between the model parameters and the global trajectory descriptors introduced in Section Characterizing bacterial swimming in a biofilm matrix through image descriptors is not intuitive. A global sensitivity analysis of the trajectory descriptors (mean acceleration and speed, distance, displacement and visited areas) with respect to the parameters γ, v 0 , v 1 , β and is The residual variance is small for the median speed and acceleration but slightly larger for the distance, displacement and visited area indicating larger effects of parameter interactions for these outputs, i.e. output variations induced by joint shifts of the parameters (Appendix B B.2). The SI of the parameters v 0 and v 1 are negligible, except for the displacement and the visited area. The parameters γ, β and , i.e. the three weights associated to each component of the state equation (2), are more influential. Distance and speed have several main drivers. The distance is impacted nearly equally by γ, β and and the PCC of these parameters is quite small, indicating that this parameters may induce indistinctly negative or positive variations of the travelled distance, except for which is slightly negatively correlated. The median speed is mainly impacted by (slightly positively) and γ (slightly negatively), with relatively small PCC (Appendix B B.2). The mean acceleration, the displacement and the visited area are preponderantly impacted by a main driver: the mean acceleration and the visited area are particularly impacted by , the stochastic term weight, with positive influence. The displacement is mainly influenced by γ with no preponderant variation direction (null PCC, Appendix B B.2). Figure B.2: Sensitivity analysis of state observable respectively to state-equation parameters. The sensitivity of different state observables to parameter shifts is systematically studied through sensitivity analysis methods. a) Sobol indices are displayed for each output through barplots indicating the part of variance explained by a given parameter. The bars do not reach the value 1, indicating a residual variance reflecting interactions between parameters. b) Pairwise correlation coefficient (PCC) of the observable respectively to the input parameters are displayed. A negative PCC indicates a negative impact on the output, and conversely. We note that the red dot indicating the PCC of β for V is confounded with the purple one indicating the PCC of .

B.3 Friction and random term in Langevin equations.
To illustrate the interplay between the friction and the random term during a random walks, we solve the problem dv = − γvdt f riction + ηdt random term (11) v(0) = (0, 0) X(0) = (0, 0) (13) in an unconstrained domain, with η a 2 dimensional white noise with unitary variance. The friction parameter γ is alternatively set to 1 (Appendix B Fig  B.3, upper panel) or 0 (Appendix B Fig B.3, lower panel). We note that the random seed is the same for the simulations with or without the friction term, so that the stochastic contribution is completely identical in the upper and lower panels. The trajectories produced without the friction term are much more regular and rectilinear that those produced with the friction term, that are much chaotic. The reason of that behaviour may come from the null mean of the white noise. Roughly speaking, in average, the acceleration shows small variations around zero which leads after temporal integration to regular speeds and Trajectories produced by different repetitions of eq. (11) are displayed with γ = 1 (upper panel) and γ = 0 (lower pannel). We note that the same random seed has been taken for the simulations of the same column with or without the friction term, meaning that the stochastic term is strictly identical in both simulations.
rectilinear-like trajectories. By contrast, the friction term reduces the particle inertia, enhancing the impact of the stochastic term, which produces much more chaotic trajectories.

B.4 Impact of the stochastic term
We illustrate the impact of the random walk term on the overall swimmer trajectory with Appendix B B.4. In this figure, we display two trajectories computed from model (2) with identical parameters (α, β, v 0 , v 1 , γ and ), initial condition, host biofilm and time length. Different random samplings of the stochastic term of Equation (2) lead to these very different trajectories. This example illustrate the difference between identifying population-wide characteristics and inferring true trajectories : while the later try to detect the differences between the two trajectories (i.e. in this example, identifying and smoothing the different stochastic samples leading to these trajectories), the former focuses on the common features between these apparently different trajectories.

B.5 Influence of inference and stochastic terms on the trajectory descriptors
We wonder if the uncertainty sources involved in the inference process and in the stochastic term of the random walk have a decisive impact on the trajectory descriptors. To address this question, a first dataset is assembled by integrating in time Eq.(2) for given parameters (see Appendix A A.3), initial  (2), including the same parameters α, β, v 0 , v 1 , γ and , the same initial condition and identical host biofilm. The only uncertainty source comes from the different random samplings of the stochastic term. In this simulation, the ground truth (with default random seed) is plotted in blue.
conditions and host biofilm. Then, this dataset is used as inputs of the inference method to infer the initial parameters (ground truth). Another dataset is produced by replacing the initial parameters by the inferred parameters. We note that we take the same seed for the random number generator than for the initial dataset, so that the only uncertainty that has been introduced until this step comes from the inference procedure. Finally, we produce a last dataset by solving the model with the same inferred parameters as in the second dataset, but changing the seed of the random number generator. Hence, this last dataset involves uncertainties coming from the stochastic terms and from the inference process. This variation results in modifying the sampling of the stochastic terms and leads to strong modifications of the trajectories, like in Appendix B B.4. At end, the trajectory descriptors are computed and plotted in Appendix B B.5. We can see that the trajectory descriptor distributions are very similar across the different dataset, except for the total distance and the displacement where discrepancies can be noted. However, these differences are relatively small compared to the mean and the width of the distributions. We can also observe that the interactions with the underlying biofilm is very well conserved, even when the sampling of the stochastic term is very different. This observation grounds the initial guess that these trajectory descriptors captures common global features of the different trajectories rather than specificities of given trajectories. B.6 Relative impact of the different swimming processes on the species swim.
The ternary plot presented in Appendix B B.6 shows the balance between the different swimming processes. The contribution of each term of equation (4) to acceleration estimate was first computed. Namely, the relative value of the speed selection term s(b) s i , the direction selection term s(∇b) s i and the stochastic term s(η) s i where , s(∇b) s i = β s ∇b(t, X s i (t)) ∇b(t, X s i (t)) , and s(η) s i = η s . for k ∈ {b, ∇b, η}. As the contribution of the direction selection was limited compared to the other processes, we zoomed in the plot near the edge s(∇b) s i = 0 to allow inter-species comparisons. To assess the influence of the random term on the populationwide trajectory descriptors and overall prediction accuracy, we repeated the experiment displayed in 6 a. A synthetic database was first assembled (ground truth) and prediction were performed with a fitted model (After inference). Then, a second repetition of the prediction of the fitted model was computed with another seed for the random number generator, resulting in modifying the sampling of the stochastic terms and strong modifications of individual trajectories, like in B.4. The population-wide trajectory descriptors are however slightly impacted by this random effect, indicating that the main characteristics of the trajectory populations marginally depend on the stochastic term. Figure B.6: Respective influence of stochastic effects, speed or direction adaptation to the host biofilm. We plot in a ternary plot the respective influence of the speed selection (V ), the direction selection (D) and the random term ( ) of Eq.(1) in the acceleration distribution of each species. Each squared instantaneous acceleration is mapped in the ternary plot coordinates through the relative contribution of V 2 , D 2 and 2 , and this point cloud is approximated in the ternary plot coordinates with a gaussian kernel to display the point distributions. The 0.05, 0.5 and 0.95 quantile isovalues of these distributions are plotted. (a) The entire ternary plot is displayed. The dashed line represents the zoom box represented in Fig. (b), where the same isolines are displayed, but with a zoom in in the y direction to highlight differences between species.