Enchained growth and cluster dislocation: A possible mechanism for microbiota homeostasis

Immunoglobulin A is a class of antibodies produced by the adaptive immune system and secreted into the gut lumen to fight pathogenic bacteria. We recently demonstrated that the main physical effect of these antibodies is to enchain daughter bacteria, i.e. to cross-link bacteria into clusters as they divide, preventing them from interacting with epithelial cells, thus protecting the host. These links between bacteria may break over time. We study several models using analytical and numerical calculations. We obtain the resulting distribution of chain sizes, that we compare with experimental data. We study the rate of increase in the number of free bacteria as a function of the replication rate of bacteria. Our models show robustly that at higher replication rates, bacteria replicate before the link between daughter bacteria breaks, leading to growing cluster sizes. On the contrary at low growth rates two daughter bacteria have a high probability to break apart. Thus the gut could produce IgA against all the bacteria it has encountered, but the most affected bacteria would be the fast replicating ones, that are more likely to destabilize the microbiota. Linking the effect of the immune effectors (here the clustering) with a property directly relevant to the potential bacterial pathogeneicity (here the replication rate) could avoid to make complex decisions about which bacteria to produce effectors against.

A Order of magnitude of the encounter time between two bacteria The typical time to nd one target of radius a in a sphere of radius b by diusion is of the order of b 3 /(Da), so the typical time when there are N bacteria in a volume V is of the order of V /(N Da). For bacteria, a is in the micrometer range. Bacteria such as salmonella or E.coli typically swim at 10µm/s, and change direction every second, which gives a diusion coecient of the order of 10 −10 m 2 /s [1,2,3] (The peristaltic motions of the digesta are large scale movement rather than local diusion, so we assume they have a smaller eect on diusion). The mouse's cecum has a volume of the order of (1cm) 3 . In experiments of [4], the smallest inoculum consists in N = 10 5 bacteria, which is already large compared to what could be a realistic number of pathogenic bacteria in food poisoning (10 5 is the typical number of Salmonella for food poisoning in humans [5], which are much larger than mice). With these numbers, the typical encounter time is of the order of 10 5 s, i.e 30h, about 10 times longer than the typical digestion time in mice.
B Argument for high enchainment probability upon replication When a bacterium replicates, the time for septation is of the order of a few minutes. We intuitively think that this time is much larger than the time τ k required for bacteria to stick together when they randomly meet. The aim of this section is to check this intuition by giving an overestimate of τ k . If the diusion coecient is high enough, the time for bacteria to stick to each other will be limited by which proportion of the time they spend in close vicinity, and the rate k at which bacteria stick to each other when they are in close vicinity, k being the inverse of τ k . If the diusion coecient is smaller, then the time to rst encounter will also play a role, but as we calculate an overestimate of τ k , we can neglect this scenario.
We use the data on gure 1k of [4] about non-dividing bacteria (so the only sticking is from random encounters). The majority of them are aggregated after τ exp up to 8 hours (from the inoculum ingestion to the sampling used for imaging) for a concentration of 10 7 − 10 8 bacteria. As we will see, this estimate of τ k is proportional to τ exp and N , so to be conservative, as we will calculate an overestimate of τ k , we take the highest concentration and the maximum experimental time, i.e. N = 10 8 bacteria in V = 1cm 3 (cecum volume) and τ exp = 8 hour.
The bacteria typical size is a few micrometers, we thus take 3µm as an overestimate of the maximum bacterial size. Thus to be in close contact, two bacteria must be at most at a = 3µm away. Let us assume that then, the volume of possible contact is 4/3πa 3 , which is also an overestimate, because only certain orientations will allow bacteria to touch each other. Then, the proportion of time spent in close contact will be of the order of (N 4πa 3 )/(3V ). Then the typical time to stick to each other will be τ exp = τ k 3V /(N 4πa 3 ). Then τ k = τ exp N 4πa 3 /(3V ). Numerically, we obtain about 5 minutes as an overestimate of τ k .
Note that this is a large overestimate. Indeed, when bacteria get clumped to each other, their eective concentration decreases, thus it takes longer for the last bacteria to meet others, and thus the time for most bacteria to be clumped will be signicantly larger than the inverse of the early clumping rate.
With all these highly conservative estimates, we nd τ k at the very most of the same order of magnitude as the septation time, and very likely much smaller. Hence the probability for bacteria to escape enchainment is small, which justies that we take in general the limit of no escape. C Model with bacterial escape (δ > 0) and differential loss (c = c ). Figure A shows how the growth rate depends on r for dierent δ, δ , δ , c and c .
Our numerical study of the system showed us that there is some critical value δ c below which the behavior is qualitatively similar to the behavior of the system with δ = 0, i.e. with a nite maximum of the growth rate of the free bacteria as a function of the replication rate; and above which the growth rate continues to increase with replication rate. Actually, for δ > 0.5, the growth rate necessarily continues to increase with the replication rate. Indeed, upon replication, one free bacteria becomes two daughter bacteria, an average of 2δ of them staying free. Thus the net gain in free bacteria is 2δ − 1. Thus for δ > 0.5, the growth rate of free bacteria is at minimum r(2δ − 1). Consequently, δ c ≤ 0.5.
We detail here how to obtain the approximation for the chain length distribution. In the long time limit, the number of chains of length i is of the order of Cp i exp(λt), with λ the largest eigenvalue. Equation (8) of main text simplies to: Assuming that i is large, is required. Using this approximation for all i, the proportion of chains of length k is: Free bacteria are released at a rate 2rδ + 2α per chain. This rate is independent of the chain length. The direct contributions to the increase of free bacteria from chains of length i compared to all the larger chains will be (with K = (1 − δ )r/(r + α)): If r is small compared to α (replication rate breaking rate), then this ratio is small. Thus the larger chains are quickly negligible. Indeed, in this regime, chains typically dislocate before new replications, so there are few larger chains.  (5) of main text. Approximation (9) of the main text predicts that the distribution should depend only on δ , and not δ nor δ . In these gure where δ = 0 but δ (and in the left pannel δ ) have non-zero values, we do observe that the distribution, in particular its slope, is closest to the result for

D Chain length distribution with a xed replication time -approximation
Below, we present in details the assumptions and calculations to obtain the approximation of the chain length distribution when bacteria replicate every τ . We dene n i (t) the number of chains of length i at t with t taken just before a replication. Assuming i even, This is because just before a replication, there are n i/2+j (t) chains of length i/2 + j. Then, just after the replication, these chains are of length i + 2j. Time t + τ is just before the next replication. With probability l(i + 2j, 2j, τ ), these chains of length i + 2j have lost 2j bacteria on their edges and are now chains of length i. We sum over all the possible j. In the long time, n i (t) = Cp i exp(λt), with λ the long term growth rate, that is such that exp(λτ ) = N , with N the largest eigenvalue of the matrix. Replacing l(i + 2j, 2j, τ ) by its expression as in equation (11) of the main text, the previous equation leads to: We compare the 1st term of the sum to the rest of the sum. The rst term isp i 2 e −ατ (i−1) , the rest of the sum is: We divide both by e −ατ (i−1) . Then this is equivalent of comparing p i/2 with: When ατ is large, links typically break before the next replication, so there is little cluster formation, and it is thus expected that the chain length distribution decreases fast with i, so that for j > 0, p i 2 +j p i/2 . When ατ is small, replication is fast compared to the typical time for one link to break. However, for a chain of length i/2, τ has to be compared to (i/2 − 1)/α, the typical rst link breaking time, thus we expect n i to decrease with i for i large enough, thus p i 2 +j Thus in the case of ατ large, S is small relative to p i/2 because S is smaller than a few units times B, with B much smaller than p i/2 . In the case of ατ small, S is small relative to p i/2 because S is of the order of (ατ ) 2 B, with B of the order of p i/2 . Then this justies the assumption that only the rst term of the sum matters: We assume i = 2 k , with k an integer. This is obviously true only for a very restricted set of i, but however this still yields an approximation for how the distribution depends on i for large i. Then, by recursion, If i is large enough, 1 + 1/2 + 1/2 2 + ... + 1/2 k−1 2. Remembering that k was dened as i = 2 k , the result is: When ατ 1, links typically break before the next replication, thus there is little impact of the clustering on the growth. Consequently, the growth will be close to its value in the absence of clustering, i.e. doubling every τ , and thus in this limit N = 2: This rough approximation allows to explain the core of the observed distribution.
E Model with force-dependent breaking rate

E.1 Model and equations
A link between bacteria may consist of several sIgA bonds, and the number of bound sIgA may not be exactly the same from one inter-bacteria link to the next, but as sIgA are likely well mixed, many per bacteria and that bacteria are similar to each other, let us assume that link heterogeneity is negligible. The links could break if there was some process degrading the sIgA, but the sIgA are thought to be very stable [6]. Another possible explanation for link breaking is that the antigen get extracted from the bacterial membrane, at a rate which may depend exponentially with the force applied on the link[7] [8]. If the forces are produced by the bacteria themselves (such as by agella rotation), there are likely to uctuate on timescales which are short compared to the time between two bacterial replications, and their distribution is likely to be the same for all links, so it would be appropriate to model their eect as a xed breaking rate, the same for all the links. Another force is the hydrodynamical force exerted by the ow on the bacterial chain. The ow in the digestive system is complex and not precisely characterized. Longer bacterial chains may also bend and their shape have complex interactions with the ow. Here, we present the simplest model taking into account the forces exerted by the ow on the link breaking rate. We aim to capture the main plausible eects of the ow when the link breaking rate is force-dependent.
Figure F: Schematic of the forces applied to the chain. A We assume a straight chain of beads with no hydrodynamic interactions between them. B We substract the average force to put ourselves in the referential of the center of the chain, as the total force will translate the whole chain and not impact on the forces on the links. We focus on the forces parallel to the chain that will impact the tension between the links. C Sum of the forces on each bead, for chains with even and odd number of beads.
Let us take a linear chain of N bacteria, each of length B. Let us approximate it by a rigid chain with beads linked by straight rods of length B (pannel A of gure F). Let us assume that the rods are innitely thin so they do not interact with the ow, and let us neglect the hydrodynamical interaction between the beads, so they each are subject to the same frictional force for a given uid velocity, and, given that the typical Reynolds numbers in the digestive tract are relatively low [9], then the viscous force on each bead is proportional to the ow velocity.
Then, let us assume that the velocity gradient in the uid is constant around the chain. The rationale for this approximation is that the typical scales of the ow are of the order of the centimeter / millimeter (for instance in a mouse, the cecum typical size is in the cm range), much larger than typical bacterial chains (the length of one bacteria is about 2µm, so even chains of dozens of bacteria remain small compared to the typical ow scale), thus we take a linear approximation of the velocity eld in the vicinity of a bacterial chain.
Then, if we take the sum of the forces on the whole chain, it will be equal on mN multiplied by the acceleration of the center of mass of the chain, with m the mass of each bead. When all the beads move together, there is no force on the links, thus let us take the referential relative to the center of the chain, and subtract the mean force on each bead (panel B of gure F). Then, there remain forces perpendicular to the axis of the chains, and forces parallel to the axis of the chain. The forces perpendicular to the axis of the chain will make it rotate, and as they are perpendicular, they have no eect on the tension on the rods. Then, let us consider only the forces parallel to the chain.
In the example portrayed here, the chain is elongated. The reverse could happen, but in this case, the chain would likely buckle, and the force applied on the links would be small. The ow varies considerably in time, due to peristaltic motions[10] [9]. There would be moments with no force and little breaking, and moments with larger forces and more breaking. The ow due to peristaltic motions changes on time scales short compared to the typical bacterial division time, thus we will assume that periods of low breaking and high breaking rates will be equivalent to an average eective breaking rate. Then let us consider the case of elongation only, as portrayed here.
As we assume here that the velocity gradient is constant, the relative uid velocity grows linearly with the distance from the center of mass of the chain. Then the force on each bead is equal to F 0 multiplied by the distance to the center divided by B. We assume, following [7] [8], that the breaking rate is dependent on the force. Thus, we dene α and β such that the breaking rate of a link is α exp(βF/F 0 ) if a force F is applied to the link. In the limit of small force, the breaking rate will be α, the same for all links, as in the base model. β is some constant caracterizing how much the stability of the link is force-dependent.
We can write the force on each bead (pannel C of gure F). Then, here, because the chain is rigid and straight, the sum of the forces on each bead has to be zero. The tension on the outermost link will simply be equal to the ow force on the outermost bead, i.e. F 0 multiplied by its distance to the center divided by B, i.e. (N − 1)/2 (both for chains of odd and even number of beads). On the next link, the tension has to compensate for the ow force on the second bead, plus the tension applied by the outermost link. Thus the tension on this link is F 0 ((N −1)/2+(N −1)/2−1), and so forth (this is analogous to modelling of breaking of polymer chains in elongational ows, as in [11]).
For N even, the force on the j th link starting from the outermost link will be: Using n i=1 i = n(n + 1)/2, it can be rewritten as: There are two links j th away from the extremities, for j from 1 to N/2 − 1, and one central link, for which j = N/2. The breaking rate of a given link is α exp(βF/F 0 ) with F the total force applied to the link. Then the total breaking rate of one chain of length N even is: An outermost link of a chain of length N + 1 (with N even, N + 1 is odd) breaks at rate α exp(βN/2). There are two such links for each chain. This and equation (S1) lead to equation (30) of the main text: For N odd, the force on the j th link starting from the outermost link will be: Simiarly to the N even case, we can rewrite: Because of the two sides, there are two links j for each chain, for j from 1 to (N − 1)/2. The breaking rate of a given link is α exp(βF/F 0 ) with F the total force applied to the link. Then the total breaking rate of one chain of length N odd is: An outermost link of a chain of length N + 1 (with N odd, N + 1 is even) breaks at rate α exp(βN/2). There are two such links for each chain. Then, this and equation (S2) lead to equation (31) of the main text for the evolution in time of the mean number of chains of odd length i: E.2 Additional gure for the force-dependent model: replication rate maximizing the growth rate as a function of β Figure G shows that the rate of replication maximizing the growth rate of free bacteria increases exponentially with β, which represent the strength of the dependence of the breaking rate on the force applied to the link. We start from equations (30) and (31), and assume that for t long enough, n i Cp i exp(λt) (with λ the largest eigenvalue). Then, Let us now determine which terms dominate in this expression.
For i large enough, λ ri. Thus λp i is negligible relative to rip i . For both i even and odd, X is a converging sum which tends to a nite number when i increases. Let us denote its limit Y = 1+2 ∞ j=1 exp(−βj 2 /2) = θ 3 (0, exp(−β/2))) in the even case, and Z = 2 ∞ j=1 exp(−β(j − 1/2) 2 /2) = θ 2 (0, exp(−β/2))) in the odd case, with θ i the Jacobi Theta functions. Thus, because β is positive, for i large enough, ri α exp(βi 2 /8)X. The remaining main terms in equation (S3) are: The rst term is negative, the two others are positive. Then we have to determine which of r(i−1)p i−1 and 2αp i+1 exp(βi/2) dominates. If 2αp i+1 exp(βi/2) dominates, αp i exp(βi 2 /8)X 2αp i+1 exp(βi/2), thus p i+1 /p i exp(βi(i/8 − 1/2))X, which for i large enough means that the long the chain, the more of it, which would diverge and does not make sense in this system. Thus This approximation is valid for large chain sizes. We assume that it is valid for any chain length. As this expression is small and decreasing quickly with increasing i, p 1 will be close to 1. Then, as: and using the known expression for the sum of the squares i j=1 j 2 = (n + 3n 2 + 2n 3 )/6 and expression (S4): These two equations can be combined, and ultimately lead to:

E.4 Additional gure for the force-dependent model: chain length distribution for other values of r/α
In panel 2J of the main text, we represented the distribution of chain lengths in the model with force-dependent link breaking rate for r/α = 1. In gure H we represent the distributions for dierent values of r/α. Overall, the shapes are similar, and the smaller r/α is (as well as the larger β is), the better the analytical approximation works.   We perform a new analysis on images that were produced for [4]. We briey describe below the experiments from which the images were produced, and describe our analysis. Mice, which were previously vaccinated with a peracetic-acid inactivated S.Typhimurium strain (PA-S.Tm), were pretreated with 0.8g/kg ampicillin sodium salt in sterile PBS. 24h later, mice received 10 5 CFU of a 1:1 mix of mCherry-(pFPV25.1) and GFP-(pM965) expressing attenuated S. Tm M2702. For imaging, cecum content was diluted gently 1:10 w/v in sterile PBS containing 6µg/ml chloramphenicol to prevent growth during imaging. 200µl of the suspension were transferred to an 8-well Nunc Lab-Tek Chambered Coverglass (Thermo Scientic) and imaged at 100x using the Zeiss Axiovert 200m microscope. To determine the distribution of bacteria in aggregates, n=25 high power elds per mouse were randomly selected and imaged for mCherry and GFP uorescence. For some mice, sequential sampling was done, these mice were terminally anaesthetised and articially respirated cecum content was sampled by tying o part of the cecum each hour for 3h. More details about the experimental procedures can be found in [4].
We analyzed all the images for the early data points (4 and 5 hours) of experiments starting from a low inoculum (10 5 ), to minimize the clustering from random encounters. Only the linear chains were counted. Images are for the red and green uorescence, so complex clusters with two colors were not counted. The data were analyzed manually. The images are available as supplementary materials: • File S1 images4h.zip contains the images for 3 of the mice only sampled at 4h.
• File S2 images4h_others.zip contains the images for the other 4 mice only sampled at 4h.
• File S3 images5h.zip contains the images of the mice only sampled at 5h • File S4 imagesseq4h.zip contains the images at 4h of mice sampled sequentially • File S5 imagesseq5h.zip contains the images at 5h of mice sampled sequentially .

G.2 Results
For linear chains, we obtained the length distribution detailed in table 1 and shown on gure 4 of the main text. Given the bumpy shape of the experimental distribution, we chose to t the data with the xed replication time model for the gure 4 of the main text. In this model, the only adjustable parameter is r ef f /α. For a given r ef f /α, we obtain the theoretical chain length distribution p i by numerical resolution chain length 4h PI o 4h PI s 5h PI o 5h PI s total (7 mice) (2 mice) (4 mice) (2 mice)  2  21  30  17  38  106  3  22  4  9  5  40  4  51  9  25  9  94  5  7  0  1  3  11  6  5  3  3  4  15  7 In practice, here, there are no clusters observed longer than i max = 14. We compute numerically this log-likelihood as a function of r ef f (p i depends on r ef f ). The value of r ef f maximizing the log likelihood is 4.1.
A condence interval of 95% can be approximated by the interval of r ef f for which the dierence between the log likelihood and its maximum is less than 1.92 [12]. This results in a condence interval of 3.7 ≤ r ef f ≤ 4.6.
To quantify our impression that there are fewer long chains observed than expected, we performed the following calculations. Taking r ef f /α = 4.1 and N exp = 313, the expected number of chains of length 15 or longer is 3.7, whereas none is observed, which for a multinomial distribution has a probability 0.024 to occur.This probability seems low. Either this is a low probability but still happened (and if we look at a bit shorter chains, the expected average number of chains of length 9 and longer is 11.7, and 9 of them are actually observed, which is relatively close); or there is some process limiting the number of long chains. There are two main possibilities for the number of long chains to be limited: there could be an experimental bias limiting the observation of long chains (see discussion below); or there could be some force-dependence of the breaking rates, which would eectively act as a cut-o for the chain length (see gure 3 of main text), as in this case, breaking rates increase considerably with chain length.

G.3 Discussion
The data may be biased. The mass of one bacterium is about one pg, and its density is about 10% more than the water density [13,14], the thermal energy at ambient temperature is of the order of 4.10 −21 J, and gravity g is of the order of 10m/s 2 , thus thermal uctuations will lift an individual bacterium by typically 4 µm higher than the bottom. Thermal uctuations will have two eects: • The average height of the center of gravity of chains will decrease with their length. This is confocal microscopy, which typical optical section is less than 1µm, focused close to the cover slip. This may bias the distribution by missing smaller chains.
• Longer chains are not rod-like, their shape uctuate. It is apparent on the microscopy images that parts of long chains may get out of focus. The longer the chain, the less likely that it is entirely in the focus, and thus chains will look smaller than they are.
We focus on the chain length distribution because this quantity is more easily accessible by experimental measurements, at the end of an experiment. Comparing models and experiments enables to check whether the data is compatible with a process of growing and breaking of clusters; and determine which specic model is closest to the data. However, some models cannot be distinguished, no matter how much data is available for the chain length distribution. For example in the model with bacterial escape and the model where chains can remain independent after breaking, there are two parameters to t (r/α and δ, or r/α and q). It is likely that tting would mainly select a value for r/α, since the distribution does not depend much on the second parameter in both cases. These models could not be distinguished from the base model. On the other hand, models with dierent distribution shapes either in the force-dependent model or in the xed division time one could be distinguished, provided that the bias can be overcome, and that more data can be collected. We could t the xed replication time model to the data, and this strengthened our hypothesis that the chains are generated by a process of enchained growth and link breaking. However, there is somewhat less long chains observed than expected (especially in the range of lengths 14 to 16). One possibility could be that the breaking rate is force dependent. If we had 10 times more of unbiased data, we could answer whether there really is a decit of longer chains. If there is indeed a decit of longer chains, then we should combine the model of force-dependent breaking rates with the model with xed replication time, to be able to make quantitative comparisons. To be more eective, comparison would likely require more data, as there would be two free parameters, r/α and β. If there is no decit of longer chains in the range up to 16, then the simple model with xed replication time predicts that we would get access to the distribution up to length 24 (and be at the limit for lengths 28 and 32) with 100 times more data than in current experiments (see gure J). Thus overall we would Increasing the amount of data would not necessarily require to sacrice more mice, but merely to take more images for each cecum content. The challenge would be to do so with no bias, and with very standardized conditions so that the images are taken in conditions close enough so as to automate the chain detection and length count.
It would be also very useful if there would be ways to estimate the breaking rate in independent experiments, for instance injecting (without breaking them nor perturbing the system) chains of non-replicating bacteria of controlled length, and measuring how the length distribution changes over time. Then, as the replication rate can be estimated by other measures (dilution of nonreplicating plasmids), we could get an estimate of the replication rate over the breaking rate, which would considerably constrain the tting of the chain length distribution, and thus give more strength to the conclusions achieved.